Metadata-Version: 2.4
Name: wet-mcp
Version: 2.2.0
Summary: Open-source MCP Server for web search, extract, and crawl with embedded SearXNG
Project-URL: Homepage, https://github.com/n24q02m/wet-mcp
Project-URL: Repository, https://github.com/n24q02m/wet-mcp.git
Project-URL: Issues, https://github.com/n24q02m/wet-mcp/issues
Author-email: n24q02m <quangminh2422004@gmail.com>
License: MIT
License-File: LICENSE
Keywords: crawl4ai,mcp,searxng,tavily-alternative,web-scraping
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: ==3.13.*
Requires-Dist: crawl4ai
Requires-Dist: httpx
Requires-Dist: litellm
Requires-Dist: loguru
Requires-Dist: mcp[cli]
Requires-Dist: pydantic
Requires-Dist: pydantic-settings
Description-Content-Type: text/markdown

# WET - Web ExTract MCP Server

**Open-source MCP Server for web scraping & multimodal extraction.**

[![PyPI](https://img.shields.io/pypi/v/wet-mcp)](https://pypi.org/project/wet-mcp/)
[![Docker](https://img.shields.io/docker/v/n24q02m/wet-mcp?label=docker)](https://hub.docker.com/r/n24q02m/wet-mcp)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)

## Features

- **Web Search** - Search via embedded SearXNG (metasearch: Google, Bing, DuckDuckGo, Brave)
- **Content Extract** - Extract clean content (Markdown/Text)
- **Deep Crawl** - Crawl multiple pages from a root URL with depth control
- **Site Map** - Discover website URL structure
- **Media** - List and download images, videos, audio files
- **Anti-bot** - Stealth mode bypasses Cloudflare, Medium, LinkedIn, Twitter

---

## Quick Start

### Prerequisites

- **Python 3.13+** (or use `uvx`)

### Add to mcp.json

#### uvx (Recommended)

```json
{
  "mcpServers": {
    "wet": {
      "command": "uvx",
      "args": ["wet-mcp@latest"],
      "env": {
        "API_KEYS": "GOOGLE_API_KEY:AIza..."
      }
    }
  }
}
```

**That's it!** On first run:
1. Automatically installs SearXNG from GitHub
2. Automatically installs Playwright chromium + system dependencies
3. Starts embedded SearXNG subprocess
4. Runs the MCP server

#### Docker

```json
{
  "mcpServers": {
    "wet": {
      "command": "docker",
      "args": ["run", "-i", "--rm", "-e", "API_KEYS", "n24q02m/wet-mcp:latest"],
      "env": {
        "API_KEYS": "GOOGLE_API_KEY:AIza..."
      }
    }
  }
}
```

### Without uvx

```bash
pip install wet-mcp
wet-mcp
```

---

## Tools

| Tool | Actions | Description |
|:-----|:--------|:------------|
| `web` | search, extract, crawl, map | Web operations |
| `media` | list, download, analyze | Media discovery & download |
| `help` | - | Full documentation |

### Usage Examples

```json
{"action": "search", "query": "python web scraping", "max_results": 10}
{"action": "extract", "urls": ["https://example.com"]}
{"action": "crawl", "urls": ["https://docs.python.org"], "depth": 2}
{"action": "map", "urls": ["https://example.com"]}
{"action": "list", "url": "https://github.com/python/cpython"}
{"action": "download", "media_urls": ["https://example.com/image.png"]}
```

---

## Configuration

| Variable | Default | Description |
|:---------|:--------|:------------|
| `WET_AUTO_SEARXNG` | `true` | Auto-start embedded SearXNG subprocess |
| `WET_SEARXNG_PORT` | `8080` | SearXNG port |
| `SEARXNG_URL` | `http://localhost:8080` | External SearXNG URL (when auto disabled) |
| `API_KEYS` | - | LLM API keys for media analysis |
| `LOG_LEVEL` | `INFO` | Logging level |

### LLM Configuration (Optional)

For media analysis (images, videos, audio), configure API keys:

```bash
API_KEYS=GOOGLE_API_KEY:AIza...
LLM_MODELS=gemini/gemini-3-flash-preview
```

---

## Architecture

```
┌─────────────────────────────────────────────────────────┐
│                    MCP Client                           │
│            (Claude, Cursor, Windsurf)                   │
└─────────────────────┬───────────────────────────────────┘
                      │ MCP Protocol
                      ▼
┌─────────────────────────────────────────────────────────┐
│                   WET MCP Server                        │
│  ┌──────────┐  ┌──────────┐  ┌──────────────────────┐   │
│  │   web    │  │  media   │  │        help          │   │
│  │ (search, │  │ (list,   │  │  (full documentation)│   │
│  │ extract, │  │ download,│  └──────────────────────┘   │
│  │ crawl,   │  │ analyze) │                             │
│  │ map)     │  └────┬─────┘                             │
│  └────┬─────┘       │                                   │
│       │             │                                   │
│       ▼             ▼                                   │
│  ┌──────────┐  ┌──────────┐                             │
│  │ SearXNG  │  │ Crawl4AI │                             │
│  │(embedded)│  │(Playwright)│                           │
│  └──────────┘  └──────────┘                             │
└─────────────────────────────────────────────────────────┘
```

---

## Build from Source

```bash
git clone https://github.com/n24q02m/wet-mcp
cd wet-mcp

# Setup (requires mise: https://mise.jdx.dev/)
mise run setup

# Run
uv run wet-mcp
```

### Docker Build

```bash
docker build -t n24q02m/wet-mcp:latest .
```

**Requirements:** Python 3.13+

---

## Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md)

## License

MIT - See [LICENSE](LICENSE)
