Metadata-Version: 2.4
Name: webpeel
Version: 0.13.0
Summary: Fast web fetcher for AI agents — smart extraction, stealth mode, structured data
Author-email: Jake Liu <jake@webpeel.dev>
License: AGPL-3.0-or-later
Project-URL: Homepage, https://webpeel.dev
Project-URL: Documentation, https://github.com/JakeLiuMe/webpeel
Project-URL: Repository, https://github.com/JakeLiuMe/webpeel
Project-URL: Issues, https://github.com/JakeLiuMe/webpeel/issues
Keywords: web-scraping,ai,llm,mcp,web-fetcher,markdown,scraper,crawler,ai-agents
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU Affero General Public License v3 or later (AGPLv3+)
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Dynamic: license-file

# WebPeel Python SDK

**Fast web fetcher for AI agents** — smart extraction, stealth mode, structured data.

Zero dependencies. Pure Python 3.8+ stdlib.

## Installation

```bash
pip install webpeel
```

## Quick Start

### Basic Scraping

```python
from webpeel import WebPeel

client = WebPeel()

# Scrape a URL and get clean markdown
result = client.scrape("https://example.com")
print(result.title)
print(result.content)  # Clean markdown content
print(result.metadata)  # Structured metadata
```

### Search the Web

```python
# Search via DuckDuckGo
results = client.search("python web scraping")

for item in results.data.get("web", []):
    print(f"{item['title']}: {item['url']}")
```

### JavaScript-Heavy Sites

```python
# Use browser rendering for SPAs and JS-heavy sites
result = client.scrape(
    "https://twitter.com/elonmusk",
    render=True,  # Enable browser mode
    wait=2000,    # Wait 2s for JS to load
)
```

### Stealth Mode (Bypass Bot Detection)

```python
# Bypass Cloudflare, reCAPTCHA, and anti-bot systems
result = client.scrape(
    "https://protected-site.com",
    stealth=True,  # Enable stealth mode
)
```

### Structured Data Extraction

```python
# Extract specific data using CSS selectors
result = client.scrape(
    "https://amazon.com/product/...",
    extract={
        "selectors": {
            "title": "h1#title",
            "price": "span.price",
            "rating": ".review-rating",
        }
    }
)

print(result.extracted)
# {"title": "Product Name", "price": "$29.99", "rating": "4.5"}
```

### Crawl a Website

```python
# Start an async crawl job (requires API key)
client = WebPeel(api_key="your-api-key")

job = client.crawl(
    "https://docs.example.com",
    limit=100,
    max_depth=3,
)

print(job.id)  # Job ID for tracking

# Check status later
status = client.get_job(job.id)
print(status["status"])  # pending, running, completed, failed
```

### Map a Domain

```python
# Discover all URLs on a domain
result = client.map("https://example.com")

print(f"Found {result.total} URLs")
for url in result.urls[:10]:
    print(url)
```

### Batch Scraping

```python
# Scrape multiple URLs in batch (requires API key)
client = WebPeel(api_key="your-api-key")

urls = [
    "https://example.com/page1",
    "https://example.com/page2",
    "https://example.com/page3",
]

job = client.batch_scrape(urls, max_tokens=5000)
print(job.id)
```

## API Reference

### WebPeel Class

```python
WebPeel(
    api_key: Optional[str] = None,
    base_url: str = "https://api.webpeel.dev",
    timeout: int = 30,
)
```

- **`api_key`**: API key for authentication (optional for free tier)
- **`base_url`**: Base URL for the WebPeel API
- **`timeout`**: Request timeout in seconds

### Methods

#### `scrape(url, **options) -> ScrapeResult`

Scrape a URL and extract content.

**Options:**
- `formats`: Output formats (default: `["markdown"]`)
- `max_tokens`: Maximum token count for output
- `render`: Use headless browser (default: `False`)
- `stealth`: Bypass bot detection (default: `False`)
- `wait`: Wait time in ms after page load
- `extract`: Structured data extraction config
- `headers`: Custom HTTP headers

#### `search(query, limit=5) -> SearchResult`

Search the web via DuckDuckGo.

#### `crawl(url, limit=50, max_depth=3) -> CrawlResult`

Start an async crawl job (requires API key).

#### `map(url) -> MapResult`

Discover all URLs on a domain.

#### `batch_scrape(urls, **options) -> BatchResult`

Batch scrape multiple URLs (requires API key).

#### `get_job(job_id) -> Dict`

Check status of an async job.

## WebPeel vs Firecrawl

| Feature | WebPeel | Firecrawl |
|---------|---------|-----------|
| **Pricing** | $0 local / $9-$29 cloud | $16-$333/mo |
| **Free Tier** | 125 fetches/week | 500 credits one-time |
| **License** | AGPL-3.0 | AGPL-3.0 |
| **Python SDK Deps** | Zero (pure stdlib) | httpx, pydantic |
| **Smart Escalation** | ✅ Auto HTTP→Browser→Stealth | Manual mode selection |
| **Token Budget** | ✅ `--max-tokens` | ❌ |
| **Quality Scoring** | ✅ 0-1 per response | ❌ |
| **Local CLI** | ✅ Free, unlimited | Requires API key |
| **LangChain** | ✅ | ✅ |
| **LlamaIndex** | ✅ | ✅ |

**WebPeel is the free, fast, open-source alternative to Firecrawl.**

## Authentication

Free tier: No API key needed. Anonymous usage with rate limits.

Paid tier: Get an API key at [webpeel.dev](https://webpeel.dev).

```python
client = WebPeel(api_key="wp_...")
```

## Error Handling

```python
from webpeel import WebPeel, WebPeelError, RateLimitError, TimeoutError

client = WebPeel()

try:
    result = client.scrape("https://example.com")
except RateLimitError:
    print("Rate limit exceeded. Upgrade or wait.")
except TimeoutError:
    print("Request timeout. Try again.")
except WebPeelError as e:
    print(f"Error: {e}")
```

## License

AGPL-3.0 © Jake Liu

## Links

- [Homepage](https://webpeel.dev)
- [Documentation](https://github.com/webpeel/webpeel)
- [GitHub](https://github.com/webpeel/webpeel)
- [Issues](https://github.com/webpeel/webpeel/issues)
