Metadata-Version: 2.4
Name: undetecta
Version: 0.1.0
Summary: Python client for the Undetecta API - Anti-detection web scraping made simple
Project-URL: Homepage, https://github.com/mikipavlov/undetecta
Project-URL: Repository, https://github.com/mikipavlov/undetecta
Project-URL: Documentation, https://github.com/mikipavlov/undetecta#readme
Project-URL: Bug Tracker, https://github.com/mikipavlov/undetecta/issues
Author: Undetecta
License: MIT
Keywords: anti-detection,api-client,browser-automation,scraping,undetecta,web-scraping
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Requires-Dist: httpx>=0.27.0
Requires-Dist: pydantic>=2.0.0
Description-Content-Type: text/markdown

# undetecta

> Python client for the Undetecta API - Anti-detection web scraping made simple.

[![PyPI version](https://badge.fury.io/py/undetecta.svg)](https://pypi.org/project/undetecta/)
[![Python](https://img.shields.io/badge/Python-3.10%2B-blue.svg)](https://www.python.org/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

## Features

- **Full Type Hints** - Comprehensive Pydantic v2 types for all API responses and requests
- **Web Scraping** - Scrape URLs with advanced options (screenshots, branding extraction, actions)
- **Web Search** - Search the web with result scraping capabilities
- **Automatic Retries** - Built-in retry logic with exponential backoff for transient failures
- **Error Handling** - Custom error classes for different error types
- **Async First** - Built on `httpx` for async/await operations
- **Context Manager** - Automatic resource cleanup with async context manager support

## Installation

```bash
pip install undetecta
```

Or using uv:

```bash
uv add undetecta
```

## Quick Start

```python
import asyncio
from undetecta import UndetectaClient

async def main():
    async with UndetectaClient(api_key="your-api-key") as client:
        result = await client.scrape(url="https://example.com")
        print(result.markdown)

asyncio.run(main())
```

## Configuration

```python
from undetecta import UndetectaClient

client = UndetectaClient(
    api_key="your-api-key",                    # Required
    base_url="https://api.undetecta.com",     # Optional, defaults to production
    timeout=60000,                             # Optional, default 60 seconds (ms)
    max_retries=3,                             # Optional, default 3 retries
)
```

## API Reference

### Scraping

#### `scrape(url, formats=None, **kwargs)`

Scrape a URL and return the results.

```python
result = await client.scrape(
    url="https://example.com",
    formats=["markdown", "screenshot"],
    wait_for_selector=".main-content",
)
```

**Scrape Options:**

| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `url` | `str` | Required | URL to scrape |
| `formats` | `list[str]` | `["markdown"]` | Output formats: `html`, `rawHtml`, `markdown`, `links`, `screenshot`, `branding` |
| `headless` | `bool` | `True` | Run browser in headless mode |
| `proxy` | `str` | - | Proxy URL to use |
| `wait_for_selector` | `str` | - | CSS selector to wait for |
| `wait_for` | `int` | - | Time to wait in ms |
| `wait_until` | `str` | - | Wait until page load event (`load`, `domcontentloaded`, `networkidle`) |
| `timeout` | `int` | `30000` | Request timeout in ms |
| `mobile` | `bool` | - | Use mobile viewport |
| `actions` | `list[Action]` | - | Browser actions to perform |
| `screenshot_options` | `ScreenshotOptions` | - | Screenshot configuration |

**Browser Actions:**

```python
result = await client.scrape(
    url="https://example.com",
    actions=[
        {"type": "click", "selector": ".cookie-accept"},
        {"type": "fill", "selector": "input[name='email']", "value": "test@example.com"},
        {"type": "wait", "options": {"duration": 1000}},
        {"type": "scroll", "options": {"direction": "down", "amount": 500}},
    ],
)
```

**Response:**

```python
class ScrapeJobResponse(BaseModel):
    id: str
    status: JobStatus  # pending, running, completed, failed, stopped
    created_at: str
    completed_at: str | None
    metadata: ScrapeMetadata | None
    html: str | None
    raw_html: str | None
    markdown: str | None
    links: list[str] | None
    screenshot: str | None  # base64 encoded image
    branding: BrandingProfile | None
    error: ScrapeJobError | None
```

### Search

#### `search(query, engine=None, limit=None, sources=None, **kwargs)`

Perform a web search and return the results.

```python
result = await client.search(
    query="web scraping tools",
    limit=10,
    engine="google",
    sources=["web"],
    scrape_options={"formats": ["markdown"]},
)

for item in result.web:
    print(item.title)
    print(item.markdown)  # Scraped content
```

**Search Options:**

| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `query` | `str` | Required | Search query |
| `engine` | `str` | `"google"` | Search engine (`google` or `duckduckgo`) |
| `limit` | `int` | `10` | Number of results |
| `sources` | `list[str]` | `["web"]` | `web`, `news`, `images` |
| `categories` | `list[str]` | - | `github`, `research`, `pdf` |
| `lang` | `str` | - | Language code |
| `country` | `str` | - | Country code |
| `scrape_options` | `SearchScrapeOptions` | - | Options for scraping results |
| `timeout` | `int` | - | Timeout in ms |

**Response:**

```python
class SearchJobResponse(BaseModel):
    id: str
    status: JobStatus
    created_at: str
    completed_at: str | None
    web: list[WebSearchResult] | None
    news: list[NewsSearchResult] | None
    images: list[ImageSearchResult] | None
    error: SearchJobError | None
```

### Health Check

#### `health()`

Check the API health status.

```python
health = await client.health()
print(health.status)  # 'ok'
```

## Error Handling

The client provides custom error classes for different error types:

```python
from undetecta.errors import (
    ApiKeyError,
    RateLimitError,
    TimeoutError,
    NetworkError,
    ValidationError,
    NotFoundError,
    UndetectaError,
)

async with UndetectaClient(api_key="your-api-key") as client:
    try:
        result = await client.scrape(url="https://example.com")
    except ApiKeyError as e:
        print(f"Invalid API key: {e.message}")
    except RateLimitError as e:
        print(f"Rate limit exceeded: {e.message}")
    except TimeoutError as e:
        print(f"Request timed out: {e.message}")
    except NetworkError as e:
        print(f"Network error: {e.message}")
    except ValidationError as e:
        print(f"Validation error: {e.message}")
    except NotFoundError as e:
        print(f"Resource not found: {e.message}")
    except UndetectaError as e:
        print(f"API error: {e.code} - {e.message}")
        if e.status_code:
            print(f"Status: {e.status_code}")
```

### Error Classes

| Error Class | Status Code | Description |
|-------------|-------------|-------------|
| `ApiKeyError` | 401 | Invalid or missing API key |
| `ValidationError` | 400 | Request validation failed |
| `NotFoundError` | 404 | Resource not found |
| `RateLimitError` | 429 | Rate limit exceeded |
| `UndetectaError` | Variable | Base error class (includes status code for server errors) |
| `TimeoutError` | - | Request timed out |
| `NetworkError` | - | Network connection failed |

## Advanced Examples

### Extract Branding

```python
result = await client.scrape(
    url="https://example.com",
    formats=["branding"],
)

print(result.branding.colors.primary)
print(result.branding.typography.font_families)
print(result.branding.components.button_primary)
```

### Screenshot with Custom Options

```python
from undetecta.types import ScreenshotOptions, ScreenshotClip, ScreenshotViewport

result = await client.scrape(
    url="https://example.com",
    formats=["screenshot"],
    screenshot_options=ScreenshotOptions(
        full_page=False,
        format="png",
        quality=90,
        clip=ScreenshotClip(x=0, y=0, width=1920, height=1080),
        selector=".main-content",
        viewport=ScreenshotViewport(width=1920, height=1080),
    ),
)

# Save screenshot
import base64
with open("screenshot.png", "wb") as f:
    f.write(base64.b64decode(result.screenshot))
```

### Search with Result Scraping

```python
result = await client.search(
    query="typescript web scraping",
    limit=5,
    scrape_options={
        "formats": ["markdown", "links"],
        "only_main_content": True,
    },
)

for item in result.web or []:
    print(item.title)
    print(item.url)
    if item.markdown:
        print(item.markdown[:200] + "...")
```

### Custom Error Handling with Retry

```python
import asyncio

async def scrape_with_retry(url, max_retries=3):
    async with UndetectaClient(api_key="your-api-key") as client:
        for attempt in range(max_retries):
            try:
                return await client.scrape(url=url)
            except RateLimitError as e:
                if attempt < max_retries - 1:
                    delay = 2 ** attempt  # Exponential backoff
                    await asyncio.sleep(delay)
                    continue
                raise
```

## Development

Install development dependencies:

```bash
uv sync
```

Run linting and formatting:

```bash
uv run ruff check src
uv run ruff format src
```

Run type checking:

```bash
uv run pyright src
```

Run tests:

```bash
uv run pytest
```

## Requirements

- **Python**: >= 3.10
- **httpx**: >= 0.27.0
- **pydantic**: >= 2.0.0

## Schema Documentation

For information about how the Pydantic models relate to the JSON schemas and how to regenerate types, see [SCHEMAS.md](./SCHEMAS.md).

## License

MIT &copy; Undetecta

## Links

- [Documentation](https://undetecta.com/docs)
- [API Reference](https://undetecta.com/api)
- [GitHub](https://github.com/undetecta/undetecta)
- [Report Issues](https://github.com/undetecta/undetecta/issues)
