Metadata-Version: 2.4
Name: crawl4ai-client
Version: 0.1.2
Summary: Lightweight async client for Crawl4AI Docker server — no browser dependencies required
Author-email: Maysam Hafez Parast <hafezparast@gmail.com>
License-Expression: Apache-2.0
Project-URL: Homepage, https://github.com/hafezparast/crawl4ai-client
Project-URL: Documentation, https://github.com/hafezparast/crawl4ai-client#readme
Project-URL: Repository, https://github.com/hafezparast/crawl4ai-client
Project-URL: Issues, https://github.com/hafezparast/crawl4ai-client/issues
Keywords: crawl4ai,web-scraping,docker,async,client
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Framework :: AsyncIO
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: httpx>=0.27.0
Requires-Dist: pydantic>=2.0
Dynamic: license-file

# crawl4ai-client

Lightweight async Python client for [Crawl4AI](https://github.com/unclecode/crawl4ai) Docker server.

**No browser dependencies required.** Just `httpx` + `pydantic` (~2MB vs ~500MB for the full crawl4ai package).

## Install

```bash
pip install crawl4ai-client
```

## Quick Start

```python
import asyncio
from crawl4ai_client import Crawl4aiDockerClient

async def main():
    async with Crawl4aiDockerClient(
        base_url="http://localhost:11235",
        api_token="your-token",  # optional
    ) as client:
        result = await client.crawl(["https://example.com"])
        print(result.raw_markdown)

asyncio.run(main())
```

## Features

- **Crawl** single or multiple URLs (`/crawl`)
- **Stream** results as they complete (`/crawl/stream`)
- **Markdown** extraction with filters (`/md`)
- **Screenshots** as base64 PNG (`/screenshot`)
- **PDF** generation (`/pdf`)
- **HTML** preprocessing for schema extraction (`/html`)
- **JavaScript** execution on pages (`/execute_js`)
- **LLM Q&A** — ask questions about page content (`/llm`)
- **Per-URL configs** for batch crawling (`crawler_configs` list)
- **Schema** retrieval (`/schema`)
- **Async context manager** with automatic cleanup

## Usage

### Basic crawl

```python
from crawl4ai_client import Crawl4aiDockerClient, CrawlerRunConfig, CacheMode

async with Crawl4aiDockerClient(base_url="http://localhost:11235") as client:
    result = await client.crawl(
        ["https://example.com"],
        crawler_config=CrawlerRunConfig(cache_mode=CacheMode.BYPASS),
    )
    print(result.raw_markdown)
```

### Multiple URLs with per-URL configs

```python
from crawl4ai_client import Crawl4aiDockerClient, CrawlerRunConfig

async with Crawl4aiDockerClient(base_url="http://localhost:11235") as client:
    results = await client.crawl(
        ["https://example.com", "https://httpbin.org/html"],
        crawler_configs=[
            CrawlerRunConfig(word_count_threshold=5),
            CrawlerRunConfig(word_count_threshold=50),
        ],
    )
    for r in results:
        print(f"{r.url}: {len(r.raw_markdown)} chars")
```

### Streaming

```python
async with Crawl4aiDockerClient(base_url="http://localhost:11235") as client:
    async for result in client.crawl_stream(["https://example.com", "https://httpbin.org/html"]):
        print(f"Got: {result.url}")
```

### Markdown endpoint

```python
async with Crawl4aiDockerClient(base_url="http://localhost:11235") as client:
    md = await client.get_markdown("https://example.com", content_filter="fit")
    print(md)
```

### Screenshot

```python
async with Crawl4aiDockerClient(base_url="http://localhost:11235") as client:
    screenshot_b64 = await client.screenshot("https://example.com")
```

### PDF generation

```python
async with Crawl4aiDockerClient(base_url="http://localhost:11235") as client:
    pdf_b64 = await client.get_pdf("https://example.com")
```

### HTML preprocessing

```python
async with Crawl4aiDockerClient(base_url="http://localhost:11235") as client:
    html = await client.get_html("https://example.com")
```

### JavaScript execution

```python
async with Crawl4aiDockerClient(base_url="http://localhost:11235") as client:
    result = await client.execute_js(
        "https://example.com",
        scripts=["document.title", "document.querySelectorAll('a').length"],
    )
    print(result.js_execution_result)
```

### LLM Q&A

```python
async with Crawl4aiDockerClient(base_url="http://localhost:11235") as client:
    answer = await client.llm_query(
        "https://example.com",
        query="What is this page about?",
    )
    print(answer)
```

## Why this package?

The full `crawl4ai` package installs 34+ dependencies (~500MB) including Playwright, browsers, numpy, and litellm. If you're running Crawl4AI as a Docker service and only need the client, this package gives you the same `Crawl4aiDockerClient` with just 2 dependencies.

## Compatibility

This client is compatible with Crawl4AI Docker server v0.8.x+. The config classes (`BrowserConfig`, `CrawlerRunConfig`) produce the same serialized format as the full library.

## License

Apache 2.0 — based on [crawl4ai](https://github.com/unclecode/crawl4ai) by [unclecode](https://github.com/unclecode).
