Metadata-Version: 2.4
Name: fast-scrape
Version: 0.1.5
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Programming Language :: Rust
Classifier: Topic :: Text Processing :: Markup :: HTML
Classifier: Typing :: Typed
Summary: High-performance HTML parsing library for Python
Keywords: html,parser,scraping,css-selectors,dom
License: MIT OR Apache-2.0
Requires-Python: >=3.10
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Documentation, https://github.com/bug-ops/scrape-rs
Project-URL: Homepage, https://github.com/bug-ops/scrape-rs
Project-URL: Issues, https://github.com/bug-ops/scrape-rs/issues
Project-URL: Repository, https://github.com/bug-ops/scrape-rs

# fast-scrape

[![PyPI](https://img.shields.io/pypi/v/fast-scrape)](https://pypi.org/project/fast-scrape)
[![Python](https://img.shields.io/pypi/pyversions/fast-scrape)](https://pypi.org/project/fast-scrape)
[![License](https://img.shields.io/pypi/l/fast-scrape)](../../LICENSE-MIT)

**10-50x faster** HTML parsing for Python. Rust-powered, BeautifulSoup-compatible API.

## Installation

```bash
pip install fast-scrape
```

<details>
<summary>Alternative package managers</summary>

```bash
# uv (recommended - 10-100x faster)
uv pip install fast-scrape

# Poetry
poetry add fast-scrape

# Pipenv
pipenv install fast-scrape
```

</details>

> [!IMPORTANT]
> Requires Python 3.10 or later.

## Quick start

```python
from scrape_rs import Soup

soup = Soup("<html><body><div class='content'>Hello, World!</div></body></html>")

div = soup.find("div")
print(div.text)  # Hello, World!
```

## Usage

<details open>
<summary><strong>Find elements</strong></summary>

```python
from scrape_rs import Soup

soup = Soup(html)

# Find first element by tag
div = soup.find("div")

# Find all elements
divs = soup.find_all("div")

# CSS selectors
for el in soup.select("div.content > p"):
    print(el.text)
```

</details>

<details>
<summary><strong>Element properties</strong></summary>

```python
element = soup.find("a")

text = element.text          # Get text content
html = element.inner_html    # Get inner HTML
href = element.get("href")   # Get attribute
```

</details>

<details>
<summary><strong>Batch processing</strong></summary>

```python
from scrape_rs import Soup

# Process multiple documents in parallel
documents = [html1, html2, html3]
soups = Soup.parse_batch(documents)

for soup in soups:
    print(soup.find("title").text)
```

> [!TIP]
> Use `parse_batch()` for processing multiple documents. Uses all CPU cores automatically.

</details>

<details>
<summary><strong>Type hints</strong></summary>

Full IDE support with type stubs:

```python
from scrape_rs import Soup, Tag

def extract_links(soup: Soup) -> list[str]:
    return [a.get("href") for a in soup.select("a[href]")]
```

</details>

## Performance

Compared to BeautifulSoup:

| Operation | Speedup |
|-----------|---------|
| Parse (1 KB) | **9.7x** faster |
| Parse (5.9 MB) | **10.6x** faster |
| `find(".class")` | **132x** faster |
| `select(".class")` | **40x** faster |

## Related packages

| Platform | Package |
|----------|---------|
| Rust | [`scrape-core`](https://crates.io/crates/scrape-core) |
| Node.js | [`@fast-scrape/node`](https://www.npmjs.com/package/@fast-scrape/node) |
| WASM | [`@fast-scrape/wasm`](https://www.npmjs.com/package/@fast-scrape/wasm) |

## License

MIT OR Apache-2.0

