Metadata-Version: 2.4
Name: webcreeper
Version: 0.2.0
Summary: Open-source web crawling framework with specialized crawler agents.
Author: WebCreeper Contributors
License: MIT
Project-URL: Homepage, https://github.com/Y-Elsayed/WebCreeper
Project-URL: Repository, https://github.com/Y-Elsayed/WebCreeper
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests<3,>=2.31
Requires-Dist: httpx<1,>=0.27
Requires-Dist: beautifulsoup4<5,>=4.12
Provides-Extra: nlp
Requires-Dist: accelerate==1.1.1; extra == "nlp"
Requires-Dist: annotated-types==0.7.0; extra == "nlp"
Requires-Dist: blis==1.0.1; extra == "nlp"
Requires-Dist: catalogue==2.0.10; extra == "nlp"
Requires-Dist: cloudpathlib==0.20.0; extra == "nlp"
Requires-Dist: confection==0.1.5; extra == "nlp"
Requires-Dist: cymem==2.0.10; extra == "nlp"
Requires-Dist: huggingface-hub==0.26.2; extra == "nlp"
Requires-Dist: joblib==1.4.2; extra == "nlp"
Requires-Dist: langcodes==3.5.0; extra == "nlp"
Requires-Dist: language_data==1.3.0; extra == "nlp"
Requires-Dist: marisa-trie==1.2.1; extra == "nlp"
Requires-Dist: mpmath==1.3.0; extra == "nlp"
Requires-Dist: murmurhash==1.0.11; extra == "nlp"
Requires-Dist: networkx==3.4.2; extra == "nlp"
Requires-Dist: numpy==2.0.2; extra == "nlp"
Requires-Dist: packaging==24.2; extra == "nlp"
Requires-Dist: pillow==11.0.0; extra == "nlp"
Requires-Dist: preshed==3.0.9; extra == "nlp"
Requires-Dist: pydantic==2.10.3; extra == "nlp"
Requires-Dist: pydantic_core==2.27.1; extra == "nlp"
Requires-Dist: PyYAML==6.0.2; extra == "nlp"
Requires-Dist: regex==2024.11.6; extra == "nlp"
Requires-Dist: safetensors==0.4.5; extra == "nlp"
Requires-Dist: scikit-learn==1.5.2; extra == "nlp"
Requires-Dist: scipy==1.14.1; extra == "nlp"
Requires-Dist: sentence-transformers==3.3.1; extra == "nlp"
Requires-Dist: smart-open==7.0.5; extra == "nlp"
Requires-Dist: spacy==3.8.2; extra == "nlp"
Requires-Dist: spacy-legacy==3.0.12; extra == "nlp"
Requires-Dist: spacy-loggers==1.0.5; extra == "nlp"
Requires-Dist: srsly==2.4.8; extra == "nlp"
Requires-Dist: sympy==1.13.1; extra == "nlp"
Requires-Dist: thinc==8.3.2; extra == "nlp"
Requires-Dist: threadpoolctl==3.5.0; extra == "nlp"
Requires-Dist: tokenizers==0.20.3; extra == "nlp"
Requires-Dist: torch==2.5.1; extra == "nlp"
Requires-Dist: tqdm==4.67.0; extra == "nlp"
Requires-Dist: transformers==4.46.3; extra == "nlp"
Requires-Dist: typer==0.15.1; extra == "nlp"
Requires-Dist: typing_extensions==4.12.2; extra == "nlp"
Requires-Dist: wasabi==1.1.3; extra == "nlp"
Requires-Dist: weasel==0.4.1; extra == "nlp"
Requires-Dist: wrapt==1.17.0; extra == "nlp"
Dynamic: license-file

# WebCreeper: Crawl. Extract. Discover.

WebCreeper is an open-source crawling framework built around **agents**.
Each agent is a crawler specialized for a specific task, and all agents share core crawling primitives from `creeper_core`.

## Agent Model

- Agents are modular crawler units with clear responsibilities.
- Each agent can expose its own settings and extraction behavior.
- Shared infrastructure (robots handling, retries, rate limits, hooks, policies) lives in the core.

This makes it easy to:
- Start simple with one agent.
- Add new agents without rewriting crawl infrastructure.
- Compose custom extraction logic through callbacks and hooks.

## Agent Selection

Use this table to choose the right agent.

| Agent | When To Use It | Documentation |
|---|---|---|
| `Atlas` | Crawl website structure, build link graphs, and run custom per-page extraction callbacks/hooks. | `docs/agents/atlas.md` |

All agent-specific setup and code examples are documented in each agent page.

## Documentation

- Installation and project docs index: `docs/README.md`
- Agent docs index: `docs/agents/README.md`

## License

MIT. See `LICENSE`.
