Metadata-Version: 2.4
Name: cccscrapper
Version: 0.1.1
Summary: A simple web scraping helper package
Home-page: https://github.com/harishrvarma/cccscrapper
Author: Harish varma
Author-email: Harish Varma <harish.cybercom@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/harishrvarma/cccscrapper
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests>=2.31.0
Requires-Dist: beautifulsoup4>=4.12.0
Requires-Dist: markdownify>=0.11.6
Requires-Dist: PyPDF2>=3.0.0
Requires-Dist: playwright>=1.45.0
Provides-Extra: dev
Requires-Dist: build; extra == "dev"
Requires-Dist: twine; extra == "dev"
Requires-Dist: requests-toolbelt; extra == "dev"
Dynamic: license-file

# pyscrapper

A simple Python package for crawling and structure validation.  

- `get_structure.py`
- `validate_get_structure.py`
- `crawl.py`
- `validate_crawl.py`

## Installation

```bash
pip install -e .

get_structure.py --out output/topic_structure.json
init(config, url)
 
validate_get_structure.py --out output/topic_structure.json
init()
 
crawl.py --url https://topic.com --out output
init(config)
 
validate_crawl.py --out output
init()

