Metadata-Version: 2.4
Name: py-websites-scraper
Version: 0.1.0
Summary: Mutliple we scraper using aiohttp to make it easy to scrape multiple URLs
Project-URL: Homepage, https://github.com/hilmanski/py-websites-scraper
Project-URL: Issues, https://github.com/hilmanski/py-websites-scraper
Author-email: Hilman Ramadhan <hilmanhgb@gmail.com>
License: MIT
License-File: LICENSE
Requires-Python: >=3.8
Requires-Dist: aiohttp>=3.12.2
Requires-Dist: readability-lxml>=0.8.4
Provides-Extra: dev
Requires-Dist: black; extra == 'dev'
Requires-Dist: isort; extra == 'dev'
Requires-Dist: pytest; extra == 'dev'
Description-Content-Type: text/markdown

# About
Scrape main content on multiple websites using Python in parallel. 

## Dependency
- [AsyncIO](https://docs.python.org/3/library/asyncio.html)
- [aiohttp](https://docs.aiohttp.org/en/stable/)
- [Readability-lxml](https://pypi.org/project/readability-lxml/)

## How to use 
```
pip install py-web-scraper
```

Quick usage:
```
import asyncio
from py_websites_scraper import scrape_urls

urls = ["https://news.ycombinator.com", "https://example.com"]
data = asyncio.run(scrape_urls(urls, max_concurrency=5))
for item in data:
    print(item["url"], item.get("title"))
```

You can add any parameters for aiohttp to perform the request like headers, proxy, and more.

Example:
```
urls = []
 results = await scrape_urls(
        urls,
        proxy="YOUR_PROXY_INFO",
        headers={"User-Agent": "USER_AGENT_INFO"},
    )
```

## Limitation
- Gated content
- Dynamic generated content

## How the test the package locally for Dev
Install in editable mode:
```
pip install -e .
```

Run any file that importing this package
```
python test_local.py
```