Metadata-Version: 2.3
Name: scrapetools
Version: 1.1.9
Summary: A collection of tools to aid in web scraping.
Project-URL: Homepage, https://github.com/matt-manes/scrapetools
Project-URL: Documentation, https://github.com/matt-manes/scrapetools/tree/main/docs
Project-URL: Source code, https://github.com/matt-manes/scrapetools/tree/main/src/scrapetools
Author-email: Matt Manes <mattmanes@pm.me>
License-File: LICENSE.txt
Keywords: email,html,scrape,scraping,web,webscraping
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.10
Requires-Dist: beautifulsoup4
Requires-Dist: phonenumbers
Description-Content-Type: text/markdown

# Scrapetools

A collection of tools to aid in web scraping.  

Install using:

```console
pip install scrapetools
```

Scrapetools contains three functions (scrape_emails, scrape_phone_numbers, scrape_inputs)
and one class (LinkScraper).  

## Basic usage

```python
import scrapetools
import requests

url = 'https://somewebsite.com'
source = requests.get(url).text

emails = scrapetools.scrape_emails(source)

phoneNumbers = scrapetools.scrape_phone_numbers(source)

scraper = scrapetools.LinkScraper(source, url)
scraper.scrape_page()
# links can be accessed and filtered via the get_links() function
same_site_links = scraper.get_links(same_site_only=True)
same_site_image_links = scraper.get_links(link_type='img', same_site_only=True)
external_image_links = scraper.get_links(link_type='img', excluded_links=same_site_image_links)

# scrape_inputs() returns a tuple of BeautifulSoup Tag elements for various user input elements
forms, inputs, buttons, selects, text_areas = scrapetools.scrape_inputs(source)
```
