Metadata-Version: 2.1
Name: pywebcrwl
Version: 0.1.1
Summary: Web crawling tool
Author-email: idriss1433@gmail.com
Project-URL: Homepage and Documentation, https://github.com/NoneToRoot/pywebcrwl
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE.txt
Requires-Dist: requests
Requires-Dist: beautifulsoup4
Requires-Dist: phonenumbers
Requires-Dist: geotext

# pywebcrwl

**pywebcrwl** is a simple Python web crawler that extracts various types of information such as links, emails, phone numbers, keywords, and more from websites.

## Features

- Crawl and extract all pages from a given URL
- Extract email addresses (with optional domain filtering)
- Extract phone numbers (including international formats)
- Detect cities mentioned in the text
- Find matches for a given regular expression
- Extract all image URLs
- Extract all websites/domains mentioned on a page
- Extract downloadable documents (optionally by file extension)
- Extract raw HTML code of pages
- Identify keywords from the content
- Extract all sentences containing a specific word
- Extract website favicons
- Extract social media links
- Generate a summary (resume) of a page

## Installation

```bash
pip install pywebcrwl
