Metadata-Version: 2.1
Name: scrawler
Version: 0.3.0
Summary: Tool for General Purpose Web Scraping and Crawling
Home-page: https://github.com/dglttr/scrawler
Author: Daniel Glatter
Author-email: d.glatter@outlook.com
License: MIT
Project-URL: Bug Tracker, https://github.com/dglttr/scrawler/issues
Project-URL: Documentation, https://scrawler.readthedocs.io/
Project-URL: Source Code, https://github.com/dglttr/scrawler
Keywords: Web Scraping,Crawling,asyncio,multithreading
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Requires-Dist: beautifulsoup4 (>=4.9.1)
Requires-Dist: requests (>=2.24.0)
Requires-Dist: tld (>=0.12.5)
Requires-Dist: pandas (>=1.0.5)
Requires-Dist: python-dateutil (>=2.8.1)
Requires-Dist: setuptools (>=28.8.0)
Requires-Dist: aiohttp (>=3.7.3)
Requires-Dist: readability-lxml (>=0.8.1)

Welcome to scrawler's documentation!
====================================

*"scrawler" = "scraper" + "crawler"*

Provides functionality for the automatic collection of website data
(`web scraping <https://en.wikipedia.org/wiki/Web_scraping>`__) and
following links to map an entire domain
(`crawling <https://en.wikipedia.org/wiki/Web_crawler>`__). It can
handle these tasks individually, or process several websites/domains in
parallel using ``asyncio`` and ``multithreading``.

This project was initially developed while working at the `Fraunhofer
Institute for Systems and Innovation
Research <https://www.isi.fraunhofer.de/en.html>`__. Many thanks for the
opportunity and support!

Installation
------------

You can install scrawler from PyPI:

::

    pip install scrawler

.. note::
    Alternatively, you can find the ``.whl`` and ``.tar.gz`` files on GitHub
    for each respective `release <https://github.com/dglttr/scrawler/releases>`__.

Getting Started
---------------

Check out the `Getting Started Guide <https://scrawler.readthedocs.io/en/latest/getting_started.html>`__.

Documentation
-------------

Documentation is available at `Read the Docs <https://scrawler.readthedocs.io/en/latest/>`__.

