Metadata-Version: 2.1
Name: silene
Version: 1.0.0
Summary: Silene is an open source web crawler framework built upon Pyppeteer.
Home-page: https://github.com/peterbencze/silene
Author: Peter Bencze
Author-email: benczepeter95@gmail.com
License: Apache License, Version 2.0
Project-URL: Bug Reports, https://github.com/peterbencze/silene/issues
Project-URL: Funding, https://www.paypal.com/paypalme/peterbencze
Project-URL: Source, https://github.com/peterbencze/silene
Keywords: crawler scraper spider framework pyppeteer data
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development :: Libraries :: Application Frameworks
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.7
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Software Development :: Libraries :: Application Frameworks
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.7
Description-Content-Type: text/markdown
Requires-Dist: appdirs (==1.4.4)
Requires-Dist: pyee (==7.0.4)
Requires-Dist: pyppeteer (==0.2.2)
Requires-Dist: syncer (==1.3.0)
Requires-Dist: tld (==0.12.3)
Requires-Dist: websockets (==8.1) ; python_full_version >= "3.6.1"
Requires-Dist: tqdm (==4.54.1) ; python_version >= "2.7" and python_version not in "3.0, 3.1, 3.2, 3.3"
Requires-Dist: urllib3 (==1.26.2) ; python_version >= "2.7" and python_version not in "3.0, 3.1, 3.2, 3.3, 3.4" and python_full_version < "4.0.0"
Provides-Extra: dev
Requires-Dist: appdirs (==1.4.4) ; extra == 'dev'
Requires-Dist: cached-property (==1.5.2) ; extra == 'dev'
Requires-Dist: cerberus (==1.3.2) ; extra == 'dev'
Requires-Dist: certifi (==2020.12.5) ; extra == 'dev'
Requires-Dist: distlib (==0.3.1) ; extra == 'dev'
Requires-Dist: iniconfig (==1.1.1) ; extra == 'dev'
Requires-Dist: orderedmultidict (==1.0.1) ; extra == 'dev'
Requires-Dist: pathspec (==0.8.1) ; extra == 'dev'
Requires-Dist: pep517 (==0.9.1) ; extra == 'dev'
Requires-Dist: pipenv-setup (==3.1.1) ; extra == 'dev'
Requires-Dist: pipfile (==0.0.2) ; extra == 'dev'
Requires-Dist: pytest (==6.1.2) ; extra == 'dev'
Requires-Dist: pytest-cov (==2.10.1) ; extra == 'dev'
Requires-Dist: pytest-httpserver (==0.3.6) ; extra == 'dev'
Requires-Dist: pytest-mock (==3.3.1) ; extra == 'dev'
Requires-Dist: regex (==2020.11.13) ; extra == 'dev'
Requires-Dist: typed-ast (==1.4.1) ; extra == 'dev'
Requires-Dist: importlib-metadata (==3.3.0) ; (python_version < "3.8") and extra == 'dev'
Requires-Dist: typing-extensions (==3.7.4.3) ; (python_version < "3.8") and extra == 'dev'
Requires-Dist: zipp (==3.4.0) ; (python_version < "3.8") and extra == 'dev'
Requires-Dist: plette[validation] (==0.2.3) ; (python_version >= "2.6" and python_version not in "3.0, 3.1, 3.2, 3.3") and extra == 'dev'
Requires-Dist: pyparsing (==2.4.7) ; (python_version >= "2.6" and python_version not in "3.0, 3.1, 3.2, 3.3") and extra == 'dev'
Requires-Dist: toml (==0.10.2) ; (python_version >= "2.6" and python_version not in "3.0, 3.1, 3.2, 3.3") and extra == 'dev'
Requires-Dist: attrs (==20.3.0) ; (python_version >= "2.7" and python_version not in "3.0, 3.1, 3.2, 3.3") and extra == 'dev'
Requires-Dist: idna (==2.10) ; (python_version >= "2.7" and python_version not in "3.0, 3.1, 3.2, 3.3") and extra == 'dev'
Requires-Dist: packaging (==20.8) ; (python_version >= "2.7" and python_version not in "3.0, 3.1, 3.2, 3.3") and extra == 'dev'
Requires-Dist: pluggy (==0.13.1) ; (python_version >= "2.7" and python_version not in "3.0, 3.1, 3.2, 3.3") and extra == 'dev'
Requires-Dist: py (==1.10.0) ; (python_version >= "2.7" and python_version not in "3.0, 3.1, 3.2, 3.3") and extra == 'dev'
Requires-Dist: python-dateutil (==2.8.1) ; (python_version >= "2.7" and python_version not in "3.0, 3.1, 3.2, 3.3") and extra == 'dev'
Requires-Dist: six (==1.15.0) ; (python_version >= "2.7" and python_version not in "3.0, 3.1, 3.2, 3.3") and extra == 'dev'
Requires-Dist: vistir (==0.5.2) ; (python_version >= "2.7" and python_version not in "3.0, 3.1, 3.2, 3.3") and extra == 'dev'
Requires-Dist: chardet (==4.0.0) ; (python_version >= "2.7" and python_version not in "3.0, 3.1, 3.2, 3.3, 3.4") and extra == 'dev'
Requires-Dist: click (==7.1.2) ; (python_version >= "2.7" and python_version not in "3.0, 3.1, 3.2, 3.3, 3.4") and extra == 'dev'
Requires-Dist: colorama (==0.4.4) ; (python_version >= "2.7" and python_version not in "3.0, 3.1, 3.2, 3.3, 3.4") and extra == 'dev'
Requires-Dist: pip-shims (==0.5.3) ; (python_version >= "2.7" and python_version not in "3.0, 3.1, 3.2, 3.3, 3.4") and extra == 'dev'
Requires-Dist: requests (==2.25.1) ; (python_version >= "2.7" and python_version not in "3.0, 3.1, 3.2, 3.3, 3.4") and extra == 'dev'
Requires-Dist: requirementslib (==1.5.16) ; (python_version >= "2.7" and python_version not in "3.0, 3.1, 3.2, 3.3, 3.4") and extra == 'dev'
Requires-Dist: tomlkit (==0.7.0) ; (python_version >= "2.7" and python_version not in "3.0, 3.1, 3.2, 3.3, 3.4") and extra == 'dev'
Requires-Dist: werkzeug (==1.0.1) ; (python_version >= "2.7" and python_version not in "3.0, 3.1, 3.2, 3.3, 3.4") and extra == 'dev'
Requires-Dist: wheel (==0.36.2) ; (python_version >= "2.7" and python_version not in "3.0, 3.1, 3.2, 3.3, 3.4") and extra == 'dev'
Requires-Dist: urllib3 (==1.26.2) ; (python_version >= "2.7" and python_version not in "3.0, 3.1, 3.2, 3.3, 3.4" and python_full_version < "4.0.0") and extra == 'dev'
Requires-Dist: coverage (==5.3) ; (python_version >= "2.7" and python_version not in "3.0, 3.1, 3.2, 3.3, 3.4" and python_version < "4") and extra == 'dev'
Requires-Dist: black (==19.10b0) ; (python_version >= "3.6") and extra == 'dev'

# Silene

Silene is an open source web crawler framework built upon [Pyppeteer](https://github.com/pyppeteer/pyppeteer).

## Requirements
You must have at least [Python 3.7](https://www.python.org/downloads/) installed.

## Installation
To install the latest release run `pip install silene`.

## Quickstart guide

Each crawler must subclass the `Crawler` class and implement the abstract `configure` method. The `CrawlerConfiguration`
specifies the initial requests to make and other properties of the crawler. Once a request is processed, the appropriate
callback will be invoked. By default, in case of a successful request the
`on_response_success` callback will be executed. This is where you can interact with the page content. You can also
specify custom callbacks for your requests.

Below you can find a very simple implementation.

### Example code snippet

```python
from silene.crawl_request import CrawlRequest
from silene.crawl_response import CrawlResponse
from silene.crawler import Crawler
from silene.crawler_configuration import CrawlerConfiguration


class MyCrawler(Crawler):
    def configure(self) -> CrawlerConfiguration:
        return CrawlerConfiguration([CrawlRequest('https://example.com')])

    def on_response_success(self, response: CrawlResponse) -> None:
        # Do something with the response...
        pass
```

## Development instructions

### Prerequisite

This project requires [Pipenv](https://docs.pipenv.org/) to be installed.

### Create environment

Run `pipenv install --dev` to create a new virtual environment and install the necessary packages.

### Run tests

Run `pytest` in the project root folder.

### Run tests with coverage

Run `pytest --cov=silene` in the project root folder.

## License

The source code of Silene is made available under
the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0).


