Metadata-Version: 2.4
Name: scrapelib
Version: 2.4.0
Project-URL: Repository, https://codeberg.org/jpt/scrapelib
Author-email: James Turk <dev@jpt.sh>
License: BSD-2-Clause
License-File: LICENSE
Classifier: Development Status :: 6 - Mature
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: BSD License
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Requires-Dist: requests[security]>=2.28.1
Requires-Dist: urllib3
Provides-Extra: dev
Requires-Dist: coveralls>=3.3.1; extra == 'dev'
Requires-Dist: flake8>=3.9.0; extra == 'dev'
Requires-Dist: flask==2.1.0; extra == 'dev'
Requires-Dist: importlib-metadata<5.0; extra == 'dev'
Requires-Dist: mkdocs-material>=9.2.7; extra == 'dev'
Requires-Dist: mkdocstrings==0.19.0; extra == 'dev'
Requires-Dist: mock>=4.0.3; extra == 'dev'
Requires-Dist: mypy>=0.961; extra == 'dev'
Requires-Dist: pytest-cov>=2.11.1; extra == 'dev'
Requires-Dist: pytest-httpbin>=2.0.0; extra == 'dev'
Requires-Dist: pytest>=7.1.2; extra == 'dev'
Requires-Dist: types-mock>=4.0.15; extra == 'dev'
Requires-Dist: types-requests>=2.28.11; extra == 'dev'
Requires-Dist: werkzeug==2.0.3; extra == 'dev'
Description-Content-Type: text/markdown

**scrapelib** is a library for making requests to less-than-reliable websites.

**This repository has moved to Codeberg, GitHub will remain as a read-only mirror.**

Source: [https://codeberg.org/jpt/scrapelib](https://codeberg.org/jpt/scrapelib)

Documentation: [https://jamesturk.github.io/scrapelib/](https://jamesturk.github.io/scrapelib/)

Issues: [https://codeberg.org/jpt/scrapelib/issues](https://codeberg.org/jpt/scrapelib/issues)

[![PyPI badge](https://badge.fury.io/py/scrapelib.svg)](https://badge.fury.io/py/scrapelib)
[![Test badge](https://github.com/jamesturk/scrapelib/workflows/Test/badge.svg)](https://github.com/jamesturk/scrapelib/actions?query=workflow%3ATest)

## Features

**scrapelib** originated as part of the [Open States](http://openstates.org/)
project to scrape the websites of all 50 state legislatures and as a result
was therefore designed with features desirable when dealing with sites that
have intermittent errors or require rate-limiting.

Advantages of using scrapelib over using requests as-is:

- HTTP(S) and FTP requests via an identical API
- support for simple caching with pluggable cache backends
- highly-configurable request throtting
- configurable retries for non-permanent site failures
- All of the power of the suberb [requests](http://python-requests.org) library.


## Installation

*scrapelib* is on [PyPI](https://pypi.org/project/scrapelib/), and can be installed via any standard package management tool.


## Example Usage

``` python

  import scrapelib
  s = scrapelib.Scraper(requests_per_minute=10)

  # Grab Google front page
  s.get('http://google.com')

  # Will be throttled to 10 HTTP requests per minute
  while True:
      s.get('http://example.com')
```
