Metadata-Version: 2.4
Name: spatula
Version: 1.0.0
Summary: A modern Python library for writing maintainable web scrapers.
Project-URL: Repository, https://codeberg.org/jpt/spatula/
Project-URL: Documentation, https://jamesturk.github.io/spatula/
Author-email: james turk <dev@jpt.sh>
License: MIT
License-File: LICENSE
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Requires-Dist: attrs<21.0.0,>=20.3.0
Requires-Dist: click<9.0.0,>=8.0.0
Requires-Dist: cssselect<2.0.0,>=1.1.0
Requires-Dist: lxml<6,>4.6
Requires-Dist: openpyxl<4.0.0,>=3.0.6
Requires-Dist: scrapelib<3.0.0,>=2.0.6
Provides-Extra: shell
Requires-Dist: ipython<8.0.0,>=7.19.0; extra == 'shell'
Description-Content-Type: text/markdown

# Overview

*spatula* is a modern Python library for writing maintainable web scrapers.

**Please note, the official repository has changed to Codeberg; GitHub will only be used as a mirror.**

Source: [https://codeberg.org/jpt/spatula/](https://codeberg.org/jpt/spatula/)

Documentation: [https://jamesturk.github.io/spatula/](https://jamesturk.github.io/spatula/)

Issues: [https://codeberg.org/jpt/spatula/issues](https://codeberg.org/jpt/spatula/issues)

[![PyPI badge](https://badge.fury.io/py/spatula.svg)](https://badge.fury.io/py/spatula)

## Features

- **Page-oriented design**: Encourages writing understandable & maintainable scrapers.
- **Not Just HTML**: Provides built in [handlers for common data formats](https://jamesturk.github.io/spatula/reference/#pages) including CSV, JSON, XML, PDF, and Excel.  Or write your own.
- **Fast HTML parsing**: Uses `lxml.html` for fast, consistent, and reliable parsing of HTML.
- **Flexible Data Model Support**: Compatible with `dataclasses`, `attrs`, `pydantic`, or bring your own data model classes for storing & validating your scraped data.
- **CLI Tools**: Offers several [CLI utilities](https://jamesturk.github.io/spatula/cli/) that can help streamline development & testing cycle.
- **Fully Typed**: Makes full use of Python 3 type annotations.
