Metadata-Version: 2.0
Name: piculet
Version: 1.0b7
Summary: XML/HTML scraper using XPath queries.
Home-page: https://bitbucket.org/uyar/piculet
Author: H. Turgut Uyar
Author-email: uyar@tekir.org
License: LGPL
Keywords: xml html xpath scrape json
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU Lesser General Public License v3 or later (LGPLv3+)
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Markup :: HTML
Classifier: Topic :: Text Processing :: Markup :: XML
Classifier: Topic :: Utilities
Provides-Extra: dev
Requires-Dist: flake8; extra == 'dev'
Requires-Dist: flake8-isort; extra == 'dev'
Requires-Dist: flake8-docstrings; extra == 'dev'
Requires-Dist: wheel; extra == 'dev'
Requires-Dist: twine; extra == 'dev'
Provides-Extra: doc
Requires-Dist: sphinx; extra == 'doc'
Requires-Dist: sphinx-rtd-theme; extra == 'doc'
Requires-Dist: pygenstub; extra == 'doc'
Provides-Extra: test
Requires-Dist: pytest; extra == 'test'
Requires-Dist: pytest-cov; extra == 'test'
Requires-Dist: pytest-profiling; extra == 'test'
Provides-Extra: yaml
Requires-Dist: pyyaml; extra == 'yaml'

Copyright (C) 2014-2018 H. Turgut Uyar <uyar@tekir.org>

Piculet is a module for extracting data from XML or HTML documents
using XPath queries. It consists of a `single source file`_
with no dependencies other than the standard library, which makes it very easy
to integrate into applications. It also provides a command line interface.

:PyPI: https://pypi.python.org/pypi/piculet/
:Repository: https://bitbucket.org/uyar/piculet
:Documentation: https://piculet.readthedocs.io/

Piculet has been tested with Python 2.7, Python 3.4+, PyPy2 5.7+,
and PyPy3 5.7+. You can install the latest version using ``pip``::

    pip install piculet

.. _single source file: https://bitbucket.org/uyar/piculet/src/tip/piculet.py




History
=======

1.0b7 (2018-03-21)
------------------

- Dropped support for Python 3.3.
- Fixes for handling Unicode data in HTML for Python 2.
- Added registry for preprocessors.

1.0b6 (2018-01-17)
------------------

- Support for writing specifications in YAML.

1.0b5 (2018-01-16)
------------------

- Added a class-based API for writing specifications.
- Added predefined transformation functions.
- Removed callables from specification maps. Use the new API instead.
- Added support for registering new reducers and transformers.
- Added support for defining sections in document.
- Refactored XPath evaluation method in order to parse path expressions once.
- Preprocessing will be done only once when the tree is built.
- Concatenation is now the default reducing operation.

1.0b4 (2018-01-02)
------------------

- Added "--version" option to command line arguments.
- Added option to force the use of lxml's HTML builder.
- Fixed the error where non-truthy values would be excluded from the result.
- Added support for transforming node text during preprocess.
- Added separate preprocessing function to API.
- Renamed the "join" reducer as "concat".
- Renamed the "foreach" keyword for keys as "section".
- Removed some low level debug messages to substantially increase speed.

1.0b3 (2017-07-25)
------------------

- Removed the caching feature.

1.0b2 (2017-06-16)
------------------

- Added helper function for getting cache hash keys of URLs.

1.0b1 (2017-04-26)
------------------

- Added optional value transformations.
- Added support for custom reducer callables.
- Added command-line option for scraping documents from local files.

1.0a2 (2017-04-04)
------------------

- Added support for Python 2.7.
- Fixed lxml support.

1.0a1 (2016-08-24)
------------------

- First release on PyPI.


