Metadata-Version: 2.1
Name: merlin-py
Version: 1.1.0
Summary: Read the merlin-py tutorials for use...
Home-page: https://github.com/kellen-t-oconnor/merlin-py
Author: Kellen O'Connor
Author-email: kellen.t.oconnor@gmail.com
License: MIT
Platform: UNKNOWN
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Description-Content-Type: text/markdown
Requires-Dist: camelot
Requires-Dist: tabula-py

Merlin-py is a potential evolution of camelot ( /atlanhq/camelot ) and tabula ( /tabulapdf/tabula ) to furthur simplify the extraction of tabular data from PDFs. Merlin-py aims to simplify data extraction by providing a search logic to users of the package for data extraction, whether that be a table label, set of column names, or table dimensions. An ideal use case is for the extraction of a common data table across many hundreds, or thousands of separate pdfs where the desired data is in different locations on each document.. (Tax documents, historical records, etc)

Current compatibility: Linux

Future compatibility: Windows, Mac



Linux Software requirements: bundled python packages, python-cv (debian)

Windows Software requirements: TBD

Mac Software requirements: TBD



Future features: Tesseract based image -> machine readable pdf conversion, runtime performance improvements, GUI frontend.

Big thanks to the developers and contributors of both tabula-py and camelot-py as this project is largely built atop these two other efforts.



