Metadata-Version: 2.1
Name: edspdf
Version: 0.5.2
Summary: Smart text extraction from PDF documents
Home-page: https://github.com/aphp/edspdf/
License: BSD-3
Author: Basile Dura
Author-email: basile.dura-ext@aphp.fr
Requires-Python: >=3.7, !=2.7.*, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*, !=3.5.*, !=3.6.*
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Requires-Dist: catalogue (>=2.0.7,<3.0.0)
Requires-Dist: loguru (>=0.6.0,<0.7.0)
Requires-Dist: networkx (>=2.6,<3.0)
Requires-Dist: pandas (>=1.2,<2.0)
Requires-Dist: pdfminer.six (>=20220319,<20220320)
Requires-Dist: pydantic (>=1.2,<2.0)
Requires-Dist: pypdfium2 (>=2.7.1,<3.0.0)
Requires-Dist: scikit-learn (>=1.0.2,<2.0.0)
Requires-Dist: scipy (>=1.7.0,<2.0.0)
Requires-Dist: thinc (>=8.0.15,<9.0.0)
Project-URL: Documentation, https://aphp.github.io/edspdf/
Project-URL: Repository, https://github.com/aphp/edspdf/
Description-Content-Type: text/markdown

![Tests](https://img.shields.io/github/workflow/status/aphp/edspdf/Tests%20and%20Linting?label=tests&style=flat-square)
[![Documentation](https://img.shields.io/github/workflow/status/aphp/edspdf/Documentation?label=docs&style=flat-square)](https://aphp.github.io/edspdf/latest/)
[![PyPI](https://img.shields.io/pypi/v/edspdf?color=blue&style=flat-square)](https://pypi.org/project/edspdf/)
[![Codecov](https://img.shields.io/codecov/c/github/aphp/edspdf?logo=codecov&style=flat-square)](https://codecov.io/gh/aphp/edspdf)
[![DOI](https://zenodo.org/badge/517726737.svg)](https://zenodo.org/badge/latestdoi/517726737)

# EDS-PDF

EDS-PDF provides modular framework to extract text from PDF documents.

You can use it out-of-the-box, or extend it to fit your use-case.

## Getting started

Install the library with pip:

<div class="termy">

```console
$ pip install edspdf
```

</div>

Visit the [documentation](https://aphp.github.io/edspdf/) for more information!

## Citation

If you use EDS-NLP, please cite us as below.

```bibtex
@software{edspdf,
  author  = {Dura, Basile and Wajsburt, Perceval and Calliger, Alice and Gérardin, Christel and Bey, Romain},
  doi     = {10.5281/zenodo.6902977},
  license = {BSD-3-Clause},
  title   = {{EDS-PDF: Smart text extraction from PDF documents}},
  url     = {https://github.com/aphp/edspdf}
}
```

## Acknowledgement

We would like to thank [Assistance Publique – Hôpitaux de Paris](https://www.aphp.fr/)
and [AP-HP Foundation](https://fondationrechercheaphp.fr/) for funding this project.

