Metadata-Version: 2.1
Name: docling
Version: 2.1.0
Summary: Docling PDF conversion package
Home-page: https://github.com/DS4SD/docling
License: MIT
Keywords: docling,convert,document,pdf,layout model,segmentation,table structure,table former
Author: Christoph Auer
Author-email: cau@zurich.ibm.com
Requires-Python: >=3.10,<4.0
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Provides-Extra: tesserocr
Requires-Dist: beautifulsoup4 (>=4.12.3,<5.0.0)
Requires-Dist: certifi (>=2024.7.4)
Requires-Dist: deepsearch-glm (>=0.25.0,<0.26.0)
Requires-Dist: docling-core (>=2.0.0,<3.0.0)
Requires-Dist: docling-ibm-models (>=2.0.1,<3.0.0)
Requires-Dist: docling-parse (>=1.6.0,<2.0.0)
Requires-Dist: easyocr (>=1.7,<2.0)
Requires-Dist: filetype (>=1.2.0,<2.0.0)
Requires-Dist: huggingface_hub (>=0.23,<1)
Requires-Dist: pandas (>=2.1.4,<3.0.0)
Requires-Dist: pyarrow (>=16.1.0,<17.0.0)
Requires-Dist: pydantic (>=2.0.0,<3.0.0)
Requires-Dist: pydantic-settings (>=2.3.0,<3.0.0)
Requires-Dist: pypdfium2 (>=4.30.0,<5.0.0)
Requires-Dist: python-docx (>=1.1.2,<2.0.0)
Requires-Dist: python-pptx (>=1.0.2,<2.0.0)
Requires-Dist: requests (>=2.32.3,<3.0.0)
Requires-Dist: rtree (>=1.3.0,<2.0.0)
Requires-Dist: scipy (>=1.14.1,<2.0.0)
Requires-Dist: tesserocr (>=2.7.1,<3.0.0) ; extra == "tesserocr"
Requires-Dist: torch (>=2.2.2,<2.3.0) ; sys_platform == "darwin" and platform_machine == "x86_64"
Requires-Dist: torch (>=2.2.2,<3.0.0) ; sys_platform != "darwin" or platform_machine != "x86_64"
Requires-Dist: torchvision (>=0,<1) ; sys_platform != "darwin" or platform_machine != "x86_64"
Requires-Dist: torchvision (>=0.17.2,<0.18.0) ; sys_platform == "darwin" and platform_machine == "x86_64"
Requires-Dist: typer (>=0.12.5,<0.13.0)
Project-URL: Repository, https://github.com/DS4SD/docling
Description-Content-Type: text/markdown

<p align="center">
  <a href="https://github.com/ds4sd/docling">
    <img loading="lazy" alt="Docling" src="docs/assets/docling_processing.png" width="100%"/>
  </a>
</p>

# Docling

[![arXiv](https://img.shields.io/badge/arXiv-2408.09869-b31b1b.svg)](https://arxiv.org/abs/2408.09869)
[![Docs](https://img.shields.io/badge/docs-live-brightgreen)](https://ds4sd.github.io/docling/)
[![PyPI version](https://img.shields.io/pypi/v/docling)](https://pypi.org/project/docling/)
![Python](https://img.shields.io/badge/python-3.10%20%7C%203.11%20%7C%203.12-blue)
[![Poetry](https://img.shields.io/endpoint?url=https://python-poetry.org/badge/v0.json)](https://python-poetry.org/)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336)](https://pycqa.github.io/isort/)
[![Pydantic v2](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/pydantic/pydantic/main/docs/badge/v2.json)](https://pydantic.dev)
[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://github.com/pre-commit/pre-commit)
[![License MIT](https://img.shields.io/github/license/DS4SD/docling)](https://opensource.org/licenses/MIT)

Docling parses documents and exports them to the desired format with ease and speed.


## Features

* 🗂️ Multi-format support for input (PDF, DOCX etc.) & output (Markdown, JSON etc.)
* 📑 Advanced PDF document understanding incl. page layout, reading order & table structures
* 📝 Metadata extraction, including title, authors, references & language
* 🤖 Seamless LlamaIndex 🦙 & LangChain 🦜🔗 integration for powerful RAG / QA applications
* 🔍 OCR support for scanned PDFs
* 💻 Simple and convenient CLI

Explore the [documentation](https://ds4sd.github.io/docling/) to discover plenty examples and unlock the full power of Docling!


## Installation

To use Docling, simply install `docling` from your package manager, e.g. pip:
```bash
pip install docling
```

Works on macOS, Linux and Windows environments. Both x86_64 and arm64 architectures.

More [detailed installation instructions](https://ds4sd.github.io/docling/installation/) are available in the docs.

## Getting started

To convert individual documents, use `convert()`, for example:

```python
from docling.document_converter import DocumentConverter

source = "https://arxiv.org/pdf/2408.09869"  # PDF path or URL
converter = DocumentConverter()
result = converter.convert(source)
print(result.document.export_to_markdown())  # output: "## Docling Technical Report[...]"
```


Check out [Getting started](https://ds4sd.github.io/docling/).
You will find lots of tuning options to leverage all the advanced capabilities.


## Get help and support

Please feel free to connect with us using the [discussion section](https://github.com/DS4SD/docling/discussions).


## Technical report

For more details on Docling's inner workings, check out the [Docling Technical Report](https://arxiv.org/abs/2408.09869).

## Contributing

Please read [Contributing to Docling](https://github.com/DS4SD/docling/blob/main/CONTRIBUTING.md) for details.


## References

If you use Docling in your projects, please consider citing the following:

```bib
@techreport{Docling,
  author = {Deep Search Team},
  month = {8},
  title = {Docling Technical Report},
  url = {https://arxiv.org/abs/2408.09869},
  eprint = {2408.09869},
  doi = {10.48550/arXiv.2408.09869},
  version = {1.0.0},
  year = {2024}
}
```

## License

The Docling codebase is under MIT license.
For individual model usage, please refer to the model licenses found in the original packages.

