Metadata-Version: 2.4
Name: versed-pdf
Version: 1.1.0
Summary: Semantic PDF-to-Markdown engine for Arabic and bilingual texts with local routing and repair
Author: Versed Team
License: MIT
Keywords: arabic,pdf,markdown,text-extraction,quran,ocr,nlp
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: pdf
Requires-Dist: pymupdf>=1.24.0; extra == "pdf"
Provides-Extra: ocr
Requires-Dist: pytesseract>=0.3.10; extra == "ocr"
Requires-Dist: Pillow>=10.0.0; extra == "ocr"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Dynamic: license-file

# versed

Local PDF-to-Markdown tooling for Arabic and bilingual texts.

It repairs broken extraction, decodes QCF Quran fonts, classifies pages, and renders semantic Markdown from local PDFs.

## Install

```bash
pip install versed-pdf
pip install versed-pdf[pdf]
pip install versed-pdf[pdf,ocr]
```

## Quick start

```python
from versed import extract_document

result = extract_document("book.pdf", title="Book")
print(result.markdown)
```

## CLI

```bash
versed repair-text "tafß¬l"
versed detect book.pdf
versed classify book.pdf
versed extract book.pdf -o book.md
```

## Public modules

- `versed.repair`: Sabon mojibake repair helpers
- `versed.qcf`: QCF Quran font decoding
- `versed.classify`: local page classification and backend selection
- `versed.routing`: cost-aware routing heuristics
- `versed.layout`: aligned words to semantic blocks
- `versed.markdown`: semantic blocks to Markdown/plain text
- `versed.extract`: end-to-end local extraction

## License

MIT
