Metadata-Version: 2.4
Name: docquill
Version: 0.1.4
Summary: Professional DOCX document processing library with AI-ready JSON export, PDF/HTML rendering, and round-trip editing
Project-URL: Homepage, https://github.com/AddNap/DocQuill
Project-URL: Documentation, https://github.com/AddNap/DocQuill/tree/main/docs
Project-URL: Repository, https://github.com/AddNap/DocQuill
Project-URL: Issues, https://github.com/AddNap/DocQuill/issues
Author: AddNap
License-Expression: Apache-2.0
Keywords: ai,document,docx,html,nlp,office,openxml,parser,pdf,renderer,word
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Office/Business :: Office Suites
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Markup :: XML
Requires-Python: >=3.9
Requires-Dist: lxml>=4.9.0
Requires-Dist: reportlab>=4.0.0
Provides-Extra: dev
Requires-Dist: black>=23.0; extra == 'dev'
Requires-Dist: mypy>=1.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Provides-Extra: rust
Requires-Dist: docquill-rust>=0.1.0; extra == 'rust'
Description-Content-Type: text/markdown

# DocQuill

[![PyPI version](https://badge.fury.io/py/docquill.svg)](https://badge.fury.io/py/docquill)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)

**Professional DOCX document processing library** with AI-ready JSON export, PDF/HTML rendering, and round-trip editing.

## Features

- 📄 **Full DOCX Parsing** - Headers, footers, tables, images, styles, numbering
- 🔄 **Round-trip Editing** - DOCX → HTML → DOCX with formatting preservation
- 📊 **AI-Ready JSON Export** - Structured layout data for ML/NLP workflows
- 🖨️ **PDF Rendering** - Python (ReportLab) or high-performance Rust backend
- 🎨 **HTML Export** - Static or editable HTML output
- 📝 **Placeholder Engine** - 20+ placeholder types for document automation
- 🔀 **Document Merging** - Combine documents with OPC relationship handling

## Installation

```bash
pip install docquill
```

For high-performance PDF rendering with Rust:

```bash
pip install docquill[rust]
```

## Quick Start

```python
from docquill import Document

# Open and process a document
doc = Document.open("document.docx")

# Export to PDF
doc.to_pdf("output.pdf")

# Export to HTML
doc.to_html("output.html")

# Get AI-ready JSON layout
layout = doc.pipeline()
json_data = layout.to_json()

# Fill placeholders
doc.fill_placeholders({
    "company_name": "Acme Corp",
    "date": "2024-01-15"
})
doc.save("filled.docx")
```

## Documentation

See the [full documentation](https://github.com/AddNap/DocQuill/tree/main/docs) for:

- [Getting Started](https://github.com/AddNap/DocQuill/blob/main/docs/getting-started.md)
- [API Reference](https://github.com/AddNap/DocQuill/blob/main/docs/api-reference.md)
- [Architecture](https://github.com/AddNap/DocQuill/blob/main/docs/architecture.md)
- [AI Integration](https://github.com/AddNap/DocQuill/blob/main/docs/ai-integration.md)

## License

Apache License 2.0

