Metadata-Version: 2.4
Name: docuweave
Version: 0.1.0
Summary: Layout-aware document parser for structured LLM-ready JSON
Author-email: venkateswaraRao <mrvenky18@gmail.com>
License: MIT
Keywords: pdf,llm,rag,document parsing,nlp
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pydantic>=2.0
Requires-Dist: pymupdf>=1.23
Requires-Dist: tiktoken>=0.5
Dynamic: license-file

# DocuWeave

Layout-aware document parser that converts PDFs into structured, hierarchical, LLM-ready JSON.

## Features

- Deterministic layout-based parsing
- Hierarchical section detection
- Token-aware smart chunking
- Embedding-ready JSON export
- RAG pipeline optimized

## Installation

```bash
pip install docuweave

from docuweave import parse

doc = parse("sample.pdf")

doc.to_chunks(max_tokens=500)
doc.save_json("output.json")

{
  "metadata": {...},
  "sections": [...],
  "chunks": [...]
}



We’ll improve this later.

---
