Metadata-Version: 2.4
Name: open-xtract
Version: 0.1.2
Summary: Open-source framework that extracts structured data from unstructured data.
Project-URL: Homepage, https://mellow-artificial-intelligence.github.io/open-xtract/
Author-email: Cole McIntosh <cole@staymellow.ai>
License: MIT
License-File: LICENSE
Requires-Python: >=3.12
Requires-Dist: aiofiles>=24.0.0
Requires-Dist: httpx>=0.27.0
Requires-Dist: langchain-anthropic>=0.3.0
Requires-Dist: langchain-community>=0.3.0
Requires-Dist: langchain-google-genai>=2.1.9
Requires-Dist: langchain-openai>=0.3.29
Requires-Dist: langchain>=0.3.0
Requires-Dist: langgraph>=0.2.0
Requires-Dist: pillow>=11.3.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: pymupdf>=1.26.3
Requires-Dist: pytest>=8.4.1
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: tiktoken>=0.8.0
Requires-Dist: typing-extensions>=4.8.0
Provides-Extra: dev
Requires-Dist: black>=24.8.0; extra == 'dev'
Requires-Dist: mypy>=1.11.0; extra == 'dev'
Requires-Dist: pre-commit>=3.7.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.24.0; extra == 'dev'
Requires-Dist: pytest-mock>=3.14.0; extra == 'dev'
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Requires-Dist: ruff>=0.6.8; extra == 'dev'
Provides-Extra: vision
Requires-Dist: pymupdf>=1.24.0; extra == 'vision'
Description-Content-Type: text/markdown

# OpenXtract

**Turn documents into structured data**

Open-source toolkit for extracting clean, structured data from text, images, and PDFs.

- [GitHub](https://github.com/Mellow-Artificial-Intelligence/open-xtract)
- [PyPI](https://pypi.org/project/open-xtract/)

## Installation

```bash
pip install open-xtract
# or
uv add open-xtract
```

## Usage

The model string should look like: `<provider>:<model_string>`

Ex. "openai:gpt-5-nano", "xai:grok-4"

```python
from pydantic import BaseModel
from open_xtract import OpenXtract

class InvoiceData(BaseModel):
    invoice_number: str
    date: str
    total_amount: float
    vendor: str

ox = OpenXtract(model="openai:gpt-5-nano")  # or any model

# Extract from text (str)
result = ox.extract("Total: $123.45 on 2025-03-01 from ACME", InvoiceData)
print(result)

# Extract from image (bytes)
with open("/path/to/receipt.png", "rb") as f:
    img_bytes = f.read()
result = ox.extract(img_bytes, InvoiceData)
print(result)

# Extract from PDF (bytes) — each page is rendered to an image internally
with open("/path/to/invoice.pdf", "rb") as f:
    pdf_bytes = f.read()
result = ox.extract(pdf_bytes, InvoiceData)
print(result)
```

## Advanced Features

### Model Configuration

```python
# Use any OpenAI-compatible model
ox = OpenXtract(model="openrouter:qwen/qwen3-max")
ox = OpenXtract(model="xai:grok-4")
```

## Features

- Extract structured data from text
- Model-agnostic (works with any OpenAI-compatible API)
- Simple, clean API

## Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md) for contribution guidelines.

## License

MIT - see [LICENSE](LICENSE).

---

Built with ❤️ by [Mellow AI](https://github.com/Mellow-Artificial-Intelligence)
