Metadata-Version: 2.1
Name: doxstractor
Version: 0.0.5
Summary: Doxstractor extracts strutured data from text in an easily configurable way.
Author: Jannes Klaas
License: Apache Software License (Apache 2.0)
Project-URL: Homepage, https://github.com/JannesKlaas/doxstractor
Project-URL: Issues, https://github.com/JannesKlaas/doxstractor/issues
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE.md
Requires-Dist: anthropic==0.25.1
Requires-Dist: transformers==4.39.3

# Doxtractor 📄➡️📊 

Doxtractor is a modular library to extract structured data from documents using LLMs.

There are many situations where you want to extract data such as numbers, text or categories from a bunch of documents. Doxstractor was created with M&A due dilligence in mind. When a company is sold, the prospective buyer will recieve a data room with everything from key employment contracts to real estate leases. 

People will then need to go through all these documents and extract key information, such as "How many stock options have been granted?" or "Does this lease contain a break clause?". This data is then first compiled into spreadsheets, and finally written up in the due dilligence report. It is tedious for the people doing it and expensive to the people buying the report.

## Tutorial

Install using pip:
`pip install doxstractor`

