Metadata-Version: 2.4
Name: petey
Version: 0.1.1
Summary: Petey — The Easy PDF Extractor
License-Expression: MIT
Project-URL: Homepage, https://github.com/user/petey
Project-URL: Repository, https://github.com/user/petey
Keywords: pdf,extraction,llm,openai,anthropic,structured-data
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Text Processing
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: PyMuPDF
Requires-Dist: openai
Requires-Dist: anthropic
Requires-Dist: instructor
Requires-Dist: pydantic
Requires-Dist: pyyaml
Requires-Dist: python-dotenv
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-asyncio; extra == "dev"
Requires-Dist: build; extra == "dev"
Requires-Dist: twine; extra == "dev"
Dynamic: license-file

# Petey

The Easy PDF Extractor.

```bash
pip install petey
```

## Setup

Add your API key to a `.env` file:

```
OPENAI_API_KEY=sk-...
```

Or for Anthropic:

```
ANTHROPIC_API_KEY=sk-ant-...
```

## Usage

```bash
petey extract --schema schema.yaml ./pdfs/ -o results.csv
```

Options: `--model/-m` (default: `gpt-4.1-mini`), `--concurrency/-c` (default: 10), `--format/-f` (csv/json/jsonl), `--output/-o`, `--instructions/-i`.

## Schema

```yaml
name: Invoice
fields:
  vendor:
    type: string
    description: Company name on the invoice
  amount:
    type: number
    description: Total amount due
  date:
    type: date
    description: Invoice date
  status:
    type: enum
    values: [Paid, Unpaid, Overdue]
    description: Payment status
```

Field types: `string`, `number`, `date`, `enum` (with or without `values`), `array` (with nested `fields`).

All fields are nullable — the LLM returns `null` for anything it can't find.

Set `record_type: array` at the top level for table extraction (multiple records per document).

Add `instructions` at the top level to append guidance to the system prompt.
