Metadata-Version: 2.4
Name: reducto-cli
Version: 0.1.1
Summary: CLI for Reducto document processing
Requires-Python: >=3.11
Description-Content-Type: text/markdown
Requires-Dist: async-typer>=0.1.10
Requires-Dist: reductoai>=0.13.0
Requires-Dist: typer>=0.20.0

# Reducto CLI

Welcome to the Reducto CLI. This tool lets you parse documents, extract structured data, and modify documents using Reducto’s platform.

## Installation

Install the Reducto CLI using pip:

```bash
pip install reducto-cli
```

## Authentication

Before using the CLI, authenticate with your Reducto API key:

```bash
reducto login
```

### Examples

- Parse a single file: `reducto parse path/to/document.pdf`
- Parse an entire folder: `reducto parse ./docs`
- Extract with a schema (path or inline JSON): `reducto extract ./docs/invoice.pdf -s schemas/invoice.json`
- Edit a single file: `reducto edit path/to/document.pdf --instructions "Your editing instructions here"`

Parsed outputs are written as `<filename>.parse.md`. Extraction reuses existing parses when possible and saves `<filename>.extract.json` containing only the payload.

## Supported File Types

  • PDF: `.pdf`  
  • Images: `.png`, `.jpg`, `.jpeg`  
  • Office documents: `.doc`, `.docx`, `.ppt`, `.pptx`  
  • Spreadsheets: `.xls`, `.xlsx`  

Commands accept either a file or a directory. Directories are scanned recursively, and only the supported file types listed above are processed.

## Parse Command Options

The `parse` command supports several flags to customize parsing behavior:

### Flags

| Flag | Description |
|------|-------------|
| `--agentic` | Enables all agentic options for tables, text, and figures. Increases accuracy but also increases latency. Use when document quality or complex layouts require enhanced processing. |
| `--change-tracking` | Enables change tracking during parsing. Returns `<s>` tags around strikethrough text, `<u>` tags around underlined text, and `<change>` tags around colored adjacent strikethrough and underlined text. Useful for documents with revision history. |
| `--highlights` | Include highlighted text in the parsed output. |
| `--hyperlinks` | Include embedded hyperlinks in the parsed output. |
| `--comments` | Include document comments in the parsed output. |

### Examples

```bash
# Basic parse
reducto parse document.pdf

# Parse with maximum accuracy (slower)
reducto parse document.pdf --agentic

# Parse a contract with change tracking
reducto parse contract.pdf --change-tracking

# Parse with all metadata
reducto parse document.pdf --hyperlinks --comments --highlights

# Combine flags as needed
reducto parse legal_doc.pdf --agentic --change-tracking --comments
```

## Extract Command Overview

The `extract` command enables you to pull specific, structured data from your documents according to a schema you provide (using JSON Schema). It is designed to automate information extraction by mapping complex or unstructured documents—such as invoices, receipts, reports, forms, contracts, financial statements, or tables—into machine-readable JSON.

Common use cases include:
- Extracting line items, totals, vendor/customer info from invoices and receipts
- Pulling key fields, tables, or sections from contracts or legal documents
- Capturing form field values from scanned forms or applications
- Summarizing structured results from reports, statements, or medical records

By providing a schema, you ensure consistency and determinism, so the extracted JSON conforms exactly to your business requirements. This is especially valuable for automating downstream processing pipelines, integrating with databases, or feeding data to other tools.

You can perform extraction on individual files or batches (folders), and extracted payloads are saved as `<filename>.extract.json`.

## Schema Guidelines for `reducto extract`

  • Schemas must be valid JSON Schema documents.  
  • The top-level schema **must** be an object (`{"type": "object", ...}`) — inline strings or arrays are not permitted.  
  • Provide explicit property definitions so the extractor can map fields deterministically.  
  • Schemas may be supplied as file paths or inline JSON strings.

### Example Schema

```json
{
  "type": "object",
  "properties": {
    "items": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "article_number": {"type": "string"},
          "description": {"type": "string"},
          "quantity": {"type": "number"},
          "unit_price": {"type": "number"},
          "total_price": {"type": "number"}
        },
        "required": [
          "article_number",
          "description",
          "quantity",
          "unit_price",
          "total_price"
        ]
      }
    }
  },
  "required": ["items"]
}
```

You can reuse parses across multiple extractions: the CLI automatically detects existing `.parse.md` files, rehydrates the recorded job ID, and uses `jobid://<id>` references to accelerate extraction jobs.

## Editing Documents with `reducto edit`

The `edit` command allows you to modify documents using natural language instructions. It uploads the document, applies the specified edits, and downloads the resulting file.

### Usage

```bash
reducto edit path/to/document.pdf --instructions "Your editing instructions here"
reducto edit path/to/document.pdf -i "Your editing instructions here"
```

### Parameters

| Parameter | Required | Description |
|-----------|----------|-------------|
| `path` | Yes | Path to a file or directory. Directories are scanned recursively for supported file types. |
| `--instructions`, `-i` | Yes | Natural language instructions describing the edits to apply. |

### Output

Edited files are saved alongside the original with the naming pattern `<filename>.edited.<extension>`. For example:
- `invoice.pdf` → `invoice.edited.pdf`
- `report.docx` → `report.edited.docx`

### Examples

```bash
reducto edit contract.pdf -i "Fill in the client name as 'Acme Corporation' and set the contract date to January 15, 2024"

reducto edit document.pdf -i "Fill out the form with: Name: John Doe, Email: john@example.com, Select 'Yes' for newsletter subscription"
```

### Effective Instructions

For best results with the `--instructions` flag:
- Be specific about what content to modify and how
- Reference specific elements (headers, footers, tables, specific text)
- Describe the desired outcome clearly
- For bulk operations on directories, ensure instructions apply uniformly to all file types

