Metadata-Version: 2.4
Name: kita
Version: 1.2.0
Summary: Official Python SDK for Kita Document Processing API
Author-email: Kita Team <support@usekita.com>
Maintainer-email: Kita Team <support@usekita.com>
License: MIT
Project-URL: Homepage, https://usekita.com
Project-URL: Documentation, https://docs.usekita.com
Project-URL: Repository, https://github.com/usekita/kita-python-sdk
Project-URL: Issues, https://github.com/usekita/kita-python-sdk/issues
Project-URL: Changelog, https://github.com/usekita/kita-python-sdk/blob/main/CHANGELOG.md
Keywords: kita,document-processing,ocr,bank-statement,payslip,pdf,api,sdk
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Office/Business :: Financial
Classifier: Topic :: Scientific/Engineering :: Image Recognition
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: requests>=2.25.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: black>=22.0.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: types-requests>=2.28.0; extra == "dev"
Dynamic: requires-python

# Kita Python SDK

The official Python SDK for the Kita Document Processing API.

## Installation

```bash
pip install kita-sdk
```

Or install from source:

```bash
cd sdk/python
pip install -e .
```

## Quick Start

```python
from kita import KitaClient

# Initialize client with your API key
client = KitaClient(api_key="kita_prod_...")

# Process a single document
result = client.process("statement.pdf", "bank_statement")

# Access parsed data
print(result.metadata)
print(result.transactions)
```

## Configuration

### API Key

Get your API key from https://api.usekita.com/api-keys.html

You can provide the API key in three ways:

```python
# 1. Pass directly to client
client = KitaClient(api_key="kita_prod_...")

# 2. Set environment variable
# export KITA_API_KEY=kita_prod_...
client = KitaClient()

# 3. Use .env file (with python-dotenv)
from dotenv import load_dotenv
load_dotenv()
client = KitaClient()
```

### Base URL

The SDK defaults to production (`https://api.usekita.com`). For local development:

```python
# Override with parameter
client = KitaClient(api_key="...", base_url="http://localhost:8080")

# Or set environment variable
# export KITA_API_URL=http://localhost:8080
```

## Usage

### Process Single Document

```python
from kita import KitaClient

client = KitaClient(api_key="kita_prod_...")

# Process and wait for result
result = client.process("document.pdf", "bank_statement")

# Access the data
print(result.metadata)           # Account info, dates, etc.
print(result.transactions)       # List of transactions
print(result.signals)            # Financial signals
print(result.raw)                # Full raw response
```

### Document Types

Supported document types (case-insensitive):

| Type | Description |
|------|-------------|
| `bank_statement` | Bank account statements |
| `payslip` | Salary/pay stubs |
| `bill` | Utility bills |
| `bir_2303` | BIR Form 2303 |
| `bir_2307` | BIR Form 2307 |
| `secretarys_certificate` | Secretary's Certificate |
| `credit_report` | Credit reports (CIBI, etc.) |

Your API key may restrict which document types you can process based on your organization's plan.

```python
# All these work
result = client.process("doc.pdf", "bank_statement")
result = client.process("doc.pdf", "BANK_STATEMENT")
result = client.process("doc.pdf", "Bank Statement")
```

### Accessing Full Processed Data

The complete processed data including transaction tables, metrics, and fraud detection is in `result.raw['result']`:

```python
result = client.process("statement.pdf", "bank_statement")

# Full API response
full_json = result.raw

# Processed data with tables, metrics, fraud detection
processed = result.raw['result']

# Transaction tables (rows of transaction data)
tables = processed['tables']
transactions = tables[0]['data']  # List of transaction dicts
for tx in transactions[:5]:
    print(f"{tx.get('Date and Time')} | {tx.get('Description')} | {tx.get('Credit') or tx.get('Debit')}")

# Financial metrics (aggregated totals)
metrics = processed['metrics']
print(f"Inflow: {metrics['total_inflow']}")
print(f"Outflow: {metrics['total_outflow']}")
print(f"Transactions: {metrics['transaction_count']}")

# Category breakdown
categories = processed['category_metrics']
# {'financial': {'count': 14, 'total': 15000}, 'food': {'count': 2, 'total': 500}, ...}

# Fraud detection
fraud = processed['fraud_check']
print(f"Suspicious: {fraud['is_suspicious']}")

# Save to file
import json
with open('result.json', 'w') as f:
    json.dump(full_json, f, indent=2)
```

### Process from URL

Process a document from a public URL (file is downloaded server-side):

```python
result = client.process_url(
    "https://example.com/statement.pdf",
    "bank_statement"
)
print(result.metadata)
```

### Download Custom Excel Export

Download a processed document as a custom Excel export:

```python
result = client.process("statement.pdf", "bank_statement")
client.download_export(result.document_id, "report.xlsx")

# Credit report specific export
result = client.process("report.pdf", "credit_report")
client.download_export(result.document_id, "credit.xlsx", export_type="credit_report")
```

### Async Processing

```python
# Don't wait for completion
result = client.process("large_doc.pdf", "bank_statement", wait=False)
print(result.raw)  # Contains documentId

# Check status later
doc = client.get_document(document_id)
print(doc.status)  # pending, processing, completed, failed
```

### Batch Processing

Process all documents in a folder:

```python
batch = client.batch_process("/path/to/folder", "payslip")

# Wait for all to complete and get results
results = batch.results()  # {filepath: DocumentResult}

for filepath, result in results.items():
    print(result.metadata)
    result.save_json(f"{filepath}_output.json")

# Check progress
print(batch.status)  # {'total': 10, 'completed': 8, 'failed': 1, 'pending': 1}
```

Options:

```python
batch = client.batch_process(
    "/folder",
    "bank_statement",
    extensions=['.pdf', '.png', '.jpg'],  # File types to process
    recursive=True,                        # Search subdirectories
    max_workers=5                          # Parallel upload threads
)
```

### Batch Process URLs

Process multiple documents from URLs:

```python
results = client.batch_process_urls([
    {"file_url": "https://example.com/stmt1.pdf", "document_type": "bank_statement"},
    {"file_url": "https://example.com/stmt2.pdf", "document_type": "bank_statement"},
])
for doc in results['documents']:
    if doc['status'] == 'completed':
        print(doc['result']['metadata'])
```

### Error Handling

```python
from kita import (
    KitaClient,
    KitaError,
    KitaAPIError,
    KitaAuthenticationError,
    KitaRateLimitError
)

client = KitaClient(api_key="kita_prod_...")

try:
    result = client.process("document.pdf", "payslip")
except KitaAuthenticationError:
    print("Invalid API key")
except KitaRateLimitError as e:
    print(f"Rate limited. Retry after {e.retry_after} seconds")
except KitaAPIError as e:
    print(f"API Error {e.status_code}: {e.message}")
except KitaError as e:
    print(f"SDK Error: {e}")
```

### List Documents

```python
docs = client.list_documents(
    limit=50,
    offset=0,
    status='completed',
    document_type='bank_statement'
)
print(docs['documents'])
```

### Convenience Function

For quick one-off processing:

```python
from kita import process

# Uses KITA_API_KEY environment variable
result = process("document.pdf", "bank_statement")
```

## API Reference

### KitaClient

```python
client = KitaClient(
    api_key: str = None,      # API key (or set KITA_API_KEY env var)
    base_url: str = None,     # API URL (default: https://api.usekita.com)
    timeout: int = 60         # Request timeout in seconds
)
```

### Methods

#### `process(file_path, document_type, ...)`

Process a single document.

```python
result = client.process(
    file_path: str,           # Path to document
    document_type: str,       # Type of document
    wait: bool = True,        # Wait for completion
    poll_interval: int = 2,   # Seconds between status checks
    timeout: int = 600,       # Max wait time in seconds
    password: str = None      # PDF password if encrypted
)
```

Returns: `DocumentResult`

#### `process_url(file_url, document_type, ...)`

Process a document from a URL.

```python
result = client.process_url(
    file_url: str,            # Public URL to document
    document_type: str,       # Type of document
    filename: str = None,     # Optional filename override
    wait: bool = True,        # Wait for completion
    poll_interval: int = 3,   # Seconds between status checks
    timeout: int = 600        # Max wait time in seconds
)
```

Returns: `DocumentResult`

#### `download_export(document_id, output_path, export_type)`

Download a processed document as an Excel export.

```python
client.download_export(
    document_id: str,         # Document ID from result.document_id
    output_path: str,         # Path to save .xlsx file
    export_type: str = 'custom'  # 'custom' or 'credit_report'
)
```

Returns: output file path

#### `batch_process(folder_path, document_type, ...)`

Process multiple documents from a folder.

```python
batch = client.batch_process(
    folder_path: str,         # Path to folder
    document_type: str,       # Type of documents
    extensions: list = None,  # File extensions (default: ['.pdf', '.png', '.jpg', '.jpeg'])
    recursive: bool = False,  # Search subdirectories
    max_workers: int = 5      # Parallel upload threads
)
```

Returns: `Batch`

#### `batch_process_urls(documents, ...)`

Process multiple documents from URLs.

```python
results = client.batch_process_urls(
    documents: list,          # List of {file_url, document_type, filename?}
    wait: bool = True,        # Wait for completion
    poll_interval: int = 3,   # Seconds between status checks
    timeout: int = 600        # Max wait time
)
```

Returns: `dict` with batch results

#### `get_document(document_id)`

Get a processed document by ID.

Returns: `DocumentResult`

#### `list_documents(limit, offset, status, document_type)`

List processed documents.

Returns: `dict` with `documents` list

### DocumentResult

```python
result.status          # 'completed', 'failed', etc.
result.document_id     # Document ID (for exports)
result.document_type   # 'bank_statement', etc.
result.metadata        # Dict with account info, dates, etc.
result.transactions    # List of transactions
result.signals         # Financial signals
result.raw             # Full raw response dict
result.to_dict()       # Convert to dictionary
result.to_json()       # Formatted JSON string
result.save_json(path) # Save to JSON file
```

### Batch

```python
batch.id               # Batch ID
batch.status           # Dict with total/completed/failed/pending counts
batch.results()        # Dict mapping filepath -> DocumentResult
```

## Environment Variables

| Variable | Description | Default |
|----------|-------------|---------|
| `KITA_API_KEY` | API key | (required) |
| `KITA_API_URL` | API base URL | `https://api.usekita.com` |

## Development

```bash
# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Format code
black kita/
```

## License

MIT License

## Support

- Documentation: https://docs.usekita.com
- Email: support@usekita.com
