Metadata-Version: 2.4
Name: kita
Version: 2.0.0
Summary: Official Python SDK for Kita Document Processing API
Author-email: Kita Team <support@usekita.com>
Maintainer-email: Kita Team <support@usekita.com>
License: MIT
Project-URL: Homepage, https://usekita.com
Project-URL: Documentation, https://docs.usekita.com
Project-URL: Repository, https://github.com/usekita/kita-python-sdk
Project-URL: Issues, https://github.com/usekita/kita-python-sdk/issues
Project-URL: Changelog, https://github.com/usekita/kita-python-sdk/blob/main/CHANGELOG.md
Keywords: kita,document-processing,ocr,bank-statement,payslip,pdf,api,sdk
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Office/Business :: Financial
Classifier: Topic :: Scientific/Engineering :: Image Recognition
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: requests>=2.25.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: black>=22.0.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: types-requests>=2.28.0; extra == "dev"
Dynamic: requires-python

# Kita Python SDK

The official Python SDK for the Kita Document Processing API.

## Installation

```bash
pip install kita
```

## Quick Start

```python
from kita import KitaClient

client = KitaClient(api_key="kita_prod_...")

# Process a document
result = client.process("statement.pdf", "bank_statement")
print(result.metadata)
print(result.transactions)
result.save_json("output.json")
```

## Configuration

### API Key

Get your API key from the Kita dashboard.

```python
# Pass directly
client = KitaClient(api_key="kita_prod_...")

# Or set environment variable: export KITA_API_KEY=kita_prod_...
client = KitaClient()
```

### Base URL

The SDK defaults to production (`https://portal.usekita.com`). Override for local development:

```python
client = KitaClient(api_key="...", base_url="http://localhost:8080")
# Or: export KITA_API_URL=http://localhost:8080
```

---

## Document Types

### Transaction-based types

| Type | Description |
|------|-------------|
| `bank_statement` | Bank account statements |
| `passbook` | Passbook savings accounts |
| `credit_card_statement` | Credit card statements |

### Structured types

| Type | Description |
|------|-------------|
| `payslip` | Salary/pay stubs |
| `bill` | Utility bills |
| `credit_report` | Credit reports (CIBI, etc.) |
| `sales_invoice` | Sales invoices |
| `afs` | Audited financial statements |

### Schema-based types

| Type | Description |
|------|-------------|
| `bir_2303` | BIR Form 2303 (Certificate of Registration) |
| `bir_2307` | BIR Form 2307 (Certificate of Creditable Tax Withheld) |
| `secretarys_certificate` | Secretary's Certificate |
| `certificate_of_employment` | Certificate of Employment |
| `government_id` | Government-issued IDs (passport, driver's license, etc.) |
| `tin_id` | Tax Identification Number card |
| `income_tax_return` | Income tax returns (ITR) |
| `proof_of_billing` | Proof of billing address |
| `loan_statement` | Loan statements |
| `remittance_slip` | Remittance slips |
| `business_registration_dti` | DTI business registration |
| `business_registration_sec` | SEC business registration |
| `mayors_permit` | Mayor's permit |
| `business_permit` | Business permit |
| `certificate_of_registration` | Certificate of registration |
| `general_information_sheet` | General information sheet (GIS) |
| `certificate_of_incorporation` | Certificate of incorporation |
| `purchase_order` | Purchase orders |
| `land_title` | Land titles |
| `vehicle_registration` | Vehicle registration (OR/CR) |
| `insurance_policy` | Insurance policies |
| `loan_agreement` | Loan agreements |
| `bill_of_lading` | Bill of lading |
| `barangay_clearance` | Barangay clearance |

### General types

| Type | Description |
|------|-------------|
| `general_document` | Auto-detected general document |
| `other_document` | Unclassified document |

Type names are case-insensitive: `"bank_statement"`, `"BANK_STATEMENT"`, and `"Bank Statement"` all work.

---

## Methods

### `process(file_path, document_type)` -- Process a document

```python
result = client.process(
    "statement.pdf",
    "bank_statement",
    wait=True,            # Wait for completion (default: True)
    poll_interval=2,      # Seconds between status checks
    timeout=600,          # Max wait time
    password=None,        # PDF password if encrypted
    show_progress=True    # Show spinner
)
```

Returns a `DocumentResult`.

### `process_url(file_url, document_type)` -- Process from URL

```python
result = client.process_url(
    "https://example.com/statement.pdf",
    "bank_statement",
    filename="statement.pdf"  # Optional filename override
)
```

### `get_result(document_id)` -- Get full result by ID

```python
result = client.get_result(12345)
print(result.metadata)
result.save_json("output.json")
```

### `get_summary(document_id)` -- Bank statement summary

Returns a flat dict of 48 summary metrics (no transactions). Only works for `bank_statement` and `passbook` documents.

```python
# As JSON dict
summary = client.get_summary(result.document_id)
print(summary['total_inflow'])
print(summary['average_daily_balance'])

# As CSV string
csv_data = client.get_summary(result.document_id, format='csv')
with open('summary.csv', 'w') as f:
    f.write(csv_data)
```

### `custom_export(document_id, output_path, export_type)` -- Excel export

```python
# Org-configured custom Excel export
client.custom_export(result.document_id, "report.xlsx")

# Credit report multi-sheet export
client.custom_export(result.document_id, "credit.xlsx", export_type="credit_report")
```

### `batch_process(folder_path, document_type)` -- Batch from folder

```python
batch = client.batch_process(
    "/path/to/statements",
    "bank_statement",
    extensions=['.pdf', '.png', '.jpg'],  # Default: ['.pdf', '.png', '.jpg', '.jpeg']
    recursive=False,                       # Search subdirectories
    max_workers=5                          # Parallel upload threads
)

results = batch.results()  # {filepath: DocumentResult}

for filepath, result in results.items():
    print(f"{filepath}: {result.status}")
    result.save_json(f"{filepath}_output.json")
```

### `batch_process_urls(documents)` -- Batch from URLs

```python
results = client.batch_process_urls([
    {"file_url": "https://example.com/stmt1.pdf", "document_type": "bank_statement"},
    {"file_url": "https://example.com/stmt2.pdf", "document_type": "bank_statement"},
])
for doc in results['documents']:
    if doc['status'] == 'completed':
        print(doc['result']['metadata'])
```

### `list_documents(limit, offset, status, document_type)` -- List documents

```python
docs = client.list_documents(limit=50, status='completed', document_type='bank_statement')
for doc in docs['documents']:
    print(f"{doc['id']}: {doc.get('document_type')}")
```

---

## Standard Response Format

All document types share this standard structure:

```json
{
  "status": "completed",
  "document_type": "bank_statement",
  "document_id": 2791,
  "filename": "statement.pdf",
  "processing_time_seconds": 50.46,
  "uploaded_at": "2026-02-17T...",
  "metadata": { ... },
  "extracted_data": { ... },
  "fraud_detection": { ... }
}
```

Key standardization notes:
- `document_id` is always a **number** (not a string)
- `processing_time_seconds` is always a **number** (not a string)
- `metadata` has internal fields stripped (`schema_used`, `display_name`, `document_type`)
- `extracted_data` contains all type-specific content
- `fraud_detection` is **only present when real data exists** (omitted when empty)

---

## DocumentResult

All processing methods return a `DocumentResult` object with property accessors that automatically look inside `extracted_data`:

```python
# Common properties (all document types)
result.status           # 'completed', 'failed', etc.
result.document_id      # Document ID (int)
result.document_type    # 'bank_statement', 'payslip', etc.
result.metadata         # Dict of document metadata
result.extracted_data   # Full extracted data container
result.fraud_detection  # Fraud detection results (if present)
result.raw              # Full response dict

# Bank statement / passbook / credit card statement
result.transactions     # List of transactions
result.metrics          # Summary metrics, category breakdowns

# Payslip
result.employment_info  # Employer/employee details, statutory IDs
result.year_to_date     # YTD totals (only when non-null)
result.signals          # Lender signals (flat array)
result.payslips         # Payslips array (earnings, deductions, totals, pay_period)
result.payslip_count    # Number of payslips detected
result.fraud_score      # Fraud score

# Bill
result.bill_fields      # Provider, amounts, dates, account info
result.signals          # Verification signals (list)
result.signal_summary   # Signal summary (score, pass/warn/fail counts)

# Sales invoice
result.extracted_data   # Invoice data (invoices array)
result.invoice_signals  # Invoice verification signals

# Credit report
result.credit_report_data  # Accounts, KYC, summaries, payment history
result.metrics             # Credit report metrics

# Other / generic / schema-based documents
result.extracted_data   # Schema-extracted fields
result.general_signals  # AI-generated document signals

# Serialization
result.to_dict()        # Convert to dictionary
result.to_json()        # Formatted JSON string
result.save_json(path)  # Save to JSON file
result['key']           # Dict-like access
result.get('key', default)
```

---

## Response Examples by Document Type

### Bank Statement

```python
result = client.process("statement.pdf", "bank_statement")
```

```json
{
  "status": "completed",
  "document_type": "bank_statement",
  "document_id": 123,
  "filename": "statement.pdf",
  "processing_time_seconds": 12.5,
  "uploaded_at": "2026-02-17T...",

  "metadata": {
    "account_holder_name": "Juan Dela Cruz",
    "account_number": "1234567890",
    "financial_institution": "BDO",
    "statement_start_date": "01-01-2024",
    "statement_end_date": "01-31-2024",
    "country": "Philippines",
    "currency": "PHP",
    "opening_balance": 50000.00,
    "closing_balance": 62000.00
  },

  "extracted_data": {
    "transactions": [
      {
        "date": "01-02-2024",
        "description": "SALARY CREDIT",
        "credit": 30000.00,
        "debit": null,
        "balance": 80000.00,
        "category": "income",
        "subcategory": "salary",
        "transaction_type": "credit"
      }
    ],

    "metrics": {
      "total_inflow": 45000.00,
      "total_outflow": 33000.00,
      "net_cash_flow": 12000.00,
      "average_balance": 58000.00,
      "total_transactions": 25,
      "by_category": { ... },
      "by_month": { ... }
    }
  },

  "fraud_detection": {
    "risk_level": "low",
    "authenticity_score": 92,
    "signals": [
      {
        "severity": "info",
        "category": "document_integrity",
        "message": "Document appears authentic"
      }
    ]
  }
}
```

### Payslip

```python
result = client.process("payslip.pdf", "payslip")
```

```json
{
  "status": "completed",
  "document_type": "payslip",
  "document_id": 456,
  "filename": "payslip.pdf",
  "processing_time_seconds": 15.2,
  "uploaded_at": "2026-02-17T...",

  "metadata": {
    "employee_name": "Juan Dela Cruz",
    "employer_name": "Acme Corp",
    "pay_date": "01-15-2024",
    "period_start": "01-01-2024",
    "period_end": "01-15-2024"
  },

  "extracted_data": {
    "payslip_count": 1,
    "payslips": [
      {
        "earnings": [
          { "label": "Basic Pay", "amount": 25000, "taxable": true },
          { "label": "Rice Allowance", "amount": 2000, "taxable": false }
        ],
        "deductions": [
          { "label": "SSS", "amount": 900, "category": "sss" },
          { "label": "PhilHealth", "amount": 450, "category": "philhealth" },
          { "label": "Withholding Tax", "amount": 2500, "category": "tax" }
        ],
        "totals": {
          "gross_pay": 30000,
          "total_deductions": 4950,
          "net_pay": 25050
        },
        "pay_period": {
          "start_date": "01-01-2024",
          "end_date": "01-15-2024",
          "pay_date": "01-15-2024"
        }
      }
    ],

    "employment_info": {
      "employer_name": "Acme Corp",
      "employee_name": "Juan Dela Cruz",
      "employee_id": "EMP-001",
      "department": "Engineering",
      "employment_type": "Regular",
      "statutory_ids": { "tin": "...", "sss": "...", "philhealth": "...", "pagibig": "..." }
    },

    "signals": [
      { "key": "mandatory_coverage", "label": "Mandatory Deductions Coverage", "score": 95, "status": "good", "display_value": "3 / 3" },
      { "key": "arithmetic_integrity", "label": "Payslip Arithmetic Integrity", "score": 98, "status": "good", "display_value": "98 / 100" },
      { "key": "fraud_confidence", "label": "Document Trust Score", "score": 80, "status": "good", "display_value": "80 / 100" }
    ],

    "fraud_score": {
      "overall_score": 95.93,
      "risk_level": "low",
      "confidence": "low",
      "categories": {
        "duplicates": { "score": 100, "confidence": "low" },
        "round_numbers": { "score": 72.86, "confidence": "low" },
        "data_consistency": { "score": 100, "confidence": "medium" }
      }
    }
  }
}
```

Key simplifications:
- **No `payslip_data` wrapper** -- earnings/deductions/totals/pay_period are directly on each payslip entry
- **No `financial_breakdown`** -- tax classification merged into each earning item via `taxable` boolean
- **No `underwriting_signals`** -- internal analytics, not included in download
- **`signals` not `payslip_signals`** -- flat array of lender signals (key, label, score, status, display_value)
- **`year_to_date`** only present when non-null values exist
- **Deduction categories** cleaned: `"CATEGORY: SSS"` becomes `"sss"`
- **`fraud_score` simplified** -- categories have just score + confidence

### Bill

```python
result = client.process("bill.pdf", "bill")
```

```json
{
  "status": "completed",
  "document_type": "bill",
  "document_id": 789,
  "filename": "bill.pdf",
  "processing_time_seconds": 8.3,
  "uploaded_at": "2026-02-17T...",

  "metadata": {
    "account_holder_name": "Juan Dela Cruz",
    "service_address": "123 Main St, Makati"
  },

  "extracted_data": {
    "bill_fields": {
      "provider": "Meralco",
      "account_number": "1234567890",
      "billing_period_start": "12-01-2023",
      "billing_period_end": "12-31-2023",
      "due_date": "01-15-2024",
      "total_amount_due": 3500.00
    },
    "signals": [
      {
        "signal_id": "address_match",
        "label": "Address Verification",
        "value": true,
        "status": "pass",
        "message": "Service address matches applicant address"
      }
    ],
    "signal_summary": {
      "overall_score": 85,
      "total_signals": 6,
      "passed": 5,
      "warnings": 1,
      "failed": 0,
      "risk_level": "low"
    }
  }
}
```

### Credit Report

```python
result = client.process("credit_report.pdf", "credit_report")
```

```json
{
  "status": "completed",
  "document_type": "credit_report",
  "document_id": 321,
  "filename": "credit_report.pdf",
  "processing_time_seconds": 25.1,
  "uploaded_at": "2026-02-17T...",

  "metadata": {
    "subject_name": "Dela Cruz, Juan",
    "bureau_score": 650,
    "source_bureau": "CIBI"
  },

  "extracted_data": {
    "credit_report_data": {
      "report_metadata": {
        "source_bureau": "CIBI",
        "bureau_score_value": 650,
        "bureau_score_band": "Fair"
      },
      "subject_person": {
        "last_name": "Dela Cruz",
        "first_name": "Juan",
        "date_of_birth": "1990-05-15"
      },
      "accounts": [
        {
          "product_type": "Installment",
          "product_category": "Housing Loan",
          "provider_name": "BDO",
          "outstanding_balance": 1800000,
          "monthly_payment": 15000
        }
      ],
      "kyc_data": { ... }
    },

    "metrics": {
      "credit_report_metrics": {
        "loan_activity_24m": { ... },
        "repayment_performance_60m": { ... },
        "dpd_analysis_60m": { ... }
      }
    }
  }
}
```

### Sales Invoice

```python
result = client.process("invoice.pdf", "sales_invoice")
```

```json
{
  "status": "completed",
  "document_type": "sales_invoice",
  "document_id": 987,
  "filename": "invoice.pdf",
  "processing_time_seconds": 30.5,
  "uploaded_at": "2026-02-17T...",

  "metadata": { ... },

  "extracted_data": {
    "invoices": [
      {
        "seller": { "name": "ABC Trading Corp", "tin": "123-456-789-000" },
        "buyer": { "name": "XYZ Industries", "tin": "987-654-321-000" },
        "invoice_number": "INV-2024-001",
        "invoice_date": "2024-01-15",
        "line_items": [
          { "description": "Product A", "quantity": 100, "unit_price": 500, "amount": 50000 }
        ],
        "subtotal": 50000,
        "vat": 6000,
        "total": 56000
      }
    ]
  },

  "invoice_signals": {
    "signals": [ ... ],
    "per_invoice": [ ... ]
  }
}
```

### AFS (Audited Financial Statement)

```python
result = client.process("afs.pdf", "afs")
```

```json
{
  "status": "completed",
  "document_type": "afs",
  "document_id": 654,
  "filename": "afs.pdf",
  "processing_time_seconds": 45.0,
  "uploaded_at": "2026-02-17T...",

  "metadata": { ... },
  "statements_found": {
    "balance_sheet": true,
    "income_statement": true,
    "cash_flow_statement": true
  },

  "extracted_data": {
    "signals": {
      "profitability": { "revenue": 5000000, "net_income": 800000 },
      "liquidity": { "current_ratio": 2.1 },
      "leverage": { "debt_to_equity": 0.45 }
    },
    "risk_flags": [],
    "data_validation": { ... }
  },

  "financial_tables": [ ... ]
}
```

### Schema-based Documents (BIR, COE, Government ID, etc.)

All schema-based types return their extracted fields in `extracted_data` with no `fraud_detection` key:

```python
result = client.process("bir.pdf", "bir_2303")
```

```json
{
  "status": "completed",
  "document_type": "bir_2303",
  "document_id": 111,
  "filename": "bir.pdf",
  "processing_time_seconds": 5.2,
  "uploaded_at": "2026-02-17T...",

  "metadata": {},
  "extracted_data": {
    "tin": "123-456-789-000",
    "registered_name": "ABC Corp",
    "registration_date": "2020-01-15",
    "business_address": "...",
    "lines_of_business": ["Retail Trade"],
    "tax_types": ["Income Tax", "VAT"]
  }
}
```

### Other / General Document

```python
result = client.process("document.pdf", "other_document")
```

```json
{
  "status": "completed",
  "document_type": "other_document",
  "document_id": 222,
  "filename": "document.pdf",
  "processing_time_seconds": 10.0,
  "uploaded_at": "2026-02-17T...",

  "metadata": { ... },
  "extracted_data": {
    "general_signals": { ... }
  }
}
```

---

## Error Handling

```python
from kita import (
    KitaClient,
    KitaError,               # Base SDK error
    KitaAPIError,             # API returned an error (has status_code, message)
    KitaAuthenticationError,  # 401 - invalid API key
    KitaRateLimitError        # 429 - rate limited (has retry_after)
)

try:
    result = client.process("doc.pdf", "bank_statement")
except KitaAuthenticationError:
    print("Invalid API key")
except KitaRateLimitError as e:
    print(f"Rate limited. Retry after {e.retry_after}s")
except KitaAPIError as e:
    print(f"API Error {e.status_code}: {e.message}")
except KitaError as e:
    print(f"SDK Error: {e}")
```

---

## Environment Variables

| Variable | Description | Default |
|----------|-------------|---------|
| `KITA_API_KEY` | API key | (required) |
| `KITA_API_URL` | API base URL | `https://portal.usekita.com` |

## License

MIT License
