Metadata-Version: 2.4
Name: atomingest
Version: 0.2.0
Summary: A metadata-driven data ingestion framework for AtomSQL with fintech-grade security and compliance features.
Author-email: Prashant Singh <box_prashant@outlook.com>
Maintainer-email: Prashant Singh <box_prashant@outlook.com>
License: MIT
Project-URL: Homepage, https://github.com/prashant-fintech/atomingest
Project-URL: Repository, https://github.com/prashant-fintech/atomingest.git
Project-URL: Issues, https://github.com/prashant-fintech/atomingest/issues
Project-URL: Documentation, https://github.com/prashant-fintech/atomingest/blob/main/README.md
Project-URL: Changelog, https://github.com/prashant-fintech/atomingest/releases
Keywords: data-engineering,etl,ingestion,fintech,orm,atomsql,pipeline,normalization,validation,metadata,yaml,csv,database,compliance,security,encryption
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: Intended Audience :: System Administrators
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Database
Classifier: Topic :: Office/Business :: Financial
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: System :: Systems Administration
Classifier: Topic :: Security :: Cryptography
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Environment :: Console
Classifier: Typing :: Typed
Requires-Python: >=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: atomsql>=0.0.1
Requires-Dist: PyYAML>=6.0
Requires-Dist: cryptography>=41.0.0
Requires-Dist: click>=8.1.0
Requires-Dist: python-dateutil>=2.8.0
Provides-Extra: all
Requires-Dist: openpyxl>=3.1.0; extra == "all"
Requires-Dist: pandas>=2.0.0; extra == "all"
Requires-Dist: fastparquet>=2024.2.0; extra == "all"
Requires-Dist: prometheus-client>=0.18.0; extra == "all"
Provides-Extra: excel
Requires-Dist: openpyxl>=3.1.0; extra == "excel"
Provides-Extra: parquet
Requires-Dist: fastparquet>=2024.2.0; extra == "parquet"
Requires-Dist: pandas>=2.0.0; extra == "parquet"
Provides-Extra: metrics
Requires-Dist: prometheus-client>=0.18.0; extra == "metrics"
Provides-Extra: dev
Requires-Dist: pytest>=8.0.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.23.0; extra == "dev"
Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
Requires-Dist: ruff>=0.4.0; extra == "dev"
Requires-Dist: pre-commit>=3.5.0; extra == "dev"
Requires-Dist: mypy>=1.8.0; extra == "dev"
Requires-Dist: bandit[toml]>=1.7.0; extra == "dev"
Requires-Dist: types-PyYAML>=6.0; extra == "dev"
Requires-Dist: sphinx>=7.0.0; extra == "dev"
Requires-Dist: sphinx-rtd-theme>=2.0.0; extra == "dev"
Dynamic: license-file

# AtomIngest

**A Metadata-Driven Data Ingestion Framework for [AtomSQL](https://github.com/prashant-fintech/atomsql)**

[![PyPI Version](https://img.shields.io/pypi/v/atomingest)](https://pypi.org/project/atomingest/)
[![Python Versions](https://img.shields.io/pypi/pyversions/atomingest)](https://pypi.org/project/atomingest/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
[![Downloads](https://img.shields.io/pypi/dm/atomingest)](https://pypi.org/project/atomingest/)
[![Build Status](https://img.shields.io/github/actions/workflow/status/prashant-fintech/atomingest/ci.yml?branch=main)](https://github.com/prashant-fintech/atomingest/actions)
[![Code Coverage](https://img.shields.io/codecov/c/github/prashant-fintech/atomingest)](https://codecov.io/gh/prashant-fintech/atomingest)

**AtomIngest** is a configurable, zero-boilerplate data ingestion framework designed for fintech environments where data reliability, compliance, and scalability are paramount. Built on the metaprogramming foundations of **AtomSQL**, it dynamically generates ORM models, validators, and transformation logic entirely from YAML configurations.

---

## 🚀 Key Features

AtomIngest transforms raw data (CSV, JSON, APIs) into structured, compliant records in AtomSQL-managed databases without writing repetitive model code.

* **📄 YAML-Driven Configuration**: Define table schemas, constraints, and pipelines declaratively.
* **🔮 Dynamic Model Generation**: Uses Python metaprogramming to create AtomSQL models (`ModelMeta`) at runtime.
* **🛡️ Fintech-Grade Security**:
    * **PII Vault**: Transparent field-level encryption for sensitive data.
    * **Immutable Ledgers**: WORM (Write Once, Read Many) support for audit trails.
    * **Crypto-Shredding**: GDPR compliance via key destruction.
* **✅ Robust Validation**: Inject regex, range checks, and custom business rules directly into `Field` definitions.
* **🔄 Reliability**: Built-in support for Idempotency (Upserts), Dead Letter Queues (DLQ), and Transaction Management.
* **⚡ High Performance**: Async batching and smart buffering for high-throughput feeds.

---

## 📦 Installation

```bash
pip install atomingest
```

Requires Python 3.12+ and AtomSQL.

---

## ⚡ Quick Start

### 1. Define your Pipeline (`trades.yaml`)

Create a YAML configuration defining your target table, schema, and validation rules. AtomIngest maps these types directly to `atomsql.orm.fields`.

```yaml
target_table: trades
strategy: upsert
business_keys: [trade_id]

schema:
  trade_id:
    type: StringField
    unique: true
    nullable: false
  symbol:
    type: CharField
    max_length: 10
    validation:
      regex: "^[A-Z]{3,5}$"  # Validates ticker symbols (e.g., AAPL)
  amount:
    type: DecimalField
    validation:
      min: 0.01
  executed_at:
    type: DateTimeField

hooks:
  pre_save: "utils.enrich_metadata"
```

### 2. Run the Ingestion

Use the CLI or Python API to ingest data. The framework dynamically builds the `Trade` model and inserts data into your database.

**Using CLI:**

```bash
atomingest run trades.yaml --source data/daily_trades.csv
```

**Using Python:**

```python
from atomingest.core import Ingester
from atomsql import Database

# Connect to DB
db = Database("sqlite:///finance.db")

# Initialize Ingester
ingest = Ingester(db)

# Run Pipeline
stats = ingest.run(
    config="trades.yaml",
    source="data/daily_trades.csv"
)

print(f"Ingested: {stats.processed}, Failed: {stats.failed}")
```

### 3. Normalization Pipeline

AtomIngest includes a powerful normalization pipeline to clean and standardize raw data before ingestion:

```python
from atomingest.normalization import normalize_csv_file, NormalizationStepConfig

# Define normalization steps
steps = [
    NormalizationStepConfig(step="RemoveBOMStep", config={}, enabled=True),
    NormalizationStepConfig(step="TrimWhitespaceStep", config={"trim_quotes": True}, enabled=True),
    NormalizationStepConfig(step="ColumnNameNormalizationStep", config={"case": "lower"}, enabled=True),
]

# Normalize data
rows, report = normalize_csv_file("messy_data.csv", steps=steps)
print(f"Cleaned {report.rows_emitted} rows with {len(report.warnings)} warnings")
```

**YAML Integration:**

```yaml
target_table: customers

# Embedded normalization configuration
normalization:
  strict: false
  steps:
    - step: TrimWhitespaceStep
      config: {trim_quotes: true, collapse_spaces: true}
      enabled: true
    - step: NullTokenNormalizationStep
      config: {null_tokens: ["", "NA", "NULL", "-"]}
      enabled: true
    - step: BasicTypeCleanupStep
      config:
        numeric_columns: ["age", "amount"]
        date_columns: ["created_date"]
        date_formats: ["%Y-%m-%d", "%m/%d/%Y"]
      enabled: true

schema:
  name: {type: CharField, max_length: 100}
  age: {type: IntegerField}
  amount: {type: DecimalField}
  created_date: {type: DateField}
```

---

## 🏗️ Architecture

AtomIngest leverages the dynamic nature of AtomSQL:

1. **Schema Loader**: Parses YAML and resolves types to AtomSQL classes (e.g., `StringField`, `IntegerField`).

2. **Metaclass Factory**: Uses `type()` to construct a new class inheriting from `atomsql.orm.models.Model`, automatically registering it with the Database.

3. **Pipeline Runner**: Streams data, applies "Validator Injection" decorators, executes hooks, and manages transactions.

4. **Reliability Layer**: Catches errors per row, routing failed records to a DLQ table while committing valid rows in batches.

---

## 🗺️ Roadmap

We are currently building out the core phases:

- **Phase 1: Core Configuration** - YAML Parser & Dynamic Model Factory.

- **Phase 2: Validation** - Regex/Range constraints & Hook system.

- **Phase 3: Reliability** - DLQ, Upsert strategies, and Async Batching.

- **Phase 4: Security (In Progress)** - PII Encryption & Immutable Mixins.

- **Phase 5: Orchestration** - Topological dependency resolution.

See the full [Project Backlog](https://github.com/prashant-fintech/atomingest/issues) for details.

---

## 🤝 Contributing

We welcome contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for details on how to set up the dev environment.

1. Clone the repo.
2. Install dependencies: `uv pip install -r pyproject.toml`.
3. Run tests: `pytest tests/`.

---

## 📄 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

---

**Built with ❤️ for the Fintech Open Source Community.**
