Metadata-Version: 2.4
Name: aptoro
Version: 0.5.0
Summary: A minimal, functional Python ETL library for reading, validating, and transforming data using YAML schemas
Project-URL: Homepage, https://github.com/plataformasindigenas/aptoro
Project-URL: Documentation, https://github.com/plataformasindigenas/aptoro#readme
Project-URL: Repository, https://github.com/plataformasindigenas/aptoro
Author: Plataformas Indígenas
License-Expression: GPL-3.0-or-later
License-File: LICENSE
Keywords: data,etl,pydantic,schema,validation,yaml
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Typing :: Typed
Requires-Python: >=3.11
Requires-Dist: pydantic>=2.0
Requires-Dist: pyyaml>=6.0
Provides-Extra: dev
Requires-Dist: build; extra == 'dev'
Requires-Dist: mypy>=1.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: ruff>=0.1; extra == 'dev'
Requires-Dist: twine; extra == 'dev'
Requires-Dist: types-pyyaml; extra == 'dev'
Provides-Extra: excel
Requires-Dist: openpyxl>=3.0; extra == 'excel'
Provides-Extra: sheets
Requires-Dist: google-auth>=2.0; extra == 'sheets'
Requires-Dist: gspread>=5.0; extra == 'sheets'
Provides-Extra: sql
Requires-Dist: sqlalchemy>=2.0; extra == 'sql'
Description-Content-Type: text/markdown

# Aptoro

[![PyPI version](https://img.shields.io/pypi/v/aptoro.svg)](https://pypi.org/project/aptoro/)
[![Python versions](https://img.shields.io/pypi/pyversions/aptoro.svg)](https://pypi.org/project/aptoro/)
[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
[![Code style: ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)

**Aptoro** is a Xavante word for *"preparing the arrows for hunting"*.

It is a minimal, functional Python ETL library for reading, validating, and transforming data using YAML schemas. Designed for simplicity and correctness, it bridges the gap between raw data files (CSV, JSON) and typed, validated Python objects.

## Features

- **Schema-First:** Define your data model in simple, readable YAML.
- **Strict Validation:** Ensures data quality with type checks, constraints, and range validation.
- **Rich Types:** Built-in support for `datetime` (ISO 8601), `url`, `file`, `dict`, nested objects, and standard primitives.
- **Multi-Format:** CSV, JSON, YAML, TOML, and Markdown front-matter (Jekyll/Hugo/Obsidian style).
- **Glob Patterns:** Read multiple files at once with `read("data/*.md")`.
- **Functional API:** Pure functions and immutable dataclasses make pipelines predictable.
- **Zero Boilerplate:** No complex class definitions—just load your schema and go.

## Installation

```bash
pip install aptoro
```

## CLI Usage

Aptoro provides a command-line interface for validating data files directly.

```bash
# Validate a CSV file against a schema
aptoro validate data.csv --schema schema.yaml

# Explicitly specify format
aptoro validate data.txt --schema schema.yaml --format json
```

## Quick Start

```python
from aptoro import load, load_schema, read, validate, to_json

# All-in-one: read + validate
entries = load(source="data.csv", schema="schema.yaml")

# Or step by step pipeline:
schema = load_schema("schema.yaml")
data = read("data.csv")
entries = validate(data, schema)

# Export to JSON
json_str = to_json(entries)

# Export with embedded metadata (self-describing files)
json_meta = to_json(entries, schema=schema, include_meta=True)
```

## Documentation

For full details on the schema language, advanced validation, and API reference, see the [Documentation](DOCS.md).

## Schema Language

Define your data schema in YAML:

```yaml
name: lexicon_entry
description: Dictionary entries

fields:
  id: str
  lemma: str
  pos: str[noun|verb|adj|adv]     # Constrained values (Enum)
  definition: str
  translation: str?               # Optional field
  examples: list[str]?            # Optional list
  frequency: int = 0              # Default value
  created_at: datetime?           # Optional ISO 8601 datetime
  source_url: url?                # Optional URL
```

### Type Syntax

- **Basic types:** `str`, `int`, `float`, `bool`
- **Specialized types:** `url`, `file`, `datetime`
- **Optional:** `str?`, `int?`, `url?`, `datetime?`
- **Default value:** `str = "default"`, `int = 0`, `list[str] = []`, `dict[str, int] = {}`
- **Constrained:** `str[a|b|c]`
- **Ranges:** `int[0..120]`, `float[0.0..1.0]`
- **Lists:** `list[str]`, `list[int]`
- **Dicts:** `dict`, `dict[str, int]`, `dict[str]`
- **Nested objects:** `type: object` with `fields` block

See [DOCS.md](DOCS.md) for full syntax, including inheritance, nested structures, and front-matter reading.

## Supported Formats

- **CSV** (auto-detects types)
- **JSON**
- **YAML**
- **TOML**
- **Markdown front-matter** (`.md` files with YAML front matter)

## License

GNU General Public License v3 (GPLv3)
