Skip to content

API Reference

Complete reference documentation for PyCharter's Python API.

Module Overview

pycharter
├── Pipeline              # ETL pipeline orchestration
├── Validator             # Data validation
├── QualityCheck          # Quality monitoring
├── etl_generator/        # ETL components
│   ├── extractors/       # Data extraction
│   ├── transformers/     # Data transformation
│   ├── loaders/          # Data loading
│   ├── state             # Incremental extraction state stores
│   └── testing           # Mock components and test harness
├── contract_parser/      # Contract parsing
├── contract_builder/     # Contract building
├── metadata_store/       # Schema registry
├── schema_evolution/     # Schema versioning
├── pydantic_generator/   # Model generation
├── json_schema_converter/ # Schema conversion
├── docs_generator/       # Contract documentation generation
├── domain/               # Lifecycle binding (FSM integration)
├── wiki/                 # Ontology, knowledge graph, governance
└── shared/               # Utilities and errors

Core Classes

Class Description
Pipeline ETL pipeline orchestration
Validator Data validation
QualityCheck Quality monitoring

ETL Components

Module Description
Extractors HTTP, File, Database, Cloud
Transformers Rename, Filter, AddField, etc.
Loaders Postgres, File, Cloud

Contract Management

Module Description
Contract Parser Parse contract files
Contract Builder Build contracts

Storage

Module Description
Metadata Store Schema registry
Schema Evolution Versioning & compatibility

Utilities & Extensions

Module Description
Pydantic Generator Generate Pydantic models from schemas
JSON Schema Converter Convert Pydantic ↔ JSON Schema
Docs Generator Generate Markdown docs from contracts
Domain Lifecycle binding for FSM engines
Wiki & Ontology Semantic annotations, knowledge graph, governance
Testing Framework Mock components and pipeline test harness
Errors Exception hierarchy

Import Patterns

# Core classes
from pycharter import Pipeline, Validator, QualityCheck

# ETL components
from pycharter import (
    HTTPExtractor, FileExtractor, DatabaseExtractor, CloudStorageExtractor,
    Rename, Filter, AddField, Drop, Select, Convert, CustomFunction,
    PostgresLoader, FileLoader, CloudStorageLoader,
)

# Metadata stores
from pycharter import (
    InMemoryMetadataStore,
    SQLiteMetadataStore,
    PostgresMetadataStore,
    MongoDBMetadataStore,
    RedisMetadataStore,
)

# Convenience functions
from pycharter import (
    from_dict, from_file, from_json,
    to_dict, to_file, to_json,
    validate, validate_batch,
    parse_contract_file, build_contract,
)

# Errors
from pycharter.shared.errors import (
    PyCharterError,
    ConfigError,
    ConfigValidationError,
    ExpressionError,
)

Type Annotations

PyCharter is fully typed with py.typed marker:

from pycharter import Validator, ValidationResult

def process_data(validator: Validator, data: dict) -> ValidationResult:
    return validator.validate(data)

Async Support

All pipeline operations are async:

import asyncio
from pycharter import Pipeline

# From script
result = asyncio.run(pipeline.run())

# From async function
async def main():
    result = await pipeline.run()
    return result

See the Async Execution Model guide for detailed guidance on running pipelines from scripts, FastAPI, notebooks, and Celery.