Skip to content

PyCharter

Data Contract Management, ETL Pipelines, and Quality Assurance for Python

PyPI version Python versions License Test


PyCharter is a comprehensive data contract management platform for Python that enables you to define, store, version, enforce, and monitor data contracts throughout your data pipelines.

Key Features

  • ETL Pipelines


    Build data pipelines with a fluent | operator. 13 built-in extractors for HTTP, files, databases, cloud storage, streaming (SSE, WebSocket), and messaging (Kafka, RabbitMQ, SQS).

    ETL Tutorial

  • Data Contracts


    Define formal agreements specifying data structure, quality rules, and governance policies.

    Contracts Tutorial

  • Quality Assurance


    Monitor data quality with metrics, track violations, and set threshold alerts.

    Quality Tutorial

  • Schema Registry


    Centralized storage for schemas with PostgreSQL, SQLite, MongoDB, or Redis backends.

    Metadata Tutorial

Quick Example

ETL Pipeline with | Operator

import asyncio
from pycharter import Pipeline, HTTPExtractor, PostgresLoader, Rename, Filter

# Build pipeline with fluent syntax
pipeline = (
    Pipeline(HTTPExtractor(url="https://api.example.com/users"))
    | Rename({"user_name": "name", "user_email": "email"})
    | Filter(lambda r: r.get("active", False))
    | PostgresLoader(connection_string="postgresql://...", table="users")
)

# Run the pipeline
result = asyncio.run(pipeline.run())
print(f"Loaded {result.rows_loaded} rows")

Data Validation

from pycharter import Validator

# Create validator from contract file
validator = Validator.from_file("user_contract.yaml")

# Validate data
result = validator.validate({"name": "Alice", "age": 30, "email": "alice@example.com"})

if result.is_valid:
    print(f"Valid: {result.data}")
else:
    print(f"Errors: {result.errors}")

Quality Check

from pycharter import QualityCheck, QualityThresholds

# Run quality check with thresholds
check = QualityCheck(store=store)
report = check.run(
    schema_id="user_schema_v1",
    data=records,
    thresholds=QualityThresholds(min_overall_score=95.0)
)

print(f"Quality Score: {report.quality_score.overall_score}/100")
print(f"Passed: {report.passed}")

Installation

pip install pycharter
pip install pycharter[api]
pip install pycharter[ui]
pip install pycharter[api,ui,etl]

Architecture Overview

graph TB
    subgraph Input["Data Sources"]
        HTTP[HTTP/API]
        Files[Files]
        DB[(Database)]
        Cloud[Cloud Storage]
        Stream[SSE / WebSocket]
        MQ[Kafka / RabbitMQ / SQS]
    end

    subgraph PyCharter["PyCharter"]
        Extract[Extractors]
        Transform[Transformers]
        Load[Loaders]
        Validate[Validator]
        Quality[Quality Check]
        Store[(Metadata Store)]
    end

    subgraph Output["Destinations"]
        PG[(PostgreSQL)]
        File[Files]
        S3[Cloud Storage]
    end

    HTTP --> Extract
    Files --> Extract
    DB --> Extract
    Cloud --> Extract
    Stream --> Extract
    MQ --> Extract

    Extract --> Transform
    Transform --> Validate
    Validate --> Load
    Validate --> Quality

    Store --> Validate
    Quality --> Store

    Load --> PG
    Load --> File
    Load --> S3

Next Steps

  • Get Started


    Install PyCharter and run your first pipeline in minutes.

    Installation

  • Learn


    Follow step-by-step tutorials for each major feature.

    Tutorials

  • API Reference


    Detailed documentation for all classes and functions.

    API Reference

  • Contribute


    Help improve PyCharter by contributing code or documentation.

    Contributing