PyCharter¶
Data Contract Management, ETL Pipelines, and Quality Assurance for Python
PyCharter is a comprehensive data contract management platform for Python that enables you to define, store, version, enforce, and monitor data contracts throughout your data pipelines.
Key Features¶
-
ETL Pipelines
Build data pipelines with a fluent
|operator. 13 built-in extractors for HTTP, files, databases, cloud storage, streaming (SSE, WebSocket), and messaging (Kafka, RabbitMQ, SQS). -
Data Contracts
Define formal agreements specifying data structure, quality rules, and governance policies.
-
Quality Assurance
Monitor data quality with metrics, track violations, and set threshold alerts.
-
Schema Registry
Centralized storage for schemas with PostgreSQL, SQLite, MongoDB, or Redis backends.
Quick Example¶
ETL Pipeline with | Operator¶
import asyncio
from pycharter import Pipeline, HTTPExtractor, PostgresLoader, Rename, Filter
# Build pipeline with fluent syntax
pipeline = (
Pipeline(HTTPExtractor(url="https://api.example.com/users"))
| Rename({"user_name": "name", "user_email": "email"})
| Filter(lambda r: r.get("active", False))
| PostgresLoader(connection_string="postgresql://...", table="users")
)
# Run the pipeline
result = asyncio.run(pipeline.run())
print(f"Loaded {result.rows_loaded} rows")
Data Validation¶
from pycharter import Validator
# Create validator from contract file
validator = Validator.from_file("user_contract.yaml")
# Validate data
result = validator.validate({"name": "Alice", "age": 30, "email": "alice@example.com"})
if result.is_valid:
print(f"Valid: {result.data}")
else:
print(f"Errors: {result.errors}")
Quality Check¶
from pycharter import QualityCheck, QualityThresholds
# Run quality check with thresholds
check = QualityCheck(store=store)
report = check.run(
schema_id="user_schema_v1",
data=records,
thresholds=QualityThresholds(min_overall_score=95.0)
)
print(f"Quality Score: {report.quality_score.overall_score}/100")
print(f"Passed: {report.passed}")
Installation¶
Architecture Overview¶
graph TB
subgraph Input["Data Sources"]
HTTP[HTTP/API]
Files[Files]
DB[(Database)]
Cloud[Cloud Storage]
Stream[SSE / WebSocket]
MQ[Kafka / RabbitMQ / SQS]
end
subgraph PyCharter["PyCharter"]
Extract[Extractors]
Transform[Transformers]
Load[Loaders]
Validate[Validator]
Quality[Quality Check]
Store[(Metadata Store)]
end
subgraph Output["Destinations"]
PG[(PostgreSQL)]
File[Files]
S3[Cloud Storage]
end
HTTP --> Extract
Files --> Extract
DB --> Extract
Cloud --> Extract
Stream --> Extract
MQ --> Extract
Extract --> Transform
Transform --> Validate
Validate --> Load
Validate --> Quality
Store --> Validate
Quality --> Store
Load --> PG
Load --> File
Load --> S3
Next Steps¶
-
Get Started
Install PyCharter and run your first pipeline in minutes.
-
Learn
Follow step-by-step tutorials for each major feature.
-
API Reference
Detailed documentation for all classes and functions.
-
Contribute
Help improve PyCharter by contributing code or documentation.