How-To Guides¶
Practical guides for common tasks and advanced patterns.
Available Guides¶
-
Building the Package
Build the Python package with pre-built UI static files.
-
Configuration & Database Setup
Database connection configuration, initialization, and migrations.
-
Data Journey
Complete workflow from contract to runtime validation.
-
Database ERD
Entity relationship diagrams for the PyCharter schema.
-
Database Operations
ETL write methods: insert, upsert, replace, truncate_and_load, and more.
-
Custom Extractors
Build extractors for custom data sources.
-
Custom Transformers
Create reusable transformation functions.
-
Custom Validators
Extend validation with custom rules.
-
Domain Models and Lifecycle
Domain entity contracts with optional lifecycle binding for FSM engines.
-
Database Configuration
Configure and optimize database backends.
-
Production Deployment
Deploy PyCharter in production environments.
-
Validation Worker
Run validation jobs asynchronously with the database-backed worker process.
-
ETL Transformations
Simple operations, JSONata, and custom Python functions in your transform.yaml.
-
Async Execution Model
Run pipelines from scripts, FastAPI, Jupyter, Celery, and concurrent contexts.
-
Incremental Extraction
Watermark-based state tracking with FileStateStore and SqliteStateStore.
-
Testing Pipelines
Mock extractors, mock loaders, the test harness, fixture files, and assertion helpers.
-
Data Profiling
Statistical field analysis with DataProfiler — completeness, distributions, and drift.
-
Wiki & Ontology
Semantic field annotations, concept vocabulary, lineage, and governance workflows.
Quick Reference¶
Common Patterns¶
Quality Gate in Pipeline¶
async def run_with_quality_gate():
result = await pipeline.run()
report = check.run(schema_id=schema_id, data=output_data, thresholds=thresholds)
if not report.passed:
raise QualityError(f"Quality gate failed: {report.threshold_breaches}")
return result
Retry with Backoff¶
from tenacity import retry, stop_after_attempt, wait_exponential
@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=4, max=10))
async def run_pipeline_with_retry():
return await pipeline.run()
Validation Decorator¶
from pycharter import validate_input
@validate_input("contracts/user.yaml")
def create_user(data: dict) -> dict:
# data is already validated
return db.insert(data)
Incremental Processing¶
# Track state between runs
last_id = load_last_processed_id()
pipeline = Pipeline.from_config_files(
extract="extract.yaml",
load="load.yaml",
variables={"LAST_ID": last_id}
)
result = await pipeline.run()
save_last_processed_id(result.metadata.get("last_id"))
Need More Help?¶
- Check the API Reference for detailed documentation
- Search GitHub Issues
- Ask on GitHub Discussions