Skip to content

How-To Guides

Practical guides for common tasks and advanced patterns.

Available Guides

  • Building the Package


    Build the Python package with pre-built UI static files.

    Read Guide

  • Configuration & Database Setup


    Database connection configuration, initialization, and migrations.

    Read Guide

  • Data Journey


    Complete workflow from contract to runtime validation.

    Read Guide

  • Database ERD


    Entity relationship diagrams for the PyCharter schema.

    Read Guide

  • Database Operations


    ETL write methods: insert, upsert, replace, truncate_and_load, and more.

    Read Guide

  • Custom Extractors


    Build extractors for custom data sources.

    Read Guide

  • Custom Transformers


    Create reusable transformation functions.

    Read Guide

  • Custom Validators


    Extend validation with custom rules.

    Read Guide

  • Domain Models and Lifecycle


    Domain entity contracts with optional lifecycle binding for FSM engines.

    Read Guide

  • Database Configuration


    Configure and optimize database backends.

    Read Guide

  • Production Deployment


    Deploy PyCharter in production environments.

    Read Guide

  • Validation Worker


    Run validation jobs asynchronously with the database-backed worker process.

    Read Guide

  • ETL Transformations


    Simple operations, JSONata, and custom Python functions in your transform.yaml.

    Read Guide

  • Async Execution Model


    Run pipelines from scripts, FastAPI, Jupyter, Celery, and concurrent contexts.

    Read Guide

  • Incremental Extraction


    Watermark-based state tracking with FileStateStore and SqliteStateStore.

    Read Guide

  • Testing Pipelines


    Mock extractors, mock loaders, the test harness, fixture files, and assertion helpers.

    Read Guide

  • Data Profiling


    Statistical field analysis with DataProfiler — completeness, distributions, and drift.

    Read Guide

  • Wiki & Ontology


    Semantic field annotations, concept vocabulary, lineage, and governance workflows.

    Read Guide

Quick Reference

Common Patterns

Quality Gate in Pipeline

async def run_with_quality_gate():
    result = await pipeline.run()

    report = check.run(schema_id=schema_id, data=output_data, thresholds=thresholds)

    if not report.passed:
        raise QualityError(f"Quality gate failed: {report.threshold_breaches}")

    return result

Retry with Backoff

from tenacity import retry, stop_after_attempt, wait_exponential

@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=4, max=10))
async def run_pipeline_with_retry():
    return await pipeline.run()

Validation Decorator

from pycharter import validate_input

@validate_input("contracts/user.yaml")
def create_user(data: dict) -> dict:
    # data is already validated
    return db.insert(data)

Incremental Processing

# Track state between runs
last_id = load_last_processed_id()

pipeline = Pipeline.from_config_files(
    extract="extract.yaml",
    load="load.yaml",
    variables={"LAST_ID": last_id}
)

result = await pipeline.run()
save_last_processed_id(result.metadata.get("last_id"))

Need More Help?