Skip to content

PyCharter Command & Function Reference

Quick reference guide for all PyCharter CLI commands and Python functions.

CLI Commands

Database Management

Command Description Example Usage
pycharter db init [database_url] Initialize database schema from scratch pycharter db init postgresql://user:pass@localhost/db
pycharter db upgrade [database_url] Upgrade database to latest revision pycharter db upgrade --revision head
pycharter db downgrade [database_url] Downgrade database to previous revision pycharter db downgrade --revision -1
pycharter db current [database_url] Show current database revision pycharter db current
pycharter db history [database_url] Show migration history pycharter db history
pycharter db stamp [revision] [database_url] Stamp database with revision without running migrations pycharter db stamp head
pycharter db seed [seed_dir] [database_url] Seed database with initial data from YAML files pycharter db seed data/seed
pycharter db truncate [database_url] Truncate all PyCharter tables (clear all data) pycharter db truncate --force

API Server

Command Description Example Usage
pycharter api Start the API server (default port 8002) pycharter api --host 0.0.0.0 --port 8002
pycharter api --reload Start API server with auto-reload pycharter api --reload
pycharter api --no-reload Start API server without auto-reload pycharter api --no-reload

UI Server

Command Description Example Usage
pycharter ui serve Serve the built UI (default port 3002) pycharter ui serve --port 3002
pycharter ui dev Run UI development server pycharter ui dev --api-url http://localhost:8002
pycharter ui build Build UI for production pycharter ui build

Documentation (MkDocs)

Command Description Example Usage
pycharter docs serve Serve docs locally (default: http://127.0.0.1:5002) pycharter docs serve --port 5002
pycharter docs build Build static site to site/ pycharter docs build

Validation Worker

Command Description Example Usage
pycharter worker start Run the validation worker (polls DB for jobs) pycharter worker start
pycharter worker start --concurrency N Set number of poller tasks pycharter worker start --concurrency 5
pycharter worker start --poll-interval MS Poll interval in milliseconds pycharter worker start --poll-interval 1000

Quality Assurance

Command Description Example Usage
pycharter quality check [schema_id] --data <file> Run a data quality check pycharter quality check user_schema --data data.json
pycharter quality check --contract <file> --data <file> Run quality check with contract file pycharter quality check --contract contract.yaml --data data.json
pycharter quality violations [schema_id] List recorded data quality violations pycharter quality violations user_schema --status open

Python Functions & Classes

Contract Parser

Function/Class Description Example Usage
parse_contract_file(path) Parse contract from YAML/JSON file metadata = parse_contract_file("contract.yaml")
parse_contract(dict) Parse contract from dictionary metadata = parse_contract({"schema": {...}})
ContractMetadata Parsed contract metadata object metadata.schema, metadata.governance_rules, metadata.ontology

Contract Builder

Function/Class Description Example Usage
build_contract(artifacts) Build contract from artifacts dict contract = build_contract(ContractArtifacts(...))
build_contract_from_store(store, schema_id) Rebuild contract from metadata store contract = build_contract_from_store(store, "schema_id")
ContractArtifacts Container for contract artifacts artifacts = ContractArtifacts(schema={...}, metadata={...}, ontology={...})

Metadata Store

Function/Class Description Example Usage
PostgresMetadataStore(connection_string) PostgreSQL metadata store store = PostgresMetadataStore("postgresql://...")
MongoDBMetadataStore(connection_string, database_name) MongoDB metadata store store = MongoDBMetadataStore("mongodb://...", "db")
InMemoryMetadataStore() In-memory metadata store store = InMemoryMetadataStore()
RedisMetadataStore(connection_string) Redis metadata store store = RedisMetadataStore("redis://...")
store.connect() Connect to metadata store store.connect()
store.disconnect() Disconnect from metadata store store.disconnect()
store.store_schema(schema_name, schema, version) Store schema in database schema_id = store.store_schema("user", schema, "1.0.0")
store.get_schema(schema_id) Retrieve schema by ID schema = store.get_schema(schema_id)
store.store_metadata(resource_id, resource_type, metadata) Store metadata for resource store.store_metadata(schema_id, "schema", metadata_dict)
store.get_metadata(resource_id, resource_type) Retrieve metadata by resource ID metadata = store.get_metadata(schema_id, "schema")
store.store_coercion_rules(schema_id, coercion_rules, version) Store coercion rules store.store_coercion_rules(schema_id, rules, "1.0.0")
store.get_coercion_rules(schema_id, version) Retrieve coercion rules rules = store.get_coercion_rules(schema_id)
store.store_validation_rules(schema_id, validation_rules, version) Store validation rules store.store_validation_rules(schema_id, rules, "1.0.0")
store.get_validation_rules(schema_id, version) Retrieve validation rules rules = store.get_validation_rules(schema_id)
store.store_ontology(name, version, ontology) Store ontology (wiki-enabled stores) store.store_ontology("user", "1.0.0", ontology_dict)
store.get_ontology(name, version) Retrieve ontology (wiki-enabled stores) ontology = store.get_ontology("user", "1.0.0")

Pydantic Generator

Function/Class Description Example Usage
from_dict(schema_dict, model_name) Generate Pydantic model from dict UserModel = from_dict(schema, "User")
from_file(file_path, model_name) Generate Pydantic model from file UserModel = from_file("schema.json", "User")
from_json(json_string, model_name) Generate Pydantic model from JSON string UserModel = from_json(json_str, "User")
from_url(url, model_name) Generate Pydantic model from URL UserModel = from_url("https://...", "User")
generate_model(schema, model_name) Generate Pydantic model (generic) UserModel = generate_model(schema, "User")
generate_model_file(schema, output_path, model_name) Generate and save model to file generate_model_file(schema, "models.py", "User")

JSON Schema Converter

Function/Class Description Example Usage
to_dict(model) Convert Pydantic model to dict schema = to_dict(UserModel)
to_file(model, file_path) Convert model to file to_file(UserModel, "schema.json")
to_json(model) Convert model to JSON string json_str = to_json(UserModel)
model_to_schema(model) Convert model to JSON Schema schema = model_to_schema(UserModel)

Runtime Validator

Function/Class Description Example Usage
validate(model, data) Validate single record result = validate(UserModel, data)
validate_batch(model, data_list) Validate batch of records results = validate_batch(UserModel, data_list)
validate_with_store(store, schema_id, data) Validate using store result = validate_with_store(store, "schema_id", data)
validate_batch_with_store(store, schema_id, data_list) Batch validate using store results = validate_batch_with_store(store, "schema_id", data_list)
validate_with_contract(contract, data) Validate using contract dict result = validate_with_contract(contract, data)
validate_batch_with_contract(contract, data_list) Batch validate using contract results = validate_batch_with_contract(contract, data_list)
get_model_from_store(store, schema_id, model_name) Get model from store UserModel = get_model_from_store(store, "schema_id", "User")
get_model_from_contract(contract, model_name) Get model from contract UserModel = get_model_from_contract(contract, "User")
ValidationResult Validation result object result.is_valid, result.errors, result.data
validate_stream(model, data_stream) Validate streaming data results = validate_stream(UserModel, stream)
validate_async(model, data) Async validation result = await validate_async(UserModel, data)
validate_batch_async(model, data_list) Async batch validation results = await validate_batch_async(UserModel, data_list)
StreamingValidator Streaming validator with callbacks and statistics validator = StreamingValidator(model, on_valid=..., on_invalid=...)

Quality Assurance

Convenience functions

Function Description Example Usage
check_quality(contract, data, options) One-liner quality check report = check_quality(contract, data)
check_quality_with_store(store, name, version, data) Quality check via metadata store report = check_quality_with_store(store, "user", "1.0.0", data)
profile_data(data, fields) Standalone data profiling profile = profile_data(records)

Presets

Preset Description Example
QualityCheckOptions.basic() Metrics + violations, no profiling/thresholds Default for check_quality()
QualityCheckOptions.strict() Everything on, including default thresholds check_quality(c, d, QualityCheckOptions.strict())
QualityCheckOptions.monitoring() Strict + skip-if-unchanged + dedup For scheduled/recurring checks

Classes

Function/Class Description Example Usage
QualityCheck(store, db_session) Quality check orchestrator check = QualityCheck(store, db_session)
check.run(schema_id, data, options) Run quality check report = check.run("schema_id", data, options)
QualityCheckOptions Quality check configuration options = QualityCheckOptions(calculate_metrics=True)
QualityReport Quality check report report.quality_score, report.violation_count
QualityScore Quality metrics score.overall_score, score.accuracy, score.completeness
QualityThresholds Quality thresholds for alerting thresholds = QualityThresholds(min_overall_score=95.0)
QualityMetrics Quality metrics calculator metrics = QualityMetrics()
ViolationTracker Violation tracking system tracker = ViolationTracker(db_session=session)
tracker.record_violation(...) Record a violation tracker.record_violation(schema_id, record_id, data, result)
tracker.get_violations(...) Query violations violations = tracker.get_violations(schema_id="schema_id")
tracker.resolve_violation(violation_id, resolved_by) Resolve a violation tracker.resolve_violation(violation_id, "user@example.com")
ViolationRecord Individual violation record violation.field_name, violation.error_message
DataProfiler Data profiling tool profiler = DataProfiler()
profiler.profile(data, fields) Profile dataset profile = profiler.profile(data_list)
FieldQualityMetrics Per-field quality metrics metrics.field_name, metrics.completeness

QualityCheckOptions Parameters:

  • record_violations: Record violations to database (default: True)
  • calculate_metrics: Calculate quality metrics (default: True)
  • check_thresholds: Check against quality thresholds (default: False)
  • thresholds: QualityThresholds object for threshold checking (optional)
  • include_profiling: Include data profiling in report (default: False)
  • include_field_metrics: Include per-field metrics (default: True)
  • sample_size: Process only a sample of records (optional)
  • metadata: Additional metadata dictionary (optional)
  • data_version: Version identifier for dataset (optional)
  • data_source: Source identifier - file path, table name, etc. (optional)
  • skip_if_unchanged: Skip check if data hasn't changed (requires data fingerprint, default: False)
  • deduplicate_violations: Deduplicate violations for same record+field+error (default: True)

Pipeline Quality (Post-Load Checks)

Class/Alias Description Example Usage
PostLoadChecker / QualityChecker Column/dataset checks after ETL load checker = PostLoadChecker(checks, pipeline_name="orders")

DLQ (Dead Letter Queue)

Function Description Example Usage
add_record_sync(dlq, ...) Sync wrapper for dlq.add_record() record = add_record_sync(dlq, "pipeline", data, reason, ...)
add_batch_sync(dlq, ...) Sync wrapper for dlq.add_batch() records = add_batch_sync(dlq, "pipeline", batch, reason, ...)
retry_record_sync(dlq, record_id) Sync wrapper for dlq.retry_record() ok = retry_record_sync(dlq, "abc123")

Quick Examples

Complete Workflow

from pycharter import (
    parse_contract_file,
    PostgresMetadataStore,
    from_dict,
    validate,
    QualityCheck,
    QualityCheckOptions
)

# 1. Parse contract
metadata = parse_contract_file("contract.yaml")

# 2. Store in database
store = PostgresMetadataStore("postgresql://...")
store.connect()
schema_id = store.store_schema("user", metadata.schema, "1.0.0")

# Merge ownership and governance into metadata before storing
metadata_dict = metadata.metadata.copy() if metadata.metadata else {}
if metadata.ownership:
    metadata_dict["business_owners"] = [metadata.ownership.get("owner", "unknown")] if metadata.ownership.get("owner") else []
if metadata.governance_rules:
    metadata_dict["governance_rules"] = metadata.governance_rules
store.store_metadata(resource_id=schema_id, resource_type="schema", metadata=metadata_dict)

store.disconnect()

# 3. Generate model
schema = store.get_schema(schema_id)
UserModel = from_dict(schema, "User")

# 4. Validate data
result = validate(UserModel, data)
if result.is_valid:
    print("Valid!")

# 5. Quality check
check = QualityCheck(store=store, db_session=db_session)
report = check.run(
    schema_id=schema_id,
    data="data.json",
    options=QualityCheckOptions(
        calculate_metrics=True,
        record_violations=True,
        include_profiling=True,
        data_version="v1.0.0",
        data_source="data.json",
        deduplicate_violations=True,
        skip_if_unchanged=True
    )
)
print(f"Quality Score: {report.quality_score.overall_score}")

CLI Workflow

# Initialize database
pycharter db init postgresql://user:pass@localhost/db

# Run quality check
pycharter quality check user_schema --data data.json

# Start API server
pycharter api --port 8002

# Start UI
pycharter ui serve --port 3002

Notes

  • All database commands support PYCHARTER_DATABASE_URL environment variable
  • Quality checks can persist to database when db_session is provided
  • Metadata stores support PostgreSQL, MongoDB, Redis, and InMemory
  • All validation functions return ValidationResult objects
  • Quality reports include metrics, violations, and optional profiling data
  • Violation Deduplication: Violations are automatically deduplicated (same record+field+error only recorded once)
  • Data Fingerprinting: Quality checks automatically calculate data fingerprints to detect unchanged data
  • Version Tracking: Use data_version and data_source options to track data versions and sources
  • Skip Unchanged: Set skip_if_unchanged=True to avoid creating duplicate metrics for unchanged data