PyCharter Command & Function Reference¶
Quick reference guide for all PyCharter CLI commands and Python functions.
CLI Commands¶
Database Management¶
| Command | Description | Example Usage |
|---|---|---|
pycharter db init [database_url] |
Initialize database schema from scratch | pycharter db init postgresql://user:pass@localhost/db |
pycharter db upgrade [database_url] |
Upgrade database to latest revision | pycharter db upgrade --revision head |
pycharter db downgrade [database_url] |
Downgrade database to previous revision | pycharter db downgrade --revision -1 |
pycharter db current [database_url] |
Show current database revision | pycharter db current |
pycharter db history [database_url] |
Show migration history | pycharter db history |
pycharter db stamp [revision] [database_url] |
Stamp database with revision without running migrations | pycharter db stamp head |
pycharter db seed [seed_dir] [database_url] |
Seed database with initial data from YAML files | pycharter db seed data/seed |
pycharter db truncate [database_url] |
Truncate all PyCharter tables (clear all data) | pycharter db truncate --force |
API Server¶
| Command | Description | Example Usage |
|---|---|---|
pycharter api |
Start the API server (default port 8002) | pycharter api --host 0.0.0.0 --port 8002 |
pycharter api --reload |
Start API server with auto-reload | pycharter api --reload |
pycharter api --no-reload |
Start API server without auto-reload | pycharter api --no-reload |
UI Server¶
| Command | Description | Example Usage |
|---|---|---|
pycharter ui serve |
Serve the built UI (default port 3002) | pycharter ui serve --port 3002 |
pycharter ui dev |
Run UI development server | pycharter ui dev --api-url http://localhost:8002 |
pycharter ui build |
Build UI for production | pycharter ui build |
Documentation (MkDocs)¶
| Command | Description | Example Usage |
|---|---|---|
pycharter docs serve |
Serve docs locally (default: http://127.0.0.1:5002) | pycharter docs serve --port 5002 |
pycharter docs build |
Build static site to site/ |
pycharter docs build |
Validation Worker¶
| Command | Description | Example Usage |
|---|---|---|
pycharter worker start |
Run the validation worker (polls DB for jobs) | pycharter worker start |
pycharter worker start --concurrency N |
Set number of poller tasks | pycharter worker start --concurrency 5 |
pycharter worker start --poll-interval MS |
Poll interval in milliseconds | pycharter worker start --poll-interval 1000 |
Quality Assurance¶
| Command | Description | Example Usage |
|---|---|---|
pycharter quality check [schema_id] --data <file> |
Run a data quality check | pycharter quality check user_schema --data data.json |
pycharter quality check --contract <file> --data <file> |
Run quality check with contract file | pycharter quality check --contract contract.yaml --data data.json |
pycharter quality violations [schema_id] |
List recorded data quality violations | pycharter quality violations user_schema --status open |
Python Functions & Classes¶
Contract Parser¶
| Function/Class | Description | Example Usage |
|---|---|---|
parse_contract_file(path) |
Parse contract from YAML/JSON file | metadata = parse_contract_file("contract.yaml") |
parse_contract(dict) |
Parse contract from dictionary | metadata = parse_contract({"schema": {...}}) |
ContractMetadata |
Parsed contract metadata object | metadata.schema, metadata.governance_rules, metadata.ontology |
Contract Builder¶
| Function/Class | Description | Example Usage |
|---|---|---|
build_contract(artifacts) |
Build contract from artifacts dict | contract = build_contract(ContractArtifacts(...)) |
build_contract_from_store(store, schema_id) |
Rebuild contract from metadata store | contract = build_contract_from_store(store, "schema_id") |
ContractArtifacts |
Container for contract artifacts | artifacts = ContractArtifacts(schema={...}, metadata={...}, ontology={...}) |
Metadata Store¶
| Function/Class | Description | Example Usage |
|---|---|---|
PostgresMetadataStore(connection_string) |
PostgreSQL metadata store | store = PostgresMetadataStore("postgresql://...") |
MongoDBMetadataStore(connection_string, database_name) |
MongoDB metadata store | store = MongoDBMetadataStore("mongodb://...", "db") |
InMemoryMetadataStore() |
In-memory metadata store | store = InMemoryMetadataStore() |
RedisMetadataStore(connection_string) |
Redis metadata store | store = RedisMetadataStore("redis://...") |
store.connect() |
Connect to metadata store | store.connect() |
store.disconnect() |
Disconnect from metadata store | store.disconnect() |
store.store_schema(schema_name, schema, version) |
Store schema in database | schema_id = store.store_schema("user", schema, "1.0.0") |
store.get_schema(schema_id) |
Retrieve schema by ID | schema = store.get_schema(schema_id) |
store.store_metadata(resource_id, resource_type, metadata) |
Store metadata for resource | store.store_metadata(schema_id, "schema", metadata_dict) |
store.get_metadata(resource_id, resource_type) |
Retrieve metadata by resource ID | metadata = store.get_metadata(schema_id, "schema") |
store.store_coercion_rules(schema_id, coercion_rules, version) |
Store coercion rules | store.store_coercion_rules(schema_id, rules, "1.0.0") |
store.get_coercion_rules(schema_id, version) |
Retrieve coercion rules | rules = store.get_coercion_rules(schema_id) |
store.store_validation_rules(schema_id, validation_rules, version) |
Store validation rules | store.store_validation_rules(schema_id, rules, "1.0.0") |
store.get_validation_rules(schema_id, version) |
Retrieve validation rules | rules = store.get_validation_rules(schema_id) |
store.store_ontology(name, version, ontology) |
Store ontology (wiki-enabled stores) | store.store_ontology("user", "1.0.0", ontology_dict) |
store.get_ontology(name, version) |
Retrieve ontology (wiki-enabled stores) | ontology = store.get_ontology("user", "1.0.0") |
Pydantic Generator¶
| Function/Class | Description | Example Usage |
|---|---|---|
from_dict(schema_dict, model_name) |
Generate Pydantic model from dict | UserModel = from_dict(schema, "User") |
from_file(file_path, model_name) |
Generate Pydantic model from file | UserModel = from_file("schema.json", "User") |
from_json(json_string, model_name) |
Generate Pydantic model from JSON string | UserModel = from_json(json_str, "User") |
from_url(url, model_name) |
Generate Pydantic model from URL | UserModel = from_url("https://...", "User") |
generate_model(schema, model_name) |
Generate Pydantic model (generic) | UserModel = generate_model(schema, "User") |
generate_model_file(schema, output_path, model_name) |
Generate and save model to file | generate_model_file(schema, "models.py", "User") |
JSON Schema Converter¶
| Function/Class | Description | Example Usage |
|---|---|---|
to_dict(model) |
Convert Pydantic model to dict | schema = to_dict(UserModel) |
to_file(model, file_path) |
Convert model to file | to_file(UserModel, "schema.json") |
to_json(model) |
Convert model to JSON string | json_str = to_json(UserModel) |
model_to_schema(model) |
Convert model to JSON Schema | schema = model_to_schema(UserModel) |
Runtime Validator¶
| Function/Class | Description | Example Usage |
|---|---|---|
validate(model, data) |
Validate single record | result = validate(UserModel, data) |
validate_batch(model, data_list) |
Validate batch of records | results = validate_batch(UserModel, data_list) |
validate_with_store(store, schema_id, data) |
Validate using store | result = validate_with_store(store, "schema_id", data) |
validate_batch_with_store(store, schema_id, data_list) |
Batch validate using store | results = validate_batch_with_store(store, "schema_id", data_list) |
validate_with_contract(contract, data) |
Validate using contract dict | result = validate_with_contract(contract, data) |
validate_batch_with_contract(contract, data_list) |
Batch validate using contract | results = validate_batch_with_contract(contract, data_list) |
get_model_from_store(store, schema_id, model_name) |
Get model from store | UserModel = get_model_from_store(store, "schema_id", "User") |
get_model_from_contract(contract, model_name) |
Get model from contract | UserModel = get_model_from_contract(contract, "User") |
ValidationResult |
Validation result object | result.is_valid, result.errors, result.data |
validate_stream(model, data_stream) |
Validate streaming data | results = validate_stream(UserModel, stream) |
validate_async(model, data) |
Async validation | result = await validate_async(UserModel, data) |
validate_batch_async(model, data_list) |
Async batch validation | results = await validate_batch_async(UserModel, data_list) |
StreamingValidator |
Streaming validator with callbacks and statistics | validator = StreamingValidator(model, on_valid=..., on_invalid=...) |
Quality Assurance¶
Convenience functions¶
| Function | Description | Example Usage |
|---|---|---|
check_quality(contract, data, options) |
One-liner quality check | report = check_quality(contract, data) |
check_quality_with_store(store, name, version, data) |
Quality check via metadata store | report = check_quality_with_store(store, "user", "1.0.0", data) |
profile_data(data, fields) |
Standalone data profiling | profile = profile_data(records) |
Presets¶
| Preset | Description | Example |
|---|---|---|
QualityCheckOptions.basic() |
Metrics + violations, no profiling/thresholds | Default for check_quality() |
QualityCheckOptions.strict() |
Everything on, including default thresholds | check_quality(c, d, QualityCheckOptions.strict()) |
QualityCheckOptions.monitoring() |
Strict + skip-if-unchanged + dedup | For scheduled/recurring checks |
Classes¶
| Function/Class | Description | Example Usage |
|---|---|---|
QualityCheck(store, db_session) |
Quality check orchestrator | check = QualityCheck(store, db_session) |
check.run(schema_id, data, options) |
Run quality check | report = check.run("schema_id", data, options) |
QualityCheckOptions |
Quality check configuration | options = QualityCheckOptions(calculate_metrics=True) |
QualityReport |
Quality check report | report.quality_score, report.violation_count |
QualityScore |
Quality metrics | score.overall_score, score.accuracy, score.completeness |
QualityThresholds |
Quality thresholds for alerting | thresholds = QualityThresholds(min_overall_score=95.0) |
QualityMetrics |
Quality metrics calculator | metrics = QualityMetrics() |
ViolationTracker |
Violation tracking system | tracker = ViolationTracker(db_session=session) |
tracker.record_violation(...) |
Record a violation | tracker.record_violation(schema_id, record_id, data, result) |
tracker.get_violations(...) |
Query violations | violations = tracker.get_violations(schema_id="schema_id") |
tracker.resolve_violation(violation_id, resolved_by) |
Resolve a violation | tracker.resolve_violation(violation_id, "user@example.com") |
ViolationRecord |
Individual violation record | violation.field_name, violation.error_message |
DataProfiler |
Data profiling tool | profiler = DataProfiler() |
profiler.profile(data, fields) |
Profile dataset | profile = profiler.profile(data_list) |
FieldQualityMetrics |
Per-field quality metrics | metrics.field_name, metrics.completeness |
QualityCheckOptions Parameters:
record_violations: Record violations to database (default: True)calculate_metrics: Calculate quality metrics (default: True)check_thresholds: Check against quality thresholds (default: False)thresholds: QualityThresholds object for threshold checking (optional)include_profiling: Include data profiling in report (default: False)include_field_metrics: Include per-field metrics (default: True)sample_size: Process only a sample of records (optional)metadata: Additional metadata dictionary (optional)data_version: Version identifier for dataset (optional)data_source: Source identifier - file path, table name, etc. (optional)skip_if_unchanged: Skip check if data hasn't changed (requires data fingerprint, default: False)deduplicate_violations: Deduplicate violations for same record+field+error (default: True)
Pipeline Quality (Post-Load Checks)¶
| Class/Alias | Description | Example Usage |
|---|---|---|
PostLoadChecker / QualityChecker |
Column/dataset checks after ETL load | checker = PostLoadChecker(checks, pipeline_name="orders") |
DLQ (Dead Letter Queue)¶
| Function | Description | Example Usage |
|---|---|---|
add_record_sync(dlq, ...) |
Sync wrapper for dlq.add_record() |
record = add_record_sync(dlq, "pipeline", data, reason, ...) |
add_batch_sync(dlq, ...) |
Sync wrapper for dlq.add_batch() |
records = add_batch_sync(dlq, "pipeline", batch, reason, ...) |
retry_record_sync(dlq, record_id) |
Sync wrapper for dlq.retry_record() |
ok = retry_record_sync(dlq, "abc123") |
Quick Examples¶
Complete Workflow¶
from pycharter import (
parse_contract_file,
PostgresMetadataStore,
from_dict,
validate,
QualityCheck,
QualityCheckOptions
)
# 1. Parse contract
metadata = parse_contract_file("contract.yaml")
# 2. Store in database
store = PostgresMetadataStore("postgresql://...")
store.connect()
schema_id = store.store_schema("user", metadata.schema, "1.0.0")
# Merge ownership and governance into metadata before storing
metadata_dict = metadata.metadata.copy() if metadata.metadata else {}
if metadata.ownership:
metadata_dict["business_owners"] = [metadata.ownership.get("owner", "unknown")] if metadata.ownership.get("owner") else []
if metadata.governance_rules:
metadata_dict["governance_rules"] = metadata.governance_rules
store.store_metadata(resource_id=schema_id, resource_type="schema", metadata=metadata_dict)
store.disconnect()
# 3. Generate model
schema = store.get_schema(schema_id)
UserModel = from_dict(schema, "User")
# 4. Validate data
result = validate(UserModel, data)
if result.is_valid:
print("Valid!")
# 5. Quality check
check = QualityCheck(store=store, db_session=db_session)
report = check.run(
schema_id=schema_id,
data="data.json",
options=QualityCheckOptions(
calculate_metrics=True,
record_violations=True,
include_profiling=True,
data_version="v1.0.0",
data_source="data.json",
deduplicate_violations=True,
skip_if_unchanged=True
)
)
print(f"Quality Score: {report.quality_score.overall_score}")
CLI Workflow¶
# Initialize database
pycharter db init postgresql://user:pass@localhost/db
# Run quality check
pycharter quality check user_schema --data data.json
# Start API server
pycharter api --port 8002
# Start UI
pycharter ui serve --port 3002
Notes¶
- All database commands support
PYCHARTER_DATABASE_URLenvironment variable - Quality checks can persist to database when
db_sessionis provided - Metadata stores support PostgreSQL, MongoDB, Redis, and InMemory
- All validation functions return
ValidationResultobjects - Quality reports include metrics, violations, and optional profiling data
- Violation Deduplication: Violations are automatically deduplicated (same record+field+error only recorded once)
- Data Fingerprinting: Quality checks automatically calculate data fingerprints to detect unchanged data
- Version Tracking: Use
data_versionanddata_sourceoptions to track data versions and sources - Skip Unchanged: Set
skip_if_unchanged=Trueto avoid creating duplicate metrics for unchanged data