PyCharter Configuration and Database Setup¶
PyCharter follows Airflow's configuration pattern for database credentials, making it easy to manage database connections across different environments. This guide covers database configuration, initialization, and migrations.
Table of Contents¶
- Database Connection Configuration
- Variable Injection in Configuration Files
- Database Initialization
- Database Migrations
- Troubleshooting
Database Connection Configuration¶
Configuration Priority¶
PyCharter checks for database configuration in this order:
-
Command-line argument (highest priority)
-
Environment variable:
PYCHARTER_DATABASE_URL -
Config file:
pycharter.cfg(in project root or~/.pycharter/) -
Alembic config:
alembic.ini(fallback)
Config File Locations¶
PyCharter looks for pycharter.cfg in:
- Current working directory
~/.pycharter/pycharter.cfg(user home directory)- Project root (where
alembic.iniis located)
Comparison with Airflow¶
| Feature | Airflow | PyCharter |
|---|---|---|
| Env Var | AIRFLOW__DATABASE__SQL_ALCHEMY_CONN |
PYCHARTER_DATABASE_URL |
| Config File | airflow.cfg |
pycharter.cfg |
| Config Section | [database] |
[database] |
| Config Key | sql_alchemy_conn |
PYCHARTER_DATABASE_URL |
Best Practices¶
-
Development: Use environment variables
-
Production: Use config file or environment variables (set by deployment system)
-
CI/CD: Use environment variables (set in CI/CD secrets)
-
Docker: Use environment variables or mounted config file
Security Notes¶
- Never commit
pycharter.cfgwith credentials to version control - Use environment variables or secrets management in production
- Consider using
pycharter.cfg.exampleas a template (without credentials)
Variable Injection in Configuration Files¶
PyCharter supports Docker Compose-style variable substitution in all YAML and JSON configuration files. This allows you to inject dynamic values from environment variables or config files into your configuration.
Syntax¶
PyCharter supports the following variable substitution syntax:
- Basic substitution:
${VAR} - Replaces with value from environment variable or config file
-
If variable not found, returns original string (backward compatible)
-
Default value:
${VAR:-default} - Uses
defaultifVARis unset or empty -
Useful for providing fallback values
-
Required variable:
${VAR:?error message} - Raises
ValueErrorifVARis unset or empty - Error message is included in the exception
-
Use for critical configuration that must be provided
-
Escaped literal:
$${VAR} - Results in literal
${VAR}(no substitution) - Use when you need to include
${}in your output
Value Resolution Priority¶
Variables are resolved in the following priority order:
- Context dictionary (if provided programmatically)
- Environment variables (
os.getenv()) - pycharter.cfg [variables] section
- pycharter.cfg [etl] section
- Default value (if
${VAR:-default}syntax used) - Error (if
${VAR:?error}syntax used and variable not found)
Examples¶
Basic Usage¶
# extract.yaml
params:
apikey: ${FMP_API_KEY} # Basic substitution
timeout: ${REQUEST_TIMEOUT:-30} # Default to 30 if not set
base_url: https://${API_HOST:-api.example.com}/v1 # Partial substitution with default
Required Variables¶
# extract.yaml
params:
apikey: ${FMP_API_KEY:?FMP_API_KEY is required} # Will raise error if not set
Config File Variables¶
Define variables in pycharter.cfg:
# pycharter.cfg
[variables]
FMP_API_KEY = your_api_key_here
API_HOST = api.example.com
DB_PORT = 5432
[etl]
default_timeout = 30
default_batch_size = 1000
Then use in YAML files:
# extract.yaml
params:
apikey: ${FMP_API_KEY} # Resolved from config file if env var not set
timeout: ${default_timeout} # Resolved from [etl] section
Complex Examples¶
# extract.yaml
base_url: https://${API_HOST}/v1
params:
apikey: ${FMP_API_KEY:?API key required}
timeout: ${TIMEOUT:-30}
headers:
Authorization: Bearer ${API_KEY}
X-API-Version: ${API_VERSION:-v1}
Escaped Variables¶
# When you need literal ${VAR} in output
message: "Cost is $${AMOUNT}" # Results in: "Cost is ${AMOUNT}"
Supported Files¶
Variable injection is automatically applied to:
- ETL Configuration Files:
extract.yaml,load.yaml,transform.yaml - Contract Files:
schema.yaml,metadata.yaml,coercion_rules.yaml,validation_rules.yaml - All YAML/JSON files loaded through PyCharter's parsers
Error Handling¶
When a required variable is missing, PyCharter raises a clear error:
The error message includes: - The variable name - The error message (if provided) - The source file and line number (if available)
Best Practices¶
-
Use defaults for optional configuration:
-
Use required syntax for critical values:
-
Store secrets in environment variables, not config files:
-
Store non-sensitive defaults in config files:
-
Use partial substitution for URLs:
Database Initialization¶
Quick Start by Database Type¶
PostgreSQL (Most Common)¶
PostgreSQL automatically initializes the schema when you connect:
from pycharter import PostgresMetadataStore
# Just connect - schema is created automatically!
store = PostgresMetadataStore("postgresql://user:pass@localhost/pycharter")
store.connect() # Tables created automatically on first connection
# Ready to use
schema_id = store.store_schema("user", {"type": "object", "version": "1.0.0"}, version="1.0.0")
MongoDB¶
MongoDB requires no initialization - collections are created automatically:
from pycharter import MongoDBMetadataStore
store = MongoDBMetadataStore(
connection_string="mongodb://user:pass@localhost:27017",
database_name="pycharter"
)
store.connect() # Collections and indexes created automatically
# Ready to use
schema_id = store.store_schema("user", {"type": "object", "version": "1.0.0"}, version="1.0.0")
Redis¶
Redis requires no initialization - keys are created automatically:
from pycharter import RedisMetadataStore
store = RedisMetadataStore(
connection_string="redis://localhost:6379/0",
key_prefix="pycharter"
)
store.connect() # Keys created automatically when storing data
# Ready to use
schema_id = store.store_schema("user", {"type": "object", "version": "1.0.0"}, version="1.0.0")
In-Memory¶
In-memory store requires no initialization:
from pycharter import InMemoryMetadataStore
store = InMemoryMetadataStore()
store.connect() # No database needed
# Ready to use
schema_id = store.store_schema("user", {"type": "object", "version": "1.0.0"}, version="1.0.0")
PostgreSQL Initialization (Detailed)¶
PostgreSQL is the most feature-rich option with automatic schema management.
Method 1: Automatic Initialization (Recommended for Development)¶
from pycharter import PostgresMetadataStore
# Create store with connection string
store = PostgresMetadataStore(
connection_string="postgresql://user:password@localhost:5432/pycharter"
)
# Connect - schema is automatically created if it doesn't exist
store.connect() # auto_initialize=True by default
# Ready to use!
schema_id = store.store_schema("user", {"type": "object", "version": "1.0.0"}, version="1.0.0")
Method 2: Manual Initialization (Recommended for Production)¶
For production, you may want to initialize the schema separately:
from pycharter import PostgresMetadataStore
# Step 1: Initialize schema (run once, e.g., in deployment script)
store = PostgresMetadataStore(connection_string="postgresql://...")
store.connect(auto_initialize=True) # Creates schema
store.disconnect()
# Step 2: Connect with validation only (in application)
store = PostgresMetadataStore(connection_string="postgresql://...")
store.connect(auto_initialize=False, validate_schema_on_connect=True)
Method 3: Using CLI¶
# Initialize schema
pycharter db init postgresql://user:pass@localhost/pycharter
# Then in Python, connect with validation only
store = PostgresMetadataStore(connection_string="postgresql://...")
store.connect(auto_initialize=False, validate_schema_on_connect=True)
What Gets Created¶
When PostgreSQL initializes, it creates:
- Tables:
schemas- JSON Schema definitionscoercion_rules- Data type coercion rulesvalidation_rules- Business logic validation rulesgovernance_rules- Governance rulesmetadata_records- Comprehensive metadata storageowners- Ownership informationdata_contracts- Central table linking all componentssystems- System informationdomains- Domain informationmetadata_record_system_pulls- Join table for pulls_from relationshipsmetadata_record_system_pushes- Join table for pushes_to relationshipsmetadata_record_system_sources- Join table for system_sources relationshipsmetadata_record_domains- Join table for domain relationships-
alembic_version- Alembic migration version tracking -
Indexes:
- Indexes on
schemas.data_contractandschemas.version - Indexes on foreign keys
- Composite indexes for common queries
- Unique constraints on key combinations
Connection String Formats¶
PostgreSQL:
Examples: -postgresql://postgres:password@localhost:5432/pycharter
- postgresql://user@localhost/pycharter (no password)
- postgresql://user:pass@host.example.com:5432/pycharter (remote)
MongoDB:
Examples: -mongodb://localhost:27017
- mongodb://user:password@localhost:27017
- mongodb://user:password@host1:27017,host2:27017/pycharter (replica set)
Redis:
Examples: -redis://localhost:6379
- redis://localhost:6379/0 (database 0)
- redis://:password@localhost:6379 (with password)
Docker Setup¶
PostgreSQL¶
docker run -d \
--name pycharter-postgres \
-e POSTGRES_USER=postgres \
-e POSTGRES_PASSWORD=postgres \
-e POSTGRES_DB=pycharter \
-p 5432:5432 \
postgres:latest
Connection: postgresql://postgres:postgres@localhost:5432/pycharter
MongoDB¶
docker run -d \
--name pycharter-mongo \
-e MONGO_INITDB_ROOT_USERNAME=rootUser \
-e MONGO_INITDB_ROOT_PASSWORD=rootPassword \
-p 27017:27017 \
mongodb/mongodb-community-server:latest
Connection: mongodb://rootUser:rootPassword@localhost:27017
Redis¶
Connection: redis://localhost:6379/0
CLI Initialization¶
If you're using PostgreSQL with the CLI, you can initialize the database:
# If you configured via environment variable or config file:
pycharter db init
# Or pass the connection string directly:
pycharter db init postgresql://user:password@localhost:5432/pycharter
This will: 1. ✅ Create all database tables (schemas, coercion_rules, validation_rules, metadata_records, owners, governance_rules, data_contracts, systems, domains, and join tables) 2. ✅ Create all indexes and unique constraints 3. ✅ Set up Alembic version tracking 4. ✅ Stamp the database with the current revision
Verify Initialization¶
Check that everything worked:
# Check current database revision
pycharter db current
# View migration history
pycharter db history
You should see output like:
Database Migrations¶
When you need to change the database structure (add columns, tables, indexes, etc.), you'll create and apply migrations using Alembic.
Quick Reference: Core Migration Commands¶
Check current database revision:
# Option 1: With database URL as argument
pycharter db current postgresql://user:pass@localhost:5432/pycharter
# Option 2: Using environment variable
export PYCHARTER_DATABASE_URL="postgresql://user:pass@localhost:5432/pycharter"
pycharter db current
View migration history:
Run migration (upgrade to latest):
# Option 1: With database URL as argument
pycharter db upgrade postgresql://user:pass@localhost:5432/pycharter
# Option 2: Using environment variable
export PYCHARTER_DATABASE_URL="postgresql://user:pass@localhost:5432/pycharter"
pycharter db upgrade
Rollback migration (downgrade one step):
# Option 1: With database URL as argument
pycharter db downgrade postgresql://user:pass@localhost:5432/pycharter --revision -1
# Option 2: Using environment variable
export PYCHARTER_DATABASE_URL="postgresql://user:pass@localhost:5432/pycharter"
pycharter db downgrade --revision -1
Verify migration:
# Check current revision (should show the new migration)
pycharter db current
# Verify the schema matches the model (PostgreSQL)
# Connect to PostgreSQL and check:
# \d pycharter.metadata_records
Workflow Overview¶
- Modify SQLAlchemy models in
pycharter/db/models/ - Generate migration using Alembic
- Review migration file (auto-generated, may need manual edits)
- Test migration on development database
- Apply migration to production
Step-by-Step: Creating a Migration¶
Example: Adding a New Column¶
Let's say you want to add a description field to the schemas table.
Step 1: Modify the SQLAlchemy Model
Edit pycharter/db/models/schema.py:
from sqlalchemy import Column, Integer, String, JSON, DateTime, Text
from sqlalchemy.sql import func
from sqlalchemy.orm import relationship
from pycharter.db.models.base import Base
class SchemaModel(Base):
__tablename__ = "schemas"
id = Column(Integer, primary_key=True, autoincrement=True)
title = Column(String(255), nullable=False)
data_contract = Column(String(255), nullable=False)
version = Column(String(50), nullable=False)
schema_data = Column(JSON, nullable=False)
description = Column(Text, nullable=True) # ← NEW COLUMN
created_at = Column(DateTime(timezone=True), server_default=func.now())
updated_at = Column(DateTime(timezone=True), server_default=func.now(), onupdate=func.now())
# ... rest of the model
Step 2: Generate Migration
Make sure your database URL is configured, then run:
# Set database URL (if not already configured)
export PYCHARTER_DATABASE_URL=postgresql://user:password@localhost:5432/pycharter
# Generate migration
alembic revision --autogenerate -m "Add description to schemas table"
This creates a new migration file in pycharter/db/migrations/versions/ like:
Step 3: Review the Generated Migration
Open the generated migration file and review it:
"""Add description to schemas table
Revision ID: abc123def456
Revises: ac3d8fcc3e60
Create Date: 2025-11-19 10:00:00.000000
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers
revision = 'abc123def456'
down_revision = 'ac3d8fcc3e60' # Points to previous migration
branch_labels = None
depends_on = None
def upgrade() -> None:
# Add column
op.add_column('schemas', sa.Column('description', sa.Text(), nullable=True))
def downgrade() -> None:
# Remove column (for rollback)
op.drop_column('schemas', 'description')
Important: Review the migration carefully:
- ✅ Check that upgrade() does what you expect
- ✅ Check that downgrade() can reverse the changes
- ✅ Verify column types, constraints, and defaults
- ✅ Remove any unwanted changes (Alembic sometimes detects unrelated changes)
Step 4: Test the Migration
Test on a development database first:
# Apply the migration
pycharter db upgrade
# Verify the change
pycharter db current # Should show new revision
# Test rollback (optional)
pycharter db downgrade --revision -1
pycharter db upgrade # Re-apply
Step 5: Apply to Production
Once tested, apply to production:
# Backup database first!
pg_dump -h localhost -U user pycharter > backup.sql
# Apply migration
pycharter db upgrade
Common Migration Scenarios¶
Adding a New Table¶
1. Create Model (pycharter/db/models/new_table.py):
from sqlalchemy import Column, Integer, String, DateTime
from sqlalchemy.sql import func
from pycharter.db.models.base import Base
class NewTableModel(Base):
__tablename__ = "new_table"
id = Column(Integer, primary_key=True)
name = Column(String(255), nullable=False)
created_at = Column(DateTime(timezone=True), server_default=func.now())
2. Import in pycharter/db/models/__init__.py:
3. Generate migration:
Adding a Column¶
1. Modify model (add column to existing model)
2. Generate migration:
Adding an Index¶
1. Modify model (add index):
from sqlalchemy import Index
class SchemaModel(Base):
# ... columns ...
__table_args__ = (
Index('idx_schemas_name_version', 'name', 'version'),
)
2. Generate migration:
Modifying a Column Type¶
1. Modify model (change column type)
2. Generate migration and manually edit if needed:
def upgrade() -> None:
# Alembic may generate this, but you might need to adjust
op.alter_column('table_name', 'column_name',
type_=sa.String(500), # New type
existing_type=sa.String(255)) # Old type
Migration Commands Reference¶
All commands can be used in two ways:
1. With database URL as argument: pycharter db <command> <database_url>
2. With environment variable: Set PYCHARTER_DATABASE_URL and run pycharter db <command>
View Current Revision¶
# Option 1: With database URL as argument
pycharter db current postgresql://user:pass@localhost:5432/pycharter
# Option 2: Using environment variable
export PYCHARTER_DATABASE_URL="postgresql://user:pass@localhost:5432/pycharter"
pycharter db current
View Migration History¶
Upgrade to Latest¶
# Option 1: With database URL as argument
pycharter db upgrade postgresql://user:pass@localhost:5432/pycharter
# Option 2: Using environment variable
export PYCHARTER_DATABASE_URL="postgresql://user:pass@localhost:5432/pycharter"
pycharter db upgrade
Upgrade to Specific Revision¶
# Option 1: With database URL as argument
pycharter db upgrade postgresql://user:pass@localhost:5432/pycharter --revision abc123def456
# Option 2: Using environment variable
export PYCHARTER_DATABASE_URL="postgresql://user:pass@localhost:5432/pycharter"
pycharter db upgrade --revision abc123def456
Downgrade One Step¶
# Option 1: With database URL as argument
pycharter db downgrade postgresql://user:pass@localhost:5432/pycharter --revision -1
# Option 2: Using environment variable
export PYCHARTER_DATABASE_URL="postgresql://user:pass@localhost:5432/pycharter"
pycharter db downgrade --revision -1
Downgrade to Specific Revision¶
# Option 1: With database URL as argument
pycharter db downgrade postgresql://user:pass@localhost:5432/pycharter --revision ac3d8fcc3e60
# Option 2: Using environment variable
export PYCHARTER_DATABASE_URL="postgresql://user:pass@localhost:5432/pycharter"
pycharter db downgrade --revision ac3d8fcc3e60
Generate Migration (Manual)¶
# Set database URL (if not already configured)
export PYCHARTER_DATABASE_URL="postgresql://user:pass@localhost:5432/pycharter"
# Auto-generate from model changes
alembic revision --autogenerate -m "Description"
# Create empty migration (for manual SQL)
alembic revision -m "Description"
Migration Best Practices¶
1. Always Test First¶
- Test migrations on a development database
- Verify both
upgrade()anddowngrade()work - Test with real data if possible
2. Review Auto-Generated Migrations¶
- Alembic's autogenerate is helpful but not perfect
- Always review and edit migration files
- Remove unwanted changes
- Add data migrations if needed
3. Use Descriptive Messages¶
# Good
alembic revision --autogenerate -m "Add description column to schemas table"
# Bad
alembic revision --autogenerate -m "update"
4. Version Control¶
- Commit migration files to version control
- Never edit existing migrations that have been applied
- Create new migrations for additional changes
5. Backup Before Production¶
# Always backup before applying migrations
pg_dump -h localhost -U user pycharter > backup_$(date +%Y%m%d_%H%M%S).sql
6. Data Migrations¶
If you need to migrate data (not just schema), add it to the migration:
def upgrade() -> None:
# Schema change
op.add_column('schemas', sa.Column('description', sa.Text(), nullable=True))
# Data migration
connection = op.get_bind()
connection.execute(
sa.text("UPDATE schemas SET description = 'Default description' WHERE description IS NULL")
)
Complete Migration Workflow Example¶
# 1. Configure database
export PYCHARTER_DATABASE_URL=postgresql://user:pass@localhost:5432/pycharter
# 2. Initialize (first time only)
pycharter db init
# 3. Make changes to models
# Edit pycharter/db/models/schema.py
# 4. Generate migration
alembic revision --autogenerate -m "Add description to schemas"
# 5. Review migration file
# Edit pycharter/db/migrations/versions/xxx_add_description_to_schemas.py
# 6. Test migration
pycharter db upgrade
pycharter db current # Verify
# 7. Apply to production (after testing)
pycharter db upgrade
Troubleshooting¶
Configuration Issues¶
Problem: Can't connect to database
Solutions: - Check connection string format - Verify database is running - Check credentials - Verify network connectivity
PostgreSQL: "relation does not exist"¶
Solution: Schema wasn't initialized. Run:
PostgreSQL: "permission denied"¶
Solution: User needs CREATE TABLE permission:
MongoDB: "authentication failed"¶
Solution: Check username/password in connection string:
Redis: "Connection refused"¶
Solution: Make sure Redis is running:
Migration Fails¶
Problem: Migration fails with error
Solution:
# Check current state
pycharter db current
# View migration history
pycharter db history
# Try to fix manually or rollback
pycharter db downgrade --revision -1
Common Migration Errors:
Migration fails with "column does not exist"¶
- The column might have already been removed
- Check current database state:
pycharter db current - You may need to manually adjust the migration
Migration fails with "column already exists"¶
- The column might already exist
- Check the database schema directly (e.g.,
\d pycharter.metadata_recordsin PostgreSQL) - You may need to stamp the database:
pycharter db stamp <revision>
Connection errors during migration¶
- Verify PostgreSQL is running
- Check connection string format
- Ensure database exists and user has permissions
Tables Already Exist¶
Problem: pycharter db init says tables already exist
Solution: Use --force flag or stamp the database:
Migration Out of Sync¶
Problem: Database state doesn't match migrations
Solution: Stamp database to current state:
# Check what revision database thinks it's at
pycharter db current
# Stamp to correct revision
alembic stamp head # or specific revision
Need to Edit Migration¶
Problem: Need to modify a migration before applying
Solution: Edit the migration file before running pycharter db upgrade. Never edit migrations that have already been applied to production.
UI theme¶
The global color theme for the PyCharter web UI (light / dark / system) can be set via:
- Environment variable:
PYCHARTER_UI_THEME— one oflight,dark, orsystem(follow OS preference). Default:light. - Config file:
pycharter.cfgsection[ui], keyPYCHARTER_UI_THEME. Same values. Env overrides .cfg.
When the UI is served by the Python server (pycharter ui or similar), the theme is applied before first paint to avoid a flash. In development (next dev), the app fetches theme from the API and applies it after load.
UI app name¶
The website/app name shown in the UI (nav bar, login, settings) can be overridden via:
- Environment variable:
PYCHARTER_UI_APP_NAME— display name (e.g.My Company Portal). Default:PyCharter. - Config file:
pycharter.cfgsection[ui], keyPYCHARTER_UI_APP_NAME. Env overrides .cfg.
Summary¶
- Configuration: Use environment variables or config files (see Database Connection Configuration)
- Initialization: Most stores auto-initialize on connect; PostgreSQL can use CLI:
pycharter db init - Create Migration: Modify models →
alembic revision --autogenerate -m "message" - Apply Migration:
pycharter db upgrade - Rollback:
pycharter db downgrade --revision -1 - Check Status:
pycharter db current
For more details, see:
- pycharter/db/README.md - Database management overview
- Data Journey Guide - Complete workflow guide
- Alembic documentation: https://alembic.sqlalchemy.org/
Configuration Priority¶
PyCharter checks for database configuration in this order:
-
Command-line argument (highest priority)
-
Environment variable:
PYCHARTER_DATABASE_URL -
Config file:
pycharter.cfg(in project root or~/.pycharter/) -
Alembic config:
alembic.ini(fallback)
Usage Examples¶
CLI Commands¶
Once configured, you can run commands without passing the database URL:
# Initialize database
pycharter db init
# Upgrade database
pycharter db upgrade
# Check current revision
pycharter db current
# View migration history
pycharter db history
Python Code¶
PostgresMetadataStore also uses the configuration:
from pycharter import PostgresMetadataStore
# Uses configuration automatically
store = PostgresMetadataStore()
store.connect()
# Or override with explicit connection string
store = PostgresMetadataStore("postgresql://other:db@localhost:5432/other_db")
Config File Locations¶
PyCharter looks for pycharter.cfg in:
- Current working directory
~/.pycharter/pycharter.cfg(user home directory)- Project root (where
alembic.iniis located)
Comparison with Airflow¶
| Feature | Airflow | PyCharter |
|---|---|---|
| Env Var | AIRFLOW__DATABASE__SQL_ALCHEMY_CONN |
PYCHARTER_DATABASE_URL |
| Config File | airflow.cfg |
pycharter.cfg |
| Config Section | [database] |
[database] |
| Config Key | sql_alchemy_conn |
PYCHARTER_DATABASE_URL |
Best Practices¶
-
Development: Use environment variables
-
Production: Use config file or environment variables (set by deployment system)
-
CI/CD: Use environment variables (set in CI/CD secrets)
-
Docker: Use environment variables or mounted config file
Security Notes¶
- Never commit
pycharter.cfgwith credentials to version control - Use environment variables or secrets management in production
- Consider using
pycharter.cfg.exampleas a template (without credentials)