Quality¶
PyCharter provides two quality systems. This page covers contract quality — row-level validation with scoring, violations, and profiling. For column/dataset-level checks after ETL loads, see Pipeline Quality and PostLoadChecker.
Quick Start¶
One-Liner Quality Check¶
The fastest way to check quality:
from pycharter import check_quality
report = check_quality(
contract={"schema": {
"version": "1.0.0",
"properties": {"name": {"type": "string"}, "email": {"type": "string", "format": "email"}},
"required": ["name", "email"]
}},
data=[
{"name": "Alice", "email": "alice@example.com"},
{"name": "", "email": "invalid"},
],
)
print(f"Score: {report.quality_score.overall_score:.1f}/100")
print(f"Valid: {report.valid_count}/{report.record_count}")
Quick Data Profiling¶
Profile a dataset without a contract:
from pycharter import profile_data
profile = profile_data([{"name": "Alice", "age": 30}, {"name": "Bob", "age": None}])
print(f"Records: {profile['record_count']}")
print(f"Age nulls: {profile['field_profiles']['age']['null_count']}")
Convenience Functions¶
check_quality¶
from pycharter import check_quality
report = check_quality(
contract=contract_dict_or_file_path,
data=records_or_file_path,
options=None, # Defaults to QualityCheckOptions.basic()
)
| Parameter | Type | Description |
|---|---|---|
contract |
dict \| str |
Contract dict or file path |
data |
list[dict] \| str \| Callable |
Records, file path, or callable |
options |
QualityCheckOptions \| None |
Options (defaults to basic()) |
Returns: QualityReport
check_quality_with_store¶
from pycharter import check_quality_with_store
report = check_quality_with_store(
store=store,
contract_name="user",
contract_version="1.0.0",
data=records,
)
| Parameter | Type | Description |
|---|---|---|
store |
MetadataStoreClient |
Connected metadata store |
contract_name |
str |
Contract name in the store |
contract_version |
str |
Contract version |
data |
list[dict] \| str \| Callable |
Records, file path, or callable |
options |
QualityCheckOptions \| None |
Options (defaults to basic()) |
Returns: QualityReport
profile_data¶
from pycharter import profile_data
profile = profile_data(data, fields=["name", "age"]) # or fields=None for all
| Parameter | Type | Description |
|---|---|---|
data |
list[dict] |
Records to profile |
fields |
list[str] \| None |
Subset of fields (all if None) |
Returns: dict with record_count, field_profiles, overall_stats
See the Data Profiling Guide for the full profile structure.
QualityCheckOptions Presets¶
Instead of configuring every option, use a preset:
from pycharter import QualityCheckOptions
opts = QualityCheckOptions.basic() # Quick check
opts = QualityCheckOptions.strict() # Gated check with thresholds
opts = QualityCheckOptions.monitoring() # Recurring check with dedup
| Preset | Metrics | Violations | Profiling | Thresholds | Skip unchanged | Dedup |
|---|---|---|---|---|---|---|
basic() |
Yes | Yes | No | No | No | Yes |
strict() |
Yes | Yes | Yes | Yes (defaults) | No | Yes |
monitoring() |
Yes | Yes | Yes | Yes (defaults) | Yes | Yes |
You can also customize any preset:
QualityCheck Class¶
For store-backed schemas, database persistence, or advanced control:
from pycharter import QualityCheck, QualityThresholds
check = QualityCheck(store=store)
report = check.run(
schema_id="user_schema",
data=records,
thresholds=QualityThresholds(min_overall_score=95.0)
)
API Reference¶
QualityCheck
¶
QualityCheck(
store: MetadataStoreClient | None = None,
db_session: "Session" | None = None,
)
Contract-based quality scoring engine — orchestrator-agnostic.
Validates data against a data contract, calculates quality scores, records violations, and optionally checks thresholds.
This class can be used: - Standalone (CLI, API, Python scripts) - Within orchestrators (Airflow, Prefect, Dagster) - Via API calls
For post-load structural checks (row count, null rate, uniqueness),
see PostLoadChecker in pycharter.etl_generator.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
store
|
MetadataStoreClient | None
|
Optional metadata store for retrieving contracts and storing violations |
None
|
db_session
|
'Session' | None
|
Optional SQLAlchemy database session for persisting metrics and violations |
None
|
Source code in src/pycharter/quality/check.py
run
¶
run(
contract_name: str | None = None,
contract_version: str | None = None,
contract: dict[str, Any] | str | None = None,
data: (
list[dict[str, Any]]
| str
| Callable[[], Any]
| None
) = None,
options: QualityCheckOptions | None = None,
) -> QualityReport
Run a quality check against a data contract. Use (contract_name, contract_version) for store-based validation, or contract for in-memory.
Source code in src/pycharter/quality/check.py
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 | |
run_by_state
¶
run_by_state(
contract_name: str | None = None,
contract_version: str | None = None,
contract: dict[str, Any] | str | None = None,
data: (
list[dict[str, Any]]
| str
| Callable[[], Any]
| None
) = None,
state_field: str = "status",
options: QualityCheckOptions | None = None,
) -> dict[str, QualityReport]
Run quality check segmented by state value.
Groups the data by the value of state_field, runs a separate
quality check for each group, and returns a mapping from state
value to :class:QualityReport.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
contract_name
|
str | None
|
Contract name for store-based validation. |
None
|
contract_version
|
str | None
|
Contract version for store-based validation. |
None
|
contract
|
dict[str, Any] | str | None
|
In-memory contract dict or file path. |
None
|
data
|
list[dict[str, Any]] | str | Callable[[], Any] | None
|
Data source (list, file path, or callable). |
None
|
state_field
|
str
|
Field name to group records by (default |
'status'
|
options
|
QualityCheckOptions | None
|
Optional quality check options (applied to each group). |
None
|
Returns:
| Type | Description |
|---|---|
dict[str, QualityReport]
|
Dict mapping each unique state value to its |
Example
qc = QualityCheck() reports = qc.run_by_state( ... contract=contract_dict, ... data=[{"status": "NEW", "x": 1}, {"status": "ACTIVE", "x": 2}], ... state_field="status", ... ) print(reports.keys()) # dict_keys(['NEW', 'ACTIVE'])
Source code in src/pycharter/quality/check.py
QualityThresholds¶
QualityThresholds
¶
Bases: BaseModel
Quality thresholds for alerting.
QualityCheckOptions¶
QualityCheckOptions
¶
Bases: BaseModel
Options for quality checks.
basic
classmethod
¶
basic() -> QualityCheckOptions
Create options for quick one-off quality checks.
Enables metrics and violation recording. Disables profiling and threshold checking for speed.
Returns:
| Type | Description |
|---|---|
QualityCheckOptions
|
QualityCheckOptions configured for basic checks. |
Source code in src/pycharter/quality/models.py
strict
classmethod
¶
strict() -> QualityCheckOptions
Create options for gated quality checks.
Enables all features including profiling and threshold checking with default thresholds. Use this when quality must meet minimum standards before proceeding.
Returns:
| Type | Description |
|---|---|
QualityCheckOptions
|
QualityCheckOptions configured for strict checks. |
Source code in src/pycharter/quality/models.py
monitoring
classmethod
¶
monitoring() -> QualityCheckOptions
Create options for scheduled/recurring quality checks.
Enables all features plus deduplication and skip-if-unchanged to avoid redundant work in monitoring pipelines.
Returns:
| Type | Description |
|---|---|
QualityCheckOptions
|
QualityCheckOptions configured for monitoring. |
Source code in src/pycharter/quality/models.py
QualityReport¶
The report returned by QualityCheck.run() and the convenience functions:
| Attribute | Type | Description |
|---|---|---|
schema_id |
str |
Schema identifier |
check_timestamp |
str |
ISO timestamp |
quality_score |
QualityScore |
Quality metrics |
field_metrics |
dict |
Per-field metrics |
record_count |
int |
Total records |
valid_count |
int |
Valid records |
invalid_count |
int |
Invalid records |
violation_count |
int |
Total violations |
threshold_breaches |
list[str] |
Breached thresholds |
passed |
bool |
All thresholds passed |
QualityScore¶
| Attribute | Type | Description |
|---|---|---|
overall_score |
float |
0-100 quality score |
violation_rate |
float |
0-1 violation ratio |
completeness |
float |
0-1 completeness ratio |
accuracy |
float |
0-1 accuracy ratio |
field_scores |
dict[str, float] |
Per-field scores |
Examples¶
One-Liner with Strict Thresholds¶
from pycharter import check_quality, QualityCheckOptions
report = check_quality(
contract="contracts/user.yaml",
data="data/users.json",
options=QualityCheckOptions.strict(),
)
if not report.passed:
print(f"Breaches: {report.threshold_breaches}")
Store-Based with Custom Options¶
from pycharter import QualityCheck, QualityCheckOptions, QualityThresholds
check = QualityCheck(store=store)
report = check.run(
schema_id="user_schema",
data=records,
options=QualityCheckOptions(
calculate_metrics=True,
record_violations=True,
check_thresholds=True,
thresholds=QualityThresholds(min_overall_score=95.0),
include_field_metrics=True,
sample_size=1000,
)
)
Quality Gate in a Pipeline¶
from pycharter import check_quality, QualityCheckOptions
report = check_quality(contract="contracts/orders.yaml", data=loaded_records,
options=QualityCheckOptions.strict())
if not report.passed:
raise RuntimeError(f"Quality gate failed: {report.threshold_breaches}")
See Also¶
- Quality Monitoring Tutorial — full walkthrough
- Data Profiling Guide — profile structure and drift detection
- Pipeline Quality Checks — column/dataset-level checks with
PostLoadChecker - Contract vs Pipeline Quality — when to use each
- Metadata Store — storing schemas and quality metrics