Data Quality: Contract vs Pipeline¶
PyCharter supports two distinct kinds of data quality checks. Use the right one for your use case.
Two Types of Quality¶
| Contract quality (row-based) | Pipeline quality (column/dataset-based) | |
|---|---|---|
| Scope | Each record validated against a contract | Batches/columns (e.g. row count, null rate) |
| Output | Per-record results, accuracy, completeness, score 0–100 | Pass/fail per check (row count, null rate, uniqueness, etc.) |
| When to use | “Is this row valid?” Record-level validation and metrics | “Did this run meet expectations?” ETL run checks |
| API | POST /quality/check (REST), QualityCheck (Python) |
ETL run with quality_checks in load config; pipeline_quality_report in run response |
| Runs dashboard | contract_quality_score, contract_quality_passed |
pipeline_quality_passed |
Contract Quality (Row-Based)¶
- What it is: Validate each row against a data contract (schema + rules). You get per-record pass/fail, aggregate metrics (accuracy, completeness, overall score), and optional violation storage.
- Use for: Incoming data validation, data quality dashboards, compliance, and any “is this record valid?” question.
- Docs: Data Quality Monitoring, QualityCheck API, REST
POST /quality/check.
Pipeline Quality (Column/Dataset-Based)¶
- What it is: Checks run after a pipeline load (e.g. row count, null rate, uniqueness, custom expressions). Results are pass/fail per check and an overall “passed” for the run.
- Use for: ETL run assurance: “Did this run have the expected row count?”, “Was null rate within bounds?”
- Docs: Pipeline quality guide, ETL run response field
pipeline_quality_report, REST runs fieldspipeline_quality_passed.
Choosing One or Both¶
- Record-level only: Use contract quality (e.g.
POST /quality/checkorQualityCheck.run()). - Run-level only: Use pipeline quality by adding
quality_checksto your ETL load config; readpipeline_quality_reportandpipeline_quality_passedfrom the run. - Both: Run contract quality for record validation and metrics; run pipeline quality in the same or a separate ETL run for batch/column checks. The runs API exposes both via
contract_quality_*andpipeline_quality_passed.
REST API Summary¶
- Contract quality:
POST /api/v1/quality/check— request body includes contract/schema and data; response includesquality_score,passed, violations. - Pipeline quality:
POST /api/v1/etl/runwith load config that includesquality_checks— response includespipeline_quality_report(and runs storepipeline_quality_passed). - Runs:
GET /api/v1/runs(and related) returncontract_quality_score,contract_quality_passed, andpipeline_quality_passedso you can tell which type of quality was run and the result.
See REST API for endpoint details.