Skip to content

Data Quality: Contract vs Pipeline

PyCharter supports two distinct kinds of data quality checks. Use the right one for your use case.

Two Types of Quality

Contract quality (row-based) Pipeline quality (column/dataset-based)
Scope Each record validated against a contract Batches/columns (e.g. row count, null rate)
Output Per-record results, accuracy, completeness, score 0–100 Pass/fail per check (row count, null rate, uniqueness, etc.)
When to use “Is this row valid?” Record-level validation and metrics “Did this run meet expectations?” ETL run checks
API POST /quality/check (REST), QualityCheck (Python) ETL run with quality_checks in load config; pipeline_quality_report in run response
Runs dashboard contract_quality_score, contract_quality_passed pipeline_quality_passed

Contract Quality (Row-Based)

  • What it is: Validate each row against a data contract (schema + rules). You get per-record pass/fail, aggregate metrics (accuracy, completeness, overall score), and optional violation storage.
  • Use for: Incoming data validation, data quality dashboards, compliance, and any “is this record valid?” question.
  • Docs: Data Quality Monitoring, QualityCheck API, REST POST /quality/check.

Pipeline Quality (Column/Dataset-Based)

  • What it is: Checks run after a pipeline load (e.g. row count, null rate, uniqueness, custom expressions). Results are pass/fail per check and an overall “passed” for the run.
  • Use for: ETL run assurance: “Did this run have the expected row count?”, “Was null rate within bounds?”
  • Docs: Pipeline quality guide, ETL run response field pipeline_quality_report, REST runs fields pipeline_quality_passed.

Choosing One or Both

  • Record-level only: Use contract quality (e.g. POST /quality/check or QualityCheck.run()).
  • Run-level only: Use pipeline quality by adding quality_checks to your ETL load config; read pipeline_quality_report and pipeline_quality_passed from the run.
  • Both: Run contract quality for record validation and metrics; run pipeline quality in the same or a separate ETL run for batch/column checks. The runs API exposes both via contract_quality_* and pipeline_quality_passed.

REST API Summary

  • Contract quality: POST /api/v1/quality/check — request body includes contract/schema and data; response includes quality_score, passed, violations.
  • Pipeline quality: POST /api/v1/etl/run with load config that includes quality_checks — response includes pipeline_quality_report (and runs store pipeline_quality_passed).
  • Runs: GET /api/v1/runs (and related) return contract_quality_score, contract_quality_passed, and pipeline_quality_passed so you can tell which type of quality was run and the result.

See REST API for endpoint details.