Testing Framework¶
pycharter.etl_generator.testing provides mock components, assertion helpers, fixture loading, and a test harness for writing isolated unit and integration tests of ETL pipelines.
See Testing Pipelines for the full guide.
Classes¶
MockExtractor
¶
Mock extractor that yields pre-configured fixture data.
Implements the Extractor protocol for testing pipelines without real data sources.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
list[dict[str, Any]] | list[list[dict[str, Any]]]
|
Fixture records. Either a flat list of dicts (auto-batched by batch_size) or a pre-batched list of lists. |
required |
batch_size
|
int
|
Number of records per batch when data is flat. |
1000
|
Source code in src/pycharter/etl_generator/testing.py
extract
async
¶
Yield fixture data as batches.
Yields:
| Type | Description |
|---|---|
AsyncIterator[list[dict[str, Any]]]
|
Batches of records from the configured fixture data. |
Source code in src/pycharter/etl_generator/testing.py
MockLoader
dataclass
¶
MockLoader(
simulate_failure: bool = False,
failure_error: str = "Simulated load failure",
loaded_records: list[dict[str, Any]] = list(),
load_calls: list[list[dict[str, Any]]] = list(),
)
Mock loader that captures loaded records for assertion.
Implements the Loader protocol. Records are accumulated in
loaded_records and each call's batch is stored in load_calls.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
simulate_failure
|
bool
|
If True, |
False
|
failure_error
|
str
|
Error message used when simulating failure. |
'Simulated load failure'
|
load
async
¶
load(
data: list[dict[str, Any]], **params: Any
) -> LoadResult
Capture loaded data and return a result.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
list[dict[str, Any]]
|
Batch of records to load. |
required |
**params
|
Any
|
Ignored. |
{}
|
Returns:
| Type | Description |
|---|---|
LoadResult
|
LoadResult indicating success or simulated failure. |
Source code in src/pycharter/etl_generator/testing.py
PipelineTestHarness
¶
PipelineTestHarness(
pipeline: Any,
fixture_data: (
list[dict[str, Any]] | list[list[dict[str, Any]]]
),
batch_size: int = 1000,
)
Run a pipeline with mock I/O injected.
Works with both programmatic and config-driven pipelines by replacing the extractor and loader with mock implementations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pipeline
|
Any
|
The pipeline to test. |
required |
fixture_data
|
list[dict[str, Any]] | list[list[dict[str, Any]]]
|
Test data — flat list of dicts or pre-batched list of lists. |
required |
batch_size
|
int
|
Batch size for the mock extractor. |
1000
|
Example
harness = PipelineTestHarness( ... pipeline, fixture_data=[{"id": 1, "name": "Alice"}] ... ) result = await harness.run() assert result.success assert harness.loaded_records == [{"id": 1, "name": "Alice"}]
Source code in src/pycharter/etl_generator/testing.py
run
async
¶
run(**params: Any) -> PipelineResult
Run the pipeline with mock extractor and loader.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
**params
|
Any
|
Passed through to |
{}
|
Returns:
| Type | Description |
|---|---|
PipelineResult
|
PipelineResult from the pipeline execution. |
Source code in src/pycharter/etl_generator/testing.py
TestFixture
dataclass
¶
TestFixture(
name: str = "",
records: tuple[dict[str, Any], ...] = (),
batches: tuple[tuple[dict[str, Any], ...], ...] = (),
metadata: dict[str, Any] = dict(),
)
Container for loaded test fixture data.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Optional fixture name from metadata. |
records |
tuple[dict[str, Any], ...]
|
Flat list of records. |
batches |
tuple[tuple[dict[str, Any], ...], ...]
|
Pre-batched records (empty if data was flat). |
metadata |
dict[str, Any]
|
Additional metadata from the fixture file. |
Functions¶
load_fixture
¶
Load fixture data from a YAML or JSON file.
Supported formats
- Top-level list:
[{id: 1}, ...] - Dict with
recordskey:{records: [{id: 1}, ...]} - Dict with
batcheskey: flattened into a single list.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | Path
|
Path to the fixture file (.yaml, .yml, or .json). |
required |
Returns:
| Type | Description |
|---|---|
list[dict[str, Any]]
|
Flat list of records. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the file does not exist. |
ValueError
|
If the format is unrecognized. |
Source code in src/pycharter/etl_generator/testing.py
load_test_fixture
¶
load_test_fixture(path: str | Path) -> TestFixture
Load a test fixture with full metadata.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | Path
|
Path to the fixture file (.yaml, .yml, or .json). |
required |
Returns:
| Type | Description |
|---|---|
TestFixture
|
TestFixture with records, batches, and metadata. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the file does not exist. |
ValueError
|
If the format is unrecognized. |
Source code in src/pycharter/etl_generator/testing.py
validate_pipeline_config
¶
validate_pipeline_config(
config: dict[str, Any] | str | Path,
*,
variables: dict[str, str] | None = None
) -> tuple[bool, list[dict[str, Any]]]
Validate a pipeline configuration for correctness.
Wraps the existing ConfigValidator for a simpler testing API.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
dict[str, Any] | str | Path
|
Pipeline config as a dict, file path string, or Path object. |
required |
variables
|
dict[str, str] | None
|
Optional variables for |
None
|
Returns:
| Type | Description |
|---|---|
bool
|
Tuple of (is_valid, errors) where errors is a list of dicts |
list[dict[str, Any]]
|
with 'section' and 'message' keys. |
Source code in src/pycharter/etl_generator/testing.py
assert_records_match
¶
assert_records_match(
actual: list[dict[str, Any]],
expected: list[dict[str, Any]],
*,
order_matters: bool = True,
subset: bool = False
) -> None
Assert that actual records match expected records.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
actual
|
list[dict[str, Any]]
|
Records produced by the pipeline. |
required |
expected
|
list[dict[str, Any]]
|
Expected records. |
required |
order_matters
|
bool
|
If True, records must be in the same order. |
True
|
subset
|
bool
|
If True, actual must contain expected as a subset. |
False
|
Raises:
| Type | Description |
|---|---|
AssertionError
|
If records do not match. |
Source code in src/pycharter/etl_generator/testing.py
assert_record_count
¶
assert_record_count(
result: PipelineResult, expected: int
) -> None
Assert the number of loaded rows in a pipeline result.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
result
|
PipelineResult
|
PipelineResult from a pipeline run. |
required |
expected
|
int
|
Expected number of loaded rows. |
required |
Raises:
| Type | Description |
|---|---|
AssertionError
|
If the count does not match. |
Source code in src/pycharter/etl_generator/testing.py
assert_fields_present
¶
Assert that all specified fields exist in every record.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
records
|
list[dict[str, Any]]
|
List of records to check. |
required |
fields
|
list[str]
|
Field names that must be present. |
required |
Raises:
| Type | Description |
|---|---|
AssertionError
|
If any field is missing from any record. |
Source code in src/pycharter/etl_generator/testing.py
assert_no_field
¶
Assert that a field does not exist in any record.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
records
|
list[dict[str, Any]]
|
List of records to check. |
required |
field_name
|
str
|
Field name that must be absent. |
required |
Raises:
| Type | Description |
|---|---|
AssertionError
|
If the field is found in any record. |
Source code in src/pycharter/etl_generator/testing.py
assert_field_values
¶
assert_field_values(
records: list[dict[str, Any]],
field_name: str,
expected_values: list[Any],
) -> None
Assert that a specific field has the expected values across records.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
records
|
list[dict[str, Any]]
|
List of records to check. |
required |
field_name
|
str
|
Field to extract values from. |
required |
expected_values
|
list[Any]
|
Expected values in order. |
required |
Raises:
| Type | Description |
|---|---|
AssertionError
|
If values do not match. |
Source code in src/pycharter/etl_generator/testing.py
assert_schema_shape
¶
Assert that field values match expected types.
None values are allowed for any field type.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
records
|
list[dict[str, Any]]
|
List of records to check. |
required |
schema
|
dict[str, type]
|
Mapping of field name to expected Python type. |
required |
Raises:
| Type | Description |
|---|---|
AssertionError
|
If any field has an unexpected type. |
Source code in src/pycharter/etl_generator/testing.py
Import¶
from pycharter.etl_generator.testing import (
# Mock components
MockExtractor,
MockLoader,
# Test harness
PipelineTestHarness,
# Fixtures
load_fixture,
load_test_fixture,
TestFixture,
# Config validation
validate_pipeline_config,
# Assertion helpers
assert_records_match,
assert_record_count,
assert_fields_present,
assert_no_field,
assert_field_values,
assert_schema_shape,
)