Changelog¶
All notable changes to PyCharter are documented here.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[Unreleased]¶
Added¶
- Messaging extractors:
KafkaExtractor,RabbitMQExtractor,SQSExtractorfor consuming from message queues with at-least-once delivery guarantees. Each extractor supports deferred acknowledgment after successful batch load. - AckableExtractor protocol: New protocol extending
Extractorwith anacknowledge(batch_index, success)method. The pipeline calls it automatically after each batch. - Streaming extractors:
SSEExtractor(Server-Sent Events),WebSocketExtractor,FileWatcherExtractorfor real-time data feeds with automatic reconnection and batch accumulation. - Pipeline acknowledgment:
Pipeline.run()automatically acknowledges batches for messaging extractors — committing offsets (Kafka), acking messages (RabbitMQ), or deleting messages (SQS) after successful load. - Messaging config models:
KafkaExtractConfig,RabbitMQExtractConfig,SQSExtractConfigPydantic models for YAML-driven messaging pipelines. - New install extras:
pycharter[kafka],pycharter[rabbitmq],pycharter[messaging]for message queue dependencies. - ETL bulk runs:
ETLOrchestrator.run_bulk()— run a pipeline for many values of a single path parameter (e.g. many symbols) with parallel extraction, rate limiting, and a single transform/load pass. Param values can come fromdefault_param_valuesinextract.yaml. - Path parameter injection: HTTP extractor now injects path param values (e.g.
{symbol}) into every extracted record so transforms/loaders have access without extra logic. - Config helpers:
get_extract_config(),get_path_param_names(),get_default_param_values(param_name)onETLOrchestratorfor discovery and bulk defaults. - AsyncRateLimiter: Reusable sliding-window rate limiter (
max_calls_per_minute) for async ETL extraction; used byrun_bulk(). - Extract config:
default_param_valuesandmax_calls_per_minutein HTTP extract config for bulk runs and rate limiting. - get_path_param_names(api_endpoint): Utility in
pycharter.etl_generator.extractorsto get{param}placeholder names from an endpoint string. - Incremental extraction: Watermark-based state tracking across pipeline runs with
FileStateStoreandSqliteStateStore. - Testing framework:
MockExtractor,MockLoader,PipelineTestHarness, and assertion helpers for unit-testing pipelines without real I/O. - MkDocs-Material documentation site
- Comprehensive tutorials for all major features
- API reference documentation with mkdocstrings
- Jupyter notebook examples
Changed¶
- Refactored ETL factory methods:
create()replacesget_extractor()/get_loader() - Standardized config field
typereplacessource_type/target_type - Consolidated loader files (merged
file.py+file_loader.py)
Removed¶
- Legacy factory methods (
get_extractor,register_extractor,get_loader) - Deprecated
types.tsduplicate definitions - Legacy config field support (
source_type,target_type)
[0.0.25] - 2024-01-XX¶
Added¶
- ETL Pipeline with
|operator for fluent composition - Config-driven pipelines from YAML files
- HTTPExtractor, FileExtractor, DatabaseExtractor, CloudStorageExtractor
- PostgresLoader, FileLoader, CloudStorageLoader
- Transformers: Rename, Filter, AddField, Drop, Select, Convert, CustomFunction
- PipelineBuilder for programmatic pipeline construction
- Variable substitution with
${VAR}syntax - ErrorMode (STRICT, LENIENT, COLLECT) for error handling
Changed¶
- Pipeline
run()is now async - Improved error messages with context
[0.0.24] - 2024-01-XX¶
Added¶
- QualityCheck class for data quality monitoring
- QualityThresholds for alerting
- Violation tracking and querying
- Quality metrics history
Fixed¶
- Memory leak in batch validation
- Timezone handling in datetime coercion
[0.0.23] - 2024-01-XX¶
Added¶
- Validator class as primary validation interface
- Factory methods:
from_file(),from_dir(),from_files(),from_dict() - Batch validation with
validate_batch() - Contract builder for consolidating artifacts
Changed¶
- Deprecated convenience functions in favor of Validator class
[0.0.22] - 2024-01-XX¶
Added¶
- Schema evolution: compatibility checking and diff
- CompatibilityMode (BACKWARD, FORWARD, FULL, NONE)
- Schema versioning in metadata store
[0.0.21] - 2024-01-XX¶
Added¶
- MongoDBMetadataStore
- RedisMetadataStore
- SQLiteMetadataStore
Changed¶
- Unified MetadataStoreClient interface
[0.0.20] - 2024-01-XX¶
Added¶
- REST API with FastAPI
- Swagger/OpenAPI documentation
- Web UI with React/Next.js
- CLI commands:
pycharter api,pycharter ui
[0.0.15] - 2024-01-XX¶
Added¶
- Custom coercion registration
- Custom validation registration
- Built-in coercions:
coerce_to_datetime,coerce_to_date,coerce_to_uuid - Built-in validations:
is_email,is_url,matches_regex
[0.0.10] - 2024-01-XX¶
Added¶
- Initial release
- JSON Schema Draft 2020-12 support
- Pydantic model generation
- Basic coercion and validation
- InMemoryMetadataStore
- PostgresMetadataStore
Version History Summary¶
| Version | Highlights |
|---|---|
| 0.0.25 | ETL Pipelines with \| operator |
| 0.0.24 | Quality monitoring and alerting |
| 0.0.23 | Validator class, contract builder |
| 0.0.22 | Schema evolution |
| 0.0.21 | MongoDB, Redis, SQLite stores |
| 0.0.20 | REST API and Web UI |
| 0.0.15 | Custom coercion/validation |
| 0.0.10 | Initial release |