Skip to content

Changelog

All notable changes to PyCharter are documented here.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[Unreleased]

Added

  • Messaging extractors: KafkaExtractor, RabbitMQExtractor, SQSExtractor for consuming from message queues with at-least-once delivery guarantees. Each extractor supports deferred acknowledgment after successful batch load.
  • AckableExtractor protocol: New protocol extending Extractor with an acknowledge(batch_index, success) method. The pipeline calls it automatically after each batch.
  • Streaming extractors: SSEExtractor (Server-Sent Events), WebSocketExtractor, FileWatcherExtractor for real-time data feeds with automatic reconnection and batch accumulation.
  • Pipeline acknowledgment: Pipeline.run() automatically acknowledges batches for messaging extractors — committing offsets (Kafka), acking messages (RabbitMQ), or deleting messages (SQS) after successful load.
  • Messaging config models: KafkaExtractConfig, RabbitMQExtractConfig, SQSExtractConfig Pydantic models for YAML-driven messaging pipelines.
  • New install extras: pycharter[kafka], pycharter[rabbitmq], pycharter[messaging] for message queue dependencies.
  • ETL bulk runs: ETLOrchestrator.run_bulk() — run a pipeline for many values of a single path parameter (e.g. many symbols) with parallel extraction, rate limiting, and a single transform/load pass. Param values can come from default_param_values in extract.yaml.
  • Path parameter injection: HTTP extractor now injects path param values (e.g. {symbol}) into every extracted record so transforms/loaders have access without extra logic.
  • Config helpers: get_extract_config(), get_path_param_names(), get_default_param_values(param_name) on ETLOrchestrator for discovery and bulk defaults.
  • AsyncRateLimiter: Reusable sliding-window rate limiter (max_calls_per_minute) for async ETL extraction; used by run_bulk().
  • Extract config: default_param_values and max_calls_per_minute in HTTP extract config for bulk runs and rate limiting.
  • get_path_param_names(api_endpoint): Utility in pycharter.etl_generator.extractors to get {param} placeholder names from an endpoint string.
  • Incremental extraction: Watermark-based state tracking across pipeline runs with FileStateStore and SqliteStateStore.
  • Testing framework: MockExtractor, MockLoader, PipelineTestHarness, and assertion helpers for unit-testing pipelines without real I/O.
  • MkDocs-Material documentation site
  • Comprehensive tutorials for all major features
  • API reference documentation with mkdocstrings
  • Jupyter notebook examples

Changed

  • Refactored ETL factory methods: create() replaces get_extractor()/get_loader()
  • Standardized config field type replaces source_type/target_type
  • Consolidated loader files (merged file.py + file_loader.py)

Removed

  • Legacy factory methods (get_extractor, register_extractor, get_loader)
  • Deprecated types.ts duplicate definitions
  • Legacy config field support (source_type, target_type)

[0.0.25] - 2024-01-XX

Added

  • ETL Pipeline with | operator for fluent composition
  • Config-driven pipelines from YAML files
  • HTTPExtractor, FileExtractor, DatabaseExtractor, CloudStorageExtractor
  • PostgresLoader, FileLoader, CloudStorageLoader
  • Transformers: Rename, Filter, AddField, Drop, Select, Convert, CustomFunction
  • PipelineBuilder for programmatic pipeline construction
  • Variable substitution with ${VAR} syntax
  • ErrorMode (STRICT, LENIENT, COLLECT) for error handling

Changed

  • Pipeline run() is now async
  • Improved error messages with context

[0.0.24] - 2024-01-XX

Added

  • QualityCheck class for data quality monitoring
  • QualityThresholds for alerting
  • Violation tracking and querying
  • Quality metrics history

Fixed

  • Memory leak in batch validation
  • Timezone handling in datetime coercion

[0.0.23] - 2024-01-XX

Added

  • Validator class as primary validation interface
  • Factory methods: from_file(), from_dir(), from_files(), from_dict()
  • Batch validation with validate_batch()
  • Contract builder for consolidating artifacts

Changed

  • Deprecated convenience functions in favor of Validator class

[0.0.22] - 2024-01-XX

Added

  • Schema evolution: compatibility checking and diff
  • CompatibilityMode (BACKWARD, FORWARD, FULL, NONE)
  • Schema versioning in metadata store

[0.0.21] - 2024-01-XX

Added

  • MongoDBMetadataStore
  • RedisMetadataStore
  • SQLiteMetadataStore

Changed

  • Unified MetadataStoreClient interface

[0.0.20] - 2024-01-XX

Added

  • REST API with FastAPI
  • Swagger/OpenAPI documentation
  • Web UI with React/Next.js
  • CLI commands: pycharter api, pycharter ui

[0.0.15] - 2024-01-XX

Added

  • Custom coercion registration
  • Custom validation registration
  • Built-in coercions: coerce_to_datetime, coerce_to_date, coerce_to_uuid
  • Built-in validations: is_email, is_url, matches_regex

[0.0.10] - 2024-01-XX

Added

  • Initial release
  • JSON Schema Draft 2020-12 support
  • Pydantic model generation
  • Basic coercion and validation
  • InMemoryMetadataStore
  • PostgresMetadataStore

Version History Summary

Version Highlights
0.0.25 ETL Pipelines with \| operator
0.0.24 Quality monitoring and alerting
0.0.23 Validator class, contract builder
0.0.22 Schema evolution
0.0.21 MongoDB, Redis, SQLite stores
0.0.20 REST API and Web UI
0.0.15 Custom coercion/validation
0.0.10 Initial release