Metadata-Version: 2.4
Name: stagecraft
Version: 0.1.1
Summary: A Python library for building robust ETL pipelines with declarative stages and data flow management
Author-email: alkndoom <developer@ivent.app>
License: Apache-2.0
Project-URL: Homepage, https://github.com/alkndoom/stagecraft
Project-URL: Documentation, https://github.com/alkndoom/stagecraft#readme
Project-URL: Repository, https://github.com/alkndoom/stagecraft
Project-URL: Issues, https://github.com/alkndoom/stagecraft/issues
Keywords: etl,pipeline,data-processing,workflow,data-engineering
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas>=2.0.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: pandera>=0.17.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Dynamic: license-file

# stagecraft

A Python library for building robust ETL (Extract, Transform, Load) pipelines with declarative stages and powerful data flow management.

## Features

- **Pipeline Architecture**: Build complex data pipelines using declarative stages and conditions
- **Type-Safe Variables**: Strongly-typed variable system with support for DataFrames, NumPy arrays, and serializable data
- **Memory Management**: Built-in memory tracking and optimization for data-intensive workflows
- **Data Sources**: Out-of-the-box support for CSV, JSON, and file-based data sources
- **Conditional Execution**: Flexible condition system for controlling stage execution
- **Exception Handling**: Comprehensive exception handling with custom wrappers
- **Logging**: Configurable logging system for pipeline monitoring
- **Utility Functions**: Rich set of utility functions for file operations, string manipulation, and more

## Installation

```bash
pip install stagecraft
```

## Quick Start

```python
from stagecraft import (
    PipelineDefinition,
    PipelineRunner,
    ETLStage,
    DFVar,
)

# Define your pipeline stages
class LoadDataStage(ETLStage):
    def recipe(self, **kwargs):
        # Load your data
        pass

# Create pipeline definition
pipeline = PipelineDefinition(
    name="my_pipeline",
    stages=[LoadDataStage()]
)

# Run the pipeline
runner = PipelineRunner(pipeline)
result = runner.run()
```

## Core Components

### Pipeline System

- `PipelineDefinition`: Define pipeline structure and stages
- `PipelineRunner`: Execute pipelines with context management
- `ETLStage`: Base class for creating custom pipeline stages
- `PipelineContext`: Manage pipeline state and variables

### Variables

- `DFVar`: pandas DataFrame variables
- `NDArrayVar`: NumPy array variables
- `SVar`: Serializable variables for general Python objects

### Data Sources

- `CSVSource`: Read data from CSV files
- `JSONSource`: Read data from JSON files
- `FileSource`: Read data from text files

### Conditions

- `AlwaysExecute`: Unconditional execution
- `AndCondition`/`OrCondition`: Combine multiple conditions
- `ConfigFlagCondition`: Execute based on configuration flags
- `VariableExistsCondition`: Check variable presence
- `CustomCondition`: Define custom execution logic

### Utilities

- File operations: `read_file`, `write_file`, `append_file`
- CSV operations: `read_csv`, `write_csv`, `append_csv`
- JSON operations: `read_json`, `write_json`, `append_json`
- String utilities: `camel_to_snake_case`, `snake_to_camel_case`, and more
- Time utilities: `get_timestamp`, `get_current_date`

## Requirements

- Python 3.8+

## Development

Install development dependencies:

```bash
pip install stagecraft[dev]
```

Run tests:

```bash
pytest
```

## License

Apache-2.0 License - see LICENSE file for details

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.
