Metadata-Version: 2.4
Name: collie-mlops
Version: 0.1.2b0
Summary: A Lightweight MLOps Framework for Machine Learning Workflows
Home-page: https://github.com/ChingHuanChiu/collie
Author: ChingHuanChiu
Author-email: ChingHuanChiu <stevenchiou8@gmail.com>
Maintainer-email: ChingHuanChiu <stevenchiou8@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/ChingHuanChiu/collie
Project-URL: Documentation, https://github.com/ChingHuanChiu/collie/blob/main/README.md
Project-URL: Repository, https://github.com/ChingHuanChiu/collie
Project-URL: Bug Tracker, https://github.com/ChingHuanChiu/collie/issues
Project-URL: Changelog, https://github.com/ChingHuanChiu/collie/blob/main/CHANGELOG.md
Keywords: mlops,machine-learning,mlflow,pipeline,orchestration,deep-learning,experiment-tracking
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: mlflow>=2.0.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: pandas>=1.3.0
Requires-Dist: numpy<2.0.0,>=1.20.0
Provides-Extra: sklearn
Requires-Dist: scikit-learn>=1.0.0; extra == "sklearn"
Provides-Extra: xgboost
Requires-Dist: xgboost>=1.5.0; extra == "xgboost"
Provides-Extra: lightgbm
Requires-Dist: lightgbm>=3.0.0; extra == "lightgbm"
Provides-Extra: pytorch
Requires-Dist: torch>=1.9.0; extra == "pytorch"
Requires-Dist: pytorch-lightning>=2.0.0; extra == "pytorch"
Requires-Dist: transformers>=4.0.0; extra == "pytorch"
Requires-Dist: sentence-transformers>=2.0.0; extra == "pytorch"
Provides-Extra: tabular
Requires-Dist: scikit-learn>=1.0.0; extra == "tabular"
Requires-Dist: xgboost>=1.5.0; extra == "tabular"
Requires-Dist: lightgbm>=3.0.0; extra == "tabular"
Provides-Extra: deep-learning
Requires-Dist: torch>=1.9.0; extra == "deep-learning"
Requires-Dist: pytorch-lightning>=2.0.0; extra == "deep-learning"
Requires-Dist: transformers>=4.0.0; extra == "deep-learning"
Requires-Dist: sentence-transformers>=2.0.0; extra == "deep-learning"
Provides-Extra: all
Requires-Dist: scikit-learn>=1.0.0; extra == "all"
Requires-Dist: xgboost>=1.5.0; extra == "all"
Requires-Dist: lightgbm>=3.0.0; extra == "all"
Requires-Dist: torch>=1.9.0; extra == "all"
Requires-Dist: pytorch-lightning>=2.0.0; extra == "all"
Requires-Dist: transformers>=4.0.0; extra == "all"
Requires-Dist: sentence-transformers>=2.0.0; extra == "all"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=3.0.0; extra == "dev"
Requires-Dist: black>=22.0.0; extra == "dev"
Requires-Dist: mypy>=0.990; extra == "dev"
Requires-Dist: ruff>=0.0.260; extra == "dev"
Dynamic: license-file

# Collie

[![PyPI version](https://badge.fury.io/py/collie-mlops.svg)](https://badge.fury.io/py/collie-mlops)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Documentation](https://img.shields.io/badge/docs-sphinx-blue.svg)](docs/_build/html/index.html)
[![codecov](https://codecov.io/gh/ChingHuanChiu/collie/branch/main/graph/badge.svg)](https://codecov.io/gh/ChingHuanChiu/collie)

A Lightweight MLOps Framework for Machine Learning Workflows


## Overview

Collie is a modern MLOps framework designed to streamline machine learning workflows by providing a component-based architecture integrated with MLflow. It enables data scientists and ML engineers to build, deploy, and manage ML pipelines with ease through modular components that handle different stages of the ML lifecycle.

## Features

- **Component-Based Architecture**: Modular design with specialized components for each ML workflow stage
- **MLflow Integration**: Built-in experiment tracking, model registration, and deployment capabilities
- **Pipeline Orchestration**: Seamless workflow management with event-driven architecture
- **Model Management**: Automated model versioning, staging, and promotion
- **Framework Agnostic**: Supports multiple ML frameworks (PyTorch, scikit-learn, XGBoost, LightGBM, Transformers)

## Architecture

Collie follows an event-driven architecture with the following core components:

- **Transformer**: Data preprocessing and feature engineering
- **Tuner**: Hyperparameter optimization
- **Trainer**: Model training and validation
- **Evaluator**: Model evaluation and comparison
- **Pusher**: Model deployment and registration
- **Orchestrator**: Workflow coordination and execution

## Quick Start

### Installation

#### Basic Installation (Core Framework Only)
```bash
pip install collie-mlops
```

This installs the core MLOps orchestration framework with MLflow integration (~100MB).

#### Install with ML Frameworks

Choose the installation that fits your needs:

**For Traditional ML (Tabular Data)**
```bash
# Individual frameworks
pip install collie-mlops[sklearn]      # scikit-learn support
pip install collie-mlops[xgboost]      # XGBoost support
pip install collie-mlops[lightgbm]     # LightGBM support

# Or install all tabular ML frameworks (~250MB)
pip install collie-mlops[tabular]
```

**For Deep Learning**
```bash
# PyTorch ecosystem (includes Transformers for NLP/Vision) (~3GB)
pip install collie-mlops[pytorch]

# Or use the alias
pip install collie-mlops[deep-learning]
```

**For Complete Installation**
```bash
# All frameworks (~3.5GB)
pip install collie-mlops[all]
```

### Prerequisites

- Python >= 3.10
- MLflow tracking server (can be local or remote)


## Components

### Transformer
Handles data preprocessing, feature engineering, and data validation.

```python
class CustomTransformer(Transformer):
    def handle(self, event) -> Event:
        # Process your data
        processed_data = ... 
        return Event(payload=TransformerPayload(train_data=processed_data))
```

### Tuner
Performs hyperparameter optimization using various strategies.

```python
class CustomTuner(Tuner):
    def handle(self, event) -> Event:
        # Optimize hyperparameters
        best_params = ...
        return Event(payload=TunerPayload(hyperparameters=best_params))
```

### Trainer
Trains machine learning models with automatic experiment tracking.

```python
class CustomTrainer(Trainer):
    def handle(self, event) -> Event:
        # Train your model
        model = ...
        return Event(payload=TrainerPayload(model=model))
```

### Evaluator
Evaluates model performance and decides on deployment.

```python
class CustomEvaluator(Evaluator):
    def handle(self, event) -> Event:
        # Evaluate model performance
        metrics = ...
        is_better: bool = ...
        return Event(payload=EvaluatorPayload(
            metrics=metrics, 
            is_better_than_production=is_better
        ))
```

### Pusher
Handles model deployment and registration.

```python
class CustomPusher(Pusher):
    def handle(self, event) -> Event:
        # Deploy model to production
        model_uri = ...
        return Event(payload=PusherPayload(model_uri=model_uri))
```

### Orchestrator
Coordinates the execution of all components in the pipeline.

```python
from collie import Orchestrator

# Create orchestrator with all components
orchestrator = Orchestrator(
    components=[
        CustomTransformer(),
        CustomTuner(),
        CustomTrainer(),
        CustomEvaluator(),
        CustomPusher()
    ],
    tracking_uri="http://localhost:5000",
    experiment_name="my_experiment",
    registered_model_name="my_model",
    mlflow_tags={"project": "my_project"},
    description="My ML Pipeline"
)

# Run the entire pipeline
orchestrator.run()
```

## Configuration

### MLflow Setup

Start MLflow tracking server:

```bash
mlflow server \
    --backend-store-uri sqlite:///mlflow.db \
    --default-artifact-root ./mlruns \
    --host 0.0.0.0 \
    --port 5000
```

## Supported Frameworks

Collie supports multiple ML frameworks through its flexible optional dependency system:

### Available Frameworks
-  **scikit-learn** - Traditional ML algorithms
-  **XGBoost** - Gradient boosting for tabular data
-  **LightGBM** - Fast gradient boosting framework  
-  **PyTorch** - Deep learning framework
-  **PyTorch Lightning** - High-level PyTorch wrapper
-  **Transformers** - Hugging Face transformers for NLP
-  **Sentence Transformers** - Sentence embeddings

### Installation Options

| Use Case | Command | Size | Frameworks Included |
|----------|---------|------|---------------------|
| **Core Only** | `pip install collie-mlops` | ~100MB | MLflow orchestration only |
| **Tabular ML** | `pip install collie-mlops[tabular]` | ~250MB | sklearn, XGBoost, LightGBM |
| **Deep Learning** | `pip install collie-mlops[pytorch]` | ~3GB | PyTorch, Lightning, Transformers |
| **Complete** | `pip install collie-mlops[all]` | ~3.5GB | All frameworks |

> **Note**: Install only what you need to keep your environment lightweight!


## Documentation

[Here you are]( https://collie-mlops.readthedocs.io/en/latest/getting_started.html )

## Roadmap

### Core Features
- [ ] **Pipeline Checkpoint & Resume** - Save intermediate results and resume from failure points

### Framework Support
- [ ] TensorFlow/Keras support
- [ ] Model monitoring and drift detection


## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Citation

If you use Collie in your research, please cite:

```bibtex
@software{collie2025,
  author = {ChingHuanChiu},
  title = {Collie: A Lightweight MLOps Framework},
  year = {2025},
  url = {https://github.com/ChingHuanChiu/collie}
}
```

---

