Metadata-Version: 2.4
Name: crystallize-ml
Version: 0.20.2
Summary: A framework for reproducible experiments with pipelines, treatments, and hypotheses.
Author-email: Bryson Tang <brysontang@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/brysontang/crystallize
Project-URL: Documentation, https://github.com/brysontang/crystallize/tree/main/docs
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pyyaml<7,>=6.0.2
Requires-Dist: networkx<4,>=3.4.2
Requires-Dist: tqdm
Requires-Dist: rich<14,>=13.8.1
Provides-Extra: dev
Provides-Extra: extras
Requires-Dist: crystallize-extras; extra == "extras"
Provides-Extra: ray
Requires-Dist: ray; extra == "ray"
Provides-Extra: vllm
Requires-Dist: vllm; extra == "vllm"
Provides-Extra: cli
Requires-Dist: textual; extra == "cli"
Provides-Extra: all
Requires-Dist: crystallize-extras; extra == "all"
Requires-Dist: ray; extra == "all"
Requires-Dist: vllm; extra == "all"
Requires-Dist: textual; extra == "all"
Dynamic: license-file

# Crystallize 🧪✨

[![Test](https://github.com/brysontang/crystallize/actions/workflows/test.yml/badge.svg)](https://github.com/brysontang/crystallize/actions/workflows/test.yml)
[![Lint](https://github.com/brysontang/crystallize/actions/workflows/lint.yml/badge.svg)](https://github.com/brysontang/crystallize/actions/workflows/lint.yml)
[![PyPI Version](https://badge.fury.io/py/crystallize-ml.svg)](https://pypi.org/project/crystallize-ml/)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/brysontang/crystallize/blob/main/LICENSE)
[![Codecov](https://codecov.io/gh/brysontang/crystallize/branch/main/graph/badge.svg)](https://codecov.io/gh/brysontang/crystallize)

⚠️ Pre-Alpha Notice  
This project is in an early experimental phase. Breaking changes may occur at any time. Use at your own risk.

---

**Rigorous, reproducible, and clear data science experiments.**

Crystallize is an elegant, lightweight Python framework designed to help data scientists, researchers, and machine learning practitioners turn hypotheses into crystal-clear, reproducible experiments.

---

## Why Crystallize?

- **Clarity from Complexity**: Easily structure your experiments, making it straightforward to follow best scientific practices.
- **Repeatability**: Built-in support for reproducible results through immutable contexts, lockfiles, and robust pipeline management.
- **Statistical Rigor**: Hypothesis-driven experiments with integrated statistical verification.

---

## Core Concepts

Crystallize revolves around several key abstractions:

- **DataSource**: Flexible data fetching and generation.
- **Pipeline & PipelineSteps**: Deterministic data transformations. Steps may be
  synchronous or ``async`` functions and are awaited automatically.
- **Hypothesis & Treatments**: Quantifiable assertions and experimental variations.
- **Statistical Tests**: Built-in support for rigorous validation of experiment results.
- **Optimizer**: Iterative search over treatments using an ask/tell loop.

---

## Getting Started

### Installation

Crystallize uses `pixi` for managing dependencies and environments:

```bash
pixi install crystallize-ml
```

### Quick Example

```python
from crystallize import (
    DataSource,
    Hypothesis,
    Pipeline,
    Treatment,
    Experiment,
    SeedPlugin,
    ParallelExecution,
)

# Example setup (simple)
pipeline = Pipeline([...])
datasource = DataSource(...)
t_test = WelchTTest()

@hypothesis(verifier=t_test, metrics="accuracy")
def rank_by_p(result):
    return result["p_value"]

hypothesis = rank_by_p()

treatment = Treatment(name="experiment_variant", apply_fn=lambda ctx: ctx.update({"learning_rate": 0.001}))

experiment = Experiment(
    datasource=datasource,
    pipeline=pipeline,
    plugins=[SeedPlugin(seed=42), ParallelExecution(max_workers=4)],
)
experiment.validate()
result = experiment.run(
    treatments=[treatment],
    hypotheses=[hypothesis],
    replicates=3,
)
print(result.metrics)
print(result.hypothesis_result)
result.print_tree()
```

### Command Line Interface

Crystallize ships with an interactive CLI for discovering and executing
experiments or experiment graphs.

```bash
# Discover and run a single experiment
crystallize run experiment

# Discover and run a graph from a specific directory
crystallize run graph --path ./my_project/experiments

# Preview actions without executing
crystallize run graph --dry-run
```

### Project Structure

```
crystallize/
├── datasources/
├── experiments/
├── pipelines/
├── plugins/
└── utils/
```

Key classes and decorators are re-exported in :mod:`crystallize` for concise imports:

```python
from crystallize import Experiment, Pipeline, ArtifactPlugin
```

This layout keeps implementation details organized while exposing a clean, flat public API.

---

## Roadmap

- **Advanced features**: Adaptive experimentation, intelligent meta-learning
- **Collaboration**: Experiment sharing, templates, and community contributions

---

## Contributing

Contributions are very welcome! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

Use [`code2prompt`](https://github.com/mufeedvh/code2prompt) to generate LLM-powered docs:

```bash
code2prompt crystallize --exclude="*.lock" --exclude="**/docs/src/content/docs/reference/*" --exclude="**package-lock.json" --exclude="**CHANGELOG.md"
```

---

## License

Crystallize is licensed under the Apache 2.0 License. See [LICENSE](LICENSE) for details.
