Metadata-Version: 2.4
Name: crystallize-ml
Version: 0.22.0
Summary: A framework for reproducible experiments with pipelines, treatments, and hypotheses.
Author-email: Bryson Tang <brysontang@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/brysontang/crystallize
Project-URL: Documentation, https://github.com/brysontang/crystallize/tree/main/docs
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pyyaml<7,>=6.0.2
Requires-Dist: networkx<4,>=3.4.2
Requires-Dist: tqdm
Requires-Dist: rich<14,>=13.8.1
Requires-Dist: dill<0.4,>=0.3.7
Provides-Extra: dev
Provides-Extra: extras
Requires-Dist: crystallize-extras; extra == "extras"
Provides-Extra: ray
Requires-Dist: ray; extra == "ray"
Provides-Extra: vllm
Requires-Dist: vllm; extra == "vllm"
Provides-Extra: cli
Requires-Dist: textual; extra == "cli"
Provides-Extra: all
Requires-Dist: crystallize-extras; extra == "all"
Requires-Dist: ray; extra == "all"
Requires-Dist: vllm; extra == "all"
Requires-Dist: textual; extra == "all"
Dynamic: license-file

# Crystallize 🧪✨

[![Test](https://github.com/brysontang/crystallize/actions/workflows/test.yml/badge.svg)](https://github.com/brysontang/crystallize/actions/workflows/test.yml)
[![Lint](https://github.com/brysontang/crystallize/actions/workflows/lint.yml/badge.svg)](https://github.com/brysontang/crystallize/actions/workflows/lint.yml)
[![PyPI Version](https://badge.fury.io/py/crystallize-ml.svg)](https://pypi.org/project/crystallize-ml/)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/brysontang/crystallize/blob/main/LICENSE)
[![Codecov](https://codecov.io/gh/brysontang/crystallize/branch/main/graph/badge.svg)](https://codecov.io/gh/brysontang/crystallize)

⚠️ Pre-Alpha Notice  
This project is in an early experimental phase. Breaking changes may occur at any time. Use at your own risk.

---

**Rigorous, reproducible, and clear data science experiments.**

Crystallize is an elegant, lightweight Python framework designed to help data scientists, researchers, and machine learning practitioners turn hypotheses into crystal-clear, reproducible experiments.

---

## Why Crystallize?

- **Clarity from Complexity**: Easily structure your experiments, making it straightforward to follow best scientific practices.
- **Repeatability**: Built-in support for reproducible results through immutable contexts, lockfiles, and robust pipeline management.
- **Statistical Rigor**: Hypothesis-driven experiments with integrated statistical verification.

---

## Core Concepts

Crystallize revolves around several key abstractions:

- **DataSource**: Flexible data fetching and generation.
- **Pipeline & PipelineSteps**: Deterministic data transformations. Steps may be
  synchronous or ``async`` functions and are awaited automatically.
- **Hypothesis & Treatments**: Quantifiable assertions and experimental variations.
- **Statistical Tests**: Built-in support for rigorous validation of experiment results.
- **Optimizer**: Iterative search over treatments using an ask/tell loop.

---

## Getting Started

### Installation

Crystallize uses `pixi` for managing dependencies and environments:

```bash
pixi install crystallize-ml
```

### Quick Example

```python
from crystallize import (
    DataSource,
    Hypothesis,
    Pipeline,
    Treatment,
    Experiment,
    SeedPlugin,
    ParallelExecution,
)

# Example setup (simple)
pipeline = Pipeline([...])
datasource = DataSource(...)
t_test = WelchTTest()

@hypothesis(verifier=t_test, metrics="accuracy")
def rank_by_p(result):
    return result["p_value"]

hypothesis = rank_by_p()

treatment = Treatment(name="experiment_variant", apply_fn=lambda ctx: ctx.update({"learning_rate": 0.001}))

experiment = Experiment(
    datasource=datasource,
    pipeline=pipeline,
    plugins=[SeedPlugin(seed=42), ParallelExecution(max_workers=4)],
)
experiment.validate()  # optional
result = experiment.run(
    treatments=[treatment],
    hypotheses=[hypothesis],
    replicates=3,
)
print(result.metrics)
print(result.hypothesis_result)
result.print_tree()
```

For a minimal YAML-driven setup with treatments and artifact output,
see [`examples/folder_experiment`](examples/folder_experiment).

### Command Line Interface

Crystallize ships with an interactive CLI for discovering and executing
experiments or experiment graphs. After each run, the summary screen now displays
both recorded metrics and hypothesis results.

Use the search field above each list to quickly filter experiments or graphs.
Clicking an item shows its details in the right panel with a **Run** button to
start execution. Pressing <kbd>Enter</kbd> while a list is focused also runs the
selected object.

Press ``c`` in the main screen to scaffold a new experiment folder.

```bash
# Discover and run a single experiment
crystallize run experiment

# Discover and run a graph from a specific directory
crystallize run graph --path ./my_project/experiments

# Preview actions without executing
crystallize run graph --dry-run
```

### Project Structure

```
crystallize/
├── datasources/
├── experiments/
├── pipelines/
├── plugins/
└── utils/
```

Key classes and decorators are re-exported in :mod:`crystallize` for concise imports:

```python
from crystallize import Experiment, Pipeline, ArtifactPlugin
```

This layout keeps implementation details organized while exposing a clean, flat public API.

---

## Roadmap

- **Advanced features**: Adaptive experimentation, intelligent meta-learning
- **Collaboration**: Experiment sharing, templates, and community contributions

---

## Contributing

Contributions are very welcome! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

Use [`code2prompt`](https://github.com/mufeedvh/code2prompt) to generate LLM-powered docs:

```bash
code2prompt crystallize --exclude="*.lock" --exclude="**/docs/src/content/docs/reference/*" --exclude="**package-lock.json" --exclude="**CHANGELOG.md"
```

---

## License

Crystallize is licensed under the Apache 2.0 License. See [LICENSE](LICENSE) for details.
