Metadata-Version: 2.4
Name: scald
Version: 0.1.0
Summary: Scalable Collaborative Agents for Data Science
Author-email: Dmitry Gilemkhanov <dima.rize@yandex.ru>
License: MIT
License-File: LICENSE
Requires-Python: >=3.11
Requires-Dist: catboost>=1.2.8
Requires-Dist: chromadb>=1.3.2
Requires-Dist: docker>=7.1.0
Requires-Dist: fastmcp>=2.13.0.2
Requires-Dist: genai-prices>=0.0.35
Requires-Dist: lightgbm>=4.6.0
Requires-Dist: loguru>=0.7.3
Requires-Dist: optuna>=4.5.0
Requires-Dist: polars>=1.35.1
Requires-Dist: pydantic-ai>=1.10.0
Requires-Dist: pydantic>=2.12.3
Requires-Dist: python-dotenv>=1.2.1
Requires-Dist: python-toon>=0.1.3
Requires-Dist: scikit-learn>=1.7.2
Requires-Dist: xgboost>=3.1.1
Description-Content-Type: text/markdown

<div align="center">

<img src="./assets/logo.svg" alt="logo" width="200"/>

# SCALD

### Scalable Collaborative Agents for Data Science

[![Python 3.11+](https://img.shields.io/badge/python-3.11+-white.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-white.svg)](https://opensource.org/licenses/MIT)
[![Documentation](https://img.shields.io/badge/docs-online-white.svg)](https://dmitryglhf.github.io/scald/)
[![Coverage](./.github/badges/coverage.svg)](htmlcov/index.html)

</div>

## Overview

Scald automates machine learning workflows using Actor-Critic agents and MCP servers.

**Key features:**
- Agent-driven EDA, preprocessing, and model training
- Boosting algorithms: CatBoost, LightGBM, XGBoost
- MCP server integration for data operations
- Iterative refinement via Actor-Critic feedback loop

## Installation

Install Python dependencies:
```bash
uv sync
```

Configure environment variables:
```bash
cp .env.example .env  # Add your api_key and base_url to .env
```

## Usage

### CLI

```bash
scald --train data/train.csv --test data/test.csv --target price --task-type regression
```

### Python API

```python
from scald import Scald

scald = Scald(max_iterations=5)
predictions = await scald.run(
    train_path="data/train.csv",
    test_path="data/test.csv",
    target="target_column",
    task_type="classification",
)
```

## Architecture

- Actor: Analyzes data and trains models using MCP tools
- Critic: Evaluates solutions, provides feedback, decides acceptance
- MCP Servers: data-analysis, data-preview, data-processing, machine-learning, file-operations, sequential-thinking

<img src="./assets/arch.svg" alt="arch"/>

## Benchmarks

WIP...


## Documentation

Serve documentation locally:

1. Install documentation dependencies:

```bash
uv sync --group docs
```

2. Serve documentation:

```bash
mkdocs serve
```

Documentation will be available at http://localhost:8000

## Development

```bash
make test      # Run tests
make lint      # Check code quality
make format    # Format code
make help      # Show all commands
```

## Requirements

- Python 3.11+
- uv
- API key for LLM
