Metadata-Version: 2.4
Name: skydiscover
Version: 0.1.0
Summary: A Flexible Framework for AI-Driven Scientific and Algorithmic Discovery
License: Apache-2.0
Requires-Python: <3.14,>=3.10
Description-Content-Type: text/markdown
Requires-Dist: openai>=1.0.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: tqdm>=4.64.0
Requires-Dist: numpy>=1.22.0
Provides-Extra: math
Requires-Dist: scipy>=1.11.0; extra == "math"
Requires-Dist: sympy>=1.14.0; extra == "math"
Requires-Dist: jax>=0.6.2; extra == "math"
Requires-Dist: optax>=0.2.6; extra == "math"
Requires-Dist: torch; extra == "math"
Requires-Dist: scikit-learn>=1.0.0; extra == "math"
Requires-Dist: numba; extra == "math"
Requires-Dist: pandas; extra == "math"
Requires-Dist: matplotlib; extra == "math"
Requires-Dist: plotly; extra == "math"
Requires-Dist: networkx; extra == "math"
Requires-Dist: cvxpy; extra == "math"
Requires-Dist: autograd; extra == "math"
Requires-Dist: pymoo; extra == "math"
Requires-Dist: PyWavelets; extra == "math"
Provides-Extra: adrs
Requires-Dist: numpy>=1.22.0; extra == "adrs"
Requires-Dist: pandas; extra == "adrs"
Requires-Dist: networkx<3.4,>=3.2; extra == "adrs"
Requires-Dist: torch; extra == "adrs"
Provides-Extra: external
Requires-Dist: openevolve; extra == "external"
Requires-Dist: gepa[full]; extra == "external"
Requires-Dist: litellm>=1.81; extra == "external"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
Requires-Dist: black>=22.0.0; extra == "dev"
Requires-Dist: isort>=5.10.0; extra == "dev"
Requires-Dist: mypy>=0.950; extra == "dev"
Requires-Dist: requests>=2.28.0; extra == "dev"
Provides-Extra: frontier-cs
Requires-Dist: anthropic>=0.74.0; extra == "frontier-cs"
Requires-Dist: colorlog>=6.10.1; extra == "frontier-cs"
Requires-Dist: datasets>=4.4.1; extra == "frontier-cs"
Requires-Dist: google-genai>=1.55.0; extra == "frontier-cs"
Requires-Dist: google-generativeai>=0.8.5; extra == "frontier-cs"
Requires-Dist: numpy>=2.0.0; extra == "frontier-cs"
Requires-Dist: python-dotenv>=1.2.1; extra == "frontier-cs"
Requires-Dist: skypilot>=0.10.5; extra == "frontier-cs"
Provides-Extra: prompt-optimization
Requires-Dist: dspy>=3.1.3; extra == "prompt-optimization"
Requires-Dist: litellm>=1.81; extra == "prompt-optimization"
Requires-Dist: bm25s>=0.3.0; extra == "prompt-optimization"
Requires-Dist: pystemmer>=2.2.0.3; extra == "prompt-optimization"
Requires-Dist: datasets>=4.5.0; extra == "prompt-optimization"
Requires-Dist: diskcache>=5.6.3; extra == "prompt-optimization"
Requires-Dist: ujson>=5.11.0; extra == "prompt-optimization"

<h1 align="center">
  <img src="assets/logo_vector.png" height="80" alt="SkyDiscover logo" style="vertical-align: middle;">&nbsp;

  <b>SkyDiscover</b>
</h1>


 <p align="center"> A Flexible Framework for AI-Driven Scientific and Algorithmic Discovery</p>
  <p align="center">
  <a href="https://skydiscover-ai.github.io/blog.html"><img src="https://img.shields.io/badge/blog-SkyDiscover-orange?style=flat-square" alt="Blog" /></a>
  <a href="https://arxiv.org/html/2602.20133v1"><img src="https://img.shields.io/badge/paper-AdaEvolve-red?style=flat-square" alt="AdaEvolve Paper" /></a>
  <a href="https://escholarship.org/uc/item/68v0c7vf"><img src="https://img.shields.io/badge/paper-EvoX-lightblue?style=flat-square" alt="EvoX Paper" /></a>
  <a href="LICENSE"><img src="https://img.shields.io/badge/license-Apache--2.0-green?style=flat-square" /></a>
  </p>



   <p align="center">
  <img src="assets/architecture.png" width="720" alt="SkyDiscover architecture"><br>
</p>


**SkyDiscover** is a modular framework for AI-driven scientific and algorithmic discovery, providing a unified interface for **running and comparing algorithms** across **200+** optimization tasks. It follows the same high-level evolutionary loop as prior systems like AlphaEvolve, but exposes each stage as a reusable, extensible component:

1. **Context Builder.** Assembles prompts from the problem spec, prior solutions, and human feedback.
2. **Solution Generator.** Produces candidates via LLM calls, with optional tool use (e.g., reading codebases).
3. **Evaluator.** Scores candidates and logs metadata back into the solution database.
4. **Solution Selector.** Maintains the solution database and picks priors for the next iteration (Top-K, MAP-Elites, adaptive evolutionary methods, etc.).

Come and build easily with SkyDiscover!
> 🚧 This project is under active development. APIs and interfaces may change.

---

## 🏆 Benchmarks & Performance
We implement two adaptive evolutionary algorithms (**AdaEvolve** and **EvoX**), achieving:

- **Best open-source performance** — ~34% median score improvement on 172 [Frontier-CS](https://frontier-cs.org/) problems over OpenEvolve, GEPA, and ShinkaEvolve
- **Matching or exceeding AlphaEvolve and human SOTA** — on 8 math and 6 systems optimization tasks
- **Real-world discoveries** — 41% lower cross-cloud transfer cost, 14% better GPU load balance for MoE serving, 29% lower KV-cache pressure via GPU model placement
<p align="center">
  <img src="assets/benchmarks.png" width="900" alt="SkyDiscover benchmarks">
</p>

<details>
<summary><b>200+ tasks across math, systems, and algorithms</b></summary>

| | Benchmark | Domain | Tasks | Description |
|-|-----------|--------|------:|-------------|
| 🔢 | [math/](benchmarks/math/) | Math | 14 | Circle packing, Erdos problems, geometric optimization |
| 🖥️ | [ADRS/](benchmarks/ADRS/) | Systems | 6 | Cloud scheduling, load balancing, MoE expert placement |
| ⚡ | [gpu_mode/](benchmarks/gpu_mode/) | Systems | 3 | GPU kernel optimization |
| 🧩 | [frontier-cs-eval/](benchmarks/frontier-cs-eval/) | Algorithms | 172 | [Frontier-CS](https://frontier-cs.org/) competitive programming |
| 🧠 | [arc_benchmark/](benchmarks/arc_benchmark/) | Reasoning | — | ARC-AGI visual reasoning |
| 💻 | [ale_bench/](benchmarks/ale_bench/) | Algorithms | — | Algorithmic programming contests |
| 💬 | [prompt_optimization/](benchmarks/prompt_optimization/) | NLP | 1 | HotPotQA prompt evolution |

See [Dependency extras](#dependency-extras) for install commands per benchmark.

</details>

## 🚀 Quick Start

**Prerequisites:** Python >= 3.10, [uv](https://docs.astral.sh/uv/)

```bash
# Install
uv sync
export OPENAI_API_KEY="<your-key>"

# Try the circle packing benchmark
uv sync --extra math
uv run skydiscover-run benchmarks/math/circle_packing/initial_program.py \
  benchmarks/math/circle_packing/evaluator.py \
  --config benchmarks/math/circle_packing/config.yaml \
  --search adaevolve \
  --iterations 100

# Or run on your own problem
uv run skydiscover-run initial_program.py evaluator.py \
  --search evox \
  --model gpt-5 \
  --iterations 100
```

Or use the Python API:

```python
from skydiscover import run_discovery

result = run_discovery(
    initial_program="initial_program.py",
    evaluator="evaluator.py",
    search="evox",   # or "adaevolve"
    model="gpt-5",
    iterations=100,
)
print(result.best_score, result.best_solution)
```


## ✏️ The Two Files You Write

### Scoring Function

Your evaluator is a Python file with an **evaluate(program_path)** function. It returns a dict with metrics and optional artifacts:

```python
def evaluate(program_path):
    score = run_and_grade(program_path)
    return {
        "combined_score": score,       # primary optimization target (maximized)
        "artifacts": {                 # optional — stored with the solution for future context
            "feedback": "Off by one in the loop boundary",
        },
    }
```

- **combined_score** drives evolution. If omitted, SkyDiscover averages all numeric values in the dict.
- **artifacts** is optional — entries are injected into the next LLM prompt as context.

### Starting Solution

Your initial program marks the region to mutate with EVOLVE-BLOCK markers. Everything outside is left untouched.

```python
# EVOLVE-BLOCK-START
def solve(input_data):
    return input_data  # baseline — SkyDiscover will improve this
# EVOLVE-BLOCK-END
```

If no markers are present, the entire file is treated as mutatable.


## 🧬 Pick an Algorithm

| Algorithm | Flag | Description |
|:---|:---|:---|
| ⭐&nbsp;**AdaEvolve** | `--search adaevolve` | Multi-island adaptive search with UCB, migration, and paradigm breakthroughs |
| 🧠&nbsp;**EvoX** | `--search evox` | Self-evolving paradigm that co-adapts solution generation and experience management |
| 📊&nbsp;**Top-K** | `--search topk` | Simple baseline — keeps the top-K solutions |
| 🔍&nbsp;**Beam&nbsp;Search** | `--search beam_search` | Breadth-first expansion of a beam of top solutions |
| 🎲&nbsp;**Best-of-N** | `--search best_of_n` | Generates N variants per iteration, keeps the best |

<details>
<summary><b>External backends</b></summary>

> **Note:** The `external` extra is only supported when installing via `uv`. It will not work with `pip install skydiscover[external]` because it pulls packages directly from Git. If you are using pip, install the backends manually (see each project's repo).

Install with `uv sync --extra external`, then use the corresponding flag:

| Backend | Flag | Source |
|:---|:---|:---|
| **OpenEvolve** | `--search openevolve` | [codelion/openevolve](https://github.com/codelion/openevolve) |
| **GEPA** | `--search gepa` | [gepa-ai/gepa](https://github.com/gepa-ai/gepa) |
| **ShinkaEvolve** | `--search shinkaevolve` | [SakanaAI/ShinkaEvolve](https://github.com/SakanaAI/ShinkaEvolve) (manual install) |

<details>
<summary>ShinkaEvolve manual install</summary>

```bash
git clone --depth 1 https://github.com/SakanaAI/ShinkaEvolve.git external_repos/ShinkaEvolve
uv pip install -e external_repos/ShinkaEvolve
```

</details>

SkyDiscover shows a helpful error if you try to use a backend that isn't installed.

</details>


## ⚙️ Configuration

Pass a YAML config with `-c`. See [configs/](configs/) for full annotated templates.

```yaml
max_iterations: 100
llm:
  models: [{ name: "gemini/gemini-3-pro-preview", weight: 1.0 }]
search:
  type: "adaevolve"                  # or "evox", "topk", "beam_search", "best_of_n"
prompt:
  system_message: |
    You are an expert at optimizing algorithms.
```

API keys (OPENAI_API_KEY, GEMINI_API_KEY, etc.) are resolved from environment variables automatically. 

### 📊 Live Monitor & Human Feedback

Add `monitor: { enabled: true }` to your config. The dashboard URL prints at run start — scatter plot of all programs, code diffs, metrics, and AI summaries. A **Human Feedback** panel lets you steer evolution in real time.
Replay a completed run:

```bash
uv run skydiscover-viewer /path/to/checkpoints/checkpoint_100
```


## 📖 Reference

<details>
<summary><b>CLI flags</b></summary>

```
uv run skydiscover-run INITIAL_PROGRAM EVALUATOR [options]
```

| Flag | Description |
|:---|:---|
| `-c, --config FILE` | Config YAML |
| `-i, --iterations N` | Number of iterations |
| `-m, --model MODEL` | LLM model (overrides config) |
| `-s, --search TYPE` | Search algorithm |
| `-o, --output DIR` | Output directory |
| `--api-base URL` | Override LLM API endpoint |
| `--checkpoint DIR` | Resume from checkpoint |
| `--codebase DIR` | Agentic mode (LLM can read your files) |
| `-l, --log-level LEVEL` | DEBUG, INFO, WARNING, or ERROR |

</details>

<details>
<summary><b>Python API — discover_solution()</b></summary>

```python
from skydiscover import discover_solution

result = discover_solution(
    initial_solution="def solve(x): return x",
    evaluator=lambda path: {"combined_score": run_tests(path)},
    iterations=50,
    search="evox",
)
```

</details>

<details>
<summary><b>Model providers</b></summary>

Any [LiteLLM](https://docs.litellm.ai/)-compatible model works using `provider/model` format:

```bash
--model gpt-5                                               # OpenAI (default)
--model gemini/gemini-3-pro-preview                          # Gemini
--model anthropic/claude-sonnet-4-20250514                   # Anthropic
--model ollama/llama3 --api-base http://localhost:11434/v1   # Local (Ollama, vLLM, etc.)
```

Multi-model pools with weighted sampling are supported in config:

```yaml
llm:
  models:
    - name: "gpt-5-mini"
      weight: 0.7
    - name: "gemini/gemini-2.0-flash"
      weight: 0.3
```

</details>

<details id="dependency-extras">
<summary><b>Benchmark dependency extras</b></summary>

```bash
uv sync                              # Base install
uv sync --extra math                 # Math benchmarks (SciPy, JAX, PyWavelets, …)
uv sync --extra adrs                 # ADRS systems benchmarks
uv sync --extra frontier-cs          # Frontier-CS benchmark tooling
uv sync --extra external             # OpenEvolve / GEPA / ShinkaEvolve backends (uv only, not available via pip)
uv sync --extra prompt-optimization  # HotPotQA prompt optimization
```

Combine extras as needed: `uv sync --extra external --extra math`

If a benchmark ships its own `requirements.txt`, also run: `uv pip install -r path/to/requirements.txt`

</details>

---

## 🔗 Related Work

SkyDiscover is inspired by [AlphaEvolve](https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/), Google DeepMind's evolutionary coding agent, and [OpenEvolve](https://github.com/codelion/openevolve), an open-source reimplementation of the AlphaEvolve framework.

## ✍️ Citation

```bibtex
@misc{skydiscover2025,
  title={SkyDiscover: A Flexible Framework for AI-Driven Scientific and Algorithmic Discovery},
  author={Liu, Shu and Cemri, Mert and Agarwal, Shubham and Krentsel, Alexander and Naren, Ashwin and Mang, Qiuyang and Li, Zhifei and Gupta, Akshat and Maheswaran, Monishwaran and Cheng, Audrey and Pan, Melissa and Boneh, Ethan and Ramchandran, Kannan and Sen, Koushik and Dimakis, Alexandros G. and Zaharia, Matei and Stoica, Ion},
  year={2025},
}
```
