Metadata-Version: 2.4
Name: flashinfer-bench
Version: 0.1.0
Summary: FlashInfer-Bench - AI for AI Infrastructure for Accelerating AI Deployment
Author: FlashInfer team
License: Apache License 2.0
Project-URL: Homepage, https://bench.flashinfer.ai
Project-URL: Documentation, https://bench.flashinfer.ai/docs
Project-URL: Repository, https://github.com/flashinfer-ai/flashinfer-bench
Project-URL: Issues, https://github.com/flashinfer-ai/flashinfer-bench/issues
Keywords: deep-learning,inference,benchmark,flashinfer,gpu
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: <4.0,>=3.9
Description-Content-Type: text/markdown
License-File: LICENCE
License-File: NOTICE
Requires-Dist: torch==2.8.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: safetensors>=0.5.0
Requires-Dist: flashinfer-python>=0.3.1
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: black>=22.0.0; extra == "dev"
Requires-Dist: isort>=5.0.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: pre-commit>=3.0.0; extra == "dev"
Requires-Dist: pydoc-markdown>=4.8.2; extra == "dev"
Provides-Extra: cuda12
Requires-Dist: nvidia-cublas-cu12>=12.3; extra == "cuda12"
Requires-Dist: nvidia-cudnn-cu12>=9.0; extra == "cuda12"
Dynamic: license-file

# FlashInfer-Bench

**FlashInfer-Bench** is a lightweight, extensible benchmarking suite for evaluating low-level kernel implementations of model inference workloads. It is centered around the `Trace` artifact — a detailed record of a workload execution. It enables systematic comparison of kernel implementations with correctness and performance metrics.

## Installation

Install FlashInfer-Bench with pip:

```bash
pip install flashinfer-bench
```

Import FlashInfer-Bench:

```python
import flashinfer_bench as fib
```

## Dataset Layout

Each dataset is organized as follows:

```
dataset/
├── definitions/         # One JSON file per workload definition
├── solutions/           # One JSON file per solution implementation
└── traces/              # Benchmark results
```

* Each **Definition** describes a computation task and reference logic.
* Each **Solution** specifies a kernel or agent implementation for a definition.
* Each **Trace** records a benchmark result: input config, performance, correctness, environment, etc.

You can load the full dataset using:

```python
from flashinfer_bench import TraceSet
trace_set = TraceSet.from_path("./dataset")
```

## Command Line Interface (CLI)

FlashInfer-Bench provides a CLI for running benchmarks and analyzing results.

### Usage

#### Options
- `--local <PATH>`: Specifies one or more local paths to load traces from. Can be used multiple times.
- `--hub`: Load the latest traces from the FlashInfer Hub (not yet implemented).
- `--warmup-runs <N>`: Number of warmup runs for benchmarking (default: 10).
- `--iterations <N>`: Number of benchmark iterations (default: 50).
- `--device <DEVICE>`: Device to run benchmarks on (default: cuda:0).
- `--log-level <LEVEL>`: Logging level (default: INFO).
- `--save-results` / `--no-save-results`: Whether to save results after running (default: save).

#### Example

```bash
# Run benchmarks on a dataset
flashinfer-bench run --local ./dataset

# Print a summary of traces
flashinfer-bench report summary --local ./dataset

# Find the best solution for each definition
flashinfer-bench report best --local ./dataset
```

## Benchmarking Kernels

You can run local benchmarks using the `Benchmark` runner, which scans your dataset for all available definitions and solutions, executes them, and appends resulting traces to the `TraceSet`.

It also supports single-solution execution via `.run_solution(...)`.

```python
from flashinfer_bench import Benchmark, BenchmarkConfig, TraceSet

traces = TraceSet.from_path("./dataset")
config = BenchmarkConfig(warmup_runs=5, iterations=20)
benchmark = Benchmark(traces, config)

benchmark.run_all()

# Accessing results
print(traces.summary())
```

## Schema

Each of the core entities is modeled as a dataclass:

* **Definition**: Workload specification with axes, inputs, outputs, and a reference implementation.
* **Solution**: A concrete implementation with source files and a launch entry point.
* **Trace**: A benchmark result of a solution on a specific workload input.

See [`schema/`](./schema/) for full documentation.
