Metadata-Version: 2.4
Name: robobench
Version: 0.0.2
Summary: A benchmarking tool for AI models and Hardware.
Author-email: Ramon Perez <ramon@menlo.ai>, Minh Nguyen <minh@menlo.ai>
License-Expression: MIT
Keywords: benchmark,cortex,llm,machine-learning
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Testing
Requires-Python: >=3.12
Requires-Dist: aiohttp>=3.11.12
Requires-Dist: gpustat>=1.1.1
Requires-Dist: psutil>=6.1.1
Requires-Dist: pyyaml>=6.0.2
Requires-Dist: rich>=13.9.4
Requires-Dist: typer>=0.15.1
Requires-Dist: websockets>=14.2
Description-Content-Type: text/markdown

# robobench

<p align="center">
  <img width="1280" alt="robobench Banner" src="./images/robo-bench1.jpg">
</p>

<div align="center">
  <table>
    <tr>
      <td align="center">
        <strong>🚧 EARLY DEVELOPMENT WARNING 🚧</strong>
        <br />
        <br />
        <span>
          This tool is currently about as stable as a house of cards in a wind tunnel.
          <br />
          Very early alpha. Bugs aren't just expected - they've signed a lease.
          <br />
          <br />
          <code>Status: Proceed with optimism ☕</code>
        </span>
      </td>
    </tr>
  </table>
</div>

A benchmarking tool for Local LLMs. Currently keeping an eye on [Cortex.cpp](https://github.com/janhq/cortex.cpp)
but with plans to judge other frameworks equally in the future.

## What is this?

`robobench` measures performance metrics, resource utilization, and stability characteristics of your LLM deployments. Rather comprehensive, really.

## Features

- Model initialization metrics
- Runtime performance
- Resource utilization
- Advanced processing scenarios
- Workload-specific benchmarks
- System integration metrics
- Stability analysis

## Installation

Using `uvx`:
```bash
uvx install robobench
```

Using pip:
```bash
pip install robobench
```

## Usage

### Basic Benchmarking
```bash
# Standard benchmark
robobench "llama3.2:3b-gguf-q2-k"

# With detailed metrics
robobench "llama3.2:3b-gguf-q2-k" --verbose
```

### Specific Benchmarks
```bash
# Initialization only
robobench "llama3.2:3b-gguf-q2-k" --type init

# Runtime metrics
robobench "llama3.2:3b-gguf-q2-k" --type runtime

# Long-running stability test
robobench "llama3.2:3b-gguf-q2-k" --type stability --stability-duration 24
```

### Advanced Usage
```bash
# Custom benchmark prompts
robobench "llama3.2:3b-gguf-q2-k" --type workload --prompts my_prompts.json

# Multi-model benchmarking
robobench "llama3.2:3b-gguf-q2-k" --type advanced \
    --secondary-models "tinyllama:1b-gguf-q4" "phi2:3b-gguf-q4"

# Export results
robobench "llama3.2:3b-gguf-q2-k" --json results.json
```

## Status

Under active development. Support for additional frameworks is planned.

## Roadmap

- Framework-agnostic benchmarking
- Additional performance metrics
- Enhanced visualizations
- Extended stability testing
- local server and UI
- CI/CD management

## Development

### Setup

1. Clone the repository:
```bash
git clone https://github.com/jan.ai/robobench.git
cd robobench
```

2. Create and activate a virtual environment:
```bash
# Using uv (recommended)
uv venv .venv --python 3.12
source .venv/bin/activate
```

3. Install development dependencies:
```bash
# Install project in editable mode with test dependencies
uv pip install -e ".[test]"

# Install development tools
uv add --dev ruff pytest pytest-cov pytest-asyncio hypothesis
```

### Code Quality

#### Linting and Formatting

Run Ruff linter:
```bash
# Check code
ruff check .

# Auto-fix issues
ruff check --fix .

# Format code
ruff format .

# Check formatting without changes
ruff format --check .
```

#### Testing

Run tests:
```bash
# All tests
pytest

# With coverage
pytest --cov=robobench --cov-report=html

# Specific test file
pytest src/tests/test_utils.py

# With hypothesis verbose output
pytest -v src/tests/test_utils.py
```

### Pre-commit Checks

Before submitting a PR:
```bash
# Format code
ruff format .

# Run linter
ruff check .

# Run tests with coverage
pytest --cov=robobench --cov-report=term-missing

# Show coverage report in browser (optional)
python -m http.server -d htmlcov
```

### Code Style

The project uses:
- Type hints
- Some docstrings for public functions and classes

### Project Structure
```
src/
├── robobench/
│   ├── core/
│   │   ├── initialization.py   # Model initialization metrics
│   │   ├── runtime.py         # Runtime performance metrics
│   │   ├── resources.py       # Resource utilization metrics
│   │   ├── integration.py     # System integration metrics
│   │   ├── workloads.py      # Workload-specific metrics
│   │   ├── stability.py       # Stability metrics
│   │   └── utils.py          # Shared utilities
│   ├── cli.py                # Command-line interface
│   └── __init__.py
└── tests/
    ├── conftest.py           # Shared test fixtures
    ├── test_initialization.py
    ├── test_runtime.py
    ├── test_resources.py
    ├── test_integration.py
    └── test_utils.py
```

### Pre-commit Checks

Before submitting a PR:
1. Run all tests
2. Check test coverage
3. Verify type hints with mypy (coming soon)
4. Ensure docstrings are up to date


## Contributing

Issues and pull requests welcome. Do have a look at the existing ones first, though.
