Metadata-Version: 2.4
Name: neurobrix
Version: 0.1.0a1
Summary: Universal Deep Learning Inference Engine — execute any AI model without model-specific code
Project-URL: Homepage, https://github.com/NeuroBrix/neurobrix
Project-URL: Repository, https://github.com/NeuroBrix/neurobrix
Project-URL: Issues, https://github.com/NeuroBrix/neurobrix/issues
Project-URL: Documentation, https://github.com/NeuroBrix/neurobrix#readme
Author-email: Neural Networks Holding LTD <contact@neurobrix.es>
License-Expression: Apache-2.0
License-File: LICENSE
Keywords: deep-learning,diffusion,gpu,inference,llm,model-serving,neural-networks,onnx-alternative,pytorch,triton
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Requires-Dist: numpy>=1.24.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: requests>=2.28.0
Requires-Dist: safetensors>=0.4.0
Requires-Dist: tqdm>=4.65.0
Provides-Extra: cuda
Requires-Dist: triton>=2.1.0; extra == 'cuda'
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Description-Content-Type: text/markdown

<p align="center">
  <img src="assets/logo.svg" alt="NeuroBrix Logo" width="280"/>
</p>

<h1 align="center">NeuroBrix</h1>

<p align="center">
  <strong>Universal Deep Learning Inference Engine</strong><br/>
  Execute any AI model without model-specific code. One engine for diffusion, LLMs, MoE, multimodal, and more.
</p>

<p align="center">
  <a href="https://pypi.org/project/neurobrix/"><img src="https://img.shields.io/pypi/v/neurobrix?color=blue" alt="PyPI"/></a>
  <a href="https://pypi.org/project/neurobrix/"><img src="https://img.shields.io/pypi/pyversions/neurobrix" alt="Python"/></a>
  <a href="https://github.com/NeuroBrix/neurobrix/blob/main/LICENSE"><img src="https://img.shields.io/badge/license-Apache%202.0-green" alt="License"/></a>
</p>

---

## What is NeuroBrix?

NeuroBrix is a **universal inference engine** that runs any deep learning model from a single, unified runtime. Instead of writing model-specific code for each architecture (diffusion pipelines, autoregressive LLMs, mixture-of-experts, multimodal models), NeuroBrix reads a self-contained `.nbx` container and executes the model graph directly.

**The runtime has zero domain knowledge.** It doesn't know what an "image" or a "token" is. It only knows tensors, axes, and execution graphs. All model-specific behavior is encoded in the `.nbx` container.

### Key Principles

- **ZERO HARDCODE** -- All values derived from the container. Nothing is hardcoded in the engine.
- **ZERO FALLBACK** -- The system crashes explicitly if data is missing. No silent defaults.
- **ZERO SEMANTIC** -- The runtime has no domain knowledge. Only tensors and execution plans.

### Supported Model Families

| Family | Models | Description |
|--------|--------|-------------|
| **Image** | PixArt-Sigma, FLUX, Sana, Janus-Pro | Diffusion and VQ-based image generation |
| **LLM** | DeepSeek-MoE, TinyLlama, Mistral | Autoregressive text generation |
| **Audio** | Whisper | Speech-to-text transcription |
| **Video** | CogVideoX | Text-to-video generation |

---

## Installation

```bash
pip install neurobrix
```

**Requirements:**
- Python 3.10+
- PyTorch 2.0+ (install separately for your CUDA version)
- NVIDIA GPU with CUDA support

**Optional (Triton kernels):**
```bash
pip install neurobrix[cuda]
```

---

## Quick Start

### 1. Browse Available Models

```bash
# List all models on the NeuroBrix registry
neurobrix hub

# Filter by category
neurobrix hub --category IMAGE
neurobrix hub --category LLM

# Search by name
neurobrix hub --search pixart
```

### 2. Import a Model

```bash
# Download from the NeuroBrix registry
neurobrix import pixart/sigma-xl-1024

# Import and delete the .nbx archive after extraction (saves disk space)
neurobrix import deepseek/moe-16b-chat --no-keep
```

### 3. Run Inference

```bash
# Image generation
neurobrix run --model PixArt-Sigma-XL-2-1024-MS --hardware v100-32g \
    --prompt "a sunset over mountains" --steps 20

# Text generation (LLM)
neurobrix run --model deepseek-moe-16b-chat --hardware c4140-4xv100-custom-nvlink \
    --prompt "Explain quantum computing in simple terms"

# With custom parameters
neurobrix run --model TinyLlama-1.1B-Chat-v1.0 --hardware v100-32g \
    --prompt "Hello, World!" --temperature 0.7 --seed 42
```

### 4. Persistent Serving (Daemon Mode)

Keep the model loaded in VRAM and serve requests instantly:

```bash
# Start the serving daemon (weights stay in GPU memory)
neurobrix serve --model TinyLlama-1.1B-Chat-v1.0 --hardware v100-32g

# Interactive chat session (connects to the running daemon)
neurobrix chat
neurobrix chat --temperature 0.7 --repetition-penalty 1.2

# Stop the daemon and free VRAM
neurobrix stop
```

**Chat commands** (inside the chat session):
| Command | Description |
|---------|-------------|
| `/new` | Start a new conversation |
| `/context` | Show token usage |
| `/status` | Show engine status |
| `/quit` | Exit chat |

---

## How It Works

```
.nbx Container ──> Prism Solver ──> Execution Plan ──> Runtime Executor ──> Output
                    (hardware)       (strategy)         (graph engine)
```

1. **`.nbx` Container** -- A self-contained archive with the model graph, weights, topology, and metadata. This is the only source of truth.

2. **Prism Solver** -- Analyzes the model's memory requirements against your hardware profile and selects the optimal execution strategy automatically.

3. **Execution Plan** -- The strategy that determines how components are placed across GPUs:

| Strategy | Description |
|----------|-------------|
| `single_gpu` | All components on one GPU |
| `single_gpu_lifecycle` | Load/unload components sequentially on one GPU |
| `pp_nvlink` | Pipeline parallelism across NVLink-connected GPUs |
| `pp_pcie` | Pipeline parallelism across PCIe-connected GPUs |
| `pp_lazy_nvlink` | Pipeline parallelism + lazy weight loading (NVLink) |
| `pp_lazy_pcie` | Pipeline parallelism + lazy weight loading (PCIe) |
| `fgp_nvlink` | Fine-grained parallelism for MoE models (NVLink) |
| `fgp_pcie` | Fine-grained parallelism for MoE models (PCIe) |
| `tp` | Tensor parallelism across GPUs |
| `lazy_sequential` | Sequential component loading (fits large models in small VRAM) |
| `zero3` | CPU offload with GPU compute (for very large models) |

4. **Runtime Executor** -- Executes the graph using CompiledSequence (zero-overhead compiled ops) with automatic dtype management (AMP), KV cache, and multi-device tensor routing.

---

## CLI Reference

### `neurobrix run` -- Run Inference

```bash
neurobrix run --model <name> --hardware <profile> --prompt <text> [options]
```

| Flag | Description |
|------|-------------|
| `--model` | Model name (required) |
| `--hardware` | Hardware profile ID (required) |
| `--prompt` | Text prompt (required) |
| `--steps` | Number of inference steps (diffusion models) |
| `--cfg` | Guidance scale |
| `--height` / `--width` | Output dimensions in pixels |
| `--output` | Output file path |
| `--seed` | Random seed for reproducibility |
| `--temperature` | Sampling temperature (0 = greedy) |
| `--repetition-penalty` | Repetition penalty (1.0 = none) |
| `--chat` / `--no-chat` | Force chat template on/off |
| `--set KEY=VALUE` | Override any runtime variable |

### `neurobrix serve` -- Persistent Model Serving

```bash
neurobrix serve --model <name> --hardware <profile> [options]
```

| Flag | Description |
|------|-------------|
| `--model` | Model name (required) |
| `--hardware` | Hardware profile ID (required) |
| `--timeout` | Idle timeout in seconds (default: 1800) |
| `--foreground` | Run in foreground (for debugging) |

### `neurobrix chat` -- Interactive Chat

```bash
neurobrix chat [options]
```

| Flag | Description |
|------|-------------|
| `--max-tokens` | Max tokens per response |
| `--temperature` | Sampling temperature |
| `--repetition-penalty` | Repetition penalty |

### `neurobrix hub` -- Browse Registry

```bash
neurobrix hub [--category IMAGE|LLM|VIDEO|AUDIO] [--search <query>]
```

### `neurobrix import` -- Download Model

```bash
neurobrix import <org/name> [--force] [--no-keep]
```

### `neurobrix list` -- List Installed Models

```bash
neurobrix list [--store]
```

### `neurobrix remove` -- Remove Model

```bash
neurobrix remove <model_name> [--store] [--all]
```

### `neurobrix clean` -- Wipe All Models

```bash
neurobrix clean [--store] [--cache] [--all] [-y]
```

### `neurobrix info` -- System Information

```bash
neurobrix info [--models] [--hardware] [--system]
```

### `neurobrix inspect` -- Inspect .nbx File

```bash
neurobrix inspect <path.nbx> [--topology] [--weights]
```

### `neurobrix validate` -- Validate .nbx Integrity

```bash
neurobrix validate <path.nbx> [--level structure|schema|coherence|deep] [--strict]
```

---

## Hardware Profiles

NeuroBrix uses hardware profiles to describe your GPU setup. The Prism solver reads these profiles to determine the best execution strategy.

| Profile ID | GPUs | Description |
|------------|------|-------------|
| `v100-16g` | 1x V100 16GB | Single GPU |
| `v100-32g` | 1x V100 32GB | Single GPU |
| `v100-32g-2` | 2x V100 32GB | Dual GPU |
| `v100-32g-3` | 3x V100 32GB | Triple GPU |
| `c4140-4xv100-custom-nvlink` | 4x V100 (mixed) | Multi-GPU NVLink cluster |
| `v100-32g-x2-nvlink` | 2x V100 32GB | Dual GPU NVLink |
| `c4140-4xv100-16GB-nvlink` | 4x V100 16GB | Multi-GPU NVLink cluster |

---

## Dependencies

### Required

| Package | Version | Purpose |
|---------|---------|---------|
| [PyTorch](https://pytorch.org/) | >= 2.0.0 | Tensor computation and GPU execution |
| [safetensors](https://github.com/huggingface/safetensors) | >= 0.4.0 | Fast, safe model weight loading |
| [NumPy](https://numpy.org/) | >= 1.24.0 | Numerical operations |
| [PyYAML](https://pyyaml.org/) | >= 6.0 | Configuration file parsing |
| [requests](https://requests.readthedocs.io/) | >= 2.28.0 | HTTP client for registry access |
| [tqdm](https://tqdm.github.io/) | >= 4.65.0 | Progress bars for downloads |

### Optional

| Package | Version | Purpose |
|---------|---------|---------|
| [Triton](https://triton-lang.org/) | >= 2.1.0 | Custom GPU kernels (`pip install neurobrix[cuda]`) |

---

## Project Structure

```
neurobrix/
  cli/                  # Command-line interface
    commands/           # run, serve, chat, hub, import, list, ...
  core/                 # Runtime engine
    runtime/            # Executor, graph engine, CompiledSequence
    prism/              # Hardware solver and execution planning
    flow/               # Execution flows (iterative, autoregressive, forward)
    dtype/              # Automatic mixed precision engine
    module/             # Tokenizer, scheduler, KV cache, text processor
    io/                 # Weight loading
  kernels/              # Triton GPU kernels
  nbx/                  # .nbx container format
  config/               # Hardware profiles, vendor configs
```

---

## License

Apache License 2.0 -- Copyright 2025 [Neural Networks Holding LTD](https://neurobrix.es)

See [LICENSE](LICENSE) for the full text.
