Metadata-Version: 2.4
Name: neuronpack
Version: 1.0.1
Summary: Universal, hardware-compiled, composable model container format enabling True RSI.
Project-URL: Homepage, https://github.com/dog52841/neuronpack
Project-URL: Repository, https://github.com/dog52841/neuronpack.git
Author-email: Dio Dog <dog52841@gmail.com>
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9
Requires-Dist: numba>=0.58.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: orjson>=3.9.0
Requires-Dist: safetensors>=0.4.0
Requires-Dist: torch>=2.0.0
Description-Content-Type: text/markdown

# NeuronPack
<div align="center">
  <h3>A universal, hardware-compiled, composable model container format.</h3>
  <p>Enabling <i>True Recursive Self-Improvement (RSI)</i> via autonomous dynamic architecture compilation.</p>
</div>

---

**⚠️ NOTE:** *This framework is completely VIBE CODED. It is an experimental pursuit of meta-learning architectures. Improvements, forks, and pull requests are very much welcome and encouraged!*

Read the full specification in the [docs/neuronpack.txt](docs/neuronpack.txt).

---

## 🧠 The Problem with Monolithic Models

Standard PyTorch or Transformers architectures lock parameters into a single large state dictionary (`.pt` or `.safetensors` files) deeply entangled with a massive, rigid Autograd Execution Graph. 

If you want to swap a single 100k-parameter Multi-Layer Perceptron (MLP) within a 10-Billion-parameter model, you traditionally have to:
1. Reload all 10 Billion parameters into RAM.
2. Unpack the monolithic state dictionary.
3. Replace the sub-module.
4. Reserialize and block Disk I/O to save the entire 10 Billion parameters back to disk.

This massive technical debt **mathematically prevents True Recursive Self-Improvement (RSI)**—systems that actively modify their own algorithm mid-training.

## 📦 The NeuronPack Solution

NeuronPack breaks down giant static Multi-Layer networks into thousands of isolated "Pieces". 
Instead of one massive file, you get a highly scalable directory structure containing isolated Safetensors, Metadata, and AOT-Compiled `kernel.pt2` machine-code binaries.

NeuronPack acts as an $O(1)$ isolated File-System Container wrapped by an ultra-low-latency CPU Numba Router.

### Components

1.  **Core Container (`NeuronPack`)**: The isolated structural DAG layer enforcing the separation of metadata and parameters on disk. Hot-swappable zero-copy mechanics via Safetensors.
2.  **`@njit` Search Router (`Router`)**: A dedicated Exact Cosine-Similarity engine that dynamically selects components to load at runtime without dragging PyTorch GPU memory limits.
3.  **AOT Inductor Compilation**: Utilizes PyTorch 2.x `torch.export` to freeze Python logic into raw C++ library binaries (`.kernel.pt2`) dynamically upon expert insertion.
4.  **NEURON-RSI Evolutionary Controller**: The architecture search engine. A Meta-Objective Evaluator that autonomously controls spawning, mutating, and Safe-Replacement for dynamic component modules based on pure mathematical fitness tradeoffs (Performance vs. Bloat Penalties).

---

## ⚡ Performance Benchmarks

*Benchmarked on a synthetic 10-Million-parameter environment across 100 experts.*
*Targeting the replacement/swapping of a single 100k-parameter module (1% of the network).*

| Metric | Traditional PyTorch Model | NeuronPack | Improvement |
| :--- | :--- | :--- | :--- |
| **Disk I/O Saves** (Evolution) | ~57.59 ms *(Save 10M params)* | **~1.54 ms** *(Save 100k params)* | **37.4x Faster** |
| **Disk I/O Loads** (Swap overhead)| ~65.03 ms *(Load graph + unzip dict)* | **~1.19 ms** *(Zero-copy single target)*| **54.5x Faster** |
| **VRAM Locked Footprint** (Idle) | ~38.15 MB *(Full state embedded in graph)* | **~0.38 MB** *(Swaps dynamically on disk)* | **100x Reduction** |
| **Routing Registry Lookup** | ~0.16 ms*(GPU VRAM Locked nn.Linear)* | ~4.41 ms*(Decoupled CPU Exact Match)*| Parity (avoids Autograd lock-in) |

NeuronPack unlocks $O(1)$ mutations, making True RSI viable locally. By decoupling the architecture lookup engine from the PyTorch Execution Graph, evolutionary training loops never throw Out-of-Memory exceptions.

---

## 🚀 How to Use

### 1. Initializing and Saving Pieces

```python
import torch, torch.nn as nn
from neuronpack import NeuronPack

# Create a NeuronPack Workspace
pack = NeuronPack('/path/to/my_model.neuron')

# Define a piece of your network
class MLPExpert(nn.Module):
    def __init__(self):
        super().__init__()
        self.w = nn.Linear(8, 8)

expert = MLPExpert()

# Metadata defines the piece contract and routing semantic vector
meta = {
    "role": "mlp_expert",
    "architecture": "MLPExpert",
    "embedding": [0.1, -0.4, 0.9, ...], # Semantic positioning
    "version": "1.0.0",
}

# The parameters and layout are saved safely to disk in isolation
pack.add_piece("unique_piece_id", expert.state_dict(), meta)
```

### 2. Live Pruning & AOT Compilation

```python
# AOT Compile the piece to C++ machine code using torch._export
# This stores `unique_piece_id.kernel.pt2` next to the weights on disk
pack.compile_piece("unique_piece_id", MLPExpert(), dummy_inputs)

# Delete experts whose telemetry indicates they are never routed to
# (E.g. load_count falls below threshold)
pack.prune(role="mlp_expert", min_load_count=10)
```

### 3. Executing True RSI (Evolutionary Looping)

```python
from neuronpack.rsi import Evaluator, Mutator, RSIController

# Create an Evaluator penalizing Architecture Size (C) against Performance (P)
evaluator = Evaluator(
    performance_fn=my_accuracy_metric_function,
    gamma=50.0 # Parameter bloat penalty
)

# Mutator can jump between different Archetypes entirely (Heterogeneous Search)
mutator = Mutator(piece_archetypes=[TinyRNN_Archetype, Transformer_Archetype])

controller = RSIController(pack, monolithic_base_model, evaluator, mutator)

# During training, autonomously evaluate permutations and safely mutate
for batch in train_loader:
    # 1. Spawn clones with weight noise, or entirely new primitives
    # 2. Evaluate fitness J(A')
    # 3. Swap safely into the container if Performance > Bloat Penalty
    # 4. Rollback and cleanup disk automatically if it fails!
    controller.run_generation() 
```

---

## 📂 Layout

When you initialize NeuronPack, it formats your directory reliably for both Python and Native implementations:

```
my_model.neuron/
├── manifest.json
├── router/
│   ├── embeddings.npy          # Numba optimized vector map
│   └── registry.json           # Vector ID hash matching
└── pieces/
    ├── attn_layer_04.meta      # JSON Interface Contract & Telemetry
    ├── attn_layer_04.kernel.pt2# PyTorch AOT-compiled Binary
    └── attn_layer_04.weights   # Raw Isolated Safetensors
```

### Examples
Check out the `/examples` directory for advanced implementations:
*   `train_rsi_advanced.py`: True RSI dynamically swapping between Recurrent layers and Transformer Attention Mechanisms based on compute tradeoffs.
*   `train_rsi_diffusion.py`: Evolving sequential Denoising step architectures via the Mutator.
*   `benchmark_traditional.py`: Benchmark utilities vs traditional PyTorch paradigms.

---

### We Welcome Contributions
Since this is an experimental RSI architecture, feel free to open PRs for enhanced routing mechanisms, CUDA bindings, or more complex Mutator archetypes!
