Metadata-Version: 2.4
Name: isagellm-kv-cache
Version: 0.5.2.5
Summary: KV Cache Management Module for sageLLM
Author-email: IntelliStream Team <shuhao_zhang@hust.edu.cn>
License: Private
Project-URL: Homepage, https://github.com/intellistream/sagellm-kv-cache
Project-URL: Repository, https://github.com/intellistream/sagellm-kv-cache
Project-URL: Issues, https://github.com/intellistream/sagellm-kv-cache/issues
Keywords: llm,inference,kv-cache,domestic-hardware
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Python: ==3.11.*
Description-Content-Type: text/markdown
Requires-Dist: pydantic>=2.0.0
Requires-Dist: isagellm-protocol<0.6.0,>=0.5.2.0
Requires-Dist: isagellm-backend<0.6.0,>=0.5.2.13
Requires-Dist: isagellm-comm<0.6.0,>=0.5.2.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: isage-pypi-publisher>=0.2.0; extra == "dev"
Requires-Dist: matplotlib>=3.5.0; extra == "dev"
Requires-Dist: numpy>=1.21.0; extra == "dev"

# sagellm-kv-cache

**KV Cache Management + KV Transfer** for sageLLM inference engine.

[![CI](https://github.com/intellistream/sagellm-kv-cache/actions/workflows/ci.yml/badge.svg)](https://github.com/intellistream/sagellm-kv-cache/actions/workflows/ci.yml)
[![codecov](https://codecov.io/gh/intellistream/sagellm-kv-cache/branch/main/graph/badge.svg)](https://codecov.io/gh/intellistream/sagellm-kv-cache)
[![PyPI version](https://badge.fury.io/py/isagellm-kv-cache.svg)](https://badge.fury.io/py/isagellm-kv-cache)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)

## Overview

This package provides efficient KV cache management and transfer for LLM inference.

**Key Features**:

- **KV Pool**: Block-based memory management with budget control.
- **KV Transfer**: Primitives for cross-node KV block migration.
- **Observability**: Metrics and hooks for cache monitoring.

### Architecture

```
┌─────────────────────────────────────────────────────────────────────┐
│                    sagellm-control-plane                            │
│              (Scheduling: alloc/free/migrate decisions)             │
└────────────────────────────┬────────────────────────────────────────┘
                             │ KVCacheInterface
                             ▼
┌─────────────────────────────────────────────────────────────────────┐
│                      sagellm-kv-cache (This Package)                 │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐ │
│  │ PrefixCache │  │  KV Pool    │  │  Eviction   │  │ KV Transfer │ │
│  │  (Task2.1)  │  │  (Task2.2)  │  │  (Task2.3)  │  │  (Task1.3)  │ │
│  └─────────────┘  └─────────────┘  └─────────────┘  └──────┬──────┘ │
└────────────────────────────────────────────────────────────┼────────┘
                             ┌───────────────────────────────┘
                             │ Use CommBackend for transport
                             ▼
┌─────────────────────────────────────────────────────────────────────┐
│                         sagellm-comm                                 │
│              (Network Layer: Topology, Collectives)                 │
└─────────────────────────────────────────────────────────────────────┘
```

## Installation

```bash
pip install isagellm-kv-cache
```

## Quick Start (CPU-first)

### KV Pool

```python
from sagellm_kv_cache.pool import KVPool

# Create a KV pool with budget control
pool = KVPool(max_tokens=1024)

# Allocate KV cache block
handle = pool.alloc(num_tokens=128, device="cpu")
print(f"Allocated handle: {handle.handle_id}, Tokens: {handle.num_tokens}")

# Free the handle
pool.free(handle)
```

### Prefix Cache (Task 2.1)

```python
from sagellm_kv_cache import PrefixCache

# Create cache with block-based hashing
cache = PrefixCache(block_size=16, max_cached_blocks=100, enable_lru=True)

# Insert prefix blocks
tokens = list(range(48))  # 3 blocks
hashes = cache.compute_block_hashes(tokens)
blocks = [{"block_id": i} for i in range(len(hashes))]
cache.insert(hashes, blocks)

# Lookup with prefix overlap
hit_blocks, num_tokens = cache.lookup(hashes)
print(f"Reused {num_tokens} tokens from cache!")

# Check hit rate
stats = cache.get_stats()
print(f"Hit rate: {stats['hit_rate']:.1%}")
```

See [examples/prefix_cache_example.py](examples/prefix_cache_example.py) for comprehensive usage
examples.

### KV Cache Access Pattern Profiling

```python
from sagellm_kv_cache.profiling import AccessStatsCollector

# Create statistics collector
collector = AccessStatsCollector()

# Record accesses during inference
collector.record_access("block_001", is_hit=True)
collector.record_access("block_002", is_hit=False)

# Export statistics to JSON
collector.export_stats("stats.json")

# Get summary
summary = collector.get_stats_summary()
print(f"Hit rate: {summary['hit_rate']:.2%}")
print(f"Total accesses: {summary['total_accesses']}")
```

**CLI Tool - Generate Demo Data**:

```bash
# Generate demo statistics
sage-kv-stats demo --num-accesses 1000 --output demo_stats.json

# Or use the Python script
python examples/kv_profiling_demo.py --num-accesses 500 --output stats.json
```

**CLI Tool - Visualize Results**:

```bash
# Generate heatmap
sage-kv-stats visualize --input stats.json --output heatmap.png

# Generate all visualizations with summary
sage-kv-stats visualize --input stats.json --type all --summary

# Or use the Python script
python scripts/visualize_access_pattern.py --input stats.json --type all --summary
```

**Install visualization dependencies** (matplotlib is optional):

```bash
pip install isagellm-kv-cache[visualization]
```

## API Reference

### Core Components

- **`PrefixCache`** (`sagellm_kv_cache`): Block-hash based prefix caching for cross-request KV
  reuse. Supports LRU eviction, hit rate tracking, and handle invalidation. See Task 2.1.
- **`KVPool`** (`sagellm_kv_cache.pool`): Main entry point for memory management. Handles
  allocation, freeing, and budget enforcement.
- **`KVHandle`** (`sagellm_kv_cache`): Represents a reference to allocated KV cache. Contains
  metadata like `handle_id`, `dtype`, `layout`.
- **`KVTransferEngine`** (`sagellm_kv_cache`): Handles moving KV blocks between nodes using
  `sagellm-comm`.
- **`EvictionManager`** (`sagellm_kv_cache`): Eviction policy management with LRU/FIFO strategies.
- **`SchedulerBridge`** (`sagellm_kv_cache`): Bridge between scheduler IR and KV pool operations.

### Dependencies

- `isagellm-protocol`: Common data structures and protocol definitions.
- `isagellm-backend`: Backend abstraction.
- `isagellm-comm`: Communication layer for transfer.

## Development

1. **Install dev dependencies**:
   ```bash
   pip install -e .[dev]
   ```
1. **Run tests**:
   ```bash
   pytest
   ```
1. **Linting**:
   ```bash
   ruff check .
   ```

## Version

Current version: 0.4.0.11 See [CHANGELOG.md](CHANGELOG.md) for history.
