Metadata-Version: 2.4
Name: isagellm-kv-cache
Version: 0.4.1.2
Summary: KV Cache Management Module for sageLLM
Author-email: IntelliStream Team <shuhao_zhang@hust.edu.cn>
License: Private
Project-URL: Homepage, https://github.com/intellistream/sagellm-kv-cache
Project-URL: Repository, https://github.com/intellistream/sagellm-kv-cache
Project-URL: Issues, https://github.com/intellistream/sagellm-kv-cache/issues
Keywords: llm,inference,kv-cache,domestic-hardware
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Python: ==3.11.*
Description-Content-Type: text/markdown
Requires-Dist: pydantic>=2.0.0
Requires-Dist: isagellm-protocol<0.6.0,>=0.5.0.0
Requires-Dist: isagellm-backend<0.5.0,>=0.4.0.0
Requires-Dist: isagellm-comm<0.5.0,>=0.4.0.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: isage-pypi-publisher>=0.2.0; extra == "dev"

# sagellm-kv-cache

**KV Cache Management + KV Transfer** for sageLLM inference engine.

[![CI](https://github.com/intellistream/sagellm-kv-cache/actions/workflows/ci.yml/badge.svg)](https://github.com/intellistream/sagellm-kv-cache/actions/workflows/ci.yml)
[![codecov](https://codecov.io/gh/intellistream/sagellm-kv-cache/branch/main/graph/badge.svg)](https://codecov.io/gh/intellistream/sagellm-kv-cache)
[![PyPI version](https://badge.fury.io/py/isagellm-kv-cache.svg)](https://badge.fury.io/py/isagellm-kv-cache)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)

## Overview

This package provides efficient KV cache management and transfer for LLM inference.

**Key Features**:
- **KV Pool**: Block-based memory management with budget control.
- **KV Transfer**: Primitives for cross-node KV block migration.
- **Observability**: Metrics and hooks for cache monitoring.

### Architecture

```
┌─────────────────────────────────────────────────────────────────────┐
│                    sagellm-control-plane                            │
│              (Scheduling: alloc/free/migrate decisions)             │
└────────────────────────────┬────────────────────────────────────────┘
                             │ KVCacheInterface
                             ▼
┌─────────────────────────────────────────────────────────────────────┐
│                      sagellm-kv-cache (This Package)                 │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐ │
│  │ PrefixCache │  │  KV Pool    │  │  Eviction   │  │ KV Transfer │ │
│  │  (Task2.1)  │  │  (Task2.2)  │  │  (Task2.3)  │  │  (Task1.3)  │ │
│  └─────────────┘  └─────────────┘  └─────────────┘  └──────┬──────┘ │
└────────────────────────────────────────────────────────────┼────────┘
                             ┌───────────────────────────────┘
                             │ Use CommBackend for transport
                             ▼
┌─────────────────────────────────────────────────────────────────────┐
│                         sagellm-comm                                 │
│              (Network Layer: Topology, Collectives)                 │
└─────────────────────────────────────────────────────────────────────┘
```

## Installation

```bash
pip install isagellm-kv-cache
```

## Quick Start (CPU-first)

```python
from sagellm_kv_cache.pool import KVPool

# Create a KV pool with budget control
pool = KVPool(max_tokens=1024)

# Allocate KV cache block
handle = pool.alloc(num_tokens=128, device="cpu")
print(f"Allocated handle: {handle.handle_id}, Tokens: {handle.num_tokens}")

# Free the handle
pool.free(handle)
```

## API Reference

### Core Components

- **`KVPool`** (`sagellm_kv_cache.pool`): Main entry point for memory management. Handles allocation, freeing, and budget enforcement.
- **`KVHandle`** (`sagellm_kv_cache`): Represents a reference to allocated KV cache. Contains metadata like `handle_id`, `dtype`, `layout`.
- **`KVTransferEngine`** (`sagellm_kv_cache`): Handles moving KV blocks between nodes using `sagellm-comm`.

### Dependencies

- `isagellm-protocol`: Common data structures and protocol definitions.
- `isagellm-backend`: Backend abstraction.
- `isagellm-comm`: Communication layer for transfer.

## Development

1. **Install dev dependencies**:
   ```bash
   pip install -e .[dev]
   ```
2. **Run tests**:
   ```bash
   pytest
   ```
3. **Linting**:
   ```bash
   ruff check .
   ```

## Version

Current version: 0.4.0.11
See [CHANGELOG.md](CHANGELOG.md) for history.
