Metadata-Version: 2.4
Name: isagellm-compression
Version: 0.5.2.3
Summary: Model Compression & Acceleration Module for sageLLM
Author-email: IntelliStream Team <shuhao_zhang@hust.edu.cn>
License: Private
Project-URL: Homepage, https://github.com/intellistream/sagellm-compression
Project-URL: Repository, https://github.com/intellistream/sagellm-compression
Project-URL: Issues, https://github.com/intellistream/sagellm-compression/issues
Keywords: llm,inference,quantization,sparsity,compression,domestic-hardware
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Python: ==3.11.*
Description-Content-Type: text/markdown
Requires-Dist: isagellm-protocol<0.6.0,>=0.5.2.0
Requires-Dist: isagellm-backend<0.6.0,>=0.5.2.13
Requires-Dist: pydantic>=2.0.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: isage-pypi-publisher>=0.2.0; extra == "dev"

# sagellm-compression

Model Compression & Acceleration Module for SageLLM (Task 3).

[![CI](https://github.com/intellistream/sagellm-compression/actions/workflows/ci.yml/badge.svg)](https://github.com/intellistream/sagellm-compression/actions/workflows/ci.yml)
[![PyPI version](https://badge.fury.io/py/isagellm-compression.svg)](https://badge.fury.io/py/isagellm-compression)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![codecov](https://codecov.io/gh/intellistream/sagellm-compression/branch/main/graph/badge.svg)](https://codecov.io/gh/intellistream/sagellm-compression)

## 📌 Architecture & Responsibility

`sagellm-compression` is the core module responsible for model compression (quantization, sparsity) and inference acceleration strategies (speculative decoding, CoT optimizations).

### Dependency Graph

```mermaid
graph TD
    Protocol[isagellm-protocol] --> Backend[isagellm-backend]
    Protocol --> Compression[isagellm-compression]
    Backend --> Compression
    Compression --> Core[isagellm-core]
    Compression --> KVCache[isagellm-kv-cache]
```

- **Depends on**:
  - `isagellm-protocol`: Shared schemas and definitions.
  - `isagellm-backend`: Hardware acceleration kernels (for Quantization).
- **Used by**:
  - `isagellm-core`: Main inference engine (integrates acceleration strategies).
  - `isagellm-kv-cache`: Uses compression techniques for KV storage.

## ✨ Features

- **Chain-of-Thought (CoT)**: Template management and prompt engineering strategies (Zero-shot, Few-shot, Self-Consistency).
- **Quantization** (Planned Task 3.1): INT8/INT4 weight and activation quantization.
- **Sparsity** (Planned Task 3.2): Structured and unstructured pruning support.
- **Speculative Decoding** (Planned Task 3.3): Draft model orchestration.
- **Kernel Fusion** (Planned Task 3.4): Operator fusion optimization.

## 📦 Installation

```bash
pip install isagellm-compression
```

### Requirements
- Python >= 3.10
- `isagellm-protocol >= 0.4.0.0`
- `isagellm-backend >= 0.4.0.0`

## 🚀 Quick Start

### Chain-of-Thought (CoT) Templates

Currently, the CoT module provides template management for reasoning tasks.

```python
from sagellm_compression.cot import CoTTemplateManager

# Initialize the manager
# Note: Ensure you have the templates directory available
manager = CoTTemplateManager(template_dir="templates/cot")

# Load a specific strategy template (e.g., zero-shot reasoning)
try:
    template_content = manager.load_template("zero_shot")
    # Render the prompt with a question
    prompt = manager.render(template_content, question="What is the result of 25 * 14?")
    print("--- Generated Prompt ---")
    print(prompt)
except FileNotFoundError:
    print("Template not found. Please ensure 'templates/cot' exists.")
```

## 🛠️ Development

### Setup

```bash
git clone git@github.com:intellistream/sagellm-compression.git
cd sagellm-compression
./quickstart.sh

# Install in editable mode with dev dependencies
pip install -e ".[dev]"
```

### Testing & Linting

```bash
# Run all tests
pytest -v

# Check code style
ruff check .
ruff format .
```

### Core Principles
- **Protocol-First**: Changes involving schemas must update `isagellm-protocol` first.
- **CPU-First**: All compression logic must reference CPU implementation by default.
- **Fail-Fast**: Missing configurations must raise explicit errors.

## 📚 Documentation

- [Protocol Specification (v0.1)](https://github.com/intellistream/sagellm-docs/blob/main/docs/specs/protocol_v0.1.md)
- [Team & Roles](docs/TEAM.md)
- [Related Papers](docs/RELATED_PAPERS_SUMMARY.md)

## 🔄 贡献指南

请遵循以下工作流程：

1. **创建 Issue** - 描述问题/需求
   ```bash
   gh issue create --title "[Bug] 描述" --label "bug,sagellm-compression"
   ```

2. **开发修复** - 在本地 `fix/#123-xxx` 分支解决
   ```bash
   git checkout -b fix/#123-xxx origin/main-dev
   # 开发、测试...
   pytest -v
   ruff format . && ruff check . --fix
   ```

3. **发起 PR** - 提交到 `main-dev` 分支
   ```bash
   gh pr create --base main-dev --title "Fix: 描述" --body "Closes #123"
   ```

4. **合并** - 审批后合并到 `main-dev`

更多详情见 [.github/copilot-instructions.md](.github/copilot-instructions.md)

## 📅 Versioning & Changelog

Current Version: **0.4.0.10**

See [CHANGELOG.md](CHANGELOG.md) for full history.

## License

Private - IntelliStream Research Project
