Metadata-Version: 2.4
Name: compactllm
Version: 0.1.1
Summary: Compact AI, any device. No API key. No cost.
Author-email: CompactLLM <hello@compactllm.dev>
License: Apache-2.0
Project-URL: Homepage, https://getcompactllm.com
Project-URL: Repository, https://github.com/compactllm-dev/compactllm
Project-URL: Documentation, https://docs.getcompactllm.com
Project-URL: Bug Tracker, https://github.com/compactllm-dev/compactllm/issues
Project-URL: Changelog, https://github.com/compactllm-dev/compactllm/blob/main/CHANGELOG.md
Keywords: ai,llm,local,free,offline,huggingface,models,compact,no-api-key,on-device,privacy,open-source
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: psutil>=5.9
Requires-Dist: huggingface_hub>=0.20
Requires-Dist: transformers>=4.40
Requires-Dist: torch>=2.0
Provides-Extra: gpu
Requires-Dist: bitsandbytes>=0.41; extra == "gpu"
Provides-Extra: gguf
Requires-Dist: llama-cpp-python>=0.2; extra == "gguf"
Provides-Extra: pretty
Requires-Dist: rich>=13.0; extra == "pretty"
Provides-Extra: full
Requires-Dist: bitsandbytes>=0.41; extra == "full"
Requires-Dist: llama-cpp-python>=0.2; extra == "full"
Requires-Dist: rich>=13.0; extra == "full"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: black>=23.0; extra == "dev"
Requires-Dist: ruff>=0.1; extra == "dev"
Requires-Dist: mypy>=1.0; extra == "dev"
Requires-Dist: build; extra == "dev"
Requires-Dist: twine; extra == "dev"

# 🤖 CompactLLM

> **Compact AI, any device. No API key. No cost.**

[![PyPI version](https://img.shields.io/pypi/v/compactllm.svg)](https://pypi.org/project/compactllm/)
[![Python](https://img.shields.io/pypi/pyversions/compactllm.svg)](https://pypi.org/project/compactllm/)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE)

CompactLLM lets you run powerful AI models **locally on any device** — from an old laptop to a gaming PC. No API keys. No subscriptions. No internet required after the first download. Just AI that runs anywhere.

```python
from compactllm import Model

model = Model('smollm2')
print(model.ask('What is AI?'))
# → "AI, or Artificial Intelligence, is the simulation of human intelligence..."
```

That's it. CompactLLM downloads the model on first run and caches it forever. Every run after that is instant.

---

## ✨ Why CompactLLM?

| | CompactLLM | Other Libraries |
|---|---|---|
| API key needed | ❌ Never | ✅ Always |
| Works offline | ✅ Yes | ❌ No |
| Auto hardware detection | ✅ Yes | ❌ No |
| Shows RAM requirements | ✅ Yes | ❌ No |
| Beginner-friendly | ✅ Yes | ⚠️ Sometimes |
| Free forever | ✅ Yes | ⚠️ Free tier limits |

---

## 🚀 Installation

```bash
pip install compactllm
```

**CPU-only install** (lighter, no CUDA dependencies):
```bash
pip install compactllm
pip install torch --index-url https://download.pytorch.org/whl/cpu
```

**With GPU acceleration:**
```bash
pip install compactllm[gpu]
```

**Everything included:**
```bash
pip install compactllm[full]
```

---

## ⚡ Quick Start

### 1. Check what your machine can run

```python
from compactllm import detect_hardware, list_models

# See your hardware
hw = detect_hardware()
print(hw)
# {
#   'ram_gb': 16.0,
#   'cpu_name': 'Intel Core i7-12700H',
#   'has_gpu': False,
#   'tier': 'mid',
#   ...
# }

# List models your machine can run right now
models = list_models()
for m in models:
    print(f"{m['name']:<30} ~{m['min_ram_gb']} GB RAM  |  {m['best_for']}")
```

### 2. Load and run a model

```python
from compactllm import Model

model = Model('smollm2')
response = model.ask('Explain machine learning in simple terms.')
print(response)
```

### 3. Let CompactLLM pick for you

```python
# Automatically selects the best model your hardware can run
model = Model.auto()
print(model.ask('Hello!'))

# Auto-select for a specific task
model = Model.auto(task='coding')
print(model.ask('Write a Python function to reverse a string.'))
```

---

## 📋 Model Catalog

CompactLLM organises models into **4 tiers** based on RAM requirements. Use `list_models()` to see what's available for your machine.

### 🥔 Potato Tier — Under 4 GB RAM
*Works on old laptops, budget PCs, Raspberry Pi 5*

| Model ID | Name | RAM | Best For | License |
|---|---|---|---|---|
| `smollm2-tiny` | SmolLM2 135M | ~0.3 GB | Ultra-fast, testing | Apache 2.0 |
| `qwen2.5-0.5b` | Qwen 2.5 0.5B | ~0.6 GB | Tiny multilingual tasks | Apache 2.0 |
| `tinyllama` | TinyLlama 1.1B | ~1 GB | General chat | Apache 2.0 |
| `gemma3-1b` | Gemma 3 1B | ~1 GB | 128K context, 140 languages | Gemma ToS |
| `deepseek-r1-1.5b` | DeepSeek R1 1.5B | ~1.5 GB | Math & reasoning | MIT |

### 💻 Basic Tier — 4–8 GB RAM
*Works on most laptops from 2018+*

| Model ID | Name | RAM | Best For | License |
|---|---|---|---|---|
| `smollm2` ⭐ | SmolLM2 1.7B | ~2 GB | Best quality <2B. General use | Apache 2.0 |
| `qwen2.5-1.5b` | Qwen 2.5 1.5B | ~1.8 GB | Multilingual, long docs | Apache 2.0 |
| `llama3.2-1b` | Llama 3.2 1B | ~1.2 GB | Meta's edge model | Llama 3.2 |
| `exaone-2.4b` | EXAONE 3.5 2.4B | ~2.5 GB | Reasoning + EN/KO | Apache 2.0 |
| `phi3.5-mini` | Phi-3.5 Mini | ~3.5 GB | Coding & reasoning | MIT |

### 🖥️ Mid Tier — 8–16 GB RAM
*Works on good laptops and most desktops*

| Model ID | Name | RAM | Best For | License |
|---|---|---|---|---|
| `llama3.2-3b` ⭐ | Llama 3.2 3B | ~3.5 GB | Best 3B overall | Llama 3.2 |
| `qwen2.5-coder-3b` | Qwen 2.5 Coder 3B | ~3 GB | Code generation | Apache 2.0 |
| `gemma3-4b` | Gemma 3 4B | ~4 GB | Chat + image understanding | Gemma ToS |
| `mistral-7b` ⭐ | Mistral 7B | ~5 GB | Fast, general purpose | Apache 2.0 |

### 🚀 High Tier — 16 GB+ RAM
*Workstations and gaming PCs*

| Model ID | Name | RAM | Best For | License |
|---|---|---|---|---|
| `phi4` | Phi-4 14B | ~9 GB | Best coding + reasoning | MIT |
| `llama3.1-8b` | Llama 3.1 8B | ~6 GB | High quality chat | Llama 3.1 |

⭐ = Recommended starting point for that tier

---

## 💡 Usage Examples

### Simple Q&A

```python
from compactllm import Model

model = Model('smollm2')
print(model.ask('What is the difference between RAM and ROM?'))
```

### Streaming Output

```python
model = Model('mistral-7b')

for chunk in model.stream('Write a short story about a robot learning to paint'):
    print(chunk, end='', flush=True)

print()  # final newline
```

### Custom System Prompt

```python
model = Model('smollm2')

response = model.ask(
    'What should I eat for breakfast?',
    system='You are a professional nutritionist. Give concise, practical advice.'
)
print(response)
```

### Deterministic Output

```python
# temperature=0.0 gives the same output every time (good for testing)
response = model.ask('Name 3 planets.', temperature=0.0)
```

### Interactive Chat

```python
model = Model('smollm2')
model.chat()
# Starts an interactive terminal chat session
# Type 'help' for commands, 'quit' to exit
```

### Browse and Compare Models

```python
from compactllm import list_models

# Models your machine can run
my_models = list_models()

# Only coding models
coding = list_models(task='coding')

# Only potato-tier models
tiny = list_models(tier='potato')

# Everything available
all_models = list_models(show_all=True)

# Print a summary
for m in my_models:
    print(f"{m['id']:<20} {m['params']:<8} ~{m['min_ram_gb']}GB  {m['best_for']}")
```

### Get Model Details

```python
from compactllm import get_model_info

info = get_model_info('mistral-7b')
print(info['name'])        # Mistral 7B Instruct
print(info['min_ram_gb'])  # 5.0
print(info['license'])     # Apache 2.0
print(info['tags'])        # ['chat', 'general', 'fast', 'recommended']
```

---

## 🖥️ Command Line Interface

After installing CompactLLM, the `compact` command is available in your terminal.

```bash
# Check your hardware
compact hardware

# List models for your machine
compact list

# List all models regardless of hardware
compact list --all

# Filter by tier
compact list --tier basic
compact list --tier potato

# Filter by task
compact list --task coding
compact list --task reasoning

# Ask a one-shot question
compact ask smollm2 "What is recursion?"
compact ask mistral-7b "Explain quantum computing simply"

# Start an interactive chat
compact chat smollm2
compact chat mistral-7b

# Get full details about a model
compact info smollm2
compact info mistral-7b

# Check version
compact version
```

---

## 🔧 How It Works

```
pip install compactllm
        ↓
from compactllm import Model
        ↓
Model('smollm2')           ← Resolves model ID, checks your RAM
        ↓
model.ask('Hello!')        ← Downloads model on first call (cached forever)
        ↓
Auto-detects CPU/GPU       ← Picks best device automatically
        ↓
Returns response string    ← Clean, no boilerplate
```

**Model caching:** Models are downloaded to `~/.cache/compactllm/` on first use. After that, they load from disk — no internet needed.

**Hardware detection:** CompactLLM checks your RAM and GPU automatically. If you try to load a model that's too big, it warns you and suggests a better option.

**Model registry updates:** The model list updates automatically once a week by fetching a lightweight JSON file from GitHub. This means you always see the latest recommended models without updating the package. It falls back gracefully to the built-in list if you're offline.

---

## 🗺️ Roadmap

- [x] **v0.1** — Python library: core inference, model registry, hardware detection, CLI
- [ ] **v0.2** — Streaming improvements, GGUF/llama.cpp backend for better CPU performance
- [ ] **v0.3** — Vision models (image understanding with LLaVA/Gemma Vision)
- [ ] **v0.4** — Audio (Whisper speech-to-text, Piper text-to-speech — both free and local)
- [ ] **v0.5** — Embeddings for semantic search and RAG pipelines
- [ ] **v1.0** — NPM package (`npm install compactllm`) for Node.js and browser
- [ ] **v1.1** — Browser-local inference via ONNX/WebAssembly

---

## 🤝 Contributing

Contributions are very welcome! Here's how to get involved:

```bash
git clone https://github.com/compactllm-dev/compactllm
cd compactllm
pip install -e ".[dev]"
pytest tests/
```

Most wanted contributions:
- New model entries in `registry.py` (especially non-English models)
- Better hardware detection (especially for ARM Macs and Windows)
- Tests and documentation improvements
- Real-world usage examples

---

## 📜 License

CompactLLM is released under the **Apache 2.0 license** — free for personal and commercial use.

Individual AI models have their own licenses (all free for personal use). Check the `license` field from `get_model_info()` or `list_models()` before commercial deployment.

---

## 🙏 Acknowledgements

Built on top of the incredible work by:
- [HuggingFace Transformers](https://github.com/huggingface/transformers)
- [Meta Llama](https://llama.meta.com/)
- [Mistral AI](https://mistral.ai/)
- [Microsoft Phi](https://azure.microsoft.com/en-us/products/phi/)
- [Google Gemma](https://ai.google.dev/gemma)
- [HuggingFace SmolLM](https://huggingface.co/HuggingFaceTB)

---

<p align="center">
  <strong>CompactLLM — Compact AI, any device. No API key. No cost.</strong><br>
  <a href="https://getcompactllm.com">getcompactllm.com</a> •
  <a href="https://github.com/compactllm-dev/compactllm">GitHub</a> •
  <a href="https://pypi.org/project/compactllm">PyPI</a>
</p>
