Metadata-Version: 2.4
Name: linkllm
Version: 0.0.1
Summary: The unified LLM runtime — local inference, API proxy, and monitoring. A powerful alternative to Ollama + LiteLLM, built in Rust.
Project-URL: Homepage, https://github.com/linkllm/linkllm
Project-URL: Documentation, https://docs.linkllm.dev
Project-URL: Repository, https://github.com/linkllm/linkllm
Project-URL: Issues, https://github.com/linkllm/linkllm/issues
Project-URL: Changelog, https://github.com/linkllm/linkllm/blob/main/CHANGELOG.md
Author-email: AJ Ashik <aj@linkllm.dev>
License: MIT
License-File: LICENSE
Keywords: ai,anthropic,gemini,gguf,huggingface,inference,litellm,llama,llm,local-llm,model-router,ollama,openai,rust
Classifier: Development Status :: 1 - Planning
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.9
Requires-Dist: anyio>=4.0.0
Requires-Dist: httpx>=0.27.0
Requires-Dist: pydantic>=2.0.0
Provides-Extra: dev
Requires-Dist: mypy>=1.10.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23.0; extra == 'dev'
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Requires-Dist: ruff>=0.4.0; extra == 'dev'
Provides-Extra: huggingface
Requires-Dist: huggingface-hub>=0.23.0; extra == 'huggingface'
Requires-Dist: torch>=2.2.0; extra == 'huggingface'
Requires-Dist: transformers>=4.40.0; extra == 'huggingface'
Description-Content-Type: text/markdown

<div align="center">

<img src="https://raw.githubusercontent.com/linkllm/linkllm/main/assets/banner.svg" alt="LinkLLM" width="100%" />

<h1>
  <img src="https://raw.githubusercontent.com/linkllm/linkllm/main/assets/logo.svg" width="32" height="32" align="center" />
  LinkLLM
</h1>

<p>
  <strong>The unified LLM runtime — local inference, API proxy, and monitoring in one blazing-fast tool.</strong><br/>
  A powerful alternative to <a href="https://ollama.com">Ollama</a> + <a href="https://github.com/BerriAI/litellm">LiteLLM</a>, built from the ground up in Rust.
</p>

<p>
  <a href="https://github.com/linkllm/linkllm/releases"><img src="https://img.shields.io/github/v/release/linkllm/linkllm?style=flat-square&color=6e56cf&label=latest" alt="Latest Release" /></a>
  <a href="https://crates.io/crates/linkllm"><img src="https://img.shields.io/crates/v/linkllm?style=flat-square&color=e06c44&label=crates.io" alt="Crates.io" /></a>
  <a href="https://pypi.org/project/linkllm"><img src="https://img.shields.io/pypi/v/linkllm?style=flat-square&color=3b82f6&label=pypi" alt="PyPI" /></a>
  <a href="https://www.npmjs.com/package/linkllm"><img src="https://img.shields.io/npm/v/linkllm?style=flat-square&color=f0db4f&label=npm" alt="npm" /></a>
  <a href="https://github.com/linkllm/linkllm/blob/main/LICENSE"><img src="https://img.shields.io/badge/license-MIT-green?style=flat-square" alt="License" /></a>
  <a href="https://github.com/linkllm/linkllm/stargazers"><img src="https://img.shields.io/github/stars/linkllm/linkllm?style=flat-square&color=yellow" alt="Stars" /></a>
</p>

<p>
  <a href="#-quick-start">Quick Start</a> ·
  <a href="#-features">Features</a> ·
  <a href="#-installation">Installation</a> ·
  <a href="#-usage">Usage</a> ·
  <a href="#-api">API</a> ·
  <a href="https://docs.linkllm.dev">Documentation</a> ·
  <a href="#-contributing">Contributing</a>
</p>

```bash
# Install and run your first model in under 60 seconds
curl -fsSL https://install.linkllm.dev | sh
linkllm pull mistralai/Mistral-7B-Instruct-v0.3-GGUF
linkllm chat mistral
```

</div>

---

## What is LinkLLM?

LinkLLM is a **single tool** that replaces both Ollama and LiteLLM — plus goes further. It gives you:

- **Local inference** of any GGUF model (llama.cpp-powered, pure-Rust candle backend)
- **API proxy** to OpenAI, Gemini, Anthropic, Groq, and any OpenAI-compatible endpoint
- **Model management** — pull any model from HuggingFace with one command
- **Production-ready** REST API with OpenAI-compatible routes, auth, rate limiting, TLS
- **Real-time monitoring** dashboard right inside your terminal
- **Multi-model routing** with fallback chains and cost tracking

All in a single binary. No Docker required. Works on Windows, macOS, Linux, and Termux.

---

## ✨ Features

### 🦀 Rust-Powered Core
Built on [Tokio](https://tokio.rs) + [Axum](https://github.com/tokio-rs/axum) — async from the ground up. Memory safe, no garbage collector pauses, minimal footprint.

### 🤖 Local Model Inference
- Run GGUF models via `llama.cpp` FFI bindings — same performance, Rust-safe wrapper
- Pure Rust inference with [candle](https://github.com/huggingface/candle) (no C++ dependency)
- GPU acceleration: CUDA, ROCm, Apple Metal — auto-detected
- Quantization: Q4_K_M, Q5_K_S, Q8_0, F16 and more

### 🌐 Universal API Proxy
Route requests to any provider through a single unified API:

| Provider | Models |
|---|---|
| OpenAI | gpt-4o, o1, gpt-4-turbo, ... |
| Google Gemini | gemini-2.0-flash, gemini-1.5-pro, ... |
| Anthropic | claude-3-5-sonnet, claude-3-opus, ... |
| Groq | llama3, mixtral (ultra-fast) |
| Together AI | 50+ open models |
| Any OpenAI-compat | Custom base URL |

### 📦 HuggingFace Model Pull
```bash
linkllm pull mistralai/Mistral-7B-Instruct-v0.3-GGUF
linkllm pull TheBloke/Llama-2-13B-chat-GGUF --quant Q4_K_M
linkllm pull google/gemma-2-9b-it
```
Resume interrupted downloads. SHA-256 integrity check. Auto-conversion to GGUF.

### 📊 Terminal Monitoring Dashboard
```bash
linkllm monitor
```
Real-time TUI powered by [Ink](https://github.com/vadimdemedes/ink):
- Tokens/second live graph
- Latency histograms (p50 / p95 / p99)
- Active model memory usage
- Per-provider cost breakdown
- Request log (live tail)
- API key usage tracker
- Error rate + alerts

### 🔐 Security-First Design
- AES-256-GCM encrypted API key store (OS keychain integration)
- TLS 1.3 by default, mTLS for production
- HMAC request signing in the Rust SDK
- JWT bearer tokens for server access
- Per-key rate limits and quotas
- Sandboxed model inference

### 🔀 Multi-Model Routing
Define routing rules in `linkllm.toml`:
```toml
[routing]
default = "mistral"

[[routing.rules]]
match = "code"
model = "deepseek-coder"

[[routing.rules]]
match = "long-context"
model = "gemini-1.5-pro"
fallback = ["gpt-4o", "claude-3-opus"]
```

---

## ⚡ Quick Start

### 1. Install

**Linux / macOS / Termux:**
```bash
curl -fsSL https://install.linkllm.dev | sh
```

**Windows (PowerShell):**
```powershell
irm https://install.linkllm.dev/windows | iex
```

**Homebrew:**
```bash
brew install linkllm/tap/linkllm
```

**npm (CLI only):**
```bash
npm install -g linkllm
```

**pip (Python SDK + CLI):**
```bash
pip install linkllm
```

**From source:**
```bash
git clone https://github.com/linkllm/linkllm
cd linkllm
cargo build --release
```

---

### 2. Pull a Model

```bash
# Pull from HuggingFace (GGUF auto-detected)
linkllm pull mistralai/Mistral-7B-Instruct-v0.3-GGUF

# Specify quantization
linkllm pull TheBloke/Llama-2-13B-chat-GGUF --quant Q4_K_M

# List downloaded models
linkllm list
```

### 3. Chat in Terminal

```bash
linkllm chat mistral
linkllm chat gpt-4o          # routes to OpenAI (needs API key)
linkllm chat gemini-flash    # routes to Google Gemini
```

### 4. Start the Server

```bash
linkllm serve
# Server running at http://localhost:11434
# OpenAI-compatible API at http://localhost:11434/v1
```

### 5. Monitor

```bash
linkllm monitor
```

---

## 🔌 API

LinkLLM exposes a fully **OpenAI-compatible REST API**. Drop it in as a replacement for `api.openai.com`:

### Chat Completions

```bash
curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-api-key" \
  -d '{
    "model": "mistral",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true
  }'
```

### Python — OpenAI SDK Compatible

```python
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="your-api-key"
)

response = client.chat.completions.create(
    model="mistral",
    messages=[{"role": "user", "content": "Explain Rust ownership"}],
    stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.content, end="")
```

### Python — LinkLLM Native SDK

```python
pip install linkllm
```

```python
import linkllm

client = linkllm.Client()

# Chat with any model — local or API
response = client.chat("mistral", "What is the capital of France?")
print(response.text)

# Streaming
for token in client.stream("gpt-4o", "Write a haiku about Rust"):
    print(token, end="", flush=True)

# Pull a model programmatically
client.pull("TheBloke/Mistral-7B-Instruct-v0.2-GGUF")

# List local models
models = client.list()
for m in models:
    print(f"{m.name} — {m.size_gb:.1f} GB")
```

### TypeScript / JavaScript

```bash
npm install linkllm
```

```typescript
import { LinkLLM } from "linkllm";

const client = new LinkLLM({ baseUrl: "http://localhost:11434" });

// Chat
const response = await client.chat({
  model: "mistral",
  messages: [{ role: "user", content: "Hello from TypeScript!" }],
});
console.log(response.content);

// Streaming
const stream = client.stream({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Tell me a story" }],
});

for await (const token of stream) {
  process.stdout.write(token);
}
```

### Rust SDK

```toml
# Cargo.toml
[dependencies]
linkllm = "0.1"
tokio = { version = "1", features = ["full"] }
```

```rust
use linkllm::{Client, ChatMessage, Role};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = Client::new("http://localhost:11434")?;

    let response = client
        .chat("mistral")
        .message(Role::User, "What is Rust?")
        .send()
        .await?;

    println!("{}", response.content());
    Ok(())
}
```

---

## ⚙️ Configuration

LinkLLM is configured via `~/.linkllm/config.toml`:

```toml
[server]
host = "127.0.0.1"
port = 11434
tls = false

[models]
default = "mistral"
model_dir = "~/.linkllm/models"

[inference]
gpu_layers = -1        # -1 = auto (offload all to GPU)
context_size = 4096
threads = 8

[api_keys]
# Encrypted. Use `linkllm key add` to set these safely.
openai = ""
gemini = ""
anthropic = ""
groq = ""

[routing]
default = "mistral"
fallback_chain = ["mistral", "gpt-4o-mini"]

[monitoring]
enabled = true
metrics_port = 9090    # Prometheus-compatible /metrics
log_level = "info"
```

### Managing API Keys

```bash
linkllm key add openai sk-...
linkllm key add gemini AIza...
linkllm key add anthropic sk-ant-...
linkllm key list
linkllm key rm openai
```

Keys are stored encrypted with AES-256-GCM, tied to your OS keychain.

---

## 📋 CLI Reference

```
linkllm <command> [options]

Commands:
  serve               Start the LinkLLM server
  chat [model]        Start an interactive chat session
  pull <user/model>   Pull a model from HuggingFace
  push <model>        Push a model to the LinkLLM registry
  list                List all local models
  rm <model>          Remove a local model
  show <model>        Show model info and metadata
  monitor             Open the TUI monitoring dashboard
  key <add|rm|list>   Manage encrypted API keys
  config <get|set>    View or update configuration
  run <model>         Pull (if needed) and start chatting

Options:
  --host              Server host (default: 127.0.0.1)
  --port              Server port (default: 11434)
  --model-dir         Override model storage directory
  --log-level         Log verbosity: error|warn|info|debug|trace
  -v, --version       Print version
  -h, --help          Show help
```

---

## 🆚 Comparison

|  | **LinkLLM** | Ollama | LiteLLM |
|---|---|---|---|
| Local GGUF inference | ✅ | ✅ | ❌ |
| API proxy (OpenAI / Gemini / etc.) | ✅ | ❌ | ✅ |
| HuggingFace model pull | ✅ | Partial | ❌ |
| TUI monitoring dashboard | ✅ | ❌ | Web UI only |
| Multi-model routing + fallback | ✅ | ❌ | ✅ |
| Encrypted API key management | ✅ | ❌ | Partial |
| Rust core (memory safe) | ✅ | Go | Python |
| OpenAI-compatible REST API | ✅ | ✅ | ✅ |
| Native Rust SDK | ✅ | ❌ | ❌ |
| Pure-Rust inference (candle) | ✅ | ❌ | ❌ |
| Mobile / Termux | ✅ | Limited | Limited |
| Cost tracking per request | ✅ | ❌ | ✅ |
| Single binary, no Docker | ✅ | ✅ | ❌ |

---

## 🏗️ Architecture

```
┌─────────────────────────────────────────────────┐
│              User Interface Layer                │
│   CLI Chat · TUI Monitor · Model Manager        │
│              (TypeScript + Ink)                  │
└────────────────────┬────────────────────────────┘
                     │
┌────────────────────▼────────────────────────────┐
│              API Gateway (Rust/Axum)             │
│   REST API · Auth · Rate Limiter · TLS          │
└────────────────────┬────────────────────────────┘
                     │
┌────────────────────▼────────────────────────────┐
│            Core Engine (Rust/Tokio)              │
│   Router · Pipeline · Context · Metrics         │
└──────┬─────────────┬───────────────┬────────────┘
       │             │               │
┌──────▼──────┐ ┌────▼─────┐ ┌──────▼──────┐
│  Local GGUF │ │  Python  │ │  API Proxy  │
│  llama.cpp  │ │  Bridge  │ │ OAI/Gemini/ │
│  + candle   │ │    HF    │ │  Anthropic  │
└─────────────┘ └──────────┘ └─────────────┘
```

See the full [Architecture Document](docs/ARCHITECTURE.md) for details.

---

## 📦 Packages

| Package | Registry | Install |
|---|---|---|
| `linkllm` (binary) | GitHub Releases | `curl -fsSL https://install.linkllm.dev \| sh` |
| `linkllm` (CLI) | [npm](https://npmjs.com/package/linkllm) | `npm install -g linkllm` |
| `linkllm` (Python SDK) | [PyPI](https://pypi.org/project/linkllm) | `pip install linkllm` |
| `linkllm` (Rust SDK) | [crates.io](https://crates.io/crates/linkllm) | `cargo add linkllm` |

---

## 🚀 Roadmap

- [x] Core Rust engine + Axum server
- [x] OpenAI-compatible API
- [x] llama.cpp GGUF inference
- [x] HuggingFace model pull
- [x] API proxy (OpenAI, Gemini, Anthropic)
- [x] TUI monitoring dashboard
- [x] Encrypted API key management
- [ ] Multi-model routing (in progress)
- [ ] candle pure-Rust inference
- [ ] WebUI dashboard
- [ ] Model fine-tuning support
- [ ] Plugin / middleware system
- [ ] LoRA adapter merge
- [ ] Distributed inference
- [ ] LinkLLM Cloud (hosted)

---

## 🤝 Contributing

Contributions are welcome! Please read [CONTRIBUTING.md](CONTRIBUTING.md) before submitting a PR.

```bash
git clone https://github.com/linkllm/linkllm
cd linkllm

# Build Rust core
cargo build

# Run tests
cargo test

# Build CLI
cd cli && npm install && npm run build

# Run Python bridge tests
cd python && pip install -e ".[dev]" && pytest
```

**Good first issues** are labeled [`good-first-issue`](https://github.com/linkllm/linkllm/issues?q=label%3Agood-first-issue) on GitHub.

---

## 📄 License

[MIT License](LICENSE) © 2025 AJ Ashik

---

<div align="center">
  <sub>Built with ❤️ in Rust · <a href="https://twitter.com/linkllm_dev">Twitter</a> · <a href="https://discord.gg/linkllm">Discord</a> · <a href="https://docs.linkllm.dev">Docs</a></sub>
</div>
