Metadata-Version: 2.4
Name: llama-buddy
Version: 0.1.2
Summary: CLI wrapper for llama.cpp providing an ollama-like experience
Keywords: llama.cpp,llm,cli,gguf
Author: Thilo Michael
Author-email: Thilo Michael <thilo.michael@bdr.de>
License-Expression: MIT
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Dist: httpx>=0.27
Requires-Dist: rich>=13.0
Requires-Python: >=3.10
Project-URL: Homepage, https://github.com/thilomichael/llama-buddy
Project-URL: Repository, https://github.com/thilomichael/llama-buddy
Project-URL: Issues, https://github.com/thilomichael/llama-buddy/issues
Description-Content-Type: text/markdown

<div align="center">

# llama-buddy

**A friendly CLI wrapper for [llama.cpp](https://github.com/ggml-org/llama.cpp)**

Manage, download, and serve local LLMs with a single command.
Think of it as an ollama-like experience built on top of `llama-server`.

[![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue)](https://www.python.org)
[![License: MIT](https://img.shields.io/badge/license-MIT-green)](LICENSE)
[![PyPI](https://img.shields.io/pypi/v/llama-buddy)](https://pypi.org/project/llama-buddy/)

</div>

---

## Features

- **Background server** &mdash; start/stop/restart `llama-server` as a daemon
- **Multi-model routing** &mdash; preset-based configuration with automatic model load/unload
- **Interactive downloads** &mdash; search HuggingFace, pick a quant, download with progress and resume
- **Rich terminal UI** &mdash; tables, panels, interactive selectors, and live search
- **GGUF inspector** &mdash; view model metadata, architecture, and sampling parameters
- **Per-model settings** &mdash; context size, GPU layers, flash attention, and more
- **Auto-sync** &mdash; preset file stays in sync with the llama.cpp cache automatically

## Screenshots

<details open>
<summary><b>Model listing</b> &mdash; <code>llb models</code></summary>
<br>
<p align="center">
  <picture>
    <source media="(prefers-color-scheme: dark)" srcset="assets/models.svg">
    <source media="(prefers-color-scheme: light)" srcset="assets/models.svg">
    <img alt="llb models" src="assets/models.svg" width="700">
  </picture>
</p>
</details>

<details open>
<summary><b>Model info</b> &mdash; <code>llb info</code></summary>
<br>
<p align="center">
  <picture>
    <source media="(prefers-color-scheme: dark)" srcset="assets/info.svg">
    <source media="(prefers-color-scheme: light)" srcset="assets/info.svg">
    <img alt="llb info" src="assets/info.svg" width="600">
  </picture>
</p>
</details>

## Installation

```bash
pipx install llama-buddy
```

Or with [uv](https://docs.astral.sh/uv/):

```bash
uv tool install llama-buddy
```

This installs the `llb` command into an isolated environment and adds it to your `PATH`.

### Prerequisites

- Python 3.10+
- [llama.cpp](https://github.com/ggml-org/llama.cpp) installed and `llama-server` on your `PATH`

## Quick start

```bash
# Download a model (interactive search)
llb download

# Or specify directly
llb download mistralai/Ministral-3-3B-Instruct-2512-GGUF:Q4_K_M

# Start the server
llb start

# List all models
llb models

# Chat with a model (uses llama-cli)
llb chat

# Inspect model metadata
llb info

# Configure settings (interactive TUI)
llb settings

# Open the web UI in your browser
llb open

# Stop the server
llb stop
```

## Commands

| Command | Description |
|---------|-------------|
| `llb start` | Start `llama-server` in the background. Extra args are forwarded. |
| `llb stop` | Stop the running server. |
| `llb restart` | Restart the server. |
| `llb status` | Show whether the server is running. |
| `llb models` | List all models with status, size, and grouping. Supports `--sort size`. |
| `llb download [model]` | Download a model. Interactive HF search when no model given. |
| `llb remove [model]` | Remove a model with confirmation dialog. `--keep-files` to preserve GGUFs. |
| `llb info [model]` | Show GGUF metadata. Interactive selector when no model given. |
| `llb settings` | Interactive editor for global and per-model settings. |
| `llb chat [model]` | Interactive chat via `llama-cli`. Model selector when no model given. |
| `llb open` | Open the `llama-server` web UI in your browser. |
| `llb logs` | Tail the server log file. |

## Configuration

Config files live in `~/.config/llama/`:

| File | Purpose |
|------|---------|
| `models.ini` | Model preset file &mdash; sections are HF repo IDs, auto-synced with cache |
| `settings.json` | Global server settings (port, context size, GPU layers, etc.) |
| `server.pid` | PID of the running server |
| `server.log` | Server stdout/stderr |

### Per-model settings

Run `llb settings` and select **Model Settings** to configure per-model overrides:

- Context size, GPU layers, flash attention
- Custom aliases
- Any `llama-server` parameter

## Development

```bash
# Clone and install
git clone https://github.com/thilomichael/llama-buddy.git
cd llama-buddy
uv sync

# Run
uv run llb <command>

# Test
uv run pytest

# Lint
uv run ruff check src/ tests/
```

## License

MIT
