Metadata-Version: 2.4
Name: guido-inference
Version: 0.1.0
Summary: Inference package for Guido language models — model code, HuggingFace loader, cartridge support, CLI.
Author-email: Northsea Research <oss@northsea.co>
License: Apache-2.0
Project-URL: Homepage, https://github.com/northseadev/guido
Project-URL: Repository, https://github.com/northseadev/guido
Project-URL: Issues, https://github.com/northseadev/guido/issues
Project-URL: HuggingFace, https://huggingface.co/northsea-ai
Keywords: guido,llm,inference,stateful,titans,cartridge,moe,kda,mla
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: guido-common~=0.1.0
Requires-Dist: torch>=2.4.0
Requires-Dist: safetensors>=0.7.0
Requires-Dist: huggingface-hub>=1.0.0
Provides-Extra: hf-tokenizer
Requires-Dist: transformers>=5.0.0; extra == "hf-tokenizer"
Provides-Extra: fla
Requires-Dist: flash-linear-attention>=0.4.0; extra == "fla"
Requires-Dist: triton>=3.6.0; extra == "fla"
Provides-Extra: dev
Requires-Dist: pytest>=9.0; extra == "dev"
Requires-Dist: pytest-cov>=6.0; extra == "dev"
Requires-Dist: ruff>=0.14.0; extra == "dev"
Requires-Dist: mypy>=1.19.0; extra == "dev"
Provides-Extra: all
Requires-Dist: guido-inference[dev,fla,hf-tokenizer]; extra == "all"

# guido-inference

Run [Guido](https://huggingface.co/northsea-ai) models locally. Model code with **Titans memory update during inference**, HuggingFace loader, cartridge support, and CLI.

Guido's memory evolves as you talk to it. Save the memory state as a `.cart` cartridge file and reload it later — the model picks up where it left off.

## Installation

```bash
pip install guido-inference
```

For Flash Linear Attention kernel support (GPU):
```bash
pip install guido-inference[fla]
```

## Quick Start

### CLI

```bash
guido run northsea-ai/Guido-3B --prompt "What is the capital of the Netherlands?"
guido run northsea-ai/Guido-3B                          # Interactive mode
guido run northsea-ai/Guido-3B --effort high             # Adaptive compute
guido run northsea-ai/Guido-3B --cartridge ./john.cart   # Resume with saved memory
guido info northsea-ai/Guido-3B                          # Model info (no GPU)
```

### Python

```python
from guido_inference import load_model

model, tokenizer = load_model("northsea-ai/Guido-3B")
input_ids = tokenizer.encode("Hello!", return_tensors="pt").to(model.device)
output = model.generate(input_ids)
print(tokenizer.decode(output[0], skip_special_tokens=True))
```

### Cartridges — Create, Save, Reload

Memory updates happen automatically during inference (Titans architecture). Save and reload the evolved state:

```python
# Conversation evolves the model's memory
for msg in ["I'm a researcher at TU Delft.", "My focus is renewable energy."]:
    ids = tokenizer.encode(msg, return_tensors="pt").to(model.device)
    model.generate(ids)

# Save evolved memory as portable cartridge (~250KB-1MB)
model.save_memory("./researcher.cart")

# Later: reload and continue where you left off
model.load_memory("./researcher.cart")
output = model.generate(tokenizer.encode("What papers should I read?", return_tensors="pt").to(model.device))

# Reset memory to blank slate
model.reset_memory()
```

## Model Downloads

| Model | Params (Total) | Params (Active) | HuggingFace |
|-------|---------------|-----------------|-------------|
| Guido-300M | 0.42B | 0.29B | [northsea-ai/Guido-300M](https://huggingface.co/northsea-ai/Guido-300M) |
| Guido-3B | 3.33B | 0.84B | [northsea-ai/Guido-3B](https://huggingface.co/northsea-ai/Guido-3B) |
| Guido-7B | 7.36B | 1.32B | [northsea-ai/Guido-7B](https://huggingface.co/northsea-ai/Guido-7B) |

For fine-tuning, use third-party training harnesses like [Axolotl](https://github.com/axolotl-ai-cloud/axolotl) or [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) with a LoRA/QLoRA adapter.

See the [monorepo README](../../README.md) for architecture details and full documentation.
