Metadata-Version: 2.4
Name: rlm-core
Version: 0.1.0
Summary: Recursive Language Model (RLM) inference engine — explore massive contexts via iterative LLM + REPL
Project-URL: Homepage, https://github.com/wgthomas/rlm-core
Project-URL: Repository, https://github.com/wgthomas/rlm-core
Author: wgthomas
License: MIT
License-File: LICENSE
Keywords: context,llm,reasoning,recursive,repl,rlm
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.11
Requires-Dist: httpx>=0.25.0
Provides-Extra: test
Requires-Dist: pytest; extra == 'test'
Requires-Dist: pytest-asyncio; extra == 'test'
Description-Content-Type: text/markdown

# rlm-core

Recursive Language Model (RLM) inference engine for Python.

Based on ["Recursive Language Models"](https://arxiv.org/abs/2502.00010) by Zhang et al. (MIT CSAIL, 2025). The key idea: instead of stuffing massive contexts into the LLM's context window, give the LLM a Python REPL and let it programmatically explore the data, query sub-LLMs, and iteratively build an answer.

## Install

```bash
pip install rlm-core
```

## Quick Start

```python
import asyncio
from rlm_core import RLM, RLMConfig

config = RLMConfig(
    model="gpt-4o-mini",
    api_key="sk-...",
)
rlm = RLM(config)

result = asyncio.run(rlm.completion(
    query="What is the main argument in this document?",
    context=open("huge_document.txt").read(),
))
print(result.answer)
```

## How It Works

1. Your query and context metadata go to a **root LLM**
2. The root LLM writes Python code in ` ```repl ` blocks to explore the context
3. Code executes in a sandboxed REPL with access to `context` (your data) and `llm_query()` (a sub-LLM)
4. Execution results feed back to the root LLM
5. Loop continues until the LLM calls `FINAL(answer)` or max iterations

## Local LLM

Point `api_base` at any OpenAI-compatible server:

```python
config = RLMConfig(
    model="qwen2.5-coder-32b",
    api_base="http://localhost:8080/v1",
    api_key="not-needed",
)
```

## Split Models

Use a strong model for reasoning and a cheaper one for sub-queries:

```python
config = RLMConfig(
    model="gpt-4o",            # root: does the reasoning
    sub_model="gpt-4o-mini",   # sub: answers chunk queries
    api_key="sk-...",
)
```

Or use entirely different backends:

```python
config = RLMConfig(
    model="gpt-4o",
    api_key="sk-...",
    sub_model="local-model",
    sub_api_base="http://localhost:8080/v1",
    sub_api_key="not-needed",
)
```

## API

### `RLMConfig`

| Field | Default | Description |
|-------|---------|-------------|
| `model` | `"gpt-4o-mini"` | Root model name |
| `sub_model` | `"gpt-4o-mini"` | Sub-LLM model name |
| `api_base` | OpenAI | Root model API base URL |
| `api_key` | `""` | Root model API key |
| `sub_api_base` | same as `api_base` | Sub model API base URL |
| `sub_api_key` | same as `api_key` | Sub model API key |
| `max_iterations` | `15` | Max REPL interaction loops |
| `max_output_length` | `50000` | Truncate REPL output at this length |
| `timeout` | `300.0` | HTTP request timeout (seconds) |

### `RLMResult`

| Field | Description |
|-------|-------------|
| `answer` | Final answer string |
| `iterations` | Number of LLM interaction rounds |
| `total_tokens` | Combined token usage (root + sub) |
| `trajectory` | List of per-iteration dicts with response/code/output |

### `RLM.completion(query, context, context_type="text") -> RLMResult`

Main entry point. Async.

## See Also

- [rlm-mcp-server-webui](https://github.com/wgthomas/rlm-mcp-server-webui) -- Full MCP server with web UI built on this library

## License

MIT
