Metadata-Version: 2.4
Name: skill-mem
Version: 0.2.0
Summary: Routing, utility tracking, and evolution for Agent Skills.
License-Expression: MIT
License-File: LICENSE
Keywords: agent,embeddings,memory,routing,skills
Requires-Python: >=3.11
Requires-Dist: pyyaml>=6.0
Provides-Extra: openrouter
Requires-Dist: openrouter>=0.7.11; extra == 'openrouter'
Description-Content-Type: text/markdown

# skill-mem

Two-stage routing, utility tracking, and evolution for [Agent Skills](https://agentskills.io).

```python
from skill_mem import Library, Outcome, Revision

library = Library(".agents/skills", embed=my_embed_fn, rerank=my_rerank_fn)

# Route — two-stage retrieve-and-rerank over full skill text
matches = library.route("analyze this CSV and find outliers")
skill = matches[0].skill

# Use it however you want — system prompt, tool description, few-shot examples
response = await agent.run(query, system=skill.content)

# Record what happened — successful queries improve routing over time
library.record(skill.name, query, Outcome(success=True, reason="found 3 outliers"))

# Skills that fail get rewritten — including their scripts and references
library.evolve(skill.name, context="choked on 500k rows", rewrite=my_rewrite_fn)
```

Skills are standard [Agent Skills](https://agentskills.io/specification) `SKILL.md` files with optional `scripts/`, `references/`, and `assets/` directories. Claude Code, Cursor, VS Code, and [30+ other tools](https://agentskills.io) already read them. This library adds routing, tracking, and learning.

## Install

```bash
uv add skill-mem
```

You bring your own embedding and (optionally) reranking functions:

```python
# Embedding — any model works
from fastembed import TextEmbedding
model = TextEmbedding("BAAI/bge-small-en-v1.5")
def embed(text: str) -> list[float]:
    return list(next(model.embed([text])))

# Reranking — optional, adds cross-encoder second stage
from sentence_transformers import CrossEncoder
reranker = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")
def rerank(query: str, docs: list[str]) -> list[float]:
    return reranker.predict([[query, d] for d in docs]).tolist()

library = Library("./skills", embed, rerank=rerank)
```

Without a reranker, routing uses bi-encoder cosine similarity only. Adding one enables the full two-stage pipeline from the [SkillRouter paper](https://arxiv.org/abs/2603.22455).

## How it works

### Skills on disk

```
.agents/skills/
  csv-analysis/
    SKILL.md              <- standard Agent Skills file
    scripts/              <- optional executable code
      analyze.py
    references/           <- optional documentation
      pandas-guide.md
  web-search/
    SKILL.md
  .skill-meta/            <- added by skill-mem (gitignored)
    embeddings.json       <- cached vectors for routing
    attempts.json         <- full attempt log (query, success, reason)
    versions.json         <- version tracking
    history/              <- archived SKILL.md before each evolution
```

The `SKILL.md` files are portable. The `.skill-meta/` sidecar stores the intelligence layer. Extra frontmatter fields (`license`, `metadata`, `compatibility`, etc.) are preserved through evolution.

### A skill with scripts

```yaml
---
name: csv-analysis
description: Analyze, summarize, or query CSV files and tabular data.
---

Load the file with pandas. Use df.describe() for numeric columns.
Check nulls with df.isnull().sum(). For large files, run
scripts/analyze.py with sampling enabled.
```

The full skill text — name, description, body, scripts, and references — is used for routing. The [SkillRouter paper](https://arxiv.org/abs/2603.22455) shows this body text is the decisive signal: 91.7% of cross-encoder attention concentrates on it, and removing it causes 29-44pp accuracy degradation.

## Routing

Routing uses a two-stage retrieve-and-rerank pipeline:

1. **Bi-encoder retrieval**: embeds the full skill text (name + description + body + files + successful queries) and retrieves top-`retrieve_k` candidates by cosine similarity.
2. **Cross-encoder reranking** (optional): scores each candidate against the query with deeper cross-attention, re-sorts, and returns top-`k`.

```python
library = Library(path, embed)                            # stage 1 only
library = Library(path, embed, rerank=fn)                 # both stages
library = Library(path, embed, rerank=fn, retrieve_k=30)  # wider retrieval window
```

The reranker sees the same full text as the embedder. `retrieve_k` defaults to 20 (the paper's value for ~80K skill pools).

## API

```python
from skill_mem import Library, Skill, Match, Outcome, Revision, Stats

library = Library(path, embed, rerank=rerank)
```

### Read

```python
library.route(query, k=3) -> list[Match]       # two-stage search, top-k
library.get(name) -> Skill                     # by name
library.all() -> list[Skill]                   # everything
library.utility(name) -> float                 # success rate (0.0-1.0)
library.stats(name) -> Stats                   # utility, attempts, recent log
```

### Write

```python
library.add(name, description, content) -> Skill
library.record(name, query, Outcome(success, reason))
library.evolve(name, context, rewrite=fn, validate=fn) -> Skill
library.discover(query, context, create=fn) -> Skill
```

### The loop

The core cycle from the [Memento-Skills paper](https://arxiv.org/abs/2603.18743):

```python
matches = library.route(query)
skill = matches[0].skill if matches else None

result = await agent.run(query, skill=skill)
outcome = judge(query, result)

if skill:
    library.record(skill.name, query, outcome)
    if not outcome.success:
        if library.utility(skill.name) < 0.4:
            library.discover(query, outcome.reason, create=my_create_fn)
        else:
            library.evolve(skill.name, outcome.reason, rewrite=my_rewrite_fn)
elif outcome.success:
    library.discover(query, result, create=my_create_fn)
```

Route. Execute. Judge. Evolve. Every cycle, the library gets better.

## Evolve and discover

`evolve` and `discover` take your functions — the library handles versioning, archival, and re-indexing.

```python
def my_rewrite(skill: Skill, context: str) -> Revision:
    """Called by library.evolve(). Can update content and files."""
    result = llm(f"Rewrite this skill:\n{skill.content}\n\nFailure: {context}")
    return Revision(description=result.description, content=result.content)

def my_create(query: str, context: str) -> tuple[str, str, str]:
    """Called by library.discover(). Returns (name, description, content)."""
    result = llm(f"Create a reusable skill from this interaction:\n{query}\n{context}")
    return (result.name, result.description, result.content)
```

### Evolving scripts and references

Skills can include `scripts/`, `references/`, and `assets/` directories. These files are available on `skill.files` and can be updated through evolution:

```python
def my_rewrite(skill: Skill, context: str) -> Revision:
    # Fix a broken script
    old_script = skill.files["scripts/analyze.py"]
    fixed = llm(f"Fix this script:\n{old_script}\n\nError: {context}")
    return Revision(
        description=skill.description,
        content=skill.content,
        files={"scripts/analyze.py": fixed},
    )
```

Files in the revision overwrite their counterparts; files not mentioned stay unchanged. An optional `validate` hook gates the entire revision:

```python
library.evolve("csv", context, rewrite=my_rewrite, validate=lambda old, new: "pandas" in new.content)
```

When a skill evolves, the old `SKILL.md` is archived to `.skill-meta/history/<name>/v<N>.md`. Successful queries are folded into routing embeddings so skills become easier to find as they accumulate evidence.

## Examples

```bash
uv run python examples/basic.py       # add skills, route, record outcomes
uv run python examples/evolving.py    # evolution loop with validation gate
uv run python examples/llm.py         # full loop with LLM rewriting and reranking
```

## Papers

Built on two complementary papers:

- [Memento-Skills](https://arxiv.org/abs/2603.18743) — the skill memory lifecycle: routing, utility tracking, evolution, and discovery.
- [SkillRouter](https://arxiv.org/abs/2603.22455) — two-stage retrieve-and-rerank over full skill text. Shows that skill body is the decisive routing signal (91.7% of attention), and a compact 1.2B pipeline outperforms much larger zero-shot alternatives.

Skills are stored in the [Agent Skills](https://agentskills.io) format for ecosystem compatibility.
