Metadata-Version: 2.4
Name: skill-search
Version: 0.1.0
Summary: LLM-framework-agnostic skill search library with fast full-text search
Project-URL: Homepage, https://github.com/igtm/skill-search
Project-URL: Source, https://github.com/igtm/skill-search
License-Expression: MIT
License-File: LICENSE
Requires-Python: >=3.13
Requires-Dist: pyyaml>=6.0.3
Requires-Dist: tantivy>=0.25.1
Description-Content-Type: text/markdown

# skill-search

LLM-framework-agnostic skill search library.

Parses `SKILL.md` files in a skills directory, builds a fast full-text search index using [tantivy](https://github.com/quickwit-oss/tantivy) (Rust-based), and provides tool definitions + execution handlers that work with any LLM library (OpenAI SDK, Anthropic SDK, LangChain, LiteLLM, etc.).

[日本語 README](README_ja.md)

## Installation

```bash
pip install skill-search
```

## Usage

### Basic

```python
from skill_search import SkillSearch

# Initialize with skill directories
ss = SkillSearch(skills_dirs=["./skills"])

# Get tool definitions (OpenAI function calling format)
tools = ss.get_tool_definitions()

# Get system prompt with skill listing
system_prompt = ss.get_system_prompt()

# Execute tool calls from LLM
result = ss.call_tool("search_skills", {"query": "API reference", "top_k": 3})
```

### OpenAI SDK

```python
import json
from openai import OpenAI
from skill_search import SkillSearch

client = OpenAI()
ss = SkillSearch(skills_dirs=["./skills"])

messages = [
    {"role": "system", "content": ss.get_system_prompt()},
    {"role": "user", "content": "How do I use the Figma API?"},
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=ss.get_tool_definitions(),
)

# Handle tool calls
for choice in response.choices:
    if choice.message.tool_calls:
        for tc in choice.message.tool_calls:
            result = ss.call_tool(
                tc.function.name,
                json.loads(tc.function.arguments),
            )
            messages.append({"role": "tool", "content": result, "tool_call_id": tc.id})
```

### Anthropic SDK

```python
from anthropic import Anthropic
from skill_search import SkillSearch

client = Anthropic()
ss = SkillSearch(skills_dirs=["./skills"])

# Convert to Anthropic format
tools = [
    {
        "name": t["function"]["name"],
        "description": t["function"]["description"],
        "input_schema": t["function"]["parameters"],
    }
    for t in ss.get_tool_definitions()
]

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    system=ss.get_system_prompt(),
    messages=[{"role": "user", "content": "How do I search in Jira?"}],
    tools=tools,
)

# Handle tool use
for block in response.content:
    if block.type == "tool_use":
        result = ss.call_tool(block.name, block.input)
```

## Available Tools

| Tool | Description |
|---|---|
| `list_skills` | List all available skills |
| `read_skill` | Read full SKILL.md content |
| `search_skills` | Full-text search (BM25 via tantivy) |
| `read_resource` | Read supplementary resource files |

## SKILL.md Format

```markdown
---
name: my-skill
description: Brief description of the skill
---

# My Skill

## Usage

1. Step 1
2. Step 2
```

## How It Works

1. **Discovery** — Recursively scans for `SKILL.md` files and parses YAML frontmatter
2. **Indexing** — Splits documents into heading-level chunks and indexes with tantivy
3. **Search** — BM25 scoring returns results ranked by relevance
4. **Tool Execution** — `call_tool()` executes LLM tool calls and returns results as strings

## Security

LLM tool calls are treated as untrusted input and protected with defense-in-depth.

### Path Traversal Prevention

| Layer | Location | Protection |
|---|---|---|
| Discovery | `discover_resources()` | Symlinks resolved; paths outside skill directory excluded |
| Input validation | `read_resource` handler | Resource names containing `..` are rejected |
| Path resolution | `read_resource` handler | Resolved paths verified to be within skill directory |
| Whitelist | `read_resource` handler | Only pre-discovered resources are accessible |

### Extension Filter

Only these file extensions are indexed as resources:

`.md`, `.json`, `.yaml`, `.yml`, `.csv`, `.xml`, `.txt`

Executable files (`.py`, `.sh`, `.exe`, etc.) and binaries are excluded.

### Design Principles

- **Read-only** — No write or execute capabilities
- **Whitelist-based** — Only pre-validated resources are accessible
- **Defense-in-depth** — Input validation → path resolution → directory boundary check

## Development

```bash
uv run pytest tests/ -v
```

## License

MIT
