# kenso — Full Reference

> Your docs are the source of truth. Make them searchable.

kenso is an MCP server that turns a folder of Markdown files into a searchable knowledge base for AI agents. SQLite FTS5, zero config, two dependencies (`mcp`, `aiosqlite`).

## Why kenso

- Search returns ranked excerpts (~600 tokens) instead of dumping entire files (3,000–8,000+).
- No format conversion, no lock-in — your Markdown files stay as they are.
- Deterministic results — same query, same answer, every time.
- Document graph with typed relations — agents navigate context, not just keywords.
- Two dependencies. No vector DB, no embedding model.

| Capability | Raw file reading | kenso (BM25) | Embedding-based RAG |
|------------|-----------------|--------------|---------------------|
| Setup | None | `pip install` + `ingest` | Vector DB + embedding model |
| Token cost per query | 3,000–8,000+ | ~600 | ~600 |
| Vocabulary mismatch | None | Aliases + tags + cascade | Semantic similarity |
| Document relationships | None | Typed graph with traversal | None (flat retrieval) |
| Deterministic results | Yes | Yes | No (embedding drift) |
| Dependencies | None | 2 | 5+ |

## Quick start

```bash
pip install kenso            # core — Python 3.11+, MIT license
pip install kenso[yaml]      # recommended: YAML frontmatter support
kenso ingest ./docs
kenso serve
```

```bash
# Test it before connecting to an editor
kenso search "how does authentication work"
# → title: "Authentication Flow"
#   score: 12.4
#   content_preview: "OAuth 2.0 with PKCE. The client redirects to /authorize..."
```

Python 3.11+ · MIT license · Typed (py.typed) · CI + 75% coverage threshold

---

# Usage Reference

## MCP Tools

### search_docs

Keyword search with BM25 ranking, deduplication, and relation re-ranking.

| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `query` | str | yes | — | Search query text |
| `category` | str \| None | no | None | Filter by exact category ("all", "", or None = no filter) |
| `limit` | int | no | 5 | Max results (capped to KENSO_SEARCH_LIMIT_MAX) |

Returns: JSON array of `{file_path, title, category, content_preview, score, tags?, related_count?, highlight?}`.

On empty query: returns `{"error": "Empty query."}`.

### search_multi

Multi-query search with Reciprocal Rank Fusion (RRF).

Each query runs independently through `search_docs`. Results are merged using RRF: `score = sum(1 / (K + rank))` with K=60. Queries capped at 5.

| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `queries` | list[str] | yes | — | List of search query strings (max 5) |
| `category` | str \| None | no | None | Filter by exact category |
| `limit` | int | no | 5 | Max results after merge (capped to KENSO_SEARCH_LIMIT_MAX) |

Returns: same shape as `search_docs`.

### get_doc

Retrieve full document content by file path.

| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `path` | str | yes | — | Document file path |
| `max_length` | int \| None | no | None | Max characters to return |

Returns: JSON object `{title, content, category, audience, tags, truncated?}`.

On non-existent path: returns `{"error": "No document found at path: <path>"}`.

### get_related

Navigate document graph with configurable depth and relation type filter.

| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `path` | str | yes | — | Document file path |
| `depth` | int | no | 1 | Hops to traverse (1=direct, max 3) |
| `relation_type` | str \| None | no | None | Filter by relation type |

Returns: JSON array of `{related_path, relation_type, direction, depth}`. Direction is "incoming" or "outgoing". Cycle-safe via visited set.

On non-existent path: returns empty array `[]` (no error — the path simply has no links).

---

## Frontmatter

Parsed with PyYAML if installed (`pip install kenso[yaml]`), regex fallback otherwise.

### Supported fields

| Field | Type | Indexed | Description |
|-------|------|---------|-------------|
| `title` | str | 10× weight | Document title |
| `category` | str | 5× weight | Category |
| `tags` | str or list | 7× weight | Tags (comma-separated or list) |
| `aliases` | list[str] | in searchable_content | Vocabulary bridges — lets users find the doc with different terminology |
| `answers` | list[str] | in searchable_content | Question-form queries — matches natural language questions |
| `description` | str | in searchable_content | Document description |
| `relates_to` | str, list, or list of dicts | links table | Document relationships |

### Full frontmatter example

```yaml
---
title: Settlement Lifecycle for Equity Trades    # indexed at 10×
category: post-trade                              # indexed at 5×
tags: [settlement, clearing, T+2, DVP, CNMV]    # indexed at 7×
aliases:                                          # injected into searchable_content
  - trade settlement
  - post-trade processing
  - liquidación de operaciones
answers:                                          # injected into searchable_content
  - How are equity trades settled?
  - What is the T+2 settlement cycle?
  - What happens when a settlement fails?
relates_to:
  - path: order-management/matching-engine.md
    relation: receives_from
  - path: compliance/cnmv-reporting.md
    relation: triggers
---
```

### relates_to formats

```yaml
# Simple string
relates_to: guides/setup.md

# List of paths (all get relation_type "related")
relates_to:
  - guides/setup.md
  - compliance/reporting.md

# Typed relations
relates_to:
  - path: guides/setup.md
    relation: feeds_into
  - path: compliance/reporting.md
    relation: triggers
```

### Relation type conventions

| Type | Meaning |
|------|---------|
| `feeds_into` | Upstream data flow |
| `receives_from` | Downstream dependency |
| `triggers` | Causal relationship |
| `contains` | Parent-child hierarchy |
| `part_of` | Inverse of contains |
| `monitors` | Observational relationship |
| `implements` | Specification to implementation |
| `related` | Default fallback |

---

## CLI commands

### kenso ingest \<path\>

Scan directory for .md files and load into database.

**Content hashing:** each file is hashed (SHA-256, truncated to 16 hex chars) on the entire raw text including frontmatter. Unchanged files are skipped. This means frontmatter-only changes (e.g., adding a tag) *are* detected because the frontmatter is part of the hashed text.

### kenso serve

Start MCP server.

### kenso search \<query\>

Search from CLI. Returns the top 5 results with score, path, title, and highlight.

### kenso stats

Show database statistics. Prints a human-readable table with docs, chunks, content_bytes, categories (with counts), and links count.

---

## Database resolution

kenso resolves the database with a local-first cascade:

1. `KENSO_DATABASE_URL` env var — explicit override, always wins
2. `.kenso/docs.db` in the current directory — project-local (default for new projects)
3. `~/.local/share/kenso/docs.db` — global fallback

Each project gets its own isolated database by default. Add `.kenso/` to your `.gitignore`.

## Environment variables

| Variable | Type | Default | Description |
|----------|------|---------|-------------|
| `KENSO_DATABASE_URL` | str | (cascade above) | SQLite database path override |
| `KENSO_TRANSPORT` | str | `stdio` | Transport protocol |
| `KENSO_HOST` | str | `127.0.0.1` | Server host (sse/streamable-http) |
| `KENSO_PORT` | str | `8000` | Server port |
| `KENSO_CHUNK_SIZE` | int | `4000` | Max chunk size in characters |
| `KENSO_CHUNK_OVERLAP` | int | `0` | Overlap between consecutive chunks |
| `KENSO_CONTENT_PREVIEW_CHARS` | int | `200` | Preview length in search results |
| `KENSO_SEARCH_LIMIT_MAX` | int | `20` | Maximum results per search |
| `KENSO_LOG_LEVEL` | str | `INFO` | Logging level |

---

## Editor configuration

**Cursor** `.cursor/mcp.json`:
```json
{ "mcpServers": { "kenso": { "command": "kenso", "args": ["serve"] } } }
```

**Claude Code** `.claude/mcp.json`:
```json
{ "mcpServers": { "kenso": { "command": "kenso", "args": ["serve"] } } }
```

**Codex CLI** `~/.codex/config.toml`:
```toml
[mcp_servers.kenso]
command = "kenso"
args = ["serve"]
```

If installed in a virtualenv, use the full path: `/path/to/.venv/bin/kenso`

---

# Internals

## Search engine

### Cascade strategy: AND → NEAR → OR

1. **AND** — all terms must be present (highest precision)
2. **NEAR** — terms within 10 tokens of each other
3. **OR** — any term present (highest recall)

The first stage returning ≥3 results (or `limit` if smaller) wins.

### FTS5 column weights

| Column | Weight | Content |
|--------|--------|---------|
| `title` | 10× | Document title + section path |
| `section_path` | 8× | Heading hierarchy (e.g., "Settlement > DVP Mechanics") |
| `tags` | 7× | Comma-separated tags from frontmatter |
| `category` | 5× | Document category |
| `searchable_content` | 1× | Chunk text + aliases + answers + keywords + source path |

Tokenizer: `porter unicode61 remove_diacritics 2`.

### Query processing

- Expands camelCase/snake_case: `orderMatchingEngine` → `[order, matching, engine]`
- Single words with 3+ chars get prefix fallback: `word` → `[word, word*]`
- Escapes FTS5 special chars: `"*()+-^:`

### Deduplication

Keeps best chunk per document (highest BM25 score).

### Relation re-ranking

Boosts scores of results that have `relates_to` links between them:
`boosted_score = score * (1 + 0.15 * connection_count)`

### Reciprocal Rank Fusion (search_multi)

Each query runs independently. Results are merged with:
`rrf_score = sum(1 / (60 + rank))` across all queries where the document appears. Higher score = appeared in more queries at higher ranks.

---

## Document graph

### Structure

- Bidirectional links extracted from `relates_to` frontmatter.
- Stored in `links` table: `(source_path, target_path, relation_type)`.
- Default relation type: `"related"`.

### Traversal (get_related)

- Depth capped at 3 (prevents runaway traversal).
- Deduplicates by `(related_path, relation_type, direction)`.
- Skips self-references.
- Cycle-safe via visited set.

---

## Chunking

1. Split by H2 headings (primary boundary).
2. Pre-H2 content → preamble chunk (title ends with "— Overview").
3. Oversized H2 sections split at H3, then H4 (max 4 heading levels).
4. Still oversized → split at paragraph boundaries (`\n\n`).
5. Never splits inside fenced code blocks or tables (protected ranges).
6. Overlap: prepend `KENSO_CHUNK_OVERLAP` chars from previous chunk (skips first and preamble chunks).

### searchable_content construction

Each chunk's searchable content is assembled as:

```
[chunk_content]

Also known as: [aliases joined by comma]

Questions this document answers: [answers joined by |]

[description]

Keywords: [tags joined by comma]

Source: [file_path]
```

---

## SQLite schema

**chunks** — main content table:
```sql
CREATE TABLE chunks (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    file_path TEXT NOT NULL,
    chunk_index INTEGER NOT NULL DEFAULT 0,
    title TEXT,
    section_path TEXT,
    content TEXT NOT NULL,
    searchable_content TEXT,
    category TEXT,
    audience TEXT DEFAULT 'all',
    tags TEXT,  -- JSON array
    content_hash TEXT,
    created_at TEXT DEFAULT (strftime('%Y-%m-%dT%H:%M:%fZ', 'now')),
    updated_at TEXT DEFAULT (strftime('%Y-%m-%dT%H:%M:%fZ', 'now')),
    UNIQUE (file_path, chunk_index)
);
```

Indexes: `idx_chunks_path` (file_path), `idx_chunks_cat` (category).

**chunks_fts** — FTS5 virtual table:
```sql
CREATE VIRTUAL TABLE chunks_fts USING fts5(
    title, section_path, tags, category, searchable_content,
    content='chunks', content_rowid='id',
    tokenize='porter unicode61 remove_diacritics 2'
);
```

Auto-synced via triggers on INSERT, UPDATE, DELETE.

**links** — document relationships:
```sql
CREATE TABLE links (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    source_path TEXT NOT NULL,
    target_path TEXT NOT NULL,
    relation_type TEXT NOT NULL DEFAULT 'related',
    created_at TEXT DEFAULT (strftime('%Y-%m-%dT%H:%M:%fZ', 'now')),
    UNIQUE (source_path, target_path, relation_type)
);
```

Indexes: `idx_links_source` (source_path), `idx_links_target` (target_path).

---

## Concurrency

SQLite runs in WAL (Write-Ahead Logging) mode. Multiple readers can operate concurrently. A single writer can proceed without blocking readers. Multiple `kenso serve` instances reading the same database is safe. Concurrent writes are serialized by SQLite's WAL lock — the second writer will wait (not fail), but high-write concurrency is not a target use case.
