Metadata-Version: 2.3
Name: cocosearch
Version: 0.1.2
Summary: Local-first code search via MCP/CLI
Author: VioletCranberry
Author-email: VioletCranberry <zh6an0w.fedor@gmail.com>
Requires-Dist: cocoindex[embeddings]>=0.3.31
Requires-Dist: mcp[cli]>=1.26.0
Requires-Dist: pathspec>=1.0.3
Requires-Dist: pgvector>=0.4.2
Requires-Dist: psycopg[binary,pool]>=3.3.2
Requires-Dist: pyyaml>=6.0.2
Requires-Dist: rich>=13.0.0
Requires-Dist: tree-sitter>=0.25.0,<0.26.0
Requires-Dist: tree-sitter-language-pack>=0.13.0
Requires-Python: >=3.11
Description-Content-Type: text/markdown

<p align="center">
  <img src="./docs/banner.svg" alt="Coco[-S]earch — Local-first hybrid semantic code search" width="960">
</p>

<p align="center">
  <a href="https://pypi.org/project/cocosearch/"><img src="https://img.shields.io/pypi/v/cocosearch?color=blue&logo=pypi&logoColor=white" alt="PyPI"></a>
  <a href="https://www.python.org/"><img src="https://img.shields.io/badge/python-%3E%3D3.11-blue?logo=python&logoColor=white" alt="Python >= 3.11"></a>
  <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-green" alt="License: MIT"></a>
  <a href="https://github.com/astral-sh/ruff"><img src="https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json" alt="Ruff"></a>
  <a href="https://github.com/astral-sh/uv"><img src="https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/uv/main/assets/badge/v0.json" alt="uv"></a>
  <a href="https://docs.pytest.org/"><img src="https://img.shields.io/badge/tests-pytest-blue?logo=pytest&logoColor=white" alt="pytest"></a>
  <a href="https://modelcontextprotocol.io/"><img src="https://img.shields.io/badge/MCP-compatible-8A2BE2?logo=anthropic&logoColor=white" alt="MCP"></a>
</p>

<p align="center">
  <a href="#supported-languages"><img src="https://img.shields.io/badge/Bash-4EAA25?logo=gnubash&logoColor=white" alt="Bash"></a>
  <a href="#supported-languages"><img src="https://img.shields.io/badge/C-A8B9CC?logo=c&logoColor=white" alt="C"></a>
  <a href="#supported-languages"><img src="https://img.shields.io/badge/C%2B%2B-00599C?logo=cplusplus&logoColor=white" alt="C++"></a>
  <a href="#supported-languages"><img src="https://img.shields.io/badge/C%23-512BD4?logo=csharp&logoColor=white" alt="C#"></a>
  <a href="#supported-languages"><img src="https://img.shields.io/badge/CSS-1572B6?logo=css3&logoColor=white" alt="CSS"></a>
  <a href="#supported-languages"><img src="https://img.shields.io/badge/Dockerfile-2496ED?logo=docker&logoColor=white" alt="Dockerfile"></a>
  <a href="#supported-languages"><img src="https://img.shields.io/badge/DTD-7A7A7A" alt="DTD"></a>
  <a href="#supported-languages"><img src="https://img.shields.io/badge/Fortran-734F96?logo=fortran&logoColor=white" alt="Fortran"></a>
  <a href="#supported-languages"><img src="https://img.shields.io/badge/Go-00ADD8?logo=go&logoColor=white" alt="Go"></a>
  <a href="#supported-languages"><img src="https://img.shields.io/badge/Groovy-4298B8?logo=apachegroovy&logoColor=white" alt="Groovy"></a>
  <a href="#supported-languages"><img src="https://img.shields.io/badge/HCL-844FBA?logo=terraform&logoColor=white" alt="HCL"></a>
  <a href="#supported-languages"><img src="https://img.shields.io/badge/HTML-E34F26?logo=html5&logoColor=white" alt="HTML"></a>
  <a href="#supported-languages"><img src="https://img.shields.io/badge/Java-ED8B00?logo=openjdk&logoColor=white" alt="Java"></a>
  <a href="#supported-languages"><img src="https://img.shields.io/badge/JavaScript-F7DF1E?logo=javascript&logoColor=black" alt="JavaScript"></a>
  <a href="#supported-languages"><img src="https://img.shields.io/badge/JSON-000000?logo=json&logoColor=white" alt="JSON"></a>
  <a href="#supported-languages"><img src="https://img.shields.io/badge/Kotlin-7F52FF?logo=kotlin&logoColor=white" alt="Kotlin"></a>
  <a href="#supported-languages"><img src="https://img.shields.io/badge/Markdown-000000?logo=markdown&logoColor=white" alt="Markdown"></a>
  <a href="#supported-languages"><img src="https://img.shields.io/badge/Pascal-0364B8" alt="Pascal"></a>
  <a href="#supported-languages"><img src="https://img.shields.io/badge/PHP-777BB4?logo=php&logoColor=white" alt="PHP"></a>
  <a href="#supported-languages"><img src="https://img.shields.io/badge/Python-3776AB?logo=python&logoColor=white" alt="Python"></a>
  <a href="#supported-languages"><img src="https://img.shields.io/badge/R-276DC3?logo=r&logoColor=white" alt="R"></a>
  <a href="#supported-languages"><img src="https://img.shields.io/badge/Ruby-CC342D?logo=ruby&logoColor=white" alt="Ruby"></a>
  <a href="#supported-languages"><img src="https://img.shields.io/badge/Rust-000000?logo=rust&logoColor=white" alt="Rust"></a>
  <a href="#supported-languages"><img src="https://img.shields.io/badge/Scala-DC322F?logo=scala&logoColor=white" alt="Scala"></a>
  <a href="#supported-languages"><img src="https://img.shields.io/badge/Solidity-363636?logo=solidity&logoColor=white" alt="Solidity"></a>
  <a href="#supported-languages"><img src="https://img.shields.io/badge/SQL-336791" alt="SQL"></a>
  <a href="#supported-languages"><img src="https://img.shields.io/badge/Swift-F05138?logo=swift&logoColor=white" alt="Swift"></a>
  <a href="#supported-languages"><img src="https://img.shields.io/badge/TOML-9C4121?logo=toml&logoColor=white" alt="TOML"></a>
  <a href="#supported-languages"><img src="https://img.shields.io/badge/TypeScript-3178C6?logo=typescript&logoColor=white" alt="TypeScript"></a>
  <a href="#supported-languages"><img src="https://img.shields.io/badge/XML-0060AC" alt="XML"></a>
  <a href="#supported-languages"><img src="https://img.shields.io/badge/YAML-CB171E?logo=yaml&logoColor=white" alt="YAML"></a>
</p>

<p align="center">
  <a href="#supported-grammars"><img src="https://img.shields.io/badge/Docker_Compose-2496ED?logo=docker&logoColor=white" alt="Docker Compose"></a>
  <a href="#supported-grammars"><img src="https://img.shields.io/badge/GitHub_Actions-2088FF?logo=githubactions&logoColor=white" alt="GitHub Actions"></a>
  <a href="#supported-grammars"><img src="https://img.shields.io/badge/GitLab_CI-FC6D26?logo=gitlab&logoColor=white" alt="GitLab CI"></a>
  <a href="#supported-grammars"><img src="https://img.shields.io/badge/Helm_Template-0F1689?logo=helm&logoColor=white" alt="Helm Template"></a>
  <a href="#supported-grammars"><img src="https://img.shields.io/badge/Helm_Values-0F1689?logo=helm&logoColor=white" alt="Helm Values"></a>
  <a href="#supported-grammars"><img src="https://img.shields.io/badge/Kubernetes-326CE5?logo=kubernetes&logoColor=white" alt="Kubernetes"></a>
</p>

Coco[-S]earch is a local-first hybrid semantic code search tool. It combines vector similarity and keyword matching (via RRF fusion) to find code by meaning, not just text. Powered by [CocoIndex](https://github.com/cocoindex-io/cocoindex) for indexing, [Tree-sitter](https://tree-sitter.github.io/tree-sitter/) for syntax-aware chunking and symbol extraction, [PostgreSQL](https://www.postgresql.org/) with [pgvector](https://github.com/pgvector/pgvector) for storage, and [Ollama](https://ollama.com/) for local embeddings. No external APIs — everything runs on your machine.

Available as a CLI, MCP server, or interactive REPL. Incremental indexing, `.gitignore`-aware. Supports 31+ languages with symbol-level filtering for 14+, plus domain-specific grammars for structured config files.

## 📑 Table of Contents

- [⚠️ Disclaimer](#disclaimer)
- [🚀 Quick Start](#quick-start)
- [✨ Features](#features)
- [🖥️ Interfaces](#interfaces)
- [🏆 Where MCP Wins](#where-mcp-wins)
- [📚 Useful Documentation](#useful-documentation)
- [🧩 Components](#components)
- [⚙️ How Search Works](#how-search-works)
- [🌐 Supported Languages](#supported-languages)
- [📝 Supported Grammars](#supported-grammars)
- [🔧 Configuration](#configuration)
- [🧪 Testing](#testing)
- [🛠️ Troubleshooting](#troubleshooting)

## Disclaimer

This project was originally built for personal use — a solo experiment in local-first, privacy-focused code search to accelerate self-onboarding to new codebases and explore spec-driven development. Initially scaffolded with [GSD](https://github.com/glittercowboy/get-shit-done) and refined by hand. Ships with a CLI, MCP tools, dashboards (TUI/WEB), a status API, reusable [Claude SKILLS](https://code.claude.com/docs/en/skills), and a [Claude Code plugin](https://code.claude.com/docs/en/plugins) for one-command setup.

## Quick Start

- **Services**:

  ```bash
  # 1. Clone this repository and start infrastructure:
  git clone https://github.com/VioletCranberry/coco-s.git && cd coco-s
  # Docker volumes are bind-mounted to ./docker_data/ inside the repository,
  # so infrastructure must be started from the cloned repo directory.
  docker compose up -d
  # 2. Verify services are ready.
  uvx cocosearch config check
  ```

- **Indexing your projects**:

  ```bash
  # 3.1 Use WEB Dashboard:
  uvx cocosearch dashboard
  # 3.2 Use CLI:
  uvx cocosearch index .
  # 3.3 Use AI and MCP - see below.
  ```

- **Register with your AI assistant (pick one)**:

  **Option A — Plugin (recommended):**

  ```bash
  claude plugin marketplace add VioletCranberry/coco-s
  claude plugin install cocosearch@cocosearch
  # All 7 skills + MCP server configured automatically
  ```

  <p align="center">
    <img src="./docs/plugin-examples.png" alt="CocoSearch plugin skills in Claude Code" width="720">
  </p>

  **Option B — Manual MCP registration:**

  ```bash
  claude mcp add --scope user cocosearch -- \
    uvx cocosearch mcp --project-from-cwd
  ```

  > **Note:** The MCP server automatically opens a web dashboard in your browser on a random port. Set `COCOSEARCH_DASHBOARD_PORT=8080` to pin it to a fixed port, or `COCOSEARCH_NO_DASHBOARD=1` to disable it.

  Install skills manually (for development):

  ```bash
  mkdir -p .claude/skills
  for skill in cocosearch-onboarding cocosearch-refactoring cocosearch-debugging cocosearch-quickstart cocosearch-explore cocosearch-new-feature cocosearch-subway; do
      ln -sfn "../../skills/$skill" ".claude/skills/$skill"
  done
  ```

## Features

- 🔍 **Hybrid search** -- combines semantic similarity and keyword matching via RRF fusion to find code by meaning and by text.
- 🏷️ **Symbol filtering** -- narrow results to functions, classes, methods, or interfaces; match symbol names with glob patterns.
- 📐 **Context expansion** -- results automatically expand to enclosing function/class boundaries using Tree-sitter, so you see complete units of code.
- ⚡ **Query caching** -- exact and semantic cache for fast repeated queries (0.95 cosine threshold).
- 🩺 **Parse health tracking** -- per-language parse status, failure details, and staleness warnings when the index drifts from your branch.
- 🔒 **Privacy-first** -- everything runs locally. No external API calls, no telemetry.

## Interfaces

Search your code four ways — pick what fits your workflow:

| Interface            | Best for                                                                                                                                                          | How to start                        |
| -------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------- |
| **CLI**              | One-off searches, scripting, CI                                                                                                                                   | `cocosearch search "auth flow"`     |
| **Interactive REPL** | Exploratory sessions — tweak filters, switch indexes, iterate on queries without restarting                                                                       | `cocosearch search --interactive`   |
| **Web Dashboard**    | Visual search + index management in the browser — filters, syntax-highlighted results, charts, dark/light theme                                                   | `cocosearch dashboard`              |
| **MCP Server**       | AI assistant integration ([Claude Code](https://claude.com/product/claude-code), [Claude Desktop](https://claude.com/download), [OpenCode](https://opencode.ai/)) | `cocosearch mcp --project-from-cwd` |

### CLI

```bash
# Index a project
uvx cocosearch index /path/to/project

# Search with natural language
uvx cocosearch search "authentication flow" --pretty

# Serve CocoSearch WEB dashboard
uvx cocosearch dashboard

# Start interactive REPL
uvx cocosearch search --interactive

# View index stats with parse health
uvx cocosearch stats --pretty

❯ uv run cocosearch stats --pretty

Index: cocosearch
Source: GIT/personal/coco-s
Branch: main (0b6050b) · up to date
Status: Indexed
Files: 192 | Chunks: 2,023 | Size: 15.0 MB
Created: 2026-02-09 18:30
Last Updated: 2026-02-14 12:36 (0 days ago)

                        Language Distribution
┏━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Language     ┃  Files ┃   Chunks ┃ Distribution                   ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ py           │    162 │     1648 │ ██████████████████████████████ │
│ md           │     22 │      267 │ ████▊                          │
│ html         │      1 │      100 │ █▊                             │
│ json         │      3 │        3 │                                │
│ toml         │      1 │        2 │                                │
│ yaml         │      2 │        2 │                                │
│ docker-comp… │      1 │        1 │                                │
└──────────────┴────────┴──────────┴────────────────────────────────┘

                             Grammar Distribution
┏━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ Grammar              ┃ Base Language  ┃  Files ┃   Chunks ┃  Recognition % ┃
┡━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ docker-compose       │ yaml           │      1 │        1 │         100.0% │
└──────────────────────┴────────────────┴────────┴──────────┴────────────────┘

 Symbol Statistics
┏━━━━━━━━━━┳━━━━━━━┓
┃ Type     ┃ Count ┃
┡━━━━━━━━━━╇━━━━━━━┩
│ function │   927 │
│ class    │   229 │
└──────────┴───────┘

Parse health: 100.0% clean (162/162 files)
                Parse Status by Language
┏━━━━━━━━━━┳━━━━━━━┳━━━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━┓
┃ Language ┃ Files ┃  OK ┃ Partial ┃ Error ┃ No Grammar ┃
┡━━━━━━━━━━╇━━━━━━━╇━━━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━┩
│ python   │   162 │ 162 │       0 │     0 │          0 │
└──────────┴───────┴─────┴─────────┴───────┴────────────┘

# View index stats with parse health live
uvx cocosearch stats --live

# List all indexes
uvx cocosearch list --pretty

┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ Name       ┃ Table                                      ┃ Branch                  ┃ Status  ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━┩
│ cocosearch │ codeindex_cocosearch__cocosearch_chunks     │ main (ed00733)          │ Indexed │
└────────────┴────────────────────────────────────────────┴─────────────────────────┴─────────┘
```

For the full list of commands and flags, see [CLI Reference](./docs/cli-reference.md).

### Web Dashboard

`cocosearch dashboard` opens a browser UI at `http://localhost:8080` with:

- **Code search** — natural language queries with language, symbol type, and hybrid search filters. Results show syntax-highlighted snippets, score badges, match type, and symbol metadata.
- **Index management** — create, reindex (incremental or fresh), and delete indexes from the browser.
- **Observability** — language distribution charts, parse health breakdown, staleness warnings, storage metrics.

<details>
<summary>Dashboard screenshots</summary>

<p align="center">
  <img src="./docs/dashboard-dark.png" alt="CocoSearch dashboard — dark theme" width="480">
  &nbsp;&nbsp;
  <img src="./docs/dashboard-light.png" alt="CocoSearch dashboard — light theme" width="480">
</p>

<p align="center">
  <img src="./docs/dashboard-search-light.png" alt="CocoSearch dashboard — search results with file actions" width="480">
  &nbsp;&nbsp;
  <img src="./docs/dashboard-open-file-dark.png" alt="CocoSearch dashboard — file viewer modal" width="480">
</p>

</details>

### Interactive REPL

`cocosearch search --interactive` starts a persistent search session:

```
cocosearch> authentication middleware
  [results...]
cocosearch> :lang python
  Language filter: python
cocosearch> error handling in views
  [results filtered to Python...]
cocosearch> :index other-project
  Switched to index: other-project
```

Settings persist across queries — change `:limit`, `:lang`, `:context`, or `:index` without restarting. Supports command history (up/down arrows) and inline filters (`lang:python` directly in queries).

## Where MCP wins

For codebases of meaningful size, CocoSearch reduces the number of MCP tool calls needed to find relevant code — often from 5-15 iterative grep/read cycles down to 1-2 semantic searches. This means fewer round-trips, less irrelevant content in the context window, and lower token consumption for exploratory and intent-based queries.

- **Exploratory/semantic queries**: "how does authentication work", "where is error handling done", "find the caching logic".
  - Native approach: Claude does 5-15 iterative grep/glob/read cycles, each adding results to context. Lots of trial-and-error, irrelevant matches, and full-file reads.
  - CocoSearch: 1 search_code call returns ranked, pre-chunked results with smart context expansion to function/class boundaries. Dramatically fewer tokens in context.
- **Identifier search with fuzzy intent**: "find the function that handles user signup".
  - Native grep requires Claude to guess the exact name (grep "signup", grep "register", grep "create_user"...). Each miss costs a round-trip + tokens.
  - CocoSearch's hybrid RRF (vector + keyword) handles this in 1 call.
- **Filtered searches**: language/symbol type/symbol name filtering is built-in. Native tools require Claude to manually assemble glob patterns and filter results.

## Useful Documentation

- [How It Works](./docs/how-it-works.md)
- [Architecture Overview](./docs/architecture.md)
- [Search Features](./docs/search-features.md)
- [Dogfooding](./docs/dogfooding.md)
- [MCP Configuration](./docs/mcp-configuration.md)
- [MCP Tools Reference](./docs/mcp-tools.md)
- [CLI Reference](./docs/cli-reference.md)
- [Retrieval Logic](./docs/retrieval.md)
- [Adding Languages](./docs/adding-languages.md)

## Components

- **Ollama** -- runs the embedding model (`nomic-embed-text`) locally.
- **PostgreSQL + pgvector** -- stores code chunks and their vector embeddings for similarity search.
- **CocoSearch** -- CLI and MCP server that coordinates indexing and search.

### Available MCP Tools

- `index_codebase` -- index a directory for semantic search
- `search_code` -- search indexed code with natural language queries
- `list_indexes` -- list all available indexes
- `index_stats` -- get statistics and parse health for an index
- `clear_index` -- remove an index from the database

### Available Skills

- **cocosearch-quickstart** ([SKILL.md](./skills/cocosearch-quickstart/SKILL.md)): Use when setting up CocoSearch for the first time or indexing a new project. Guides through infrastructure check, indexing, and verification in under 2 minutes.
- **cocosearch-debugging** ([SKILL.md](./skills/cocosearch-debugging/SKILL.md)): Use when debugging an error, unexpected behavior, or tracing how code flows through a system. Guides root cause analysis using CocoSearch semantic and symbol search.
- **cocosearch-onboarding** ([SKILL.md](./skills/cocosearch-onboarding/SKILL.md)): Use when onboarding to a new or unfamiliar codebase. Guides you through understanding architecture, key modules, and code patterns step-by-step using CocoSearch.
- **cocosearch-refactoring** ([SKILL.md](./skills/cocosearch-refactoring/SKILL.md)): Use when planning a refactoring, extracting code into a new module, renaming across the codebase, or splitting a large file. Guides impact analysis and safe step-by-step execution using CocoSearch.
- **cocosearch-new-feature** ([SKILL.md](./skills/cocosearch-new-feature/SKILL.md)): Use when adding new functionality — a new command, endpoint, module, handler, or capability. Guides placement, pattern matching, and integration using CocoSearch.
- **cocosearch-explore** ([SKILL.md](./skills/cocosearch-explore/SKILL.md)): Use for codebase exploration — answering questions about how code works, tracing flows, or researching a topic. Autonomous mode for subagent/plan mode research; interactive mode for user-facing "how does X work?" explanations.
- **cocosearch-subway** ([SKILL.md](./skills/cocosearch-subway/SKILL.md)): Use when the user wants to visualize codebase structure as an interactive London Underground-style subway map. AI-generated visualization using CocoSearch tools for exploration.

## How Search Works

```
 Query: "authentication flow"
 ─────────────────────────────────────────────────────────────────────
                              │
                    ┌─────────▼──────────┐
                    │   Query Analysis   │  Detect identifiers
                    │  (camelCase, etc.) │  → auto-enable hybrid
                    └─────────┬──────────┘
                              │
                    ┌─────────▼──────────┐
                    │  Ollama Embedding  │  nomic-embed-text
                    │   768-dim vector   │  (runs locally)
                    └─────────┬──────────┘
                              │
              ┌───────────────┴───────────────┐
              │                               │
    ┌─────────▼──────────┐          ┌─────────▼──────────┐
    │  Vector Similarity │          │  Keyword Search    │
    │  (pgvector cosine) │          │  (tsvector FTS)    │
    └─────────┬──────────┘          └─────────┬──────────┘
              │                               │
              └───────────┬───────────────────┘
                          │
                ┌─────────▼──────────┐
                │    RRF Fusion      │  Reciprocal Rank Fusion
                │  + Definition 2x   │  merges both ranked lists
                └─────────┬──────────┘
                          │
                ┌─────────▼──────────┐
                │  Symbol & Language  │  --symbol-type function
                │     Filtering       │  --language python
                └─────────┬──────────┘
                          │
                ┌─────────▼──────────┐
                │ Context Expansion  │  Expand to enclosing
                │ (Tree-sitter)      │  function/class boundaries
                └─────────┬──────────┘
                          │
                ┌─────────▼──────────┐
                │   Query Cache      │  Exact hash + semantic
                │   (LRU + 0.95)     │  similarity fallback
                └─────────┬──────────┘
                          │
                          ▼
                   Ranked Results
 ─────────────────────────────────────────────────────────────────────
```

## Supported Languages

CocoSearch indexes 31 programming languages. Symbol-aware languages (✓) support `--symbol-type` and `--symbol-name` filtering.

```
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ Language   ┃ Extensions                  ┃ Symbols ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━┩
│ C          │ .c, .h                      │    ✓    │
│ C++        │ .cpp, .cc, .cxx, .hpp, .hxx │    ✓    │
│ C#         │ .cs                         │    ✗    │
│ CSS        │ .css, .scss                 │    ✓    │
│ DTD        │ .dtd                        │    ✗    │
│ Fortran    │ .f, .f90, .f95, .f03        │    ✗    │
│ Go         │ .go                         │    ✓    │
│ Groovy     │ .groovy, .gradle            │    ✗    │
│ HTML       │ .html, .htm                 │    ✗    │
│ Java       │ .java                       │    ✓    │
│ Javascript │ .js, .mjs, .cjs, .jsx       │    ✓    │
│ JSON       │ .json                       │    ✗    │
│ Kotlin     │ .kt, .kts                   │    ✗    │
│ Markdown   │ .md, .mdx                   │    ✗    │
│ Pascal     │ .pas, .dpr                  │    ✗    │
│ Php        │ .php                        │    ✓    │
│ Python     │ .py, .pyw, .pyi             │    ✓    │
│ R          │ .r, .R                      │    ✗    │
│ Ruby       │ .rb                         │    ✓    │
│ Rust       │ .rs                         │    ✓    │
│ Scala      │ .scala                      │    ✓    │
│ Solidity   │ .sol                        │    ✗    │
│ SQL        │ .sql                        │    ✗    │
│ Swift      │ .swift                      │    ✗    │
│ TOML       │ .toml                       │    ✗    │
│ Typescript │ .ts, .tsx, .mts, .cts       │    ✓    │
│ XML        │ .xml                        │    ✗    │
│ YAML       │ .yaml, .yml                 │    ✗    │
│ Bash       │ .sh, .bash, .zsh            │    ✓    │
│ Dockerfile │ Dockerfile                  │    ✗    │
│ HCL        │ .tf, .hcl, .tfvars          │    ✓    │
└────────────┴─────────────────────────────┴─────────┘
```

<details>
<summary>How chunking works</summary>

Chunking strategy depends on the language:

- **Tree-sitter chunking (~20 languages)**: CocoIndex's `SplitRecursively` uses Tree-sitter internally to split at syntax-aware boundaries (function/class edges). Covers Python, JavaScript, TypeScript, Go, Rust, Java, C, C++, C#, Ruby, PHP, and others in CocoIndex's [built-in list](https://cocoindex.io/docs/ops/functions#supported-languages).
- **Custom handler chunking (6 languages)**: HCL, Dockerfile, Bash, Go Template, Scala, and Groovy use regex-based `CustomLanguageSpec` separators tuned for their syntax — no Tree-sitter grammar available for these in CocoIndex.
- **Text fallback**: Languages not recognized by either tier (Markdown, JSON, YAML, TOML, etc.) are split on blank lines and whitespace boundaries.

In short: CocoIndex's Tree-sitter tells you _where to cut_; the `.scm` files tell you _what's inside each piece_.

Independently of chunking, CocoSearch runs its own Tree-sitter queries (`.scm` files in `src/cocosearch/indexer/queries/`) to extract symbol metadata — function, class, method, and interface names and signatures. This powers `--symbol-type` and `--symbol-name` filtering. Symbol extraction is available for 14 languages.

See [Adding Languages](./docs/adding-languages.md) for details on how these tiers work and how to add new languages or grammars.

</details>

## Supported Grammars

Beyond language-level support, CocoSearch recognizes **grammars** — domain-specific schemas within a base language. A **language** is matched by file extension (e.g., `.yaml` -> YAML), while a **grammar** is matched by file path and content patterns (e.g., `.github/workflows/ci.yml` containing `on:` + `jobs:` -> GitHub Actions). Grammars provide structured chunking and richer metadata compared to generic text chunking.

```
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Grammar        ┃ File Format ┃ Path Patterns                                                                    ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ docker-compose │ yaml        │ docker-compose*.yml, docker-compose*.yaml, compose*.yml, compose*.yaml           │
│ github-actions │ yaml        │ .github/workflows/*.yml, .github/workflows/*.yaml                                │
│ gitlab-ci      │ yaml        │ .gitlab-ci.yml                                                                   │
│ helm-template  │ gotmpl      │ **/templates/*.yaml, **/templates/**/*.yaml, **/templates/*.yml,                 │
│                │             │ **/templates/**/*.yml                                                            │
│ helm-values    │ yaml        │ **/values.yaml, **/values-*.yaml                                                 │
│ kubernetes     │ yaml        │ *.yaml, *.yml                                                                    │
└────────────────┴─────────────┴──────────────────────────────────────────────────────────────────────────────────┘
```

<details>
<summary>How grammar matching works</summary>

Priority: Grammar match > Language match > TextHandler fallback.

A grammar is matched by file path patterns and optionally by content patterns. For example, a YAML file at `.github/workflows/ci.yml` containing `on:` + `jobs:` is recognized as GitHub Actions, not generic YAML. This enables structured chunking by job/step and richer metadata extraction (job names, service names, stages).

</details>

## Configuration

Create `cocosearch.yaml` in your project root to customize indexing:

```yaml
indexing:
  # See also https://cocoindex.io/docs/ops/functions#supported-languages
  include_patterns:
    - "*.py"
    - "*.js"
    - "*.ts"
    - "*.go"
    - "*.rs"
  exclude_patterns:
    - "*_test.go"
    - "*.min.js"
  chunk_size: 1000 # bytes
  chunk_overlap: 300 # bytes
```

## Testing

Tests use [pytest](https://docs.pytest.org/). All tests are unit tests, fully mocked, and require no infrastructure. Markers are auto-applied based on directory -- no need to add them manually.

```bash
uv run pytest                                          # Run all unit tests
uv run pytest tests/unit/search/test_cache.py -v       # Single file
uv run pytest -k "test_rrf_double_match" -v            # Single test by name
uv run pytest tests/unit/handlers/ -v                  # Handler tests
```

## Troubleshooting

**Dashboard shows "Indexing" but CLI shows "Indexed"**

The web dashboard and CLI now share a status sync mechanism: when the dashboard detects a live indexing thread, it corrects the database status so both interfaces agree. If you still see a discrepancy, check whether indexing is genuinely running (CPU usage, `docker stats` for Ollama activity).

**Index appears stuck in "Indexing" status**

After 1 hour with no progress updates, the status auto-recovers to "Indexed". You can also run `cocosearch index .` again to force a fresh index, which will reset the status.

**High CPU after indexing appears complete**

Ollama may still be processing embeddings in its queue. Check with `docker stats` or `ps aux | grep ollama`. CocoIndex may also perform background cleanup after the main indexing loop finishes.
