Metadata-Version: 2.4
Name: rubberduck-index
Version: 0.1.7
Summary: Local project indexer for RubberDuck Semantic Intelligence MCP
Author: RubberDuck Team
License-Expression: MIT
Project-URL: Homepage, https://github.com/Grieco/cpg_query_service
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Quality Assurance
Classifier: Topic :: Software Development :: Testing
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: requests>=2.28
Requires-Dist: pathspec>=0.11
Requires-Dist: watchdog>=3.0

# rubberduck-index

Local project indexer for [RubberDuck Semantic Intelligence](https://github.com/Grieco/cpg_query_service). Syncs your Python source code to the RubberDuck MCP server where it's analyzed as Code Property Graphs (CPGs) — enabling LLMs to query definitions, data flow, call chains, and more.

## How It Works

```
Your Machine                          RubberDuck Server
┌──────────────┐    SHA-256 hashes    ┌──────────────────────┐
│  rubberduck  │───────────────────►  │  /index/manifest     │
│  -index      │◄──── need_upload ──  │  (diff check)        │
│              │                      │                      │
│  scanner.py  │──── changed files ─► │  /index/upload       │
│  syncer.py   │    (gzip tar/JSON)   │  (store + build CPG) │
│  watcher.py  │                      │                      │
└──────────────┘                      │  ProjectStore        │
                                      │  LocalProjects/      │
                                      │    u{user_id}/       │
                                      │      {project}/      │
                                      └──────────────────────┘
```

1. **Scan** — Finds Python files, computes SHA-256 hashes, respects `.gitignore`
2. **Diff** — Sends hashes to server; server replies with which files need uploading
3. **Upload** — Sends only changed files (JSON for small batches, gzip tar for large)
4. **Build** — Server stores files in user-scoped directories and builds CPG graphs
5. **Query** — LLMs use MCP tools (`analyze_code`, `trace_variable`, `call_chain`, etc.)

## Install

```bash
pip install rubberduck-index

# With file watcher support (auto-sync on save):
pip install "rubberduck-index[watch]"
```

### Requirements

- Python 3.9+
- `requests` (HTTP client) — installed automatically
- `pathspec` (`.gitignore`-compatible pattern matching) — installed automatically
- `watchdog` (optional — for `watch` command)

## Quick Start

```bash
# 1. Initialize a project (first-time setup)
cd ~/my-python-project
rubberduck-index init
# Prompts for your auth token (one-time setup)
# Auto-detects project name from directory

# 2. Sync changes (incremental — only uploads what changed)
rubberduck-index sync

# 3. Watch for changes (auto-sync on file save)
rubberduck-index watch -d    # daemon mode (background)
rubberduck-index stop        # stop the daemon

# 4. Check status
rubberduck-index status      # local + server status
rubberduck-index list        # all your projects on server
```

## Commands

### `init`

Initialize a project directory for indexing. Creates `.rubberduck/config.json`, prompts for your token, scans files, and performs the first sync.

```bash
cd ~/my-project
rubberduck-index init [OPTIONS]
```

| Flag | Description |
|------|-------------|
| `--project` | Project name on the server (default: current directory name) |
| `--token` | Bearer token (skips interactive prompt) |
| `--server` | Override server URL (default: `https://semantic.rubberduck.com`) |
| `--directory` | Project directory (default: current directory) |
| `--include` | File patterns to include (default: `**/*.py`) |
| `--watch`, `-w` | Start background watcher after init |

**Typical usage — no flags needed:**

```bash
cd ~/my-project
rubberduck-index init
# Enter your token: ****
# Scanning project... 42 file(s)
# Uploading... Synced 42 file(s), 42 CPG graph(s) built
```

### `sync`

Sync changed files to the server. Compares local hashes with server manifest and uploads only what's different.

```bash
rubberduck-index sync [--force]
```

| Flag | Description |
|------|-------------|
| `--force` | Force full re-upload (ignore hash comparison) |

### `watch`

Watch for file changes and auto-sync. Uses OS-native file system events (FSEvents on macOS, inotify on Linux).

```bash
rubberduck-index watch [-d]
```

| Flag | Description |
|------|-------------|
| `--daemon`, `-d` | Run in background. Stop with `rubberduck-index stop`. |

Changes are debounced (default 500ms) and batched before uploading.

### `stop`

Stop the background watcher daemon.

```bash
rubberduck-index stop
```

### `status`

Show index status for the current project — local file count vs. server state.

```bash
rubberduck-index status
```

### `list`

List all your indexed projects on the server.

```bash
rubberduck-index list
```

### `remove`

Remove a project from the server (deletes synced files and CPG graphs).

```bash
rubberduck-index remove [--project NAME]
```

## Authentication

The server requires a Bearer token. Get your token from your RubberDuck admin.

**Three ways to provide your token (in priority order):**

1. **Interactive prompt** (default on `init`):
   ```bash
   rubberduck-index init
   # Enter your token: ****
   # Token is saved to .rubberduck/config.json — you won't be asked again
   ```

2. **Environment variable** (any command):
   ```bash
   export RUBBERDUCK_TOKEN=your-token
   rubberduck-index init
   ```

3. **`--token` flag** (init only, for scripting):
   ```bash
   rubberduck-index init --token your-token
   ```

The token is saved in `.rubberduck/config.json` after init. All subsequent commands (`sync`, `watch`, `status`, etc.) read it from there automatically.

The `.rubberduck/` directory is automatically added to `.gitignore` to prevent accidental token commits.

## Configuration

All config lives in `.rubberduck/config.json` (created by `init`):

```json
{
  "server": "https://semantic.rubberduck.com",
  "project": "my-app",
  "token": "your-bearer-token",
  "include": ["**/*.py"],
  "exclude_defaults": true,
  "max_file_size": 50000000,
  "watch_debounce_ms": 500
}
```

| Field | Default | Description |
|-------|---------|-------------|
| `server` | `https://semantic.rubberduck.com` | MCP server URL |
| `project` | directory name | Project name on the server |
| `token` | — | Bearer token for auth |
| `include` | `["**/*.py"]` | Glob patterns for files to index |
| `exclude_defaults` | `true` | Use built-in exclude list (see below) |
| `max_file_size` | `50000000` (50MB) | Skip files larger than this |
| `watch_debounce_ms` | `500` | Debounce interval for file watcher |

### Custom ignore patterns

Create `.rubberduck/ignore` with `.gitignore`-style patterns:

```
# Extra excludes
tests/fixtures/**
docs/**
*.generated.py
```

### Built-in excludes

Always excluded regardless of config: `__pycache__`, `.git`, `.venv`, `venv`, `node_modules`, `.tox`, `.mypy_cache`, `.pytest_cache`, `.eggs`, `*.egg-info`, `dist`, `build`, `.DS_Store`, `.hg`, `.svn`, `.env`.

## User Isolation

Each user's projects are stored in separate directories on the server. Two users can have projects with the same name without conflicts:

```
LocalProjects/
  u1/my-app/       ← User 1's "my-app"
  u2/my-app/       ← User 2's "my-app" (completely isolated)
```

The `user_id` is derived from your Bearer token — the CLI never needs to know it.

## Server Limits

| Limit | Default | Description |
|-------|---------|-------------|
| Max project size | 2 GB | Total size of all files in a project |
| Max file size | 50 MB | Individual file size limit |
| Compressed upload | 2 GB | Max gzip-compressed body size |

## Examples

### Index a Django project

```bash
cd ~/django-project
rubberduck-index init --watch
```

### Index only specific directories

```bash
rubberduck-index init --include "src/**/*.py" "lib/**/*.py"
```

### Use with Cursor / Claude Code

After indexing, tell the LLM in Cursor or Claude Code:

```
Load the project "my-app" and trace the data flow of the `request` variable.
```

The LLM will use MCP tools:
1. `load_repo(repo="local/my-app")` — loads CPG graphs
2. `analyze_code(statement="trace data flow of request", graph_id="...")` — queries the graph
3. Returns facts about definitions, assignments, and flow paths

## Troubleshooting

**"No .rubberduck/config.json found"**
Run `rubberduck-index init` first, or `cd` into the project directory.

**"Token is required"**
Get your token from your RubberDuck admin, then run `rubberduck-index init` again.

**"Request body too large"**
Your project exceeds 2GB. Use `--include` patterns to reduce file count.

**"No matching files found"**
Check your `include` patterns in `.rubberduck/config.json`. Default is `**/*.py`.

**Watcher not detecting changes**
Ensure `watchdog` is installed: `pip install "rubberduck-index[watch]"`. On Linux, check inotify limits: `sysctl fs.inotify.max_user_watches`.
