Metadata-Version: 2.4
Name: gloggur
Version: 0.1.0
Summary: Symbol-level, incremental codebase indexer for semantic search and precedent retrieval.
Author: Gloggur Contributors
License-Expression: MIT
Keywords: code-search,indexer,tree-sitter,embeddings,cli
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: click>=8.1
Requires-Dist: pydantic>=2.6
Requires-Dist: pyyaml>=6.0
Requires-Dist: watchfiles>=0.22.0
Requires-Dist: tree-sitter>=0.25
Requires-Dist: tree-sitter-language-pack>=0.13.0
Requires-Dist: faiss-cpu>=1.8
Requires-Dist: numpy>=1.26
Requires-Dist: sentence-transformers>=2.7
Provides-Extra: local
Provides-Extra: openai
Requires-Dist: openai>=1.3; extra == "openai"
Requires-Dist: tenacity>=8.3; extra == "openai"
Provides-Extra: gemini
Requires-Dist: google-genai>=0.5; extra == "gemini"
Provides-Extra: all
Requires-Dist: sentence-transformers>=2.7; extra == "all"
Requires-Dist: openai>=1.3; extra == "all"
Requires-Dist: tenacity>=8.3; extra == "all"
Requires-Dist: google-genai>=0.5; extra == "all"
Provides-Extra: dev
Requires-Dist: pytest>=8.2.0; extra == "dev"
Requires-Dist: pytest-cov>=5.0.0; extra == "dev"
Requires-Dist: black>=24.4.0; extra == "dev"
Requires-Dist: mypy>=1.10.0; extra == "dev"
Requires-Dist: ruff>=0.4.0; extra == "dev"
Dynamic: license-file

# Gloggur

Gloggur is a symbol-level, incremental codebase indexer for semantic search and precedent retrieval. It parses code into symbols, generates embeddings, stores them locally, and exposes a JSON-friendly CLI for integration with AI agents.

## Features

- Tree-sitter parsing for multi-language symbol extraction
- Incremental indexing with SHA-256 hashing
- Pluggable embedding backends (local models, OpenAI, or Gemini)
- FAISS vector search with SQLite metadata
- Docstring audit with semantic similarity scoring
- JSON output for CLI automation

## Installation

Preferred:

```bash
pipx install gloggur
```

Alternatives:

```bash
pip install gloggur
```

Optional:

```bash
pipx install "gloggur[openai]"
pipx install "gloggur[gemini]"
pipx install "gloggur[local]"
```

For local development worktrees, use the repo wrapper/bootstrap flow:

```bash
scripts/bootstrap_gloggur_env.sh
scripts/gloggur status --json
```

`scripts/gloggur` now runs a preflight check before launching the CLI:
- prefers `.venv/bin/python` when healthy
- otherwise falls back to system Python with repo-root `PYTHONPATH`
- returns structured `--json` failures with `operation=preflight`

## Quickstart

Index a repository:

```bash
gloggur index . --json
```

Search for similar symbols:

```bash
gloggur search "streaming parser" --top-k 5 --json
```

Stream results as line-delimited JSON:

```bash
gloggur search "streaming parser" --top-k 50 --json --stream
```

Inspect docstrings (semantic similarity scoring):

```bash
gloggur inspect . --json
```

`gloggur inspect` skips unchanged files by default. Use `--force` to reinspect everything.

Check status:

```bash
gloggur status --json
```

Enable save-triggered indexing (one-time setup):

```bash
gloggur watch init . --json
gloggur watch start --daemon --json
```

Check or stop watcher:

```bash
gloggur watch status --json
gloggur watch stop --json
```

Clear cache:

```bash
gloggur clear-cache --json
```

Cache compatibility is automatic:
- If cache schema changes, Gloggur rebuilds `.gloggur-cache/index.db` automatically.
- If embedding provider/model changes, the next `gloggur index ...` run rebuilds cache and vectors automatically.

## On-Save Indexing (Watch Mode)

Watch mode keeps the index updated in the background as files are saved.

- Filesystem events are debounced (`watch_debounce_ms`, default `300`) to coalesce save bursts.
- Content hashing avoids re-parsing/re-embedding unchanged files.
- Deleted/changed symbol vectors are removed before upsert to prevent stale search hits.
- Runtime state is written to `.gloggur-cache/watch_state.json` by default.

## Verification

Core behavior checks run in `pytest` (including smoke tests):

```bash
.venv/bin/pytest
```

Additional verification probes are available for provider/edge/performance checks:

```bash
# Run non-test verification phases (providers, edge cases, performance)
python scripts/run_suite.py

# Run specific phases
python scripts/run_provider_probe.py  # Embedding providers
python scripts/run_edge_bench.py  # Edge cases & performance
```

See `docs/VERIFICATION.md` for detailed documentation.
See `docs/AGENT_INTEGRATION.md` for agent integration guidance.

## Configuration

Create `.gloggur.yaml` or `.gloggur.json` in your repository:

```yaml
embedding_provider: gemini
local_embedding_model: microsoft/codebert-base
openai_embedding_model: text-embedding-3-large
gemini_embedding_model: gemini-embedding-001
cache_dir: .gloggur-cache
watch_enabled: false
watch_path: .
watch_debounce_ms: 300
watch_mode: daemon
watch_state_file: .gloggur-cache/watch_state.json
watch_pid_file: .gloggur-cache/watch.pid
watch_log_file: .gloggur-cache/watch.log
docstring_semantic_threshold: 0.2
docstring_semantic_max_chars: 4000
supported_extensions:
  - .py
  - .js
  - .jsx
  - .ts
  - .tsx
  - .rs
  - .go
  - .java
excluded_dirs:
  - .git
  - node_modules
  - venv
  - .venv
  - .gloggur-cache
  - dist
  - build
  - htmlcov
```

Environment variables:

- `GLOGGUR_EMBEDDING_PROVIDER`
- `GLOGGUR_LOCAL_MODEL`
- `GLOGGUR_OPENAI_MODEL`
- `OPENAI_API_KEY`
- `GLOGGUR_GEMINI_MODEL`
- `GLOGGUR_GEMINI_API_KEY`
- `GLOGGUR_CACHE_DIR`
- `GLOGGUR_WATCH_ENABLED`
- `GLOGGUR_WATCH_PATH`
- `GLOGGUR_WATCH_DEBOUNCE_MS`
- `GLOGGUR_WATCH_MODE`
- `GLOGGUR_DOCSTRING_SEMANTIC_MIN_CHARS`
- `GEMINI_API_KEY` (or `GOOGLE_API_KEY`)

## Output Schema

Search results are returned as JSON:

```json
{
  "query": "...",
  "results": [
    {
      "symbol": "function_name",
      "kind": "function",
      "file": "path/to/file.py",
      "line": 42,
      "signature": "def function_name(arg1, arg2)",
      "docstring": "...",
      "similarity_score": 0.95,
      "context": "surrounding code snippet"
    }
  ],
  "metadata": {
    "total_results": 10,
    "search_time_ms": 45
  }
}
```

## Development

```bash
scripts/bootstrap_gloggur_env.sh
scripts/gloggur status --json
.venv/bin/pytest
```

If bootstrap/preflight fails with `--json`, error payloads include:
- `error_code`: `missing_venv`, `missing_python`, `missing_package`, `broken_environment`
- `message`
- `remediation` steps
- `detected_environment` details
