Metadata-Version: 2.4
Name: mcp-vector-search
Version: 0.8.2
Summary: CLI-first semantic code search with MCP integration
Project-URL: Homepage, https://github.com/bobmatnyc/mcp-vector-search
Project-URL: Documentation, https://mcp-vector-search.readthedocs.io
Project-URL: Repository, https://github.com/bobmatnyc/mcp-vector-search
Project-URL: Bug Tracker, https://github.com/bobmatnyc/mcp-vector-search/issues
Author-email: Robert Matsuoka <bobmatnyc@gmail.com>
License: MIT License
        
        Copyright (c) 2024 Robert Matsuoka
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
License-File: LICENSE
Keywords: code-search,mcp,semantic-search,vector-database
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Code Generators
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.11
Requires-Dist: aiofiles>=23.0.0
Requires-Dist: chromadb>=0.5.0
Requires-Dist: click-didyoumean>=0.3.0
Requires-Dist: httpx>=0.25.0
Requires-Dist: loguru>=0.7.0
Requires-Dist: mcp>=1.12.4
Requires-Dist: packaging>=23.0
Requires-Dist: pydantic-settings>=2.1.0
Requires-Dist: pydantic>=2.5.0
Requires-Dist: rich>=13.0.0
Requires-Dist: sentence-transformers>=2.2.2
Requires-Dist: tree-sitter-language-pack>=0.9.0
Requires-Dist: tree-sitter>=0.20.1
Requires-Dist: typer>=0.9.0
Requires-Dist: watchdog>=3.0.0
Description-Content-Type: text/markdown

# MCP Vector Search

🔍 **CLI-first semantic code search with MCP integration**

[![PyPI version](https://badge.fury.io/py/mcp-vector-search.svg)](https://badge.fury.io/py/mcp-vector-search)
[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

> ⚠️ **Alpha Release (v0.7.1)**: This is an early-stage project under active development. Expect breaking changes and rough edges. Feedback and contributions are welcome!

A modern, fast, and intelligent code search tool that understands your codebase through semantic analysis and AST parsing. Built with Python, powered by ChromaDB, and designed for developer productivity.

## ✨ Features

### 🚀 **Core Capabilities**
- **Semantic Search**: Find code by meaning, not just keywords
- **AST-Aware Parsing**: Understands code structure (functions, classes, methods)
- **Multi-Language Support**: 8 languages - Python, JavaScript, TypeScript, Dart/Flutter, PHP, Ruby, HTML, and Markdown/Text (with extensible architecture)
- **Real-time Indexing**: File watching with automatic index updates
- **Automatic Version Tracking**: Smart reindexing on tool upgrades
- **Local-First**: Complete privacy with on-device processing
- **Zero Configuration**: Auto-detects project structure and languages

### 🛠️ **Developer Experience**
- **CLI-First Design**: Simple commands for immediate productivity
- **Rich Output**: Syntax highlighting, similarity scores, context
- **Fast Performance**: Sub-second search responses, efficient indexing
- **Modern Architecture**: Async-first, type-safe, modular design
- **Semi-Automatic Reindexing**: Multiple strategies without daemon processes

### 🔧 **Technical Features**
- **Vector Database**: ChromaDB with connection pooling for 13.6% performance boost
- **Embedding Models**: Configurable sentence transformers
- **Smart Reindexing**: Search-triggered, Git hooks, scheduled tasks, and manual options
- **Extensible Parsers**: Plugin architecture for new languages
- **Configuration Management**: Project-specific settings
- **Production Ready**: Connection pooling, auto-indexing, comprehensive error handling

## 🚀 Quick Start

### Installation

```bash
# Install from PyPI
pip install mcp-vector-search

# Or with UV (recommended)
uv add mcp-vector-search

# Or install from source
git clone https://github.com/bobmatnyc/mcp-vector-search.git
cd mcp-vector-search
uv sync && uv pip install -e .
```

### Complete Setup with Install Command

The new **enhanced install command** provides a complete one-step setup:

```bash
# Interactive setup with MCP configuration
mcp-vector-search install

# Setup without MCP configuration
mcp-vector-search install --no-mcp

# Setup for specific MCP tool
mcp-vector-search install --mcp-tool "Claude Code"

# Setup without automatic indexing
mcp-vector-search install --no-index

# Custom file extensions
mcp-vector-search install --extensions .py,.js,.ts,.dart
```

The install command:
- Initializes your project configuration
- Detects and configures MCP tools (Claude Code, Cursor, Windsurf, VS Code)
- Automatically indexes your codebase
- Provides rich progress indicators and next-step hints

### Basic Usage

```bash
# Initialize your project
mcp-vector-search init

# Index your codebase
mcp-vector-search index

# Search your code
mcp-vector-search search "authentication logic"
mcp-vector-search search "database connection setup"
mcp-vector-search search "error handling patterns"

# Setup automatic reindexing (recommended)
mcp-vector-search auto-index setup --method all

# Check project status
mcp-vector-search status

# Start file watching (auto-update index)
mcp-vector-search watch
```

### Smart CLI with "Did You Mean" Suggestions

The CLI includes intelligent command suggestions for typos:

```bash
# Typos are automatically detected and corrected
$ mcp-vector-search serach "auth"
No such command 'serach'. Did you mean 'search'?

$ mcp-vector-search indx
No such command 'indx'. Did you mean 'index'?
```

See [docs/CLI_FEATURES.md](docs/CLI_FEATURES.md) for more details.

## Versioning & Releasing

This project uses semantic versioning with an automated release workflow.

### Quick Commands
- `make version-show` - Display current version
- `make release-patch` - Create patch release
- `make publish` - Publish to PyPI

See [docs/VERSIONING_WORKFLOW.md](docs/VERSIONING_WORKFLOW.md) for complete documentation.

## 📖 Documentation

### Commands

#### `init` - Initialize Project
```bash
# Basic initialization
mcp-vector-search init

# Custom configuration
mcp-vector-search init --extensions .py,.js,.ts --embedding-model sentence-transformers/all-MiniLM-L6-v2

# Force re-initialization
mcp-vector-search init --force
```

#### `index` - Index Codebase
```bash
# Index all files
mcp-vector-search index

# Index specific directory
mcp-vector-search index /path/to/code

# Force re-indexing
mcp-vector-search index --force

# Reindex entire project
mcp-vector-search index reindex

# Reindex entire project (explicit)
mcp-vector-search index reindex --all

# Reindex entire project without confirmation
mcp-vector-search index reindex --force

# Reindex specific file
mcp-vector-search index reindex path/to/file.py
```

#### `search` - Semantic Search
```bash
# Basic search
mcp-vector-search search "function that handles user authentication"

# Adjust similarity threshold
mcp-vector-search search "database queries" --threshold 0.7

# Limit results
mcp-vector-search search "error handling" --limit 10

# Search in specific context
mcp-vector-search search similar "path/to/function.py:25"
```

#### `auto-index` - Automatic Reindexing
```bash
# Setup all auto-indexing strategies
mcp-vector-search auto-index setup --method all

# Setup specific strategies
mcp-vector-search auto-index setup --method git-hooks
mcp-vector-search auto-index setup --method scheduled --interval 60

# Check for stale files and auto-reindex
mcp-vector-search auto-index check --auto-reindex --max-files 10

# View auto-indexing status
mcp-vector-search auto-index status

# Remove auto-indexing setup
mcp-vector-search auto-index teardown --method all
```

#### `watch` - File Watching
```bash
# Start watching for changes
mcp-vector-search watch

# Check watch status
mcp-vector-search watch status

# Enable/disable watching
mcp-vector-search watch enable
mcp-vector-search watch disable
```

#### `status` - Project Information
```bash
# Basic status
mcp-vector-search status

# Detailed information
mcp-vector-search status --verbose
```

#### `config` - Configuration Management
```bash
# View configuration
mcp-vector-search config show

# Update settings
mcp-vector-search config set similarity_threshold 0.8
mcp-vector-search config set embedding_model microsoft/codebert-base

# List available models
mcp-vector-search config models
```

## 🚀 Performance Features

### Connection Pooling
Automatic connection pooling provides **13.6% performance improvement** with zero configuration:

```python
# Automatically enabled for high-throughput scenarios
from mcp_vector_search.core.database import PooledChromaVectorDatabase

database = PooledChromaVectorDatabase(
    max_connections=10,    # Pool size
    min_connections=2,     # Warm connections
    max_idle_time=300.0,   # 5 minutes
)
```

### Semi-Automatic Reindexing
Multiple strategies to keep your index up-to-date without daemon processes:

1. **Search-Triggered**: Automatically checks for stale files during searches
2. **Git Hooks**: Triggers reindexing after commits, merges, checkouts
3. **Scheduled Tasks**: System-level cron jobs or Windows tasks
4. **Manual Checks**: On-demand via CLI commands
5. **Periodic Checker**: In-process periodic checks for long-running apps

```bash
# Setup all strategies
mcp-vector-search auto-index setup --method all

# Check status
mcp-vector-search auto-index status
```

### Configuration

Projects are configured via `.mcp-vector-search/config.json`:

```json
{
  "project_root": "/path/to/project",
  "file_extensions": [".py", ".js", ".ts"],
  "embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
  "similarity_threshold": 0.75,
  "languages": ["python", "javascript", "typescript"],
  "watch_files": true,
  "cache_embeddings": true
}
```

## 🏗️ Architecture

### Core Components

- **Parser Registry**: Extensible system for language-specific parsing
- **Semantic Indexer**: Efficient code chunking and embedding generation
- **Vector Database**: ChromaDB integration for similarity search
- **File Watcher**: Real-time monitoring and incremental updates
- **CLI Interface**: Rich, user-friendly command-line experience

### Supported Languages

MCP Vector Search supports **8 programming languages** with full semantic search capabilities:

| Language   | Extensions | Status | Features |
|------------|------------|--------|----------|
| Python     | `.py`, `.pyw` | ✅ Full | Functions, classes, methods, docstrings |
| JavaScript | `.js`, `.jsx`, `.mjs` | ✅ Full | Functions, classes, JSDoc, ES6+ syntax |
| TypeScript | `.ts`, `.tsx` | ✅ Full | Interfaces, types, generics, decorators |
| Dart       | `.dart` | ✅ Full | Functions, classes, widgets, async, dartdoc |
| PHP        | `.php`, `.phtml` | ✅ Full | Classes, methods, traits, PHPDoc, Laravel patterns |
| Ruby       | `.rb`, `.rake`, `.gemspec` | ✅ Full | Modules, classes, methods, RDoc, Rails patterns |
| HTML       | `.html`, `.htm` | ✅ Full | Semantic content extraction, heading hierarchy, text chunking |
| Text/Markdown | `.txt`, `.md`, `.markdown` | ✅ Basic | Semantic chunking for documentation |

**Planned Languages:**
| Language   | Status | Features |
|------------|--------|----------|
| Java       | 🔄 Planned | Classes, methods, annotations |
| Go         | 🔄 Planned | Functions, structs, interfaces |
| Rust       | 🔄 Planned | Functions, structs, traits |

#### New Language Support

**HTML Support** (Unreleased):
- **Semantic Extraction**: Content from h1-h6, p, section, article, main, aside, nav, header, footer
- **Intelligent Chunking**: Based on heading hierarchy (h1-h6)
- **Context Preservation**: Maintains class and id attributes for searchability
- **Script/Style Filtering**: Ignores non-content elements
- **Use Cases**: Static sites, documentation, web templates, HTML fragments

**Dart/Flutter Support** (v0.4.15):
- **Widget Detection**: StatelessWidget, StatefulWidget recognition
- **State Classes**: Automatic parsing of `_WidgetNameState` patterns
- **Async Support**: Future<T> and async function handling
- **Dartdoc**: Triple-slash comment extraction
- **Tree-sitter AST**: Fast, accurate parsing with regex fallback

**PHP Support** (v0.5.0):
- **Class Detection**: Classes, interfaces, traits
- **Method Extraction**: Public, private, protected, static methods
- **Magic Methods**: __construct, __get, __set, __call, etc.
- **PHPDoc**: Full comment extraction
- **Laravel Patterns**: Controllers, Models, Eloquent support
- **Tree-sitter AST**: Fast parsing with regex fallback

**Ruby Support** (v0.5.0):
- **Module/Class Detection**: Full namespace support (::)
- **Method Extraction**: Instance and class methods
- **Special Syntax**: Method names with ?, ! support
- **Attribute Macros**: attr_accessor, attr_reader, attr_writer
- **RDoc**: Comment extraction (# and =begin...=end)
- **Rails Patterns**: ActiveRecord, Controllers support
- **Tree-sitter AST**: Fast parsing with regex fallback

## 🤝 Contributing

We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.

### Development Setup

```bash
# Clone the repository
git clone https://github.com/bobmatnyc/mcp-vector-search.git
cd mcp-vector-search

# Install dependencies with UV
uv sync

# Install in development mode
uv pip install -e .

# Run tests
uv run pytest

# Run linting
uv run ruff check
uv run mypy src/
```

### Adding Language Support

1. Create a new parser in `src/mcp_vector_search/parsers/`
2. Extend the `BaseParser` class
3. Register the parser in `parsers/registry.py`
4. Add tests and documentation

## 📊 Performance

- **Indexing Speed**: ~1000 files/minute (typical Python project)
- **Search Latency**: <100ms for most queries
- **Memory Usage**: ~50MB baseline + ~1MB per 1000 code chunks
- **Storage**: ~1KB per code chunk (compressed embeddings)

## ⚠️ Known Limitations (Alpha)

- **Tree-sitter Integration**: Currently using regex fallback parsing (Tree-sitter setup needs improvement)
- **Search Relevance**: Embedding model may need tuning for code-specific queries
- **Error Handling**: Some edge cases may not be gracefully handled
- **Documentation**: API documentation is minimal
- **Testing**: Limited test coverage, needs real-world validation

## 🙏 Feedback Needed

We're actively seeking feedback on:

- **Search Quality**: How relevant are the search results for your codebase?
- **Performance**: How does indexing and search speed feel in practice?
- **Usability**: Is the CLI interface intuitive and helpful?
- **Language Support**: Which languages would you like to see added next?
- **Features**: What functionality is missing for your workflow?

Please [open an issue](https://github.com/bobmatnyc/mcp-vector-search/issues) or start a [discussion](https://github.com/bobmatnyc/mcp-vector-search/discussions) to share your experience!

## 🔮 Roadmap

### v0.0.x: Alpha (Current) 🔄
- [x] Core CLI interface
- [x] Python/JS/TS parsing
- [x] ChromaDB integration
- [x] File watching
- [x] Basic search functionality
- [ ] Real-world testing and feedback
- [ ] Bug fixes and stability improvements
- [ ] Performance optimizations

### v0.1.x: Beta 🔮
- [ ] Advanced search modes (contextual, similar code)
- [ ] Additional language support (Java, Go, Rust)
- [ ] Configuration improvements
- [ ] Comprehensive testing suite
- [ ] Documentation improvements

### v1.0.x: Stable 🔮
- [ ] MCP server implementation
- [ ] IDE extensions (VS Code, JetBrains)
- [ ] Git integration
- [ ] Team collaboration features
- [ ] Production-ready performance

## 🛠️ Development

### Three-Stage Development Workflow

**Stage A: Local Development & Testing**
```bash
# Setup development environment
uv sync && uv pip install -e .

# Run development tests
./scripts/dev-test.sh

# Test CLI locally
uv run mcp-vector-search version
```

**Stage B: Local Deployment Testing**
```bash
# Build and test clean deployment
./scripts/deploy-test.sh

# Test on other projects
cd ~/other-project
mcp-vector-search init && mcp-vector-search index
```

**Stage C: PyPI Publication**
```bash
# Publish to PyPI
./scripts/publish.sh

# Verify published version
pip install mcp-vector-search --upgrade
```

### Quick Reference
```bash
./scripts/workflow.sh  # Show workflow overview
```

See [DEVELOPMENT.md](DEVELOPMENT.md) for detailed development instructions.

## 📚 Documentation

For comprehensive documentation, see **[CLAUDE.md](CLAUDE.md)** - the main documentation index.

### Quick Links
- **[Installation & Deployment](docs/DEPLOY.md)** - Setup and deployment guide
- **[Project Structure](docs/STRUCTURE.md)** - Architecture and file organization
- **[Contributing Guidelines](docs/developer/CONTRIBUTING.md)** - How to contribute
- **[API Reference](docs/developer/API.md)** - Internal API documentation
- **[Testing Guide](docs/developer/TESTING.md)** - Testing strategies
- **[Code Quality](docs/developer/LINTING.md)** - Linting and formatting
- **[Versioning](docs/VERSIONING.md)** - Version management
- **[Releases](docs/RELEASES.md)** - Release process

## 🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

## 📄 License

MIT License - see [LICENSE](LICENSE) file for details.

## 🙏 Acknowledgments

- [ChromaDB](https://github.com/chroma-core/chroma) for vector database
- [Tree-sitter](https://tree-sitter.github.io/) for parsing infrastructure
- [Sentence Transformers](https://www.sbert.net/) for embeddings
- [Typer](https://typer.tiangolo.com/) for CLI framework
- [Rich](https://rich.readthedocs.io/) for beautiful terminal output

---

**Built with ❤️ for developers who love efficient code search**
