Metadata-Version: 2.4
Name: largefile
Version: 0.1.8
Summary: MCP server for AI assistants to navigate, search, and edit large codebases, logs, and data files with semantic code analysis
Project-URL: Homepage, https://github.com/peteretelej/largefile
Project-URL: Repository, https://github.com/peteretelej/largefile.git
Project-URL: Issues, https://github.com/peteretelej/largefile/issues
Project-URL: Documentation, https://github.com/peteretelej/largefile#readme
Author-email: Peter Etelej <peter@etelej.com>
Maintainer-email: Peter Etelej <peter@etelej.com>
License: MIT
Keywords: ai-tools,code-navigation,large-files,llm,log-analysis,mcp,semantic-analysis,tree-sitter
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing
Classifier: Typing :: Typed
Requires-Python: >=3.10
Requires-Dist: chardet>=5.0.0
Requires-Dist: click>=8.2.1
Requires-Dist: mcp>=1.10.1
Requires-Dist: rapidfuzz>=3.0.0
Requires-Dist: tree-sitter-go>=0.21.0
Requires-Dist: tree-sitter-java>=0.21.0
Requires-Dist: tree-sitter-javascript>=0.21.0
Requires-Dist: tree-sitter-python>=0.21.0
Requires-Dist: tree-sitter-rust>=0.21.0
Requires-Dist: tree-sitter-typescript>=0.21.0
Requires-Dist: tree-sitter>=0.21.0
Description-Content-Type: text/markdown

# Largefile MCP Server

Navigate, search, and edit large codebases, logs, and data files that exceed AI context limits.

[![CI](https://img.shields.io/github/actions/workflow/status/peteretelej/largefile/ci.yml?branch=main&logo=github)](https://github.com/peteretelej/largefile/actions/workflows/ci.yml) [![codecov](https://codecov.io/gh/peteretelej/largefile/branch/main/graph/badge.svg)](https://codecov.io/gh/peteretelej/largefile) [![PyPI version](https://img.shields.io/pypi/v/largefile.svg)](https://pypi.org/project/largefile/) [![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

## Why Largefile?

- **Go beyond context limits** - Read, search, and edit files too large to fit in AI context windows
- **Semantic code navigation** - Tree-sitter extracts functions/classes for Python, JS/TS, Rust, Go
- **Fewer LLM errors** - Search/replace editing eliminates line number mistakes common with line-based edits
- **Smart search** - Fuzzy matching, regex, case-insensitive, inverted, and count-only modes
- **No size limits** - Handles multi-GB files via tiered memory strategy (RAM → mmap → streaming)

## Quick Start

**Prerequisite:** Install [uv](https://docs.astral.sh/uv/getting-started/installation/) for the `uvx` command.

```json
{
  "mcpServers": {
    "largefile": {
      "command": "uvx",
      "args": ["--from", "largefile", "largefile-mcp"]
    }
  }
}
```

## Tools

| Tool               | Use For                                                |
| ------------------ | ------------------------------------------------------ |
| `get_overview`     | File structure and semantic outline before diving in   |
| `search_content`   | Finding patterns, counting occurrences, regex matching |
| `read_content`     | Reading specific sections; tail/head modes for logs    |
| `edit_content`     | Safe search/replace with automatic backups             |
| `revert_edit`      | Recovering from bad edits                              |
| `list_directory`   | Browse directory trees with recursive depth control    |
| `search_directory` | Search patterns across all files in a directory        |

## When to Use Largefile

**Use when:**

- File exceeds ~1000 lines or 100KB (supports multi-GB files)
- Navigating large codebases with semantic structure
- Analyzing log files (especially recent entries with tail mode)
- Making search/replace edits across large files
- Counting occurrences without loading full content

**Don't use for:**

- Small files that fit in context (AI doesn't need help with those)
- Binary files (images, executables, compressed)

## Usage Examples

### Large Codebase Navigation

```pythonß
# Get semantic structure of a large Python file
overview = get_overview("/path/to/large_module.py")
# Returns: 2,847 lines, 15 classes, function outline via Tree-sitter

# Find all class definitions
classes = search_content("/path/to/large_module.py", "class ", fuzzy=False)

# Read complete class with semantic chunking
code = read_content("/path/to/large_module.py", pattern="class UserModel", mode="semantic")
```

### Batch Refactoring

```python
# Preview rename across file
preview = edit_content("/path/to/api.py", changes=[
    {"search": "process_data", "replace": "transform_data"},
    {"search": "old_endpoint", "replace": "new_endpoint"}
], preview=True)

# Apply changes (creates automatic backup)
result = edit_content("/path/to/api.py", changes=[...], preview=False)

# Undo if needed
revert_edit("/path/to/api.py")
```

### Log Analysis

```python
# Get log file overview
overview = get_overview("/var/log/app.log")
# Returns: 150,000 lines, 2.1GB

# Read last 500 lines efficiently
recent = read_content("/var/log/app.log", limit=500, mode="tail")

# Count errors without loading content
error_count = search_content("/var/log/app.log", "ERROR", count_only=True, fuzzy=False)

# Find errors with regex
errors = search_content("/var/log/app.log", r"ERROR.*timeout", regex=True)
```

## Supported Languages

Tree-sitter semantic analysis for: **Python**, **JavaScript/JSX**, **TypeScript/TSX**, **Rust**, **Go**, **Java**

Other file types use text-based analysis with graceful fallback.

## File Size Handling

| Size     | Strategy                                |
| -------- | --------------------------------------- |
| < 50MB   | Full memory loading with AST caching    |
| 50-500MB | Memory-mapped access                    |
| > 500MB  | Streaming (tail/head modes recommended) |

## Configuration

Environment variables for tuning:

```bash
LARGEFILE_MEMORY_THRESHOLD_MB=50      # RAM loading limit
LARGEFILE_MMAP_THRESHOLD_MB=500       # Memory mapping limit
LARGEFILE_FUZZY_THRESHOLD=0.8         # Match sensitivity (0.0-1.0)
LARGEFILE_MAX_SEARCH_RESULTS=20       # Results per search
LARGEFILE_BACKUP_DIR=~/.largefile/backups
```

## Documentation

- [API Reference](docs/API.md) - Detailed tool documentation
- [Configuration Guide](docs/configuration.md) - All environment variables
- [Examples](docs/examples.md) - More workflow examples
- [Design Document](docs/design.md) - Architecture details
- [Contributing](docs/CONTRIBUTING.md) - Development setup
