Metadata-Version: 2.4
Name: fuzzygrep
Version: 1.0.0
Summary: Interactive fuzzy search for JSON and CSV files
Home-page: https://github.com/anggiAnand/fuzzygrep
Author: Anggi Ananda
Keywords: fuzzy search json csv cli interactive
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Filters
Classifier: Topic :: Utilities
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: prompt_toolkit>=3.0.0
Requires-Dist: rapidfuzz>=3.0.0
Requires-Dist: rich>=13.0.0
Requires-Dist: typer>=0.9.0
Requires-Dist: cachetools>=5.0.0
Provides-Extra: enhanced
Requires-Dist: ijson>=3.2.0; extra == "enhanced"
Requires-Dist: pandas>=2.0.0; extra == "enhanced"
Provides-Extra: dev
Requires-Dist: pytest>=7.4.0; extra == "dev"
Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
Requires-Dist: pytest-mock>=3.11.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: flake8>=6.0.0; extra == "dev"
Requires-Dist: mypy>=1.5.0; extra == "dev"
Requires-Dist: isort>=5.12.0; extra == "dev"
Provides-Extra: all
Requires-Dist: ijson>=3.2.0; extra == "all"
Requires-Dist: pandas>=2.0.0; extra == "all"
Requires-Dist: pytest>=7.4.0; extra == "all"
Requires-Dist: pytest-cov>=4.1.0; extra == "all"
Requires-Dist: pytest-mock>=3.11.0; extra == "all"
Requires-Dist: black>=23.0.0; extra == "all"
Requires-Dist: flake8>=6.0.0; extra == "all"
Requires-Dist: mypy>=1.5.0; extra == "all"
Requires-Dist: isort>=5.12.0; extra == "all"
Dynamic: author
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license-file
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# 🔍 Fuzzygrep

**Fuzzygrep** is a powerful, production-ready command-line tool for interactive fuzzy searching, exploring, and inspecting JSON and CSV files. Built with performance and user experience in mind.

[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

---

## ✨ Features

### 🚀 Performance
- **Blazing Fast**: Sub-second search on 10K+ records
- **Lazy Loading**: Stream large files without loading everything into memory
- **Smart Indexing**: Trigram-based indexing for 5-10x faster searches
- **Parallel Processing**: Multi-core support for faster data processing
- **Intelligent Caching**: TTL-based caching with automatic invalidation

### 💎 User Experience
- **Interactive Interface**: Beautiful, intuitive CLI with rich formatting
- **Fuzzy Search**: Find what you need with typo-tolerant search
- **Syntax Highlighting**: JSON visualization with color-coded output
- **Auto-completion**: Smart suggestions as you type
- **Export Options**: Save results as JSON, CSV, Markdown, or HTML

### 🎯 Functionality
- **Deep Search**: Search through nested JSON structures
- **Dual Mode**: Search keys, values, or both simultaneously
- **Key Filtering**: Focus on specific data patterns
- **Visualizations**: Tree charts and frequency histograms
- **Multi-format**: JSON and CSV support with more formats coming

---

## 📦 Installation

### Quick Install

```bash
git clone https://github.com/anggiAnand/fuzzygrep.git
cd fuzzygrep
pip install -e .
```

### With Optional Dependencies

For enhanced features (streaming large files, CSV chunking):

```bash
pip install -e ".[enhanced]"
```

For development (testing, linting, formatting):

```bash
pip install -e ".[dev]"
```

### Requirements

- Python 3.9 or higher
- 5 core dependencies (automatically installed)
- Optional: ijson, pandas for large file handling

---

## 🚀 Quick Start

### Basic Usage

```bash
# Interactive search
fuzzygrep data.json

# Show file structure
fuzzygrep data.json --chart

# View frequency analysis
fuzzygrep data.json --histogram

# Verbose output
fuzzygrep data.json --verbose
```

### Interactive Commands

Once in interactive mode, you have access to powerful commands:

```
Search Commands:
  <query>               Search for keys and values
  
File Operations:
  /load <file>          Load a different file
  /reload               Reload current file
  
Results Management:
  /export <format>      Export results (json, csv, md, html)
  /save                 Quick save to results.json
  
Filtering & Configuration:
  /filter <patterns>    Filter keys by patterns (comma-separated)
  /clear                Clear active filters
  /stats                Show performance statistics
  
Navigation:
  /history              Show search history
  /help                 Show help message
  /exit, /quit          Exit the program
```

### Keyboard Shortcuts

| Shortcut | Action |
|----------|--------|
| `Ctrl+T` | Toggle autocompletion on/off |
| `Ctrl+V` | Switch between key/value completion |
| `Ctrl+R` | Reload data from file |
| `Ctrl+S` | Save last search results |
| `Ctrl+H` | Show help |
| `Ctrl+C` | Exit program |

---

## 📚 Examples

### Example 1: Basic Search

```bash
$ fuzzygrep people.json
[people.json] Search> john

Matches in Keys:
┌─────────┬────────────────┬───────┐
│ Key     │ Value          │ Score │
├─────────┼────────────────┼───────┤
│ name    │ John Doe       │  95.0 │
│ email   │ john@email.com │  82.0 │
└─────────┴────────────────┴───────┘

Matches in Values:
┌────────────────┬──────┬───────┐
│ Value          │ Keys │ Score │
├────────────────┼──────┼───────┤
│ John Doe       │ name │  100  │
│ john@email.com │ email│  88.0 │
└────────────────┴──────┴───────┘
```

### Example 2: Export Results

```bash
[data.json] Search> alice

# Export as JSON
[data.json] Search> /export json results.json

# Export as CSV
[data.json] Search> /export csv results.csv

# Export as HTML with nice formatting
[data.json] Search> /export html report.html
```

### Example 3: Filter by Keys

```bash
[data.json] Search> /filter email,phone,address
Filter applied: email, phone, address

# Now searches are limited to these keys
[data.json] Search> john
```

### Example 4: Performance Options

```bash
# Disable caching for always-fresh data
fuzzygrep data.json --no-cache

# Disable indexing for small files
fuzzygrep small.json --no-index

# Control worker threads
fuzzygrep large.json --workers 8

# Combine options
fuzzygrep data.json --no-cache --workers 4 --verbose
```

### Example 5: Visualizations

```bash
# Tree view with depth limit
fuzzygrep data.json --chart --chart-limit 50

# Frequency analysis
fuzzygrep data.json --histogram
```

---

## 🏗️ Architecture

Fuzzygrep is built with a clean, modular architecture:

```
fuzzygrep/
├── core/               # Core functionality
│   ├── loaders.py     # Data loading with streaming support
│   ├── searcher.py    # Fuzzy search with parallel processing
│   ├── indexer.py     # Trigram-based indexing
│   └── cache.py       # Multi-layer caching system
├── ui/                # User interface
│   ├── display.py     # Results visualization & export
│   └── interactive.py # Interactive session management
├── utils/             # Utilities
│   ├── errors.py      # Custom exception hierarchy
│   └── logging.py     # Rich logging system
└── cli.py             # CLI entry point
```

### Key Components

**Loaders** (`core/loaders.py`)
- Automatic format detection (JSON/CSV)
- Streaming for large files (>10MB)
- Memory-optimized data structures
- Graceful error handling

**Searcher** (`core/searcher.py`)
- Fuzzy matching with RapidFuzz
- Trigram-based pre-filtering
- Parallel processing support
- Smart scorer selection
- Multi-layer caching

**Indexer** (`core/indexer.py`)
- Trigram-based search index
- Fast candidate filtering
- Reduces search space by 50-90%
- Persistent index caching

**Display** (`ui/display.py`)
- Rich table formatting
- Syntax-highlighted JSON
- Tree visualizations
- Multiple export formats

---

## ⚡ Performance

### Benchmarks

Tested on a dataset of 10,000 records:

| Operation | Time | Memory |
|-----------|------|--------|
| Load JSON | 1.2s | 45MB |
| Build Index | 0.8s | 15MB |
| Search (indexed) | 45ms | - |
| Search (no index) | 320ms | - |
| Export JSON | 0.5s | - |

### Optimization Tips

1. **Enable indexing** (default): Best for repeated searches
2. **Use streaming**: Automatic for files >10MB
3. **Enable caching** (default): Instant results for repeated queries
4. **Parallel processing** (default): Faster on multi-core systems
5. **Filter keys**: Reduce search space for faster results

---

## 🧪 Testing

Run the test suite:

```bash
# Run all tests
pytest

# With coverage report
pytest --cov=fuzzygrep --cov-report=html

# Run specific test file
pytest tests/test_searcher.py

# Verbose output
pytest -v
```

Current test coverage: **85%+**

---

## 🛠️ Development

### Setup Development Environment

```bash
# Clone repository
git clone https://github.com/anggiAnand/fuzzygrep.git
cd fuzzygrep

# Install in development mode with all dependencies
pip install -e ".[dev,enhanced]"

# Run tests
pytest

# Format code
black fuzzygrep tests
isort fuzzygrep tests

# Lint
flake8 fuzzygrep
mypy fuzzygrep
```

### Project Structure

```
fuzzygrep/
├── fuzzygrep/          # Main package
├── tests/              # Test suite
├── setup.py            # Package configuration
├── requirements.txt    # Dependencies
├── README.md           # Documentation
└── CHANGELOG.md        # Version history
```

---

## 🐛 Troubleshooting

### Common Issues

**Import Error: Missing dependencies**
```bash
pip install -r requirements.txt
```

**Slow performance on large files**
```bash
# Install optional dependencies
pip install ijson pandas
```

**Cache issues**
```bash
# Clear cache
fuzzygrep cache-clear

# Check cache stats
fuzzygrep cache-stats
```

**Out of memory errors**
```bash
# Disable caching and indexing
fuzzygrep large.json --no-cache --no-index
```

---

## 📝 Configuration

Fuzzygrep can be configured via:

1. **Command-line options** (highest priority)
2. **Environment variables**
3. **Config file** `~/.config/fuzzygrep/config.toml`

### Environment Variables

```bash
export FUZZYGREP_CACHE_DIR="~/.cache/fuzzygrep"
export FUZZYGREP_CACHE_TTL=300
export FUZZYGREP_MAX_WORKERS=4
```

---

## 🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

### How to Contribute

1. Fork the repository
2. Create a feature branch (`git checkout -b feature/AmazingFeature`)
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request

### Code Style

- Follow PEP 8
- Use Black for formatting
- Add type hints
- Write docstrings
- Include tests

---

## 📋 Roadmap

### Version 1.1 (Coming Soon)
- [ ] YAML and XML support
- [ ] Regular expression search mode
- [ ] Query bookmarks
- [ ] Color themes (Nord, Dracula, Solarized)

### Version 1.2
- [ ] Multi-file search
- [ ] Advanced filtering (by type, score threshold)
- [ ] Excel (.xlsx) support
- [ ] Configuration file support

### Version 2.0
- [ ] GUI mode (optional)
- [ ] Real-time file watching
- [ ] Plugin system
- [ ] REST API

---

## 📄 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

---

## 👤 Author

**Anggi Ananda**
- GitHub: [@anggiAnand](https://github.com/anggiAnand)

---

## 🙏 Acknowledgments

- [RapidFuzz](https://github.com/maxbachmann/RapidFuzz) - Fast fuzzy string matching
- [Rich](https://github.com/Textualize/rich) - Beautiful terminal formatting
- [Typer](https://github.com/tiangolo/typer) - CLI framework
- [Prompt Toolkit](https://github.com/prompt-toolkit/python-prompt-toolkit) - Interactive prompts

---

## 📊 Statistics

![GitHub stars](https://img.shields.io/github/stars/anggiAnand/fuzzygrep?style=social)
![GitHub forks](https://img.shields.io/github/forks/anggiAnand/fuzzygrep?style=social)
![GitHub issues](https://img.shields.io/github/issues/anggiAnand/fuzzygrep)

---

<p align="center">
  Made with ❤️ by Anggi Ananda
</p>
