Metadata-Version: 2.4
Name: memvid
Version: 0.1.0
Summary: Video-based AI memory library for fast semantic search and retrieval
Home-page: https://github.com/olow304/memvid
Author: Memvid Team
Author-email: team@memvid.ai
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Multimedia :: Video
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: qrcode[pil]==8.2
Requires-Dist: opencv-python==4.11.0.86
Requires-Dist: pyzbar==0.1.9
Requires-Dist: sentence-transformers==4.1.0
Requires-Dist: numpy==1.26.4
Requires-Dist: openai==1.82.0
Requires-Dist: tqdm==4.67.1
Requires-Dist: faiss-cpu==1.7.4
Requires-Dist: Pillow==11.2.1
Requires-Dist: python-dotenv==1.1.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: flake8>=6.0.0; extra == "dev"
Provides-Extra: pdf
Requires-Dist: PyPDF2==3.0.1; extra == "pdf"
Provides-Extra: web
Requires-Dist: fastapi>=0.100.0; extra == "web"
Requires-Dist: gradio>=4.0.0; extra == "web"
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license-file
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# Memvid - Video-Based AI Memory 🧠📹

**The lightweight, game-changing solution for AI memory at scale**

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)

Memvid revolutionizes AI memory management by encoding text data into videos, enabling **lightning-fast semantic search** across millions of text chunks with **sub-second retrieval times**. Unlike traditional vector databases that consume massive amounts of RAM and storage, Memvid compresses your knowledge base into compact video files while maintaining instant access to any piece of information.

## 🚀 Why Memvid?

### Game-Changing Innovation
- **Video as Database**: Store millions of text chunks in a single MP4 file
- **Instant Retrieval**: Sub-second semantic search across massive datasets
- **10x Storage Efficiency**: Video compression reduces memory footprint dramatically
- **Zero Infrastructure**: No database servers, just files you can copy anywhere
- **Offline-First**: Works completely offline once videos are generated

### Lightweight Architecture
- **Minimal Dependencies**: Core functionality in ~1000 lines of Python
- **CPU-Friendly**: Runs efficiently without GPU requirements
- **Portable**: Single video file contains your entire knowledge base
- **Streamable**: Videos can be streamed from cloud storage

## 📦 Installation

### Prerequisites
```bash
# Ubuntu/Debian
sudo apt-get install libzbar0

# macOS
brew install zbar

# Windows
# Download and install from https://sourceforge.net/projects/zbarw/
```

### Install from PyPI
```bash
pip install memvid
```

### Install from Source
```bash
git clone https://github.com/yourusername/memvid.git
cd memvid
pip install -e .
```

### Development Installation
```bash
# Create virtual environment
python -m venv .memvid
source .memvid/bin/activate  # On Windows: .memvid\Scripts\activate

# Install with development dependencies
pip install -e ".[dev]"
```

## 🎯 Quick Start

### Basic Usage
```python
from memvid import MemvidEncoder, MemvidChat

# Create video memory from text chunks
chunks = ["Important fact 1", "Important fact 2", "Historical event details", ...]
encoder = MemvidEncoder()
encoder.add_chunks(chunks)
encoder.build_video("memory.mp4", "memory_index.json")

# Chat with your memory
chat = MemvidChat("memory.mp4", "memory_index.json")
chat.start_session()
response = chat.chat("What do you know about historical events?")
print(response)
```

### Building Memory from Documents
```python
from memvid import MemvidEncoder
import os

# Load documents
encoder = MemvidEncoder(chunk_size=512, overlap=50)

# Add text files
for file in os.listdir("documents"):
    with open(f"documents/{file}", "r") as f:
        encoder.add_text(f.read(), metadata={"source": file})

# Build optimized video
encoder.build_video(
    "knowledge_base.mp4",
    "knowledge_index.json",
    fps=30,  # Higher FPS = more chunks per second
    frame_size=512  # Larger frames = more data per frame
)
```

### Advanced Search & Retrieval
```python
from memvid import MemvidRetriever

# Initialize retriever
retriever = MemvidRetriever("knowledge_base.mp4", "knowledge_index.json")

# Semantic search
results = retriever.search("machine learning algorithms", top_k=5)
for chunk, score in results:
    print(f"Score: {score:.3f} | {chunk[:100]}...")

# Get context window
context = retriever.get_context("explain neural networks", max_tokens=2000)
print(context)
```

### Interactive Chat Interface
```python
from memvid import MemvidInteractive

# Launch interactive chat UI
interactive = MemvidInteractive("knowledge_base.mp4", "knowledge_index.json")
interactive.run()  # Opens web interface at http://localhost:7860
```

## 📊 Performance Benchmarks

| Dataset Size | Encoding Time | Search Time | Storage Size | RAM Usage |
|-------------|--------------|-------------|--------------|-----------|
| 10K chunks | 45 seconds | 0.05s | 12 MB | 50 MB |
| 100K chunks | 8 minutes | 0.15s | 120 MB | 200 MB |
| 1M chunks | 80 minutes | 0.8s | 1.2 GB | 800 MB |
| 10M chunks | 13 hours | 1.5s | 12 GB | 2 GB |

*Benchmarked on Intel i7-9700K, 16GB RAM, SSD storage*

## 🔧 API Reference

### MemvidEncoder
```python
encoder = MemvidEncoder(
    chunk_size=512,      # Characters per chunk
    overlap=50,          # Character overlap between chunks
    model_name='all-MiniLM-L6-v2'  # Sentence transformer model
)

# Methods
encoder.add_chunks(chunks: List[str], metadata: List[dict] = None)
encoder.add_text(text: str, metadata: dict = None)
encoder.build_video(video_path: str, index_path: str, fps: int = 30, qr_size: int = 512)
```

### MemvidRetriever
```python
retriever = MemvidRetriever(
    video_path: str,
    index_path: str,
    cache_size: int = 100  # Number of frames to cache
)

# Methods
results = retriever.search(query: str, top_k: int = 5)
context = retriever.get_context(query: str, max_tokens: int = 2000)
chunks = retriever.get_chunks_by_ids(chunk_ids: List[int])
```

### MemvidChat
```python
chat = MemvidChat(
    video_path: str,
    index_path: str,
    llm_backend: str = 'openai',  # 'openai', 'anthropic', 'local'
    model: str = 'gpt-4'
)

# Methods
chat.start_session(system_prompt: str = None)
response = chat.chat(message: str, stream: bool = False)
chat.clear_history()
chat.export_conversation(path: str)
```

## 🛠️ Advanced Configuration

### Custom Embeddings
```python
from sentence_transformers import SentenceTransformer

# Use custom embedding model
custom_model = SentenceTransformer('sentence-transformers/all-mpnet-base-v2')
encoder = MemvidEncoder(embedding_model=custom_model)
```

### Video Optimization
```python
# For maximum compression
encoder.build_video(
    "compressed.mp4",
    "index.json",
    fps=60,  # More frames per second
    frame_size=256,  # Smaller frames
    video_codec='h265',  # Better compression
    crf=28  # Compression quality (lower = better quality)
)
```

### Distributed Processing
```python
# Process large datasets in parallel
encoder = MemvidEncoder(n_workers=8)
encoder.add_chunks_parallel(massive_chunk_list)
```

## 🐛 Troubleshooting

### Common Issues

**Video Decode Errors**
```bash
# Increase error correction
encoder.build_video(..., error_correction='H')  # Highest error correction
```

**Memory Issues with Large Datasets**
```python
# Use streaming mode
encoder = MemvidEncoder(streaming=True)
for batch in chunk_generator():
    encoder.add_chunks_batch(batch)
```

**Slow Search Performance**
```python
# Pre-warm cache
retriever.warm_cache(common_queries=['AI', 'machine learning', 'data'])
```

## 🤝 Contributing

We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.

```bash
# Run tests
pytest tests/

# Run with coverage
pytest --cov=memvid tests/

# Format code
black memvid/
```

## 📚 Examples

Check out the [examples/](examples/) directory for:
- Building memory from Wikipedia dumps
- Creating a personal knowledge base
- Multi-language support
- Real-time memory updates
- Integration with popular LLMs

## 🔗 Links

- [Documentation](https://memvid.readthedocs.io)
- [PyPI Package](https://pypi.org/project/memvid)
- [GitHub Repository](https://github.com/yourusername/memvid)
- [Discord Community](https://discord.gg/memvid)

## 📄 License

MIT License - see [LICENSE](LICENSE) file for details.

## 🙏 Acknowledgments

Built with ❤️ using:
- [sentence-transformers](https://www.sbert.net/) for embeddings
- [OpenCV](https://opencv.org/) for video processing
- [qrcode](https://github.com/lincolnloop/python-qrcode) & [pyzbar](https://github.com/NaturalHistoryMuseum/pyzbar) for data encoding

---

**Ready to revolutionize your AI memory management? Install Memvid and start building!** 🚀
