Metadata-Version: 2.4
Name: gurrt
Version: 0.1.4
Summary: An Intelligent Open-Source Video Understanding System A different path from traditional Large Video Language Models (LVLMs). Built for modularity, openness, and real-world usability.
Author-email: Mohammad Owais <owaismohammad2515@gmail.com>, Fareha Aslam <farehaaslam57@gmail.com>
License-Expression: MIT
Requires-Python: >=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: opencv-python>=4.13.0.92
Requires-Dist: transformers>=5.1.0
Requires-Dist: accelerate>=1.12.0
Requires-Dist: pillow>=12.1.0
Requires-Dist: chromadb>=1.4.1
Requires-Dist: ollama>=0.6.1
Requires-Dist: langchain>=1.2.9
Requires-Dist: langchain-groq>=1.1.2
Requires-Dist: moviepy>=1.0.3
Requires-Dist: sentence-transformers>=5.2.2
Requires-Dist: tqdm>=4.67.3
Requires-Dist: scenedetect>=0.6.7.1
Requires-Dist: scikit-image>=0.26.0
Requires-Dist: faster-whisper>=1.2.1
Requires-Dist: langchain-text-splitters>=1.1.0
Requires-Dist: fastapi>=0.128.7
Requires-Dist: pydantic>=2.12.5
Requires-Dist: supermemory>=3.24.0
Requires-Dist: platformdirs>=4.5.1
Requires-Dist: typer>=0.21.1
Requires-Dist: opencv-python-headless>=4.13.0.92
Provides-Extra: cuda
Requires-Dist: torch; extra == "cuda"
Requires-Dist: torchvision; extra == "cuda"
Requires-Dist: torchaudio; extra == "cuda"
Dynamic: license-file

# gurrt
An intelligent video understanding system designed as an open-source alternative to monolithic Large Video Language Models

I built gurrt out of frustration.

Working with Large Video Language Models locally is:

- Expensive to set up  
- GPU intensive  
- Slow to experiment with  
- Difficult to run on consumer hardware  
- Often closed or partially restricted  

Most state-of-the-art video models require massive compute clusters and large-scale infrastructure.  
They are impressive — but they are not accessible.

If meaningful video intelligence requires:

- Multiple high-end GPUs  
- Hours of inference time  
- Proprietary model access  

Then it stops feeling truly open.

---

### A Different Philosophy

gurrt is not an attempt to compete with systems like YouTube’s internal models or other large-scale industrial LVLMs trained on massive GPU clusters.

It is an attempt to rethink the approach.

Instead of asking how to build a larger end-to-end video transformer, it explores a different path:

- Smarter frame sampling techniques  
- Stronger and more modular vision models  
- Better structured embedding strategies  
- More efficient and grounded RAG pipelines  
- Persistent memory-driven reasoning  

It represents a belief that meaningful video understanding can emerge from:

- Thoughtful engineering  
- Smart sampling  
- Strong modular components  
- Memory-augmented retrieval  

Not just from massive GPU clusters and billion-parameter models.


## Architecture Overview
```bash
Video
  │
  ├── Smart Frame Extraction
  │     └── Captioning + Embeddings
  │
  ├── Audio Extraction
  │     └── Speech-to-Text + Embeddings
  │
  ├── Vector Memory Store
  │
  ├── Supermemory (Persistent Conversation Layer)
  │
  └── LLM Reasoning Engine
```

## Project Setup (using uv)

```bash
# Install uv if you haven't already
pip install uv

# Sync dependencies
uv sync

# Activate environment
.venv\Scripts\activate
```

## File Structure

```bash
gurrt/
├── src/
│   |
│   │
│   └── videorag/                      # Core Video-RAG application package
│       │
│       ├── api/
│       │   └── server.py              # API server (exposes endpoints for querying, ingestion, etc.)
│       │
│       ├── cli/
│       │   └── main.py                # CLI entry point (init, ingest, query commands)
│       │
│       ├── config/
│       │   └── config.py              # Configuration management (API keys, paths, environment setup)
│       │
│       ├── core/                      # Core intelligence pipeline
│       │   ├── __init__.py
│       │   ├── asr.py                 # Audio extraction + speech-to-text processing
│       │   ├── embedding.py           # Embedding generation for captions & transcripts
│       │   ├── llm.py                 # LLM interaction and reasoning logic
│       │   ├── models.py              # Model loading and management utilities
│       │   ├── pipeline.py            # End-to-end ingestion + query pipeline orchestration
│       │   ├── prompts.py             # Prompt templates and structured context injection
│       │   ├── search.py              # Retrieval logic (semantic search over stored embeddings)
│       │   └── vectordb.py            # Vector database interface and storage abstraction
│       │
│       └── utils/
│           └── utils.py            # Shared utility functions and helpers
│
└── README.md                         # Project documentation
```



