Metadata-Version: 2.4
Name: matrice_vss
Version: 0.1.2
Summary: Video Search & Summarization service for Matrice.ai
Author-email: "Matrice.ai" <dipendra@matrice.ai>
License: MIT
Project-URL: Homepage, https://matrice.ai
Project-URL: Repository, https://github.com/matrice-ai/py_vss
Keywords: matrice,vss,video,search,summarization,langgraph,llm,vlm
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Operating System :: POSIX :: Linux
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Typing :: Typed
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE.txt
Requires-Dist: pyyaml>=6.0.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Dynamic: license-file
Dynamic: requires-python

# VSS — Video Search & Summarization Service

> **Production-grade natural language query engine for video analytics data.**  
> Ask questions in plain English. Get answers from your ClickHouse database or directly from camera frames — no SQL knowledge required.

---

## Table of Contents

1. [Executive Summary](#1-executive-summary)
2. [Project Structure](#2-project-structure)
3. [Tech Stack](#3-tech-stack)
4. [High-Level Architecture](#4-high-level-architecture)
5. [Production Workflow](#5-production-workflow)
6. [Data Schema](#6-data-schema)
7. [Configuration & Environment Variables](#7-configuration--environment-variables)
8. [How to Start the Project](#8-how-to-start-the-project)
9. [Docker Setup on H100](#9-docker-setup-on-h100)
10. [Ports Reference](#10-ports-reference)
11. [API Reference](#11-api-reference)
12. [Logging & Observability](#12-logging--observability)
13. [Testing & Validation](#13-testing--validation)
14. [Limitations & Open Items](#14-limitations--open-items)

---

## 1. Executive Summary

VSS is a **two-mode AI query service** built for video analytics platforms. It sits on top of a ClickHouse analytics database and (optionally) a Vision Language Model, exposing a single clean REST API that accepts natural language.

**What it does:**

- **Mode 1 — SQL Analytics:** Accepts a text question (e.g. *"How many vehicles were detected on Camera 5 yesterday?"*), classifies intent, auto-generates a safe read-only ClickHouse SQL query via an LLM, executes it, and returns a human-readable answer.
- **Mode 2 — Visual Analysis:** Accepts a question + a base64-encoded image (e.g. a camera frame), routes it to a Vision Language Model (`nvidia/Cosmos-Reason2-8B`), and returns a scene-level description or answer.

**Design principles:**


| Principle             | Implementation                                               |
| --------------------- | ------------------------------------------------------------ |
| SQL-only by default   | VLM is fully disabled unless `VSS_VLM_ENABLED=true`          |
| Zero hardcodes        | Every credential, host, port, and GPU index is config-driven |
| Safe by default       | All SQL is validated — only `SELECT`/`WITH` allowed          |
| Pluggable backends    | LLM backend switchable between Ollama and vLLM               |
| Production-observable | Three rotating log files, structured per-request traces      |


---

## 2. Project Structure

```
vss/
├── main.py                         # FastAPI app entrypoint, CORS, lifespan
├── requirements.txt                # Python dependencies
├── .env                            # Runtime secrets (gitignored)
├── .env.example                    # Template — copy to .env
├── checklist.md                    # Code alignment checklist
│
├── config/
│   ├── settings.py                 # Pydantic BaseSettings — single config source
│   └── logging_config.py           # Rotating log handlers + structured trace writer
│
├── api/
│   ├── routes.py                   # FastAPI endpoints: /query, /query/media, /health
│   └── schemas.py                  # Request/Response Pydantic models
│
├── orchestration/
│   ├── graph.py                    # VSSOrchestrator: LangGraph state machine
│   └── state.py                    # VSSState TypedDict, helpers: add_trace, set_error
│
├── agents/
│   ├── routing_agent.py            # Intent classification: SQL vs VLM
│   ├── sql_agent.py                # LangGraph SQL sub-graph: table selection → execute → retry
│   ├── vlm_agent.py                # VLM inference wrapper (requires vlm_enabled=True)
│   └── response_composer.py        # LLM-based natural language response formatter
│
├── inference/
│   ├── llm_client.py               # Ollama / vLLM text LLM client
│   └── vlm_client.py               # Cosmos-Reason2-8B via vLLM OpenAI-compat client
│
├── storage/
│   └── clickhouse_client.py        # ClickHouse driver: connect, execute, health check
│
├── scripts/
│   ├── launch_vllm.sh              # Bash script to start vLLM server on target GPU
│   ├── launch_ollama.sh            # Bash script to start Ollama server
│   ├── vss_demo_app.py             # Gradio UI for local testing
│   ├── vss_demo_streamlit.py       # Streamlit UI for local testing
│   ├── test_connections.py         # Component connectivity smoke test
│   └── test_vlm_path.py            # End-to-end VLM path smoke test
│
├── tests/
│   ├── run_tests.py                # Test runner (no pytest required)
│   ├── conftest.py                 # Shared fixtures
│   ├── shared.py                   # Logging helpers for tests
│   ├── test_agents/
│   │   ├── test_routing.py         # RoutingAgent unit tests
│   │   ├── test_sql_agent.py       # SQL agent + sanitizer unit tests
│   │   └── test_llm_init.py        # LLM client init tests
│   ├── test_e2e/
│   │   ├── test_api.py             # API endpoint integration tests
│   │   └── test_mode1_flow.py      # Full Mode 1 flow integration test
│   └── test_orchestration/
│       └── test_graph.py           # LangGraph orchestration tests
│
└── logs/                           # Auto-created on first run (gitignored)
    ├── vss_app.log                 # All INFO+ messages (rotates at 10 MB, 7 backups)
    ├── vss_trace.log               # Per-request trace blocks with full pipeline steps
    └── vss_errors.log              # WARNING/ERROR/CRITICAL only
```

---

## 3. Tech Stack


| Layer             | Technology                         | Purpose                                                     |
| ----------------- | ---------------------------------- | ----------------------------------------------------------- |
| **API**           | FastAPI + Uvicorn                  | Async REST server                                           |
| **Orchestration** | LangGraph                          | State-machine agent pipeline                                |
| **LLM (text)**    | Ollama (`llama3.1:8b`)             | SQL generation, routing, response formatting                |
| **VLM (vision)**  | vLLM + `nvidia/Cosmos-Reason2-8B`  | Visual scene analysis                                       |
| **Database**      | ClickHouse                         | Video analytics data store                                  |
| **DB Driver**     | `clickhouse-connect`               | Python ClickHouse client (also provides SQLAlchemy dialect) |
| **LangChain**     | `langchain`, `langchain-community` | LLM tooling, SQL database abstraction                       |
| **Config**        | `pydantic-settings` v2             | Type-safe, env-driven configuration                         |
| **HTTP client**   | `httpx`                            | Async calls to vLLM API                                     |
| **Demo UI**       | Gradio / Streamlit                 | Local testing interfaces                                    |


**Python version:** 3.10+ recommended  
**GPU requirements:** NVIDIA H100 / A100 / RTX (for VLM); CPU-only works for SQL-only mode

---

## 4. High-Level Architecture

```
┌──────────────────────────────────────────────────────────────────┐
│                         CLIENT / UI                              │
│          (Gradio Demo / Streamlit / REST API consumer)           │
└───────────────────────────┬──────────────────────────────────────┘
                            │  HTTP POST
                            ▼
┌──────────────────────────────────────────────────────────────────┐
│                      FastAPI  (main.py)                          │
│   /api/v1/query         /api/v1/query/media      /api/v1/health  │
└───────────────────────────┬──────────────────────────────────────┘
                            │  async call
                            ▼
┌──────────────────────────────────────────────────────────────────┐
│                  VSSOrchestrator  (LangGraph)                    │
│                                                                  │
│   ┌─────────────┐                                                │
│   │  CLASSIFY   │  ◄── RoutingAgent                              │
│   │  (intent)   │      Rule-based patterns + LLM fallback        │
│   └──────┬──────┘                                                │
│          │ intent=sql            intent=vlm (if enabled)         │
│   ┌──────▼──────┐          ┌──────────────┐                     │
│   │  SQL AGENT  │          │   VLM AGENT  │                     │
│   │  (LangGraph │          │  (Cosmos-R2) │                     │
│   │   sub-graph)│          └──────┬───────┘                     │
│   └──────┬──────┘                 │                             │
│          │ escalate (optional)    │                             │
│          └───────────┬────────────┘                             │
│                      ▼                                           │
│              ┌───────────────┐                                   │
│              │    COMPOSE    │  ◄── ResponseComposer (LLM)       │
│              │   (response)  │                                   │
│              └───────┬───────┘                                   │
│                      ▼ END                                       │
└──────────────────────────────────────────────────────────────────┘
                            │
          ┌─────────────────┴──────────────────┐
          ▼                                     ▼
┌──────────────────┐                 ┌──────────────────────┐
│   ClickHouse DB  │                 │  vLLM Server         │
│  (analytics data)│                 │  Cosmos-Reason2-8B   │
└──────────────────┘                 │  OpenAI-compat API   │
                                     └──────────────────────┘
```

---

## 5. Production Workflow

### 4.1 SQL Path (default — always active)

```
User Query (text)
      │
      ▼
RoutingAgent.classify()
  ├─ Step 1: Check for uploaded media → if present, force VLM
  ├─ Step 2: Rule-based regex scoring (SQL patterns vs VLM patterns)
  │          Confidence ≥ 0.8 → decided
  └─ Step 3: LLM classification (for ambiguous queries)
      │
      │ intent = "sql"
      ▼
SQLAgent (inner LangGraph)
  ├─ route_query     → LLM decides if DB lookup is needed
  ├─ discover_tables → list available ClickHouse tables
  ├─ select_tables   → LLM selects the minimal relevant tables (max 5)
  ├─ load_schema     → fetch schema + semantic column samples
  ├─ generate_sql    → LLM writes a read-only SELECT query
  ├─ check_sql       → LLM query-checker validates & corrects SQL
  ├─ execute_sql     → run against ClickHouse; blocked if non-SELECT
  └─ assess_execution → retry loop (up to VSS_SQL_MAX_RETRIES=2)
      │
      ▼
ResponseComposer
  └─ LLM converts raw tabular data → natural language answer
      │
      ▼
API Response (JSON)
```

### 4.2 VLM Path (requires `VSS_VLM_ENABLED=true`)

```
User Query + Base64 Image
      │
      ▼
API guard: /query/media checks vlm_enabled → 503 if disabled
      │
      ▼
RoutingAgent → intent = "vlm" (media always forces VLM)
      │
      ▼
VLMAgent.process()
  ├─ Guard: re-checks vlm_enabled
  ├─ Health check: probes vLLM /models endpoint
  └─ Sends multimodal payload to Cosmos-Reason2-8B via vLLM
      │
      ▼
ResponseComposer → format VLM output as conversational answer
      │
      ▼
API Response (JSON)
```

### 4.3 SQL → VLM Escalation (optional)

When both `VSS_VLM_ENABLED=true` and `VSS_ENABLE_VLM_ESCALATION=true`, the SQL agent can escalate to VLM if it exhausts retries or returns empty results. This is disabled by default.

---

## 6. Data Schema

### Tables in ClickHouse (`default` database)

The SQL agent is aware of three tables. The routing agent selects the minimal set needed per query.

---

#### `aggregated_analytics_totals`

Aggregated counts and statistics for all analytics applications over configurable time periods.


| Column            | Type                   | Description                                                                                                                                            |
| ----------------- | ---------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `_idApplication`  | `String`               | Unique application ID                                                                                                                                  |
| `_idCamera`       | `String`               | Unique camera ID                                                                                                                                       |
| `cameraName`      | `String`               | Human-readable camera name                                                                                                                             |
| `cameraGroup`     | `String`               | Camera group/zone label                                                                                                                                |
| `location`        | `String`               | Physical location description                                                                                                                          |
| `applicationName` | `String`               | App type: `Fire Safety Monitoring`, `License Plate Recognition`, `People Counting`, `People in Zone Counting`, `Vehicle Type Monitoring`               |
| `period`          | `String`               | Aggregation period: `minutes`, `hourly`, `daily`, `monthly`                                                                                            |
| `periodStart`     | `DateTime64(3, 'UTC')` | Window start (UTC, millisecond precision)                                                                                                              |
| `periodEnd`       | `DateTime64(3, 'UTC')` | Window end (UTC, millisecond precision)                                                                                                                |
| `category`        | `String`               | Detected object/event class: `Fire`, `Fire/Smoke`, `Smoke`, `License_Plate`, `license_plate`, `person`, `car`, `truck`, `bus`, `motorcycle`, `bicycle` |
| `count`           | `Int64`                | Aggregated detection count in window                                                                                                                   |
| `row_count`       | `Int64`                | Number of raw rows aggregated                                                                                                                          |


**Common queries:** total detections, time-series trends, per-category breakdowns, camera comparisons.

---

#### `alert_instances`

Individual alert events generated by analytics applications for each camera.


| Column            | Type         | Description                          |
| ----------------- | ------------ | ------------------------------------ |
| `_idCamera`       | `String`     | Camera that triggered the alert      |
| `applicationName` | `String`     | Application that generated the alert |
| `alertType`       | `String`     | Category/type of alert               |
| `timestamp`       | `DateTime64` | When the alert occurred              |
| `severity`        | `String`     | Alert severity level                 |


**Common queries:** alert counts, alert history, recent alerts per camera.

---

#### `incidents`

Detected incidents with severity classification across applications and cameras.


| Column            | Type         | Description                       |
| ----------------- | ------------ | --------------------------------- |
| `_idCamera`       | `String`     | Camera that detected the incident |
| `incidentType`    | `String`     | Type of incident                  |
| `severity`        | `String`     | Severity level                    |
| `timestamp`       | `DateTime64` | When the incident was detected    |
| `applicationName` | `String`     | Application that classified it    |


**Common queries:** incident statistics, severity breakdowns, incident history.

---

### SQL Rules Enforced by the Agent

- All queries are `SELECT` or `WITH` only — writes are blocked at the tool level
- ClickHouse date functions: use `yesterday()`, `today()`, `now()` — not `INTERVAL` syntax
- Always qualify: `SELECT ... FROM default.<table_name>`
- Time filtering: `WHERE periodStart >= 'YYYY-MM-DD' AND periodStart < 'YYYY-MM-DD'`
- For totals: `SUM(count)` — not `COUNT(*)`
- For breakdowns: `GROUP BY category`

---

## 7. Configuration & Environment Variables

All configuration is controlled by `VSS_`-prefixed environment variables. Nested models use `__` as the delimiter.

Copy `.env.example` to `.env` and fill in your values:

```bash
cp .env.example .env
```

### Full Variable Reference


| Variable                          | Default                      | Description                                           |
| --------------------------------- | ---------------------------- | ----------------------------------------------------- |
| **Service**                       |                              |                                                       |
| `VSS_SERVICE_NAME`                | `vss`                        | Service identifier in logs                            |
| `VSS_API_HOST`                    | `0.0.0.0`                    | Bind host for Uvicorn                                 |
| `VSS_API_PORT`                    | `8080`                       | Bind port for Uvicorn                                 |
| `VSS_LOG_LEVEL`                   | `INFO`                       | Root log level (`DEBUG`/`INFO`/`WARNING`/`ERROR`)     |
| `VSS_CORS_ALLOWED_ORIGINS`        | `["*"]`                      | CORS allowed origins — restrict in production         |
| **Feature Flags**                 |                              |                                                       |
| `VSS_VLM_ENABLED`                 | `false`                      | `true` to enable VLM path and `/query/media` endpoint |
| `VSS_ENABLE_VLM_ESCALATION`       | `false`                      | `true` to allow SQL → VLM escalation on empty results |
| **ClickHouse**                    |                              |                                                       |
| `VSS_CLICKHOUSE__HOST`            | `localhost`                  | ClickHouse server hostname or IP                      |
| `VSS_CLICKHOUSE__PORT`            | `8123`                       | ClickHouse HTTP port                                  |
| `VSS_CLICKHOUSE__DATABASE`        | `default`                    | Target database name                                  |
| `VSS_CLICKHOUSE__USER`            | `default`                    | ClickHouse username                                   |
| `VSS_CLICKHOUSE__PASSWORD`        | *(empty)*                    | ClickHouse password — never hardcode                  |
| `VSS_CLICKHOUSE__TABLE`           | `aggregated_analytics_final` | Primary analytics table                               |
| **LLM (Ollama)**                  |                              |                                                       |
| `VSS_LLM__ENDPOINT`               | `http://localhost:11434`     | Ollama server base URL                                |
| `VSS_LLM__MODEL_NAME`             | `llama3.1:8b`                | Ollama model tag                                      |
| `VSS_LLM__MAX_TOKENS`             | `512`                        | Max generation tokens                                 |
| `VSS_LLM__TEMPERATURE`            | `0.1`                        | Sampling temperature                                  |
| `VSS_LLM__GPU_DEVICE`             | `0`                          | CUDA device index for Ollama                          |
| `VSS_LLM__USE_VLLM`               | `false`                      | Route LLM traffic through vLLM instead of Ollama      |
| `VSS_LLM__VLLM_ENDPOINT`          | `http://localhost:8001/v1`   | vLLM endpoint for LLM (if `USE_VLLM=true`)            |
| **VLM (Cosmos-Reason2-8B)**       |                              |                                                       |
| `VSS_VLM__ENDPOINT`               | `http://localhost:8000/v1`   | vLLM OpenAI-compatible base URL                       |
| `VSS_VLM__MODEL_NAME`             | `nvidia/Cosmos-Reason2-8B`   | HuggingFace model ID                                  |
| `VSS_VLM__MAX_TOKENS`             | `1024`                       | Max VLM generation tokens                             |
| `VSS_VLM__TEMPERATURE`            | `0.1`                        | VLM sampling temperature                              |
| `VSS_VLM__GPU_DEVICE`             | `1`                          | CUDA device index for vLLM                            |
| `VSS_VLM__GPU_MEMORY_UTILIZATION` | `0.85`                       | vLLM GPU memory fraction (0.0–1.0)                    |
| **SQL Agent Tuning**              |                              |                                                       |
| `VSS_SQL_MAX_RETRIES`             | `2`                          | Max SQL generation retry attempts                     |
| `VSS_SQL_DIALECT`                 | `clickhouse`                 | SQL dialect for query generation                      |
| `VSS_SQL_TOP_K_TABLES`            | `5`                          | Max tables forwarded to LLM for selection             |
| `VSS_SQL_SAMPLE_ROWS_PER_TABLE`   | `0`                          | Sample rows shown in schema context (0 = none)        |
| `VSS_SQL_SCHEMA_LOADING_MODE`     | `selected_tables`            | `selected_tables` or `all_tables`                     |
| `VSS_SQL_ENFORCE_READ_ONLY`       | `true`                       | Block non-SELECT/WITH SQL at tool level               |


---

## 8. How to Start the Project

### Prerequisites

Ensure the following infrastructure is running before starting VSS:

- **ClickHouse** — accessible at `VSS_CLICKHOUSE__HOST:VSS_CLICKHOUSE__PORT`
- **Ollama** — serving `llama3.1:8b` (for SQL-only mode)
- **vLLM** — required only when `VSS_VLM_ENABLED=true`

### Step 1 — Clone & Install

```bash
cd vss/
pip install -r requirements.txt
```

### Step 2 — Configure

```bash
cp .env.example .env
# Edit .env with your ClickHouse credentials, LLM/VLM endpoints
```

Minimum required values in `.env`:

```dotenv
VSS_CLICKHOUSE__HOST=<your-clickhouse-host>
VSS_CLICKHOUSE__USER=<your-user>
VSS_CLICKHOUSE__PASSWORD=<your-password>
VSS_LLM__ENDPOINT=http://localhost:11434
```

### Step 3 — Start Ollama (LLM)

```bash
bash scripts/launch_ollama.sh
# Or manually:
OLLAMA_HOST=0.0.0.0 ollama serve
ollama pull llama3.1:8b
```

### Step 4 — (Optional) Start VLM Server

Only needed when `VSS_VLM_ENABLED=true`:

```bash
# Override GPU and port via env vars
VLM_GPU=1 VLM_PORT=8000 bash scripts/launch_vllm.sh
```

This starts `nvidia/Cosmos-Reason2-8B` via vLLM on port 8000, binding to all interfaces.

### Step 5 — Start the VSS API

```bash
# Default (uses settings from .env)
python main.py

# Explicit Uvicorn with reload
uvicorn main:app --host 0.0.0.0 --port 8080 --reload

# Override port via env var
VSS_API_PORT=9090 python main.py
```

### Step 6 — (Optional) Launch Demo UI

```bash
# Gradio UI
python scripts/vss_demo_app.py
# → http://0.0.0.0:7860

# Streamlit UI
streamlit run scripts/vss_demo_streamlit.py
# → http://localhost:8501
```

### Step 7 — Verify

```bash
# Health check
curl http://localhost:8080/api/v1/health

# Quick query test
curl -X POST http://localhost:8080/api/v1/query \
  -H "Content-Type: application/json" \
  -d '{"query": "How many people were detected yesterday?"}'
```

---

## 9. Docker Setup on H100

The following sets up the full VSS stack — ClickHouse, Ollama, and (optionally) vLLM — in containers on an H100 machine.

### Directory layout

```
vss-docker/
├── docker-compose.yml
├── .env                    # ← copy from vss/.env.example, fill values
└── vss/                    # ← your vss/ source directory
```

### `docker-compose.yml`

```yaml
version: "3.9"

services:

  # ─── ClickHouse ──────────────────────────────────────────────────────────
  clickhouse:
    image: clickhouse/clickhouse-server:24.3
    container_name: vss-clickhouse
    restart: unless-stopped
    ports:
      - "8123:8123"   # HTTP interface
      - "9000:9000"   # Native TCP interface
    volumes:
      - clickhouse-data:/var/lib/clickhouse
    environment:
      CLICKHOUSE_USER: ${VSS_CLICKHOUSE__USER:-default}
      CLICKHOUSE_PASSWORD: ${VSS_CLICKHOUSE__PASSWORD:-}
      CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT: 1
    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "http://localhost:8123/ping"]
      interval: 10s
      timeout: 5s
      retries: 5

  # ─── Ollama (LLM) ────────────────────────────────────────────────────────
  ollama:
    image: ollama/ollama:latest
    container_name: vss-ollama
    restart: unless-stopped
    ports:
      - "11434:11434"
    volumes:
      - ollama-data:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ["0"]        # LLM on GPU 0
              capabilities: [gpu]
    environment:
      OLLAMA_HOST: "0.0.0.0"
    healthcheck:
      test: ["CMD", "curl", "-sf", "http://localhost:11434/api/tags"]
      interval: 15s
      timeout: 5s
      retries: 5

  # ─── VSS API ─────────────────────────────────────────────────────────────
  vss:
    build:
      context: ./vss
      dockerfile: Dockerfile
    container_name: vss-api
    restart: unless-stopped
    ports:
      - "${VSS_API_PORT:-8080}:8080"
    env_file:
      - .env
    environment:
      # Override ClickHouse host to point at the Docker service name
      VSS_CLICKHOUSE__HOST: clickhouse
      VSS_LLM__ENDPOINT: http://ollama:11434
      # VLM endpoint — update if vllm container is running
      VSS_VLM__ENDPOINT: http://vss-vllm:8000/v1
    depends_on:
      clickhouse:
        condition: service_healthy
      ollama:
        condition: service_healthy
    volumes:
      - ./vss/logs:/app/logs
    healthcheck:
      test: ["CMD", "curl", "-sf", "http://localhost:8080/api/v1/health"]
      interval: 20s
      timeout: 10s
      retries: 3

  # ─── vLLM (VLM — only needed when VSS_VLM_ENABLED=true) ─────────────────
  vss-vllm:
    image: vllm/vllm-openai:latest
    container_name: vss-vllm
    restart: unless-stopped
    profiles:
      - vlm                           # Start with: docker compose --profile vlm up
    ports:
      - "8000:8000"
    volumes:
      - huggingface-cache:/root/.cache/huggingface
    command: >
      --model nvidia/Cosmos-Reason2-8B
      --host 0.0.0.0
      --port 8000
      --gpu-memory-utilization 0.85
      --max-model-len 8192
      --trust-remote-code
      --dtype auto
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ["1"]        # VLM on GPU 1
              capabilities: [gpu]

volumes:
  clickhouse-data:
  ollama-data:
  huggingface-cache:
```

### `Dockerfile` (place in `vss/`)

```dockerfile
FROM python:3.11-slim

WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
    curl \
    && rm -rf /var/lib/apt/lists/*

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

# Create logs directory
RUN mkdir -p logs

EXPOSE 8080

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080", "--workers", "2"]
```

### Start Commands

```bash
# SQL-only mode (ClickHouse + Ollama + VSS)
docker compose up -d

# Pull the LLM model after Ollama starts
docker exec vss-ollama ollama pull llama3.1:8b

# With VLM enabled (adds the vss-vllm container)
VSS_VLM_ENABLED=true docker compose --profile vlm up -d

# View logs
docker compose logs -f vss
docker compose logs -f vss-vllm

# Stop everything
docker compose down
```

### GPU Assignment on H100


| Container    | GPU      | VRAM Usage          | Purpose                                             |
| ------------ | -------- | ------------------- | --------------------------------------------------- |
| `vss-ollama` | `GPU 0`  | ~8 GB               | `llama3.1:8b` — SQL generation, routing, formatting |
| `vss-vllm`   | `GPU 1`  | ~16 GB (85% of GPU) | `Cosmos-Reason2-8B` — visual analysis               |
| `vss-api`    | CPU only | —                   | FastAPI orchestration                               |


> **Note:** Adjust `device_ids` in `docker-compose.yml` to match your actual H100 GPU topology. On multi-GPU H100 nodes use `nvidia-smi` to confirm device indices before deployment.

---

## 10. Ports Reference


| Port    | Service         | Protocol | Description                                      |
| ------- | --------------- | -------- | ------------------------------------------------ |
| `8080`  | VSS API         | HTTP     | Main REST API (configurable via `VSS_API_PORT`)  |
| `8000`  | vLLM            | HTTP     | Cosmos-Reason2-8B OpenAI-compatible API          |
| `8001`  | vLLM (LLM mode) | HTTP     | Optional: vLLM serving `llama3.1:8b`             |
| `11434` | Ollama          | HTTP     | Ollama LLM server API                            |
| `8123`  | ClickHouse      | HTTP     | ClickHouse HTTP interface                        |
| `9000`  | ClickHouse      | TCP      | ClickHouse native TCP interface                  |
| `7860`  | Gradio Demo     | HTTP     | Local testing UI (scripts/vss_demo_app.py)       |
| `8501`  | Streamlit Demo  | HTTP     | Local testing UI (scripts/vss_demo_streamlit.py) |


---

## 11. API Reference

Base URL: `http://<host>:8080/api/v1`

Interactive docs (Swagger UI): `http://<host>:8080/docs`

---

### `POST /query`

Process a natural language text query. Routes to SQL or VLM based on intent classification.

**Request body:**

```json
{
  "query": "How many fire alerts were triggered yesterday?",
  "include_trace": false
}
```


| Field           | Type     | Required | Description                                                 |
| --------------- | -------- | -------- | ----------------------------------------------------------- |
| `query`         | `string` | Yes      | Natural language question (1–2000 chars)                    |
| `include_trace` | `bool`   | No       | Include full execution trace in response (default: `false`) |


**Response:**

```json
{
  "request_id": "a3f1c2e0-...",
  "status": "success",
  "response": "There were 14 fire alerts triggered yesterday across all cameras.",
  "processing_time_ms": 1842,
  "trace": [],
  "metadata": {
    "intent": "sql",
    "intent_confidence": 0.9,
    "response_source": "sql",
    "escalated": false,
    "escalation_reason": null,
    "sql_query": "SELECT COUNT(*) FROM default.alert_instances WHERE ...",
    "routing_reason": "SQL patterns: 3"
  }
}
```

**Status codes:**


| Code  | Meaning                                                    |
| ----- | ---------------------------------------------------------- |
| `200` | Success (check `status` field for `"success"` / `"error"`) |
| `422` | Validation error (e.g. empty query)                        |
| `500` | Internal server error                                      |


---

### `POST /query/media`

Process a query with an attached camera frame or image. **Requires `VSS_VLM_ENABLED=true`.**

**Request body:**

```json
{
  "query": "How many people are visible in this frame?",
  "media_type": "image",
  "media_base64": "<base64-encoded-jpeg>",
  "include_trace": false
}
```


| Field           | Type     | Required | Description                                |
| --------------- | -------- | -------- | ------------------------------------------ |
| `query`         | `string` | Yes      | Question about the image                   |
| `media_type`    | `string` | No       | Always `"image"` (default)                 |
| `media_base64`  | `string` | Yes      | Base64-encoded image (JPEG/PNG)            |
| `include_trace` | `bool`   | No       | Include execution trace (default: `false`) |


**Response:** Same structure as `/query` above, with `response_source: "vlm"`.

**Status codes:**


| Code  | Meaning                                      |
| ----- | -------------------------------------------- |
| `200` | Success                                      |
| `503` | VLM not enabled — set `VSS_VLM_ENABLED=true` |
| `422` | Validation error                             |
| `500` | Internal server error                        |


**Example error when VLM is disabled:**

```json
{
  "detail": "VLM is not enabled. Set VSS_VLM_ENABLED=true to enable media queries."
}
```

---

### `GET /health`

Check health status of all service components.

**Response:**

```json
{
  "status": "healthy",
  "timestamp": "2026-03-02T10:00:00.000Z",
  "components": {
    "clickhouse": { "status": "ok" },
    "llm":        { "status": "ok" },
    "vlm":        { "status": "disabled", "message": "VLM not enabled (VSS_VLM_ENABLED=false)" }
  }
}
```


| Component status | Meaning                                   |
| ---------------- | ----------------------------------------- |
| `ok`             | Service is reachable and responding       |
| `error`          | Service is unreachable or failing         |
| `disabled`       | Feature is intentionally off (`vlm` only) |


**Overall status:**


| `status`    | Meaning                                        |
| ----------- | ---------------------------------------------- |
| `healthy`   | All enabled components are `ok`                |
| `degraded`  | One or more components are `error` but not all |
| `unhealthy` | All components are `error`                     |


---

### `GET /`

Root endpoint. Returns service identity.

```json
{ "service": "VSS", "status": "running", "docs": "/docs" }
```

---

## 12. Logging & Observability

VSS writes three rotating log files (auto-created under `logs/`, rotate at 10 MB, 7 backups kept):


| File                  | Level      | Contents                                                            |
| --------------------- | ---------- | ------------------------------------------------------------------- |
| `logs/vss_app.log`    | `DEBUG+`   | All module logs — startup, routing decisions, SQL execution, errors |
| `logs/vss_trace.log`  | `DEBUG+`   | Structured per-request trace blocks with per-node step breakdowns   |
| `logs/vss_errors.log` | `WARNING+` | Errors and warnings only — for alerting/monitoring                  |


### Sample trace block (`vss_trace.log`)

```
════════════════════════════════════════════════════════════════════════════════
  REQUEST  │  2026-03-02 14:23:01  │  id=a3f1c2e0  │  completed  │  1842 ms
  QUERY    │  How many fire alerts were triggered yesterday?
  INTENT   │  sql  (conf=0.90)  via: SQL patterns: 3
  SQL      │  SELECT COUNT(*) FROM default.alert_instances WHERE ...
  SOURCE   │  sql
────────────────────────────────────────────────────────────────────────────────
  FULL TRACE  (7 steps)
    [14:23:01] State initialized: How many fire alerts were triggered yesterday?
    [14:23:01] Node: classify
    [14:23:01] Routing (rules): sql (0.90)
    [14:23:01] Node: sql
    [14:23:01] SQLAgent: completed (1 attempt(s))
    [14:23:01] Node: compose
────────────────────────────────────────────────────────────────────────────────
  ANSWER
    There were 14 fire alerts triggered yesterday across all cameras.
════════════════════════════════════════════════════════════════════════════════
```

### Per-node step blocks

Every LangGraph node writes a step block showing exactly which trace entries it produced — useful for debugging where the pipeline diverges.

---

## 13. Testing & Validation

### Running Tests

Tests are located in `tests/` and use Python's built-in `unittest` — no pytest required.

```bash
# From the vss/ directory

# Run all tests
python tests/run_tests.py

# Verbose (shows each test method)
python tests/run_tests.py -v

# Run a single module
python tests/run_tests.py --module routing
python tests/run_tests.py --module sql
python tests/run_tests.py --module llm
python tests/run_tests.py --module api

# Run a single test class
python tests/run_tests.py --class TestSanitizeSql -v
```

Test logs are written to `tests/logs/vss_tests.log` with a structured session banner, per-class headers, and PASS/FAIL per test.

### Test Modules


| Module    | File                            | What it tests                                                  |
| --------- | ------------------------------- | -------------------------------------------------------------- |
| `routing` | `test_agents/test_routing.py`   | Intent classification — SQL/VLM pattern matching, LLM fallback |
| `sql`     | `test_agents/test_sql_agent.py` | SQL sanitizer, read-only enforcement, retry logic              |
| `llm`     | `test_agents/test_llm_init.py`  | LLM client initialization and config loading                   |
| `api`     | `test_e2e/test_api.py`          | API endpoint integration tests                                 |


### Connectivity Smoke Tests

Use these standalone scripts before running the full service to validate each component independently:

```bash
# Test ClickHouse, Ollama, and vLLM connectivity
python scripts/test_connections.py

# Test end-to-end VLM path with a sample image
python scripts/test_vlm_path.py
```

### Quick Validation Checklist

```bash
# 1. Verify no hardcoded credentials in Python files
grep -r "172.17" --include="*.py" .          # Should only appear in defaults
grep -r "matrice" --include="*.py" .         # Should be ZERO matches
grep -r "CUDA_VISIBLE_DEVICES" --include="*.py" .   # Should be ZERO matches

# 2. Verify VLM is disabled by default
grep -r "vlm_enabled" --include="*.py" .     # Should show default=False

# 3. Verify env var override works
VSS_CLICKHOUSE__HOST=test python -c \
  "from config.settings import get_settings; print(get_settings().clickhouse.host)"
# Expected output: test

# 4. Health check after startup
curl -s http://localhost:8080/api/v1/health | python -m json.tool
```

---

## 14. Limitations & Open Items

### Current Limitations


| Area                    | Limitation                                                                                                                                                                                 |
| ----------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| **Media input**         | Only JPEG/PNG images via base64. Video frame extraction is not implemented — callers must extract frames externally.                                                                       |
| **SQL scope**           | SQL agent targets a fixed set of three tables (`aggregated_analytics_totals`, `alert_instances`, `incidents`). Adding new tables requires updating `TABLE_DESCRIPTIONS` in `sql_agent.py`. |
| **VLM context**         | VLM receives a single image per request — no multi-frame or temporal analysis.                                                                                                             |
| **Authentication**      | The API has no authentication layer. In production, place it behind an API gateway or add OAuth2/API key middleware.                                                                       |
| **Rate limiting**       | No rate limiting is implemented — add via API gateway or a middleware like `slowapi` before production exposure.                                                                           |
| **Concurrent requests** | Each request creates a new `SQLAgent` and `VLMAgent` instance. Under high concurrency, LLM/DB connections will multiply; a connection pool or agent pool is not yet implemented.           |
| **VLM cold start**      | First VLM request after startup may time out while the model loads into GPU memory. The 120 s `httpx` timeout provides some buffer, but very large models may exceed this.                 |
| **Time zone handling**  | All timestamps are stored and queried in UTC. The agent does not translate user-local time zones to UTC automatically.                                                                     |


### Open Items / Roadmap


| #   | Item                                                                                                 | Priority |
| --- | ---------------------------------------------------------------------------------------------------- | -------- |
| 1   | Add API key / JWT authentication middleware                                                          | High     |
| 2   | Implement agent instance pooling to handle concurrent load                                           | High     |
| 3   | Add a rate limiter (e.g. `slowapi`)                                                                  | Medium   |
| 4   | Extend SQL agent to support schema discovery at runtime (dynamic tables)                             | Medium   |
| 5   | Add multi-frame VLM support (video clip as sequence of frames)                                       | Medium   |
| 6   | Expose Prometheus metrics (`/metrics` endpoint) for MLOps dashboards                                 | Medium   |
| 7   | Implement user-timezone-aware date parsing in the routing agent                                      | Low      |
| 8   | Add streaming responses (`text/event-stream`) for long LLM completions                               | Low      |
| 9   | Migrate `ClickHouseConfig`, `LLMConfig`, `VLMConfig` to `BaseSettings` for independent instantiation | Low      |
| 10  | Add CI/CD pipeline with automated test execution on push                                             | Low      |


---

