Searchat — RAG Chat Pipeline

Query → Search → Context → LLM → Response
v0.6.0 | Last updated: 2026-02-17
Python 3.10+ · FastAPI · LiteLLM · FAISS · Hybrid Search (DuckDB FTS + Vector)
Stage 1
Query Reception
User Query Input Natural language question from user interface
Query Parser Extracts intent, validates input, prepares for search
Top-K Selector Determines context size: 6 (simple), 8 (default), 16 (complex)
Parsed Query
Stage 2
Hybrid Search Engine <100ms
DuckDB FTS Keyword Search Full-text search with English stemmer for exact terms and code identifiers
FAISS Semantic Search Vector similarity using all-MiniLM-L6-v2 embeddings (384-dim)
RRF Fusion Reciprocal Rank Fusion merges DuckDB FTS + FAISS results
Top-K Results (default: 8)
Stage 3
Context Builder
Result Formatter Extracts conversation_id, updated_at, project_id, snippet from each result
System Prompt Injection Combines instruction template with formatted context chunks for LLM
Formatted Context + Prompt
Stage 4
LLM Provider Selection
OpenAI
<2s
Ollama
<5s
Embedded
<8s
Model Resolution Uses config defaults or user-specified model name
Generated Response
Stage 5
Response Delivery
Streaming Mode (SSE) Yields response chunks as they arrive from LLM
Non-Streaming Mode Returns complete RAGGeneration object with full answer
Citation Extraction Links answer segments to source conversations
<100ms
Search
<2s
OpenAI
<5s
Ollama
<8s
Embedded
8
Top-K