Searchat — RAG Chat Pipeline
Query → Search → Context → LLM → Response
v0.4.0
| Last updated: 2026-02-03
Python 3.9+ · FastAPI · LiteLLM · FAISS · Hybrid Search (BM25 + Vector)
Stage 1
Query Reception
User Query Input
Natural language question from user interface
Query Parser
Extracts intent, validates input, prepares for search
Top-K Selector
Determines context size: 6 (simple), 8 (default), 16 (complex)
Parsed Query
Stage 2
Hybrid Search Engine
<100ms
BM25 Keyword Search
Traditional text matching for exact terms and code identifiers
FAISS Semantic Search
Vector similarity using all-MiniLM-L6-v2 embeddings (384-dim)
RRF Fusion
Reciprocal Rank Fusion merges BM25 + FAISS results
Top-K Results (default: 8)
Stage 3
Context Builder
Result Formatter
Extracts conversation_id, updated_at, project_id, snippet from each result
System Prompt Injection
Combines instruction template with formatted context chunks for LLM
Formatted Context + Prompt
Stage 4
LLM Provider Selection
OpenAI
<2s
Ollama
<5s
Embedded
<8s
Model Resolution
Uses config defaults or user-specified model name
Generated Response
Stage 5
Response Delivery
Streaming Mode (SSE)
Yields response chunks as they arrive from LLM
Non-Streaming Mode
Returns complete RAGGeneration object with full answer
Citation Extraction
Links answer segments to source conversations
<100ms
Search
<2s
OpenAI
<5s
Ollama
<8s
Embedded
8
Top-K