Click any component to explore it in the full case study.

Query Processing + Retrieval + Generation
Step 1
Pipeline Routing
_pipeline_from_question() keyword-matches → content_type assignment → knowledge_types filter for retrieval
Step 2
Live Tool Check
_needs_live_tools() detects historical vs. live keywords. Conditionally enables tool set to avoid live-data spam on historical queries.
Step 3
SLM Classification
Phi-3 Mini / Phi-4-mini determines retrieval strategy (semantic vs. structured vs. point-read)
Step 4
RAG Retrieval
PostgreSQL HNSW semantic search + Cosmos DB structured queries. Context assembled.
Step 5
Claude Generation
Claude Sonnet 4.5 + Bedrock Prompt Cache (cache when context > 1200 chars). Conversational tone enforced.
🔒
Anti-Hallucination + SWA Proxy Auth
Same data-source-only rule as pipelines — never fabricate. Historical keyword detection prevents unnecessary live tool calls. API key never exposed to client (SWA proxy injects). Bedrock Prompt Cache reduces cost on multi-turn.
Data-source-only ruleHistorical keyword filterSWA proxy key injectionBedrock Prompt Cache (>1200 chars)No API key on clientTool call trace in UI
Auto
Pipeline routing
15+
Live tools available
4
History turns
Cache
Bedrock Prompt Cache
StackPythonClaude Sonnet 4.5AWS BedrockBedrock Prompt CacheLangChainPostgreSQL HNSWCosmos DBPhi-3 Mini SLMAzure SWA proxy