Click any component to explore it in the full case study.

ETL Pipeline — build_knowledge_from_ops.py · Rolling 7–14 day window
Step 1
Source Read
Cosmos DB partition scans for ops window
Step 2
Extractors
low_wait_gem, crowd_trend, sellout_pattern, hidden_gem extractors
Step 3
Insight Filter
Quality gate: minimum signal threshold, dedup, freshness check
Step 4
Embed
Azure OpenAI text-embedding-3-small → 1536-dim vector per doc
Step 5
Upsert
PostgreSQL + pgvector · HNSW index on embedding column
1.6M
Records indexed
1536
Embedding dimensions
7–14d
Rolling data window
3
Retrieval strategies
Stack Azure OpenAI text-embedding-3-small PostgreSQL + pgvector HNSW index Cosmos DB Phi-3 Mini (SLM) Azure Functions LangChain