Skip to content

RAG Data Flow (Canonical)

Canonical decomposition of the retrieval + generation pipeline. Supersedes monolithic legacy page ../rag_pipeline.md.

Intent + profile (+ optional trip context) → cache lookup → hybrid retrieval → conditional enrichment → context assembly → LLM generation → attribution & compliance → cache store → client delivery.

RAG pipeline flow

Figure: High-level RAG + caching overview. Blue = processing stages, gold = decisions, tan = caches, teal = persistent stores.


1. Hashing & Cache Keys

Each cache key uses a stable 64-bit FNV or Murmur hash of a canonical JSON subset. Version suffixes bump when prompt or template semantics change.

Cache entries: - Retrieval set (petal IDs + metadata snapshot) - Generated answer (trip draft or recommendation set)

Eviction / TTL (initial): - Retrieval: 30–120 min (volatile sources shorter) - Generation: 10–30 min unless user-saved

2. Retrieval Strategy

Primary: Azure Cognitive Search hybrid (BM25 + vector). Server-side filters: geo radius, tags, seasonality, accessibility.

Diversification heuristic: if top 8 share >60% identical tag set, replace lowest 2 with high-distance related_petals to enhance thematic variety.

Ranking blend (heuristic v1):

score = 0.55 * vector_score + 0.35 * bm25 + 0.10 * recency_decay

Future: learned-to-rank on interaction feedback.

Failure handling: - Empty result → broaden filters once (geo widen, relax tag strictness) - API timeout → fallback to cached retrieval (<1h old)

3. External Enrichment Decision Matrix

Trigger enrichment only if: - Missing critical fields (opening_hours, pricing_tier) - last_updated stale (>7d events, >30d attractions) - License flag requires refreshed attribution

Controls: token bucket per provider; circuit breaker after consecutive failures.

4. Context Assembly Templates

Blocks: 1. User preference summary 2. Trip scaffold (dates, companions, constraints) 3. Petal snippets (top N grouped by thematic cluster) 4. Preference summary (favorite tags, companions, travel modes) 5. Safety / license directives

Budget: target < 6k input tokens (reserve generation headroom).

5. Prompt Strategy & Safety Filters

Sections: - System (role & compliance constraints) - Context (assembled blocks) - Instruction (output format JSON itinerary or bullet recommendations)

Safety: - Regex/policy filter post-generation - Mandatory attribution enforcement before return

6. Response Post-Processing & Attribution

Steps: 1. Parse JSON (repair prompt retry if invalid) 2. Attach source & license metadata per petal 3. Insert AI generation disclaimers 4. Compute quality score (coverage, diversity, personalization alignment)

7. Metrics & Observability

Metric Target (MVP) Notes
P95 end-to-end latency < 1200 ms Warm cache path ~500–700 ms
Cache hit ratio (retrieval) > 55% Cost efficiency lever
Generation token cost / req < $0.25 Track per intent type
Invalid JSON rate < 3% Trigger prompt revision above
Enrichment call success > 98% After retries
Diversification applied rate 10–25% Healthy exploration range

Tracing: correlate intent_id across classification → retrieval → generation.

8. Future Optimizations

  • Learned re-ranking model
  • Multi-vector field indexing
  • Adaptive cache TTL (volatility score)
  • Structured function calling for JSON validation
  • Distillation for cheaper re-ranker
  • Daily preference embedding refresh

Last updated: September 2025