A scheduled content factory that uses a fine-tuned Phi-3 Mini SLM (deployed to Azure Container Apps)
and a LangChain agentic loop backed by Claude on AWS Bedrock to generate multi-platform park content —
social posts, blogs, video scripts, morning briefings, evening recaps — fully automatically.
The SLM classifies query intent for tool selection; a parallel pre-fetch layer pulls knowledge
from the PostgreSQL RAG before the LLM sees a single token; tool calls execute in parallel via
ThreadPoolExecutor; and every pipeline type has a curated, type-specific tool subset.
Each pipeline type runs on a schedule (operations bulletin every morning, weather updates 3× daily, trend reports nightly, etc.). A Container App Job launches the pipeline runner, which assembles context from the knowledge RAG + real-time tools, then calls Claude via LangChain to produce formatted multi-platform content.
Full park ops snapshot: current waits, LL status, closures, dining, shows. Targets social + blog. Heaviest pre-fetch — 6 knowledge queries + 7 real-time tools in parallel.
3× daily weather-focused content: NWS forecast, storm risk, ride sensitivity, rides at risk of closure. All three use identical WEATHER_TOOLS but different prompt tone (morning plan vs. midday adjustment vs. evening recap).
Pre-open park strategy: first attractions to target, LL prioritization, weather impact on opening conditions. Uses forecast + current LL + purchase options + entity proximity context.
End-of-day analysis: wait time trends, sellout/release patterns, weekly comparisons. Uses TREND_TOOLS including query_intelligence for historical context from the knowledge RAG.
Narrative recap of prior day's operations: headline waits, LL drama, downtime incidents, weather impact. Structured by pre-fetched daily actuals from precomputed_docs.
Pre-visit preparation content: what to expect today based on historical day-of-week patterns, LL opening strategy, weather outlook. MORNING_TOOLS blend real-time current state with pattern knowledge.
Long-form analytical content: week-over-week comparisons, crowd pattern explanations, predictive guidance. INSIGHTS_TOOLS combine query_intelligence (RAG) with structured daily aggregates for multi-layer analysis.
Day-end narrative: today's operational highlights, LL sellout counts, top waits, crowd vs. historical baseline. Pre-fetches today's KPI digest and crowd index for data-driven hooks.
Every pipeline has a PIPELINE_PREFETCH config that defines exactly which knowledge
queries and real-time tool calls to fire in parallel before the LLM call.
This pattern eliminates agentic round-trips for predictable data needs — the LLM receives
a complete context block and can focus on generation, not data retrieval.
The operations_bulletin pipeline pre-fetches 6 semantic knowledge queries and 7 live tool calls
in parallel. Entity names are embedded into query strings (not just type filters) so BM25
ranking in the knowledge store surfaces the right headliner rides — not low-priority attractions
written later in the upsert cycle. Template variables like {last_same_dow} and
{date} are resolved at runtime from an ET-anchored date context before queries are dispatched.
A fine-tuned Phi-3 Mini model deployed to Azure Container Apps handles the query classification step that decides which retrieval strategy to use for each tool call. The same classifier serves both user-facing RAG queries and the automated content pipeline tool calls. The SLM runs at sub-200ms with minimal inference cost; Phi-4-mini on Azure OpenAI serves as the automatic fallback.
Every tool wraps one of two HTTP endpoints. LangChain tool docstrings are the only
documentation the model sees for tool selection — they are written as precise data-query
specifications, not conversational descriptions. Tool failures return
"No data available: [reason]" and are never propagated as exceptions
into the generation context.
| Tool | Backend | Used By | Key Data |
|---|---|---|---|
query_intelligenceRAG |
park-knowledge-updater → PostgreSQL RAG | TREND_TOOLS, INSIGHTS_TOOLS | Historical patterns, knowledge articles. Supports mode=semantic/structured/knowledge, knowledge_types filter to avoid full-library scan. |
get_current_wait_timesReal-time |
router-bot /api/ai/v1/currentWaits | BULLETIN, MORNING, ROPE_DROP | Live wait times all attractions, operational status |
get_lightning_lane_statusReal-time |
router-bot /api/ai/v1/lightningLane | BULLETIN, MORNING, ROPE_DROP | LL availability, pricing, sellout status per ride |
get_weather_intelligenceReal-time |
router-bot /api/ai/weather/intelligence | All tool sets | AI-analyzed weather with crowd/wait impact predictions, storm risk |
get_weather_forecastReal-time |
router-bot /api/ai/weather/forecast | WEATHER_TOOLS, ROPE_DROP | Hourly NWS forecast, storm risk %, wind risk %, comfort level |
get_rides_at_riskReal-time |
router-bot | WEATHER_TOOLS, ROPE_DROP | Outdoor rides with closure risk given current/forecast weather |
get_ride_sensitivityReal-time |
router-bot | WEATHER_TOOLS, ROPE_DROP | Historical weather sensitivity profile per attraction (rain/wind/lightning) |
get_down_summaryReal-time |
router-bot | BULLETIN, TREND, INSIGHTS | Today's downtime incidents, total hours, impacted attractions |
get_top_wait_aggregatesReal-time |
router-bot | TREND_TOOLS, INSIGHTS_TOOLS | Today's ranked top-10 waits from precomputed_docs |
get_daily_sellout_summaryReal-time |
router-bot | TREND_TOOLS, INSIGHTS_TOOLS | Today's LL sellout counts per attraction |
get_daily_release_summaryReal-time |
router-bot | TREND_TOOLS, INSIGHTS_TOOLS | Today's LL release/restock events — high count = extreme demand cycling |
get_entity_contextReal-time |
router-bot → park_knowledge parkEntities | BULLETIN, ROPE_DROP, evening_wrap | Attraction metadata: land, park, type, proximity. Used for spatial narrative. |
get_weekly_aggregationReal-time |
router-bot | TREND_TOOLS, INSIGHTS_TOOLS | Week-over-week comparative wait/downtime metrics |
The previous implementation used boto3 directly against the Bedrock Agent runtime (RETURN_CONTROL flow),
which serialized tool calls and had no actual token counting. The current implementation uses
LangChain's ChatBedrock (bedrock-runtime:InvokeModel directly), parallel tool execution,
and real token counts from usage_metadata.
RETURN_CONTROL tool calls were serial. Token counting was chars÷4 estimate. Output format required retries. AWS service overhead per iteration.
bedrock-runtime:InvokeModel directly — no Agent service overhead. Tool calls fire in parallel via ThreadPoolExecutor. Real token counts from usage_metadata. Output format enforced at temperature=0.
When the model emits multiple tool_calls in a single turn, all execute concurrently. Results collected via as_completed(). Wall time = slowest tool, not sum.
Every tool wraps its HTTP call in a try/except. Timeout, HTTP error, or exception → return "No data available: [reason]". Model instructed to omit that section rather than invent data.
Input tokens, output tokens, and total tracked per generation call. Used for cost audit trail and pipeline-level token budget enforcement.
Content-type-specific model selection: long-form insight generation uses Sonnet; real-time bulletin updates use Haiku for lower latency and cost.
Content generation uses multiple sequential LLM calls to separate concerns. Pass 1 synthesizes raw tool data into a structured summary. Pass 2 reformats for each enabled platform. Pass 3 (Reels/TikTok) builds the video script. Each pass uses different temperature and prompt directives.
For scheduled pipeline types (not interactive chat), the data requirements are fully predictable. Pre-fetching in parallel has two major advantages: total wall time equals the slowest single call (not sum of all calls), and the LLM receives a complete context block rather than making tool decisions under generation pressure. The agentic loop still fires for unexpected data needs, but the pre-fetch layer covers 90%+ of what each pipeline type needs. The PIPELINE_PREFETCH config makes it explicit and testable — a new engineer can see exactly what data flows into each pipeline type before reading any LLM code.
should_gate_for_fact_pack() inspects the pre-fetched KPI fact pack before any LLM call. Required conditions: crowd_score is present and numeric, top_10_waits has at least one entry, and operational_date matches today (Eastern Time). If any condition fails, the pipeline logs a gate reason (insufficient_kpi_data, aws_auth_failure, generation_failed) and skips generation entirely. This prevents publishing content built from stale or partial data — which would be worse than publishing nothing. The gate reason is auditable in the pipeline run log.
PostgreSQL BM25-style ranking (and the previous Cosmos query layer) scores documents by text relevance. If you query knowledge_types=["popularity_ranking"] with a generic question, the ranker returns low-priority attractions written last in the upsert cycle. Headliner rides (TRON, Rise of the Resistance, Seven Dwarfs Mine Train) that were upserted earlier score lower on recency-based metrics. By embedding headliner names directly into the query string — "TRON Rise Resistance Seven Dwarfs Mine Train Guardians Cosmic Rewind popularity ranking" — the text matching layer surfaces the right rides first regardless of upsert order. This was discovered through operational observation: bulletin content was featuring Remy's Ratatouille Adventure over Space Mountain because Remy was written later and scored higher on recency.
The Bedrock Agent runtime (RETURN_CONTROL loop) serialized tool calls: the agent emitted one tool call, waited for the response, then decided whether to call another. This meant a 3-tool bulletin build took 3 sequential 45-second HTTP calls = 135 seconds. LangChain's ChatBedrock calls bedrock-runtime:InvokeModel directly — we own the agentic loop. When Claude emits 5 tool calls in a single turn, we fire all 5 in parallel via ThreadPoolExecutor and collect results. The same bulletin build now takes ~45 seconds (cost of the slowest tool). The secondary benefit is output format control: with the Bedrock Agent, getting structured section-marked output reliably required retries. At temperature=0 with explicit format instructions in the system prompt, Claude's first response is always correctly formatted.
Restaurant wait patterns are strongly day-of-week dependent (Magic Kingdom Saturdays vs. Tuesdays have completely different dining pressure). The structured knowledge query "top 10 restaurants by wait time date={last_same_dow}" resolves {last_same_dow} to the prior week's same day-of-week (e.g., for a Thursday bulletin, it fetches last Thursday's restaurant data). This gives the LLM a meaningful historical dining baseline that reflects the same crowd pattern as today — much more accurate than a trailing 7-day average, which blends weekday and weekend patterns.