Four AI models with different strengths and specific per-model instructions collaborate to generate daily theme park content — each model assigned to exactly the task where it performs best. A fine-tuned SLM classifies intent, a fast model handles tool-driven synthesis, and a quality-focused model formats every output for each platform. No single model does everything; every model does what it does best.
Every model in the pipeline receives instructions tuned to its task. The classifiers receive a classification schema and confidence contract. The synthesis model receives anti-hallucination rules and tool-use directives. The formatting model receives platform-specific character limits, format templates, and brand voice guidelines. No model carries instructions for tasks it doesn't perform.
Fine-tuned for classification on this specific task. Deployed to Azure Container Apps (scales to zero). SHA-256 routing for determinism — same question always routes the same path. Returns strategy + confidence score.
Azure OpenAI deployment. Automatically activates when Phi-3 Mini confidence < 0.85 or the container times out. Identical input/output contract — stamps classification_source="slm_fallback_*" for audit trail.
Handles the agentic tool loop: decides which tools to call, receives results, synthesizes into a structured data summary. Chosen for low latency and cost — tool calls are the bottleneck, not model inference. Parallel tool execution via ThreadPoolExecutor reduces total wall time to slowest tool.
All platform-facing content. Sonnet replaced Haiku for social after Haiku produced park-unaware hallucinations and used unfamiliar attraction terminology. Three separate calls: blog post, platform formatting (Facebook/Instagram/Threads), and Reels narration script — each with specific per-format instructions.
Each generation call uses the model best suited for that task. Model IDs are env-var overridable — no redeploy needed to experiment with models. The prompt and instruction set also changes per pass to match the task's requirements.
| Pass | Model | Instructions Given | Output | Why This Model |
|---|---|---|---|---|
| Pass 1 Agentic loop |
Haiku 4.5 | PIPELINE_SYSTEM_PROMPT (identity + anti-hallucination) + tool use directives + Cosmos voice prompt. Sees all 30+ tool definitions and pre-fetched context block. | Tool selection + structured data synthesis. Section-marked content block (---BLOG---, ---DISCORD---, ---TELEGRAM---) | Tool calls are the bottleneck — model inference is fast. Low cost per run (<$0.003). Park-generic synthesis doesn't require domain terminology depth. |
| Pass 2a Blog post |
Sonnet 4.5 | Cosmos prompt_body (voice/tone specific to this pipeline) + blog-format instructions (200-300 words, 2-3 data points, scannable). Gets Pass 1 output as context. | Full blog post with headline, structured paragraphs, actionable insights. Links auto-appended by publisher. | Blog readers expect craft and domain knowledge. Sonnet understands park-specific terminology (what a "rope-drop advantage" is vs. a generic attraction list). |
| Pass 2b Social format |
Sonnet 4.5 | Per-channel format blocks: Facebook (IMAGE_KEY + HEADLINE ≤55 chars + CAPTION 500-800 chars), Instagram (PUBLISH gate + IMAGE_KEY + hashtags), Threads (data_drop or meteorologist template by content type). Section markers define output structure. | Section-marked response: ---FACEBOOK--- block, ---INSTAGRAM--- block, ---THREADS--- block. Each parsed into ContentArtifact platform slots. | Replaced Haiku after Haiku hallucinated ride names and used generic travel language ("head to X if you're around"). Sonnet knows Disney World attractions. |
| Pass 3 Reels script |
Sonnet 4.5 | Strict schema: HOOK (5-8 words, leave tension unresolved), SEGMENT_1 (6-9 words, payoff), CTA (5-7 words), NARRATION (36-42 words, 3 sentences: 2 data moves + CTA), IG_CAPTION, TIKTOK_CAPTION, YT_TITLE, YT_DESCRIPTION. Forbidden phrases list (e.g. "here's what you need to know"). | Structured REELS block with 10 required fields. NARRATION field goes to ElevenLabs TTS → synchronized captions in the video pipeline. | Strict word-count contracts for Reels require precision. Sonnet reliably stays within the hook/narration word constraints at temperature=0. |
The instruction surface seen by the generation model is assembled from three independent layers. This separation lets operational voice changes happen in Cosmos without a redeploy, while the anti-hallucination contract and platform format rules stay in code where they're versioned.
langchain_agent.py :: PIPELINE_SYSTEM_PROMPTpipeline_configs · prompt_source: "cosmos" | "hardcoded"operations_bulletin, morning_briefing, weather_morning, etc. The document contains prompt_body (overall voice and blog format) and prompt_platforms (per-platform section-specific instructions). Both fields are editable in the Portal UI without redeploying the function app. Hardcoded fallbacks exist for every pipeline so a missing Cosmos document never blocks generation.
shared/platform_prompt_config.py :: build_enabled_call2_format_block()prompt_platforms section already defines a channel, the hardcoded block is suppressed — Cosmos wins.
The same 4-model team handles all 10 pipeline types, but each type configures the tool set, platform targets, Cosmos prompt, Reels template variant, and image selection hint differently. A weather pipeline runs Haiku with WEATHER_TOOLS and a Cosmos meteorologist voice prompt; a rope-drop pipeline runs Haiku with ROPE_DROP_TOOLS and an urgency-first prompt. Same models, different instruction surfaces.
Three reasons. Cost: running Sonnet for 10 parallel pre-fetch queries + tool calls + synthesis is ~5× more expensive than Haiku for those steps, with no quality benefit — tool responses are structured data, not prose. Latency: the agentic loop fires up to 10 LLM calls with tool round-trips; Haiku's lower inference latency meaningfully reduces end-to-end pipeline time. Hallucination surface: Sonnet's broader knowledge sometimes works against it for the synthesis pass — it may blend training knowledge with tool results. Haiku at temperature=0 is more literal, which is exactly what synthesis needs.
Haiku produced content that failed on two fronts: it used unfamiliar attraction terminology (calling "TRON Lightcycle / Run" just "TRON" in some contexts and something unrecognizable in others), and it wrote generic travel advice that read as AI-generated. Two specific failure patterns: "head to X if you're around" (no Disney guest "heads to" a ride, they "rope-drop" it or "grab a Lightning Lane"), and "based on the information provided" appearing in captions. Sonnet understands the Disney World vernacular — what "rope-drop" means, what a "Lightning Lane Multi Pass booking" is vs. "Individual Lightning Lane", and the urgency framing that Disney social content requires.
deploy.sh upserts pipeline_configs documents at deploy time, which resets prompt_source to "hardcoded" and clears any Cosmos-stored prompt. The correct operations sequence is: deploy.sh first, then push Cosmos prompt updates via API. If prompt updates go first, deploy.sh overwrites them and the next scheduled run uses the hardcoded fallback — surfaced by Cosmos prompt unavailable — no hardcoded fallback exists in the pipeline log. This is documented as Rule 6 in the deployment runbook.
The generation pipeline parses responses by exact section markers (---FACEBOOK---, ---INSTAGRAM---, IMAGE_KEY:, HEADLINE:, etc.). Any deviation — extra preamble text, wrong marker spelling, field reordering — breaks the parser and produces an empty ContentArtifact. At temperature=0 with explicit format instructions in the system prompt, Claude's first response follows the schema without retries. The previous Bedrock Agent runtime approach at higher temperatures required multi-turn format correction loops. temperature=0 made the retry logic unnecessary — it was removed entirely.
Three env vars govern model selection: CLAUDE_MODEL_ID (Pass 1, defaults to Haiku 4.5), CLAUDE_MODEL_ID_BLOG (Pass 2 blog, defaults to Sonnet 4.5), CLAUDE_MODEL_ID_SOCIAL (Pass 2 social, defaults to Sonnet 4.5). Changing any of these via az functionapp config appsettings set + restart swaps the model with no code change. This lets A/B testing happen against live production traffic: one day Sonnet on social, next day Haiku, compare platform engagement metrics directly. Model IDs use the cross-region inference profile format (us.anthropic.claude-*) for failover across AWS regions.