Multi-Model AI Content Generation

The Model Team

Each Model Has One Job — and Specific Instructions for It

Every model in the pipeline receives instructions tuned to its task. The classifiers receive a classification schema and confidence contract. The synthesis model receives anti-hallucination rules and tool-use directives. The formatting model receives platform-specific character limits, format templates, and brand voice guidelines. No model carries instructions for tasks it doesn't perform.

Classifier · Primary

Phi-3 Mini

Query intent → retrieval strategy

Fine-tuned for classification on this specific task. Deployed to Azure Container Apps (scales to zero). SHA-256 routing for determinism — same question always routes the same path. Returns strategy + confidence score.

Container Apps confidence ≥ 0.85 POST /classify timeout 2000ms

Classifier · Fallback

Phi-4-mini-instruct

Query intent → retrieval strategy

Azure OpenAI deployment. Automatically activates when Phi-3 Mini confidence < 0.85 or the container times out. Identical input/output contract — stamps classification_source="slm_fallback_*" for audit trail.

Azure OpenAI fallback path same schema

Synthesis · Agentic Loop

Claude Haiku 4.5

Tool calls + data synthesis (Pass 1)

Handles the agentic tool loop: decides which tools to call, receives results, synthesizes into a structured data summary. Chosen for low latency and cost — tool calls are the bottleneck, not model inference. Parallel tool execution via ThreadPoolExecutor reduces total wall time to slowest tool.

AWS Bedrock temperature=0 max 10 iterations tool_response ≤ 12000 chars

Format · Blog · Social · Reels

Claude Sonnet 4.5

Blog, social, and Reels script (Passes 2–3)

All platform-facing content. Sonnet replaced Haiku for social after Haiku produced park-unaware hallucinations and used unfamiliar attraction terminology. Three separate calls: blog post, platform formatting (Facebook/Instagram/Threads), and Reels narration script — each with specific per-format instructions.

AWS Bedrock temperature=0 BLOG_MODEL_ID SOCIAL_MODEL_ID

Pass-by-Pass Model Routing

Three Generation Passes, Three Model Assignments

Each generation call uses the model best suited for that task. Model IDs are env-var overridable — no redeploy needed to experiment with models. The prompt and instruction set also changes per pass to match the task's requirements.

Pass	Model	Instructions Given	Output	Why This Model
Pass 1 Agentic loop	Haiku 4.5	PIPELINE_SYSTEM_PROMPT (identity + anti-hallucination) + tool use directives + Cosmos voice prompt. Sees all 30+ tool definitions and pre-fetched context block.	Tool selection + structured data synthesis. Section-marked content block (---BLOG---, ---DISCORD---, ---TELEGRAM---)	Tool calls are the bottleneck — model inference is fast. Low cost per run (<$0.003). Park-generic synthesis doesn't require domain terminology depth.
Pass 2a Blog post	Sonnet 4.5	Cosmos prompt_body (voice/tone specific to this pipeline) + blog-format instructions (200-300 words, 2-3 data points, scannable). Gets Pass 1 output as context.	Full blog post with headline, structured paragraphs, actionable insights. Links auto-appended by publisher.	Blog readers expect craft and domain knowledge. Sonnet understands park-specific terminology (what a "rope-drop advantage" is vs. a generic attraction list).
Pass 2b Social format	Sonnet 4.5	Per-channel format blocks: Facebook (IMAGE_KEY + HEADLINE ≤55 chars + CAPTION 500-800 chars), Instagram (PUBLISH gate + IMAGE_KEY + hashtags), Threads (data_drop or meteorologist template by content type). Section markers define output structure.	Section-marked response: ---FACEBOOK--- block, ---INSTAGRAM--- block, ---THREADS--- block. Each parsed into ContentArtifact platform slots.	Replaced Haiku after Haiku hallucinated ride names and used generic travel language ("head to X if you're around"). Sonnet knows Disney World attractions.
Pass 3 Reels script	Sonnet 4.5	Strict schema: HOOK (5-8 words, leave tension unresolved), SEGMENT_1 (6-9 words, payoff), CTA (5-7 words), NARRATION (36-42 words, 3 sentences: 2 data moves + CTA), IG_CAPTION, TIKTOK_CAPTION, YT_TITLE, YT_DESCRIPTION. Forbidden phrases list (e.g. "here's what you need to know").	Structured REELS block with 10 required fields. NARRATION field goes to ElevenLabs TTS → synchronized captions in the video pipeline.	Strict word-count contracts for Reels require precision. Sonnet reliably stays within the hook/narration word constraints at temperature=0.

Prompt Architecture

Three Prompt Layers — Each with a Different Owner and Update Cycle

The instruction surface seen by the generation model is assembled from three independent layers. This separation lets operational voice changes happen in Cosmos without a redeploy, while the anti-hallucination contract and platform format rules stay in code where they're versioned.

Layer 1 — Identity & Safety Contract Static · In Code

File: langchain_agent.py :: PIPELINE_SYSTEM_PROMPT

Applied identically across every pipeline type and every model invocation. Sets the Park Whisperer persona, enforces the anti-hallucination rule ("Use ONLY the data in the LIVE DATA section"), and instructs the model to follow the content prompt below. This layer never changes at runtime — it is the invariant trust boundary.

You are Park Whisperer — Walt Disney World's data-driven intelligence source.
ANTI-HALLUCINATION RULE: Use ONLY the data that appears in the LIVE DATA section of this request.
Do not invent wait times, weather readings, crowd levels, Lightning Lane prices, or show times.
If a data source shows "No data available", state that explicitly in the relevant section.
Follow the voice, tone, and formatting instructions in the content prompt below.

Layer 2 — Voice, Tone & Content Format Per-Pipeline · Cosmos DB

Collection: pipeline_configs · prompt_source: "cosmos" | "hardcoded"

Each pipeline type owns its own Cosmos document: operations_bulletin, morning_briefing, weather_morning, etc. The document contains prompt_body (overall voice and blog format) and prompt_platforms (per-platform section-specific instructions). Both fields are editable in the Portal UI without redeploying the function app. Hardcoded fallbacks exist for every pipeline so a missing Cosmos document never blocks generation.

VOICE: Enthusiastic insider, chatty, data-obsessed. Never generic travel writing.
BLOG: Lead with the 2-3 most surprising numbers. 200-250 words. No ride lists — tell the story.
OPERATIONS BULLETIN SPECIFIC: Crowd score first if > 8.0. LL sellout urgency if < 10 remain.

Layer 3 — Platform Format Contracts Versioned · platform_prompt_config.py

File: shared/platform_prompt_config.py :: build_enabled_call2_format_block()

Defines the exact structured format each platform receives. Facebook: IMAGE_KEY enum (6 options) + HEADLINE ≤55 chars (no hashtags, no emojis) + CAPTION 500-800 chars + 2-3 hashtags at end. Instagram: PUBLISH gate + IMAGE_KEY + HEADLINE + hashtag-rich CAPTION. Reels: 10-field schema with exact word-count constraints per field. When a Cosmos prompt_platforms section already defines a channel, the hardcoded block is suppressed — Cosmos wins.

• ---FACEBOOK--- MUST use EXACTLY:
IMAGE_KEY: [lightning_lane_surge | peak_crowds | light_day | downtime_ripple | record_high | rope_drop_advantage]
HEADLINE: [ALL CAPS, max 55 chars, no hashtags, no emojis]
CAPTION: [500-800 chars, insider data voice, 2-3 hashtags at end]

Content-Type Routing

Pipeline Type Changes What Every Model Receives

The same 4-model team handles all 10 pipeline types, but each type configures the tool set, platform targets, Cosmos prompt, Reels template variant, and image selection hint differently. A weather pipeline runs Haiku with WEATHER_TOOLS and a Cosmos meteorologist voice prompt; a rope-drop pipeline runs Haiku with ROPE_DROP_TOOLS and an urgency-first prompt. Same models, different instruction surfaces.

Tool Set

BULLETIN_TOOLS (14) WEATHER_TOOLS (9) ROPE_DROP_TOOLS (10) TREND_TOOLS (8) INSIGHTS_TOOLS (10) MORNING_TOOLS (10) Haiku receives only the tools for this content type — not all 30+

Cosmos Prompt

prompt_body: pipeline-specific voice and blog format prompt_platforms: per-channel override sections (optional) prompt_source=cosmos → live from DB · prompt_source=hardcoded → fallback in code Updatable in Portal UI between deploys — voice changes don't require a redeploy

Platform Targets

channels dict: facebook/instagram/threads/reels/youtube per content type operations_bulletin: all channels · weather_morning: threads + reels Sonnet only formats channels that are enabled — no wasted tokens on disabled platforms _normalize_generation_channels() resolves legacy platforms[] + modern channels{} formats

Reels Script Template

_resolve_reels_call3_override() → content-type variant of the Reels prompt weather_* → "weather" template (meteorologist tone, storm risk hook) yesterday_recap → "yesterday_recap" template (past tense, what happened) morning_briefing → "morning_briefing" template (planning framing, forward-looking) Default template for operations_bulletin, trend_report, etc.

Design Decisions

Why This Model Team Structure Works

Why not use one model for everything?+

Three reasons. Cost: running Sonnet for 10 parallel pre-fetch queries + tool calls + synthesis is ~5× more expensive than Haiku for those steps, with no quality benefit — tool responses are structured data, not prose. Latency: the agentic loop fires up to 10 LLM calls with tool round-trips; Haiku's lower inference latency meaningfully reduces end-to-end pipeline time. Hallucination surface: Sonnet's broader knowledge sometimes works against it for the synthesis pass — it may blend training knowledge with tool results. Haiku at temperature=0 is more literal, which is exactly what synthesis needs.

Why did you move social formatting from Haiku to Sonnet?+

Haiku produced content that failed on two fronts: it used unfamiliar attraction terminology (calling "TRON Lightcycle / Run" just "TRON" in some contexts and something unrecognizable in others), and it wrote generic travel advice that read as AI-generated. Two specific failure patterns: "head to X if you're around" (no Disney guest "heads to" a ride, they "rope-drop" it or "grab a Lightning Lane"), and "based on the information provided" appearing in captions. Sonnet understands the Disney World vernacular — what "rope-drop" means, what a "Lightning Lane Multi Pass booking" is vs. "Individual Lightning Lane", and the urgency framing that Disney social content requires.

How does the Cosmos prompt layer prevent stale prompts after a deploy?+

deploy.sh upserts pipeline_configs documents at deploy time, which resets prompt_source to "hardcoded" and clears any Cosmos-stored prompt. The correct operations sequence is: deploy.sh first, then push Cosmos prompt updates via API. If prompt updates go first, deploy.sh overwrites them and the next scheduled run uses the hardcoded fallback — surfaced by Cosmos prompt unavailable — no hardcoded fallback exists in the pipeline log. This is documented as Rule 6 in the deployment runbook.

How does temperature=0 enforce format compliance?+

The generation pipeline parses responses by exact section markers (---FACEBOOK---, ---INSTAGRAM---, IMAGE_KEY:, HEADLINE:, etc.). Any deviation — extra preamble text, wrong marker spelling, field reordering — breaks the parser and produces an empty ContentArtifact. At temperature=0 with explicit format instructions in the system prompt, Claude's first response follows the schema without retries. The previous Bedrock Agent runtime approach at higher temperatures required multi-turn format correction loops. temperature=0 made the retry logic unnecessary — it was removed entirely.

How are model IDs managed across environments?+

Three env vars govern model selection: CLAUDE_MODEL_ID (Pass 1, defaults to Haiku 4.5), CLAUDE_MODEL_ID_BLOG (Pass 2 blog, defaults to Sonnet 4.5), CLAUDE_MODEL_ID_SOCIAL (Pass 2 social, defaults to Sonnet 4.5). Changing any of these via az functionapp config appsettings set + restart swaps the model with no code change. This lets A/B testing happen against live production traffic: one day Sonnet on social, next day Haiku, compare platform engagement metrics directly. Model IDs use the cross-region inference profile format (us.anthropic.claude-*) for failover across AWS regions.

Multi-ModelAI Content Generation