Prototype · Azure OpenAI Assistants API · GPT-4o-mini · IaC

Park Whisperer
Azure OpenAI Assistants API

An earlier architecture for the Park Whisperer conversational agent using the Azure OpenAI Assistants API — a stateful, persistent assistant model with managed threads and runs, distinct from the chat-completions-with-tools approach used in production. 16 function tools backed by GPT-4o-mini, with vector search over a Cosmos DB knowledge base, all configured as Infrastructure as Code and updated via a Python deployment script.

This Version — Assistants API
APIclient.beta.assistants
State modelPersistent — Thread objects stored server-side
Tool invocationRun loop: submit → poll → submit tool output → poll
Assistant configPersistent object (asst_UEzQ...), updated via API
Deploymentupdate-assistant.py — IaC push to Azure OpenAI
Tool count16 function tools
vs
Production — Chat Completions + Tools
APIclient.chat.completions.create
State modelStateless — history passed in messages array per call
Tool invocationAgentic loop in Azure Function code; streaming response
Assistant configSystem prompt in function code; deployed with function app
DeploymentStandard Azure Functions deploy pipeline
Tool count15+ function tools
v4
Agent Version
16
Function Tools
9
Knowledge Types
285
System Prompt Lines
GPT-4o
mini · Azure

Threads, Runs, and Tool Output Submission

The Assistants API uses a fundamentally different execution model from chat completions. The assistant object is persistent and server-managed. Conversations are Threads. Each LLM invocation is a Run on a thread — which may pause mid-execution, requiring the client to submit tool results and resume.

Once
Create / Update Assistant
Persistent assistant object with system instructions, model, and all 16 tool definitions. Updated via update-assistant.py pushing from agent-config.json.
beta.assistants.update()
Per Conversation
Create Thread
A Thread holds the full conversation history server-side. Threads persist between sessions — the client only stores the thread ID, not the message history.
beta.threads.create()
Per Message
Add Message + Create Run
User message added to thread. A Run is created to invoke the model. The Run transitions through: queued → in_progress → requires_action → completed
beta.threads.runs.create()
If Tools Needed
Poll + Submit Tool Outputs
Run pauses at requires_action with a submit_tool_outputs action. Client executes the function calls, then submits results to resume the Run.
beta.threads.runs.submit_tool_outputs()
Final
Retrieve Response
Once Run reaches completed, client retrieves the assistant's messages from the thread. The full exchange is stored server-side in the Thread.
beta.threads.messages.list()

Agent Defined in JSON, Deployed via Python Script

All agent configuration — system prompt, model selection, and all 16 tool definitions — lives in version-controlled JSON files. update-assistant.py reads the config and pushes it to the Azure OpenAI service, treating the assistant configuration as deployable infrastructure.

update-assistant.py — Push config to Azure OpenAI
# Load config from agent-config.json with open('agent-config.json') as f: config = json.load(f) client = AzureOpenAI( azure_endpoint="https://parkwhisperer-dev-cz4vkm.cognitiveservices.azure.com/", api_key=os.getenv("AZURE_OPENAI_API_KEY"), api_version="2024-05-01-preview" ) # Push all configuration in one call — instructions + tools + model updated = client.beta.assistants.update( assistant_id="asst_UEzQNn0e8YKninGZhi1kGstr", instructions=config['instructions'], # 285-line system prompt tools=config['tools'], # 16 function definitions name=config['name'], model=config['model'] # gpt-4o-mini ) # Snapshot deployed state back to agent-assistant-config.json snapshot = { "assistant_id": updated.id, "tools_count": len(updated.tools), "updated_at": "2025-12-14" } json.dump(snapshot, open('agent-assistant-config.json', 'w'))
agent-config.json
Source of truth: name, description, model, instructions, tools array System instructions: 285 lines covering data source rules, multi-day query patterns, knowledge base search types, tool combinations, response format Rule: ONLY use function tool responses — NEVER fabricate data Rule: Multi-day queries call get_downtime_summary once per day, aggregate results
agent-functions.json
Canonical function definitions with JSON Schema parameters Metadata block: endpoint URL map, update frequency, date format rules, operational date boundary (Eastern 3 AM) 16 functions with descriptions written for LLM tool selection — not human documentation All data-returning endpoints include optional entity_id and date filters
update-assistant.py
One-command deploy: reads config, calls beta.assistants.update(), snapshots result Prints all 16 function names before updating — visual confirmation of what's being pushed Saves updated state to agent-assistant-config.json for drift detection API version: 2024-05-01-preview — Assistants API was still in beta
agent.html
Static HTML chat interface for testing the assistant directly in browser Space Mono monospace font, dark slate theme — Park Whisperer visual identity Calls Azure OpenAI Assistants API from browser JS (api key via agent_config.js)

16 Function Tools Across Four Categories

FunctionCategoryRouter Bot EndpointDescription
get_current_wait_times Live /v1/currentWaits Real-time attraction wait times, updated every 10 minutes. Optional entity_id filter.
get_current_down_attractions Live /v1/currentDown Attractions experiencing downtime. Critical for plan-change guidance.
get_park_schedule Live /v1/dailySchedules Park operating hours for planning arrival and departure.
get_lightning_lane_availability Lightning Lane /v1/lightningLane LL return time availability. Filters: queue_type (PAID_RETURN_TIME | RETURN_TIME), sold_out_only.
get_top_wait_times Ranked /v1/topTenWaits Ranked longest waits over daily/weekly/monthly periods.
get_top_restaurant_waits Ranked /v1/topTenRestaurants Ranked restaurant wait times for dining crowd avoidance.
get_sellout_events Lightning Lane /v1/selloutEvents LL return time sellout events. Shows which slots are no longer available.
get_top_sellouts Lightning Lane /v1/topSellouts Attractions ranked by sellout speed. Indicates most-in-demand LL windows.
get_historical_wait_patterns History /v1/waitSnapshots Wait time history for a specific attraction + date. Required params: entityName, date.
get_downtime_summary History /v1/downSummary Daily downtime summary. Multi-day queries call once per date and aggregate.
get_park_advisories Live /v1/advisories Current park advisories, alerts, and special event announcements. No parameters.
get_purchase_options Lightning Lane /v1/purchaseOptions Individual LL and Premier Pass pricing and availability.
get_park_shows Live /v1/parkShows Show schedules and entertainment listings.
get_release_events Lightning Lane /v1/releaseEvents LL release timing events — when return times become available.
get_park_entities Reference /v1/parkEntities Park entity reference data — IDs, names, types for all attractions and parks.
search_park_knowledge Vector /v1/searchKnowledge Cosmos DB vector search over park_knowledge container. Required param: query. Optional: type, limit (default 3).

9 Knowledge Types in the Cosmos DB Vector Store

The search_park_knowledge tool queries a Cosmos DB container with vector embeddings. The agent is instructed to use this tool when guests ask for advice, strategies, tips, or recommendations — as opposed to live operational data. Nine semantic knowledge types are defined and searchable by type filter.

wait_time_strategy
When and how to approach each attraction for minimum wait
reliability_info
Historical downtime patterns and reliability risk per attraction
popularity_ranking
Relative crowd demand and peak periods by attraction
low_wait_gem
Hidden or underrated attractions with consistently short waits
next_best_action
Context-aware recommendations: what to do next given current conditions
sellout_risk_windows
Time windows when LL return times historically sell out fastest
dining_pressure_windows
Peak dining congestion times and avoidance strategies
lightning_lane_volatility
Which LL return times fluctuate most and when to grab them
park_whisperer_magic
Expert insider tips and non-obvious strategies for maximizing the visit

What the Assistants API Got Right — and Why Production Moved Away

The core advantage: server-managed thread state+

The most compelling property of the Assistants API is that the conversation history lives server-side in Thread objects. The client only needs to store a thread_id — not reconstruct or send the full message history on every turn. For a multi-turn park planning conversation that accumulates dozens of messages and multiple tool call results, this matters: passing the full history in the messages array on every call adds tokens and latency proportional to conversation length.

In practice, the Thread-based model works well for long-running planning sessions where a guest asks follow-up questions across many turns. The server-side state also means if the client drops the connection mid-turn, the Run can continue and the result is retrievable later — a resilience property that the stateless chat completions model doesn't have.

The polling requirement — why it complicates real-time UX+

The Assistants API Run lifecycle requires polling: create a Run, poll until requires_action or completed, execute tools, submit outputs, poll again. Each poll is an HTTP round-trip. A conversation turn requiring three tool calls involves: create run → poll → submit three tool outputs → poll → retrieve messages. This polling loop adds latency and complexity compared to streaming chat completions where the tool call response arrives in the stream and the agentic loop runs in the same function execution.

The production system uses streaming chat completions inside an Azure Function, which allows token-by-token streaming to the client — users see the response building in real time. The Assistants API (at beta/2024-05-01-preview) did not support streaming for runs with tool calls, which made the UX feel slower even when total latency was similar.

16 tools vs 15+ — what was added and why+

get_park_shows and get_release_events were present in this v4 Assistants version that weren't in early production iterations. Show schedules are straightforward — the agent can tell a guest when Fantasmic plays tonight. The release events tool tracks when Lightning Lane return times become available throughout the day, which is a more advanced use case: guests asking "when should I try to book TRON Lightning Lane?" can get data-driven timing recommendations rather than generic advice.

The 16th tool, get_park_entities, provides the reference entity data (IDs, names, types) that other tools use for filtering. In practice, the agent rarely needs to call this explicitly — it uses entity IDs from the system context — but having it as a callable tool means the agent can resolve an ambiguous attraction name to an ID before calling other tools. This resolved a class of "attraction not found" errors when guests used informal names.

IaC config management — treating the assistant as deployable infrastructure+

Separating the agent configuration into version-controlled JSON files (agent-config.json, agent-functions.json) and deploying via script is the most durable practice from this version. It meant that changes to the system prompt or tool definitions were tracked in git, reviewable as diffs, and reproducible — rather than being made directly in the Azure portal where changes leave no audit trail.

The agent-assistant-config.json snapshot pattern (reading back the deployed state after update) is equivalent to a Terraform state file: it records what was actually deployed, allowing drift detection between what's in agent-config.json and what the Azure OpenAI service has. This pattern carried forward into the production deployment pipeline, where function app settings are read back and validated after each deploy.

Why production migrated from Assistants API to Chat Completions + Tools+

Three concrete reasons drove the migration to the stateless chat completions approach for production:

  • Streaming — chat completions support token streaming with tool use; the beta Assistants API did not. Users expect streaming in modern chat interfaces.
  • Control — the agentic tool-call loop running inside an Azure Function is fully observable and debuggable. The Assistants API's Run lifecycle is opaque: if a Run fails or stalls, the error is often a generic timeout rather than a specific exception from the tool execution code.
  • Thread management overhead — server-side Thread objects accumulate and require cleanup. For a high-traffic public product, managing Thread lifecycle (creation, expiry, deletion) adds operational burden that the stateless model avoids entirely. Each conversation in the production system is self-contained in the client-sent messages array.

The trade-off: the production system must send the full conversation history on every turn, which uses more tokens for long sessions. But the streaming UX improvement and reduced operational complexity were judged worth that cost.

Production Successor → Knowledge RAG ↗ Bedrock Prototype ↗ All Projects ↗