Prototype · Azure OpenAI Assistants API · GPT-4o-mini · IaC

Park Whisperer
Azure OpenAI Assistants API

An earlier architecture for the Park Whisperer conversational agent using the Azure OpenAI Assistants API — a stateful, persistent assistant model with managed threads and runs, distinct from the chat-completions-with-tools approach used in production. 16 function tools backed by GPT-4o-mini, with vector search over a Cosmos DB knowledge base, all configured as Infrastructure as Code and updated via a Python deployment script.

Production Successor: Park Agent Chat ↗ ← All Projects

This Version — Assistants API

APIclient.beta.assistants

State modelPersistent — Thread objects stored server-side

Tool invocationRun loop: submit → poll → submit tool output → poll

Assistant configPersistent object (asst_UEzQ...), updated via API

Deploymentupdate-assistant.py — IaC push to Azure OpenAI

Tool count16 function tools

Production — Chat Completions + Tools

APIclient.chat.completions.create

State modelStateless — history passed in messages array per call

Tool invocationAgentic loop in Azure Function code; streaming response

Assistant configSystem prompt in function code; deployed with function app

DeploymentStandard Azure Functions deploy pipeline

Tool count15+ function tools

Agent Version

Function Tools

Knowledge Types

285

System Prompt Lines

GPT-4o

mini · Azure

The Assistants API Model

Threads, Runs, and Tool Output Submission

The Assistants API uses a fundamentally different execution model from chat completions. The assistant object is persistent and server-managed. Conversations are Threads. Each LLM invocation is a Run on a thread — which may pause mid-execution, requiring the client to submit tool results and resume.

Once

Create / Update Assistant

Persistent assistant object with system instructions, model, and all 16 tool definitions. Updated via update-assistant.py pushing from agent-config.json.

beta.assistants.update()

Per Conversation

Create Thread

A Thread holds the full conversation history server-side. Threads persist between sessions — the client only stores the thread ID, not the message history.

beta.threads.create()

Per Message

Add Message + Create Run

User message added to thread. A Run is created to invoke the model. The Run transitions through: queued → in_progress → requires_action → completed

beta.threads.runs.create()

If Tools Needed

Poll + Submit Tool Outputs

Run pauses at requires_action with a submit_tool_outputs action. Client executes the function calls, then submits results to resume the Run.

beta.threads.runs.submit_tool_outputs()

Final

Retrieve Response

Once Run reaches completed, client retrieves the assistant's messages from the thread. The full exchange is stored server-side in the Thread.

beta.threads.messages.list()

Configuration as Code

Agent Defined in JSON, Deployed via Python Script

All agent configuration — system prompt, model selection, and all 16 tool definitions — lives in version-controlled JSON files. update-assistant.py reads the config and pushes it to the Azure OpenAI service, treating the assistant configuration as deployable infrastructure.

update-assistant.py — Push config to Azure OpenAI

# Load config from agent-config.json
with open('agent-config.json') as f:
    config = json.load(f)

client = AzureOpenAI(
    azure_endpoint="https://parkwhisperer-dev-cz4vkm.cognitiveservices.azure.com/",
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),
    api_version="2024-05-01-preview"
)

# Push all configuration in one call — instructions + tools + model
updated = client.beta.assistants.update(
    assistant_id="asst_UEzQNn0e8YKninGZhi1kGstr",
    instructions=config['instructions'],   # 285-line system prompt
    tools=config['tools'],               # 16 function definitions
    name=config['name'],
    model=config['model']                  # gpt-4o-mini
)

# Snapshot deployed state back to agent-assistant-config.json
snapshot = {
    "assistant_id": updated.id,
    "tools_count": len(updated.tools),
    "updated_at": "2025-12-14"
}
json.dump(snapshot, open('agent-assistant-config.json', 'w'))

agent-config.json

Source of truth: name, description, model, instructions, tools array System instructions: 285 lines covering data source rules, multi-day query patterns, knowledge base search types, tool combinations, response format Rule: ONLY use function tool responses — NEVER fabricate data Rule: Multi-day queries call get_downtime_summary once per day, aggregate results

agent-functions.json

Canonical function definitions with JSON Schema parameters Metadata block: endpoint URL map, update frequency, date format rules, operational date boundary (Eastern 3 AM) 16 functions with descriptions written for LLM tool selection — not human documentation All data-returning endpoints include optional entity_id and date filters

update-assistant.py

One-command deploy: reads config, calls beta.assistants.update(), snapshots result Prints all 16 function names before updating — visual confirmation of what's being pushed Saves updated state to agent-assistant-config.json for drift detection API version: 2024-05-01-preview — Assistants API was still in beta

agent.html

Static HTML chat interface for testing the assistant directly in browser Space Mono monospace font, dark slate theme — Park Whisperer visual identity Calls Azure OpenAI Assistants API from browser JS (api key via agent_config.js)

Tool Definitions

16 Function Tools Across Four Categories

Function	Category	Router Bot Endpoint	Description
get_current_wait_times	Live	/v1/currentWaits	Real-time attraction wait times, updated every 10 minutes. Optional entity_id filter.
get_current_down_attractions	Live	/v1/currentDown	Attractions experiencing downtime. Critical for plan-change guidance.
get_park_schedule	Live	/v1/dailySchedules	Park operating hours for planning arrival and departure.
get_lightning_lane_availability	Lightning Lane	/v1/lightningLane	LL return time availability. Filters: queue_type (PAID_RETURN_TIME \| RETURN_TIME), sold_out_only.
get_top_wait_times	Ranked	/v1/topTenWaits	Ranked longest waits over daily/weekly/monthly periods.
get_top_restaurant_waits	Ranked	/v1/topTenRestaurants	Ranked restaurant wait times for dining crowd avoidance.
get_sellout_events	Lightning Lane	/v1/selloutEvents	LL return time sellout events. Shows which slots are no longer available.
get_top_sellouts	Lightning Lane	/v1/topSellouts	Attractions ranked by sellout speed. Indicates most-in-demand LL windows.
get_historical_wait_patterns	History	/v1/waitSnapshots	Wait time history for a specific attraction + date. Required params: entityName, date.
get_downtime_summary	History	/v1/downSummary	Daily downtime summary. Multi-day queries call once per date and aggregate.
get_park_advisories	Live	/v1/advisories	Current park advisories, alerts, and special event announcements. No parameters.
get_purchase_options	Lightning Lane	/v1/purchaseOptions	Individual LL and Premier Pass pricing and availability.
get_park_shows	Live	/v1/parkShows	Show schedules and entertainment listings.
get_release_events	Lightning Lane	/v1/releaseEvents	LL release timing events — when return times become available.
get_park_entities	Reference	/v1/parkEntities	Park entity reference data — IDs, names, types for all attractions and parks.
search_park_knowledge	Vector	/v1/searchKnowledge	Cosmos DB vector search over `park_knowledge` container. Required param: query. Optional: type, limit (default 3).

Design Decisions

What the Assistants API Got Right — and Why Production Moved Away

The core advantage: server-managed thread state+

The most compelling property of the Assistants API is that the conversation history lives server-side in Thread objects. The client only needs to store a thread_id — not reconstruct or send the full message history on every turn. For a multi-turn park planning conversation that accumulates dozens of messages and multiple tool call results, this matters: passing the full history in the messages array on every call adds tokens and latency proportional to conversation length.

In practice, the Thread-based model works well for long-running planning sessions where a guest asks follow-up questions across many turns. The server-side state also means if the client drops the connection mid-turn, the Run can continue and the result is retrievable later — a resilience property that the stateless chat completions model doesn't have.

The polling requirement — why it complicates real-time UX+

The Assistants API Run lifecycle requires polling: create a Run, poll until requires_action or completed, execute tools, submit outputs, poll again. Each poll is an HTTP round-trip. A conversation turn requiring three tool calls involves: create run → poll → submit three tool outputs → poll → retrieve messages. This polling loop adds latency and complexity compared to streaming chat completions where the tool call response arrives in the stream and the agentic loop runs in the same function execution.

The production system uses streaming chat completions inside an Azure Function, which allows token-by-token streaming to the client — users see the response building in real time. The Assistants API (at beta/2024-05-01-preview) did not support streaming for runs with tool calls, which made the UX feel slower even when total latency was similar.

16 tools vs 15+ — what was added and why+

get_park_shows and get_release_events were present in this v4 Assistants version that weren't in early production iterations. Show schedules are straightforward — the agent can tell a guest when Fantasmic plays tonight. The release events tool tracks when Lightning Lane return times become available throughout the day, which is a more advanced use case: guests asking "when should I try to book TRON Lightning Lane?" can get data-driven timing recommendations rather than generic advice.

The 16th tool, get_park_entities, provides the reference entity data (IDs, names, types) that other tools use for filtering. In practice, the agent rarely needs to call this explicitly — it uses entity IDs from the system context — but having it as a callable tool means the agent can resolve an ambiguous attraction name to an ID before calling other tools. This resolved a class of "attraction not found" errors when guests used informal names.

IaC config management — treating the assistant as deployable infrastructure+

Separating the agent configuration into version-controlled JSON files (agent-config.json, agent-functions.json) and deploying via script is the most durable practice from this version. It meant that changes to the system prompt or tool definitions were tracked in git, reviewable as diffs, and reproducible — rather than being made directly in the Azure portal where changes leave no audit trail.

The agent-assistant-config.json snapshot pattern (reading back the deployed state after update) is equivalent to a Terraform state file: it records what was actually deployed, allowing drift detection between what's in agent-config.json and what the Azure OpenAI service has. This pattern carried forward into the production deployment pipeline, where function app settings are read back and validated after each deploy.

Why production migrated from Assistants API to Chat Completions + Tools+

Three concrete reasons drove the migration to the stateless chat completions approach for production:

Streaming — chat completions support token streaming with tool use; the beta Assistants API did not. Users expect streaming in modern chat interfaces.
Control — the agentic tool-call loop running inside an Azure Function is fully observable and debuggable. The Assistants API's Run lifecycle is opaque: if a Run fails or stalls, the error is often a generic timeout rather than a specific exception from the tool execution code.
Thread management overhead — server-side Thread objects accumulate and require cleanup. For a high-traffic public product, managing Thread lifecycle (creation, expiry, deletion) adds operational burden that the stateless model avoids entirely. Each conversation in the production system is self-contained in the client-sent messages array.

The trade-off: the production system must send the full conversation history on every turn, which uses more tokens for long sessions. But the streaming UX improvement and reduced operational complexity were judged worth that cost.

Park WhispererAzure OpenAI Assistants API

Threads, Runs, and Tool Output Submission

Agent Defined in JSON, Deployed via Python Script

16 Function Tools Across Four Categories

9 Knowledge Types in the Cosmos DB Vector Store

What the Assistants API Got Right — and Why Production Moved Away

Park Whisperer
Azure OpenAI Assistants API