AI Video Team | Architecture

Click any component to explore it in the full case study.

10-field structured block from Pass 3: HOOK, SEGMENT_1, CTA, NARRATION (36–42 words enforced), platform captions. NARRATION feeds TTS; all fields feed caption system.

36–42 word contract

Input B

Park Background Images

9:16 aspect ratio images from Azure Blob Storage. Per-content-type image selection (rope_drop vs. weather vs. morning briefing).

Azure Blob · 9:16 images

Input C

Background Music Track

Per-content-type music selection. Energy and tempo matched to content type and time of day.

Per-pipeline audio

5-Stage Video Production Pipeline

Stage 1

Number Normalize

Digits → words for TTS ("60" → "sixty"). Prevents robotic pronunciation in narration.

›

Stage 2

ElevenLabs TTS

NARRATION → MP3 + per-character timestamps (eleven_multilingual_v2). Word-level timing map generated.

›

Stage 3

Caption Align

Word-level timing from ElevenLabs → ASR map (spoken words back to written). Semantic color-coding applied.

›

Stage 4

Frame Compose

Park images + word-pop captions + brand overlays. Per-content-type render profiles (motion, transitions, effects).

›

Stage 5

FFmpeg Encode

1080×1920 @ 30fps. CRF 18. Min 3500k bitrate. Duration-matched to TTS audio. Thumbnail extracted.

🎬

Quality Gates + Safety Checks

Word count (36–42) enforced pre-render. _must_skip_social() check before any platform API call. PREVIEW_MODE and PIPELINE_DRY_RUN gates for staging. Video quality validated (frame count, audio sync).

Word count gate (36–42)_must_skip_social() checkPREVIEW_MODE / DRY_RUNCaption color validationAudio sync verification0 manual steps

Output A

1080×1920 MP4 Video

9:16 Reels/TikTok/YouTube Shorts format. Word-pop captions with semantic color-coding: purple (numbers), mint (ride names), coral (status words). Brand overlay included.

1080×1920 · 30fps · CRF 18

Output B

Per-Platform Captions

IG_CAPTION, TIKTOK_CAPTION, YT_TITLE, YT_DESCRIPTION, YT_TAGS. Platform-native character limits and hashtag strategy applied per publish target.

Instagram · TikTok · YouTube

Output C

Thumbnail + Publish

Extracted thumbnail frame. Published to Instagram Reels, TikTok, YouTube Shorts conditionally. All gated by _must_skip_social() check.

Auto-published · 3 platforms

AI systems (Claude + ElevenLabs + Pillow + FFmpeg + Blob)

Caption colors (semantic)

10+

Render profiles

Manual steps

StackPythonElevenLabs eleven_multilingual_v2Pillow (PIL)FFmpegAzure Blob StorageClaude Sonnet (script)Instagram APITikTok APIYouTube API

Script-to-Published Video Pipeline