Machine Learning · BigQuery · GCP · scikit-learn · METAR · Atmospheric Science

8 Years of Florida Weather Data
Turned Into Consumable Park Intelligence

The Weather Whisperer ML pipeline ingests METAR aviation weather reports from 17 Central Florida stations, fuses them with upper-air atmospheric soundings (CAPE, Lifted Index, K-Index), and produces six trained models that generate 2-hour thunderstorm, precipitation, wind, fog, and venue-impact nowcasts every 20 minutes — purpose-built to tell park visitors what weather means for the rides they're about to board.

44K+
Training observations
8 yrs
Historical METAR depth
6
Trained ML models
17
Weather stations
0.99
Best ROC AUC
20 min
Prediction cadence

From Raw Aviation Reports to Operational Predictions

The intelligence pipeline transforms raw FAA METAR observations into park-actionable predictions through four distinct processing stages, each running as an independent GCP Cloud Run Job on its own cadence.

Stage 1 · Every 10 min
METAR Ingestion — weather-updater-v6
Polls 17 Central Florida FAA stations via aviationweather.gov. Parses raw METAR strings: decodes present weather groups (TS, VCTS, LTG for thunderstorms; BR/FG for fog), computes derived fields (dewpoint depression, precip rolling windows), and upserts to BigQuery weather_updater_v3. Also flags adverse events: adverse_thunderstorm, adverse_high_wind_event, adverse_fog_mist.
aviationweather.gov API BigQuery weather_updater_v3 METAR present-weather decode
Stage 2 · Every 12 hrs
Atmospheric Soundings — weather-updater-atmos
Fetches upper-air radiosonde data from the University of Wyoming soundings archive for Tampa Bay station 72210. Parses CAPE (convective available potential energy), CIN (convective inhibition), Lifted Index, K-Index, Total Totals Index, Showalter Index, and PWAT (precipitable water). These 7 atmospheric parameters are the most predictive features for the thunderstorm model — they describe the thermodynamic state of the atmosphere above the surface, where convection is initiated.
UWyo soundings archive (Station 72210) CAPE / CIN / Lifted Index K-Index · Total Totals · PWAT
Stage 3 · On-demand
Model Training — weather-ml-trainer
Reads from ml_training_features — a BigQuery view that assembles hourly feature vectors from 8+ years of historical METAR and atmospheric data. Trains 6 independent scikit-learn models and serializes each to GCS as a pickled .pkl file. Training runs on-demand after data accumulation milestones or when model drift is detected. The thunderstorm model uses all 7 atmospheric indices plus METAR trends; the fog model uses only 6 surface-level features (atmospheric soundings don't add predictive value for radiation fog).
RandomForestClassifier · GradientBoostingRegressor 44,000+ training samples GCS pickled model storage
Stage 4 · Every 20 min
ML Predictions — weather-ml-predictor
Loads pickled models from GCS, queries current conditions + spatial storm context via BigQuery, and generates prediction payloads for all 17 stations. Applies a spatial boost adjustment to the thunderstorm model when nearby or upwind stations report active storms. Publishes per-station nowcast JSON and venue-impact JSON to public GCS. Total execution time: <30 seconds.
JSON to GCS (public, no auth) Spatial boost via ST_DISTANCE <30s execution, <$0.01/run

Model Performance at 44,000+ Observations

All models use Random Forest classifiers with class-weight balancing to handle the natural imbalance of severe weather events (thunderstorms occur on roughly 15–20% of summer afternoons). ROC AUC scores reflect test set evaluation on a 20% holdout using stratified splitting.

Model Target ROC AUC Accuracy Key Feature Groups
thunderstorm_nowcast
RandomForest · n=150 · depth=12 · balanced
target_thunderstorm
0.750
72% CAPE, Lifted Index, K-Index, Total Totals, PWAT, pressure change, dewpoint depression
precipitation_prediction
RandomForest · balanced
target_precipitation
0.990
99% Precip rolling windows, dewpoint depression, visibility, hourly precip
high_wind_prediction
RandomForest · balanced
target_high_wind ≥25 kt gusts
1.000
100% Wind speed, wind gust change 1h, pressure change, temperature change 3h
fog_prediction
RandomForest · n=100 · depth=8 · balanced
target_fog
0.900
92% Dewpoint depression, visibility, wind speed, hour of day (radiation fog window)
venue_impact_prediction
RandomForest · KISM primary features
Operational impact at WDW
0.847
81% KISM-specific features (nearest WDW station), multi-hour precip, atmospheric indices
high_impact_venue_prediction
RandomForest · high-impact threshold
High-severity operational impact
0.950
95% KISM + CAPE thresholds, historical high-impact event labels

Fog model intentionally uses no atmospheric sounding data — the model learned that ground-truth surface METAR features (dewpoint depression <2°F, wind <5 kt, visibility trending down) were more predictive for Florida radiation fog than upper-air instability indices, which describe convective initiation rather than surface-based fog formation.

What the Models Actually See

The thunderstorm nowcast model uses 17 input features across four categories. The atmospheric sounding features (CAPE, LI, K-Index) consistently rank as the top predictors — surface METAR alone cannot capture the pre-storm thermodynamic instability that drives Florida convection.

Atmospheric Soundings (Upper-Air)
cape_j_kg
Energy for convective initiation
cin_j_kg
Inhibition — lid suppressing storms
lifted_index
Parcel stability; <-2 = unstable
k_index
Moisture depth + lapse rate
total_totals_index
Thunderstorm composite index
showalter_index
Shallow convection threshold
pwat_inches
Total column water vapor
Surface METAR Observations
avg_temp_f
Surface temperature
dewpoint_depression
temp_f − dewpoint_f; <5°F = humid
avg_pressure_hpa
Station pressure (altimeter)
pressure_change_1h
Falling = approaching system
pressure_change_3h
Medium-term trend
total_precip_in
Hourly accumulation
avg_wind_speed_kt
Sustained wind
max_wind_gust_kt
Peak gust in window
Spatial Storm Context
nearby_tstm_count
Stations with TS within 100 mi
nearby_precip_count
Stations with precip within 100 mi
upwind_tstm_count
TS stations within 150 mi (all dirs)
Computed via BigQuery ST_DISTANCE CROSS JOIN across all 17 stations per prediction cycle
Temporal Context
hour_of_day
Critical for FL sea-breeze window
temp_change_1h
Rapid cooling = outflow boundary
temp_change_3h
Diurnal heating trend
Florida peak convective window: 15:00–20:00 UTC (11 AM–4 PM local). hour_of_day captures this pattern — storm probability peaks sharply in afternoon hours.

How Nearby Storms Modify the Prediction

Pure single-station METAR features miss a critical signal: a station's atmosphere can still be clear while a storm 50 miles away is confirmed active and propagating toward it. The spatial boost uses BigQuery's geography functions to measure real-time storm proximity across the entire 17-station network and inject that awareness into the model output.

Three-Ring Spatial Context Around Each Station

100 mi — Nearby Ring
+20% boost per active TS station
  • Immediate threat vector
  • Any confirmed TS in window
  • Drives ACTIVE storm state
150 mi — Upwind Ring
+15% boost per active TS station
  • Approaching storm signal
  • Includes all bearing directions
  • Typical Florida sea-breeze reach
Cross-JOIN Architecture
BigQuery ST_DISTANCE
  • 17 × 17 = 289 distance pairs
  • Computed fresh every 20 min
  • Filters to latest obs per station
spatial_boost = min(0.6, nearby_tstm × 0.20 + upwind_tstm × 0.15)
final_prob = min(0.95, model_prob + spatial_boost)

# Example: 1 nearby storm + 1 upwind storm
# spatial_boost = min(0.6, 0.20 + 0.15) = 0.35
# If base model says 9% → boosted to 44% (MODERATE risk level)
Why the boost is additive rather than multiplicative+

A multiplicative boost (probability × multiplier) would have diminishing returns on already-elevated probabilities — a 60% base probability multiplied by 1.5 gives 90%, whereas a base of 5% gives only 7.5%. The effect is too variable depending on where in the probability range the model lands.

An additive offset applies a fixed increment regardless of starting point, which more faithfully represents the idea that "a confirmed storm 50 miles away adds approximately X% to your probability of seeing one in the next 2 hours." The hard clamps at 0.6 maximum boost and 0.95 maximum final probability prevent overflow and preserve model calibration at the extremes.

The clear-sky suppressor: why CAPE alone can't trigger a HIGH risk+

Tampa Bay atmospheric soundings are collected twice daily (00Z and 12Z) and have a 25-hour valid window. A sounding from the previous morning can show high CAPE values that are accurate for that time but stale by noon the next day, when skies are clear and sea-breeze dynamics have not yet initialized.

Without a suppressor, the BigQuery views would emit HIGH thunderstorm risk scores on clear sunny mornings because the cached CAPE value from 18 hours prior is still loaded. The clear-sky suppressor in v_weather_current_enhanced gates CAPE and Lifted Index contributions: if the current METAR shows no present weather, visibility ≥5 miles, and no precipitation in the last 3 hours, atmospheric contributions are zeroed out. This prevents false HIGH alerts on clear mornings while preserving them when surface conditions corroborate the upper-air instability.

The morning dewpoint suppression is a companion rule: the dewpoint depression risk score contribution is gated to afternoon hours (11:00–21:00 local), since high dewpoint readings at 06:00 AM don't indicate convective risk — they indicate overnight humidity that will dry out with morning solar heating.

Why 8 years of data matters for a binary classification problem+

Thunderstorm events at any given FAA station occur roughly 70–80 days per year in Central Florida — the highest-frequency severe weather environment in the continental US. But the class imbalance is still significant: hourly observations are collected roughly 24 × 365 = 8,760 times per year, of which perhaps 500–700 contain a thunderstorm flag. That's a ~6–8% positive class rate.

With one year of data this would produce approximately 500–700 positive samples for training — sufficient for a basic model but too thin for reliable probability calibration, especially for the multi-feature atmospheric interactions that drive convection. Eight years gives approximately 4,000–5,600 positive samples, enough to reliably learn the atmospheric state patterns that precede Florida afternoon thunderstorms (specifically: high CAPE + low CIN + elevated K-Index + afternoon local time + sea-breeze convergence signals in the surface observations).

The class_weight='balanced' parameter also compensates by up-weighting the minority (thunderstorm) class during training, but this only helps if there are enough minority samples for the tree structure to learn meaningful splits. Eight years of data provides that depth.

Venue impact models: translating meteorology to operational decisions+

The venue_impact_prediction and high_impact_venue_prediction models are KISM-specific — KISM (Kissimmee Gateway Airport, 3.5 miles from Magic Kingdom) is the most spatially relevant FAA station for WDW operations. The venue impact target is derived from historical correlation between KISM weather events and observed ride closure rates in the park operations database.

A thunderstorm at KISM doesn't automatically mean ride closures. The correlation depends on storm intensity (CAPE-derived), proximity (KISM vs. peripheral stations), and which rides are operating (outdoor coasters close at different thresholds than indoor dark rides). The venue impact models learn this relationship directly from the historical data rather than encoding fixed rules, producing a probability output that reflects the empirical operational impact of past weather events on the same property.

17-Station Central Florida Coverage

Coverage spans a ~100-mile radius around WDW. KISM is designated the primary venue-impact station. Multi-station spatial queries are the foundation of the storm tracking system.

KISM
Kissimmee Gateway Airport
Primary WDW venue station — 3.5 mi from Magic Kingdom. Used for all venue-impact models.
KMCO
Orlando International
Regional anchor — 15 mi NE of WDW. High-reliability obs, good upper-air consistency.
KORL
Orlando Executive
Urban Orlando urban heat island reference. 12 mi NE.
KLAL
Lakeland Linder
West vector — storm track from Tampa Bay sea-breeze convergence zone.
KTPA
Tampa International
65 mi west. Critical west-approach upwind station; Tampa Bay sea-breeze origin.
KSFB
Sanford Orlando
North vector — I-4 corridor storms propagating south.
KVDF
Tampa Executive
SW approach. Sea-breeze collision zone between Gulf and Atlantic moisture.
KMLB
Melbourne Orlando Intl
East vector — Atlantic sea-breeze. Storms initiate at the east coast and move west.
+ 9 more
Daytona · Ocala · Gainesville · Sebring · Vero Beach · Punta Gorda · Fort Myers · Sarasota · Flagler
Outer ring for long-range spatial context (100–150 mi upwind detection).
Lightning Tracker (companion case study) → Downstream: Ride Forecasting ↗ ETL Pipeline ↗ All Projects ↗