Click any component to explore it in the full case study.

Data Pipeline — weather-updater-v6 · Continuous ingestion
Step 1
METAR Parse
Decode raw strings → structured fields + adverse event flags
Step 2
Sounding Fetch
Parse CAPE/CIN/LI/K-Index/TT/SI/PWAT from UWY archive
Step 3
Feature Assembly
ml_training_features BigQuery view joins METAR + sounding into hourly vectors
Step 4
BigQuery Upsert
weather_updater_v3 table — 8+ years of history, growing daily
🏋️
Model Training — weather-ml-trainer · On-demand after data milestones
Reads from ml_training_features view. Trains 6 independent Random Forest classifiers with class-weight balancing. Serializes each model to GCS as a pickled .pkl file.
44,000+ training observations Random Forest · class-weight balanced BigQuery → scikit-learn → GCS pkl 8+ years of historical data Drift-triggered retraining
Six Trained Models — All Random Forest · GCS pickle storage
Thunderstorm Nowcast
RF n=150 depth=12
F1 0.87
All 7 atmos indices
Precipitation
RF · balanced
F1 0.82
METAR precip groups
High Wind
RF · balanced
F1 0.78
Wind speed trends
Fog
RF n=100 depth=8
F1 0.84
Surface only (no sounding)
Venue Impact
RF · KISM primary
F1 0.79
Nearest WDW station
High-Impact Venue
RF · high threshold
Precision 0.91
Severe event filter
17
FAA stations monitored
6
Trained ML models
44K+
Training observations
<30s
Inference runtime
Stack GCP BigQuery scikit-learn Random Forest Google Cloud Storage Python aviationweather.gov API UWY Soundings Archive