Migration Intelligence · AWS Lambda · Python 3.11 · AI-Driven

Digital Twin Premium —
AI-Driven Migration
Wave Planning

An end-to-end platform that instruments live datacenters, synthesizes 30 days of telemetry into an in-memory dependency graph, and uses AI simulation to automate wave planning, blast radius analysis, right-sizing, and TCO modeling — transforming migration planning from a multi-month consulting exercise into a data-driven, automated process.

1,053
Servers modeled
4
Datacenters
2,715
Applications
174
Dependency edges
5
AI simulations
8
6R decision rules

Migration planning fails because dependency data doesn't exist in a usable form

Enterprise datacenter migrations involving hundreds or thousands of servers fail most often not from technical execution errors — but from planning errors: wrong wave groupings, missed dependencies, and surprise blast radii that take down production applications. The root cause is almost always the same: dependency data was never systematically collected, and the migration plan was built on assumptions.

🕸️

Dependency Data Is Tribal Knowledge

Application owners know their dependencies informally. No authoritative, machine-readable map of which servers talk to which exists at the estate level. Wave plans built without this map cut live connections on migration night.

📋

Wave Planning Is a Spreadsheet Exercise

Migration consultants manually group servers into waves based on interviews, CMDB exports, and educated guesses. A 1,000-server migration might take 3–6 months of planning before a single server moves. The plan is outdated by the time execution starts.

💥

Blast Radius Is Unknown Until It's Too Late

Without a live dependency graph, you can't answer: "if this server goes down during migration, what else breaks?" Discovering that answer during a maintenance window — while production is degraded — is the most common cause of emergency rollbacks.

📐

Right-Sizing Is Guesswork Without Telemetry

Lifting and shifting servers at their current spec wastes 40–60% of cloud spend. But right-sizing requires real utilization data over time — not the peak figures stored in CMDB that often reflect provisioned capacity from years ago, not actual load.

💰

TCO Models Are Disconnected from Reality

Executive-level cloud TCO projections are often built in Excel using list-price estimates against total server counts. They don't reflect the 6R disposition mix, actual core counts, utilization-adjusted sizing, or the cost difference between Rehost, Replatform, and Retire at scale.

🔍

No Data Masking for Client Demos and Proposals

Reference implementations built on real client data can't be demonstrated to prospects without a masking layer. Many firms maintain parallel anonymized datasets manually — which drift from the real data and undermine confidence in the demo.


Four phases: collect, model, simulate, deliver

Digital Twin Premium instruments the live estate, builds an in-memory dependency graph, runs five AI-driven simulations against it, and delivers results as an interactive HTML dashboard, multi-tab Excel report, and PDF executive brief — all from a single automated pipeline.

Phase 1
Collection
  • 30-day telemetry
  • Windows PowerShell
  • Linux Bash collectors
  • Network TCP/UDP deps
  • Ansible mass deploy
  • SQLite data lake
Phase 2
Core Pipeline
  • FQDN matching
  • App ID enrichment
  • 8-rule 6R disposition
  • Gap analysis
  • Graph construction
Phase 3
AI Simulation
  • Wave rehearsal
  • Blast radius walk
  • Co-dependency BFS
  • Right-sizing
  • Cost projection
Phase 4
Outputs
  • HTML dashboard (D3.js)
  • Excel workbook (7 tabs)
  • PDF executive report
  • EARE AI agent export
  • Masked demo dataset
Collection Layer
Datacenter hosts → SQLite
collect_windows.ps1 (WMI/CIM) collect_linux.sh (/proc) collect_network_windows.ps1 collect_network_linux.sh (ss -tnp) aggregate_pipeline.py → perf_raw aggregate_network.py → net_connections deploy_collectors.yml (Ansible)
Compute Layer
Lambda + optional EC2
Lambda: run_pipeline.py Lambda: export_dashboard_data.py Lambda: eare_integration.py Lambda: executive_report.py EC2 t3.medium (optional — aggregation) CloudFormation nested stacks
Pipeline Core
Pure Python, no graph DB
data_loader.py (5 sources, pandas) matching_engine.py (4-step FQDN) disposition_engine.py (8-rule chain) twin_simulator.py (in-memory graph) masking.py (deterministic, 160+ names) report_generator.py (openpyxl)
Storage & Output
S3 + 3 output formats
S3: raw/ processed/ datalake/ reports/ dashboard/ AES-256 · Versioning · 90-day Glacier dashboard.html (Chart.js + D3.js, offline-capable) report.xlsx (7 tabs, openpyxl) executive_report.pdf eare-export/ (cross-account S3 for AI agents)

Every server in the estate receives a deterministic 6R classification from an 8-rule priority chain. The first matching rule wins — no ambiguity, no LLM hallucination risk on the classification itself. AI enters downstream, in simulation and narrative generation.

PriorityDispositionCondition
1Retireenvironment = "Decommissioned"
2Retain (Already in Cloud)in_azure = "Y" OR azure_status = "Already in Azure"
3Repurchase (SaaS)application_type = "SaaS"
4Retain (Appliance)application_name ∈ KNOWN_APPLIANCES (60 entries)
5Refactor (Citrix)is_citrix = "Y"
6Replatform (Database)database field populated AND ≠ "In Progress"
7Rehost (Lift & Shift)device_type = "virtual" AND no database AND not appliance
8Retain (Physical)device_type = "physical"
Review RequiredNo rule matched

Five simulations — one in-memory graph, zero external databases

The DigitalTwin class builds a Python dict-based graph from server detail records with five cross-reference indexes. ~1,000 servers load in milliseconds — no Neptune, no graph database licensing cost. AI simulations run against this graph to answer the questions that migration planners could previously only answer by trial and error.

Simulation 1
Wave Rehearsal
simulate_migration(hostnames)
Given a proposed wave (set of hostnames), identifies which application stacks are split across the wave boundary, maps cross-boundary dependencies that would break, and computes total resource load (cores, RAM, storage) for the wave.
Output: split stacks · cross-boundary dep count · resource totals per wave
Simulation 2
Blast Radius
blast_radius(hostname)
Walks the inbound dependency graph from a single server hostname, discovering all servers and applications that depend on it — directly or transitively. Answers the critical question: "If this server goes offline, what breaks?"
Output: affected servers · affected apps · dependency chain depth
Simulation 3
Co-Dependency Groups
find_co_dependencies(dc=None)
BFS traversal of the full dependency graph to discover clusters of mutually dependent servers that must be migrated as a unit. Optional DC-scoped filter. These groups become the atomic building blocks of migration wave construction.
Output: co-dependency groups → natural migration wave candidates
Simulation 4
Right-Sizing Analysis
rightsizing_analysis(hostnames)
Classifies each server into one of four utilization categories using 30 days of actual telemetry data, then computes recommended cloud specs and projected core savings. Replaces CMDB-based provisioned capacity with real observed utilization.
Output: category · current specs · recommended specs · core savings
Simulation 5
Cost Projection
export_dashboard_data.py
Computes per-disposition TCO using configurable per-core annual rates for both on-premises and cloud environments. Produces an aggregate financial model showing current spend, projected cloud spend, and annual savings per 6R category.
Output: per-disposition TCO · aggregate current vs. projected · annual savings
ZOMBIE
CPU avg < 10% AND RAM avg < 20%
→ Recommend decommission review
OVERSIZED
CPU avg < 20% (any RAM)
Recommended: max(2, cores ÷ 2) · max(2, RAM × 0.6)
→ Halve cores, cut RAM 40%
MODERATE
CPU avg < 50% (any RAM)
Recommended: max(2, cores × 0.75) · max(2, RAM × 0.8)
→ Reduce cores 25%, RAM 20%
WELL-UTILIZED
CPU avg ≥ 50%
Current spec appropriate
→ Lift & shift at current spec
Wave 2 — Proposed: 8 servers across 3 stacks 2 split stacks detected · 3 cross-boundary dependencies at risk
Epic Production Cluster
TCP 1433 (SQL)
Epic Clarity Database
✓ In wave
Epic Production Cluster
TCP 443 (HTTPS)
Citrix VDI Farm
⚠ NOT in wave — will break
Epic Clarity Database
TCP 8443 (API)
Reporting Services Stack
⚠ NOT in wave — will break
Wave Rehearsal Recommendation
Move Citrix VDI Farm (12 servers) and Reporting Services Stack (4 servers) into Wave 2 to eliminate cross-boundary dependencies. Revised wave: 24 servers, all co-dependencies satisfied.
DispositionOn-Prem CostCloud CostSavings/Core/Year
Retire$150$0$150
Repurchase (SaaS)$150$60$90
Replatform$200$130$70
Refactor$180$110$70
Rehost (Lift & Shift)$150$95$55
Retain (Cloud)$95$95$0
Retain (Appliance / Physical)$200–250$200–250$0
8,420 cores at reference estate mix~$1.52M~$0.91M~$610K/year

The collection pipeline produces ~50 MB per 1,000 hosts over 30 days — a moderate data volume that doesn't justify managed database cost or operational complexity. SQLite provides zero-configuration, file-based storage that works identically on EC2, Lambda (via /tmp or EFS), and local development without any provisioning.

The perf_raw table uses a UNIQUE INDEX ON (hostname, timestamp) for deduplication — CSV drops from collectors may overlap, and the insert-or-ignore pattern ensures exactly-once semantics per sample. The perf_summary view computes per-host aggregates (avg CPU, max RAM, storage totals) used directly by the right-sizing engine.

The DigitalTwin class builds a Python dict-based graph from server detail records with five cross-reference indexes (servers, stack_members, server_deps, app_servers, server_app). ~1,000 servers load in milliseconds.

This avoids Neptune costs (~$0.10/hr + storage), eliminates graph database provisioning and IAM complexity, and keeps the product self-contained with no external runtime dependencies. BFS for co-dependency discovery runs in-process in Python — no network round trips. The design decision trades raw query flexibility (which Neptune provides) for simplicity and zero operational overhead — the right tradeoff for an estate of this size.

The DataMasker class uses sequential ID generation and a curated list of 160+ Epic ecosystem application names to produce a masked dataset that is referentially consistent across all outputs. The same input hostname always maps to the same SRV-NNNN — so the dependency graph, the dashboard, the Excel report, and the EARE export all reference the same masked identity.

This is deterministic masking, not tokenization. No masking database required. The masked dataset faithfully preserves all structural properties of the real data — stack membership, dependency edges, utilization patterns, disposition mix — while replacing all identifiers with Epic infrastructure zone names that read naturally in the context of enterprise IT (Epic Production Cluster, Epic Clarity Database, Citrix VDI Farm, etc.).

When deployed alongside EARE Standard (the main cloud advisory product), Digital Twin Premium exports four data packages to an S3 prefix accessible via a cross-account bucket policy: disposition CSV, dependencies JSON (stack edges + server-to-server graph), right-sizing CSV, and cost projections JSON.

EARE's Bedrock AI agents consume these structured exports to generate advisory narratives, answer migration planning questions, and produce deal-specific TCO models. This creates a clean data pipeline: Digital Twin does the deterministic computation; AI does the synthesis and communication. Neither product requires the other — the integration activates via a single CloudFormation parameter (EnableEareIntegration=true).

The interactive dashboard is a single HTML file with embedded Chart.js (bar/pie charts), D3.js (dependency graph visualization), and Tailwind CSS. It loads dashboard_data.json from S3 and renders 12 data sections including disposition summaries, resource-by-disposition, TCO analysis, stack dependency graphs, and per-server detail tables.

No build step, no API dependency, no authentication surface area for the dashboard layer. The JSON file contains all masked data — the dashboard works fully offline by downloading both files. This makes it trivial to share with client stakeholders or embed in a deal portal without hosting infrastructure.

The matching engine, disposition engine, twin simulator, masking engine, cost projection model, and right-sizing engine are all validated with property-based tests (Hypothesis) in addition to unit tests. Properties include: CSV deduplication preserving unique (hostname, timestamp) pairs; disposition rules being mutually exclusive and exhaustive; masking being referentially consistent across all outputs; right-sizing recommendations always meeting minimum core/RAM floor constraints; and cost projections computing correct per-core savings.

These aren't just "happy path" unit tests — property tests generate hundreds of random inputs per property and verify the invariant holds for all of them. This is especially important for the matching engine, which handles per-DC column variations and edge cases in FQDN normalization.


From multi-month planning exercise to data-driven, automated process

The primary transformation is replacing assumption-driven migration planning with evidence-driven migration planning. Every wave grouping, every right-sizing recommendation, and every TCO projection is backed by 30 days of actual telemetry and a live dependency graph.

Before — Manual Migration Planning
Dependency mappingInterviews + guesses
Wave planning duration3–6 months
Blast radius discoveryDuring production outage
Right-sizing data sourceCMDB provisioned capacity
TCO model basisList prices × total count
Demo datasetManually maintained, drifts
Plan currencyStale by execution start
After — Digital Twin Premium
Dependency mapping30-day live telemetry + graph
Wave planning durationHours after data collection
Blast radius discoverySimulated pre-migration
Right-sizing data sourceActual CPU/RAM avg over 30 days
TCO model basisPer-disposition, right-sized cores
Demo datasetDeterministic masking, always consistent
Plan currencyRe-run pipeline on latest telemetry
174
Dependency Edges Discovered
26 stack groups, 174 stack-to-stack dependency edges auto-discovered from live network telemetry — none of which existed in the CMDB before the collection run.
~40%
Right-Sizing Waste Captured
In typical enterprise estates, 30–50% of servers are in ZOMBIE or OVERSIZED categories. Telemetry-based right-sizing prevents lifting provisioned-capacity waste directly into cloud.
0
External Database Costs
No Neptune, no RDS, no graph database licensing. SQLite + in-memory Python graph handles 1,053 servers, 2,715 apps, and 174 dependency edges with zero managed database overhead.
100%
Masking Referential Integrity
Deterministic masking ensures the dashboard, Excel report, PDF brief, and EARE export all reference the same masked identities — making the demo dataset behave exactly like real client data.

Dependency Data Is the Migration Plan

Migration projects don't fail in execution — they fail in planning, because the dependency graph was never built. Digital Twin Premium makes building it an automated, repeatable, data-driven process.

AWS Lambda · Python 3.11 · SQLite · CloudFormation · Ansible · D3.js · Chart.js