Digital Twin Premium — AI-Driven Migration Wave Planning

Digital Twin Premium —
AI-Driven Migration
Wave Planning

An end-to-end platform that instruments live datacenters, synthesizes 30 days of telemetry into an in-memory dependency graph, and uses AI simulation to automate wave planning, blast radius analysis, right-sizing, and TCO modeling — transforming migration planning from a multi-month consulting exercise into a data-driven, automated process.

1,053

Servers modeled

Datacenters

2,715

Applications

174

Dependency edges

AI simulations

6R decision rules

Migration planning fails because dependency data doesn't exist in a usable form

Enterprise datacenter migrations involving hundreds or thousands of servers fail most often not from technical execution errors — but from planning errors: wrong wave groupings, missed dependencies, and surprise blast radii that take down production applications. The root cause is almost always the same: dependency data was never systematically collected, and the migration plan was built on assumptions.

🕸️

Dependency Data Is Tribal Knowledge

Application owners know their dependencies informally. No authoritative, machine-readable map of which servers talk to which exists at the estate level. Wave plans built without this map cut live connections on migration night.

📋

Wave Planning Is a Spreadsheet Exercise

Migration consultants manually group servers into waves based on interviews, CMDB exports, and educated guesses. A 1,000-server migration might take 3–6 months of planning before a single server moves. The plan is outdated by the time execution starts.

💥

Blast Radius Is Unknown Until It's Too Late

Without a live dependency graph, you can't answer: "if this server goes down during migration, what else breaks?" Discovering that answer during a maintenance window — while production is degraded — is the most common cause of emergency rollbacks.

📐

Right-Sizing Is Guesswork Without Telemetry

Lifting and shifting servers at their current spec wastes 40–60% of cloud spend. But right-sizing requires real utilization data over time — not the peak figures stored in CMDB that often reflect provisioned capacity from years ago, not actual load.

💰

TCO Models Are Disconnected from Reality

Executive-level cloud TCO projections are often built in Excel using list-price estimates against total server counts. They don't reflect the 6R disposition mix, actual core counts, utilization-adjusted sizing, or the cost difference between Rehost, Replatform, and Retire at scale.

🔍

No Data Masking for Client Demos and Proposals

Reference implementations built on real client data can't be demonstrated to prospects without a masking layer. Many firms maintain parallel anonymized datasets manually — which drift from the real data and undermine confidence in the demo.

Four phases: collect, model, simulate, deliver

Digital Twin Premium instruments the live estate, builds an in-memory dependency graph, runs five AI-driven simulations against it, and delivers results as an interactive HTML dashboard, multi-tab Excel report, and PDF executive brief — all from a single automated pipeline.

Phase 1

Collection

30-day telemetry
Windows PowerShell
Linux Bash collectors
Network TCP/UDP deps
Ansible mass deploy
SQLite data lake

→

Phase 2

Core Pipeline

FQDN matching
App ID enrichment
8-rule 6R disposition
Gap analysis
Graph construction

→

Phase 3

AI Simulation

Wave rehearsal
Blast radius walk
Co-dependency BFS
Right-sizing
Cost projection

→

Phase 4

Outputs

HTML dashboard (D3.js)
Excel workbook (7 tabs)
PDF executive report
EARE AI agent export
Masked demo dataset

System Architecture

Collection Layer

Datacenter hosts → SQLite

collect_windows.ps1 (WMI/CIM) collect_linux.sh (/proc) collect_network_windows.ps1 collect_network_linux.sh (ss -tnp) aggregate_pipeline.py → perf_raw aggregate_network.py → net_connections deploy_collectors.yml (Ansible)

Compute Layer

Lambda + optional EC2

Lambda: run_pipeline.py Lambda: export_dashboard_data.py Lambda: eare_integration.py Lambda: executive_report.py EC2 t3.medium (optional — aggregation) CloudFormation nested stacks

Pipeline Core

Pure Python, no graph DB

data_loader.py (5 sources, pandas) matching_engine.py (4-step FQDN) disposition_engine.py (8-rule chain) twin_simulator.py (in-memory graph) masking.py (deterministic, 160+ names) report_generator.py (openpyxl)

Storage & Output

S3 + 3 output formats

S3: raw/ processed/ datalake/ reports/ dashboard/ AES-256 · Versioning · 90-day Glacier dashboard.html (Chart.js + D3.js, offline-capable) report.xlsx (7 tabs, openpyxl) executive_report.pdf eare-export/ (cross-account S3 for AI agents)

Deterministic 6R Disposition Engine

Every server in the estate receives a deterministic 6R classification from an 8-rule priority chain. The first matching rule wins — no ambiguity, no LLM hallucination risk on the classification itself. AI enters downstream, in simulation and narrative generation.

Priority	Disposition	Condition
1	Retire	environment = "Decommissioned"
2	Retain (Already in Cloud)	in_azure = "Y" OR azure_status = "Already in Azure"
3	Repurchase (SaaS)	application_type = "SaaS"
4	Retain (Appliance)	application_name ∈ KNOWN_APPLIANCES (60 entries)
5	Refactor (Citrix)	is_citrix = "Y"
6	Replatform (Database)	database field populated AND ≠ "In Progress"
7	Rehost (Lift & Shift)	device_type = "virtual" AND no database AND not appliance
8	Retain (Physical)	device_type = "physical"
—	Review Required	No rule matched

AI Simulation Engine

Five simulations — one in-memory graph, zero external databases

The DigitalTwin class builds a Python dict-based graph from server detail records with five cross-reference indexes. ~1,000 servers load in milliseconds — no Neptune, no graph database licensing cost. AI simulations run against this graph to answer the questions that migration planners could previously only answer by trial and error.

Simulation 1

Wave Rehearsal

simulate_migration(hostnames)

Given a proposed wave (set of hostnames), identifies which application stacks are split across the wave boundary, maps cross-boundary dependencies that would break, and computes total resource load (cores, RAM, storage) for the wave.

Output: split stacks · cross-boundary dep count · resource totals per wave

Simulation 2

Blast Radius

blast_radius(hostname)

Walks the inbound dependency graph from a single server hostname, discovering all servers and applications that depend on it — directly or transitively. Answers the critical question: "If this server goes offline, what breaks?"

Output: affected servers · affected apps · dependency chain depth

Simulation 3

Co-Dependency Groups

find_co_dependencies(dc=None)

BFS traversal of the full dependency graph to discover clusters of mutually dependent servers that must be migrated as a unit. Optional DC-scoped filter. These groups become the atomic building blocks of migration wave construction.

Output: co-dependency groups → natural migration wave candidates

Simulation 4

Right-Sizing Analysis

rightsizing_analysis(hostnames)

Classifies each server into one of four utilization categories using 30 days of actual telemetry data, then computes recommended cloud specs and projected core savings. Replaces CMDB-based provisioned capacity with real observed utilization.

Output: category · current specs · recommended specs · core savings

Simulation 5

Cost Projection

export_dashboard_data.py

Computes per-disposition TCO using configurable per-core annual rates for both on-premises and cloud environments. Produces an aggregate financial model showing current spend, projected cloud spend, and annual savings per 6R category.

Output: per-disposition TCO · aggregate current vs. projected · annual savings

Right-Sizing: 4 Utilization Categories

ZOMBIE

CPU avg < 10% AND RAM avg < 20%

→ Recommend decommission review

OVERSIZED

CPU avg < 20% (any RAM)
Recommended: max(2, cores ÷ 2) · max(2, RAM × 0.6)

→ Halve cores, cut RAM 40%

MODERATE

CPU avg < 50% (any RAM)
Recommended: max(2, cores × 0.75) · max(2, RAM × 0.8)

→ Reduce cores 25%, RAM 20%

WELL-UTILIZED

CPU avg ≥ 50%
Current spec appropriate

→ Lift & shift at current spec

Dependency Graph — Wave Boundary Analysis (Example)

Wave 2 — Proposed: 8 servers across 3 stacks 2 split stacks detected · 3 cross-boundary dependencies at risk

Epic Production Cluster

TCP 1433 (SQL)

→

Epic Clarity Database

✓ In wave

Epic Production Cluster

TCP 443 (HTTPS)

→

Citrix VDI Farm

⚠ NOT in wave — will break

Epic Clarity Database

TCP 8443 (API)

→

Reporting Services Stack

⚠ NOT in wave — will break

Wave Rehearsal Recommendation

Move Citrix VDI Farm (12 servers) and Reporting Services Stack (4 servers) into Wave 2 to eliminate cross-boundary dependencies. Revised wave: 24 servers, all co-dependencies satisfied.

Per-Disposition Cost Model (Per Core / Year)

Disposition	On-Prem Cost	Cloud Cost	Savings/Core/Year
Retire	$150	$0	$150
Repurchase (SaaS)	$150	$60	$90
Replatform	$200	$130	$70
Refactor	$180	$110	$70
Rehost (Lift & Shift)	$150	$95	$55
Retain (Cloud)	$95	$95	$0
Retain (Appliance / Physical)	$200–250	$200–250	$0
8,420 cores at reference estate mix	~$1.52M	~$0.91M	~$610K/year

Technical Architecture Details

The collection pipeline produces ~50 MB per 1,000 hosts over 30 days — a moderate data volume that doesn't justify managed database cost or operational complexity. SQLite provides zero-configuration, file-based storage that works identically on EC2, Lambda (via /tmp or EFS), and local development without any provisioning.

The perf_raw table uses a UNIQUE INDEX ON (hostname, timestamp) for deduplication — CSV drops from collectors may overlap, and the insert-or-ignore pattern ensures exactly-once semantics per sample. The perf_summary view computes per-host aggregates (avg CPU, max RAM, storage totals) used directly by the right-sizing engine.

The DigitalTwin class builds a Python dict-based graph from server detail records with five cross-reference indexes (servers, stack_members, server_deps, app_servers, server_app). ~1,000 servers load in milliseconds.

This avoids Neptune costs (~$0.10/hr + storage), eliminates graph database provisioning and IAM complexity, and keeps the product self-contained with no external runtime dependencies. BFS for co-dependency discovery runs in-process in Python — no network round trips. The design decision trades raw query flexibility (which Neptune provides) for simplicity and zero operational overhead — the right tradeoff for an estate of this size.

The DataMasker class uses sequential ID generation and a curated list of 160+ Epic ecosystem application names to produce a masked dataset that is referentially consistent across all outputs. The same input hostname always maps to the same SRV-NNNN — so the dependency graph, the dashboard, the Excel report, and the EARE export all reference the same masked identity.

This is deterministic masking, not tokenization. No masking database required. The masked dataset faithfully preserves all structural properties of the real data — stack membership, dependency edges, utilization patterns, disposition mix — while replacing all identifiers with Epic infrastructure zone names that read naturally in the context of enterprise IT (Epic Production Cluster, Epic Clarity Database, Citrix VDI Farm, etc.).

When deployed alongside EARE Standard (the main cloud advisory product), Digital Twin Premium exports four data packages to an S3 prefix accessible via a cross-account bucket policy: disposition CSV, dependencies JSON (stack edges + server-to-server graph), right-sizing CSV, and cost projections JSON.

EARE's Bedrock AI agents consume these structured exports to generate advisory narratives, answer migration planning questions, and produce deal-specific TCO models. This creates a clean data pipeline: Digital Twin does the deterministic computation; AI does the synthesis and communication. Neither product requires the other — the integration activates via a single CloudFormation parameter (EnableEareIntegration=true).

The interactive dashboard is a single HTML file with embedded Chart.js (bar/pie charts), D3.js (dependency graph visualization), and Tailwind CSS. It loads dashboard_data.json from S3 and renders 12 data sections including disposition summaries, resource-by-disposition, TCO analysis, stack dependency graphs, and per-server detail tables.

No build step, no API dependency, no authentication surface area for the dashboard layer. The JSON file contains all masked data — the dashboard works fully offline by downloading both files. This makes it trivial to share with client stakeholders or embed in a deal portal without hosting infrastructure.

The matching engine, disposition engine, twin simulator, masking engine, cost projection model, and right-sizing engine are all validated with property-based tests (Hypothesis) in addition to unit tests. Properties include: CSV deduplication preserving unique (hostname, timestamp) pairs; disposition rules being mutually exclusive and exhaustive; masking being referentially consistent across all outputs; right-sizing recommendations always meeting minimum core/RAM floor constraints; and cost projections computing correct per-core savings.

These aren't just "happy path" unit tests — property tests generate hundreds of random inputs per property and verify the invariant holds for all of them. This is especially important for the matching engine, which handles per-DC column variations and edge cases in FQDN normalization.

From multi-month planning exercise to data-driven, automated process

The primary transformation is replacing assumption-driven migration planning with evidence-driven migration planning. Every wave grouping, every right-sizing recommendation, and every TCO projection is backed by 30 days of actual telemetry and a live dependency graph.

Before — Manual Migration Planning

Dependency mappingInterviews + guesses

Wave planning duration3–6 months

Blast radius discoveryDuring production outage

Right-sizing data sourceCMDB provisioned capacity

TCO model basisList prices × total count

Demo datasetManually maintained, drifts

Plan currencyStale by execution start

After — Digital Twin Premium

Dependency mapping30-day live telemetry + graph

Wave planning durationHours after data collection

Blast radius discoverySimulated pre-migration

Right-sizing data sourceActual CPU/RAM avg over 30 days

TCO model basisPer-disposition, right-sized cores

Demo datasetDeterministic masking, always consistent

Plan currencyRe-run pipeline on latest telemetry

174

Dependency Edges Discovered

26 stack groups, 174 stack-to-stack dependency edges auto-discovered from live network telemetry — none of which existed in the CMDB before the collection run.

~40%

Right-Sizing Waste Captured

In typical enterprise estates, 30–50% of servers are in ZOMBIE or OVERSIZED categories. Telemetry-based right-sizing prevents lifting provisioned-capacity waste directly into cloud.

External Database Costs

No Neptune, no RDS, no graph database licensing. SQLite + in-memory Python graph handles 1,053 servers, 2,715 apps, and 174 dependency edges with zero managed database overhead.

100%

Masking Referential Integrity

Deterministic masking ensures the dashboard, Excel report, PDF brief, and EARE export all reference the same masked identities — making the demo dataset behave exactly like real client data.

Dependency Data Is the Migration Plan

Migration projects don't fail in execution — they fail in planning, because the dependency graph was never built. Digital Twin Premium makes building it an automated, repeatable, data-driven process.

AWS Lambda · Python 3.11 · SQLite · CloudFormation · Ansible · D3.js · Chart.js

Digital Twin Premium —AI-Driven MigrationWave Planning