A multi-agent AI pipeline that analyzes enterprise application portfolios and produces auditable, provenance-tagged 6R cloud migration recommendations — with zero hallucinated numbers and built-in healthcare governance.
When a healthcare organization decides to migrate 36 clinical applications — PACS systems, oncology platforms, cardiology imaging — to the cloud, the analysis required is enormous. Traditional consulting approaches fail in predictable ways that cost time, money, and clinical trust.
Consultants produce ROI projections ($2.3M savings!) that cite no source, reference no formula, and can't be reproduced. When challenged, the analysis falls apart. Healthcare CIOs lose confidence in the entire recommendation.
A portfolio of 36 applications — each requiring vendor research, VM telemetry analysis, procurement intelligence, dependency mapping, and financial modeling — traditionally takes 3–6 months of analyst time at $150–$350/hr billing rates.
Generic cloud migration frameworks don't know that a radiation therapy linear accelerator interface is hardware-integrated and life-safety-classified — making cloud migration structurally impossible, regardless of whether a SaaS alternative exists.
When a healthcare compliance officer asks "why did you recommend REPURCHASE for Sectra PACS with 124 VMs?", there is no traceable chain from evidence to recommendation. The analysis is a black box.
Different analysts, different days, different answers. A portfolio analysis that takes three consultants three months has no reproducibility guarantee — critical when the same portfolio needs re-analysis after a vendor acquisition.
A mid-size healthcare portfolio advisory engagement runs $250K–$750K in consulting fees. Most of that spend goes to data gathering and formatting — work that is inherently automatable if the AI is governed properly.
The platform is a multi-agent AI pipeline built on AWS Bedrock (Claude Sonnet 4.6) that processes an enterprise application portfolio through a chain of specialized agents — each governed by deterministic validation rules that run after every LLM call. The pipeline produces fully auditable recommendations where every number traces to a source.
vm_count × SRC-TCO-001 × cloud_multiplier with a fully traceable
formula in the output JSON.
phi_handled, baa_confirmed,
hardware_dependency, and life_safety_classification
are tri-state — not binary. The pipeline enforces this with a governance rule:
baa_confirmed=false from a missing field will cap confidence at 40
on a PHI-handling system — potentially blocking a valid migration recommendation.
The tri-state model separates "we don't know" from "the answer is no."
IF hardware_dependency == true AND (latency_sensitive == true OR life_safety_classification == true)
→ strategic_recommendation = RETAIN, unconditionally.
phi_handled=true AND baa_confirmed != truesnapshot_count=1 AND apm_telemetry_available=false
final_confidence < 60 and the
strategic recommendation is not RETAIN, the displayed recommendation becomes RETAIN.
No exceptions. If an agent reasons that an exception applies, that reasoning is the
signal to apply the gate.
scenario_bridge.py runs with zero LLM calls.
It reads the completed recommendations JSON and produces a deterministic comparison
of current state (on-prem, full TCO) versus target state (recommended disposition,
projected cloud cost) for every application.
The core architectural principle is that hallucination must be structurally impossible to hide — not merely discouraged. This is achieved through a provenance model that requires every numeric value to carry one of four explicit tags. A deterministic Python validator (22 rules, no LLM) runs after every agent call and rejects non-compliant output.
Value has a named industry source. References the Sourced Values Registry (Gartner, IDC, AWS MAP data). Validator rejects if citation doesn't resolve.
Value came from customer-supplied data. Requires source artifact (e.g., rvtools_export.xlsx) and source field. Validator rejects if artifact is not in the input manifest.
Derived from a formula whose inputs are themselves provenance-tagged. Requires formula, inputs, and computation. Validator recomputes and rejects on mismatch > 0.01.
Documented default used when data is missing. References the Default Assumptions Registry. Surfaced explicitly in the output so reviewers see every assumption made.
{
"current_annual_tco_usd": 117000,
"projected_cloud_cost_usd": 31200,
"annual_savings_usd": 85800
}
// Where did $117,000 come from?
// No source. No formula. Unverifiable.
{
"current_annual_tco_usd": {
"value": 117000,
"provenance": "COMPUTED",
"formula": "vm_count × on_prem_tco_per_vm",
"inputs": [
{"field": "vm_count", "value": 18,
"provenance": "CUSTOMER_PROVIDED",
"source_artifact": "rvtools_export.xlsx"},
{"field": "on_prem_tco_per_vm", "value": 6500,
"provenance": "SOURCED",
"source_citation": "SRC-TCO-001",
"source_basis": "Gartner IaaS TCO 2022-24"}
],
"computation": "18 × 6500 = 117000"
}
}
A centralized table of every numeric constant the system uses — on-prem TCO per VM by type, cloud cost multipliers, migration factors — each with Gartner / IDC citation and variance band. Agents reference Source IDs; they cannot override values without a documented reason.
A Python program (not an LLM) runs after every agent call. It mechanically checks provenance tags, resolves source citations, recomputes arithmetic, and gates downstream agents on failure. PASS / PASS_WITH_DEFAULTS / FAIL — explicit, logged, surfaced.
When an agent returns malformed JSON, the pipeline automatically retries up to 2 times with an explicit correction instruction injected into the next call. Parse failures and retries are logged and surfaced in the token report — never silently swallowed.
Every default value used (when customer data is missing) is surfaced in a top-level
default_assumptions_used array in the output. Reviewers see exactly what
the system assumed, why, and whether better data would resolve it.
This platform doesn't just automate the analysis — it changes what's possible. Engagements that were previously gated by analyst headcount can now be run iteratively, updated when new telemetry arrives, and re-executed against a changed portfolio without starting over.
This v4 run analyzed the full client Imaging & Radiology portfolio: 36 clinical applications spanning PACS systems (Sectra, 124 VMs), oncology (Mosaiq/Elekta), cardiology imaging (Lumedx Apollo, GE Centricity), radiology AI (RapidAI, Viz.ai, HeartFlow), and enterprise imaging (Hyland OnBase, 55 VMs). The pipeline correctly applied life-safety RETAIN rules to Philips Perinatal, Philips Patient Monitoring, and Mosaiq — systems where hardware integration makes cloud migration structurally blocked regardless of SaaS availability. The final report was published to a live S3-hosted dashboard accessible to the engagement team within hours of data ingestion.
Full token accounting from the 2026-05-22 production run — 36 apps, 408 VMs, Claude Sonnet 4.6 on AWS Bedrock.
The full Imaging & Radiology portfolio report and executive summary are live-hosted on AWS S3. Both are fully interactive with sortable tables, disposition breakdowns, and per-application evidence trails.
Built on AWS Bedrock · Claude Sonnet 4.6 · Python 3.11 · ~3M tokens · $24 total compute