Chapter 16.2
Subsystem Roadmaps 2026 → 2030 (Consolidated)
Every subsystem in the AI data center is climbing the same exponential at once — power, cooling, silicon, fabric, and storage are all replatforming on overlapping three-year clocks — so the only roadmap that matters is the one that asks which irreversible substrate you must over-build today to absorb a density ramp you have not yet committed to.
What you'll decide here
- Whether to plumb and wire the facility substrate (floor loading, water headroom, electrical risers, pipe-rack space, voltage class) for the 2027–2028 density step — ~600 kW Kyber-class racks on 800 VDC — or accept that you have built a current-generation, ~130 kW liquid hall that cannot absorb the ramp without a tear-out.
- Which power architecture you commit the building to now: stay on 415/480 VAC with rack-level rectification, or take the ±400/800 VDC disaggregated-sidecar path that the >150 kW rack physically requires — the single least-reversible electrical decision in the project.
- Which scale-up standard you underwrite — NVLink/NVLink Fusion, UALink/UALoE, or Broadcom SUE/ESUN — because that bet sets your accelerator-vendor optionality, your copper-vs-optics reach budget, and whether co-packaged optics is on your 2027 critical path.
- Whether to design the cooling plant for single-phase direct-to-chip as table stakes while reserving the thermal and mechanical headroom for the microfluidic / two-phase step that the >2 kW-per-die generation will demand.
- Which roadmap items are reversible (accelerator generation within an envelope, oversubscription ratio, storage tier mix, transceiver form factor) and can be deferred — versus the handful that are irreversible and must be hedged in concrete and copper at scoping time.
The forward-looking sections scattered through Parts 4 through 9 each end with the same uneasy sentence: this subsystem is about to replatform. Power is moving off AC. Cooling is moving off air, and single-phase liquid is already being framed as the floor rather than the ceiling. Silicon is doubling memory and tripling fabric bandwidth on a yearly cadence. The network is fighting a three-way standards war while optics migrate onto the package. Storage is being pulled into the GPU's memory hierarchy. None of these is a standalone trend. They are five faces of one phenomenon — the power-bound density ramp — and they are synchronized, because they are all downstream of the same accelerator roadmap. This chapter consolidates those forward pointers into a single 2026 → 2030 view and does what a roadmap is actually for: it sorts the moves you can defer from the ones you must commit before steel is cut.
A roadmap is a register of option premiums. Betting on the wrong generation of GPU is reversible: you buy the next tray. The expensive mistake of the 2026 era is building a substrate that cannot absorb the generation after next, because the floor, the water, the risers, and the voltage class are the things you cannot retrofit cheaply once the hall is live. We walk the five subsystems in the order the cascade flows — power, cooling, compute/memory, network/optics, storage — and close on the rack and facility integration that ties them together, because by 2027 the unit of design stops being the 19-inch rack and becomes the ~600 kW power-and-cooling chassis.
Power: 415 VAC → 800 VDC, solid-state transformers, behind-the-meter generation
The power chain is the subsystem under the most acute roadmap pressure, because it sits at the intersection of two exponentials: the per-rack power draw climbing from ~40 kW (H100, 2023) through ~120–142 kW (GB200/GB300, 2024–2025) to ~190–230 kW (Vera Rubin, 2026) and ~600 kW (Rubin Ultra Kyber, H2 2027), and the grid-side scarcity that made power the binding constraint in the first place (Chapter 16.1). At ~40 kW you can rectify AC inside the rack and not think about it. At ~600 kW you cannot: the copper cross-section, the rectifier count, and the I²R losses of pushing that power at 415/480 VAC to the rack become physically and economically untenable. The roadmap answer is a step-change in distribution voltage, and it is a one-way door.
The fork is between staying on the familiar 415/480 VAC chain with rack-level rectification — fine through roughly 150 kW — and taking the ±400/800 VDC disaggregated-sidecar path that NVIDIA, the OCP Mt Diablo project (Google/Meta/Microsoft), and the major electrical vendors (ABB, Eaton, Schneider, Vertiv) have converged on for the >150 kW rack. 800 VDC carries over 150% more power through the same copper than 415/480 VAC, and the end-to-end electrical chain efficiency rises from the AC chain's ~61–87.5% to over 92% on a DC path — about a 5% end-to-end gain that, at gigawatt scale, is tens of megawatts of recovered capacity. The catch is that the DC ecosystem is immature: the solid-state transformer (SST) that makes medium-voltage-to-800 VDC conversion efficient hit ~98% at 400 kW in an ETH Zurich benchmark and targets ~99%, but UL listing is not expected until roughly 2029. The decision you face in 2026 is therefore to design the building's voltage class and riser topology for the DC future while bridging with conventional gear until the SST and DC-breaker supply chain matures.
Underneath the distribution question sits the generation question. The interconnection wall has pushed behind-the-meter (BTM) on-site generation from a fringe tactic to a mainline strategy: ~82–101 GW of BTM gas has been announced cumulatively by 2026 (though only ~7 GW is under construction and ~2–3 GW online), because energizing megawatts on-site in 18–36 months beats waiting four-to-seven years in a utility large-load queue. The roadmap consequence is that the power subsystem is no longer just a distribution-engineering problem inside the fence — it is a generation, fuel-supply, and grid-interactive problem, and the building must be designed to host the generation it may need to bridge to grid power. → DC architecture in Chapter 4.7; on-site generation in Chapter 4.8; the queue and speed-to-power in Chapter 3.2.
| Decision axis | Stay 415/480 VAC (rack rectification) | ±400/800 VDC (disaggregated sidecar) |
|---|---|---|
| Density ceiling served | Comfortable to ~150 kW/rack | Designed for >150 kW → ~600 kW Kyber → ~1 MW racks |
| End-to-end chain efficiency | ~61–87.5% (utility-to-VRM, AC) | Over 92% on the DC path (~5% e2e gain) |
| Copper / busway sizing | Baseline; rises sharply past 150 kW | Over 150% more power through the same copper |
| Ecosystem maturity (2026) | Mature, fully UL-listed, low risk | Immature; SST UL listing ~2029, DC breakers ramping |
| Reversibility | Low risk now, but caps the density ramp | Irreversible substrate bet; unlocks the 2027–2028 step |
| Best-fit decision | Current-gen inference halls, retrofits, bridges | New-build training / dense-inference campuses with a multi-gen ramp |
Cooling: single-phase DLC as table stakes, two-phase and microfluidics on deck
Cooling crossed its decisive fork earlier than the other subsystems, and the roadmap reflects a settled near-term and a contested far-term. The near-term is settled: single-phase direct-to-chip liquid cooling is the 2026 default, holding roughly 55% of the liquid-cooling market, because air saturates near 41 kW/rack and the GB200 NVL72 draws ~120–132 kW. Designing a new dense hall for anything but liquid in 2026 is a category error. The forward question is what comes after single-phase D2C when per-die thermal density keeps climbing toward and past 2 kW.
Here the roadmap forks into two candidate successors, and the fork is shaped as much by chemistry and liability as by thermodynamics. Two-phase approaches (two-phase D2C and two-phase immersion) offer better heat transfer by exploiting latent heat of vaporization — but two-phase immersion stalled hard when the PFAS health-and-liability crisis drove 3M to exit the Novec fluorochemical business, leaving the supply chain and regulatory picture for the dielectric fluids deeply uncertain. Microfluidics — etching coolant channels directly into the silicon, as in Microsoft's research demonstrating up to ~3x cold-plate performance with AI-designed bio-inspired channels — is the more promising far-term path because it attacks the thermal resistance at its source (the die-to-coolant interface) rather than downstream of it. Neither is a 2026 production decision. The roadmap consequence is conservative and specific: build for single-phase D2C now, but reserve the mechanical and facility-water headroom (CDU capacity, secondary-loop delta-T margin, manifold and quick-disconnect provisioning) so the hall can adopt the successor without re-plumbing. → DLC in Chapter 5.4; immersion and the PFAS problem in Chapter 5.5; facility water loops in Chapter 5.7.
Compute & memory: Vera Rubin → Rubin Ultra/Kyber → Feynman, HBM4, the advanced-packaging gate
The accelerator roadmap is the metronome the whole building marches to, because every other subsystem's clock is set by it. The publicly-stated cadence is yearly: Vera Rubin (VR200, ~190–230 kW rack) in 2026, Rubin Ultra in the ~600 kW Kyber NVL576 rack in H2 2027 — 576 GPU dies across 144 quad-die packages, ~15 ExaFLOPS FP4 inference / 5 ExaFLOPS FP8 training per rack — and Feynman in 2028, widely expected to push toward the 1 MW rack. Memory climbs in lockstep: per-GPU HBM goes H100 80GB → B200 192GB → B300 288GB → Rubin 288GB (HBM4) → Rubin Ultra 1TB (HBM4e). For a strategist the takeaway is the cadence, not the spec sheet. A yearly generation step with a roughly 2–3 year frontier-economic life (the contested depreciation figure from Chapter 1.8) means the substrate you pour in 2026 must absorb at least two generations of density growth within its first refresh window.
The real roadmap risk on this axis is the advanced-packaging gate, not the GPU. The binding constraint above wafer fabrication is TSMC CoWoS capacity and HBM stacking, not transistors. 2026 HBM3E is effectively sold out with a ~30% supply gap and ~15–20% per-quarter price rises, and HBM4 (qualified with Samsung and SK hynix for Rubin) inherits the same hybrid-bonding and CoWoS-wafer bottleneck. The consequence for a data-center program is counterintuitive: your delivery schedule is gated less by your capex and more by an upstream packaging line you do not control. A scope that assumes GPU availability on the vendor's stated cadence, without an allocation agreement, is underwriting a schedule it cannot guarantee. → NVIDIA roadmap in Chapter 7.2; HBM in Chapter 7.6; advanced packaging in Chapter 7.7.
Deep dive: why the packaging gate, not the fab, sets your delivery date
The intuitive model of GPU supply is a fab-limited one: more EUV wafers, more chips. That model has been wrong since Blackwell. The genuine bottleneck moved one step down the line, to advanced packaging — TSMC's CoWoS (chip-on-wafer-on-substrate) that integrates the logic die with its HBM stacks — and to the HBM stacking itself, where hybrid bonding limits how fast capacity can grow. The numbers tell the story: in 2026 HBM3E is fully allocated with a supply gap on the order of 30%, prices are rising ~15–20% per quarter, and CoWoS wafer starts — not GPU die yield — are what the hyperscalers are actually fighting over.
The consequence for facility planning is direct and frequently missed. You can energize the megawatts, pour the slab, plumb the liquid, and string the fabric, and still have empty racks because your accelerators are stuck behind someone else's packaging allocation. This inverts the usual planning order: in the power-bound era you race to energize power, but the GPUs that fill that power are gated by an upstream line measured in CoWoS wafers per month. The defensible scope treats accelerator allocation as a long-lead item on par with the transformer and the interconnection agreement — contracted, not assumed — and phases the capacity ramp to the packaging supply curve rather than to the vendor's headline launch date. → Chapter 7.7.
Network & optics: the scale-up wars, Ultra Ethernet, and co-packaged optics
The network roadmap is where the most consequential standards bet lives, because unlike power or cooling — where physics largely dictates the answer — the fabric question is partly a political one about which ecosystem you tie your accelerator optionality to. Two distinct fabrics are evolving in parallel. Scale-up (intra-rack, the memory-semantic domain that binds a tray of GPUs into one logical accelerator) is the contested ground: NVIDIA's NVLink (1.8 TB/s/GPU on Gen5, 3.6 TB/s on Gen6) with NVLink Fusion opening the IP to third-party CPUs/XPUs; the open UALink / UALoE consortium path (AMD-led, up to 1,024 accelerators in the 1.0 spec); and Broadcom's SUE/ESUN (Scale-Up Ethernet). Scale-out (inter-rack, the all-reduce fabric) is consolidating faster around Ultra Ethernet (UEC 1.0: packet spray with reorder, UCCM congestion control, native RDMA) as the open answer to InfiniBand, alongside NVIDIA's Spectrum-X.
The scale-up standard you underwrite is a multi-year lock-in decision: it sets which accelerators you can rack, how large a scale-up domain you can build (and therefore your tensor-/expert-parallel ceiling), and your copper-vs-optics reach budget. Choose NVLink and you get the largest, most mature scale-up domains and the deepest software stack — at the price of single-vendor accelerator dependence. Choose UALink or SUE and you preserve multi-vendor optionality at the price of a younger ecosystem and smaller proven domains. Underneath both sits the optics transition: as lane rates climb to 200G and 448G and reach budgets shrink (passive copper ~1–2 m at 800G/1.6T), co-packaged optics moves the optical engine onto the switch package — cutting per-1.6T-link power from ~30W (pluggable DSP) to ~9W and delivering ~3.5x energy efficiency with 10x resilience. NVIDIA's Quantum-X (InfiniBand) reached availability in early 2026 and Spectrum-X Photonics (Ethernet) follows in H2 2026. The roadmap consequence: if your 2027 build uses a ~600 kW Kyber rack, rack-to-rack optics and CPO are very likely on your critical path, not an option. → scale-out standards in Chapter 8.4; topology and oversubscription in Chapter 8.5; CPO and fiber plant in Chapter 8.10.
| Axis | NVLink / NVLink Fusion | UALink / UALoE | Broadcom SUE / ESUN |
|---|---|---|---|
| Backer / camp | NVIDIA (Fusion opens IP to 3rd-party XPUs) | Open consortium (AMD-led) | Broadcom (Ethernet-based scale-up) |
| Accelerator lock-in | Highest — NVIDIA-centric ecosystem | Lowest — multi-vendor by design | Low — Ethernet-merchant-silicon path |
| Per-GPU bandwidth | 1.8 TB/s (Gen5) → 3.6 TB/s (Gen6) | 200G-class lanes; spec to 1,024 accelerators | Ethernet SerDes (200G/400G/lane) |
| Maturity (2026) | Shipping at scale; deepest software stack | Spec 1.0 final; early silicon | Emerging; leverages Ethernet supply chain |
| Proven domain size | Largest (NVL72 → NVL576 Kyber) | Smaller proven domains so far | Smaller proven domains so far |
| Best-fit decision | Max performance, single-vendor accepted | Multi-vendor optionality a priority | Ethernet-everywhere / merchant strategy |
Storage: GPU/DPU-initiated I/O, all-flash everywhere, file/object convergence, CXL tiering
Storage is the subsystem most often left off a roadmap, and the most quietly transformed by the density ramp. The forward direction has four threads. First, the data path is moving off the CPU: GPUDirect Storage and GPU/DPU-initiated I/O let the accelerator pull data over NVMe-oF without a host-CPU bounce, and the new generation of DPUs (NVIDIA BlueField-4 at 800 Gb/s) is built to own this path. Second, the spinning disk is being designed out of the hot path entirely — all-flash is becoming the default for both the training scratch tier and the checkpoint tier, with PCIe 6.0 dense-flash servers (e.g. 96 E3.S SSDs, ~2.9 PB) emerging to feed it. Third, the historical split between parallel file systems and object stores is converging, as the same platforms (WEKA, VAST, DDN) serve file and object semantics over one flash substrate. Fourth, and most distinctively for inference, CXL and Ethernet-attached flash are becoming a KV-cache tier — a new layer of the memory hierarchy that sits between HBM and bulk storage.
That last thread is the one with the sharpest 2026 roadmap signal. As reasoning models emit long decode sequences, the KV cache balloons, and the economics of holding it in HBM collapse. The answer is a three-tier KV hierarchy — HBM, then host/CXL memory, then NVMe and Ethernet-attached flash — that NVIDIA is standardizing via BlueField-4's context-memory platform (CMX) and the NIXL transfer library, with vendors reporting the ability to serve roughly 10x more users by offloading prefix caches to flash. The roadmap consequence for the facility is that storage is no longer a back-of-house capacity question; it is part of the inference memory hierarchy and sits on the latency-critical path. A 2026 design that treats storage as bulk capacity, decoupled from the fabric and the accelerator, is designing for the training workload of 2023, not the inference workload of 2026. → the CPU-bypass data path in Chapter 9.3; inference and KV-cache storage in Chapter 9.7; object/capacity tier in Chapter 9.6.
Rack & facility integration: from the 19-inch rack to the ~600 kW power-and-cooling chassis
The five subsystem roadmaps do not converge on a server — they converge on a chassis. The defining structural shift of 2026 → 2030 is that the unit of design stops being the 19-inch rack populated with independent servers and becomes an integrated power-and-cooling enclosure where compute, NVLink switching, liquid distribution, and power delivery are co-engineered as one mechanical object. The GB200 NVL72 (~120–132 kW, ~1.36 t / 3,000 lb, 5,184 in-rack copper NVLink cables, blind-mate liquid manifolds, a 1,400 A busbar) is the first generation that can only be understood as a single integrated unit. The Kyber NVL576 at ~600 kW is the next: at that power the disaggregated sidecar (Mt Diablo / Diablo 400) moves power conversion out of the compute rack into an adjacent power chassis, the busbar becomes liquid-cooled, and scale-up optics move onto the package.
For facility design this collapses several previously-independent decisions into one substrate commitment. A ~600 kW chassis at ~3,000–5,000 lb wet implies a structural slab and seismic-anchoring basis you cannot retrofit (Chapter 6.2, Chapter 6.7); an 800 VDC riser and busway you cannot rip out without a hall outage; a facility-water capacity and pipe-rack geometry sized for liquid heat rejection at that density; and a row-and-aisle pitch set by the chassis footprint plus its sidecar. The roadmap-correct move is the one this chapter has argued throughout: reserve the irreversible substrate for the chassis you will house in 2027–2028, and keep the reversible fit-out — the actual trays, transceivers, CDUs, and storage shelves — matched to the generation you are buying this year. → rack as integration unit in Chapter 7.13; modular/prefab construction in Chapter 6.4.
Deep dive: the five clocks are one clock
It is tempting to manage the five subsystem roadmaps as five independent programs with five owners. That decomposition is exactly the error, because the five clocks are not independent — they are harmonics of the accelerator clock, and they fall out of phase only at the cost of stranded capacity. Work the dependency chain forward from a single accelerator generation. A ~600 kW Kyber rack requires 800 VDC distribution (the AC chain cannot feed it efficiently); 800 VDC requires the disaggregated sidecar and a DC-breaker supply chain; the ~600 kW thermal load requires liquid at a delta-T and flow that sizes the CDU and facility-water loop; the scale-up domain at NVL576 requires rack-to-rack optics and likely CPO because copper reach has run out; and the inference workload that justifies the rack requires the KV-cache flash tier on the fabric. Miss one clock and the others are stranded: 800 VDC with an air-cooled hall is pointless; liquid with a 480 VAC riser caps at ~150 kW; a non-blocking fabric with no flash KV tier starves the inference engine it was built for.
This is why the consolidated roadmap belongs in one chapter rather than five appendices. The planning artifact that captures it is the capacity-ramp curve from Chapter 1.1 and Chapter 1.7, extended to call out, generation by generation, the density step and the substrate it implies — so that the floor, the voltage class, the water, the pipe-rack, and the fabric reach budget are all sized to the same future rack, on the same clock, at scoping time. The roadmap is not a forecast you read; it is a synchronization constraint you design to.