Guide › Trends, Roadmaps & the Future › 16.2

Chapter 16.2

Subsystem Roadmaps 2026 → 2030 (Consolidated)

Every subsystem in the AI data center is climbing the same exponential at once — power, cooling, silicon, fabric, and storage are all replatforming on overlapping three-year clocks — so the only roadmap that matters is the one that asks which irreversible substrate you must over-build today to absorb a density ramp you have not yet committed to.

POWER-BOUNDDENSITY-RAMP

What you'll decide here

Whether to plumb and wire the facility substrate (floor loading, water headroom, electrical risers, pipe-rack space, voltage class) for the 2027–2028 density step — ~600 kW Kyber-class racks on 800 VDC — or accept that you have built a current-generation, ~130 kW liquid hall that cannot absorb the ramp without a tear-out.
Which power architecture you commit the building to now: stay on 415/480 VAC with rack-level rectification, or take the ±400/800 VDC disaggregated-sidecar path that the >150 kW rack physically requires — the single least-reversible electrical decision in the project.
Which scale-up standard you underwrite — NVLink/NVLink Fusion, UALink/UALoE, or Broadcom SUE/ESUN — because that bet sets your accelerator-vendor optionality, your copper-vs-optics reach budget, and whether co-packaged optics is on your 2027 critical path.
Whether to design the cooling plant for single-phase direct-to-chip as table stakes while reserving the thermal and mechanical headroom for the microfluidic / two-phase step that the >2 kW-per-die generation will demand.
Which roadmap items are reversible (accelerator generation within an envelope, oversubscription ratio, storage tier mix, transceiver form factor) and can be deferred — versus the handful that are irreversible and must be hedged in concrete and copper at scoping time.

The forward-looking sections scattered through Parts 4 through 9 each end with the same uneasy sentence: this subsystem is about to replatform. Power is moving off AC. Cooling is moving off air, and single-phase liquid is already being framed as the floor rather than the ceiling. Silicon is doubling memory and tripling fabric bandwidth on a yearly cadence. The network is fighting a three-way standards war while optics migrate onto the package. Storage is being pulled into the GPU's memory hierarchy. None of these is a standalone trend. They are five faces of one phenomenon — the power-bound density ramp — and they are synchronized, because they are all downstream of the same accelerator roadmap. This chapter consolidates those forward pointers into a single 2026 → 2030 view and does what a roadmap is actually for: it sorts the moves you can defer from the ones you must commit before steel is cut.

A roadmap is a register of option premiums. Betting on the wrong generation of GPU is reversible: you buy the next tray. The expensive mistake of the 2026 era is building a substrate that cannot absorb the generation after next, because the floor, the water, the risers, and the voltage class are the things you cannot retrofit cheaply once the hall is live. We walk the five subsystems in the order the cascade flows — power, cooling, compute/memory, network/optics, storage — and close on the rack and facility integration that ties them together, because by 2027 the unit of design stops being the 19-inch rack and becomes the ~600 kW power-and-cooling chassis.

The master roadmap fork: substrate vs fit-out

Every subsystem on the roadmap splits into two layers with opposite reversibility. The fit-out layer — the accelerator generation, the transceiver form factor, the CDU model, the scheduler, the oversubscription ratio, the storage-tier mix — is reversible: it turns over every one to three years and you re-buy it at refresh. The substrate layer — floor loading, structural grid, facility water capacity, electrical riser and busway ampacity, voltage class, pipe-rack and knockout space, and the macro decision to plumb a hall for liquid at all — is effectively permanent for the life of the building. The roadmap discipline is to spend your option premium where it is cheap and irreversible: over-build the substrate to the 2027–2028 density step now, and keep the fit-out matched to the shipping generation. Designing the substrate to today's ~130 kW rack and being surprised by the ~600 kW Kyber generation strands power, floor, and an interconnection slot you cannot recover. → density-ramp trap in Chapter 1.1; structural basis in Chapter 6.2.

Power: 415 VAC → 800 VDC, solid-state transformers, behind-the-meter generation

The power chain is the subsystem under the most acute roadmap pressure, because it sits at the intersection of two exponentials: the per-rack power draw climbing from ~40 kW (H100, 2023) through ~120–142 kW (GB200/GB300, 2024–2025) to ~190–230 kW (Vera Rubin, 2026) and ~600 kW (Rubin Ultra Kyber, H2 2027), and the grid-side scarcity that made power the binding constraint in the first place (Chapter 16.1). At ~40 kW you can rectify AC inside the rack and not think about it. At ~600 kW you cannot: the copper cross-section, the rectifier count, and the I²R losses of pushing that power at 415/480 VAC to the rack become physically and economically untenable. The roadmap answer is a step-change in distribution voltage, and it is a one-way door.

The fork is between staying on the familiar 415/480 VAC chain with rack-level rectification — fine through roughly 150 kW — and taking the ±400/800 VDC disaggregated-sidecar path that NVIDIA, the OCP Mt Diablo project (Google/Meta/Microsoft), and the major electrical vendors (ABB, Eaton, Schneider, Vertiv) have converged on for the >150 kW rack. 800 VDC carries over 150% more power through the same copper than 415/480 VAC, and the end-to-end electrical chain efficiency rises from the AC chain's ~61–87.5% to over 92% on a DC path — about a 5% end-to-end gain that, at gigawatt scale, is tens of megawatts of recovered capacity. The catch is that the DC ecosystem is immature: the solid-state transformer (SST) that makes medium-voltage-to-800 VDC conversion efficient hit ~98% at 400 kW in an ETH Zurich benchmark and targets ~99%, but UL listing is not expected until roughly 2029. The decision you face in 2026 is therefore to design the building's voltage class and riser topology for the DC future while bridging with conventional gear until the SST and DC-breaker supply chain matures.

Underneath the distribution question sits the generation question. The interconnection wall has pushed behind-the-meter (BTM) on-site generation from a fringe tactic to a mainline strategy: ~82–101 GW of BTM gas has been announced cumulatively by 2026 (though only ~7 GW is under construction and ~2–3 GW online), because energizing megawatts on-site in 18–36 months beats waiting four-to-seven years in a utility large-load queue. The roadmap consequence is that the power subsystem is no longer just a distribution-engineering problem inside the fence — it is a generation, fuel-supply, and grid-interactive problem, and the building must be designed to host the generation it may need to bridge to grid power. → DC architecture in Chapter 4.7; on-site generation in Chapter 4.8; the queue and speed-to-power in Chapter 3.2.

Power-distribution roadmap: AC bridge vs 800 VDC path

Decision axis	Stay 415/480 VAC (rack rectification)	±400/800 VDC (disaggregated sidecar)
Density ceiling served	Comfortable to ~150 kW/rack	Designed for >150 kW → ~600 kW Kyber → ~1 MW racks
End-to-end chain efficiency	~61–87.5% (utility-to-VRM, AC)	Over 92% on the DC path (~5% e2e gain)
Copper / busway sizing	Baseline; rises sharply past 150 kW	Over 150% more power through the same copper
Ecosystem maturity (2026)	Mature, fully UL-listed, low risk	Immature; SST UL listing ~2029, DC breakers ramping
Reversibility	Low risk now, but caps the density ramp	Irreversible substrate bet; unlocks the 2027–2028 step
Best-fit decision	Current-gen inference halls, retrofits, bridges	New-build training / dense-inference campuses with a multi-gen ramp

Rack-power figures are NVIDIA roadmap (2026–2027 entries are pre-shipment). SST/UL timing per SemiAnalysis / ETH Zurich. The decision is which substrate voltage class you commit the building to, not which gear you buy first.

Cooling: single-phase DLC as table stakes, two-phase and microfluidics on deck

Cooling crossed its decisive fork earlier than the other subsystems, and the roadmap reflects a settled near-term and a contested far-term. The near-term is settled: single-phase direct-to-chip liquid cooling is the 2026 default, holding roughly 55% of the liquid-cooling market, because air saturates near 41 kW/rack and the GB200 NVL72 draws ~120–132 kW. Designing a new dense hall for anything but liquid in 2026 is a category error. The forward question is what comes after single-phase D2C when per-die thermal density keeps climbing toward and past 2 kW.

Here the roadmap forks into two candidate successors, and the fork is shaped as much by chemistry and liability as by thermodynamics. Two-phase approaches (two-phase D2C and two-phase immersion) offer better heat transfer by exploiting latent heat of vaporization — but two-phase immersion stalled hard when the PFAS health-and-liability crisis drove 3M to exit the Novec fluorochemical business, leaving the supply chain and regulatory picture for the dielectric fluids deeply uncertain. Microfluidics — etching coolant channels directly into the silicon, as in Microsoft's research demonstrating up to ~3x cold-plate performance with AI-designed bio-inspired channels — is the more promising far-term path because it attacks the thermal resistance at its source (the die-to-coolant interface) rather than downstream of it. Neither is a 2026 production decision. The roadmap consequence is conservative and specific: build for single-phase D2C now, but reserve the mechanical and facility-water headroom (CDU capacity, secondary-loop delta-T margin, manifold and quick-disconnect provisioning) so the hall can adopt the successor without re-plumbing. → DLC in Chapter 5.4; immersion and the PFAS problem in Chapter 5.5; facility water loops in Chapter 5.7.

~40 → ~600 kW

rack power, H100 (2023) → Rubin Ultra Kyber (H2 2027); GB200 ~120–132 kW, Rubin ~190–230 kW between

2026SemiAnalysis / NVIDIA roadmap

over 92%

end-to-end electrical chain efficiency on an 800 VDC path vs ~61–87.5% legacy AC (~5% e2e gain)

2025SemiAnalysis, Datacenter Anatomy

~98% (→99%)

solid-state transformer efficiency at 400 kW (13.2 kVAC→800 VDC); UL listing pending ~2029

2025SemiAnalysis / ETH Zurich INTELEC

~55%

single-phase direct-to-chip share of the liquid-cooling market — the 2026 default as two-phase stalled on PFAS

2026DCD / IDTechEx

H100 80GB → Rubin Ultra 1TB

per-GPU HBM capacity trajectory (B200 192GB, B300 288GB, Rubin 288GB HBM4)

2026NVIDIA Developer

1.8 → 3.6 TB/s

NVLink per-GPU bandwidth, Gen5 (Blackwell) → Gen6 (Rubin); NVL72 = 130 TB/s rack aggregate

2026NVIDIA

~30W → ~9W

per-1.6T-link power, pluggable DSP vs co-packaged optics — ~3.5x energy saving, 10x resilience

2026NVIDIA (CPO) / Spectrum-X Photonics

~30% / sold out

2026 HBM3E supply gap, fully allocated; CoWoS advanced-packaging capacity is the true supply gate

2026SemiAnalysis / TrendForce

Compute & memory: Vera Rubin → Rubin Ultra/Kyber → Feynman, HBM4, the advanced-packaging gate

The accelerator roadmap is the metronome the whole building marches to, because every other subsystem's clock is set by it. The publicly-stated cadence is yearly: Vera Rubin (VR200, ~190–230 kW rack) in 2026, Rubin Ultra in the ~600 kW Kyber NVL576 rack in H2 2027 — 576 GPU dies across 144 quad-die packages, ~15 ExaFLOPS FP4 inference / 5 ExaFLOPS FP8 training per rack — and Feynman in 2028, widely expected to push toward the 1 MW rack. Memory climbs in lockstep: per-GPU HBM goes H100 80GB → B200 192GB → B300 288GB → Rubin 288GB (HBM4) → Rubin Ultra 1TB (HBM4e). For a strategist the takeaway is the cadence, not the spec sheet. A yearly generation step with a roughly 2–3 year frontier-economic life (the contested depreciation figure from Chapter 1.8) means the substrate you pour in 2026 must absorb at least two generations of density growth within its first refresh window.

The real roadmap risk on this axis is the advanced-packaging gate, not the GPU. The binding constraint above wafer fabrication is TSMC CoWoS capacity and HBM stacking, not transistors. 2026 HBM3E is effectively sold out with a ~30% supply gap and ~15–20% per-quarter price rises, and HBM4 (qualified with Samsung and SK hynix for Rubin) inherits the same hybrid-bonding and CoWoS-wafer bottleneck. The consequence for a data-center program is counterintuitive: your delivery schedule is gated less by your capex and more by an upstream packaging line you do not control. A scope that assumes GPU availability on the vendor's stated cadence, without an allocation agreement, is underwriting a schedule it cannot guarantee. → NVIDIA roadmap in Chapter 7.2; HBM in Chapter 7.6; advanced packaging in Chapter 7.7.

Deep dive: why the packaging gate, not the fab, sets your delivery date

The intuitive model of GPU supply is a fab-limited one: more EUV wafers, more chips. That model has been wrong since Blackwell. The genuine bottleneck moved one step down the line, to advanced packaging — TSMC's CoWoS (chip-on-wafer-on-substrate) that integrates the logic die with its HBM stacks — and to the HBM stacking itself, where hybrid bonding limits how fast capacity can grow. The numbers tell the story: in 2026 HBM3E is fully allocated with a supply gap on the order of 30%, prices are rising ~15–20% per quarter, and CoWoS wafer starts — not GPU die yield — are what the hyperscalers are actually fighting over.

The consequence for facility planning is direct and frequently missed. You can energize the megawatts, pour the slab, plumb the liquid, and string the fabric, and still have empty racks because your accelerators are stuck behind someone else's packaging allocation. This inverts the usual planning order: in the power-bound era you race to energize power, but the GPUs that fill that power are gated by an upstream line measured in CoWoS wafers per month. The defensible scope treats accelerator allocation as a long-lead item on par with the transformer and the interconnection agreement — contracted, not assumed — and phases the capacity ramp to the packaging supply curve rather than to the vendor's headline launch date. → Chapter 7.7.

Network & optics: the scale-up wars, Ultra Ethernet, and co-packaged optics

The network roadmap is where the most consequential standards bet lives, because unlike power or cooling — where physics largely dictates the answer — the fabric question is partly a political one about which ecosystem you tie your accelerator optionality to. Two distinct fabrics are evolving in parallel. Scale-up (intra-rack, the memory-semantic domain that binds a tray of GPUs into one logical accelerator) is the contested ground: NVIDIA's NVLink (1.8 TB/s/GPU on Gen5, 3.6 TB/s on Gen6) with NVLink Fusion opening the IP to third-party CPUs/XPUs; the open UALink / UALoE consortium path (AMD-led, up to 1,024 accelerators in the 1.0 spec); and Broadcom's SUE/ESUN (Scale-Up Ethernet). Scale-out (inter-rack, the all-reduce fabric) is consolidating faster around Ultra Ethernet (UEC 1.0: packet spray with reorder, UCCM congestion control, native RDMA) as the open answer to InfiniBand, alongside NVIDIA's Spectrum-X.

The scale-up standard you underwrite is a multi-year lock-in decision: it sets which accelerators you can rack, how large a scale-up domain you can build (and therefore your tensor-/expert-parallel ceiling), and your copper-vs-optics reach budget. Choose NVLink and you get the largest, most mature scale-up domains and the deepest software stack — at the price of single-vendor accelerator dependence. Choose UALink or SUE and you preserve multi-vendor optionality at the price of a younger ecosystem and smaller proven domains. Underneath both sits the optics transition: as lane rates climb to 200G and 448G and reach budgets shrink (passive copper ~1–2 m at 800G/1.6T), co-packaged optics moves the optical engine onto the switch package — cutting per-1.6T-link power from ~30W (pluggable DSP) to ~9W and delivering ~3.5x energy efficiency with 10x resilience. NVIDIA's Quantum-X (InfiniBand) reached availability in early 2026 and Spectrum-X Photonics (Ethernet) follows in H2 2026. The roadmap consequence: if your 2027 build uses a ~600 kW Kyber rack, rack-to-rack optics and CPO are very likely on your critical path, not an option. → scale-out standards in Chapter 8.4; topology and oversubscription in Chapter 8.5; CPO and fiber plant in Chapter 8.10.

The scale-up standards bet: NVLink vs UALink/UALoE vs SUE/ESUN

Axis	NVLink / NVLink Fusion	UALink / UALoE	Broadcom SUE / ESUN
Backer / camp	NVIDIA (Fusion opens IP to 3rd-party XPUs)	Open consortium (AMD-led)	Broadcom (Ethernet-based scale-up)
Accelerator lock-in	Highest — NVIDIA-centric ecosystem	Lowest — multi-vendor by design	Low — Ethernet-merchant-silicon path
Per-GPU bandwidth	1.8 TB/s (Gen5) → 3.6 TB/s (Gen6)	200G-class lanes; spec to 1,024 accelerators	Ethernet SerDes (200G/400G/lane)
Maturity (2026)	Shipping at scale; deepest software stack	Spec 1.0 final; early silicon	Emerging; leverages Ethernet supply chain
Proven domain size	Largest (NVL72 → NVL576 Kyber)	Smaller proven domains so far	Smaller proven domains so far
Best-fit decision	Max performance, single-vendor accepted	Multi-vendor optionality a priority	Ethernet-everywhere / merchant strategy

Scale-up = the intra-rack memory-semantic fabric. This is a multi-year lock-in that sets accelerator optionality and domain size. Figures per NVIDIA / UALink Consortium / SemiAnalysis (AI networks).

Storage: GPU/DPU-initiated I/O, all-flash everywhere, file/object convergence, CXL tiering

Storage is the subsystem most often left off a roadmap, and the most quietly transformed by the density ramp. The forward direction has four threads. First, the data path is moving off the CPU: GPUDirect Storage and GPU/DPU-initiated I/O let the accelerator pull data over NVMe-oF without a host-CPU bounce, and the new generation of DPUs (NVIDIA BlueField-4 at 800 Gb/s) is built to own this path. Second, the spinning disk is being designed out of the hot path entirely — all-flash is becoming the default for both the training scratch tier and the checkpoint tier, with PCIe 6.0 dense-flash servers (e.g. 96 E3.S SSDs, ~2.9 PB) emerging to feed it. Third, the historical split between parallel file systems and object stores is converging, as the same platforms (WEKA, VAST, DDN) serve file and object semantics over one flash substrate. Fourth, and most distinctively for inference, CXL and Ethernet-attached flash are becoming a KV-cache tier — a new layer of the memory hierarchy that sits between HBM and bulk storage.

That last thread is the one with the sharpest 2026 roadmap signal. As reasoning models emit long decode sequences, the KV cache balloons, and the economics of holding it in HBM collapse. The answer is a three-tier KV hierarchy — HBM, then host/CXL memory, then NVMe and Ethernet-attached flash — that NVIDIA is standardizing via BlueField-4's context-memory platform (CMX) and the NIXL transfer library, with vendors reporting the ability to serve roughly 10x more users by offloading prefix caches to flash. The roadmap consequence for the facility is that storage is no longer a back-of-house capacity question; it is part of the inference memory hierarchy and sits on the latency-critical path. A 2026 design that treats storage as bulk capacity, decoupled from the fabric and the accelerator, is designing for the training workload of 2023, not the inference workload of 2026. → the CPU-bypass data path in Chapter 9.3; inference and KV-cache storage in Chapter 9.7; object/capacity tier in Chapter 9.6.

Rack & facility integration: from the 19-inch rack to the ~600 kW power-and-cooling chassis

The five subsystem roadmaps do not converge on a server — they converge on a chassis. The defining structural shift of 2026 → 2030 is that the unit of design stops being the 19-inch rack populated with independent servers and becomes an integrated power-and-cooling enclosure where compute, NVLink switching, liquid distribution, and power delivery are co-engineered as one mechanical object. The GB200 NVL72 (~120–132 kW, ~1.36 t / 3,000 lb, 5,184 in-rack copper NVLink cables, blind-mate liquid manifolds, a 1,400 A busbar) is the first generation that can only be understood as a single integrated unit. The Kyber NVL576 at ~600 kW is the next: at that power the disaggregated sidecar (Mt Diablo / Diablo 400) moves power conversion out of the compute rack into an adjacent power chassis, the busbar becomes liquid-cooled, and scale-up optics move onto the package.

For facility design this collapses several previously-independent decisions into one substrate commitment. A ~600 kW chassis at ~3,000–5,000 lb wet implies a structural slab and seismic-anchoring basis you cannot retrofit (Chapter 6.2, Chapter 6.7); an 800 VDC riser and busway you cannot rip out without a hall outage; a facility-water capacity and pipe-rack geometry sized for liquid heat rejection at that density; and a row-and-aisle pitch set by the chassis footprint plus its sidecar. The roadmap-correct move is the one this chapter has argued throughout: reserve the irreversible substrate for the chassis you will house in 2027–2028, and keep the reversible fit-out — the actual trays, transceivers, CDUs, and storage shelves — matched to the generation you are buying this year. → rack as integration unit in Chapter 7.13; modular/prefab construction in Chapter 6.4.

Roadmap figures are bets, not facts

Almost every number on the 2027–2028 horizon in this chapter is a vendor roadmap or a pre-shipment estimate, not a shipped specification. The ~600 kW Kyber rack, the ~1 MW Feynman rack, Rubin Ultra's 1TB HBM4e, SST UL listing timing, and the CPO adoption curve are all announced, and announced roadmaps slip, re-scope, and occasionally reverse (recall the contested depreciation and BTM-gas figures, where announced and built diverge by an order of magnitude). The discipline is to treat the direction as high-confidence — density up, AC→DC, air→liquid, copper→optics, CPU-bypass storage — while treating any specific 2027+ figure as a planning assumption to be re-checked, not a commitment to design rigidly against. Over-build the substrate for the direction; do not bet the schedule on a single dated milestone you do not control. → forecast register in Appendix D.

Deep dive: the five clocks are one clock

It is tempting to manage the five subsystem roadmaps as five independent programs with five owners. That decomposition is exactly the error, because the five clocks are not independent — they are harmonics of the accelerator clock, and they fall out of phase only at the cost of stranded capacity. Work the dependency chain forward from a single accelerator generation. A ~600 kW Kyber rack requires 800 VDC distribution (the AC chain cannot feed it efficiently); 800 VDC requires the disaggregated sidecar and a DC-breaker supply chain; the ~600 kW thermal load requires liquid at a delta-T and flow that sizes the CDU and facility-water loop; the scale-up domain at NVL576 requires rack-to-rack optics and likely CPO because copper reach has run out; and the inference workload that justifies the rack requires the KV-cache flash tier on the fabric. Miss one clock and the others are stranded: 800 VDC with an air-cooled hall is pointless; liquid with a 480 VAC riser caps at ~150 kW; a non-blocking fabric with no flash KV tier starves the inference engine it was built for.

This is why the consolidated roadmap belongs in one chapter rather than five appendices. The planning artifact that captures it is the capacity-ramp curve from Chapter 1.1 and Chapter 1.7, extended to call out, generation by generation, the density step and the substrate it implies — so that the floor, the voltage class, the water, the pipe-rack, and the fabric reach budget are all sized to the same future rack, on the same clock, at scoping time. The roadmap is not a forecast you read; it is a synchronization constraint you design to.

This chapter consolidates forward pointers that live in full depth elsewhere. Power: the DC revolution in Chapter 4.7, on-site generation in Chapter 4.8, UPS/transient absorption in Chapter 4.5. Cooling: DLC in Chapter 5.4, immersion and PFAS in Chapter 5.5, facility water in Chapter 5.7. Compute and memory: the NVIDIA roadmap in Chapter 7.2, HBM in Chapter 7.6, advanced packaging in Chapter 7.7. Network and optics: scale-up fabric in Chapter 8.2, scale-out standards in Chapter 8.4, CPO and fiber plant in Chapter 8.10. Storage: the CPU-bypass path in Chapter 9.3, KV-cache hierarchy in Chapter 9.7. The power-bound framing that motivates the whole ramp is Chapter 16.1; the reversible-vs-irreversible discipline is from Chapter 1.1; the economics that score the refresh cadence are in Chapter 1.8; and the dated forecast register is Appendix D.