The Definitive Guide toAI Data Centers
Ask the Guide

Appendix B

Reference Designs & Worked Examples

A reference design is the cascade made arithmetic: pick the archetype, the accelerator generation, and the scalable unit, and the megawatts, liters-per-minute, fiber strands, switch ports, and dollars all fall out of a small set of multipliers you can carry in your head — this appendix supplies those multipliers and works three reference builds end-to-end (a scalable-unit budget, a 50 MW campus, and a 100k-GPU cluster BOM) so you can sanity-check any vendor's sizing in an afternoon.

What you'll decide here

  1. Start from the per-archetype design-basis sheet that matches your dominant workload — it fixes density tier, cooling modality, fabric blocking, redundancy, and the GPU:CPU/storage/network ratios that every later number multiplies against.
  2. Treat the scalable unit (SU) as the atomic costing and deployment block: size the SU once from the budget table, then multiply — campuses and clusters in this appendix are all integer counts of SUs, not bespoke arithmetic.
  3. Use the 50 MW campus and 100k-GPU BOM as order-of-magnitude calibrators, not bids: counts are exact from the ratios; dollar figures are 2025–2026 list/street ranges that move quarterly and must be re-quoted.
  4. When a vendor proposal disagrees with these tables by more than ~20% on a count (racks, CDUs, switches, optics), find out why before you sign — the divergence is usually a hidden oversubscription, redundancy, or generation assumption.
  5. Re-derive, do not interpolate, when you change generation (GB200 → GB300 → VR200 → Kyber): density, flow, and busbar current step discontinuously, so the multipliers in §2 are generation-stamped on purpose.

This appendix is the reusable arithmetic layer behind Part 1's archetype framework (Chapter 1.1) and Part 1's requirements matrix (Chapter 1.7). It does four things, in order: (1) a per-archetype design-basis sheet that freezes the inputs every later number inherits; (2) a scalable-unit (SU) budget giving the power, cooling, water, and network draw of one atomic deployment block per accelerator generation; (3) a 50 MW campus sized from the SU up, with the power chain, cooling plant, and water loop derived; and (4) a 100k-GPU cluster reference BOM with counts and rough 2025–2026 costs for GPUs, racks, CDUs, switches, optics, and storage.

The discipline throughout is multiplier-first. A reference design is a chain of ratios: GPUs per rack, racks per SU, kW per rack, L/min per kW, NICs per node, optics per NIC, GB/s per GPU of storage. Once those are pinned, every aggregate is a multiplication you can audit. Counts in the tables are exact arithmetic from the stated ratios. Dollar figures are 2025–2026 street/list ranges (sources stamped inline); they drift quarterly and are calibration aids, not quotes. Density, flow, and current figures are generation-stamped because they step discontinuously across GB200 → GB300 → Vera Rubin VR200 → Rubin Ultra Kyber; do not interpolate across a generation boundary.

1. Per-archetype design-basis sheets

The design-basis sheet is the single page that everything downstream inherits — it is the concrete instantiation of the workload-profile and design-basis artifacts named in Chapter 1.1. Read it as: choose the row that matches your dominant archetype, and the rest of the appendix is parameterized for you. The three reference builds in §3–§4 use the frontier-training column unless noted, because it is the most constraining; an inference-shaped build relaxes density, fabric, and redundancy and is cheaper on every axis.

Design-basis sheet by workload archetype (2026 reference points)
ParameterFrontier trainingPost-training / RLOnline inferenceBatch inferenceEdge inference
Dominant acceleratorGB200/GB300 NVL72Disaggregated: NVL72 trainer + HGX rolloutHGX B200/B300; GB200 for MoEHGX B200; prior-gen acceptableL4/L40S, Jetson, single B200
Rack density (design)120–142 kWMixed: 132 kW trainer / 40–60 kW rollout40–60 kW30–60 kW3–30 kW per micro-site
Cooling modalityDLC mandatory, warm-waterDLC trainer + RDHx/air rolloutAir, RDHx, or DLC by densityAir often sufficientAir / sealed modular
Scale-up domain72 GPUs (→144, →576)72 trainer / 8 rollout8–72 (MoE wants 72)81 (single node)
Scale-out fabric1:1 non-blocking, 8-railTight trainer / oversub rollout2:1–3:1 oversubscribedHeavily oversubscribedMinimal; WAN backhaul
Fabric transportInfiniBand XDR or Spectrum-XIB trainer / RoCE rolloutEthernet/RoCE commonEthernet, cost-optimizedStandard IP
GPU:CPU ratio2:1 (NVL72: 72G:36C)2:1 trainer / 4–8:1 rollout4:1–8:18:1+1:1 appliance
GPU:storage (BW)~250–400 GB/s per 1,024 GPUTrainer like trainingKV-cache + model load tierStreaming object tierLocal NVMe only
Redundancy basisN / N+1 (checkpointable)N+1, staleness-tolerant2N / Tier-IV-class + N+1 coolingN (queue-and-retry)N; fleet geo-redundancy
EDPp sizing factor~1.4–1.5× TDP~1.4× trainer~1.3× TDP~1.2× TDP~1.2× TDP
Siting driverCheap firm MW + cold climateFollows dominant sub-workloadSub-50 ms to usersCheapest / curtailable MWLatency budget (30/50/100 ms)
Density and fabric figures are GB200/GB300 NVL72-class, 2026-current. GPU:CPU and GPU:storage are design ratios, not hard limits. See keynumbers for sources and vintages.

2. The scalable unit (SU): power / cooling / water / network budget

The scalable unit is the atomic deployment and costing block — order it, integrate it at the factory (L11/L12), ship it, energize it, repeat. Sizing the SU once and then multiplying is what makes campus and cluster arithmetic tractable. We anchor the SU to the NVIDIA DGX SuperPOD GB200 reference: 8 × NVL72 racks = 576 GPUs per SU, with the full SuperPOD at 16 SUs (128 racks, 9,216 GPUs). The budget below gives one SU's draw across four generations; later sections count SUs, not racks.

The cross-generation columns exist because the multipliers step. A GB200 SU is ~1.06 MW of IT; the same 8-rack SU at Kyber density (~600 kW/rack) is ~4.8 MW — a 4.5× jump in the same floor footprint. That is the density-ramp trap from Chapter 1.1 rendered as a budget line: the floor, water, and busbar you reserve today must survive it.

Scalable-unit budget — 8-rack SU (576 GPUs), by accelerator generation
MetricGB200 NVL72 (2025)GB300 NVL72 (2025–26)VR200 NVL144 (H2 2026)Kyber NVL576 (H2 2027)
GPUs per SU5765761,152 (144/rack)4,608 (576/rack)
Rack density (TDP)132 kW142 kW~200 kW~600 kW
IT power per SU (TDP)~1.06 MW~1.14 MW~1.60 MW~4.80 MW
IT power per SU (EDPp ~1.4×)~1.48 MW~1.59 MW~2.24 MW~6.7 MW (smoothed ~30%)
DLC heat to liquid (~87%)~0.92 MW~0.99 MW~1.74 MW (100% liquid)~4.8 MW (100% liquid)
Residual air heat~0.14 MW (~17 kW/rack)~0.15 MW~0 (100% liquid)~0 (100% liquid)
Secondary-loop flow (~1.5 L/min·kW)~1,580 L/min~1,700 L/min~2,400 L/min~7,200 L/min
Coolant inlet / ΔT target<25 °C / <10 °C<25 °C / ~10 °C~45 °C warm-water~45 °C warm-water
Back-end NICs (8× 400G/node)144 NICs (3.2 Tb/s/node)144 NICs288 (CX-9 800G)1,152 (CX-9/CPO)
Leaf switch ports consumed (back-end)576 (1 port/GPU rail)5761,1524,608
Back-end optics (transceivers, 1:1)~1,152 (NIC+leaf ends)~1,152~2,304~9,216 (CPO shifts mix)
Storage BW attributable (~250 GB/s/1,024 GPU)~140 GB/s~140 GB/s~280 GB/s~1,125 GB/s
SU = 8 NVL72-class racks = 576 GPUs (DGX SuperPOD GB200 SU definition). Water flow at ~1.5 L/min per kW DLC heat; facility-water make-up assumes evaporative rejection (zero for closed-loop). Optics/switch counts are the 1:1 non-blocking back-end share attributable to one SU.
Worked example: deriving one GB200 SU line-by-line

Take the GB200 column and walk it forward so the multipliers are explicit. GPUs: 8 racks × 72 GPUs = 576. IT power (TDP): 8 × 132 kW = 1.056 MW ≈ 1.06 MW. EDPp: 1.06 MW × 1.4 ≈ 1.48 MW provisioned on the rack power chain. Heat split: at ~115 kW liquid + ~17 kW air per rack, liquid carries 8 × 115 = 920 kW and air carries 8 × 17 = 136 kW. Flow: 920 kW × ~1.5 L/min·kW ≈ 1,380–1,580 L/min secondary loop (≈ 130–200 LPM/rack, matching NVL72 CDU sizing). NICs: 8 racks × 18 compute nodes... note NVL72 presents 72 GPUs across 18 trays; a rail-optimized back-end gives 8× 400G per GPU-pair node → ~144 NICs/SU at 3.2 Tb/s/node. Optics: the back-end optic count tracks GPU rails, not NIC bodies — each of the 576 GPUs drives one 1:1 rail link, and each link burns two transceivers (server/NIC end + leaf end), so 576 GPUs × 2 ≈ ~1,152 rail-side optics for the SU's share of the non-blocking fabric (before spine). These are the only numbers; everything in §3–§4 is integer multiples of them. → SU definition in Chapter 1.7; fabric sizing in Chapter 8.5.

3. Worked example: a 50 MW campus sized from the SU up

Now multiply. The brief: a 50 MW IT frontier-training campus on GB200/GB300 NVL72, built as integer SUs, with the power chain, cooling plant, and water loop derived. We size on TDP for the IT budget and EDPp for the electrical chain, and we reserve floor/water/busbar headroom for a GB300 → VR200 density step (the irreversible substrate from Chapter 1.1).

SU count. 50 MW IT ÷ ~1.06 MW/SU ≈ 47 SUs. Round to 48 SUs (a clean 3 × 16-SU SuperPOD-scale halls, or 6 × 8-SU halls). That is 48 × 8 = 384 NVL72 racks and 48 × 576 = 27,648 GPUs. Total IT at 132 kW/rack = 50.7 MW — within the 50 MW brief at design margin.

50 MW campus — derived from 48 SUs (384 racks, 27,648 GB200 GPUs)
SubsystemSizing basisQuantity / value
Scalable units50 MW ÷ 1.06 MW/SU48 SUs
NVL72 racks48 × 8384 racks
GPUs384 × 7227,648 GPUs
IT power (TDP)384 × 132 kW50.7 MW
Facility power (PUE ≈ 1.2)50.7 MW × 1.2~60.8 MW
Utility interconnect (N, +margin)~61 MW × 1.15~70 MW POI / 2× 132 kV feeders
Main transformers≥2 × 75 MVA (N+1 at MV)2–3 × 75 MVA
MV distribution33/13.8 kV ring or radialper-hall 13.8 kV → 415 V / 800 VDC
UPS / ride-through (EDPp)50.7 MW × 1.4 EDPp~71 MW transient basis; BESS + rack BBU
Backup generationN (gas RICE / turbine for island)~65–75 MW prime/standby
DLC heat to facility water384 × 115 kW~44 MW thermal
CDUs (L2L, ~1.4 MW each, N+1)44 MW ÷ ~1.3 MW useful + N+1~36–40 CDUs (≈ 1 per 8–10 racks + spares)
Secondary-loop flow384 × ~190 LPM~73,000 L/min aggregate
Heat rejection~61 MW total heattowers/dry-coolers + adiabatic, economized
Water make-up (WUE ~0.5 L/kWh)60.8 MW × 0.5 × 8,760 h~266 ML/yr (≈ 700 m³/day peak)
Back-end fabric (1:1)27,648 GPUs, 8-rail fat-tree~3,456 leaf+spine switch ASICs (see §4)
Floor area (white space)384 racks @ ~30 m²/rack incl. aisles/CDU~11,500 m² + plant
Floor loading reservewet rack ~1.36 t + VR200 headroomdesign slab to ≥1,500 kg/m²
TDP basis 132 kW/rack. Facility power applies PUE ≈ 1.2 (warm-water DLC + economized rejection). Generator/UPS sized to EDPp. Water assumes hybrid rejection with adiabatic assist; closed-loop dry would trade WUE→0 for ~+0.05 PUE.

4. Reference BOM: a 100k-GPU GB200/GB300 cluster

The flagship worked example: a 100,000-GPU GB200/GB300-class training cluster, costed as a bill of materials. Built from the SU: 100,000 ÷ 576 ≈ 174 SUs; round to 176 SUs = 1,408 NVL72 racks = 101,376 GPUs (≈ 100k). At 132 kW/rack that is ~186 MW IT, ~223 MW facility at PUE 1.2 — a multi-campus build in practice (scale-across over DCI, Chapter 8.8), but costed here as one logical cluster.

The cost column carries the heaviest caveat: GPU/system pricing is 2025 street/list (SemiAnalysis), networking and storage are practitioner ranges, and all of it moves quarterly. Use the counts as gospel (they are arithmetic) and the dollars as an order-of-magnitude frame. The GPU/system line dominates so heavily (~75–80% of cluster capex) that errors elsewhere barely move the total.

100k-GPU cluster reference BOM (176 SUs · 1,408 NVL72 racks · 101,376 GPUs)
BOM lineCountBasisUnit cost (2025–26)Line cost (rough)
GB200/GB300 GPUs101,376176 SU × 576~$60–70k effective
NVL72 racks (integrated, L11)1,408176 SU × 8~$3.0–3.5M / rack~$4.5–4.9B
— (rack line includes GPUs, Grace, NVSwitch, DLC)GB200 ~$200k/GPU system-level~$70–80B system total
CDUs (L2L, N+1)~150~1 per 8–10 racks + spares~$120–180k~$22–27M
Back-end leaf switches (Quantum-X800/Spectrum-X)~2,0008-rail, ~72 GPU/leaf group~$120k (64×800G)~$240M
Back-end spine switches~1,0002-tier fat-tree, 1:1~$120k~$120M
Front-end / storage / mgmt switches~600in-band + OOB + storage net~$30–60k~$25M
Back-end optics / transceivers (800G)~405,000101,376 GPU × ~4 (NIC+leaf+spine ends)~$1,000–1,500~$450–600M
DAC/AEC copper (intra-rack scale-up)in-rack5,184 NVLink cables/rack (copper, in rack price)incl. in rack
High-perf storage (parallel FS)~25 PB usable~250 GB/s per 1,024 GPU → ~25 TB/s~$0.20–0.40/GB flash tier~$1.0–2.0B
Capacity / object tier~150–250 PBdata lake + checkpoints~$0.02–0.05/GB~$5–12M
Facility power chain (per MW)~223 MW facilitytransformers, UPS/BESS, switchgear, gen~$10–15M / MW (AI-grade)~$2.2–3.3B
Cooling plant + water loop (per MW)~186 MW ITCDUs counted above + rejection + piping~$3–5M / MW~$0.6–0.9B
Cluster total (compute + network + storage + facility)GPU/system dominates ~75–80%~$80–95B
Counts are exact from §2 ratios. Unit costs are 2025–2026 street/list ranges (SemiAnalysis AI Neocloud Playbook; vendor lists); they drift quarterly and exclude land, building shell, and soft costs. Networking assumes 1:1 non-blocking 8-rail fat-tree; storage at ~250 GB/s per 1,024 GPUs.
132 / 142 kW
GB200 / GB300 NVL72 rack TDP; GB300 up to ~142 kW (483k BTU/hr) per rack
2026Schneider Electric / Converge Digest; Supermicro &amp; Lenovo GB300 datasheets
576 GPUs / SU
DGX SuperPOD GB200 scalable unit = 8 NVL72 racks; full SuperPOD = 16 SU / 128 racks / 9,216 GPUs
2025NVIDIA DGX SuperPOD GB200 Reference Architecture
~600 kW
Rubin Ultra Kyber NVL576 rack on 800 VDC; ~4.8 MW per 8-rack SU
H2 2027 (announced)NVIDIA GTC; The Next Platform; DCD
~1.5 L/min·kW
DLC secondary-loop flow rule of thumb (PG25); ~130–200 LPM per NVL72 rack
2025Dober PG25; Introl NVL72 deployment
~1.4 MW / CDU
L2L CDU useful capacity; ~1 per 8–10 NVL72 racks at N+1; Google Deschutes row CDU ~2 MW
2025Vertiv/Eaton/nVent CDU specs; Google Cloud
8× 400 Gb/s
back-end NICs per training node = 3.2 Tb/s/node; 1:1 non-blocking 8-rail fat-tree
2025SemiAnalysis AI Neocloud Playbook
~$200k/GPU
GB200 system-level all-in (~$200M per 1,000 GPUs); ~$60–70k effective per GPU
2025SemiAnalysis; domain-research BOM
~250–400 GB/s
aggregate storage bandwidth per 1,024 training GPUs (~2 PB initial); place on front-end, not back-end
2025SemiAnalysis storage sizing
Sensitivity: how the three builds move when you change one input

Generation step (GB200 → VR200). Same 8-rack SU footprint, but IT power per SU goes ~1.06 → ~1.60 MW and flow ~1,580 → ~2,400 L/min. The 50 MW campus at VR200 density needs only ~31 SUs for the same 50 MW — but each hall now dissipates ~1.6× the heat per rack, so the cooling plant, not the floor, becomes binding. Oversubscription (1:1 → 2:1) for an inference cluster. Cuts back-end switches and optics by ~31% — on the 100k BOM that is ~$0.3B saved on a ~$90B total (≈ 0.3%), which is why you never compromise training fabric to save fabric money, but always oversubscribe an inference fabric. Closed-loop dry cooling. Drives the 50 MW campus's ~266 ML/yr water make-up toward zero, at the cost of ~+0.05 PUE (~+2.5 MW facility power) and a larger heat-rejection footprint — the WUE↔PUE trade from Chapter 15.4. Effective GPU life (3 yr → 5 yr). Does not move any count or capex line, but nearly doubles the denominator in $/GPU-hr — the dominant TCO lever, quantified in Chapter 1.8 and Appendix C.

These reference designs operationalize the archetype framework in Chapter 1.1 and the requirements matrix in Chapter 1.7; the economics that score them live in Chapter 1.8 with the calculators in Appendix C. The SU and BOM inherit: rack/integration detail from Chapter 7.13 and Chapter 7.14; the 800 VDC power chain from Chapter 4.7 and transient sizing from Chapter 4.5; CDU and warm-water loop sizing from Chapter 5.6 and Chapter 5.7; fabric topology and oversubscription from Chapter 8.5 and optics from Chapter 8.10; storage sizing from Chapter 9.8; multi-campus scale-across from Chapter 8.8. Every dated figure here is registered with vintage and scenario in Appendix D.