Guide › Cooling & Thermal Management › 5.1

Chapter 5.1

Thermal Fundamentals & the Density Wall

Heat is the binding physical constraint of an AI data center: every watt you put into a chip must leave it through a stack of thermal resistances that air can no longer clear, so the density of the machine you intend to run silently dictates which cooling regime you are forced into — and the boundary between regimes is a cliff, not a slope.

POWER-BOUNDDENSITY-RAMP

What you'll decide here

The peak rack density your facility must clear — not today's accelerator but the one two generations out — because density sets the cooling regime, and the regime sets the slab, the plenum, the water, and the heat-rejection plant before any of them can be changed.
Which side of the air-cooling cliff (~30–50 kW/rack) your design basis sits on, and therefore whether you are committing to air, a rear-door bridge, or direct-to-chip liquid before steel is cut.
The junction-to-coolant thermal-resistance budget you are designing against — the chip vendor fixes Tjunction and TDP, leaving you only the coolant temperature and the resistance stack to spend, and that budget decides whether a cold plate can keep the silicon legal.
The approach temperature and effectiveness you target at every heat exchanger in the chain, because each delta-T you spend narrows the free-cooling window and pushes the facility toward mechanical chilling.
Whether the irreversible substrate (floor loading, facility water, pipe-rack and CDU space, electrical headroom) is sized for the density ramp, even where the reversible IT fit-out is matched to the current generation.

Cross ~41 kW per rack and air cooling stops working — committing you to liquid, a plumbed hall, and reinforced floors, permanently.

Every accelerator is, thermodynamically, a space heater that happens to do arithmetic. A 1,200 W GPU converts essentially all of its electrical input into heat, and that heat must be removed continuously and within a few degrees of a fixed temperature limit or the silicon throttles, ages, or fails. This is the one constraint in the building that does not negotiate. You can oversubscribe a fabric, defer a redundancy tier, re-price a power contract — but you cannot argue with the second law. The heat leaves through the path you built for it, at the rate physics allows, or the machine slows down to match the path you actually have.

This chapter lays the thermal foundation for all of Part 5. We start from heat-flux first principles and the thermal-resistance stack-up that connects a transistor junction to the coolant, then show why air hit a wall somewhere around 30–50 kW per rack and why that wall is a discontinuity rather than a gentle ceiling. We trace the density curve from 2020 to 2027 (40 kW H100 racks to ~132 kW GB200 NVL72 to ~600 kW Kyber-class) and map the cooling hierarchy onto it, naming where each technology saturates. We close on the thermal metrics (approach temperature, NTU/effectiveness, the chain of delta-Ts) that the rest of this part uses as working vocabulary. Density is the decision; the cooling regime is its consequence; the cliff between regimes is the most expensive boundary to cross in the wrong direction.

Heat flux and the resistance stack-up

The governing quantity is not power but heat flux — power per unit area, W/cm². A 700 W H100 die spread over roughly 8 cm² runs near 85–90 W/cm²; a Blackwell-class package past 1 kW pushes toward and beyond 100 W/cm² at the hotspot. For scale, that flux rivals a nuclear-reactor fuel rod and exceeds a domestic cooktop element by an order of magnitude. Flux is what the cooling solution actually fights, because heat removal is fundamentally limited by how much surface area you can couple to a coolant and how steep a temperature gradient you can sustain across it.

The chip vendor hands you two fixed numbers and no others. Tjunction-max — the maximum allowable on-die temperature, typically ~90–105 °C for datacenter accelerators — is a hard reliability and functional limit; cross it and the part throttles, then degrades, then fails. TDP — the thermal design power you must remove — is set by the silicon and the workload. Everything between the junction and the coolant is your design space, and it is governed by a simple, unforgiving relation: the temperature rise from coolant to junction equals the heat removed times the thermal resistance of the path, ΔT = Q × Rθ. Fix Q (the TDP) and Tjunction-max, and the resistance you can afford collapses to a fixed budget. Spend it badly and the chip is illegal at any coolant temperature you can practically supply.

The resistance stack is a series chain — Tjunction to Tcase to the cooling medium — and like any series circuit, the largest resistor dominates. Walk it from the silicon outward:

The junction-to-coolant thermal-resistance stack

Stage	Interface	What it is	Why it dominates or doesn't
Junction → case	Rθ-JC (in-package)	Silicon → TIM1 → integrated heat spreader / lid	Largely fixed by the vendor's package; you cannot improve it from outside
Case → cold plate	TIM2 / thermal interface	Lid → second thermal interface → cold-plate base	The most abused link; a poor or pumped-out TIM2 silently adds 5–15 °C
Cold plate → coolant	Convective resistance	Microchannel / skived-fin base → flowing coolant film	Set by flow rate, channel geometry, and coolant; where DLC wins over air
Coolant → facility	Loop ΔT + CDU approach	Technology-cooling loop carries heat to the CDU heat exchanger	Cumulative; every approach temperature here narrows free-cooling headroom

Representative single-GPU values for a ~1 kW datacenter accelerator under single-phase direct-to-chip cold-plate cooling. Resistances are order-of-magnitude design references, not vendor specs; the series sum and the coolant temperature together set the junction temperature.

The reason air lost is visible in the third row. The convective resistance from a surface to a fluid scales with the fluid's heat-transfer coefficient and the wetted area. Water's volumetric heat capacity is roughly 3,500× that of air, and its convective coefficient at a cold-plate surface is one to two orders of magnitude higher than forced air over a finned heatsink. Air can be pushed harder — more CFM, taller fins, colder supply — but each lever has sharply diminishing returns and a parasitic-fan-power penalty that eventually exceeds the heat it removes. Liquid does not so much beat air as operate in a different regime entirely: it shrinks the case-to-coolant resistor by enough that the same TDP fits inside the same junction budget at a far more relaxed coolant temperature. → the cold-plate engineering is in Chapter 5.4; in-chip microchannels that attack Rθ-JC itself are in Chapter 16.2.

The vendor fixes two numbers; you spend the gap

Read the resistance stack as a budget you are handed and then spend. Tjunction-max and TDP are given. Your degrees of freedom are exactly two: the coolant supply temperature and the sum of resistances from coolant to junction. A warmer coolant (good for free cooling and heat reuse) eats into the same budget that a higher TDP eats into — they trade against each other in the same equation. This is why the GB200 NVL72 envelope is so tight: a ~1 kW-plus GPU with a ~90 °C junction limit leaves only enough budget for ~20–25 °C inlet coolant across a well-designed cold plate, and deviation throttles the GPUs by up to 50%. The thermal designer's whole job is to spend that gap so the silicon stays legal while the facility stays efficient — and those two goals pull in opposite directions.

Why air hit a wall

Air cooling did not gradually run out of headroom; it hit a wall, and the location of the wall is one of the most consequential numbers in this guide. With aggressive hot/cold-aisle containment, tuned airflow, and cool supply air, a well-engineered hall tops out around 40–50 kW per rack, with ~41 kW a common practitioner reference for the point past which air becomes uneconomic and unreliable (ASHRAE TC 9.9; SemiAnalysis). Some operators push to 50 kW with specialized architecture; many never clear 20 kW in legacy halls. The exact figure is site-specific, but the existence of the ceiling is not.

Three physical limits converge to build the wall. First, fan power scales with the cube of airflow: doubling CFM to chase more heat roughly octuples fan energy, so beyond a point you are spending more electricity moving air than the air removes as useful cooling — a negative-return regime. Second, air's low heat capacity forces large temperature rises and large volumes; the supply-to-return delta-T air can carry is small, and you run out of mass flow before you run out of fans. Third, acoustic and velocity limits cap how hard you can blow before noise, vibration, and bypass airflow make the hall unworkable and the cooling ineffective at the chip. Past the wall, the curve does not bend — it breaks. There is no airflow scheme, no containment trick, no warmer ASHRAE class that closes a 90 kW gap between a 41 kW air ceiling and a 132 kW rack.

This is why the boundary is a cliff, not a slope, and why it is a one-way door in a retrofit. A hall built for air has the wrong floor loading, no plenum or pipe-rack for liquid distribution, insufficient electrical headroom, and frequently no facility water provisioned at all. Crossing the cliff after the fact runs $5–10M/MW and still tends to strand capacity: power you cannot use because cooling caps first, or floor area you cannot fill because the slab cannot bear wet racks. The decision to plumb a hall for liquid is an archetype decision, not a mechanical one, and it must be made before the slab is poured. → the retrofit paths are engineered in Chapter 5.10; air pushed to its honest limit is Chapter 5.2.

The density curve, 2020–2027

The density wall would be an academic curiosity if accelerators had stayed where they were. They did not. Per-GPU thermal design power has climbed from the A100's ~300 W to the H100's 700 W to GB200's ~1.0–1.2 kW, with Rubin and Rubin Ultra projected near ~1.8 kW and ~2.3 kW. Multiply by the GPUs packed into a rack and the rack-level curve is steeper still — because the scale-up domain grew at the same time, concentrating more silicon behind a single liquid manifold. The result is a density ramp that crossed the air-cooling cliff somewhere in the H100-to-GB200 transition and never looked back.

Rack density by GPU generation, and the cooling regime each forces

Generation (year)	Per-GPU TDP	Per-rack draw	Cooling regime forced	Relation to the air cliff
A100 / HGX (2020–22)	~300–400 W	~10–20 kW	Air; raised floor + containment	Comfortably under the wall
H100 / HGX (2023)	~700 W	~30–40 kW	Air at the limit; RDHx optional	At the wall; air still wins for many
GB200 NVL72 (2024–25)	~1.0–1.2 kW	~120–132 kW	Direct-to-chip liquid mandatory	~3× over the wall; no air path exists
GB300 NVL72 (2025)	~1.4 kW class	~135–142 kW (up to ~155 kW peak)	DLC; residual air load on RDHx	Well over; hybrid liquid+air per rack
Rubin VR200 (2026)	~1.8 kW	~190–230 kW	DLC + 800 VDC power path	Far over; warm-water loops to free-cool
Rubin Ultra Kyber (2027)	~2.3 kW	~600 kW	DLC mandatory; in-chip microfluidics on the roadmap	An order of magnitude over the wall

Per-rack figures are NVIDIA-class reference points; 2026–2027 entries are roadmap, not shipping (Rubin TDPs and Kyber rack power are pre-shipment estimates). The regime column is the consequence the density forces — it is not a choice once the density is fixed.

The rightmost column is the consequence. Once the density column crosses ~41 kW, the cooling regime is no longer a decision you get to make; the physics has made it for you. The only decisions left are when you cross (which generation your facility targets) and whether the irreversible substrate is ready when you do. A hall scoped for 40 kW air-cooled racks cannot absorb a 132 kW NVL72 generation, let alone a 600 kW Kyber-class rack — not the floor, not the power chain, not the cooling plant. The expensive mistake of the 2026 era is designing to today's density and being surprised by the ramp.

The density-ramp trap is the irreversible mistake

The fix is not to build for 600 kW on day one — that strands capital against a depreciation clock. The fix is to make the irreversible substrate accommodate the ramp while keeping the reversible IT fit-out matched to the current generation. Reserve what you cannot retrofit: floor loading for ~3,000–5,000 lb wet racks, facility water and pipe-rack routing, CDU floor space, and electrical headroom and knockouts for the next voltage class. Defer what you can: the specific accelerator, the CDU count, the loop fill. A powered shell plumbed for liquid preserves the density-ramp option at a modest premium; a hall built air-only against the 2024 generation forecloses it permanently. → the scoping cascade that frames this fork is in Chapter 1.1.

~41 kW

practical air-cooling ceiling per rack; RDHx bridges ~50–100 kW; DLC clears 200+ kW

2025ASHRAE TC 9.9; SemiAnalysis Datacenter Anatomy

~3,500×

volumetric heat capacity of water vs air — the reason liquid operates in a different cooling regime

2025Thermodynamic reference; ASHRAE TC 9.9

~120–132 kW

per GB200 NVL72 rack (~115 kW removed by liquid, ~17 kW by air); ~3× over the air wall

2025NVIDIA OCP / Introl

20–25 °C inlet, ~80 L/min

GB200 NVL72 DLC envelope; deviation throttles GPUs up to ~50%

2025NVIDIA OCP / Introl

~135–142 kW

per GB300 NVL72 rack (up to ~155 kW peak); CPUs/GPUs/NVSwitch liquid, optics/storage air

2025Schneider Electric / HPE / Lenovo datasheets

~600 kW

per Rubin Ultra Kyber NVL576 rack on 800 VDC (roadmap, 2H2027)

H2 2027 (announced)NVIDIA GTC; The Next Platform; Tom's Hardware

~2.3 kW

projected Rubin Ultra per-GPU TDP (A100 was ~0.3 kW) — a ~7× climb in seven years

2026 (pre-ship)NVIDIA / SemiAnalysis roadmap

$5–10M/MW

cost to retrofit an air-cooled hall across the cliff to AI liquid cooling; still strands capacity

2026Introl / SemiAnalysis

The cooling hierarchy and where each rung saturates

Map the density curve onto the available cooling technologies and you get a hierarchy — a ladder where each rung removes more heat at higher capital cost and integration complexity, and each rung saturates at a density that hands the load to the next. The engineering discipline is to choose the lowest rung that clears your peak density with margin, because every rung up costs money, water, and plumbing complexity you do not get back.

Air (containment + CRAH/in-row). Saturates ~40–50 kW/rack. Cheapest, simplest, no facility water at the rack. Still the right answer for storage, networking, modest-density inference, and edge. → Chapter 5.2.
Rear-door heat exchangers / air-assisted liquid. Bridges ~50–100 kW/rack. Captures heat at the rack exhaust with a liquid coil; the brownfield-friendly rung because it needs no chip-level plumbing and tolerates facilities without facility water. Saturates where the door coil can no longer extract a high enough fraction of the heat. → Chapter 5.3.
Direct-to-chip liquid (single-phase DLC). The 2026 default, ~55% of the liquid-cooling market. Cold plates on the GPUs/CPUs/switches, in-rack manifolds, dripless quick-disconnects, a CDU isolating the technology-cooling loop from facility water. Clears 100 kW to 200+ kW and scales to the Kyber generation with warm-water loops. → Chapter 5.4; CDU loop in Chapter 5.6.
Immersion (single- and two-phase). Best-in-class PUE, but niche: single-phase wins on serviceability and floor loading, two-phase stalled on the PFAS reckoning and insurability. → Chapter 5.5.
In-chip / direct-to-silicon microfluidics. The next rung, attacking the in-package Rθ-JC resistor itself with microchannels etched into or onto the die — the only lever that touches the dominant resistance the cold plate cannot reach. Roadmap, not yet default. → Chapter 16.2.

Deep dive: the chain of delta-Ts, approach temperature, and why warm water decides free cooling

Heat does not teleport from the junction to the sky; it walks down a staircase of temperature drops, and every step costs you. SemiAnalysis frames this as the Four Delta-Ts, and it is the right mental model for the entire facility. Start at the junction (~90 °C limit). Drop across the package and TIMs to the cold-plate coolant. Drop again across the loop as the coolant carries heat to the CDU. Drop a third time across the CDU heat exchanger from the technology-cooling loop to the facility-water loop. Drop a fourth time at heat rejection — the cooling tower, dry cooler, or chiller that finally hands the heat to ambient. The junction temperature is fixed; ambient is fixed by your climate and season; everything in between is a budget of degrees you allocate across four exchangers.

The lever at each exchanger is approach temperature — the gap between the two fluids leaving a heat exchanger that never fully equalize. A tighter approach means a more effective (and larger, costlier) exchanger but a warmer achievable supply on the cold side. Formally this is captured by effectiveness and the NTU (number of transfer units) method: effectiveness is the actual heat transferred divided by the thermodynamic maximum, and it rises with NTU, which rises with exchanger surface area and overall conductance. More area buys more effectiveness buys a tighter approach buys warmer facility water for the same junction temperature.

Why does warmer water matter so much? Because it is the difference between free cooling and mechanical chilling. If your facility-water loop can run warm — ASHRAE's W17 through W45-plus classes key cooling water by supply temperature — a dry cooler or tower can reject heat to ambient for most or all of the year, and a PUE near 1.1 is reachable. If your delta-T budget forces cold supply water, you burn compressor energy on chillers, drive PUE up, and shrink your siting envelope to cool climates. Every degree you waste on a sloppy TIM or an under-sized exchanger upstream is a degree you cannot spend on free cooling downstream. This is why the 30 °C-coolant roadmap exists, and why warm-water design is treated as a first-class objective rather than an afterthought. → facility loops and warm-water design in Chapter 5.7; heat rejection in Chapter 5.8; the metric definitions in Chapter 15.1.

Thermal metrics used in this part

Part 5 leans on a small, consistent vocabulary of thermal metrics. Pin them down here so the engineering chapters can use them without re-deriving:

Approach temperature — the residual gap between the two streams leaving a heat exchanger. Smaller approach, more effective and more expensive exchanger, warmer achievable cold-side supply. The single knob you tune at every exchanger in the chain.
Effectiveness (ε) and NTU — effectiveness is actual heat transfer over the thermodynamic maximum; NTU is the dimensionless measure of exchanger size (conductance × area over the minimum heat-capacity rate). The ε-NTU method is how you size CDU and facility heat exchangers without solving the full temperature field. Higher NTU asymptotes toward ε = 1 with diminishing returns.
Delta-T (ΔT) — the temperature rise a coolant carries across a load. A larger ΔT moves the same heat at lower flow (smaller pumps, smaller pipes), which is why warm-water, high-ΔT design is favored. The flow-rate rule of thumb — ~1.2–2.0 L/min per kW — falls directly out of the ΔT you choose.
Heat flux (W/cm²) — power per die area, the quantity the cold plate actually fights at the hotspot, distinct from total TDP.

The facility-efficiency metrics — PUE, WUE, ITUE, and TUE — sit one level up, scoring the whole plant rather than a single exchanger. PUE is total facility energy over IT energy; WUE is water consumed per IT energy; ITUE and TUE extend the accounting to capture fan and pump parasitics that liquid cooling reshuffles. These are defined canonically in Chapter 15.1 and used throughout Part 5 as the scorecard for the design choices this chapter sets up; we name them here only so the cross-references resolve.

Choose the lowest rung that clears your peak density with margin

The cooling-hierarchy decision reduces to a single rule with two failure modes on either side. Choose a rung too low for your density and you hit the saturation wall — the rack throttles, the hall strands capacity, and you pay $5–10M/MW to retrofit upward. Choose a rung too high for your density and you have over-plumbed a facility, sinking capital into CDUs, manifolds, and facility water that a modest-density inference workload never needed. The right move is to identify the peak density across the planned ramp (not the steady state), select the lowest rung that clears it with thermal margin, and make sure the irreversible substrate reaches one rung higher than today's fit-out. Peak-density-driven, ramp-aware, substrate-hedged: that is thermal scoping.

This chapter sets the foundation the rest of Part 5 builds on. Air pushed to its honest limit is Chapter 5.2; the rear-door bridge is Chapter 5.3; direct-to-chip liquid — the 2026 default — is Chapter 5.4; immersion is Chapter 5.5; the CDU and secondary loop are Chapter 5.6; facility water and warm-water design are Chapter 5.7; heat rejection is Chapter 5.8; and retrofitting across the cliff is Chapter 5.10. The density-and-cooling fork that this chapter treats as physics is framed as a scoping decision in Chapter 1.1; in-chip microfluidics that attack the in-package resistance live on the roadmap in Chapter 16.2; and the efficiency metrics that score every cooling choice are defined in Chapter 15.1.