The Definitive Guide toAI Data Centers
Ask the Guide
Guide Sustainability & Efficiency15.2

Chapter 15.2

Energy Efficiency: Cooling, Free Cooling, Setpoints & Power-Chain Losses

Energy efficiency in an AI facility is not a virtue you bolt on after commissioning — it is a chain of irreversible design forks (coolant temperature, economizer hours, voltage class) that you either bank at scoping time or pay for in megawatts every hour the building runs, and the leverage has migrated from chillers to coolant setpoints and the power chain.

POWER-BOUNDGOODPUT

What you'll decide here

  1. The facility-water and coolant setpoint band (chilled ~18-27 C vs warm-water ~32-45 C) — the single decision that determines how many compressor-free hours your climate gives you and whether heat reuse is even physically possible.
  2. Whether to design for free cooling / economization at all, and which kind (airside, waterside, or dry-cooler-direct) — a climate-and-water-coupled fork that sets your annualized PUE floor, not the nameplate.
  3. The IT inlet and coolant-return setpoints you will actually operate at — every degree of headroom you decline to use is compressor energy you chose to spend, but raising setpoints narrows the thermal ride-through margin that protects 1 kW+ GPUs.
  4. The power-chain topology and UPS operating mode (double-conversion ~94-96% vs eco/dynamic-online ~98-99%; 415/480 VAC vs an 800 VDC path) — 2-5 points of end-to-end efficiency that compounds across every watt, for the life of the building.
  5. Whether ML-driven cooling control and part-load-aware operation are in the design basis or retrofitted later — the difference between a plant that is efficient at nameplate and one that is efficient at the 40-70% utilization where it actually lives.

The efficiency conversation in AI data centers is usually conducted in the wrong units. People argue about PUE as if it were a scoreboard, when it is a consequence: the visible residue of a dozen upstream decisions about coolant temperature, economizer design, setpoint discipline, and how many conversion stages sit between the utility and the GPU's voltage regulator. Most of the leverage is spent at design time, in decisions that are expensive or impossible to reverse once steel is cut and water is plumbed.

Three things changed with the AI buildout. First, the denominator grew: an air hall's cooling overhead used to be 40-50% of IT load; a well-designed warm-water direct-to-chip (DLC) plant can push facility overhead toward 5-10%, which means the marginal efficiency game has moved off the chiller and onto the coolant loop and the power chain. Second, density removed the slack: at 132 kW/rack and climbing toward 600 kW, you no longer have the option to waste a few points of efficiency on conservative setpoints — the cooling plant is on the critical path, and every avoidable watt of overhead is a watt of IT capacity you cannot energize in a power-bound site. Third, the metric stopped meaning what it used to: PUE was built for an air-cooled world and quietly rewards moving inefficiency inside the IT envelope (server fans, pumps) where it stops counting as overhead. We treat the metric stack itself in Chapter 15.1; here we treat the physical decisions that move it.

The efficiency budget: where the leverage actually is

Start by drawing the energy budget honestly, because that is where you find the leverage. In a modern AI facility, non-IT (overhead) energy splits roughly into cooling (60-80% of overhead) and the power chain (conversion and distribution losses, most of the rest), with lighting and ancillaries a rounding error. Cooling is the dominant lever, and within cooling the dominant sub-lever is not the chiller's efficiency at full load — it is how many hours per year you can avoid running the compressor at all. That single framing reorders the entire chapter: the highest-value efficiency decision is not buying a better chiller, it is designing a coolant loop warm enough that the chiller is mostly off.

The reason this is the master lever is thermodynamic. Mechanical (compressorized) cooling is the expensive mode; free cooling — rejecting heat directly to ambient air or water without a vapor-compression cycle — is nearly free by comparison. The fraction of the year you spend in each mode is set almost entirely by two numbers you choose at design time: the coolant/facility-water supply temperature and the approach temperature of your heat-rejection equipment relative to ambient. Raise the supply temperature and you convert thousands of compressor-hours into economizer-hours. It is why a Reykjavík facility and a Phoenix facility built to the same nameplate PUE can have wildly different annualized PUE. The nameplate is a snapshot at design conditions; the annualized number is the integral over the weather, and the weather is set by siting (Chapter 3.7) and exploited by setpoint.

Cooling architecture and the PUE bands it produces

Cooling architecture is the first fork, and it sets a hard band on achievable PUE before you tune a single setpoint. The choice is not a smooth dial — it is a small set of discrete regimes, each with its own efficiency floor, each gated by rack density (the cooling cliff of Chapter 5.1). The table below is the decision, stated as bands rather than point estimates because the within-band variation is exactly the setpoint-and-climate game discussed in the rest of the chapter.

Cooling architecture → achievable PUE band and the efficiency mechanism
Cooling architectureDensity fitPUE bandFree-cooling exposureWhy it sits there
Legacy air + DX/chiller≤ ~30 kW/rack1.4-1.6Low — compressor-bound much of the yearCompressorized cooling dominates; warm supply air limited by IT inlet specs
Optimized air + economizer≤ ~40 kW/rack1.25-1.4Moderate — airside/waterside economizer hoursContainment + economization cut compressor hours; still air-transport-limited
Rear-door HX / air-assisted liquid~40-75 kW/rack1.2-1.35Moderate-high — warmer loop enables more free hoursLiquid-to-the-door removes the air-transport penalty; loop can run warmer
Direct-to-chip liquid (warm-water)~75-200+ kW/rack1.05-1.15High — 32-45 C loop is dry-cooler / economizer friendlyHeat captured at the source in a warm loop; compressor often unnecessary
Single-/two-phase immersion~50-200+ kW/rack1.02-1.10Very high — high-grade warm bath rejects to ambientMinimal fan/pump parasitics; best floor, but fluid/PFAS and serviceability costs
PUE bands are 2026-current design ranges (SemiAnalysis Datacenter Anatomy Pt 2; Uptime Institute). Annualized PUE lands inside these bands as a function of climate and setpoint discipline — the colder the site and the warmer the loop, the closer to the floor.

The architecture you can choose is gated by density on the left; the PUE band you inherit is on the right; and the mechanism that places you within the band, free-cooling exposure, is the lever the rest of this chapter pulls. The consequence is sharp: choosing air at the limit instead of warm-water DLC does not merely cost you a few tenths of PUE at nameplate — it caps your annualized efficiency at a band you can never escape, because an air hall's loop is too cold to economize for most of the year in most climates. The cooling architecture is, in effect, a decision about how many compressor-hours you have signed up to pay for over the building's life.

Free cooling and economization: the highest-value efficiency decision

Free cooling — economization — is the act of rejecting heat to the environment without running a vapor-compression cycle. It is the single largest annualized-efficiency lever in the building, and it comes in three architectures that trade water, capital, and reachable hours against one another. The fork is not merely "do we economize" (you always should) but which kind, because each one couples to a different siting constraint and a different downstream cost.

Airside economization pulls filtered outside air directly into the hall when ambient is cool and dry enough, exhausting hot air rather than recirculating and re-cooling it. It is the cheapest to operate and the most water-free, but it imports outdoor humidity, particulates, and gaseous contaminants into the white space — a real reliability liability — and it works only when the air-cooled IT inlet spec (ASHRAE A1-A4) can be met directly. Waterside economization keeps the air loop sealed and instead uses a cooling tower or dry cooler to chill the facility water without the chiller, via a plate heat exchanger when wet-bulb (or dry-bulb) is low enough. It tolerates a wider climate envelope and keeps contaminants out, at the cost of water (evaporative towers) or a larger dry-cooler footprint and a warmer achievable loop. Dry-cooler-direct (compressor-less) operation is the warm-water DLC endgame: if your facility loop runs at 40-45 C, a dry cooler can reject to ambient air across most of the year with no evaporation and no compressor at all — zero process water for cooling, and PUE that approaches the pump-and-fan parasitic floor.

Free-cooling architecture → the trade it forces
Economizer typeSealed white space?Water useReachable free hoursPrimary downside
Airside (direct outside air)No — outside air enters hallNone (unless adiabatic assist)High in cool/dry climatesImports humidity, particulates, gaseous contaminants; needs filtration + RH control
Airside + adiabatic assistNoModerate (evaporative pre-cool)Extends warm-climate hoursReintroduces water; adds spray/media maintenance
Waterside (tower + plate HX)YesHigh (evaporative) or low (dry)High where wet-bulb is lowTower water, blowdown, Legionella control (ASHRAE 188)
Dry-cooler-direct (warm loop)Yes≈ Zero for coolingVery high if loop ≥ ~40 CLarger heat-rejection footprint; needs warm-water DLC to begin with
Reachable-hours depends on climate; figures are directional for a temperate-to-cool site. Engineering home for heat-rejection equipment is Chapter 5.8; the warm-water facility loop is Chapter 5.7.

The trade is a water-versus-PUE-versus-capital triangle, and the AI-density era has bent it decisively toward the dry-cooler-direct corner. The reason is the warm loop: once DLC lets you capture heat at 40-45 C instead of cooling air to 18-27 C, the dry cooler becomes viable for most of the year in most temperate climates, which lets you design water out of the building entirely for cooling — the closed-loop, near-zero-WUE designs hyperscalers now publish (Microsoft's next-gen closed-loop facilities report cutting >125 million litres per year). That is the same decision that governs water stewardship in Chapter 15.4, which is why efficiency and water cannot be optimized separately: the coolant setpoint that maximizes free-cooling hours is also the one that lets you eliminate evaporative water. The cost you pay is footprint and capital — dry coolers are larger and cannot reach as low a loop temperature as an evaporative tower on a hot day — which is precisely why this is a siting-coupled decision, not a mechanical one.

Warm-water / high-temperature cooling: the setpoint that unlocks everything

Everything above converges on one number: the facility-water supply temperature. The industry has spent decades over-cooling — running 18-27 C chilled loops because that was what air-cooled IT inlets demanded — and in doing so threw away both free-cooling hours and any chance of heat reuse. Warm-water DLC inverts the logic. ASHRAE's liquid-cooling classes (W17 through W45+ in the 5th-edition Thermal Guidelines) are keyed to the upper facility-water supply temperature precisely to make this a deliberate design choice, and the direction of travel is unambiguous: ASHRAE TC 9.9's own roadmap argues for standardizing toward a ~30 C facility-water target, and NVIDIA's Vera Rubin reference designs target dry-cooler-capable 40-45 C loops.

The consequence chain from this one setpoint is the most important in the chapter. A warmer loop (a) widens the temperature difference between your coolant and ambient, which (b) lets a dry cooler or economizer reject heat across more of the year, which (c) collapses compressor-hours toward zero, which (d) drops annualized PUE toward the parasitic floor, and simultaneously (e) lifts the return-water temperature high enough that the waste heat becomes a sellable product instead of a disposal problem (Chapter 15.5). One setpoint, five downstream wins. The cost is margin: a warmer loop leaves less thermal headroom, so the cold-plate design, flow rate, and CDU approach temperature must be tighter and the controls more disciplined (the transient-stability problem of Chapter 5.12). The engineering of the loop that delivers it lives in Chapter 5.7.

Setpoint strategy: every conservative degree is a watt you chose to spend

Setpoint strategy is where design intent meets operating reality, and it is where most facilities quietly leave efficiency on the table. The instinct of an operations team is to run cold and conservative — it feels safe. But ASHRAE's recommended IT inlet envelope has been 18-27 C for years, with A1-A4 allowable ranges extending to ~32-45 C, and every degree you decline to use is compressor or fan energy you have chosen to spend for a thermal margin the equipment did not require. The setpoint decision is therefore a deliberate risk-versus-efficiency trade, and it must be made explicitly rather than defaulted to "cold."

The fork has three settings. Conservative (low IT inlet, cold loop, wide margin) maximizes ride-through and equipment longevity headroom at the cost of free-cooling hours — defensible only for legacy air halls or sites with poor thermal monitoring. ASHRAE-recommended (mid-band inlet, moderate loop) is the safe default for most operators. Aggressive / allowable-band (high inlet within A-class limits, warm loop) maximizes economizer hours and heat-reuse grade, and is the right posture for a well-instrumented liquid-cooled facility with disciplined controls — but it narrows the thermal ride-through window, which matters enormously when a 1 kW+ GPU can thermal-trip within seconds of a cooling-loss event (the resilience coupling of Chapter 12.2). The downstream cost of running warm is not normally efficiency — it is that you have spent your transient margin, so your pump redundancy, UPS-backed cooling, and controls stability had better be commissioned to match. Raising setpoints without first proving ride-through is how an efficiency initiative becomes an outage.

Power-chain efficiency: the other half of overhead

Cooling is the larger lever, but the power chain is the one that compounds across every single watt the building draws, every hour, with no weather dependence. Each conversion stage between the utility and the GPU's voltage regulator — transformer, UPS, PDU, rack PSU, board-level VRM — sheds a few percent, and the product of those efficiencies is the end-to-end electrical efficiency. A legacy AC chain can land anywhere from ~61% to ~87.5% end-to-end depending on vintage; a modern path exceeds 92%, and an 800 VDC architecture targets a further ~5-point gain by eliminating conversion stages (SemiAnalysis Datacenter Anatomy Pt 1, 2025). Five points end-to-end does not sound like much until you multiply it by a gigawatt running continuously for a decade.

Two power-chain decisions carry most of the leverage. The first is the UPS operating mode: a double-conversion UPS runs at ~94-96% efficiency because it rectifies and re-inverts continuously; eco / dynamic-online modes hold the inverter on standby and pass utility power through at ~98-99%, recovering 2-4 points — at the cost of a few milliseconds of transfer time that must be reconciled against the load's ride-through requirement (the UPS architecture decision lives in Chapter 4.5). The second is the voltage architecture: the 48 V → ±400 V → 800 VDC transition (Chapter 4.7) removes conversion stages, pushes ~150% more power through the same copper, and is the structural enabler for 600 kW+ racks — which is why the efficiency case and the density-ramp case point the same way. Both decisions are substantially irreversible: you commission a UPS topology and a voltage class once, and re-doing either mid-life is a rebuild, not a tune-up.

~1.54
industry weighted-average PUE, flat for six straight years (legacy infrastructure limits gains)
2025Uptime Institute Global Data Center Survey 2025
~1.09
Google fleet-wide trailing-12-month PUE; best sites ~1.07-1.08 (quarterly swings 1.08-1.11 with season)
2025Google datacenters.google
1.05-1.15
PUE band for warm-water direct-to-chip liquid; two-phase immersion ~1.02-1.10; air halls 1.3-1.5+
2025SemiAnalysis Datacenter Anatomy Pt 2 / Uptime
~60-80%
cooling share of non-IT (overhead) energy — the dominant efficiency lever
2025ASHRAE TC 9.9 / SemiAnalysis synthesis
~40-45 C
warm-water facility-loop target (NVIDIA Vera Rubin) enabling dry-cooler / compressor-less rejection; ASHRAE roadmap argues ~30 C standard
2025ASHRAE TC 9.9 30°C Coolant Roadmap; NVIDIA
94-96% → 98-99%
UPS efficiency: double-conversion vs eco/dynamic-online mode (2-4 points recoverable, large at GW scale)
2025Vertiv / Eaton; SemiAnalysis
>92% (→ +~5 pts)
modern end-to-end electrical-chain efficiency; 800 VDC path targets a further ~5-point gain over legacy AC
2025SemiAnalysis Datacenter Anatomy Pt 1; 800VDC Revolution
up to ~50%
GPU throttle if coolant inlet/flow drifts out of spec — the hard stop on warm-setpoint efficiency gains
2025NVIDIA OCP / Introl (GB200 NVL72)

ML-driven cooling optimization: closing the part-load gap

A plant designed to be efficient at nameplate is not the same as a plant that operates efficiently, because the building almost never sits at nameplate. AI facilities live at 40-70% utilization much of the time, with training jobs that ramp and checkpoint and inference fleets that swing 30-90% in minutes — and cooling plant that is efficient at full load is frequently inefficient at part load, where pumps, fans, and chillers run off their best-efficiency point. The part-load gap is real money, and it is the domain where machine-learning control has earned its place in the design basis rather than as an afterthought.

The canonical result is Google's, where a DeepMind reinforcement-learning controller adjusting setpoints across the cooling plant in real time cut cooling energy by ~40% and total facility overhead (PUE) by ~15% relative to the human-tuned baseline — the kind of gain that is invisible to nameplate PUE because it lives entirely in the part-load, multi-variable interactions a static setpoint table cannot capture. The mechanism is straightforward: a cooling plant has dozens of coupled actuators (pump speeds, valve positions, tower fan speeds, chiller staging) and a non-linear response surface that shifts with load and weather; an ML controller searches that surface continuously where a human operator sets-and-forgets. The decision is whether to design for this — instrumenting the plant densely enough (the DCIM telemetry of Chapter 14.2) and giving the controller safe authority — or to bolt it on later against a plant that lacks the sensors and actuators to exploit it. Retrofitting observability is far more expensive than designing it in, which is why agentic and RL-based control (Chapter 14.13) belongs in the efficiency design basis, not the wish list. The caution: an ML controller that optimizes facility PUE without a goodput constraint will happily throttle the IT load to make its own number look better — the objective must be useful-work-per-watt, with the GPU thermal envelope as a hard constraint, never PUE in isolation.

Deep dive: the four delta-Ts and why each one is an efficiency decision

Practitioners decompose a liquid-cooled facility's thermal path into a chain of temperature differences — the "four delta-Ts" — and each one is a lever you trade against capital, parasitic power, and free-cooling hours. Walking them from chip to sky makes the efficiency physics concrete.

Delta-T #1: chip-to-coolant (across the cold plate). Set by cold-plate thermal resistance (~0.02-0.03 C/W) and flow rate. A tighter delta-T here lets the coolant run warmer for the same junction temperature — but demands more flow, which costs pump power. Delta-T #2: coolant-loop rise (the ~7.5-12 C the coolant gains across the rack). A larger rise means less flow for the same heat — lower pump parasitics — but a higher return temperature that the CDU must handle. Delta-T #3: CDU approach (the ~3-5 C the heat exchanger loses transferring from the technology-cooling loop to the facility-water loop). Smaller approach means a warmer facility loop for the same chip temperature — directly more free-cooling hours — but a larger, costlier heat exchanger. Delta-T #4: facility-loop-to-ambient (the margin the dry cooler or tower needs over wet- or dry-bulb to reject heat). This is the one the weather controls and the warm loop widens; it is the delta-T that decides how many hours the compressor stays off.

The unifying insight: efficiency is the art of spending the right delta-T in the right place. Every degree you can push into the facility-loop-to-ambient delta-T (#4) by tightening the upstream three (#1-#3) is a degree of free-cooling exposure. That is why warm-water design, cold-plate engineering, and CDU sizing are not separate problems — they are one continuous budget, and the budget's bottom line is annualized PUE. Engineering homes: cold plates and flow in Chapter 5.4, the CDU and secondary loop in Chapter 5.6, heat rejection in Chapter 5.8.

Deep dive: why PUE understates liquid-cooled efficiency (and what to use instead)

PUE was defined for an air-cooled world, and it has a structural blind spot that the liquid-cooled, high-density era makes acute: it counts everything outside the IT envelope as overhead, and everything inside it as useful — regardless of whether the inside-the-box power is doing computation or merely moving heat. In an air-cooled server, the cooling fans (sometimes 10-20% of server power at high density) live inside the IT boundary and therefore improve PUE even though they are pure cooling overhead. Move that same heat-moving work to a facility CDU and pump, and it now counts against PUE. The metric can be gamed by relocating inefficiency across the IT boundary, and direct-to-chip liquid — which removes the server fans — can make a genuinely more efficient facility look worse on PUE than the air hall it replaced.

The fix is to measure the thing PUE was a proxy for. TUE (Total-power Usage Effectiveness) folds the in-server cooling and conversion losses back in (TUE = ITUE × PUE), so relocating a fan across the boundary changes nothing. Work-based metrics go further and divide useful computational work by total energy, which is the only framing that correctly penalizes a throttled GPU. The practical takeaway for this chapter: when you raise setpoints or warm the loop, watch a TUE-class or work-based number, not facility PUE — otherwise you can congratulate yourself on a falling PUE while server fans spin up and goodput falls. The full metric stack and its governance live in Chapter 15.1.

Part-load and utilization: efficient at the load you actually run

The last efficiency lever is the most often ignored because it does not appear on a nameplate at all: match the plant's efficiency curve to the load profile you will actually run, not the design-day peak. A cooling plant or UPS that is most efficient at 100% load is the wrong plant for a facility that lives at 40-70%. The fix is modularity and staging — multiple smaller chillers, pumps, and CDUs that can be staged on and off to keep the running units near their best-efficiency point, rather than a few large units running inefficiently part-loaded. The same logic governs the power chain: a UPS bank sized so that each module sits in its high-efficiency band at typical load beats one sized so every module idles at 30%.

This is a design-time decision with an operating-time payoff, and it interacts with two threads of this guide. On the power-bound axis, part-load-efficient plant means more of your contracted megawatts reach the IT load instead of being lost to oversized, under-loaded equipment — directly more compute per interconnection slot. On the goodput axis, the staging logic must never compromise the thermal ride-through that protects the GPUs, so part-load efficiency and resilience are co-designed (Chapter 14.7 on operational capacity/thermal management). The anti-pattern is the facility designed and benchmarked at peak, commissioned with a single efficiency point, and then operated for years at a load where that point is irrelevant — efficient on paper, wasteful in production.

This chapter is the strategy-and-decisions layer over the cooling and power engineering treated in depth elsewhere. The metric stack it leans on — PUE, WUE, ERF, TUE, work-based metrics — is built in Chapter 15.1. The density wall that gates cooling architecture is Chapter 5.1; warm-water facility loops in Chapter 5.7; heat rejection, economizers, dry coolers and towers in Chapter 5.8; the CDU and secondary loop in Chapter 5.6; DLC cold-plate engineering in Chapter 5.4; and setpoint/controls transient stability in Chapter 5.12. The power-chain decisions live in Chapter 4.5 (UPS modes) and Chapter 4.7 (the DC power revolution). Climate-and-water siting is Chapter 3.7; the efficiency-vs-water coupling is Chapter 15.4; the warm-loop heat-reuse payoff is Chapter 15.5. The goodput-vs-availability reframe that disciplines setpoint risk is Chapter 12.2; operational capacity/thermal management is Chapter 14.7; DCIM telemetry is Chapter 14.2; and ML/agentic cooling control is Chapter 14.13.