Chapter 15.2
Energy Efficiency: Cooling, Free Cooling, Setpoints & Power-Chain Losses
Energy efficiency in an AI facility is not a virtue you bolt on after commissioning — it is a chain of irreversible design forks (coolant temperature, economizer hours, voltage class) that you either bank at scoping time or pay for in megawatts every hour the building runs, and the leverage has migrated from chillers to coolant setpoints and the power chain.
What you'll decide here
- The facility-water and coolant setpoint band (chilled ~18-27 C vs warm-water ~32-45 C) — the single decision that determines how many compressor-free hours your climate gives you and whether heat reuse is even physically possible.
- Whether to design for free cooling / economization at all, and which kind (airside, waterside, or dry-cooler-direct) — a climate-and-water-coupled fork that sets your annualized PUE floor, not the nameplate.
- The IT inlet and coolant-return setpoints you will actually operate at — every degree of headroom you decline to use is compressor energy you chose to spend, but raising setpoints narrows the thermal ride-through margin that protects 1 kW+ GPUs.
- The power-chain topology and UPS operating mode (double-conversion ~94-96% vs eco/dynamic-online ~98-99%; 415/480 VAC vs an 800 VDC path) — 2-5 points of end-to-end efficiency that compounds across every watt, for the life of the building.
- Whether ML-driven cooling control and part-load-aware operation are in the design basis or retrofitted later — the difference between a plant that is efficient at nameplate and one that is efficient at the 40-70% utilization where it actually lives.
The efficiency conversation in AI data centers is usually conducted in the wrong units. People argue about PUE as if it were a scoreboard, when it is a consequence: the visible residue of a dozen upstream decisions about coolant temperature, economizer design, setpoint discipline, and how many conversion stages sit between the utility and the GPU's voltage regulator. Most of the leverage is spent at design time, in decisions that are expensive or impossible to reverse once steel is cut and water is plumbed.
Three things changed with the AI buildout. First, the denominator grew: an air hall's cooling overhead used to be 40-50% of IT load; a well-designed warm-water direct-to-chip (DLC) plant can push facility overhead toward 5-10%, which means the marginal efficiency game has moved off the chiller and onto the coolant loop and the power chain. Second, density removed the slack: at 132 kW/rack and climbing toward 600 kW, you no longer have the option to waste a few points of efficiency on conservative setpoints — the cooling plant is on the critical path, and every avoidable watt of overhead is a watt of IT capacity you cannot energize in a power-bound site. Third, the metric stopped meaning what it used to: PUE was built for an air-cooled world and quietly rewards moving inefficiency inside the IT envelope (server fans, pumps) where it stops counting as overhead. We treat the metric stack itself in Chapter 15.1; here we treat the physical decisions that move it.
The efficiency budget: where the leverage actually is
Start by drawing the energy budget honestly, because that is where you find the leverage. In a modern AI facility, non-IT (overhead) energy splits roughly into cooling (60-80% of overhead) and the power chain (conversion and distribution losses, most of the rest), with lighting and ancillaries a rounding error. Cooling is the dominant lever, and within cooling the dominant sub-lever is not the chiller's efficiency at full load — it is how many hours per year you can avoid running the compressor at all. That single framing reorders the entire chapter: the highest-value efficiency decision is not buying a better chiller, it is designing a coolant loop warm enough that the chiller is mostly off.
The reason this is the master lever is thermodynamic. Mechanical (compressorized) cooling is the expensive mode; free cooling — rejecting heat directly to ambient air or water without a vapor-compression cycle — is nearly free by comparison. The fraction of the year you spend in each mode is set almost entirely by two numbers you choose at design time: the coolant/facility-water supply temperature and the approach temperature of your heat-rejection equipment relative to ambient. Raise the supply temperature and you convert thousands of compressor-hours into economizer-hours. It is why a Reykjavík facility and a Phoenix facility built to the same nameplate PUE can have wildly different annualized PUE. The nameplate is a snapshot at design conditions; the annualized number is the integral over the weather, and the weather is set by siting (Chapter 3.7) and exploited by setpoint.
Cooling architecture and the PUE bands it produces
Cooling architecture is the first fork, and it sets a hard band on achievable PUE before you tune a single setpoint. The choice is not a smooth dial — it is a small set of discrete regimes, each with its own efficiency floor, each gated by rack density (the cooling cliff of Chapter 5.1). The table below is the decision, stated as bands rather than point estimates because the within-band variation is exactly the setpoint-and-climate game discussed in the rest of the chapter.
| Cooling architecture | Density fit | PUE band | Free-cooling exposure | Why it sits there |
|---|---|---|---|---|
| Legacy air + DX/chiller | ≤ ~30 kW/rack | 1.4-1.6 | Low — compressor-bound much of the year | Compressorized cooling dominates; warm supply air limited by IT inlet specs |
| Optimized air + economizer | ≤ ~40 kW/rack | 1.25-1.4 | Moderate — airside/waterside economizer hours | Containment + economization cut compressor hours; still air-transport-limited |
| Rear-door HX / air-assisted liquid | ~40-75 kW/rack | 1.2-1.35 | Moderate-high — warmer loop enables more free hours | Liquid-to-the-door removes the air-transport penalty; loop can run warmer |
| Direct-to-chip liquid (warm-water) | ~75-200+ kW/rack | 1.05-1.15 | High — 32-45 C loop is dry-cooler / economizer friendly | Heat captured at the source in a warm loop; compressor often unnecessary |
| Single-/two-phase immersion | ~50-200+ kW/rack | 1.02-1.10 | Very high — high-grade warm bath rejects to ambient | Minimal fan/pump parasitics; best floor, but fluid/PFAS and serviceability costs |
The architecture you can choose is gated by density on the left; the PUE band you inherit is on the right; and the mechanism that places you within the band, free-cooling exposure, is the lever the rest of this chapter pulls. The consequence is sharp: choosing air at the limit instead of warm-water DLC does not merely cost you a few tenths of PUE at nameplate — it caps your annualized efficiency at a band you can never escape, because an air hall's loop is too cold to economize for most of the year in most climates. The cooling architecture is, in effect, a decision about how many compressor-hours you have signed up to pay for over the building's life.
Free cooling and economization: the highest-value efficiency decision
Free cooling — economization — is the act of rejecting heat to the environment without running a vapor-compression cycle. It is the single largest annualized-efficiency lever in the building, and it comes in three architectures that trade water, capital, and reachable hours against one another. The fork is not merely "do we economize" (you always should) but which kind, because each one couples to a different siting constraint and a different downstream cost.
Airside economization pulls filtered outside air directly into the hall when ambient is cool and dry enough, exhausting hot air rather than recirculating and re-cooling it. It is the cheapest to operate and the most water-free, but it imports outdoor humidity, particulates, and gaseous contaminants into the white space — a real reliability liability — and it works only when the air-cooled IT inlet spec (ASHRAE A1-A4) can be met directly. Waterside economization keeps the air loop sealed and instead uses a cooling tower or dry cooler to chill the facility water without the chiller, via a plate heat exchanger when wet-bulb (or dry-bulb) is low enough. It tolerates a wider climate envelope and keeps contaminants out, at the cost of water (evaporative towers) or a larger dry-cooler footprint and a warmer achievable loop. Dry-cooler-direct (compressor-less) operation is the warm-water DLC endgame: if your facility loop runs at 40-45 C, a dry cooler can reject to ambient air across most of the year with no evaporation and no compressor at all — zero process water for cooling, and PUE that approaches the pump-and-fan parasitic floor.
| Economizer type | Sealed white space? | Water use | Reachable free hours | Primary downside |
|---|---|---|---|---|
| Airside (direct outside air) | No — outside air enters hall | None (unless adiabatic assist) | High in cool/dry climates | Imports humidity, particulates, gaseous contaminants; needs filtration + RH control |
| Airside + adiabatic assist | No | Moderate (evaporative pre-cool) | Extends warm-climate hours | Reintroduces water; adds spray/media maintenance |
| Waterside (tower + plate HX) | Yes | High (evaporative) or low (dry) | High where wet-bulb is low | Tower water, blowdown, Legionella control (ASHRAE 188) |
| Dry-cooler-direct (warm loop) | Yes | ≈ Zero for cooling | Very high if loop ≥ ~40 C | Larger heat-rejection footprint; needs warm-water DLC to begin with |
The trade is a water-versus-PUE-versus-capital triangle, and the AI-density era has bent it decisively toward the dry-cooler-direct corner. The reason is the warm loop: once DLC lets you capture heat at 40-45 C instead of cooling air to 18-27 C, the dry cooler becomes viable for most of the year in most temperate climates, which lets you design water out of the building entirely for cooling — the closed-loop, near-zero-WUE designs hyperscalers now publish (Microsoft's next-gen closed-loop facilities report cutting >125 million litres per year). That is the same decision that governs water stewardship in Chapter 15.4, which is why efficiency and water cannot be optimized separately: the coolant setpoint that maximizes free-cooling hours is also the one that lets you eliminate evaporative water. The cost you pay is footprint and capital — dry coolers are larger and cannot reach as low a loop temperature as an evaporative tower on a hot day — which is precisely why this is a siting-coupled decision, not a mechanical one.
Warm-water / high-temperature cooling: the setpoint that unlocks everything
Everything above converges on one number: the facility-water supply temperature. The industry has spent decades over-cooling — running 18-27 C chilled loops because that was what air-cooled IT inlets demanded — and in doing so threw away both free-cooling hours and any chance of heat reuse. Warm-water DLC inverts the logic. ASHRAE's liquid-cooling classes (W17 through W45+ in the 5th-edition Thermal Guidelines) are keyed to the upper facility-water supply temperature precisely to make this a deliberate design choice, and the direction of travel is unambiguous: ASHRAE TC 9.9's own roadmap argues for standardizing toward a ~30 C facility-water target, and NVIDIA's Vera Rubin reference designs target dry-cooler-capable 40-45 C loops.
The consequence chain from this one setpoint is the most important in the chapter. A warmer loop (a) widens the temperature difference between your coolant and ambient, which (b) lets a dry cooler or economizer reject heat across more of the year, which (c) collapses compressor-hours toward zero, which (d) drops annualized PUE toward the parasitic floor, and simultaneously (e) lifts the return-water temperature high enough that the waste heat becomes a sellable product instead of a disposal problem (Chapter 15.5). One setpoint, five downstream wins. The cost is margin: a warmer loop leaves less thermal headroom, so the cold-plate design, flow rate, and CDU approach temperature must be tighter and the controls more disciplined (the transient-stability problem of Chapter 5.12). The engineering of the loop that delivers it lives in Chapter 5.7.
Setpoint strategy: every conservative degree is a watt you chose to spend
Setpoint strategy is where design intent meets operating reality, and it is where most facilities quietly leave efficiency on the table. The instinct of an operations team is to run cold and conservative — it feels safe. But ASHRAE's recommended IT inlet envelope has been 18-27 C for years, with A1-A4 allowable ranges extending to ~32-45 C, and every degree you decline to use is compressor or fan energy you have chosen to spend for a thermal margin the equipment did not require. The setpoint decision is therefore a deliberate risk-versus-efficiency trade, and it must be made explicitly rather than defaulted to "cold."
The fork has three settings. Conservative (low IT inlet, cold loop, wide margin) maximizes ride-through and equipment longevity headroom at the cost of free-cooling hours — defensible only for legacy air halls or sites with poor thermal monitoring. ASHRAE-recommended (mid-band inlet, moderate loop) is the safe default for most operators. Aggressive / allowable-band (high inlet within A-class limits, warm loop) maximizes economizer hours and heat-reuse grade, and is the right posture for a well-instrumented liquid-cooled facility with disciplined controls — but it narrows the thermal ride-through window, which matters enormously when a 1 kW+ GPU can thermal-trip within seconds of a cooling-loss event (the resilience coupling of Chapter 12.2). The downstream cost of running warm is not normally efficiency — it is that you have spent your transient margin, so your pump redundancy, UPS-backed cooling, and controls stability had better be commissioned to match. Raising setpoints without first proving ride-through is how an efficiency initiative becomes an outage.
Power-chain efficiency: the other half of overhead
Cooling is the larger lever, but the power chain is the one that compounds across every single watt the building draws, every hour, with no weather dependence. Each conversion stage between the utility and the GPU's voltage regulator — transformer, UPS, PDU, rack PSU, board-level VRM — sheds a few percent, and the product of those efficiencies is the end-to-end electrical efficiency. A legacy AC chain can land anywhere from ~61% to ~87.5% end-to-end depending on vintage; a modern path exceeds 92%, and an 800 VDC architecture targets a further ~5-point gain by eliminating conversion stages (SemiAnalysis Datacenter Anatomy Pt 1, 2025). Five points end-to-end does not sound like much until you multiply it by a gigawatt running continuously for a decade.
Two power-chain decisions carry most of the leverage. The first is the UPS operating mode: a double-conversion UPS runs at ~94-96% efficiency because it rectifies and re-inverts continuously; eco / dynamic-online modes hold the inverter on standby and pass utility power through at ~98-99%, recovering 2-4 points — at the cost of a few milliseconds of transfer time that must be reconciled against the load's ride-through requirement (the UPS architecture decision lives in Chapter 4.5). The second is the voltage architecture: the 48 V → ±400 V → 800 VDC transition (Chapter 4.7) removes conversion stages, pushes ~150% more power through the same copper, and is the structural enabler for 600 kW+ racks — which is why the efficiency case and the density-ramp case point the same way. Both decisions are substantially irreversible: you commission a UPS topology and a voltage class once, and re-doing either mid-life is a rebuild, not a tune-up.
ML-driven cooling optimization: closing the part-load gap
A plant designed to be efficient at nameplate is not the same as a plant that operates efficiently, because the building almost never sits at nameplate. AI facilities live at 40-70% utilization much of the time, with training jobs that ramp and checkpoint and inference fleets that swing 30-90% in minutes — and cooling plant that is efficient at full load is frequently inefficient at part load, where pumps, fans, and chillers run off their best-efficiency point. The part-load gap is real money, and it is the domain where machine-learning control has earned its place in the design basis rather than as an afterthought.
The canonical result is Google's, where a DeepMind reinforcement-learning controller adjusting setpoints across the cooling plant in real time cut cooling energy by ~40% and total facility overhead (PUE) by ~15% relative to the human-tuned baseline — the kind of gain that is invisible to nameplate PUE because it lives entirely in the part-load, multi-variable interactions a static setpoint table cannot capture. The mechanism is straightforward: a cooling plant has dozens of coupled actuators (pump speeds, valve positions, tower fan speeds, chiller staging) and a non-linear response surface that shifts with load and weather; an ML controller searches that surface continuously where a human operator sets-and-forgets. The decision is whether to design for this — instrumenting the plant densely enough (the DCIM telemetry of Chapter 14.2) and giving the controller safe authority — or to bolt it on later against a plant that lacks the sensors and actuators to exploit it. Retrofitting observability is far more expensive than designing it in, which is why agentic and RL-based control (Chapter 14.13) belongs in the efficiency design basis, not the wish list. The caution: an ML controller that optimizes facility PUE without a goodput constraint will happily throttle the IT load to make its own number look better — the objective must be useful-work-per-watt, with the GPU thermal envelope as a hard constraint, never PUE in isolation.
Deep dive: the four delta-Ts and why each one is an efficiency decision
Practitioners decompose a liquid-cooled facility's thermal path into a chain of temperature differences — the "four delta-Ts" — and each one is a lever you trade against capital, parasitic power, and free-cooling hours. Walking them from chip to sky makes the efficiency physics concrete.
Delta-T #1: chip-to-coolant (across the cold plate). Set by cold-plate thermal resistance (~0.02-0.03 C/W) and flow rate. A tighter delta-T here lets the coolant run warmer for the same junction temperature — but demands more flow, which costs pump power. Delta-T #2: coolant-loop rise (the ~7.5-12 C the coolant gains across the rack). A larger rise means less flow for the same heat — lower pump parasitics — but a higher return temperature that the CDU must handle. Delta-T #3: CDU approach (the ~3-5 C the heat exchanger loses transferring from the technology-cooling loop to the facility-water loop). Smaller approach means a warmer facility loop for the same chip temperature — directly more free-cooling hours — but a larger, costlier heat exchanger. Delta-T #4: facility-loop-to-ambient (the margin the dry cooler or tower needs over wet- or dry-bulb to reject heat). This is the one the weather controls and the warm loop widens; it is the delta-T that decides how many hours the compressor stays off.
The unifying insight: efficiency is the art of spending the right delta-T in the right place. Every degree you can push into the facility-loop-to-ambient delta-T (#4) by tightening the upstream three (#1-#3) is a degree of free-cooling exposure. That is why warm-water design, cold-plate engineering, and CDU sizing are not separate problems — they are one continuous budget, and the budget's bottom line is annualized PUE. Engineering homes: cold plates and flow in Chapter 5.4, the CDU and secondary loop in Chapter 5.6, heat rejection in Chapter 5.8.
Deep dive: why PUE understates liquid-cooled efficiency (and what to use instead)
PUE was defined for an air-cooled world, and it has a structural blind spot that the liquid-cooled, high-density era makes acute: it counts everything outside the IT envelope as overhead, and everything inside it as useful — regardless of whether the inside-the-box power is doing computation or merely moving heat. In an air-cooled server, the cooling fans (sometimes 10-20% of server power at high density) live inside the IT boundary and therefore improve PUE even though they are pure cooling overhead. Move that same heat-moving work to a facility CDU and pump, and it now counts against PUE. The metric can be gamed by relocating inefficiency across the IT boundary, and direct-to-chip liquid — which removes the server fans — can make a genuinely more efficient facility look worse on PUE than the air hall it replaced.
The fix is to measure the thing PUE was a proxy for. TUE (Total-power Usage Effectiveness) folds the in-server cooling and conversion losses back in (TUE = ITUE × PUE), so relocating a fan across the boundary changes nothing. Work-based metrics go further and divide useful computational work by total energy, which is the only framing that correctly penalizes a throttled GPU. The practical takeaway for this chapter: when you raise setpoints or warm the loop, watch a TUE-class or work-based number, not facility PUE — otherwise you can congratulate yourself on a falling PUE while server fans spin up and goodput falls. The full metric stack and its governance live in Chapter 15.1.
Part-load and utilization: efficient at the load you actually run
The last efficiency lever is the most often ignored because it does not appear on a nameplate at all: match the plant's efficiency curve to the load profile you will actually run, not the design-day peak. A cooling plant or UPS that is most efficient at 100% load is the wrong plant for a facility that lives at 40-70%. The fix is modularity and staging — multiple smaller chillers, pumps, and CDUs that can be staged on and off to keep the running units near their best-efficiency point, rather than a few large units running inefficiently part-loaded. The same logic governs the power chain: a UPS bank sized so that each module sits in its high-efficiency band at typical load beats one sized so every module idles at 30%.
This is a design-time decision with an operating-time payoff, and it interacts with two threads of this guide. On the power-bound axis, part-load-efficient plant means more of your contracted megawatts reach the IT load instead of being lost to oversized, under-loaded equipment — directly more compute per interconnection slot. On the goodput axis, the staging logic must never compromise the thermal ride-through that protects the GPUs, so part-load efficiency and resilience are co-designed (Chapter 14.7 on operational capacity/thermal management). The anti-pattern is the facility designed and benchmarked at peak, commissioned with a single efficiency point, and then operated for years at a load where that point is irrelevant — efficient on paper, wasteful in production.