Chapter 13.3
Electrical Power Acceptance (L3/L4)
Electrical acceptance is the gate where the paper power chain — the interconnect, the switchgear, the generators, the UPS/BESS — is forced to prove, with instrumented evidence, that it will hold the most violent load on the planet: a synchronized cluster of GPUs that can swing tens of megawatts in milliseconds.
What you'll decide here
- Whether you accept the power chain against a static, steady-state load-bank profile (cheaper, faster, what most NETA scopes default to) or against a dynamic, AI-emulating profile that reproduces the cluster's real di/dt — and therefore whether transient tolerance is proven at L4 or discovered at first real training run.
- Which load-bank technology — resistive, reactive, or AI-load-emulating — you commission each subsystem against, because each proves a different thing and none of them proves everything.
- How far you validate redundancy topology under load: which transfers, breaker operations, and source losses you actually trip with the racks (or load banks) energized, versus which you take on the drawing's word.
- Whether protection coordination and arc-flash are verified as-built against the study (Chapter 4.2) by primary injection and the relay's own event records, or merely confirmed by settings file — the difference is whether a fault clears selectively or cascades.
- Which acceptance criteria for NVL72-class power smoothing and dynamic-load-swing tolerance you write into the L4 script as quantitative pass/fail gates, and which you defer to IST (Chapter 13.6) and the proxy training run (Chapter 13.9).
By the time a cluster reaches electrical acceptance, the power chain has been designed, studied, manufactured, installed, and point-to-point checked. Every component has a datasheet that says it works. Electrical acceptance — Levels 3 and 4 in the commissioning taxonomy of Chapter 13.1 — is where that paper is converted into instrumented evidence. L3 is functional: each subsystem is energized and operated on its own (the switchgear racks and trips, the generator starts and assumes load, the UPS transfers and rides through), proving the component does what its sequence of operations says it does. L4 is integrated: the subsystems are operated together under load, proving that a utility loss, a generator start, a transfer, and a UPS ride-through chain into one another without a gap the IT load can feel. This chapter is about what "under load" must mean for an AI facility — because the load this building was designed for is unlike anything a conventional data center commissioning script was written to test.
At every gate there is a choice between a cheaper, faster acceptance that proves the steady-state case and a more expensive, slower one that proves the dynamic case. The cost of taking the cheap path is not paid at commissioning. It is paid months later, at the first real workload, when a synchronized load step the load banks never reproduced trips an under-voltage relay, sags a UPS, or de-syncs a generator paralleling scheme, and the operator discovers that an L4 sign-off certified a building that cannot actually hold its design load. This chapter walks the chain in energization order (utility, switchgear, generators, UPS/BESS), then the three decisions specific to AI: load-bank realism, redundancy-topology validation under load, and protection/arc-flash verification against the study. It closes on the acceptance criterion conventional commissioning has no vocabulary for: dynamic-load-swing tolerance.
The energization sequence: from the fence to the busbar
Electrical acceptance follows the direction power flows, and each stage is a prerequisite for the next — you cannot commission the UPS until the switchgear that feeds it is accepted, and you cannot commission the switchgear until the utility or generators can energize it. The sequence is a dependency chain, and a deficiency anywhere upstream stalls everything below it.
Utility energization (backfeed / first energization). The interconnect and on-site substation (Chapter 4.2) are energized for the first time — often by backfeeding the customer transformer from the utility side, with the utility's protection and the customer's relays both live. This is the single highest-energy moment in the build: the first time full fault current is available behind the customer's breakers. Acceptance here is dominated by protection — relay settings loaded and verified, CT/PT ratios and polarity confirmed, the trip scheme proven to operate before the bus is left energized. Get the polarity of a differential CT wrong and the first through-fault, not the commissioning team, finds it.
Switchgear commissioning. Medium- and low-voltage switchgear and the busway/PDU distribution (Chapter 4.6) are tested to ANSI/NETA ATS — the acceptance-testing specification that the industry treats as the as-installed bar. Insulation resistance and DC hi-pot prove the dielectric survived shipping and install; contact-resistance (ductor) testing on every bolted joint catches the loose lug that becomes a hot spot; breaker timing and trip verification prove the mechanism operates within spec; and, critically, primary injection proves the protective relays trip on real current, not just on a settings file. NETA ATS-2025 is explicit that data centers, with thousands of low-voltage breakers, generate a substantial deficiency count even at a small per-device failure rate — the punch-list management of Chapter 13.2 is not optional bureaucracy here, it is the only way to track which of 3,000 breakers has been proven and which has not.
Generator commissioning. Each generating set is started, run, and load-tested — first on its own load bank (L3), then in parallel and onto the building (L4). Cold-start time to accept load, voltage and frequency recovery on a block load step, fuel-system and exhaust acceptance, and protection (reverse-power, loss-of-field, over/under-frequency) are all proven. For AI sites that lean on on-site generation as prime or island power, the paralleling and load-sharing acceptance is deep enough to warrant its own treatment in Chapter 13.4; here the gate is simply that the gensets can pick up the design block load within the voltage and frequency window the UPS downstream can tolerate.
UPS and BESS commissioning. The ride-through layer (Chapter 4.5) is the last electrical subsystem and the one the IT load is most intimate with. The acceptance set is the battery/capacitor discharge test to proven autonomy, the transfer tests (normal-to-bypass, bypass-to-normal, and the all-important loss-of-utility transfer that hands off to the generator), the harmonic and power-factor behavior under non-linear load, and — for the increasingly common BESS layer — the fast-response transient-absorption behavior that conventional double-conversion UPS was never asked to provide. This is where the AI-specific acceptance criteria start to bite, because it is the UPS/BESS that must absorb the GPU load swing the gensets and the grid cannot follow fast enough.
Load banks: three technologies, three different proofs
A load bank is how you put load on the power chain before the GPUs arrive (or before you dare put them at risk). The choice of load-bank technology is a choice about which question you are answering, and the three available answers do not overlap.
Resistive load banks dissipate real power (kW) as heat through resistive elements at unity power factor. They are the workhorse of data-center commissioning: cheap, robust, and sufficient to prove that the transformers, switchgear, busway, and cooling can carry the design kW and reject the design heat. What they cannot do is load the power chain reactively — they draw no VARs — so they exercise the real-power capability of the chain and nothing about how it behaves under the leading/lagging and harmonic-rich conditions a real GPU power supply imposes.
Reactive load banks add inductive (and sometimes capacitive) load, letting the commissioning team test the chain at a realistic, non-unity power factor — typically down to ~0.8 lagging. This matters because generator sets and UPS are rated in kVA, not kW; a chain that carries 100% rated kW at unity power factor may be over its kVA limit at 0.8 PF. Reactive load banks prove the generator's automatic voltage regulator, the UPS's VAR handling, and the transformer's heating under the apparent-power load the equipment is actually rated for. They still hold the load steady; they prove the operating point, not the transient.
AI-load-emulating (dynamic) load banks are the newest and least standardized class. They are designed to reproduce the workload's electrical fingerprint — programmable, fast load steps that mimic the synchronized ramp-up, steady-state oscillation, and abrupt ramp-down of a GPU cluster entering and leaving collective operations. They are the only load-bank class that exercises the di/dt the power chain was actually designed to survive: the under-voltage and frequency excursions on a step load, the UPS/BESS transient-absorption response, the generator's load-acceptance and load-rejection recovery, and the interaction between rack-level capacitance and facility-level storage. They are more expensive, harder to source, and still cannot perfectly reproduce a real cluster (the only true emulator is the proxy training run of Chapter 13.9) — but they are the difference between accepting a static building and accepting the building you designed.
| Load-bank class | Loads the chain in | Proves | Leaves untested | Relative cost / availability |
|---|---|---|---|---|
| Resistive | Real power (kW), unity PF, steady | Capacity, thermal/heat-rejection, basic steady-state holding | Reactive (kVA) behavior; all transient/di/dt behavior; harmonics | Lowest; ubiquitous, easily rented |
| Reactive (R+L, sometimes +C) | Apparent power (kVA) at ~0.8 PF lagging, steady | Generator AVR & kVA rating, UPS VAR handling, transformer heating at rated PF | Transient/di/dt behavior; workload-realistic harmonic spectrum | Moderate; available but heavier/larger |
| AI-load-emulating (dynamic) | Programmable fast load steps mimicking cluster ramp/oscillation | Transient ride-through, UPS/BESS absorption, generator load-accept/reject, di/dt tolerance | Exact cluster spectrum, real cold-plate thermal coupling, true software-driven swings | Highest; specialist gear, limited rental pool |
The practical pattern at a 2026 AI site is layered, not single-choice. Resistive banks prove capacity and that the cooling can reject the heat (the facility load banks reject to air, which is itself a realism limit treated in Chapter 13.5). Reactive banks prove the generator and UPS at their kVA ratings. Then, where the budget and schedule allow, dynamic banks prove the transient envelope — and where they do not, that envelope is left to IST (Chapter 13.6) and ultimately to the first proxy training run (Chapter 13.9), which is the only instrument that loads the chain with a real, software-synchronized cluster. The decision a strategist signs off is how much of the transient risk to retire at L4 (expensive, but caught before GPUs are at risk) versus how much to carry into IST and first-workload (cheaper to commission, but discovered with $100M+ of accelerators energized).
Validating the redundancy topology under load
A redundancy topology — N+1, 2N, distributed-redundant, or the catalog of fault-domain designs in Chapter 12.1 — is a claim that the load survives the loss of any one (or any block of) components. The drawing makes the claim; L4 acceptance is where you make the claim true by actually removing components with the load energized and watching whether the load notices. The fork here is how aggressively you test: which transfers and source losses you genuinely trip versus which you accept on the single-line diagram's authority.
The acceptance set, in ascending order of how much it tells you and how much it risks: (1) controlled transfers — operate every static transfer switch, automatic transfer switch, and breaker tie through its full sequence under load, confirming break/make timing and that no downstream bus drops; (2) single-source loss — open a utility feed, a UPS module, a CDU power feed, or a PDU and confirm the redundant path picks up within the IT load's ride-through window; (3) concurrent / cascading loss — the IST-grade test (Chapter 13.6) where you stack faults to confirm the topology survives the design-basis worst case, up to the full black-building "pull-the-plug" demonstration. For an AI cluster, the loaded redundancy test has a sharp edge that conventional data centers lack: the redundancy you are validating may not be the redundancy the workload values. A synchronous training job already restarts from a checkpoint on a node loss; spending L4 effort proving 2N facility availability for a job that tolerates checkpoint-and-resume can be effort spent on the wrong nines — the goodput-vs-availability reframing of Chapter 12.2. The acceptance scope should validate the topology that protects the revenue, not every redundancy the drawings happen to contain.
Protection coordination & arc-flash: verifying the building against the study
The protection coordination study and the arc-flash incident-energy study (produced in Chapter 4.2) are analytical artifacts: they specify relay settings, breaker trip curves, and PPE/labeling on the assumption that the as-built plant matches the model. Electrical acceptance is where that assumption is tested. The fork is verification depth: confirm coordination by reading settings back from the relays (fast, cheap, and proves only that someone typed the right numbers) versus proving it by primary injection and recorded event data (slow, expensive, and proves the device actually trips at the studied current in the studied time).
The acceptance set is concrete. Settings verification: every protective relay's pickup, time-dial, and curve is read back and reconciled against the coordination study — and against the device labeling, because NETA ATS is explicit that mislabeled or mis-addressed devices are a primary cause of lost selective coordination. Primary/secondary injection: current is injected to confirm the relay operates within the study's time-current band, so that a downstream fault is cleared by the nearest upstream device and not by one two levels up that would needlessly drop a whole bus. Coordination proof under fault: where practical, faults are staged and the relay event records confirm selective operation — the only evidence that coordination is real and not merely modeled. Arc-flash verification: the as-built clearing times from those injection tests feed back into the incident-energy calculation, because arc-flash energy scales with clearing time — a relay that trips slower than the study assumed raises the incident energy at that bus, which can invalidate the PPE category on the label a technician is relying on. Acceptance is therefore not just "the relay trips"; it is "the relay trips fast enough that the arc-flash labels on this gear are still true."
Deep dive: why the coordination study and the acceptance test are not the same document
A common and dangerous shortcut treats the protection coordination study as the proof — "the study says these settings coordinate, the settings are loaded, therefore the building coordinates." Each link in that chain can be false. The study models the equipment that was specified; the building contains the equipment that was installed, which may differ in available fault current (a different upstream transformer impedance, a utility that strengthened the source), in breaker trip-unit firmware, or in the actual length and impedance of feeders. The settings may have been transcribed wrong, loaded to the wrong device address, or overwritten during a firmware update. And the device may simply not operate as its curve claims — trip mechanisms age, CTs saturate, and a relay that reads back the right settings can still trip at the wrong current.
Primary injection collapses all of that uncertainty into a single empirical test: real current in, measured trip time out, compared against the study's band. For an AI site this matters more than for a conventional load, because the density ramp (Chapter 12.1 fault domains) packs enormous fault energy behind compact busway and the consequence of a mis-coordinated trip is not one rack but a fault domain of GPUs going dark mid-job — and because arc-flash incident energy, which the same clearing times determine, governs whether a technician can safely work a live cabinet during day-2 operations. The study is the hypothesis. The injection test is the experiment. Acceptance requires both, and the cost of skipping the experiment is a building whose protection is true on paper and unknown in fact.
The AI-specific gate: dynamic-load-swing tolerance and NVL72 power smoothing
This is the acceptance criterion conventional data-center commissioning has no vocabulary for, and it is the one most likely to be skipped — because the load banks that most scopes specify cannot produce it and the relays that would trip on it are upstream of the IT the script focuses on. A modern rack actively fights its own transients: the GB300 NVL72 power shelf carries roughly 65 J/GPU of electrolytic capacitance — about half the PSU volume — and combines it with power-capping on ramp-up and a deliberate "GPU burn" on ramp-down to taper the load gracefully, cutting peak grid demand by up to 30%. The Vera Rubin generation pushes rack storage to ~400 J/GPU (≈6x) with closed-loop state-of-charge control. Behind the rack, facility BESS provides the next layer of fast-response absorption (Chapter 4.5). Together these form a layered mitigation stack — rack capacitance, then BBU/UPS, then facility BESS — and electrical acceptance is where you prove the stack actually engages and that the residual swing it does not absorb stays inside the power chain's tolerance.
The acceptance criteria, written as quantitative pass/fail gates in the L4 script (the script anatomy is Chapter 13.2): the voltage excursion at the bus on a defined load step stays within band; the frequency excursion (on generator or island power) stays within the relay's no-trip window; the UPS/BESS sources the swing without dropping to bypass or sagging the output; the rack-level smoothing demonstrably reduces the upstream peak by its rated amount; and no protective relay anywhere in the chain trips on the largest synchronized step the cluster can produce. The honest limit, named plainly: a load bank — even a dynamic one — can approximate these steps but cannot reproduce the exact, software-driven, thousands-of-GPUs-in-lockstep spectrum of a real workload. That is why this gate spans three commissioning activities: dynamic load banks at L4 retire the gross transient risk, IST (Chapter 13.6) stacks it with concurrent faults, and the proxy training run (Chapter 13.9) is the only true emulator. The decision is where to draw the line — and the consequence of drawing it too early is a cluster whose first real all-reduce is also its first real transient test.
The density ramp as an acceptance problem
For an AI facility, electrical acceptance is a recurring gate the density ramp re-opens with every GPU generation, not a one-time event. A power chain accepted for ~132 kW GB200 NVL72 racks faces a different acceptance problem when the hall fills with ~190–230 kW Rubin VR200 racks, and a categorically different one at ~600 kW Rubin Ultra Kyber on 800 VDC. The fault current rises, the transient magnitude rises, the protection coordination must be re-studied and re-injected, the arc-flash energy is recomputed, and the load-bank profile that proved the old generation under-tests the new one. The irreversible electrical substrate — the interconnect capacity, the voltage class, the switchgear fault rating, the busway ampacity — has to be accepted not just for today's load but for the ramp it must absorb (the density-ramp trap of Chapter 12.1). The strategist's acceptance decision is therefore forward-looking: do you commission the protection and the transient envelope against the generation you are installing, or against the headroom the substrate must eventually carry? Accepting only to today's load is cheaper and re-opens the gate at every refresh; accepting to the substrate's ceiling is more expensive now and converts a recurring acceptance cost into a one-time one.