The Definitive Guide toAI Data Centers
Ask the Guide
Guide Commissioning & Go-Live13.4

Chapter 13.4

Commissioning On-Site Generation & Microgrid Controls

When you build behind the meter you are no longer a customer of a grid — you are a grid, and commissioning is the only point at which you prove your generation, controls, and storage can actually keep a gigawatt-class AI load alive when the utility lets go.

POWER-BOUNDGOODPUT

What you'll decide here

  1. Whether the site runs prime/island, grid-parallel, or grid-as-backup — because that one choice decides whether your microgrid controller must hold frequency and voltage on its own or merely follow the utility's.
  2. Seamless (make-before-break, no-break island) vs break-before-make islanding — and therefore whether the IT load rides through a grid loss on inverter/BESS support or drops to UPS/BBU for the transfer.
  3. Who owns the seam: where facility commissioning (13.3) ends, where microgrid Cx begins, and where it must hand a stable, characterized bus to Integrated Systems Testing (13.6).
  4. How much of the transient-support stack — grid-forming BESS, synchronous condensers, GPU-side power smoothing — you validate at Cx versus discover under a real workload.
  5. Which failure modes you are willing to demonstrate live (loss-of-utility, loss-of-largest-generator, black start) versus accept on analysis — because each live demo costs schedule and risk but buys a tested restoration sequence.

For most of the data-center industry's history, on-site generation meant a yard of standby diesels that started, took the block load, ran for an hour during the monthly test, and went back to sleep. Commissioning was a checklist: crank it, transfer to it, load-bank it, prove the ATS. That world is gone for the AI build. When you site behind the meter to escape a five-to-seven-year interconnection queue — and roughly 82 GW of behind-the-meter gas had been announced by 2026 against a grid that simply cannot energize it in time (Cleanview / SemiAnalysis, 2026) — your generators are not standby. They are prime. They run the building. The utility, if it is present at all, is the backup. That inversion changes the entire character of acceptance: you are no longer commissioning a transfer switch, you are commissioning a power system — generation, synchronization, protection, storage, and a controller that must do what a utility control room does, autonomously, in milliseconds.

This chapter is about proving that power system before a single GPU depends on it. It sits deliberately in the sequence: after electrical power acceptance (Chapter 13.3) has energized and characterized the switchgear, UPS, and BESS as components, and before Integrated Systems Testing (Chapter 13.6) pulls the plug on the whole building. The engineering it validates — paralleling and load-sharing, fuel-gas conditioning and dual-fuel changeover, islanding transitions, microgrid-controller tuning, black start, and inertia/transient support — is designed in Chapter 4.8 (electrical integration), Chapter 4.9 (fuel and gas process), and Chapter 4.10 (grid-interactive behavior toward the point of interconnection). Here we prove it works, and mark where a wrong acceptance decision strands the most capital.

Prime/island acceptance: paralleling and load-sharing

The first acceptance gate is mechanical and electrical at once: getting multiple prime movers to share a load without fighting each other. A single 35 MW aeroderivative turbine (GE Vernova's LM2500XPRESS, the unit Crusoe ordered 29 of for its Abilene campus, is the 2025 reference point — 35 MW, five-minute start, black-start-capable and grid-independent by design) does not run a 200 MW hall. Eight or ten of them do, in parallel, on a common bus, and they must synchronize (match voltage, frequency, and phase angle within tight windows before the breaker closes) and then load-share (split real and reactive power proportionally, with no unit hogging kW or VARs).

The fork here is the load-sharing control philosophy, and it is not academic — it determines how the plant behaves the instant a GPU cluster steps load. Isochronous control holds frequency dead-flat at 60 Hz and is the natural mode for a true island, but it requires a master controller or a robust load-sharing line and is intolerant of communication faults. Droop control lets frequency sag slightly with load (e.g. 4% droop = 2.4 Hz across 0–100%), is inherently stable and communication-free, and is mandatory when running in parallel with a utility. Most AI microgrids commission both — isochronous-load-sharing when islanded, droop when grid-parallel — and the acceptance test must prove the controller hands off cleanly between them. Skip the dual-mode validation and you discover, the first time the utility drops, that the plant either hunts (units oscillating against each other) or dumps load because the mode transition was never exercised.

Load-sharing / reference modes — what each costs you at acceptance
ModeFrequency behaviorReference sourcePrimary useWhat Cx must proveFailure if skipped
Isochronous (single master)Flat 60 HzGrid-forming masterSteady island, one prime mover leadingMaster holds f under largest load step; backup-master failoverLoss of master collapses the island
Isochronous load-sharingFlat 60 HzDistributed / load-share lineMulti-generator islandProportional kW/VAR split; no hunting; comms-fault behaviorUnits fight; oscillation; uneven wear
DroopSags with load (e.g. 4%)External grid (utility)Grid-parallel operationCorrect power dispatch vs grid; reverse-power protectionReverse power; export trip; instability
Isochronous ↔ droop handoffFlat → sag on transitionSwitches on island detectSeamless grid/island dutyClean mode transition at island detect and reconnectLoad dump or hunting at the transition
The fork is which mode(s) you commission and prove the handoff between. Figures are typical practitioner settings; tune to the protection and stability study from Chapter 4.8.

Fuel-supply and gas-process acceptance

An electron-side island is only as firm as its molecule-side supply, and this is the acceptance gate that the electrical team is most tempted to wave through because it lives in a different P&ID. Don't. The prime movers will not start, hold load, or pass an emissions stack test if the fuel gas is out of spec, and every aeroderivative turbine is fussy about its inlet. Commissioning the fuel chain (the engineering of which is laid out in Chapter 4.9) means accepting the conditioning train end to end: filtration and coalescing (no liquids, no particulates), dew-point control and inlet heating (superheat above the hydrocarbon dew point so nothing condenses in the fuel skid), Wobbe-index verification (the interchangeability number the turbine's combustion map is tuned to — a Wobbe excursion forces a derate or a trip), and on-site boost compression for aeroderivatives that demand higher fuel-gas pressure than the pipeline tap provides.

The decision with the longest tail is firmness: firm vs interruptible pipeline transport, and the dual-fuel hedge. Interruptible gas is cheaper and faster to contract, but a winter curtailment is exactly when the grid is also stressed and your island is doing real work. The hedge is dual-fuel changeover — gas to distillate (diesel) with on-site liquid storage — and the changeover is a commissioning test in its own right: prove the turbine transfers fuels under load, without a trip, within the changeover window, and that the on-site distillate inventory matches the firmness commitment (operators chasing 'five-nines' reliability size days, not hours, of on-site liquid backup). Accept the gas path on paper and you will find the gap at the worst possible time — a Wobbe trip or a failed changeover during the one curtailment event the whole island exists to survive.

Islanding transitions: seamless vs break-before-make

This is the defining fork of the chapter, and it maps directly onto the IT load's tolerance for a power interruption. Break-before-make islanding opens the utility breaker, lets the bus de-energize, then closes onto generation — a clean, simple, well-understood sequence with one fatal property for AI: there is a dead bus interval. Every millivolt-second of that interval the GPU racks ride on UPS or rack-level BBU, and a synchronous training run that loses power mid-step restarts from its last checkpoint. Seamless (make-before-break, or 'no-break') islanding never lets the bus go dead: a grid-forming source — almost always a grid-forming BESS inverter — is already holding the bus in parallel, the controller detects the grid anomaly, and it opens the utility breaker while the inverter seamlessly assumes the full reference. Done right, the IT load never sees an event.

The consequence cascade is unforgiving. Choose break-before-make and you have implicitly sized your UPS/BBU autonomy to cover the dead-bus interval plus generator pickup — and you have accepted that every utility disturbance is a restart risk for tightly-coupled jobs. Choose seamless and you have committed to a grid-forming BESS sized and tuned to instantaneously source the entire campus transient, a faster and more sophisticated controller, and a far more demanding commissioning test: you must prove the island forms with the load online, not on a dead bus you re-energize at leisure. The acceptance criterion is exact: measure the bus voltage and frequency excursion through the transition and show the IT-critical bus never leaves the ITIC/CBEMA ride-through envelope. This is the test that, more than any other in microgrid Cx, determines whether the building is goodput-grade or merely available.

Microgrid controller tuning: droop, regulation, dispatch

The microgrid controller is the brain, and IEEE 2030.7-2017 is the standard that defines what it must do — the control functions above component level: dispatch, unplanned and planned islanding, reconnection, and black start. Its companion, IEEE 2030.8-2018, defines how you test those functions, and a disciplined microgrid Cx plan cites both as the acceptance basis. Tuning the controller is where commissioning stops being a checklist and becomes engineering, because the controller's loops interact: droop coefficients set steady-state power sharing; the secondary voltage/frequency regulation restores nominal after a droop excursion; and the dispatch logic decides, second by second, which generators run, which idle, and how the BESS is charged and discharged against the campus load and the GPU power profile.

The tuning fork that bites is response speed vs stability. Tune the controller too aggressive and it overshoots and oscillates against the turbine governors and the inverter control — a multi-loop instability that can build into a wide-area oscillation, a real and documented risk as synchronized AI loads and inverter-based resources proliferate (see the power-stabilization literature, arXiv 2508.14318). Tune it too soft and it cannot arrest the frequency excursion from a GPU load step before protection trips. There is no generic setting; the loops must be tuned against the measured dynamics of this plant, which is why controller tuning is a commissioning activity and not a factory pre-set. The deep-dive below walks the tuning sequence.

Deep dive: tuning the controller against measured plant dynamics

Controller tuning at Cx is a staged escalation, never a single test, and the discipline is to characterize each loop in isolation before closing the outer loops. Stage 1 — primary (droop/governor): with the island formed at light load, inject controlled load steps and measure the frequency and voltage excursion and recovery for each prime mover and inverter individually. Confirm the governor and AVR responses match the model from the Chapter 4.8 stability study; mismatch here means the model is wrong and everything downstream is suspect. Stage 2 — secondary (restoration): close the controller's voltage/frequency restoration loop and prove it returns the bus to nominal after a droop excursion without overshoot, with restoration time inside the dispatch-cycle budget. Stage 3 — load-sharing: with multiple sources online, prove proportional kW/VAR split holds through load steps and that no unit hunts. Stage 4 — dispatch: exercise the economic/availability dispatch logic across a synthetic day, including a forced loss-of-a-generator, and confirm the controller re-dispatches and the BESS bridges the gap without a frequency violation.

The reason this sequence matters: each stage's gains are inputs to the next, and a hidden instability in an inner loop is invisible until an outer loop excites it under a real transient. Tuning the whole thing at once — the temptation when schedule is tight — is how you ship a controller that passes every static test and then oscillates the first time a 200 MW training cluster steps its load. The transient physics you are tuning against are canonical in Chapter 4.5.

Black-start capability and restoration sequencing

Black start is the demonstration that the island can come back from nothing — no utility, no running generation, a dead campus — under its own power. It is the most demanding test in the program and the one most likely to be deferred to 'analysis' under schedule pressure, which is exactly why it deserves a live demonstration. The black-start source (an aeroderivative turbine with on-board black-start capability, or a grid-forming BESS energizing a starting bus) must cold-start, energize a dead bus, then sequence the restoration: bring up auxiliaries and fuel-gas compression, parallel additional generation, and pick up load in blocks small enough that no single step exceeds the running generation's transient capability.

The fork is block-loading granularity, and it trades restoration speed against transient risk. Large blocks restore the campus faster but risk a frequency excursion that trips the nascent island and forces a restart of the whole black-start sequence. Small blocks are safe but slow, and a slow restoration is lost goodput. The acceptance test must prove the sequence, not just the start: cold-start the source, energize the bus, and walk the documented block-loading steps with the BESS bridging each step's transient, demonstrating the frequency stays inside the protection window at every block. Critically, the cooling plant must restore in lockstep — there is no point energizing GPU racks faster than the CDUs and pumps can take heat, a coupling that becomes the IST problem in Chapter 13.6 and is why this test is sequenced before, not during, integrated testing.

BESS / synchronous-condenser inertia and transient support

An island built on inverter-based generation and aeroderivative turbines is low-inertia — it lacks the large spinning masses of a utility grid that resist frequency change. Low inertia means frequency moves fast under a load step, and an AI load steps hard: a synchronous cluster can swing tens of megawatts in milliseconds as a training step begins or ends. NERC's rare Level 3 alert came after data-center loads dropped roughly 1,500 MW on a single fault and 1.5 GW in 82 seconds in Virginia in 2024 — the grid-facing symptom of exactly this dynamic. Inside the island, the same physics means the transient-support stack is not optional, and commissioning must validate every layer of it.

The stack is layered by timescale, and the acceptance plan must test each. Synchronous condensers (free-spinning synchronous machines) and flywheels add real rotating inertia and short-circuit strength — validate by measuring the rate-of-change-of-frequency (RoCoF) the island survives. Grid-forming BESS provides synthetic inertia and sub-second power injection — validate against measured load steps. And increasingly the GPU silicon itself carries the fastest layer: NVIDIA's Vera Rubin power-smoothing system holds ~400 J of rack-level energy storage per GPU and a closed-loop controller that cuts peak current demand by up to ~25%, flattening the sub-second transient before it ever reaches the bus (NVIDIA, 2025). The commissioning consequence is profound: the more the transient is smoothed in silicon and BESS, the less inertia the island must carry — but you can only bank that relief if you measure it at acceptance rather than assume it. Validate the layers bottom-up, and the load-bank-vs-real-workload realism gap (the canonical home of which is Chapter 13.6) is where you confront what the test load could and could not reproduce.

~82 GW
behind-the-meter gas announced by 2026 (~7 GW under construction) — the scale of the islanding problem
2026Cleanview / SemiAnalysis
35 MW / 5 min
LM2500XPRESS aeroderivative unit rating and start time; black-start-capable, grid-independent
2025GE Vernova / Crusoe (29-unit order)
18–36 mo+
aeroderivative gas-turbine lead time (refurb under 12 mo); the speed-to-power constraint behind islanding
2025Data Center Frontier / Grid Capacity Intelligence
~400 J / GPU
Vera Rubin rack-level energy storage for power smoothing (~6x prior gen); cuts peak current ~25%
2025NVIDIA developer blog
~1,500 MW
data-center load lost on a single 230 kV fault; 1.5 GW dropped in 82 s (VA, 2024) — triggered NERC Level 3 alert
2026NERC Level 3 Alert / Utility Dive
IEEE 2030.7 / 2030.8
microgrid-controller specification (2017) and conformance-test method (2018) — the Cx acceptance basis
2017–2018IEEE Standards
~7 days / 512 GPUs
best-in-class cluster MTBF; a single power transient that drops a synchronous job restarts from checkpoint
2025SemiAnalysis (100k H100 clusters)

Failure-mode demonstration at the seam

Microgrid Cx earns its place in the program by demonstrating failure, not just function. The three live demonstrations that define a credible island acceptance are: loss of utility (the planned and unplanned islanding test — prove the seamless or break-before-make transition holds the IT bus), loss of the largest generator (the N-1 contingency — prove the BESS bridges and the controller re-dispatches without a frequency violation), and black start (prove the campus comes back from dead). Each is a fork between live demonstration and analytical acceptance, and each live demo costs schedule and carries the risk of a real trip — but each buys a tested restoration sequence and a characterized transient response that the IST in Chapter 13.6 can build on rather than re-discover.

The discipline is to demonstrate live exactly those failures whose analytical models you do not trust — which, on a first-of-its-kind low-inertia island feeding a novel AI load, is most of them. The transition where the utility lets go, the contingency where a turbine trips off the bus, and the cold black start are precisely the events where a wrong assumption strands a gigawatt of capital. Witness them here, with a load bank standing in for the GPUs, and hand a stable, characterized bus to integrated testing.

The electrical integration this chapter accepts is designed in Chapter 4.8; the fuel and gas-process side in Chapter 4.9; and the grid-interactive behavior toward the point of interconnection — ride-through, reactive/voltage support, frequency response — in Chapter 4.10. The transient physics the controller is tuned against are canonical in Chapter 4.5. Upstream in the commissioning program, electrical power acceptance is Chapter 13.3 and documentation/scripts are Chapter 13.2; downstream, the load-bank-vs-real-workload realism gap and the black-building pull-the-plug test live in Chapter 13.6, and cooling acceptance that must restore in lockstep with generation is Chapter 13.5. The reliability framing — why goodput, not facility availability, is the target the islanding transition is really protecting — is Chapter 12.2.