Chapter 4.11
Grounding, Bonding, Earthing, Lightning Protection, SPD & EMC
Grounding is not a code box to tick at the end — it is the silent reference plane the entire facility rides on, and on a gigawatt of millisecond-stepping, phase-coherent GPU load fed by an ungrounded 800 VDC bus, the wrong earthing regime is a safety hazard, a goodput killer, and a one-way concrete decision all at once.
What you'll decide here
- Which AC system-earthing regime (TN-S, TT, or IT) you commit the facility to — the choice is regional code (IEC vs NEC/ANSI) plus a reliability-vs-safety fork that propagates into every protective device, RCD, and bonding detail downstream.
- How you ground and monitor the ungrounded ±400/800 VDC bus that the dense-rack generation introduces — the genuinely novel problem, where there is no neutral, no RCD, and a first ground fault is invisible without an insulation-monitoring device.
- The substation ground-grid design and the ground-potential-rise / step-and-touch budget you build to (IEEE 80) — an irreversible, soil-resistivity-bound civil decision poured before any white space exists.
- The Signal Reference Grid and equipotential-bonding scheme for high-density halls (TIA-607 / IEEE 1100), including how you bond conductive liquid-cooled racks, manifolds, and CDUs that the air-cooled era never had to.
- The multi-stage SPD coordination (Type 1/2/3 staging under IEC 61643 / UL 1449) and the lightning-protection / EMC posture that keep multi-Tb/s SerDes and the management plane alive through a strike or a switching transient.
Of all the electrical subsystems in an AI data center, grounding is the one most likely to be treated as paperwork and most likely to bite. It is invisible when it works, it spans every other discipline — substation civil, power distribution, structured cabling, mechanical, networking — and almost every decision in it is poured, bonded, or buried before commissioning, which makes it expensive to revisit. This chapter treats grounding, bonding, earthing, surge protection and EMC as what they actually are: a set of decisions with consequences, where the fork you pick at design time determines whether a grid disturbance trips a 50 MW hall, whether a maintenance technician survives touching a frame during a fault, and whether a first ground fault on an ungrounded DC bus is a logged warning or a silent landmine waiting for the second fault.
The AI-factory era changed the stakes in three specific ways. First, the load is no longer a benign collection of dual-corded servers — it is a phase-coherent population of accelerators that steps from idle to >150 kW/rack and back in milliseconds, injecting common-mode noise and switching transients into every reference plane it touches. Second, the dense-rack generation is migrating off the grounded 415/480 VAC neutral onto an ungrounded ±400/800 VDC bus — a regime with no neutral, no residual-current device, and a fundamentally different fault philosophy. Third, the racks themselves became wet: conductive coolant, metal manifolds, CDUs and dripless quick-disconnects are now bondable objects carrying their own touch-potential and stray-current concerns. The earthing system has to absorb all three at once.
The AC system-earthing fork: TN-S, TT, IT
The first and most consequential fork is which system-earthing regime the LV distribution runs. In IEC terminology the choice is named by a two-letter code: the first letter is the source-to-earth relationship (T = directly earthed, I = isolated/impedance-earthed), the second is the exposed-conductive-parts-to-earth relationship (T = locally earthed, N = bonded to the source neutral). The mainstream data-center answer in IEC regions is TN-S — a single point of earthing at the source, with separate protective-earth (PE) and neutral (N) conductors carried all the way to the load. In North America the equivalent is a solidly grounded wye with a separate equipment-grounding conductor (EGC) per NEC, which is functionally TN-S by another name. The fork that matters is not TN-S vs its regional twin; it is whether any part of the facility deliberately departs from a solidly-earthed source toward IT (isolated/impedance-earthed) — and the dense-rack DC bus drags exactly that question to the front (covered below).
The consequence chain runs through every protective device. A solidly-earthed TN-S system clears an earth fault as a high-magnitude short circuit — overcurrent protection (breakers, fuses) sees it and trips, and supplementary residual-current devices (RCDs / GFCIs) catch the lower-magnitude faults that overcurrent misses. A TT system (source earthed, but loads earthed to a separate local electrode rather than the source neutral) cannot rely on overcurrent for earth faults because the fault loop runs through soil; it mandates RCDs, and it is common where a facility cannot trust the quality of the utility's earth. An IT system is the deliberate opposite: the source is isolated or impedance-earthed so that a first earth fault does not draw enough current to trip anything — the load keeps running — and the system instead carries an insulation-monitoring device that alarms so the fault can be found and fixed before a second fault on a different phase creates a true short. That continuity-of-service property is precisely why IT earthing is the conceptual ancestor of the ungrounded DC bus, and why hospitals and process industries have used it for decades.
| Regime | Source earthing | First earth-fault behavior | Clears via | Where it fits in an AI DC |
|---|---|---|---|---|
| TN-S (IEC) / solidly-grounded wye + EGC (NEC) | Single point at source; PE and N separate to the load | High fault current; load trips immediately | Overcurrent (breaker/fuse) + supplementary RCD/GFCI | The mainstream LV default; lowest-cost protection, fastest fault clearing |
| TT | Source earthed; loads to a separate local electrode | Fault loop runs through soil; current may be too low for overcurrent | RCDs mandatory | Where utility earth quality is poor or the site electrode is independent |
| IT (isolated / impedance-earthed) | Source isolated or earthed through high impedance | First fault draws negligible current; load keeps running | Insulation-monitoring device alarms; second fault then cleared | Continuity-critical loads; the conceptual parent of the ungrounded DC bus |
The substation ground grid, GPR and the IEEE 80 budget
Before any white space exists, the on-site MV/HV substation (Chapter 4.2) needs a buried ground grid — a mesh of bare copper conductors and driven rods that gives fault current a low-impedance path back to source and, crucially, controls the voltages a person can be exposed to during a fault. When a ground fault dumps current into the earth, the whole grid rises in potential relative to remote earth: this is ground-potential rise (GPR), and it can reach kilovolts on a stiff fault. The danger is not the absolute rise but the gradients it creates — the touch voltage between a grounded structure and the earth a person stands on (hand-to-feet), and the step voltage between one foot and the other across the soil. IEEE 80 (the substation grounding standard) is the design basis that bounds both against tolerable body-current limits derived from fault clearing time and body weight.
The decisive, irreversible input is soil resistivity, which can vary by orders of magnitude between a wet clay site and a dry rocky or desert one and which you must measure (Wenner four-pin survey) before grid design, not assume. High-resistivity soil forces a larger, denser grid, more rods, deeper electrodes, or imported low-resistivity backfill / ground-enhancement material to hit the target grid resistance and keep touch/step voltages inside the IEEE 80 envelope. Get this wrong and the consequences are poured into concrete: a substation energized over an under-designed grid is a step-and-touch hazard that is enormously expensive to retrofit, and a high grid resistance also degrades fault clearing and lifts GPR onto every bonded metallic path leaving the site — fences, cable shields, pipework, even the structured-cabling bonding network in the halls. This is a power-bound, civil-first decision: the grid goes in with the substation, on the same critical path as the interconnect, and it is one of the least reversible items in the build.
Equipotential bonding and the Signal Reference Grid
Inside the halls the governing idea is equipotential bonding: tie every exposed conductive surface — rack frames, busway enclosures, cable trays, containment, raised-floor structure, and now liquid-cooling metalwork — to a common reference so that during a fault or a transient they all rise and fall together and there is no dangerous potential difference for a person or a sensitive signal to see. The structured-cabling discipline formalizes this as a bonding network: a telecommunications main grounding busbar (TMGB) at the entrance, secondary busbars (TGBs) per space, and bonding conductors tying racks and pathways back to it. In North America the reference is ANSI/TIA-607 (telecommunications bonding and grounding); the equipment-and-electronics rationale and the high-frequency reference-plane concept come from IEEE 1100 (the Emerald Book).
For high-density, high-frequency halls the upgrade from a radial (single-point) bond to a Signal Reference Grid (SRG) — a fine copper mesh under or bonded across the floor, with racks tied to it at multiple points — matters because at the frequencies modern silicon and SerDes operate, a long single bonding conductor is an inductor, not a short. A mesh keeps the reference impedance low across a wide band, suppresses common-mode noise, and stops fault or transient energy from finding a path through signal cabling. A single-point bond is cheaper and adequate for low-density legacy IT, but a dense AI hall with multi-Tb/s links and a load that slams common-mode current into everything wants a meshed SRG. Retrofitting one under a live, wet, 132 kW/rack floor is miserable, so it is a design-time call.
The novel problem: grounding the ungrounded ±400/800 VDC bus
This is the section the legacy textbooks do not cover, and the canonical home for it in this guide. The dense-rack generation moves the distribution from a grounded AC wye to an ungrounded (or high-resistance-grounded) DC bus at ±400 VDC (the Mt Diablo / Diablo 400 ecosystem) or 800 VDC (the NVIDIA reference and the OCP 800 VDC work), delivered from a sidecar power rack (Chapter 4.1). The DC bus is deliberately run as the conceptual cousin of an AC IT system: floating relative to earth so that a single ground fault on one pole draws negligible current and does not shut down a hall of accelerators. There is no neutral to reference, no RCD that works the way it does on AC, and the protective philosophy inverts from 'trip on first fault' to 'detect, alarm, locate, and survive to the next maintenance window.'
The instrument that makes this safe is the insulation-monitoring device (IMD): it continuously measures the insulation resistance of the entire floating system to earth and alarms when it degrades, so the first fault becomes a tracked work-order rather than an invisible loss of your safety margin. Pair it with ground-fault detection and location so operations can find which branch faulted on a live bus, and with protective devices rated for the full bus voltage on both poles (at 800 VDC a fault can put the full 800 V across a device, not 400). High-resistance grounding (HRG) — earthing the bus through a deliberate high impedance instead of leaving it fully floating — is the pragmatic middle: it bounds the fault current and transient overvoltages while still letting the system ride through a first fault, at the cost of a small continuous leakage and the grounding-conductor sizing rules that the higher voltage tightens. A fully floating bus maximizes continuity but demands flawless insulation monitoring and disciplined fault-clearing operations; an HRG bus trades a sliver of that continuity for bounded fault energy and easier overvoltage control. Either way, the IMD is not optional — without it you have not built a resilient ungrounded system, you have built a system that silently runs on one fault until the second one arcs.
| Scheme | First ground-fault behavior | Detection requirement | Pros | Cons |
|---|---|---|---|---|
| Ungrounded / floating (IT-like) | Negligible fault current; load keeps running | Insulation-monitoring device (IMD) + ground-fault location, mandatory | Maximum continuity; no nuisance trips on first fault | Transient overvoltage risk; safety depends entirely on monitoring and disciplined fault clearing |
| High-resistance grounded (HRG) | Small bounded fault current; load keeps running | IMD / ground-fault relay; alarm on fault | Bounds fault current and overvoltage; easier to locate faults | Small continuous leakage; tighter grounding-conductor sizing at 800 VDC |
| Solidly / low-resistance grounded | High fault current; protection trips the affected segment | Overcurrent / DC ground-fault protection | Familiar trip philosophy; lowest overvoltage stress | Sacrifices the continuity that motivated the DC architecture in the first place |
Multi-stage SPD coordination: Type 1/2/3 staging
Surge-protective devices defend the facility against overvoltage transients — lightning-induced surges, utility switching events, and the internally-generated transients a GPU load throws when it steps hundreds of kilowatts in milliseconds. The engineering principle is cascaded, coordinated staging: no single device both survives a direct-strike-class surge and clamps to a voltage low enough to protect electronics, so you stage them. Type 1 (IEC: Class I, tested with the 10/350 µs waveform that mimics a direct lightning current) sits at the service entrance and takes the brunt. Type 2 (Class II, 8/20 µs) sits at downstream distribution boards and handles residual and switching surges. Type 3 (Class III) sits close to sensitive equipment as a final clamp. The two governing standard families are IEC 61643 (international; IEC 61643-11 was rolled into the expanded IEC 61643-01:2024) and UL 1449 (North America; the 5th Edition is the current listing basis, ANSI-approved in 2025, and NEC requires SPDs to be UL 1449 listed).
The consequence of getting the coordination wrong is subtle and common: if the upstream and downstream devices are not energy-coordinated, a downstream Type 2/3 device can try to clamp a surge the upstream Type 1 should have absorbed, and it fails — or the upstream device's let-through voltage is too high for the downstream device's rating and you cascade a failure. SPDs must be coordinated as a system (let-through voltage, nominal discharge current, short-circuit current rating) with the right disconnect and fusing, and they must be maintainable — they degrade with every surge they absorb, so monitored status and replaceable modules are not luxuries on a facility that cannot take a hall down to swap a sacrificial part. SPDs are also a primary reason the lightning-protection and bonding systems must be unified: a strike that is not given a low-impedance bonded path will find one through your SPDs and your signal cabling.
Lightning protection and EMC
The lightning-protection system (LPS) on the building envelope — air terminals, down-conductors, and a low-impedance connection to the earthing system — is governed internationally by IEC 62305 (and NFPA 780 in North America). Its job is to intercept a strike, conduct it to earth, and equipotentialize the structure so the energy does not pass through the building and its electronics. The physical envelope coordination — where air terminals sit, how down-conductors route around a liquid-cooled roof plant, how the LPS earth ties into the substation grid — is detailed in Chapter 6.3; the electrical obligation here is that the LPS earth, the substation ground grid, and the in-hall bonding network must be a single, equipotentialized system. Separate, isolated earths are the classic mistake: a strike on an isolated LPS earth drives a transient potential difference into everything bonded to the 'other' earth, and the path of least impedance runs through your SerDes.
EMC closes the loop. Multi-Tb/s SerDes and a sensitive management plane live or die on common-mode noise control, and the grounding/bonding system is the EMC system at low frequency. The mechanisms that matter: a low-impedance meshed reference (the SRG above) to give common-mode currents somewhere to go; correct shield bonding of STP copper and fiber armor — bonded at the right points to drain induced currents without creating ground loops; and segregation of noisy power runs from sensitive signal pathways. The detailed STP/fiber-shield bonding practice for the network plant is consolidated in Chapter 8.9; the decision that lands here is that EMC performance is not a property you add with filters at the end — it is largely determined by the bonding topology you poured into the floor, which is why the SRG, the LPS, and the SPD staging all have to be designed as one system.
Deep dive: why the first ground fault on an 800 VDC bus is the dangerous one
On a grounded AC system the fault model is intuitive: a hot conductor touches a frame, a large current flows back to the source neutral, a breaker sees it and trips. The fault announces itself and the protection clears it. The ungrounded/HRG DC bus deliberately breaks that model to preserve continuity, and the price is that the first fault is silent. With no low-impedance path to earth, a single pole-to-ground fault draws almost no current — nothing trips, nothing alarms unless an instrument is watching the insulation resistance. The system keeps running, which is the whole point. But the bus is now referenced by that fault: the formerly-floating system is effectively earthed at the fault point, and the opposite pole now sits at close to the full bus voltage above ground. At 800 VDC that is the full 800 V, not 400. A second fault on the opposite pole is now a pole-to-pole short through two ground points — a high-energy DC arc fault that DC's lack of a natural current zero makes especially hard to interrupt.
This is why the insulation-monitoring device is the load-bearing safety element of the entire DC architecture, not an accessory. The IMD continuously injects a small signal and measures system-to-earth resistance, alarming when it degrades toward a threshold long before it becomes a hard fault. Ground-fault location adds the ability to pinpoint the faulted branch on a live bus so it can be isolated on a planned window. The operational discipline that follows is non-negotiable: an IMD in alarm means you have spent your one free fault, and the system must be treated as compromised until it is found and cleared. Run a floating bus with the IMD bypassed, miscalibrated, or chronically in alarm and you have quietly converted a resilient design into a single-fault-from-catastrophe one — the exact opposite of the intent. The voltage-architecture decision that creates this obligation is in Chapter 4.1; the fault-current and relaying coordination it feeds is in Chapter 4.2.
Deep dive: unifying the earths — why isolated ground systems are the recurring failure
A surprising amount of grounding pathology traces to a single anti-pattern: someone built more than one 'earth' and let them float relative to each other. The candidates are the substation ground grid, the building lightning-protection earth, the electrical equipment-grounding system, the structured-cabling bonding network, the mechanical/cooling-plant bonding, and any 'isolated' or 'clean' ground a vendor asked for. The intuition behind separating them — keep noisy power grounds away from clean signal grounds — is exactly backwards at the frequencies and energies that matter. When a fault or a lightning strike dumps energy into one earth, every metallic path bonded to a different earth sees a transient potential difference, and that difference drives current through whatever ties the two systems together — invariably a signal cable, a shield, or a sensitive interface that was never meant to carry it.
The correct posture is a single, equipotentialized earthing system with a common bonding network: one low-impedance reference that the LPS, the substation grid, the electrical EGC/PE, the SRG, and the cooling-plant bonding all tie into. 'Clean' references are achieved by topology and segregation within the bonded network (meshed reference, careful routing, shield management) — not by a galvanically separate earth. For an AI hall the stakes are higher than legacy IT because the load injects common-mode current continuously and the links are multi-Tb/s: a few volts of transient potential difference across a 'split' ground is enough to corrupt a management plane or degrade a SerDes margin. Get the unification right at design time and EMC, lightning, and fault safety all improve together; get it wrong and you chase intermittent, undiagnosable link errors for the life of the facility. The network-side shield-bonding detail is in Chapter 8.9; the envelope LPS coordination in Chapter 6.3.
How the pieces have to be designed as one system
The throughline of this chapter is that grounding does not decompose into independent subsystems you can hand to separate trades. The substation grid sets the GPR that the in-hall bonding network must survive; the bonding network is the EMC reference the SerDes ride on; the SPD staging only works if it has a low-impedance bonded path to the same earth the LPS uses; and the DC-bus monitoring philosophy only makes sense against the AC earthing regime it descends from. The forks compound:
- Solidly-earthed vs IT/HRG — trip-on-first-fault safety vs ride-through-and-alarm continuity, decided per power domain, and mirrored on the DC bus by floating vs HRG.
- Single-point bond vs meshed SRG — cheaper and fine for low-density legacy IT; a meshed reference is effectively mandatory for a dense, wet, multi-Tb/s hall and cannot be retrofitted cheaply.
- Unified earth vs split/clean earths — the single most reliable predictor of EMC and lightning trouble; unify and equipotentialize, segregate by topology not by separate electrodes.
- SPD coordination as a system vs device-by-device — energy-coordinated Type 1/2/3 staging with monitored, replaceable modules vs an uncoordinated cascade that fails on the surge it was bought to stop.
Each is a density-ramp-sensitive decision: the substrate (buried grid, floor SRG, bonded cooling network, service-entrance SPD provisioning) is poured or bonded early and is painful to revisit, so it must accommodate the ramp to 600 kW-class racks and an 800 VDC bus even when today's hall is lighter. Reserve the reference-plane and earthing headroom you cannot retrofit; defer only the device-level spend you can swap later.