Guide › Part 11

Part 11

Security

12 chapters

Threat Model, Assets & Security Levels for AI Infrastructure

An AI data center is the rare facility that concentrates nation-state-grade IP, allocation-constrained silicon, and recognized war infrastructure inside one fence line — so the first security decision is not which controls to buy but which adversary tier you intend to survive, because that single target level deterministically derives the entire control stack and its cost.

Physical Security: Siting, Zones & Kinetic/Drone Threats

Physical security is a siting decision before it is a fence decision: where you put the slab, how you stack the concentric zones, and whether you can legally shoot down a drone are choices made years before the first attack — and in 2026 they are no longer hypothetical, because hyperscale AI facilities have been hit by drones in combat.

Supply-Chain Security & Hardware Provenance

A GPU is the most valuable, most counterfeited, most tamperable industrial component on the planet right now — so the security boundary of an AI data center does not begin at the cage door, it begins at the fab and ends at the shredder, and every link you cannot prove is a link an adversary can substitute.

Hardware Root of Trust, Firmware & BMC Security

Firmware is the lowest, most privileged, and least-watched layer in the machine — own the root of trust in silicon and you can prove what is running; cede it to the BMC and an attacker who roots one management controller can brick, backdoor, or wiretap a GPU rack from beneath every defense you bought above it.

GPU Confidential Computing & Trusted Execution

Confidential computing moves the trust boundary from the operator to the silicon — a cryptographic guarantee that the cloud cannot read your weights or your prompts — but you pay for it in attestation plumbing, a narrower-than-advertised threat model, and a performance tax that is near-zero on Blackwell and severe on anything that crosses the PCIe boundary.

Multi-Tenant & Workload Isolation Security

Every shared accelerator is a decision about how much of someone else's blast radius you are willing to inherit — partitioning a GPU is a utilization win and a confidentiality liability at the same time, and the only honest question is which boundary you trust to hold.

Network Segmentation, Microsegmentation & Zero Trust

An AI cluster is one enormous flat east-west fabric built for collective bandwidth, not for containment — so the segmentation decision is really a decision about where the blast radius of a single compromised node ends, and the wrong answer is a perimeter firewall guarding a network that has no interior walls at all.

Model & Weight Protection (At-Rest, In-Transit, In-Use)

The weights are the only asset in the building that is simultaneously worth a nation-state's full operational capacity to steal and small enough to fit on a thumb drive — so the protection problem is not 'encrypt the disk' but 'control every path a few terabytes of float can take out of a machine that exists to emit terabytes per day.'

Insider Threat & Human-Layer Security

Insider threat is the one attack vector that runs through almost every other one, which is why it is the dominant gap keeping frontier programs at RAND Security Level 2 — and why the path to SL4-5 is bought with human-layer controls and organizational friction, not more cryptography.

Cyber-Physical & Destructive Attacks on OT/Facility Systems

The control plane that keeps an AI factory alive — BMS, EPMS, cooling controllers, the power-cap and firmware layers — is also the shortest path to destroying it, because a single forged setpoint or synchronized load step can do in seconds what a kinetic strike needs explosives to do.

Compliance, Certification & Governance

Compliance is not a stamp you collect at the end — it is a design constraint that decides, at scoping time, where your data may sit, which workloads you may host, who may touch the silicon, and how much of your engineering effort is permanently diverted into producing machine-readable evidence; choose the wrong framework portfolio and you have built a campus you cannot legally sell into.

Security Operations, Detection & Incident Response

Every control in Part 11 is a hypothesis about an adversary; security operations is the function that tests those hypotheses in production — and on an AI campus the test is harder because the highest-value asset is a file you cannot watch leave, the highest-consequence attack is one that turns the cooling plant into a weapon, and the detection surface spans a converged cyber-physical estate that no off-the-shelf SOC was designed to watch.