Chapter 11.8

Model & Weight Protection (At-Rest, In-Transit, In-Use)

The weights are the only asset in the building that is simultaneously worth a nation-state's full operational capacity to steal and small enough to fit on a thumb drive — so the protection problem is not 'encrypt the disk' but 'control every path a few terabytes of float can take out of a machine that exists to emit terabytes per day.'

GOODPUTDENSITY-RAMP

What you'll decide here

Which Weights Security Level (WSL/SL1-5) you are actually defending to — i.e., which adversary tier (opportunistic, cybercrime, insider, top-tier nation-state) your controls stop — because the honest answer for most frontier labs in 2026 is SL2-SL3, far below the OC5 actor the weights attract.
Where the choke point lives: egress-rate caps and upload limits on a fabric whose entire purpose is high-bandwidth output, versus accepting that a production inference server cannot have its primary channel throttled and pushing the boundary inward to attestation-gated decryption.
Whether weights are ever in cleartext in HBM at all — plaintext-in-use with perimeter controls (fast, simple, breached-once-lost) versus attestation-gated key release into a GPU TEE so the weights only decrypt inside a measured enclave (defensible, but a CC tax and a new key-management plane).
How the key hierarchy and HSM/KMS root are operated at fleet scale: who can authorize a key release, how rotation and revocation propagate across tens of thousands of accelerators, and whether a single compromised KMS credential unlocks the crown jewels.
Whether the cryptography protecting decade-lived weights is crypto-agile and on a post-quantum path now, because 'harvest-now-decrypt-later' makes today's RSA/ECC-wrapped key material a future liability you cannot patch after exfiltration.

Every other asset in an AI data center is replaceable. A GPU that fails is swapped in ninety seconds; a transformer that trips is re-energized; a tenant's data is restored from backup. The weights are the exception. They are the distilled output of a training run that may have cost hundreds of millions of dollars and burned an interconnection slot you cannot get back — and unlike the hardware, they can be copied perfectly, instantly, and silently. That asymmetry is the entire chapter. The crown jewels are a few terabytes of floating-point that, once they leave, are gone with no way to claw them back and no way to know for certain they are gone.

This chapter is structured as a sequence of decisions across three states of the asset — at-rest (on disk, in checkpoints, in object storage), in-transit (across the fabric, between regions, to and from storage), and in-use (decrypted into HBM where a GPU can actually compute on it). The organizing framework is RAND's Securing AI Model Weights and its five Weights Security Levels, because it is the only published model that ties controls to a named adversary tier rather than to a compliance checklist. We treat egress as the choke point, attestation-gated key release as the in-use answer, key management and PKI as a discipline rather than a feature, and we close on the crypto-agility problem that long-lived weights force on you whether you want it or not. The data-governance and privacy regime — who is allowed to train on what — is a different problem; it lives in Chapter 10.10.

The framework: Weights Security Levels and the adversary you actually face

You cannot scope weight protection until you have named the adversary, and the industry's reference for naming adversaries is RAND's framework. It defines five Weights Security Levels (WSL, SL1-SL5), five operational-capacity tiers of attacker (OC1 amateurs through OC5 top-priority nation-state operations), and catalogs 38 distinct attack vectors. The benchmark is deliberately concrete: a given security level is defined by whether it can prevent a given attacker tier from stealing the weights in under roughly two months of dedicated effort. RAND's own assessment is the uncomfortable headline — most frontier labs in 2024-2026 sit around SL2, capable of stopping opportunistic actors and basic insiders, but not the OC4-OC5 nation-state operations that the strategic value of frontier weights actively invites. Eight of the 38 vectors are infeasible for OC1-OC3 but feasible for OC4-OC5 — which is to say the gap between what most operators defend against and what the weights attract is structural, not incremental.

That gap is why a cross-industry SL5 Task Force formed in March 2025, targeting nation-state-resistant frontier infrastructure by 2028/2029 and publishing an SL5 standard framed as a NIST SP 800-53 overlay (43 controls across 10 families, spanning network, physical, machine, personnel, and supply-chain streams). The reason the timeline is years out is the same reason siting and interconnection are: the controls with the longest lead time — facility construction, hardware procurement, organizational capability — must be planned before the steel is cut. SL5 is a building decision rather than a software patch, which is exactly why it belongs in this guide and not in a security-tooling appendix.

The master fork: which adversary tier do your controls actually stop?

Pick the Weights Security Level you are genuinely defending to, not the one the marketing deck implies. SL1-SL2 stops amateurs, opportunistic crime, and untrained insiders with standard cloud hygiene, RBAC, and disk encryption — cheap, fast, and adequate for non-frontier or open-weight models. SL3 adds egress-rate limiting, hardware-backed key custody, two-person controls, and aggressive insider mitigation — the level a serious frontier lab can plausibly reach in 2026, and the level Anthropic has publicly committed to. SL4-SL5 targets professional and top-tier nation-state operations and demands attestation-gated in-use protection, air-gapped or confidential-by-default compute, hardware supply-chain assurance, and personnel programs that most operators have never built. The cost of mis-stating your level is not embarrassment — it is designing a facility whose irreversible substrate (network topology, HSM placement, confidential-compute capability, secure rooms) cannot reach the level the weights will eventually demand. Decide the target level before you pour the slab; the in-use and key-custody decisions below all inherit from it.

At-rest: the easy state, and why it is not enough

At-rest protection is the one part of the problem that is genuinely close to solved by mature practice, and it is therefore the part operators over-index on. Weights and checkpoints live encrypted on disk and in object storage under envelope encryption: a data-encryption key (DEK) encrypts the bytes, a key-encryption key (KEK) in an HSM or KMS wraps the DEK, and the HSM root never leaves tamper-resistant hardware. A 175B-parameter checkpoint is roughly 2.3 TB and a 1-trillion-parameter checkpoint near 13.8 TB at ~14 bytes/parameter including optimizer state, so at-rest encryption is bulk symmetric crypto (AES-256-GCM/XTS) where the performance cost is negligible against the storage bandwidth you already provisioned for checkpointing (Chapter 9.4).

The fork that matters at-rest is not whether to encrypt but where the key lives and who can release it. Disk-level transparent encryption with keys resident on the host protects against a stolen drive and nothing else — the moment an attacker has code execution on the host, the cleartext is theirs. The defensible posture binds DEK release to an attestation of the requesting node, so a checkpoint only decrypts on a machine whose firmware, kernel, and GPU state match a known-good measurement. That single design choice converts at-rest protection from 'encrypted until someone logs in' into 'encrypted until a verified machine asks' — and it is the same machinery that the in-use decision will reuse, which is why getting the key hierarchy right at-rest pays off twice. The deeper consequence: at-rest is the state where you have the most time and the fewest constraints, so it is where you should spend your strongest cryptography and your strictest access policy. It is also, bluntly, not where the weights get stolen.

Egress as the choke point: the inference-server paradox

Here is the decision the whole field is stuck on. The cleanest way to stop weight exfiltration is to cap the rate at which bytes can leave the secure environment — an egress-rate limit or upload limit — because the weights are large and the attacker has to move all of them. The arithmetic is stark: if a model is ~1,000 Gb (~125 GB) and you cap egress at 800 GB/day, a determined exfiltrator still drains it in well under two days across any channel. To make exfiltration take long enough to detect and interdict, the cap has to be aggressive — and that is exactly where it collides with the workload.

The collision is the inference-server paradox: a production inference fleet's entire job is to emit enormous volumes of data — on the order of a terabyte of tokens per day per server — to users on the outside. You cannot rate-cap the primary output channel without breaking the service that earns the revenue. So egress limiting works beautifully for a training cluster that has no reason to send weight-sized payloads outward, and works poorly for an inference cluster whose normal behavior is high-volume output. The 2026 frontier of this problem (active research, not settled practice) is inference-output verification — proving that what leaves is plausibly model output and not encoded model weights — and the looming counter-threat is aggressive weight compression: preliminary work suggests weights may be compressible toward ~1 bit/parameter in a theft context, shrinking the payload an attacker must exfiltrate and undercutting any fixed-rate cap. Egress limiting is necessary and it is not sufficient, and any honest design says so.

The three states of weights → control posture and the dominant exfiltration path

State	Primary control	Dominant exfil path it closes	Residual path it leaves open	Defends to (WSL)
At-rest (disk / object store / checkpoints)	Envelope encryption (AES-256), HSM/KMS-held KEK, attestation-gated DEK release	Stolen drive; offline checkpoint copy; backup theft	Host with code-exec reads cleartext after key release	SL2-SL3
In-transit (fabric / storage / cross-region)	mTLS / IPsec / link encryption; signed manifests; egress-rate caps; upload limits	Passive wiretap; bulk copy to external endpoint at high rate	Slow low-and-slow exfil; compressed weights; inference channel abuse	SL2-SL3
In-use (decrypted in HBM)	Plaintext-in-use + perimeter (fast, simple)	Nothing structural — relies entirely on the perimeter holding	Any host/firmware/insider compromise yields cleartext HBM	SL1-SL2
In-use (decrypted in HBM)	Attestation-gated key release into a GPU TEE; encrypted HBM; TEE-I/O over NVLink	Host OS, hypervisor, BMC, and operator-with-root reading cleartext weights	Side channels; metadata leakage; supply-chain/firmware below the RoT	SL3-SL4

Decision-and-consequence view. 'Defends to' is the realistic Weights Security Level the listed controls reach against RAND's adversary tiers (OC1-OC5); higher levels require the in-use column to be genuinely attestation-gated, not perimeter-only.

In-transit: cheap to encrypt, hard to make atomic

In-transit protection on the wire is the least controversial of the three states: mutual TLS or IPsec for north-south and control-plane traffic, link-layer or fabric encryption for the high-bandwidth east-west path, and signed, hash-anchored manifests so a tampered checkpoint is detected before it is loaded. The decision here is where the encryption boundary sits relative to the fabric you are trying to keep fast. Encrypting the back-end collective fabric — the non-blocking InfiniBand or Spectrum-X plane that exists for all-reduce bandwidth — adds latency and, more importantly, may not be terminable at line rate without offload, so the common posture is to encrypt the storage and cross-region paths hard while relying on physical and segmentation controls (Chapter 11.7) for the in-rack scale-up fabric. The GPU-TEE answer (below) collapses part of this problem by extending the trust boundary across NVLink with TEE-I/O, so weights stay encrypted even on the inter-GPU link.

The subtler in-transit problem is provenance and integrity, not confidentiality. A weight blob that arrives intact but wrong — a poisoned checkpoint, a backdoored fine-tune, a substituted base model — is an integrity failure that encryption does nothing to catch. The control is a signed chain: every checkpoint and released weight artifact carries a cryptographic signature and a measurement that ties it back to the training run and the build pipeline, validated before load. This is the same measured-boot and golden-measurement machinery used for firmware integrity (Chapter 11.4), applied to the model artifact. Skip it and you have protected the weights from being read but not from being silently replaced — and a replaced model is, for many threat models, worse than a stolen one.

In-use: the only state where the weights must be cleartext

This is the hardest state and the one where the real decision lives, because to compute on weights a GPU must, at some instant, have them in usable form in HBM. The classic posture is plaintext-in-use behind a perimeter: decrypt the weights into HBM, trust that the host OS, hypervisor, BMC, network segmentation, and personnel controls keep everyone else out. It is fast, simple, and adds zero compute tax — and it fails completely the moment any one of those layers is breached, because the cleartext is sitting in memory for anyone with sufficient privilege to read. Against an OC4-OC5 adversary, or an insider with root, that perimeter is not a boundary; it is a speed bump.

The alternative is attestation-gated key release into a GPU TEE: the weights stay encrypted until a confidential-computing enclave on the GPU proves, via a hardware-rooted attestation, that its firmware and configuration match a known-good measurement — only then does the KMS release the decryption key, and the weights decrypt inside encrypted HBM where even the host operator with root cannot read them. NVIDIA's Blackwell and Hopper confidential computing is the reference implementation: ~90% of GPU memory placed inside the encrypted Compute Protected Region, the BAR0 decoupler hiding ~99.78% of memory-mapped registers in CC mode, a 5-certificate device-identity chain validating 64 structured measurement records against NVIDIA's remote attestation service and golden RIMs, and 44+ per-channel session keys derived from one SPDM-negotiated master secret. The full architecture and its residual side-channels are the subject of Chapter 11.5; what matters here is the decision it forces.

Attestation-gated key release is what makes 'confidential' mean something

Confidential computing without attestation-gated key release protects nothing. The point is not that the HBM is encrypted — it is that the decryption key is withheld until the requesting machine cryptographically proves it is the machine you think it is, running the firmware you approved, in the configuration you measured. The weight key becomes a function of the machine's verified state. Flash a malicious BMC, downgrade the firmware, or boot an unmeasured kernel, and the attestation fails and the key never arrives — the weights stay ciphertext. This is the mechanism that lets you run frontier weights on infrastructure you do not fully trust (a neocloud, a colo, a multi-tenant host) and still bound who can see cleartext to 'a verified enclave' rather than 'anyone with a root shell.' The cost is real — a key-management plane that can issue and revoke release policies at fleet scale, plus the CC performance tax — but it is the only in-use posture that survives an insider with privileged access. See Chapter 11.5 for the enclave internals and Chapter 11.9 for why the insider is the vector this is really defending against.

5 levels / 5 tiers / 38 vectors

RAND Weights Security Levels (SL1-5), attacker operational-capacity tiers (OC1-5), and catalogued attack vectors

2024RAND RRA2849-1 (Securing AI Model Weights)

~SL2

where RAND assesses most frontier labs currently sit — stops opportunistic actors and basic insiders, not OC4-OC5 nation-states

2024-2026RAND RRA2849-1

2028/2029

SL5 Task Force target for nation-state-resistant frontier AI infrastructure; SL5 standard = 43 controls / 10 families (NIST SP 800-53 overlay)

2025-2026SL5 Task Force / Institute for Security & Technology

~1.25 days

to exfiltrate a ~1,000 Gb model even under an 800 GB/day egress cap — why fixed-rate limits are necessary but not sufficient

2025LessWrong/Alignment Forum egress-limit analyses

~1 TB/day

token output of a single production inference server — the channel that cannot be rate-capped without breaking the service

2025Inference-verification exfiltration research

~1 bit/param

preliminary feasible weight-compression floor in a theft context — shrinks the payload an attacker must move, undercutting fixed egress caps

2026arXiv 'Aggressive Compression Enables LLM Weight Theft'

~90% / ~99.78%

GPU HBM inside the encrypted Compute Protected Region; memory-mapped registers hidden by the BAR0 decoupler in CC mode

2025arXiv 2507.02770; NVIDIA WP-12554

2.3-13.8 TB

checkpoint size for a 175B to 1T-param model at ~14 bytes/param incl. optimizer state — the at-rest bulk the crypto must wrap

2025NVIDIA storage guidance; checkpoint-sizing rules of thumb

Key management and PKI as a discipline, not a feature

Every control above — at-rest envelope encryption, in-transit mTLS, attestation-gated in-use release — collapses to the same root: a key hierarchy and the discipline that operates it. Get the keys wrong and the strongest cryptography in the world is a single stolen credential away from irrelevant. The hierarchy is conventional in shape and unforgiving in operation: a hardware root of trust (an HSM cluster, FIPS 140-3 Level 3 or equivalent) holds the top-level KEKs that never leave tamper-resistant silicon; those wrap per-domain and per-tenant keys in a KMS; those in turn wrap the DEKs that encrypt actual weight blobs. The decisions that distinguish a real program from a checkbox are about who can release, how fast you can revoke, and what a single compromise unlocks.

Authorization to release is where two-person control earns its keep: a weight-key release for the crown-jewel model should require an attestation and a policy that no single human or service credential can satisfy alone — the separation-of-duties and least-privilege discipline that Chapter 11.9 treats as the dominant unaddressed vector. Rotation and revocation at fleet scale is the operational nightmare nobody budgets for: re-wrapping DEKs across tens of thousands of accelerators, propagating a revocation before an attacker can use a leaked key, and doing it without stalling training goodput. A KMS that can issue a key in milliseconds but takes hours to revoke one across the fleet is a KMS that fails exactly when it matters. And blast radius: if one compromised KMS credential or one over-scoped IAM role can release every weight key, you have built a hierarchy with no interior walls — the same anti-pattern Chapter 11.7 warns about for the network, now applied to the keys. The control-plane secrets that operate this machinery (KMS credentials, attestation-service trust anchors, signing keys) are themselves crown-jewel-adjacent and belong under the same custody discipline as the weights they protect (Chapter 10.6, Chapter 11.7).

Deep dive: insider exfiltration paths, and why they bypass most of the stack

RAND's framework and the SL5 effort both land on the same uncomfortable conclusion: the insider, not the remote nation-state, is the path most operators leave open. The reason is that insiders sit inside the perimeter the at-rest and in-transit controls assume is the boundary. A privileged operator with root on a host running plaintext-in-use weights can read HBM directly. A storage admin with KMS access can request a DEK release and pull a cleartext checkpoint. A platform engineer can stage weights to a low-and-slow side channel that stays under any egress cap. None of these require breaking encryption; they require legitimate access used illegitimately, which is precisely what cryptography does not address.

This is why attestation-gated key release matters as an insider control specifically: it removes 'operator with root' from the set of principals who can see cleartext, because the key is bound to a verified machine state rather than to a human credential. But it is only one layer. The complete posture pairs it with two-person controls on key release, separation of duties so no single role spans both the key and the data, behavioral and access monitoring on the privileged paths, hardware-enforced data-loss-prevention on egress, and the personnel-security and offboarding discipline that Chapter 11.9 develops in full. The choice is stark: spend your marginal security dollar on the remote-attacker perimeter and you harden the path the data shows is least likely to be used; spend it on insider mitigation and attestation-gated in-use protection and you close the path that actually matters. The egress-as-choke-point control and the in-use enclave are the two that an insider cannot trivially walk around — everything else, a sufficiently privileged insider can.

Crypto-agility and the post-quantum clock on long-lived weights

The last decision is the one easiest to defer and most expensive to get wrong, because its consequence lands years after the choice. Frontier weights are long-lived secrets — a model trained in 2026 may retain strategic and economic value for a decade. The asymmetric cryptography that wraps key material today (RSA, ECC) is the part of the stack a future cryptanalytically-relevant quantum computer threatens, and the threat does not wait for that computer to exist: harvest-now-decrypt-later means an adversary can exfiltrate today's RSA/ECC-protected key material and weight blobs now, store them, and decrypt them when the capability arrives. For a one-week-lived session token this is a non-issue. For a decade-lived weight, the encryption you choose in 2026 has to survive an adversary's 2035 capabilities.

The decision is therefore crypto-agility now: design the key hierarchy and the in-transit/at-rest envelope so the algorithms can be swapped without re-architecting, adopt NIST-standardized post-quantum KEMs and signatures (ML-KEM/ML-DSA) for the asymmetric layers protecting long-lived weight keys, and run hybrid classical-plus-PQC during the transition. The symmetric bulk encryption (AES-256) is already quantum-resistant enough at full key length, so the migration is concentrated in the key-wrapping and signing layers — which is exactly where attestation and PKI live. The consequence of deferring: you cannot retroactively protect weights that have already been harvested under classical crypto. The forward-pointer for the full subsystem roadmap — including where PQC lands in the 2026-2030 transition — is Chapter 16.2.

Reaching higher Weights Security Levels: what each step actually costs

Climbing the WSL ladder is not a matter of buying a product; each level demands controls with different lead times and irreversibility, and the most consequential ones are facility decisions that must be made before construction. The honest framing is that SL1-SL2 is operational hygiene you can add late, SL3 is a serious program you can retrofit with effort, and SL4-SL5 is a design basis you largely have to build in from the start.

SL1 → SL2 (stop amateurs and basic insiders): at-rest encryption, RBAC and least privilege, MFA, audit logging, standard cloud hygiene. Cheap, late-bindable, reversible. The floor for any non-open-weight model.
SL2 → SL3 (stop cybercrime and trained insiders): egress-rate limiting and upload caps, HSM-backed key custody, attestation-gated DEK release, two-person controls on weight-key release, behavioral monitoring, hardened offboarding. This is the level a committed frontier lab can reach in 2026 (Anthropic's public SL3 commitment is the reference point). Mostly retrofittable, but the egress and key-custody architecture is easier built-in.
SL3 → SL4 (stop professional/state-adjacent operations): attestation-gated in-use protection as the default (GPU TEEs, confidential-by-default compute), hardware supply-chain assurance and firmware integrity below the root of trust, network microsegmentation with DPU enforcement, and personnel-security programs. This starts to dictate the hardware you buy and the topology you build — increasingly irreversible.
SL4 → SL5 (stop top-tier nation-state operations): the SL5 standard's 43 controls — air-gapped or strongly-isolated compute for the highest-value weights, secure facilities, supply-chain provenance end-to-end, and organizational capability that takes years to build. This is the level the SL5 Task Force targets for 2028/2029 because, like interconnection, it is gated by lead time, not by will.

The recurring trap is the same one that governs density and power: designing the irreversible substrate to today's level and being unable to reach tomorrow's. If there is any chance a facility will host weights that attract OC4-OC5 attention, the confidential-compute capability, the HSM placement, the segmentation topology, and the secure-room provisions are headroom you reserve at scoping time — because, unlike the egress policy, you cannot retrofit them after the weights arrive.

Plaintext-in-use plus a perimeter is not a frontier-weight control

The most common over-confidence in 2026 is treating a hardened perimeter — strong segmentation, strict IAM, encrypted disks and links — as if it protected the weights in use. It does not. The instant weights are decrypted into HBM for compute, a perimeter-only posture means the cleartext is readable by anyone who breaches any one perimeter layer or who already sits inside it with privilege. For a non-frontier or open-weight model this is a defensible trade. For weights that attract a nation-state, plaintext-in-use is the single largest residual exposure in the building, and no amount of network hardening closes it — only attestation-gated decryption into a TEE does. If your threat model includes OC4-OC5 and your weights are plaintext in HBM behind a perimeter, you have an SL2 building wearing an SL4 sticker. → Chapter 11.5 for the enclave; Chapter 11.9 for the insider it really defends against.

Weight protection is the apex of the security stack and it inherits from nearly all of it. The GPU TEE that makes attestation-gated in-use protection real — its internals, its performance tax, and its residual side-channels — is Chapter 11.5. The hardware root of trust, measured boot, and firmware integrity that the attestation chain stands on are Chapter 11.4. The network segmentation and zero-trust posture that bound the blast radius around a host holding cleartext weights are Chapter 11.7; the control-plane secrets and telemetry that operate the key hierarchy touch Chapter 10.6. The insider — the vector this chapter's hardest controls are really aimed at — is the whole of Chapter 11.9, and the governance and certification framing of all of it is Chapter 11.11. The checkpoint sizes and cadence that set the at-rest crypto bulk are Chapter 9.4; the data-governance and privacy regime that this chapter is explicitly not about is Chapter 10.10; and the crypto-agility/PQC roadmap for long-lived weights is Chapter 16.2.