Numbers Provenance Register
Every date-stamped figure in the guide — 1,420 entries, sourced and flagged where contested.
1,420 matches · showing 400
| Metric | Value | As of | Where | Source |
|---|---|---|---|---|
| rack power across the inflection: legacy → GB200 NVL72 (~132 kW) → Rubin Ultra Kyber (~600 kW, 2027 roadmap) | ~10–15 kW → 120–600 kW | 2026 | 0.1 | SemiAnalysis / NVIDIA roadmap |
| practical air-cooling ceiling per rack — the discontinuity that forces liquid and rewrites the building | ~41 kW | 2025 | 0.1 | ASHRAE TC 9.9; SemiAnalysis Datacenter Anatomy |
| inference share of AI compute in 2026 (½ in 2025, ⅓ in 2023); 80–90% of draw at large operators | ~2/3 | 2026 | 0.1 | Deloitte TMT Predictions 2026; McKinsey |
| US large-load grid interconnection lead time end-to-end; up to ~10 yr in the worst queues — the binding constraint | ~3–7+ yr | 2025 | 0.1 | ERCOT / PJM filings synthesis |
| HV/substation power transformer lead time (standard); up to ~60 months in constrained markets — often the schedule's long pole | ~128 wk | 2025 | 0.1 | Wood Mackenzie / pv magazine |
| global data center capex in 2026 (~21% CAGR through 2029; GPUs ~1/3 of capex) | approaching ~$1T | 2026 | 0.1 | Dell'Oro Group |
| cumulative global data center capex by 2030 (~$5.2T AI-capable) — the scale that makes mis-coordination catastrophic | ~$6.7T | 2025 | 0.1 | McKinsey, 'The cost of compute' |
| end-to-end electrical-chain efficiency, 800VDC/DC chain vs legacy AC (utility-to-VRM) — a system gain only co-design captures | >92% vs ~61–87.5% | 2025 | 0.1 | SemiAnalysis, Datacenter Anatomy Pt 1 |
| inference share of AI compute in 2026 (½ in 2025, ⅓ in 2023) — a fast-moving figure read as direction, not a fixed level | ~2/3 | 2026 | 0.2 | Deloitte TMT Predictions 2026 |
| global data center capex 2026, approaching — volatile market figure; analyst estimates differ by capex-scope definition | ~$1T | 2026 | 0.2 | Dell'Oro Group |
| per GB200 NVL72 rack (shipping, ~115 kW liquid + ~17 kW air) — a semi-durable hardware spec you can design against | ~132 kW | 2025 | 0.2 | NVIDIA OCP / Introl |
| per Rubin Ultra Kyber-class rack — marked roadmap/announced, not shipping; do not budget as a level | ~600 kW | 2027 (announced) | 0.2 | SemiAnalysis / NVIDIA roadmap |
| practical air-cooling ceiling per rack — a durable physics number, safe to treat as a hard constraint | ~41 kW | 2025 | 0.2 | ASHRAE TC 9.9 / SemiAnalysis |
| large-load grid interconnection lead time — volatile and region-dependent; up to ~10 yr in worst queues | 3–7+ yr | 2025 | 0.2 | ERCOT / PJM filings synthesis |
| GPU economic vs book life — flagged CONTESTED; run irreversible decisions across the range, not a point estimate | 2–3 yr vs 5–6 yr | 2026 | 0.2 | CNBC / SemiAnalysis synthesis |
| best-in-class vs industry-average training goodput — a GOODPUT-thread target, vendor-marketed upper bound | ~96% vs ~90% | 2025 | 0.2 | SemiAnalysis ClusterMAX / CoreWeave |
| industry-weighted average PUE, flat for a 6th year; best-in-class liquid 1.05-1.15 | ~1.54 | 2025 | 0.3 | Uptime Institute Global Data Center Survey 2025 |
| WUE range: industry avg ~1.8-1.9; best-in-class 0.3-0.7; closed-loop ~0 | ~0-1.9 L/kWh | 2025 | 0.3 | Vertiv / NREL synthesis; Microsoft FY2025 fleet ~0.30 |
| goodput (effective training time): industry average vs best-in-class | ~90% / ~96% | 2025 | 0.3 | SemiAnalysis ClusterMAX 2.0 / CoreWeave |
| scale-up (NVLink) domain size: HGX node, NVL72 rack, announced Rubin Ultra Kyber | 8 - 72 - 576 | 2026 | 0.3 | NVIDIA NVLink / Rubin platform roadmap |
| scale-up (NVLink5/GPU) vs scale-out (per-NIC) bandwidth — roughly 18x apart | ~1.8 TB/s vs ~400 Gb/s | 2025 | 0.3 | NVIDIA / SemiAnalysis |
| self-operated TCO at 2048-GPU scale, 90% util; ~$1.03-3.50 rented (contested — single-source) | ~$0.74/GPU-hr | 2025-2026 | 0.3 | SemiAnalysis H100 cost / rental analyses |
| inference cost per million tokens: self-hosted 70B worked example vs market average | ~$1.90-2.50 | 2025 | 0.3 | Introl / SemiAnalysis synthesis |
| Uptime Tier III vs Tier IV availability (~1.6 hr vs ~26 min downtime/yr) | 99.982% / 99.995% | 2025 | 0.3 | Uptime Institute (figures Uptime no longer formally endorses) |
| Uptime: concurrent maintainability vs fault tolerance; legacy ~99.982% (~1.6 h/yr) vs ~99.995% (~26 min/yr), now Uptime-disavowed | Tier III / IV | 2025 | 0.4 | Uptime Institute Tier Standard |
| TIA-942-C resilience scale; full-facility telecom + M&E standard, May 2024 (C) revision | Rated 1–4 | 2024 | 0.4 | ANSI/TIA-942-C |
| EN 50600 / ISO/IEC 22237 Availability Classes (+ Protection Classes); basis of the EU DC sustainability scheme | Class 1–4 | 2024 | 0.4 | CEN / ISO/IEC JTC 1 |
| ASHRAE TC 9.9 air classes and liquid W-classes (5th ed. + 2024 liquid-cooling resiliency addendum) | A1–A4 / W17–W45 | 2024 | 0.4 | ASHRAE TC 9.9 Thermal Guidelines |
| OCP Diablo 400 (Mt. Diablo) sidecar-power spec; ±400/800 VDC, ~100 kW to ~1 MW racks | v0.5.2 | May 2025 | 0.4 | OCP (Google/Meta/Microsoft) |
| FedRAMP 20x Key Security Indicators replacing 325+ NIST 800-53 controls; Phase 3 opens to all Q3 2026 | 56–61 KSIs | 2026 | 0.4 | FedRAMP PMO (RFC-0006) |
| ISO/IEC 42001 (first AI management-system standard) from publication to operationalized certification bodies | 2023 → 2026 | 2026 | 0.4 | ISO/IEC; ANAB/BSI accreditation |
| industry-weighted PUE (flat YoY) — the ISO/IEC 30134-2 KPI that lands in leases and disclosures | ~1.54 | 2025 | 0.4 | Uptime Institute Global DC Survey 2025 |
| published Tier III (~1.6 hr/yr down) vs Tier IV (~26 min/yr) availability — Uptime no longer endorses the specific % | 99.982% / 99.995% | 2025 | 0.5 | Uptime Institute Tier Standard |
| Tier IV capital premium over Tier III for the fault-tolerance step; total build often 2-3x in practice | 20-40% | 2026 | 0.5 | Uptime Institute; INGENIOUS.BUILD; market data |
| of impactful data-center outages root-caused to power (most often UPS); IT/networking ~23% | 45% | 2025 | 0.5 | Uptime Institute Annual Outage Analysis |
| of recent major outages cost over $100k / over $1M respectively | ~57% / ~20% | 2025 | 0.5 | Uptime Institute Global Survey |
| of human-error outages caused by staff not following procedures (up 10 pts YoY) — process, not topology | 58% | 2025 | 0.5 | Uptime Institute Annual Outage Analysis |
| best-in-class H100 cluster failure rate; one failure restarts a synchronous job from checkpoint | ~1 failure / 512 GPUs / week | 2025 | 0.5 | SemiAnalysis (100k H100 clusters) |
| training goodput: industry average vs best-in-class; reliability overhead 6-21% of TCO | ~90% / ~96% | 2025 | 0.5 | SemiAnalysis ClusterMAX / CoreWeave |
| data-center load tripped on a single 230 kV fault, triggering a rare NERC Level 3 alert — a grid-scale blast radius | ~1,500 MW | 2026 | 0.5 | NERC / Utility Dive |
| per GB200 NVL72 rack (≈132 kW typical: ~115 kW liquid + ~17 kW air) | 120–140 kW | 2025 | 1.1 | NVIDIA GB200 NVL72 / HPE & Supermicro datasheets |
| per Rubin Ultra Kyber NVL576 rack on 800 VDC | ~600 kW | H2 2027 (announced) | 1.1 | NVIDIA GTC (Jensen Huang); DCD, Tom's Hardware |
| practical air-cooling ceiling per rack; RDHx ~50–100 kW; DLC 200+ kW | ~41 kW | 2025 | 1.1 | ASHRAE TC 9.9; SemiAnalysis Datacenter Anatomy |
| inference share of AI compute in 2026 (½ in 2025, ⅓ in 2023); 80–90% of draw at large operators | ~2/3 | 2026 | 1.1 | Deloitte TMT Predictions 2026; McKinsey |
| active generation + storage in US interconnection queues (end-2024; ~twice US installed capacity); large-load waits 4–7 yr in top hubs | ~2,290 GW | end-2024 | 1.1 | LBNL, Queued Up 2025 Edition |
| all-in cost per 8-GPU H100 server (excl. storage); ~$31k/GPU/yr enterprise all-in | $283–318k | 2025 | 1.1 | SemiAnalysis AI Neocloud Playbook |
| TCO at 2048-GPU scale, 90% utilization; ~$1.03 small clusters; cloud H100 ~$1.49 (contested — single-source) | ~$0.74/GPU-hr | 2025 | 1.1 | SemiAnalysis H100 cost/rental analyses |
| accelerated economic life vs 5–6 yr book life; used GPUs retain ~20–40% residual after 3 yr | 2–3 yr | 2025 | 1.1 | Goldman Sachs; CNBC/secondary-market analyses |
| per dense training rack (GB200 NVL72 ~120–132 kW; GB300 ~142 kW) | 120–142 kW | 2025 | 1.2 | NVIDIA OCP / SemiAnalysis / Introl |
| per Rubin Ultra Kyber NVL576 rack on 800 VDC (announced roadmap) | ~600 kW | 2027 (announced) | 1.2 | NVIDIA GTC; SemiAnalysis 800 VDC |
| practical air-cooling ceiling/rack; RDHx ~50–100 kW; DLC 200+ kW | ~41 kW | 2025 | 1.2 | ASHRAE TC 9.9 / SemiAnalysis |
| GB200 NVL72 DLC inlet & flow; deviation throttles GPUs up to ~50% | 20–25 °C / ~80 L/min | 2025 | 1.2 | NVIDIA OCP / Introl |
| training back-end fabric non-blocking; 2:1 'optimized' cuts back-end cost ~31% (contested — single-source) | 1:1 vs 2:1 | 2025 | 1.2 | SemiAnalysis AI Neocloud Playbook |
| NVLink5 per-GPU BW (1.8 TB/s) vs ~400G scale-out NIC — keep collectives in scale-up | ~18x | 2025 | 1.2 | NVIDIA / SemiAnalysis |
| unplanned interruptions on 16,384 H100s (~1 / 3 hr); 78% hardware-caused | 419 / 54 days | 2024 | 1.2 | Meta Llama 3 paper (Table 5) |
| best-in-class mature H100 cluster MTBF; one failure restarts a synchronous job | ~7 days / 512 GPUs | 2025 | 1.2 | SemiAnalysis 100k-H100 clusters |
| training goodput: industry average / best-in-class; reliability overhead 6–21% of TCO | ~90% / ~96% | 2025 | 1.2 | SemiAnalysis ClusterMAX / CoreWeave |
| inference share of AI compute in 2026 (½ in 2025, ⅓ in 2023); 80-90% of draw at large operators | ~2/3 | 2026 | 1.3 | Deloitte TMT Predictions 2026 |
| AI inference capacity to 2030 (~35% CAGR) vs training 23.1 → 62.2 GW (~22%) | 20.9 → 93.3 GW | 2026 | 1.3 | McKinsey, 'The next big shifts in AI workloads' |
| market for inference-optimized chips in 2026; most inference stays in data centers, not at the edge | >$50B | 2026 | 1.3 | Deloitte TMT Predictions 2026 |
| power-oversubscription headroom: inference (uncorrelated per-request peaks) vs training (synchronous peaks) | ~21% vs ~3% | 2026 | 1.3 | Uptime Institute Journal; arXiv power-profile studies |
| inference fabric oversubscription (vs 1:1 non-blocking for training); 2:1 cuts back-end cost ~31% (contested — single-source) | 2:1-3:1 | 2025 | 1.3 | SemiAnalysis AI Neocloud Playbook; Juniper |
| HBM3E per Ironwood TPU v7 (inference-era ASIC); 9,216-chip pods, 42.5 FP8 ExaFLOPS, 4,614 FP8 TFLOPS/chip | 192 GiB / 7.4 TB/s | 2025 | 1.3 | Google Cloud; SemiAnalysis |
| self-hosted vs market-avg inference cost per million tokens; ~10x/yr token-price deflation (LLMflation) | ~$1.90 → ~$2.50/M tok | 2025 | 1.3 | Introl / NVIDIA synthesis; a16z |
| inference uptime target (99.995%) vs training's checkpoint-tolerant N/N+1 posture | Tier IV ~26 min/yr | 2025 | 1.3 | Uptime Institute (Tier classes) |
| of wall-clock spent on rollout generation in agentic/reasoning RL post-training | ~80% | 2026 | 1.4 | 2025–2026 RL-systems papers (ROLL Flash, ROLLART) & Introl RLHF infra report |
| of compute consumed by rollouts at 16K-token generation length (RLVR long-CoT) | ~70% | 2025 | 1.4 | RLVR / long-CoT RL-systems analyses (arXiv) |
| tokens per RL trajectory for reasoning/agentic tasks — the rollout that dominates cost | 10K–100K+ | 2026 | 1.4 | domain-research keyNumbers; reasoning-model RL reports |
| wall-clock speedup of variance-controlled async RL vs synchronous at equal accuracy (~42h vs ~105h) | 2.5x | 2026 | 1.4 | Stable Asynchrony / VCPO (arXiv 2602.17616) |
| just to hold weights for a 70B PPO-RLHF stack (actor + reference + reward + critic), pre-optimizer | 8–16 GPUs | 2025 | 1.4 | Introl RLHF infrastructure report |
| QLoRA fine-tune on a single 48 GB GPU; memory cut from >780 GB to <48 GB without quality loss | 65B on 48 GB | 2023 | 1.4 | QLoRA (Dettmers et al., arXiv 2305.14314) |
| share of parameters trained by a LoRA adapter vs full fine-tune (model-dependent) | ~0.1% | 2026 | 1.4 | LoRA (Hu et al.) / 2026 PEFT practitioner guides |
| GPU:CPU norm rebalancing toward more CPU per node as agentic RL adds rollout/tool/env load | from 8:1 | 2026 | 1.4 | domain-research (System Composition); SemiAnalysis |
| one-way fiber latency from distance alone (~5 ms per 1,000 km); ~1.64 ms RT per 100 mi before any processing | ~0.82 ms / 100 mi | 2025 | 1.5 | M2 Optics fiber-latency analysis (≈2/3 c in glass) |
| MEC round-trip at the access edge; under ~50 ms from a regional 5G URLLC breakout | sub-10 ms | 2025 | 1.5 | ETSI ISG MEC; arXiv 2504.03708 (telco-LLM latency) |
| perceptibility thresholds: hard real-time / interactive (AR-VR, agentic) / 'instant' conversational | ~30 / 50 / 100 ms | 2026 | 1.5 | Spheron hybrid edge guide; AR/VR latency literature |
| edge data center market, 2026 to 2033, ~14.9% CAGR; AI/ML inference the fastest-growing segment | ~$40B → ~$106B | 2026 | 1.5 | Grand View Research; Coherent Market Insights |
| micro data centers' share of the edge market (global 2025) / of US edge by 2026 | ~35% / ~54% | 2026 | 1.5 | Grand View Research; Coherent Market Insights (US) |
| inference share of AI compute in 2026 (½ in 2025); the growth pool the edge competes for | ~2/3 | 2026 | 1.5 | Deloitte TMT Predictions 2026 |
| edge-site deploy time and install-time reduction under zero-touch provisioning (Vapor IO; ZTP fleet tooling) | ~1 hr / 90%+ | 2026 | 1.5 | Vapor IO; Scale Computing / VMware VCF Edge |
| practical power envelope per edge micro-site (vs ~132 kW for a centralized NVL72 rack) | a few kW – ~30 kW | 2026 | 1.5 | research/domain-research.json; practitioner ranges |
| time-to-power: greenfield self-build vs wholesale colo (live 50k+ GPU cluster) vs neocloud | 24–36 mo / 6–12 mo / days–weeks | 2026 | 1.6 | SemiAnalysis; JLL 2026 Outlook; Introl |
| brownfield retrofit cost: cooling-only vs full AI retrofit; ~2/3 of pre-2015 DCs unsuitable for frontier density | $2–3M / $5–10M per MW | 2025 | 1.6 | Introl / Tetra Tech / Schneider synthesis |
| global wholesale colo average 2025 (record); ~$120 Atlanta to ~$450 Singapore; ~1% vacancy | ~$217/kW-month | 2025 | 1.6 | JLL / CBRE synthesis |
| self-build TCO at 2,048-GPU scale, 90% utilization (~$1.03 small clusters) vs neocloud median ~$2.3–3.5/hr (contested — single-source) | ~$0.74/GPU-hr | 2025 | 1.6 | SemiAnalysis cost / H100 rental analyses |
| neocloud GPU rental vs hyperscaler pricing (8-GPU node ~$34/hr neocloud vs ~$98/hr hyperscaler) | 40–85% below | 2026 | 1.6 | SemiAnalysis H100 Index / AM Compute |
| rise in the 1-year H100 rental contract index, Oct 2025 to Mar 2026, as capacity tightened; on-demand largely sold out | ~+40% | 2026 | 1.6 | SemiAnalysis H100 Rental Index |
| breakeven utilization for a debt-financed cluster; swings -$330k to +$340k/mo (55% vs 85%) on a 1,024-GPU H100 build (contested — single-source) | ~70% | 2025 | 1.6 | AM Compute / McKinsey |
| US large-load grid interconnection lead time end-to-end; up to ~10 yr in worst queues — the gate behind self-build | ~3–7+ yr | 2026 | 1.6 | LBNL Queued Up; ERCOT / PJM filings |
| practical air-cooling ceiling per rack; RDHx ~50–100 kW; DLC 100–200 kW+ | ~41 kW | 2025 | 1.7 | ASHRAE TC 9.9; SemiAnalysis Datacenter Anatomy |
| per GB200 NVL72 rack (~115 kW liquid + ~17 kW air); GB300 ~142 kW; Rubin Ultra Kyber ~600 kW | 120–132 kW | 2026 | 1.7 | NVIDIA OCP / SemiAnalysis roadmap |
| GB200 NVL72 DLC inlet & flow; deviation can throttle GPUs up to ~50% | 20–25 °C / ~80 L/min | 2025 | 1.7 | NVIDIA OCP / Introl |
| training non-blocking vs inference oversubscribed; 2:1 cuts back-end cost ~31% (contested — single-source); Meta ran 7:1 on 24k H100 | 1:1 vs 2:1–3:1 | 2025 | 1.7 | SemiAnalysis AI Neocloud Playbook / Meta |
| GPU:CPU ratio shifting from training-era norm toward agentic-inference host demand | ~8:1 → 4–8:1 | 2026 | 1.7 | TrendForce Insights; Introl |
| full AI liquid retrofit cost crossing the cooling cliff; still strands capacity | ~$5–10M/MW | 2026 | 1.7 | Introl / Vera Rubin deployment analysis |
| ~1.6 hr/yr vs ~26 min/yr downtime; Tier IV ~20–40% capital premium | Tier III 99.982% / Tier IV 99.995% | 2025 | 1.7 | Uptime Institute |
| goodput (effective training time): industry avg vs best-in-class; reliability overhead 6–21% of TCO | ~90% / ~96% | 2025 | 1.7 | SemiAnalysis ClusterMAX / CoreWeave |
| 1 GW AI data center: total-program capex (core stack ~$27.9/W plus land, build-out, financing) and all-in annual TCO (~$8.5M/MW-yr) | ~$38B / ~$8.5B/yr | 2026 | 1.8 | Epoch AI, AI datacenter cost breakdown |
| 1 GW annual TCO at 3-yr / 5-yr / 7-yr IT useful life — the dominant lever | $12B / $8.5B / $7B | 2025 | 1.8 | Epoch AI / AM Compute synthesis |
| self-operated TCO at 2048-GPU scale, 90% util; ~$1.03 small clusters (contested — single-source) | ~$0.74/GPU-hr | 2025 | 1.8 | SemiAnalysis, GPU cluster cost |
| breakeven utilization (debt-financed); 1,024-GPU cluster swings -$330k to +$340k/mo (contested — single-source) | ~70% | 2025 | 1.8 | AM Compute / McKinsey |
| LLMflation: inference cost decline at fixed quality (Epoch: ~50x/yr median) | ~10x/yr | 2024-2026 | 1.8 | a16z; Epoch AI |
| AI-app gross margin vs 70-90% for mature SaaS | ~41% to ~52% | 2026 | 1.8 | ICONIQ State of AI 2026; Bessemer |
| wholesale colo global avg 2025; BTS/CTL ~$150-220/kW-mo over 15 yr | ~$217/kW-mo | 2025 | 1.8 | JLL / CBRE synthesis |
| estimated understated AI D&A 2026-2028 (CONTESTED); industry AI D&A ~$400B/yr | ~$176B | 2026 | 1.8 | Burry / secondary analyses; filings |
| AI/HPC scheduler share: Slurm / Kubernetes / in-house (rule of thumb) | ~70% / ~20% / ~10% | 2026 | 10.1 | HPCwire, ‘Slurm vs Kubernetes in the Age of AI’; ClusterMAX |
| GPU-cloud customers using K8s for inference vs Slurm for training | ~90% K8s / ~50% Slurm | 2025 | 10.1 | SemiAnalysis ClusterMAX |
| Kubernetes Dynamic Resource Allocation graduated to stable (Sept 2025) | GA in 1.34 | 2025 | 10.1 | Kubernetes blog, ‘v1.34: DRA has graduated to GA’ |
| reported GPU utilization, device plugins vs DRA (better packing/sharing) | 45–60% → 70–85% | 2026 | 10.1 | Red Hat / vendor DRA analyses |
| goodput (effective-training-time): industry avg vs best-in-class | ~90% / ~96% | 2025 | 10.1 | SemiAnalysis ClusterMAX / CoreWeave |
| failure interval for a 16k-GPU cluster at ~80,000-hr per-GPU MTBF | ~every 3 hr | 2025 | 10.1 | Meta Llama 3 / domain reliability math |
| demonstrated scale of Slurm-on-Kubernetes (Slinky) side-by-side workloads | 8,000+ GPUs | 2025 | 10.1 | NVIDIA Developer, Slinky / slurm-bridge blog |
| NVIDIA open-sourced KAI Scheduler (gang, fair-share, DRA, topology) | Apache-2.0, Apr 2025 | 2025 | 10.1 | NVIDIA / KAI-Scheduler GitHub |
| Bartz v. Anthropic settlement — largest US copyright payout; ~$3,000 per work across ~500,000 works; pirated copies ordered destroyed | $1.5B | 2025 | 10.10 | Bartz v. Anthropic (N.D. Cal.); Authors Guild; Fortune |
| Italian Garante fine on OpenAI for training ChatGPT without adequate legal basis + transparency failures; plus a 6-month awareness campaign | €15M | Dec 2024 | 10.10 | Garante per la protezione dei dati personali |
| EU AI Act GPAI obligations apply to new models; training-content summary template (Commission) mandatory | Aug 2, 2025 | 2025 | 10.10 | European Commission; EU AI Act |
| Deadline for pre-existing (placed before Aug 2025) GPAI models to publish their training-content summary | Aug 2, 2027 | 2025 | 10.10 | European Commission; Mayer Brown analysis |
| ChatGPT conversation logs OpenAI was ordered to produce in NYT v. OpenAI discovery — output logs deemed relevant to fair-use defense | 20M logs | 2025 | 10.10 | NYT v. OpenAI (S.D.N.Y.); Bloomberg Law |
| EDPB legitimate-interest assessment for AI training (interest, necessity, balancing); high bar to claim a trained model is anonymous | 3-step test | Dec 2024 | 10.10 | EDPB Opinion 28/2024 |
| UK High Court: AI model weights are not an infringing 'copy' under the CDPA (statistical parameters, not stored images) | weights ≠ copy | Nov 2025 | 10.10 | Getty Images v. Stability AI (UK High Court) |
| EU DSM TDM exception is opt-out by default — crawlers must honor machine-readable reservations (robots.txt / TDM Reservation Protocol) | opt-out | 2025 | 10.10 | EU DSM Directive 2019/790, Art. 4; AI Act Code of Practice |
| TPOT roughly matching human reading speed (~20-25 tok/s); common interactive decode target | ~40-50 ms | 2025 | 10.11 | Practitioner consensus; NVIDIA / vLLM serving guides |
| inference share of AI compute in 2026 (½ in 2025, ⅓ in 2023); 80-90% of draw at large operators | 2/3 (~66%) | 2026 | 10.11 | Deloitte TMT Predictions 2026; McKinsey |
| market-average self-hosted inference cost, fell ~$10→~$2.50 in a year; worked example ~$1.90/M (8xH100, Llama-70B FP16) | ~$2.50/M tok | 2025 | 10.11 | Introl / NVIDIA synthesis (via provenance) |
| goodput gain from hybrid aggregation/disaggregation over SOTA when both TTFT and TPOT bind | up to ~77% | 2025 | 10.11 | TaiChi (arXiv 2508.01989); see also FlowKV/HexGen-2 |
| inference back-end fabric oversubscription (training is 1:1 non-blocking); 2:1 cuts back-end cost ~31% (contested — single-source) | 2:1-3:1 | 2025 | 10.11 | SemiAnalysis AI Neocloud Playbook |
| best-in-class vs industry-average goodput (training framing); reliability overhead 6-21% of TCO | ~96% / ~90% | 2025 | 10.11 | SemiAnalysis ClusterMAX 2.0 / CoreWeave |
| runtime-reconfigurable disaggregation: x prefill workers feeding y decode workers, re-balanced live | xPyD | 2026 | 10.11 | NVIDIA Dynamo / TensorRT-LLM disaggregated serving docs |
| GB200 NVL72 coherent NVLink domain — the rack-scale block the scheduler treats as atomic | 72 GPUs / 130 TB/s | 2025 | 10.2 | NVIDIA GB200 NVL72 / NVLink product page |
| NVLink 5 per-GPU bidirectional bandwidth (scale-up); ~3.6 TB/s on Rubin (roadmap) | ~1.8 TB/s | 2026 | 10.2 | NVIDIA NVLink |
| scale-up (NVLink) vs scale-out (~400G NIC) per-GPU bandwidth — the cliff the scheduler defends | ~5–18x | 2025 | 10.2 | NVIDIA / SemiAnalysis |
| Slurm block size for one NVL72 NVLink domain in topology.yaml (topology/block plugin) | 18 nodes | 2025 | 10.2 | NVIDIA Developer — Slurm block scheduling on GB200 NVL72 |
| max IMEX channels (ComputeDomains) per node in Kubernetes DRA — strands partial-node GPUs | 1 / node | 2025 | 10.2 | NVIDIA Developer — MNNVL on Kubernetes |
| minimum Kubernetes with DRA APIs enabled for ComputeDomains; GPU Operator 25.3+ | K8s 1.32+ | 2025 | 10.2 | NVIDIA / AWS EKS GB200 guidance |
| share of Llama-3 training job interruptions traced to network/config issues — the cost of getting topology wrong | ~10.7% | 2024 | 10.2 | Meta (via Introl topology analysis) |
| training goodput, industry average vs best-in-class — topology-aware placement is a lever on the gap | ~90% → ~96% | 2025 | 10.2 | SemiAnalysis ClusterMAX / CoreWeave |
| MIG instances per GPU (B200/GB200): 2×~93GB, 4×~46GB, or 7×~23GB profiles | up to 7 | 2025 | 10.3 | NVIDIA MIG User Guide (r580); MIG supported-profiles docs |
| HBM per Blackwell GPU available to partition across tenants (B200/GB200 class) | 180–192 GB | 2026 | 10.3 | NVIDIA Blackwell datasheets; provenance.js HBM trajectory |
| NVIDIAScape (CVE-2025-23266) Container Toolkit escape — container-to-host on shared GPU nodes | CVSS 9.0 | 2025 | 10.3 | Wiz Research; NVIDIA security bulletin |
| CVE-2025-23290 — first acknowledged cross-VM GPU-metric leak via vGPU Manager (co-tenant side channel) | CVSS 2.5 | 2025 | 10.3 | NVIDIA security bulletin; Tenable |
| Slurm vs Kubernetes share of AI clusters — the two quota/fairness enforcement planes operators must master | ~70% / ~20% | 2026 | 10.3 | HPCwire, 'Slurm vs Kubernetes in the Age of AI' |
| inference share of AI compute in 2026 — the workload class that most rewards fractional/MIG sharing | ~2/3 | 2026 | 10.3 | Deloitte TMT Predictions 2026; McKinsey |
| accelerated GPU economic life — the depreciation clock that makes reclaiming idle silicon urgent | 2–3 yr | 2025 | 10.3 | Goldman Sachs; secondary-market analyses |
| current CUDA Toolkit (released May 2026); paired with R580 LTSB data-center driver | CUDA 13.3 | mid-2026 | 10.4 | NVIDIA CUDA Toolkit Release Notes; Data Center Driver docs |
| current NVIDIA data-center LTS driver branch; ~3-yr lifecycle, EOL ~Aug 2028 | R580 | mid-2026 | 10.4 | NVIDIA Data Center Drivers; AI Enterprise lifecycle policy |
| AMD stack with RCCL NCCL-API parity; MI350X/MI355X support (7.0 Sep 2025) | ROCm 7.x | 2025-2026 | 10.4 | AMD ROCm 7.0 release notes & compatibility matrix |
| NVLink-SHARP Multimem multicast + symmetric-memory kernels within an NVL72 domain | NCCL 2.28+ | 2026 | 10.4 | NVIDIA NCCL release notes; GitHub releases |
| GPU SMs consumed by a reduction after composing NVLink-SHARP + IB-SHARP in NCCL 2.27 | ~16 → ≤6 SMs | 2025 | 10.4 | NVIDIA Developer (NCCL 2.27); SHARP in-network computing |
| NCCL all_reduce busbw vs theoretical (acceptance gate); ≈370 GB/s on 400G NDR | ~92% | 2025 | 10.4 | NVIDIA DGX BasePOD NCCL validation; OCI/Together AI |
| lower hardware cost for AMD vs NVIDIA — the prize that funds the ROCm tax | 15-30% | 2026 | 10.4 | domain-research keyNumbers; SemiAnalysis AMD vs NVIDIA |
| failure cadence of a 16k-GPU cluster (Llama 3: 419 unplanned/54 days) the stack must absorb | every ~3 hr | 2024 | 10.4 | Meta Llama 3 405B disclosure |
| of fabric line rate is the NCCL all_reduce acceptance bar (~370 GB/s on a 400 GB/s fabric) | ~92% | 2025 | 10.5 | Together AI — Practitioner's Guide to Testing Large GPU Clusters |
| typical burn-in soak before a new cluster is admitted to production | 72–168 hr | 2025 | 10.5 | Introl validation frameworks; neocloud operator reports |
| mean time between failures for a 16,000-GPU cluster — why provisioning is a continuous day-2 loop, not a one-time event | ~3 hr | 2024 | 10.5 | Meta Llama 3 (16,384 H100); ~80,000-hr per-GPU MTBF |
| MTBF per 512 GPUs at a top-tier H100 operator; new clusters fail far more during 3–4 week burn-in | ~7 days | 2025 | 10.5 | SemiAnalysis (100k H100 clusters) |
| automated node replacement on a best-in-class fleet — the day-2 lifecycle target | ~90 sec | 2026 | 10.5 | SemiAnalysis AI Neocloud Playbook / ClusterMAX |
| to provision 128 GPUs to a customer at a top-rated neocloud — the bring-up-as-competitive-lever benchmark | <2 days | 2026 | 10.5 | SemiAnalysis ClusterMAX 2.0 |
| goodput (effective-training-time) achievable despite ~3-hr cluster MTBF, given automated validation + recovery | >90% | 2025 | 10.5 | Google Cloud goodput; NVIDIA Mission Control |
| revenue per GW per year — the depreciation clock that makes time-from-rack-to-first-job a million-dollar-per-week metric (contested — single-source) | $10–12B | 2026 | 10.5 | Domain synthesis; SemiAnalysis |
| unplanned interruptions on 16,384 H100s (~1 every 3 hr); 78% hardware, 58.7% GPU-related | 419 / 54 days | 2024 | 10.6 | Meta (Llama 3 paper) / Tom's Hardware |
| MTBF per 512 GPUs at a best-in-class mature H100 operator (new clusters fail far more) | ~7 days | 2025 | 10.6 | SemiAnalysis (100k H100 clusters) |
| machines harboring an SDC-prone defect; SDC expected every 1-2 weeks in large training | ~1 in 1,000 | 2025 | 10.6 | Meta Engineering; OCP SDC-in-AI whitepaper |
| SDC test seeds per month across Meta's fleet (Fleetscanner + Ripple) | ~2.5 billion | 2025 | 10.6 | Meta Engineering (How Meta keeps AI hardware reliable) |
| industry-average vs best-in-class goodput (effective training time) | ~90% / ~96% | 2025 | 10.6 | SemiAnalysis ClusterMAX / CoreWeave |
| large-LLM-job failure rate (~37% hardware-attributed; ~73% recoverable via restart) | ~43.4% | 2024 | 10.6 | Alibaba (Unicron) via SemiAnalysis |
| reliability/recovery overhead — the cost the observability loop exists to shrink | 6-21% of TCO | 2025 | 10.6 | SemiAnalysis ClusterMAX 2.0 |
| MTTR achievable with multi-tier checkpointing vs 15-30 min naive restart | <2 min | 2025 | 10.6 | Google Cloud (multi-tier checkpointing) |
| mean-time-to-failure of a 1,024-GPU job vs 47.7 days for an 8-GPU job — the single-point-of-failure penalty of scale | 7.9 hr | 2025 | 10.7 | Meta, Revisiting Reliability in Large-Scale ML Clusters (arXiv 2410.21680) |
| failures per thousand node-days on Meta's RSC-1 cluster (11 months, ~80%+ utilization) | 6.50 / 1000 | 2025 | 10.7 | Meta, Revisiting Reliability (arXiv 2410.21680) |
| unplanned interruptions on 16,384 H100s during Llama 3 405B (~1 every 3 hr); 78% hardware, 58.7% GPU-related | 419 / 54 days | 2024 | 10.7 | Meta (Llama 3 paper) / Tom's Hardware |
| checkpoint-and-restart overhead required to hold ETTR ~0.9 on a 100,000-GPU run at RSC-2-like failure rates | ~2 min | 2025 | 10.7 | Meta, Revisiting Reliability (arXiv 2410.21680) |
| best-in-class MTBF per 512 GPUs on a mature 100k-H100 cluster (burn-in 3–4 weeks first) | ~7 days | 2025 | 10.7 | SemiAnalysis, 100k H100 Clusters |
| large-LLM-job failure rate; ~37% hardware-attributed; ~73% recoverable via restart | 43.4% | 2024 | 10.7 | Alibaba Unicron production study |
| goodput (effective training time): industry average / best-in-class; reliability overhead 6–21% of TCO | ~90% / ~96% | 2025 | 10.7 | SemiAnalysis ClusterMAX / CoreWeave |
| training restart latency, storage-only vs multi-tier/in-memory checkpointing | 15–30 min → <2 min | 2025 | 10.7 | Google Cloud multi-tier checkpointing |
| typical model-FLOPS-utilization (MFU) for large LLM training; best-in-class >50% on Hopper | ~30–50% | 2025 | 10.8 | SemiAnalysis; provenance.js (domain economics) |
| BF16 MFU gain on GB200 NVL72 from software/kernel maturation over ~12 months (≈57% throughput from software alone) | 34% → 54% | 2025 | 10.8 | SemiAnalysis (H100 vs GB200 NVL72 training benchmarks) |
| BF16 MFU achieved pre-training Llama 3 on 16k H100s (frontier-scale reference point) | ~41% | 2024 | 10.8 | Meta, The Llama 3 Herd of Models |
| training-state footprint with Adam (4 weight + 4 grad + 8–12 optimizer); the number the framework must sleep across the fleet | ~16–18 B/param | 2025 | 10.8 | Standard mixed-precision Adam accounting; DeepSpeed/ZeRO docs |
| rule-of-thumb checkpoint size on disk (weights + optimizer state); sets async-drain bandwidth need | ~14 B/param | 2025 | 10.8 | VAST Data (checkpoint bandwidth analysis) |
| training goodput: industry average vs best-in-class effective-training-time fraction | ~90% / ~96% | 2025 | 10.8 | SemiAnalysis ClusterMAX / CoreWeave; provenance.js |
| large-LLM-job failure rate in a production fleet (~37% hardware-attributed; ~73% restart-recoverable) — why elastic orchestration matters | ~43.4% | 2024 | 10.8 | Alibaba Unicron; provenance.js |
| MTBF per 512 GPUs at a top-tier operator; one failure restarts a synchronous job from its last checkpoint | ~7 days | 2025 | 10.8 | SemiAnalysis (100k H100 clusters); provenance.js |
| goodput (effective training time): industry average vs best-in-class marketed; reliability overhead 6-21% of TCO | ~90% / ~96% | 2025 | 10.9 | SemiAnalysis ClusterMAX / CoreWeave |
| ClusterMAX 2.0 GPU-cloud rating: Security, Lifecycle, Orchestration, Storage, Networking, Reliability, Monitoring, Pricing, Partnerships, Availability — Platinum to UnderPerform | 10 dimensions / 5 tiers | 2025 | 10.9 | SemiAnalysis ClusterMAX 2.0 |
| breakeven utilization for a debt-financed fleet; the cliff the on-demand/take-or-pay mix transfers or carries (contested — single-source) | ~70% | 2025 | 10.9 | AM Compute / McKinsey |
| serverless GPU time-to-first-token (H100): warm-pool vs scale-from-zero; snapshots claim ~10x cold-start gains | 8-15s warm / 30-90s cold | 2026 | 10.9 | RunPod / Modal serverless comparisons |
| spot/preemptible discount vs on-demand; the price of transferring interruption risk to the customer | ~60-80% off | 2026 | 10.9 | Spheron / GCP GPU pricing synthesis |
| H100 on-demand ladder: spot floor to Azure managed; neocloud median ~$2.29-3.50 (the value-stack premium, monetized) | ~$1.03 - $12.29/GPU-hr | 2026 | 10.9 | SemiAnalysis H100 Index / AM Compute |
| Tier III vs Tier IV facility availability (~1.6 hr vs ~26 min/yr) — the easy SLA, distinct from goodput | 99.982% / 99.995% | 2025 | 10.9 | Uptime Institute |
| RAND Weights Security Levels and adversary operational-capacity tiers; 38 distinct attack vectors enumerated | SL1–SL5 / OC1–OC5 | 2024 | 11.1 | RAND, Securing AI Model Weights (RRA2849-1) |
| attack vectors infeasible for OC1–OC3 but feasible for OC4–OC5 — why nation-state defense is categorically harder | 8 of 38 | 2024 | 11.1 | RAND RRA2849-1 |
| assessed posture of frontier labs vs OC4–OC5 adversaries that want the weights — the central gap | ~SL2–SL3 | 2026 | 11.1 | RAND RRA2849-1; IFP / IST SL5 Task Force |
| allocation-constrained silicon per GB200 NVL72 rack — theft/destruction economics differ from generic cloud hardware | ~$3M+ | 2025 | 11.1 | Guide domain research; OEM rack pricing |
| RAND theft-window benchmark: a Security Level is defined by thwarting weight theft within roughly this horizon | <2 months | 2024 | 11.1 | RAND RRA2849-1 |
| IRGC drones struck 3 AWS facilities in UAE/Bahrain — first deliberate state targeting of commercial data centers in wartime | 1 Mar 2026 | 2026 | 11.1 | CNBC; The Conversation |
| projected data-center physical-security spend by 2030 (roughly doubling) as kinetic/drone threats enter the planning case | ~$4B | 2030 (proj.) | 11.1 | Guide domain research; industry security forecasts |
| concentric physical model (perimeter → facility → data hall → cage/rack) with escalating MFA at each boundary | 4 zones | 2026 | 11.1 | NIST / DCK physical-security guidance |
| of organizations run BMS affected by known-exploited vulnerabilities (KEVs); data centers the worst case | 75% | 2025 | 11.10 | Claroty Team82, State of CPS Security 2025: BMS Exposures |
| of organizations exposed to KEVs that are ransomware-linked AND insecurely internet-connected | 51% | 2025 | 11.10 | Claroty Team82, State of CPS Security 2025 |
| heat lost in Lviv via FrostyGoop — Modbus firmware downgrade, no zero-day | 600 buildings / ~2 days | 2024 | 11.10 | Dragos / CISA / SANS |
| data-center load lost instantaneously on a single 230-kV fault (the weaponizable swing) | ~1,500 MW | 2024 | 11.10 | NERC Level 3 Alert / Utility Dive |
| load shed in a single Virginia event — the synchronized-load-step primitive, unweaponized | 1.5 GW / 82 s | 2024 | 11.10 | NERC / Utility Dive |
| time from CDU flow-loss to GB200 NVL72 throttle/over-temp trip (no chilled-water inertia) | seconds-tens | 2025 | 11.10 | NVIDIA OCP DLC spec / Introl |
| IEC 62443 security levels by attacker capability; destructive primitives are SL3-SL4 | SL1-SL4 | 2025 | 11.10 | ISA/IEC 62443 |
| Modbus TCP — unauthenticated by design; the protocol FrostyGoop and most BMS/CDU controllers speak | Port 502 | 2024 | 11.10 | Dragos / SANS ICS |
| FedRAMP 20x Key Security Indicators (Low / Moderate baseline) — automated, measurable outcomes replacing control-narrative essays | 56 / 61 | 2026 | 11.11 | FedRAMP PMO RFC-0006 |
| RFC-0024 deadline: machine-readable (OSCAL) packages mandatory for all FedRAMP providers | Sep 2026 | 2026 | 11.11 | FedRAMP PMO RFC-0024 |
| CMMC Level 2 third-party certification becomes mandatory for CUI-handling DoD contracts (Phase 2) | 10 Nov 2026 | 2025-2026 | 11.11 | DoD 48 CFR final rule |
| EU AI Act full enforcement powers activate; most high-risk obligations apply; fines up to 7% global turnover | 2 Aug 2026 | 2026 | 11.11 | European Commission |
| EU / North American enterprise AI-vendor RFPs asking for ISO 42001 certification or implementation | ~40% / ~25% | 2026 | 11.11 | Industry RFP analyses |
| OCP S.A.F.E. accredited Security Review Providers (Atredis, IOActive, NCC Group) for firmware-security conformance audits | 3 SRPs | 2025-2026 | 11.11 | Open Compute Project |
| single-event load loss that pushed NERC to treat large AI loads as grid actors subject to CIP-adjacent scrutiny | ~1,500 MW | 2026 | 11.11 | NERC Level 3 Alert / Utility Dive |
| ISO 27001 / 42001 certification validity with annual surveillance audits; SOC 2 Type II re-issued every 6–12 mo | 3 yr | 2026 | 11.11 | ISO; AICPA |
| mean time to identify + contain a breach in 2025 (lowest in 9 years) — the dwell window your retention must outlast | 241 days | 2025 | 11.12 | IBM Cost of a Data Breach 2025 |
| average breach cost: global down 9% to $4.44M; US at an all-time high of $10.22M | $4.44M / $10.22M | 2025 | 11.12 | IBM Cost of a Data Breach 2025 |
| NIST IR guidance restructured onto CSF 2.0 (Govern/Identify/Protect/Detect/Respond/Recover); first revision since 2012 | SP 800-61r3 | Apr 2025 | 11.12 | NIST SP 800-61 Rev. 3 |
| GPU registers hidden by the BAR0 decoupler in confidential mode (vs ~7.94% normal) — the forensic opacity the SOC works around | ~99.78% | 2025 | 11.12 | NVIDIA WP-12554 / arXiv 2507.02770 |
| certificate device-identity chain and structured measurement records (NRAS/RIM) that become the forensic record on confidential systems | 5 / 64 | 2026 | 11.12 | NVIDIA Secure AI whitepaper (domain synthesis) |
| documented multi-tenant GPU escape (cross-VM disclosure) and cross-tenant DoS — the isolation-breach playbook's design case | CVE-2025-23290 / -23285 | 2025 | 11.12 | NVIDIA security bulletins (domain research) |
| IRGC drone strikes on AWS facilities (UAE/Bahrain) — aerial/kinetic attack now an IR design case, not tail-risk | Mar 1, 2026 | 2026 | 11.12 | Domain research / open reporting |
| projected data-center physical-security spend by 2030 (~2x), reflecting the converged cyber-physical posture | ~$4B | 2026 | 11.12 | Security-domain research synthesis |
| AWS facilities directly hit by drones (UAE) + 1 blast-damaged (Bahrain), Mar 2026 — first confirmed combat strike on US-run hyperscale DC | 2 + blast | Mar 2026 | 11.2 | DefenseScoop / DCK / MWI (West Point) |
| first US statute letting certified state/local/tribal law enforcement deploy counter-UAS (after DOJ training); private operators still cannot legally defeat a drone | FY2026 NDAA | 2026 | 11.2 | FY2026 NDAA; CRS; Route Fifty |
| data center security market 2026, growing to ~$90B by 2034 (~17% CAGR); biometrics the fastest-growing sub-segment | ~$25.7B | 2026 | 11.2 | Fortune Business Insights; market.us |
| data center access-control market 2025, to ~$2.53B by 2030 (~10% CAGR) | ~$1.55B | 2025 | 11.2 | MarketsandMarkets |
| load lost on a single substation fault; 1.5 GW dropped in 82 s (VA, 2024) — the prize a saboteur targets outside the fence | ~1,500 MW | 2024 | 11.2 | NERC Level 3 Alert / Utility Dive |
| grid interconnection lead time for a large load — making the utility tie an irreplaceable single-point-of-failure if attacked | ~3–7+ yr | 2025 | 11.2 | ERCOT / PJM filings synthesis |
| typical standoff range a rural power-first campus gets free; urban inference sites often have near-zero | 50–150 m | 2026 | 11.2 | Practitioner / CPTED siting guidance |
| cost of a commercial FPV/one-way drone — the asymmetry against a multi-hundred-million-dollar facility | ~$300–500/round | 2026 | 11.2 | MWI (West Point) / open-source defense reporting |
| suspect counterfeit-part submissions logged in 2025 (down from 1,055 in 2024, partly a one-off batch); active components ~36% of reports | 748 | 2025 | 11.3 | ERAI 2025 Annual Counterfeit Report |
| of suspect counterfeit parts that PASSED electrical test — would evade detection if electrical test were the only screen | ~24% | 2025 | 11.3 | ERAI 2025 report |
| NIST SP 1800-34 'Validating the Integrity of Computing Devices' finalized — the platform-certificate / provenance reference architecture | Dec 2022 | 2022 | 11.3 | NIST / NCCoE SP 1800-34 |
| NIST SP 800-88 Rev 2 released — media sanitization modernized for encrypted/virtual/cloud media (Clear / Purge / Destroy) | Sept 2025 | 2025 | 11.3 | NIST SP 800-88 Rev 2 |
| IEEE 2883-2022: no overwrite-based method meets the Purge threshold for SSD/NVMe — only verified cryptographic erase or physical destruction qualifies | Purge = CE or destroy | 2025 | 11.3 | IEEE 2883-2022 / NIST 800-88 r2 |
| of used drives resold on the secondary market found to contain residual recoverable data (PII, financial, IP) — the data-remanence base rate | 42% | 2019 | 11.3 | Blancco Technology Group study |
| approximate silicon value concentrated in a single GB200 NVL72 rack (1.36 t) — the asset-value density driving target priority | $3–4M | 2025 | 11.3 | NVIDIA / SemiAnalysis (derived) |
| OCP S.A.F.E. project cadence; AMI the first approved independent firmware vendor SRP — the centralized, inheritable firmware-audit framework | 1st Thu/mo | 2025 | 11.3 | Open Compute Project S.A.F.E. |
| CVE-2024-54085 AMI MegaRAC BMC auth-bypass via Redfish; added to CISA KEV 25 Jun 2025; OEM'd across 12+ server vendors | CVSS 10.0 | 2025 | 11.4 | Eclypsium / CISA KEV / The Hacker News |
| internet-exposed MegaRAC SP-X Redfish instances found, each potentially exploitable for remote takeover/bricking | 1,000+ | 2025 | 11.4 | Eclypsium (Shodan scan) |
| Caliptra open silicon RoT co-developed by Microsoft, Google, AMD, NVIDIA; committed in their first-party/server silicon | 4 contributors | 2025 | 11.4 | OCP / CHIPS Alliance / Microsoft Azure |
| post-quantum signatures + KEM in Caliptra 2.x via open-source Adams Bridge accelerator (CNSA 2.0 path), side-channel hardened | ML-DSA + ML-KEM | 2025 | 11.4 | Microsoft Azure / CHIPS Alliance |
| irreplaceable, allocation-constrained silicon per GB200 NVL72 rack a management-plane implant can brick or wiretap | $3M+ | 2025 | 11.4 | RAND / domain research synthesis |
| NIST Platform Firmware Resiliency (protect/detect/recover); with SP 1800-34 and IR 8320 the standards backbone for firmware integrity | 800-193 | 2024 | 11.4 | NIST / NCCoE |
| OCP module decoupling BMC + RoT + TPM from the motherboard; 2.1 open reference designs appeared in 2025 | DC-SCM 2.0 | 2025 | 11.4 | OCP / Cloudflare Project Argus / Antmicro |
| BMC runs on standby power and boots before the host; a rooted BMC is an OS-invisible, persistent foothold under the CPU | always-on | 2025 | 11.4 | Eclypsium / OCP DC-SCM |
| of GPU HBM placed inside the encrypted, integrity-protected Compute Protected Region (CPR) | ~90% | 2025 | 11.5 | arXiv 2507.02770 (GPU CC Demystified); NVIDIA WP-12554 |
| of GPU memory-mapped registers hidden by the BAR0 decoupler in CC mode (vs ~8% in normal mode) | ~99.78% | 2025 | 11.5 | arXiv 2507.02770 |
| device-identity chain length and structured measurement records validated against NRAS + RIM goldens | 5-cert / 64 records | 2025 | 11.5 | arXiv 2507.02770; NVIDIA attestation docs |
| per-channel session keys derived from one SPDM-negotiated master secret (RPC / DMA / fault / workload) | 44+ keys | 2025 | 11.5 | arXiv 2507.02770 |
| training / inference advantage HGX B200 retains over H200 with confidential computing fully enabled | ~2x / ~2.5x | 2025 | 11.5 | NVIDIA Secure AI WP-12554; Corvex/Spheron benchmarks |
| Blackwell CC overhead on large matrix ops (encrypted HBM + TEE-I/O over NVLink); Hopper far heavier on small/PCIe transfers | under ~3% | 2025 | 11.5 | NVIDIA; independent Hopper CC benchmark (arXiv 2409.03992) |
| Hopper-class confidential-computing scope; multi-GPU TEE-I/O across NVLink is Blackwell-and-later | single-GPU | 2025 | 11.5 | NVIDIA Secure AI with Blackwell and Hopper GPUs (WP-12554) |
| year AMD SEV-SNP + Intel TDX + NVIDIA GPU CC reached broad cloud GA as a paired confidential-AI stack | 2025 | 2025-2026 | 11.5 | NVIDIA / cloud-provider CC GA announcements |
| CVSS of NVIDIAScape (CVE-2025-23266) — three-line container escape to host root in NVIDIA Container Toolkit | 9.0 | Jul 2025 | 11.6 | Wiz Research; NVIDIA Security Bulletin |
| NVIDIA Container Toolkit versions vulnerable to NVIDIAScape (GPU Operator ≤25.3.0) | ≤1.17.7 | Jul 2025 | 11.6 | Wiz; NVIDIA |
| first publicly acknowledged cross-VM co-tenant information disclosure via the vGPU Manager | CVE-2025-23290 | Jul 2025 | 11.6 | NVIDIA Security Bulletins |
| max MIG instances per GPU — the only hardware-enforced fractional partition (dedicated SMs, L2 slice, memory controllers, HBM slice) | 7 | 2025 | 11.6 | NVIDIA Multi-Instance GPU |
| LLM-response data recoverable per query via LeftoverLocals (CVE-2023-4969) from un-scrubbed GPU local memory | ≈181 MB | 2024 | 11.6 | Trail of Bits |
| memory and fault isolation guarantees provided by time-slicing / MPS between tenants | 0 | 2025 | 11.6 | Introl; NVIDIA MPS docs |
| ClusterMAX 2.0 operator-maturity rubric grades tenant/fabric isolation, health-checks, and goodput as first-class | 10-dimension | 2025 | 11.6 | SemiAnalysis ClusterMAX 2.0 |
| share of data-center traffic that is east-west (interior); approaches 100% on a training back-end fabric | 76-80% | 2024-2026 | 11.7 | Akamai / Gigamon |
| average eCrime breakout time (initial access to first lateral movement) in 2025, down from 48 min in 2024; fastest 27 s | 29 min | 2025 | 11.7 | CrowdStrike 2026 Global Threat Report |
| BlueField-4 DPU throughput; 64 Arm cores, ~6x BlueField-3 compute; zero-trust east-west enforcement at line rate | 800 Gb/s | 2026 (Vera Rubin platform) | 11.7 | NVIDIA / HPCwire |
| NIST Zero Trust Architecture — 'never trust, always verify'; no trust from network location | SP 800-207 | Aug 2020 (current) | 11.7 | NIST |
| training back-end fabric design; sub-2 us latency — why inline L7 inspection is a goodput tax there | 1:1 non-blocking | 2025 | 11.7 | SemiAnalysis / NVIDIA |
| industry-avg vs best-in-class goodput; inline enforcement on collectives erodes exactly this metric | ~90% / ~96% | 2025 | 11.7 | SemiAnalysis ClusterMAX / CoreWeave |
| configuration boundaries (subnet-manager / adapter enforced) — segmentation, not cryptographic isolation | VLAN/PKey | 2025 | 11.7 | NVIDIA InfiniBand / SemiAnalysis ClusterMAX |
| egress posture for the weights enclave: allow-listed proxy + blocked/alerted bulk transfers — the anti-exfil linchpin | default-deny | 2025 | 11.7 | RAND RRA2849-1 (weight-security egress controls) |
| RAND Weights Security Levels (SL1-5), attacker operational-capacity tiers (OC1-5), and catalogued attack vectors | 5 levels / 5 tiers / 38 vectors | 2024 | 11.8 | RAND RRA2849-1 (Securing AI Model Weights) |
| where RAND assesses most frontier labs currently sit — stops opportunistic actors and basic insiders, not OC4-OC5 nation-states | ~SL2 | 2024-2026 | 11.8 | RAND RRA2849-1 |
| SL5 Task Force target for nation-state-resistant frontier AI infrastructure; SL5 standard = 43 controls / 10 families (NIST SP 800-53 overlay) | 2028/2029 | 2025-2026 | 11.8 | SL5 Task Force / Institute for Security & Technology |
| to exfiltrate a ~1,000 Gb model even under an 800 GB/day egress cap — why fixed-rate limits are necessary but not sufficient | ~1.25 days | 2025 | 11.8 | LessWrong/Alignment Forum egress-limit analyses |
| token output of a single production inference server — the channel that cannot be rate-capped without breaking the service | ~1 TB/day | 2025 | 11.8 | Inference-verification exfiltration research |
| preliminary feasible weight-compression floor in a theft context — shrinks the payload an attacker must move, undercutting fixed egress caps | ~1 bit/param | 2026 | 11.8 | arXiv 'Aggressive Compression Enables LLM Weight Theft' |
| GPU HBM inside the encrypted Compute Protected Region; memory-mapped registers hidden by the BAR0 decoupler in CC mode | ~90% / ~99.78% | 2025 | 11.8 | arXiv 2507.02770; NVIDIA WP-12554 |
| checkpoint size for a 175B to 1T-param model at ~14 bytes/param incl. optimizer state — the at-rest bulk the crypto must wrap | 2.3-13.8 TB | 2025 | 11.8 | NVIDIA storage guidance; checkpoint-sizing rules of thumb |
| where consensus assesses frontier labs sit; insider threat is the dominant gap blocking SL4-5, which need human-layer controls not more crypto | ~SL2 | 2024-2025 | 11.9 | RAND RRA2849-1 (Securing AI Model Weights); IST SL5 Task Force |
| RAND theft benchmark: a Security Level is defined by stopping an adversary attempting weight theft inside this window | <2 months | 2024 | 11.9 | RAND RRA2849-1 |
| distinct attack vectors in RAND's model; insider threat spans most of them rather than being one isolated path | 38 vectors | 2024 | 11.9 | RAND RRA2849-1 (5 SL, 5 OC tiers, 38 vectors) |
| average annual cost of insider risk per organization (largest Ponemon insider study to date) | $17.4M | 2025 | 11.9 | Ponemon / DTEX 2025 Cost of Insider Risks |
| share of insider incidents that are negligent vs malicious; credential theft ~20% but costliest at ~$779,797/event | ~55% / ~25% | 2025 | 11.9 | Ponemon 2025 Cost of Insider Risks |
| average time to detect and contain an insider incident (down from 86 in 2023); far longer than a checkpoint copy takes | 81 days | 2025 | 11.9 | Ponemon 2025 Cost of Insider Risks |
| of breaches involve the human element; convenience (60%) now leads deliberate-misuse motive ahead of financial gain (33%) | ~60% | 2025 | 11.9 | Verizon 2025 DBIR (12,195 breaches) |
| frontier pattern: time-limited, peer-approved, business-justified grants to weight infrastructure (multi-party authorization) | no standing access | 2025 | 11.9 | Anthropic Frontier Model Security; OpenAI frontier-risk |
| legacy Tier III / Tier IV availability (~1.6 hr vs ~26 min/yr down) — figures Uptime no longer endorses | 99.982% / 99.995% | 2025 | 12.1 | Uptime Institute Tier Standard |
| MEP construction-cost swing of 2N over N+1; 2N strands ~50% of capacity idle | +30–50% | 2025 | 12.1 | SemiAnalysis Datacenter Anatomy; STACK Infrastructure |
| Tier IV capital premium over Tier III — for ~70 extra minutes/yr of facility uptime | ~20–40% | 2025 | 12.1 | Uptime Institute / practitioner data |
| share of impactful outages caused by power (most often UPS) — the leading cause, 4th year of falling overall frequency | 45% | 2025 | 12.1 | Uptime Institute Annual Outage Analysis 2025 |
| of human-error outages caused by staff not following procedures (up from 48%); ~40% of orgs hit a major human-error outage in 3 yr | 58% | 2025 | 12.1 | Uptime Institute Annual Outage Analysis 2025 |
| Llama 3 405B training interruptions on 16,384 H100s (~1 every 3 hr; 78% hardware) yet >90% effective training time | 466 / 54 days | 2024 | 12.1 | Meta (Llama 3 paper) |
| best-in-class H100 cluster MTBF per 512 GPUs — the job is its own availability risk, not the building | ~7 days | 2025 | 12.1 | SemiAnalysis (100k H100 clusters) |
| rack BBU (OCP ORv3, 5+1 redundant) switchover — backup energy migrating down to the rack/silicon | <5 ms | 2025 | 12.1 | OCP ORv3 / Open Rack BBU specs |
| unplanned interruptions on 16,384 H100s (~1 every 3 hr); 78% hardware, 58.7% GPU/HBM — all at 100% facility availability | 419 / 54 days | 2024 | 12.2 | Meta (Llama 3 405B paper) / Tom's Hardware |
| goodput (effective training time): industry average vs best-in-class; reliability overhead 6–21% of TCO | ~90% / ~96% | 2025 | 12.2 | SemiAnalysis ClusterMAX / CoreWeave |
| best-in-class MTBF per 512 GPUs on mature H100 clusters; far worse during 3–4 week burn-in | ~7 days | 2025 | 12.2 | SemiAnalysis (100k H100 clusters) |
| Uptime Tier III vs Tier IV availability (~1.6 hr vs ~26 min/yr); Tier IV ~20–40% capital premium | 99.982% / 99.995% | 2025 | 12.2 | Uptime Institute (% figures Uptime-disavowed) |
| training MTTR cut by multi-tier checkpointing — a goodput gain no facility tier delivers | 15–30 min → <2 min | 2025 | 12.2 | Google Cloud (multi-tier checkpointing) |
| data-center load lost on a single 230 kV fault (1.5 GW in 82 s, VA); triggered NERC's rare Level 3 alert | ~1,500 MW | 2026 | 12.2 | NERC Level 3 Alert / Utility Dive |
| per-GPU capacitance, GB300 → Vera Rubin (~6x); ~30% peak-grid-demand reduction demonstrated | 65 → ~400 J/GPU | 2026 | 12.2 | NVIDIA / SemiAnalysis |
| large-LLM job failure rate (Alibaba Unicron); ~37% hardware-attributed, ~73% restart-recoverable | ~43.4% | 2024 | 12.2 | Alibaba (Unicron) via SemiAnalysis |
| practitioner RTO / RPO target for production interactive inference | ~15 min / ~5 min | 2025 | 12.3 | Introl, Disaster Recovery for AI Infrastructure |
| training RPO floor — set by checkpoint interval, not by replication; RTO bounded by GPU re-acquire + resume | 2-4 hr | 2025 | 12.3 | Introl DR analysis; checkpoint practice |
| infrastructure cost of active-active (carry a second live fleet); hot warm standby ~60% cheaper; pilot light ~20% of full redundancy | ~2x | 2025 | 12.3 | Introl DR analysis; cloud DR-pattern taxonomy |
| training throughput (goodput) penalty of forcing a zero-RPO posture vs setting RPO = checkpoint interval | ~15-20% | 2025 | 12.3 | Introl DR analysis |
| duration of the AWS US-EAST-1 outage (Oct 19-20, 2025) — a single-region control-plane/DNS dependency cascading estate-wide | ~15 hr | 2025 | 12.3 | AWS post-event summary; InfoQ; ThousandEyes |
| availability achievable for inference spanning multiple active regions (e.g. Uber's 3-region inference posture) | 99.99% | 2025 | 12.3 | Introl / Uber engineering synthesis |
| continuous replication bandwidth (~200 Gbps) to hold a 1-hour RPO on ~100 TB of training state across regions | ~$50k/mo | 2025 | 12.3 | Introl DR analysis |
| large-load grid interconnection lead time — why failover capacity must be energized in advance, not acquired on the day | 3-7+ yr | 2025 | 12.3 | ERCOT/PJM filings synthesis (provenance register) |
| training goodput: industry average vs best-in-class marketed (CoreWeave); the gap the contract prices | 90% / ~96% | 2025 | 12.4 | SemiAnalysis ClusterMAX 2.0 / CoreWeave |
| GPU-cloud SLA baseline: node uptime / rack uptime, with penalties (ClusterMAX baseline) | 99.9% / 99% | 2025 | 12.4 | SemiAnalysis ClusterMAX |
| hyperscaler compute SLA: multi-AZ region-level vs single-instance Monthly Uptime | 99.99% / 99.5% | 2026 | 12.4 | Amazon EC2 / Compute SLA |
| reference service-credit ladder rungs (% of monthly bill) as uptime falls through bands | ~10% / 25% / 100% | 2026 | 12.4 | Amazon EC2 / Compute SLA |
| Uptime Tier III vs Tier IV availability (~1.6 hr vs ~26 min downtime/yr); Uptime now disavows the % | 99.982% / 99.995% | 2025 | 12.4 | Uptime Institute Tier Standard |
| best-in-class H100 MTBF per 512 GPUs — the failure environment any cluster SLA is written against | ~7 days | 2025 | 12.4 | SemiAnalysis (100k H100 clusters) |
| Llama-3 405B interruption rate (16,384 H100, 54 days): 466 interruptions, 78% hardware | ~1 / 3 hr | 2024 | 12.4 | Meta Llama 3 Herd of Models |
| reliability overhead as a share of cluster TCO — the cost of closing the goodput gap | 6–21% | 2025 | 12.4 | SemiAnalysis ClusterMAX |
| failures per 1,000 node-days, Meta RSC-1 vs RSC-2 — the empirical λ that drives any cluster goodput model | 6.50 vs 2.34 | 2024 | 12.5 | Meta, Revisiting Reliability in Large-Scale ML Clusters (arXiv 2410.21680) |
| projected mean time between failures for a 16,384-GPU vs 131,072-GPU synchronous job | 1.8 hr → 14 min | 2024 | 12.5 | Meta (arXiv 2410.21680); SemiAnalysis |
| modeled ETTR (goodput) for a 16k-GPU run moving from 60-min to 5-min checkpoint interval | 0.70 → 0.93 | 2024 | 12.5 | Meta, Revisiting Reliability (arXiv 2410.21680) |
| 512+ GPU job failure rate after lemon-node ejection — a sensitivity result the model must reproduce | 14% → 4% | 2024 | 12.5 | Meta, Revisiting Reliability (arXiv 2410.21680) |
| IEC 61508 beta-factor range for common-cause failure; ~10% the default if no diversity measures applied | 0.5%–10% | 2025 | 12.5 | IEC 61508-6 Annex D; exida |
| annualized GPU failure rate feeding the per-node λ in fleet roll-up models | ~9% AFR | 2026 | 12.5 | domain synthesis / Chapter 14.3 fleet data |
| Uptime Tier III / Tier IV availability targets (~1.6 hr vs ~26 min/yr) — the facility-model benchmark | 99.982% / 99.995% | 2025 | 12.5 | Uptime Institute (Tier classes; % figures Uptime-disavowed) |
| industry-average vs best-in-class training goodput — the validation band any goodput model must land in | ~90% / ~96% | 2025 | 12.5 | SemiAnalysis ClusterMAX / CoreWeave |
| the commissioning ladder: FAT → SAT → pre-functional → functional → Integrated Systems Test (IST) | L1–L5 | 2025 | 13.1 | Construct & Commission; BMP MEP; CxPlanner |
| concurrent maintainability (any path serviceable, no load impact) vs fault tolerance (survive any single unplanned fault) | Tier III vs IV | 2025 | 13.1 | Uptime Institute Tier Standard |
| Tier III (~1.6 hr/yr) vs Tier IV (~26 min/yr) availability; ~20–40% capital premium for IV | 99.982% / 99.995% | 2025 | 13.1 | Uptime Institute (% figures Uptime-disavowed) |
| ASHRAE commissioning-process / Basis-of-Design / data-center-specific Cx guidelines; Std 202 formalizes the Cx-Process | Gd 0 / 1.1 / 1.6 | 2025 | 13.1 | ASHRAE; ACHR News |
| commissioning as a share of construction cost; CxAs now locked in 12–18 months ahead of energization | 0.5–2% | 2025 | 13.1 | CxPlanner; iRecruit / industry practice |
| lost-revenue cost of delaying commissioning a 60 MW facility — the schedule pressure that tempts truncating L5 | ~$14.2M/mo | 2025 | 13.1 | Mastt / industry build-cost analyses |
| unplanned interruptions on a 16,384-GPU Llama 3 run (~1 every 3 hr); the day-2 reality a thin Cx program hands forward | 419 / 54 days | 2024 | 13.1 | Meta (Llama 3 paper) / Tom's Hardware |
| ANSI/BICSI 002-2024 — the most comprehensive lifecycle design+implementation standard; 2024 ed. expanded liquid/immersion | ~575 pp | 2024 | 13.1 | BICSI |
| of serious data-center outages involve human error — most trace to missing or unfollowed procedures (the case for the handover package) | ~70-80% | 2025 | 13.10 | Uptime Institute Global Data Center Survey / Outage Analysis |
| revenue per GW of AI capacity per year — the clock that pressures teams to override the readiness gate (contested — single-source) | ~$10-12B | 2025 | 13.10 | SemiAnalysis (onsite gas economics) |
| data-center load dropped in 82 s (VA, 2024); ~1,500 MW lost on a single fault — the swing go-live first exposes | ~1.5 GW | 2026 | 13.10 | NERC Level 3 Alert / Utility Dive |
| NERC Level 2 Recommendation on large loads (commissioning + ramp coordination); Project 2026-02 Computational Loads under way | Sept 2025 | 2026 | 13.10 | NERC Large Loads Action Plan / Utility Dive |
| industry-average vs best-in-class goodput — the acceptance floor the full-load stage must clear | ~90% / ~96% | 2025 | 13.10 | SemiAnalysis ClusterMAX / CoreWeave |
| Tier III vs Tier IV availability — the redundancy that must hold at every point on the ramp, not just at the end | 99.982% / 99.995% | 2025 | 13.10 | Uptime Institute Tier Classification |
| per GB200/GB300 NVL72 rack — the heat flux and power transient the cooling/smoothing stack must absorb at full load | 120-142 kW | 2026 | 13.10 | SemiAnalysis / NVIDIA roadmap |
| MTBF per 512 GPUs at a mature operator — the failure cadence operations inherits the instant handover completes | ~7 days | 2025 | 13.10 | SemiAnalysis (100k H100 clusters) |
| commissioning as share of total project cost; prevents multiples in rework/downtime | 1–3% | 2025 | 13.2 | Industry Cx cost guidance (TrueLook / practitioner) |
| lead time operators now lock in commissioning agents ahead of energization | 12–18 mo | 2025 | 13.2 | iRecruit / DC construction-trend reporting |
| default fabric BER acceptance threshold per port (InfiniBand ibdiagnet) | 1e-12 | 2025 | 13.2 | NVIDIA/Mellanox ibdiagnet manual |
| GPU node burn-in/soak duration gated before cluster acceptance | 72–168 hr | 2025 | 13.2 | Together AI / Introl validation guides |
| goodput acceptance bar: industry-avg vs best-in-class effective training time | ~90% / ~96% | 2025 | 13.2 | SemiAnalysis ClusterMAX / CoreWeave |
| CDU coolant inlet acceptance band; deviation can throttle GPUs up to ~50% | 20–25 °C | 2025 | 13.2 | NVIDIA OCP / Introl (GB200 NVL72) |
| Tier III vs Tier IV availability the redundancy-topology scripts must demonstrate | 99.982% / 99.995% | 2025 | 13.2 | Uptime Institute Tier classification |
| NVL72 heat split (liquid vs air) — the load a facility load bank cannot reproduce in the loop | ~115 / ~17 kW | 2025 | 13.2 | NVIDIA OCP / Introl |
| ANSI/NETA Acceptance Testing Specifications — the current as-installed bar for switchgear, breakers, relays and primary injection | ATS-2025 | 2025 | 13.3 | ANSI/NETA ATS-2025; NETA World Journal |
| data-center load lost on a single 230 kV fault — the synchronized ride-through failure NETA/Cx must now design against | ~1,500 MW | 2026 | 13.3 | NERC Level 3 Alert / Utility Dive |
| peak grid-demand reduction from GB300 NVL72 power-shelf energy storage (capacitor smoothing) — an L4 acceptance criterion now, not a spec sheet curiosity | up to 30% | 2025 | 13.3 | NVIDIA Developer Blog; ServeTheHome |
| rack-level electrolytic-capacitance energy storage in GB300 NVL72 power shelves (≈half the PSU volume) | 65 J/GPU | 2025 | 13.3 | NVIDIA Developer Blog / LITEON |
| Vera Rubin NVL72 rack-level storage — ~6x GB300 — with closed-loop state-of-charge control for fast transient smoothing | 400 J/GPU | 2026 (roadmap) | 13.3 | NVIDIA Vera Rubin POD blog |
| power-oversubscription headroom: training vs inference — the swing magnitude electrical acceptance must absorb | 3% vs 21% | 2025 | 13.3 | Uptime Institute Journal |
| Rubin Ultra Kyber rack on 800 VDC — the density ramp the irreversible power substrate must accept | ~600 kW | 2027 (announced) | 13.3 | SemiAnalysis / NVIDIA roadmap; The Next Platform |
| lagging power factor a reactive load bank loads the chain to — proving generator/UPS at kVA rating, not just kW | 0.8 PF | 2025 | 13.3 | Aggreko / CxPlanner commissioning practice |
| behind-the-meter gas announced by 2026 (~7 GW under construction) — the scale of the islanding problem | ~82 GW | 2026 | 13.4 | Cleanview / SemiAnalysis |
| LM2500XPRESS aeroderivative unit rating and start time; black-start-capable, grid-independent | 35 MW / 5 min | 2025 | 13.4 | GE Vernova / Crusoe (29-unit order) |
| aeroderivative gas-turbine lead time (refurb under 12 mo); the speed-to-power constraint behind islanding | 18–36 mo+ | 2025 | 13.4 | Data Center Frontier / Grid Capacity Intelligence |
| Vera Rubin rack-level energy storage for power smoothing (~6x prior gen); cuts peak current ~25% | ~400 J / GPU | 2025 | 13.4 | NVIDIA developer blog |
| data-center load lost on a single 230 kV fault; 1.5 GW dropped in 82 s (VA, 2024) — triggered NERC Level 3 alert | ~1,500 MW | 2026 | 13.4 | NERC Level 3 Alert / Utility Dive |
| microgrid-controller specification (2017) and conformance-test method (2018) — the Cx acceptance basis | IEEE 2030.7 / 2030.8 | 2017–2018 | 13.4 | IEEE Standards |
| best-in-class cluster MTBF; a single power transient that drops a synchronous job restarts from checkpoint | ~7 days / 512 GPUs | 2025 | 13.4 | SemiAnalysis (100k H100 clusters) |
| GB200 NVL72 coolant inlet spec; deviation can throttle GPUs up to ~50% | 20–25 °C | 2025 | 13.5 | NVIDIA OCP / Introl |
| DLC flow per GB200 NVL72 rack (~1.2–2.0 L/min per kW design rule) | ~80 L/min | 2025 | 13.5 | Dober / NVIDIA OCP |
| NVL72 CDU/row-level cooling capacity (per-rack heat is ~132 kW: ~115 kW liquid + ~17 kW air) | ~2.4 MW | 2025 | 13.5 | NVIDIA OCP / Introl |
| secondary-loop conductivity floor flushed to before coolant charge (DI ≥0.5 MΩ·cm) | ≤5 µS/cm | 2026 | 13.5 | Liquid-cooling commissioning practice (XD Thermal / Introl synthesis) |
| rated working pressure for hydrostatic acceptance hold (ASME B31.x / EN 13480 basis) | 1.5× | 2025 | 13.5 | Liquid-cooling commissioning practice; ASME B31 |
| install + commissioning per GB200 NVL72 system; load staged 25→50→75→100% | 2–3 weeks | 2026 | 13.5 | Introl GB200 NVL72 deployment |
| single-phase direct-to-chip share of the liquid-cooling market (the loop you are commissioning) | ~55% | 2026 | 13.5 | DCD / IDTechEx |
| best-in-class training goodput the loop must protect; a cooling trip is lost goodput | ~96% | 2025 | 13.5 | SemiAnalysis ClusterMAX / CoreWeave |
| GB300 NVL72 in-shelf energy storage for power smoothing; ~30% peak-grid reduction on Megatron training | 65 J/GPU | 2025 | 13.6 | NVIDIA Developer (GB300 steady power) |
| Vera Rubin power-smoothing reservoir target; facility BESS roles for transient/ride-through/DR | ~400 J/GPU | 2025 | 13.6 | NVIDIA (production-ready BESS for AI factories) |
| single-event large-load loss on a 230 kV fault; 1.5 GW dropped in 82 s (VA, 2024) — the ride-through problem IST must prove against | ~1,500 MW | 2026 | 13.6 | NERC Level 3 Alert / Utility Dive |
| GB200/GB300 NVL72 coolant inlet window; deviation throttles GPUs up to ~50% — the thermal ride-through envelope | 20-25 °C | 2025 | 13.6 | NVIDIA OCP / Introl |
| power-oversubscription headroom training vs inference — why transient behavior differs by workload IST cannot run | 3% vs 21% | 2025 | 13.6 | Uptime Institute Journal |
| single-phase direct-to-chip share of liquid-cooling market — the loop IST load banks cannot exercise at real heat flux | ~55% | 2026 | 13.6 | DCD / IDTechEx |
| typical IST planning horizon before a full-facility Level 5 campaign | weeks-to-months | 2025 | 13.6 | Construct & Commission (L5 IST guide) |
| post-FEC BER pass floor for AI fabric links (tightening toward 1e-13 at the highest lane rates) | ~1e-12 | 2025 | 13.7 | IEEE 802.3 / IBTA link specifications; practitioner acceptance plans |
| PAM4 SerDes per-lane rate driving 800G/1.6T links — FEC-mandatory, BER-screening-critical | 100-200 Gb/s | 2025 | 13.7 | SemiAnalysis (AI networks); provenance.js optics ladder |
| minimum link-flap soak under line-rate load at operating temperature before a link is accepted | ≥ 24 h | 2025 | 13.7 | Practitioner fabric-commissioning practice; Keysight test methodology |
| InfiniBand point-to-point latency; tuned RoCEv2 ~1.5-2.5 us — the acceptance band for ib_*_lat | ~1-2 us | 2025 | 13.7 | SemiAnalysis / NVIDIA; provenance.js IB-vs-RoCE |
| PTP accuracy held across a Spectrum switch; ConnectX-class NIC timestamping under ~4 ns variance | ~10 ns | 2025 | 13.7 | NVIDIA Technical Blog, Spectrum switch time-sync |
| fleet PTP offset-from-master target the time-sync gate must demonstrate, under load and across every node | sub-us | 2024 | 13.7 | Engineering at Meta (SPTP); IEEE 1588 practice |
| effective throughput a well-tuned AI Ethernet fabric (Spectrum-X) sustains — the congestion-gate target | ~95% | 2025 | 13.7 | NVIDIA (Spectrum-X xAI Colossus) |
| NVLink aggregate per GB200 NVL72 rack the scale-up gate verifies whole (1.8 TB/s/GPU, NVLink 5) | ~130 TB/s | 2025 | 13.7 | NVIDIA; provenance.js NVLink |
| GPU node burn-in / soak window (3-day minimum to 7-day strict acceptance) | 72–168 hr | 2025 | 13.8 | Together AI seven-phase guide; Introl validation frameworks; ClusterMAX 2.0 |
| bring-up burn-in period before a new cluster's failure rate decays toward the mature baseline | 3–4 weeks | 2025 | 13.8 | SemiAnalysis (100k H100 clusters) |
| mature best-in-class H100 MTBF; freshly-racked clusters fail far more often | ~7 days / 512 GPUs | 2025 | 13.8 | SemiAnalysis (100k H100 clusters) |
| unplanned interruptions on 16,384 H100s during Llama 3 405B — ~1 every 3 hr | 419 in 54 days | 2024 | 13.8 | Meta (Llama 3 paper) / Tom's Hardware |
| Llama 3 interruptions attributed to faulty GPU and to HBM3 — together >½ of hardware faults | ~30% + ~17% | 2024 | 13.8 | Meta (Llama 3 paper) / DataCenterDynamics |
| machines affected by silent data corruption (SDC) at fleet scale | ~1 in 1,000 | 2025 | 13.8 | Meta Engineering (How Meta keeps its AI hardware reliable) |
| expected SDC events during a large-scale training run (Meta; Google reports similar for Gemini) | every 1–2 weeks | 2025–2026 | 13.8 | Meta Engineering; IEEE / arXiv SDC studies |
| DCGM -r 4 (deep, incl. memtest + EUD) runtime per node, GPU-count dependent | ~1.5 hr | 2026 | 13.8 | NVIDIA DCGM Diagnostics documentation |
| in-domain all-reduce busbw on GB200 NVL72 (vs 900 GB/s/GPU NVLink5 ceiling); scale-out gate set as % of this | 870-928 GB/s | 2025 | 13.9 | NCCL tests on GB200 NVL72 (Crusoe / Nebius / NVIDIA tuning guide) |
| checkpoint state to size the write path; keep checkpoint stall <10% of step time | ~14 bytes/param | 2025 | 13.9 | VAST Data checkpoint survey (85k+ checkpoints) |
| failure cadence in a 100k-accelerator cluster at full utilization — why checkpoint bandwidth is an acceptance gate | ~every 30 min | 2025 | 13.9 | MLCommons MLPerf Storage v2.0 |
| aggregate storage bandwidth per ~1,024 GPUs (write ≥ ½ read design rule) | 250-400 GB/s | 2025 | 13.9 | NVIDIA DGX SuperPOD reference architecture |
| industry-average vs best-in-class measured training goodput — the number the SLA is set against | ~90% / ~96% | 2025 | 13.9 | SemiAnalysis ClusterMAX 2.0 / CoreWeave |
Refine your search or pick a Part to narrow the 1,420 matches.