Glossary
408 terms across power, cooling, compute, networking, reliability, and economics.
$/GPU-hr
The all-in cost to operate one accelerator for one hour; the core unit economic for build-versus-rent decisions.
$/M-tokens
The cost to serve one million tokens, the revenue-side unit economic for inference businesses.
24/7 CFE · Carbon-Free Energy
Matching every hour of consumption with carbon-free generation, a stricter goal than annual renewable matching.
2N
Redundancy where the entire power/cooling system is fully mirrored, so any single path can fail with zero impact.
800 VDC
Direct-current rack distribution for megawatt-class racks that cuts conversion stages and copper versus traditional AC.
ABS · Asset-Backed Securitization
Financing that bundles cash-flowing assets (such as leases or GPUs) into securities sold to investors.
Aeroderivative turbine
A jet-engine-derived gas turbine that starts fast and ramps quickly, suited to on-site or backup power.
Air-gap
Physically isolating a system or network from any external connection to protect highly sensitive workloads.
All-gather
A collective that assembles each GPU's data shard onto every GPU, common in sharded training and inference.
All-reduce
A collective operation that sums gradients across all GPUs and shares the result, the dominant traffic in training.
Anycast
Routing that sends a request to the nearest of several sites sharing one address, used for low-latency global serving.
Approach temperature
The gap between a coolant's temperature and the medium it rejects heat to; a small approach demands a bigger exchanger.
Archetype
A reference workload pattern (pretraining, post-training, RL, online or batch inference, edge) that drives design choices.
Arithmetic intensity
The ratio of compute operations to bytes moved; it determines whether a kernel is compute- or memory-bound.
ASHRAE
The engineering society whose thermal guidelines define the temperature and humidity envelopes for IT equipment.
ASHRAE TC 9.9
The industry committee whose A1-A4 air and W17-W45 water classes define the temperature envelopes IT gear can run in.
ASIC · Application-Specific Integrated Circuit
A chip hard-wired for one task; AI ASICs trade flexibility for efficiency versus general-purpose GPUs.
ATS · Automatic Transfer Switch
A switch that automatically moves load from utility to backup generator when grid power fails.
Autoregressive
Generating output one token at a time, each conditioned on all previous tokens, the basis of LLM decoding.
Availability zone · AZ
An isolated data-center location within a cloud region, designed to fail independently of its peers.
Badput
Compute spent on work that is wasted (failed, redundant or thrown away), the opposite of goodput.
Battery Backup Unit · BBU
Rack-mounted battery providing short ride-through and load-transient smoothing for AI racks.
Battery Energy Storage System · BESS
Large battery installation buffering power transients and bridging outages at the facility level.
BBU · Battery Backup Unit
Rack-level battery that rides through power dips and absorbs sudden GPU load swings before the BESS or genset reacts.
Behind-the-meter · BTM
On-site generation connected on the customer side of the utility meter, bypassing grid interconnection delays.
BER · Bit-Error Rate
The fraction of transmitted bits received in error; a key quality metric for high-speed optical and copper links.
BESS · Battery Energy Storage System
Facility-scale battery bank that smooths load transients, provides ride-through, and can bridge to generators.
BF16 · Brain Float 16
A 16-bit floating-point format with a wide exponent range, a common default for stable AI training.
Bisection bandwidth
The aggregate bandwidth across the worst-case cut of a network; the key figure for all-reduce-heavy training.
Black start
Restarting generation and energizing a grid or site from a complete shutdown without external power.
Black-building test · Pull-the-plug test
A commissioning test that cuts utility power to prove backup generation and transfer carry the full load.
Blast radius · Fault domain
The scope of impact when something fails; good design shrinks the blast radius so one fault affects few resources.
Blowdown
Periodically draining concentrated mineral-laden water from a cooling tower to control scale, a key water loss.
BMC · Baseboard Management Controller
An always-on chip that remotely monitors and manages a server's hardware independent of its CPU and OS.
BMS · Building Management System
The control system that runs a facility's mechanical and electrical equipment such as cooling, power and alarms.
Breakeven utilization
The fraction of capacity that must be sold or used for a facility's revenue to cover its costs.
Bring-up
The process of powering, validating and tuning a new cluster until it passes acceptance and can run real jobs.
Brownfield
A project that reuses or retrofits an existing building or site rather than building new.
Build-to-suit · BTS
A data center custom-built and leased to a specific tenant's specifications, usually under a long-term contract.
Burn-in
Running new hardware hard for a period to surface early ('infant mortality') failures before production use.
Busbar
A solid metal conductor distributing high current within a rack or system, replacing bulky cabling.
Busway
Overhead enclosed busbar run with tap-off boxes that distributes power flexibly across rows of racks.
CAGR · Compound Annual Growth Rate
The smoothed yearly growth rate of a quantity over a period, used to project demand, cost or capacity.
Caliptra
An open-source silicon root-of-trust block letting chips verify their own firmware, backed by OCP and hyperscalers.
Capex · Capital Expenditure
Up-front spending on long-lived assets like buildings, power gear and GPUs, depreciated over their useful life.
Cascade-to-inference
Repurposing older training GPUs for inference as newer chips take over training, extending hardware economic life.
CCGT · Combined-Cycle Gas Turbine
A high-efficiency gas power plant that reuses turbine exhaust heat to drive a steam turbine, a behind-the-meter option.
CDN · Content Delivery Network
A distributed network of caching servers that delivers content from locations close to users.
CDU · Coolant Distribution Unit
Heat exchanger plus pumps isolating the clean technology loop from facility water and providing leak containment.
CFD · Computational Fluid Dynamics
Simulation of airflow and heat that engineers use to design and verify data-center cooling.
Chain-of-thought · CoT
Prompting or training a model to reason in explicit intermediate steps, trading more tokens for better answers.
Checkpoint
A periodic save of model weights and optimizer state so a long training run can resume after a failure.
Chip-on-Wafer-on-Substrate · CoWoS
TSMC advanced-packaging process integrating compute and memory on an interposer; an upstream supply chokepoint.
Chiplet
A smaller die combined with others in one package, letting designers mix processes and beat single-die size limits.
Chunked prefill
Breaking a long prompt's prefill into chunks interleaved with decode so latency stays steady under load.
Circular financing
Arrangements where a chip vendor invests in customers who use the money to buy its chips, raising scrutiny over demand.
ClusterMAX
SemiAnalysis's rating system grading GPU cloud providers on reliability, performance and operational maturity.
CMMC · Cybersecurity Maturity Model Certification
The US Defense Department framework certifying contractors' cybersecurity to handle controlled unclassified information.
Co-Packaged Optics · CPO
Bringing optical engines into the chip package to push high-speed links further than copper allows.
COD · Commercial Operation Date
The date a facility or power asset begins commercial service, often a contractual and financing milestone.
Cold plate
A liquid-cooled metal block clamped to a chip that conducts heat into the coolant in a direct-liquid-cooling loop.
Collective
A coordinated communication pattern across many GPUs (all-reduce, all-gather, reduce-scatter) central to distributed AI.
Colocation · Colo
Renting space, power and cooling in a shared facility, in wholesale (large blocks) or retail (rack-level) form.
Commissioning · Cx
The structured testing (levels L1-L5) that proves a facility works correctly under load before it goes live.
Concurrent maintainability
The ability to maintain or replace any component without shutting down IT load, the defining trait of Tier III.
Confidential computing
Protecting data while it is being processed by running it inside a hardware trusted execution environment.
Containment
Physically separating hot and cold air (cold-aisle, hot-aisle, or chimney) so cooling air is not wasted by mixing.
Continuous batching · In-flight batching
Dynamically adding and removing requests from a running inference batch to keep the GPU busy and lift throughput.
Coolant Distribution Unit · CDU
Unit that pumps and conditions coolant to racks while isolating it from the facility water system.
Cooling cliff · Density wall
The rack-power point (~100 kW) above which air cooling fails and liquid cooling becomes mandatory.
Cooling distribution · L2L / L2A
Heat-exchange schemes moving heat liquid-to-liquid (L2L) or liquid-to-air (L2A) between cooling loops.
Cooling tower
A structure that rejects heat by evaporating water; efficient but the main driver of data-center water consumption.
COP · Coefficient of Performance
A heat pump or chiller's ratio of heat moved to electricity consumed; higher means more efficient cooling or heating.
Cordon and drain
Marking a node unschedulable and moving its work off so it can be serviced without disrupting the cluster.
CoWoS · Chip-on-Wafer-on-Substrate
TSMC's 2.5D packaging that joins logic die and HBM stacks on a silicon interposer; its wafer capacity gates AI supply.
CPO · Co-Packaged Optics
Optics integrated into the switch or accelerator package to beat copper-reach limits at the cost of serviceability.
CRAC · Computer Room Air Conditioner
A refrigerant-based room cooling unit; the legacy air-cooling workhorse now giving way to liquid for AI density.
CRAH · Computer Room Air Handler
A chilled-water room air handler that cools the data hall, more efficient than refrigerant-based CRAC units.
Critical path · CPM
The longest chain of dependent tasks whose any slip delays the whole project; everything off it has float.
CUDA
NVIDIA's programming platform for general-purpose GPU computing; the software moat underpinning its AI dominance.
CUE · Carbon Usage Effectiveness
Kilograms of CO2-equivalent emitted per kWh of IT energy; the carbon companion to PUE.
Curtailable load
Load a site agrees to reduce on the grid operator's signal, trading interruptions for faster or cheaper interconnection.
Curtailment
Forced reduction of a load's or generator's output, often to manage grid constraints; AI loads may trade it for speed.
CxA · Commissioning Agent
The independent party that plans and verifies commissioning to confirm the facility performs as designed.
CXL · Compute Express Link
A cache-coherent interconnect over PCIe that lets CPUs, accelerators and memory pools share memory across devices.
DAC · Direct Attach Copper
A copper cable carrying high-speed signals over short reach, cheaper than optics for in-rack and adjacent links.
DAOS · Distributed Asynchronous Object Storage
An open-source high-performance storage system built for NVMe and persistent memory in HPC and AI clusters.
Data gravity
The tendency for large datasets to attract compute and services, making data expensive and slow to move.
Data parallelism · DP
Replicating the model across GPUs that each process different data and synchronize gradients via all-reduce.
Data residency
The requirement that data be stored and processed within a specific country or jurisdiction.
Data sovereignty
The principle that data is subject to the laws of the nation where it is collected or stored.
DCGM · Data Center GPU Manager
NVIDIA's tool for monitoring, diagnosing and health-checking GPUs across a fleet.
DCI · Data Center Interconnect
The long-haul links and equipment connecting separate data-center sites, increasingly used to scale AI across buildings.
DCIM · Data Center Infrastructure Management
Software that monitors and manages a facility's power, cooling, space and assets in one place.
DCQCN
The congestion-control algorithm tuning RoCE flows using ECN and PFC; mis-tuned it causes victim flows and stalls.
DDTL · Delayed-Draw Term Loan
A loan committed up front but drawn in stages as construction milestones are hit, matching financing to capital needs.
Decode
The memory-bandwidth-bound phase that generates output tokens one at a time after prefill.
Delta-T
The temperature rise of coolant across a cold plate or heat exchanger; it sizes flow rate and the warm-water loop.
Demand response
Reducing or shifting electricity use on the grid operator's request in exchange for payments or cheaper rates.
Depreciation
Spreading an asset's cost over its useful life; book life and economic life can differ and reshape reported margins.
Design-basis
The frozen set of design assumptions and requirements that, once signed, lets long-lead gear be ordered.
DGX
NVIDIA's fully integrated AI server and SuperPOD reference system sold as a turnkey appliance.
Dielectric fluid
A non-conductive liquid (such as engineered fluorocarbons) used in immersion cooling so it can contact electronics safely.
Digital twin
A live software model of a physical facility or system, used to validate designs and optimize operations.
Direct-to-chip liquid cooling · DLC
Cooling that pipes liquid through cold plates mounted directly on processors, the standard for dense AI racks.
Disaggregated serving · P/D disaggregation
Running prefill and decode on separate GPU pools so each scales independently for better efficiency.
Distributed-redundant · 3N/2
A redundancy scheme (e.g. 3N/2, 4N/3) spreading reserve capacity across multiple paths for efficiency over pure 2N.
District heating
A network piping waste heat to warm nearby buildings; a primary outlet for data-center heat reuse in Europe.
DLC · Direct Liquid Cooling
Cold plates on hot chips fed by an isolated coolant loop; the default above the ~100 kW/rack air-cooling cliff.
DPU · Data Processing Unit
A programmable NIC (e.g. BlueField) that offloads networking, storage and security from the host CPU.
Dry cooler
A finned coil that rejects heat to ambient air without evaporating water, saving water at the cost of efficiency on hot days.
DSCR · Debt-Service Coverage Ratio
Operating cash flow divided by debt payments; lenders require it above a threshold to ensure loans are serviceable.
DWPD · Drive Writes Per Day
How many times an SSD's full capacity can be overwritten daily over its warranty, a key endurance rating.
EBITDA
Earnings before interest, taxes, depreciation and amortization; a proxy for operating cash generation.
ECC · Error-Correcting Code
Memory protection that detects and corrects bit errors; uncorrectable ECC errors signal failing memory.
ECN · Explicit Congestion Notification
A signaling method that marks packets to throttle senders before congestion forces drops, key to lossless AI Ethernet.
Edge inference
Running AI models close to where data is generated, at the network edge, to reduce latency and backhaul.
EDPp · Energy-Delay Product
Energy multiplied by latency per operation; a chip figure of merit penalizing designs that are slow or power-hungry.
Egress
Data leaving a cloud or region, typically billed at a premium and a major hidden cost in AI pipelines.
Elastic training
Training that can continue at a reduced GPU count when nodes fail and absorb them back when restored.
Embodied carbon
The greenhouse-gas emissions from making and building hardware and facilities, separate from operating energy.
EN 50600
The European data-center facility standard, with Availability Classes paralleling the Uptime Tier scheme.
Energy Reuse Factor · ERF
Fraction of facility energy recaptured and exported as useful heat rather than rejected to atmosphere.
EPC · Engineering, Procurement and Construction
A delivery model where one contractor designs, buys and builds the project, often under a fixed price.
EPMS · Electrical Power Monitoring System
A system that continuously monitors and records the facility's electrical distribution for reliability and analysis.
Erasure coding
Splitting data into fragments plus parity so it survives multiple drive failures using far less overhead than full copies.
ERCOT
The grid operator for most of Texas, a frequent AI-data-center destination known for fast interconnection and volatility.
ERF · Energy Reuse Factor
Fraction of facility energy exported as useful heat (e.g. district heating); a rare metric where higher is better.
EVM · Earned Value Management
A method tracking project cost and schedule performance by comparing planned, earned and actual value.
Expert parallelism · EP
Distributing a Mixture-of-Experts model's experts across GPUs, routing tokens to whichever GPU holds the chosen expert.
Export controls
Government restrictions (administered by BIS) limiting which advanced AI chips can be sold to which countries.
Facility water · FWS
The building's water loop that ultimately rejects heat to the outdoors, kept separate from the clean chip-cooling loop.
FAT · Factory Acceptance Test
Testing equipment at the factory before shipment to confirm it meets specification, the first commissioning step.
Fat-tree
A multi-tier leaf/spine network topology providing full, non-blocking bandwidth between any pair of nodes.
Fault tolerance
The ability to withstand any single equipment failure without disrupting IT load, the defining trait of Tier IV.
FEC · Forward Error Correction
Encoding that lets the receiver fix transmission errors without retransmission, essential at high link speeds.
FedRAMP
The US government program standardizing security authorization for cloud services used by federal agencies.
FEOC · Foreign Entity of Concern
A designation restricting subsidies or participation for entities tied to certain adversary nations.
FERC · Federal Energy Regulatory Commission
The US agency regulating interstate electricity transmission, wholesale markets and grid interconnection rules.
FLAP-D
Europe's primary data-center markets: Frankfurt, London, Amsterdam, Paris and Dublin.
Floor loading
The structural weight a floor can bear; dense liquid-cooled racks can exceed limits and need reinforced slabs.
FLOPS · Floating-Point Operations Per Second
The standard measure of compute throughput; AI clusters are rated in petaFLOPS and exaFLOPS.
FOAK · First Of A Kind
The first deployment of a novel technology, carrying higher cost and risk than later, proven units.
Fork
A point where a design or program path splits into mutually exclusive options that must be chosen between.
FP4
A 4-bit floating-point format, native in Blackwell-class silicon, pushing inference throughput and density further.
FP8
An 8-bit floating-point format that roughly doubles throughput and halves memory versus 16-bit for AI workloads.
Free cooling · Economizer
Using cool outside air or water to reject heat without running mechanical chillers, saving energy in mild conditions.
FRU · Field-Replaceable Unit
A component designed to be swapped on-site, such as a power supply, fan or drive, without sending the whole system back.
FSDP · Fully Sharded Data Parallel
A training method that shards model parameters, gradients and optimizer state across GPUs to fit larger models.
Gang scheduling
Scheduling all the GPUs a distributed job needs at once, so it either starts fully or waits, avoiding partial deadlock.
GB200
An NVIDIA Blackwell-generation system pairing two GPUs with a Grace CPU, deployed in the NVL72 rack-scale design.
GDPR · General Data Protection Regulation
The EU's comprehensive data-protection law governing how personal data is collected, processed and transferred.
GEMM · General Matrix Multiply
The dense matrix-multiply operation at the heart of neural-network compute and GPU benchmarking.
Genset
An engine-driven generator set providing backup or primary on-site power, typically diesel or natural gas.
Geo-redundancy
Replicating systems or data across geographically separate sites so one site's loss does not cause an outage.
Glycol
An antifreeze added to cooling water (commonly PG25, 25% propylene glycol) to prevent freezing and inhibit corrosion.
GMP · Guaranteed Maximum Price
A contract capping the owner's cost; the contractor absorbs overruns above the agreed maximum.
Golden image
A standardized, validated base system image cloned to every node for consistent, drift-free deployment.
Goodput
Useful work delivered per unit time after subtracting failed, restarted or stale work; the metric that actually matters.
GPU · Graphics Processing Unit
A massively parallel processor that became the workhorse of AI training and inference.
GPU:CPU ratio
The number of GPUs per CPU in a node; AI servers skew heavily toward GPUs, reshaping system balance.
GPUDirect Storage · GDS
NVIDIA technology moving data directly from storage into GPU memory, bypassing the CPU bounce buffer.
Grace
NVIDIA's Arm-based server CPU, paired tightly with its GPUs over a coherent link in superchip designs.
Gray space
The back-of-house area housing power, cooling and infrastructure equipment that supports the white space.
Greenfield
A project built from scratch on undeveloped land, contrasted with brownfield reuse of an existing site.
Grid-forming inverter
An inverter that actively sets grid voltage and frequency, providing stability that conventional follow-the-grid inverters cannot.
GSU · Generator Step-Up transformer
Transformer that raises on-site generator output to grid or distribution voltage for behind-the-meter power.
HBM · High-Bandwidth Memory
Stacked DRAM mounted on-package with the accelerator; the bandwidth and capacity ceiling and the key supply bottleneck.
HBM3E
An enhanced generation of high-bandwidth memory shipping in 2024-2025 accelerators, faster and denser than HBM3.
HBM4
The next high-bandwidth-memory generation with wider interfaces and a logic base die, targeting late-2020s accelerators.
HBOM · Hardware Bill of Materials
An inventory of the components in a hardware product, the hardware analog of an SBOM for supply-chain assurance.
Heat reuse
Capturing data-center waste heat to warm buildings or feed district heating, improving total energy use and ERF.
HGX
NVIDIA's baseboard reference platform integrating 8 GPUs with NVLink, the building block for many AI servers.
High-Bandwidth Memory · HBM
Vertically stacked DRAM bonded next to a processor for huge memory bandwidth; AI's binding supply constraint.
Hot spare
A spare node kept ready to swap in instantly when a failure occurs, minimizing interruption to a running job.
HSM · Hardware Security Module
A tamper-resistant device that generates, stores and uses cryptographic keys, the anchor of key custody.
HV transformer · High-Voltage transformer
Large transformer stepping transmission-level voltage down to site distribution; a long-lead item that often gates schedules.
Hybrid bonding
A copper-to-copper die-stacking technique with far finer, denser connections than solder microbumps.
Hyperscaler
A giant cloud and platform operator (AWS, Microsoft, Google, Meta) building data centers at global scale.
IaC · Infrastructure as Code
Managing infrastructure through version-controlled configuration files rather than manual setup, for repeatability.
IEC 62443
The international standard series for cybersecurity of industrial automation and control systems.
Immersion cooling
Submerging servers in a non-conductive dielectric fluid that carries heat away, in single-phase or boiling two-phase form.
IMS · Integrated Master Schedule
The master project schedule linking all tasks and dependencies, from which the critical path is derived.
Inferentia
Amazon's custom AI inference accelerator, optimized for cost-efficient serving of models.
InfiniBand · IB
A low-latency lossless fabric with native RDMA, the historical default for non-blocking AI training back-ends.
Interconnection queue
The utility waitlist for connecting new large loads or generation to the grid; multi-year waits dominate AI siting.
Interposer
The silicon or organic layer carrying dense wiring between logic and memory in a 2.5D package.
IOPS · Input/Output Operations Per Second
The rate of read/write operations a storage system handles; metadata-heavy AI loads are often IOPS-bound, not bandwidth-bound.
IRR · Internal Rate of Return
The discount rate at which an investment's net present value is zero, a headline measure of project return.
ISO · Independent System Operator
An operator managing grid reliability and power markets in a region, often used interchangeably with RTO.
ISO 22237
The international data-center facility standard series, the global counterpart to Europe's EN 50600.
ISO 27001
The international standard for an information security management system (ISMS), a baseline enterprise security certification.
ISO 42001
The international standard for an AI management system (AIMS), governing responsible development and operation of AI.
IST · Integrated Systems Test
The final commissioning level (L5) that proves all facility systems work together under simulated full load.
ITUE · IT-power Usage Effectiveness
An efficiency metric pushing the boundary inside the server to capture fan, VRM and PSU losses.
Jevons paradox
The principle that efficiency gains can raise total consumption; cheaper AI inference spurs far more demand, not less.
Junction temperature · Tj
The temperature of the actual transistors inside a chip; exceeding its max forces throttling or damage.
Kubernetes · K8s
The dominant open-source container orchestration platform, increasingly used to schedule AI inference and training.
KV cache · Key-Value cache
Stored attention tensors reused during decode; its size grows with context and concurrency, dominating inference memory.
Kyber
NVIDIA's rack-scale platform for the Rubin Ultra generation, scaling NVLink domains to hundreds of GPUs.
Lakehouse
A data architecture (Iceberg, Delta, Hudi) adding database-like transactions and schema to cheap object storage.
Lemon node
A subtly defective node that passes basic checks but repeatedly degrades jobs, found by lemon-node detection.
LGIA · Large Generator Interconnection Agreement
The contract governing how a large facility or generator connects to the transmission grid.
Lights-out operations
Running a facility with minimal on-site staff, relying on remote management and automation.
LLMflation
The rapid collapse in the cost to serve a given level of AI capability as models and hardware improve.
LMP · Locational Marginal Price
The price of electricity at a specific grid node, reflecting local supply, demand and congestion.
Load step · Power transient
A sudden synchronized swing in GPU draw across thousands of chips that stresses the power chain in milliseconds.
Long-lead equipment
Items like transformers, switchgear and chillers whose long procurement times often gate the schedule.
LoRA · Low-Rank Adaptation
A fine-tuning method that trains small low-rank adapter matrices instead of all weights, slashing cost and memory.
Lose a nine
Industry shorthand for an order-of-magnitude drop in availability, e.g. from 99.99% to 99.9% uptime.
LOSF · Lots Of Small Files
A workload pattern of huge numbers of tiny files that stresses storage metadata far more than raw bandwidth.
Maia
Microsoft's custom AI accelerator chip, part of its in-house silicon program for Azure AI.
Makeup water
Fresh water added to a cooling system to replace what evaporation, drift and blowdown remove.
MBU · Model Bandwidth Utilization
Achieved memory bandwidth over peak for memory-bound decode inference; the MBU is the MFU analog when HBM-bound.
Measured boot · Attestation
Recording cryptographic hashes of each boot stage so a remote party can verify a system booted trusted code.
MEC · Multi-access Edge Computing
Placing compute near users at the network edge (e.g. telco sites) to cut latency for real-time AI services.
Memory-bandwidth-bound
A workload limited by how fast data moves from memory rather than by compute; typical of inference decode.
Merchant
Revenue or power sold on the open market without a long-term contract, carrying price risk versus contracted supply.
MFU · Model FLOPs Utilization
Achieved FLOPs divided by peak FLOPs in a training run; 35-55% is good at scale, eroded by collectives and stragglers.
MGX
NVIDIA's modular server reference architecture letting partners build varied GPU systems from common building blocks.
MIG · Multi-Instance GPU
An NVIDIA feature partitioning one GPU into isolated instances so multiple workloads share it securely.
Mixture of Experts · MoE
An architecture routing each token to a few specialized sub-networks, widening parallelism and reshaping fabric needs.
Model FLOPs Utilization · MFU
How much of a chip's theoretical compute a training run actually uses; the headline training-efficiency metric.
MoE · Mixture of Experts
A sparse model that activates only a subset of expert sub-networks per token, cutting compute per token at scale.
Mt Diablo
An OCP/NVIDIA initiative defining an 800 VDC sidecar power architecture for megawatt-class AI racks.
MTBF · Mean Time Between Failures
The average operating time between failures of a component or system, a core reliability input.
MTIA · Meta Training and Inference Accelerator
Meta's custom AI accelerator family, built to run its recommendation and language workloads more cheaply than GPUs.
MTTI · Mean Time To Interruption
The average time a large training job runs before something interrupts it, a key scaling-reliability metric.
MTTR · Mean Time To Repair
The average time to restore a failed component to service, a core driver of overall availability.
MVA · Megavolt-Ampere
Unit of apparent power used to size transformers and switchgear, accounting for both real and reactive load.
MW · Megawatt
Unit of real power; in AI data centers it has become the de facto unit of compute capacity and grid demand.
N+1
Redundancy with one spare component beyond what the load needs, tolerating a single failure or maintenance event.
NCCL · NVIDIA Collective Communications Library
NVIDIA's library implementing optimized multi-GPU collective operations like all-reduce over NVLink and the fabric.
Neocloud
A new breed of GPU-focused cloud provider (CoreWeave, Lambda and peers) renting AI compute outside the big hyperscalers.
NEPA · National Environmental Policy Act
The US law requiring federal projects to assess environmental impacts, a potential permitting gate for some sites.
NERC
The North American body setting and enforcing mandatory grid reliability standards, including critical-infrastructure rules.
NIST SP 800-53
The US catalog of security and privacy controls for federal information systems, a foundation for many compliance regimes.
Non-blocking
A network that can carry full bandwidth between all node pairs simultaneously with no internal contention (1:1).
NPDES
The US permit program regulating pollutant discharges to surface waters, governing cooling-water blowdown.
NPV · Net Present Value
The present value of future cash flows minus the investment, the core go/no-go metric for capital projects.
NVL72
An NVIDIA rack connecting 72 Blackwell GPUs into one NVLink domain that behaves as a single huge accelerator.
NVLink
NVIDIA's high-bandwidth GPU-to-GPU interconnect for tying many GPUs into one memory-coherent scale-up domain.
NVMe · Non-Volatile Memory Express
The high-speed protocol for SSDs over PCIe, the storage interface standard in modern AI servers.
NVSwitch
NVIDIA's switch chip that fully connects all GPUs in a scale-up domain and can do in-network reductions.
Object storage
Storage that keeps data as objects in a flat namespace accessed by API (e.g. S3), the backbone for AI datasets.
OCP · Open Compute Project
An industry community that open-sources data-center hardware designs for racks, power, cooling and security.
Off-gas detection
Sensing the gases a failing battery cell vents, an early-warning trigger before thermal runaway and fire.
Opex · Operating Expenditure
Ongoing running costs such as power, water, staff and maintenance, expensed as incurred.
Optimizer state
The extra per-parameter data an optimizer like Adam keeps (momentum, variance), often doubling or tripling memory needs.
ORR · Operational Readiness Review
A formal gate confirming a facility or cluster is ready to safely take on production load.
ORV3 · Open Rack V3
The Open Compute Project's third-generation rack standard, defining power, busbar and form-factor for open hardware.
OSAT · Outsourced Semiconductor Assembly and Test
Companies that package and test chips after fabrication, a key and capacity-constrained step for advanced AI silicon.
OT · Operational Technology
The control systems running physical infrastructure (power, cooling, building management), a growing cyber-attack surface.
Oversubscription
Provisioning less network bandwidth than full non-blocking would require; 1:1 is non-blocking, 3:1 is oversubscribed.
P50 / P90
Confidence levels for a schedule or estimate: P50 is the median outcome, P90 the value met 90% of the time.
PagedAttention
A vLLM technique that manages the KV cache in fixed pages like virtual memory, cutting waste and fragmentation.
Parallel file system
A storage system (Lustre, GPFS, WEKA, VAST) serving many clients at once with high aggregate bandwidth for AI clusters.
PCIe · Peripheral Component Interconnect Express
The standard high-speed bus connecting CPUs, GPUs, NICs and storage inside a server.
PDU · Power Distribution Unit
Equipment that distributes electrical power to racks; rack PDUs are the metered strips feeding individual servers.
PEFT · Parameter-Efficient Fine-Tuning
A family of methods (like LoRA) that adapt large models by training only a tiny fraction of parameters.
PFAS
Persistent fluorinated 'forever chemicals' found in some dielectric coolants, raising environmental and regulatory concern.
PFC · Priority Flow Control
An Ethernet mechanism that pauses traffic to prevent packet loss, making RoCE lossless but risking head-of-line blocking.
Phase gate · Stage gate
A go/no-go decision point between project phases where deliverables are reviewed and capital is released.
Phase I ESA · Environmental Site Assessment
A desktop and walkover review of a site's environmental history to flag contamination risk before purchase.
PILOT · Payment In Lieu Of Taxes
A negotiated payment a data center makes instead of standard property taxes, often part of siting incentives.
Pipeline bubble
Idle GPU time at the start and end of pipeline-parallel execution while the pipeline fills and drains.
Pipeline parallelism · PP
Splitting a model's layers across GPU groups that process different micro-batches in an assembly line.
PJM
The largest US regional grid operator, covering the mid-Atlantic and a major data-center hub with long queues.
Point of interconnection · POI
The physical point where a facility's electrical system connects to the utility grid.
PoP · Point of Presence
A network access location where a provider's infrastructure meets users or other networks.
Post-training
The fine-tuning and alignment stages (SFT, RLHF) after pretraining that shape a model's behavior and usefulness.
Power capping
Limiting how much power chips or racks can draw to stay within facility limits, at the cost of some performance.
Power factor
Ratio of real power (kW) to apparent power (kVA); a low power factor wastes capacity through reactive current.
Power oversubscription
Provisioning more IT than the power can sustain at full draw, relying on workloads rarely peaking together.
Power Purchase Agreement · PPA
Multi-year electricity contract fixing price and often sourcing clean energy for a facility.
Power Usage Effectiveness · PUE
Total facility power divided by IT power; the headline data-center efficiency ratio, lower is better.
Powered shell
A building delivered with power and core infrastructure in place but without IT fit-out, ready for a tenant to finish.
PPA · Power Purchase Agreement
Long-term contract to buy electricity (often renewable) at a set price, used to secure and green a site's power.
Preemption
Pausing or evicting a lower-priority job to free resources for a higher-priority one in a shared cluster.
Prefill
The compute-heavy phase that processes an inference prompt and builds its KV cache before generation begins.
Pretraining
The initial, compute-heavy phase that trains a model on vast unlabeled data to learn general capabilities.
Provenance register
A documented record tracing a component's origin and chain of custody to assure supply-chain integrity.
PSU · Power Supply Unit
The component that converts incoming AC or DC into the regulated voltages a server or GPU node consumes.
PUE · Power Usage Effectiveness
Total facility power divided by IT power; the headline efficiency ratio where 1.0 is perfect and AI halls target ~1.1-1.2.
Purdue model
A reference architecture that layers and segments industrial control networks to contain cyber threats.
PXE boot · Preboot Execution Environment
Booting a server over the network to load its OS image, the basis of automated bare-metal provisioning.
QLC · Quad-Level Cell
Flash storing four bits per cell, offering high density and low cost at the expense of write endurance and speed.
Quantization
Using lower numerical precision (FP8, FP4, INT8) to cut memory and boost throughput, trading some accuracy for speed.
Quick-disconnect · QD
A self-sealing coupling that connects or separates a liquid line without leaks, used to service cooled racks.
Rail-optimized
A topology pinning each GPU's NIC to a dedicated switch 'rail' for collision-free, non-blocking training collectives.
RBD · Reliability Block Diagram
A modeling technique that maps components as a network of blocks to compute overall system availability.
RDHx · Rear-Door Heat Exchanger
A liquid-cooled rack door removing ~50-100 kW without piping liquid to the chips; a brownfield step toward full DLC.
RDMA · Remote Direct Memory Access
Network transfers that move data directly between machines' memory, bypassing the CPU for low latency.
Reactive power
Power that oscillates between source and load without doing work, measured in VAR, that must be managed and corrected.
Rear-door heat exchanger · RDHx
A radiator-style door on the back of a rack that cools exhaust air with liquid before it enters the room.
REC · Renewable Energy Certificate
A tradable certificate representing one MWh of renewable generation, used to claim clean-energy use.
Redfish
A modern REST API standard for remotely managing server and infrastructure hardware, succeeding legacy IPMI.
REF · Renewable Energy Factor
The share of a facility's energy supplied from renewable sources, a sustainability companion to PUE.
Reticle
The maximum area a lithography tool can pattern in one exposure (~858 mm2), capping how large a single die can be.
Reversibility
Designing a decision so it can be undone cheaply; reversible choices warrant less analysis than one-way doors.
Reward model
A model trained to score outputs, providing the reward signal that guides reinforcement-learning fine-tuning.
RFS · Ready For Service
The milestone at which a facility or capacity block is fully commissioned and available to take load.
Ride-through
A system's ability to stay running through a brief power or cooling disturbance instead of tripping offline.
RIM · Reference Integrity Manifest
A signed reference of expected firmware measurements used to verify a device booted untampered code.
RLHF · Reinforcement Learning from Human Feedback
Tuning a model using human preference rankings to make its outputs more helpful and aligned.
RMA · Return Merchandise Authorization
The process of returning failed hardware to a vendor for repair or replacement under warranty.
RoCE · RDMA over Converged Ethernet
RDMA carried on Ethernet (made lossless via PFC/ECN); the open, cost-driven alternative to InfiniBand for AI fabrics.
ROCm
AMD's open GPU-computing software stack, its counterpart to NVIDIA's CUDA ecosystem.
Rollout
In reinforcement learning, generating a trajectory of model actions and outcomes used to compute training rewards.
Root of trust · RoT
A hardware-anchored trusted base that verifies firmware and boot integrity before a system is trusted to run.
RTO · Regional Transmission Organization
An entity operating the grid and wholesale power market across multiple utilities in a region (e.g. PJM, ERCOT).
Rubin
NVIDIA's GPU architecture generation following Blackwell, with Rubin Ultra pushing rack-scale density further.
S3 · Simple Storage Service
Amazon's object-storage service whose API has become the de facto standard for cloud object storage.
SBOM · Software Bill of Materials
A formal inventory of all components in a piece of software, used to track and respond to supply-chain risk.
SCADA · Supervisory Control and Data Acquisition
Industrial control software that monitors and operates a facility's physical systems like power and cooling.
Scalable Unit · SU
The standardized, repeatable design and procurement increment used to scale a cluster predictably.
Scale-out
Connecting many nodes into a looser cluster fabric (InfiniBand or Ethernet) to scale beyond one coherent domain.
Scale-up
Tightly coupling GPUs into one coherent high-bandwidth domain (NVLink class) that acts like a single large accelerator.
Scaling laws
Empirical relationships predicting how model quality improves with more compute, data and parameters.
Scope 1
Direct greenhouse-gas emissions from sources a company owns or controls, such as on-site generators.
Scope 2
Indirect emissions from the purchased electricity, heat or cooling a facility consumes.
Scope 3
All other indirect emissions in the value chain, including the embodied carbon of equipment and construction.
SDC · Silent Data Corruption
Errors that corrupt computation without any alert; at fleet scale they silently spoil training and must be hunted.
Secure boot
A boot process that cryptographically verifies each firmware and software stage before allowing it to run.
SerDes · Serializer/Deserializer
The circuit converting parallel data to a high-speed serial stream and back; lane rate sets link bandwidth.
SFT · Supervised Fine-Tuning
Adapting a pretrained model by training it on labeled example responses for a target behavior or domain.
SGLang
An open-source LLM serving framework with fast structured generation and aggressive KV-cache reuse.
SLA · Service Level Agreement
A contractual promise of service performance (uptime, latency) with penalties or credits if it is missed.
SLO · Service Level Objective
A target for a service metric such as latency or availability that an inference fleet is sized to meet.
Slurm
A widely used open-source workload manager and job scheduler for HPC and AI clusters.
Small Modular Reactor · SMR
Compact, factory-fabricated nuclear unit pitched as scalable clean-firm power for data centers.
SmartNIC
A network card with onboard processing that offloads packet, storage and security work from the server CPU.
SMR · Small Modular Reactor
Factory-built nuclear reactor under ~300 MW, proposed as clean firm power for large AI campuses.
SOC 2
An audit report (Type II covers a period) attesting that an organization's security and availability controls work as described.
SOFC · Solid-Oxide Fuel Cell
A high-temperature fuel cell generating clean on-site electricity from gas or hydrogen for data-center power.
Solid-State Transformer · SST
High-efficiency electronic transformer converting medium voltage directly to the DC bus feeding modern racks.
Sovereign AI
A nation's drive to own and control AI compute, data and models within its borders for security and autonomy.
Spectrum-X
NVIDIA's Ethernet platform tuning RoCE for AI collectives with adaptive routing and congestion control.
Speculative decoding
Using a small draft model to guess several tokens that a large model verifies in parallel, speeding generation.
SPOF · Single Point of Failure
A component whose failure alone takes down the whole system; eliminating SPOFs is the goal of redundant design.
SPV · Special-Purpose Vehicle
A standalone, bankruptcy-remote legal entity created to own and finance a single project and ring-fence its risk.
SST · Solid-State Transformer
Power-electronics transformer (~99% efficient) enabling medium-voltage-to-DC conversion for 800 VDC rack architectures.
Straggler
A node running slower than its peers that holds up a synchronized collective and drags down whole-job throughput.
Stranded capacity
Provisioned power, cooling or space that cannot be used because a different resource is the binding constraint.
Stranded power
Generation or interconnection capacity that exists but cannot reach load due to transmission or siting limits.
STS · Static Transfer Switch
A solid-state switch that instantly transfers a load between two power sources without interruption.
SU · Scalable Unit
A repeatable build block (a defined MW + GPU + cooling + fabric increment) that capacity ramps are composed of.
Substation
The facility transforming and switching power between transmission and the site, a major long-lead build item.
Super-load · Heavy-haul
An oversized, very heavy shipment (such as a large transformer) needing special permits and routing logistics.
SuperPOD · DGX SuperPOD
NVIDIA's reference cluster design wiring many DGX systems into a validated, scalable AI supercomputer.
Switchgear
Assembly of breakers, switches and protection that controls and isolates electrical circuits; a long-lead procurement item.
Systolic array
A grid of processing elements that pumps data through in lockstep, the core structure of TPUs and many AI ASICs.
Tail latency · p99
The slowest few percent of responses (e.g. 99th percentile); in clusters one slow node can stall a whole job.
Take-or-pay
Contract obligating the buyer to pay for a minimum quantity of power or capacity whether or not it is used.
Tape-out
The milestone of finalizing a chip design and sending it to the foundry for fabrication.
TCO · Total Cost of Ownership
The full lifetime cost of capacity, including capex amortization, power, cooling, staff and maintenance.
TDP · Thermal Design Power
The sustained power and heat a chip package must dissipate; the per-chip number that drives rack density and cooling.
TEE · Trusted Execution Environment
A hardware-isolated, encrypted region of a processor that protects code and data even from the host operator.
Tensor parallelism · TP
Splitting a single layer's math across multiple GPUs so they jointly compute one forward/backward pass.
TensorRT-LLM
NVIDIA's optimized library for compiling and serving large language models at low latency on its GPUs.
Test-time compute
Spending extra inference compute (e.g. chain-of-thought reasoning) to improve answers, shifting cost from training to serving.
The cascade · Training-to-inference
The lifecycle where today's training hardware becomes tomorrow's inference fleet as newer chips arrive.
Thermal Design Power · TDP
Sustained heat-dissipation requirement of a processor, setting cooling and rack-density needs.
Thermal runaway
A self-reinforcing temperature rise, notably in batteries, that can lead to fire if not detected and contained.
TIA-942
A telecom-industry standard for data-center infrastructure with its own Rated 1-4 reliability classification.
Tier III
An Uptime classification meaning concurrently maintainable: any component can be serviced without taking IT load down.
Tier IV
The top Uptime classification meaning fault tolerant: the facility survives any single failure with no impact.
TIM · Thermal Interface Material
The paste or pad filling microscopic gaps between a chip and its heat spreader or cold plate to conduct heat.
Time-to-power · Speed-to-power
The elapsed time from contract to energized megawatts; the binding constraint and primary siting screen of the AI era.
TLC · Triple-Level Cell
Flash storing three bits per cell, the mainstream balance of cost, endurance and performance for SSDs.
Tokens-per-joule
Inference energy efficiency measured as tokens generated per joule, a cross-vendor comparator that survives generations.
Tokens-per-watt
Inference efficiency framed as tokens produced per watt of power, used to compare accelerators and fleets.
Topology-aware scheduling
Placing a job's GPUs to respect network topology so its collectives run on high-bandwidth, low-latency links.
Total Cost of Ownership · TCO
The all-in cost of running infrastructure over its life, not just the purchase price.
TPOT · Time Per Output Token
The steady-state delay between successive generated tokens; the inter-token latency SLO governed by decode.
TPU · Tensor Processing Unit
Google's custom AI accelerator chip, built around a systolic array for matrix math and used across its cloud and models.
Trainium
Amazon's custom AI training accelerator, part of its bid to reduce dependence on merchant GPUs.
Truck roll
Dispatching a technician on-site to fix something; minimizing truck rolls is a goal of remote and automated ops.
TSV · Through-Silicon Via
A vertical electrical channel drilled through a die to stack chips, the wiring that makes HBM and 3D stacking possible.
TTFT · Time To First Token
How long an inference request waits before the first output token appears; a key latency SLO set by prefill.
TUE · Total Usage Effectiveness
PUE multiplied by IT-side efficiency (ITUE); the true facility-to-transistor energy ratio.
Two-person rule
A control requiring two authorized people to act together for a sensitive operation, reducing insider risk.
Two-phase cooling
Cooling that absorbs heat by boiling a fluid and condensing it, exploiting latent heat for very high heat flux.
UALink
An open scale-up interconnect standard for up to 1,024 accelerators, the multi-vendor alternative to NVLink.
UCIe · Universal Chiplet Interconnect Express
An open standard for connecting chiplets from different vendors within one package.
UEC · Ultra Ethernet Consortium
The industry group defining AI-grade Ethernet transport (UET) with packet spray and modern congestion control.
Uninterruptible Power Supply · UPS
Power system that keeps IT load energized through grid sags and outages until backup generation starts.
UPS · Uninterruptible Power Supply
System (battery or flywheel backed) that maintains clean power through grid disturbances and bridges to generators.
Uptime Tier
Uptime Institute's I-IV classification of facility resilience, from basic (I) to fault-tolerant 2N (IV).
UQD · Universal Quick Disconnect
A dripless connector letting liquid-cooled hardware be plugged and unplugged without spilling coolant or tools.
vLLM
A popular open-source inference engine known for PagedAttention and high-throughput continuous batching.
Voltage Regulator Module · VRM
On-board converter delivering the precise low voltage a processor core requires from the board supply.
VRM · Voltage Regulator Module
Power-electronics stage that steps board voltage down to the low voltage a GPU or CPU core actually needs.
WACC · Weighted Average Cost of Capital
The blended cost of a project's debt and equity, used as the discount rate for valuing its cash flows.
Warm-water loop
A liquid-cooling loop run at elevated temperature so heat can be rejected with free cooling and reused downstream.
Water Usage Effectiveness · WUE
Liters of water consumed per kWh of IT energy; the water-efficiency companion to PUE.
Water-positive
A commitment to replenish more water than a facility consumes, a sustainability pledge by several operators.
Wet-bulb temperature
The lowest temperature achievable by evaporation, setting the floor for how well evaporative cooling can perform.
White space
The conditioned data-hall area where IT racks sit, as opposed to support (gray) space for power and cooling gear.
WORM · Write Once Read Many
Storage that prevents data from being altered or deleted after writing, used for compliance and tamper resistance.
WUE · Water Usage Effectiveness
Liters of water consumed per kWh of IT energy; the water analog of PUE that evaporative cooling worsens.
XID
An NVIDIA GPU error code reported by the driver; specific XIDs flag memory, hardware or driver faults to triage.
XPU
Generic term for a non-GPU AI accelerator such as a TPU, Trainium, Maia or MTIA; hyperscaler custom silicon.
Young/Daly
The formula setting the optimal checkpoint interval by balancing checkpoint cost against expected failure-rollback loss.
ZeRO · Zero Redundancy Optimizer
A technique partitioning optimizer state, gradients and parameters across GPUs to remove memory redundancy in training.
Zero trust
A security model that trusts no user or device by default and verifies every access request continuously.
Zero-touch provisioning · ZTP
Automatically configuring devices on first power-up with no manual setup, key to deploying at scale.
ZLD · Zero Liquid Discharge
A water system that recovers nearly all wastewater for reuse, leaving essentially no liquid discharge.