Glossary

408 terms across power, cooling, compute, networking, reliability, and economics.

The all-in cost to operate one accelerator for one hour; the core unit economic for build-versus-rent decisions.

The cost to serve one million tokens, the revenue-side unit economic for inference businesses.

24/7 CFE · Carbon-Free Energy

Matching every hour of consumption with carbon-free generation, a stricter goal than annual renewable matching.

Redundancy where the entire power/cooling system is fully mirrored, so any single path can fail with zero impact.

Direct-current rack distribution for megawatt-class racks that cuts conversion stages and copper versus traditional AC.

ABS · Asset-Backed Securitization

Financing that bundles cash-flowing assets (such as leases or GPUs) into securities sold to investors.

Aeroderivative turbine

A jet-engine-derived gas turbine that starts fast and ramps quickly, suited to on-site or backup power.

Physically isolating a system or network from any external connection to protect highly sensitive workloads.

A collective that assembles each GPU's data shard onto every GPU, common in sharded training and inference.

A collective operation that sums gradients across all GPUs and shares the result, the dominant traffic in training.

Routing that sends a request to the nearest of several sites sharing one address, used for low-latency global serving.

Approach temperature

The gap between a coolant's temperature and the medium it rejects heat to; a small approach demands a bigger exchanger.

A reference workload pattern (pretraining, post-training, RL, online or batch inference, edge) that drives design choices.

Arithmetic intensity

The ratio of compute operations to bytes moved; it determines whether a kernel is compute- or memory-bound.

The engineering society whose thermal guidelines define the temperature and humidity envelopes for IT equipment.

The industry committee whose A1-A4 air and W17-W45 water classes define the temperature envelopes IT gear can run in.

ASIC · Application-Specific Integrated Circuit

A chip hard-wired for one task; AI ASICs trade flexibility for efficiency versus general-purpose GPUs.

ATS · Automatic Transfer Switch

A switch that automatically moves load from utility to backup generator when grid power fails.

Generating output one token at a time, each conditioned on all previous tokens, the basis of LLM decoding.

Availability zone · AZ

An isolated data-center location within a cloud region, designed to fail independently of its peers.

Compute spent on work that is wasted (failed, redundant or thrown away), the opposite of goodput.

Battery Backup Unit · BBU

Rack-mounted battery providing short ride-through and load-transient smoothing for AI racks.

Battery Energy Storage System · BESS

Large battery installation buffering power transients and bridging outages at the facility level.

BBU · Battery Backup Unit

Rack-level battery that rides through power dips and absorbs sudden GPU load swings before the BESS or genset reacts.

Behind-the-meter · BTM

On-site generation connected on the customer side of the utility meter, bypassing grid interconnection delays.

BER · Bit-Error Rate

The fraction of transmitted bits received in error; a key quality metric for high-speed optical and copper links.

BESS · Battery Energy Storage System

Facility-scale battery bank that smooths load transients, provides ride-through, and can bridge to generators.

BF16 · Brain Float 16

A 16-bit floating-point format with a wide exponent range, a common default for stable AI training.

Bisection bandwidth

The aggregate bandwidth across the worst-case cut of a network; the key figure for all-reduce-heavy training.

Restarting generation and energizing a grid or site from a complete shutdown without external power.

Black-building test · Pull-the-plug test

A commissioning test that cuts utility power to prove backup generation and transfer carry the full load.

Blast radius · Fault domain

The scope of impact when something fails; good design shrinks the blast radius so one fault affects few resources.

Periodically draining concentrated mineral-laden water from a cooling tower to control scale, a key water loss.

BMC · Baseboard Management Controller

An always-on chip that remotely monitors and manages a server's hardware independent of its CPU and OS.

BMS · Building Management System

The control system that runs a facility's mechanical and electrical equipment such as cooling, power and alarms.

Breakeven utilization

The fraction of capacity that must be sold or used for a facility's revenue to cover its costs.

The process of powering, validating and tuning a new cluster until it passes acceptance and can run real jobs.

A project that reuses or retrofits an existing building or site rather than building new.

Build-to-suit · BTS

A data center custom-built and leased to a specific tenant's specifications, usually under a long-term contract.

Running new hardware hard for a period to surface early ('infant mortality') failures before production use.

A solid metal conductor distributing high current within a rack or system, replacing bulky cabling.

Overhead enclosed busbar run with tap-off boxes that distributes power flexibly across rows of racks.

CAGR · Compound Annual Growth Rate

The smoothed yearly growth rate of a quantity over a period, used to project demand, cost or capacity.

An open-source silicon root-of-trust block letting chips verify their own firmware, backed by OCP and hyperscalers.

Capex · Capital Expenditure

Up-front spending on long-lived assets like buildings, power gear and GPUs, depreciated over their useful life.

Cascade-to-inference

Repurposing older training GPUs for inference as newer chips take over training, extending hardware economic life.

CCGT · Combined-Cycle Gas Turbine

A high-efficiency gas power plant that reuses turbine exhaust heat to drive a steam turbine, a behind-the-meter option.

CDN · Content Delivery Network

A distributed network of caching servers that delivers content from locations close to users.

CDU · Coolant Distribution Unit

Heat exchanger plus pumps isolating the clean technology loop from facility water and providing leak containment.

CFD · Computational Fluid Dynamics

Simulation of airflow and heat that engineers use to design and verify data-center cooling.

Chain-of-thought · CoT

Prompting or training a model to reason in explicit intermediate steps, trading more tokens for better answers.

A periodic save of model weights and optimizer state so a long training run can resume after a failure.

Chip-on-Wafer-on-Substrate · CoWoS

TSMC advanced-packaging process integrating compute and memory on an interposer; an upstream supply chokepoint.

A smaller die combined with others in one package, letting designers mix processes and beat single-die size limits.

Chunked prefill

Breaking a long prompt's prefill into chunks interleaved with decode so latency stays steady under load.

Circular financing

Arrangements where a chip vendor invests in customers who use the money to buy its chips, raising scrutiny over demand.

SemiAnalysis's rating system grading GPU cloud providers on reliability, performance and operational maturity.

CMMC · Cybersecurity Maturity Model Certification

The US Defense Department framework certifying contractors' cybersecurity to handle controlled unclassified information.

Co-Packaged Optics · CPO

Bringing optical engines into the chip package to push high-speed links further than copper allows.

COD · Commercial Operation Date

The date a facility or power asset begins commercial service, often a contractual and financing milestone.

A liquid-cooled metal block clamped to a chip that conducts heat into the coolant in a direct-liquid-cooling loop.

A coordinated communication pattern across many GPUs (all-reduce, all-gather, reduce-scatter) central to distributed AI.

Colocation · Colo

Renting space, power and cooling in a shared facility, in wholesale (large blocks) or retail (rack-level) form.

Commissioning · Cx

The structured testing (levels L1-L5) that proves a facility works correctly under load before it goes live.

Concurrent maintainability

The ability to maintain or replace any component without shutting down IT load, the defining trait of Tier III.

Confidential computing

Protecting data while it is being processed by running it inside a hardware trusted execution environment.

Physically separating hot and cold air (cold-aisle, hot-aisle, or chimney) so cooling air is not wasted by mixing.

Continuous batching · In-flight batching

Dynamically adding and removing requests from a running inference batch to keep the GPU busy and lift throughput.

Coolant Distribution Unit · CDU

Unit that pumps and conditions coolant to racks while isolating it from the facility water system.

Cooling cliff · Density wall

The rack-power point (~100 kW) above which air cooling fails and liquid cooling becomes mandatory.

Cooling distribution · L2L / L2A

Heat-exchange schemes moving heat liquid-to-liquid (L2L) or liquid-to-air (L2A) between cooling loops.

A structure that rejects heat by evaporating water; efficient but the main driver of data-center water consumption.

COP · Coefficient of Performance

A heat pump or chiller's ratio of heat moved to electricity consumed; higher means more efficient cooling or heating.

Cordon and drain

Marking a node unschedulable and moving its work off so it can be serviced without disrupting the cluster.

CoWoS · Chip-on-Wafer-on-Substrate

TSMC's 2.5D packaging that joins logic die and HBM stacks on a silicon interposer; its wafer capacity gates AI supply.

CPO · Co-Packaged Optics

Optics integrated into the switch or accelerator package to beat copper-reach limits at the cost of serviceability.

CRAC · Computer Room Air Conditioner

A refrigerant-based room cooling unit; the legacy air-cooling workhorse now giving way to liquid for AI density.

CRAH · Computer Room Air Handler

A chilled-water room air handler that cools the data hall, more efficient than refrigerant-based CRAC units.

Critical path · CPM

The longest chain of dependent tasks whose any slip delays the whole project; everything off it has float.

NVIDIA's programming platform for general-purpose GPU computing; the software moat underpinning its AI dominance.

CUE · Carbon Usage Effectiveness

Kilograms of CO2-equivalent emitted per kWh of IT energy; the carbon companion to PUE.

Curtailable load

Load a site agrees to reduce on the grid operator's signal, trading interruptions for faster or cheaper interconnection.

Forced reduction of a load's or generator's output, often to manage grid constraints; AI loads may trade it for speed.

CxA · Commissioning Agent

The independent party that plans and verifies commissioning to confirm the facility performs as designed.

CXL · Compute Express Link

A cache-coherent interconnect over PCIe that lets CPUs, accelerators and memory pools share memory across devices.

DAC · Direct Attach Copper

A copper cable carrying high-speed signals over short reach, cheaper than optics for in-rack and adjacent links.

DAOS · Distributed Asynchronous Object Storage

An open-source high-performance storage system built for NVMe and persistent memory in HPC and AI clusters.

The tendency for large datasets to attract compute and services, making data expensive and slow to move.

Data parallelism · DP

Replicating the model across GPUs that each process different data and synchronize gradients via all-reduce.

The requirement that data be stored and processed within a specific country or jurisdiction.

Data sovereignty

The principle that data is subject to the laws of the nation where it is collected or stored.

DCGM · Data Center GPU Manager

NVIDIA's tool for monitoring, diagnosing and health-checking GPUs across a fleet.

DCI · Data Center Interconnect

The long-haul links and equipment connecting separate data-center sites, increasingly used to scale AI across buildings.

DCIM · Data Center Infrastructure Management

Software that monitors and manages a facility's power, cooling, space and assets in one place.

The congestion-control algorithm tuning RoCE flows using ECN and PFC; mis-tuned it causes victim flows and stalls.

DDTL · Delayed-Draw Term Loan

A loan committed up front but drawn in stages as construction milestones are hit, matching financing to capital needs.

The memory-bandwidth-bound phase that generates output tokens one at a time after prefill.

The temperature rise of coolant across a cold plate or heat exchanger; it sizes flow rate and the warm-water loop.

Demand response

Reducing or shifting electricity use on the grid operator's request in exchange for payments or cheaper rates.

Spreading an asset's cost over its useful life; book life and economic life can differ and reshape reported margins.

The frozen set of design assumptions and requirements that, once signed, lets long-lead gear be ordered.

NVIDIA's fully integrated AI server and SuperPOD reference system sold as a turnkey appliance.

Dielectric fluid

A non-conductive liquid (such as engineered fluorocarbons) used in immersion cooling so it can contact electronics safely.

A live software model of a physical facility or system, used to validate designs and optimize operations.

Direct-to-chip liquid cooling · DLC

Cooling that pipes liquid through cold plates mounted directly on processors, the standard for dense AI racks.

Disaggregated serving · P/D disaggregation

Running prefill and decode on separate GPU pools so each scales independently for better efficiency.

Distributed-redundant · 3N/2

A redundancy scheme (e.g. 3N/2, 4N/3) spreading reserve capacity across multiple paths for efficiency over pure 2N.

District heating

A network piping waste heat to warm nearby buildings; a primary outlet for data-center heat reuse in Europe.

DLC · Direct Liquid Cooling

Cold plates on hot chips fed by an isolated coolant loop; the default above the ~100 kW/rack air-cooling cliff.

DPU · Data Processing Unit

A programmable NIC (e.g. BlueField) that offloads networking, storage and security from the host CPU.

A finned coil that rejects heat to ambient air without evaporating water, saving water at the cost of efficiency on hot days.

DSCR · Debt-Service Coverage Ratio

Operating cash flow divided by debt payments; lenders require it above a threshold to ensure loans are serviceable.

DWPD · Drive Writes Per Day

How many times an SSD's full capacity can be overwritten daily over its warranty, a key endurance rating.

Earnings before interest, taxes, depreciation and amortization; a proxy for operating cash generation.

ECC · Error-Correcting Code

Memory protection that detects and corrects bit errors; uncorrectable ECC errors signal failing memory.

ECN · Explicit Congestion Notification

A signaling method that marks packets to throttle senders before congestion forces drops, key to lossless AI Ethernet.

Running AI models close to where data is generated, at the network edge, to reduce latency and backhaul.

EDPp · Energy-Delay Product

Energy multiplied by latency per operation; a chip figure of merit penalizing designs that are slow or power-hungry.

Data leaving a cloud or region, typically billed at a premium and a major hidden cost in AI pipelines.

Elastic training

Training that can continue at a reduced GPU count when nodes fail and absorb them back when restored.

Embodied carbon

The greenhouse-gas emissions from making and building hardware and facilities, separate from operating energy.

The European data-center facility standard, with Availability Classes paralleling the Uptime Tier scheme.

Energy Reuse Factor · ERF

Fraction of facility energy recaptured and exported as useful heat rather than rejected to atmosphere.

EPC · Engineering, Procurement and Construction

A delivery model where one contractor designs, buys and builds the project, often under a fixed price.

EPMS · Electrical Power Monitoring System

A system that continuously monitors and records the facility's electrical distribution for reliability and analysis.

Splitting data into fragments plus parity so it survives multiple drive failures using far less overhead than full copies.

The grid operator for most of Texas, a frequent AI-data-center destination known for fast interconnection and volatility.

ERF · Energy Reuse Factor

Fraction of facility energy exported as useful heat (e.g. district heating); a rare metric where higher is better.

EVM · Earned Value Management

A method tracking project cost and schedule performance by comparing planned, earned and actual value.

Expert parallelism · EP

Distributing a Mixture-of-Experts model's experts across GPUs, routing tokens to whichever GPU holds the chosen expert.

Export controls

Government restrictions (administered by BIS) limiting which advanced AI chips can be sold to which countries.

Facility water · FWS

The building's water loop that ultimately rejects heat to the outdoors, kept separate from the clean chip-cooling loop.

FAT · Factory Acceptance Test

Testing equipment at the factory before shipment to confirm it meets specification, the first commissioning step.

A multi-tier leaf/spine network topology providing full, non-blocking bandwidth between any pair of nodes.

Fault tolerance

The ability to withstand any single equipment failure without disrupting IT load, the defining trait of Tier IV.

FEC · Forward Error Correction

Encoding that lets the receiver fix transmission errors without retransmission, essential at high link speeds.

The US government program standardizing security authorization for cloud services used by federal agencies.

FEOC · Foreign Entity of Concern

A designation restricting subsidies or participation for entities tied to certain adversary nations.

FERC · Federal Energy Regulatory Commission

The US agency regulating interstate electricity transmission, wholesale markets and grid interconnection rules.

Europe's primary data-center markets: Frankfurt, London, Amsterdam, Paris and Dublin.

The structural weight a floor can bear; dense liquid-cooled racks can exceed limits and need reinforced slabs.

FLOPS · Floating-Point Operations Per Second

The standard measure of compute throughput; AI clusters are rated in petaFLOPS and exaFLOPS.

FOAK · First Of A Kind

The first deployment of a novel technology, carrying higher cost and risk than later, proven units.

A point where a design or program path splits into mutually exclusive options that must be chosen between.

A 4-bit floating-point format, native in Blackwell-class silicon, pushing inference throughput and density further.

An 8-bit floating-point format that roughly doubles throughput and halves memory versus 16-bit for AI workloads.

Free cooling · Economizer

Using cool outside air or water to reject heat without running mechanical chillers, saving energy in mild conditions.

FRU · Field-Replaceable Unit

A component designed to be swapped on-site, such as a power supply, fan or drive, without sending the whole system back.

FSDP · Fully Sharded Data Parallel

A training method that shards model parameters, gradients and optimizer state across GPUs to fit larger models.

Gang scheduling

Scheduling all the GPUs a distributed job needs at once, so it either starts fully or waits, avoiding partial deadlock.

An NVIDIA Blackwell-generation system pairing two GPUs with a Grace CPU, deployed in the NVL72 rack-scale design.

GDPR · General Data Protection Regulation

The EU's comprehensive data-protection law governing how personal data is collected, processed and transferred.

GEMM · General Matrix Multiply

The dense matrix-multiply operation at the heart of neural-network compute and GPU benchmarking.

An engine-driven generator set providing backup or primary on-site power, typically diesel or natural gas.

Replicating systems or data across geographically separate sites so one site's loss does not cause an outage.

An antifreeze added to cooling water (commonly PG25, 25% propylene glycol) to prevent freezing and inhibit corrosion.

GMP · Guaranteed Maximum Price

A contract capping the owner's cost; the contractor absorbs overruns above the agreed maximum.

A standardized, validated base system image cloned to every node for consistent, drift-free deployment.

Useful work delivered per unit time after subtracting failed, restarted or stale work; the metric that actually matters.

GPU · Graphics Processing Unit

A massively parallel processor that became the workhorse of AI training and inference.

The number of GPUs per CPU in a node; AI servers skew heavily toward GPUs, reshaping system balance.

GPUDirect Storage · GDS

NVIDIA technology moving data directly from storage into GPU memory, bypassing the CPU bounce buffer.

NVIDIA's Arm-based server CPU, paired tightly with its GPUs over a coherent link in superchip designs.

The back-of-house area housing power, cooling and infrastructure equipment that supports the white space.

A project built from scratch on undeveloped land, contrasted with brownfield reuse of an existing site.

Grid-forming inverter

An inverter that actively sets grid voltage and frequency, providing stability that conventional follow-the-grid inverters cannot.

GSU · Generator Step-Up transformer

Transformer that raises on-site generator output to grid or distribution voltage for behind-the-meter power.

HBM · High-Bandwidth Memory

Stacked DRAM mounted on-package with the accelerator; the bandwidth and capacity ceiling and the key supply bottleneck.

An enhanced generation of high-bandwidth memory shipping in 2024-2025 accelerators, faster and denser than HBM3.

The next high-bandwidth-memory generation with wider interfaces and a logic base die, targeting late-2020s accelerators.

HBOM · Hardware Bill of Materials

An inventory of the components in a hardware product, the hardware analog of an SBOM for supply-chain assurance.

Capturing data-center waste heat to warm buildings or feed district heating, improving total energy use and ERF.

NVIDIA's baseboard reference platform integrating 8 GPUs with NVLink, the building block for many AI servers.

High-Bandwidth Memory · HBM

Vertically stacked DRAM bonded next to a processor for huge memory bandwidth; AI's binding supply constraint.

A spare node kept ready to swap in instantly when a failure occurs, minimizing interruption to a running job.

HSM · Hardware Security Module

A tamper-resistant device that generates, stores and uses cryptographic keys, the anchor of key custody.

HV transformer · High-Voltage transformer

Large transformer stepping transmission-level voltage down to site distribution; a long-lead item that often gates schedules.

A copper-to-copper die-stacking technique with far finer, denser connections than solder microbumps.

A giant cloud and platform operator (AWS, Microsoft, Google, Meta) building data centers at global scale.

IaC · Infrastructure as Code

Managing infrastructure through version-controlled configuration files rather than manual setup, for repeatability.

The international standard series for cybersecurity of industrial automation and control systems.

Immersion cooling

Submerging servers in a non-conductive dielectric fluid that carries heat away, in single-phase or boiling two-phase form.

IMS · Integrated Master Schedule

The master project schedule linking all tasks and dependencies, from which the critical path is derived.

Amazon's custom AI inference accelerator, optimized for cost-efficient serving of models.

InfiniBand · IB

A low-latency lossless fabric with native RDMA, the historical default for non-blocking AI training back-ends.

Interconnection queue

The utility waitlist for connecting new large loads or generation to the grid; multi-year waits dominate AI siting.

The silicon or organic layer carrying dense wiring between logic and memory in a 2.5D package.

IOPS · Input/Output Operations Per Second

The rate of read/write operations a storage system handles; metadata-heavy AI loads are often IOPS-bound, not bandwidth-bound.

IRR · Internal Rate of Return

The discount rate at which an investment's net present value is zero, a headline measure of project return.

ISO · Independent System Operator

An operator managing grid reliability and power markets in a region, often used interchangeably with RTO.

The international data-center facility standard series, the global counterpart to Europe's EN 50600.

The international standard for an information security management system (ISMS), a baseline enterprise security certification.

The international standard for an AI management system (AIMS), governing responsible development and operation of AI.

IST · Integrated Systems Test

The final commissioning level (L5) that proves all facility systems work together under simulated full load.

ITUE · IT-power Usage Effectiveness

An efficiency metric pushing the boundary inside the server to capture fan, VRM and PSU losses.

The principle that efficiency gains can raise total consumption; cheaper AI inference spurs far more demand, not less.

Junction temperature · Tj

The temperature of the actual transistors inside a chip; exceeding its max forces throttling or damage.

Kubernetes · K8s

The dominant open-source container orchestration platform, increasingly used to schedule AI inference and training.

KV cache · Key-Value cache

Stored attention tensors reused during decode; its size grows with context and concurrency, dominating inference memory.

NVIDIA's rack-scale platform for the Rubin Ultra generation, scaling NVLink domains to hundreds of GPUs.

A data architecture (Iceberg, Delta, Hudi) adding database-like transactions and schema to cheap object storage.

A subtly defective node that passes basic checks but repeatedly degrades jobs, found by lemon-node detection.

LGIA · Large Generator Interconnection Agreement

The contract governing how a large facility or generator connects to the transmission grid.

Lights-out operations

Running a facility with minimal on-site staff, relying on remote management and automation.

The rapid collapse in the cost to serve a given level of AI capability as models and hardware improve.

LMP · Locational Marginal Price

The price of electricity at a specific grid node, reflecting local supply, demand and congestion.

Load step · Power transient

A sudden synchronized swing in GPU draw across thousands of chips that stresses the power chain in milliseconds.

Long-lead equipment

Items like transformers, switchgear and chillers whose long procurement times often gate the schedule.

LoRA · Low-Rank Adaptation

A fine-tuning method that trains small low-rank adapter matrices instead of all weights, slashing cost and memory.

Industry shorthand for an order-of-magnitude drop in availability, e.g. from 99.99% to 99.9% uptime.

LOSF · Lots Of Small Files

A workload pattern of huge numbers of tiny files that stresses storage metadata far more than raw bandwidth.

Microsoft's custom AI accelerator chip, part of its in-house silicon program for Azure AI.

Fresh water added to a cooling system to replace what evaporation, drift and blowdown remove.

MBU · Model Bandwidth Utilization

Achieved memory bandwidth over peak for memory-bound decode inference; the MBU is the MFU analog when HBM-bound.

Measured boot · Attestation

Recording cryptographic hashes of each boot stage so a remote party can verify a system booted trusted code.

MEC · Multi-access Edge Computing

Placing compute near users at the network edge (e.g. telco sites) to cut latency for real-time AI services.

Memory-bandwidth-bound

A workload limited by how fast data moves from memory rather than by compute; typical of inference decode.

Revenue or power sold on the open market without a long-term contract, carrying price risk versus contracted supply.

MFU · Model FLOPs Utilization

Achieved FLOPs divided by peak FLOPs in a training run; 35-55% is good at scale, eroded by collectives and stragglers.

NVIDIA's modular server reference architecture letting partners build varied GPU systems from common building blocks.

MIG · Multi-Instance GPU

An NVIDIA feature partitioning one GPU into isolated instances so multiple workloads share it securely.

Mixture of Experts · MoE

An architecture routing each token to a few specialized sub-networks, widening parallelism and reshaping fabric needs.

Model FLOPs Utilization · MFU

How much of a chip's theoretical compute a training run actually uses; the headline training-efficiency metric.

MoE · Mixture of Experts

A sparse model that activates only a subset of expert sub-networks per token, cutting compute per token at scale.

An OCP/NVIDIA initiative defining an 800 VDC sidecar power architecture for megawatt-class AI racks.

MTBF · Mean Time Between Failures

The average operating time between failures of a component or system, a core reliability input.

MTIA · Meta Training and Inference Accelerator

Meta's custom AI accelerator family, built to run its recommendation and language workloads more cheaply than GPUs.

MTTI · Mean Time To Interruption

The average time a large training job runs before something interrupts it, a key scaling-reliability metric.

MTTR · Mean Time To Repair

The average time to restore a failed component to service, a core driver of overall availability.

MVA · Megavolt-Ampere

Unit of apparent power used to size transformers and switchgear, accounting for both real and reactive load.

Unit of real power; in AI data centers it has become the de facto unit of compute capacity and grid demand.

Redundancy with one spare component beyond what the load needs, tolerating a single failure or maintenance event.

NCCL · NVIDIA Collective Communications Library

NVIDIA's library implementing optimized multi-GPU collective operations like all-reduce over NVLink and the fabric.

A new breed of GPU-focused cloud provider (CoreWeave, Lambda and peers) renting AI compute outside the big hyperscalers.

NEPA · National Environmental Policy Act

The US law requiring federal projects to assess environmental impacts, a potential permitting gate for some sites.

The North American body setting and enforcing mandatory grid reliability standards, including critical-infrastructure rules.

The US catalog of security and privacy controls for federal information systems, a foundation for many compliance regimes.

A network that can carry full bandwidth between all node pairs simultaneously with no internal contention (1:1).

The US permit program regulating pollutant discharges to surface waters, governing cooling-water blowdown.

NPV · Net Present Value

The present value of future cash flows minus the investment, the core go/no-go metric for capital projects.

An NVIDIA rack connecting 72 Blackwell GPUs into one NVLink domain that behaves as a single huge accelerator.

NVIDIA's high-bandwidth GPU-to-GPU interconnect for tying many GPUs into one memory-coherent scale-up domain.

NVMe · Non-Volatile Memory Express

The high-speed protocol for SSDs over PCIe, the storage interface standard in modern AI servers.

NVIDIA's switch chip that fully connects all GPUs in a scale-up domain and can do in-network reductions.

Storage that keeps data as objects in a flat namespace accessed by API (e.g. S3), the backbone for AI datasets.

OCP · Open Compute Project

An industry community that open-sources data-center hardware designs for racks, power, cooling and security.

Off-gas detection

Sensing the gases a failing battery cell vents, an early-warning trigger before thermal runaway and fire.

Opex · Operating Expenditure

Ongoing running costs such as power, water, staff and maintenance, expensed as incurred.

Optimizer state

The extra per-parameter data an optimizer like Adam keeps (momentum, variance), often doubling or tripling memory needs.

ORR · Operational Readiness Review

A formal gate confirming a facility or cluster is ready to safely take on production load.

ORV3 · Open Rack V3

The Open Compute Project's third-generation rack standard, defining power, busbar and form-factor for open hardware.

OSAT · Outsourced Semiconductor Assembly and Test

Companies that package and test chips after fabrication, a key and capacity-constrained step for advanced AI silicon.

OT · Operational Technology

The control systems running physical infrastructure (power, cooling, building management), a growing cyber-attack surface.

Oversubscription

Provisioning less network bandwidth than full non-blocking would require; 1:1 is non-blocking, 3:1 is oversubscribed.

Confidence levels for a schedule or estimate: P50 is the median outcome, P90 the value met 90% of the time.

A vLLM technique that manages the KV cache in fixed pages like virtual memory, cutting waste and fragmentation.

Parallel file system

A storage system (Lustre, GPFS, WEKA, VAST) serving many clients at once with high aggregate bandwidth for AI clusters.

PCIe · Peripheral Component Interconnect Express

The standard high-speed bus connecting CPUs, GPUs, NICs and storage inside a server.

PDU · Power Distribution Unit

Equipment that distributes electrical power to racks; rack PDUs are the metered strips feeding individual servers.

PEFT · Parameter-Efficient Fine-Tuning

A family of methods (like LoRA) that adapt large models by training only a tiny fraction of parameters.

Persistent fluorinated 'forever chemicals' found in some dielectric coolants, raising environmental and regulatory concern.

PFC · Priority Flow Control

An Ethernet mechanism that pauses traffic to prevent packet loss, making RoCE lossless but risking head-of-line blocking.

Phase gate · Stage gate

A go/no-go decision point between project phases where deliverables are reviewed and capital is released.

Phase I ESA · Environmental Site Assessment

A desktop and walkover review of a site's environmental history to flag contamination risk before purchase.

PILOT · Payment In Lieu Of Taxes

A negotiated payment a data center makes instead of standard property taxes, often part of siting incentives.

Pipeline bubble

Idle GPU time at the start and end of pipeline-parallel execution while the pipeline fills and drains.

Pipeline parallelism · PP

Splitting a model's layers across GPU groups that process different micro-batches in an assembly line.

The largest US regional grid operator, covering the mid-Atlantic and a major data-center hub with long queues.

Point of interconnection · POI

The physical point where a facility's electrical system connects to the utility grid.

PoP · Point of Presence

A network access location where a provider's infrastructure meets users or other networks.

The fine-tuning and alignment stages (SFT, RLHF) after pretraining that shape a model's behavior and usefulness.

Limiting how much power chips or racks can draw to stay within facility limits, at the cost of some performance.

Ratio of real power (kW) to apparent power (kVA); a low power factor wastes capacity through reactive current.

Power oversubscription

Provisioning more IT than the power can sustain at full draw, relying on workloads rarely peaking together.

Power Purchase Agreement · PPA

Multi-year electricity contract fixing price and often sourcing clean energy for a facility.

Power Usage Effectiveness · PUE

Total facility power divided by IT power; the headline data-center efficiency ratio, lower is better.

A building delivered with power and core infrastructure in place but without IT fit-out, ready for a tenant to finish.

PPA · Power Purchase Agreement

Long-term contract to buy electricity (often renewable) at a set price, used to secure and green a site's power.

Pausing or evicting a lower-priority job to free resources for a higher-priority one in a shared cluster.

The compute-heavy phase that processes an inference prompt and builds its KV cache before generation begins.

The initial, compute-heavy phase that trains a model on vast unlabeled data to learn general capabilities.

Provenance register

A documented record tracing a component's origin and chain of custody to assure supply-chain integrity.

PSU · Power Supply Unit

The component that converts incoming AC or DC into the regulated voltages a server or GPU node consumes.

PUE · Power Usage Effectiveness

Total facility power divided by IT power; the headline efficiency ratio where 1.0 is perfect and AI halls target ~1.1-1.2.

A reference architecture that layers and segments industrial control networks to contain cyber threats.

PXE boot · Preboot Execution Environment

Booting a server over the network to load its OS image, the basis of automated bare-metal provisioning.

QLC · Quad-Level Cell

Flash storing four bits per cell, offering high density and low cost at the expense of write endurance and speed.

Using lower numerical precision (FP8, FP4, INT8) to cut memory and boost throughput, trading some accuracy for speed.

Quick-disconnect · QD

A self-sealing coupling that connects or separates a liquid line without leaks, used to service cooled racks.

A topology pinning each GPU's NIC to a dedicated switch 'rail' for collision-free, non-blocking training collectives.

RBD · Reliability Block Diagram

A modeling technique that maps components as a network of blocks to compute overall system availability.

RDHx · Rear-Door Heat Exchanger

A liquid-cooled rack door removing ~50-100 kW without piping liquid to the chips; a brownfield step toward full DLC.

RDMA · Remote Direct Memory Access

Network transfers that move data directly between machines' memory, bypassing the CPU for low latency.

Power that oscillates between source and load without doing work, measured in VAR, that must be managed and corrected.

Rear-door heat exchanger · RDHx

A radiator-style door on the back of a rack that cools exhaust air with liquid before it enters the room.

REC · Renewable Energy Certificate

A tradable certificate representing one MWh of renewable generation, used to claim clean-energy use.

A modern REST API standard for remotely managing server and infrastructure hardware, succeeding legacy IPMI.

REF · Renewable Energy Factor

The share of a facility's energy supplied from renewable sources, a sustainability companion to PUE.

The maximum area a lithography tool can pattern in one exposure (~858 mm2), capping how large a single die can be.

Designing a decision so it can be undone cheaply; reversible choices warrant less analysis than one-way doors.

A model trained to score outputs, providing the reward signal that guides reinforcement-learning fine-tuning.

RFS · Ready For Service

The milestone at which a facility or capacity block is fully commissioned and available to take load.

A system's ability to stay running through a brief power or cooling disturbance instead of tripping offline.

RIM · Reference Integrity Manifest

A signed reference of expected firmware measurements used to verify a device booted untampered code.

RLHF · Reinforcement Learning from Human Feedback

Tuning a model using human preference rankings to make its outputs more helpful and aligned.

RMA · Return Merchandise Authorization

The process of returning failed hardware to a vendor for repair or replacement under warranty.

RoCE · RDMA over Converged Ethernet

RDMA carried on Ethernet (made lossless via PFC/ECN); the open, cost-driven alternative to InfiniBand for AI fabrics.

AMD's open GPU-computing software stack, its counterpart to NVIDIA's CUDA ecosystem.

In reinforcement learning, generating a trajectory of model actions and outcomes used to compute training rewards.

Root of trust · RoT

A hardware-anchored trusted base that verifies firmware and boot integrity before a system is trusted to run.

RTO · Regional Transmission Organization

An entity operating the grid and wholesale power market across multiple utilities in a region (e.g. PJM, ERCOT).

NVIDIA's GPU architecture generation following Blackwell, with Rubin Ultra pushing rack-scale density further.

S3 · Simple Storage Service

Amazon's object-storage service whose API has become the de facto standard for cloud object storage.

SBOM · Software Bill of Materials

A formal inventory of all components in a piece of software, used to track and respond to supply-chain risk.

SCADA · Supervisory Control and Data Acquisition

Industrial control software that monitors and operates a facility's physical systems like power and cooling.

Scalable Unit · SU

The standardized, repeatable design and procurement increment used to scale a cluster predictably.

Connecting many nodes into a looser cluster fabric (InfiniBand or Ethernet) to scale beyond one coherent domain.

Tightly coupling GPUs into one coherent high-bandwidth domain (NVLink class) that acts like a single large accelerator.

Empirical relationships predicting how model quality improves with more compute, data and parameters.

Direct greenhouse-gas emissions from sources a company owns or controls, such as on-site generators.

Indirect emissions from the purchased electricity, heat or cooling a facility consumes.

All other indirect emissions in the value chain, including the embodied carbon of equipment and construction.

SDC · Silent Data Corruption

Errors that corrupt computation without any alert; at fleet scale they silently spoil training and must be hunted.

A boot process that cryptographically verifies each firmware and software stage before allowing it to run.

SerDes · Serializer/Deserializer

The circuit converting parallel data to a high-speed serial stream and back; lane rate sets link bandwidth.

SFT · Supervised Fine-Tuning

Adapting a pretrained model by training it on labeled example responses for a target behavior or domain.

An open-source LLM serving framework with fast structured generation and aggressive KV-cache reuse.

SLA · Service Level Agreement

A contractual promise of service performance (uptime, latency) with penalties or credits if it is missed.

SLO · Service Level Objective

A target for a service metric such as latency or availability that an inference fleet is sized to meet.

A widely used open-source workload manager and job scheduler for HPC and AI clusters.

Small Modular Reactor · SMR

Compact, factory-fabricated nuclear unit pitched as scalable clean-firm power for data centers.

A network card with onboard processing that offloads packet, storage and security work from the server CPU.

SMR · Small Modular Reactor

Factory-built nuclear reactor under ~300 MW, proposed as clean firm power for large AI campuses.

An audit report (Type II covers a period) attesting that an organization's security and availability controls work as described.

SOFC · Solid-Oxide Fuel Cell

A high-temperature fuel cell generating clean on-site electricity from gas or hydrogen for data-center power.

Solid-State Transformer · SST

High-efficiency electronic transformer converting medium voltage directly to the DC bus feeding modern racks.

A nation's drive to own and control AI compute, data and models within its borders for security and autonomy.

NVIDIA's Ethernet platform tuning RoCE for AI collectives with adaptive routing and congestion control.

Speculative decoding

Using a small draft model to guess several tokens that a large model verifies in parallel, speeding generation.

SPOF · Single Point of Failure

A component whose failure alone takes down the whole system; eliminating SPOFs is the goal of redundant design.

SPV · Special-Purpose Vehicle

A standalone, bankruptcy-remote legal entity created to own and finance a single project and ring-fence its risk.

SST · Solid-State Transformer

Power-electronics transformer (~99% efficient) enabling medium-voltage-to-DC conversion for 800 VDC rack architectures.

A node running slower than its peers that holds up a synchronized collective and drags down whole-job throughput.

Stranded capacity

Provisioned power, cooling or space that cannot be used because a different resource is the binding constraint.

Generation or interconnection capacity that exists but cannot reach load due to transmission or siting limits.

STS · Static Transfer Switch

A solid-state switch that instantly transfers a load between two power sources without interruption.

SU · Scalable Unit

A repeatable build block (a defined MW + GPU + cooling + fabric increment) that capacity ramps are composed of.

The facility transforming and switching power between transmission and the site, a major long-lead build item.

Super-load · Heavy-haul

An oversized, very heavy shipment (such as a large transformer) needing special permits and routing logistics.

SuperPOD · DGX SuperPOD

NVIDIA's reference cluster design wiring many DGX systems into a validated, scalable AI supercomputer.

Assembly of breakers, switches and protection that controls and isolates electrical circuits; a long-lead procurement item.

A grid of processing elements that pumps data through in lockstep, the core structure of TPUs and many AI ASICs.

Tail latency · p99

The slowest few percent of responses (e.g. 99th percentile); in clusters one slow node can stall a whole job.

Contract obligating the buyer to pay for a minimum quantity of power or capacity whether or not it is used.

The milestone of finalizing a chip design and sending it to the foundry for fabrication.

TCO · Total Cost of Ownership

The full lifetime cost of capacity, including capex amortization, power, cooling, staff and maintenance.

TDP · Thermal Design Power

The sustained power and heat a chip package must dissipate; the per-chip number that drives rack density and cooling.

TEE · Trusted Execution Environment

A hardware-isolated, encrypted region of a processor that protects code and data even from the host operator.

Tensor parallelism · TP

Splitting a single layer's math across multiple GPUs so they jointly compute one forward/backward pass.

NVIDIA's optimized library for compiling and serving large language models at low latency on its GPUs.

Test-time compute

Spending extra inference compute (e.g. chain-of-thought reasoning) to improve answers, shifting cost from training to serving.

The cascade · Training-to-inference

The lifecycle where today's training hardware becomes tomorrow's inference fleet as newer chips arrive.

Thermal Design Power · TDP

Sustained heat-dissipation requirement of a processor, setting cooling and rack-density needs.

Thermal runaway

A self-reinforcing temperature rise, notably in batteries, that can lead to fire if not detected and contained.

A telecom-industry standard for data-center infrastructure with its own Rated 1-4 reliability classification.

An Uptime classification meaning concurrently maintainable: any component can be serviced without taking IT load down.

The top Uptime classification meaning fault tolerant: the facility survives any single failure with no impact.

TIM · Thermal Interface Material

The paste or pad filling microscopic gaps between a chip and its heat spreader or cold plate to conduct heat.

Time-to-power · Speed-to-power

The elapsed time from contract to energized megawatts; the binding constraint and primary siting screen of the AI era.

TLC · Triple-Level Cell

Flash storing three bits per cell, the mainstream balance of cost, endurance and performance for SSDs.

Tokens-per-joule

Inference energy efficiency measured as tokens generated per joule, a cross-vendor comparator that survives generations.

Tokens-per-watt

Inference efficiency framed as tokens produced per watt of power, used to compare accelerators and fleets.

Topology-aware scheduling

Placing a job's GPUs to respect network topology so its collectives run on high-bandwidth, low-latency links.

Total Cost of Ownership · TCO

The all-in cost of running infrastructure over its life, not just the purchase price.

TPOT · Time Per Output Token

The steady-state delay between successive generated tokens; the inter-token latency SLO governed by decode.

TPU · Tensor Processing Unit

Google's custom AI accelerator chip, built around a systolic array for matrix math and used across its cloud and models.

Amazon's custom AI training accelerator, part of its bid to reduce dependence on merchant GPUs.

Dispatching a technician on-site to fix something; minimizing truck rolls is a goal of remote and automated ops.

TSV · Through-Silicon Via

A vertical electrical channel drilled through a die to stack chips, the wiring that makes HBM and 3D stacking possible.

TTFT · Time To First Token

How long an inference request waits before the first output token appears; a key latency SLO set by prefill.

TUE · Total Usage Effectiveness

PUE multiplied by IT-side efficiency (ITUE); the true facility-to-transistor energy ratio.

Two-person rule

A control requiring two authorized people to act together for a sensitive operation, reducing insider risk.

Two-phase cooling

Cooling that absorbs heat by boiling a fluid and condensing it, exploiting latent heat for very high heat flux.

An open scale-up interconnect standard for up to 1,024 accelerators, the multi-vendor alternative to NVLink.

UCIe · Universal Chiplet Interconnect Express

An open standard for connecting chiplets from different vendors within one package.

UEC · Ultra Ethernet Consortium

The industry group defining AI-grade Ethernet transport (UET) with packet spray and modern congestion control.

Uninterruptible Power Supply · UPS

Power system that keeps IT load energized through grid sags and outages until backup generation starts.

UPS · Uninterruptible Power Supply

System (battery or flywheel backed) that maintains clean power through grid disturbances and bridges to generators.

Uptime Institute's I-IV classification of facility resilience, from basic (I) to fault-tolerant 2N (IV).

UQD · Universal Quick Disconnect

A dripless connector letting liquid-cooled hardware be plugged and unplugged without spilling coolant or tools.

A popular open-source inference engine known for PagedAttention and high-throughput continuous batching.

Voltage Regulator Module · VRM

On-board converter delivering the precise low voltage a processor core requires from the board supply.

VRM · Voltage Regulator Module

Power-electronics stage that steps board voltage down to the low voltage a GPU or CPU core actually needs.

WACC · Weighted Average Cost of Capital

The blended cost of a project's debt and equity, used as the discount rate for valuing its cash flows.

Warm-water loop

A liquid-cooling loop run at elevated temperature so heat can be rejected with free cooling and reused downstream.

Water Usage Effectiveness · WUE

Liters of water consumed per kWh of IT energy; the water-efficiency companion to PUE.

A commitment to replenish more water than a facility consumes, a sustainability pledge by several operators.

Wet-bulb temperature

The lowest temperature achievable by evaporation, setting the floor for how well evaporative cooling can perform.

The conditioned data-hall area where IT racks sit, as opposed to support (gray) space for power and cooling gear.

WORM · Write Once Read Many

Storage that prevents data from being altered or deleted after writing, used for compliance and tamper resistance.

WUE · Water Usage Effectiveness

Liters of water consumed per kWh of IT energy; the water analog of PUE that evaporative cooling worsens.

An NVIDIA GPU error code reported by the driver; specific XIDs flag memory, hardware or driver faults to triage.

Generic term for a non-GPU AI accelerator such as a TPU, Trainium, Maia or MTIA; hyperscaler custom silicon.

The formula setting the optimal checkpoint interval by balancing checkpoint cost against expected failure-rollback loss.

ZeRO · Zero Redundancy Optimizer

A technique partitioning optimizer state, gradients and parameters across GPUs to remove memory redundancy in training.

A security model that trusts no user or device by default and verifies every access request continuously.

Zero-touch provisioning · ZTP

Automatically configuring devices on first power-up with no manual setup, key to deploying at scale.

ZLD · Zero Liquid Discharge

A water system that recovers nearly all wastewater for reuse, leaving essentially no liquid discharge.