Chapter 1.8
Business Models, Economics & ROI
An AI data center is a depreciating capital asset whose return is decided by four numbers — capex per watt, the depreciation life you assume, the utilization you actually achieve, and the price you can still charge after the market deflates it — and getting any one of them wrong turns a 'factory' into a stranded balance-sheet liability.
What you'll decide here
- Which depreciation life you underwrite the asset against — the 5–6 year book life that flatters near-term earnings, or the 2–3 year frontier-economic life that the workload actually obeys — because that single assumption is the dominant lever on every TCO, $/GPU-hr, and breakeven number downstream.
- Which operating archetype you are (hyperscaler, neocloud, colo/build-to-suit, or self-build) and therefore whose cost-of-capital, utilization risk, and margin structure you inherit.
- The utilization you can credibly contract or fill — because below the ~70% debt-financed breakeven the same hardware that prints money at 85% bleeds cash, and the swing is wider than any engineering optimization can recover.
- How much of your revenue is contracted/take-or-pay versus merchant/spot — the split that sets your debt capacity, your exposure to the ~10x/yr token-price deflation, and whether a single non-renewal strands the asset.
- Which downside you are designed to survive — utilization collapse, a residual-value shock, a rate spike, or a contract non-renewal — and whether the design-for-flexibility you paid for actually hedges it.
Every chapter before this one spends money; this is the chapter that decides whether the money comes back. An AI data center is not, financially, a building — it is a depreciating capital asset with a very short fuse, dominated by silicon that loses value faster than almost any industrial equipment ever financed at this scale. The engineering decisions in Parts 5 through 12 set the cost stack; the market decisions in Part 16 set the demand; this chapter is where the two meet in a single objective function: does the asset earn its cost of capital before the workload, the hardware, or the price curve makes it obsolete?
We build the cost stack and the TCO denominator; we confront the depreciation debate that quietly determines whether the whole industry is profitable; we lay out the $/GPU-hr pricing ladder and the breakeven that governs it; we trace inference unit economics down to $/M-tokens and the gross-margin waterfall that the application layer lives or dies on; we score build-vs-own-vs-lease as an NPV with an explicit option value; and we close on the operating archetypes and the downside stress tests. The through-line: most of the decisive numbers in AI-infrastructure economics are contested, and the contested ones are exactly the ones the return depends on. We flag them as we go.
The asset and its cost stack
Start with the denominator. The canonical bottom-up reference is a 1 GW AI data center: roughly $38B total-program capex and ~$8.5B/yr all-in TCO once costs are annualized over their respective asset lives (Epoch AI, 2026). The $38B is the total program — the ~$27.5/W core IT-plus-power-plus-shell stack below plus land, the multi-year build-out, and financing carried to energization. It works out to about $8.5M per MW per year — the single number to carry in your head when someone quotes you a lease or a colo rate, because it is the all-in cost you are implicitly benchmarking against.
The cost stack inverts the intuition of a traditional data center. In a legacy facility the building and power plant dominate; in an AI factory the silicon dominates everything. The split is roughly: IT/servers ~60–64% (about $17.50/W), power + cooling + electrical ~29–30% (about $7–10/W), and the building shell ~7% (about $1.90/W) — a core capital intensity near $27.90/W (≈ $17.50 + $8.50 + $1.90) for the IT-plus-power-plus-shell scope of an AI-optimized build, before land, multi-year build-out, and financing (Epoch AI; domain synthesis, 2026). The consequence of a server-dominated stack is profound: the asset's economic life is the GPU's economic life, not the concrete's. You can amortize a shell over thirty years; you cannot amortize a frontier accelerator over thirty years, and pretending otherwise is the original sin of AI-infrastructure accounting.
The costing denominator matters as much as the numerator. Quote a facility in $/MW-year and you are comparing real estate; quote it in $/GPU-hour and you are comparing compute supply; quote it in $/M-tokens and you are comparing the product the customer actually buys. The three denominators are linked by utilization and by tokens-per-GPU-second, and a number that looks competitive in one can be uncompetitive in another. Naming the denominator is the first act of honesty in any AI-infrastructure pro-forma. → metric definitions in Chapter 0.3.
GPU economics, depreciation and obsolescence
This is the canonical home for the depreciation debate, because it is where the contested figures do the most damage if mis-set. Begin with the unit. An 8-GPU H100 SXM server runs ~$250k–$400k (~$27k–$40k/GPU all-in). Depreciation on a $300k server is ~$50k/yr over six years, ~$60k/yr over five, ~$75k/yr over four — and that one line is the largest single component of a self-operated cluster's cost. The depreciation schedule is therefore not a footnote; it is the cost structure.
The bull case — the reason a 5–6 year book life is defensible — is the training-to-inference cascade. A GPU retired from frontier pre-training is not scrap; it cascades down to post-training, then to inference serving, then to batch and internal workloads, earning revenue at each step. If the cascade holds, the economic life stretches toward the book life and the accounting is honest. The bear case is that the cascade is finite (there is only so much inference demand for a two-generations-old part), that each new generation is so much more efficient per token that the old part is uneconomic to run against grid power, and that the residual market is thin. Both can be partly true at once.
The residual evidence is genuinely mixed, which is why it is CONTESTED. H100s retained ~60–83% of value at 18 months, but secondary rental rates fell 64–75% from their $8–10/hr peak, and the implied residual after three years is ~20–40% (Hashrate Index; CNBC synthesis, 2025). A residual that high underwrites the cascade defense; a residual that low validates the short-life bears. The hyperscalers themselves disagree in public: Meta extended server life from 4.0 to 5.5 years (+$2.9B income); Amazon went the other way, 6 to 5 years (−$700M), in the same window (company filings, 2025). When the largest operators move depreciation in opposite directions, no outside party should pretend the number is settled.
Deep dive: the Burry thesis and why understated depreciation is a systemic question, not a stock pick
The sharpest version of the bear case is the claim that the industry is systematically under-depreciating its AI fleet — booking long lives to flatter earnings while the assets decay on the short schedule. The most-cited estimate puts ~$176B of understated depreciation across 2026–2028 for the major operators, against an industry AI-asset D&A line approaching ~$400B/yr (Michael Burry / secondary analyses, 2025–2026). The mechanism is simple accounting: every year you extend the assumed useful life, you move cost off the current income statement, so reported operating margin rises even though nothing about the physical asset improved.
Why this matters beyond a single short position: depreciation policy is the hinge between two completely different pictures of AI-infrastructure profitability. On the long life, the build-out is a high-margin growth story. On the short life, a large share of current 'earnings' is borrowed from a future write-down. The honest engineering-economics posture is not to pick a side but to model the cash flows on the economic life and the reported earnings on the book life, and watch the gap — because the gap is where stranded-asset risk hides. This is CONTESTED and the figures bind to the dated forecast register. → Appendix D; macro framing in Chapter 16.4.
Pricing, utilization and revenue management
Cost is half the equation; the other half is what you can charge and how full you keep the asset. The $/GPU-hour ladder in 2026 spans nearly an order of magnitude for the same H100: a ~$1.03/hr spot floor, a neocloud median ~$2.29–3.50/hr, AWS on-demand ~$6.88/hr, and Azure ~$12.29/hr (SemiAnalysis H100 Index; AM Compute, 2026). Neoclouds price 40–85% below the hyperscalers because they sell raw capacity without the managed-services envelope. The ladder is not static: the 1-year contract index rose ~+40% from October 2025 to March 2026 as supply tightened, a reminder that GPU pricing is a commodity market with real cycles, not a SaaS price list.
Against that revenue ladder sits the cost the operator actually carries. A self-operated build at scale lands near ~$0.74/GPU-hr at 2048-GPU scale and 90% utilization, rising to ~$1.03/hr for small clusters (SemiAnalysis, 2025). The spread between that cost and the rental ladder is the gross margin — but only if the asset is full. Utilization is the silent variable that dominates the whole pro-forma.
Revenue management therefore reduces to filling the asset above the cliff and tiering the fill by value. Revenue per GW of AI capacity runs ~$10–12B/GW/yr (SemiAnalysis, 2025 — a contested, single-source figure), which is why speed-to-power has direct dollar value: energizing 200 MW six months early is worth roughly $1–1.2B in incremental revenue against a depreciation clock that is already running. The revenue-per-MW you can actually realize tiers by archetype — interactive inference at a latency premium, batch at a spot discount — and the mix you contract determines whether you sit comfortably above breakeven or hope for it.
| Supply channel | Price / cost ($/GPU-hr) | What it includes | Implied posture |
|---|---|---|---|
| Spot floor | ~$1.03 | Bare capacity, interruptible, no SLA | Below or at self-op cost — a buyer's market signal |
| Neocloud median | ~$2.29–3.50 | Reserved capacity, basic SLA, fast time-to-job | 40–85% under hyperscaler; the volume tier |
| AWS on-demand | ~$6.88 | Managed, integrated, enterprise SLA | Convenience and trust premium |
| Azure on-demand | ~$12.29 | Managed, integrated, enterprise SLA | Top of the ladder; managed-services envelope |
| Self-operated cost (2048 GPU @ 90%) | ~$0.74 | Your own all-in TCO, excludes margin | The cost you must beat to justify building |
| Self-operated cost (small cluster) | ~$1.03 | Sub-scale all-in TCO | Scale penalty erases much of the build advantage |
Inference revenue and unit economics
This is the canonical home for inference unit economics, because inference is now ~2/3 of AI compute and the workload most operators actually monetize. The build-up runs from physics to price: tokens/GPU-second → $/GPU-hour → $/M-tokens. A worked example: an 8x H100 node at ~$19.20/hr serving Llama-70B at ~2,800 tokens/sec lands near ~$1.90 per million tokens self-hosted (Introl / NVIDIA synthesis, 2025) — though the number is brutally sensitive to model size, precision, and batch efficiency. The same hardware can swing the cost several-fold depending on how well you batch and how long the decode sequences run.
The application layer that sits on top earns a gross margin that is structurally worse than software's, and this is the figure most often missed in AI business plans. AI-app gross margins run ~41% rising toward ~52% in 2026 (application-layer specifically nearer 45%), against traditional SaaS at 70–90% (ICONIQ State of AI 2026; Bessemer, 2026). Inference COGS averages ~23% of revenue at scaling-stage AI companies — for every $1M of AI product revenue, roughly $230k is consumed by inference. The gross-margin waterfall is: list price, minus token COGS, minus the inevitable free-tier and retry overhead, minus the cost of the long decode sequences that reasoning models emit. Every layer of that waterfall is under pressure from the layer below it.
Build vs own vs lease: the NPV and the option value
Chapter 1.6 framed the procurement fork qualitatively; this is its quantitative home. The benchmark to anchor against is wholesale colocation: ~$217/kW-month global average in 2025 (Ashburn ~$215 at record highs; range ~$120 in Atlanta to ~$250 in Silicon Valley, up to ~$450 in Singapore), with build-to-suit / credit-tenant leases at ~$150–220/kW-month over 15 years and vacancy near 1% (JLL / CBRE, 2025). Convert $/kW-month to $/MW-year and compare it against the ~$8.5M/MW-year all-in TCO of a self-build: leasing trades a higher steady-state unit cost for capex-light speed and — critically — optionality under demand uncertainty.
The NPV comparison is necessary but not sufficient, because a flat DCF understates the value of being able to change your mind. When demand is uncertain — the normal state in 2026 — a lease is a real option: the right, not the obligation, to continue holding capacity, exercisable as demand resolves. A self-build forecloses that option; you own the megawatts whether or not the workload materializes. The correct comparison prices the option premium: how much extra $/MW-year is the exit/flex right worth, given your demand variance? For a durable, well-forecast workload at scale the option is nearly worthless and the build wins on unit cost. For a spiky or uncertain workload the option dominates and leasing or renting wins even at a higher headline rate. The fork is not build-vs-lease in the abstract — it is how confident is your demand forecast, expressed as an option price.
| Mode | Unit cost (steady state) | Capital intensity | Time-to-power | Option value under uncertainty |
|---|---|---|---|---|
| Self-build (own) | Lowest at scale (~$8.5M/MW-yr all-in) | Highest (full capex) | 24–36 months | Lowest — you own the MW regardless of demand |
| Build-to-suit lease | ~$150–220/kW-mo (~$1.8–2.6M/MW-yr) | Capex-light (lease) | 12–24 months | Moderate — long term limits exit |
| Wholesale colo | ~$217/kW-mo avg (~$2.6M/MW-yr) | Capex-light (lease + IT) | 6–12 months | High — shorter terms preserve exit |
| Neocloud / rental | ~$2.29–3.50/GPU-hr (opex) | Opex only | Days to weeks | Highest — pure pay-as-you-go optionality |
Financing strategy: why the capital structure shapes the asset
How you finance the asset changes what you can build and what survives a downturn — the deal mechanics live in Chapter 2.5, but the strategic logic belongs here because it feeds straight into the ROI scorecard. The defining feature of the 2026 build-out is that it has outgrown self-funding: against a multi-year build estimated near $2.9T (2025–2028) with a ~$1.5T financing gap beyond hyperscaler cash flow (Morgan Stanley, 2025), the market reached for GPU-collateralized debt, delayed-draw term loans (DDTLs), bankruptcy-remote SPVs, and asset-backed securitization. ABS issuance ran ~$27B in 2025 and is projected toward $30–40B/yr in 2026–2027.
The strategic catch is that the collateral is the very asset whose value is contested. GPU-backed lending underwrites a depreciating, deflating asset against a thin secondary market — the same residual-value uncertainty from the depreciation debate, now wired into the capital structure. CoreWeave is the visible test case: FY25 revenue $5.13B and 60% adjusted-EBITDA margin, but a −$1.17B net loss, ~$21–25B of debt, interest near 46% of EBITDA, and a ~$66.8B backlog (~13x revenue) concentrated in a few anchor tenants (company filings, 2026). The 'circular financing' critique — vendor stakes and residual backstops that let a buyer finance the purchase of the vendor's own chips — is a real structural risk, not a talking point: it couples the financing to the same demand and residual assumptions the equipment depends on, so a residual shock hits collateral, covenants, and revenue at once.
The four operating archetypes
The same physical asset earns very different returns depending on who operates it, because each archetype inherits a different cost-of-capital, utilization risk, and margin structure. They are four distinct business models that happen to share a bill of materials.
The hyperscaler finances from operating cash flow at the lowest cost of capital, fills the asset with its own first-party demand (search, ads, cloud, internal training), and treats the data center as cost-of-revenue for a far larger product. Utilization risk is low because demand is captive; the depreciation policy is the visible lever, which is why hyperscaler life-extensions move billions of reported income. The neocloud is the opposite: thin margins, high leverage, GPU-backed debt, and acute exposure to the ~70% breakeven and to tenant concentration — a high-beta bet on sustained GPU demand. The colo / build-to-suit operator sells powered shells and steady $/kW-month rent, carries real-estate-like risk and real-estate-like cost of capital, and is largely insulated from GPU obsolescence because the tenant owns the silicon. The self-build enterprise/lab optimizes for control and long-run unit cost on a durable, well-forecast workload, accepting the deepest capital commitment and the full obsolescence risk in exchange.
| Archetype | Cost of capital | Utilization risk | Margin structure | Obsolescence exposure |
|---|---|---|---|---|
| Hyperscaler | Lowest (op cash flow) | Low — captive first-party demand | Cost-of-revenue for a larger product | High but absorbed; depreciation policy is the lever |
| Neocloud / GPU cloud | High (GPU-backed debt) | High — merchant demand, tenant concentration | Thin, leveraged, ~70% breakeven | Highest — owns silicon, sells hours |
| Colo / build-to-suit | Real-estate-like | Low-moderate — long leases | Steady $/kW-mo rent | Low — tenant owns the GPUs |
| Self-build (enterprise/lab) | Corporate/project finance | Self-imposed — own workload | Internal cost; lowest unit cost at scale | Full — owns and runs to economic end-of-life |
Protecting ROI and stress-testing the downside
Protecting the return is a small set of levers, each of which maps to a downside it hedges. The power-cost lever is the largest controllable opex line — energy is ~$0.6B/yr in the 1 GW model — so a cheap, firm, long-dated PPA or on-site generation is worth more to lifetime ROI than most capex optimizations. Depreciation policy is the lever that decides whether reported margin reflects reality; the conservative choice protects against a residual shock at the cost of near-term earnings. Design-for-flexibility — reserving floor loading, water, and electrical headroom for a density ramp, and keeping procurement mode hybrid — is the lever that hedges workload and generation uncertainty. The ROI scorecard ties them together: levered IRR, DSCR, payback against economic (not book) life, and the contracted-vs-merchant revenue split that sets debt capacity.
Then stress the downside, because the asset's fragility lives in the tails:
- Utilization collapse. The dominant risk. Falling from 85% to 55% utilization flips a profitable cluster to a cash-burning one ($670k/month swing on 1,024 GPUs). The hedge is contracted/take-or-pay revenue; the failure mode is a merchant fleet into a soft GPU-rental market.
- Residual-value shock. If three-year residuals collapse from ~40% toward the low end, GPU-backed debt is under-collateralized and the short-life depreciation bears are vindicated. The hedge is conservative depreciation and limited leverage; the failure mode is circular financing against an optimistic residual.
- Rate spike. Highly-levered builds (interest already ~46% of EBITDA at the visible neocloud) are acutely rate-sensitive; a financing-cost spike can exceed the entire margin. The hedge is fixed-rate, long-dated debt and a contracted revenue base.
- Contract non-renewal. Backlogs concentrated in a few anchor tenants (~13x revenue at the visible case) mean one non-renewal can strand a campus. The hedge is tenant diversification and take-or-pay with real termination economics.
- The secondary-market-depth (Burry) thesis. The whole defense of the long life rests on a deep, liquid secondary GPU market that can absorb cascaded hardware at a stable residual. If that market is thin, the cascade is a story rather than a cash flow, and a large slice of reported industry earnings is borrowed from a future write-down. This is the systemic version of the residual shock.
Deep dive: why design-for-flexibility is the cheapest downside hedge you can buy
Most of the downside cases above are expensive to hedge after the fact and cheap to hedge at scoping time — which is the entire argument for spending an option premium early. A merchant operator cannot manufacture a take-or-pay contract once utilization has already collapsed; but it can, at design time, keep its procurement mode hybrid (a colo anchor plus neocloud overflow) so that a demand miss sheds opex instead of stranding capex. An operator cannot retrofit a soft residual market; but it can choose a conservative depreciation life up front so the balance sheet already assumes the bear case. And an operator cannot re-pour a slab for a denser generation mid-life; but it can, per Chapter 1.1, reserve the floor loading, water, and electrical headroom that let the asset absorb a 5x density jump (Hopper ~40 kW → Blackwell ~130 kW → Rubin Ultra ~600 kW), capturing the ~5x revenue-per-MW of a generation step instead of being stranded one generation behind.
The unifying principle: the downside cases are correlated — a demand miss tends to arrive with a residual shock and a financing squeeze at the same time, because they share the same underlying cause (AI demand resolving lower than the build assumed). Flexibility is valuable precisely because it is the one hedge that pays off across all of the correlated tails at once: it lets you shrink, defer, or re-mix the asset rather than carry a fixed cost into a falling market. Price the flexibility premium against the joint probability of the tails, not each one in isolation. → structural scenarios in Chapter 16.5.