GPU futures and the locus of inference
Where inference occurs in the future will tell us a lot about whether GPU futures make sense
I’ve previously written about the idea of GPU futures, and this seems to elicit a lot of confusion or ire from readers. I get a bunch of DMs, emails and unsubscriptions every time I write about it. This is an AI newsletter! I hear you say. What’s dirty finance doing in my newsletters? Well, this post is an FAQ to help my subscribers understand why GPU futures might be important for the future of AI.
What are futures, in plain English?
A futures contract is a promise to buy or sell something at a fixed price on a fixed date in the future. Farmers use them to lock in grain prices. Airlines use them to manage jet fuel costs. Traders use them to speculate.
But not every good or service can be turned into futures. You need:
A standardized unit (barrel of oil, bushel of corn),
lots of buyers and sellers with ongoing exposure,
a way to measure spot prices,
and a trusted mechanism to settle disputes.
When these conditions exist, futures markets thrive.
Why talk about futures for GPUs?
GPUs are the scarce resource behind AI. They’re expensive, oversubscribed, and in high demand for both training (building models) and inference (running them). On the surface, that looks a lot like oil in the 1970s: a bottlenecked resourced begging for financialization. If GPU futures existed, data centers and AI companies could hedge costs, investors could speculate, and infrastructure could be financed at a lower cost of capital.
Why does it matter where inference happens?
Because the economics look completely different depending on the locus of inference:
Cloud inference: You rent compute from a data center, paying as you go. That’s an operating expense: volatile, recurring, hedgeable.
Edge inference: The compute lives in your phone, laptop, car, robot, drone, industrial sensor. You pay once, up front, when you buy the device. That’s a capital expense: fixed, bundled, not hedgeable.
Futures thrive in opex-shaped markets, not capex-shaped ones.
Is this a zero-sum game: cloud or edge?
Not really. It’s not all cloud or all edge. Reality is hybrid:
Training is locked in the cloud.
Inference splits: some workloads centralize, others move to devices.
The question is whether enough inference stays in the cloud, standardized, aggregated, and volatile, to justify a futures curve. GPU futures is a bet on the locus of inference.
What would a GPU futures contract look like?
It wouldn’t be tied to chip names like H100. Hardware evolves too fast. More likely:
Unit: a benchmarked measure of throughput (tokens per second on a standard test).
Location: delivery at a few regional hubs (Virginia, Oregon, Frankfurt).
Firmness: “firm” contracts guarantee priority and uptime; “interruptible” are cheaper but flexible.
Settlement: against a trusted index built from actual transactions, with guardrails against manipulation.
That’s how you turn GPU time into a defensible financial instrument.
How are AI data centers financed today, without futures?
Right now, AI-grade data centers are financed with blunt, expensive tools because there’s no way to hedge GPU risk:
Hyperscalers (Microsoft, Google, Amazon, Meta) use balance sheets and bonds.
GPU cloud startups (CoreWeave, Crusoe, Lambda) raise equity from VCs, hedge funds, and sovereign wealth funds — costly capital.
Banks rarely offer project-finance debt because there’s no hedged offtake market (unlike oil or LNG).
Governments subsidize with tax credits, cheap land, and power deals.
AI data centers today look more like merchant power plants circa 1990 than like LNG trains. Without a futures curve, they can’t tap cheap, non-recourse project finance.
How would GPU futures lower the cost of capital?
With a futures curve:
A data center builder could sell forward strips of GPU hours.
Lenders could underwrite loans against those locked-in revenues.
Project finance debt (cheaper than equity) would become available, as it did for oil, LNG, and fiber once those had futures curves.
In short: a futures market would let AI infra be financed like an asset class, not a speculative venture.
Are GPU futures like a crack spread in oil?
Yes. Think of it this way:
In oil, refiners buy crude and sell gasoline or diesel. The crack spread is the margin between input and output prices.
In AI, inference can run in the cloud (opex per million tokens) or on the edge (capex amortized per million tokens).
The “spread” between those costs is the margin that decides where workloads go. Traders could one day build products around this, like swaps or indices, to hedge or speculate on the cloud vs edge economics.
Why can’t Silicon Valley build this market?
Because building futures markets requires a completely different skill set:
Chicago/New York DNA: clearinghouses, collateral, regulation, contracts that survive litigation.
Silicon Valley DNA: platforms, apps, scaling fast, breaking things.
Futures markets can’t break. They need enforceability and neutrality. Hyperscalers and device OEMs are too vertically integrated to play that role, and SV startups don’t have the legal/regulatory infrastructure.
So why are VCs sniffing around?
Because the story is irresistible: GPUs are the oil of AI, oil has futures, therefore GPUs must too.
To a VC, it’s a cheap option: most bets die, but if this one works, the payoff is massive.
What they fund are proto-markets: auctions, credit marketplaces, indices.
Their hope: become the front end that CME/ICE eventually buys.
But the real futures curve, if it comes, will be born where commodity markets always are: Chicago, New York, London, Geneva.
Where else in the world are commodity markets centered?
Chicago: grains, Treasuries, financial futures.
New York: coffee, cocoa, metals, energy.
Houston: physical energy hub.
London: metals (LME), Brent crude.
Geneva/Zug: merchant houses (Vitol, Glencore, Trafigura).
Singapore: Asian oil, LNG, palm oil, freight.
São Paulo: B3 exchange, monster in equity options and ag futures.
Shanghai/Dalian: iron ore, soybeans, rebar (huge but semi-closed).
Each commodity has a “home court.” GPUs will likely be born in Chicago (futures DNA) but traded globally with Geneva-style merchants and São Paulo-style derivatives desks.
What should we watch to know if GPU futures will exist?
Five leading indicators:
Cloud share of inference. Does a deep artery of cloud traffic remain?
Standardization. Are there clear benchmarks and service levels?
Congestion. Are queues and latency persistent enough to hedge?
Pricing models. Are AI services billed by usage (cloud opex) or bundled into devices (edge capex)?
Enterprise posture. Do regulated industries stick with the cloud for compliance?
Final takeaway
GPU futures aren’t inevitable. They hinge on whether enough inference stays cloud-side to create a deep, standardized, hedgeable channel of demand.
If cloud retains that artery, futures can emerge, lowering costs of capital and reshaping AI infra finance. If edge eats too much, the dream fades, and financialization migrates elsewhere (equities, OEM contracts, securitization).
Either way, the decision won’t be made in Chicago or New York. It will be made in Cupertino, Seattle, and Shenzhen, by engineers deciding where inference runs.
If you enjoy this newsletter, consider sharing it with a colleague.
I’m always happy to receive comments, questions, and pushback. If you want to connect with me directly, you can:
I love the idea. But gotta wonder at that computer benchmark for compute, and how it can be gamed/ optimized by SV types with motivation to do so.
A barrel of oil is a barrel of oil; a few simple chemical tests define jt completely to all parties involved. A bunch of processing power depends on configurations and software libraries and chip/board/power/cooling details; all of which which vary a lot. Seems challenging to me.