AI's fork in the road
Does intelligence stay in the cloud, or migrate to the edge? The answer decides the fate of Nvidia, data centers, and GPU futures
Today’s dominant AI narrative is built around the GPU. Hyperscalers and sovereigns are in a global arms race to secure Nvidia hardware. Investors talk about compute as the scarce industrial input of our era. Don Wilson, one of the savviest commodities traders of his generation, has even predicted that GPU futures markets could be larger than oil within a decade.
That’s the consensus view: GPUs are the oil of the AI economy.
But there’s a contrarian one. Inference, the day-to-day use of trained AI models, may not remain in centralized GPU clusters. Instead, it could migrate to the edge: smartphones, laptops, cars, industrial sensors. If that shift materializes, it undermines the assumption that GPUs are the canonical commodity of the AI era.
The question isn’t academic. Billions in capex, Nvidia’s valuation, and the entire premise of GPU financialization hinge on the answer. Does AI scale like oil and natgas, forever flowing through centralized commodity infrastructure? Or does it scale like smartphones: ubiquitous, distributed, and personalized?
Why edge inference matters
Inference has different economics from training. Training is episodic, capital-intensive, and centralized: you need thousands of GPUs running in parallel to grind through massive datasets. Inference is continuous and user-facing: every text completion, every image recognition, every voice command. It’s the ongoing operational cost of AI.
Running inference on edge devices addresses several structural bottlenecks:
Latency: cloud inference requires a round trip across the network. That’s tolerable for some tasks, unacceptable for safety-critical or user-experience-critical ones. Processing locally gives sub-100ms responses.
Privacy: local models mean your voice, text, or camera feed doesn’t leave the device. Apple has leaned hard into this advantage.
Bandwidth and cost: sending raw video or sensor data to the cloud for processing is expensive. Doing the work locally means the network only carries conclusions, not terabytes of raw input.
Energy efficiency: data centers already consume a single-digit share of U.S. electricity. Shifting a fraction of inference to billions of efficient edge NPUs distributes the load and lowers cost.
In other words, edge inference isn’t speculative utopia. It’s a practical response to cost, latency, and privacy constraints.
Evidence in 2025
We aleady have tangible proof that significant inference can run outside the cloud.
Apple: iOS18 ships with a ~3B parameter model running on the iPhone 15 Pro’s Neural Engine. It generates ~30 tokens/sec locally. That’s not toy performance; it’s a miniature ChatGPT-class assistant in your pocket.
Qualcomm: in 2023 they demonstrated Stable Diffusion, once thought to require an A100 cluster, running entirely on a Snapdragon phone in under 20 seconds.
Open-source community: projects like llama.cpp have compressed Meta’s LLaMA and Mistral models to run on laptops and even phones. Enthusiasts now run 7B parameter LLMs locally, no cloud required.
Industrial deployments: cars use on-board compute for autopilot; factory cameras detect defects in real time without streaming to the cloud; smart speakers process words locally.
Cloud providers hedging: AWS Greengrass, Azure IoT Edge, and Cloudflare Workers AI all let customers push inference closer to users.
These are early signals, but across consumer, industrial, and enterprise settings they all point in the same direction: inference is escaping the data center.
What changes if edge inference scales
Data Centers
They remain indispensable for training and ultra-large inference. But the growth trajectory could flatten relative to the “everthing centralizes” narrative. If edge accounts for 20-30% of inference by 2030, that trims trillions of the most bullish capex projections. Expect a tiered topology: centralized clusters for training, regional sites for latency-sensitive workloads, and billions of devices handling routine inference.
Nvidia
Training demand is secure. Inference is where erosion happens. Hyperscalers are building ASICs (Inferentia, TPU, MTIA) to cut GPU reliance. Edge silicon from Apple, Qualcomm, and others captures consumer inference. Nvidia’s GPUs are powerful but expensive, hot, and replaced every 18-24 months. CUDA dominance is weaker in inference: ONNX and PyTorch runtimes make it easier to target non-GPU hardware. Nvidia still wins training, but inference fragments.
Financialization
Here’s the hinge. Don Wilson has said GPU futures could be larger than oil. That assumes two things: fungibility and durable demand. Oil meets both. GPUs don’t.
Fungibility: oil is chemically identical; GPUs are not. An H100 isn’t a GB200, and both are obsolete in a few years.
Durability: oil has powered industry for a century; GPU demand could fragment within five years as ASICs and edge NPUs scale.
Near-term GPU futures make sense as hedges and speculative contracts during scarcity. Long-term, the ecosystem fractures. A single GPU contract cannot represent the compute economy. Expect a portfolio: training-GPU contracts, ASIC capacity swaps, edge-compute indices.
Forecast through 2030
A reasonable base case: 20-30% of global inference runs on edge devices by 2030.
Breakdown by domain:
Vision (consumer + industrial): 40-60% edge
Speech/assistants: 30-50% edge (with cloud fallback)
LLM chat/knowledge: 10-25% edge (small local models, cloud for heavy queries)
Robotics/vehicles: >80% edge (safety-critical)
The rest remains in the cloud: training and ultra-large inference still on GPUs, but much routine inference migrating to custom ASICs.
This implies a financialization arc:
Late 2020s: GPU futures thrive as short-tenor capacity hedges.
Early 2030s: fragmentation sets in; GPU contracts lose benchmark status.
Beyond: AI compute markets splinter into multiple instruments, none as dominant as oil.
Signals to watch
Investors and operators should monitor:
Device benchmarks: does each phone/laptop generation double the feasible model size?
Cloud positioning: do AWS and Azure market edge-first deployment?
ASIC adoption: how fast do Amazon, Google, Meta shift inference off GPUs?
Privacy regulation: new laws requiring local data processing could accelerate edge.
These signals will reveal whether GPUs remain the canonical commodity or become just one piece of a heterogeneous landscape.
Closing
Capital is flowing into data centers as if GPUs are the oil of our era. Don Wilson is betting that GPU futures will eclipse oil.
But look closer. Apple is running LLMs locally. Qualcomm put Stable Diffusion on a phone. Cars and factories are processing data at the edge. Even cloud providers are hedging toward distributed inference.
GPU futures can be a blockbuster late-2020s instrument. By the early 2030s, compute financialization will splinter: GPUs for training, ASICs for hyperscale inference, NPUs for edge. One monolithic contract won’t survive the topology shift.
If inference moves to the edge, then the topology of AI changes. And if you’re trading this space, the play is clear: go long near-term GPU scarcity, accumulate exposure to ASIC and edge, and fade the fantasy that one oil-like benchmark will dominate the 2030s.
If you enjoy this newsletter, consider sharing it with a colleague.
I’m always happy to receive comments, questions, and pushback. If you want to connect with me directly, you can:
Dave - in a world where a meaningful % of inference moves to the edge, that would obviously be bad for Nvidia's GPU business, but what do you think that would mean for TSMC? Does this net out neutral for them in terms of revenue, just with a different mix of semis (and probably a reorganization of their fabs)? Or does this really hurt them?
Already see this in my enterprise customers. All want to use gen ai, coding assistants and similar, but none want to put their proprietary code and designs into someone else’s model.