Welcome to the latest edition of Buy the Rumor; Sell the News. We’re closing in on 2,000 subscribers, including institutional allocators, venture capitalists, litigators, senior executives and entrepreneurs. Thank you to all who have subscribed!
In today’s post, I show how LLM-native startups look nothing like traditional SaaS.
If you want to connect with me directly, my contact information is at the end of this post.
Silicon Valley is in the throes of a new gold rush. Pitch decks for LLM startups echo the same mantra: This is SaaS 2.0 but bigger. At first glance, the analogy is seductive. LLM-native startups look like classic software plays:
Subscription or usage-based revenue
Sticky customer relationships
Data accumulation driving network effects
And, critically, the promise of gross margins in the 80–90% range that justify the entire SaaS multiple regime.
But once you actually interrogate the unit economics, this narrative falls apart. LLM-native businesses inherit a structural trap first formalized by 19th-century economist William Stanley Jevons.
Why Jevons lives in GPUs
Jevons famously observed that as steam engines became more efficient, coal consumption soared. Lower costs unlocked new use cases, driving total demand higher. Efficiency stimulated consumption.
AI is playing out the same way:
Inference costs, which are the GPU seconds and datacenter power that underlie every model call, have fallen 50–100x over the past two years thanks to better chips, orchestration, quantization, and batching.
But instead of stabilizing COGS, these gains uncork insatiable demand: bigger context windows, more parallel agents, always-on copilots.
This is Jevons for the digital era. Lower unit costs do not automatically expand margins. They simply widen the runway for workloads to balloon.
SaaS-like margins is an accounting illusion
Classic SaaS economics are simple. Serving one more customer costs close to zero. The infrastructure is amortized across thousands. This is how companies like Snowflake sustain gross margins around 66% (even after paying AWS), and why pure-play software can push into the 80s.
AI-native startups are fundamentally different. Here’s the brutal causal chain:
Customer query → GPU seconds → power draw → real dollars out the door
These costs are not OpEx; they’re COGS. Under GAAP, cloud compute consumed to serve customers is booked directly against revenue. The more your product is used, the more you pay whoever owns the GPUs. That means the marginal cost of serving one more query does not approach zero. It’s directly tied to the physics of energy and compute.
Even if you own the stack, it’s not a free lunch: AWS’s own infra margins top out around 25–30%. So even verticalized, AI-heavy workloads face a permanent haircut relative to SaaS.
Why elasticity is wide enough to devour margins
Is demand truly infinite? Of course not. Eventually you hit constraints such as human attention, latency budgets, daylight hours. But demand doesn’t have to be infinite. It just has to be wide enough and long enough to consume all foreseeable efficiency gains.
And that’s exactly what’s happening. As inference gets cheaper:
Chatbots evolve into full-time copilots, constantly scanning docs, summarizing meetings, drafting outreach.
Legal tools go from spot-checking clauses to analyzing entire contract stacks.
Customer service runs 24/7 multi-agent triage.
The result: usage scales up to fill the cheaper compute. Efficiency gains become margin erosion.
Meanwhile, open-source releases erode proprietary model moats, pushing ever more of the value capture back to whoever controls the cheapest, most reliable compute — not the software layer.
Horizontal vs vertical AI
This dynamic isn’t uniform. It’s most vicious for horizontal model APIs. These include the OpenAI and Cohere style businesses selling raw token completion. Here, every bit of usage is a direct pass-through to GPUs.
By contrast, vertical AI applications, such as BloombergGPT or Intuit’s TurboTax copilots, can amortize compute across higher-value workflows. They constrain inference to narrow domains, compress context intelligently, and monetize outcomes, not raw token churn. This buffers them somewhat from the pure Jevons trap.
Still, even verticals face pressure as expectations rise: If your model can reason across my entire data lake for $30/month, why not run it 24/7?
Why venture quietly ignores it
If all this seems obvious, you might wonder: why are the best VCs shoveling hundreds of millions into LLM-heavy startups? Because:
Optionality trumps margin math. Venture only needs one outlier to return the fund. If there’s even a small chance your LLM play becomes the next indispensable platform, they’ll underwrite gross margin risk and pass it to the next buyer.
Short time horizons + fake margin optics. Many startups are riding on millions in free compute credits from hyperscalers. P&Ls look artificially healthy. The reckoning comes when those expire.
Social proof. If Sequoia leads your Series A at $200m post, Benchmark and a16z have powerful reasons to join, even if they suspect your COGS will eat you alive. No one wants to miss the next Salesforce.
But eventually public market investors or strategic acquirers will demand: Show me your gross margins. Not next year, not on paper. Now.
We’re already seeing early hints. Public companies embedding generative AI, from Google to Salesforce, are issuing explicit disclaimers about inference costs. Google, which owns its own TPUs and datacenters, flagged increased AI-driven OpEx on its last earnings call. If it’s material for Google, imagine a Series C startup still renting GPUs at spot prices.
Where this all leads
Eventually, most AI-heavy startups are forced into one of three uncomfortable directions:
Verticalize completely. Buy or build your own compute stack, from silicon to datacenter design. This is the OpenAI, Anthropic, xAI model. But it’s CapEx heavy, shifts you from SaaS multiples to infrastructure multiples, and demands sophisticated capital structuring more akin to power plants.
Throttle demand. Raise prices, impose usage caps, gate the most compute-hungry features. This protects gross margin but strangles the hyper-growth story that justified your last valuation.
Accept permanent margin compression. Operate like a utility, hoping volume compensates for thin spreads. That might suit infra funds, but it’s poison to growth VCs hunting 80% software margins.
Ironically, the most robust LLM businesses may end up looking less like SaaS darlings and more like energy merchants or telecom operators: CapEx heavy, throughput driven, valued on predictable cash flows, not infinite gross margin software fairy tales.
How to build or invest under Jevons
This environment isn’t hopeless. It just demands a radically different playbook.
If you’re building:
Design for bounded inference. Price on outcomes, not raw GPU hours.
Prioritize proprietary data that compresses the problem space. A domain-specific model that’s 10× cheaper to run is your best moat.
Explore async and offline architectures. Many workloads can be batched or distilled to cut live compute.
If you’re investing:
Don’t just chase topline. Scrutinize the ratio of COGS to customer value delivered.
Favor companies that either own critical infra outright or have a credible path to do so.
Be wary of startups driving usage into ever-more GPU-hungry workflows with no plan to curb elasticity.
The bottom line
Jevons paradox ensures that for the foreseeable future, most investors will misprice the unit economics of AI-native businesses, just as they did early cloud or telecom buildouts.
That creates a golden window for more disciplined builders and allocators who understand the physics behind gross margin. In AI, as in energy, efficiency doesn’t widen margins. It fuels more consumption, until your cost structure is all that’s left.
Coda
If you enjoy this newsletter, consider sharing it with a colleague.
Most posts are public. Some are paywalled.
I’m always happy to receive comments, questions, and pushback. If you want to connect with me directly, you can: