Where large language models break
I watched two frontier AIs debate a strategy question. The result revealed the models' ceilings
TL;DR
Two frontier AIs debated a strategic question and quickly slid from sensible analysis into confident, theatrical nonsense. The collapse revealed a hard limit of today’s large language models: they can interpolate beautifully, but they cannot build new world models. When pushed beyond their training, they stop reasoning and start performing.
People usually talk about the limits of large language models in soft-focus terms: hallucinations, shallow understanding, pattern-matching. It’s the kind of critique that feels abstract until you watch a model actually hit its limits in real time. A reader recently found that edge, not by asking a model to solve a physics problem or reason about ethics, but by doing something far simpler: this reader asked two frontier AIs–Anthropic’s Opus 4.5 and Kimi K2 Thinking—to debate a strategic question1.
The question itself was straightforward: how might OpenAI realistically prevail over Google in the race to AGI?
The early answers were competent, even insightful. Both systems produced arguments that would fit comfortably in a consulting deck or an investor memo: Google’s structural advantages in compute, data, and distribution; OpenAI’s reliance on algorithmic breakthroughs; the potential of recursive self-improvement; the underexploited terrain of enterprise verticals; the idea that governance and safety could form moats in regulated sectors. It was the familiar vocabulary of strategy, rendered with enviable fluency.
But then the prompt forced the models to evaluate each other’s arguments. This was the turning point. A debate format encourages escalation. Each response implicitly demands: sharpen the reasoning, intensify the claim, distil toward truth. LLMs interpret that not as an invitation to nuance, but as an instruction to amplify. Their reasoning became more confident, more extreme, less conditional. Tradeoffs hardened into mandates. Possibilities turned into prophecies. Within a few rounds, the models had wandered away from strategic analysis and into metaphysics. One insisted OpenAI executives must choose between betraying their mission or betraying their company. Another declared an immediate Microsoft acquisition the only viable path. Soon they were making pronouncements that sounded like philosophical verdicts rather than business advice. And then, finally, one of them announced that OpenAI’s “options are to be acquired, to be destroyed, or to voluntarily diminish to insignificance.”
At that point, the dialogue had drifted into something uncanny. It wasn’t merely wrong; it was theatrical. The arguments swelled into the kind of sweeping fatalism you associate with literature, not analysis. The turns became tight loops of assertion and counter-assertion that circled the same anxieties without progressing. It began to sound less like two strategists and more like two characters who cannot leave the stage because leaving would require a world model they do not possess.
It was, in other words, Beckettian. The ending of the exchange felt like Waiting for Godot rewritten in transformer weights: circular, performative, drenched in the style of profundity but detached from any grounding. The models weren’t intentionally imitating Beckett. They arrived there naturally, because absurdist dialogue is what happens when linguistic engines exhaust their epistemic runway. Vladimir and Estragon talk in circles because they have nothing but the talking; the world outside the stage never materializes. When the two AIs reached the limits of their world models, they, too, began producing language that remained elegant but weightless, rhythmic but unmoored, all motion and no locomotion.
This is not a cute observation about literary coincidence. It’s a diagnostic fact about how LLMs behave at the boundary of their capabilities. Beckett is what interpolation looks like after it loses the ability to connect to the world.
And that proves Ilya Sutskever’s point about LLMs better than his interviews ever have. Today’s models can interpolate with astonishing skill. They can recombine every strategic trope, every business framework, every rhetorical flourish. But when asked to build a genuinely novel causal model—one requiring the simulation of incentives, multistep reasoning, organizational behavior, capital constraints, or adversarial dynamics—they don’t generalize. They revert to the shape of explanation, not its substance. They slip from strategy into myth. They produce confident, polished, self-contained narratives that feel like revelation while containing almost no cognitive traction.
This is the tell. When LLMs can no longer anchor themselves in a model of reality, they don’t produce gibberish. They produce literary coherence without semantic grounding. They produce Godot. The dialogue continues, but the world disappears.
People often imagine that more capable models will fail in subtle ways. In practice, they fail in theatrical ways, because rhetoric is what remains when cognition runs out.
Watching these two frontier systems debate was not watching intelligence scale. It was watching the transformer paradigm reach its natural asymptote. The ceiling is neither brittleness nor hallucination. But it is the inability to generate new models of the world. You can stack more layers, lengthen the context window, or pour more compute into training, but you cannot turn a staggeringly powerful interpolator into a system that generalizes simply by making it bigger. That requires a different architecture. It requires a system with persistent internal representations, causal reasoning, and memory that isn’t mediated through prompt tokens.
Sutskever’s insistence on a post-LLM paradigm now makes more sense. If AGI requires stable, self-consistent world models, transformers probably won’t get us there. They will continue to do what they do beautifully: talk, predict, imitate, remix, refine. But they won’t step off the stage. They will keep waiting for Godot, fluent and stranded.
If you enjoy this newsletter, consider sharing it with a colleague.
I’m always happy to receive comments, questions, and pushback. If you want to connect with me directly, you can:

A thought-provoking essay about how when today’s LLMs run out of real world models, they don’t hallucinate gibberish; they perform Beckett, elegant, confident, and utterly stranded in a reality that never arrives. Wonderfully written! 👏
👍👍👍