The Data Moat Is a Lie

Owning proprietary data isn’t a moat. It’s a mirage

May 07, 2025

Every startup pitch today includes the same boast: "We have proprietary data." But in the AI economy, that’s like bragging about untapped oil in a warzone. The territory doesn’t matter unless you can defend it and extract value from it.

Data isn’t a moat. It’s an inert asset. A possibility. Moats form when data flows through product, through feedback, through behavior. Without that motion, you're not building a moat. You're building a storage locker for someone else to loot.

Owning Data Isn’t the Same as Using It

We are told, ad nauseam, that data is the new oil. But most companies are sitting on dry wells or contaminated sludge. They boast about exclusive access, but ship a chatbot with a logo. They claim defensibility, but offer no behavioral integration. The core assumption, that owning proprietary data translates to durable AI advantage, is not just occasionally wrong. It's structurally wrong.

Proprietary data is valuable only when it moves. When it trains models, shapes decisions, refines outputs, and learns from outcomes. Without that closed loop, your asset is stranded. The moat appears not when you own data, but when you activate it.

The Bloomberg Mirage

Consider Bloomberg. Their much-hyped BloombergGPT was heralded as the canonical vertical model: trained on exclusive financial data, targeting the elite user base of the Bloomberg Terminal. If any incumbent was positioned to fight back against foundation model monopolies, it was Bloomberg.

But BloombergGPT vanished. Not because the model failed, but because it never entered the workflow. Terminal users still navigate arcane function codes. The model sat next to the product, not inside it. It didn’t restructure decisions. It didn’t induce behavior. It didn’t learn from users. It was a press release, not a flywheel.

Bloomberg owns the data, yes. But it failed to build the loop. It never connected model output to user interaction, to retraining, to feedback. The moat never formed because the water never moved.

The Pattern of Failure

Bloomberg isn’t an outlier. It’s the archetype. Others follow:

LexisNexis and Westlaw sit on decades of legal data, but ship faster search, not adaptive legal reasoning.
Flatiron Health has oncology outcomes data, but offers retrospective reports, not real-time clinical guidance.
Coursera and Khan Academy track millions of learning interactions, but embed static bots, not dynamic pedagogy engines.

These companies own data. But they don’t own behavior. They lack the feedback systems that compound value. Their advantage is frozen in time.

Why? Three structural failures.

1. Workflow Is the Choke Point

AI becomes valuable only when embedded directly into the work. If your assistant sits outside the primary interface, it’s irrelevant. Alt-tabbing kills usage. Integration is gravity.

Epic doesn’t just have healthcare data. It owns the environment where physicians think, diagnose, and prescribe. Veeva doesn’t just store pharma info. It mediates regulated decision points.

You don’t always need full workflow control. Sometimes, owning the moment of decision is enough, like GitHub Copilot inside the IDE. But if you’re not intercepting behavior at the point of leverage, you’re just floating.

2. Feedback Loops Create Compounding Advantage

Static datasets are one-time edges. Loops that adapt to user behavior generate compounding returns. That’s the moat: the system that gets smarter every time it’s used.

Most companies don’t build this. Their models don’t observe users. They don’t fine-tune on corrections. They don’t close the loop. They ship a model and call it a strategy.

It’s hard. Feedback is messy, legally fraught, and expensive. But the difficulty is the defense. Tesla does it with FSD. Copilot is starting to. The harder the loop is to build, the stronger the moat.

3. Access ≠ Rights

Even when companies hold the data, they often lack the rights to use it meaningfully. HIPAA, FERPA, GDPR, FINRA: compliance regimes fracture control. You may have access, but no ability to train, re-identify, or productize.

Much of what's labeled "proprietary data" is legally unusable. It exists more in pitch decks than in product.

What Real Defensibility Looks Like

Defensibility doesn’t come from raw data. It comes from a closed system:

Sensor inputs (e.g. Samsara)
Behavioral interaction (e.g. Tesla)
Embedded workflow (e.g. Epic, Veeva)
Feedback optimization (e.g. Copilot)

Together, these form a loop. That’s the moat. Everything else is narrative.

You don’t have to beat GPT-5. You just have to own the moment of decision. Even if the foundation model is slightly worse, owning the interface, latency, customization, and telemetry gives you control. And control compounds.

The Two Axes of AI Moats

Think of defensibility on two axes:

X-axis: Who owns the data?
Y-axis: Who owns the behavior?

The bottom left quadrant, where you own neither, is where wrappers and UX layers die. The top right, where you own both, is where real moats live. That’s the Copilot model. That’s the Tesla model. That’s the target.

Conclusion: Build the Loop or Be Replaced

Owning data is not a moat. It’s raw material.

To make it defensible, you must mine it, refine it, circulate it, and feed it back. You must build a system that learns. One that shapes decisions. One that improves over time. Otherwise, you're just leasing oil rights to OpenAI.

The AI era doesn’t reward possession. It rewards circulation. Build the loop. Or get looped out.

Buy the Rumor; Sell the News

Discussion about this post