Why AI Agents Won’t Just “Do Stuff”

Cross-platform execution requires trust rails we haven't built yet

Dec 29, 2025

TL;DR

Intelligence is not the bottleneck: While Aaron Levie argues that cheaper AI will lead to an explosion of work (Jevons Paradox), “doing stuff” is limited by human attention, decision rights, and risk budgets, not just the cost of production.
Permissions are the ultimate barrier: Enterprises are built on strict security and permission systems designed to prevent autonomous action; letting agents operate at machine speed creates massive security risks, like prompt injection, that current infrastructure cannot handle.
Platform protectionism: Major tech giants (Microsoft, Google, etc.) have no incentive to let “foreign” third-party agents roam freely through their data; they will likely enclose agents within their own ecosystems to maintain strategic control and revenue.
Agent clearinghouses are needed: To move beyond simple chatbots to executive agents, we need a new institutional layer, similar to credit card networks or SWIFT, that manages identity, cryptographic attestations of integrity, and liability when things go wrong.
The shift from produce to prove: In high-stakes industries, AI won’t create a surplus of work. It will create a massive new verification tax, where the real value and power migrate to the infrastructure that can prove an agent’s actions were legal, safe, and intentional.

Aaron Levie, CEO of Box, recently published a thoughtful piece arguing that AI agents will trigger a “Jevons Paradox for knowledge work,” in which agents dramatically lower the cost of cognitive tasks. The result, he argues, will be an explosion in total knowledge work activity, much like how cheaper coal led to more coal consumption rather than less. It’s a compelling story. It’s also missing about half the architecture.

The core Jevons mechanism is real: make something cheaper, and if demand is elastic enough, total usage explodes. This happened with computing power, with cloud infrastructure, with marketing technology. Levie is betting it happens with cognition. But there’s a reason why “AI agents that do stuff” keeps getting promised and keeps not quite arriving at the scale the rhetoric suggests. The missing piece isn’t better models or better prompts. It’s that executing work across organizational and platform boundaries requires infrastructure that doesn’t exist yet, and that the entities who’d need to build it have every incentive not to.

The Good Parts of the Argument

Let’s be fair: Levie gets several things right.

First, the “lower I, not higher R” insight is genuine. Most ROI discussions obsess over returns when the real leverage is collapsing investment costs. When prototyping goes from “hire three engineers for six months” to “engineer plus AI for two weeks,” the set of experiments you’re willing to attempt explodes. That’s a real phase change. Option value becomes abundant when fixed costs compress.

Second, demand for knowledge work probably is more elastic than we assume. There are thousands of projects that don’t get started because the coordination cost is too high, not because the idea is bad. If AI can collapse those coordination costs—generate the first draft, build the prototype, do the research—then yes, more things will be attempted. Jevons isn’t magic; it’s just what happens when you remove a binding constraint and discover new use cases.

Third, the “tasks aren’t jobs” observation is historically solid. Automation usually eats subtasks, not roles. ATMs didn’t eliminate bank tellers; they changed what tellers do. Excel didn’t eliminate accountants; it changed what counted as acceptable analysis. Most jobs are bundles of tasks, and when you automate some tasks, the job boundary just shifts.

So far, the argument holds. The problem is what comes next.

The Hidden Assumption: Demand is Unbounded

Jevons requires that the binding constraint is the cost of the input. Levie assumes that knowledge work is primarily constrained by the cost of producing drafts, analysis, code, and creative material. Cheaper production therefore means much more production.

But for a lot of knowledge work, the binding constraints live elsewhere:

Attention doesn’t scale. Most knowledge work outputs must be consumed by humans: customers, regulators, managers, users. Attention doesn’t expand 10× just because generation does. When drafts become abundant, value migrates to filtering, ranking, trust, and distribution. Generation becomes the easy part; getting anyone to care becomes the hard part.

This isn’t a footnote. It’s the economic center of gravity. If AI makes it trivial to generate marketing campaigns, customer support responses, contract reviews, and strategic analyses, you don’t get 10× more valuable work. You get 10× more artifacts, and the scarce resource becomes human judgment about which ones matter.

Decision rights and risk budgets are the real constraints. In enterprises, the limiting factor often isn’t “can we generate a contract review?” but rather: who is allowed to sign off, who bears liability, what the audit trail requires, what the regulator expects, what the brand can risk. If AI makes output cheaper but increases the probability of subtle failure, or makes failures harder to detect, organizations don’t scale usage linearly. They hit governance ceilings.

Market demand is derived, not intrinsic. More marketing campaigns or software prototypes only create value if they increase revenue or reduce costs. But when everyone gets the same capability, you often get more competitive intensity, lower differentiation, lower margins, and higher churn of tactics. Activity can go up 10× while economic surplus per unit collapses. Jevons can describe compute usage while completely missing who captures value.

The marketing employment analogy Levie uses is particularly slippery. Yes, marketing jobs increased as marketing technology improved. But they increased largely because digital platforms created entirely new categories (SEO, social media, programmatic buying) and because marketing became more measurement-intensive. That’s not pure Jevons. That’s platform intermediation creating jobs and taking 40-50% of ad spend. The AI equivalent might be: yes, more activity, but huge chunks of value flow to model providers, API vendors, and trust infrastructure rather than to distributed knowledge workers.

The Cross-Platform Execution Problem

Here’s where the argument really starts to fracture: Levie’s vision requires that AI agents can access and execute within any system they need to complete their tasks. A small team should be able to deploy an agent that handles procurement, accounting, customer support, and engineering, orchestrating across Microsoft, Salesforce, AWS, QuickBooks, Zendesk, and GitHub as needed.

This collides with reality in two ways: internal permissions and external platform politics.

Internal: Permissions Are the Product

Enterprises are permission systems wearing a trench coat. Most valuable actions require identity verification, authorization, separation of duties, auditability, and revocation capabilities. An agent that can “just do the thing” is exactly what security teams are paid to prevent.

Even if the agent is “you,” it changes the risk profile because it executes at machine speed, can be tricked at machine scale, and can cause correlated failures across systems. Firms will throttle it, sandbox it, or route it through approval gates. That’s an immediate brake on the implied scaling curve.

The integration reality is messier still. The “agentic” story presumes clean tool surfaces: stable APIs, well-defined schemas, deterministic side effects, idempotent actions. Real workflows are half-SaaS APIs, half-brittle UIs, manual exceptions, undocumented business logic, Slack approvals, and someone’s spreadsheet with the “real numbers.” Agents can navigate this, but reliability craters when workflows aren’t formalized. The edge cases are the business.

And then there’s security. If an agent can read email, documents, and tickets and then take privileged actions, you’ve created a pipeline from untrusted input to privileged execution. Prompt injection—malicious instructions hidden in documents or web pages—becomes the new SQL injection. The payload isn’t “drop table,” it’s “wire money, change vendor, exfiltrate data, grant permissions.”

To safely deploy agents with broad access, you need content sanitization, tool-use allowlists, least-privilege enforcement, policy engines, human-in-the-loop approvals, anomaly detection, and provenance tracking. The constraint moves from “can the agent do it?” to “can we prove it won’t do the wrong thing?”

External: Platforms Are Not Neutral Pipes

But even if you solve internal security, there’s a bigger problem: why would Microsoft allow agents from Amazon to execute within Microsoft’s systems on Amazon’s behalf?

The answer is: they won’t, except under very controlled conditions.

From Microsoft’s perspective, an agent is just a client acting on behalf of some identity: a user, a service principal, an app registration. So Microsoft already “allows” third-party agents via OAuth tokens and API scopes, the same way they allow any CRM, backup tool, or SIEM.

What Microsoft will not allow is ambient authority for foreign agents. And more importantly, Microsoft will permit cross-platform agent activity only when it increases usage of Microsoft systems, is monetizable, is auditable and controllable, and doesn’t undermine their strategic control points.

They won’t allow it when it increases attack surface with unclear accountability, enables easy data exfiltration or automated switching away from their products, disintermediates their own agent layer (Copilot, Graph, Power Platform), or turns their platform into a dumb pipe while a competitor captures the orchestration layer.

This isn’t a technical constraint. It’s political economy.

If agents become the primary interface to software, whoever controls the agent layer controls default choices, procurement, switching costs, and ultimately customer relationships. So Microsoft, Google, Salesforce, and AWS all have strong incentives to say: “Use our agent framework to operate in our domain. We’ll integrate outward, but on our terms. Third-party agents can connect, but only through audited channels.”

Think about how this plays out. Apple allows apps, but only through App Store rules. Banks allow transfers, but through SWIFT and ACH rails with compliance layers. Cloud providers allow integrations, but through IAM, scopes, quotas, and contracts. Agents don’t magically dissolve these incentives. They intensify them.

The likely equilibrium is not “agents roam freely across systems.” It’s “bring the agent to the system.” In-tenant agents run inside Microsoft’s security boundary, inside Google Workspace, inside Salesforce. Cross-platform work happens through narrow, well-scoped connectors. This preserves data gravity, trust boundaries, billing control, and reduces “foreign agent” risk.

In other words: the agent layer becomes another enclosure movement, not a universal passport.

The Missing Layer: Clearinghouses for Agent Authority

Once you understand that cross-org agent execution is gated by trust, not by capability, the problem shape changes completely. This isn’t primarily an AI problem. It’s a distributed systems + identity + liability + adversarial security problem. AI just makes the transaction volume explode.

The generic solution to “trusted transactions between strangers at scale” is some variant of a clearinghouse. We’ve built them before:

Credit card networks (identity + fraud detection + dispute resolution)
SWIFT and ACH rails (messaging standards + settlement)
PKI and certificate authorities (identity attestations)
Container supply-chain signing (provenance + revocation)
Derivatives clearing houses (margin + default management)

Agents doing consequential actions is closer to payments and clearing houses than to chatbots.

What Would an Agent Clearinghouse Actually Do?

The unit of account isn’t messages. It’s authority. An agent is dangerous only because it can cause side effects. So what you’re clearing is: a signed request to execute an action, under a scoped delegation, producing a verifiable receipt.

The minimum viable clearinghouse needs six primitives:

1. Strong identity for principals and agents. You need to distinguish between the legal entity that’s responsible (company, department, individual) and the software actor that’s executing (model + toolchain + policy engine). “This agent belongs to DaveCo and is allowed to do X” is different from “DaveCo is allowed to do X.”

2. Delegation and capability tokens. Not “this agent can access my CRM.” More like: “This agent can create draft invoices up to $5k, expiring in 24 hours, revocable, auditable.” This is capability-based security: permissions as unforgeable tokens, not static roles.

3. Attestation of agent integrity. You want cryptographic statements about: what model version, what tool connectors, what policy rules, what sandbox, what human-approval gates, whether it’s running in a hardened environment. This is the software supply chain problem, but for agent stacks.

4. Execution receipts. Every action produces a signed, tamper-evident receipt: who requested, what authority token was used, what inputs were considered, what tool calls happened, what side effect occurred, what approvals were obtained. If you can’t generate an audit trail that survives lawyers, regulators, and incident response, you don’t have enterprise agents.

5. Revocation and incident broadcast. When something goes bad: revoke tokens instantly, revoke agent certificates, publish compromise indicators, coordinate rollback procedures where possible. This is the certificate revocation list equivalent.

6. Liability and dispute resolution. If an agent wires money wrong, deletes data, violates policy, or exfiltrates secrets: who eats the loss, what’s the appeals process, what insurance exists, what evidence is admissible? Without this, large firms will sandbox agents into uselessness.

Once you have the minimum viable clearinghouse, the next layer looks like a financial CCP: risk scoring that requires more friction for high-risk actions, standardized action schemas (CREATE_PURCHASE_ORDER, ISSUE_REFUND, PROVISION_USER), and a reputation layer based on certified assurance levels and incident history.

Who Runs This Thing?

There are three plausible equilibria:

Platform-run trust rails. Microsoft, Google, and Salesforce each build “agent trust” internally, then offer cross-platform connectors on their terms. Fastest to deploy, integrated with existing IAM and logging, but leads to fragmentation and rent extraction.

Identity incumbents as clearinghouses. Okta, Entra, Ping Identity, and security vendors evolve into “agent identity + delegation” networks. They already sit in the auth choke point, but they still need liability and action receipt standardization.

Central Counterparty Clearinghouse (CCP) model. For regulated workflows, including infinance, healthcare, and government, you eventually get a standards body, certified participants, audit requirements, dispute processes, and possibly regulatory blessing. Banks did it. Derivatives markets did it. Payments did it.

My bet: platforms dominate early for productivity use cases, consortiums emerge for high-stakes cross-firm actions, and identity vendors supply much of the plumbing.

The Real Moat Isn’t Models

Here’s the non-obvious punchline: the clearinghouse will be the real “agent moat.” Models will commoditize faster than people think. The durable defensibility will be: trust rails, compliance acceptance, standardized receipts, incident response network effects, liability frameworks, and certification ecosystems.

That’s the boring stuff that becomes priceless when the first big “agent-induced loss event” hits headlines and every board asks, “Are we exposed?”

Levie’s Jevons story is about cheap cognition creating more work. But executed work is gated by trust. The clearinghouse is the mechanism that converts “cheap tokens” into “trusted transactions.” That’s the missing institution. And it’s where a lot of the real money, and power, ends up living.

What This Means for the Jevons Thesis

So where does this leave the original argument? Levie is directionally right that cheaper cognition will create rebound effects. But the scaling curve is much steeper for some activities than others.

Jevons will hit hard for: experimentation, prototyping, first drafts, search and retrieval, personalization at scale, long-tail customer interaction where “good enough” is acceptable.

But for high-stakes domains, including legal commitments, regulated reporting, security-sensitive code, medical decisions, and mission-critical operations, AI increases throughput only as fast as you can scale verification and accountability. The binding constraint moves from “produce” to “prove.”

The labor outcome is probably task explosion with role polarization: fewer people doing routine mid-level synthesis, more leverage for operators who can run AI workflows end-to-end, more demand for domain owners who can sign off (liability holders), and more demand for toolsmiths building evaluation, monitoring, and governance infrastructure.

So Jevons can be true about tokens while not being a clean story about jobs. And it can be true about activity while missing entirely where value and power concentrate.

The Boring Truth

The universe remains allergic to free lunches. It just changes where the bill shows up.

In this case, the bill shows up as: governance infrastructure, trust rails, liability allocation, and adversarial security. The exciting vision of “AI agents democratize Fortune 500 capabilities” crashes into the boring reality that executing consequential actions requires institutions we haven’t built yet.

Those institutions will get built. They always do when enough money is at stake. But they’ll be built by the entities with the most to gain from controlling the choke points: platforms, identity providers, and regulated industry consortiums. The democratization story becomes more ambiguous when the trust layer is owned by the same concentrated players.

The most Jevons-y thing about AI in enterprises may not be the explosion of useful work. It may be the explosion of artifacts, bureaucracy, and meta-work required to verify that all those cheap tokens actually did something correct, legal, and intentional.

That’s less inspiring than “every small business gets Fortune 500 capabilities.” But it’s probably closer to what actually happens.

If you enjoy this newsletter, consider sharing it with a colleague.

I’m always happy to receive comments, questions, and pushback. If you want to connect with me directly, you can:

follow me on Twitter,
connect with me on LinkedIn, or
send an email to dave [at] davefriedman dot co. (Not .com!)

tete dgretrtt

I love this stuff. From my pov the short term use case is using AI agents to create dumb automated workflows quickly. These are much safer to permission.

Expand full comment

1 reply by Dave Friedman

Jon

5dEdited

1. “The spreadsheet with the “real numbers “ 😂

2. Do “non-aligned” models like OpenAI have a giant hill to climb here?

Content is fire this week, keep it coming

2 replies by Dave Friedman and others

5 more comments...

Buy the Rumor; Sell the News

Discussion about this post

Ready for more?