Three Problems the AI Industry Is Glossing Over
After installing OpenClaw this past Friday—one weekend of heavy use, $80 in Anthropic API costs, another $5.80 in Chinese model tokens in a couple of hours—some uncomfortable realities about AI agents became hard to ignore.
There are three problems with AI agents right now that are deeply coupled. Call it the agent triangle: the human bottleneck, the quality ceiling, and the cost. They reinforce each other in ways that make the “AI will 10x everyone” narrative more fragile than it sounds.
The Human Bottleneck
Coding agents speed people up. No question. But there’s a less comfortable observation hiding underneath: the human is also what’s slowing the agent down.
The common assumption is that the bottleneck is code review, that the agent writes code and the human validates it. And for less experienced developers, that’s probably true. But for senior developers, validating code is the easy part. The actual bottleneck is everything that happens before the code gets written.
Knowing what to build. Aligning the agent on how to build it. Getting it enough context to understand the problem, which means having it read files, wikis, documentation, codebases, sometimes entire websites. That context-gathering loop takes real time, even though it’s more comfortable than doing the research manually. The agent hunts the resources, reads them, explains them back. That’s genuinely useful. But it’s not instant, and the back-and-forth of specification and discussion can easily meander into tangents or slow exchanges that eat into the productivity gains.
There’s also a subtler shift happening. When an agent is always available to discuss ideas with, the temptation is to externalize thinking, to bounce half-formed thoughts off the agent instead of iterating internally first. This isn’t necessarily bad. Having a tireless collaborator that can process context faster than any human means more resources are within reach, more ideas get explored, more knowledge gets absorbed in less time. But “faster” doesn’t mean “free.” The exploration still takes time, and expanded curiosity means expanded scope, which means more hours spent even if each individual hour is more productive.
And then there’s the parallelization problem, arguably where the biggest theoretical gains are. An agent can work on multiple features, multiple projects simultaneously. But a human can’t effectively supervise multiple agents at once. Context-switching between parallel workstreams, keeping track of what each agent is doing, maintaining the mental model for each project—human multitasking has hard limits. This is where the real 10x productivity unlock would come from, and it’s precisely where human cognitive bandwidth caps it.
The obvious fix is more autonomy. Take the human out of the loop entirely. Let the agents run unsupervised. But that only works if the output can be trusted. And whether the output can be trusted depends entirely on the next problem.
The Quality Ceiling
Models get better every release. No argument there. But here’s the question nobody’s answering clearly: does quality converge to human level, or past it?
If it asymptotes just below human quality—always getting closer, never quite reaching it—then the human can never fully leave the loop. The bottleneck from the first point becomes permanent. Not because the human is needed for every task, but because the trust is never quite enough to let go.
And it’s worse than “average quality” suggests. The real killer is variance. A model can look near-human on average while confidently doing something bizarre 1–5% of the time. That’s exactly the regime where trust never fully develops. The output is good most of the time, but “most of the time” isn’t enough to hand over the reins.
Fundamentally, this is a trust problem. Humans make mistakes too. Humans hallucinate, forget things, get things wrong. But there’s a baseline trust that a competent human meets a quality bar and can learn from errors. The question is whether models reach that same threshold of trust—not perfection, but enough reliability that the default shifts from “check everything” to “trust but verify occasionally.” Until that threshold is crossed, the human stays in the loop, and the bottleneck holds.
The Cost
Remember “intelligence too cheap to meter”?
That framing was plausible when AI meant chatting with a model. A few dollars per million tokens on simple prompt-response interactions. But agentic AI is a completely different cost profile. Reasoning traces, tool calls, long contexts, retries, multi-step workflows—a single real task can burn through hundreds of thousands of tokens without breaking a sweat.
$80 in a weekend on Anthropic’s models. $5.80 in a couple of hours on a Chinese model that’s roughly 10x cheaper. These aren’t edge cases. This is what normal, productive agentic usage looks like.
Now do the math for a company. An employee costs salary X and is productive at level Y. With AI agents, maybe they’re 30% more productive but now also burning $1,000–2,000 per month in inference costs. Is that a win? At what point does the productivity gain actually cover the compute bill?
The subscription models make the economics even more visible. The $20/month and $200/month plans from the major labs were designed for conversational and coding usage: chat with a model, use a coding agent, build something. And coding is already an expensive usage pattern. But it’s still bounded—someone sits down, codes for a few hours, stops. An always-on agent accessible from a phone, running tasks throughout the day, exploring things on the user’s behalf, is a fundamentally different usage pattern. It’s not a coding session. It’s a digital employee that never clocks out. And the flat-rate pricing was never built for that.
The consequences are playing out in real time. This past week, Anthropic officially banned the use of their subscription OAuth tokens in third-party tools like OpenClaw. The math is straightforward: a $200/month Max subscription becomes deeply unprofitable when users route autonomous agent workloads through it, since the equivalent API usage would cost over $1,000. Then, just days later, Google started banning Antigravity users doing the same thing, disabling accounts for paying $250/month Ultra subscribers without warning. OpenAI is the only major lab still allowing this—they hired OpenClaw’s creator and are leaning into third-party tool support, at least for now. But the pattern is clear: flat-rate subscriptions and agentic usage are fundamentally incompatible. The labs priced these plans assuming human-paced interaction, not autonomous agents running loops all night.
This isn’t just an OpenClaw story. It’s the market discovering in real time that agentic AI costs dramatically more to serve than conversational AI, and the current pricing doesn’t account for it.
The Triangle Compounds
These three problems don’t just coexist. They compound each other.
The human can’t be removed because quality and trust aren’t there yet. The human is slow because supervising AI output is cognitively expensive. Every iteration costs real money in tokens. The result: slower than promised AND more expensive than before.
Try to fix the bottleneck by giving agents more autonomy, and quality issues bite harder. Try to fix quality by adding more guardrails, and token spend and human time both go up. Try to fix cost by using cheaper models, and quality drops, which means more human review. Every lever tightens one of the other constraints.
Is This a Bubble?
The technology is real and transformative. AI agents are a genuine step change from even a year ago, and nobody who’s used them seriously is going back.
But there’s a gap between the narrative and the current economics that the industry isn’t being honest enough about. “AI agents will replace entire workflows” is a compelling pitch. The reality right now is more like “AI agents will make some workflows faster while adding a significant compute bill, and there still needs to be a human checking the work.”
That gap doesn’t mean the technology is overhyped in the long run. It means the edges need to be seen clearly right now, so the focus can be on the things that actually close the gap rather than pretending it doesn’t exist.
What Actually Unlocks This
The three sides of the triangle aren’t equal. The quality ceiling is the keystone. How it breaks determines which of two futures we land in.
Future one: cost drops, humans stay. Model quality remains near but below human level, so the human never fully leaves the loop. But inference costs come down far enough that the human-plus-AI equation pencils out. The productivity boost is real, it’s affordable, and it justifies deploying agents widely even with a human still supervising. Some tasks get fully automated where trust can be established programmatically. Others stay human-in-the-loop but move faster and cheaper. This is the incremental future. It’s good, but it’s not the revolution being sold.
Future two: quality clears the bar, humans leave. Model reliability decisively surpasses human level. The human bottleneck dissolves because the output is trustworthy enough to not need supervision. The cost math flips entirely—the question stops being “does the productivity gain cover the compute bill on top of the salary?” and becomes “are the tokens cheaper than the human they’re replacing?” That’s a much easier equation to win.
But future two has a tension worth naming. If AI becomes reliable enough to replace human labor, the labs are providing enormous value, and companies that provide enormous value price accordingly. The incentive to drive inference costs down weakens precisely when the product becomes indispensable. As long as humans are in the loop, they’re a pricing check: no company will pay $5,000/month in tokens for an agent that still needs a $150k/year supervisor. Remove the human, and the ceiling on what labs can charge rises to whatever the salary used to be—unless open-source models running on cheaper, commoditized compute get good enough to be the market correction. There’s probably a whole essay in that game theory alone.
Neither future is certain. Quality might plateau and costs might stay high, and the triangle stays locked. But the shape of the problem is worth seeing clearly, because “AI will just keep getting cheaper and better” assumes a smooth path that the economics don’t guarantee.
The agent triangle is real. Which future breaks it open—and whether either does—depends on a quality threshold nobody can confidently predict, and an economic dynamic almost nobody is talking about yet.