The Machines Are Talking to Themselves
OpenRouter data shows that AI agents now consume more tokens than humans. Here's what that means for the future of computing.
The New Exponential
Moore's Law was the steady heartbeat of tech for half a century. The relentless doubling of transistors on a chip gave us everything from the PC to the smartphone. As a software engineer, I grew up taking that rhythm for granted—it was just the way progress worked. But I've come to realize that era is over.
I'm calling the new trend the Token Law. Where Moore's Law was a supply-side observation about the physics of silicon, this new Token Law is a demand-side phenomenon—reflecting the explosive growth in the complexity of tasks we are now entrusting to AI. It's not about how many transistors we can cram onto a chip, but how many "thoughts" an AI can process.
(OpenRouter tokens/week)
(Oct 2025 — 1.3 quadrillion)
(Oct 2025)
(OpenRouter observed rate)
The chart above tells the story. Moore's Law, the gentle upward slope on the left, delivered a 2× improvement every two years. The Token Law, the steep eruption on the right, is delivering 12× in a single year. And this isn't just one platform—Google went from 980 trillion to 1.3 quadrillion tokens per month in just two months. Alibaba reports its token use is doubling every few months.
The Robots Are Doing the Talking
My first thought was that it's just more people like you and me chatting with AI. But that's not the whole story. The primary driver is a fundamental shift in how AI operates. We're moving from simple, single-shot queries to complex, multi-step workflows executed by what we in the field call autonomous "agentic" AI systems.
For me, the most compelling evidence was seeing which applications were consuming the most tokens. When I looked at the OpenRouter leaderboard, it wasn't dominated by chatbots. The real power users were coding agents—specialized AI systems designed to write, debug, and manage software autonomously.
| Rank | App | Type | Tokens / Day | Volume |
|---|
I think of it as the difference between asking a person for directions and hiring a consultant who then makes dozens of phone calls, reads manuals, and runs tests on your behalf. A simple query might consume a few hundred tokens. But when you ask an AI agent to fix a bug, it kicks off a complex internal monologue. It has to analyze the code, replicate the error, search for solutions, write a new patch, and then test its own work. If the test fails, it starts the whole loop over again, learning as it goes. As an engineer, I find this process of "self-reflection" fascinating. It can multiply the token cost by 10, 50, or even 100 times compared to a simple query.
💬 Simple Query
🤖 Agentic Task
The Unseen Brakes on the Exponential Engine
While the demand for tokens is exploding, a parallel and equally intense engineering effort is underway to tame this exponential growth. The story of the Token Law isn't just about unchecked expansion; it's also about the sophisticated optimizations being built to manage the cost and complexity of these powerful new systems.
The most significant of these is KV Caching. In the iterative "self-reflection" loops common to AI agents, much of the initial context remains the same from one step to the next. Instead of re-processing this entire context each time, caching techniques allow the model to reuse the intermediate calculations, dramatically reducing the effective number of tokens processed and making complex, multi-step reasoning economically feasible.
Furthermore, the AI ecosystem is not monolithic. Sophisticated agents rarely rely on a single, massive model. Instead, they orchestrate a cascade of models, using smaller, faster, and cheaper specialized models for routine tasks like intent recognition or data extraction, only calling upon the powerful—and expensive—frontier models for the most complex steps. This, combined with the fact that input tokens are often 3-5× cheaper than output tokens, forms a powerful set of brakes on the runaway train of token consumption. The true challenge for engineers is not just building token-hungry agents, but architecting systems that balance their immense power with these crucial economic and computational realities.
The Physical Cost of Thought
This exponential growth in abstract "tokens" has a very real, physical cost. As someone who works on messaging infrastructure at scale, my world is governed by the trade-offs between latency, bandwidth, and computational resources. We fight for every kilobyte saved in our data serialization and every millisecond shaved off our processing time. From that perspective, the sheer scale of token consumption by agentic AI is staggering. The Jevons Paradox is in full effect: as models become more efficient, we don't just do the same tasks for less energy; we invent entirely new, token-hungry workflows that were previously unimaginable.
"The unit that once measured text now measures energy. Moore's Law no longer governs progress because token growth does." — Jonathan Lishawa, illuminem
Global data center electricity use is projected to rise from roughly 400 terawatt-hours in 2024 to nearly 1,000 by 2030, with AI workloads responsible for about a third of that total. The future of AI is now inextricably linked to the future of energy.
The Dawn of a New Machine Age
The fifty-year reign of Moore's Law gave us the tools to connect the world. The new exponential, the Token Law, is about what happens now that the world is connected. It's a paradigm shift driven not by human-to-machine chatter, but by a vast and growing chorus of machines talking to themselves—agentic systems that write code, run experiments, and manage complex workflows with multiplying levels of autonomy.
As we've seen, this new age comes with a new set of rules. The abstract "thought" of a token carries a real-world cost in energy, and the economics of AI are being rewritten around tasks completed, not tokens spent. We're also starting to account for what I'd call the "Unreliability Tax"—the hidden but significant engineering cost of building production-grade systems on top of non-deterministic models. This tax is paid in the engineering hours spent on robust retry logic with exponential backoff, the computational overhead of input/output validation parsers that can handle hallucinated JSON, and the architectural complexity of stateful error recovery to roll back a workflow that fails midway.
The central challenge for engineers and innovators in the next decade will not be merely building bigger models, but mastering the art of orchestrating these powerful, token-hungry agents. The future will belong to those of us who can manage this flow of digital thought as meticulously as a conductor leads an orchestra.
However, there is a fascinating counter-argument to consider: the Intelligence Paradox. Does a truly advanced agent use more tokens, or fewer? A novice programmer might write 1,000 lines of brute-force code to solve a problem a senior engineer solves in 100 elegant lines. It's possible that the current explosion in token use is a symptom of agent immaturity, and that as these systems become more intelligent, they will become more efficient, learning to solve complex problems with a fraction of the "thought" they require today.
The age of the token has just begun, and it promises to be a far stranger, faster, and more transformative era than the one we're leaving behind.