We have ways of making you pay

By Iain,

A hand holding a fountain pen drawing squiggles on a sheet of paper, on an light green background

The true cost of AI work is hard to measure; the value of AI work is also hard to measure, and metering changes which of those two blindnesses you notice first. It drags the cost into the light, itemised and arriving monthly, while the value stays diffuse, lagging and easy to argue about. That asymmetry is exactly why the panic is showing up now, ahead of any definitive verdict on whether the spending was worth it.

Simon Willison did the arithmetic on himself. He pays $200 a month across his Anthropic and OpenAI consumer plans, and when he ran the ccusage tool against his laptop, it showed $2,180 of API tokens burned in 30 days. He calls himself a moderately heavy user.

That gap between what he pays and what he burns is a straight subsidy, and it is closing. The way it closes tells you where AI pricing is heading. For two years, the frontier labs priced like companies fighting a land grab, which is exactly what they were doing, with allowances so generous that heavy users cost more to serve than they paid. That phase is ending, and the new one has a meter attached. Rising bills are only half of the story. The better half is what the meter exposes once it is running, which is that almost nobody, buyer or seller, can say what a unit of this work costs.

From a typical workday to the meter

The clearest sign is what happened to enterprise contracts. When Anthropic added Claude Code to its Team and Enterprise plans in August 2025, each seat included “enough usage for a typical workday.” By the account The Information published in April, that became $20 a seat plus API pricing for whatever you burn on top, a change Anthropic dates to November and which companies are discovering, uncapped, as their annual contracts renew.

OpenAI followed weeks later, moving its Codex rate card to API token usage from April 2nd 2026, first across Plus, Pro and Business, then across every existing Enterprise plan, with the charges dressed up as “credits” that map onto published token rates. The timing was not gentle. GPT-5.5 landed on April 23rd at twice the API price of GPT-5.4, and Opus 4.7 a week earlier at about 1.4 times Opus 4.6 once the new tokeniser is counted, so both labs rebuilt their contracts to bill usage just as the rates went up, locking year-long customers to the meter.

Then GitHub removed any remaining doubt. From June 1st 2026, every Copilot plan moved to usage-based billing, with a monthly allotment of credits consumed at published API rates. Chief product officer Mario Rodriguez gave the honest reason. A quick chat question and a multi-hour autonomous coding session, he wrote, have been costing the user the same amount, with GitHub absorbing the difference, and that arrangement is over. The fallback to a cheaper model when you run out is gone too. When the credits are spent, you either pay the rate or stop.

And this week Anthropic showed where the logic ends up. Claude Fable 5, launched on June 9th as the most capable model the company has ever sold to the public, costs $10 per million input tokens and $50 per million output, double Opus 4.8 and by some distance the priciest flagship any major lab sells.

The subscription mechanics are the telling part. Pro, Max, Team and seat-based Enterprise plans include Fable 5 only until June 22nd. From June 23rd it leaves those plans entirely, and continued use means buying usage credits at API rates, with Anthropic promising to fold it back into subscriptions “once capacity allows”, something nobody will be holding their breath for. The monthly fee now buys you the second-best model. The best one runs on the meter from birth, and the two weeks of included access is a free sample with an expiry date.

Why the labs can charge what they like

Pricing power like this arrives only when a product becomes something people cannot work without, and Willison’s argument, which I think is right, is that the labs have found product-market fit in coding agents rather than the chatbot.

The chatbot numbers explain why. OpenAI said in February that ChatGPT had more than 900 million weekly active users, of whom only 50 million paid. One in 18 converting at $20 a month is a poor way to repay a trillion dollars of infrastructure.

Coding agents bend the maths the other way. They eat tokens at a rate that a chat session never will, and they have become the standard tool for most developers because even $1,000 a month in inference costs pays for itself. The reach goes beyond software, too, since a coding agent can automate anything you do by typing instructions into a computer.

The labs spend enormous sums serving this, and the figure that stopped me came from SpaceX’s S-1 filing, which disclosed that Anthropic pays $1.25 billion a month through 2029 for compute on the Colossus clusters, capacity it ties to raising Claude Code and API usage limits. A company spending that with one vendor has a strong reason to stop subsidising anyone, and it explains why Fable 5 launched rationed. When the constraint is physical capacity, the meter doubles as a queueing system.

The budget panic

The uncapped meter has already produced casualties. The headline case is Uber, whose CTO told The Information the company had maxed out its full-year AI budget within months, mostly on Claude Code. Claude Code only became dependable last November, so a budget drawn up in 2025 was always going to underestimate 2026. That is a forecasting miss more than a scandal, and the supporting quote is softer than the headline it generated. Uber COO Andrew Macdonald said roughly a quarter of the company’s code commits last quarter ran through Claude Code, and that he could not yet draw a clean line from that to shipped features.

Some of the stories are starker. Axios reported that one company spent $500 million in a single month on Anthropic’s models after failing to set spend limits, the kind of accident only possible when the meter has no ceiling, and nobody watches the dial. The reaction has been a scramble for the controls, with Walmart capping its internal “Code Puppy” coding tool and an Amazon senior vice-president telling staff not to use AI just for the sake of using AI.

There will be many more such moments, each requiring the CFO to take a stiff drink in a darkened room, but the budget panic is just the surface. Underneath sits a problem that more spending discipline will not fix.

Nobody can price the work

The writer Ed Zitron makes the sharpest version of this case in a piece titled AI Doesn’t Have ROI, with no room for ambiguity. I do not follow him all the way to his conclusion, but his central observation is one the industry keeps stepping around. The return-on-investment debate assumes you can at least measure the cost side cleanly, and you cannot.

A token price is not a task price. The same request, run twice, can cost wildly different amounts depending on how far the model wanders, how many times it loops, how much context it drags along, and whether it does the wrong thing in a way that takes three more turns to unpick. OpenAI has conceded that hallucination is mathematically inevitable rather than an engineering bug, so the variance is not going away. You can know the price per million tokens to four decimal places and still have no idea what a finished piece of work will cost before you run it.

The subsidy made that invisible, and that was the point. For two years, a model that re-read the same file 20 times or spent an hour pursuing a dead end cost the user nothing extra, because the monthly cost was $20 or $200 regardless. Everyone learned to treat tokens as free and to file the failures under growing pains. GitHub’s own admission that a one-line chat and a multi-hour agent run have been billed identically is a confession that the meter was switched off on purpose. Switch it back on, and every retry and every wasted pass has a price next to it.

This is why the GitHub move drew anger rather than mere grumbling. Users were posting that a single prompt ate half their monthly allowance, during a promotional window that still hands out free credits. The work had not changed, only the visibility of its cost. People were seeing, for the first time, what they had been spending all along.

When firms try to net it off against the gains, the picture stays murky. Bain surveyed 951 executives at companies with revenue above $100 million and found 37% reporting cost reductions of 10 to 20%, a larger 40% reporting 10% or less, and only 4% clearing 30%. Worse, 44% were funding their next wave of AI with the savings from the last one, savings that several had not yet banked. Bain’s blunt summary was that the technology worked, but the value did not arrive, and self-funding the next wave from past returns is a circular bet with an obvious structural flaw.

What the meter makes visible

The meter does more than raise the bill. It turns a long list of mundane engineering sins, the kind flat-rate pricing hid completely, into line items you can read.

Take the Model Context Protocol, the now-standard way of plugging tools and data into an agent. A carelessly built MCP server injects its entire tool schema into the model’s context on every turn. That can be tens of thousands of tokens of JSON definitions, re-read on every message. Under an all-you-can-eat plan, this is free and invisible, the software equivalent of leaving every light in the house on because the bill is fixed. Put it on the meter, and you are paying, repeatedly, to remind the model of abilities it is not using.

The same goes for the retrieval setup that re-fetches and re-tokenises the same large file 20 times in one task, the agent that retries a failing call 10 times and resends its whole context each time, the system prompt that has quietly grown to novella length, and the tool that returns a 5,000-row blob when the model needed four fields.

None of it was worth worrying about when usage was free. All of it now has a price, and the price compounds against intuition. Per-token costs have fallen by roughly 10 times a year at the raw inference layer, yet as the coding-tool maker Kilo worked through, frontier-model prices have stayed roughly flat while the tokens a task burns have climbed about tenfold in two years. In other words, the fuel got cheaper, but the trips got far longer.

The multipliers stack from there, because a reasoning model spends thousands of tokens deliberating, and on a hard query can burn more than 100 times the compute of a simple lookup. Anthropic’s own engineers found an agent uses roughly four times as many tokens as a chat, and a system fanning work across parallel subagents about 15 times. The 15 then compounds, since a subagent that spawns more subagents, or a tool handing back an oversized result, can multiply one query’s cost by another 10. Which is the bloated MCP from a moment ago, with a far larger price tag.

The client research agent that spins up five subagents to comb sources in parallel felt instant and free on a flat-rate plan. On the meter, it is a few dollars a run, and a few dollars a run, 100 times a day, is a junior salary by the end of the quarter. Kilo projects heavy individual usage at $100,000 per developer per year before long.

The meter, then, is a forcing function for the context discipline good engineers were supposed to have all along. Organisations that treat token efficiency as a first-class concern and audit what their tools send will pay a fraction of what sloppier competitors pay for the same output. The ones who ported flat-rate habits onto a metered plan are about to find out, in itemised detail, what those bad habits cost.

The value side is just as dark

This is where I part company with Zitron. He reads the missing measurement as proof the value is missing altogether, and the technology is a four-year con, which the evidence does not support. The measurement problem is genuine, but the conclusion he draws from it is not the only one available.

The most thoughtful counter comes from SemiAnalysis, whose Dark Output piece argues that AI output will be felt long before it can be counted, an echo of Robert Solow’s old crack that you could see the computer age everywhere except in the productivity statistics.

There is no barrel of consulting, no metric ton of research reviews. When a task that used to carry a human price becomes a few cents’ worth of tokens absorbed within a firm, the value does not vanish, but the only trace left in the accounts is the cost. A million tokens can produce junk or automate a process that reshapes a company’s operations, and the two can look identical in the ledger unless someone deliberately measures the difference, which takes planning and effort.

So the true cost of AI work is hard to measure, and the value of AI work is also hard to measure, and metering changes which of those two blind spots you notice first. It drags the cost into the light, itemised and arriving monthly, while the value stays diffuse and lagging, easy to argue about. That asymmetry is why the panic is showing up now, ahead of any verdict on whether the spending was worth it. The bill is legible. The benefit much less so. Human nature does the rest.

The price ladder

The Fable launch produced hand-wringing about the end of subsidised AI and the divide it opens between corporations that can pay the meter and the individuals and small firms that cannot. Fair, as far as it goes, but the more useful way to read this week is that the market is sorting itself into a ladder, with more rungs than just the rich-and-poor framing.

Apple supplied the bottom rung at WWDC on June 8th. Siri AI, the company’s rebuilt assistant, is a chatbot with personal context built into the operating system. It reads your screen, digs up a friend’s address from last March’s Messages, follows the conversation across devices, and costs nothing beyond the hardware. That is a baseline tier of intelligence owned by the company that already owns the slab of glass in your pocket, and the labs’ free offerings now compete with something that ships in the box.

Above that sit the $20 plans, which buy better models and more generation. The serious subscriptions come next, the $100-to-$200 tiers Willison pays for, built for people whose working day runs through the tools. At the top is the meter itself, usage billed at API rates, where Fable 5 now lives and where the most complex and demanding agentic workloads are heading. So four rungs, where a year ago the public conversation saw two. There is arguably a fifth, since Anthropic sells its unrestricted Mythos-class models only to approved organisations, a rung where the product is permission rather than tokens.

A ladder like this has a name in every other industry. It is price discrimination, the structure airlines and utilities converge on once they stop buying market share and start charging what each segment will bear. Its arrival is the clearest sign yet that the labs believe the land grab is over. You do not build fare classes for a product you are still giving away.

The routing worry is well founded too, and it goes alongside the context discipline above. The router that sends boilerplate to a cheap model and reserves Fable for work that justifies $50 a million output tokens is the same instinct as the engineer who trims the MCP schema. Both need to ask the same question of every task. What does this work deserve to cost?

What you are buying

The fixed subscription is turning into a variable utility bill that scales with how useful the tool is, which means the better it gets, the more it costs. Track consumption the way you track cloud spend, because it now behaves similarly. Set hard caps before you need them, since the $500 million accident only happened because no ceiling existed.

Treat your agent harnesses and MCP servers as cost centres rather than latency problems, because the difference between a tidy context and a bloated one is now denominated in dollars on an invoice. And decide in advance which rung of the ladder each class of work belongs on, because the router is about to become the most important piece of software in the company.

The cheap years happened. We all ordered like it was an open bar, and the labs let us, because it was a customer-acquisition cost that was always going to be clawed back. Willison’s $2,180 is what the product is economically worth. His $200 is what it cost while the labs were still buying market share, and that is nearly at an end. The tab, it turns out, was always going to land on the people who liked to party most.

More from the blog

  • Bloated: how chat made you fat

    By Iain,

    It helps to remember the time you save generating a document is not free. It is borrowed from every person who has to read it, at interest, and the longer the distribution list the worse the rate of return.The pitch for writing with a language model is that it saves you time: you describe the memo, the model produces it and 90 seconds later you have four pages (okay, maybe forty) instead of a blank document. Someone still has to read those pages though. The model did not remove that work. It just moved it downstream to your colleagues or suppliers, and on the way it produced more than any h…

  • Apple’s bicycle without a chain

    By Iain,

    Steve Jobs described the computer as a bicycle for the mind. Apple Intelligence so far is more like a bicycle with no chain. The frame is gorgeous, and the engineering is extraordinary, but you cannot get far with it.In early 2025, Xe Iaso published a piece that landed like a brick through a window in the Apple developer community. The argument was simple and damning: Apple had built the holy grail of trusted compute with Private Cloud Compute, a genuinely unprecedented piece of security infrastructure, only to fill it with half-baked notification summaries and an image generator that produce…

  • Weeknotes vol. 17: business, schmizness

    By Garrett,

    Hello and happy casual weeknotes Friday.I stopped writing these about a year ago when I began the transition into consulting (solving fun and challenging problems), and to say a lot has changed since then would be the understatement of the century.In summary: Iain joined full time, we’re helping people solve operational problems and optimize their work across pretty much all aspects of business, and we’re having a lot of fun doing it. Iain has his masters in AI for Business, which has pushed me to go down the biggest rabbit hole I’ve been down since HTML/CSS in college (and we know where that…

  • The ten trillion dollar gamble

    By Iain,

    In November 2025, on stage at the Wall Street Journal’s Tech Live event, the chief financial officer of OpenAI was asked how her company planned to honor roughly $1.4 trillion in compute contracts on $13 billion of revenue. Sarah Friar said she was looking to assemble a network of banks, private equity, and a federal “backstop” or “guarantee.” By the following evening, she had posted to LinkedIn explaining that “backstop” had muddied the point, that what she meant was something more like a public-private partnership, and that the United States government has been “incredibly forward-leaning” …

  • Never talk about goblins

    By Iain,

    Buried in a JSON file that OpenAI posted to GitHub recently, inside the configuration for its newest coding agent, sits an instruction that reads like a footnote written by someone losing their composure. “Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user’s query.” The line appears more than once. Whoever wrote it wanted to be sure the model understood.Most readers, including ones who follow AI closely, may be unaware of what a “base instruction” is, where it lives, or why anyone a…

All blog posts