The cost of everything and value of nothing
By Iain,
Nobody knows what a token will cost in five years. Nobody knows how many tokens a single user will burn through in a working day, or whether the word “token” will even still mean what it means now once models have been carved up, distilled, and pushed to the edge. We know roughly the shape of the spreadsheet. We have no idea what goes in the cells.
We know reasoning chains are long, media is heavy, and agents loop. We know inference cost per token has been roughly halving every three months, a Moore’s Law cadence that nobody would sensibly extrapolate past 2027. We know each frontier model has a commercial half-life of six to nine months before the next one arrives and eats its lunch, which means even a profitable inference margin today is a bet on your ability to keep chasing. And sitting behind all of this is the fact that consumer demand, once you give people something genuinely useful, does not politely cap itself.
Forecasting AI pricing in 2026 feels like forecasting bandwidth in 1998. You know roughly what the question is. But you have no idea what the answer will turn out to be, and anyone claiming otherwise is probably selling something.
The mobile precedent
In fact, there is an even better comparison than internet bandwidth: mobile data in the 2000s.
The telcos had spent vast sums deploying radio networks that could, in principle, give you data on your phone. What they could not do was let everyone use as much as they wanted without the networks collapsing under their own weight. So they spent the better part of a decade trying to work out how to ration it — how to turn a bit, which means nothing to a normal human being, into something a normal human being would pay for. It was a miserable and wildly profitable era in equal measure. If you owned a Nokia in 2006 you remember the dread of the bill.
Change the proper nouns and the problems map almost one for one onto what every SaaS CEO is now arguing about with their CFO over Slack. A megabyte of data, like a thousand tokens, means nothing to a normal human being. Usage does not track value in either direction — a teenager downloading ringtones could cost the operator more than a banker reading email, the same way a single agentic workflow can burn through more tokens than a hundred chat sessions while producing something the user values less. Perceived value, which is what people pay for, drifts wildly from the thing you are measuring on the meter. That gap is the whole game, both then and now.
The telcos dealt with this in two ways, both worth remembering.
First, they spent 15 to 20% of revenue on capex every year, deploying successive Gs that delivered incremental speed and, more importantly, much more capacity. The entire business case for 5G was capacity, even though the marketing was about raw speed. The AI industry is running the same play. Anthropic, OpenAI and Google are each burning tens of billions of dollars a year on training compute and data-centre buildout, which is the research-era equivalent of laying fibre and lighting up base stations. The capex line items look different. The economic shape is identical. You spend vast sums up front to push the capacity curve out far enough that demand can grow into it without the whole thing breaking.
Second, they tried to segment pricing by perceived value. The canonical example was SMS. A text message rode on the signalling channel rather than the data channel, which made it almost free for the operator to deliver. The retail price per bit was astronomical however, because the perceived value mapped tidily to what operators charged — a text to your girlfriend was worth 10 cents in a way that a megabyte of wallpaper downloads was not. People paid for years, (mostly) happily.
Every AI vendor is now hunting for its SMS. GitHub Copilot sells at $19 a month because writing code feels worth that much, even though the underlying inference is probably worth a fraction of it. Cursor charges for a premium model tier because debugging at 2am feels worth the upgrade. Any pricing scheme that finds a moment where the value is vivid enough to override the abstraction of the meter can charge almost anything it likes. The hard part is that most software use cases do not have an SMS moment.
After SMS, the segmentation got messier. Operators zero-rated their own portals, then offered “unlimited” plans with fair-use caps that were not, on any honest reading of the word, unlimited. They were trying to trace the price-elasticity curve with a crayon, and the crayon kept slipping.
Speed-running the same conversation
Every pricing debate currently happening in enterprise AI is a compressed replay of something that happened to the mobile industry between about 2003 and 2012. Bundles, caps, tiered allocations, fair-use policies, hybrid constructs with a monthly floor and usage-based overages, commitment-based discounting.
The arguments are the same arguments. The buzzy idea at the moment, outcome-based pricing, where the customer pays a percentage of the value the software creates, is not new either — performance-based marketing has operated this way for decades, and it has always had the same problem.
The problem is that for most enterprise software, you cannot mechanistically link a dollar of value to a specific set of actions by the software.
Stripe can do it. Stripe literally sits on top of a flow of money and takes a cut. Salesforce sort of claims to do it, and an ad agency will claim to, and half the ad-tech world has spent twenty years and trillions of dollars arguing about attribution models without anyone truly being sure which impression drove which purchase. But when an HR team buys a piece of hiring software, there is no revenue to attribute and no cost saving that bookkeeping would recognise. When a security vendor prevents an incident that would have ended the company, the value is infinite and also entirely counterfactual. What is the outcome-based price for a fire that did not happen?
The honest answer is that outcome-based pricing only works where outcomes are measurable, isolable, and discretely attributable to the software. That is a tiny corner of the enterprise software world. Everywhere else, outcome-based pricing is either a salesperson’s rhetorical device or a messy calculation the vendor does internally before quoting a flat annual figure.
What happens next
Two things will likely happen, and they are not mutually exclusive.
The first is that we will see a proliferation of clever and complex pricing structures. Some will be genuine attempts to match price to value. Some will be obfuscation, designed to make bills unpredictable in a way that favours the vendor. A handful will endure. Most will not. Expect the next eighteen months of enterprise sales calls to contain long sections explaining why a particular vendor’s six-dimensional pricing model is, in fact, simple once you understand it.
The second is the one that has happened every other time this argument has played out. We will end up, for the vast majority of use cases, with flat-rate bundled pricing that gives the buyer absolute predictability on what they are spending. A seat-based licence is a usage-based licence with the usage abstracted away, and the HR team buying a hiring tool is perfectly capable of doing the value calculation themselves, in their own heads, in terms of hours saved and vacancies filled. They do not want to hand over a percentage of the hiring manager’s time savings. They want a number that goes into a line item.
Put slightly differently, the legacy pricing models were already outcome-based pricing, just with the outcome calculation pre-agreed between buyer and seller and then hidden inside a simple annual fee. Every seat licence is an implicit bet by both sides on how much value the software will deliver per person per year. When that bet is roughly right, nobody argues. When it drifts out of whack, the contract renegotiates at renewal. This is not naive pricing. This is pricing that has absorbed the lessons of every previous attempt to get clever.
The edge is coming back
The most underrated part of the mobile story is where the traffic ended up.
In 2007, nobody quite knew how mobile data would scale. The answer, in the end, was that a huge share of it did not stay on mobile networks at all. It got offloaded to Wi-Fi. Your phone is a mobile device only for the minutes you are not at home or at work or in a coffee shop. Everywhere else, the bits go over someone else’s fixed-line connection, paid for by a different business model entirely, and the mobile operator never sees them. The carriers built the expensive network. A different set of pipes carried most of the traffic. This was not what anyone planned for in 2005.
The AI industry is heading for the same split. Today every token runs through centralised inference on bespoke silicon in a data centre with a dedicated power substation, which is the equivalent of 2007, when every bit traversed the radio network. The future, already visible in the research, is a layered world. A large frontier model in the cloud for the hard problems. A medium model on a cheap cloud instance for the bulk of enterprise work. A small model on the device for the things you do a hundred times an hour — autocomplete, summarisation, intent classification, simple tool use. That third tier is the Wi-Fi of AI.
Apple has been telegraphing this for two years with its on-device hardware capabilities, software shortcomings notwithstanding. Qualcomm and MediaTek are putting NPUs in every new chipset. Microsoft’s Phi series and Google’s Gemma are pushing the ceiling of what a sub-10B-parameter model can do on a laptop. None of this will eliminate the frontier. It will swallow the tail of the distribution, which is where the tokens live.
This matters for pricing because token volume is not spread evenly across use cases. The vast bulk of tokens, in the median enterprise, are used on tasks where a model one generation behind the frontier is completely adequate. Summarising an email thread does not need GPT-5 or Claude Opus 4.7. It needs a capable small model, running somewhere cheap, preferably on hardware you have already bought. When that tier of demand moves off the centralised inference bill and onto silicon the customer already owns, the pricing conversation changes. Flat-rate becomes easier to offer, because the expensive edge cases are a smaller fraction of the volume. This will be the single most important pricing story of the next three years, and it is happening quietly because the companies benefiting from the current pricing model have no incentive to accelerate it.
Two forces, one direction
Two things are true at once about AI pricing, and the tension between them is what will shape the next three years.
One is commercial. Vendors want to extract value in proportion to the value they create. Buyers want predictability. The distance between those two positions is where all the clever pricing structures live, and it is where most of the energy in enterprise sales conversations is currently going.
The other is technical. Moore’s Law, in its AI-era form, is squashing the complexity out of the cost structure. Inference cost halves. Small models close the gap on large ones for the majority of workloads. The edge swallows the median case. Over a long enough horizon, the marginal cost of a token asymptotes toward the marginal cost of electricity plus a small engineering margin, and the pricing conversation simplifies because there is less to argue about.
These two forces are pulling in the same direction, even though they feel opposed. Complex pricing models are a symptom of a cost base that is still volatile and heavy. As the cost base stabilises and the inference tail moves to the edge, the pricing base stabilises with it. Mobile went the same way. In 2008 you had per-megabyte pricing, add-on data packs, roaming charges that could bankrupt a small country, and a fair-use policy that nobody in the shop could justify with a straight face. By 2015 most consumers in developed markets had a flat monthly bundle with a number of gigabytes they never came close to using, and the meter had all but disappeared. AI is somewhere around 2008 on that curve. The bills are complicated. The meter is new. Give it time.
The bill at the end of the universe
Nobody wants a bill they did not expect. It is the most durable pricing preference in the history of commerce. Gyms do not charge per squat. Restaurants do not itemise the pepper. When metered pricing has settled in the past, it has settled into abstractions that hide the meter.
The AI industry is going through the same arc. We will have a period, possibly a long one, of bundles and caps and tiers and outcome-adjusted credits and hybrid floors with overage multipliers. Some will be genuinely clever. Most will be friction. Ten years from now, a knowledge worker will pay a flat fee, use what she needs, and think about the token meter the way we now think about long-distance phone charges, which is to say not at all.
None of this contradicts the capex story. Hundreds of billions are being spent on datacentres precisely because the industry expects volume to scale faster than efficiency. The hyperscalers are not betting on token prices staying where they are. They are betting on being the utility layer when pricing settles, the same way Amazon bet on compute a generation earlier. Build-out and commoditisation are the same wager placed from different ends of the market.
Until then, enjoy your tokenmaxxing. The meter is not going to stay visible forever.
Like this? Get email updates or grab the RSS feed.
More insights:
-
Go to the actual place and see the actual thing
Somewhere in a Toyota plant in the early 1950s, a young engineer stood inside a chalk circle drawn on the factory floor. Taiichi Ohno, the architect of the Toyota Production System, had put him there with a single instruction. Watch. No clipboard, no agenda, just observe what ha…
-
Climbing the Claude ladder: from prompting to orchestrating
Most people using Claude are stuck on the first rung of a very tall ladder. They open a chat, type a question, get an answer, and move on with their day. Which is fine, but it’s a bit like buying a full workshop and only using the tape measure. I’ve spent the better part of a y…
-
The path to an agent-first web
For three decades, the web has operated on an implicit contract between the people who build websites and the people who visit them. You design pages for human eyes and organise information for human brains, monetising attention through ads, upsells, and sticky navigation patter…
-
Generative engine optimisation: separating sound practice from snake oil
A new three-letter acronym is stalking the marketing industry. Generative Engine Optimisation (GEO) is the practice of making your content visible in AI-generated answers, such as those produced by ChatGPT, Perplexity, Google AI Overviews, and Claude. The term was coined in a 20…
-
Automating your marketing 01: Paid Search Ads
Google has always wanted you to believe that running search ads is simple and not as complex as it actually is. Set a budget (a generous one!), choose some keywords, and let the machine handle the rest. To be fair, the machine has become exceptionally good at certain aspects of …