The narrow window for probabilistic agents
By Iain,
You can see the exact moment it goes wrong. The CIO sits through a vendor demo, watches an “AI agent” process a support ticket, look up an order, apply a returns policy, issue a refund, and send a confirmation email. It is slick, fast, and in every meaningful way, a workflow automation disguised in a language model’s trenchcoat. Each step follows a rule, and each rule was written by a human. The agent is not reasoning under uncertainty. It is executing a script, like a very expensive macro that can spell.
This is the blurriness at the core of the AI agent debate. Probabilistic AI, the kind that assigns confidence levels to outcomes and reasons about genuinely uncertain situations, has been lumped together with deterministic automation, workflow orchestration, and glorified if/then logic under the single banner of “agents.” The result is a market where every company believes it needs autonomous probabilistic systems when what it actually requires is better plumbing.
McKinsey’s own research confirms the scale of the mismatch. Nearly eight in ten companies now claim to use generative AI, yet roughly the same proportion say it has produced no meaningful impact on earnings. They call it the “gen AI paradox.” A less diplomatic interpretation might be that companies are buying hammers and then finding they needed a plumber. The cases where genuinely needing a probabilistic hammer do exist, and they are worth examining carefully because understanding where the boundary lies between probabilistic necessity and deterministic sufficiency is the single most useful thing a business leader can do before signing an agent platform contract.
Where probability is the territory, not a feature
A probabilistic system proves its worth when three conditions align. The input data is naturally uncertain, the problem space evolves more quickly than any static rule set can follow, and the cost of missing a pattern vastly exceeds the expense of an occasional incorrect prediction. Remove any one of these three factors, and you could likely address the problem with a deterministic process, a decision tree, or—more simply—a spreadsheet maintained by someone familiar with the business. The three-condition test is straightforward in its simplicity, and that’s the point.
Real-time ad bidding and programmatic media. If you have ever managed programmatic advertising for a client, or sat alongside someone doing it, you have already encountered a genuinely probabilistic system—whether you recognised it or not. Every time an ad impression goes to auction, the bidding system has roughly 100 milliseconds to estimate the likelihood that this specific user, on this particular page, at this moment, will click and then convert. The inputs are noisy and incomplete, with the user’s browsing history providing only a partial signal at best. The competitive landscape shifts with each auction because other bidders are adjusting their models in real time, which is akin to playing poker against a table of opponents who can see Half your cards are also engaged in their own separate game simultaneously. A deterministic bidding strategy, one that states “always bid £2.40 for users in this demographic,” will deplete the budget within days because it cannot adapt to the shifting probabilities of engagement. The system needs to reason about likelihoods, update those likelihoods continuously, and make decisions under uncertainty thousands of times per second. There is no way to write a static ruleset that handles this, because the uncertainty is the problem, not an imperfection in the data.
Fraud detection and financial anomaly monitoring. Any business that processes payments at scale, from an e-commerce platform to a SaaS company running subscription billing, faces a problem that is fundamentally adversarial. The fraudster is actively trying to appear as a legitimate customer, which makes this structurally different from, say, automating invoice processing, where the invoices are not trying to deceive you (though it sometimes feels that way). A deterministic system tuned to catch last quarter’s fraud patterns will consistently miss next quarter’s mutations because the adversary studies your defences and adapts.
A probabilistic system trained on distributions of normal transaction behaviour can flag anomalies it has never seen before, precisely because it models what “normal” looks like as a probability distribution rather than a checklist. It does not need to know what the next attack looks like. It needs to know, with quantified confidence, when observed behaviour deviates from the baseline by an improbable degree. This is why every major payment processor runs probabilistic models and why no amount of manual rule-writing will keep up with someone who is paid to find the holes in your rules.
Dynamic pricing under competitive pressure. Hotels, airlines, rental car companies, and increasingly e-commerce businesses face a pricing problem that is genuinely stochastic. The optimal price for a room on a Tuesday in March depends on how many competitors have availability, what events are happening in the area, how demand has been trending over the last 48 hours, and whether a conference that was announced two weeks ago just got cancelled, which you may find out from a tweet before you find out from your booking system.
These variables interact non-linearly and change by the hour. A static pricing matrix will leave money on the table when demand spikes and fail to fill capacity when it softens, and the gap between the two is where the probabilistic model earns back its cost many times over. The demand variance is not a data quality issue to be solved by better collection. It is the market behaving the way markets behave, and you need a tool that treats that variance as the input rather than trying to smooth it away.
Predictive maintenance on equipment-dependent operations. If your clients run manufacturing lines, logistics fleets, or data centres with heavy physical infrastructure, the question of when a machine will fail is irreducibly probabilistic. Sensor data from a CNC machine or a delivery truck engine is noisy, and the relationship between vibration patterns, temperature readings, and actual failure is stochastic rather than deterministic, more like medicine than mechanics. A rule-based system that says “replace the bearing every 10,000 hours” will either replace it too early, wasting money and downtime on a healthy part, or too late, causing a $50,000 production stoppage during a client delivery crunch.
A probabilistic model that estimates the likelihood of failure within the next 200 operating hours, updating that estimate with every new sensor reading, can tell you when a machine is drifting toward breakdown in time to schedule maintenance during a planned window. The model is not clairvoyant, but it does not need to be. It just needs to be right more often than the calendar-based schedule it replaced, and the bar for that is, candidly, very low.
These domains share something important: the uncertainty is not an engineering failure or a sign of messy data. It is a property of the territory itself. No amount of better data architecture, cleaner processes, or more rigorous project management will make competitive ad auctions deterministic, stop fraudsters from evolving their methods, or make demand curves hold still. That is why these domains need probabilistic tools, and why they have always needed them, long before the current wave of LLM-powered agents arrived.
What opens up next
The set of genuinely probabilistic use cases will grow, but probably not in the way the demos imply. The most promising areas for expansion are where large-scale pattern recognition under uncertainty intersects with high-frequency decision-making, and where the cost of mistakes is measured in more tangible terms than developer hours.
Supply chain disruption prediction ranks highly. This is not the basic demand forecasting that most companies perform, with seasonal adjustments and historical sales data, which is an established problem that predates AI decades ago, but authentic disruption modelling. Estimating the probability that a port strike in Shenzhen leads to a component shortage in Stuttgart four weeks later, considering geopolitical signals, shipping route congestion, and the dependencies among hundreds of suppliers, is genuinely non-linear and cannot be captured by static models due to the combinatorial explosion of interdependencies. Companies that succeed in this will gain a real competitive advantage and will require probabilistic agents.
Autonomous cybersecurity response is another area where probabilistic reasoning proves valuable. Not just anomaly detection, which is well developed already, but also deciding in real-time how to respond. Quarantine this endpoint or just flag it? Block this IP range or throttle it? The best response depends on a probability estimate of the threat level, the business cost of a false positive (such as locking a legitimate user out of a system at 2am—a concern for anyone who has faced such a situation), and how the attacker might adapt to each countermeasure. This is a game-theoretic problem layered on top of a probabilistic one, and it cannot be managed by a static playbook because the adversary adapts faster than any human can update the rules.
Revenue optimisation across channels in real time will become a genuine use case for probabilistic modelling as businesses operate across more platforms simultaneously. Not “set The price is reviewed monthly, which most companies do despite calling it “dynamic pricing.” They continuously adjust pricing, inventory allocation, and promotional spend across a range of channels based on signals about competitor behaviour, demand elasticity, and margin sensitivity. The challenge is not that any single decision is difficult, but that the interactions between decisions across multiple channels create feedback loops that are genuinely non-linear. Discount too aggressively on one channel, and you cannibalise full-price sales on another. However, the sensitivity of that cannibalisation shifts hour by hour, meaning the model must be probabilistic or it will be inaccurate.
Notice something about these areas: because they are complex, technically demanding, capital-intensive operations with dedicated engineering teams, specialised data systems, and regulatory frameworks requiring auditability at each step. They are not simply “our sales team wastes four hours a week on CRM data entry” or “our customer support tickets take too long to resolve.” The divide between domains that truly demand probabilistic agents and those problems most companies are trying to solve with AI is large, and pretending otherwise is costly.
The uncomfortable middle
Most business processes companies aim to automate with AI agents are not probabilistic at all. They involve workflow management, routing, data hygiene, and process documentation. These are problems that could have been solved, and often should have been solved, over the past decade using existing tools.
Consider the canonical “AI agent” use case from any enterprise vendor’s marketing page. An agent that receives a customer support ticket, looks up the customer’s order history, checks the returns policy, processes a refund, and sends a confirmation email. Every step in that workflow is deterministic. The returns policy is a set of rules; the order history sits in a database; the refund is a transaction; and the email is a template. The only step where you might argue for a probabilistic component is interpreting the customer’s initial message, and even there, you could make a strong case that a well-structured intake form would eliminate the need for natural language interpretation entirely.
IBM’s Danilevsky put it well when she said, “I’m still struggling to truly believe that this is all that different from just orchestration. You’ve renamed orchestration, but now it’s called agents, because that’s the cool word. But orchestration is something that we’ve been doing in programming forever.” She is right, and the reason it matters is not that orchestration is bad or unimportant. It matters because dressing up orchestration in the language of autonomous agents creates confusion about what the system is doing, what it costs, where it can fail, and who is accountable when it does. A deterministic workflow that processes insurance claims according to business rules is legible, auditable, predictable, and cheap to run. A probabilistic agent doing the same job is opaque, harder to audit, and introduces failure modes such as hallucination and inconsistent policy application that do not exist in the deterministic version. You are adding risk without adding any corresponding ability that the situation demands.
The Gartner data on this paints a picture of widespread institutional caution. Only 15% of IT application leaders are even considering deploying fully autonomous AI agents, and a full 74% of respondents believe these agents represent a new attack vector. Just 13% are confident they have the governance structures to manage them. These are not Luddites afraid of change but the people who will have to maintain whatever gets deployed, and they are telling you that the governance, security, and accountability infrastructure does not exist yet for the kind of autonomous probabilistic systems that the marketing materials promise.
The accidental revolution, or, how AI made plumbing interesting
When organisations try to deploy AI agents and fail, what they discover is not that the AI does not work. What they discover is that their data is a mess, their processes are undocumented, and their operational workflows have accumulated decades of cruft and tribal knowledge that nobody has ever had the budget or the political capital to clean up.
MIT Sloan’s research on this is blunt, reporting that technical debt in the United States alone costs over $2.41 trillion annually, and that figure has nothing to do with AI. It is the accumulated cost of decades of “if it ain’t broken, don’t fix it” thinking across every business function, from accounting systems built in Access to customer databases held together with VLOOKUP formulas and prayer. MIT researchers studying AI agents for cancer detection found that 80% of the effort went into data engineering, stakeholder coordination, governance, and workflow restructuring. The AI was the straightforward part, while the hard part was the stuff that had nothing to do with AI and everything to do with organisational hygiene that had been deferred for years.
There is a dynamic here that deserves more attention than it gets. AI is creating an impetus, a social and organisational inflexion point, for companies to fix problems they could have fixed at any time in the last ten years. The problems were always there, the tools to fix them were always available, and what was missing was the motivation and the political cover to make it happen.
Cleaning up your data architecture was never likely to earn a standing ovation at the board meeting. Documenting your operational workflows was never going to secure a profile in the FT. Moving away from a legacy system that three people in accounting built in 2007 was never going to attract venture funding. These are the unglamorous, career-stalling, budget-begging tasks that every CTO knows need doing and that every organisation delays because there is always something shinier to pursue.
AI changed the incentive structure overnight. Suddenly, fixing the data architecture is not a cost centre cleanup but an “AI readiness initiative.” Documenting your workflows is not tedious process mapping but “preparing for autonomous agents.” Moving away from legacy systems is not maintenance but “building the groundwork for digital transformation.” The work itself remains the same, but the framing is completely different, and it is this framing that unlocks the budget.
You can see this playing out in concrete terms. Amazon reported that using its Q generative AI assistant reduced the time required to upgrade legacy Java applications from roughly 50 developer-days per application to six hours, estimating savings equivalent to 4,500 developer-years of work. But look at what that work actually is, because it is not building new AI capabilities. It is clearing technical debt that had been accumulating for years because nobody wanted to assign engineers to the thankless task of upgrading Java versions. The AI did not create new value so much as it made it possible, and politically acceptable, to capture value that had been leaking through cracks in the operational floor for a very long time.
This is not cynical, or at least not entirely. It is, in a slightly sideways fashion, the most worthwhile thing the AI hype cycle has produced. Not the agents themselves, many of which are orchestration systems that would have been called “workflow automation” three years ago and sold for a tenth of the price, but the organisational permission to do work that was always necessary and never funded. Every decade has its justification for operational housekeeping, from Y2K preparedness in the 2000s to cloud migration in the 2010s to AI readiness today. The justification changes, but the underlying work remains the same.
Arsalan Khan, a technology advisor, captured this dynamic precisely when he observed that technical debt is often both self-inflicted and cultural. Legacy processes, shadow IT, inconsistent data, and short-term shortcuts create friction that compounds over time. His follow-up observation is the one that should be pinned to every AI project board. AI can help by automating repetitive tasks and identifying patterns, but it cannot fix misaligned processes, poor data quality, or departmental biases. The broken spreadsheet does not care what model you run on top of it.
The companies that will see genuine productivity gains from this era will not be the ones that deployed the most sophisticated agents. They will be the ones who used the AI moment as cover to do the operational housekeeping that their organisations had been deferring for a decade. They will standardise their data formats because the AI needs it. They will document their processes because the agent needs a workflow to follow. They will retire legacy systems because the new tools cannot work with them. And then they will discover, quietly, that the productivity gain came less from the AI than from having clean data, documented processes, and systems that actually communicate with each other. It is the business equivalent of renovating a kitchen and discovering that the reason your food tasted bad was the twenty-year-old oven, not the recipe.
The taxonomy
If you are a business leader trying to determine where to invest, the essential framework involves just three questions about any proposed AI deployment.
First, is the core problem genuinely probabilistic? Does it involve irreducible uncertainty, adversarial inputs, or systems where outcomes depend delicately on conditions that change faster than your rules can be updated? If yes, you need probabilistic tools and should invest accordingly. These are the bidding problems, fraud detection issues, and dynamic pricing challenges where investing in proper probabilistic agents will prove valuable many times over.
Second, is the problem a workflow that currently requires a human to bridge gaps between systems that do not communicate with each other? If yes, you do not need an AI agent but rather well-integrated systems that connect seamlessly. You might use an LLM to interpret unstructured inputs at the system’s boundary, but the core work remains deterministic and should be built that way—legible, auditable, cost-effective to operate, and easy to adapt when business rules change tomorrow.
Third, is the problem actually that your operational infrastructure is held together with duct tape and institutional memory? If yes, the solution is not AI but the work you’ve been avoiding. Improve the data, document the processes, and retire the legacy systems. If AI provides the political cover to finally address these issues, then by all means highlight AI and secure the budget, but be honest with yourself about what truly generates productivity gains once implemented.
The S&P Global survey that found 42% of companies abandoned most of their AI pilot projects by the end of 2024 is not a failure story as it has been reported. It is a story about companies running expensive experiments and learning that the prerequisites were not in place. The pilots did not fail because the AI did not work, but because the organisations were not ready for any form of automation, intelligent or otherwise, and the pilot process exposed that truth in a way that years of internal memos and ignored audit reports never did.
Where the light falls
The AI industry has a strong commercial incentive to blur the line between genuinely probabilistic problems and ordinary workflow automation. Every vendor wants you to believe that your customer support operation needs the same category of tool as a trading desk running fraud detection, that both problems belong under the same “agentic AI” umbrella, and both require the same six-figure platform investment.
The narrowness of the genuinely probabilistic use cases should be clarifying rather than dispiriting. It means that for the vast majority of companies, the path to productivity does not run through deploying autonomous agents. It runs through the boring, unfashionable, enormously useful work of getting your house in order, the kind of work that earns nobody a keynote slot but saves everyone from the midnight phone calls when the legacy system crashes.
There is a version of the next five years where this plays out well. Companies use the AI moment to finally clean up their data, document their processes, retire their legacy systems, and build the kind of operational infrastructure that they should have built a decade ago. Along the way, they discover that they need fewer AI agents than they thought, because well-structured deterministic systems can handle 90% of the work they were planning to throw at a probabilistic model. The remaining 10%, the genuinely uncertain, adversarial, or chaotic problems, get the probabilistic treatment they deserve. Even for the companies whose agents never make it out of pilot purgatory, the most durable legacy of the agent era might be that it made operational hygiene fundable for the first time in a generation.
There is also a version where it plays out badly. Companies spend millions on agent platforms, discover that their data is too messy for the agents to work with, and abandon the project without doing the underlying cleanup work that the project was supposed to justify. This is the worst outcome because it combines the cost of the failed pilots with the ongoing cost of carrying the technical debt that caused the failures. The S&P Global number, 42% abandonment, suggests this version is already happening at scale. And if the hype around AI is what finally gives organisations the budget and the board-level attention to do the work, then the AI boom will have been worth every breathless press release.
Somewhere soon, a CNC machine is vibrating at a frequency that means it will fail in eleven days, and a probabilistic model is quietly calculating the odds. Somewhere else, a sales team is manually copying data between two systems that could have been connected with a Zapier integration in 2019. The first problem needs an agent, and the second needs a plumber, but the market is currently selling the same thing to both.
Like this? Get email updates or grab the RSS feed.
More insights:
-
The path to an agent-first web
For three decades, the web has operated on an implicit contract between the people who build websites and the people who visit them. You design pages for human eyes and organise information for human brains, monetising attention through ads, upsells, and sticky navigation patter…
-
Generative engine optimisation: separating sound practice from snake oil
A new three-letter acronym is stalking the marketing industry. Generative Engine Optimisation (GEO) is the practice of making your content visible in AI-generated answers, such as those produced by ChatGPT, Perplexity, Google AI Overviews, and Claude. The term was coined in a 20…
-
Automating your marketing 01: Paid Search Ads
Google has always wanted you to believe that running search ads is simple and not as complex as it actually is. Set a budget (a generous one!), choose some keywords, and let the machine handle the rest. To be fair, the machine has become exceptionally good at certain aspects of …
-
Why AI models hallucinate
In September 2025, OpenAI published a paper that said something the AI industry already suspected but hadn’t quite articulated. The paper, “Why Language Models Hallucinate”, authored by Adam Tauman Kalai, Ofir Nachum, Santosh Vempala, and Edwin Zhang, didn’t just catalogue the p…
-
Received wisdom: classic frameworks under AI pressure 01: David C Baker
David C Baker has spent thirty years telling agency owners something they already suspected but lacked the courage to act on. You are not expensive enough, not focused enough in what you do. You are not sufficiently authoritative with your clients. The issue is not your work. Th…