Snake oil, SEO, and the GEO chimera

By Iain,

A cobra rising from an open apothecary bottle labelled with a starred review card on a green background

Update, 21 May 2026.

Google used yesterday’s I/O keynote to make AI Mode the default Search experience worldwide, powered by Gemini 3.5 Flash, with conversational follow-ups built into AI Overviews and new agentic features that surface passive digests on user-specified topics. Search has stopped being a distribution mechanic that hands users off to other sites. It has become an engagement sink that resolves queries in place. Queries reached an all-time high in Q1 2026 and search revenue growth accelerated, which means the experiment paid off.

Two implications follow for everything below. The zero-click problem this piece describes has graduated from a publisher complaint into Google’s stated product strategy. Optimising for citation rather than traffic is no longer a hedge against a possible future. It is the whole game. The Marketing Live announcements the same afternoon extended the logic: Universal Cart lets shoppers complete transactions inside Google’s surface without visiting the retailer, Direct Offers adds bundling and BNPL, and AI-powered Shopping ads embed Gemini-generated explainer text inside sponsored results. None of this rewrites the SEO bargain. It just compresses the funnel further and turns citation-as-currency into an operating condition. The practical advice below still holds.


In the Smithsonian Institution’s Division of Medicine and Science sits a glass bottle, embossed “Clark Stanley Snake Oil Liniment”. When the federal Bureau of Chemistry analysed Stanley’s product in 1916, the contents were mineral oil, around one percent beef fat, capsicum, and traces of camphor and turpentine, but no snake. Stanley was fined twenty dollars for misbranding.

The catch is that real snake oil, the kind Chinese railroad workers brought west in the 1840s, was a working anti-inflammatory. A 1989 Western Journal of Medicine study by Richard Kunin found Chinese water snake oil contained around 20% eicosapentaenoic acid, against 8.5% in American rattlesnakes. Stanley substituted rattlesnake, then nothing, and kept the name.

GEO in 2026 is in similar shape. The traffic shift behind the hype is undeniable. Gartner projects traditional search engine volume will drop 25% by 2026 as AI chatbots eat queries. Ahrefs found AI Overviews had cut clicks to the top organic result by 58% by the end of 2025, up from 34.5% eight months earlier, with the trend accelerating. ChatGPT processes roughly 2.5 billion prompts a day. The traffic is moving. The question is whether the discipline being built to chase the traffic has anything in it worth time and effort, or whether it is, in the way Stanley’s bottle was, mostly meaningless labels.

What SEO offered

Search engine optimisation at least offered the chimera of control. Google’s ranking system was opaque, but it produced deterministic outputs you could read. For any single query at any given moment, there was a SERP with positions. You could observe it. You could track it. You could change a page, wait for the recrawl, and see whether the position eventually moved. This supposed feedback loop was enough to sustain the belief that you were steering the boat, even when most of what you observed was really the current.

A great deal of what was sold as SEO over the years was nonsense. Keyword density meters, link wheels, exact-match domains, doorway pages. The first decade of the industry was mostly correlation-mining dressed up as science, with periodic algorithm updates from the search engines that punished the manipulators and reset the field.

Google’s Panda update in 2011 buried a generation of content farms. Penguin the following year did the same to link schemes. What survived was less a science than a discipline of habits. Produce good content, make it accessible, build credibility, and the system would mostly reward you. Most “ranking factors” were folk wisdom retro-fitted to outcomes. The control was always largely an illusion. But the illusion was sustainable because the system gave you outputs, and for many reading outputs that change over time is psychologically indistinguishable from controlling them.

That instinct is what people are reaching for when they talk about Generative Engine Optimisation (GEO - although I actually prefer Answer Engine Optimisation). If LLMs are eating search traffic, it follows that there should be an equivalent discipline for influencing what they cite. Position three in a SERP should have an equivalent in AI answers.

It doesn’t. And the SEO version, if we are brutally honest, didn’t either. The difference is that SEO at least gave you a surface to focus on. GEO is the same wishful thinking with no observable substrate to keep it plausible.

The chimera

LLM outputs are not deterministic. Sampling temperature, model version, prompt phrasing, user context, retrieval strategy, and a constantly changing index all influence which sources appear in a given response. The same query, run twice, can produce different citations. The same model, updated next week, can produce different citations. There is no SERP. There is no position. There is no citation rank you can observe and track the way you observed and tracked search engine rankings.

The closest the industry has produced is share-of-model tracking. Run a panel of prompts repeatedly, count how often your brand appears, observe the number drift, which is usually enormous. One analysis put month-on-month variance in AI citations at 40 to 60% without any underlying content changes. The cross-platform picture is worse. ZipTie’s analysis found that 89% of citations differ between ChatGPT and Perplexity, with only 18% of brands appearing across all three major AI platforms simultaneously.

There is no unified “AI visibility” to track. There are several weakly-correlated phenomena that the industry treats as one. Tracking citation share is not the equivalent of tracking SERP positions. It is the equivalent of estimating average rainfall by counting the puddles outside the window.

You can also run controlled experiments in a lab. The Aggarwal et al. Princeton paper did exactly this, with a benchmark of 10,000 queries against a system designed to mimic Bing Chat. The paper reported around 41% lift in citation likelihood from adding statistics, 28% from quotations, and 30 to 40% from citing third-party sources. The numbers are real, but they are also benchmark artefacts produced in a controlled environment that does not exist in production.

SandboxSEO has made a sharper methodological objection. The three winning techniques all involved adding content, while the six failing ones only tweaked existing text. The lifts may be a function of content density rather than any optimisation magic. Adding statistics and quotations gets cited because it gives the model more substantive material, not because the model has a preference for statistics and quotations as such.

Read carefully, the strongest published evidence in the field reduces to two claims. The Princeton paper says that content with statistics, quotations, and citations is more likely to be cited, in a benchmark. Semrush’s analysis of 230,000 prompts across three LLMs over 13 weeks says that the domains ranking well in traditional search get cited more by AI. ZipTie qualified the second claim by noting that Domain Authority alone explains only about 18% of citation variance, meaning aggregate authority captures only part of why some domains get cited and others do not.

This is the chimera. There is no GEO playbook that adds materially to “write substantive, well-sourced content” and “do good basic SEO”. Everything else is either crawler plumbing, debunked tactics, or invented numbers.

What the evidence supports

The full list of GEO practices with published evidence behind them fits in a single table. None of it is novel, and none of it constitutes a separate discipline.

Practice Why Evidence
Allow OAI-SearchBot, Claude-SearchBot, Claude-User, and PerplexityBot in robots.txt These are the live retrieval crawlers for ChatGPT search, Claude search, Claude user-fetches, and Perplexity. Block them and you are invisible to live AI search. Provider documentation, summarised in Search Engine Journal’s coverage of the Anthropic and OpenAI crawler split
Server-render the content you want cited AI crawlers do not reliably execute JavaScript. A client-rendered React or Vue site appears as an empty HTML shell to most retrieval crawlers. Multiple independent server-log analyses and crawler documentation. Client-side rendering is a well-documented retrieval failure mode.
Structure content for extraction AI systems retrieve sections rather than scoring whole pages. Each section needs to function as a self-contained unit with a clear heading and the answer at the top. GenOptima’s analysis found that AI Overviews cite from the first 30% of content 55% of the time. GenOptima cross-platform monitoring. Inverted-pyramid journalistic convention long predates LLMs.
Add sourced statistics to your content Statistics addition produced about a 41% lift on the position-adjusted word count metric in the Princeton benchmark. Combined with fluency optimisation it was the best-performing single strategy. Aggarwal et al., ACM KDD 2024
Add named quotations from credible experts Quotation addition lifted citation likelihood by about 28% on the subjective impression metric. Aggarwal et al., ACM KDD 2024
Cite third-party sources inline Citing sources lifted citation likelihood by 30 to 40% in the Princeton benchmark. Lower-ranked pages saw lifts of up to 115% from GEO methods overall. Aggarwal et al., ACM KDD 2024
Publish original research and proprietary data AI engines are risk-minimising systems and prefer content with verifiable, attributable data. Sites with high original-data density received 4.31 times more citation occurrences per URL than directory-style listings. ZipTie cross-platform analysis
Maintain strong domain authority through traditional SEO Domains that rank well in Google search dominate AI citations, though the cited URL is often a deep subpage. Aggregate Domain Authority explains only about 18% of citation variance, so this is one input rather than a master lever. Semrush analysis of 230,000 prompts, with the variance caveat from ZipTie’s analysis
Implement schema markup for Bing Copilot and Google AI Overviews Microsoft has confirmed schema helps Copilot interpret content. Google’s Search Liaison has said structured data gives an advantage in AI-generated search experiences. Microsoft’s Fabrice Canel at SMX Munich, March 2025, and Google’s Search Liaison statement, April 2025, both reported by Search Engine Land

What the evidence does not support

The snake oil in the category draws attention disproportionate to its evidence. The biggest single example is llms.txt. As Kai Spriestersbach put it, the argument that AI providers read your llms.txt because they publish their own is “the fact that a restaurant has a menu with the claim that it reads other restaurants’ menus before cooking”. The provider files cited as proof of adoption all live on developer documentation subdomains, where they exist so that coding agents like Cursor and Claude Code can pull API references at inference time. Production retrieval systems are not reading them.

Practice Why it is snake oil Counter-evidence
Treating llms.txt as a GEO tactic A proposed Markdown file format for LLM retrieval. No major AI provider has committed to reading it during retrieval. The crawlers do not request it during routine site visits. SE Ranking analysed 300,000 domains and found no meaningful correlation with citations, Semrush implemented one on Search Engine Land and reported no effect, and John Mueller compared it to the discontinued meta keywords tag, and Gary Illyes confirmed Google does not support it.
Blanket blocking all AI bots Recycled from 2023-era publisher advice that predates the training and search crawler split. Sites running it today block themselves from AI search visibility while leaving training ingestion mostly intact, because not all training crawlers honour robots.txt the way the search crawlers do. OpenAI separated GPTBot from OAI-SearchBot and ChatGPT-User in late 2024. Anthropic followed with the ClaudeBot, Claude-SearchBot, Claude-User split. A Rutgers and Wharton study published in December 2025 found publishers who blocked AI bots experienced a 23% traffic decline against peers who allowed crawling.
Citation guarantees from agencies or tools Cross-platform variance and constant model updates make any individual citation impossible to guarantee. The 89% citation difference between ChatGPT and Perplexity means there is no single “AI” to be guaranteed in. ZipTie cross-platform analysis, combined with the underlying non-determinism of LLM retrieval
Treating schema markup as a universal GEO lever Confirmation exists only for Bing Copilot and probably Google AI Overviews. No major AI provider beyond those has said schema affects citation likelihood. Search Atlas, December 2024, found no correlation between schema coverage and citation rates across OpenAI, Gemini, and Perplexity
Vendor lift statistics without methodology Numbers like “60% visibility loss without llms.txt”, “300% accuracy from schema”, and “30% citation lift from X” appear in consultant content without any underlying study. Many are laundered versions of the Princeton paper’s benchmark numbers, recycled as universal claims. The Princeton paper is the only large-scale peer-reviewed study in the field. Any vendor claim that does not name a methodology should be treated as invented.
AI visibility tracking tools that promise to track citation share precisely Month-on-month variance in AI citations runs at 40 to 60% even without content changes. The signal is dwarfed by the noise. Treating a panel-derived “share of model” number as a controllable metric is a category error. Citation variance analysis
Buying into a separate “GEO discipline” with its own playbook Both the Princeton paper and the Semrush correlation reduce to two instructions. Write substantively with credible sources, and keep doing SEO. There is no third lever beyond those that the published evidence supports. The combination of Aggarwal et al., ACM KDD 2024 and the Semrush 230,000-prompt analysis

In summary

The reason SEO survived as a discipline, despite its early-decade silliness, is that the system gave you outputs you could monitor over time. You could see your position. You could eventually see the change after a recrawl. The feedback loop was crude and the causal stories were mostly wrong, but the SERP held its shape long enough for the discipline to feel like one. The chimera was sustainable because the system gave you something to focus on.

GEO is the same instinct applied to a system that gives no equivalent outputs. The closest analogues to a ranked position are noisy averages over hundreds of prompts whose underlying variance dwarfs whatever signal you might detect. The rational response is to admit that any attempt to measure this is wishful thinking, do the access plumbing, continue to write substantive well-sourced content, maintain the SEO authority that is the entry ticket to AI citation, and to stop chasing shadows.

Stanley’s bottle still sits under glass in the Smithsonian. The label still promises a curative discovery that astonished and convinced a generation of intelligent people. History rhyming once again.

More from the blog

  • Another nice mess

    By Iain,

    Somewhere in your business right now, someone is assembling a picture that no single app can provide. It may be the project manager pulling hours from Harvest and budget data from the finance tool to assess whether the engagement is still viable. Maybe it's you on a Sunday, because what you need is not any one number from a system, but the pattern across three of them. The cloud gave small businesses access to the best software they had ever had, priced monthly and built for specific purposes. But twenty years of sensibly chosen apps have left the average small business with a patchwork data …

  • The state and the machine

    By Iain,

    > What little we saw of Fable and Mythos offers both cause for excitement and concern. It was widely and credibly seen as a model of a completely different caliber from those that had come before. Perhaps the risks in this instance were overstated or amplified for political ends. What is more profound is that the short time we had with the models offered a clear glimpse of a future in which a single company is making significant progress toward a superintelligence with the potential to rival or exceed the power of nation-states or even massive corporations. That juncture was never going to ar…

  • We have ways of making you pay

    By Iain,

    > The true cost of AI work is hard to measure; the value of AI work is also hard to measure, and metering changes which of those two blindnesses you notice first. It drags the cost into the light, itemised and arriving monthly, while the value stays diffuse, lagging and easy to argue about. That asymmetry is exactly why the panic is showing up now, ahead of any definitive verdict on whether the spending was worth it.Simon Willison did the arithmetic on himself. He pays $200 a month across his Anthropic and OpenAI consumer plans, and when he ran the [ccusage](https://github.com/ryoppippi/ccusa…

  • Bloated: how chat made you fat

    By Iain,

    > It helps to remember the time you save generating a document is not free. It is borrowed from every person who has to read it, at interest, and the longer the distribution list the worse the rate of return.The pitch for writing with a language model is that it saves you time: you describe the memo, the model produces it and 90 seconds later you have four pages (okay, maybe forty) instead of a blank document. Someone still has to read those pages though. The model did not remove that work. It just moved it downstream to your colleagues or suppliers, and on the way it produced more than any h…

  • Apple’s bicycle without a chain

    By Iain,

    Steve Jobs described the computer as a bicycle for the mind. Apple Intelligence so far is more like a bicycle with no chain. The frame is gorgeous, and the engineering is extraordinary, but you cannot get far with it.In early 2025, Xe Iaso published a [piece that landed like a brick through a window](https://xeiaso.net/blog/2025/squandered-holy-grail/) in the Apple developer community. The argument was simple and damning: Apple had built the holy grail of trusted compute with Private Cloud Compute, a genuinely unprecedented piece of security infrastructure, only to fill it with half-baked not…

All blog posts