The trust problem that you already solved

By Garrett,

The trust problem that you already solved

Every developer who has spent time with AI coding tools carries the same low-grade anxiety. You ask the model to build something, it hands you back a file, and then you stare at it like a customs inspector wondering whether the suitcase has a false bottom. Line by line, function by function, you trace through the logic looking for the thing that will blow up in production at 2am on a Saturday. It is exhausting, and it is also, if you think about it for more than ten seconds, a problem you solved years ago in a completely different context.

Other people’s code

When you work at any company of reasonable size, you depend on services built by other teams. The payments team builds an API and the auth team builds an identity layer. You read their documentation, you call their endpoints, and you get on with your life. You do not clone their repository and audit every function before making your first request. You do not trace through their pagination logic line by line to confirm they handle off-by-one errors correctly. If you did, you would never ship anything, because you would be too busy reading other people’s merge histories.

The implicit contract is simple enough, and you have probably never thought twice about it. They are professionals who tested this, and if something breaks you will debug it together, but until that happens you trust the abstraction.

Nobody finds this arrangement controversial, because it is how software gets built at scale and has been for decades. The reason it works is not that the code behind those services is perfect. The code is often mediocre, occasionally horrifying, and periodically held together by a single engineer who left the company in 2019. It works because the output is predictable enough that you stop worrying. You call the endpoint, you get the right data back, the pagination behaves as documented. You have done it enough times that the pattern holds. Your trust is built on accumulated evidence, not on a leap of faith.

The same problem wearing different clothes

When a model generates code for you, the anxiety is real but the underlying structure is identical. You are receiving output from an opaque system that you did not build and cannot fully inspect. The question is whether the output behaves correctly, not whether you can trace every computational step that produced it.

The discomfort comes from the fact that the opaque system is a language model rather than a team of engineers in a Slack channel. With the engineering team, you have social accountability, a shared codebase, probably a service-level agreement, and the knowledge that their manager will hear about it if their API starts returning garbage. With a language model, you have none of those guardrails. The system is probabilistic, it does not understand the code it writes (in any conventional sense of “understand”), and there is no one to page when it hallucinates a database column that does not exist.

All of which is true, and all of which matters less than you think once the model reaches a certain threshold of reliability.

Reliability is the only thing that earns trust

Opus 4.5 was the first model that cleared the bar for me. After enough rounds of asking it to build specific, well-scoped things and watching it produce correct, unexciting, completely functional output, the anxiety started to loosen. Ask it to build a JSON API that queries a database table, paginates the results, and returns them in a predictable format, and it just does it, not in a way that makes you want to write a LinkedIn post about the future of software but in the way a competent colleague does it, which is to say correctly and without drama.

That pattern of repeated correct output is exactly the same mechanism that made you comfortable calling the payments team’s API without reading their source code. You did it once, and it worked; then you did it a hundred times, and it kept working; eventually, you stopped thinking about it. The trust was not granted; it was earned through boring repetition.

The boundary matters, though, and it is worth being precise about where you draw it. I trust the model for classes of problems where I have seen it perform reliably, things like pagination, CRUD endpoints, data transformation, and standard API patterns. For these, the output is predictable enough that line-by-line review is a poor use of my time. If I asked it to design a distributed consensus algorithm or implement a custom encryption scheme, I would read every character, because I have not built up the same evidence base for those tasks.

The review question is really a resource allocation question

Treating every line of AI-generated code with equal suspicion is neither rigorous nor wasteful. It is the equivalent of cloning every internal team’s repo before calling their service, or taste-testing every ingredient before eating at a restaurant. Some level of trust delegation is required for anyone who wants to ship anything faster than geological timescales.

The useful question is not “should I trust AI output” but “for which specific tasks has this model earned my trust?” That framing turns a vague philosophical anxiety into a practical calibration exercise. You build a mental map of the reliability frontier, the boundary between problems the model handles reliably and problems where you need to stay close to the code, and you update that map as the models improve.

Right now, for well-scoped, well-understood programming tasks, the frontier is further out than most developers have internalized. People are still auditing boilerplate that a competent model handles flawlessly, because the idea of trusting a machine to write correct code still feels transgressive. It felt transgressive to trust a remote team’s undocumented API, too, the first time you did it. Then it became Tuesday.

The shift is not about developing blind faith in AI systems. It is about recognizing that you already have a framework for trusting opaque systems that produce predictable output, and that the same framework applies here. The model is not your colleague, and it is not going to send you a passive-aggressive Slack message when you file a bug report. But for the subset of tasks where the evidence supports it, the mechanics of trust are the same ones you have been using your entire career.

You just have to notice that you already know how to do this.

More insights:

  • Go to the actual place and see the actual thing

    Somewhere in a Toyota plant in the early 1950s, a young engineer stood inside a chalk circle drawn on the factory floor. Taiichi Ohno, the architect of the Toyota Production System, had put him there with a single instruction. Watch. No clipboard, no agenda, just observe what ha…

  • Climbing the Claude ladder: from prompting to orchestrating

    Most people using Claude are stuck on the first rung of a very tall ladder. They open a chat, type a question, get an answer, and move on with their day. Which is fine, but it’s a bit like buying a full workshop and only using the tape measure. I’ve spent the better part of a y…

  • The path to an agent-first web

    For three decades, the web has operated on an implicit contract between the people who build websites and the people who visit them. You design pages for human eyes and organise information for human brains, monetising attention through ads, upsells, and sticky navigation patter…

  • Generative engine optimisation: separating sound practice from snake oil

    A new three-letter acronym is stalking the marketing industry. Generative Engine Optimisation (GEO) is the practice of making your content visible in AI-generated answers, such as those produced by ChatGPT, Perplexity, Google AI Overviews, and Claude. The term was coined in a 20…

  • Automating your marketing 01: Paid Search Ads

    Google has always wanted you to believe that running search ads is simple and not as complex as it actually is. Set a budget (a generous one!), choose some keywords, and let the machine handle the rest. To be fair, the machine has become exceptionally good at certain aspects of …

All insights

Book a call

Have a challenge in mind or just want to connect? Schedule a call with Garrett, or reach out via email or LinkedIn.

A playful, hand-drawn illustration of a group of characters holding up scorecards with the number ‘11’. They sit behind a table scattered with various other numbers.