The trust problem that you already solved
By Garrett,

Every developer who has spent time with AI coding tools carries the same low-grade anxiety. You ask the model to build something, it hands you back a file, and then you stare at it like a customs inspector wondering whether the suitcase has a false bottom. Line by line, function by function, you trace through the logic looking for the thing that will blow up in production at 2am on a Saturday. It is exhausting, and it is also, if you think about it for more than ten seconds, a problem you solved years ago in a completely different context.
Other people’s code
When you work at any company of reasonable size, you depend on services built by other teams. The payments team builds an API and the auth team builds an identity layer. You read their documentation, you call their endpoints, and you get on with your life. You do not clone their repository and audit every function before making your first request. You do not trace through their pagination logic line by line to confirm they handle off-by-one errors correctly. If you did, you would never ship anything, because you would be too busy reading other people’s merge histories.
The implicit contract is simple enough, and you have probably never thought twice about it. They are professionals who tested this, and if something breaks you will debug it together, but until that happens you trust the abstraction.
Nobody finds this arrangement controversial, because it is how software gets built at scale and has been for decades. The reason it works is not that the code behind those services is perfect. The code is often mediocre, occasionally horrifying, and periodically held together by a single engineer who left the company in 2019. It works because the output is predictable enough that you stop worrying. You call the endpoint, you get the right data back, the pagination behaves as documented. You have done it enough times that the pattern holds. Your trust is built on accumulated evidence, not on a leap of faith.
The same problem wearing different clothes
When a model generates code for you, the anxiety is real but the underlying structure is identical. You are receiving output from an opaque system that you did not build and cannot fully inspect. The question is whether the output behaves correctly, not whether you can trace every computational step that produced it.
The discomfort comes from the fact that the opaque system is a language model rather than a team of engineers in a Slack channel. With the engineering team, you have social accountability, a shared codebase, probably a service-level agreement, and the knowledge that their manager will hear about it if their API starts returning garbage. With a language model, you have none of those guardrails. The system is probabilistic, it does not understand the code it writes (in any conventional sense of “understand”), and there is no one to page when it hallucinates a database column that does not exist.
All of which is true, and all of which matters less than you think once the model reaches a certain threshold of reliability.
Reliability is the only thing that earns trust
Opus 4.5 was the first model that cleared the bar for me. After enough rounds of asking it to build specific, well-scoped things and watching it produce correct, unexciting, completely functional output, the anxiety started to loosen. Ask it to build a JSON API that queries a database table, paginates the results, and returns them in a predictable format, and it just does it, not in a way that makes you want to write a LinkedIn post about the future of software but in the way a competent colleague does it, which is to say correctly and without drama.
That pattern of repeated correct output is exactly the same mechanism that made you comfortable calling the payments team’s API without reading their source code. You did it once, and it worked; then you did it a hundred times, and it kept working; eventually, you stopped thinking about it. The trust was not granted; it was earned through boring repetition.
The boundary matters, though, and it is worth being precise about where you draw it. I trust the model for classes of problems where I have seen it perform reliably, things like pagination, CRUD endpoints, data transformation, and standard API patterns. For these, the output is predictable enough that line-by-line review is a poor use of my time. If I asked it to design a distributed consensus algorithm or implement a custom encryption scheme, I would read every character, because I have not built up the same evidence base for those tasks.
The review question is really a resource allocation question
Treating every line of AI-generated code with equal suspicion is neither rigorous nor wasteful. It is the equivalent of cloning every internal team’s repo before calling their service, or taste-testing every ingredient before eating at a restaurant. Some level of trust delegation is required for anyone who wants to ship anything faster than geological timescales.
The useful question is not “should I trust AI output” but “for which specific tasks has this model earned my trust?” That framing turns a vague philosophical anxiety into a practical calibration exercise. You build a mental map of the reliability frontier, the boundary between problems the model handles reliably and problems where you need to stay close to the code, and you update that map as the models improve.
Right now, for well-scoped, well-understood programming tasks, the frontier is further out than most developers have internalized. People are still auditing boilerplate that a competent model handles flawlessly, because the idea of trusting a machine to write correct code still feels transgressive. It felt transgressive to trust a remote team’s undocumented API, too, the first time you did it. Then it became Tuesday.
The shift is not about developing blind faith in AI systems. It is about recognizing that you already have a framework for trusting opaque systems that produce predictable output, and that the same framework applies here. The model is not your colleague, and it is not going to send you a passive-aggressive Slack message when you file a bug report. But for the subset of tasks where the evidence supports it, the mechanics of trust are the same ones you have been using your entire career.
You just have to notice that you already know how to do this.
Like this? Get email updates or grab the RSS feed like it’s 2008.
More from the blog
-

The state and the machine
> What little we saw of Fable and Mythos offers both cause for excitement and concern. It was widely and credibly seen as a model of a completely different caliber from those that had come before. Perhaps the risks in this instance were overstated or amplified for political ends. What is more profound is that the short time we had with the models offered a clear glimpse of a future in which a single company is making significant progress toward a superintelligence with the potential to rival or exceed the power of nation-states or even massive corporations. That juncture was never going to ar…
-

We have ways of making you pay
> The true cost of AI work is hard to measure; the value of AI work is also hard to measure, and metering changes which of those two blindnesses you notice first. It drags the cost into the light, itemised and arriving monthly, while the value stays diffuse, lagging and easy to argue about. That asymmetry is exactly why the panic is showing up now, ahead of any definitive verdict on whether the spending was worth it.Simon Willison did the arithmetic on himself. He pays $200 a month across his Anthropic and OpenAI consumer plans, and when he ran the [ccusage](https://github.com/ryoppippi/ccusa…
-

Bloated: how chat made you fat
> It helps to remember the time you save generating a document is not free. It is borrowed from every person who has to read it, at interest, and the longer the distribution list the worse the rate of return.The pitch for writing with a language model is that it saves you time: you describe the memo, the model produces it and 90 seconds later you have four pages (okay, maybe forty) instead of a blank document. Someone still has to read those pages though. The model did not remove that work. It just moved it downstream to your colleagues or suppliers, and on the way it produced more than any h…
-

Apple’s bicycle without a chain
Steve Jobs described the computer as a bicycle for the mind. Apple Intelligence so far is more like a bicycle with no chain. The frame is gorgeous, and the engineering is extraordinary, but you cannot get far with it.In early 2025, Xe Iaso published a [piece that landed like a brick through a window](https://xeiaso.net/blog/2025/squandered-holy-grail/) in the Apple developer community. The argument was simple and damning: Apple had built the holy grail of trusted compute with Private Cloud Compute, a genuinely unprecedented piece of security infrastructure, only to fill it with half-baked not…
-

Weeknotes vol. 17: business, schmizness
Hello and happy casual weeknotes Friday.I stopped writing these [about a year ago](/blog/weeknotes-16/) when I began the transition into consulting (solving fun and challenging problems), and to say a lot has changed since then would be the understatement of the century.In summary: [Iain](/blog/iain/) joined full time, we're helping people solve operational problems and optimize their work across pretty much all aspects of business, and we're having a lot of fun doing it. Iain has his masters in AI for Business, which has pushed me to go down the biggest rabbit hole I've been down since HTML/…
