AI governance: between the committee and the catastrophe

By Iain,

AI governance: between the committee and the catastrophe

Every large organisation deploying AI currently faces two failure modes. Moving too slowly by requiring extensive committee approvals and detailed risk assessments causes the technology to become outdated before it can deliver results. Conversely, moving too quickly by allowing engineers to deploy models with minimal oversight risks issues such as systematic discrimination—for example, a credit algorithm unfairly targeting women for six months while compliance teams are still drafting policies.

Most organisations respond to this dilemma in the worst way. They add governance to AI as an afterthought, creating bureaucratic layers that slow processes but do not prevent harm. This approach results in programmes that frustrate engineers, disappoint business leaders, and still fall short of passing audit requirements.

  1. This piece discusses what a large, complex, regulated organisation truly needs to implement to govern AI effectively. It focuses on the real-world challenge of balancing usefulness with oversight in high-stakes environments, where mistakes can lead to regulatory sanctions, reputational damage, or direct harm to ind ividuals.

The harm is not hypothetical

Before analysing structure, it’s worth considering what “getting it wrong” looks like in practice.

In 2019, Optum’s Impact Pro algorithm, used across American hospitals to flag patients for additional care management, was found by researchers at UC Berkeley to contain serious racial bias. The algorithm used healthcare spending as a proxy for healthcare need, which seemed reasonable on paper. But because Black patients historically spent less on healthcare (owing to systemic barriers including geography, income, and access), the model learned to assign them lower risk scores than white patients who were objectively less ill. The researchers estimated this bias reduced the number of Black patients identified for extra care by more than half. The algorithm affected roughly 200 million people per year and had been running for years before anyone outside the organisation examined it closely.

That same year, Goldman Sachs came under investigation by the New York Department of Financial Services after Apple Card customers reported that the credit assessment algorithm was offering men dramatically higher credit limits than women with comparable or superior financial profiles. Tech entrepreneur David Heinemeier Hansson reported receiving a credit limit 20 times higher than his wife’s despite her having a higher credit score. Apple co-founder Steve Wozniak confirmed that the same pattern applied to his own household. As New York’s superintendent of financial services stated at the time, algorithms do not receive immunity from discrimination, and disparate impact is illegal whether the intent is there or not.

These cases share a revealing pattern, because in both, the organisations developing the systems did not intend to discriminate, and in both, the models performed well by their own internal metrics. Nobody with authority to intervene was observing the correct aspects until the damage had already accumulated. The Optum algorithm was, by the company’s own account, “highly predictive of cost.” That was the issue. It was optimiser for the wrong goal, and no governance process identified the gap between what the model measured and what it was being employed to decide.

This is the fundamental challenge of AI governance in regulated industries because the harm does not reveal itself immediately. It hides inside proxy variables, accumulates over time, and only becomes obvious when someone with the necessary expertise asks the right questions. Annual reviews and static risk registers will miss it entirely.

Why most governance programmes fail before they start

The typical enterprise response to AI risk follows a depressingly familiar pattern. A working group is formed, consultants are hired, a framework is chosen (usually NIST AI RMF, ISO 42001, or a hybrid of both), policies are drafted, and a committee is set up to review AI use cases. Then everything stalls because the committee meets monthly, the review process takes eight weeks, and by the time a model is approved, the business case that justified it has moved on.

The NIST AI Risk Management Framework, released in January 2023, is the most robust of the available frameworks precisely because it avoids this trap. It arranges AI risk management into four functions (Govern, Map, Measure, Manage) that are meant to be iterative and proportional rather than sequential and uniform. The framework explicitly states that its playbook is neither a checklist nor a set of steps to be followed completely. Organisations are encouraged to adopt as many or as few suggestions as suit their situation. But most organisations, when presented with a flexible framework, immediately rigidify it. They turn suggestions into mandatory steps, iterative loops into linear approval chains, and proportional risk assessments into one-size-fits-all review boards.

The result is what one industry analysis called “innovation gridlock,” with teams spending 56% of their time on governance-related activities when using manual processes. More than half the working week is spent on documentation, approvals, and committee cycles rather than on building anything. At that rate, governance is not enabling responsible AI deployment; it is preventing AI deployment altogether, which may seem safe until you realise that your competitors are shipping and your organisation is only writing memos about deploying.

Tiered risk, not uniform bureaucracy

The most crucial structural decision in any AI governance programme is how you tier risk because not every AI application has the same consequences. A model that recommends which internal wiki article to surface in search results is categorically different from one that determines credit eligibility or flags patients for medical intervention. Treating them the same is organisational negligence dressed up as rigour.

The EU AI Act, which becomes fully applicable in August 2026, formalises this through a four-tier classification system: unacceptable risk (banned outright), high risk (heavy compliance obligations), limited risk (transparency requirements), and minimal risk (largely unregulated). Whatever you think of the specific classifications, the principle is sound. The level of oversight should be proportional to the consequences.

For a large regulated organisation, this means building an internal classification system that maps each AI use case to a risk tier at the start, not after deployment. In practice, Tier 1 would include applications where AI outputs directly impact individuals’ rights, finances, health, or legal standing, and these are subject to full governance including bias testing, explainability requirements, human review of decisions, and ongoing monitoring. Tier 2 covers applications where AI supports but does not determine human decisions, which receive lighter but still meaningful oversight. Tier 3 includes internal productivity tools, content summarisation, and search augmentation, governed through acceptable-use policies and periodic audits rather than case-by-case review.

Classification must be quick, because if tiering takes six weeks, you’ve merely moved the bottleneck. It should be a structured questionnaire that a product owner can complete in under an hour, with automatic routing based on answers. Does this model make or materially influence decisions about individuals? Does it process personal data or operate in a domain subject to sector-specific regulation? The answers to these questions determine the governance track, and that track influences the speed.

Who owns this, and why that question is harder than it sounds

In most organisations deploying AI at scale, accountability becomes so divided that it almost dissolves. The data science team constructs the model, engineering deploys it, the business unit uses it, legal reviews the vendor contract, compliance examines the regulatory aspects, and risk management assesses the exposure. When issues arise, each function shifts blame onto the others.

The governance structure that effectively functions in practice (as opposed to the idealised version on a slide) assigns three clear roles for every Tier 1 and Tier 2 AI application. First, a model owner, usually within the business unit, who is responsible for the application’s outcomes and monitors its performance once in production. Second, a technical steward from the data science or engineering team, who ensures the model’s technical integrity, including data quality, bias testing, and version control. Third, a risk reviewer from the compliance or risk team, who verifies that the application meets regulatory and policy standards before deployment and at fixed intervals afterwards.

This is not a committee but a clearly identified group of individuals with specific responsibilities and escalation procedures. Named owners focus accountability, whereas committees diffuse it.

Continuous Monitoring or None

The governance issues in the Optum and Apple Card cases share another trait beyond misaligned incentives. In both instances, models were reviewed at one point and then left to operate. This static audit model fundamentally conflicts with the nature of AI systems, which drift over time as data distributions change. The world evolves around the assumptions embedded in training data, and a model that was fair on Tuesday can become discriminatory by Thursday if the demographics shift or the data supply deteriorates.

A Gartner survey of 360 organisations in 2025 revealed that organisations using AI governance platforms were 3.4 times more likely to attain high governance effectiveness than those relying solely on manual processes. The difference isn’t about the technology itself, but about the shift from intermittent monitoring to continuous oversight.

For a Tier 1 application, continuous monitoring involves automated tracking of model performance metrics (accuracy, precision, recall) and fairness metrics across protected groups. It includes drift detection that signals when input data diverges from training data and logs every decision with enough context to explain why a particular output was produced. It also requires predefined thresholds that automatically trigger escalation when false-positive rates for any subgroup surpass set limits, instead of waiting until the next quarterly review.

For Tier 2 and Tier 3 applications, the monitoring can be lighter, based on periodic sampling instead of exhaustive logging and aggregate statistics rather than detailed decision logs. Nonetheless, the core principle remains: governance that only exists at approval loses relevance once the model engages with real-world data.

The speed question

Everything described above seems costly and sluggish, which is exactly the objection that kills most governance programmes early on. Business leaders hear “risk tier classification, named accountability, continuous monitoring” and interpret it as “months of delay and an increase in staff we cannot justify.”

This is where the design of the programme matters more than its content. Organisations that manage this well share a structural choice: embedding governance into existing development workflows rather than adding it as a separate process. Classification occurs during product planning, not after development. Bias testing runs as part of the CI/CD process, not as an extra gate before deployment. Monitoring is automated and integrated into the same dashboards engineers already use, not a separate compliance portal that nobody checks.

The relevant analogy is code review in software engineering. Two decades ago, code review was a formal, scheduled event involving printed listings and conference rooms. Today, it is a pull request that happens inline, as part of the natural development rhythm, and it is faster and more effective because it is integrated, not added on. AI governance needs the same structural shift.

The other key is proportionality: not every model requires the same level of review. A Tier 3 application, where a model summarises internal meeting notes, should not need the same documentation as a Tier 1 application, where a model influences loan approvals. If your governance process cannot differentiate, you will either over-govern low-risk applications (wasting time) or under-govern high-risk ones (risking disaster), and both outcomes are common and avoidable.

Regulatory pressure is not slowing down

For organisations still debating whether to invest in governance infrastructure, the regulatory calendar is settling the argument. The EU AI Act’s high-risk system rules take effect in August 2026, with penalties of up to EUR 35 million or 7% of global annual turnover for prohibited practice violations. In the United States, over 1,100 AI-related bills were introduced at the state level in 2025 alone](https://www.jadeglobal.com/blog/ai-governance-maturity-vs-risk), with states including Texas, Colorado, and California already enacting disclosure, bias prevention, and risk management requirements. Gartner projects that by 2030, fragmented AI regulation will cover 75% of the world’s economies, with total compliance spending exceeding $1 billion.

The question is no longer whether your organisation will need a governance programme. The question is whether you build one now, when you can design it to be proportional and efficient, or later, when you are scrambling to retrofit controls onto systems that have been running ungoverned for years.

Building governance reactively, after a regulatory investigation or a public incident, is invariably more expensive and more disruptive than building it proactively. A Conference Board analysis of S&P 500 filings found that 38% of companies now cite AI-related reputational risk as a material concern, up 12 percentage points in two years. That shift reflects a dawning recognition that the cost of ungoverned AI extends beyond fines. It includes the erosion of customer trust, the drag of legal uncertainty, and the competitive disadvantage of being unable to deploy AI confidently because you cannot demonstrate its safety.

What the programme actually looks like

Strip away the framework jargon and the regulatory acronyms, and the operating model for AI governance in a large, complex, regulated organisation comes down to five things.

A risk classification system that is fast and proportional. Every AI application gets tiered on intake, with the classification driving the governance track, from full treatment for Tier 1 to a light touch for Tier 3, and classification takes less than a day rather than less than a quarter.

Named accountability for every consequential model. Model owner, technical steward, risk reviewer, all documented with escalation paths. No committees-of-the-whole where everyone is responsible and therefore nobody is.

Governance embedded in the development lifecycle. Bias testing running in the build process, documentation generated alongside code, review happening at pull-request cadence rather than board-meeting cadence.

Continuous monitoring proportional to risk. Automated performance and fairness tracking for Tier 1 applications, periodic sampling for everything else, and predefined thresholds that trigger action without waiting for the next scheduled review.

A regulatory mapping that stays current. If you operate across jurisdictions, you need a maintained mapping of which AI applications fall under which regulatory regimes and what each regime requires. This is not a one-off exercise, because the regulatory terrain shifts quarterly and your mapping needs to keep pace.

None of this is conceptually difficult, but the hard part is organisational. It requires executive sponsorship that treats governance as an operational function rather than a cost centre, engineering teams that accept governance as a professional obligation rather than an impediment, and risk and compliance functions willing to move at the speed of technology rather than the speed of audit cycles.

The uncomfortable middle ground

There is a persistent belief in governance discourse that you can have both perfect oversight and maximum speed, that with enough cleverness in your framework design, governance becomes invisible and frictionless. This is not true, and pretending otherwise sets programmes up for failure.

Good governance creates friction, and that is precisely the point. The question is whether the friction is targeted, proportional, and quickly resolved, or whether it is blanket, disproportionate, and endless. A Tier 1 application should face meaningful friction in the form of tough questions about training data, stress testing against edge cases, documented justification for proxy variables, and human review of decisions with serious consequences. That friction exists because the alternative is a credit algorithm that discriminates against half the population or a healthcare model that systematically under-serves Black patients for years without detection.

The World Economic Forum’s assessment of AI governance myths is worth repeating here. AI governance failures are rarely caused by technological limits; instead, they arise from conceptual mistakes that mislead regulation. The organisations that govern AI effectively are not the ones with the most complicated frameworks, but those that have figured out where friction is necessary and eliminated it elsewhere.

Optum’s algorithm did what it was designed to do, predicting healthcare costs with high precision. The failure was upstream, in the decision to use cost as a proxy for need without considering how that proxy might affect patients who had historically been denied access to the system. No amount of model monitoring could have identified that error because the model was performing exactly as planned. What was missing was someone with the authority and mandate to question whether the model measured what should have been measured.

That question cannot be automated or answered by a dashboard. It requires human judgment, domain expertise, and an institutional willingness to slow down at moments when slowing down matters. The governance programme’s role is to create the space for that question to be asked, and to ensure it is addressed before the model ships, not after it has been running on 200 million people for three years.

More insights:

  • The path to an agent-first web

    For three decades, the web has operated on an implicit contract between the people who build websites and the people who visit them. You design pages for human eyes and organise information for human brains, monetising attention through ads, upsells, and sticky navigation patter…

  • Generative engine optimisation: separating sound practice from snake oil

    A new three-letter acronym is stalking the marketing industry. Generative Engine Optimisation (GEO) is the practice of making your content visible in AI-generated answers, such as those produced by ChatGPT, Perplexity, Google AI Overviews, and Claude. The term was coined in a 20…

  • Automating your marketing 01: Paid Search Ads

    Google has always wanted you to believe that running search ads is simple and not as complex as it actually is. Set a budget (a generous one!), choose some keywords, and let the machine handle the rest. To be fair, the machine has become exceptionally good at certain aspects of …

  • Why AI models hallucinate

    In September 2025, OpenAI published a paper that said something the AI industry already suspected but hadn’t quite articulated. The paper, “Why Language Models Hallucinate”, authored by Adam Tauman Kalai, Ofir Nachum, Santosh Vempala, and Edwin Zhang, didn’t just catalogue the p…

  • Received wisdom: classic frameworks under AI pressure 01: David C Baker

    David C Baker has spent thirty years telling agency owners something they already suspected but lacked the courage to act on. You are not expensive enough, not focused enough in what you do. You are not sufficiently authoritative with your clients. The issue is not your work. Th…

All insights

Book a call

Have a challenge in mind or just want to connect? Schedule a call with Garrett, or reach out via email or LinkedIn.

A playful, hand-drawn illustration of a group of characters holding up scorecards with the number ‘11’. They sit behind a table scattered with various other numbers.