Jun 09, 2026

An IAEA for AI — and who pays for it

Who builds the AI verification mechanism — and how to pay for it without poisoning it

An open letter to Marina Favaro and Jack Clark, authors of "When AI builds itself" (Anthropic Institute, on recursive self-improvement).

Your essay closes on the right problem: if recursive self-improvement is plausible, we need verifiable coordination mechanisms that could enable a global slowdown if one is ever warranted. You name the need but leave two questions open — who carries such a mechanism, and how its founding gets financed without destroying the very neutrality that makes it credible. What follows is an attempt at those two questions: deliberately one idea, sharpened, not a survey.

The institution is the long pole

The mechanism has two parts — a body and a method — and they move at different speeds. A verification method is an engineering problem and can be developed quickly. A legitimate, neutral institution that states will actually submit to is a problem of consent and trust, and historically that takes a decade. So the institution is the binding constraint, and the clock on it starts now — well before the method is finished. The resolution to "but the body would be an empty shell": make commissioning the verification method its founding statutory mandate. The institution's first job is to build the tool it will later administer. That fuses the two timelines instead of trading one against the other.

Legal form: an IAEA, not a UN organ

The natural reflex is "house it in the UN." That is half right. The model to copy is the IAEA, which is not a UN organ but an autonomous body created by its own statute — "UN family" for legitimacy, independent for capacity. This matters concretely: the institution's existence and authority do not hang on a Security Council veto. Call it, as an open working title, the International AI Verification Agency (IAVA) — verification, the thing you identify as missing, named in the institution itself.

This is not a duplicate of what exists. National AI Safety Institutes (US, UK), the Bletchley/Seoul process, the OECD principles, the UN advisory bodies — all are either national (and therefore not neutral enough for one great power to let another verify it), or soft law (no binding inspection regime), or both. IAVA supplies precisely what they structurally cannot: a neutral treaty body with inspection rights and a hard membership condition.

The verifiable handle: compute, privacy-preserving

The nuclear regime works because fissile material is physical, concentrated, and countable. The closest analogue for AI is compute — advanced chips flow through a narrow supply chain (a handful of fabs, EUV, a few designers), are already serial-numbered and export-controlled. Through on-chip attestation / hardware-enabled mechanisms, a cluster can attest that a training run above a compute threshold occurred without exposing weights, data, or architecture.

That last property is the whole game for great-power consent. You verify that and at what scale computation happened, not what is inside it. This turns the problem from "intrusive inspection = loss of sovereignty" (no equilibrium any major state will accept) into "symmetric, IP-preserving attestation = verified mutual restraint" (a cooperative equilibrium that can hold). The remaining gap — getting determined states to opt in — is then an organizational and game-theoretic problem rather than a technical impossibility: the move is to make verification symmetric and cheap enough that opting in dominates holding out. Compute is the entry point for the regime; the design should not foreclose extending the scope to other verifiable objects as the method matures.

Financing without capture

The obvious objection to a frontier lab seeding the body that regulates frontier labs is regulatory capture — and it is a real precedent, not a hypothetical: the Gates Foundation's position as a top funder of the WHO produces persistent, documented concern about a private actor setting a UN body's agenda. Here it would be sharper, because the funder is itself regulated.

So here is the concrete proposal: Anthropic puts up the first tranche — on the order of one billion dollars — to seed the agency, and structures that gift so it builds legitimacy rather than buys control. The constraint, after all, is not the money — at current market scale a billion dollars is a rounding error — it is legitimacy, the scarce good, which the financing must manufacture, not merely supply. Two devices do this:

Contribution as the entry ticket. Make a financial contribution a condition of recognized standing as a legitimate frontier developer. This inverts the capture problem: single-donor dominance is structurally impossible when broad contribution is a condition of the institution's existence, and it builds exactly the pressure you would want — anyone claiming a seat at the frontier must acknowledge that the frontier needs global verification, and pay accordingly. Participation becomes incentive-compatible (standing, reputation, a hand in writing the rules), so the institution does not depend on altruism to survive.
Matching escrow for the first move. A lone billion is the unilateral disadvantage you warn about. So the seed contribution should be conditional: held in escrow, released only once N further contributors join. The first mover is then not a sucker making a solo concession but the trigger of a coordinated one.

Why act on an imperfect design

This concept has real flaws, and I have not hidden them — great-power consent is not solved, only made tractable; enforcement is hard; the timeline is tight. But the honest comparison is not "this design vs. a perfect one," it is "this design vs. the status quo," and the status quo's flaws are worse — they are merely diffuse, and therefore invisible. A concrete construction with nameable weaknesses beats a diffuse non-construction with deniable ones. If your own thesis about timelines is even roughly right, the cost of waiting for an unflawed mechanism is the one cost we cannot afford.

This letter was drafted with Claude Opus 4.8 from my own reasoning. You — and the people you could convene — undoubtedly command better instruments and more information than I do to find and close the concept's open seams, enforcement against a determined state being the hardest of them.

posted at Juni 09, 12:15 · Artikel

ai gesellschaft politik

blog:cknoll