Jun 12, 2026 · 5 min read AI Reliability Architecture

The trust gap: bounded autonomy for AI SRE agents

SREs face 50+ alerts a day at 60% false positives while vendors promise autonomous resolution. The autonomy ladder: what an AI agent should never do alone.

Every vendor deck in 2026 has the same headline: this is the year of autonomous incident resolution. Rootly frames it that way, the Medium thought-leadership circuit frames it that way, and the cloud providers shipped agents to match. The pitch is seductive: the agent sees the incident, reasons about it, and fixes it while you sleep.

I build these agents. I also keep them on a short leash, on purpose. Because underneath the “autonomous” headline is a number the decks skip past: SREs face an average of 50+ alerts a day, and roughly 60% of them are false positives. An agent that acts unsupervised is an agent that acts on bad signal — at machine speed, at scale, during the exact moments your systems are already fragile.

That gap — between what the agents can do and what we can trust them to do — is the real story of 2026. The technology is arriving faster than the frameworks needed to deploy it safely. This post is the framework I use to close it: bounded autonomy.

”Autonomous” is the wrong axis

The mistake baked into the vendor framing is treating autonomy as a switch. On: the agent acts. Off: the agent suggests. Reality doesn’t have that switch, because the question was never “can the agent act” — it’s “which action, with what blast radius, and can you undo it.”

Restarting a single stateless replica and altering a database schema are both “actions.” Putting them on the same on/off toggle is how you turn a clever demo into a postmortem. I made the conceptual version of this argument in when NOT to use AI in production SRE; this is the operational follow-through.

The right axis is how much authority each specific action has earned. And authority is a function of two things:

Blast radius — how many systems, users, or dollars a wrong action touches.
Reversibility — how cheaply you can undo it when (not if) the agent is wrong.

Low blast radius, easily reversible → a candidate for bounded autonomy. High blast radius or irreversible → human-approved, forever, no matter how confident the model sounds. The model’s confidence is not evidence; with 60% false positives in the input, confident and wrong is the expected case, not the edge case.

The model’s output is a request, never an authorization. Bounded autonomy is just that principle with a permission system attached.

The autonomy ladder

Here’s the model I actually deploy against. Six rungs, and you climb them by earning trust at each level — never by flipping a config flag.

Observe & summarize. The agent reads telemetry and tells you what’s happening in plain language. Zero write access. Pure upside, zero risk. Every team should be here today.
Investigate & hypothesize. The agent correlates signals, forms a causal hypothesis, and shows its reasoning. This is the genuine step-change over 2019-era AIOps — hypothesis generation against live telemetry instead of signature matching. Still read-only. Still safe.
Propose remediation. The agent recommends a specific fix and the command to run it. A human executes. This is where most teams find the steepest part of the value curve, and it’s still read-only on the agent’s side.
Execute bounded, reversible actions — human in the loop. The agent may run a small, explicit, reversible action set (restart a stateless pod, scale a worker, drain a node) but each action waits for a one-click human approval. This is the first rung with write access, and it’s deliberately small.
Execute bounded, reversible actions — human on the loop. Same action set, but the agent acts and the human can veto, rather than pre-approving. You earn this rung only after rung 4 has a measured track record on your systems.
Full unsupervised action. For the narrow, well-understood, high-frequency, low-blast-radius cases where you’ve proven the agent across hundreds of incidents. Most teams should treat this as a destination for a handful of action types — not a default, and never for anything irreversible.

Most of the production value lives at rungs 2 through 4. The vendors are selling rung 6 because it demos well. The reliability is at 2–4 because that’s where the agent is useful and the human still owns the irreversible decisions.

You earn rungs with measurement, not faith

The reason teams skip straight to “autonomous” and get burned is that they treat the ladder as a config choice instead of an evidence problem. Each rung up is a claim: the agent is reliable enough at this blast radius to lose the human check. That claim needs data — the evaluation layer of the stack, the one everybody skips.

Concretely, before an action type graduates from rung 4 to rung 5, I want to see: across the last N incidents where the agent proposed this action, how often was it right, how often did a human override it, and what did the overrides cost when they were wrong? If you can’t answer that, you haven’t earned the rung — you’re just hoping. And hope, deployed to a system facing 50 alerts a day at 60% false positives, is a strategy for generating incidents, not resolving them.

This is also the antidote to agent sprawl. The reason forty ungoverned agents become your next outage is that none of them sit on an explicit ladder — they each accumulated whatever permissions someone granted in a hurry. A team that places every agent on a named rung, with the action set written down and the evaluation data attached, has a governance surface. A team that has “the autonomous agent” has a liability with good marketing.

What to do Monday

Place every agent you run on a rung. If you can’t name the rung, the agent is at rung 6 by accident — fix that today.
Write the action set down. For any agent above rung 3, the exact set of actions it may take should be a reviewable list, scored by blast radius and reversibility. Anything irreversible comes off the list.
Attach evaluation data to every rung. No promotion without a measured track record at the rung below.
Default new agents to rung 2. Observe and hypothesize first. Earn the write access. The New Relic data — 2x correlation, 27% less noise — says rungs 2 and 3 already pay for themselves before you grant a single write.

The year of autonomous incident resolution is real in the sense that the agents are genuinely good at rungs 2 through 4. It’s marketing in the sense that rung 6 is being sold as the default. Bounded autonomy is how you take the real value without buying the marketing — and how you make sure the agent that’s supposed to resolve your 3am incident isn’t the thing that caused it.

Frequently asked questions

What is bounded autonomy for an AI SRE agent?

Bounded autonomy is the design principle that an AI SRE agent operates from a small, explicit, reversible set of actions it may take on its own, while everything outside that set is a proposal a human approves. The agent's authority is a function of the blast radius and reversibility of each action — not a single on/off 'autonomous' switch. It's the difference between letting an agent restart a stateless pod unsupervised and letting it modify a database, which it should never do without a human.

Why isn't full autonomy the goal for AI SRE agents?

Because the failure cost is asymmetric. The data shows SREs face 50+ alerts a day with roughly 60% false positives, so an agent acting unsupervised will act on bad signals at scale. A wrong read-only investigation costs nothing; a wrong write action during an incident can turn a degradation into an outage. Full autonomy optimizes for the rare case where the agent is right and ignores the common case where the signal is wrong.

What is the autonomy ladder?

The autonomy ladder is a model for granting an AI SRE agent authority in graduated rungs: (1) observe and summarize, (2) investigate and hypothesize, (3) propose remediation, (4) execute reversible bounded actions with a human in the loop, (5) execute reversible bounded actions with a human on the loop, and (6) full unsupervised action. Most production value lives at rungs 2 through 4. You earn each rung with measured reliability at the one below it, never by default.

How do I decide which actions an agent can take unsupervised?

Score each candidate action on two axes: blast radius (how many systems or users a wrong action affects) and reversibility (how cheaply you can undo it). Low-blast-radius, easily-reversible actions like restarting a stateless replica or scaling a queue worker are candidates for bounded autonomy. High-blast-radius or irreversible actions — schema changes, data deletion, anything touching money or safety-critical paths — stay human-approved regardless of how confident the model is.