Agentic AI Patterns: The Maturity Model (Part 3 of 3)

A five-level maturity model for agentic AI — from manual to multi-agent mesh — with a self-assessment to find where your team sits, what the jump to the next level actually requires, and where regulated industries should draw the line.


Agentic AI Patterns — Part 3 of 3  ·  Part 1: The Decision Guide  ·  Part 2: Where They Break in Production

Most teams that say they’re “doing AI agents” are at Level 2. A few know it. Most don’t.

This post maps agentic AI capability onto a five-level maturity model. The levels describe increasing autonomy, trust, and infrastructure complexity — not increasing model capability. You can be at Level 4 with an open-weights model on commodity hardware. Level is determined by how you’ve built and governed the system, not which API you’re calling.

Five-level agentic maturity ladder from Manual to Multi-Agent Mesh, with a regulated-industry target marker

Each level adds autonomy and demands more governance — most teams sit at Level 2; regulated industries should target Level 3.


The five levels

Level 1 — Manual

What it looks like: An engineer copies text into a chat interface, gets a response, and applies the result manually. The LLM is a smart clipboard.

Typical use: Drafting incident summaries, brainstorming runbook steps, explaining unfamiliar error messages.

Limitations: Zero repeatability. Zero auditability. Output quality depends entirely on the individual prompt. No leverage — one engineer, one task, one output.


Level 2 — Assisted

What it looks like: The LLM is embedded in a tool — an IDE plugin, a Slack bot, a CLI wrapper — and produces suggestions that a human reviews and applies. The human is still the agent; the LLM is the copilot.

Typical use: GitHub Copilot for code, an alert-explainer bot in your observability channel, an AI-assisted runbook generator that outputs a draft for human review.

Limitations: Human review bottleneck. Every output requires a human in the loop, which limits throughput. The system is only as fast as the human reviewing it.

Most production “AI” deployments as of mid-2026 are at this level. It’s a legitimate place to be, especially in regulated environments — but calling it “agentic” is a stretch.


Level 3 — Supervised Agent

What it looks like: The system runs an agentic loop autonomously — reads context, selects tools, executes actions — but with human-in-the-loop gates at defined checkpoints before irreversible actions. The agent does the work; a human approves the consequential steps.

Typical use: An incident-response agent that investigates, produces a root-cause analysis, drafts a remediation plan, and then pauses for human approval before executing any infrastructure change.

What the jump requires:

  • Working tool integrations (not just LLM calls — real tool use with structured inputs/outputs)
  • A gate mechanism: Slack webhook, approval queue, or UI checkpoint
  • Observability: what did the agent do, in what order, with what results?
  • Error handling: what happens when a tool call fails mid-run?

This is where most teams should be building right now. The autonomy is real; the safety net is present; the trust is being established.


Level 4 — Autonomous Agent

What it looks like: The agent operates end-to-end without a human gate on individual actions. It has defined boundaries (scope, tool access, blast radius) enforced by the platform, not by human review. Humans review outcomes, not individual steps.

Typical use: An auto-remediation agent that detects a known failure pattern, executes a bounded playbook (restart pod, scale up, clear cache), and posts a summary to Slack. No human approval per-action — the boundaries are enforced in the tool layer.

What the jump from Level 3 requires:

  • Mature tool access controls: Cedar, OPA, or equivalent policy layer that enforces what the agent can and cannot do at the infrastructure level — not just in the prompt
  • Blast radius limits: the agent can restart pods but cannot delete namespaces; the tool surface enforces this, not the model
  • Outcome observability: if you can’t see what the agent did and measure whether it worked, you can’t trust it enough to remove the gate
  • Incident playbook: when the autonomous agent causes a problem, how do you detect it, stop it, and roll back?

This level requires trust you earn, not trust you assume. Most teams skip Level 3 and go straight here. That’s how production incidents happen.


Level 5 — Multi-Agent Mesh

What it looks like: Multiple specialized agents coordinate to accomplish goals that no single agent could handle alone. A supervisor routes tasks to sub-agents, sub-agents may themselves spawn sub-agents, and results flow back up the chain. The system operates continuously, not just on demand.

Typical use: An autonomous SRE mesh where a monitoring agent detects anomalies, a diagnosis agent identifies root cause, a remediation agent executes a fix, and a reporting agent updates the incident record — all without a human step in the loop for known failure classes.

What the jump from Level 4 requires:

  • Mature supervisor pattern (see Part 1)
  • Sub-agent interface contracts: structured typed inputs and outputs, not freeform text
  • Multi-agent observability: tracing that follows a task across agent boundaries, not just within a single agent
  • Governance for autonomous-to-autonomous delegation: what can the supervisor authorize a sub-agent to do that the sub-agent couldn’t do on its own?

This is the frontier. A handful of teams in cloud-native environments are operating here. Most are not.


Self-assessment: where is your team?

Answer these honestly:

  1. Can your AI system complete a multi-step task without a human copy-pasting between steps? If no → Level 1 or 2.
  2. Does your AI system call external tools (APIs, databases, infrastructure) directly? If no → Level 2.
  3. Do you have observability on individual tool calls — not just final output, but every step? If no → you’re at Level 2 even if you think you’re at Level 3.
  4. Do you have policy-enforced tool access controls (not just prompt-based restrictions)? If no → you’re at Level 3 at most.
  5. Do multiple specialized agents coordinate on a single task, with structured interfaces between them? If yes → Level 5 candidate.

Where regulated and safety-critical industries should draw the line

This question comes up constantly in aviation, finance, and healthcare: how autonomous is too autonomous?

The honest answer: most regulated environments should target Level 3, with a well-defined path to Level 4 for bounded, reversible actions.

Here’s why:

  • Auditability requirements in regulated industries demand a human decision point before consequential actions. Level 4 removes that point. You need explicit regulatory analysis before removing human gates on actions that affect safety or compliance.
  • Blast radius asymmetry: in consumer software, a bad autonomous action is annoying. In aviation infrastructure or financial settlement, it can be catastrophic. The cost-benefit of Level 4 changes dramatically.
  • Trust is regulatory, not just operational. Even if your Level 4 system works perfectly, you need to demonstrate to regulators that it operates within defined bounds. Policy-enforced tool access (Cedar, OPA) is the mechanism — but the regulatory framework for certifying autonomous AI decision-making in most industries is still immature.

The right posture for regulated industries: build Level 3 deeply, with excellent observability and tight blast-radius controls. Run Level 4 in read-only or low-consequence paths first (log analysis, summarization, reporting) to build the evidence base. Earn Level 4 trust one bounded use case at a time.


Using the model

The decision guide in Part 1 tells you which pattern to use for a specific task. This maturity model tells you which patterns are appropriate given your current level of governance, observability, and trust.

If you’re at Level 2, don’t implement a Supervisor pattern. You don’t have the observability or error handling to operate it safely.

If you’re at Level 3 with strong observability and a working gate mechanism, Plan-and-Execute and Critic loops are appropriate next steps. Parallel fan-out and Supervisor are Level 4+ patterns in most organizations.

The failure modes in Part 2 are largely failures of teams implementing Level 4+ patterns at Level 2 maturity. The maturity model is the checklist that prevents that.


Related posts: