The MCP gateway pattern: five jobs your agent runtime can't skip

Letting agents call MCP servers directly is the same mistake as letting microservices call each other without an API gateway. Here are the five jobs an MCP gateway has to do, and reproducible patterns for each — scope-token exchange, schema firewall, quarantine queue, provenance ledger, and a catalog/broker split.


The first time someone in your org wires Claude (or any agent) directly to a Model Context Protocol server in production, three things happen in the next ninety days.

One: a developer adds a second MCP server. Then a third. Each one carries its own credentials, its own scopes, its own idea of what an “id” looks like.

Two: someone discovers the agent called a write tool nobody expected — a Jira ticket got closed, a row got deleted, a Slack DM went out — and there’s no clean answer to which conversation, which user, which prompt.

Three: security asks for an inventory of what tools the agent can reach, and the answer is “depends which MCP servers were running when the conversation started.”

I’ve watched the same arc happen with microservices in 2014, with Kubernetes operators in 2019, and now with MCP in 2026. The fix is the same shape every time: a gateway. Not a proxy. A gateway — a thing that owns identity, policy, schema, and audit between the caller and the downstream surface.

This post is the architecture I keep arriving at, written down so I stop redrawing it on whiteboards. Five jobs an MCP gateway has to do, and a reproducible pattern for each.


Why “just use the MCP server directly” stops working

The MCP spec is a transport. It defines how a client discovers tools, calls them, and reads resources. It is intentionally unopinionated about who is calling, what they’re allowed to call, which arguments are acceptable, and what gets logged.

That is correct for a protocol. It is wrong for a runtime.

Direct-to-MCP works for one developer, one laptop, one server. It breaks the moment you have:

  • More than one MCP server — credentials sprawl, tool-name collisions, no single allowlist
  • More than one agent surface — Claude Desktop, Claude Code, an internal chat app, a backend worker — each holds its own copy of the config
  • Any production write tool — destructive verbs reach a downstream system with no second pair of eyes
  • Any compliance scope — SOC 2, HIPAA, FedRAMP, internal data-classification — and the auditor asks “show me every tool call against PHI in March”

An MCP gateway is the thing that lets you answer those questions without rewriting every agent.


The five jobs

Every gateway I’ve designed or reviewed ends up doing these five things. If you skip one, you’ll add it back inside a quarter.

1. Identity & scope-token exchange

The agent should never hold long-lived downstream credentials. The gateway should.

The pattern: the agent presents a user-bound identity token (OIDC ID token, signed session JWT, whatever your IdP emits). The gateway validates it, looks up the user’s authorizations, and exchanges that token for a short-lived, per-tool, per-call credential that it injects into the downstream MCP server’s request.

Agent ──[user JWT]──> Gateway ──[scoped token, TTL=60s]──> MCP server ──> Backend

Two properties matter:

  • The downstream credential is never visible to the agent. The agent cannot exfiltrate what it cannot see.
  • The credential is scoped to the call, not to the session. A read:incidents call cannot be replayed as write:incidents.

This is RFC 8693 token exchange applied to tool calls. The plumbing is boring. The discipline is not.

2. Tool allowlist (catalog/broker split)

Don’t let the agent enumerate every tool on every connected MCP server. Curate.

The pattern I keep landing on is a catalog/broker split:

  • The catalog is the set of tools the gateway advertises to the agent. It’s a flat, governed list — jira.search_issues, incidents.read_summary, runbook.fetch — namespaced, versioned, with descriptions the agent can actually use.
  • The broker is the layer that maps each catalog entry to a downstream MCP server, an authentication strategy, and an argument transform.
# gateway/catalog.yaml
- name: incidents.read_summary
  description: "Get the structured summary of an incident by ID"
  upstream:
    server: pagerduty-mcp
    tool: get_incident
    args_map:
      incident_id: id
  auth:
    strategy: token_exchange
    audience: pagerduty.api
    scopes: [incidents:read]
  classification: internal

The agent sees incidents.read_summary. It does not see pagerduty-mcp.get_incident, does not see the credentials, does not see the seventeen other tools that PagerDuty’s MCP server happens to expose.

Adding a new tool is a pull request against catalog.yaml. That’s the governance hook security and platform teams have been asking for.

3. Schema firewall

LLMs hallucinate arguments. They also occasionally exfiltrate data through return values. The gateway is the only honest place to enforce the shape of what crosses the boundary.

Two directions, both mandatory:

  • Inbound (agent → tool): strict JSON Schema validation. Reject calls with extra fields. Coerce types. Cap string lengths. If an id field has a known regex, enforce it. Refuse silently-truncating implicit casts.
  • Outbound (tool → agent): a redaction pass keyed by data classification. PII fields stripped or hashed. Secrets blocked entirely. Internal-only fields filtered when the calling agent is on an external surface.

This is the part most teams skip in v1 and regret in v2. A return-value firewall is what stops “we asked the agent to summarize incidents” from accidentally including a customer email in the summary that gets pasted into a Slack channel with external guests.

4. Quarantine queue for destructive calls

Read tools can be auto-approved by policy. Write tools — anything with create, update, delete, close, send, restart, scale, drain in its verb — should default to requiring a second signal.

The pattern: a quarantine queue. When the agent calls a write-classified tool, the gateway:

  1. Validates the call (schema firewall passes)
  2. Computes a call hash = sha256(agent_id, tool, canonical(args))
  3. Pushes a pending record to a queue
  4. Returns to the agent: {"status": "pending_approval", "approval_id": "..."}
  5. Waits for an out-of-band approval — Slack button, ServiceNow ticket, web UI — bound to the same human user whose token initiated the call

The agent gets a structured “pending” response it can reason about. The human gets a single-screen view of what’s about to happen. The audit log gets both halves.

The trap to avoid: do not let the agent self-approve by re-calling with force: true. That’s a backdoor with a friendly name. If you need an auto-approve path, scope it to specific tools in catalog.yaml, not to a runtime flag.

5. Provenance ledger

Every tool call gets one append-only log entry. Schema:

ts, agent_id, surface, user_id, conversation_id, turn_id,
tool, args_hash, args_classification,
result_hash, result_classification,
auth_token_id, latency_ms, decision (allow|deny|quarantine|approve|reject),
policy_versions[]

Two properties earn their place:

  • Hashes, not payloads. Storing every argument and result blob is a data-classification nightmare. Hashes give you replayability and dedup; the actual payloads live in a separately-governed object store with shorter retention and stricter access.
  • Policy versions. When an auditor asks “why did this call get approved in March but blocked in April,” the answer is the policy version that was live at the time. Embed it.

This is the artifact that turns “we use AI agents in production” into a sentence you can say to a regulator without flinching.


The reference architecture, end to end

┌─────────────────┐      ┌──────────────────────────────────────┐
│   Agent runtime │      │           MCP Gateway                │
│ (Claude / app)  │      │                                      │
│                 │ ───► │  1. Validate identity (OIDC/JWT)     │
│  user JWT +     │      │  2. Resolve catalog entry            │
│  tool call      │      │  3. Schema firewall (inbound)        │
│                 │      │  4. Token exchange ──► scoped cred   │
│                 │      │  5. Quarantine if write-classified   │
│                 │      │  6. Forward to upstream MCP server   │
│                 │      │  7. Schema firewall (outbound)       │
│                 │ ◄─── │  8. Append provenance ledger entry   │
└─────────────────┘      └──────────────────────────────────────┘


                          ┌──────────────────────────────────┐
                          │  Upstream MCP servers            │
                          │  (PagerDuty, Jira, internal,...) │
                          └──────────────────────────────────┘

You can implement this in 600 lines of Python on top of FastAPI plus an OPA sidecar for policy. The hard part is not the code. The hard part is treating catalog.yaml, the policy bundle, and the provenance ledger as first-class platform artifacts with their own review process — the same way you treat Terraform modules and Kubernetes admission policies.


What I’d build first

If I were standing this up at a new org tomorrow, in order:

  1. Identity + scope-token exchange. Without this, nothing else matters.
  2. Catalog with one tool. Wire one read-only tool through the gateway end-to-end. Resist the urge to onboard ten.
  3. Provenance ledger. Even before you have policies, capture the calls. You will want March’s data when April asks.
  4. Schema firewall, inbound only. Outbound redaction can come in v2 once you have a classification taxonomy.
  5. Quarantine queue. Right before — not after — you onboard the first write tool.

That order keeps the gateway useful at every step and avoids the dead-end where you build a beautiful policy engine and have no traffic to apply it to.


What this is not

This is not a vendor pitch. There’s a small but growing set of MCP gateway projects in the wild — some open source, some commercial — and any of them will save you time over rolling your own. Pick one that lets you express the five jobs above as configuration, not code. If you can’t find one that fits, the patterns above are enough to write your own in a sprint.

This is also not a substitute for sandboxing the agent runtime itself, for output filtering at the LLM boundary, or for the boring work of writing decent tool descriptions. Those are different problems. The gateway is just the one that’s getting skipped right now in the rush to ship agentic features, and the one that’s hardest to retrofit later.

The microservices generation eventually agreed: you don’t put services into production without a gateway. The agent generation is about to learn the same lesson. The cheap version is to learn it now.


If you’re designing or operating an MCP gateway and want to compare notes, my contact is on cloudandsre.com. I’m collecting field reports for a follow-up post on what breaks at scale.