No anonymous inference endpoints — the MCP security principle you're probably violating

In 2026 the NSA and NIST both put MCP and AI agents on notice: the protocol that lets your agents act is also a centralized funnel for prompt injection and privilege abuse. Why 'no anonymous inference endpoints' is the principle most teams break by default, and how token exchange (RFC 8693) plus policy-as-code closes it.


Two things happened in 2026 that should change how you wire agents into production. The NSA published a cybersecurity information sheet on Model Context Protocol, warning that MCP standardizes how agents talk to the world and in doing so creates a predictable, centralized funnel for prompt injection, schema manipulation, and trust-boundary abuse. And NIST stood up an AI Agent Standards Initiative, which is the polite federal way of announcing that the Wild West phase for autonomous systems is ending.

When the NSA and NIST move in the same quarter, the underlying issue is usually mundane and widespread. This one is: most teams are running anonymous inference endpoints, and they don’t think of it that way.


The principle, stated plainly

Every call that reaches a model or a tool must carry a verifiable identity, be authorized for that specific action, and leave an audit record. No exceptions, no anonymous paths.

Read that and most people nod — of course we authenticate. Then you look at how the agent actually reaches the database, the cluster, the cloud API, and you find one of these:

  • A single service account whose key is baked into the agent, used for every action regardless of which user or workflow triggered it.
  • An MCP server that any agent on the network can reach, trusting whatever the model decided to send because the model is “internal.”
  • A tool endpoint with no record of who invoked it, which agent relayed it, or what context produced the request.

Each of those is an anonymous inference endpoint. The model — a non-deterministic component that an attacker can influence through prompt injection — is being trusted as if it were authenticated code. The NSA’s framing is exact: compromise the MCP node or poison the model’s input, and you have a direct line into the agent’s reach, with a shared credential and no attribution. That’s not a hardening gap. That’s the front door.


Why the model is the part you must not trust

Classic service security has a clean trust boundary: authenticated code on one side, untrusted input on the other, validation at the seam. AI breaks the diagram because the untrusted input flows into the component making the decisions. A prompt-injected document, a poisoned retrieval result, a malicious tool description — any of these can steer the model into emitting a tool call it should never make. And if that tool call rides a shared service-account credential to your infrastructure, the model just became a confused deputy with admin rights.

So the architectural rule follows from the threat model, not from compliance: the model’s output is a request, never an authorization. The agent doesn’t get to act because the model said so. It gets to act because an identity it carries is authorized, at the gateway, for that exact action, right now. This is the same instinct behind the broker I described in the MCP gateway pattern — but this post is about the one job in that pipeline that anonymous endpoints skip: proving who is asking before deciding whether they may.


Closing it: identity that survives the hop

The mechanism that kills anonymous endpoints is token exchangeRFC 8693, the OAuth 2.0 extension built for exactly this. The user (or upstream service) authenticates and gets a token. When the agent needs to act on their behalf, it doesn’t reach for a god-mode service account — it exchanges the user’s token for a narrowly scoped, short-lived token that says this principal, this action, this resource, expiring shortly. The identity survives the hop from user to agent to tool, instead of being laundered into a shared credential at the first step.

That gives you three things anonymous endpoints can’t:

  • Attribution. Every tool call traces to the human or service that originated it — not to a blob named agent-prod.
  • Least privilege per action. The token scopes to the operation at hand, so a compromised agent can’t do more than the task in front of it authorized.
  • A short fuse. Short-lived tokens mean a leaked credential is a minutes-long problem, not a forever one — which is the same posture the 2026 guidance pushes toward with cryptographic agility and rotation.

Token exchange answers who. The other half is whether — and that’s policy.


Authorization as code, not vibes

Once a call carries a real identity, something has to decide if that identity may take that action. Burying those rules in if statements across each tool is how you end up with inconsistent, unauditable, drift-prone authorization. The pattern that holds up is policy-as-code with a dedicated engine — I use Cedar — where authorization lives in declarative, reviewable, version-controlled policies the gateway evaluates on every call:

// An agent may read logs only for its own service,
// only when acting on behalf of an on-call engineer.
permit(
  principal in Role::"oncall-agent",
  action == Action::"logs:read",
  resource
)
when {
  resource.service == principal.assignedService &&
  context.onBehalfOf.role == "oncall-engineer"
};

The win isn’t the syntax — it’s that authorization becomes a thing you can review in a PR, test, audit, and reason about, instead of an emergent property of scattered code. When NIST’s standards land and someone asks “show me your agent authorization model,” the answer is a directory of policies, not a shrug. This is also the input-validation discipline the 2026 guidance keeps stressing: every tool invocation validated against a defined schema and an explicit policy before it executes — never trusted because it came from “our” model.


The checklist

You’re running anonymous inference endpoints if you can’t answer yes to all of these:

  1. Identity on every call. Does every model and tool invocation carry a verifiable principal — not a shared service account?
  2. Token exchange, not god-mode keys. Do agents act with narrowly-scoped, short-lived tokens derived from the originating user (RFC 8693), rather than one baked-in credential?
  3. Policy-as-code authorization. Is “may this principal take this action on this resource” decided by reviewable policy at a gateway, on every call?
  4. Schema validation at the seam. Is every tool input checked against a defined schema and range before execution?
  5. An audit record. Can you reconstruct who triggered what, through which agent, with what context — after the fact?

If any answer is no, that’s the path an injected prompt walks straight through. The NSA and NIST didn’t invent a new threat in 2026; they put a federal stamp on one that’s been shipping quietly inside every agent deployment that treats “internal” as a synonym for “trusted.” Identity, scoped tokens, policy-as-code, validation, audit. None of it is exotic. All of it is the difference between an agent that can act and an attacker who can act through it.


Related: The MCP gateway pattern · What is Model Context Protocol? · When NOT to use AI in production SRE