Agentic Resource Discovery: I Read the Spec, Then Published a Catalog

Google, Microsoft, and Hugging Face shipped Agentic Resource Discovery — a well-known ai-catalog.json plus a registry search API so agents can find, verify, and connect to tools without scraping the DOM. The real schema, a working catalog, serving config, and the gotchas that break it.


Agentic Resource Discovery infographic: the runtime workflow (provider, registry, agent, verify, connect), the three primitives (capability manifest, search registries, domain-anchored trust), and the four publication mechanisms

The whole model in one picture: publish a manifest, let registries index it, let agents find and verify you, then connect over the native protocol.

On June 17, 2026, Google, Microsoft, and Hugging Face jointly announced Agentic Resource Discovery (ARD), with a coalition behind it that includes Amazon, Cisco, GitHub, Salesforce, Snowflake, and Nvidia. It’s Apache-2.0, it’s a v0.9 draft, and it’s built on the Linux Foundation’s AI Catalog data model. That’s a lot of logos for a JSON file at a well-known path — so I read the actual spec and published a real catalog to see what’s underneath the announcement. This post is what I found, with working code and the gotchas that bite in production.

The problem ARD is actually solving

Today, when an agent needs a capability it doesn’t already have wired in, its options are bad. It scrapes a webpage’s DOM and hopes, it screenshots a UI and reasons over pixels, or a human hardcodes an endpoint into the agent’s config months in advance. None of that scales to a web where every company wants to expose tools to agents.

ARD reframes this as three questions an agent needs answered at runtime:

  1. Where does the right capability live?
  2. Which one should I use out of the candidates?
  3. How do I verify it’s safe to connect to before I hand it a task?

The answer is deliberately boring, which is why it might work: a standard, machine-readable manifest at a predictable URL, plus “search engines for agents” that index those manifests. If robots.txt and DNS taught us anything, it’s that boring, well-known conventions are exactly what scale across an adversarial, decentralized web.

The two primitives

ARD is just two things:

  • A static ai-catalog.json manifest, hosted by the provider at a well-known path, describing the capabilities that domain offers.
  • A registry API that crawls those catalogs, indexes them, and answers natural-language discovery queries with ranked results.

Providers publish. Registries index. Agents query, verify, and connect directly using the resource’s own protocol. Critically, ARD does not replace MCP or A2A — it’s the layer above them.

Layered diagram: ARD discovery layer (ai-catalog.json + registry) sits above the connection layer (MCP, A2A, OpenAPI), which sits above the actual resources

ARD is discovery; MCP/A2A/OpenAPI are connection. Different jobs, stacked.

If you’ve read my MCP gateway pattern post, this slots in cleanly: ARD is how an agent finds the MCP server and verifies it; the gateway is how it safely talks to it. Discovery and connection are separate concerns, and conflating them is how you end up with agents scraping HTML.

Publishing a real catalog

Here’s a genuine ai-catalog.json — the shape straight from the spec, advertising an MCP server. Note the details, because the spec is stricter than the launch-day examples suggest.

{
  "specVersion": "1.0",
  "host": {
    "displayName": "cloudandsre.com",
    "identifier": "did:web:cloudandsre.com"
  },
  "entries": [
    {
      "identifier": "urn:air:cloudandsre.com:mcp:incident-scribe",
      "displayName": "Incident Scribe",
      "type": "application/mcp-server-card+json",
      "url": "https://cloudandsre.com/mcp/incident-scribe.json",
      "description": "Turns a Slack incident thread into a structured postmortem.",
      "capabilities": ["SummarizeThread", "DraftPostmortem"],
      "representativeQueries": [
        "summarize this incident slack thread into a postmortem",
        "draft an incident report from these messages"
      ],
      "version": "1.2.0",
      "updatedAt": "2026-07-01T00:00:00Z"
    }
  ]
}

Every field here is load-bearing:

  • identifier uses the urn:air: schemeurn:air:<publisher-fqdn>:<namespace>:<name>. Not urn:ai:. That one-letter mistake fails validation, and it was wrong in more than one early example floating around.
  • type is an IANA media type. For an MCP server it’s exactly application/mcp-server-card+json. A2A agents use application/a2a-agent-card+json; OpenAPI tools and nested catalogs have their own types.
  • url OR data, never both. Reference the full server card by URL, or inline it under data — exactly one. The spec rejects entries with both, and rejects unknown properties on the host object.
  • representativeQueries caps at 2–5 entries. These aren’t decoration — registries embed them to build the semantic index, so this is the single highest-leverage field for whether agents actually find you. Write them the way a user would phrase the request, not the way your docs describe the feature.
  • capabilities lets a registry filter without fetching the full server card — cheap pre-filtering before the expensive fetch.

That’s the manifest. Now you have to make agents aware it exists.

Advertising it: four signals, and the config to serve them

The spec defines four ways to point crawlers at your catalog. Belt and suspenders — use several.

1. The well-known path. This blog runs on Caddy, so here’s the actual serving config — serve the file with the right content type and advertise it in an HTTP Link header on every response:

cloudandsre.com {
    root * /var/www/cloudandsre
    file_server

    # Correct media type for the catalog
    @catalog path /.well-known/ai-catalog.json
    header @catalog Content-Type application/json

    # Advertise the catalog on every response
    header Link `</.well-known/ai-catalog.json>; rel="ai-catalog"`
}

2. robots.txt — a directive pointing compliant crawlers straight at it:

User-agent: *
Allow: /

Agentmap: https://cloudandsre.com/.well-known/ai-catalog.json

3. An HTML <link> in your <head>, so the catalog is discoverable during ordinary web indexing:

<link rel="ai-catalog" href="/.well-known/ai-catalog.json">

4. A DNS SVCB record under _catalog._agents.cloudandsre.com, for zero-fetch discovery at the DNS layer — the same delegated-lookup trick that makes DNS itself scale, now pointing at your catalog.

Not theoretical: what actually breaks

Publishing the JSON is the easy 20%. Here’s the 80% that bites, drawn from the spec’s strict edges and from Todd O’Rourke’s hands-on writeup of implementing it on a real stack:

Your control panel may hijack /.well-known/. cPanel (and other panels) create a real /.well-known/ directory on disk for ACME/SSL challenges, and the web server serves it straight from the filesystem — bypassing your app’s routing entirely. If you’re generating the catalog dynamically in your app, it silently never gets served. Fix: serve ai-catalog.json as a static file, and confirm what’s actually on disk.

Your WAF can make the catalog invisible. This is the nasty one. A web application firewall that blocks unfamiliar crawler user-agents will happily return 200 to your browser and 403 to a registry’s crawler. Your catalog looks perfect when you check it and is undiscoverable to every agent on the planet. You cannot assume — you have to test your own stack against real agent user-agents:

import httpx

URL = "https://cloudandsre.com/.well-known/ai-catalog.json"
AGENTS = ["ClaudeBot/1.0", "GPTBot/1.0", "python-httpx/0.27", "Mozilla/5.0"]

for ua in AGENTS:
    r = httpx.get(URL, headers={"User-Agent": ua}, follow_redirects=True)
    ctype = r.headers.get("content-type", "")
    ok = r.status_code == 200 and "json" in ctype
    print(f"{'OK ' if ok else 'FAIL'} {ua:22} {r.status_code} {ctype}")

If any line prints FAIL, an agent using that user-agent can’t see you. Run this from outside your network — a WAF that trusts your office IP will lie to you.

An empty catalog is worthless. The blunt lesson from anyone who’s shipped one: publishing the manifest with no working tool behind it accomplishes nothing. Expose at least one real, callable capability — an MCP endpoint that returns actual results — or don’t bother. Discovery is only the front door; there has to be a room behind it.

A quick sanity-check script for the manifest itself, before you even get to serving:

import json, re, sys

REQUIRED = {"identifier", "displayName", "type"}
URN = re.compile(r"^urn:air:[^:]+:.+")

cat = json.load(open(sys.argv[1]))
assert cat.get("specVersion"), "missing specVersion"
assert cat.get("host", {}).get("identifier"), "host needs an identifier (did:web:...)"

for e in cat.get("entries", []):
    missing = REQUIRED - e.keys()
    assert not missing, f"{e.get('identifier','?')}: missing {missing}"
    assert URN.match(e["identifier"]), f"bad URN: {e['identifier']}"
    assert ("url" in e) ^ ("data" in e), f"{e['identifier']}: need exactly one of url/data"
    rq = e.get("representativeQueries", [])
    assert 2 <= len(rq) <= 5 or not rq, f"{e['identifier']}: representativeQueries must be 2–5"

print(f"OK: {len(cat['entries'])} entries valid")

Trust: the part that matters for regulated industries

Discovery without verification is a security incident waiting to happen — an agent connecting to whoever claims to be your bank. ARD’s answer is domain-anchored trust: identity is cryptographically bound to domain ownership, and the URN’s <publisher> segment must match the cryptographic identity. This is the same principle as TLS certificates — you trust the domain because it proved control of the domain.

The optional but important trustManifest is where enterprise adoption lives:

"trustManifest": {
  "identity": "spiffe://cloudandsre.com/mcp/incident-scribe",
  "identityType": "spiffe",
  "attestations": [
    { "type": "SOC2-Type2", "uri": "https://trust.cloudandsre.com/soc2.pdf",
      "digest": "sha256:..." },
    { "type": "HIPAA-Audit", "uri": "https://trust.cloudandsre.com/hipaa.pdf" }
  ],
  "signature": "eyJhbGciOiJFUzI1NiJ9.."
}
  • identity is a workload identity — SPIFFE, a DID, or an HTTPS URI — not just a name.
  • attestations carry compliance proofs (SOC2, HIPAA, GDPR) as URIs with content digests, so an agent — or the platform team governing it — can filter to only resources that meet a compliance bar before connecting.
  • signature is a detached JWS over the entry.

For anyone building agents in a regulated, safety-critical industry, this is the headline. It’s the mechanism that turns “some tool on the internet” into “a SOC2-attested, HIPAA-audited, cryptographically-identified capability from a verified publisher.” It’s also the natural enforcement point for the argument I’ve made before: no anonymous inference endpoints. ARD gives you the identity; your gateway enforces the policy.

Intent-based discovery: the registry query API

The registry side is a plain REST API. An agent searches in natural language and gets ranked, structured results:

curl -X POST https://registry.example.com/api/v1/search \
  -H "Content-Type: application/json" \
  -d '{
    "query": {
      "text": "summarize a slack incident thread into a postmortem",
      "filter": {
        "type": ["application/mcp-server-card+json"],
        "trustManifest.attestations.type": ["SOC2-Type2"]
      }
    },
    "federation": "referrals",
    "pageSize": 5
  }'

A few things worth internalizing:

  • Filters use dot-separated paths over any catalog field. Within a key, values OR together; across keys, they AND. The example above means: MCP servers AND SOC2-attested. Any field is filterable, including metadata.*.
  • federation controls reach: none (this registry only), referrals (local results plus pointers to other registries), or auto (query upstream and merge).
  • score is relevance, 0–100, and never trust. The spec is explicit: ranking reflects semantic match, not safety or compliance. Trust is a separate, verifiable dimension you check via the trust manifest. Conflating “ranked #1” with “safe to use” would be the classic mistake — and the spec goes out of its way to prevent it.
  • There’s also an /explore endpoint that returns facet counts (“how many MCP servers, by publisher”) without ranking or burning context-window tokens on full results.

That last point connects to something I’ve written about repeatedly: the context window is a budget. Structured discovery — a small JSON result describing a tool — is dramatically cheaper in tokens than dumping a scraped DOM or a screenshot into the model and asking it to reason out what the page can do. Cheaper discovery means more budget left for the actual task. That efficiency, not the novelty, is the real reason structured discovery beats scraping.

Should you adopt it yet? My honest take

ARD is v0.9, and real-world adoption today is close to zero — GitHub Copilot’s Agent Finder is the most prominent registry implementation so far, and most of the coalition’s support is still announcement-stage. So calibrate:

  • Publish a catalog now. It’s a static file and a few DNS/config lines — a cheap, two-way door. If ARD wins, you’re indexed early; if it stalls, you’ve lost an afternoon. Start with one real, callable capability and honest representativeQueries.
  • Wire up the trust manifest if you’re in a regulated space. This is where ARD is most differentiated and where it’s least served by the alternatives (a bare MCP endpoint has no standard way to carry a SOC2 attestation).
  • Don’t build your own registry yet. Indexing the agentic web is a search-engine problem; let Google, GitHub, and the coalition spend that money. Consume registries; don’t become one.
  • Watch the WAF and the well-known path. The two failure modes above will make a perfect catalog invisible, and you won’t notice because it looks fine from your browser.

The takeaway

Agentic Resource Discovery is the unglamorous infrastructure the agentic web has been missing: a well-known manifest, a search registry, and domain-anchored trust — robots.txt and DNS, reimagined for agents instead of crawlers. It doesn’t compete with MCP; it’s the phone book and the caller-ID check that sit above MCP, A2A, and OpenAPI. The spec is young and adoption is early, but the cost of publishing a catalog is trivial and the trust layer is genuinely useful today for regulated workloads. The theoretical part is a JSON file. The real part is serving it so agents can actually see it, backing it with a capability they can actually call, and proving you are who you say you are. That last mile is where discovery stops being a spec and starts being infrastructure.

Frequently asked questions

What is Agentic Resource Discovery (ARD)?

ARD is an open specification, announced June 17, 2026 by Google, Microsoft, and Hugging Face, for publishing, discovering, and verifying AI capabilities across the web. Providers host a machine-readable ai-catalog.json at /.well-known/ai-catalog.json, registries crawl and index those catalogs, and agents query registries in natural language to find tools, verify the publisher's identity, and connect using the tool's native protocol (MCP, A2A, or OpenAPI).

Does ARD replace MCP?

No. ARD is the discovery layer that sits above MCP, A2A, and OpenAPI. It answers 'which capability exists, where does it live, and can I trust it?' — then the agent connects using the resource's native protocol. ARD is the phone book and caller-ID check; MCP is the phone call.

Where do I host the ARD catalog?

At the well-known path https://yourdomain.com/.well-known/ai-catalog.json, served with Content-Type application/json. You can additionally advertise it via a robots.txt Agentmap directive, an HTML <link rel="ai-catalog"> tag, an HTTP Link header, or a DNS SVCB record under _catalog._agents.yourdomain.com.

Is ARD production-ready?

As of mid-2026 it is v0.9 draft, Apache-2.0 licensed, and adoption is early — GitHub Copilot's Agent Finder is the most prominent registry implementation. Publishing a catalog is a cheap, reversible step worth doing now; building your own registry is not yet worth it for most teams.