The whole model in one picture: publish a manifest, let registries index it, let agents find and verify you, then connect over the native protocol.
On June 17, 2026, Google, Microsoft, and Hugging Face jointly announced Agentic Resource Discovery (ARD), with a coalition behind it that includes Amazon, Cisco, GitHub, Salesforce, Snowflake, and Nvidia. It’s Apache-2.0, it’s a v0.9 draft, and it’s built on the Linux Foundation’s AI Catalog data model. That’s a lot of logos for a JSON file at a well-known path — so I read the actual spec and published a real catalog to see what’s underneath the announcement. This post is what I found, with working code and the gotchas that bite in production.
The problem ARD is actually solving
Today, when an agent needs a capability it doesn’t already have wired in, its options are bad. It scrapes a webpage’s DOM and hopes, it screenshots a UI and reasons over pixels, or a human hardcodes an endpoint into the agent’s config months in advance. None of that scales to a web where every company wants to expose tools to agents.
ARD reframes this as three questions an agent needs answered at runtime:
- Where does the right capability live?
- Which one should I use out of the candidates?
- How do I verify it’s safe to connect to before I hand it a task?
The answer is deliberately boring, which is why it might work: a standard, machine-readable manifest at a predictable URL, plus “search engines for agents” that index those manifests. If robots.txt and DNS taught us anything, it’s that boring, well-known conventions are exactly what scale across an adversarial, decentralized web.
The two primitives
ARD is just two things:
- A static
ai-catalog.jsonmanifest, hosted by the provider at a well-known path, describing the capabilities that domain offers. - A registry API that crawls those catalogs, indexes them, and answers natural-language discovery queries with ranked results.
Providers publish. Registries index. Agents query, verify, and connect directly using the resource’s own protocol. Critically, ARD does not replace MCP or A2A — it’s the layer above them.
ARD is discovery; MCP/A2A/OpenAPI are connection. Different jobs, stacked.
If you’ve read my MCP gateway pattern post, this slots in cleanly: ARD is how an agent finds the MCP server and verifies it; the gateway is how it safely talks to it. Discovery and connection are separate concerns, and conflating them is how you end up with agents scraping HTML.
Publishing a real catalog
Here’s a genuine ai-catalog.json — the shape straight from the spec, advertising an MCP server. Note the details, because the spec is stricter than the launch-day examples suggest.
{
"specVersion": "1.0",
"host": {
"displayName": "cloudandsre.com",
"identifier": "did:web:cloudandsre.com"
},
"entries": [
{
"identifier": "urn:air:cloudandsre.com:mcp:incident-scribe",
"displayName": "Incident Scribe",
"type": "application/mcp-server-card+json",
"url": "https://cloudandsre.com/mcp/incident-scribe.json",
"description": "Turns a Slack incident thread into a structured postmortem.",
"capabilities": ["SummarizeThread", "DraftPostmortem"],
"representativeQueries": [
"summarize this incident slack thread into a postmortem",
"draft an incident report from these messages"
],
"version": "1.2.0",
"updatedAt": "2026-07-01T00:00:00Z"
}
]
}
Every field here is load-bearing:
identifieruses theurn:air:scheme —urn:air:<publisher-fqdn>:<namespace>:<name>. Noturn:ai:. That one-letter mistake fails validation, and it was wrong in more than one early example floating around.typeis an IANA media type. For an MCP server it’s exactlyapplication/mcp-server-card+json. A2A agents useapplication/a2a-agent-card+json; OpenAPI tools and nested catalogs have their own types.urlORdata, never both. Reference the full server card by URL, or inline it underdata— exactly one. The spec rejects entries with both, and rejects unknown properties on thehostobject.representativeQueriescaps at 2–5 entries. These aren’t decoration — registries embed them to build the semantic index, so this is the single highest-leverage field for whether agents actually find you. Write them the way a user would phrase the request, not the way your docs describe the feature.capabilitieslets a registry filter without fetching the full server card — cheap pre-filtering before the expensive fetch.
That’s the manifest. Now you have to make agents aware it exists.
Advertising it: four signals, and the config to serve them
The spec defines four ways to point crawlers at your catalog. Belt and suspenders — use several.
1. The well-known path. This blog runs on Caddy, so here’s the actual serving config — serve the file with the right content type and advertise it in an HTTP Link header on every response:
cloudandsre.com {
root * /var/www/cloudandsre
file_server
# Correct media type for the catalog
@catalog path /.well-known/ai-catalog.json
header @catalog Content-Type application/json
# Advertise the catalog on every response
header Link `</.well-known/ai-catalog.json>; rel="ai-catalog"`
}
2. robots.txt — a directive pointing compliant crawlers straight at it:
User-agent: *
Allow: /
Agentmap: https://cloudandsre.com/.well-known/ai-catalog.json
3. An HTML <link> in your <head>, so the catalog is discoverable during ordinary web indexing:
<link rel="ai-catalog" href="/.well-known/ai-catalog.json">
4. A DNS SVCB record under _catalog._agents.cloudandsre.com, for zero-fetch discovery at the DNS layer — the same delegated-lookup trick that makes DNS itself scale, now pointing at your catalog.
Not theoretical: what actually breaks
Publishing the JSON is the easy 20%. Here’s the 80% that bites, drawn from the spec’s strict edges and from Todd O’Rourke’s hands-on writeup of implementing it on a real stack:
Your control panel may hijack /.well-known/. cPanel (and other panels) create a real /.well-known/ directory on disk for ACME/SSL challenges, and the web server serves it straight from the filesystem — bypassing your app’s routing entirely. If you’re generating the catalog dynamically in your app, it silently never gets served. Fix: serve ai-catalog.json as a static file, and confirm what’s actually on disk.
Your WAF can make the catalog invisible. This is the nasty one. A web application firewall that blocks unfamiliar crawler user-agents will happily return 200 to your browser and 403 to a registry’s crawler. Your catalog looks perfect when you check it and is undiscoverable to every agent on the planet. You cannot assume — you have to test your own stack against real agent user-agents:
import httpx
URL = "https://cloudandsre.com/.well-known/ai-catalog.json"
AGENTS = ["ClaudeBot/1.0", "GPTBot/1.0", "python-httpx/0.27", "Mozilla/5.0"]
for ua in AGENTS:
r = httpx.get(URL, headers={"User-Agent": ua}, follow_redirects=True)
ctype = r.headers.get("content-type", "")
ok = r.status_code == 200 and "json" in ctype
print(f"{'OK ' if ok else 'FAIL'} {ua:22} {r.status_code} {ctype}")
If any line prints FAIL, an agent using that user-agent can’t see you. Run this from outside your network — a WAF that trusts your office IP will lie to you.
An empty catalog is worthless. The blunt lesson from anyone who’s shipped one: publishing the manifest with no working tool behind it accomplishes nothing. Expose at least one real, callable capability — an MCP endpoint that returns actual results — or don’t bother. Discovery is only the front door; there has to be a room behind it.
A quick sanity-check script for the manifest itself, before you even get to serving:
import json, re, sys
REQUIRED = {"identifier", "displayName", "type"}
URN = re.compile(r"^urn:air:[^:]+:.+")
cat = json.load(open(sys.argv[1]))
assert cat.get("specVersion"), "missing specVersion"
assert cat.get("host", {}).get("identifier"), "host needs an identifier (did:web:...)"
for e in cat.get("entries", []):
missing = REQUIRED - e.keys()
assert not missing, f"{e.get('identifier','?')}: missing {missing}"
assert URN.match(e["identifier"]), f"bad URN: {e['identifier']}"
assert ("url" in e) ^ ("data" in e), f"{e['identifier']}: need exactly one of url/data"
rq = e.get("representativeQueries", [])
assert 2 <= len(rq) <= 5 or not rq, f"{e['identifier']}: representativeQueries must be 2–5"
print(f"OK: {len(cat['entries'])} entries valid")
Trust: the part that matters for regulated industries
Discovery without verification is a security incident waiting to happen — an agent connecting to whoever claims to be your bank. ARD’s answer is domain-anchored trust: identity is cryptographically bound to domain ownership, and the URN’s <publisher> segment must match the cryptographic identity. This is the same principle as TLS certificates — you trust the domain because it proved control of the domain.
The optional but important trustManifest is where enterprise adoption lives:
"trustManifest": {
"identity": "spiffe://cloudandsre.com/mcp/incident-scribe",
"identityType": "spiffe",
"attestations": [
{ "type": "SOC2-Type2", "uri": "https://trust.cloudandsre.com/soc2.pdf",
"digest": "sha256:..." },
{ "type": "HIPAA-Audit", "uri": "https://trust.cloudandsre.com/hipaa.pdf" }
],
"signature": "eyJhbGciOiJFUzI1NiJ9.."
}
identityis a workload identity — SPIFFE, a DID, or an HTTPS URI — not just a name.attestationscarry compliance proofs (SOC2, HIPAA, GDPR) as URIs with content digests, so an agent — or the platform team governing it — can filter to only resources that meet a compliance bar before connecting.signatureis a detached JWS over the entry.
For anyone building agents in a regulated, safety-critical industry, this is the headline. It’s the mechanism that turns “some tool on the internet” into “a SOC2-attested, HIPAA-audited, cryptographically-identified capability from a verified publisher.” It’s also the natural enforcement point for the argument I’ve made before: no anonymous inference endpoints. ARD gives you the identity; your gateway enforces the policy.
Intent-based discovery: the registry query API
The registry side is a plain REST API. An agent searches in natural language and gets ranked, structured results:
curl -X POST https://registry.example.com/api/v1/search \
-H "Content-Type: application/json" \
-d '{
"query": {
"text": "summarize a slack incident thread into a postmortem",
"filter": {
"type": ["application/mcp-server-card+json"],
"trustManifest.attestations.type": ["SOC2-Type2"]
}
},
"federation": "referrals",
"pageSize": 5
}'
A few things worth internalizing:
- Filters use dot-separated paths over any catalog field. Within a key, values OR together; across keys, they AND. The example above means: MCP servers AND SOC2-attested. Any field is filterable, including
metadata.*. federationcontrols reach:none(this registry only),referrals(local results plus pointers to other registries), orauto(query upstream and merge).scoreis relevance, 0–100, and never trust. The spec is explicit: ranking reflects semantic match, not safety or compliance. Trust is a separate, verifiable dimension you check via the trust manifest. Conflating “ranked #1” with “safe to use” would be the classic mistake — and the spec goes out of its way to prevent it.- There’s also an
/exploreendpoint that returns facet counts (“how many MCP servers, by publisher”) without ranking or burning context-window tokens on full results.
That last point connects to something I’ve written about repeatedly: the context window is a budget. Structured discovery — a small JSON result describing a tool — is dramatically cheaper in tokens than dumping a scraped DOM or a screenshot into the model and asking it to reason out what the page can do. Cheaper discovery means more budget left for the actual task. That efficiency, not the novelty, is the real reason structured discovery beats scraping.
Should you adopt it yet? My honest take
ARD is v0.9, and real-world adoption today is close to zero — GitHub Copilot’s Agent Finder is the most prominent registry implementation so far, and most of the coalition’s support is still announcement-stage. So calibrate:
- Publish a catalog now. It’s a static file and a few DNS/config lines — a cheap, two-way door. If ARD wins, you’re indexed early; if it stalls, you’ve lost an afternoon. Start with one real, callable capability and honest
representativeQueries. - Wire up the trust manifest if you’re in a regulated space. This is where ARD is most differentiated and where it’s least served by the alternatives (a bare MCP endpoint has no standard way to carry a SOC2 attestation).
- Don’t build your own registry yet. Indexing the agentic web is a search-engine problem; let Google, GitHub, and the coalition spend that money. Consume registries; don’t become one.
- Watch the WAF and the well-known path. The two failure modes above will make a perfect catalog invisible, and you won’t notice because it looks fine from your browser.
The takeaway
Agentic Resource Discovery is the unglamorous infrastructure the agentic web has been missing: a well-known manifest, a search registry, and domain-anchored trust — robots.txt and DNS, reimagined for agents instead of crawlers. It doesn’t compete with MCP; it’s the phone book and the caller-ID check that sit above MCP, A2A, and OpenAPI. The spec is young and adoption is early, but the cost of publishing a catalog is trivial and the trust layer is genuinely useful today for regulated workloads. The theoretical part is a JSON file. The real part is serving it so agents can actually see it, backing it with a capability they can actually call, and proving you are who you say you are. That last mile is where discovery stops being a spec and starts being infrastructure.