notes from the field
Making cloud run itself,
one agent at a time.
Hands-on writing on agentic infrastructure, MCP, and reliability — from someone shipping it in production at a Fortune-500 aviation platform, not theorizing about it. By Ajin Baby.
Right now:
Latest field notes
-
The trust gap: bounded autonomy for AI SRE agents
Vendors call 2026 the year of autonomous incident resolution. But SREs still face 50+ alerts a day at 60% false positives, and the trust frameworks lag the agents. Here's the autonomy-ladder model for what an AI SRE agent should — and should never — do on its own.
-
MCP goes stateless — what the 2026 release candidate means for your SRE tooling
The 2026-07-28 MCP release candidate is the biggest revision since launch: it deletes the session handshake for a stateless HTTP core and hardens OAuth against mix-up attacks. Here's what changes for the agents wired into your production systems — and the migration window you have to act in.
-
The AI-native SRE stack — a 2026 reference guide
A practitioner's map of the AI-native SRE stack in 2026: the six layers from telemetry to bounded remediation, the tools that actually fill each one, and an honest read on where AI pays off — and where the New Relic and Datadog data says it doesn't yet.
Tools I build in the open
Who's writing this
Ajin Baby — 15 years making cloud platforms reliable, 2x founder before that. Azure AI Engineer, Neo4j Certified, IEEE member. More about me →