AI Platform & Cloud
Infrastructure Architect
Designing AI-enabled cloud platforms — from agentic infrastructure and MCP servers to self-healing systems — for safety-critical and regulated industries.
By Ajin Baby · 15+ years architecting cloud systems · 2x founder before architect. Writing and shipping open-source code at the intersection of AI, cloud, and reliability.
Recent Posts
-
Harness engineering: the third phase of AI maturity
Agent = Model + Harness. In 2026 the model is rarely the bottleneck — the scaffolding around it is. Here's what a production-grade SRE harness actually contains, with a ~40-line reference implementation you can run offline: tool orchestration, verification, memory, guardrails, and observability.
-
Observability and incident response — the SRE basics
A primer on the two operational disciplines every SRE team needs to run: observability (logs, metrics, traces) and incident response (roles, severities, blameless postmortems). Includes the practical shape of an incident and how AI is starting to absorb the lower rungs of both.
-
Toil and the 50% rule — what it is, how to measure it, and how to kill it
A primer on toil — the manual, repetitive, automatable work that quietly eats SRE teams. Covers Google's six-part definition, the 50% cap, how to measure toil honestly, and how the 2026 generation of AI agents changes the toil-elimination playbook.
-
SLI, SLO, SLA, and error budgets — the reliability contract explained
A primer on the four numbers every SRE team needs to agree on: Service Level Indicators, Objectives, Agreements, and the error budget that falls out of them. Includes concrete examples, the math behind 'nines,' and what the contract looks like once AI agents start contributing to the burn rate.
-
What is Site Reliability Engineering (SRE)?
A primer on Site Reliability Engineering — what SRE is, where it came from at Google, how it differs from DevOps and Platform Engineering, and the core principles that make it work. Includes a short note on what changes in 2026 as AI moves into the on-call seat.
-
What are vector embeddings?
A short primer on vector embeddings — the numerical representation that lets a computer treat 'the meaning of this text' as something it can search, cluster, and compare. Covers what an embedding actually is, how similarity works, why model choice matters more than retrieval quality, and the production failure modes you only see in evaluation.