Skills for AI agents that do SRE work

Most agent skills are chatbot prompts in disguise. The ones I just published are operator tools — opinionated, output-contracted, with mandatory discipline sections that say what the skill won't do. Three skills, portable across Claude Code, Claude Desktop, Codex CLI, and any markdown-prompt runtime.


There are now thousands of “agent skill” libraries floating around the AI tooling space. Most of them are chatbot prompts in a directory.

A skill that says “You are a helpful SRE assistant. Be concise.” is not an operator tool. It’s a vibe. The agent following it will produce confidently-wrong output the first time the input is sparse, hallucinate runbook URLs the first time the alert lacks an annotation, and quietly attribute blame the first time a postmortem mentions a person by name. None of that is what you want at 2 a.m.

I published cloudandsre-skills yesterday with a different premise: a skill is an operator tool, and operator tools have output contracts and discipline sections. Three skills are in v0.1.0, all of them shaped by the same SRE patterns I’ve been writing about all month. Apache-2.0, portable across Claude Code, Claude Desktop, Codex CLI, and any agent runtime that consumes a markdown prompt.

This post is about the design of those skills — what makes one of them different from the typical prompt template, and why each rule earns its place.


What’s in v0.1.0

SkillJob
prometheus-alert-explainAlertmanager / Prometheus alert payload → structured triage brief
incident-postmortem-draftSlack thread or message log → blameless postmortem markdown
waf-pattern-reviewCode, architecture, or design doc → opinionated reliability review

Each one corresponds to a real production discipline I’ve shipped or reviewed. prometheus-alert-explain is the prompt-shaped sibling of alert-explainer, the service I released yesterday. incident-postmortem-draft is the prompt-shaped sibling of incident-scribe, released last week. waf-pattern-review is the brand thesis distilled — it’s what a senior SRE asks when someone shows them a design.


The six rules every skill follows

Each one earns its place. None of them are aesthetic preferences.

1. One job per skill

A skill that does “explain alerts and draft postmortems and review architectures” is three skills wearing a trench coat. When an agent has to decide which mode of a many-mode skill to invoke, it picks wrong roughly half the time. A small library of single-job skills is more useful than a large library of multi-tools, the same way a Unix toolbox of grep + awk + sort is more useful than a single super-program with sixty flags.

2. Output contract is non-negotiable

Every skill specifies the exact sections of its output. Not “summarize the alert in a useful way” but “produce four sections in this order: Summary, Likely causes, Triage checklist, False-positive check, ending with a confidence line.”

The reason isn’t aesthetics; it’s chainability. If the next skill in a pipeline (or the next LLM call, or a downstream parser) needs to extract the triage steps, it can — because the section name is fixed. Free-form output makes a skill useless to chain. Once you have one skill in production, you’ll want a second one. The output contract is what makes that possible.

3. Discipline section is mandatory

Every skill has a Discipline section that says what the skill will not do.

  • prometheus-alert-explain will not propose remediation actions. It tells the engineer what to investigate, never what to mutate. This is the same line alert-explainer holds: the AI layer is additive, never load-bearing. An agent that confidently says “restart the pod” is exactly the agent you don’t want anywhere near production.
  • incident-postmortem-draft will not record proper names in the timeline. The timeline takes actor_role, never actor_name. The schema enforces blamelessness — if a person’s name doesn’t have a field to live in, it can’t appear. A human reviewer can add names back if the team’s culture wants attribution; the default artifact is structurally neutral.
  • waf-pattern-review will not pile on patterns the system doesn’t need. Most services do not need Event Sourcing or Leader Election. Recommending Bulkhead when there’s no shared-fate concern is operational surface area you charge a future team to maintain.

The discipline section is where the production experience lives. Without it, the skill is a prompt template.

4. No emoji, no boldface theater

Operators read these at 2 a.m. A skill that opens with 🚨🔥 ALERT ANALYSIS COMPLETE 🔥🚨 is rude. The output should look like a report, not a notification.

5. Examples are mandatory

Every skill ships with at least one tiny input → output example so a reader can verify the skill is doing what its description claims. The example is the only contract that survives a refactor — the body of a prompt can drift, but if the example still parses and still matches the output contract, the skill is still working.

6. Stay simple

A skill body longer than ~250 lines is doing too much. Either split it into two skills, or move the long parts into a referenced doc. The simplicity philosophy I keep coming back to: graduate when the system demands it, not when a checklist says so.


Why this is portable

SKILL.md is intentionally plain markdown with YAML frontmatter. The frontmatter is the metadata an agent runtime needs to decide whether to invoke; the body is the prompt the agent uses once invoked. Both pieces are stable across runtimes:

  • Claude Code drops SKILL.md files into .claude/skills/<name>/ and discovers them automatically.
  • Claude Desktop has a Skills feature that imports SKILL.md files (or zipped bundles).
  • Codex CLI doesn’t have a “skill” abstraction, but it consumes prompt files. The body of SKILL.md is a prompt; copy it in and it works.
  • Other agent SDKs that handle markdown prompts (cline, cursor’s .cursorrules analogues, Continue, etc.) work the same way.

The lint script scripts/lint_skills.py enforces the contract: every skill has frontmatter, the required name and description fields, name matches directory, description fits within Anthropic’s 1024-character limit, body is non-trivial. CI runs the lint on every push.


Try it in two minutes

git clone https://github.com/ajinb/cloudandsre-skills.git
mkdir -p .claude/skills
cp -r cloudandsre-skills/skills/* .claude/skills/

Restart Claude Code, then paste an Alertmanager payload (or a Slack incident thread, or a design doc) and watch the right skill engage.

If you write a skill that’s more opinionated than the ones I shipped — has a tighter discipline section, a smaller scope, a sharper output contract — open a PR. The contribution bar is “your skill is more useful at 2 a.m. than the alternative I’d write myself.”


Ajin Baby is an AI Platform & Cloud Infrastructure Architect at a Fortune-500 aviation technology company and the founder of cloudandsre.com, where he publishes production-grade tooling at the intersection of AI and SRE. He is currently pursuing an MS in Artificial Intelligence and is a 15+ year IEEE member.