Artificial Intelligence 22 min read

Taming AI Coding Agents: A Powerful Development Workflow with Engineering Discipline

The article introduces Matt Pocock's open‑source "skills" collection for AI coding agents, shows how it embeds traditional engineering practices such as alignment, domain modeling, TDD, and architecture governance into reusable command sets, and walks through a complete partial‑refund feature implementation using these skills.

BirdNest Tech Talk

May 18, 2026

Taming AI Coding Agents: A Powerful Development Workflow with Engineering Discipline

Overview

Matt Pocock, a well‑known figure in the TypeScript community, open‑sourced his personal set of AI coding .claude skills under the repository mattpocock/skills. The collection has quickly amassed 89K+ stars and 7.8K forks, demonstrating strong community interest.

These skills are not simple prompt templates; they are engineered workflows that bring core software‑engineering disciplines—requirement alignment, domain modeling, test‑driven development (TDD), and architecture governance—into the AI agent’s operation.

Getting Started

Installation is a single command: npx skills@latest add mattpocock/skills After selecting the desired skills and target agent (Claude Code, Codex, Cursor, Copilot, Windsurf, etc.), run the one‑time setup script: /setup-matt-pocock-skills The script asks for an issue tracker, label set, and documentation path, then configures the entire suite.

Skill Structure

Each skill is a directory containing a SKILL.md with YAML front‑matter (name, description) and a Markdown workflow body. Optional files include REFERENCE.md, EXAMPLES.md, or a scripts/ folder. The agent discovers skills via the front‑matter description.

Skills Classification

Engineering (11 skills)

/setup-matt-pocock-skills

– one‑time configuration wizard. /triage – state‑machine issue workflow, labeling bugs/enhancements and moving issues through states such as needs‑triage, needs‑info, ready‑for‑agent, ready‑for‑human, wontfix. /grill-me – generic questioning engine; most popular skill (157K installs). /grill-with-docs – enhanced version that consults CONTEXT.md and docs/adr/, challenges terminology, performs pressure testing, cross‑checks code, updates terminology, and creates ADRs when decisions meet three criteria (hard to reverse, lacking context, real trade‑off). /tdd – full TDD loop (RED → GREEN → REFACTOR) with checklist ensuring tests verify behavior via public interfaces. /diagnose – disciplined debugging loop: reproduce → minimise → hypothesise → instrument → fix → regression‑test. /improve-codebase-architecture – applies Ousterhout’s deep‑module theory, uses a “deletion test” to identify shallow modules, then runs an interactive redesign via /grill-with-docs. /to-prd – synthesises a PRD from the current conversation and submits it as a GitHub Issue. /to-issues – splits a PRD into vertically sliced, independently deliverable issues. /zoom-out – provides a high‑level explanation of an unfamiliar code segment. /prototype – creates disposable prototypes for design validation.

Productivity (4 skills)

/caveman

– ultra‑compressed response mode, reducing token usage by ~75%. /handoff – packages the current context into a handoff document for seamless continuation in another session or by another agent. /write-a-skill – scaffolds a new skill with proper YAML front‑matter and progressive disclosure.

Misc (4 tools)

/git-guardrails-claude-code

– adds confirmation prompts before dangerous Git commands. /setup-pre-commit – installs Husky pre‑commit hooks for linting, formatting, type‑checking, and testing. /migrate-to-shoehorn – migrates as type assertions to @total-typescript/shoehorn. /scaffold-exercises – creates a teaching‑exercise directory structure.

Complete Development Loop Example

Phase 1: Requirement Alignment

Input: "Our order system needs partial refunds; currently only full refunds are supported." The agent runs /grill-with-docs, scans CONTEXT.md for existing terms ( Order, Refund, LineItem), and asks clarifying questions about terminology, granularity, inventory handling, and payment‑gateway interaction.

Answers update CONTEXT.md (adding PartialRefund) and generate an ADR when a decision meets the three‑condition threshold.

Phase 2: Task Splitting

Running /to-prd creates a structured PRD and submits it as a GitHub Issue. Then /to-issues splits the PRD into vertical slices such as "partial refund per LineItem", "refund amount validation", and "refund status tracking".

Phase 3: Implementation (TDD)

For the "refund amount validation" issue, the agent starts /tdd. It plans the public interface, writes a failing test for over‑payment, creates the function refundService.requestPartialRefund(), implements the logic, and passes the test. The RED→GREEN→REFACTOR cycle repeats for additional edge cases.

Phase 4: Debugging

A production bug (duplicate refunds) triggers /diagnose. The agent guides the user through reproducing the issue, minimising the steps, hypothesising a race‑condition, instrumenting logs, fixing with a distributed lock, and adding a regression test.

Phase 5: Architecture Governance

After a week, the refund module shows bloated classes. /improve-codebase-architecture scans the code, uses the deletion test to flag RefundCalculator (shallow module) and tightly coupled validators. The agent proposes merging the validators, runs /grill-with-docs to redesign interfaces, and updates CONTEXT.md and ADRs accordingly.

Phase 6: Ongoing Tools

/zoom-out

– quickly explains unfamiliar code. /prototype – builds throw‑away prototypes. /caveman – reduces token consumption. /handoff – hands off long tasks to another session.

Why the Methodology Works

The approach amplifies classic software‑engineering principles:

Requirement alignment mirrors the "programmer’s dilemma" from The Pragmatic Programmer – solved by /grill-me.

Unified language (Ubiquitous Language) from Domain‑Driven Design is enforced by /grill-with-docs updating CONTEXT.md.

Fast feedback loops from Extreme Programming are realized through the combined /tdd + /diagnose cycle.

Deep‑module theory from Ousterhout guides /improve-codebase-architecture to keep modules simple and interfaces clean.

By packaging decades of engineering discipline into reusable markdown‑based commands, the skills turn AI agents into disciplined collaborators rather than uncontrolled code generators.

FAQ Highlights

Skills work with multiple agents (Claude Code, Codex, Cursor, Copilot, Windsurf).

Compared to static .cursorrules, skills are interactive and stateful.

Minimal onboarding path: /setup-matt-pocock-skills → /grill-with-docs → /tdd.

Team sharing is natural because CONTEXT.md and ADRs live in the Git repository.

Creating a new skill only requires a SKILL.md with proper front‑matter; /write-a-skill scaffolds it.

References

GitHub repository: https://github.com/mattpocock/skills

Distribution platform: https://skills.sh/mattpocock/skills

Matt Pocock’s Skills Newsletter: https://www.aihero.dev/s/skills-newsletter

The Pragmatic Programmer (David Thomas & Andrew Hunt)

Domain‑Driven Design (Eric Evans)

A Philosophy of Software Design (John Ousterhout)

Extreme Programming Explained (Kent Beck)

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

prompt engineering architecture governance TDD AI coding agents software engineering workflow

Written by

BirdNest Tech Talk

Author of the rpcx microservice framework, original book author, and chair of Baidu's Go CMC committee.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Overview

Getting Started

Skill Structure

Skills Classification

Engineering (11 skills)

Productivity (4 skills)

Misc (4 tools)

Complete Development Loop Example

Phase 1: Requirement Alignment

Phase 2: Task Splitting

Phase 3: Implementation (TDD)

Phase 4: Debugging

Phase 5: Architecture Governance

Phase 6: Ongoing Tools

Why the Methodology Works

FAQ Highlights

References

BirdNest Tech Talk

How this landed with the community

Was this worth your time?

0 Comments

Phase 1: Requirement Alignment

Phase 2: Task Splitting

Phase 3: Implementation (TDD)

Phase 4: Debugging

Phase 5: Architecture Governance

Phase 6: Ongoing Tools