12 Claude Code Rules Reduce Error Rate from 41% to 3%
After Karpathy's original four CLAUDE.md rules cut Claude's coding error rate from 41% to 11%, the author tested 30 repositories over six weeks, added eight new rules to address emerging failure scenarios, and demonstrated a further drop to 3% error with a compliance rate around 76%, supported by detailed metrics and real‑world examples.
In January 2026 Andrej Karpathy posted a tweet highlighting three failure modes of Claude Code: implicit assumptions, over‑complexity, and unintended damage to unrelated code. Forrest Zhang extracted these problems into four behavioral rules placed in a CLAUDE.md file and published the file on GitHub. Testing the four‑rule template on 30 repositories over six weeks reduced the observed error rate from roughly 41 % to about 11 % and, in well‑matched scenarios, to under 3 %.
Why the rules matter
Claude Code’s CLAUDE.md is often ignored or inflated. When the file exceeds ~200 lines, Anthropic’s documentation reports a sharp drop in rule compliance (from ~80 % to ~30 %). Maintaining a concise rule set therefore preserves token budget and session consistency.
Original four rules
Think before coding : Explicitly state assumptions, ask when uncertain, and propose simpler alternatives.
Simplicity first : Use the minimal code needed, avoid speculative features, and simplify if senior engineers deem the solution overly complex.
Precise edits : Modify only the necessary parts, never “optimise” adjacent code or refactor working code; follow the existing style.
Goal‑driven execution : Define success criteria, iterate until the goal is met, and let Claude aim for the outcome rather than following step‑by‑step instructions.
New failure scenarios (May 2026)
Agent conflicts and hook cascade failures.
Skill‑loading conflicts.
Multi‑step workflow interruptions across sessions.
Additional eight rules
Model only for judgment tasks : Use Claude for classification, summarisation, and information extraction; avoid routing, retry logic, or deterministic transformations that should be handled by code.
Strict token budget : Limit a single task to 4 000 tokens and an entire session to 30 000 tokens; when approaching the limit, summarise and restart instead of silently exceeding.
Expose conflicts, no compromise : When two code‑base patterns clash, choose one, explain the choice, and mark the other for removal; never mix conflicting patterns.
Read before writing : Before adding code, review exported items, direct callers, and shared utilities; if the surrounding code is unclear, ask clarifying questions.
Test intent, not just behaviour : Tests must reflect why a behaviour matters; trivial tests that always pass are ineffective.
Checkpoint after each major step : Summarise completed work, verified items, and remaining tasks; pause if the state cannot be clearly described.
Follow repository conventions : Respect existing naming, component style, and lint rules even if personal preferences differ; propose changes explicitly.
Explicit failure reporting : Never claim success when data is silently skipped; surface any uncertainty instead of hiding it.
Experimental results
Across six weeks, 50 representative tasks were run on the same 30 repositories under three configurations (baseline, four‑rule, twelve‑rule). Metrics recorded were:
Error rate : proportion of tasks requiring correction or rewrite.
Compliance rate : frequency Claude followed the explicit rules.
Expanding from four to twelve rules kept compliance high (78 % → 76 %) while the error rate fell an additional 8 percentage points, reaching ~3 % overall. The four‑rule baseline already lowered error from 41 % to ~11 %; the eight new rules covered the previously uncovered failure scenarios without sacrificing model attention.
Unsuccessful attempts
Alternative rule sources (Reddit/X rewrites, domain‑specific rules) were either too narrow or caused compliance to drop sharply when the rule count exceeded ~12 (compliance fell to 52 % beyond 14 rules). Over‑specific tools such as “always use ESLint” failed when the tool was absent.
Full 12‑rule template
# CLAUDE.md — 12 Rules Template
## Rule 1 — Think before coding
Explicitly state assumptions. Ask when uncertain; don’t guess.
Provide multiple interpretations for ambiguous cases.
Propose simpler solutions when they exist.
Stop if confused and state the unclear part.
## Rule 2 — Simplicity first
Solve with minimal code; no speculative features.
Don’t implement out‑of‑scope functionality; avoid abstraction for one‑off use.
If senior engineers deem it overly complex, simplify.
## Rule 3 — Precise edits
Modify only necessary parts; don’t “optimise” adjacent code, comments, or formatting.
Don’t refactor code that works; follow existing style.
## Rule 4 — Goal‑driven execution
Define success criteria; loop until met.
Don’t follow step‑by‑step prompts; define the desired outcome and iterate.
Clear success criteria enables autonomous loops.
## Rule 5 — Model only for judgment tasks
Applicable: classification, writing, summarisation, information extraction.
**Not applicable**: routing, retry, deterministic transformations.
Let code handle what code can solve.
## Rule 6 — Token budget (non‑suggestive)
Single task: 4000 tokens; single session: 30000 tokens.
When near budget, summarise then restart.
Explicitly warn about overflow; don’t silently exceed.
## Rule 7 — Expose conflicts, no compromise
If two patterns conflict, pick one (updated or better‑tested), explain why, and mark the other for cleanup.
Never mix conflicting patterns.
## Rule 8 — Read before writing
Before adding code, read exported items, direct callers, and common utilities.
If the surrounding structure is unclear, ask first.
"Seems unrelated" is the most dangerous statement.
## Rule 9 — Test intent, not just behaviour
Tests must show why the behaviour matters, not just what it does.
If a test can’t fail when business logic changes, it’s useless.
## Rule 10 — Checkpoint after each major step
Summarise completed work, verified items, and remaining tasks.
If the state can’t be clearly described, pause and restate.
## Rule 11 — Follow repository conventions
Respect existing naming, component style, and lint rules even if you prefer otherwise.
If you think a convention is harmful, raise it explicitly; don’t change silently.
## Rule 12 — Explicit failure reporting
If content is silently skipped, you cannot claim “migration complete”.
If any test is skipped, you cannot claim “tests passed”.
Default to exposing uncertainty rather than hiding it.Installation
Two steps:
# 1. Append Karpathy’s original four rules to your CLAUDE.md
curl https://raw.githubusercontent.com/forrestchang/andrej-karpathy-skills/main/CLAUDE.md >> CLAUDE.md
# 2. Paste rules 5–12 from the template below the existing contentSave the file at the repository root; the >> operator ensures the new rules are appended rather than overwriting existing project‑specific rules.
Core understanding
CLAUDE.mdis a behavioral contract that directly addresses observed failure modes. The original four rules mitigate implicit assumptions, over‑design, unrelated code damage, and weak success criteria. The eight new rules add token budgeting, multi‑step checkpoints, test‑quality enforcement, and explicit failure reporting to handle the agent‑driven, multi‑repo workflows that emerged by May 2026.
Conclusion
Extending the rule set from four to twelve reduced the overall error rate from 41 % to 3 % while keeping compliance around 76 %. The twelve‑rule template provides a concrete, reproducible contract for Claude Code projects across diverse codebases.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
AI Architecture Hub
Focused on sharing high-quality AI content and practical implementation, helping people learn with fewer missteps and become stronger through AI.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
