Artificial Intelligence 20 min read

12 Claude Code Rules Reduce Error Rate from 41% to 3%

After Karpathy's original four CLAUDE.md rules cut Claude's coding error rate from 41% to 11%, the author tested 30 repositories over six weeks, added eight new rules to address emerging failure scenarios, and demonstrated a further drop to 3% error with a compliance rate around 76%, supported by detailed metrics and real‑world examples.

AI Architecture Hub

May 28, 2026

12 Claude Code Rules Reduce Error Rate from 41% to 3%

In January 2026 Andrej Karpathy posted a tweet highlighting three failure modes of Claude Code: implicit assumptions, over‑complexity, and unintended damage to unrelated code. Forrest Zhang extracted these problems into four behavioral rules placed in a CLAUDE.md file and published the file on GitHub. Testing the four‑rule template on 30 repositories over six weeks reduced the observed error rate from roughly 41 % to about 11 % and, in well‑matched scenarios, to under 3 %.

Why the rules matter

Claude Code’s CLAUDE.md is often ignored or inflated. When the file exceeds ~200 lines, Anthropic’s documentation reports a sharp drop in rule compliance (from ~80 % to ~30 %). Maintaining a concise rule set therefore preserves token budget and session consistency.

Original four rules

Think before coding : Explicitly state assumptions, ask when uncertain, and propose simpler alternatives.

Simplicity first : Use the minimal code needed, avoid speculative features, and simplify if senior engineers deem the solution overly complex.

Precise edits : Modify only the necessary parts, never “optimise” adjacent code or refactor working code; follow the existing style.

Goal‑driven execution : Define success criteria, iterate until the goal is met, and let Claude aim for the outcome rather than following step‑by‑step instructions.

New failure scenarios (May 2026)

Agent conflicts and hook cascade failures.

Skill‑loading conflicts.

Multi‑step workflow interruptions across sessions.

Additional eight rules

Model only for judgment tasks : Use Claude for classification, summarisation, and information extraction; avoid routing, retry logic, or deterministic transformations that should be handled by code.

Strict token budget : Limit a single task to 4 000 tokens and an entire session to 30 000 tokens; when approaching the limit, summarise and restart instead of silently exceeding.

Expose conflicts, no compromise : When two code‑base patterns clash, choose one, explain the choice, and mark the other for removal; never mix conflicting patterns.

Read before writing : Before adding code, review exported items, direct callers, and shared utilities; if the surrounding code is unclear, ask clarifying questions.

Test intent, not just behaviour : Tests must reflect why a behaviour matters; trivial tests that always pass are ineffective.

Checkpoint after each major step : Summarise completed work, verified items, and remaining tasks; pause if the state cannot be clearly described.

Follow repository conventions : Respect existing naming, component style, and lint rules even if personal preferences differ; propose changes explicitly.

Explicit failure reporting : Never claim success when data is silently skipped; surface any uncertainty instead of hiding it.

Experimental results

Across six weeks, 50 representative tasks were run on the same 30 repositories under three configurations (baseline, four‑rule, twelve‑rule). Metrics recorded were:

Error rate : proportion of tasks requiring correction or rewrite.

Compliance rate : frequency Claude followed the explicit rules.

Expanding from four to twelve rules kept compliance high (78 % → 76 %) while the error rate fell an additional 8 percentage points, reaching ~3 % overall. The four‑rule baseline already lowered error from 41 % to ~11 %; the eight new rules covered the previously uncovered failure scenarios without sacrificing model attention.

Unsuccessful attempts

Alternative rule sources (Reddit/X rewrites, domain‑specific rules) were either too narrow or caused compliance to drop sharply when the rule count exceeded ~12 (compliance fell to 52 % beyond 14 rules). Over‑specific tools such as “always use ESLint” failed when the tool was absent.

Full 12‑rule template

# CLAUDE.md — 12 Rules Template

## Rule 1 — Think before coding
Explicitly state assumptions. Ask when uncertain; don’t guess.
Provide multiple interpretations for ambiguous cases.
Propose simpler solutions when they exist.
Stop if confused and state the unclear part.

## Rule 2 — Simplicity first
Solve with minimal code; no speculative features.
Don’t implement out‑of‑scope functionality; avoid abstraction for one‑off use.
If senior engineers deem it overly complex, simplify.

## Rule 3 — Precise edits
Modify only necessary parts; don’t “optimise” adjacent code, comments, or formatting.
Don’t refactor code that works; follow existing style.

## Rule 4 — Goal‑driven execution
Define success criteria; loop until met.
Don’t follow step‑by‑step prompts; define the desired outcome and iterate.
Clear success criteria enables autonomous loops.

## Rule 5 — Model only for judgment tasks
Applicable: classification, writing, summarisation, information extraction.
**Not applicable**: routing, retry, deterministic transformations.
Let code handle what code can solve.

## Rule 6 — Token budget (non‑suggestive)
Single task: 4000 tokens; single session: 30000 tokens.
When near budget, summarise then restart.
Explicitly warn about overflow; don’t silently exceed.

## Rule 7 — Expose conflicts, no compromise
If two patterns conflict, pick one (updated or better‑tested), explain why, and mark the other for cleanup.
Never mix conflicting patterns.

## Rule 8 — Read before writing
Before adding code, read exported items, direct callers, and common utilities.
If the surrounding structure is unclear, ask first.
"Seems unrelated" is the most dangerous statement.

## Rule 9 — Test intent, not just behaviour
Tests must show why the behaviour matters, not just what it does.
If a test can’t fail when business logic changes, it’s useless.

## Rule 10 — Checkpoint after each major step
Summarise completed work, verified items, and remaining tasks.
If the state can’t be clearly described, pause and restate.

## Rule 11 — Follow repository conventions
Respect existing naming, component style, and lint rules even if you prefer otherwise.
If you think a convention is harmful, raise it explicitly; don’t change silently.

## Rule 12 — Explicit failure reporting
If content is silently skipped, you cannot claim “migration complete”.
If any test is skipped, you cannot claim “tests passed”.
Default to exposing uncertainty rather than hiding it.

Installation

Two steps:

# 1. Append Karpathy’s original four rules to your CLAUDE.md
curl https://raw.githubusercontent.com/forrestchang/andrej-karpathy-skills/main/CLAUDE.md >> CLAUDE.md

# 2. Paste rules 5–12 from the template below the existing content

Save the file at the repository root; the >> operator ensures the new rules are appended rather than overwriting existing project‑specific rules.

Core understanding

CLAUDE.md

is a behavioral contract that directly addresses observed failure modes. The original four rules mitigate implicit assumptions, over‑design, unrelated code damage, and weak success criteria. The eight new rules add token budgeting, multi‑step checkpoints, test‑quality enforcement, and explicit failure reporting to handle the agent‑driven, multi‑repo workflows that emerged by May 2026.

Conclusion

Extending the rule set from four to twelve reduced the overall error rate from 41 % to 3 % while keeping compliance around 76 %. The twelve‑rule template provides a concrete, reproducible contract for Claude Code projects across diverse codebases.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

prompt engineering AI coding software development Claude error reduction token budget

Written by

AI Architecture Hub

Focused on sharing high-quality AI content and practical implementation, helping people learn with fewer missteps and become stronger through AI.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Why the rules matter

Original four rules

New failure scenarios (May 2026)

Additional eight rules

Experimental results

Unsuccessful attempts

Full 12‑rule template

Installation

Core understanding

Conclusion

AI Architecture Hub

How this landed with the community

Was this worth your time?

0 Comments

New failure scenarios (May 2026)