Why Context Size Isn’t Everything: A Practical Guide to AI Agent Context Engineering
The article explains that an AI agent’s performance hinges not on how much context is fed to it but on what information is placed in the main thread, and it presents a six‑layer framework—prompt, rule files, skills, MCP, subagents, and artifacts—to systematically engineer context for reliable coding agents.
Layer 1 – Prompt holds the current task contract
A prompt should contain only the four elements that define the immediate work: Goal, Context, Constraints, and Done‑when criteria. Example (OpenAI Codex best practices [1]):
Goal: Fix the checkout page when the coupon API returns 500 and orders cannot be submitted.
Context: Relevant code is in apps/web/src/checkout, error log at logs/checkout-500.log.
Constraints: Do not modify the payment SDK; keep existing error‑message style; do not add new dependencies.
Done when: New or updated tests pass; local checkout tests run; root cause and risk are documented.Repeating static constraints in every prompt inflates the main thread; such durable rules belong elsewhere.
Layer 2 – AGENTS.md / CLAUDE.md store durable rules
Persistent project rules are kept in AGENTS.md (OpenAI) or CLAUDE.md (Anthropic). They should include:
Repository structure (business code, tests, scripts, generated artifacts)
Common commands (run unit tests, type‑check, format, build)
Edit boundaries (files or directories that must not be touched, directories that require extra confirmation)
Acceptance criteria (conditions for task completion)
Team preferences (naming, error handling, logging, documentation, commit messages)
Known pitfalls (historical failure points)
They should exclude vision statements, outdated introductions, generic AI advice, abstract principles, or mismatched commands. The default size limit is 32 KiB ( project_doc_max_bytes) [2], encouraging governance of rule files.
Layer 3 – Skill encapsulates repeatable workflows
A workflow that is used repeatedly should be promoted to a Skill. Signals that a process qualifies as a Skill:
Appears in three or more tasks
Contains multiple ordered steps that cannot be expressed in a single prompt
Has checkpoints, failure branches, or manual confirmations
Is portable across projects or teammates
Requires assets such as scripts, templates, or configurations
Example: an article‑production pipeline (topic selection, research, drafting, formatting, proofreading, image insertion, HTML generation, WeChat draft publishing) is a Skill rather than a prompt.
Layer 4 – MCP provides controlled external context
The Model‑Context‑Protocol (MCP) lets agents call external tools (logs, databases, monitoring, cloud services) without copying data into the prompt. MCP defines a data layer (tools, resources, prompts, notifications) and a transport layer (communication, authentication) [4]. Adding MCP introduces two risks:
Irrelevant information can flood the main thread.
Permissions expand, potentially allowing write operations.
Before attaching an MCP, ask four questions:
Is the external data truly more up‑to‑date than repository information?
Does the agent need read, write, or just a summary?
Do tool responses include time, source, sample size, and permission boundaries?
Will failures stop the workflow instead of being wrapped as conclusions?
If any answer is unclear, defer MCP integration. Minimal MCP definition example:
Goal: Enable the agent to troubleshoot production errors.
Needed tools: read‑only log query, read‑only error details, read‑only deployment records.
Unneeded tools: restart services, modify configs, execute write‑SQL, delete alerts.
Output: For each query return time range, filter criteria, sample count, and uncertainty.MCP Roots specification further defines file‑system boundaries for client‑server interactions [5].
Layer 5 – Subagent isolates noisy work
Subagents run in separate context windows with independent tool permissions. They handle noisy tasks (search, log analysis, test‑failure attribution) and return concise, judgment‑ready summaries, not raw data. Subagent output template:
Conclusion: I believe the root cause is A.
Evidence: File X line 42, three samples from log Y, failure info from test Z.
Excludes: B and C have been checked and do not match symptoms.
Uncertain: D lacks production configuration for verification.
Next step: The main thread should validate A before fixing.This “report contract” ensures the main thread retains the problem definition, confirmed constraints, current decision, final deliverable, and risk points while delegating heavy lifting to subagents [6][7].
Layer 6 – Artifact compacts long‑running tasks
For lengthy tasks, periodically create structured artifacts (e.g., progress.md, evidence.md, handoff.md) that capture the current goal, confirmed facts, excluded paths, hypotheses, next steps, and risks. Example artifact:
# Current Task Status
## Goal
Fix checkout page coupon‑API 500 error.
## Confirmed Facts
- 500 occurs after order draft is created.
- Front‑end error boundary does not catch couponService.reject.
- Payment SDK is not invoked.
## Excluded Paths
- Not a token‑expiry issue.
- Not an empty price‑calc response.
## Current Hypothesis
Coupon error not mapped to a recoverable UI state.
## Next Step
Add failing test, then adjust coupon error mapping.
## Risks
Do not alter payment flow; only handle coupon‑failure branch.Artifacts improve correctness, completeness, size, and trajectory of context [8].
Six‑Layer Context Template
Current Task – Prompt: goal, context, constraints, completion criteria.
Project Rules – AGENTS.md/CLAUDE.md: repo structure, commands, edit boundaries, acceptance standards.
Repeatable Process – Skill: multi‑step workflow, checkpoints, templates, scripts.
External Context – MCP: changing data, tools, resources with defined boundaries.
Noise Isolation – Subagent: search, log, test summaries, third‑party fact extraction.
Status Recovery – Artifact: facts, hypotheses, exclusions, next steps.
When an agent degrades, use this checklist to locate the failing layer:
Prompt unclear about the current goal?
Project rules missing or outdated?
Repeatable workflow not encapsulated as a Skill?
MCP returning too much irrelevant context?
Main thread polluted by logs or search results?
Long task not compacted into an artifact?
References
[1] OpenAI Codex best practices: https://developers.openai.com/codex/learn/best‑practices
[2] AGENTS.md: https://developers.openai.com/codex/guides/agents‑md
[3] CLAUDE.md: https://code.claude.com/docs/en/memory
[4] MCP architecture: https://modelcontextprotocol.io/docs/learn/architecture
[5] MCP Roots spec: https://modelcontextprotocol.io/specification/2025-06-18/client/roots
[6] OpenAI Codex subagents: https://developers.openai.com/codex/concepts/subagents
[7] Claude Code subagents: https://code.claude.com/docs/en/sub‑agents
[8] Advanced Context Engineering for Coding Agents: https://www.humanlayer.dev/blog/advanced‑context‑engineering
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ArcThink
ArcThink makes complex information clearer and turns scattered ideas into valuable insights and understanding.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
