Artificial Intelligence 18 min read

Anthropic’s Practical Approach to Context Engineering for AI Agents

The article explains how Anthropic engineers treat the limited token budget of large language models as a finite resource, detailing static configuration, runtime retrieval, and long‑task strategies such as compaction, structured notes, and sub‑agent architectures to build reliable, efficient AI agents.

Shi's AI Notebook

May 18, 2026

Anthropic’s Practical Approach to Context Engineering for AI Agents

Why context engineering matters for agents

LLMs exhibit “context rot” – accuracy of recall degrades as token count grows. Benchmarks on 18 mainstream LLMs (GPT‑4.1, Claude 4, Gemini 2.5, Qwen3…) show non‑uniform performance drop with longer inputs [5]. The root cause is the finite attention budget of the Transformer architecture, where n tokens require O(n²) pairwise attention [7][8], and training data are biased toward short sequences. Consequently each token consumes part of a limited attention budget and must be selected carefully.

Key elements of effective context

System prompt should be crystal‑clear and placed at a “Goldilocks” level – not overly brittle nor too vague. A common layout splits the prompt into sections such as <background_information>, <instructions>, ## Tool guidance, and ## Output description, using XML tags or markdown headings to delimit boundaries.

Tools act as the interface between the agent and the external world. Good tools are (1) independent of external state, (2) able to handle errors gracefully, and (3) self‑explanatory. Overly large toolsets cause overlap and hesitation in tool selection. Anthropic’s discussion of tool design lists these three traits explicitly [10].

Few‑shot examples should be diverse and representative rather than exhaustive edge cases, allowing the model to infer behavior from concise illustrations.

Context retrieval and agentic retrieval

Two main approaches exist: pre‑retrieval (embedding‑based) and agentic retrieval, where the agent decides at runtime what data to fetch via tools. Claude Code follows the latter, using Unix commands such as head and tail to view data without loading the full content into the context [13]. References such as file paths, query strings, or URLs serve as lightweight cues that the agent can resolve on demand, avoiding stale embeddings and preserving structural signals like directory hierarchy and timestamps.

Long‑task context engineering

When a task exceeds a single context window, three techniques are recommended:

Compaction : Summarize the current conversation when the window nears its limit, then start a fresh window with the summary plus recent files. Over‑compression can lose subtle context; iterative prompting helps balance recall and precision. Claude Code implements compaction by feeding the message history to the model for summarization, discarding redundant tool‑call outputs, and then continuing with the summary plus the five most recent files.

Structured notes (agentic memory) : Persist periodic notes outside the window (e.g., NOTES.md) and reload them when needed. Claude’s Pokémon‑playing experiments recorded step‑by‑step state (e.g., “after 1234 steps, Pikachu is level 8”) in a note file, enabling the agent to resume after thousands of steps without re‑loading the entire history [14].

Sub‑agent architecture : Delegate focused sub‑tasks to child agents with clean windows, returning concise summaries (≈1000–2000 tokens) to the main agent. The main agent handles high‑level planning while sub‑agents perform detailed search or tool usage; the summary ratio is justified because the main agent only needs conclusions, not raw evidence. This pattern is described in the multi‑agent research system paper [15].

The choice among these methods depends on task characteristics: compaction for fluid dialogues, structured notes for milestone‑driven development, and sub‑agents for complex research requiring parallel exploration.

Conclusion

Context engineering treats the model’s attention budget as a finite resource and continuously decides which information enters the window. Whether using compaction, token‑efficient tools, or on‑demand retrieval, the goal is to identify the smallest high‑signal token set that maximizes desired outcomes.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI agents LLM Prompt Engineering compaction Anthropic Context Engineering

Written by

Shi's AI Notebook

AI technology observer documenting AI evolution and industry news, sharing development practices.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.