Advanced AI Context Engineering: Building Operable Worlds (Part 2)

This article examines how to evolve AI prompt engineering into full‑stack context and environment engineering, detailing six practical design patterns from the Manus system, the limits of Vibe Coding, the Spec‑Driven development workflow, and concrete steps to give models a persistent, controllable world for long‑term tasks.

Software Engineering 3.0 Era
Software Engineering 3.0 Era
Software Engineering 3.0 Era
Advanced AI Context Engineering: Building Operable Worlds (Part 2)

KV‑Cache‑Centric Context Design

In multi‑turn agents the input token count far exceeds output (≈100:1). Whether the input hits the KV cache determines cost and latency; cached input costs $0.3 per million tokens versus $3 per million uncached.

Manus keeps the prompt prefix stable:

All state is append‑only, never rewriting earlier content.

Structured data (e.g., JSON) is deterministically serializable: fixed key order, no random fields, no timestamps or random IDs.

Points where the cache must be broken are marked explicitly, e.g., inserting a special marker that tells the inference engine not to inherit KV beyond that point.

Replace per‑round prompt reconstruction with a fixed prefix plus an appended log and use a stable serialization function for all JSON/structured context to reduce cost and latency.

Masking Tools Instead of Deleting

Removing a tool from the tool list changes the prompt prefix, causing KV‑cache misses, and the model may still expect the removed tool.

Manus keeps the tool list stable in system/tool definitions and controls availability with a state‑machine + logits mask during decoding:

If the current state disallows Tool A, mask its tokens.

Boost the weight of tokens for required tools.

Action‑space control is placed in the decoder rather than altering the context.

File System as “Ultimate Context”

Long‑context agents face three classic problems:

Tool outputs can be large and overflow the context window.

Beyond a certain length model performance degrades (attention erosion, information dilution).

Even with KV cache, long inputs remain expensive.

Manus treats the file system as part of the context:

“Write file” offloads large content to disk.

“Read file” fetches a local excerpt or summary when needed.

Only file paths/URLs and short summaries remain in the prompt.

[Observation]
已爬取页面:https://example.com/a
内容已保存至:/data/pages/a.md
内容摘要:这是一篇关于上下文工程的博客,包含 A/B/C 三部分……
Keep only references and summaries in the prompt; bulk lives in the file system. The file system becomes “long‑term memory + scratchpad” with virtually unlimited capacity.

Control Attention with Re‑statement

When a task requires dozens of tool calls, the agent can forget the original goal. Manus repeatedly restates the overall goal and the current todo list at the end of the context, updating it after a few steps. These re‑statements appear in the latest messages, ensuring the model’s attention stays on them.

[System 或 Tool 输出的一部分]
当前任务全局目标(请始终对齐):
1. 为用户完成 X 功能
2. 确保代码具备单元测试与基本文档
3. 最终以 README + 源码压缩包形式交付

当前未完成子任务列表:
- [ ] 完成模块 A 的接口定义
- [ ] 实现模块 A 核心逻辑
- [ ] 为模块 A 添加 3 条单测
…
Moving key goal information to the most recent token window counteracts “middle‑information forgetting”.

Preserve Errors in Context

All failed actions, including full stack traces, are retained in the context. This provides the model with negative samples and counter‑factual feedback, allowing it to lower the posterior success probability of previously failing actions and avoid repeating the same mistake.

The context itself acts as an online RLHF / experience replay buffer.

Inject Diversity to Avoid Few‑Shot Lock‑In

When few‑shot examples are overly uniform, the model over‑fits and hallucinates. Manus adds small structured variations:

Different serialization templates.

Varying field order.

Alternative natural‑language phrasing.

Controlled “format noise” within safety limits.

These cues remind the model that patterns are not hard rules, widening its generalisation space.

Spec‑Driven Development vs Vibe Coding

Vibe Coding (Prompt → Code) suffers from three fatal issues:

Users rarely write high‑quality prompts covering edge cases and non‑functional requirements.

Generated code accumulates technical debt: missing docs, tests, architectural constraints.

Developers struggle to understand and extend the generated code.

Spec‑Driven Development flips the workflow to Spec → Design → Tasks → Code and enforces three layers of specification:

Requirements (requirements.md) : written in an EARS‑style conditional format, e.g., WHEN [condition] THE SYSTEM SHALL [expected behavior].

Design (design.md) : architecture, module boundaries, interfaces, data models, front‑end/back‑end layering.

Tasks (tasks.md) : each task is an executable TODO with clear input/output and acceptance criteria (including tests).

These specifications become rich context for the model.

Practical Steps

Convert free‑form requirements into EARS‑style entries, e.g.:

WHEN 用户在移动端点击「一键下单」
THE SYSTEM SHALL 创建一条订单记录,并在 3 秒内返回下单结果。

Prompt the model to produce a design draft (module structure, key classes, dependencies). Humans review and edit; the refined design becomes high‑priority context for subsequent agents.

Automatically split the design into concrete tasks, e.g.:

Task: 为订单模块新增 create_order API
- 修改文件: /backend/order/api.py
- 要求:
  - 接收参数: user_id, item_id, quantity
  - 检查库存 & 用户有效性
  - 写入数据库并返回订单号
  - 添加至少 3 条单元测试用例

When an agent executes a task, its context consists of:

System instructions (team coding standards).

Long‑term memory (project info, coding conventions, security policies).

Relevant snippets from requirements.md, design.md, and the target file (or its path + summary).

Output expectations: code diff, updated docs/tests, and a natural‑language explanation.

The result is a model that follows a full set of constraints rather than “writing code by feel”, and the context becomes reusable across new members or new tasks, dramatically lowering long‑term maintenance cost.

From Context Engineering to Environment Engineering

Context engineering focuses on what the model can see (tokens, RAG, tools, state machines). Environment engineering adds a perceivable, controllable, evolving world on top of that context.

Stages Comparison

Prompt Engineering : focus – how to write a single prompt; typical problem – clarity of wording, tone, instructions; capability – one‑shot Q&A / generation.

Context Engineering : focus – what information the model can see; typical problem – memory, RAG, tools, state machine; capability – multi‑turn dialogue, simple agents.

Environment Engineering : focus – what world the model lives in; typical problem – world state, rules, feedback, persistence; capability – long‑term tasks, self‑adapting, collaborative agents.

Environment engineering adds:

Persistent world state (files, databases, task boards, external service statuses).

Rules & constraints (permissions, security, resource limits).

Feedback & reward signals (success/failure, business KPIs, explicit user feedback).

Multi‑agent collaboration protocols (who can modify what, conflict resolution).

Key differences:

State persistence : context relies on a single call’s window + external storage; environment treats world state as the primary data source, with context acting as an observation window.

Control granularity : context controls token visibility; environment controls read/write permissions and external event triggers.

Objective : context measures success per response; environment measures long‑term business metrics and overall system behaviour.

Mini‑Environment Practices

Integrate a file system or database into agent design. Decide which information is off‑loaded to external storage versus kept as ephemeral context, and expose a well‑defined read/write API.

Build a minimal task‑and‑world‑state panel that can read, update, and log task status (pending/running/done/failed) and associate each task with relevant files or resources.

Let agents turn human feedback into part of the environment: mark tasks as success/failure and write failure reasons into logs, so future decisions incorporate this experience.

Define permission boundaries: explicitly list readable/writable directories or tables, require human confirmation for high‑risk APIs, and use a two‑step “plan‑then‑confirm” flow for critical operations.

Conclusion

Context engineering solves how to pack information into a single page so the model doesn’t make stupid mistakes. Products such as Manus, Kiro, and Claude Code show the next step is to give the model an operable world where it can see, remember, think, act, fail, and correct itself. In that world the context is merely a window; the environment is where the model truly lives.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AIPrompt EngineeringAgentKV CacheContext EngineeringSpec-Driven
Software Engineering 3.0 Era
Written by

Software Engineering 3.0 Era

With large models (LLMs) reshaping countless industries, software engineering is leading the charge into the Software Engineering 3.0 era—model-driven development and operations. This account focuses on the new paradigms, theories, and methods of SE 3.0, and showcases its tools and practices.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.