The Most Systematic 102‑Page Review of Agent Harnesses
This article provides a comprehensive overview of the "Code as Agent Harness" paradigm, detailing its three‑layer architecture, the roles of code in reasoning, acting, and environment modeling, the mechanisms that enable reliable long‑term execution, and how multi‑agent systems scale the harness through shared code and feedback loops.
Introduction
The rapid progress of large language models (LLMs) in code generation has shifted the focus from code as a final output to code as the operational foundation—called the Agent Harness—of autonomous systems.
Code as Agent Harness
The authors define Code as Agent Harness as code that is executable , inspectable , and stateful , making it the optimal medium to connect model reasoning, environment interaction, and persistent state.
Three‑Layer Architecture
Harness Interface (Interface Layer)
Code serves as a universal interface for agents, supporting three dimensions:
Reasoning : Externalizing intermediate computation to executables or symbolic solvers (Program‑Delegated Reasoning, Formal Verification, Iterative Code‑Grounded Reasoning).
Acting : Translating high‑level intent into executable policies (Grounded Skill Selection, Programmatic Policy Generation, Lifelong Code‑Based Agents).
Environment Modeling : Representing environments with executable artifacts (Structured World Representations, Execution‑Trace World Modeling, Code‑Grounded Evaluation Environments, Verifiable Environment Construction).
Harness Mechanisms (Mechanism Layer)
Ensures reliable long‑term execution through five interacting dimensions:
Planning : Four paradigms—Linear Decomposition, Structure‑Grounded Planning, Search‑Based Planning, Orchestration‑Based Planning.
Memory & Context Engineering : Five memory types—Working, Semantic, Experiential, Long‑Term, Multi‑Agent—plus context compaction and state offloading.
Tool Use : Four categories—Function‑Oriented, Environment‑Interaction, Verification‑Driven, Workflow‑Orchestration.
Harness Control : The Plan‑Execute‑Verify (PEV) loop with components—Plan as Contract, Sandboxed Execution, Permissioned State Transition, Deterministic Verification.
Agentic Harness Engineering (AHE) : Deep telemetry collects trace‑level data; an Evolution Agent diagnoses failures and proposes component revisions, which are evaluated and promoted via Governed Harness Mutation.
Scaling the Harness (Scaling Layer)
When task complexity exceeds a single agent's capacity, multi‑agent systems (MAS) share executable code artifacts, using execution feedback as objective convergence signals. Design space includes:
Specialized functional roles (Manager, Planner, Coder, Reviewer, Tester, Verifier).
Interaction modes (Collaborative Synthesis, Critique & Repair, Adversarial Validation, Reasoning Debate).
Workflow topologies (Waterfall, Agile/Iterative, Hierarchical, Star, Dynamic DAG).
Conclusion
The review serves as a detailed roadmap for researchers building coding agents, computer‑use agents, embodied AI, or scientific discovery agents.
Code as Agent Harness: Toward Executable, Verifiable, and Stateful Agent Systems
https://arxiv.org/pdf/2605.18747
https://github.com/YennNing/Awesome-Code-as-Agent-Harness-PapersSigned-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
