The Most Systematic 102‑Page Review of Agent Harnesses

This article provides a comprehensive overview of the "Code as Agent Harness" paradigm, detailing its three‑layer architecture, the roles of code in reasoning, acting, and environment modeling, the mechanisms that enable reliable long‑term execution, and how multi‑agent systems scale the harness through shared code and feedback loops.

PaperAgent
PaperAgent
PaperAgent
The Most Systematic 102‑Page Review of Agent Harnesses

Introduction

The rapid progress of large language models (LLMs) in code generation has shifted the focus from code as a final output to code as the operational foundation—called the Agent Harness—of autonomous systems.

Code as Agent Harness

The authors define Code as Agent Harness as code that is executable , inspectable , and stateful , making it the optimal medium to connect model reasoning, environment interaction, and persistent state.

Three‑Layer Architecture

Harness Interface (Interface Layer)

Code serves as a universal interface for agents, supporting three dimensions:

Reasoning : Externalizing intermediate computation to executables or symbolic solvers (Program‑Delegated Reasoning, Formal Verification, Iterative Code‑Grounded Reasoning).

Acting : Translating high‑level intent into executable policies (Grounded Skill Selection, Programmatic Policy Generation, Lifelong Code‑Based Agents).

Environment Modeling : Representing environments with executable artifacts (Structured World Representations, Execution‑Trace World Modeling, Code‑Grounded Evaluation Environments, Verifiable Environment Construction).

Harness Mechanisms (Mechanism Layer)

Ensures reliable long‑term execution through five interacting dimensions:

Planning : Four paradigms—Linear Decomposition, Structure‑Grounded Planning, Search‑Based Planning, Orchestration‑Based Planning.

Memory & Context Engineering : Five memory types—Working, Semantic, Experiential, Long‑Term, Multi‑Agent—plus context compaction and state offloading.

Tool Use : Four categories—Function‑Oriented, Environment‑Interaction, Verification‑Driven, Workflow‑Orchestration.

Harness Control : The Plan‑Execute‑Verify (PEV) loop with components—Plan as Contract, Sandboxed Execution, Permissioned State Transition, Deterministic Verification.

Agentic Harness Engineering (AHE) : Deep telemetry collects trace‑level data; an Evolution Agent diagnoses failures and proposes component revisions, which are evaluated and promoted via Governed Harness Mutation.

Scaling the Harness (Scaling Layer)

When task complexity exceeds a single agent's capacity, multi‑agent systems (MAS) share executable code artifacts, using execution feedback as objective convergence signals. Design space includes:

Specialized functional roles (Manager, Planner, Coder, Reviewer, Tester, Verifier).

Interaction modes (Collaborative Synthesis, Critique & Repair, Adversarial Validation, Reasoning Debate).

Workflow topologies (Waterfall, Agile/Iterative, Hierarchical, Star, Dynamic DAG).

Conclusion

The review serves as a detailed roadmap for researchers building coding agents, computer‑use agents, embodied AI, or scientific discovery agents.

Code as Agent Harness: Toward Executable, Verifiable, and Stateful Agent Systems
https://arxiv.org/pdf/2605.18747
https://github.com/YennNing/Awesome-Code-as-Agent-Harness-Papers
Code as Agent Harness: a panoramic review of executable, inspectable, stateful agent systems
Code as Agent Harness: a panoramic review of executable, inspectable, stateful agent systems
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

LLMMemoryMulti-AgentTool UsePlanningagent harnessCode as Agent
PaperAgent
Written by

PaperAgent

Daily updates, analyzing cutting-edge AI research papers

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.