Artificial Intelligence 10 min read

The Most Systematic 102‑Page Review of Agent Harnesses

This article provides a comprehensive overview of the "Code as Agent Harness" paradigm, detailing its three‑layer architecture, the roles of code in reasoning, acting, and environment modeling, the mechanisms that enable reliable long‑term execution, and how multi‑agent systems scale the harness through shared code and feedback loops.

PaperAgent

Jun 5, 2026

The Most Systematic 102‑Page Review of Agent Harnesses

Introduction

The rapid progress of large language models (LLMs) in code generation has shifted the focus from code as a final output to code as the operational foundation—called the Agent Harness—of autonomous systems.

Code as Agent Harness

The authors define Code as Agent Harness as code that is executable , inspectable , and stateful , making it the optimal medium to connect model reasoning, environment interaction, and persistent state.

Three‑Layer Architecture

Harness Interface (Interface Layer)

Code serves as a universal interface for agents, supporting three dimensions:

Reasoning : Externalizing intermediate computation to executables or symbolic solvers (Program‑Delegated Reasoning, Formal Verification, Iterative Code‑Grounded Reasoning).

Acting : Translating high‑level intent into executable policies (Grounded Skill Selection, Programmatic Policy Generation, Lifelong Code‑Based Agents).

Environment Modeling : Representing environments with executable artifacts (Structured World Representations, Execution‑Trace World Modeling, Code‑Grounded Evaluation Environments, Verifiable Environment Construction).

Harness Mechanisms (Mechanism Layer)

Ensures reliable long‑term execution through five interacting dimensions:

Planning : Four paradigms—Linear Decomposition, Structure‑Grounded Planning, Search‑Based Planning, Orchestration‑Based Planning.

Memory & Context Engineering : Five memory types—Working, Semantic, Experiential, Long‑Term, Multi‑Agent—plus context compaction and state offloading.

Tool Use : Four categories—Function‑Oriented, Environment‑Interaction, Verification‑Driven, Workflow‑Orchestration.

Harness Control : The Plan‑Execute‑Verify (PEV) loop with components—Plan as Contract, Sandboxed Execution, Permissioned State Transition, Deterministic Verification.

Agentic Harness Engineering (AHE) : Deep telemetry collects trace‑level data; an Evolution Agent diagnoses failures and proposes component revisions, which are evaluated and promoted via Governed Harness Mutation.

Scaling the Harness (Scaling Layer)

When task complexity exceeds a single agent's capacity, multi‑agent systems (MAS) share executable code artifacts, using execution feedback as objective convergence signals. Design space includes:

Specialized functional roles (Manager, Planner, Coder, Reviewer, Tester, Verifier).

Interaction modes (Collaborative Synthesis, Critique & Repair, Adversarial Validation, Reasoning Debate).

Workflow topologies (Waterfall, Agile/Iterative, Hierarchical, Star, Dynamic DAG).

Conclusion

The review serves as a detailed roadmap for researchers building coding agents, computer‑use agents, embodied AI, or scientific discovery agents.

Code as Agent Harness: Toward Executable, Verifiable, and Stateful Agent Systems
https://arxiv.org/pdf/2605.18747
https://github.com/YennNing/Awesome-Code-as-Agent-Harness-Papers

Code as Agent Harness: a panoramic review of executable, inspectable, stateful agent systems

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

LLM Memory Multi-Agent Tool Use Planning agent harness Code as Agent

Written by

PaperAgent

Daily updates, analyzing cutting-edge AI research papers

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.