Artificial Intelligence 13 min read

Proactive Failure Recovery: How AgentChord Embeds Recovery Actions into Robot Task Graphs

AgentChord, a system presented at RSS 2026, anticipates potential robot manipulation failures by embedding recovery actions directly into a structured task graph, enabling immediate low‑latency switches to pre‑compiled recovery branches and achieving up to 99.2% success in simulated tasks and 77.5% on real robots.

Machine Heart

May 24, 2026

Proactive Failure Recovery: How AgentChord Embeds Recovery Actions into Robot Task Graphs

Background and Motivation

Robotic manipulation is moving from tightly structured industrial settings to open, real‑world environments where tasks involve long execution chains, complex object interactions, and unpredictable disturbances. Small deviations—such as a grasp that is not fully secure, a slight object displacement, or a pose error during hand‑over—can cause downstream steps to diverge from the original plan.

Traditional robot operating systems plan only a "normal" trajectory and react to failures after they occur. In long‑horizon tasks this reactive approach introduces extra latency and may lead to repeated roll‑backs and re‑planning.

AgentChord Overview

Researchers from The Chinese University of Hong Kong (Shenzhen), Cross‑Dimensional Intelligence, and Shenzhen Heetao Institute introduced AgentChord , a proactive failure‑recovery framework for robotic manipulation. The core question it addresses is: Can a robot, like a human, anticipate how to rescue a task before it starts? Instead of detecting failures at execution time and replanning, AgentChord predicts possible failures in advance and writes corresponding recovery actions into the task graph.

System Architecture

AgentChord models a manipulation task as a directed task graph whose nodes represent semantic sub‑goals (e.g., "grasp bottle", "move above cup", "pour") and edges represent action transitions. Three specialized agents populate and execute this graph:

Task Structuring Agent parses language instructions and the initial scene to construct a baseline “normal” task skeleton.

Recovery Orchestration Agent scans each critical step, imagines likely failure modes (object slip, pose shift, incomplete grasp, etc.), and inserts recovery nodes and edges that point back to appropriate downstream positions.

Execution Compilation Agent compiles both normal and recovery actions into executable robot programs and generates low‑latency monitoring functions that continuously read pose, point‑cloud, gripper state, and joint data.

When the monitor detects an anomaly, the robot instantly switches to the pre‑compiled recovery branch, corrects the state, and resumes the original task without a full re‑planning cycle. This "forward recovery" strategy keeps progress toward the final goal rather than reverting to the start.

Proactive Recovery Mechanism

The key insight is to treat failure recovery as part of the task plan, akin to writing an "execution score" with emergency passages before the performance begins. Normal actions form the main melody, while recovery actions are pre‑written variations that the monitor can trigger at the right moment.

Simulation Experiments

Evaluations were conducted in the EmbodiChain simulator and on a real CobotMagic dual‑arm robot across six scenarios: single‑arm pouring, dual‑arm pouring, table‑setting, block hand‑over, towel folding, and coffee‑tray placement. In simulation, three tasks (single‑arm pour, dual‑arm pour, table‑setting) were tested under varying probabilities of object‑drop disturbances.

AgentChord achieved the highest average success rate of 99.2% with an average execution time of 41.5 s , outperforming baselines such as Inner Monologue, DoReMi, ReKep, and Code‑as‑Monitor.

The advantage stemmed not only from more accurate failure detection but also from having recovery branches ready before a failure occurred, eliminating the need for costly on‑the‑fly model inference and re‑planning.

Real‑World Robot Experiments

On the dual‑arm CobotMagic platform, six real‑world tasks were evaluated. AgentChord obtained an average success rate of 77.5% with an average execution time of 92.2 s . By contrast, Code‑as‑Monitor achieved 72.5% success and 130.9 s execution time.

In fine‑grained collaborative tasks such as block hand‑over and dual‑arm pouring, the pre‑compiled recovery branches proved especially beneficial: when objects were displaced or poses deviated, the system quickly executed the appropriate recovery action instead of waiting for a full model inference.

Using Recovery Trajectories for Policy Training

Beyond runtime rescue, the recovery trajectories generated by AgentChord serve as valuable training data. In a single‑arm pouring experiment, replacing half of the successful trajectories with AgentChord‑generated recoverable failure trajectories improved the Sim2Real‑VLA policy’s success in 50 disturbance tests from 26/50 to 39/50.

This demonstrates that robot policies should learn not only how to succeed but also how to continue after errors.

Implications and Future Work

AgentChord offers a clear organizational framework: task definition, failure anticipation, recovery planning, and continuation are all expressed in a single interpretable graph. Limitations include reliance on large‑model predictions of common failure modes; rare or compound failures may still require dynamic diagnosis.

The modular design allows future upgrades—stronger vision‑language models, more robust 3D perception, richer skill libraries—to be plugged into the recovery‑enhanced task graph, extending its applicability from household service robots to complex assembly lines.

Ultimately, the system aims to shift robots from being passive post‑failure fixers to proactive agents that pre‑emptively allocate pathways for potential failures.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Simulation Large Language Model robotics failure recovery manipulation task graph

Written by

Machine Heart

Professional AI media and industry service platform

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.