Enabling Robots to “Think While Acting”: LingBot-VA Paper Accepted at RSS 2026
Researchers from AntLingbo and Hong Kong University present LingBot-VA, a causal world modeling framework for robot control that predicts future environment changes and generates actions, achieving up to 98.5% success on benchmarks and over 20‑point gains with only 50 real demonstrations, now open‑sourced after acceptance at RSS 2026.
Research Motivation
For robots, the difficulty lies not only in executing actions but also in understanding the resulting changes in the environment—for example, how the tabletop changes after picking up a cup or how object positions shift after pulling a drawer. The study introduces the ability for robots to first predict how the world will evolve and then decide on actions based on those predictions.
Technical Approach
We propose a causal world modeling framework for robot control and implement it as the world’s first open‑source autoregressive video‑action model, LingBot‑VA. The model continuously predicts environmental changes during task execution and generates the next action command, giving robots a human‑like “observe‑judge‑act” capability.
LingBot‑VA embeds causal relationships into its architecture: each prediction step relies only on prior observations and actions, respecting the forward‑in‑time nature of physical reality. The system uses a Mixture‑of‑Transformers (MoT) architecture that unifies video prediction and action generation within a single autoregressive diffusion framework, and it incorporates a closed‑loop inference mechanism that constantly ingests real‑world feedback to reduce error accumulation over long horizons.
Experimental Results
On the RoboTwin 2.0 benchmark covering 50 dual‑arm manipulation tasks, LingBot‑VA achieves average success rates of 92.0% in the Easy setting and 91.1% in the Hard setting. On the LIBERO benchmark it reaches 98.5%.
In real‑world evaluations involving six high‑difficulty challenges (long‑horizon, high‑precision, flexible and articulated object manipulation), LingBot‑VA adapts with only 50 real demonstration trajectories and improves overall success rate by more than 20 percentage points compared with the industry baseline π0.5, demonstrating strong data efficiency and generalization.
Open Source and Future Work
The paper’s acceptance at RSS 2026 marks international recognition of AntLingbo’s exploration of world‑model‑driven robot control. Model weights, training code, and inference scripts were released in January 2026 on the Magic‑Dock community, Hugging Face, and GitHub. The authors anticipate that this causal world‑modeling line will help robots move from command‑following toward deeper environment understanding, broader task generalization, and autonomous decision‑making.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
