Terminal-World: Large-Scale Environment Synthesis for Terminal Agents

The paper presents Terminal-World, an automated pipeline that uses Agent Skills to generate diverse terminal‑agent training data, builds over 5,700 environments, and trains models that outperform existing baselines on multiple benchmarks despite using far less data.

Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Terminal-World: Large-Scale Environment Synthesis for Terminal Agents

Terminal agents enable large language models to execute tasks directly in command‑line environments, but their progress is limited by a scarcity of high‑quality training data. Existing data‑generation approaches rely on manually defined seed tasks or local GitHub repositories, leading to narrow task distributions, mismatched environments, and teacher trajectories containing excessive unguided exploration and inefficient operations.

To address these issues, the paper introduces Terminal‑World, a fully automated pipeline for synthesizing terminal‑agent data. The pipeline treats Agent Skills as primitive synthesis units, leveraging the implicit “what to do, when to use, how to execute” information in each skill to jointly generate task instructions, executable environments, and teacher trajectories. To further enlarge the synthesis space, Terminal‑World composes individual skills into skill teams and skill graphs, supporting multi‑role and cross‑domain complex tasks.

Using this pipeline, the authors built 5,723 training environments and trained three model families: Terminal‑World‑8B, Terminal‑World‑14B, and Terminal‑World‑32B. Experiments on six benchmarks show consistent superiority over existing terminal‑agent baselines. Notably, Terminal‑World‑32B, trained with only 1.2 % of the data used for Nemotron‑Terminal, achieves a Pass@1 score of 31.5 on Terminal‑Bench 2.0, surpassing Nemotron‑Terminal‑32B, and reaches a Pass@3 of 43.8.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

large language modelsbenchmarkAgent Skillsterminal agentsenvironment synthesisTerminal-World
Machine Learning Algorithms & Natural Language Processing
Written by

Machine Learning Algorithms & Natural Language Processing

Focused on frontier AI technologies, empowering AI researchers' progress.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.