ATLAS: One Word Unifies Agentic and Latent Visual Reasoning
ATLAS introduces a discrete functional token that simultaneously serves as an agentic operation and a latent reasoning unit, enabling large multimodal models to perform visual tasks without external tools or intermediate image generation, and achieves competitive results through SFT‑plus‑RL training and a token‑level gradient‑anchor technique.
