How AutoMoT Leverages Large‑Model Understanding for End‑to‑End Driving Decisions and Trajectory Planning
AutoMoT introduces a unified Vision‑Language‑Action model that combines a 4B Qwen3‑VL understanding expert with a 1.6B action expert via layer‑wise shared attention and asynchronous inference, achieving state‑of‑the‑art results on Bench2Drive and nuScenes while preserving general VLM capabilities.
