Tagged articles
1 articles
Page 1 of 1
Machine Heart
Machine Heart
May 15, 2026 · Artificial Intelligence

How X2SAM Empowers Multimodal Models to Segment Images and Videos at Pixel Level

X2SAM is a unified multimodal large model that combines image and video segmentation with language and visual prompts, introduces a Mask Memory for temporal consistency, defines a new V‑VGD task, and achieves state‑of‑the‑art results while cutting training cost by over 30%.

Large Language ModelV-VGDX2SAM
0 likes · 9 min read
How X2SAM Empowers Multimodal Models to Segment Images and Videos at Pixel Level