Tagged articles
2 articles
Page 1 of 1
Data Party THU
Data Party THU
Oct 29, 2025 · Artificial Intelligence

Can Test-Time Scaling Unlock More Reliable Vision‑Language‑Action Robots?

The paper introduces RoboMonkey, a framework that applies a generate‑and‑verify paradigm and test‑time scaling to Vision‑Language‑Action models, showing that increasing sampling and verification at inference dramatically reduces action error across multiple VLA architectures, and presents scalable verifier training, synthetic data augmentation, and efficient deployment strategies.

AI researchAction VerificationRoboMonkey
0 likes · 8 min read
Can Test-Time Scaling Unlock More Reliable Vision‑Language‑Action Robots?
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Jun 20, 2022 · Artificial Intelligence

Action Sequence Verification in Videos with CosAlignment Transformer (CAT)

The paper introduces Action Sequence Verification (ASV), a task that determines whether two videos follow the same ordered actions, provides the Chemical Sequence Verification dataset and re‑annotated COIN‑SV and Diving48‑SV sets, and proposes the CosAlignment Transformer (CAT) with intra‑step feature extraction, a Transformer‑based inter‑step encoder, and a sequence‑alignment loss that outperforms prior baselines and serves as a pre‑training model for video retrieval and classification.

Action VerificationMultimodalTransformer
0 likes · 7 min read
Action Sequence Verification in Videos with CosAlignment Transformer (CAT)