7 Kuaishou AI Papers Accepted at ACL 2025: Video Understanding & Safe LLM Decoding
Kuaishou’s foundational large-model team has secured seven papers at ACL 2025, spanning alignment bias in training, safety defenses during inference, decoding strategies, fine-grained video-temporal understanding, reward fairness in RLHF, multimodal captioning benchmarks, and methods to curb hallucinations in vision-language models.
The 63rd Annual Meeting of the Association for Computational Linguistics (ACL) will take place from July 27 to August 1 in Vienna. The conference has announced its paper acceptance list, and Kuaishou’s foundational large-model team has seven papers selected.
The accepted works cover frontier topics of large models, including alignment bias during training, safety protection at inference, decoding strategies and reliability, video-temporal understanding, and evaluation benchmarks.
Paper 01: TUNA – Comprehensive Fine‑grained Temporal Understanding Evaluation on Dense Dynamic Videos
Type: ACL 25 Main
Link: https://friedrichor.github.io/projects/TUNA/
Abstract: Existing video‑understanding benchmarks treat temporal elements such as shots, scenes, actions, and attributes separately or focus on limited aspects, ignoring overall video coherence. TUNA introduces a temporal‑focused benchmark for dense dynamic videos with two complementary tasks—video description and question answering—featuring diverse scenes, dynamic attributes, interpretable and robust evaluation metrics. Evaluation of leading models on TUNA reveals challenges like limited action description, insufficient multi‑entity understanding, and insensitivity to camera motion.
Paper 02: Root Defense Strategies – Ensuring Safety of LLM at the Decoding Level
Type: ACL 25 Main
Link: https://arxiv.org/pdf/2410.06809
Abstract: As large language models (LLMs) advance, the risk of harmful outputs from erroneous or malicious prompts grows. Existing jailbreak defenses operate only at the pre‑fill stage and do not fully exploit decoding‑stage information, leading to lower effectiveness and robustness, and they often sacrifice usefulness. This work studies LLMs’ ability to assess token danger, quantifies it, and proposes a decoding‑oriented, step‑wise defense that directly corrects harmful queries rather than rejecting them, using speculative decoding to maintain usability and speed. Experiments show improved safety without affecting inference speed.
Paper 03: Towards Reward Fairness in RLHF – From a Resource Allocation Perspective
Type: ACL 25 Main
Link: https://arxiv.org/pdf/2505.23349
Abstract: Reward functions serve as proxies for human preferences in Reinforcement Learning from Human Feedback (RLHF). Imperfect rewards can introduce biases such as length preference, harming alignment. This paper treats reward as a resource to be allocated, balancing utility and fairness. Two fairness mechanisms—regularization and coefficient—are introduced for validation and RL stages, yielding fair reward and policy models. Experiments demonstrate more equitable alignment of LLMs with human preferences.
Paper 04: HAIC – Improving Human Action Understanding and Generation with Better Captions for Multimodal Large Language Models
Type: ACL 25 Main
Link: https://arxiv.org/abs/2502.20811
Abstract: Multimodal large language models have progressed in video understanding, yet lack high‑quality data for human‑action videos, limiting performance. The authors propose a two‑stage annotation pipeline to collect videos with clear human actions and annotate them with standardized, attribute‑rich, temporally ordered descriptions. The resulting HAICTrain (126 k video‑caption pairs) and HAICBench (500 manually annotated pairs plus 1 400 QA pairs) enable comprehensive evaluation. Training on HAICTrain significantly improves action understanding and video‑to‑text generation quality.
Paper 05: GODBench – A Benchmark for Multimodal Large Language Models in Video Comment Art
Type: ACL 25 Main
Link: https://stan-lei.github.io/KwaiMM-Dialogue/paper3-godbench.html
Abstract: Video comment art enriches user engagement through humor, satire, or emotional resonance, demanding deep cultural and contextual understanding. While multimodal LLMs excel in STEM tasks, they struggle with creative video comment generation. Existing benchmarks lack modality diversity and coverage. GODBench introduces a multimodal benchmark for evaluating MLLMs’ ability to generate artistic video comments, and proposes a Ripple of Thought (RoT) multi‑step reasoning framework that markedly enhances creative generation capabilities.
Paper 06: Mixture of Decoding – An Attention‑Inspired Adaptive Decoding Strategy to Mitigate Hallucinations in Large Vision‑Language Models
Type: ACL 25 Findings
Link: https://arxiv.org/pdf/25
Abstract: Large vision‑language models (LVLMs) achieve impressive results but still suffer from hallucinations. The proposed Mixture of Decoding (MoD) dynamically adjusts decoding based on attention correctness: when model‑focused tokens align with image tags, a complementary strategy amplifies key information; when misaligned, a contrasting strategy suppresses misleading cues. Experiments show MoD outperforms existing decoders across major benchmarks, effectively reducing hallucinations.
Paper 07: VidCapBench – A Comprehensive Benchmark of Video Captioning for Controllable Text‑to‑Video Generation
Type: ACL 25 Findings
Link: https://arxiv.org/pdf/2502.12782
Abstract: Controllable text‑to‑video (T2V) models rely on high‑quality video‑caption pairs, yet current evaluations separate caption quality from T2V generation. VidCapBench provides a caption evaluation framework independent of format, annotating videos with aesthetic, content, motion, and physical law attributes, split into automatically and manually assessable subsets. Extensive evaluation of state‑of‑the‑art caption models demonstrates VidCapBench’s stability and comprehensiveness, and its scores correlate strongly with T2V quality metrics, offering valuable guidance for T2V training.
These papers collectively showcase Kuaishou’s advances in AI research, spanning multimodal understanding, safety, fairness, and evaluation benchmarks.
Kuaishou Tech
Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.