Tag

video understanding

0 views collected around this technical thread.

Kuaishou Large Model
Kuaishou Large Model
Jun 5, 2025 · Artificial Intelligence

7 Kuaishou Papers Accepted at ACL 2025 Reveal Cutting‑Edge AI Advances

Kuaishou's foundational large‑model team secured seven papers at the prestigious ACL 2025 conference, covering alignment bias during model training, safety in inference, decoding strategies, fine‑grained video‑temporal understanding, and new evaluation benchmarks that push the frontier of multimodal large language models.

ACL 2025Large Language Modelsbenchmark
0 likes · 16 min read
7 Kuaishou Papers Accepted at ACL 2025 Reveal Cutting‑Edge AI Advances
Kuaishou Tech
Kuaishou Tech
Jun 5, 2025 · Artificial Intelligence

7 Kuaishou AI Papers Accepted at ACL 2025: Video Understanding & Safe LLM Decoding

Kuaishou’s foundational large-model team has secured seven papers at ACL 2025, spanning alignment bias in training, safety defenses during inference, decoding strategies, fine-grained video-temporal understanding, reward fairness in RLHF, multimodal captioning benchmarks, and methods to curb hallucinations in vision-language models.

ACLAI safetyLarge Language Models
0 likes · 13 min read
7 Kuaishou AI Papers Accepted at ACL 2025: Video Understanding & Safe LLM Decoding
DaTaobao Tech
DaTaobao Tech
Aug 21, 2023 · Artificial Intelligence

Action Sensitivity Learning for Temporal Action Localization

The paper presents Action Sensitivity Learning (ASL), a framework that models frame‑wise importance at both class‑level (via learnable Gaussian distributions) and instance‑level (using quality scores), integrates these weights into classification and regression losses, adds a contrastive InfoNCE term, and achieves state‑of‑the‑art temporal action localization performance across six benchmark datasets.

Action Sensitivity LearningTemporal Action Localizationcomputer vision
0 likes · 8 min read
Action Sensitivity Learning for Temporal Action Localization
HomeTech
HomeTech
Jul 7, 2023 · Artificial Intelligence

Multi-Modal Video Understanding and AIGC Video Generation at Autohome

This article presents a comprehensive multi-modal video understanding system for AIGC video generation, detailing technical architecture, GCN-based semi-supervised learning, and practical applications across automotive content scenarios.

AIGCBERTNeXtVLAD
0 likes · 8 min read
Multi-Modal Video Understanding and AIGC Video Generation at Autohome
DataFunSummit
DataFunSummit
Jun 22, 2022 · Artificial Intelligence

Generating and Applying Social Relationship Graphs for Video Understanding

This talk presents recent research on integrating dynamic analysis and graph machine learning to generate social relationship graphs from video, detailing hierarchical graph convolution networks, multimodal feature fusion, weakly supervised training, experimental results, and applications such as enhanced video retrieval and storyline understanding.

Weak Supervisiongraph neural networkmultimodal
0 likes · 11 min read
Generating and Applying Social Relationship Graphs for Video Understanding
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Jun 20, 2022 · Artificial Intelligence

Action Sequence Verification in Videos with CosAlignment Transformer (CAT)

The paper introduces Action Sequence Verification (ASV), a task that determines whether two videos follow the same ordered actions, provides the Chemical Sequence Verification dataset and re‑annotated COIN‑SV and Diving48‑SV sets, and proposes the CosAlignment Transformer (CAT) with intra‑step feature extraction, a Transformer‑based inter‑step encoder, and a sequence‑alignment loss that outperforms prior baselines and serves as a pre‑training model for video retrieval and classification.

Transformeraction verificationcomputer vision
0 likes · 7 min read
Action Sequence Verification in Videos with CosAlignment Transformer (CAT)
DataFunTalk
DataFunTalk
May 20, 2022 · Artificial Intelligence

Hierarchical Graph Convolutional Networks for Video Social Relationship Modeling

This article presents a multimodal approach that combines dynamic analysis and graph machine learning to generate and apply social relationship graphs in videos, detailing problem background, graph generation modules, applications such as video retrieval, experimental results, and future research directions.

AIWeak Supervisiongraph neural network
0 likes · 11 min read
Hierarchical Graph Convolutional Networks for Video Social Relationship Modeling
AntTech
AntTech
Oct 19, 2021 · Artificial Intelligence

Target Re‑identification and Occluded Video Instance Segmentation: Applications in Insurance Claims and Pet Identification

The article introduces pet identity verification using target re‑identification and occluded video instance segmentation, describes recent ICCV VIPriors competitions where Ant Group’s insurance team achieved top ranks, and explains how these computer‑vision techniques are applied to insurance claims, pet identification, and future AI scenarios.

Insurance AITarget Re-identificationcomputer vision
0 likes · 7 min read
Target Re‑identification and Occluded Video Instance Segmentation: Applications in Insurance Claims and Pet Identification
Tencent Advertising Technology
Tencent Advertising Technology
May 28, 2021 · Artificial Intelligence

Insights from the Tencent Advertising Algorithm Competition: Model Framework and Optimization Strategies

The article shares a Tencent competition champion’s practical TensorFlow‑based video ad solution, detailing data handling, model architecture, optimization tricks, multimodal fusion techniques, and experimental observations to help participants improve performance in the 2021 Tencent Advertising Algorithm Contest.

TensorFlowadvertising algorithmcompetition
0 likes · 7 min read
Insights from the Tencent Advertising Algorithm Competition: Model Framework and Optimization Strategies
Youku Technology
Youku Technology
Mar 23, 2021 · Artificial Intelligence

Text-Video Alignment Algorithm for Automated Short Video Production at Youku

Youku’s new text‑video alignment system automatically generates short video summaries by extracting multimodal video and linguistic features, matching sentences to clips through embedding and tag‑level models, and enabling AI‑driven auto‑editing that cuts production time from days to minutes.

BERTNLPcross-modal matching
0 likes · 10 min read
Text-Video Alignment Algorithm for Automated Short Video Production at Youku
iQIYI Technical Product Team
iQIYI Technical Product Team
Aug 7, 2020 · Artificial Intelligence

Boundary Content Graph Neural Network (BC‑GNN) for Temporal Action Proposal Generation

The Boundary Content Graph Neural Network (BC‑GNN) introduces a bipartite‑graph framework that jointly refines start/end boundary probabilities and segment‑content confidence, enabling more precise temporal action proposals and achieving state‑of‑the‑art results on ActivityNet‑1.3 and THUMOS14.

BC-GNNTemporal Action Proposalcomputer vision
0 likes · 10 min read
Boundary Content Graph Neural Network (BC‑GNN) for Temporal Action Proposal Generation
HomeTech
HomeTech
Mar 4, 2020 · Artificial Intelligence

Video Multi-Label Classification Using Graph Convolutional Networks

This paper introduces a method for video multi-label classification that incorporates label correlation features using graph convolutional networks, significantly improving classification performance.

GCNInceptionV3NeXtVLAD
0 likes · 7 min read
Video Multi-Label Classification Using Graph Convolutional Networks
DataFunTalk
DataFunTalk
Jul 26, 2019 · Artificial Intelligence

Hulu’s Video Content Understanding: Challenges, Practices, and Applications

This article summarizes Hulu Chief Research Officer Xie Xiaohui’s presentation on why video content understanding is essential, the technical challenges involved, and Hulu’s end‑to‑end solutions—including fine‑grained segmentation, logo and subtitle detection, automated pipelines, tagging taxonomy, content generation, and vector embeddings—to improve recommendation, advertising, and search for massive video libraries.

AIHulucontent tagging
0 likes · 14 min read
Hulu’s Video Content Understanding: Challenges, Practices, and Applications
iQIYI Technical Product Team
iQIYI Technical Product Team
Dec 28, 2018 · Artificial Intelligence

Short Video Tagging Using Neural Networks

The paper presents a gated‑attention neural network that fuses audio, visual, and title text features to automatically generate high‑quality tags for short videos, achieving state‑of‑the‑art performance on the YouTube‑8M challenge and enabling scalable tagging and recommendation services with future plans for broader tag coverage and temporal segment tagging.

AIYouTube-8M datasetattention mechanisms
0 likes · 7 min read
Short Video Tagging Using Neural Networks