Artificial Intelligence 6 min read

Recent ACM MM Papers Accepted by Alibaba Entertainment Group

Alibaba Entertainment Group secured four ACM MM paper acceptances, presenting a probabilistic graphical model for crowdsourced visual quality assessment, an attention‑driven Siamese network with reinforcement learning for robust object tracking, a scene‑aware context‑graph method for unsupervised video anomaly detection, and a cross‑modal graph‑matching approach for visual grounding.

Youku Technology

Aug 6, 2020

Recent ACM MM Papers Accepted by Alibaba Entertainment Group

Recently, the ACM International Conference on Multimedia (ACM MM), a top‑tier CCF A conference, announced its accepted papers. Alibaba Entertainment Group has four papers accepted.

A probabilistic graphical model for analyzing the subjective visual quality assessment data from crowdsourcing

Authors: Jing Li, Sui Yi Ling, Jun Le Wang, Patrick Le Callet

Abstract: This paper models the video quality assessment process using a probabilistic graphical model. Unlike traditional approaches that assume a Gaussian distribution for quality scores and treat the ground truth as a mean, the authors propose using an ordinal categorical distribution to describe the ground truth. The method can accurately predict ground‑truth quality scores and infer the probability of non‑compliant raters in crowdsourced experiments. Experiments on crowdsourcing data show that the proposed ground‑truth recovery outperforms state‑of‑the‑art methods, achieving higher precision for cost‑effective, high‑accuracy subjective testing.

Siamese Attentive Graph Tracking

Authors: Fei Zhao, Ting Zhang, Chao Ma, Ming Tang, Jin Qiao Wang, Xiao Bo Wang

Abstract: To address large appearance variations of targets in visual object tracking, this work proposes an attention‑based Siamese network combined with an actor‑critic deep reinforcement learning framework that learns a cascade bounding‑box regression strategy. The algorithm adapts well to appearance changes during tracking and extracts more discriminative features. It achieves state‑of‑the‑art performance on multiple benchmarks such as OTB and VOT.

Scene‑Aware Context Reasoning for Unsupervised Abnormal Event Detection in Videos

Authors: Che Sun, Yun De Jia, Yao Hu, Yu Wei Wu

Abstract: This paper introduces a scene‑aware context reasoning method that leverages contextual information in visual features for unsupervised abnormal event detection in videos. A spatio‑temporal context graph is constructed to model object appearance, inter‑object relations, and scene type. Contextual information is encoded on graph nodes and edges and refined through multiple message‑passing RNNs. A graph‑based deep Gaussian mixture model is also proposed for unsupervised scene clustering. Frame‑level abnormal scores are computed from the context, and extensive experiments on UCF‑Crime, Avenue, and ShanghaiTech demonstrate the effectiveness of the approach.

Visual‑Semantic Graph Matching for Visual Grounding

Authors: Chen Chen Jing, Ming Tao Pei, Yu Wei Wu, Yao Hu, Yun De Jia, Qi Wu

Abstract: Visual grounding is defined as a graph matching problem that aligns a visual scene graph with a language scene graph. Because the two graphs are heterogeneous, a cross‑modal graph neural network is employed to learn unified node representations that capture both semantic and structural information. The matching problem is reduced to a linear assignment problem, and permutation loss together with semantic cycle‑consistency loss are introduced to handle cases with or without ground‑truth correspondences. The proposed method demonstrates strong performance on visual expression understanding and phrase grounding tasks.

Further paper analyses will be released soon; stay tuned.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Visual Grounding graph neural networks crowdsourcing Object Tracking

Written by

Youku Technology

Discover top-tier entertainment technology here.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.