Artificial Intelligence 12 min read

Mastering Video Object Segmentation: 3 Research Paths & Alibaba’s Latest Advances

This article explains video object segmentation, outlines the three main research directions—semi‑supervised, interactive, and unsupervised—describes Alibaba’s Moku Lab breakthroughs and competition results, and discusses future plans to improve segmentation in complex scenes.

Alibaba Cloud Developer

Sep 18, 2019

Mastering Video Object Segmentation: 3 Research Paths & Alibaba’s Latest Advances

Video Object Segmentation (VOS) aims to extract the foreground object region from every frame of a video, providing essential material for content creation such as 3D‑effect videos.

In the computer‑vision community, VOS research is divided into three directions, which correspond to the three tracks of the DAVIS Challenge 2019:

Semi‑supervised VOS (one‑shot VOS)

Interactive VOS

Unsupervised VOS

Semi‑supervised Video Object Segmentation

Also called one‑shot VOS, this approach receives a ground‑truth mask for the target object in the first frame and propagates the segmentation to subsequent frames. Challenges include similar foreground/background colors and appearance changes such as new instances of the same object appearing.

Algorithms are categorized into online‑learning and offline‑learning methods. Online‑learning methods fine‑tune a model on the first‑frame mask (e.g., Lucid datadreaming, OSVOS, PreMVOS) achieving high accuracy but requiring heavy computation. Recent offline methods (e.g., FEELVOS, Space‑time Memory Network) use pre‑trained models for faster inference.

Evaluation metrics are mean Jaccard index and F‑measure, which assess region overlap and boundary accuracy.

Interactive Video Object Segmentation

Interactive VOS, emerging since last year, replaces the first‑frame ground‑truth with user interactions on any frame (bounding boxes, scribbles, edge points). The typical pipeline involves five steps: user provides interaction, an interactive image segmentation algorithm produces a mask for that frame, the mask is propagated to other frames using semi‑supervised VOS, the user reviews results and provides additional interactions on poorly segmented frames, and the process repeats until satisfactory.

Performance is measured by J&F@60s and AUC, reflecting both accuracy and speed under a limited number of user interactions.

Unsupervised Video Object Segmentation

Unsupervised VOS operates solely on RGB video without any additional input, aiming to segment salient objects automatically. It is the newest research direction and requires adding a saliency detection module before the core segmentation pipeline. Because object saliency is subjective, multiple objects may be predicted, and evaluation matches predicted objects to ground‑truth objects to compute mean J&F.

Alibaba Entertainment Moku Lab Research Status

Since March 2019, the lab has pursued semi‑supervised and interactive VOS. In May 2019 they released a baseline solution and achieved 4th place in the interactive track of DAVIS 2019. Their “VOS with robust tracking” strategy boosted interactive J&F@60s from 0.353 to 0.761, and their semi‑supervised method reached J&F = 0.763, comparable to state‑of‑the‑art results.

Future Plans

The lab will continue to improve segmentation in complex scenarios such as small objects, similar foreground/background colors, fast motion, and severe occlusion. Planned research includes online learning, space‑time networks, and region proposal & verification strategies, as well as advancing related image segmentation and multi‑object tracking technologies.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

computer vision video object segmentation interactive segmentation semi-supervised Alibaba Research unsupervised segmentation

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.