SVOR Wins CVPR 2026 Video Object Removal Challenge – Xiaomi’s Open‑Source Solution for Three Tough Problems
The article introduces SVOR, a Xiaomi‑developed video object removal framework that tackles shadow residues, motion jitter, and mask defects with MUSE, DA‑Seg, and a two‑stage training pipeline, achieves new SOTA on multiple benchmarks, and clinches first place in the CVPR 2026 video removal contest, with all code and models released publicly.
Video object removal lets users erase unwanted elements—such as pedestrians or stray objects—from recorded scenes, a capability that becomes essential when re‑shooting is impossible. In practice, most academic methods assume ideal conditions, but real‑world videos suffer from three major imperfections: lingering shadows after erasure, motion‑induced flickering when objects move quickly, and inaccurate masks produced by AI segmentation.
SVOR Framework
To address these issues, Xiaomi’s large‑model team proposes SVOR (Stable Video Object Removal), which integrates three key technologies.
MUSE (Mask Union for Stable Erasure)
MUSE adopts a windowed‑union strategy instead of processing each frame independently. By aggregating mask information over a temporal window, it preserves the continuity of fast‑moving objects, eliminating the “blink‑and‑miss” effect that plagues conventional frame‑by‑frame approaches.
DA‑Seg (Denoising‑Aware Segmentation)
DA‑Seg acts as a corrective layer for imperfect masks. It incorporates denoising‑aware segmentation to repair boundary errors, granting SVOR robust tolerance to mask defects and ensuring that residual shadows or partial masks do not degrade the final reconstruction.
Curriculum‑Style Two‑Stage Training
The training pipeline first pre‑trains the model on real‑background videos in a self‑supervised manner, allowing it to learn natural temporal dynamics (“learning to walk”). The second stage fine‑tunes the network on synthetic data specifically crafted to handle shadows and reflections (“learning to run”). This staged approach dramatically improves cross‑scene adaptability.
Performance and Competition Results
SVOR attains new state‑of‑the‑art performance on several standard video removal datasets and degraded‑mask benchmarks, surpassing previous methods in both quantitative metrics and visual quality. In the CVPR 2026 Physical‑Perception Video Instance Removal Challenge, SVOR outperformed 17 other teams, securing first place with a large margin in physical‑perception scores, artificial‑intelligence ratings, and overall ranking.
Open‑Source Release
The complete codebase, released under the Apache 2.0 license, is available at https://github.com/xiaomi-research/svor. The accompanying paper (arXiv 2603.09283) and a ready‑to‑use skill package (https://clawhub.ai/wangfei1204/mi-visionforge-svor) enable developers and creators to integrate SVOR directly into their workflows.
Implications
Video creators: more natural erasure of unwanted elements without ghosting or flicker.
Developers: an open‑source, production‑ready library for building advanced video editing tools.
Industry: video restoration moves from a research prototype toward practical, real‑world deployment.
By openly sharing the model and evaluation pipeline, the team encourages ecosystem growth and further innovation in video AI.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
