Deformable Anchor Model for Image‑Driven Motion Transfer
The Deformable Anchor Model (DAM) and its hierarchical extension (HDAM) enhance image‑driven motion transfer by introducing a learned root (and intermediate) node as a structural prior for keypoint detection, yielding clearer motions, lower reconstruction errors across multiple benchmarks, while preserving the speed of the original First‑Order Motion Model.
Problem Introduction – What is Image‑Driven Motion Transfer
Image‑driven motion transfer (also called image animation) generates a video by combining a single source image with a driving video. The output video keeps the appearance of the source image while adopting the motion of the driving video.
Applications
Used for entertainment video generation (e.g., the popular "Ma Yi Ya Hei" meme) and for e‑commerce, where static product images are animated to increase ad engagement.
Main Contributions
Building on the First‑Order Motion Model (FOMM), we propose a Deformable Anchor Model (DAM) that introduces a root node as a structural prior for keypoint detection. A hierarchical version (HDAM) adds an intermediate node, forming a root‑branch‑leaf structure.
Baseline – FOMM
FOMM is a model‑free method that predicts source and driving keypoints and their local affine transforms to compute optical flow, without relying on external keypoint detectors.
Our Improvement – Deformable Keypoint Model (DAM)
We add a learned root node that imposes a spatial prior on keypoints, penalizing implausible configurations. During inference only keypoints are predicted, preserving FOMM’s speed.
Experimental Comparison
Qualitative and quantitative results on TaiChiHD, FashionVideo, MGIF and VoxCeleb1 show that DAM/HDAM produce clearer motions and lower reconstruction errors (L1, AED, AKD, MKR) than FOMM, Monkey‑Net and RegionMM.
Paper and Download
Structure‑Aware Motion Transfer with Deformable Anchor Model – CVPR 2022. https://arxiv.org/abs/2204.05018
References
[1] Liu, Wen. “Impersonator – Your Dance, My Move.” Zhihu. [2] Xu, Borun, et al. “Move As You Like: Image Animation in E‑Commerce Scenario.” ACM MM 2021. [3] Siarohin, Aliaksandr, et al. “First Order Motion Model for Image Animation.” NeurIPS 2019. [4] Felzenszwalb, Pedro F., et al. “Object Detection with Discriminatively Trained Part‑Based Models.” TPAMI 2010. [5] Siarohin, Aliaksandr, et al. “Animating Arbitrary Objects via Deep Motion Transfer.” CVPR 2019. [6] Siarohin, Aliaksandr, et al. “Motion Representations for Articulated Animation.” CVPR 2021. [7] Zablotskaia, Polina, et al. “DWNet: Dense Warp‑Based Network for Pose‑Guided Human Video Generation.” BMVC 2019. [8] Nagrani, Arsha, Joon Son Chung, and Andrew Zisserman. “VoxCeleb: A Large‑Scale Speaker Identification Dataset.” arXiv 2017.
Alimama Tech
Official Alimama tech channel, showcasing all of Alimama's technical innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.