Artificial Intelligence 12 min read

Deformable Anchor Model for Image‑Driven Motion Transfer

The Deformable Anchor Model (DAM) and its hierarchical extension (HDAM) enhance image‑driven motion transfer by introducing a learned root (and intermediate) node as a structural prior for keypoint detection, yielding clearer motions, lower reconstruction errors across multiple benchmarks, while preserving the speed of the original First‑Order Motion Model.

Alimama Tech
Alimama Tech
Alimama Tech
Deformable Anchor Model for Image‑Driven Motion Transfer

Problem Introduction – What is Image‑Driven Motion Transfer

Image‑driven motion transfer (also called image animation) generates a video by combining a single source image with a driving video. The output video keeps the appearance of the source image while adopting the motion of the driving video.

Applications

Used for entertainment video generation (e.g., the popular "Ma Yi Ya Hei" meme) and for e‑commerce, where static product images are animated to increase ad engagement.

Main Contributions

Building on the First‑Order Motion Model (FOMM), we propose a Deformable Anchor Model (DAM) that introduces a root node as a structural prior for keypoint detection. A hierarchical version (HDAM) adds an intermediate node, forming a root‑branch‑leaf structure.

Baseline – FOMM

FOMM is a model‑free method that predicts source and driving keypoints and their local affine transforms to compute optical flow, without relying on external keypoint detectors.

Our Improvement – Deformable Keypoint Model (DAM)

We add a learned root node that imposes a spatial prior on keypoints, penalizing implausible configurations. During inference only keypoints are predicted, preserving FOMM’s speed.

Experimental Comparison

Qualitative and quantitative results on TaiChiHD, FashionVideo, MGIF and VoxCeleb1 show that DAM/HDAM produce clearer motions and lower reconstruction errors (L1, AED, AKD, MKR) than FOMM, Monkey‑Net and RegionMM.

Paper and Download

Structure‑Aware Motion Transfer with Deformable Anchor Model – CVPR 2022. https://arxiv.org/abs/2204.05018

References

[1] Liu, Wen. “Impersonator – Your Dance, My Move.” Zhihu. [2] Xu, Borun, et al. “Move As You Like: Image Animation in E‑Commerce Scenario.” ACM MM 2021. [3] Siarohin, Aliaksandr, et al. “First Order Motion Model for Image Animation.” NeurIPS 2019. [4] Felzenszwalb, Pedro F., et al. “Object Detection with Discriminatively Trained Part‑Based Models.” TPAMI 2010. [5] Siarohin, Aliaksandr, et al. “Animating Arbitrary Objects via Deep Motion Transfer.” CVPR 2019. [6] Siarohin, Aliaksandr, et al. “Motion Representations for Articulated Animation.” CVPR 2021. [7] Zablotskaia, Polina, et al. “DWNet: Dense Warp‑Based Network for Pose‑Guided Human Video Generation.” BMVC 2019. [8] Nagrani, Arsha, Joon Son Chung, and Andrew Zisserman. “VoxCeleb: A Large‑Scale Speaker Identification Dataset.” arXiv 2017.

Computer Visiondeformable keypoint modelFOMMimage animationmotion transfer
Alimama Tech
Written by

Alimama Tech

Official Alimama tech channel, showcasing all of Alimama's technical innovations.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.