High-Fidelity Image-to-Video Generation for E-commerce with AtomoVideo and Noise Rectification
Alibaba’s AI team introduced AtomoVideo, a diffusion‑based image‑to‑video generator enhanced by a training‑free Noise Rectification module that adds and corrects controlled noise to eliminate first‑frame errors, enabling merchants to automatically create high‑fidelity 4‑second 720p product videos with strong temporal consistency for e‑commerce advertising.
In the e‑commerce domain, video content is becoming a key medium for advertising, but traditional production is costly and labor‑intensive.
To address this, Alibaba’s AI team has developed AtomoVideo, a diffusion‑based image‑to‑video generation system, and a training‑free Noise Rectification module that improves fidelity without extra model training.
Noise Rectification adds controlled noise to the input image, then corrects the predicted noise during DDIM denoising, eliminating the first‑frame error and preserving product details.
AtomoVideo upgrades the base T2V model to an I2V model with multi‑granularity image injection, high‑quality dataset construction, and progressive motion‑intensity training, achieving 4‑second, 720p videos with high temporal consistency.
The system has been deployed in Alibaba’s advertising platforms, enabling merchants to generate dynamic product videos automatically, with examples shown in the article.
Future work includes extending video length, improving stability, and exploring larger‑scale AIGC models such as Sora.
Alimama Tech
Official Alimama tech channel, showcasing all of Alimama's technical innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.