Artificial Intelligence 7 min read

Towards Smooth Video Composition: A New Benchmark for GAN‑Based Video Generation

Researchers from multiple institutions propose a GAN‑based video generation framework that explicitly models short‑, medium‑, and long‑range temporal relations, introduces B‑spline motion embeddings and temporal shift modules, and demonstrates substantial quality improvements across several video datasets.

AntTech

Dec 20, 2022

Towards Smooth Video Composition: A New Benchmark for GAN‑Based Video Generation

Recent advances in Generative Adversarial Networks (GANs) have enabled high‑resolution image synthesis and novel applications such as personalized editing and animation, yet generating coherent videos remains challenging due to the need to model complex temporal dynamics.

In "Towards Smooth Video Composition," authors from The Chinese University of Hong Kong, Shanghai AI Lab, Ant Research Institute, and UCLA present a new video generation method that tackles temporal modeling at three scales: short‑range (~5 frames), medium‑range (~5 seconds), and unlimited length. They extend the GAN framework by sampling a sequence of latent vectors and generating each frame with a shared content variable and a per‑frame motion variable.

Short‑range modeling addresses texture‑sticking artifacts observed in StyleGAN‑V videos. By adopting StyleGAN‑3 techniques and pre‑training the image backbone, the method improves fine‑grained motion consistency.

Medium‑range modeling introduces an explicit temporal shift module (TSM) into each layer of the discriminator, enabling richer temporal supervision and significantly reducing FVD scores on three datasets.

Unlimited‑length modeling replaces linear motion interpolation with B‑spline‑based motion embeddings, yielding smoother motion trajectories and mitigating periodic jitter. A low‑rank constraint on motion embeddings further reduces repetitive artifacts.

The approach is evaluated on YouTube Driving, Timelapse, and Taichi‑HD datasets, achieving notable gains in both image quality (FID) and video quality (FVD) compared with prior works such as MoCoGAN and StyleGAN‑V. Visual results illustrate smoother, more natural motion and the elimination of texture‑sticking and jitter.

Overall, the proposed benchmark and associated improvements provide a simple yet effective baseline for GAN‑based video synthesis, advancing the state of the art across multiple temporal scales.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

GaN Video Generation temporal modeling B-spline StyleGAN-V

Written by

AntTech

Technology is the core driver of Ant's future creation.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.