Artificial Intelligence 15 min read

How Kuaishou Y‑Tech Leverages GANs for Real‑Time Face Attribute Editing in Short Videos

This article details Kuaishou Y‑Tech's practical deployment of GAN‑based high‑precision face attribute editing—covering gender, age, hair, and expression transformations—for short‑video effects, discussing background, business applications, technical challenges, and solutions across data preparation, model training, and mobile deployment.

Kuaishou Large Model

Apr 1, 2021

How Kuaishou Y‑Tech Leverages GANs for Real‑Time Face Attribute Editing in Short Videos

Background Introduction

Face effects are a key component of short‑video content creation, but traditional 2D/3D semantic methods often lack realism. Recent generative technologies such as VAE, GAN, AutoRegressive Models, and Normalizing Flow Models have advanced rapidly, with GANs becoming a prominent solution due to their ability to generate data distributions close to real images.

GANs provide strong realism, high clarity, end‑to‑end output, and enable automated image and video production, making them valuable for industrial special‑effects pipelines. Internationally, applications like FaceApp and Snapchat demonstrate the impact of GANs, while Kuaishou was the first in China to integrate GANs into short‑video effect creation, focusing on high‑precision face attribute editing (gender, age, hair, expression).

Business Applications

1. Kuaishou Magic Board : Since August 2019, the Magic Board feature has released effects such as "Child", "My Life", "Gender Swap", and "Big Smile", providing novel user experiences.

2. Yitian Camera Hair Growth : Realistic hair generation is crucial for visual appeal. Traditional fake‑hair overlays look artificial, whereas generative techniques enable high‑fidelity, natural‑looking hair growth effects.

Problem Analysis

Key challenges include:

Unstable GAN training leading to artifacts, spots, and local distortions.

Diverse effect requirements across server‑side and mobile‑side deployments.

User‑experience‑driven priorities, such as balancing realism with aesthetic appeal.

Technical Practice

Data Preparation : Collecting paired data (e.g., the same pose at different ages) is difficult, so non‑paired data are gathered and augmented using StyleGAN‑generated virtual samples. High‑quality, diverse data are essential for attributes like hair length, color, and texture.

Data Generation Models : Domain‑translation methods (CycleGAN, MUNIT, U‑GAT‑IT, StarGAN v2) are employed to convert non‑paired data into paired samples. Improvements include adaptive spatial attention, semi‑supervised cyclic training, and mask‑guided techniques (MaskGAN, SEAN, MichiGAN) to focus generation on specific facial regions.

Hair Generation : 3D hair templates are converted into masks and edge maps, which are fused with real images. Deformation operations and frequency‑domain separation ensure stable texture and color generation, while multi‑scale generators and latent‑space control via StyleGAN enable natural hair growth across domains.

Pixel‑to‑Pixel Model : Knowledge‑distillation converts unpaired models into paired ones for more stable training. Optimizations for mobile inference include the custom YCNN engine supporting NPU, METAL, DSP, OpenCL, and NEON back‑ends, as well as enhanced discriminators with pretrained features and multi‑scale designs.

Model Deployment & Material Design : Post‑processing steps such as beautification, sharpening, and atmospheric effects improve user perception. These operations can also be integrated into data‑generation pipelines to boost data quality.

Conclusion

While GANs have matured for face attribute manipulation, achieving optimal user experience within limited project timelines remains challenging. Ongoing research focuses on low‑data StyleGAN control, data‑quality enhancements, and efficient mobile deployment to continuously innovate short‑video effects.

References

[1] Liu et al., 2020. Generative Adversarial Networks for Image and Video Synthesis. [2] Goodfellow et al., 2014. Generative Adversarial Networks. [3] Karras et al., 2020. Analyzing and Improving the Image Quality of StyleGAN. [4] Zhu et al., 2017. Unpaired Image‑to‑Image Translation Using Cycle‑Consistent Adversarial Networks. [5] Huang et al., 2018. Multimodal Unsupervised Image‑to‑Image Translation. [6] Kim et al., 2020. U‑GAT‑IT. [7] Choi et al., 2020. StarGAN v2. [8] Cai et al., 2020. Frequency Domain Image Translation. [9] Shen et al., 2020. Interpreting the Latent Space of GANs for Semantic Face Editing. [10] Tewari et al., 2020. StyleRig. [11] Shoshan et al., 2021. GAN‑Control. [12] Lee et al., 2020. MaskGAN. [13] Zhu et al., 2020. SEAN. [14] Tan et al., 2020. MichiGAN.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

computer vision GaN Kuaishou StyleGAN Mobile Deployment face editing short video effects

Written by

Kuaishou Large Model

Official Kuaishou Account

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.