Kuaishou Mobile Mixed Reality System: Architecture, Algorithms, and Applications
This article presents Kuaishou's mobile mixed reality (MR) system, detailing its integration of deep learning, SLAM, and scene reconstruction for real‑time spatial computing, the design of a monocular depth‑estimation model, a lightweight 3D rendering engine, and its deployment across iOS and Android devices with various user‑facing effects.
Abstract: Kuaishou's mixed reality (MR) system combines deep learning, SLAM, and scene reconstruction to achieve real‑time spatial computation and scene understanding on mobile devices, enabling immersive virtual‑real interactions through effects such as "New Year Lantern", "Dream Sprite" and "Underwater World".
Mixed Reality Development Trend: XR encompasses VR, AR, and MR; MR extends AR by understanding the real world to enable interactive virtual‑real behaviors. Historically limited by algorithms and compute, recent advances in mobile NPUs, LiDAR, and depth‑learning have accelerated MR adoption.
Industry Status: Early MR devices like Microsoft Hololens 2 relied on dedicated hardware. Modern smartphones now embed NPU and LiDAR (e.g., Apple ARKit 4, Google ARCore 1.18 Depth API), allowing MR experiences without specialized sensors.
Kuaishou MR Layout: Since 2017 Kuaishou introduced planar and spatial AR effects. To overcome AR limitations, the MR team built a monocular camera‑based tracking and reconstruction pipeline, integrating a custom 3D engine (SKwai) to render realistic MR effects on all iOS/Android models.
Mobile MR System Architecture: The system consists of two modules: a monocular scene‑understanding system and a special‑effects rendering engine. The scene‑understanding module performs 6‑DOF pose estimation, semantic understanding, and dense mesh reconstruction, feeding the engine for occlusion, collision, lighting, and surface‑attachment effects.
Monocular Scene Understanding System: It includes three sub‑modules: (1) Motion tracking using VIO (visual‑inertial odometry) that fuses image frames with IMU data; (2) Monocular depth estimation guided by sparse point‑cloud maps and a deep learning model, ensuring temporal stability; (3) 3D reconstruction that fuses pose and dense depth to generate a refined mesh. The pipeline is optimized for diverse mobile hardware, covering ~1.5 billion users.
Depth‑Estimation Model Design: Built on a UNet backbone with dual encoders for RGB and sparse point maps, the model incorporates multi‑task learning (depth, semantics, normals) and spatial attention. Training uses diverse depth datasets with refinement networks, and model compression employs depth‑wise separable convolutions, pruning, and SE modules, achieving real‑time inference (8 ms on iPhone 11, 15 ms on iPhone 6).
Real‑Time Reconstruction: Using pose and dense depth, the system performs key‑frame selection and frame‑to‑frame filtering to produce stable surface meshes. Compared with traditional RGB‑D methods, it offers (1) dense reconstruction from a single RGB camera, (2) hash‑based storage for large scenes, and (3) plane‑semantic integration for higher accuracy.
Mobile Rendering Engine (SKwai): SKwai is a lightweight next‑generation 3D engine supporting PBR, IBL, deferred rendering, physics (collision, cloth, soft‑body), GPU particles, and an ECS‑based scripting system. It enables virtual lighting, occlusion, physical collision, surface‑attachment, and other MR effects on a wide range of mobile devices.
Value of Mixed Reality: MR merges virtual and physical worlds, offering immersive experiences for user creation, gaming, commerce, social interaction, and education. By delivering MR on commodity smartphones, Kuaishou democratizes access to advanced XR technologies and paves the way for future metaverse applications.
References: [1] Microsoft Hololens 2, [2] Apple ARKit 4, [3] Google ARCore Depth API, [4‑6] Academic papers on monocular depth estimation, [7] Kuaishou Spring Festival technical showcase, [8] Extended reality overview.
Kuaishou Tech
Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.