How Kuaishou’s ‘All‑Things AR’ Turns Real Objects into Interactive 3D Characters
‘All‑Things AR’ (万物AR) is a Kuaishou Y‑tech solution that lets users capture any real‑world object with a phone, automatically segments it using a custom AI model, and renders an animated 3D avatar via a lightweight SLAM‑based pipeline, enabling low‑cost, high‑quality AR experiences.
Background
In August 2020 a Japanese app called RakugakiAR allowed users to turn 2D doodles into 3D animated AR characters. The app quickly went viral, highlighting the appeal of turning real‑world sketches into moving avatars.
Concept of “All‑Things AR”
Inspired by RakugakiAR, Kuaishou’s Y‑tech team created “All‑Things AR”, a feature that lets users point a phone at any object, automatically segment the object, and render a lively 3D avatar without requiring the user to draw anything first.
Overall Pipeline
The workflow consists of generic object detection, object‑level segmentation (the “万物分割” step), passing the mask texture to an effect SDK, rendering the AR character, compositing with the background, and finally delivering the interactive AR experience.
Model Training and Data Preparation
To support segmentation of any object, the team collected large‑scale open‑source datasets and internal data, used a server‑side SOTA model to generate pseudo‑labels for new categories, and performed iterative manual annotation on hard cases.
Data augmentation (boundary smoothing, hole filling, random occlusion, lighting changes) was applied to improve robustness.
Segmentation Model Optimization
Two specialized models were trained: one for faces and one for generic objects, reducing complexity. A cascade architecture with multi‑task learning (boundary detection, classification) was introduced to suppress background errors.
Loss functions were enhanced with OHEM for hard‑example mining and contrastive learning to stabilize predictions.
Engineering Integration
The AI pipeline (Ykit) detects objects each frame and, when segmentation is triggered, returns both bounding boxes and masks. Separate handling for faces and generic objects ensures optimal performance on mobile devices.
The effect engine (FaceMagic) receives the mask texture via SDK calls, then uses a custom rendering engine (SKwai) to display the AR avatar.
AR Effect Development
Designers created three themed avatars (band, Ramadan, Olympics) with 3D models, skeletal rigs, and particle effects. Adaptive scaling ensures the avatar fits objects of varying shapes.
Visual polish includes color correction, fake shadows, and particle systems for dynamic effects.
Camera Localization (SLAM)
A lightweight SLAM system estimates the phone’s pose and reconstructs a sparse 3D map, enabling stable placement of virtual objects. For simple AR scenes, a plane‑assumption SLAM variant runs at >15 fps on low‑end devices.
Conclusion
Through coordinated advances in AI segmentation, real‑time rendering, and efficient SLAM, the All‑Things AR feature was delivered at scale on Kuaishou, driving user engagement and opening new business opportunities.
Kuaishou Large Model
Official Kuaishou Account
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.