AI‑Generated Pet Companions for the New Xiaomi SU7: How Vision Forge Brings Your Own Pet to the Car Cockpit
The article details how Xiaomi's Vision Forge AI engine creates a high‑fidelity, lively 3D pet avatar from a single photo, integrates it into the SU7 cockpit with precise matting and motion, and explains the underlying model architecture, reinforcement‑learning enhancements, and inference optimizations that make the feature both realistic and responsive.
01 Light‑Touch Pet Companion
For many families pets are members, yet traveling with them is limited by space and motion. Xiaomi’s large‑model team built Vision Forge so that users can upload a photo of their pet, have the AI generate a realistic 3D avatar, and pin it to the car‑machine screen as a virtual co‑driver.
High fidelity: The model extracts distinctive features such as color, pattern, ear shape, eye spacing, and even the wetness of the nose, ensuring the virtual pet is instantly recognizable.
High liveliness: Beyond static replication, the engine adds “cute‑factor” elements that make the pet blink, wag its tail, and react instantly to screen taps.
High naturalness: Vision Forge provides precise cut‑out and background fusion, so the pet blends seamlessly with any cockpit backdrop, appearing as if it truly sits on the dashboard.
02 Self‑Developed Generation Engine
The realistic pet effect relies on the Vision Forge agent, which combines image generation, video generation, and segmentation‑fusion technologies within a large‑model framework. User‑provided pet photos and car‑machine background images are processed to produce animated pet avatars that are then integrated into the cockpit environment.
Key technical upgrades include:
MoE‑LoRA deep customization: A mixture‑of‑experts architecture builds multiple lightweight expert networks that automatically match species‑specific dynamics (e.g., cat agility, bird lightness), avoiding over‑fitting and enabling zero‑switch inference.
Reinforcement learning‑enhanced dynamics: The proprietary ACC‑FlowGRPO algorithm, combined with a reward model and multimodal large model, improves the stochastic differential equation (SDE) flow, yielding higher‑quality, more vivid pet motions.
Multi‑tens‑fold inference acceleration: Multi‑step engine optimizations, feature caching, and operator quantization compress generation time, delivering smooth, real‑time interaction.
03 Technology Enables a Warm Experience
To achieve seamless integration, the team created an end‑to‑end visual matting model that extracts pet fur at pixel precision, producing high‑accuracy alpha masks. This eliminates traditional multi‑stage pipelines (detection → segmentation → transparency) and ensures each hair strand is correctly rendered, even under complex lighting.
Additional edge‑case handling removes color bleeding (green/black edges) and corrects hue shifts so the pet’s fur matches the cabin illumination, resulting in realistic semi‑transparent overlays that maintain focus on the pet.
The engineers, many of whom are pet owners themselves, emphasize that the AI‑generated pet is meant to provide an emotional, temperature‑rich experience: accompanying children on trips, adding fun for couples, and preserving memories of departed pets.
Future work will explore further applications of generative agents in the cockpit, continuing to deliver emotionally resonant, AI‑driven features for Xiaomi car owners.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
