Game Development 12 min read

AR+AI Powered Video Interactive Mini‑Games on iQIYI: Architecture, Face & Gesture Control, and Lua Game Layer

iQIYI’s AR+AI powered video interactive mini‑games blend a custom VideoAR engine with real‑time AI‑driven face and gesture detection, use lightweight Lua for game logic, and offer rapid hot‑updates, enabling diverse IP integrations that have attracted over a million participants and boosted viewer engagement.

iQIYI Technical Product Team
iQIYI Technical Product Team
iQIYI Technical Product Team
AR+AI Powered Video Interactive Mini‑Games on iQIYI: Architecture, Face & Gesture Control, and Lua Game Layer

iQIYI’s recent hot dramas have introduced a new way of watching: video interactive mini‑games that combine IP elements with playful video effects, attracting millions of users and inspiring many entertainment figures.

The article analyzes the technical implementation and product applications of these games.

AR+AI Dual Engine : To deliver a smooth interactive experience, iQIYI’s tech team integrated an AR engine (VideoAR) and an AI engine (SmileAR). The AR engine provides sticker rendering and AR prop support, while the AI engine supplies real‑time face and gesture detection.

Overall Architecture : VideoAR incorporates a self‑developed AR shooting engine and the AI engine. Lua is chosen for the upper‑level game logic because of its lightweight, high‑efficiency, and easy integration characteristics. All game‑related logic runs in Lua, reducing inter‑team coupling and enabling rapid development, quick iteration, and online hot‑updates.

Face Control : Accurate facial key‑point localization is essential for features such as face slimming, local makeup, and virtual wearables. The detection model is based on MobileNetV2, trimmed and quantized for mobile real‑time performance. Five optimization points are highlighted:

Lower latency and improve control accuracy by moving the heavy face‑detection module to an asynchronous thread and tracking the face on the main thread.

Use a quantized MobileNetV2 backbone to ensure real‑time inference on mobile devices.

Reduce model complexity for 106‑point and 240‑point versions by shrinking input size, channel numbers, and applying hard‑mining.

Adopt multi‑task learning to predict pose and expression together, cutting error by 10%.

Combine direct regression with heat‑map prediction during training, then remove the heat‑map branch at inference to keep accuracy without extra latency.

Gesture Control : Hand‑gesture recognition is implemented with an SSD detection model using MobileNet as the backbone. Quantization techniques accelerate inference, achieving real‑time gesture detection on mobile CPUs (Qualcomm Snapdragon, Huawei Kirin, MediaTek Helio). The quantization‑aware training pipeline adds only two lines of code to the TensorFlow training script.

Lua Game Layer : Inspired by traditional game engines, iQIYI wrapped the basic AR rendering capabilities into a configurable framework. Core elements such as sticker rendering, motion control, AR messaging, collision detection, and audio management are exposed via JSON configuration, which is then compiled into Lua scripts. This allows developers to create game logic without writing code directly, and the framework integrates with the internal MusesEffect tool for graphical configuration.

Multi‑Scenario Applications : The interactive mini‑games have been customized for dozens of IP programs (e.g., "Non‑Daily Party", "Youth With You 2", "Chinese New Rap 2020", etc.), providing new interactive marketing formats, boosting user‑generated content, and enhancing program viewership. They are also integrated into iQIYI’s children’s brand "QibaBu" to deliver educational content, such as English word learning through fruit‑eating gameplay.

Since launch, the games have accumulated over one million participants in five months, demonstrating strong user engagement and commercial value. Future plans include adding facial deformation, frame freezing, hand‑tracking, multi‑player modes, side‑storylines, and story‑linked experiences.

computer visionAIARLuamobile gamingVideo Interaction
iQIYI Technical Product Team
Written by

iQIYI Technical Product Team

The technical product team of iQIYI

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.