Zero‑Second Startup: iQIYI Playback Kernel Performance Optimization and 5.0 Architecture
iQIYI’s new Playback Kernel 5.0 introduces a decoupled pre‑decode component that creates a single hardware (or software) decoder and supplies pre‑decoded frames to multiple player instances, cutting start‑up latency from roughly 400 ms to about 35 ms and enabling true “zero‑second” playback across a wide range of Android devices.
Background : iQIYI’s large‑playback core runs on Android Mobile, Android TV, Apple TV, iPhone, iPad, GPad, macOS, Windows PC and supports live, VOD, ads, membership, VR/AR, interactive video, etc. After nine years and four major versions, the core is stable for long‑video playback, but new “scroll‑to‑play” scenarios (short‑video feeds, rapid switching) expose performance limits.
Problem : To achieve “zero‑second” start‑up, the product requires 2‑3 player instances for pre‑loading and instance switching. While this solves latency, it dramatically increases memory and thread usage, causing noticeable stutter on mid‑ and low‑end devices.
Investigation : The issue is most severe on Android devices with diverse hardware capabilities. Tests show that the dominant latency comes from decoder creation and opening, ranging from ~20 ms on high‑end phones to >350 ms on low‑end phones (some >500 ms).
Proposed Solutions :
Multi‑decoder scheme: create several decoders inside the core and pre‑decode, reducing memory/threads compared with multiple player instances. Works on mid‑high devices but still problematic on low‑end.
Software‑decode scheme: decoder creation <20 ms, but CPU usage and power consumption become high, especially for high‑bitrate streams.
Live‑playback architecture: open a decoder once and reuse it, leveraging Adaptive Playback on Android. Simplifies decoder creation but complicates timeline management for VOD, seeking, ads, etc.
All three have advantages and drawbacks; the final approach merges them.
Solution – Playback Kernel 5.0 : The new architecture adds a “pre‑decode” unit that is decoupled from the player instance. The pre‑decode unit creates a hardware decoder once (fallback to software if needed) and provides pre‑decoded frames to any player instance, bypassing the most time‑consuming decoding stage. This enables “zero‑second” start‑up while keeping API compatibility.
Architecture diagram (originally shown in the source) is omitted here, but the key change is the independent pre‑decode component.
Performance Test (Qualcomm Snapdragon 450) :
Comparison of total, business‑logic and decode‑render times (ms):
Total Time
Business Time
Decode/Render Time
4.0
395.85
72.30
323.55
5.0
35.14
24.21
10.93
The 5.0 version reduces total latency from ~396 ms to ~35 ms, meeting the “zero‑second” goal (sub‑50 ms on mid‑low devices, sub‑20 ms on high‑end).
Conclusion : By integrating multi‑decoder, software‑decode fallback, and live‑playback concepts into a unified pre‑decode architecture, iQIYI achieved a dramatic performance boost across a wide range of devices. The next steps will build on version 5.0 to deliver further innovations.
iQIYI Technical Product Team
The technical product team of iQIYI
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.