Mobile Real-Time Portrait Segmentation for Youku Bullet Comment Passthrough
To enable real‑time bullet‑comment passthrough on Youku’s mobile app, the team built a million‑scale portrait dataset and designed the AirSegNet series—CPU, GPU, and server variants—using VGG‑style nets, edge‑aware losses, and hybrid CPU‑GPU inference, achieving 0.98 IoU and sub‑15 ms latency on most devices.
As video platforms introduce bullet comment passthrough features, Youku (Alibaba's video platform) needed mobile-side human portrait segmentation technology to complement their server-side solution. This article presents a comprehensive solution for real-time portrait segmentation deployed on Youku's mobile app for bullet comment passthrough.
1. Business Background
While server-side segmentation offers stable quality and high accuracy, it incurs storage and bandwidth costs and cannot meet real-time requirements for trending videos. This drove the need for mobile-side portrait segmentation with both high accuracy and real-time performance.
2. Salient Portrait Segmentation
The solution focuses on segmenting salient human figures in videos—areas in focus—rather than background or non-focused regions. The team built a million-scale dataset covering modern urban dramas, historical dramas, and military content, with various poses including half-body, full-body, single-person, and multi-person scenarios. Special edge cases like backlight, low-light, and reaching gestures were also collected.
3. Model Design - AirSegNet Series
Through extensive experiments on Alibaba's MNN mobile inference framework, the team found that VGG-style straight network designs performed best. They developed the AirSegNet series with three variants: AirSegNet-CPU, AirSegNet-GPU, and AirSegNet-Server. Key design innovations include using 1x1 convolutions in the decoder to reduce computation, fusing multi-scale low-level features (x2, x4, x8), and setting align_corners=False for more accurate edge segmentation.
4. Training Optimization
Multiple effective strategies were employed: (1) Background weight calculation to address misclassification; (2) Edge weighting using 5x5 kernel dilation and erosion to identify edges with 5x loss weight; (3) A novel clustering loss to improve segmentation accuracy; (4) TopK loss for handling hard negative samples. The final portrait IOU reached 0.98.
5. Post-Processing Optimization
To address aliasing artifacts at different resolutions: (1) Edge optimization using 3x3 Gaussian blur fused into the network, plus curve transformation to reduce transition areas; (2) Momentum-based frame-to-frame jitter suppression with adaptive thresholds; (3) Frame skipping for stable scenes to reduce computation.
6. Engineering Deployment
Integrated with Alibaba's PixelAI SDK, the solution was tested across Android and iOS devices. Key optimizations included: (1) Dual CPU+GPU model initialization to solve GPU initialization latency; (2) CPU+GPU hybrid model distribution based on device capability. The solution achieves under 15ms inference time for 90%+ of devices, enabling seamless bullet comment passthrough experience.
Youku Technology
Discover top-tier entertainment technology here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.