AI-Powered Masked Danmaku: Design and Implementation
This article details the design and practical implementation of an AI-driven masked danmaku system that prevents comment overlay on video content, covering background, technology selection, instance segmentation methods, distributed task scheduling, mask generation, client rendering, performance optimizations, and future directions.
Background
Danmaku (bullet comments) appear over video and can enhance interaction, but dense comments obscure the video. To keep the fun of danmaku while preserving viewing experience, a mask‑based solution is proposed that lets comments avoid human regions in the video.
Technical Research and Selection
Video Frame Extraction
Frames are extracted using FFmpeg via the PyAV wrapper, which provides flexible decoding and supports common data formats such as numpy arrays.
Human Region Detection
Instance segmentation (a computer‑vision technique) is employed to identify human regions in each frame.
AI Framework Choice
Among TensorFlow and PyTorch, PyTorch was selected for its lower learning curve, ease of use, and rich pre‑trained model ecosystem.
Instance Segmentation Algorithm
After comparing Mask R‑CNN, YOLACT and BlendMask, BlendMask was chosen for its superior accuracy and 20 % speed advantage over Mask R‑CNN.
Open‑Source Detection Projects
Various detection libraries were evaluated (Detectron, maskrcnn‑benchmark, Detectron2, MMDetection, SimpleDet, Tensorflow Object Detection). Detectron2 was adopted because it is PyTorch‑based, FAIR‑maintained, modular, and already supports BlendMask via the AdelaiDet component.
Mask Storage Format
Human‑region masks are stored as SVG vector graphics to retain quality at any scale and keep file size minimal. Multiple SVGs are compressed and packaged for storage in the FastDFS file system.
Client Rendering
The front‑end renders masks using CSS3 mask-image on the danmaku layer, achieving the “mask‑danmaku” effect.
Mask Generation Design
A distributed task system is built to handle large video volumes efficiently. Videos are split into time‑based segments, each becoming a mask‑generation task processed in parallel across multiple machines.
Task Production
The producer analyzes video length, divides it (e.g., 0‑10 min, 11‑20 min), and stores task metadata (video URL, start/end times) in a database.
Task Scheduling
The scheduler dispatches tasks, monitors execution, recovers stalled tasks, and sends SMS alerts when anomalies occur.
Task Consumption
Consumers retrieve tasks, extract the relevant video segment with FFmpeg , read frames via PyAV , run Detectron2 instance segmentation, filter low‑confidence or unsuitable masks, generate PNG masks, convert them to SVG using potrace (later replaced by pypotrace ), pack SVGs with timestamps into a binary file, upload to FastDFS, and report the file URL back to the scheduler.
Optimization
CUDA Memory Management
PyTorch’s caching allocator can retain GPU memory; invoking torch.cuda.empty_cache() after processing ~100 images releases memory, reducing usage from ~15 GB to ~900 MB.
Prediction Speed
Switching from Mask‑RCNN to BlendMask, resizing input images to ≤320 px, and upgrading hardware from Nvidia K80 to V100 cut per‑frame inference from >200 ms to ~35 ms.
PNG Generation
By using only the pred_masks field from Detectron2 results and skipping unnecessary visualisation steps, PNG creation time dropped from >130 ms to ~1 ms.
SVG Conversion
Replacing the original potrace pipeline with a custom pypotrace implementation reduced SVG conversion from ~80 ms per image to ~1 ms.
m3u8 Seek Issue
Handling EXT‑X‑DISCONTINUITY tags required tracking the last PTS before discontinuity and adjusting subsequent timestamps, ensuring reliable seeking in long HLS streams.
Conclusion
The article presents an end‑to‑end AI‑driven masked danmaku solution, covering background, technology evaluation, system architecture, mask generation workflow, client rendering, and multiple performance optimizations. The authors hope the experience and lessons learned benefit others building similar video‑overlay systems.
HomeTech
HomeTech tech sharing
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.