Artificial Intelligence 22 min read

Mask Detection System and Visual AI Competition Achievements

Didi’s COVID‑19 mask‑detection system, built on a DFS‑based face detector and an attention‑enhanced ResNet‑50 mask classifier achieving over 99.5 % accuracy, has been deployed in vehicles, open‑sourced, and complemented by top‑ranked results in international visual AI contests, including first place in driver‑gaze prediction and podium finishes in emotion recognition and model‑compression challenges.

Didi Tech
Didi Tech
Didi Tech
Mask Detection System and Visual AI Competition Achievements

At the beginning of 2020, when the COVID‑19 pandemic broke out, Didi leveraged its years of research in computer vision to develop a mask‑recognition anti‑epidemic system. The article introduces the system’s architecture, underlying principles, and methods, and also reports the image‑technology team’s progress in international computer‑vision competitions.

Mask recognition technology – The solution consists of two modules: a face‑detection module based on DFS (Detection with Feature Fusion and Segmentation Supervision) and a mask‑attribute recognition module. The face‑detection module accurately localizes faces, while the mask module uses attention‑enhanced classification to decide whether a detected face is wearing a mask. The system achieves image‑level accuracy above 99.5 % under diverse lighting, occlusion, pose, scale, and mask‑type variations.

The face‑detection component builds on a feature‑fusion pyramid and segmentation supervision. High‑level semantic features are fused with low‑level detail features through spatial and channel attention, preventing semantic over‑dominance and preserving fine details. The DFS algorithm uses weak supervision from annotated face boxes, sharing scale information between detection and segmentation branches; the segmentation branch is removed at inference, adding no extra parameters.

Evaluation on the WIDER FACE benchmark shows that DFS reaches AP scores of 96.9 % (Easy), 95.9 % (Medium), and 91.2 % (Hard) on the validation set, and similar scores on the test set, ranking first in most metrics.

The mask‑attribute classifier is built on a modified ResNet‑50 backbone with an attention layer after block 3. The attention layer multiplies feature maps with learned weights to emphasize mask regions and suppress background, improving discrimination on difficult samples. A 20 % expansion of the detected face bounding box is applied to compensate for localization errors and to include mask‑related regions such as ear loops.

Since January 2020, the system has been deployed in Didi’s pre‑trip quality‑inspection and in‑vehicle devices, automatically checking whether drivers wear masks. The solution was published in *Chinese Science – Information Science* and open‑sourced on GitHub in February 2020.

Visual competition results – The team participated in several international challenges. In the ACM ICMI 2020 EmotiW2020 competition, they achieved 1st place in Driver Gaze Prediction and 3rd place in Group Emotion Recognition. The driver‑gaze task involved classifying nine gaze zones; the team’s model combined appearance features, geometric landmarks, and multi‑modal fusion, reaching an accuracy of 82.53 %.

For the Group Emotion Recognition track, they fused video‑level, facial‑expression, and body‑keypoint features, achieving a final score of 70.77 % and ranking 3rd globally.

In the ICME 2020 Embedded Deep Learning Object Detection Model Compression Competition for Traffic in Asian Countries, the team placed 3rd overall. They addressed challenges such as low‑quality night images, severe class imbalance, and inconsistent annotations by applying data augmentation, online hard example mining, and model‑fusion of anchor‑based and anchor‑free detectors. In the final stage they employed knowledge distillation and quantization to compress the model, achieving the best scores for model size, computational complexity, and inference speed on Nvidia Jetson TX2.

These research and competition activities have strengthened Didi’s cloud‑ and edge‑side visual perception capabilities, providing a solid foundation for future anti‑epidemic projects and other transportation‑related AI services.

computer visionAImodel compressiondeep learningface detectionmask detectionvisual competition
Didi Tech
Written by

Didi Tech

Official Didi technology account

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.