Deep Learning-based Object Detection Algorithm Review (Part 2): Solutions and Network Improvements
The article reviews deep learning object detection solutions: small object detection via FPN and TDM, irregular shapes via deformable convolution, sample imbalance via focal loss and cascade methods, occlusion handling with Soft‑NMS and RRC, large‑batch training using MegDet, relationship modeling with Relation Networks, and network improvements such as DetNet, RefineDet, Pelee, and RFBNet.
This article is the second part of a comprehensive survey on deep learning-based object detection algorithms, focusing on solutions to common detection problems and network structure improvements.
Small Object Detection:
The Feature Pyramid Networks (FPN) paper introduces a Top-Down structure to improve small object detection by fusing high-level semantic features with low-level detail features. The Beyond Skip Connections paper proposes Top-Down Modulation (TDM) using a different approach from FPN, employing lateral connections and convolution-based feature fusion.
Irregular Shape Object Detection:
Deformable Convolutional Networks address the limitation of fixed geometric structures in CNNs by introducing deformable convolution and deformable RoI pooling, enabling the network to adapt to spatial geometric transformations.
Sample Imbalance Solutions:
Focal Loss addresses the extreme imbalance between positive and negative samples in one-stage detectors by adding a modulating factor to reduce the loss weight for easy negative samples. The Chained Cascade Network uses a cascade approach to filter out background regions progressively. RON introduces an objectness prior map to guide training and prediction.
Occluded Object Detection:
Soft-NMS improves upon traditional NMS by assigning lower scores to boxes with high IoU rather than completely removing them. The Recurrent Rolling Convolution (RRC) network addresses detection of small and occluded objects through iterative feature aggregation.
Mini-batch Training:
MegDet proposes solutions for large batch-size training in object detection, including variance equivalence, warmup strategy, and cross-GPU batch normalization, reducing training time from 33 to 4 hours.
Object Relationship Modeling:
Relation Networks introduce a relation module that models relationships between detected objects, applicable to both object recognition and duplicate detection elimination, enabling end-to-end trainable detection.
Network Structure Improvements:
DetNet redesigns the backbone network specifically for detection tasks, maintaining higher resolution while balancing computational cost. RefineDet combines one-stage and two-stage approaches through anchor refinement. Pelee optimizes for mobile devices with real-time performance. RFBNet uses Receptive Field Blocks with dilated convolutions to increase receptive field without adding parameters.
Meitu Technology
Curating Meitu's technical expertise, valuable case studies, and innovation insights. We deliver quality technical content to foster knowledge sharing between Meitu's tech team and outstanding developers worldwide.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.