Artificial Intelligence 11 min read

How BCNet Tackles Occlusion in Instance Segmentation with a Dual‑Layer GCN

The article introduces BCNet, a lightweight dual‑layer instance segmentation network that models images as overlapping occluder and occludee layers, enabling effective handling of heavy object occlusion and achieving significant performance gains on COCO, COCOA and KINS datasets compared to existing methods.

Kuaishou Audio & Video Technology
Kuaishou Audio & Video Technology
Kuaishou Audio & Video Technology
How BCNet Tackles Occlusion in Instance Segmentation with a Dual‑Layer GCN

Abstract

Object occlusion is common in daily life and severely degrades the performance of existing detection and segmentation algorithms. This work models an image as two overlapping layers to introduce explicit occluder‑occludee relationships and proposes a lightweight instance segmentation algorithm that effectively handles occlusion, achieving large performance improvements on COCO and KINS datasets.

Background

Instance segmentation combines object detection and semantic segmentation and is fundamental for video editing, video conferencing, medical imaging, autonomous driving, and other applications.

Problem

Methods such as Mask R‑CNN follow a detect‑then‑segment pipeline, but heavy overlap between objects leads to confusion of true object boundaries, especially when occluder and occludee belong to the same class or have similar textures, causing large segmentation errors.

Method (BCNet)

BCNet treats each Region of Interest (RoI) as two overlapping layers: the top layer detects the occluding object (Occluder) and the bottom layer infers the partially hidden object (Occludee). The network consists of a cascade of two graph‑convolutional layers. The first layer contains four convolutional layers (3×3 kernels) and a non‑local graph convolution to model the occluder’s shape and appearance. The second layer receives RoI features combined with the first layer’s output and predicts the occludee mask. Non‑local operators reduce parameter count while aggregating spatially disjoint features.

Experiments

BCNet was evaluated on COCO, COCOA, and KINS. It consistently outperforms CenterMask, BlendMask, HTC, and other state‑of‑the‑art methods, especially under heavy occlusion, without increasing model size or inference time. Tables 1‑3 show quantitative gains; Figures 5‑8 provide qualitative comparisons demonstrating more robust predictions and better interpretability.

Significance

The dual‑layer design decouples occluder and occludee boundaries, offering a practical solution for high‑precision instance segmentation in real‑world scenarios such as short‑video editing and autonomous driving.

References

Lee, Youngwan, and Jongyoul Park. "Centermask: Real-time anchor-free instance segmentation." CVPR, 2020.

Huang Z, Huang L, Gong Y, et al. "Mask scoring R‑CNN." CVPR, 2019.

Qi L, Jiang L, Liu S, et al. "Amodal instance segmentation with KINS dataset." CVPR, 2019.

Follmann, Patrick, et al. "Learning to see the invisible: End-to-end trainable amodal instance segmentation." WACV, 2019.

Computer Visiondeep learninginstance segmentationgraph convolutional networkocclusion handling
Kuaishou Audio & Video Technology
Written by

Kuaishou Audio & Video Technology

Explore the stories behind Kuaishou's audio and video technology.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.