Artificial Intelligence 14 min read

Image Segmentation for High-Definition Mapping: Evolution and Practices at Gaode Maps

Gaode Maps has progressed image segmentation from early heuristic region splitting to modern deep‑learning pipelines—leveraging FCNs, multi‑task networks, Mask R‑CNN, and specialized losses—to achieve centimeter‑level, instance‑aware mapping of roads, signs, and small objects while pursuing lighter, real‑time models.

Amap Tech
Amap Tech
Amap Tech
Image Segmentation for High-Definition Mapping: Evolution and Practices at Gaode Maps

1. Introduction

Image segmentation is a fundamental technology in computer vision, aiming to assign a label to each pixel so that pixels with the same label share visual characteristics. Since the 1960s, research on image segmentation has progressed, and deep learning has dramatically advanced the field. Early algorithms struggled with abstract semantic targets (e.g., text, animals, pedestrians, vehicles) due to reliance on low‑level features, leading to the so‑called “semantic gap”. Deep learning’s ability to learn features automatically has largely closed this gap, enabling segmentation based on high‑level semantics.

Figure: History of image segmentation development

Gaode Maps possesses massive image/video data and requires semantic understanding for various scenarios, such as detecting text, road surfaces, buildings, bridges, signs, and lane markings captured by vehicles, satellites, or user phones.

How does Gaode achieve robust understanding of such complex scenes? This article describes the evolution of image segmentation at Gaode Maps from a simple tool to a powerful engine for highly automated data production.

2. Exploration Phase – Early Attempts

In street‑level data collection, Gaode needed to automatically generate POI (Point of Interest) data such as shops. OCR could read text, but distinguishing multiple POIs in a single image was difficult. Simple heuristics (e.g., background color) produced many errors. An unsupervised gPb‑owt‑ucm algorithm combined with an improved watershed method was used to split images into regions, and Cascade Boosting‑based text detection helped isolate text‑containing regions.

3. Growth Phase – Semantic Segmentation in Natural Scenes

In late 2014, Fully Convolutional Networks (FCNs) provided the first end‑to‑end deep learning solution for pixel‑wise classification. Gaode quickly applied FCNs to tasks such as text region segmentation. However, FCNs produced coarse masks, struggled with instance separation, false alarms, and multi‑scale objects. To address text‑line adhesion, a multi‑task network added a “mid‑line” segmentation branch; a Dijkstra‑like algorithm then split adjacent text lines based on shortest distance to the mid‑line.

To suppress false alarms, a parallel R‑CNN branch was added for text detection, providing cues to filter out non‑text regions. A consistency loss was introduced to jointly train segmentation and detection branches, improving overall accuracy (see arXiv 2017 paper [3]).

4. Maturity Phase – Fine‑grained and Instance‑aware Segmentation

The introduction of Mask R‑CNN made instance segmentation more accessible. For shop sign segmentation, traditional rectangular proposals fail due to non‑vertical viewpoints; Mask R‑CNN’s combined detection‑segmentation pipeline yields precise masks even for irregular shapes.

High‑definition map production demands centimeter‑level accuracy, meaning segmentation errors must be limited to 1–2 pixels. Edge accuracy and multi‑scale recall become critical. Gaode designed a custom loss that heavily penalizes errors near ground‑truth edges, improving road surface and lane‑line segmentation.

Small and large objects (e.g., lamp posts, lane markings) pose challenges due to limited receptive fields. Architectures such as PSPNet, DeepLab, and FPN mitigate these issues. Moreover, the class‑imbalance caused by varying object scales is addressed by adapting Focal Loss—originally for object detection—to segmentation, focusing training on hard, small‑scale targets.

5. Future Outlook

Recent advances such as Mask Scoring R‑CNN and Hybrid Task Cascade further improve segmentation precision. Nevertheless, segmentation remains computationally heavy compared to classification. Techniques like ICNet, MobileNet, and knowledge distillation aim to reduce inference cost while preserving accuracy, enabling more real‑time applications.

For Gaode Maps, image segmentation is an indispensable foundation across all automated data pipelines. Ongoing work will continue to pursue more accurate and lightweight segmentation solutions for mapping scenarios.

6. References

[1] Arbelaez, P. et al., “Contour detection and hierarchical image segmentation,” IEEE TPAMI, 2010.

[2] Long, J., Shelhamer, E., Darrell, T., “Fully convolutional networks for semantic segmentation,” CVPR, 2015.

[3] Jiang, F., Hao, Z., Liu, X., “Deep scene text detection with connected component proposals,” arXiv:1708.05133, 2017.

[4] He, K. et al., “Mask R‑CNN,” ICCV, 2017.

[5] Zhao, H. et al., “Pyramid scene parsing network,” CVPR, 2017.

[6] Chen, L.-C. et al., “DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs,” IEEE TPAMI, 2017.

[7] Lin, T.-Y. et al., “Feature pyramid networks for object detection,” CVPR, 2017.

[8] Lin, T.-Y. et al., “Focal loss for dense object detection,” ICCV, 2017.

[9] Huang, Z. et al., “Mask scoring R‑CNN,” CVPR, 2019.

[10] Chen, K. et al., “Hybrid task cascade for instance segmentation,” CVPR, 2019.

computer visionAIdeep learningimage segmentationsemantic segmentationGaode Maps
Amap Tech
Written by

Amap Tech

Official Amap technology account showcasing all of Amap's technical innovations.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.