Artificial Intelligence 11 min read

Applying Learning to Rank for Search Suggestion Optimization at Gaode Maps

Gaode Maps applied Learning to Rank to optimize search suggestions, moving from rule-based to gradient boosted rank model, addressing sample construction and feature sparsity via session-based labeling and loss adjustment, achieving a seven‑point MRR gain and higher coverage, and paving the way for personalization and deep learning.

Amap Tech

Jun 5, 2019

Applying Learning to Rank for Search Suggestion Optimization at Gaode Maps

Introduction: Gaode aims to connect the real world and improve travel experiences. To achieve this, it must handle massive LBS (Location‑Based Service) data and intelligently link users with relevant information. Information retrieval is a core technology, and search suggestions are an indispensable component of the retrieval service.

This article introduces how machine learning, specifically Learning to Rank (LTR), is applied to Gaode's search suggestion (suggest) service, focusing on model optimization experiments that have been validated and yielded significant improvements. These efforts also laid the groundwork for later personalization, deep learning, and vector indexing applications.

Search Suggestion Overview: The suggest service provides real‑time query or POI (Point of Interest) completions as users type, presenting a list of candidates that are intelligently ranked. It is designed to be fast, lightweight, and a simplified LBS information retrieval system.

The ranking stage uses query‑doc textual relevance and document features (weight, click) to compute weighted scores. As feature volume grew, rule‑based ranking became hard to maintain, prompting a shift to LTR.

Challenges: Sample construction and model tuning are the two main hurdles. Manual labeling is infeasible due to the massive daily traffic and astronomical number of candidate POIs. Automatic sample construction based on click/no‑click pairs faces issues such as click bias, lack of true satisfaction signals, limited exposure (only top‑10 results are shown), and users who prefer typing full queries.

Click‑overfitting: past clicks heavily influence future rankings.

Clicks may not reflect genuine satisfaction.

Only top‑10 suggestions are displayed, leaving many candidates unclicked.

Some users bypass suggestions entirely, making their intent invisible.

These problems lead to ambiguous modeling when click data is sparse, and sparse features are often ignored by the model despite being crucial for long‑tail cases.

System Modeling Details:

To address sample construction, Gaode aggregates multiple server logs (suggest, search, navigation) to build user sessions. Instead of treating each query click in isolation, the entire session is considered, and the final click in a session is propagated to all preceding queries, providing a richer supervision signal.

Randomly sampled millions of online click logs are used to retrieve the top‑N candidate POIs for each query, generating tens of millions of effective training samples for a Gradient Boosted Rank (gbrank) model.

Feature engineering covers four modeling needs:

Cross‑chain comparability for multiple recall pipelines (different cities, pinyin, etc.).

Dynamic representation of target POIs as user input evolves.

Prior features for low‑frequency, long‑tail queries lacking click‑based signals.

Regional personalization using geohash‑based spatial partitioning.

After feature design, standard preprocessing (scaling, smoothing, position‑bias removal, normalization) is applied.

The initial model, with all rule‑based components removed, improved MRR by about 5 points on the test set. However, gbrank exhibited highly uneven feature learning: only a few features (e.g., city‑click) dominated tree splits, while many engineered features were ignored.

Analysis of feature importance revealed two main causes:

Cross‑features like query‑click are missing in ~60% of samples, yielding low split gain.

Text similarity features have low positive/negative ratios, also resulting in low split gain.

To mitigate this, two solutions were considered:

Oversampling sparse and low‑frequency query samples (simple but distorts data distribution).

Adjusting the loss function: modify the negative gradient for specific features, effectively increasing their split gain in subsequent trees.

The chosen approach modifies the loss as follows (illustrated in the figure):

The added term loss_diff acts as a shift on the sigmoid output, increasing the penalty for mis‑predicted samples involving the target feature. Larger differences produce larger loss_diff, thereby boosting the feature’s split gain in the next iteration.

After loss adjustment and retraining, MRR increased an additional 2 points, and the proportion of resolved ranking cases rose from 40% to 70%.

Conclusion: Deploying Learning to Rank in Gaode's search suggestion system eliminated rule‑based coupling and patch‑heavy maintenance, delivering clear performance gains. The gbrank model now satisfies ranking needs across query frequencies. Ongoing work includes personalized modeling, deep learning, vector indexing, and user‑behavior sequence prediction for further enhancements.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

model optimization Learning-to-Rank Gaode Maps Search Suggestion

Written by

Amap Tech

Official Amap technology account showcasing all of Amap's technical innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.