Advancements in Query Analysis for Map Search: City Analysis, Where‑What Segmentation, and Path Planning
The article details Amap’s upgraded map‑search query analysis, introducing a two‑stage city‑identification system, enhanced where‑what segmentation with CRF and GBDT models, a three‑stage path‑planning pipeline, and outlines future deep‑learning and knowledge‑graph enhancements for robustness and low‑frequency query handling.
In the previous article we introduced the overall evolution of geographic text processing at Amap and presented several generic query‑analysis points. This second part focuses on map‑specific text‑analysis techniques, including city analysis, where‑what segmentation, and path‑planning, and provides a brief outlook on future work.
4.1 City Analysis
Map search at Amap operates on a city‑level granularity. A complete search request contains the user‑entered query, the city displayed on the map, and the user’s current city. Accurately identifying the target city is the first and crucial step. The original city‑analysis module relied on rule‑based priors and posteriors, which suffered from poor effectiveness and maintainability, and used only query‑level features, leading to weak performance on low‑frequency queries.
The redesign treats city analysis as a two‑stage “recall + selection” problem. In the recall stage, features are extracted from both query and phrase granularity and candidate cities are merged. In the selection stage, a GBDT binary classifier ranks candidate cities. Samples are drawn randomly from search logs and manually labeled, with care taken to balance local and remote city distributions. Feature groups include query‑level, phrase‑level, and engineered composite features.
4.2 Where‑What Analysis
Map queries often contain both a spatial constraint (where) and a target POI (what). Correctly separating these components enables precise retrieval. The task consists of a prior stage (sequence labeling to split the query into where and what) and a posterior stage (intent selection, modeled as classification or ranking).
Problems identified include a simplistic CRF model that depends heavily on component‑analysis features, and rule‑based post‑processing that is hard to maintain. Improvements involve:
Developing a CRF analysis tool to visualize Viterbi paths and diagnose issues.
Enhancing features such as prefix confidence, suffix entropy, and what confidence, which quantify how strongly a phrase functions as a where or what segment.
Replacing rule‑based intent selection with a GBDT model that incorporates prior confidence and textual features.
Addressing robustness by building shallow ensemble models to handle query variations such as reversed order, misspelled districts, and unexpected token swaps.
These upgrades lead to comparable overall accuracy while significantly improving performance on adversarial and low‑frequency cases.
4.3 Path Planning
Path‑planning queries (e.g., “from X to Y”) are a classic NLP task. Early implementations used template matching, which handled most simple cases but could not scale to complex expressions like multi‑modal transit routes or indirect phrasing. The new pipeline adopts a three‑stage approach:
Keyword‑based pre‑filtering to quickly discard non‑path queries.
A CRF model for slot filling (origin, destination, transport mode) on the remaining candidates.
Post‑processing validation to ensure consistency.
Training data are generated automatically by enriching known patterns and substituting real origin/destination tokens, reducing the reliance on costly manual annotation. Features combine component‑analysis signals and POI‑keyword dictionaries. Evaluation on validation and random query sets shows clear gains in precision and recall, and the CRF‑based solution paves the way for future seq2seq multi‑task models.
5 Outlook
After two years of extensive machine‑learning deployment, geographic text processing has entered a “deep water” stage. Future work will focus on:
Attack side : Leveraging deep learning (e.g., seq2seq) to improve low‑frequency and long‑tail query handling, and integrating knowledge graphs to provide human‑like prior judgments.
Defense side : Enhancing system robustness against atypical expressions, query reordering, and input errors through targeted optimizations and ensemble shallow models.
Map search, though a niche vertical, presents a full set of challenges. Continued adoption of state‑of‑the‑art techniques tailored to geographic text characteristics will drive smarter, more reliable search experiences.
Amap Tech
Official Amap technology account showcasing all of Amap's technical innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.