Artificial Intelligence 9 min read

Geographic Alias Mining and Knowledge Base Construction Using Contextual Vectors and Address Similarity

The paper presents two inexpensive techniques for extracting geographic aliases of points of interest—comparing high‑dimensional contextual vectors of nearby shipping addresses and analyzing co‑occurring words in identical addresses—to construct a knowledge base that links official names with their synonyms, improving location‑based service accuracy.

Xianyu Technology
Xianyu Technology
Xianyu Technology
Geographic Alias Mining and Knowledge Base Construction Using Contextual Vectors and Address Similarity

Abstract: Location‑based services (e.g., rental platforms) rely on points of interest (POI) to deliver fine‑grained user experiences.

Each POI has an official name and often multiple aliases (e.g., “Beijing University” also called “Beida”). Users frequently search using these aliases; lacking alias data can cause inaccurate results, making alias acquisition crucial.

Figure 1: Overview of the alias generation model.

The industry lacks public geographic alias datasets; existing sources are costly (manual collection or ML/DL mining). This paper proposes two low‑cost methods: (1) alias extraction using high‑dimensional contextual vectors (see Figure 1); (2) analysis of co‑occurring words in identical shipping addresses.

Keywords: geographic alias, POI, shipping address, knowledge base

1. Alias Data Mining

1.1 Alias extraction based on contextual high‑dimensional vectors

Figure 2: Shipping address context comparison.

Two shipping addresses are compared using cosine similarity of their token vectors. The cosine formula (Figures 3‑5) extends from 2‑D to n‑D vectors.

Figure 3: Cosine function formula. Figure 4: 2‑D vector cosine similarity. Figure 5: Multi‑dimensional vector cosine similarity.

Example addresses A and B are tokenized, yielding frequency vectors A={1,1,1,1,0,1} and B={1,1,1,0,1,1}; cosine similarity indicates high overlap.

After confirming similarity, alias candidates are generated by vertical tokenization, counting co‑occurrence, building frequency vectors, and computing cosine similarity (Figure 6, 7). Alias target words are all ordered subsets of the original name, excluding single characters.

Figure 6: Tokenization horizontal split. Figure 7: Tokenization vertical split.

Address data must be within 300 m of the POI (user‑authorized) and sparse vectors (>90 % zeros) are discarded.

2. Analysis of Same‑Context Words in Identical Addresses

Identical shipping addresses may contain different expressions of the same place (e.g., “Beijing University” vs “Beida”). By grouping addresses within 500 m, tokenizing, and analyzing co‑occurrence, three cases are identified (Figure 8). Overlapping name sets indicate synonym relations; disjoint sets indicate distinct locations; partial overlaps are ambiguous and excluded.

Figure 8: Alias relationship analysis.

3. Knowledge Base Construction

The knowledge base stores entries (name, ID, description) and supports multiple meanings (e.g., “Apple” fruit vs “Apple” device). Figure 9 shows the entry schema; Figure 10 depicts relationship graphs (synonym links under specific conditions).

Figure 9: Knowledge‑base entry representation. Figure 10: Knowledge‑base structure.

By inserting geographic names and their aliases as entries and establishing synonym relations under the POI condition, users can retrieve the same entity via either the official name or any alias.

4. Extension

Machine‑learning techniques can further enhance alias mining, as demonstrated in “Research on POI Abbreviation Retrieval,” which combines statistical segmentation, classification, and hidden Markov models.

Reference: [1] “Research on POI Abbreviation Retrieval”.

Data Miningknowledge basecosine similarityGeographic Aliaspoi
Xianyu Technology
Written by

Xianyu Technology

Official account of the Xianyu technology team

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.