Fundamentals 7 min read

Optimizing International Hotel Data Aggregation Algorithms at Qunar

The article outlines Qunar’s challenges in aggregating international hotel data, analyzes issues such as localized address formats and limited text similarity parsing, and presents a pattern‑matching and weighted scoring approach that improves aggregation accuracy across multiple countries.

Qunar Tech Salon
Qunar Tech Salon
Qunar Tech Salon
Optimizing International Hotel Data Aggregation Algorithms at Qunar

Qunar’s hotel aggregation is a core business function that unifies hotel data from many agencies to provide reliable price‑comparison experiences. Success and accuracy of aggregation directly affect user perception.

The main responsibilities are to map disparate hotel trees into Qunar’s unified hotel catalog, using key fields such as hotel name, address, city, coordinates, and phone number.

Key pain points include massive localization differences in address formats across countries and the limited ability of existing text‑similarity algorithms to parse detailed components of hotel names and addresses. Manual aggregation is costly and inefficient.

To address these challenges, the team reorganized high‑frequency name and address patterns, shifted from pure text similarity to a pattern‑matching approach with weighted scoring, and built dictionaries for brands, cities, POI terms, descriptive words, and industry terms.

The optimized workflow consists of:

Ingest a hotel record to be aggregated.

Parse the hotel name and address using pre‑collected dictionaries (brand, city, country, etc.) to generate possible tokenizations.

Match token sets against all candidate patterns and select the best match.

Search the existing aggregated dataset with the extracted key components to retrieve a candidate set.

Compute component‑wise string similarity (exact, contains, prefix, suffix) and apply weights (brand > branch > industry term > city) to calculate a composite score.

Adjust scores for special cases (e.g., same street name but different house number reduces score; identical phone or close coordinates increase score).

Select the candidate with the highest score that exceeds a predefined threshold as the final aggregation result.

Applying this pattern‑matching and weighted scoring method to eight key international markets yielded significant improvements in aggregation success rates, as shown in the result charts.

The team plans to continue sharing experiences on room‑type aggregation and encourages further collaboration.

data integrationalgorithm optimizationtext similarityPattern Matchinghotel aggregation
Qunar Tech Salon
Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.