Artificial Intelligence 9 min read

Improving International Hotel Room‑Type Merging with Text Similarity and Machine‑Learning Models

This article describes how a large‑scale international hotel platform reduced room‑type merging errors and user complaints by applying rule‑based methods, text‑similarity algorithms (Jaccard, LCS, N‑Gram) and supervised machine‑learning classifiers such as fastText to standardize and merge heterogeneous room‑type data.

Tongcheng Travel Technology Center
Tongcheng Travel Technology Center
Tongcheng Travel Technology Center
Improving International Hotel Room‑Type Merging with Text Similarity and Machine‑Learning Models

Project Background International hotel operations face a major difficulty in consolidating basic information, especially merging room types, which historically suffers from high error rates and user complaints.

Existing Problems The data come from many sources, are inconsistent, massive (over 1.2 million overseas hotels, millions of daily updates), and manual maintenance is costly and error‑prone, leading to mismatches such as "double room" being incorrectly merged with "standard room".

Solution Overview Three iterative approaches were adopted:

1) Rule‑based method : extract keywords from room‑type strings and group them using manually crafted rules; high labor cost and limited coverage.

2) String similarity : build a keyword library from standard room types, compute vector similarity to match supplier room types; reduces labor but still produces false matches when texts are highly similar.

3) Machine‑learning model : train supervised classifiers (fastText and textCNN) on labeled data (≈10 k samples covering 160 room‑type categories) to automatically recognize room types, achieving higher accuracy with lower training time.

Data Preparation A standard room‑type library was constructed from hotel official websites; training samples were generated by annotating supplier data according to this standard.

Algorithm Model The pipeline includes:

Text processing : remove non‑room information, normalize English/Chinese scripts, eliminate stop words, apply synonym conversion, and unify numeric expressions.

Text similarity : compute Jaccard set similarity, Longest Common Subsequence (LCS), and N‑Gram checks; Jaccard similarity of 1 indicates a perfect match, while LCS handles ordered character similarity.

Text classification : a fastText classifier (three‑layer architecture: input, hidden, output) predicts room‑type categories for cases not resolved by similarity, followed by confidence‑threshold filtering and final standardization.

Model Effectiveness Offline tests on 500 hotels showed high merge rates and a 99 % manual verification accuracy; the new algorithm reduced complaint‑related compensation by 40 % and increased distribution volume.

Business Feedback After deployment, the upgraded room‑type merging system improved overall product quality, lowered error‑induced complaints, and contributed to higher revenue for the platform.

machine learningtext similarityfastTexthoteljaccardN-gramroom type merging
Tongcheng Travel Technology Center
Written by

Tongcheng Travel Technology Center

Pursue excellence, start again with Tongcheng! More technical insights to help you along your journey and make development enjoyable.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.