Xianyu Technology
Dec 12, 2018 · Big Data
Community Data Normalization Using Prefix Matching and Text Similarity
The study presents a four‑step pipeline that normalizes community data for rental platforms by clustering records using longest‑common‑prefix patterns, geographic filtering, Levenshtein similarity, and pattern‑based parent‑child assignment, achieving under 8 % false positives and 5 % false negatives.
ClusteringReal Estatedata normalization
0 likes · 10 min read