Big Data 10 min read

How ID‑Mapping Connects Data Silos Across Industries

This article explains the fundamentals of ID‑Mapping, its importance for unifying fragmented user and device data, showcases industry solutions from Alibaba, NetEase, 58.com and Meituan, and outlines technical approaches such as priority‑based rules and graph‑based computation.

Data Thinking Notes
Data Thinking Notes
Data Thinking Notes
How ID‑Mapping Connects Data Silos Across Industries

ID‑Mapping Overview

ID‑Mapping is a fundamental yet critical step in big‑data analysis that links multiple data sources to the same entity—such as a device, user, or enterprise—turning fragmented pieces into a complete user profile and eliminating data islands.

Typical challenges include switching accounts on the same device, different accounts across channels (e.g., WeChat mini‑program vs. app), and users logging in from various device manufacturers.

Industry Solutions

Alibaba OneID

Alibaba aggregates IDs like phone, PC cookie, IMEI/IDFA, Taobao account, Alipay account, and email. Using the OneData framework (OneModel, OneID, OneService), it unifies these identifiers into a single UID through business rules, machine learning, and graph algorithms.

NetEase ID‑Mapping

NetEase combines various account and device identifiers (e.g., musicid, oaid, phone, email, idfa, imei) and applies rule‑based and data‑mining algorithms (connected‑graph partitioning + community detection) to determine whether accounts belong to the same person.

58.com ID‑Mapping

58.com integrates data from multiple products (58 Tongcheng, Ganji, Anjuke, etc.) across logs, resumes, posts, and merchant databases. Different business lines use distinct ID tags (e.g., wuser, guser, kimei) which are linked via fields such as telep, bidua, appua, imei, and idfa to build a unified mapping.

Meituan ID‑Mapping

After merging with Dianping, Meituan aligns user identities across apps by using common login methods (phone, WeChat, Weibo) and selects the phone number as the unique identifier.

Technical Approaches

Method 1: Priority‑Based ID Mapping

Assign a unique identifier by selecting the highest‑priority ID (e.g., phone, UID, device ID). This simple method fails when users have multiple devices, channels, or when identifiers like cookies, unionid, MAC, IMEI, IMSI, AndroidID, OpenUUID, IDFA, or custom device IDs vary across logs.

Method 2: Graph‑Based Computation

Represent identifiers as nodes and their relationships as edges, then apply graph algorithms (e.g., maximum connected subgraph) to discover clusters of IDs belonging to the same entity. The workflow includes generating daily node and edge sets, merging with previous mappings, running the connectivity algorithm, and assigning a persistent UID.

The resulting ID mapping dictionary acts as a bridge that connects previously isolated data islands, enabling comprehensive user profiling and more precise analytics.

Big Datauser profilingdata integrationgraph computingID-MappingCross-device Tracking
Data Thinking Notes
Written by

Data Thinking Notes

Sharing insights on data architecture, governance, and middle platforms, exploring AI in data, and linking data with business scenarios.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.