Big Data 12 min read

Overview of the 58 User Profile System Architecture and Data Processing

The article describes the design, data integration, ID mapping, tag generation, and application scenarios of the 58 user profiling platform, which aggregates billions of user IDs across multiple business lines to provide online and offline persona data for personalization, analytics, and AI modeling.

58 Tech
58 Tech
58 Tech
Overview of the 58 User Profile System Architecture and Data Processing
“People always like to do what they are good at, and they hope that some people know they are right there.” "List Life" Frederick Backman

As the world’s largest life‑service platform, we also want users to know that “58 is right there!”. To achieve this, we need deep, precise insight into users, and the construction of the 58 user‑profile system is aimed at that goal.

Since its founding, 58 has accumulated hundreds of millions of registered users across countless scenarios—from housing, vehicles, and second‑hand goods to recruitment and on‑site services. The massive scale and diversity of data mean a single user can be described by thousands of dimensions. Extracting all needed data from source systems each time would be prohibitively costly, so we built a unified user‑profile platform to integrate data across subsidiaries.

1. Introduction to the 58 User Profile

The platform now ingests core data sources such as 58, Ganji, Anjuke, DaJia, Yingcai, commercial, and certification systems, aggregating billions of active IDs. It provides seven categories of tags—personal attributes, certification, location, B‑side behavior, C‑side behavior, interests, and devices—totaling over 2,300 tags, with an average of nearly 100 tags per user.

There are three ways to use the profile data:

FaceAPI Interface: An online service that returns all tags for a user based on phone number, device ID, account, cookie, etc., supporting personalization in search, recommendation, DSP, and other scenarios.

Offline Profile Data: Provides tag data keyed by common IDs for statistical analysis, model training, and other batch‑processing tasks.

Smart Website: Offers two main functions: (1) a filter that lets users build user groups with AND/OR logic for targeted pushes, and (2) a crowd‑analysis tool that enables two‑dimensional attribute combination analysis and instant report generation for product or operations teams.

2. Data Architecture of the 58 User Profile

The core of profile construction is data organization and tag management. 58’s business spans real estate, recruitment, vehicles, yellow pages, etc., and data originates from logs, resume databases, post databases, user info databases, merchant databases, and certification databases. Logs alone include PC, mobile, and app logs from multiple sub‑products, making data integration the first major challenge.

To address this, we built an IDMapping model that unifies IDs across the group. IDMapping model diagram:

IDMapping is a key module that maps various source IDs to a unique user ID, allowing a single account or phone number to retrieve all behaviors across business lines. It also merges multiple IDs into one persona, increasing data density, improving personalization matching, and enabling targeted governance of problematic users.

We construct an ID association graph using co‑occurrence IDs, then split the graph based on business and temporal attributes. Currently, IDMapping stores dozens of ID types, exceeding 10 billion IDs. To handle the growing data volume and computation cost, we designed full‑ and incremental‑update pipelines that meet daily update requirements.

Based on IDMapping, the system architecture is divided into three layers:

Data Resource Management Layer: Controls data ingestion tools, scheduling, quality monitoring, and metadata management.

Profile Tag Production Layer: Includes ETL, IDM, data aggregation, tag extraction, and algorithmic strategy tools.

Storage and Application Layer: Stores online and offline tables for downstream consumption.

During tag generation, raw data is abstracted into a seven‑tuple {userID, time, location, category, behavior, entity, other} for behavior data, and a five‑tuple {entity, time, location, category, attribute‑table} for posts. This abstraction enables unified management across heterogeneous sources. In the data fusion layer, behavior data is converted via IDM and aggregated daily, then further aggregated across multiple time slices. Algorithms and rule‑based strategies then produce various tags for the application layer.

Because many data sources contribute to the same tag, different generation strategies are applied and the results are merged. For example, gender tags may come from resumes, registration info, or a classification algorithm that predicts gender from interests. Reliable sources are prioritized, then algorithmic predictions fill gaps, with each source weighted appropriately to mitigate sparsity.

3. Application Scenarios and Case Studies

Philip Kotler noted in "Marketing Management" that retaining a new customer is far cheaper than acquiring one, and reducing churn by 5% can increase profit by 25‑85%.

User profiles are widely used for personalized matching, information‑quality governance, and support more than 50 application scenarios across search, recommendation, publishing, and information security, handling over 10 billion daily calls. They improve conversion rates in search, recommendation, push, ad matching, financial modeling, identity verification, and anomaly detection.

The profiling platform supports a five‑level modeling abstraction:

Basic Data Ingestion: Connects core sources (58, Ganji, Anjuke, etc.) and allows custom data source integration.

IDMapping: Retrieves associated features for a user, supporting additional IDs such as payment codes.

Profile Tags: Currently over 2,300 tags across six major categories, with the ability to define new tags.

User Features: Extracts modeling dimensions from the cleaned tag data, supporting offline training sample generation.

AI Algorithms: Applies suitable algorithms to business scenarios, producing model outputs and evaluation data.

big datauser profilingdata integrationdata architecturetag generationID-Mapping
58 Tech
Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.