Beike DMP Platform: Architecture, Implementation Challenges, and Business Impact
The article details Beike's Data Management Platform (DMP) built since May 2018, covering its overall architecture, data collection, processing, real-time profiling, storage solutions, application scenarios, achieved performance metrics, and future development directions.
1. Background: To better understand real user needs, provide differentiated services, and achieve refined user operations, Beike launched a DMP platform in May 2018 that collects diverse user data, tags interests, and enables personalized recommendation, search, content guidance, and precise advertising or push messaging.
2. Challenges: The platform needed to unify user identities, handle massive real‑time behavior data, and achieve second‑level audience estimation and minute‑level complex audience calculations.
3. Implementation:
3.1 Overall Architecture
3.2 Data Collection Layer: Collects online and offline user behavior; a unified tracking specification ("Luopan") was introduced in early 2018 to provide a solid data foundation.
3.3 Data Processing Layer: Builds a wide‑table (topic table) to flatten data, solves user identity unification across devices (IMEI, IDFA, app‑generated IDs, UCID), and generates three types of tags – basic/behavioral, preference scores, and predictive labels via classification and clustering algorithms.
3.4 Real‑time Profiling: Uses Spark Streaming to consume behavior data, stores it in HBase wide tables, updates counts atomically, and caches real‑time preferences in Redis, achieving second‑level profiling.
3.5 Application Data Storage Layer: Utilizes ClickHouse, MongoDB, and HBase for different workloads; Spark jobs import Hive data into ClickHouse, use bitmap operations for fast audience estimation, and sync data to MongoDB for push services and to HBase/Redis for low‑latency personalized services.
3.6 Application Layer: Provides functionalities such as audience definition, audience insight, tag management, tag marketplace, look‑alike audience expansion, and push messaging, all powered by the underlying data platform.
4. Effects: The platform processes billions of daily events, delivers data within 10 am, and supports over 400 million daily API calls with average response times of 5 ms, enabling personalized services across DSP advertising, recommendation, search, homepage layout, and more.
5. Summary: After two years of iteration, Beike's DMP has become a core platform supporting a wide range of scenarios, delivering personalized services and refined operations.
6. Outlook: Future work includes deepening tag coverage with predictive models, continuously improving effectiveness, and further platformization of the DMP.
Beike Product & Technology
As Beike's official product and technology account, we are committed to building a platform for sharing Beike's product and technology insights, targeting internet/O2O developers and product professionals. We share high-quality original articles, tech salon events, and recruitment information weekly. Welcome to follow us.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.