Evolution and Architecture of Beike Commercial Strategy Algorithm Platform
This article details the evolution of Beike's commercial strategy algorithm platform, describing its business scenarios, bidding mechanisms, architecture redesign across online, near‑real‑time, and offline layers, model training, vector retrieval, service governance, and the performance and stability improvements achieved.
The article introduces the rapid growth of traffic on the Beike real‑estate app and the increasing complexity of its commercial scenarios, which demand higher efficiency for business iteration and effect optimization.
It explains the core business flow: users browse listings, a real‑time bidding system selects an agent, and interactions are charged using Beike coins, making agents the advertisers.
Key technical components include eCPM‑based ranking (price × estimated CVR), various commercial products (effect ads, contract ads, natural slots), and distinct payment and settlement models (CPA with three‑interaction requirement).
Architecture evolution is traced from 2018 to 2020. Early versions relied on a simple Python/Tornado service, MySQL‑to‑Hive pipelines, and an ADX layer using Redis. By 2019, a Python service performed eCPM estimation, but performance, stability, and data‑isolation issues emerged.
In response, a comprehensive redesign introduced a three‑layer strategy algorithm middle‑platform:
Online Service Layer – high‑performance, stable Java services with configurable micro‑service orchestration, AB testing, and service‑governance.
Near‑Real‑Time Layer – minute‑level data collection, feature processing, and model updates using Kafka, Redis, Milvus/FAISS vector search, and Elasticsearch.
Offline Computing Layer – hour‑level batch ETL, feature engineering, and model training (LR, LightGBM, Wide&Deep, DeepFM) on Hive/HDFS.
The redesign replaced Python with Java to eliminate GIL limitations, introduced service governance frameworks for reliability, and split services both horizontally (by business domain) and vertically (by functional modules) to reduce coupling.
Detailed designs of the online service layer include a central control service that reads configuration from MySQL, orchestrates sub‑services (recall, filtering, feature processing, ranking, fusion), and applies dynamic timeout and circuit‑breaker policies.
Recall mechanisms combine KV‑based rule recall, Elasticsearch queries, and vector similarity search using FAISS/Milvus. Filtering handles budget control, frequency caps, and business‑specific rules via RPC.
Model serving utilizes PMML files for lightweight models and a TensorFlow‑Serving‑like platform for deep learning models, supporting TensorFlow and PyTorch.
Micro‑service technology stack includes Spring Cloud Gateway, Eureka service registry, Feign/HTTP RPC, Ribbon load balancing, Sentinel for rate limiting, Apollo for configuration, and Prometheus‑Grafana for monitoring.
Post‑restructuring results show stability improvement from 97% to 99.99%, a reduction in required instances (36 eight‑core nodes to 24 four‑core nodes), and faster AB‑test deployment with minute‑level configuration updates.
The article concludes with future directions: expanding the AI platform to other domains (image analysis, valuation), supporting multiple programming languages, further RPC performance optimization, and exploring more deep‑learning models for online deployment.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.