Design and Implementation of 58.com Commercial Recruitment Recommendation System
This article presents a comprehensive overview of the 58.com commercial recruitment recommendation system, detailing its business challenges, system architecture, region‑based and behavior‑based recall strategies, coarse‑ and fine‑ranking models, bias handling, evaluation methods, and future directions.
The 58.com recruitment platform focuses its commercial traffic on recommendation scenarios, aiming to efficiently match a large, diverse pool of job seekers (C‑side) with a relatively small set of commercial job postings (B‑side). The core problem is to improve bidirectional matching and commercial flow monetization.
System Overview : The commercial recommendation pipeline consists of multi‑channel recall, filtering, and ranking. It leverages user and item data together with advanced machine‑learning algorithms to generate personalized job recommendations.
Recall Stage includes:
Region‑based recall using DBSCAN clustering on user GPS data to identify dense user areas and retrieve top‑clicked job categories within each region.
Behavior‑based recall employing the EGES model: sessions of three‑hour user activity are cleaned, a graph of clicked posts is built, random walks generate sequences, and a Word2Vec‑style embedding is trained with side‑information (tags, categories, salary). Online retrieval combines normalized post vectors with a 16‑bit city encoding (0 → -1) and cosine similarity.
Additional recall methods such as content‑based (title Word2Vec) and rule‑based (category expansion) are also applied.
Ranking Stage is divided into coarse‑ranking and fine‑ranking:
Coarse‑ranking aims to select 200‑300 candidates within 15 ms. Models evolved from rule‑based to logistic regression, then to a dual‑tower architecture, and finally to knowledge distillation where a complex fine‑ranking model provides soft targets.
Fine‑ranking addresses position bias and combines CTR and CVR. Models progressed from a bias‑feature baseline to DIN‑bias (a sub‑network for position), then to W3DA (dual‑tower with Wide sub‑networks for first‑ and second‑order feature interactions), and finally to MultiTask‑W3DA which jointly optimizes CTR, CTCVR, and conversion.
Evaluation uses offline AUC and online A/B tests, monitoring metrics such as CTR, CVR, CTCVR, and ACP. The Q&A section discusses clustering metrics, embedding evaluation, negative sampling, and future plans like deploying models on edge devices.
Future work includes improving cold‑start recall, exploring more expressive ranking models (e.g., DCNv2), and enhancing the recommendation ecosystem for healthier commercial traffic.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.