Artificial Intelligence 16 min read

Ranking Strategy Optimization Practice in 58 Commercial Traffic

This article details the comprehensive optimization of 58's commercial traffic ranking system, covering data‑flow upgrades, advanced feature engineering, model enhancements—including online training, multi‑task and relevance models—and a multi‑factor ranking mechanism that together improve monetization efficiency and user experience.

58 Tech

Oct 28, 2019

Ranking Strategy Optimization Practice in 58 Commercial Traffic

Background: 58 is China’s largest life‑information service platform, offering local services, recruitment, real‑estate, used cars, and more. The commercial department serves various business lines, optimizing monetization efficiency while meeting diverse flow‑monetization needs.

Before optimization, the basic model consisted of four parts: data flow, feature engineering, model computation, and ranking mechanism. Data flow produced raw logs (impressions, clicks, conversions) which were merged, cleaned, and anti‑fraud filtered to generate hundreds of base fields for downstream sampling and metric calculation.

Feature engineering involved designing, processing, and selecting features such as position bias correction (COEC), Bayesian smoothing, discretization, and information‑gain/AUC based selection.

Model computation supported FTRL, GBDT, and wide&deep models with batch and incremental daily updates; wide&deep used TensorFlow server API for online inference.

Ranking mechanism originally supported single‑factor sorting (CTR, CVR, or eCPM) and faced issues like feature diff and insufficient timeliness.

Data‑flow upgrade : To address feature diff and latency, two real‑time streams were built—one using Spark Streaming for real‑time impression‑click log stitching and sample generation (latency ~1 minute, reducing feature‑diff rate from 5% to <1%), and another using Storm (future Flink) for real‑time metric calculation (updates within 3 seconds).

Feature upgrade : Discrete ID features (e.g., car model, brand) were enhanced using embeddings and attention mechanisms. User interest vectors were generated by embedding discrete attributes, weighting them by interaction counts, and pooling. Cross‑features between users and post attributes were also embedded.

Model upgrade : Three upgrades were introduced:

Online training (real‑time model) using FTRL, dumping models every 30 minutes for hot‑loading.

Multi‑task model (ESMM) jointly learning pCTR and pCTCVR on the full impression space, mitigating sample‑selection bias and data sparsity.

Relevance model for coarse ranking, computing cosine similarity between user and post vectors (using DSSM or item2vec) and combining strong and weak intent scores.

Ranking mechanism upgrade : The new multi‑factor framework supports hierarchical sorting, weighted fusion, custom fusion (e.g., multiplication), and nested layers, enabling flexible, business‑specific ranking across multiple factors.

Summary and outlook : After the four upgrades, the ranking system shows significant improvements in monetization and connection efficiency across business lines. Future work includes introducing true exposure data, further automating feature engineering, exploring model fusion and compression, and building multi‑objective offline evaluation for automated factor exploration.

Team introduction : The Commercial Product Technology team handles ad delivery and retrieval, supporting platforms such as Lego Ad System, City Treasure ADX, Intelligent Creative Platform, and DSP, with a strong culture of technical sharing.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

e-commerce Machine Learning feature engineering ranking online training Real-time Data Pipeline

Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.