Algorithmic Practices in Haolo Carpool Service: Platform, Matching Engine, Transaction Governance, and Intelligent Marketing
The article details Haolo's end-to-end AI platform—built on Hadoop/Yarn with Spark ML, XGBoost and TensorFlow—and explains how its matching recommendation engine, transaction-ecosystem governance models, and intelligent uplift-based marketing system jointly boost carpool efficiency, safety, user retention, and ROI.
Introduction: This article presents the practical application of AI algorithms in Haolo's carpool (顺风车) service. It first describes the underlying algorithm platform, then details the matching recommendation engine, transaction‑ecosystem governance algorithms, and finally the intelligent marketing system.
01 Business Introduction
The Haolo machine‑learning platform is built on Hadoop/Yarn for resource scheduling and integrates Spark ML, XGBoost, and TensorFlow, supporting both CPU and GPU. It offers end‑to‑end services for data preprocessing, model training, evaluation, and online prediction.
Feature service provides offline and real‑time feature capabilities.
Model service manages algorithm versions, features, and parameters, delivering high‑availability online inference.
AB‑testing platform enables rapid, scientific validation of algorithmic improvements.
Initially, features and models were tightly coupled with business code, leading to low iteration efficiency. Decoupling them onto the ML platform accelerated deployment of carpool algorithms.
02 Matching Recommendation Engine
The engine aims to maximize transaction efficiency while preserving long‑term user retention. Its architecture consists of a data layer (real‑time context, Flink‑computed near‑real‑time signals, and offline wide tables) and a model layer with four stages: recall, coarse ranking, fine ranking, and re‑ranking.
Recall challenges include heavy real‑time computation (e.g., geographic expansion and route planning), single‑appearance orders that cannot be modeled offline, and low‑frequency users. Solutions involve trajectory compression (80% reduction), graph‑based recall via encoding orders into discrete IDs, and grid‑based recall using historical destinations.
Graph recall process:
Orders are discretized into codes, mapped to driver history, and processed with node2vec to generate homogeneous graph structures. Skip‑gram with negative sampling trains embeddings for downstream recall.
Fine‑ranking evolution:
Algorithm 1.0: simple route‑distance sorting, LR model increased order acceptance by 6%.
Algorithm 2.0: LightGBM + LR (pointwise) and LambdaRank (listwise) raised acceptance by up to 10%.
Algorithm 3.0: hybrid model – LightGBM trees produce 1000‑dim leaf codes, embedded and fed into a pyramid‑style neural network with dropout and batch‑norm, finally combined with raw discrete features before a sigmoid layer. This model improved offline AUC by two percentile points.
03 Transaction Ecosystem Governance Algorithm
The governance module ensures driver and passenger experience and safety before, during, and after trips. It combines static features (spatial‑temporal, order attributes) with real‑time signals (trajectory streams, IM/chat, calls).
Samples are obtained via manual labeling, customer‑service tickets, and crowd‑sourced reviews. Offline models are trained for pre‑trip cancellation prediction, in‑trip trajectory deviation detection, and post‑trip dispute resolution.
Model evolution challenges and solutions:
Sparse trajectory data – trigger user prompts at appropriate moments.
Scarce labeled samples – augment with small‑sample learning and pseudo‑labeling.
Multi‑modal features – early‑fusion of heterogeneous signals using correlation‑aware layers.
Explainability – combine rule‑based logic with SHAP analysis for feature contribution.
Trajectory deviation (偏航) algorithm versions:
v1: compute angle and distance changes between consecutive trajectory batches, generate a deviation score, and push alerts to the app.
v2: use v1 alerts plus customer‑service tickets as positive samples to train a LightGBM small‑sample model.
v3: with larger crowdsourced labels, train a deep model and explore multi‑objective training (e.g., deviation + ticket generation).
04 Intelligent Marketing
The marketing system consists of architecture, user lifecycle management, model evolution, and uplift modeling.
Data sources include user profiles, behavior features, and recent click‑stream signals. Offline models (ML or deep) are deployed online via a CRM platform to deliver personalized coupons.
User lifecycle stages: acquisition, activation, retention, and re‑engagement. The key problem is identifying the right users for coupons, the appropriate coupon value, and the optimal copy (smart copy generation is handled by a dedicated team).
Uplift modeling evolution:
v1: response model predicts travel probability; coupons are issued based on predicted probability, leading to low ROI due to natural conversions.
v2: uplift model estimates incremental effect of coupon issuance, improving targeting.
v3: combine predicted redemption probability and incremental gain, solve an integer‑programming allocation problem (using Google OR‑Tools) to respect budget constraints and maximize ROI.
Uplift model families:
S‑Learner: single model with treatment indicator.
T‑Learner: separate models for treatment and control.
X‑Learner: hybrid approach, leveraging both S‑ and T‑Learner predictions; shown to achieve the best offline AUUC and Gini scores for driver activation.
Overall, the integrated uplift + optimization framework increased ROI by about 10% compared with the previous response‑plus‑rule pipeline, demonstrating the value of causal inference in marketing.
The article concludes with a call to action for readers to follow the Haolo technology public account for future technical shares.
HelloTech
Official Hello technology account, sharing tech insights and developments.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.