Artificial Intelligence 21 min read

Algorithmic Practices in Haola Ride‑Sharing: Platform Infrastructure, Matching Recommendation Engine, Transaction Governance, and Intelligent Marketing

This article presents a comprehensive overview of Haola's ride‑sharing algorithm ecosystem, covering the machine‑learning platform foundation, the architecture and evolution of the matching recommendation engine, the transaction‑ecosystem governance models, and the intelligent marketing uplift framework, highlighting technical challenges, solutions, and performance gains.

DataFunTalk

Jun 26, 2021

Algorithmic Practices in Haola Ride‑Sharing: Platform Infrastructure, Matching Recommendation Engine, Transaction Governance, and Intelligent Marketing

01 Business Introduction

First, the article introduces Haola's algorithm platform infrastructure, which provides strong support for deploying algorithms in the ride‑sharing business.

1. Platform Infrastructure

Haola's machine‑learning platform is built on machine‑learning and deep‑learning frameworks and offers end‑to‑end services from data preprocessing, model training, evaluation to online prediction, allowing algorithm engineers to focus on strategy rather than engineering.

The platform relies on Hadoop/YARN for resource scheduling and integrates Spark ML, XGBoost and TensorFlow, supporting both CPU and GPU resources.

The feature‑service platform provides offline and real‑time feature capabilities, enabling offline features to be applied online and real‑time computed features to be pushed online.

The model‑service platform manages algorithm versions, associated models, features and parameters, and provides highly available online prediction for ML/DL models.

The AB‑testing platform enables fast and reliable evaluation of algorithm effects.

Initially, features and models were tightly coupled with business logic, causing low iteration efficiency; after decoupling, the platform enables rapid deployment of ride‑sharing algorithms.

02 Matching Recommendation Engine

The matching engine consists of three parts: architecture, recall‑module evolution, and ranking‑module evolution.

1. Architecture

The goal is to maximize transaction efficiency while maintaining long‑term retention. Data comes from three sources: real‑time context from the client (price, distance, etc.), near‑real‑time metrics from Flink (e.g., number of drivers who saw the order), and offline wide‑table features.

The model layer includes recall, coarse ranking, fine ranking and re‑ranking stages. Different sub‑scenes (driver side: temporary trips, frequent routes, nearby orders, cross‑city orders; passenger side) have customized models.

2. Recall Module

Four challenges are identified: (1) heavy real‑time computation, solved by a trajectory‑compression algorithm that reduces computation by 80%; (2) one‑time orders lacking offline embeddings, addressed by discretizing orders into a 14‑digit code and applying graph‑based recall; (3) driver preference for nearby, high‑price, on‑time orders, handled by adding these core factors to the recall chain; (4) low‑frequency users, mitigated by grid‑based recall of historical destinations.

Recall versions: V1 distinguishes intra‑city vs cross‑city and uses core‑strategy recall; V2 adds mileage‑based recall, improving order completion by 5%; V3 adds grid and graph recall.

Graph recall process: important features are bucketed (price, route similarity, distances) to obtain a 14‑digit code; driver order histories are mapped to these codes; a homogeneous graph is built with node2vec; embeddings are trained with skip‑gram and negative sampling.

3. Ranking Module

During cold‑start the system ranked by route similarity. Algorithm 1.0 employed logistic regression, increasing order acceptance by 6%. Algorithm 2.0 used LightGBM + LR (pointwise) and LambdaRank (listwise), achieving up to 10% lift. Deep models (DeepFM, xDeepFM, DIN, DIEN) did not outperform version 2.0.

Algorithm 3.0 converts continuous features into high‑dimensional discrete features via LightGBM leaf encoding (1000‑dimensional), then feeds normalized continuous features, raw discrete features and embeddings into a pyramid‑style neural network with dropout and batch‑norm, finally combining the last hidden layer with raw discrete features before a sigmoid output.

Offline tests show a two‑point percentile AUC improvement over the baseline.

03 Transaction Ecology Governance Algorithm

The goal is to ensure driver and passenger experience and safety before, during, and after trips.

1. Link

Before a trip, the system predicts probabilities of cancellation, complaints, or malicious events to intervene and match suitable drivers and passengers, and also predicts risky behaviors (offline transactions, detours) to provide education and guidance.

During a trip, real‑time trajectory deviation and abnormal stop detection protect safety.

After a trip, responsibility‑attribution algorithms protect the rights of both parties.

2. Architecture

Features include basic spatio‑temporal data, order attributes (e.g., car‑pooling, passenger count, cross‑city flag), offline driver‑passenger behavior, and real‑time signals such as trajectory streams, IM chat, and calls. Labels are obtained from manual annotation, customer‑service tickets, and crowd‑review samples.

Offline models are trained and deployed via the ML platform for each stage of the transaction.

3. Model Evolution

Challenges: scarce offline behavior data, lack of labeled samples, multimodal feature sources, and need for strong interpretability. Solutions include using customer‑service tickets and crowd‑review as positive samples, multimodal feature fusion, and SHAP‑based rule extraction for explainability.

Trajectory deviation algorithm evolution: V1 computes angle and distance changes between successive trajectory batches and issues push notifications; V2 adds labeled samples and trains LightGBM with few‑shot learning; V3 incorporates more crowd‑review data and trains deep models. Future work plans multi‑objective training (e.g., deviation + ticket generation).

04 Intelligent Marketing

This section covers architecture, user lifecycle, model evolution, and uplift modeling.

1. Architecture

Data includes user profiles, behavior features, and recent browsing/clicks. Offline models (machine‑learning or deep) are trained and deployed online; a CRM system distributes personalized coupons.

2. User Lifecycle

Typical stages are acquisition, activation, retention, and re‑engagement. For each stage, specific marketing algorithms and delivery methods are applied to boost activity and ROI.

3. Uplift Model

V1 predicts travel probability (response model) to issue coupons, resulting in low ROI because many users convert naturally. V2 introduces uplift modeling (S‑Learner) to estimate the incremental effect of coupons, improving targeting. V3 predicts coupon redemption probability and incremental value, then solves an integer‑programming problem (Google OR‑Tools) to allocate coupons under budget constraints, achieving a 10% ROI increase.

Three uplift modeling paradigms are described: S‑Learner (single model with treatment indicator), T‑Learner (separate models for treatment and control), and X‑Learner (combines both, uses pseudo‑outcomes, trains two models whose weighted sum yields uplift). X‑Learner shows the best offline AUUC and Gini scores and superior performance in driver activation scenarios.

The uplift + optimization framework outperforms the previous response + rule framework, demonstrating the effectiveness of causal inference in marketing.

Thank you for listening.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Machine Learning recommendation system AI Algorithms Ride-sharing Marketing Optimization transaction safety

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.