Artificial Intelligence 12 min read

Reinforcement Learning for Intelligent Marketing in Didi's Xiaoju Car Service

Didi’s Xiaoju Car Service leverages a reinforcement‑learning framework with Double DQN and graph‑embedding‑based personalization across its traffic‑distribution, tagging, portrait, targeting, strategy, and reach‑optimization modules, replacing manual rule‑based marketing, and achieves roughly 8 % new‑user lift, 50 % cost reduction, and significant gains in open and conversion rates.

Didi Tech
Didi Tech
Didi Tech
Reinforcement Learning for Intelligent Marketing in Didi's Xiaoju Car Service

Overview

Didi’s Xiaoju Car Service, a brand under Didi, provides an integrated one‑stop vehicle service platform for car owners. The presentation focuses on the practical application of reinforcement learning (RL) to improve user operation and marketing ROI.

Algorithm System Architecture

The user‑operation system consists of a traffic distribution platform, a tag system, a portrait module, a target‑audience targeting module, a marketing‑strategy module, and a reach‑optimization module.

Traffic Distribution Platform : Delivers scenario‑based ads and interacts with users via push, SMS, etc.

Tag System : Identifies user attributes and helps operators finely select target groups.

The four stages of user operation are supported by corresponding algorithmic modules:

Module 1 – Portrait : Builds fine‑grained user, merchant, and vehicle portraits.

Module 2 – Target Audience Targeting : Selects appropriate user groups for long‑term value, short‑term conversion, churn prediction, etc.

Module 3 – Marketing Strategy : Uses RL, combinatorial policies, and personalized recommendation. It covers both internal driver‑related data and external growth channels (social marketing, DSP).

Module 4 – Reach Optimization : Issues coupons or activity reminders based on user state, balancing user disturbance with platform revenue.

Pain Points of Manual Operation

Manual rule‑based marketing is coarse, relies heavily on operator experience, splits the user lifecycle into rigid stages, and fails to exploit rich online/offline behavior features, leading to user fatigue and low conversion.

RL‑Based Solution

The RL framework models the interaction between the platform (agent) and users (environment). At each step the agent can:

Issue a coupon of a certain denomination.

Send a message with a specific frequency.

Take no action (empty action).

The environment returns a reward based on user response (e.g., coupon redemption, message click). State features include offline behavior, online activity, static attributes, and model‑predicted scores.

Two action cycles are defined: an action cycle during which the user can react, followed by a silence cycle where no further actions are taken.

Double DQN Algorithm

To avoid Q‑value overestimation in classic DQN, a Double Deep Q‑Network is employed. The training network’s parameters are periodically copied to a target network, and loss is computed using both networks. Negative sampling mitigates class‑imbalance between positive and negative feedback.

Graph Embedding for Personalized Messaging

Three graph‑embedding methods are used to personalize reminders:

LINE : Learns on homogeneous graphs to embed users and stations.

TransE : Handles heterogeneous graphs with user‑station edges.

GraphSAGE : Works on both homogeneous and heterogeneous graphs, leveraging structural and attribute information.

Results

RL experiments show a stable ROI advantage over the control group. Full‑traffic deployment in the fueling business yields:

≈8% lift in new‑user acquisition and recall rates.

~50% cost reduction, achieving higher ROI.

Personalized messaging using graph embedding improves open rates and conversion rates:

LINE vs. manual: +7% open rate, +10% conversion.

TransE vs. LINE: +4% open rate, +6% conversion (cumulative +11% open, +16% conversion over manual).

Overall, the intelligent marketing system driven by RL and graph embedding significantly enhances user growth, efficiency, and experience.

Conclusion

The case demonstrates how reinforcement learning and graph‑based personalization can be integrated into large‑scale user operation to achieve measurable business impact.

machine learningReinforcement LearningDidigraph embeddingintelligent marketinguser operation
Didi Tech
Written by

Didi Tech

Official Didi technology account

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.