Artificial Intelligence 11 min read

Neuron‑level Shared Multi‑task Learning for Joint CTR and CVR Prediction

This article introduces a neuron‑level shared multi‑task learning framework that jointly estimates click‑through rate (CTR) and conversion rate (CVR), discusses the background and advantages of multi‑task learning, reviews classic shared‑bottom models, describes the proposed pruning‑based architecture, and presents experimental results demonstrating its effectiveness in large‑scale recommendation systems.

DataFunSummit
DataFunSummit
DataFunSummit
Neuron‑level Shared Multi‑task Learning for Joint CTR and CVR Prediction

Introduction The article presents a neuron‑level shared approach for jointly estimating CTR and CVR in recommendation systems. It is divided into two parts: an overview of multi‑task learning (MTL) development and a detailed description of the proposed joint estimation method.

Background of Multi‑task Learning MTL aims to improve performance by sharing information across related tasks. In e‑commerce, CTR (click‑through rate) and CVR (conversion rate) are key metrics; although they target different stages (exposure→click vs. click→conversion), they share underlying user‑item interactions, making them suitable for MTL.

Advantages of MTL MTL provides implicit data augmentation, attention focusing, feature stealing, reduced representation bias, and regularization effects, all of which can improve model generalization and efficiency.

Classic Multi‑task Models The most common architectures in recommendation are the ESMM series (share‑bottom with joint CTR/CVR learning) and the MMOE series (expert‑gate mixtures). These models share bottom‑layer embeddings but differ in how task‑specific information is combined.

Proposed Neuron‑level Shared Method Inspired by the lottery ticket hypothesis, the method iteratively prunes a base network to obtain task‑specific masks for CTR and CVR. Each task updates only the weights within its mask, reducing conflict while preserving shared representations. The final model is obtained by element‑wise (Hadamard) multiplication of the base weights with the learned masks.

Training and Inference Both tasks use the same base network (sNET). Masks are initialized as all‑ones, then after each epoch the smallest‑magnitude weights (a percentage p) are masked out. The best mask per task is selected based on validation performance, and the corresponding sub‑network is fine‑tuned.

Experimental Results Offline experiments on a short‑video dataset (≈10 M users, ≈11 M videos) show that the proposed S_weight model reduces CVR MSE by about 3 % compared with a hard‑sharing baseline. Online A/B tests with four configurations demonstrate that the joint S_weight CTR + CVR model yields modest but consistent improvements over single‑task baselines.

Conclusion The neuron‑level sharing paradigm mitigates task‑conflict, automatically discovers useful sub‑networks, and has been deployed in a billion‑scale video recommendation system, delivering significant performance gains and being extensible to other downstream metrics such as like‑rate or comment‑rate.

ctrCVRneural networksmulti-task learningRecommendation systemsmodel pruning
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.