Artificial Intelligence 22 min read

Deep Contextual Interest Network (DCIN) for CTR Prediction

This article introduces the Deep Contextual Interest Network (DCIN), a novel CTR prediction model that jointly models clicked items, surrounding display context, and position bias through three modules—PCAM, FCFM, and IMM—showing significant offline AUC gains and a 1.5% online CTR improvement.

Meituan Technology Team

Nov 9, 2023

Deep Contextual Interest Network (DCIN) for CTR Prediction

1. Background

Click‑through‑rate (CTR) prediction is a fundamental task for online advertising and recommendation systems. Existing user‑behavior sequence models mainly capture positive interest from clicked items and ignore the surrounding display context, which limits performance.

Prior work such as DFN and DUMN introduced unclicked behavior to model negative preference, but they still treat clicks and non‑clicks separately and overlook interactions between clicked and surrounding items. Because many items are shown together on a screen, the context of displayed items influences user decisions.

Figure 1 illustrates that a user may click a green T‑shirt when surrounding items are unrelated (phone, shoes), but prefers a blue T‑shirt when all surrounding items are T‑shirts, highlighting the importance of display context.

2. Deep Contextual Interest Network (DCIN)

2.1 CTR Model Overview

Given a sequence of user clicks c_1,…,c_T and corresponding browse items, the model predicts the probability of clicking a target item. For each click we also select K surrounding display items, denoted s_{t,1},…,s_{t,K}, and record absolute and relative positions.

2.2 Model Architecture

DCIN consists of three key modules:

Position‑aware Context Aggregation Module (PCAM) : uses an attention mechanism to aggregate the K display items for each click. Position bias is incorporated by concatenating item embeddings with absolute position embeddings and a learned relative‑position embedding.

Feedback‑Context Fusion Module (FCFM) : a two‑layer MLP that non‑linearly fuses the click representation with its aggregated context representation, enabling interaction between click and surrounding items.

Interest Matching Module (IMM) : adopts the DIN attention to match the fused representation with the target item, producing the final user representation that is fed into a binary cross‑entropy loss for CTR estimation.

Figure 2 shows the overall DCIN framework.

2.3 Online Service Optimizations

Because incorporating K display items expands sequence length by tens of times, naive online inference would increase latency. DCIN mitigates this by pre‑computing the context‑aware interest vector u_t offline (PCAM and FCFM are independent of the target item). At inference time only the IMM computation is required, saving roughly 10 ms and allowing sequence lengths 28× longer than the RACP baseline with only ~1 ms additional latency.

3. Experiments

3.1 Experimental Setup

We constructed a 31‑day industrial dataset from Meituan’s ad system, containing billions of samples. The first 30 days form the training set; the last day is held out for testing. Each user’s recent 50 clicks are kept, and for each click the surrounding 20 display items are used as context.

Six baselines are compared: Wide&Deep, DeepFM, DIN, DIEN, DFN, and RACP. All models share the same feature set. Evaluation metrics are AUC and Relative Improvement (RelaImpr) over the baseline.

3.2 Results

DCIN achieves the highest AUC and a 21.24 % RelaImpr over the best baseline. When limiting sequence length to that of RACP (DCIN‑Short), DCIN still outperforms RACP because it models both per‑click context and position bias, which RACP ignores.

Offline results (Table) show DCIN surpassing all baselines; online A/B testing demonstrates a 1.5 % lift in CTR and RPM compared with the DIN online baseline.

3.3 Ablation Study

Removing position information from PCAM (DCIN‑short‑w/o position) degrades performance, confirming the importance of position bias. Removing FCFM (DCIN‑short‑w/o FCFM) also hurts results, indicating that fusing click and context representations is essential.

3.4 Case Study

We sampled 10 clicked items and, for each, generated 100 different display contexts. t‑SNE visualizations of the resulting context‑aware interest vectors show clear separation between different clicked items and diverse representations for the same click under different contexts, evidencing DCIN’s ability to capture fine‑grained contextual interest.

Attention weight visualizations (Figure 4) reveal that DIN assigns identical weights to the same target item across contexts, whereas DCIN produces distinct weights reflecting context‑dependent competition.

4. Conclusion

We highlight the necessity of incorporating display context and position information into user‑interest modeling. The proposed DCIN model delivers significant offline and online gains and has been fully deployed in Meituan’s online advertising system.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

recommendation CTR DeepLearning ABTesting ContextualModeling UserInterest

Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.