Dynamic Creative Optimization and DeepMCP Feature Learning for CTR Prediction
This talk presents a dynamic creative optimization framework that combines style‑material selection with DSA modeling to address the combinatorial explosion in CTR prediction, and introduces DeepMCP, an auxiliary network that improves feature embeddings through user‑ad and ad‑ad relationships, achieving superior performance in large‑scale advertising systems.
Dynamic Creative Optimization for CTR Prediction
In the rich‑media era, ad formats must be personalized, which brings huge challenges to Click‑Through Rate (CTR) estimation. To tackle this, we propose a dynamic style combination selection together with a DSA (Dynamic Slot Auction) model, integrated with position‑wise auction, to effectively solve the style‑selection problem.
Traditional CTR models rarely capture feature interactions. We introduce an MCP model that learns better feature representations via an auxiliary network without adding extra online inference cost.
Agenda
Dynamic style CTR modeling
Feature expression auxiliary learning
1. Business Background and CTR Introduction
Our main business includes search ads and feed ads. Search ads are triggered by user queries, while feed ads are shown without a query.
A typical commercial ad system consists of:
Frontend server: media ingestion, ad style encapsulation, etc.
Server A: central strategy server that connects downstream servers and performs ranking and filtering.
Server C: generates candidate ad sets (triggers for search, targeting for feed).
Server B: ad retrieval and requests the model server for Q‑value computation.
Model server: performs various Q‑value calculations.
Additional user‑dimension real‑time information servers and storage.
The entire ad system is maintained by engineering teams, while strategy teams iterate on policies. The next sections focus on the crucial component of the system: CTR estimation.
2. Click‑Through Rate (CTR) Estimation
CTR estimation predicts the probability that a given user will click a given ad. It is essential for ad selection, ranking (CTR × bid), and charging.
Typical CTR prediction pipeline:
Collect user ID, user attributes (age, gender, interests), and ad title.
Gather click / non‑click feedback as training samples to train a complex non‑linear model.
Research directions include feature optimization, model algorithm improvements, and innovative model applications.
3. Related Work
Logistic Regression (LR): simple and popular in early years.
Factorization Machines (FM): adds second‑order feature interactions.
Deep Neural Networks (DNN): embeds features and stacks fully‑connected layers, using Sigmoid for CTR.
Wide&Deep: combines memorization (LR) with generalization (DNN).
DeepFM: upgrades Wide&Deep by replacing LR with FM and jointly learning FM and DNN embeddings.
Later works include DCN, DIN, etc.
4. Dynamic Style CTR Modeling
Ad product styles have evolved from simple title + description to richer layouts such as product lists, mixed image‑text, and grouped sub‑chains, greatly increasing CTR potential.
Dynamic creative is realized through two elements: layout (style arrangement) and components (material). An ad is decomposed into elements like title, image, description, and groups, forming a layout. Components are then filled with content from the ad library.
The challenge lies in selecting the optimal layout‑material combination for each user and ad, as well as handling the combinatorial explosion when many ads share the same screen.
4.1 Simple Algorithm (Baseline)
Ad retrieval: obtain creative IDs.
CTR estimation: filter ads using a basic CTR score (no style/material).
Ad ranking: determine final display order.
Style & material selection: compute final creative style.
This reduces the candidate set but ignores the contribution of style and material to CTR, leading to sub‑optimal ranking.
4.2 Style+Material Combination Optimization
We first select a layout, then fill each container with appropriate material (title, image, description, etc.). The naive exhaustive computation would require millions of CTR evaluations per ad, which is infeasible.
We adopt a greedy algorithm with early‑exit (EE) pruning: each layer of the layout tree is optimized sequentially (title → image → description → group …), dramatically cutting the search space.
Ranking is performed by combining CTR estimates with eCPM maximization.
4.3 Overall Optimization Strategy
Per‑layer selection with mutual‑exclusion rules.
CTR estimation for each component.
Ranking by max‑eCPM.
Pruning (EE) to limit nodes per layer.
After layout selection, component‑level material selection is performed.
4.4 Algorithms and Models Used
CTR estimation: DNN.
CTR‑EE: Thompson sampling.
Pruning EE: future work includes deep reinforcement learning (DRL) for early pruning decisions.
4.5 Improved Workflow with DSA Model
The improved pipeline adds a DSA model that incorporates position, layout, material, and context features on top of the upstream Q‑value.
Retrieve creative IDs.
Coarse CTR pre‑ranking to select top‑N ads.
Style‑material combination optimization for the top‑N ads.
PSA (position‑wise sequential auction) computes ad displays.
DSA model refines CTR using rank, style, material, and previous ads.
Joint training of the coarse CTR model and DSA model mitigates survivor bias and improves overall eCPM.
5. Feature Expression Auxiliary Learning (DeepMCP)
Traditional CTR models map features to CTR but ignore relationships among feature embeddings. DeepMCP adds auxiliary networks to learn better embeddings by modeling user‑ad and ad‑ad relationships.
5.1 DeepMCP Overview
Three sub‑networks share the same embedding matrix:
Prediction sub‑network: standard CTR prediction (can be any model such as Wide&Deep, DeepFM, etc.).
Matching sub‑network: learns similarity between user and ad embeddings via a tanh‑activated MLP, producing a matching score.
Correlation sub‑network: models ad‑ad relationships using a graph‑like approach inspired by skip‑gram, where a target ad is pulled close to its context ads and pushed away from negative samples.
During training, all three losses (prediction, matching, correlation) are combined with weighting factors α and β, tuned on a validation set for best AUC. In online inference only the prediction sub‑network is active, incurring no extra cost.
5.2 Experiments
We evaluated on the public Avito dataset and internal business data, comparing against baseline models (LR, FM, DNN, PNN, Wide&Deep, DeepFM). Variants of DeepMCP were tested:
DeepCP: only correlation network.
DeepMP: only matching network.
DeepMCP: both matching and correlation.
Results show DeepMP outperforms DeepCP, and DeepMCP achieves the best overall performance. In production, DeepMP was chosen for its best trade‑off between accuracy and model complexity, yielding +2.9% CTR and +0.9% CPM in A/B tests.
We also studied the impact of the weighting parameters and the depth of the auxiliary networks, confirming that a larger weight for the matching loss and moderate depth for the MCP network give the best results.
5.3 Summary of Findings
Traditional CTR models only learn feature‑to‑CTR relationships.
DeepMCP additionally learns feature‑to‑feature relationships (user‑ad and ad‑ad).
It is a multi‑task learning framework that improves both prediction ability and embedding quality.
Matching sub‑network contributes more to performance than correlation sub‑network.
DeepMP (prediction + matching) offers the best accuracy‑complexity trade‑off.
6. Conclusion
Dynamic style upgrades introduce a combinatorial explosion problem for CTR estimation. By first narrowing the ad queue and then applying greedy + EE style‑material selection, we reduce computation while preserving ranking quality. Integrating PSA and DSA models further fuses style selection with ranking, achieving significant gains. DeepMCP demonstrates that auxiliary feature‑learning networks can substantially boost CTR prediction without extra online cost.
Thank you for your attention.
PS: To join the Alibaba Big Data & AI technical salon, follow the public account and reply with the keyword "小师妹" to receive the WeChat group invitation and related job recommendations.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.