Artificial Intelligence 19 min read

CSCNN: Category‑Specific Convolutional Neural Network for Visual CTR Prediction in JD E‑commerce Advertising

This article presents CSCNN, a category‑specific convolutional neural network that integrates visual priors into click‑through‑rate (CTR) models for JD.com’s e‑commerce advertising, detailing its motivation, architecture, engineering optimizations, offline and online training strategies, and empirical performance gains on both public and industrial datasets.

DataFunTalk
DataFunTalk
DataFunTalk
CSCNN: Category‑Specific Convolutional Neural Network for Visual CTR Prediction in JD E‑commerce Advertising

JD.com’s search advertising platform relies heavily on CTR models to rank ads; with the massive influx of visual content, leveraging image information has become a new trend. The talk introduces CSCNN, a next‑generation ad‑ranking model that incorporates visual cues into CTR prediction.

The presentation first outlines the background of JD’s 9NAI platform, the challenges of optimizing eCPM in e‑commerce, and the four‑fold feature space (query, user, product, context) used in CTR modeling. It then discusses the limitations of traditional CNNs for this domain, such as weak supervision, overfitting on sparse features, and engineering bottlenecks.

To address these issues, the authors propose a multi‑modal feature pipeline: manual features, text features, user‑side interaction features, and image features. They highlight the need for visual priors—category information that can guide CNN learning—so that the network focuses on category‑relevant details and avoids irrelevant background noise.

CSCNN builds on a category‑specific attention mechanism similar to SENet. For each convolutional layer, channel‑wise and spatial‑wise attention modules receive both the feature map and a category embedding, producing refined feature maps that are biased toward the given product category. This design effectively turns the CNN into a category‑aware extractor.

Engineering solutions are described to mitigate training and serving costs: offline pre‑computation of image embeddings, aggregation of identical product requests, synchronized multi‑GPU updates, and a lookup‑table‑based online serving that reduces latency to sub‑20 ms on CPU.

Extensive experiments on a public Amazon dataset and JD’s massive industrial logs (hundreds of billions of samples, thousands of categories) demonstrate significant AUC improvements over baseline models, late‑fusion approaches, and other attention‑based methods. The CSCNN model has been deployed at scale, serving billions of users daily.

In conclusion, integrating category‑specific visual priors into CNNs substantially boosts CTR prediction performance in e‑commerce advertising, and the proposed system architecture enables practical, large‑scale deployment.

machine learningDeep LearningCTR predictione-commerce advertisingcategory-specific CNNvisual modeling
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.