Artificial Intelligence 18 min read

Bayesian Smoothing and Key-Value Memory Networks for Click-Through Rate Prediction in Recommendation Systems

This article presents a Bayesian smoothing approach to alleviate cold-start problems in click-through rate estimation, introduces key-value memory networks to incorporate prior knowledge, and proposes methods to convert continuous features into dictionary embeddings for deep learning models in recommendation systems.

Tencent Advertising Technology
Tencent Advertising Technology
Tencent Advertising Technology
Bayesian Smoothing and Key-Value Memory Networks for Click-Through Rate Prediction in Recommendation Systems

The article begins by describing the cold-start problem in recommendation systems, where new users or items have insufficient historical data, leading to unreliable click-through rate (CTR) estimates. It illustrates the issue with examples showing how a single click can cause large variance in CTR for sparse data.

To address this, the author proposes using Bayesian smoothing: treating the observed click count as a sample from a beta distribution and computing a posterior probability that combines prior knowledge from all users with the individual’s observed behavior. This yields a smoothed CTR that is more robust for cold-start cases.

Next, the paper introduces Key-Value Memory Networks (KVMN) as a mechanism to inject prior knowledge into deep neural networks. The network consists of three components: Key hashing (building a dictionary of frequent words), Key addressing (computing similarity between a question and keys via embeddings and softmax), and Value reading (producing an output vector by weighting value embeddings with the similarity scores). The KVMN allows the model to treat input as a set of key‑value pairs, enabling richer feature representation.

For continuous features such as the smoothed CTR, the author discusses several strategies to convert them into vector inputs suitable for embedding layers. These include discretization (equal‑width or equal‑frequency binning), supervised discretization using lightGBM, and a method from the AutoInt paper that multiplies the original continuous value with the embedding of its discretized version. A novel approach by a contest winner (Guo Da) maps the continuous value to a dictionary feature based on distances to uniformly spaced anchors in [0,1], applies an inverse distance weighting and softmax to obtain weights, and optionally uses a squared inverse distance variant to improve spread.

The article further extends the idea by treating the click‑through probability as a full distribution rather than a single expectation. By segmenting the probability density function into intervals and using the average probability in each bin as a dictionary feature, the model can directly consume a distribution‑level representation without an extra continuous‑to‑discrete step.

Finally, the author provides code implementations for Bayesian smoothing, beta‑distribution‑based feature generation, continuous‑to‑vector conversion (including softmax and gravitation‑based weighting), and the KVMN layers (embedding, hashing, addressing, reading, and sequence‑multiply/pooling components). The code is presented in Python with necessary imports and class definitions, and the author notes that the smoothing code is adapted from online sources while the remainder is original.

Deep Learningrecommendation systemsclick-through rateBayesian smoothingcontinuous feature embeddingkey-value memory networks
Tencent Advertising Technology
Written by

Tencent Advertising Technology

Official hub of Tencent Advertising Technology, sharing the team's latest cutting-edge achievements and advertising technology applications.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.