Artificial Intelligence 10 min read

Exploitation & Exploration Algorithms in Recommender Systems: ε‑Greedy, UCB, and Thompson Sampling Applications

This article introduces recommender systems and the exploitation‑exploration dilemma, explains common E&E algorithms such as ε‑greedy, Upper‑Confidence‑Bound, and Thompson Sampling, and details their practical deployment for interest‑point eviction, selection, and adaptive recall count optimization in an automotive recommendation platform.

HomeTech
HomeTech
HomeTech
Exploitation & Exploration Algorithms in Recommender Systems: ε‑Greedy, UCB, and Thompson Sampling Applications

Recommender systems help users discover content by modeling interests from behavior, addressing information overload and complementing search engines, which suffer from the Matthew effect. They collect historical actions, demographics, and preferences to generate item lists, giving exposure to long‑tail items.

The exploitation‑exploration (E&E) problem arises when new users or items appear, when diversity and novelty are needed, and when solely recommending known interests reduces long‑term gains. Exploitation uses known user interests, while Exploration randomly probes to discover new preferences.

Common E&E algorithms include:

ε‑Greedy : with probability ε (e.g., 0.05) explore, otherwise exploit.

Upper‑Confidence‑Bound (UCB) : selects arms based on estimated reward plus a confidence term that balances exploration and exploitation.

Thompson Sampling : models each arm with a Beta‑Bernoulli bandit, sampling from the posterior to decide which arm to pull.

In the AutoHome recommendation system, these algorithms are applied as follows:

Interest‑point eviction uses probabilistic thresholds (80% after 3 non‑clicks, 95% after 5, 99% after 8) to remove low‑impact points.

Interest‑point selection employs Thompson Sampling on click (win) and view (loss) counts of tags (e.g., car series, brand) to rank and select the top‑n points for recall.

Adaptive recall count leverages UCB to automatically allocate the number of items recalled per strategy, using minute‑level CTR as reward and storing the configuration in Redis.

These deployments improve system simplicity, reduce unnecessary recalls, and enable fast parameter tuning without heavy code changes, while also addressing limitations such as coarse eviction granularity and tracking challenges.

Future work plans to extend E&E to broader reinforcement‑learning frameworks, exploring D‑Q‑learning with neural networks to model state‑action‑value functions for optimal recommendation and ranking decisions.

reinforcement learningrecommender systemsepsilon-greedyUCBBandit algorithmsexploitationexplorationThompson Sampling
HomeTech
Written by

HomeTech

HomeTech tech sharing

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.