Exploitation & Exploration Algorithms in Recommender Systems: ε‑Greedy, UCB, and Thompson Sampling Applications
This article introduces recommender systems and the exploitation‑exploration dilemma, explains common E&E algorithms such as ε‑greedy, Upper‑Confidence‑Bound, and Thompson Sampling, and details their practical deployment for interest‑point eviction, selection, and adaptive recall count optimization in an automotive recommendation platform.
Recommender systems help users discover content by modeling interests from behavior, addressing information overload and complementing search engines, which suffer from the Matthew effect. They collect historical actions, demographics, and preferences to generate item lists, giving exposure to long‑tail items.
The exploitation‑exploration (E&E) problem arises when new users or items appear, when diversity and novelty are needed, and when solely recommending known interests reduces long‑term gains. Exploitation uses known user interests, while Exploration randomly probes to discover new preferences.
Common E&E algorithms include:
ε‑Greedy : with probability ε (e.g., 0.05) explore, otherwise exploit.
Upper‑Confidence‑Bound (UCB) : selects arms based on estimated reward plus a confidence term that balances exploration and exploitation.
Thompson Sampling : models each arm with a Beta‑Bernoulli bandit, sampling from the posterior to decide which arm to pull.
In the AutoHome recommendation system, these algorithms are applied as follows:
Interest‑point eviction uses probabilistic thresholds (80% after 3 non‑clicks, 95% after 5, 99% after 8) to remove low‑impact points.
Interest‑point selection employs Thompson Sampling on click (win) and view (loss) counts of tags (e.g., car series, brand) to rank and select the top‑n points for recall.
Adaptive recall count leverages UCB to automatically allocate the number of items recalled per strategy, using minute‑level CTR as reward and storing the configuration in Redis.
These deployments improve system simplicity, reduce unnecessary recalls, and enable fast parameter tuning without heavy code changes, while also addressing limitations such as coarse eviction granularity and tracking challenges.
Future work plans to extend E&E to broader reinforcement‑learning frameworks, exploring D‑Q‑learning with neural networks to model state‑action‑value functions for optimal recommendation and ranking decisions.
HomeTech
HomeTech tech sharing
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.