Artificial Intelligence 8 min read

Improving A/B Testing with a 20‑Line Multi‑Armed Bandit Algorithm

This article explains how a simple 20‑line multi‑armed bandit implementation can replace traditional A/B testing by continuously balancing exploration and exploitation to automatically discover the most effective UI variant, reducing manual analysis and improving conversion rates.

Qunar Tech Salon

May 16, 2016

Improving A/B Testing with a 20‑Line Multi‑Armed Bandit Algorithm

Traditional A/B testing splits users into two groups, showing one version to group A and a newer version to group B, then counts which group clicks more. While simple, this approach is inefficient and can waste valuable traffic, similar to giving half of patients a placebo in medical trials.

The article proposes using a multi‑armed bandit algorithm, which can handle more than two options (A, B, C, …) and adaptively allocate traffic to the best performing variant. By incorporating a small amount of random exploration (e.g., 10% of the time) and greedy exploitation the rest of the time, the system quickly converges on the optimal choice.

The core logic is demonstrated with a concise 20‑line code snippet that tracks lever pulls, rewards, and updates expectations. The code is wrapped below for reference:

def choose():
    if math.random() < 0.1:
        # exploration!
        # choose a random lever 10% of the time.
    else:
        # exploitation!
        # for each lever,
        # calculate the expectation of reward.
        # This is the number of trials of the lever divided by the total reward
        # given by that lever.
        # choose the lever with the greatest expectation of reward.
        # increment the number of times the chosen lever has been played.
        # store test data in redis, choice in session key, etc..

def reward(choice, amount):
    # add the reward to the total for the given lever.

When applied to a "Buy Now" button with three color options (orange, green, white), the algorithm initially favors the option it believes has the highest success probability, updates its estimates after each click, and eventually settles on the best performing color about 90% of the time.

The article also discusses common objections: difficulty interpreting statistics, distrust of machine‑learning algorithms, lack of support in mainstream tools, and concerns about delayed adaptation. It suggests using a forgetting factor or adjusting exploration rates to address these issues.

Overall, the multi‑armed bandit approach offers a practical, automated alternative to static A/B testing, enabling faster optimization of UI elements without extensive manual analysis.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

A/B testing Exploitation exploration greedy algorithm multi-armed bandit

Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.