Unleashing the Algorithmic Advantage with Multi-Armed Bandits

Conversion optimisation is crucial for boosting website metrics. While A/B testing is a standard experimentation approach, Multi-Armed Bandits (MAB) offer an intriguing alternative methodology. But what exactly are multi-armed bandits and how can they enhance your CRO efforts?

What is a Multi-Armed Bandit?

Multi-Armed Bandit is a problem of decision-making. For example, when you have a limited set of options or resources, and you don’t know which option will give you optimal results, you might choose to use MAB.

The classic example of a multi-armed bandit problem is the dilemma of a Gambler in a casino. For a gambler without any previous knowledge of slot machines and in a limited time, it is very difficult to decide which machine to play, how many times to play, and in which order should play so that you can get maximum gain. Multi-Armed Bandits would be an effective way to resolve this, as it’s algorithm continuously optimises traffic distribution, favouring variations that show promising results.

There are various algorithms and theories related to the multi-armed bandit problem. Here are some commonly used algorithms and theories:

Epsilon-Greedy Algorithm: This algorithm balances exploration and exploitation. It selects the arm with the highest estimated reward with probability (1 - ε) and selects a random arm with probability ε to explore other options.
Upper Confidence Bound (UCB) Algorithm: The UCB algorithm uses an upper confidence bound to estimate the reward uncertainty of each arm. It chooses the arm with the highest upper confidence bound, which balances exploration and exploitation.
Thompson Sampling: It is a popular approach used in multi-arm bandit problems. Thompson sampling, named after William R. Thompson, is a Bayesian algorithm for solving the explore-exploit dilemma in multi-arm bandit problems. It balances the need to explore different variations (arms) with the desire to exploit variations that have shown promise based on observed data.

In the context of A/B testing and data collection software, Thompson sampling can be utilised to dynamically allocate resources between different variants to maximise desired outcomes. Like a slot machine with multiple arms/levers, an MAB test will dynamically shift visitors to experience the better-performing variation during the experiment.

Unlike fixed A/B testing which evenly distributes traffic, an intelligent MAB algorithm dynamically allocates more users to the high-converting variation to exploit that winning experience. This enables faster optimisation by focusing traffic on what is empirically working best.

Benefits of Multi-Armed Bandits

The biggest benefit of Multi-Armed Bandits is short term exploitation; getting to a well-performing variant in as short a time as possible. For example, when big promotions like Black Friday campaigns are running. If running a test, you'd want to find the best possible content as quickly as possible, and should appreciate that past learnings may not apply during this sale period. The key benefit to Multi-Armed Bandit is time to value. Instead of running an experiment for 2 weeks, or to 100k users before finding value, you can start shifting traffic in favour of winning experiments as soon as there is data to support a skew. Some other key upsides of multi-armed bandits for experimentation include:

Faster optimisation by shifting more users to the optimal experience in that way maximum users will get the optimal experience during the experimentation.
Particularly helpful for websites with dynamic content, like news websites, where the objective is to maximise the interaction but there are a lot of options and difficult to have a strategy to guide users in pursuing a certain course of action.
Continuously learning and adapting during a test run, unlike A/B testing where we need to wait longer until the experiment finishes.
It works well for websites with a lot of traffic, facilitates decision-making swiftly, and moves consumers closer to the optimal variation more quickly.
It is very suitable for websites that already have some historical data.
Effective for testing many variations like algorithm components.

Multi-Armed Bandit Limitations

While multi-arm-bandits can be a powerful algorithm for Conversion Rate Optimisation (CRO), but there are some limitations as well.

Statistical significance needs balancing with speed. The test may end too soon before a definitive "winning" variation emerges.
Qualitative Experience Testing - Optimising full-page templates, long forms and other qualitative experiences may be difficult since bandits favour iterative component testing.
It might not be appropriate for high-stake modifications, In cases where the consequences of making the wrong decision are significant like adding additional shipping or payment options to the checkout funnel pages.
It is also not preferable for small A/B tests where there are only two variations.
Low Traffic Sites- For smaller sites with sparse traffic, bandits may not accumulate enough data to effectively optimise.

While promising for their efficiency, bandits do need strategic implementation. Understanding these limitations allows teams to apply them selectively where they can maximise impact. A balanced optimisation program combines multiple methodologies.

Practical examples of Multi-Armed Bandits

Here are some more practical examples related to CRO and experimentations of how multi-armed bandits can be applied for conversion rate optimisation:

1. Try various page layouts

Run a bandit test that optimises multiple page layout variants in real-time by serving more users the best-performing layouts. It will shift traffic away from underperforming layouts faster than A/B testing.

2. Test Pricing/Offer variations

You can test various pricing and offer styles and apply a bandit model to test multiple price/offer points simultaneously. It will steer users toward the optimal variation that balances conversion rate and revenue.

3. Higher-performing recommendation faster

A Multi-Armed Bandit algorithm can test out different product recommendation algorithms or the placement/combination of products to determine the optimal approach for generating conversions. More users will see the most relevant, higher-performing recommendation faster.

4. Find a lead-generating variation of the Landing Page faster

They can be effective in testing various layouts and contents on landing pages, you can test different headlines, CTAs, images, and copy on landing pages. The better-performing content can be shown to more users to accelerate optimisation.

5. Try various designs for forms

We can test different configurations of form fields, social proof placements, etc. within a sign-up flow can be tested adaptively to optimise conversion through the funnel.

6. On-page Messaging

Test different on-page messages like exit popups, embedded hello bars, or feedback prompts using a bandit algorithm to determine optimal placement, copy, and offer.

7. Page Load Progress Bars

Experiment with different progress bar styles, colours, and messaging that indicate page load speed/progress. Bandit can tune the most effective format.

8. Chatbot Integrations

Adaptively test the impact of different chatbots, scripts, and activation techniques (e.g. popups, badges) on providing assistance and reducing fallout.

9. Social Proof

Test the impact of showing different types of social proof like reviews, testimonials, trust badges, and peer purchase activity using a bandit.

10. Marketing campaigns

If you are to run a marketing campaign, for a short period of time then it is well- suited for bandit testing to refine the optimal combination iteratively.

11. Page Content Ordering

Use a bandit to determine the ideal order of content blocks or modules on the key website pages. It will serve visitors the variation with the most conversions more frequently.

12. Dynamic content websites

Has to make a decision about which articles to display to a visitor. With no information about the visitor, all click outcomes are unknown. The first question is, which articles will get the most clicks? And in which order should they appear? The website’s goal is to maximise engagement, but they have many pieces of content from which to choose, and they lack data that would help them to pursue a specific strategy.
The key is applying bandits to experiences that can be dynamically adjusted and optimised based on live user data. Their adaptive nature suits any scenario with clear success metrics.
Overall, multi-armed bandits introduce an intriguing opportunity to complement standard testing efforts. Their adaptive nature empowers practitioners to optimise pages rapidly. When applied strategically, bandits can accelerate experimentation programs.

A more dynamic approach

As consumer expectations rise across industries, companies need to continually optimise experiences to stay competitive. While A/B testing remains a stalwart experimentation technique, multi-armed bandits offer an intriguing complementary methodology.
By dynamically allocating visitors to better-performing variations, bandit algorithms enable faster optimisation through real-time learning. Their ability to double down on what is empirically working makes them well-suited for certain iterative tests. However, bandits do require thoughtful implementation to ensure adequate statistical significance and reasonable interpretation. When applied strategically to key pages and components, bandits provide optimisation teams with increased efficiency and flexibility.
The future of sophisticated experimentation is bright, and techniques like bandits will propel the evolution even further. So get ready to spin those dials and may the highest-converting variation win!