What is Multi-Armed Bandit (MAB)?
Imagine you're at a bank of slot machines ("one-armed bandits"). Your goal is to maximize your winnings. A traditional A/B test would have you pull each lever 10,000 times before deciding which is best. A Multi-Armed Bandit, however, is smarter. It pulls each lever a few times, sees which one is paying out, and then automatically shifts more of its attention (and money) to the proven winners.
MAB is an algorithm that dynamically allocates incoming traffic to the best-performing variations throughout the experiment, ensuring that you minimize "regret" (the opportunity cost of sending traffic to a suboptimal version).
The Explore vs. Exploit Dilemma
A core challenge in experimentation is balancing exploration, testing all options to discover the best, and exploitation, redirecting most resources toward the current top performer.
- Exploration helps you discover potential winners: what if something unexpected performs best?
- Exploitation gets you the most gains quickly by prioritizing what already seems to be working.
MAB algorithms are unique because they handle both at once: they continuously manage this trade-off, ensuring you never waste too much on underperforming ideas, but also don’t miss hidden opportunities.
How MAB Works in PageSense
PageSense’s MAB feature moves beyond rigid, fixed splits and instead learns and optimizes all the time:
- Dynamic Allocation: All variations start with equal traffic, but successful versions rapidly receive a higher share of new visitors.
- Intelligent Learning (Thompson Sampling): We use a Bayesian approach called Thompson Sampling. This means the algorithm models each variation’s conversion rate as a probability distribution. Every time a user arrives, PageSense "samples" from these distributions and sends the visitor to the variation most likely to be a winner in that moment. Over time, the best-performing variation naturally gets the most traffic, while the system continues to test alternatives—never fully shutting off exploration.
- Real-Time Optimization: No waiting for weeks to reach a confidence level. MAB instantly and continuously optimizes, making it perfect for time-sensitive campaigns or when rapid gains are critical.
How Does Thompson Sampling Work?
Thompson Sampling is one of the most advanced forms of MAB, especially valued for digital optimization:
- Every variation is assigned a probability model based on its past results (e.g., number of successes and failures).
- When a new visitor lands, Thompson Sampling draws a random value from each variation’s probability curve and picks the variation with the highest sampled value.
- Over thousands of visits, the best-performing variations emerge with higher probability and thus collect more traffic, but there’s always a small chance underdogs are revisited, keeping the system adaptive as performance shifts.
- This approach balances learning enough about all possibilities (exploration) with maximizing wins (exploitation) in real time.
Key Benefits of Using Multi-Armed Bandit
- Maximized Conversions During Testing: Rather than waiting until the end, MAB ramps up gains by directing visitors to top performers throughout the experiment ("earn while you learn").
- Perfect for Fast, High-Stakes Campaigns: Quickly react and capture value in short-term promotions or news cycles—where instant action matters most.
- Fully Automated Optimization: Once started, MAB takes over the traffic shifts, eliminating the need for manual tuning or constant attention.
- Adapts to Trends and Shifts: If user behavior or market dynamics change mid-campaign (seasonality, new trends), MAB spots the shift and finds the new winner, no need to restart tests from scratch.
Limitations to Keep in Mind
- Statistical Trade-Off: MAB is focused on maximizing immediate outcome, not providing scientific, final answers to “which variant is best?” Use traditional A/B tests for definitive, reportable, statistical comparison.
- Not for Research or Causal Insights: Need to precisely attribute cause and lift? A/B/n is still the gold standard for analytical rigor.
- Poor Fit for Delayed Feedback: Avoid for scenarios like email campaigns or long sales cycles where conversions don’t happen quickly, as that breaks MAB’s adaptive feedback loop.
Allocating Traffic Split in A/B Tests
PageSense gives you two ways to split traffic between variations: manual control or automatic optimization with MAB.
1. Manual Distribution
You decide exactly how traffic is split between versions:
- Equal Split: For example, with 2 variations, you assign 50/50. With 3, you might do 33/33/33.
- Custom Split: Tailor the distribution to your needs — e.g., 60% to Control, 40% to Variant.
Best for controlled experiments, gradual rollouts, or when you want full oversight of exposure.
2. Auto Distribution (MAB)
PageSense handles the split dynamically:
- Starts with equal distribution.
- Learns from results and sends more traffic to high performers while reducing traffic to weaker ones.
- Keeps adjusting in real time to maximize conversions.
Important: Once you pick auto-distribution and launch, you can’t go back or manually edit traffic splits.
When to use Auto Distribution:
- For high-traffic campaigns where speed and adaptive learning matter more than declaring a single “winner.”
- During promotions, time-sensitive events, or ongoing optimizations.
Example Scenarios:
- Manual Equal: 1000 visitors split 500/500.
- Manual Custom: 1000 visitors split 600/400.
- Auto (MAB): 1000 visitors start evenly, then shift dynamically (e.g., 700 Variant / 300 Control).