Multi Armed Bandit Testing Task

From GM-RKB
(Redirected from Multi-Armed Bandit Testing)
Jump to navigation Jump to search

A Multi Armed Bandit Testing Task is a discrete testing task that ...



References

2016

2012

  • https://conductrics.com/balancing-earning-with-learning-bandits-and-adaptive-optimization/
    • QUOTE: … Let’s see how Bandit’s continuous Learn and Earn balancing act looks compared to AB Testing in the graph below.

      With the AB Testing approach we try out each of the three options with equal proportions until we run our test at week 5, and then select the option with the highest value. So we have effectively selected the best option 44% of the time over the 6 week period (33% of the time through week 5, and 100% at week 6), and the medium and lowest performing options 28% of time each (33% of the time through week 5, and 0% at week 6). If we the weight these selection proportions by the actual value of each option, we get an expect yield of about $1.31 per user over the 6 weeks (44%*$2 + %28*$1 + %28*$0.50=$1.31). <P< Now lets look at the Bandit approach. Remember, the bandit attempts to use what it knows about each option from the very beginning, so it continuously updates the probabilities that it will select each option throughout the optimization project. In the above chart we can see that with each new week, the bandit reduces how often it selects the lower performing options and increases how often if selects the highest performing option.

      So, over the 6 week period the bandit approach has selected the best option 72% of the time, the medium option 16% of the time, and the lowest performer on 12% of the time. If we weight by the average value of each option we get an expected yield of about $1.66 per user. In the case the bandit would have returned a 27% lift over the AB Testing approach – ($1.66-$1.31)/$1.31 – during the learning time of the project.