Contextual Bandit Algorithm

Jump to navigation Jump to search

A Contextual Bandit Algorithm is a bandit algorithm that can be implemented by a contextual bandit system (that solves a contextual bandit task).



  • Ashok Chandrashekar, Fernando Amat, Justin Basilico and Tony Jebara. (2017). “Artwork Personalization at Netflix.” In: Netflix Blog
    • QUOTE: … Briefly, contextual bandits are a class of online learning algorithms that trade off the cost of gathering training data required for learning an unbiased model on an ongoing basis with the benefits of applying the learned model to each member context. In our previous unpersonalized image selection work, we used non-contextual bandits where we found the winning image regardless of the context. For personalization, the member is the context as we expect different members to respond differently to the images.

      A key property of contextual bandits is that they are designed to minimize regret. At a high level, the training data for a contextual bandit is obtained through the injection of controlled randomization in the learned model’s predictions. The randomization schemes can vary in complexity from simple epsilon-greedy formulations with uniform randomness to closed loop schemes that adaptively vary the degree of randomization as a function of model uncertainty. …