2011 LearningtoTradeOffBetweenExplor

From GM-RKB
Jump to navigation Jump to search

Subject Headings:

Notes

Cited By

Quotes

Author Keywords

Abstract

We study multi-class bandit prediction, an online learning problem where the learner only receives a partial feedback in each trial indicating whether the predicted class label is correct. The exploration vs. exploitation tradeoff strategy is a well-known technique for online learning with incomplete feedback (i.e., bandit setup). Banditron [8], a multi-class online learning algorithm for bandit setting, maximizes the run-time gain by balancing between exploration and exploitation with a fixed tradeoff parameter. The performance of Banditron can be quite sensitive to the choice of the tradeoff parameter and therefore effective algorithms to automatically tune this parameter is desirable. In this paper, we propose three learning strategies to automatically adjust the tradeoff parameter for Banditron. Our extensive empirical study with multiple real-world data sets verifies the efficacy of the proposed approach in learning the exploration vs. exploitation tradeoff parameter.

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2011 LearningtoTradeOffBetweenExplorRong Jin
Hamed Valizadegan
Shijun Wang
Learning to Trade Off Between Exploration and Exploitation in Multiclass Bandit Prediction10.1145/2020408.20204452011