2013 BalancingExplorationandExploita

Subject Headings: Online Learning-to-Rank.

Notes

As retrieval systems become more complex, learning to rank approaches are being developed to automatically tune their parameters. Using online learning to rank, retrieval systems can learn directly from implicit feedback inferred from user interactions. In such an online setting, algorithms must obtain feedback for effective learning while simultaneously utilizing what has already been learned to produce high quality results. We formulate this challenge as an exploration-exploitation dilemma and propose two [[online LtR method|methods for addressing it. By adding mechanisms for balancing exploration and exploitation during learning, each method extends a state-of-the-art learning to rank method, one based on listwise learning and the other on pairwise learning. Using a recently developed simulation framework that allows assessment of online performance, we empirically evaluate both methods. Our results show that balancing exploration and exploitation can substantially and significantly improve the online retrieval performance of both listwise and pairwise approaches. In addition, the results demonstrate that such a balance affects the two approaches in different ways, especially when user feedback is noisy, yielding new insights relevant to making online learning to rank effective in practice.

;

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2013 BalancingExplorationandExploita	Maarten de Rijke Katja Hofmann Shimon Whiteson			Balancing Exploration and Exploitation in Listwise and Pairwise Online Learning to Rank for Information Retrieval