2010 WebScaleBayesianClickthroughRat

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Bayesian Prediction Algorithm, Supervised Click-through Prediction Algorithm.

Notes

Cited By

Quotes

Abstract

We describe a new Bayesian click-through rate (CTR) prediction algorithm used for Sponsored Search in Microsoft’s Bing search engine. The algorithm is based on a probit regression model that maps discrete or real-valued input features to probabilities. It maintains Gaussian beliefs over weights of the model and performs Gaussian online updates derived from approximate message passing. Scalability of the algorithm is ensured through a principled weight pruning procedure and an approximate parallel implementation. We discuss the challenges arising from evaluating and tuning the predictor as part of the complete system of sponsored search where the predictions made by the algorithm decide about future training sample composition. Finally, we show experimental results from the production system and compare to a calibrated Naïve Bayes algorithm.

1. Introduction

Sponsored search remains one of the most profitable business models on the web today. It accounts for the overwhelming majority of income for the three major search engines Google, Yahoo and Bing, and generates revenue of at least 25 billion dollars per year and rising. All three major players use keyword auctions to allocate display space alongside the algorithmic search results based on a pay-per-click model in which advertisers are charged only if their advertisements are clicked by a user. In this mechanism it is necessary for the search engine to estimate the click-through rate (CTR) of available ads for a given search query to determine the best allocation of display space and appropriate payments (Edelman, Ostrovsky, & Schwarz, 2007). As a consequence, the task of CTR prediction is absolutely crucial to Sponsored Search advertising because it impacts user experience, profitability of advertising and search engine revenue.

Recognising the importance of CTR estimation for online advertising, management at Bing/adCenter decided to run a competition to entice people across the company to develop the most accurate and scalable CTR predictor. The algorithm described in this publication tied for first place in the first competition and won the subsequent competition based on prediction accuracy. As a consequence, it was chosen to replace Bing􀂶􀁖􀀃 􀁓􀁕􀁈􀁙􀁌􀁒􀁘􀁖􀀃 CTR prediction algorithm, a transition that was completed in the summer of 2009.

The paper makes three major contributions. First, it describes the Sponsored Search application scenario, the key role of CTR prediction in general, and the particular constraints derived from the task, including accuracy, calibration, scalability, dynamics, and exploration. Second, it describes a new Bayesian online learning algorithm for binary prediction, subsequently referred to as adPredictor. The algorithm is based on a generalised linear model with a probit (cumulative Gaussian) link function, a factorising Gaussian belief distribution on the feature weights, and calculates the approximate posterior using message passing, providing simple, closed-form update equations with automatic feature-wise learning rate adaptation. Third, we discuss the techniques we employed to make adPredictor work in Bing􀂶􀁖􀀃production environment, now driving 100% Sponsored Search traffic with 􀣩􁈺􀍳􀍲􀬵􀬴 􀵆 􀍳􀍲􀬵􀬵􁈻 ad impressions per year.

2.3. Domain - Specific Challenges

2.3.1. Evaluation

An important question is how to evaluate a predictor within the context of a given application domain. Broadly speaking, the performance of a predictor can be evaluated in isolation or as part of the larger system. To evaluate a predictor in isolation, the machine learning community has developed a number of reasonable measures such as the log-likelihood of test data under the model or the area under the receiver-operator (RO) curve (AUC). In the experimental section we will use these measures to evaluate adPredictor in comparison to calibrated Naïve Bayes. However, it is clear that these measures can only act as a proxy for the performance of the predictor in the larger system.

Ultimately, the predictor is part of a larger system that serves a purpose different from predicting user behaviour, namely the selection of ads. The ad selection system must be designed to balance the utilities of different players participating in the transaction: advertisers, users, and the search engine. These three types of players have different, even contradictory objectives. Advertisers are interested in maximising their return on investment at high volume . Users would like to see maximally relevant ads that help them pursue their intent. The search engine would like to maximise revenue and growth.

References

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2010 WebScaleBayesianClickthroughRatRalf Herbrich
Thore Graepel
Joaquin Quinonero Candela
Thomas Borchert
Web-scale Bayesian Click-through Rate Prediction for Sponsored Search Advertising in Microsoft’s Bing Search Engine