2009 ACaseStudyofBehaviorDrivenConjo

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Conjoint Analysis.

Notes

Cited By

Quotes

Author Keywords

Abstract

Conjoint analysis is one of the most popular market research methodologies for assessing how customers with heterogeneous preferences appraise various objective characteristics in products or services, which provides critical inputs for many marketing decisions, e.g. optimal design of new products and target market selection. Nowadays it becomes practical in e-commercial applications to collect millions of samples quickly. However, the large-scale data sets make traditional conjoint analysis coupled with sophisticated Monte Carlo simulation for parameter estimation computationally prohibitive. In this paper, we report a successful large-scale case study of conjoint analysis on click through stream in a real-world application at Yahoo!. We consider identifying users' heterogenous preferences from millions of click/view events and building predictive models to classify new users into segments of distinct behavior pattern. A scalable conjoint analysis technique, known as tensor segmentation, is developed by utilizing logistic tensor regression in standard partworth framework for solutions. In offline analysis on the samples collected from a random bucket of Yahoo! Front Page Today Module, we compare tensor segmentation against other segmentation schemes using demographic information, and study user preferences on article content within tensor segments. Our knowledge acquired in the segmentation results also provides assistance to editors in content management and user targeting. The usefulness of our approach is further verified by the observations in a bucket test launched in December 2008.

1. INTRODUCTION

Since the advent of conjoint methods in marketing re— search pioneered by Green and Rao [9], research on theoret— ical methodologies and pragmatic issues has thrived. Con— joint analysis is one of the most popular marketing research methodologies to assess users7 preferences on various ob— jective characteristics in products or services. Analysis of trade—offs, driven by heterogeneous preferences on benefits derived from product attributes, provides critical input for many marketing decisions, e.g. optimal design of new prod— ucts, target market selection, and product pricing. It is also an analytical tool for predicting users7 plausible reactions to new products or services.

In practice, a set of categorical or quantitative attributes is collected to represent products or services of interest, while a users preference on a specific attribute is quantified by a utility function (also called partworth function). While there exist several ways to specify a conjoint model, additive models that linearly sum up individual partworth functions are the most popular selection.

As a measurement technique for quantifying users7 prefer— ences on product attributes (or partworths), conjoint anal— ysis always consists of a series of steps, including stimu— lus representation, feedback collection and estimation meth— ods. Stimulus representation involves development of stim— uli based on a number of salient attributes (hypothetical profiles or choice sets) and presentation of stimuli to appro— priate respondents. Based on the nature of users7 response to the stimuli, popular conjoint analysis approaches are ei— ther choice—based 0r ratings — based. Recent developments of estimation methods comprise hierarchical Bayesian (HB) methods [15], polyhedral adaptive estimation [19], Support Vector Machines [2, 7] etc.

New challenges on conjoint analysis techniques arise from emerging personalized services on the Internet. A well—known Web site can easily attract millions of users daily. The Web content is intentionally programmed that caters to users7

needs. As a new kind of stimuli, the content could be a hy— perlink with text or an image with caption. The traditional stimuli, such as questionnaires with itemized answers, are seldom applied on web pages, due to attention burden on the user side. Millions of responses, such as click or non—click, to content stimuli are collected at much lower cost compared to the feedback solicited in traditional practice, but the im— plicit feedback without incentive—cornpatibility constraints are potentially noisy and more difficult to interpret. Al— though indispensable elements in the traditional settings of conjoint analysis have been changed greatly, conjoint anal— ysis is still particularly important in identifying the most appropriate users at the right time, and then optimizing available content to improve user satisfaction and retention. We summarize three main differences between Web—based conjoint analysis and the traditional one in the following:

o The Web content may have various stimuli that poten— tially contain many psychologically related attributes, rather than predefined attributes of interest in tradi— tional experimental design. Meanwhile, most of users are casual or new Visitors who declare part or none of their personal information and interests. Since we have to extract attributes or discover latent features in profiling both content stimuli and users, parameter es — timation methods become more challenging than that in the traditional situation;

In feedback collection, most of respondents haven7t ex— perienced strong incentives to expend their cognitive resources on the prominent but unsolicited content. This issue causes a relatively high rate of false nega— tive (false non—click);

The sample size considered in traditional conjoint anal— ysis is usually less than a thousand, whereas it is com— mon in modern e—business applications to observe mil— lions of responses in a short time, e.g. in a few hours. The large—scale data sets make the traditional conjoint analysis coupled with sophisticated Monte Carlo sim— ulation for parameter estimation computationally pro— hibitive. In this paper, we conduct a case study of conjoint analy— sis on click through stream to understand users7 intentions. We construct features to represent the Web content, and collect user information across the Yahoo! network. The partworth function is optimized in tensor regression frame— work Via gradient descent methods on large scale samples. In the partworth space, we apply clustering algorithms to identifying meaningful segments with distinct behavior pat— tern. These segments result in significant CTR lift over both the unsegInented baseline and two demographic segmenta— tion methods in offline and online tests on the Yahoo! Front Page Today Module application. Also by analyzing charac— teristics of user segments, we obtain interesting insight of users7 intention and behavior that could be applied for mar— ket campaigns and user targeting. The knowledge could be further utilized to help editors for content management. The paper is organized as follows: In Section 2, we de— lineate the scenario under consideration by introducing the Today Module application at Yahoo! Front Page; In Sec— tion 3, we review related work in literature; In Section 4, we present tensor segmentation in detail. We report exper— imental results in Section 5 and conclude in Section 6.

Figure l: A snapshot of the default “Featured” tab in the Today Module on Yahoo! Front Page, de- lineated by the rectangular. There are four articles displayed at footer positions, indexed by F1, F2, F3 and F4. One of the four articles is highlighted at the story position. At default, the article at F1 is highlighted at the story position.

2. PROBLEM SETTING

In this section, we first describe our problem domain and our motivations for this research work. Then we describe our data set and define some notations.

2.1 Today Module

Today Module is the most prominent panel on Yahoo! Front Page, which is also one of the most popular pages on the Internet, see a snapshot in Figure 1. The default “Featured” tab in Today Module highlights one of four high— quality articles selected from a daily—refreshed article pool curated by human editors. As illustrated in Figure 1, there are four articles at footer positions, indexed by F1, F2, F3 and F4 respectively. Each article is represented by a small picture and a title. One of the four articles is highlighted at the story position, which is featured by a large picture, a title and a short summary along with related links. At default, the article at F1 is highlighted at the story position. A user can click on the highlighted article at the story position to read more details if the user is interested in the article. The event is recorded as a “story click”. If a user is interested in the articles at F2~F4 positions, she can highlight the article at the story position by clicking on the footer position.

A pool of articles is maintained by human editors that in— carnates the “voice” of the site, i.e., the desired nature and mix of various content. Fresh stories and breaking news are regularly acquired by the editors to replace out—of—date ar— ticles every a few hours. The life time of articles is short, usually just a few hours, and the popularity of articles, mea— sured by their click—through rate (CTR)[1], changes over time. Yahoo! Front Page serves millions of Visitors daily. Each Visit generates a “View” event on the Today Module, though the Visitor may not pay any attention to the Today Mod— ule. The users of the Today Module, a subset of the whole traffic, also generate a large amount of “story click” events by clicking at story position for more details of the stories they would like to read. Our setting is featured by dynamic characteristics of articles and users. Scalability is also an important requirement in our system design.

One of our goals is to increase user activities, measured by overall CTR, on the Today Module. To draw Visitors7 atten— tion and increase the number of clicks, we would like to rank available articles according to Visitors7 interests, and to high— light the most attractive article at the F1 position. In our previous research [1] we developed an Estimated Most Pop» ulm’ algorithm (EMP), which estimates CTR of available articles in near real—tiIne by a Kalrnan filter, and presents the article of the highest estimated CTR at the F1 position. Note that there is no personalized service in that system. i.e. the article shown at F1 is the same to all Visitors at a given time. In this work we would like to further boost overall CTR by launching a partially personalized service. User segments which are determined by conjoint analysis will be served with different content according to segmental interests. Articles with the highest segmental CTR will be served to user segments respectively.

In addition to optimizing overall CTR, another goal of this study is to understand users7 intention and behavior to some extend for user targeting and market campaigns. Once we identify users who share similar interests in conjoint analy— sis, predictive models can be built to classify users (including new Visitors) into segments. For example, if we find a user segment who like “Car 85 Transportation” articles, then we can target this user segment for Autos marketing campaigns. Eirthernlore, with knowledge of user interests in segments, we can provide assistance to the editors for content manage— ment. For example, if we know that most users in a segment who like articles of News Visit us in the morning while most users in another segment who like articles about TV come in the evening. Then editors can target these two segments by simply programming more News articles in the morning and more TV—related articles in the evening.

Note that if we only consider the first goal of maximizing overall CTR, personalized recornInender systems at individ— ual level might be an appropriate approach to pursue. In our recent research [4], we observed that feature—based models yield higher CTR than conjoint models in offline analysis. However, conjoint models significantly outperform the “one— size—fits — all” EMP approach on the metric of CTR, and also provide actionable management on content and user target— ing at segment level. The flexibility supplied by conjoint models is valuable to portals such as Yahoo!.

2.2 Data Collection

We collected three sets of data, including content features, user profiles and interactive data between users and articles.

Each article is summarized by a set of features, such as topic categories, sub—topics, URL resources, etc. Each Visi— tor is profiled by a set of attributes as well, e.g. age, gender, residential location, Yahoo! property usage, etc. Here we simply selected a set of informative attributes to represent users and articles. Gauch et al. [8] gave an extensive review on various profiling techniques.

The interaction data consists of two types of actions: View only or story click for each pair of a Visitor and an article. One Visitor can only View a small portion of available arti— cles, and it is also possible that one article is shown to the

same Visitor more than one time. It is difficult to detect whether the Visitors have paid enough attention to the ar— ticle at the story position. Thus in data collection a large amount of View events are false non—click events. In another words, there is a relatively high rate of false negative (false non—click) in our observations.

There are multiple treatments on users7 reactions in mod— eling the partworth utility, such as

o Choice—based responses: We only consider whether an article has been clicked by a Visitor, while ignoring repeated Views and clicks. In this case, an observed response is simply binary, click or not;

a Poisson—based responses: The number of clicks we ob— served on each article/user pair is considered as a re— alization from a Poisson distribution;

a Metric—based responses: We consider repeated Views and clicks and treat CTR of articles by each user as target.

In the Today Module setting, Poisson—based and metric— based responses might be vulnerable by the high rate of false negative observations. Thus we follow the choice—based re— sponses only in this work.

2.3 Notations

Let index the i—th user as xi, a D X 1 vector of user fea— tures, and the j—th content item as ZJ’, a C X 1 vector of article features. We denote by nj the interaction between the user xi and the item ZJ’, where nj 6 {—1, +1} for “View” event and “story click” event respectively. We only observe inter— actions on a small subset of all possible user/article pairs, and denote by (U) the set of observations {nj}.

3. RELATED WORK

In very early studies [21], homogeneous groups of con— sumers are entailed by a priori segmentation. For exam— ple, consumers are assigned to groups on the basis of demo— graphic and socioeconomic variables, and the conjoint mod— els are estimated within each of those groups. Clearly, the criteria in the two steps are not necessarily related: one is the homogeneity of customers in terms of their descrip— tive variables and another is the conjoint preference within segments. However, this segmentation strategy is easy to implement, which is still widely applied in industry.

Traditionally, conjoint analysis procedures are of two—stage: 1) estimating a partworth utility function in terms of at— tributes which represents customers7 preference at individual— level, e.g. Via ordinary least squares regression; 2) if seg— mentation is of interest to marketing, through hierarchi— cal or non—hierarchical clustering algorithms, customers are grouped into segments where people share similar individual— level partworths.

Conjoint studies, especially on the partworth function, de— pend on designs of stimuli (e.g. product profiles or choices sets on questionnaires) and methods of data collection from respondents. One of challenges in the traditional conjoint analysis is to obtain sufficient data from respondents to es — timate partworth utilities at individual level using relatively few questions. The theory of experimental design is adapted for constructing compact profiles to evaluate respondents7 opinion effectively. Kuhfeld et al. [14] studied orthogonal

designs for linear models. Huber and Zwerina [11] brought two additional properties, minimal level overlap and util— ity balance, into choice—based conjoint experiments. Sandor and Wedel [17] developed experimental designs by utilizing prior information. More references are reviewed by Rao [16] recently.

Hierarchical Bayesian (HB) methods [15] are developed to exploit partworth information of all respondents in modeling the partworth function. The HB models relate the variation in a subject7s Inetric—based responses and the variation in the subjects7 partworths over the population as follows:

Tij = fliTZj + Eij Vi,j (1)

fii=WTxi+6¢fori=1,...,n (2)

where fl, is a C—diInensional partworth vector of the user i and W is a matrix of regression coefficients that relates user attributes to partworths. The error terms {5,7} and {61'} in eq(1) and eq(2) are assumed to be mutually independent and Gaussian random variables with zero mean and covariance matrix {0?} and A respectively, i.e. eij ~ N(0,U]2») and 61' ~ j\/(0,A)[2]. Together with appropriate prior distributions over the remaining variables, the posterior analysis yields the HB estimator of the partworths as a convex function of an individual—level estimator and a pooled estimator, in which the weights mainly depend on the noise levels {0?} and A in eq(1) and eq(2). The estimation on these noise levels usually involves the Monte Carlo simulation, such as the Gibbs sampling or Metropolis — Hastings algorithms [20], which is computationally infeasible for our applications with millions of samples.

Huber and Train [10] compared the estimates obtained from the HB methods with those from classical Inaxirnunl simulated likelihood methods, and found the average of the expected partworths to be almost identical for the two methods in some applications on electricity suppliers. In the past decade, integrated conjoint analysis methods have emerged that simultaneously segment the market and estimate segment-level partworths, e.g. the finite mixture conjoint models [6]. The study conducted by [20] shows the finite mixture conjoint models performed well at segInent—level, compared with the HB models.

Toubia et al. [19] developed a fast polyhedral adaptive conjoint analysis method that greatly reduces respondent burden in parameter estimation. They employ “interior point” mathematical programming to select salient questions to re— spondents that narrow the feasible region of the partworth values as fast as possible.

A recently developed technique on parameter estimation is based on the ideas from statistical learning and support vector machines. The choice—based data can be translated as a set of inequalities that compare the utilities between two selected items. Evgeniou et al. [7] formulated the partworth learning as a convex optimization problem in regularization fraInework. Chapelle and Harchaoui [2] proposed two ma— chine learning algorithms that efficiently estimate conjoint models from pairwise preference data.

Jiang and Tuzhilin [12] experimentally demonstrated both 1—to—1 personalization and segmentation approaches signifi— cantly outperform aggregate modeling. Chu and Park [4] re—

cently proposed a feature—based model for personalized ser— vice at individual level. On the Today Module application, the 1—to—1 personalized model outperforms several segmen— tation models in offline analysis. However, conjoint models are still indispensable components in our system, because of the valuable insight on user intention and tangible control on both content and user sides at segment level.

4. TENSOR SEGMENTATION

In this section, we employ logistic tensor regression cou— pled with efficient gradient—descent methods to estimate the partworth function conjointly on large data sets. In the users7 partworth space, we further apply clustering tech— niques to segmenting users. Note that we consider the cases of millions of users and thousands of articles. The number of observed interactions between user/article pairs could be tens of million.

4.1 Tensor Indicator

We first define an indicator as a parametric function of the tensor product of both article features ZJ’ and user attributes X, as follows:

C D Sij = Z: $i,bwasz,a, (3)

(1:1 b:1

where D and C are the dimensionality of user and content features respectively, ija denotes the a—th feature of ZJ’, and $in denotes the b—th feature of xi. The weight variable wab is independent of user and content features, which represents affinity of these two features 05in and ija in interactions. In matrix form, eq(3) can be rewritten as

T Sij : Xi WZj,

where W denotes a D X C matrix with entries {wab}. The partworths of the user X, on article attributes is evaluated as WTxi, denoted as 22,, a vector of the same length of zj.

The tensor product above, also known as a bilinear model, can be regarded as a special case in the Tucker family [5], which have been extensively studied in literature and appli— cations. For example, TenenbauIn and Freeman [18] devel— oped a bilinear model for separating “style” and “content” in images, and recently Chu and Ghahrarnani [3] derived a probabilistic framework of the Tucker family for mod— eling structural dependency from partially observed high— dimensional array data.

The tensor indicator is closely related to the traditional HB models as in eq(1) and eq(2). Respondent heterogene— ity is assumed either to be randomly distributed or to be constrained by attributes measured at individual level.

4.2 Logistic Regression

Conventionally the tensor indicator is related to an ob— served binary event by a logistic function. In our particular application, we found three additions in need:

0 User—specific bias: Users7 activity levels are quite dif— ferent. Some are active clickers, while some might be casual users. We introduce a bias term for each user, denoted as M for user i;

o Article—specific bias: Articles have different popularity. We have a bias term for each article as well, denoted as W for article j;

0 Global offset: Since the number of click events is much smaller than that of View events in our observations, the classification problem is heavily imbalanced. Thus we introduce a global offset L to take this situation into account.

Based on the considerations above, the logistic function is defined as follows,

(r' ] g. ) _ 1

p 1] 1] 1 + exp(—r1j§1j +1.)

where §1j = s1j — ,u1 — fig and s1j is defined as in eq(3). Together with a standard Gaussian prior on the coeffi—

cients {wab}, i.e. wab ~ N(O, c),[3] the maximum a posteriori

estimate of {wab} is obtained by solving the following opti—

mization problem,

D c V{1,0,11er 2—10 2 Z wib — Z 10gP(Tij]§ij)-

(1:1 b:1 ijEO

where ,u denotes {p1} and ”y denotes {Vil- We employ a gradient descent package to find the optimal solution. The gradient with respect to wab is given by

1 A E wab — Z Tij$¢,ij,a(1 -p(m]81‘j))a

ion and the gradient with respect to the bias terms can be de— rived similarly. The model parameter c is determined by cross validation.

4.3 Clustering

With the optimal coefficients W in hand, we compute the partworths for each training user by 221 = WTxt. The vector 221 represents the users preferences on article attributes.

In the partworth space spanned by {221}, we further apply a clustering technique, e.g. K—means [13], to classify training users having similar preferences into segments. The number of clusters can be determined by validation in offline analy— sis.

For an existing or new user, we can predict her partworths by 221 = WTxt, where x: is the vector of user features. Then her segment membership can be determined by the shortest distance between the partworth vector and the centroids of clusters, i.e.

argmkin|]$<1 —ok|] (4)

where {011} denote the centroids obtained in clustering.

5. EXPERIMENTS

We collected events from a random bucket in July, 2008 for training and validation. In the random bucket, articles are randomly selected from a content pool to serve users. An event records a users action on the article at the F1 position, which is either “View” or “story click” encoded as —1 and +1 respectively. We also collected events from a random bucket in September 2008 for test.

Note that a user may View or click on the same article multiple times but at different time stamps. In our train— ing, repeated events were treated as a single event. The distinct events were indexed in triplet format, i.e. (user_id, article_id, click/View).

We split the July random bucket data by a time stamp threshold for training and validation. There are 37.806 mil— lion click and View events generated by 4.635 million users before the time stamp threshold, and 0.604 million click events happened after the time stamp for validation. In the September test data, there are about 1.784 million click events.

The features of users and items were selected by “support”. The “support” of a feature means the number of samples having the feature. We only selected the features of high support above a prefixed threshold, e.g. 5% of the popula— tion. Each user is represented by a vector of 1192 categorical features, which include:

o Demographic information: gender (2 classes) and age discretized into 10 classes;

0 Geographical features: 191 locations of countries and US States;

a Behavioral categories: 989 binary categories that sum— marize consumption behavior of a user within Yahoo! properties.

Each article is profiled by a vector of 81 static features. The 81 static features include:

0 URL categories: 43 classes inferred from the URL of the article resource;

0 Editor categories: 38 topics tagged by human editors to summarize the article content.

Categorical features are encoded as binary vectors with non—zero indicators. For example, “gender” is translated into two binary features, i.e., “male” is encoded as [0, 1], “female” is encoded as [1,0] and “unknown” is [0,0]. As the num— ber of non—zero entries in a binary feature vector varies, we further normalized each vector into unit length, i.e., non— zero entries in the normalized vector are replaced by 1/ fl, where k is the number of non—zero entries. For article fea— tures, we normalized URL and Editor categories together. For user features, we normalized behavioral categories and the remaining features (age, gender and location) separately, due to the variable length of behavioral categories per user. Following conventional treatment, we also augmented each feature vector with a constant attribute 1. Each content item is finally represented by a feature vector of 82 entries, while each user is represented by a feature vector of 1193 entries.

5.1 Offline Analysis

For each user in test, we computed her membership first as in eq(4), and sorted all available articles in descending or— der according to their CTR in the test user7s segment at the time stamp of the event. On click events, we measured the rank position of the article being clicked by the user. The performance metric we used in offline analysis is the num— ber of clicks in top four rank positions. A good predictive model should have more clicks on top—ranked positions. We

computed the click portion at each of the top four‘rank po— sitions in predictive ranking, i.e. mm and

of all clicks 1 the metric “lift” over the baseline mode , which is defined as click portion . baseline click portion

baseline click ortion _ _ _ We trained our 1logistic regress1on models on the training

data with different values of the trade—off parameter c where

Figure 2: Click portion at the top rank position with different Cluster number on the July validation data.

c E {0.01,0.1,1,10, 100} , and examined their performance (click portion at the top position) on the validation data set. We found that c = 1 gives the best validation performance.

Using the model with c = 1, we run K—means cluster— ing [13] on the partworth vectors of training users computed as in eq(4) to group users with similar preferences into clus — ters. We varied the number of clusters from 1 to 20, and presented the corresponding results of the click portion at the top rank position in Figure 2. Note that the CTR esti— mation within segments suffers from low—traffic issues when the number of segments is large. We observed the best val— idation performance at 8 clusters, but the difference com— pared with that at 5 clusters is not statistically significant. Thus we selected 5 clusters in our application.

To verify the stability of the clusters we found in the July data, we further tested on the random bucket data collected in September 2008. The EMP approach, described as in Section 2.1, was utilized as the baseline model. We also im— plemented two demographic segmentations for comparison purpose:

0 Gender: 3 clusters defined by users7 gender, {“male”,

“female”, “unknown”};

0 AgeGender: 11 clusters defined as {“<17, male”,“17~24, male”,“25~34, male”,“35~49, male”,“>49, male”,“<17, female”,“17~24, female”,“25~34, female”,“35~49, fe— male”,“>49, female”,“unknown”}.

We estimated the article CTR within segments by the same technique [1] used in the EMP approach. A user will be served with the most popular article in the segment which she belongs to.

We computed the lifts over the EMP baseline approach and presented the results of the top 4 rank positions in Fig— ure 3. All segmentation approaches outperform the baseline model, the EMP unsegmented approach. AgeGender seg— mentation having 11 clusters works better than Gender seg— mentation with 3 clusters in our study. Tensor segmentation with 5 clusters consistently gives more lift than Gender and AgeGender at all the top 4 positions.

5.1.1 Segment Analysis

We collected some characteristics in the 5 segments we discovered. On the September data, we identified cluster membership for all users, and plotted the population distri— bution in the 5 segments as a pie chart in Figure 4. The largest cluster takes 32% of users, while the smallest cluster contains 10% of users. We further studied the user composi— tion of the 5 clusters with popular demographic categories, and presented the results in Figure 5 as a Hinton graph. We found that

Figure 3: A comparison on lift at top 4 rank posi- tions in the offline analysis on the September data. The baseline model is the EMP unsegmented ap- proach.

Figure 4: Population distribution in the 5 segments.

a Cluster c1 is of mostly female users under age 34; a Cluster c2 is of mostly male users under age 44; a Cluster c3 is for female users above age 30;

a Cluster c4 is of mainly male users above age 35; a Cluster c5 is predominantly non—U.S. users.

We also observed that c1 and c2 contains a small portion of users above age 55, and c3 has some young female users as well. Here, cluster membership is not solely determined by demographic information, though the demographic informa— tion gives a very strong signal. It is users7 behavior (click pattern) that reveals users7 interest on article topics.

We utilized the centroid of each cluster as a representa— tive to illustrate users7 preferences on article topics. The centroids in the space of article topics are presented in Fig— ure 6 as a heatmap graph. The gray level indicates users7 preference, from like (white) to dislike (black). We found the following favorite and unattractive topics by comparing the representatives7 scores across segments:

Figure 5: Distributions of users with a specific de- mographic category in the 5 Clusters. In the Hinton graph, each square’s area represents user percentage in a Cluster. Each column represents users having a particular demographic category, while each row is of a Cluster.

Figure 6: Users’ preferences on selected article top- ics in the 5 Clusters. Each square’s gray level indi- cates the preference of a segment on the corresponding article topic, from white (like) to black (dislike).

Users in c1 don7t have strong opinion on article topics, while relatively like Music more;

0 Users in c2 greatly like Sports, Movies, Cars 85 Trans — portation, Tech 85 Gadgets and Dating/Personals, while dislike TV and Food;

Users in c3 like TV, OMG,4 Food, while dislike Sports, Movies, Science & Mathemetics, Games 85 Recreation etc.; 0 Users in c4 are active readers who like Finance, Cars 85 Transportation and Politics 85 Government most;

Users in c5 like Travel, Hard News and Beauty 85 Style, while are not interested in Personal Finance and Poli— tics & Government.

Some topics are preferable for any user, such as Celebrity, but the discrepancy on interests is significant on most of top— ics. The discrepancy between clusters can be exploited by editors in content management to enhance user engagement.

5.1.2 Editorial Assistant

One interesting finding in our conjoint analysis is that Vis — iting patterns of some segments are quite different from the

4http://omg.yahoo.com/, a web site of celebrity gossip, news, photos, etc.

others, as shown in Figure 7. Yahoo! Front Page is visited by more older users (c3 and c4) in the morning while by more younger users (c1 and (:2) in the late afternoon. Most users around the midnight are international users (c5). Por— tion of traffic from older male users significantly decreases during weekend/holiday while traffic from other segments remains almost the same level for the entire week. This finding suggests some tips for content management, such as programming more articles related with News, Politics and Finance in the morning of weekdays, more entertainment articles of Sports and Music in the late afternoon, and more articles relevant to international users around midnight. We can also monitor user activities within segments and remind editors to target underperformed segments when the CTR within the segments runs below their average level.

Figure 7: Fraction of views in each user segment over a week. The first day is a US holiday and the sixth and seventh day are Saturday and Sunday.

Table 1: Bucket test results of three segmentation methods, Gender, AgeGender and Tensor-5. All buckets have served the almost same amount of page views.

entation on tory

nder 1.49

nder 2 .45 Tensor—5 3. 24

5.2 Online Bucket Test

To validate the tensor segmentation we proposed, we launched

a bucket test in December 2008. Three segmentation meth— ods, Gender, AgeGender and Tensor—5, were implemented in our production. From 8:00am 12 December to 0:00am 15 December, each of the three schemes and the control (EMP) bucket served about several million page views respectively. The numbers of page views in these four buckets are almost the same in the three days. We computed the story CTR and reported the corresponding lifts over the EMP control bucket for the three segmentation schemes in Table 1. The tensor segmentation with 5 clusters yields the most lift in the bucket test. We also observed the AgeGender segmentation outperforms the Gender segmentation. The observations in the online bucket test are consistent with our results in of—

fline analysis. Although the bucket test only lasted about 3 days around a weekend, the CTR lift we observed in sig— nificant amount of traffic gives another strong empirical ev— idence.

6. CONCLUSIONS

In this study, we executed conjoint analysis on a large— scale click through stream of Yahoo! Front Page Today Module. We validated the segments discovered in conjoint analysis by conducting offline and online tests. We analyzed characteristics of users in segments and also found different Visiting patterns of segments. The insight on user intention at segment level we found in this study could be exploited to enhance user engagement on the Today Module by assisting editors on article content management. In this study, a user can belong to only one segment. We would like to exploit other clustering techniques, such as Gaussian mixture mod— els, that allow for multiple membership, and then a users preference might be determined by a weighted sum of sev— eral segmental preferences. We plan to pursue this direction in the future.

7. ACKNOWLEDGMENTS

We thank Raghu Ramakrishnan, Scott Roy, Deepak Agar— wal, Bee—Chung Chen, Pradheep Elango, and Ajoy Sojan for many discussions and helps on data collection.

References

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2009 ACaseStudyofBehaviorDrivenConjoWei Chu
Seung-Taek Park
Todd Beaupre
Nitin Motgi
Amit Phadke
Seinjuti Chakraborty
Joe Zachariah
A Case Study of Behavior-driven Conjoint Analysis on Yahoo!: Front Page Today ModuleKDD-2009 Proceedings10.1145/1557019.15571382009
  1. CTR of an article is measured by the total number of clicks on the article divided by total number of Views on the article in a certain time interval.
  2. 2N(T, <2) denotes a Gaussian distribution with mean 7' and variance <2-
  3. Appropriate priors can be specified to bias terms too.