A Clickthrough Log Dataset is an interaction dataset of item clicks.
- (Joachims et al., 2017) ⇒ Thorsten Joachims, Adith Swaminathan, and Tobias Schnabel. (2017). “Unbiased Learning-to-Rank with Biased Feedback.” In: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. ISBN:978-1-4503-4675-7 doi:10.1145/3018661.3018699
- QUOTE: Implicit feedback (e.g., clicks, dwell times, etc.) is an abundant source of data in human-interactive systems. While implicit feedback has many advantages (e.g., it is inexpensive to collect, user centric, and timely), its inherent biases are a key obstacle to its effective use. For example, position bias in search rankings strongly influences how many clicks a result receives, so that directly using click data as a training signal in Learning-to-Rank (LTR) methods yields sub-optimal results. To overcome this bias problem, we present a counterfactual inference framework that provides the theoretical basis for unbiased LTR via Empirical Risk Minimization despite biased data.
- train - Training set. 10 days of click-through data, ordered chronologically. Non-clicks and clicks are subsampled according to different strategies.
- test - Test set. 1 day of ads to for testing your model predictions.
- sampleSubmission.csv - Sample submission file in the correct format, corresponds to the All-0.5 Benchmark.
id: ad identifier
click: 0/1 for non-click/click
hour: format is YYMMDDHH, so 14091123 means 23:00 on Sept. 11, 2014 UTC.
C1 -- anonymized categorical variable
C14-C21 -- anonymized categorical variables