Click-Through Log Dataset

A Click-Through Log Dataset is a click-through data source that is a log dataset (of click-through records).

Context:
- It can be associated with a Click-Through Data Stream.
- …
Example(s):
- a Website Clickthrough Log, such as:
  - Avazu's CTR Prediction Dataset [1].
  - Criteo Lab's Display Ad Dataset [2]
- …
Counter-Example(s):
- a Dwell Time Dataset.
- a Purchase Transaction Dataset.
- …
See: Clickthrough Rate Estimation, Interaction Dataset.

References

2017

(Joachims, Swaminathan et al., 2017) ⇒ Thorsten Joachims, Adith Swaminathan, and Tobias Schnabel. (2017). “Unbiased Learning-to-Rank with Biased Feedback.” In: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. ISBN:978-1-4503-4675-7 doi:10.1145/3018661.3018699
- QUOTE: Implicit feedback (e.g., clicks, dwell times, etc.) is an abundant source of data in human-interactive systems. While implicit feedback has many advantages (e.g., it is inexpensive to collect, user centric, and timely), its inherent biases are a key obstacle to its effective use. For example, position bias in search rankings strongly influences how many clicks a result receives, so that directly using click data as a training signal in Learning-to-Rank (LTR) methods yields sub-optimal results. To overcome this bias problem, we present a counterfactual inference framework that provides the theoretical basis for unbiased LTR via Empirical Risk Minimization despite biased data.

2014

http://kaggle.com/c/avazu-ctr-prediction
- QUOTE:
  - train - Training set. 10 days of click-through data, ordered chronologically. Non-clicks and clicks are subsampled according to different strategies.
  - test - Test set. 1 day of ads to for testing your model predictions.
  - sampleSubmission.csv - Sample submission file in the correct format, corresponds to the All-0.5 Benchmark.

Data fields
   id: ad identifier
   click: 0/1 for non-click/click
   hour: format is YYMMDDHH, so 14091123 means 23:00 on Sept. 11, 2014 UTC.
   C1 -- anonymized categorical variable
   banner_pos
   site_id
   site_domain
   site_category
   app_id
   app_domain
   app_category
   device_id
   device_ip
   device_model
   device_type
   device_conn_type
   C14-C21 -- anonymized categorical variables

Click-Through Log Dataset

References

2017

2014

Navigation menu

Search