Cross-Validation Evaluation Task

A Cross-Validation Evaluation Task is a out-of-sample evaluation task that estimates how accurate a predictive model will perform in practice.

AKA: Rotation Estimation, Out-of-Sample Test.
- It creates holdout and training datasets) based on sampling without replacement of [math]\displaystyle{ n }[/math] folds on different portions of the annotated dataset to split evaluation data.
- It can be solved by a Cross-Validation Evaluation System (that implements a cross-validation evaluation algorithm).
- It can range from being a Non-exhaustive Cross-validation Task, to being a Exhaustive Cross-Validation Task, to being an Nested Cross-validation Task.
- …
Example(s):
- a Non-exhaustive Cross-validation Task such as:
- an Exhaustive Cross-Validation Task such as:
  - Leave-P-Out Cross-Validation (LpO CV) Task,
  - Leave-One-Out Cross-Validation (LOOCV) Task,
- a Nested Cross-validation Task such as:
  - KL-fold Cross-Validation Task,
  - K-Fold Cross-Validation Task with Validation and Test Set.
- …
Counter-Example(s):
- An In-Sample Test.
- Bootstrapping.
See: Temporal Data Evaluation, Out-of-Sample Forecasting Experiment, Extrinsic Performance, Variance, Model Validation, Statistics, Accuracy, Predictive Modelling, Validation Set, Overfitting, Selection Bias, Partition of a Set, Statistical Sample, Data.

References

2020

(SciKit-Learn, 2020) ⇒ https://scikit-learn.org/stable/modules/cross_validation.html Retrieved: 2020-02-15.
- QUOTE: However, by partitioning the available data into three sets, we drastically reduce the number of samples which can be used for learning the model, and the results can depend on a particular random choice for the pair of (train, validation) sets.
  A solution to this problem is a procedure called cross-validation (CV for short). A test set should still be held out for final evaluation, but the validation set is no longer needed when doing CV. In the basic approach, called k-fold CV, the training set is split into k smaller sets (other approaches are described below, but generally follow the same principles). The following procedure is followed for each of the k “folds”:
  - A model is trained using $k-1$ of the folds as training data;
  - the resulting model is validated on the remaining part of the data (i.e., it is used as a test set to compute a performance measure such as accuracy).

The performance measure reported by k-fold cross-validation is then the average of the values computed in the loop. This approach can be computationally expensive, but does not waste too much data (as is the case when fixing an arbitrary validation set), which is a major advantage in problems such as inverse inference where the number of samples is very small.

2019

(Wikipedia, 2019) ⇒ https://en.wikipedia.org/wiki/Cross-validation_(statistics) Retrieved:2019-5-1.
- Cross-validation, sometimes called rotation estimation, ^[1] ^[2] ^[3] or out-of-sample testing is any of various similar model validation techniques for assessing how the results of a statistical analysis will generalize to an independent data set. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice. In a prediction problem, a model is usually given a dataset of known data on which training is run (training dataset), and a dataset of unknown data (or first seen data) against which the model is tested (called the validation dataset or testing set). ^[4] ^[5] The goal of cross-validation is to test the model's ability to predict new data that was not used in estimating it, in order to flag problems like overfitting or selection bias and to give an insight on how the model will generalize to an independent dataset (i.e., an unknown dataset, for instance from a real problem). One round of cross-validation involves partitioning a sample of data into complementary subsets, performing the analysis on one subset (called the training set), and validating the analysis on the other subset (called the validation set or testing set). To reduce variability, in most methods multiple rounds of cross-validation are performed using different partitions, and the validation results are combined (e.g. averaged) over the rounds to give an estimate of the model's predictive performance. In summary, cross-validation combines (averages) measures of fitness in prediction to derive a more accurate estimate of model prediction performance.^[6]

↑ Geisser, Seymour (1993). Predictive Inference. New York, NY: Chapman and Hall. ISBN 978-0-412-03471-8.
↑ Kohavi, Ron (1995). “A study of cross-validation and bootstrap for accuracy estimation and model selection". Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence. San Mateo, CA: Morgan Kaufmann. 2 (12): 1137–1143. CiteSeerX 10.1.1.48.529.
↑ Devijver, Pierre A.; Kittler, Josef (1982). Pattern Recognition: A Statistical Approach. London, GB: Prentice-Hall.
↑ "What is the difference between test set and validation set?". Retrieved 10 October 2018.
↑ "Newbie question: Confused about train, validation and test data!". Archived from the original on 2015-03-14. Retrieved 2013-11-14.CS1 maint: BOT: original-url status unknown (link)
↑ Grossman, Robert; Seni, Giovanni; Elder, John; Agarwal, Nitin; Liu, Huan (2010). “Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions". Synthesis Lectures on Data Mining and Knowledge Discovery. Morgan & Claypool. 2: 1–126. doi:10.2200/S00240ED1V01Y200912DMK002.

2005

(Inoue & Kilian, 2005) ⇒ Atsushi Inoue, and Lutz Kilian. (2005). “In-Sample or Out-of-Sample Tests of Predictability: Which one should we use?.” In: Econometric Reviews, 23(4). doi:10.1081/ETC-200040785
- ABSTRACT: It is widely known that significant in-sample evidence of predictability does not guarantee significant out-of-sample predictability. This is often interpreted as an indication that in-sample evidence is likely to be spurious and should be discounted. In this paper, we question this interpretation. Our analysis shows that neither data mining nor dynamic misspecification of the model under the null nor unmodelled structural change under the null are plausible explanations of the observed tendency of in-sample tests to reject the no-predictability null more often than out-of-sample tests. We provide an alternative explanation based on the higher power of in-sample tests of predictability in many situations. We conclude that results of in-sample tests of predictability will typically be more credible than results of out-of-sample tests.

[1] Geisser, Seymour (1993). Predictive Inference. New York, NY: Chapman and Hall. ISBN 978-0-412-03471-8.

[Kohavi95-2] Kohavi, Ron (1995). “A study of cross-validation and bootstrap for accuracy estimation and model selection". Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence. San Mateo, CA: Morgan Kaufmann. 2 (12): 1137–1143. CiteSeerX 10.1.1.48.529.

[Devijver82-3] Devijver, Pierre A.; Kittler, Josef (1982). Pattern Recognition: A Statistical Approach. London, GB: Prentice-Hall.

[4] "What is the difference between test set and validation set?". Retrieved 10 October 2018.

[5] "Newbie question: Confused about train, validation and test data!". Archived from the original on 2015-03-14. Retrieved 2013-11-14.CS1 maint: BOT: original-url status unknown (link)

[6] Grossman, Robert; Seni, Giovanni; Elder, John; Agarwal, Nitin; Liu, Huan (2010). “Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions". Synthesis Lectures on Data Mining and Knowledge Discovery. Morgan & Claypool. 2: 1–126. doi:10.2200/S00240ED1V01Y200912DMK002.

[1]

[2]

[3]

[4]

[5]

[6]