# Cross-Validation Algorithm

(Redirected from cross-validation)

## References

### 2018

• (Wikipedia, 2018) ⇒ https://en.wikipedia.org/wiki/Cross-validation_(statistics) Retrieved:2018-2-20.
• Cross-validation, sometimes called rotation estimation, is a model validation technique for assessing how the results of a statistical analysis will generalize to an independent data set. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice. In a prediction problem, a model is usually given a dataset of known data on which training is run (training dataset), and a dataset of unknown data (or first seen data) against which the model is tested (called the validation dataset or testing set).[1] The goal of cross validation is to define a dataset to "test" the model in the training phase (i.e., the validation set), in order to limit problems like overfitting , give an insight on how the model will generalize to an independent dataset (i.e., an unknown dataset, for instance from a real problem), etc. One round of cross-validation involves partitioning a sample of data into complementary subsets, performing the analysis on one subset (called the training set), and validating the analysis on the other subset (called the validation set or testing set). To reduce variability, in most methods multiple rounds of cross-validation are performed using different partitions, and the validation results are combined (e.g. averaged) over the rounds to estimate a final predictive model. One of the main reasons for using cross-validation instead of using the conventional validation (e.g. partitioning the data set into two sets of 70% for training and 30% for test) is that there is not enough data available to partition it into separate training and test sets without losing significant modelling or testing capability. In these cases, a fair way to properly estimate model prediction performance is to use cross-validation as a powerful general technique.

In summary, cross-validation combines (averages) measures of fit (prediction error) to derive a more accurate estimate of model prediction performance.[2]

### 1998

• (Kohavi & Provost, 1998) ⇒ Ron Kohavi, and Foster Provost. (1998). “Glossary of Terms.” In: Machine Leanring 30(2-3).
• QUOTE: Cross-validation: A method for estimating the accuracy (or error) of an inducer by dividing the data into k mutually exclusive subsets (the “folds) of approximately equal size. The inducer is trained and tested k times. Each time it is trained on the data set minus a fold and tested on that fold. The accuracy estimate is the average accuracy for the k folds.

### 1993

• (Geisser, 1993) ⇒ Seymour Geisser. (1993). “Predictive Inference. Chapman and Hall. ISBN 0412034719.
• Cross-validation: Divide the sample in half, use the second half to "validate" the first half and vice versa, yielding a second validation or comparison. The two may be combined into a single one.

### 1983

• (Efron & Gong, 1983) ⇒ Bradley Efron and Gail Gong. (1983). “A Leisurely Look at the Bootstrap, the Jackknife, and Cross-Validation.” In: The American Statistician, 37(1). http://www.jstor.org/stable/2685844
• Abstract: This is an invited expository article for The American Statistician. It reviews the nonparametric estimation of statistical error, mainly the bias and standard error of an estimator, or the error rate of a prediction rule. The presentation is written at a relaxed mathematical level, omitting most proofs, regularity conditions, and technical details.

1. Cite error: Invalid `<ref>` tag; no text was provided for refs named `Newbie question: Confused about train, validation and test data!`
2. Cite error: Invalid `<ref>` tag; no text was provided for refs named `:0`