# Training Dataset

(Redirected from training data)

A Training Dataset is a learning dataset of training data records to be used in a supervised ML system.

## References

### 2018

1. Ripley, B.D. (1996) Pattern Recognition and Neural Networks, Cambridge: Cambridge University Press, p. 354
2. "Subject: What are the population, sample, training set, design set, validation set, and test set?", Neural Network FAQ, part 1 of 7: Introduction (txt), comp.ai.neural-nets, Sarle, W.S., ed. (1997, last modified 2002-05-17)

### 2009

• (Wikipedia, 2009) ⇒ http://en.wikipedia.org/wiki/Training_set
• In artificial intelligence, a training set consists of an input vector and an answer vector, and is used together with a supervised learning method to train a knowledge database (e.g. a neural net or a naive bayes classifier) used by an AI machine.
• In general, the intelligent system consists of a function taking one or more arguments and results in an output vector, and the learning method's task is to run the system once with the input vector as the arguments, calculating the output vector, comparing it with the answer vector and then changing somewhat in order to get an output vector more like the answer vector next time the system is simulated.

### 2000

• (Evgeniou et al., 2000) ⇒ Theodorus Evgeniou, Massimiliano Pontil, and Tomaso Poggio. (2000). “Regularization Networks and Support Vector Machines.” In: Advances in Computational Mathematics, 13(1).
• ... Vapnik’s theory characterizes and formalizes these concepts in terms of the capacity of a set of functions and capacity control depending on the training data: for instance, for a small training set the capacity of the function space in which $f$ is sought has to be small whereas it can increase with a larger training set.
• ... We are provided with examples of this probabilistic relationship, that is with a data set $D_l ≡ \{(x_i, y_i) \in X×Y\}^l_{i=1}$ called the training data, obtained by sampling $l$ times the set $X × Y$ according to $P(x, y)$.