1995 AStudyOfCrossValidAndBoostrap
(
Kohavi, 1995
) =>
Ron Kohavi
. (1995). "
A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection.
" In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (
IJCAI 1995
).
Keywords:
Cross-Validation Estimation Algorithm
,
Bootstrap Algorithm
,
Supervised Learning Task
.
Notes
Presentation slides:
http://robotics.stanford.edu/%7Eronnyk/accEst-talk.ps
Quotes
Abstract
We review
accuracy estimation methods
and compare the two most common methods:
cross-validation
and
bootstrap
. Recent
experimental results
on
artificial data
and
theoretical results
in restricted settings have shown that for selecting a good
classifier
from a
set
of
classifiers
(
model selection
),
ten-fold cross-validation
may be better than the more expensive
leave-one-out cross-validation
. We report on a large-scale
experiment
--- over half a million runs of
C4.5 Algorithm
and a
Naive-Bayes algorithm
--- to
estimate
the effects of different
parameters
on these
algorithms
on
real-world datasets
. For
cross-validation
, we vary the
number
of
folds
and whether the
folds
are
stratified
or
not
; for
bootstrap
, we vary the number of
bootstrap samples
. Our results indicate that for
real-world datasets
similar to ours, the best
method
to use for
model selection
is
ten-fold stratified cross validation
, even if
computation power
allows using more
folds
.
1. Introduction
A
classifier
is a
function
that maps an
unlabelled instance
to a
label
using internal
data structures
. An
inducer
or an
induction algorithm
builds a
classifier
from a given
dataset
.
CART
and
C4.5
(Brennan, Friedman Olshen &. Stone 1984, Quinlan 1993) are
decision tree inducers
that build
decision tree classifiers
. In this
paper
we are not interested in the specific
method
for
inducing classifiers
, but assume access to a
dataset
and an
inducer
of interest.