C4.5 Algorithm

A C4.5 algorithm is a classification tree training algorithm that uses an Information Gain impurity function as a decision tree splitting criterion.

Context:
- It was a direct descendant of the ID3 algorithm.
- It implements the Information Gain Measure as its Branch Splitting Heuristic.
- It employs Post-Pruning.
- It is implemented in the C4.5 System.
- …
Counter-Example(s):
- ID3 Algorithm.
- C5.0 Algorithm.
- FOIL Algorithm.
- CART Algorithm.
See: Decision Tree Pruning, Gini Index, Ross Quinlan.

References

Wikidata Concept: http://www.wikidata.org/wiki/Q1022655

2011

(Wikipedia, 2011) ⇒ http://en.wikipedia.org/wiki/C4.5_algorithm
- C4.5 is an algorithm used to generate a decision tree developed by Ross Quinlan. C4.5 is an extension of Quinlan's earlier ID3 algorithm. The decision trees generated by C4.5 can be used for classification, and for this reason, C4.5 is often referred to as a statistical classifier. C4.5 builds decision trees from a set of training data in the same way as ID3, using the concept of information entropy. The training data is a set [math]\displaystyle{ S = {s_1, s_2, ...} }[/math] of already classified samples. Each sample [math]\displaystyle{ s_i = {x_1, x_2, ...} }[/math] is a vector where [math]\displaystyle{ x_1, x_2, … }[/math] represent attributes or features of the sample. The training data is augmented with a vector [math]\displaystyle{ C = {c_1, c_2, ...} }[/math] where [math]\displaystyle{ c_1, c_2, … }[/math] represent the class to which each sample belongs. At each node of the tree, C4.5 chooses one attribute of the data that most effectively splits its set of samples into subsets enriched in one class or the other. Its criterion is the normalized information gain (difference in entropy) that results from choosing an attribute for splitting the data. The attribute with the highest normalized information gain is chosen to make the decision. The C4.5 algorithm then recurs on the smaller sublists.

2009
(Wu & Kumar, 2009) ⇒ Xindong Wu, and Vipin Kumar, editors. (2009). “The Top Ten Algorithms in Data Mining.” Chapman & Hall. ISBN:1420089641
2002
(Gabor Melli, 2002) ⇒ Gabor Melli. (2002). “PredictionWorks' Data Mining Glossary." PredictionWorks.
C4.5: A decision tree algorithm developed by Ross Quinlan, and a direct descendant of the ID3 algorithm. C4.5 can process both discrete and continuous data and makes classifications. C4.5 implements the information gain measure as its splitting criterion and employs post-pruning. Through the 1990s it was the most common algorithm to compare results against. See ID3, Pruning, Gini.
1996
(Quinlan, 1996) ⇒ J. Ross Quinlan. (1996). “Improved Use of Continuous Attributes in C4.5.” In: Journal of Artificial Intelligence Research, 4.
1993
(Quinlan, 1993a) ⇒ J. Ross Quinlan. (1993). “C4.5: Programs for machine learning." Morgan Kaufmann. ISBN:1558602380

C4.5 Algorithm

References

2011

2009

2002

1996

1993

Navigation menu

Search