Gini Diversity Index

From GM-RKB
(Redirected from Gini Impurity Index)
Jump to navigation Jump to search

A Gini diversity index is a dispersion metric based on an impurity function.



References

2012

  • http://en.wikipedia.org/wiki/Decision_tree_learning#Gini_impurity
    • Used by the CART algorithm, Gini impurity is a measure of how often a randomly chosen element from the set would be incorrectly labelled if it were randomly labelled according to the distribution of labels in the subset. Gini impurity can be computed by summing the probability of each item being chosen times the probability of a mistake in categorizing that item. It reaches its minimum (zero) when all cases in the node fall into a single target category. o compute Gini impurity for a set of items, suppose y takes on values in {1, 2, ..., m}, and let fi = the fraction of items labelled with value i in the set. [math]\displaystyle{ I_{G}(f) = \sum_{i=1}^{m} f_i (1-f_i) = \sum_{i=1}^{m} (f_i - {f_i}^2) = \sum_{i=1}^m f_i - \sum_{i=1}^{m} {f_i}^2 = 1 - \sum^{m}_{i=1} {f_i}^{2} }[/math]



2011

2004

1984