k-Means Clustering Algorithm

From GM-RKB
Jump to navigation Jump to search

A k-Means Clustering Algorithm is a top-down clustering algorithm that relocates cluster centroids by minimizing mean within-cluster sum of squares (mean squared error).



References

2014


  • (Wikipedia, 2014) ⇒ http://en.wikipedia.org/wiki/K-means_clustering#Description Retrieved:2014-10-20.
    • Given a set of observations (x1, x2, …, xn), where each observation is a d-dimensional real vector, k-means clustering aims to partition the n observations into k (≤ n) sets S = {S1S2, …, Sk} so as to minimize the within-cluster sum of squares (WCSS). In other words, its objective is to find: [math]\displaystyle{ \underset{\mathbf{S}} {\operatorname{arg\,min}} \sum_{i=1}^{k} \sum_{\mathbf x \in S_i} \left\| \mathbf x - \boldsymbol\mu_i \right\|^2 }[/math] where μi is the mean of points in Si.

2009

2004

2001

1999

  • (Larsen & Aone, 1999) ⇒ B. Larsen, and C. Aone. (1999). “Fast and Effective Text Mining using Linear-time Document Clustering.” In: Proceedings of KDD 1999.

1998

  • (Neal & Hinton, 1998) ⇒ R. Neal, and G. Hinton. (1998). “A view of the EM algorithm that justifies incremental, sparse, and other variants” In: Michael I. Jordan (Ed.), Learning in Graphical Models, Kluwer: 1998.
    • Allows for application of k-Means/EM to non-continuous metric spaces

1992

  • (Cutting et al., 1992) ⇒ D. R. Cutting, D. R. Karger, J. O. Pedersen, and John W. Tukey. (1992). “Scatter/gather: A cluster-based approach to browsing large document collections.” In: Proceedings of the Fifteenth ACM SIGIR Conference Retrieval.

1990

  • (Kaufman & Rousseeuw, 1990) ⇒ L. Kaufman, and P. J. Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley and Sons, March 1990.

1979

1973

  • (Duda & Hart, 1973) ⇒ R. O. Duda, and P.E. Hart. (1973). “Pattern Classification and Scene Analysis. New York: John Wiley and Sons.

1967

  • (MacQueen, 1967) ⇒ J. B. MacQueen. (1967). “Some Methods for Classification and Analysis of Multivariate Observations.” In: Proceedings of the Fifth Symposium on Math, Statistics, and Probability (pp. 281{297).

1965

  • (Forgy, 1965) ⇒ E. Forgy. (1965). “Cluster Analysis of Multivariate Data: Efficiency vs. interpretability of classifications.” In: Biometrics 21:768.

1956

  • (Steinhaus, 1956) ⇒ H. Steinhaus. (1956). “Sur la division des corp materiels en parties.” In: Bull. Acad. Polon. Sci., C1. III vol IV.