# k-Means Clustering Algorithm

(Redirected from k-Means)

A k-Means Clustering Algorithm is a top-down clustering algorithm that relocates cluster centroids by minimizing mean within-cluster sum of squares (mean squared error).

## References

### 2014

• (Wikipedia, 2014) ⇒ http://en.wikipedia.org/wiki/K-means_clustering#Description Retrieved:2014-10-20.
• Given a set of observations (x1, x2, …, xn), where each observation is a d-dimensional real vector, k-means clustering aims to partition the n observations into k (≤ n) sets S = {S1S2, …, Sk} so as to minimize the within-cluster sum of squares (WCSS). In other words, its objective is to find: $\underset{\mathbf{S}} {\operatorname{arg\,min}} \sum_{i=1}^{k} \sum_{\mathbf x \in S_i} \left\| \mathbf x - \boldsymbol\mu_i \right\|^2$ where μi is the mean of points in Si.

### 1999

• (Larsen & Aone, 1999) ⇒ B. Larsen, and C. Aone. (1999). “Fast and Effective Text Mining using Linear-time Document Clustering.” In: Proceedings of KDD 1999.

### 1998

• (Neal & Hinton, 1998) ⇒ R. Neal, and G. Hinton. (1998). “A view of the EM algorithm that justifies incremental, sparse, and other variants” In: Michael I. Jordan (Ed.), Learning in Graphical Models, Kluwer: 1998.
• Allows for application of k-Means/EM to non-continuous metric spaces

### 1992

• (Cutting et al., 1992) ⇒ D. R. Cutting, D. R. Karger, J. O. Pedersen, and John W. Tukey. (1992). “Scatter/gather: A cluster-based approach to browsing large document collections.” In: Proceedings of the Fifteenth ACM SIGIR Conference Retrieval.

### 1990

• (Kaufman & Rousseeuw, 1990) ⇒ L. Kaufman, and P. J. Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley and Sons, March 1990.

### 1973

• (Duda & Hart, 1973) ⇒ R. O. Duda, and P.E. Hart. (1973). “Pattern Classification and Scene Analysis. New York: John Wiley and Sons.

### 1967

• (MacQueen, 1967) ⇒ J. B. MacQueen. (1967). “Some Methods for Classification and Analysis of Multivariate Observations.” In: Proceedings of the Fifth Symposium on Math, Statistics, and Probability (pp. 281{297).

### 1965

• (Forgy, 1965) ⇒ E. Forgy. (1965). “Cluster Analysis of Multivariate Data: Efficiency vs. interpretability of classifications.” In: Biometrics 21:768.

### 1956

• (Steinhaus, 1956) ⇒ H. Steinhaus. (1956). “Sur la division des corp materiels en parties.” In: Bull. Acad. Polon. Sci., C1. III vol IV.