# k-Means Clustering Algorithm

(Redirected from k-means)

A k-Means Clustering Algorithm is a top-down clustering algorithm that relocates cluster centroids by minimizing mean within-cluster sum of squares (mean squared error).

**AKA:**Forgys Method, MacQueens Algorithm.**Context:**- It can be implemented by a k-Means Clustering System (to solve a k-means clustering task).
- It requires that
*k*be provided. - It can start by creating
*k*initial sets, either at random or using some heuristic data. - It can iterative refine the clusters by:
- 1. Each Instance
*d_i*is assigned to its closest Cluster Centroid. - 2. Each Cluster Centroid
*C_j*is updated to be the Mean of its Members.

- 1. Each Instance
- It can terminate after an iteration that does not change Cluster composition.
- It can (often) be applied to Metric Spaces because of the requirements to compute the Mean.
- It can (often) be have Computational Complexity of [math]\mathcal{O} (n \times K \times I \times d)[/math]; n : number of points; K : number of clusters; I : number of iterations; d : number of attributes.
- It can be considered a special base of the EM Algorithm, that assumes:
- Each cluster is modeled by a spherical Gaussian distribution;
- Each data item is assigned to a single cluster;
- The (prior probability) mixture weights are equal (each cluster is similarly sized).

**Example(s):****See:**Vector Quantization, Voronoi Cell, NP-Hard, Heuristic Algorithm, Mixture Model, Gaussian Distribution.

## References

### 2014

- (Wikipedia, 2014) ⇒ http://en.wikipedia.org/wiki/k-means_clustering Retrieved:2014-10-20.
is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining.*k*-means clustering*k*-means clustering aims to partition*n*observations into*k*clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.The problem is computationally difficult (NP-hard); however, there are efficient heuristic algorithms that are commonly employed and converge quickly to a local optimum. These are usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions via an iterative refinement approach employed by both algorithms. Additionally, they both use cluster centers to model the data; however,

*k*-means clustering tends to find clusters of comparable spatial extent, while the expectation-maximization mechanism allows clusters to have different shapes.

- (Wikipedia, 2014) ⇒ http://en.wikipedia.org/wiki/K-means_clustering#Description Retrieved:2014-10-20.
- Given a set of observations (
**x**_{1},**x**_{2}, …,**x**_{n}), where each observation is a d*-dimensional real vector,*n observations into*k*-means clustering aims to partition the*k*(≤*n*) sets**S**= {*S*_{1},*S*_{2}, …,*S*_{k}} so as to minimize the within-cluster sum of squares (WCSS). In other words, its objective is to find: [math]\underset{\mathbf{S}} {\operatorname{arg\,min}} \sum_{i=1}^{k} \sum_{\mathbf x \in S_i} \left\| \mathbf x - \boldsymbol\mu_i \right\|^2 [/math] where*μ*_{i}is the mean of points in*S*_{i}.

- Given a set of observations (

### 2009

- (Ghosh & Liu, 2009) ⇒ Joydeep Ghosh and Alexander Liu. (2009). “K-Means.” In: (Wu & Kumar, 2009)

### 2004

- (Ding & He, 2004) ⇒ Chris Ding, and Xiaofeng He. (2004). “
*K-means Clustering via Principal Component Analysis.*” In: Proceedings of Int'l Conf. Machine Learning (ICML 2004). - http://www.nature.com/nrg/journal/v5/n6/glossary/nrg1349_glossary.html
- An unsupervised clustering technique. Data points are partitioned into a predetermined number of non-hierarchical clusters based on similarity.

### 2001

- (Zha et al., 2001) ⇒ H. Zha, Chris Ding, M. Gu, Xiaofeng He, and H.D. Simon. (2001). “
*Spectral Relaxation for K-means Clustering.*” In: Neural Information Processing Systems vol.14 (NIPS 2001). - (Wagstaff et al., 2001) ⇒ Kiri Wagstaff, Claire Cardie, Seth Rogers, and Stefan Schrödl. (2001). “Constrained K-means Clustering with Background Knowledge." In: Proceedings of the Eighteenth International Conference on Machine Learning.

### 1999

- (Larsen & Aone, 1999) ⇒ B. Larsen, and C. Aone. (1999). “
*Fast and Effective Text Mining using Linear-time Document Clustering.*” In: Proceedings of KDD 1999.

### 1998

- (Neal & Hinton, 1998) ⇒ R. Neal, and G. Hinton. (1998). “
*A view of the EM algorithm that justifies incremental, sparse, and other variants*” In: Michael I. Jordan (Ed.), Learning in Graphical Models, Kluwer: 1998.- Allows for application of k-Means/EM to non-continuous metric spaces

### 1992

- (Cutting et al., 1992) ⇒ D. R. Cutting, D. R. Karger, J. O. Pedersen, and John W. Tukey. (1992). “
*Scatter/gather: A cluster-based approach to browsing large document collections.*” In: Proceedings of the Fifteenth ACM SIGIR Conference Retrieval.

### 1990

- (Kaufman & Rousseeuw, 1990) ⇒ L. Kaufman, and P. J. Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley and Sons, March 1990.

### 1979

- (Hartigan & Wong, 1979) ⇒ J. A. Hartigan, and M. A. Wong. (1979). “A K-means Clustering Algorithm". In: Soc. Ser. C-Appl. Stat.

### 1973

- (Duda & Hart, 1973) ⇒ R. O. Duda, and P.E. Hart. (1973). “
*Pattern Classification and Scene Analysis.*New York: John Wiley and Sons.

### 1967

- (MacQueen, 1967) ⇒ J. B. MacQueen. (1967). “
*Some Methods for Classification and Analysis of Multivariate Observations.*” In: Proceedings of the Fifth Symposium on Math, Statistics, and Probability (pp. 281{297).

### 1965

- (Forgy, 1965) ⇒ E. Forgy. (1965). “
*Cluster Analysis of Multivariate Data: Efficiency vs. interpretability of classifications.*” In: Biometrics 21:768.

### 1956

- (Steinhaus, 1956) ⇒ H. Steinhaus. (1956). “
*Sur la division des corp materiels en parties.*” In: Bull. Acad. Polon. Sci., C1. III vol IV.