Spectral Clustering Algorithm
Jump to navigation
Jump to search
A Spectral Clustering Algorithm is a clustering algorithm that performs dimensionality reduction (for clustering in fewer dimensions) by means of the spectrum of the affinity matrix.
- AKA: Spectral Clustering.
- Example(s):
- Counter-Example(s):
- See: Kernel Principal Component Analysis; K-Way Spectral Clustering.
References
2011
- (Sammut & Webb, 2011) ⇒ Claude Sammut; Geoffrey I. Webb. (2011). "Spectral Clustering." In: (Sammut & Webb, 2011) p.907
- (Wikipedia - Spectral Clustering, 2011-Jun-13) ⇒ http://en.wikipedia.org/wiki/Cluster_analysis#Spectral_clustering
- QUOTE: Given a set of data points A, the similarity matrix may be defined as a matrix [math]\displaystyle{ S }[/math] where [math]\displaystyle{ S_{ij} }[/math] represents a measure of the similarity between points [math]\displaystyle{ i, j\in A }[/math]. Spectral clustering techniques make use of the spectrum of the similarity matrix of the data to perform dimensionality reduction for clustering in fewer dimensions. One such technique is the Normalized Cuts algorithm by Shi-Malik, commonly used for image segmentation. It partitions points into two sets [math]\displaystyle{ (S_1,S_2) }[/math] based on the eigenvector [math]\displaystyle{ v }[/math] corresponding to the second-smallest eigenvalue of the Laplacian matrix [math]\displaystyle{ L = I - D^{-1/2}SD^{-1/2} }[/math] of [math]\displaystyle{ S }[/math], where [math]\displaystyle{ D }[/math] is the diagonal matrix [math]\displaystyle{ D_{ii} = \sum_{j} S_{ij}. }[/math] This partitioning may be done in various ways, such as by taking the median [math]\displaystyle{ m }[/math] of the components in [math]\displaystyle{ v }[/math], and placing all points whose component in [math]\displaystyle{ v }[/math] is greater than [math]\displaystyle{ m }[/math] in [math]\displaystyle{ S_1 }[/math], and the rest in [math]\displaystyle{ S_2 }[/math]. The algorithm can be used for hierarchical clustering by repeatedly partitioning the subsets in this fashion. A related algorithm is the Meila-Shi algorithm, which takes the eigenvectors corresponding to the [math]\displaystyle{ k }[/math] largest eigenvalues of the matrix [math]\displaystyle{ P = SD^{-1} }[/math] for some [math]\displaystyle{ k }[/math], and then invokes another algorithm (e.g. k-means) to cluster points by their respective [math]\displaystyle{ k }[/math] components in these eigenvectors.
- See also: Kernel principal component analysis
2003
- (Bengio & al, 2003) ⇒ Yoshua Bengio, Jean-François Paiement, Pascal Vincent, Olivier Delalleau, Nicolas Le Roux, Marie Ouimet. (2003). "Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering." In: Advances in Neural Information Processing Systems 16 (NIPS 2004).
2001
- (Ng, Jordan & Weiss, 2001) ⇒ Andrew Y. Ng, Michael I. Jordan, and Yair Weiss. (2001). "On Spectral Clustering: Analysis and an algorithm." In: Advances in Neural Information Processing Systems, 14 (NIPS 2001)
- QUOTE: Despite many empirical successes of spectral clustering methods algorithms that cluster points using eigenvectors of matrices derived from the data - there are several unresolved issues. First, there are a wide variety of algorithms that use the eigenvectors in slightly different ways. Second, many of these algorithms have no proof that they will actually compute a reasonable clustering. In this paper, we present a simple spectral clustering algorithm that can be implemented using a few lines of Matlab. Using tools from matrix perturbation theory, we analyze the algorithm, and give conditions under which it can be expected to do well. We also show surprisingly good experimental results on a number of challenging clustering problems.