Subspace Clustering Algorithm

From GM-RKB
Jump to navigation Jump to search

A Subspace Clustering Algorithm is a Clustering Algorithm that detects all clusters in all subspaces



References

2014

  • (Wikipedia, 2014) ⇒ http://en.wikipedia.org/wiki/Clustering_high-dimensional_data#Subspace_clustering Retrieved:2014-7-27.
    • Subspace clustering is the task of detecting all clusters in all subspaces. This means that a point might be a member of multiple clusters, each existing in a different subspace. Subspaces can either be axis-parallel or affine. The term is often used synonymous with general clustering in high-dimensional data.

      The image on the right shows a mere two-dimensional space where a number of clusters can be identified. In the one-dimensional subspaces, the clusters [math]\displaystyle{ c_a }[/math] (in subspace [math]\displaystyle{ \{x\} }[/math]) and [math]\displaystyle{ c_b }[/math], [math]\displaystyle{ c_c }[/math], [math]\displaystyle{ c_d }[/math] (in subspace [math]\displaystyle{ \{y\} }[/math]) can be found. [math]\displaystyle{ c_c }[/math] cannot be considered a cluster in a two-dimensional (sub-)space, since it is too sparsely distributed in the [math]\displaystyle{ x }[/math] axis. In two dimensions, the two clusters [math]\displaystyle{ c_{ab} }[/math] and [math]\displaystyle{ c_{ad} }[/math] can be identified.

      The problem of subspace clustering is given by the fact that there are [math]\displaystyle{ 2^d }[/math] different subspaces of a space with [math]\displaystyle{ d }[/math] dimensions. If the subspaces are not axis-parallel, an infinite number of subspaces is possible. Hence, subspace clustering algorithm utilize some kind of heuristic to remain computationally feasible, at the risk of producing inferior results. For example, the downward-closure property (cf. association rules) can be used to build higher-dimensional subspaces only by combining lower-dimensional ones, as any subspace T containing a cluster, will result in a full space S also to contain that cluster (i.e. S ⊆ T), an approach taken by most of the traditional algorithms such as CLIQUE , SUBCLU . It is also possible to define a subspace using different degrees of relevance for each dimension, an approach taken by iMWK-Means .

  • (Kriegel & Ntoutsi, 2014) ⇒ Hans-Peter Kriegel, and Eirini Ntoutsi. (2014). “Clustering High Dimensional Data: Examining Differences and Commonalities Between Subspace Clustering and Text Clustering - a Position Paper.” In: ACM SIGKDD Explorations Newsletter Journal, 15(2). doi:10.1145/2641190.2641192

2002

1999