2014 RelevantOverlappingSubspaceClus

From GM-RKB
Jump to navigation Jump to search

Subject Headings:

Notes

Cited By

Quotes

Author Keywords

Abstract

Clustering categorical data poses some unique challenges: Due to missing order and spacing among the categories, selecting a suitable similarity measure is a difficult task. Many existing techniques require the user to specify input parameters which are difficult to estimate. Moreover, many techniques are limited to detect clusters in the full-dimensional data space. Only few methods exist for subspace clustering and they produce highly redundant results. Therefore, we propose ROCAT (Relevant Overlapping Subspace Clusters on Categorical Data), a novel technique based on the idea of data compression. Following the Minimum Description Length principle, ROCAT automatically detects the most relevant subspace clusters without any input parameter. The relevance of each cluster is validated by its contribution to compress the data. Optimizing the trade-off between goodness-of-fit and model complexity, ROCAT automatically determines a meaningful number of clusters to represent the data. ROCAT is especially designed to detect subspace clusters on categorical data which may overlap in objects and/or attributes; i.e. objects can be assigned to different clusters in different subspaces and attributes may contribute to different subspaces containing clusters. ROCAT naturally avoids undesired redundancy in clusters and subspaces by allowing overlap only if it improves the compression rate. Extensive experiments demonstrate the effectiveness and efficiency of our approach.

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2014 RelevantOverlappingSubspaceClusClaudia Plant
Jing Feng
Xiao He
Bettina Konte
Son T. Mai
Relevant Overlapping Subspace Clusters on Categorical Data10.1145/2623330.26236522014