2015 NonExhaustiveOverlappingCluster

From GM-RKB
Jump to navigation Jump to search

Subject Headings:

Notes

Cited By

Quotes

Author Keywords

Abstract

Clustering is one of the most fundamental tasks in data mining. To analyze complex real-world data emerging in many data-centric applications, the problem of non-exhaustive, overlapping clustering has been studied where the goal is to find overlapping clusters and also detect outliers simultaneously. We propose a novel convex semidefinite program (SDP) as a relaxation of the non-exhaustive, overlapping clustering problem. Although the SDP formulation enjoys attractive theoretical properties with respect to global optimization, it is computationally intractable for large problem sizes. As an alternative, we optimize a low-rank factorization of the solution. The resulting problem is non-convex, but has a smaller number of solution variables. We construct an optimization solver using an augmented Lagrangian methodology that enables us to deal with problems with tens of thousands of data points. The new solver provides more accurate and reliable answers than other approaches. By exploiting the connection between graph clustering objective functions and a kernel k-means objective, our new low-rank solver can also compute overlapping communities of social networks with state-of-the-art accuracy.

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2015 NonExhaustiveOverlappingClusterInderjit S. Dhillon
David F. Gleich
Yangyang Hou
Joyce Jiyoung Whang
Non-exhaustive, Overlapping Clustering via Low-Rank Semidefinite Programming10.1145/2783258.27833982015