2012 TwoApproachestoUnderstandingWhe

From GM-RKB
Jump to navigation Jump to search

Subject Headings:

Notes

Cited By

Quotes

Author Keywords

Abstract

Most algorithm work in data mining focuses on designing algorithms to address a learning problem. Here we focus our attention on designing algorithms to determine the ease or difficulty of a problem instance. The area of clustering under constraints has recently received much attention in the data mining community. We can view the constraints as restricting (either directly or indirectly) the search space of a clustering algorithm to just feasible clusterings. However, to our knowledge no work explores methods to count the feasible clusterings or other measures of difficulty nor the importance of these measures. We present two approaches to efficiently characterize the difficulty of satisfying must-link (ML) and cannot-link (CL) constraints: calculating the fractional chromatic polynomial of the constraint graph using LP and approximately counting the number of feasible clusterings using MCMC samplers. We show that these measures are correlated to the classical performance measures of constrained clustering algorithms. From these insights and our algorithms we construct new methods of generating and pruning constraints and empirically demonstrate their usefulness.

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2012 TwoApproachestoUnderstandingWheIan DavidsonTwo Approaches to Understanding When Constraints Help Clustering10.1145/2339530.23397342012