2012 TwoApproachestoUnderstandingWhe

Subject Headings:

Notes

Most algorithm work in data mining focuses on designing algorithms to address a learning problem. Here we focus our attention on designing algorithms to determine the ease or difficulty of a problem instance. The area of clustering under constraints has recently received much attention in the data mining community. We can view the constraints as restricting (either directly or indirectly) the search space of a clustering algorithm to just feasible clusterings. However, to our knowledge no work explores methods to count the feasible clusterings or other measures of difficulty nor the importance of these measures. We present two approaches to efficiently characterize the difficulty of satisfying must-link (ML) and cannot-link (CL) constraints: calculating the fractional chromatic polynomial of the constraint graph using LP and approximately counting the number of feasible clusterings using MCMC samplers. We show that these measures are correlated to the classical performance measures of constrained clustering algorithms. From these insights and our algorithms we construct new methods of generating and pruning constraints and empirically demonstrate their usefulness.

;

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2012 TwoApproachestoUnderstandingWhe	Ian Davidson			Two Approaches to Understanding When Constraints Help Clustering				10.1145/2339530.2339734		2012