2006 MeasuringConstraintSetUtilityfo

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Cluster Performance; Rand Index; Cluster Accuracy; Partitional Cluster Algorithm; High Informativeness.

Notes

Cited By

Quotes

Abstract

Clustering with constraints is an active area of machine learning and data mining research. Previous empirical work has convincingly shown that adding constraints to clustering improves performance, with respect to the true data labels. However, in most of these experiments, results are averaged over different randomly chosen constraint sets, thereby masking interesting properties of individual sets. We demonstrate that constraint sets vary significantly in how useful they are for constrained clustering; some constraint sets can actually decrease algorithm performance. We create two quantitative measures, informativeness and coherence, that can be used to identify useful constraint sets. We show that these measures can also help explain differences in performance for four particular constrained clustering algorithms.

References

  1. Wagstaff, K., Cardie, C., Rogers, S., Schroedl, S.: Constrained k-means clustering with background knowledge. In: Proceedings of the Eighteenth International Conference on Machine Learning (2001)
  2. Klein, D., Kamvar, S.D., Manning, C.D.: From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering. In: Proceedings of the Nineteenth International Conference on Machine Learning (2002)
  3. Xing, E.P., Ng, A.Y., Jordan, M.I., Russell, S.: Distance metric learning, with application to clustering with side-information. NIPS 15 (2003)
  4. Basu, S., Bilenko, M., Mooney, R.J.: A probabilistic framework for semi-supervised clustering. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA (2004)
  5. Bar-Hillel, A., Hertz, T., Shental, N., Weinshall, D.: Learning a Mahalanobis metric from equivalence constraints. Journal of Machine Learning Research 6 (2005)
  6. Wagstaff, K.L.: Intelligent Clustering with Instance-Level Constraints. PhD thesis, Cornell University (2002)
  7. Lu, Z., Leen, T.K.: Semi-supervised learning with penalized probabilistic clustering. In: Advances in Neural Information Processing Systems 17 (2005)
  8. Davidson, I., Ravi, S.S.: Clustering with constraints: Feasibility issues and the k-means algorithm. In: Proceedings of the 2005 SIAM International Conference on Data Mining (2005)
  9. Mitchell, T.: Machine Learning. McGraw Hill, New York (1997)
  10. Blake, C.L., Merz, C.J.: UCI Repository of Machine Learning Databases (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
  11. Rand, W.M.: Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association 66(366) (1971);


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2006 MeasuringConstraintSetUtilityfoSugato Basu
Ian Davidson
Kiri L. Wagstaff
Measuring Constraint-Set Utility for Partitional Clustering Algorithms10.1007/11871637_152006