- (Xing et al., 2003) ⇒ Eric P. Xing, Andrew Y. Ng, Michael I. Jordan, and Stuart Russell. (2003). “Distance Metric Learning with Application to Clustering with Side-information.” In: Advances in Neural Information Processing Systems.
Subject Headings: Distance-Metric Learning Algorithm.
- (Basu et al., 2004) ⇒ Sugato Basu, Mikhail Bilenko, and Raymond Mooney. (2004). “A Probabilistic Framework for Semi-Supervised Clustering.” In: Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2004).
Many algorithms rely critically on being given a good metric over their inputs. For instance, data can often be clustered in many "plausible" ways, and if a clustering algorithm such as K-means initially fails to find one that is meaningful to a user, the only recourse may be for the user to manually tweak the metric until sufficiently good clusters are found. For these and other applications requiring good metrics, it is desirable that we provide a more systematic way for users to indicate what they consider “similar." For instance, we may ask them to provide examples. In this paper, we present an algorithm that, given examples of similar (and, if desired, dissimilar) pairs of points in [math]R^n[/math], learns a distance metric over [math]R^n[/math] that respects these relationships. Our method is based on posing metric learning as a convex optimization problem, which allows us to give efficient, local-optima-free algorithms. We also demonstrate empirically that the learned metrics can be used to significantly improve clustering performance.
We have presented an algorithm that, given examples of similar pairs of points in [math]R^n[/math], learns a distance metric that respects these relationships. Our method is based on posing metric learning as a convex optimization problem, which allowed us to derive efficient, local-optima free algorithms. We also showed examples of diagonal and full metrics learned from simple artificial examples, and demonstrated on artificial and on UCI datasets how our methods can be used to improve clustering performance.
- C. Atkeson, A. Moore, and S. Schaal. Locally weighted learning. AI Review, 1996.
- T. Cox and M. Cox. Multidimensional Scaling. Chapman & Hall, London, 1994.
- C. Domeniconi and D. Gunopulos. Adaptive nearest neighbor classification using support vector machines. In Advances in Neural Information Processing Systems 14. MIT Press, 2002.
- G. H. Golub and C. F. Van Loan. Matrix Computations. Johns Hopkins Univ. Press, 1996.
- Trevor Hastie and Robert Tibshirani. Discriminant adaptive nearest neighbor classification. IEEE Transactions on Pattern Analysis and Machine Learning, 18:607–616, 1996.
- T.S. Jaakkola and D. Haussler. Exploiting generative models in discriminaive classifier. In: Proceedings of Tenth Conference on Advances in Neural Information Processing Systems, 1999.
- I.T. Jolliffe. Principal Component Analysis. Springer-Verlag, New York, 1989.
- R. Rockafellar. Convex Analysis. Princeton Univ. Press, 1970.
- S.T. Roweis and L.K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science 290: 2323-2326.
- Bernhard Schölkopf and Alexander J. Smola. Learning with Kernels. In Press, 2001.
- N. Tishby, Fernando Pereira, and W. Bialek. The information bottleneck method. In: Proceedings of the 37th Allerton Conference on Communication, Control and Computing, 1999.
- K.Wagstaff, C. Cardie, S. Rogers, and S. Schroedl. Constrained k-means clustering with background knowledge. In: Proceedings of 18th International Conference on Machine Learning, 2001.
|2003 DistanceMetricLearningwithAppli||Eric P. Xing|
Andrew Y. Ng
Michael I. Jordan
Stuart J. Russell
|Distance Metric Learning with Application to Clustering with Side-information||2003|