2008 DerivingDistMetricsFromGenRelations

From GM-RKB
Jump to: navigation, search

Subject Headings: Distance Function, General-Specific Ordering.

Notes

Quotes

Abstract

Many pattern recognition and machine learning approaches employ a distance metric on patterns, or a generality relation to partially order the patterns. We investigate the relationship amongst them and prove a theorem that shows how a distance metric can be derived from a partial order (and a corresponding size on patterns) under mild conditions. We then discuss the use of the theorem. More specifically, we show how well-known distance metrics for sets, strings, trees and graphs can be derived from their generality relation.

1. Introduction

Over the past decade, there has been a growing attention in the pattern recognition, machine learning and data mining literature to dealing with structured data in the form of sets, strings, trees or graphs. Many of these techniques employ a distance metric, which measures how similar two instances are, or a generality order, which partially orders the search space of possible patterns. Distance metrics form the basis for many well-known algorithms, such as k-nearest neighbor and k-means clustering, whereas the generality order is used to intelligently enumerate and traverse the space of possible patterns in local pattern mining and concept-learning (Mitchell, 1997).

This paper points out a general relationship between the generality ordering and distance metrics. Specifically, we focus on partially ordered pattern spaces that satisfy two conditions: (1) there is a monotonic size measure on patterns jsj, that is, a size that monotonically decreases as patterns become more general, and (2) they satisfy the diamond equation, which imposes restrictions on the size of the minimally general generalizations and maximally general specializations. Our theorem extends earlier results by Ramon et al. (1998) obtained in the context of distance metrics for atoms in logic. The earlier results was restricted to complete lattices, and hence assumed that minimally general generalizations and maximally general specializations always exist and are unique.

We also investigate the applicability of the theorem by showing how it can be used to reconstruct well-known distance metrics for sets (Ramon and Bruynooghe, 2001), strings (Wagner and Fischer, 1974), trees (Tai, 1979) and graphs (Bunke and Shearer, 1998) from their generality relations, and drawing some conclusions from this.

References

  • Horst Bunke, and Kim Shearer. (1998). “A graph distance metric based on the maximal common subgraph.” In: Pattern Recognition Lett. 19, 255–259.
  • Tom M. Mitchell. (1997). “Machine Learning.” McGraw-Hill.
  • Jan Ramon, M. Bruynooghe. (2001). “A polynomial time computable metric between point sets.” In: Acta Inform. 37, 765–780.
  • Jan Ramon, M. Bruynooghe, W. Van Laer. (1998). “Distance measures between atoms.” In: ProceedingsCompulogNet Area Meeting on ’Computational Logic and Machine Learning’, pp. 35–41.
  • K. Tai. (1979). “Tree to tree correction problem.” In: ACM 26 (3), 422–433.
  • R.A. Wagner, and M.J. Fischer. (1974). “The string to string correction problem. Journal of the ACM 21 (1), 168–173. January.

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2008 DerivingDistMetricsFromGenRelationsLuc De Raedt
Jan Ramon
Deriving Distance Metrics from Generality RelationsPattern Recognition Lettershttp://dx.doi.org/10.1016/j.patrec.2008.09.00710.1016/j.patrec.2008.09.0072008