Distance Metric

A distance metric is a symmetric metric that maps two items, [math]\displaystyle{ x,y }[/math], to a non-negative distance value where [math]\displaystyle{ \text{Dist}(x,x)=\text{Dist}(y,y)=0 }[/math] and is also constrained the triangle inequality.

AKA: [math]\displaystyle{ \text{Dist}() }[/math], Relatedness Metric, d.
Context:
- Input: Thing, Thing.
- Output: a Distance Value, such that:
  - (p1) Dist(x,y) ≥ 0. Non-Negative Function (can be derived from p2, p3, p4).
  - (p2) Dist(x,y) = Dist(y,x), a Symmetric Function. (See: Quasi Metric).
  - (p3) Dist(x,y) = 0 iff x = y. Reflexive Function.
  - (p4) Dist(x,y) > 0, If x ≠ y. Positive Function. (See: Pseudo Metric).
  - (p5) Dist(x,y) < Dist(x,z) + Dist(z,y), Triangle Inequality. (See: Super Metric).
- ...
- It can range from being a Discrete Distance Function to being a Continuous Distance Function.
- It can range from being an Ultrametric Distance Function to being a Partial Metric Distance Function.
- ...
- It can define a Metric Space (along with a tuple point set).
- It can be used as a Semantic Similarity Measure.
- It can be produced by a Distance Function Creation System (that solves a distance function creation task).
- It can be a Relational Similarity Function.
- It can be a Learned Distance Function, that adjusts the Dimension Weights.
- ...
Example(s):
- a Number Distance Function(?), such as a Physical Distance Function (or Subtraction Function?).
- a Set Distance Function, such as a Jaccard Distance.
- a Multiset Distance Function.
- a String Distance Function, such as: a Gene Distance Function or Levenshtein Distance Function.
- a Word Distance Function, such as a Lexical Semantic Similarity Function.
- a Group Norm Distance Function.
- a Euclidean Distance Metric.
- a Tuple Distance Function, a Vector Distance Function, such as L2 Norm Euclidean Distance or Cosine Distance Function.
- a Tree Distance Function, a Graphs Similarity Measure, a Graph Nodes Similarity Measure, an Edge Distance Function.
- a Geographical Distance Metric.
- …
Counter-Example(s):
- a Dissimilarity Function.
- A Cardinality Function.
- A Semantic Similarity Measure.
See: Vector Space, Similarity Search, Set Measure Function, Distance-based Clustering, Relative Metric.

References

2010

(Wikipedia, 2010) http://en.wikipedia.org/wiki/Metric_(mathematics)
- In mathematics, a metric or distance function is a function which defines a distance between elements of a set. A set with a metric is called a metric space.
- A metric on a set X is a function (called the distance function or simply distance) d : X × X → R (where R is the set of real numbers).
  - For all x, y, z in X, this function is required to satisfy the following conditions:
  - 1. d(x, y) ≥ 0 (non-negativity)
  - 2. d(x, y) = 0 if and only if x = y (identity of indiscernibles. Note that condition 1 and 2 together produce positive definiteness)
  - 3. d(x, y) = d(y, x) (symmetry)
  - 4. d(x, z) ≤ d(x, y) + d(y, z) (subadditivity / triangle inequality).
- These axioms are not independent: Non-negativity follows from the other axioms.
- A metric is called an ultrametric if it satisfies the following stronger version of the triangle inequality:
  - For all x, y, z in M, d(x, z) ≤ max(d(x, y), d(y, z))
- A metric d on X is called intrinsic if any two points x and y in X can be joined by a curve with length arbitrarily close to d(x, y).

2009

(Rajaraman & Ullman, 2009e) ⇒ Anand Rajaraman, and Jeffrey D. Ullman. (2009). “Applications and Variants of LSH." Stanford Course - CS345A, Winter 2009: Data Mining. 2/2.
(Rajaraman & Ullman, 2009f) ⇒ Anand Rajaraman, and Jeffrey D. Ullman. (2009). “Distance Measures, Generalizations of Minhashing and LSH." Stanford Course - CS345A, Winter 2009: Data Mining. 2/2-2/4.

2008

(Xiang et al., 2008) ⇒ Shiming Xiang, Feiping Nie, and Changshui Zhang. (2008). “Learning a Mahalanobis Distance Metric for Data Clustering and Classification.” In: Pattern Recognition 41.
(De Raedt & Ramon, 2008) ⇒ Luc De Raedt, and Jan Ramon. (2008). “Deriving Distance Metrics from Generality Relations.” In: Pattern Recognition Letters 30(3).

2006

(Zezula et al., 2006) ⇒ Pavel Zezula, Giuseppe Amato, Vlastislav Dohnal, and Michal Batko. (2006). “Similarity Search: The Metric Space Approach." Springer, Advances in Database Systems.

(Deza & Deza, 2006) ⇒ Michel-Marie Deza, and Elena Deza. (2006). “Dictionary of Distances." Elsevier Science. ISBN:9780080465548
- QUOTE: The concept of distance is basic to human experience. In everyday life it usually means some degree of closeness of two physical objects or ideas, i.e., length, time interval, gap, rank difference, coolness or remoteness, while the term metric is often used as a standard for a measurement. But here we consider, except for the last two chapters, the mathematical meaning of those terms, which is an abstraction of measurement.
  The mathematical notions of distance metric (i.e., a function d (x, y) from X × X to the set of real numbers satisfying to d (x, y) = 0 with equality only for x = y, d (x, y) = d (y, x), and d (x, y) = d (x, z) +d (z, y)) and of metric space (X, d) were originated a century ago by M. Fréchet (1906) and F. Hausdor (1914) as a special case of an infinite topological space. The triangle inequality above appears already in Euclid. The infinite metric spaces are usually seen as a generalization of the metric|x - y|on the real numbers. Their main classes are the measurable spaces (add measure) and Banach spaces (add norm and completeness).

2004

http://www.nature.com/nrg/journal/v5/n11/glossary/nrg1469_glossary.html
- A measure of similarity or dissimilarity that can be used to organize groups according to their degree of relation to one another. …

2003

(Xing, 2003) ⇒ Eric P. Xing, Andrew Y. Ng , Michael I. Jordan and Stuart J. Russell. (2003). “Distance Metric Learning, with Application to Clustering with Side-Information.” In: Advances in Neural Information Processing Systems 15.

2001

(Ramon & Bruynooghe, 2001) ⇒ Jan Ramon, and M. Bruynooghe. (2001). “A polynomial time computable metric between point sets.” In: Acta Inform. 37, 765–780.

1998

(Bunke & Shearer, 1998) ⇒ Horst Bunke, and Kim Shearer. (1998). “A graph distance metric based on the maximal common subgraph.” In: Pattern Recognition Lett. 19, 255–259.
(Ramon et al., 1998) ⇒ Jan Ramon, M. Bruynooghe, and W. Van Laer. (1998). “Distance Measures Between Atoms.” In: ProceedingsCompulogNet Area Meeting on ’Computational Logic and Machine Learning’, pp. 35–41.

1997

(Mitchell, 1997) ⇒ Tom M. Mitchell. (1997). “Machine Learning.” McGraw-Hill.

1979

(Tai, 1979) ⇒ K. Tai. (1979). “Tree to Tree Correction Problem.” In: ACM 26 (3), 422–433.