2007 SimilaritySearchTheMetricSpaceApproach

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Similarity Search Task, Similarity Search Algorithm.

Notes

Cited By

Quotes

Tutorial Overview

The objective of this tutorial is to explain concepts of the powerful metric space approach to similarity searching, demonstrate its capabilities through case studies, and outline its future perspectives through distributed architectures that can achieve constant scalability by exploiting parallelism.

Abstract

Similarity searching has become afundamental computational task in a variety of application areas, including multimedia information retrieval, data mining, pattern recognition, machine learning, computer vision, biomedical databases, data compression and statistical data analysis. In such environments, an exact match has little meaning, and proximity/distance (similarity/dissimilarity) concepts are typically much more fruitful for searching.

The notion of similarity has long been studied extensively in the field of social psychology, nicely characterized by the following quotation:

An ability to assess similarity lies close to the core of cognition. The sense of sameness is the very keel and backbone of our thinking. An understanding of problem solving, categorization, memory retrieval,inductive reasoning, and other cognitive processes require that we understand how humans assess similarity.” --Robert L. Goldstone, Similarity, 2001

With the increasing diversity of digital data types covering practically all forms of fact representation, computerized data processing must respect these natural principles and provide adequate tools for similarity searching.

In this tutorial, we review the state of the art in developing similarity search mechanisms that accept the metric space paradigm. We explain the high extensibility of the metric space approach and demonstrate its capability with examples of distance functions. After a survey of specialized partitioning and pruning concepts, we introduce the main indexing representatives and provide performance comparison. The efforts to further speed up retrieval are demonstrated by a class of approximated techniques and the very recent proposals of scalable and distributed structures based on the P2P communication paradigm.

References


,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2007 SimilaritySearchTheMetricSpaceApproachPavel Zezula
Giuseppe Amato
Vlastislav Dohnal
Similarity Search - The Metric Space Approachhttp://www.nmis.isti.cnr.it/amato/similarity-search-book/SAC-07-tutorial.pdf2007