Approximate Nearest Neighbor Search Task

From GM-RKB
Jump to navigation Jump to search

An Approximate Nearest Neighbor Search Task is a Nearest Neighbor Search Task where the proximity requirements are loosened.



References

2009

  • (Wikipedia, 2009) ⇒ http://en.wikipedia.org/wiki/Nearest_neighbor_search#Approximate_nearest_neighbor
    • Approximate nearest neighbor In some applications it may be acceptable to retrieve a "good guess" of the nearest neighbor. In those cases, we can use an algorithm which doesn't guarantee to return the actual nearest neighbor in every case, in return for improved speed or memory savings. Often such an algorithm will find the nearest neighbor in a majority of cases, but this depends strongly on the dataset being queried.
    • Algorithms which support the approximate nearest neighbor search include Best Bin First and the kd-trie.
    • ε-approximate nearest neighbor search is becoming an increasingly popular tool for fighting the curse of dimensionality.

1998

  • (Indyk & Motwani, 1998) ⇒ Piotr Indyk, and Rajeev Motwani. (1998). “Approximate Nearest Neighbors: Towards removing the curse of dimensionality.” In: Proceedings of the thirtieth annual ACM symposium on Theory of computing.
    • The nearest neighbor problem is the following: Given a set of [math]\displaystyle{ n }[/math] points P={p_1,...,p_n} in some metric space [math]\displaystyle{ X }[/math], preprocess P/ so as to efficiently answer queries which require finding the point in [math]\displaystyle{ P }[/math] closest to a query point q ∈ X. We focus on the particularly interesting case of the [math]\displaystyle{ d }[/math]-dimensional Euclidean space where X = R^d under some l_p norm. Despite decades of effort, the current solutions are far from satisfactory; in fact, for large [math]\displaystyle{ d }[/math], in theory or in practice, they provide little improvement over the brute-force algorithm which compares the query point to each data point. Of late, there has been some interest in the approximate nearest neighbors problem, which is: Find a point p ∈ P that is an e-approximate nearest neighbor of the query [math]\displaystyle{ q }[/math] in that for all p’ ∈ P, d(p, q) < (1 + e)d(p’, q).
    • The nearest neighbor search (NNS) problem is: Given a set of [math]\displaystyle{ n }[/math] points P = {p_1,...,p_n} in a metric space [math]\displaystyle{ X }[/math] with distance function [math]\displaystyle{ d }[/math], preprocess [math]\displaystyle{ P }[/math] so as to efficiently answer queries for finding the point in [math]\displaystyle{ P }[/math] closest to a query point q ∈ X. We focus on the particularly interesting case of the [math]\displaystyle{ d }[/math]-dimensional Euclidean space where X = R^d under some l_p norm. The low-dimensional case is well-solved [26], so the main issue is that of dealing with the "curse of dimensionality” [16].