Locality-Sensitive Hashing (LSH) Algorithm

From GM-RKB
Jump to navigation Jump to search

A Locality-Sensitive Hashing (LSH) Algorithm is an approximate similarity search algorithm that uses multiple "bucket" hash functions to guarantee that there is a high probability of collision for points which are close to each other and low collision probability for dissimilar points.



References

2020

  • (Wikipedia, 2020) ⇒ https://en.wikipedia.org/wiki/Locality-sensitive_hashing Retrieved:2020-5-5.
    • In computer science, locality-sensitive hashing (LSH) is an algorithmic technique that hashes similar input items into the same "buckets" with high probability. (The number of buckets are much smaller than the universe of possible input items.) Since similar items end up in the same buckets, this technique can be used for data clustering and nearest neighbor search. It differs from conventional hashing techniques in that hash collisions are maximized, not minimized. Alternatively, the technique can be seen as a way to reduce the dimensionality of high-dimensional data; high-dimensional input items can be reduced to low-dimensional versions while preserving relative distances between items.

      Hashing-based approximate nearest neighbor search algorithms generally use one of two main categories of hashing methods: either data-independent methods, such as locality-sensitive hashing (LSH); or data-dependent methods, such as Locality-preserving hashing (LPH).

2017

2016

2015

LSH family for Euclidean distance is accomplished by random Gaussian projection. There are also various versions of LSH family for Jaccard similarity, cosine distance, Hamming distance etc.

 LSH usually applies a concatenation of m hashing functions randomly selected from the hashing family to build the hash table(reducing check-rate), and builds l independent hash tables and takes the union of results (raising recall). m and l are parameters of LSH and the best value is related to the data size.

2014

2012

2004

1999