Definition: A probabilistic algorithm to quickly find points in a high dimensional space near a query point. Preprocessing: put every point in multiple hash tables. Each table has its own locality-sensitive hash function and uses buckets (or chaining) since many collisions are expected. The hash functions come from a family of functions. Finding: look up the query point in each hash table, and compute the distance from the query point of every point in the bucket.
Generalization (I am a kind of ...)
point access method.
Aggregate child (... is a part of or used in me.)
hash function, bucket.
See also Bloom filter.
Note: This algorithm is for high dimensional spaces. For instance, a 1000 x 1000 image can be characterized by a vector in 1,000,000-dimensional space, one dimension for each pixel.
To understand the algorithm, let the "hash functions" be random sets of the coordinates. If many coordinates of a point match the query point, it is likely to be close to the query point, at least closer than those that don't match.
Alexandr Andoni's Locality-Sensitive Hashing home page. Locality sensitive hashing in Wikipedia.
Alexandr Andoni and Piotr Indyk, Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions, CACM, 51(1):117-122, January 2008.
If you have suggestions, corrections, or comments, please get in touch with Paul E. Black.
Entry modified 16 May 2008.
HTML page formatted Tue Dec 6 16:16:32 2011.
Cite this as:
Paul E. Black, "locality-sensitive hashing", in Dictionary of Algorithms and Data Structures [online], Paul E. Black, ed., U.S. National Institute of Standards and Technology. 16 May 2008. (accessed TODAY) Available from: http://www.nist.gov/dads/HTML/localitySensitiveHashing.html