Upload
emil-flowers
View
219
Download
0
Embed Size (px)
Citation preview
Beyond Locality Sensitive Hashing
Alex Andoni (Microsoft Research)
Joint with: Piotr Indyk (MIT), Huy L. Nguyen (Princeton), Ilya Razenshteyn (MIT)
Nearest Neighbor Search (NNS)
• Preprocess: a set of points
• Query: given a query point , report a point with the smallest distance to
𝑞
𝑝
Motivation• Generic setup:
• Points model objects (e.g. images)• Distance models (dis)similarity measure
• Application areas: • machine learning: k-NN rule• image/video/music recognition,
deduplication, bioinformatics, etc…
• Distance can be: • Hamming, Euclidean, …
• Primitive for other problems:• find the similar pairs, clustering…
000000011100010100000100010100011111
000000001100000100000100110100111111 𝑞
𝑝
Approximate NNS
• r-near neighbor: given a new point , report a point s.t.
• Randomized: a point returned with 90% probability
• In practice: query time small if not too many approximate neighbors
c-approximate
𝑐𝑟if there exists apoint at distance
q
r p
cr
Locality-Sensitive Hashing• Random hash function on
s.t. for any points :• Close when is “high” • Far when is “small”
• Use several hashtables
q
p
𝑑𝑖𝑠𝑡 (𝑝 ,𝑞)
Pr [𝑔 (𝑝 )=𝑔 (𝑞)]
𝑟 𝑐𝑟
1
𝑃 1
𝑃 2
: , where
[Indyk-Motwani’98]
q
“not-so-small”𝑃 1=¿𝑃 2=¿
𝑃1=𝑃2𝜌
Locality sensitive hash functions• Hash function is actually a
concatenation of “primitive” functions:
• LSH in Hamming space • , i.e., choose bit for a random • • = projection on a few random bits
6
[Indyk-Motwani’98]
Ham (𝑝 ,𝑞)
Pr [𝑔 (𝑝 )=𝑔 (𝑞)]
𝑟 𝑐𝑟
1
𝑃 1
𝑃 2
Algorithms and Lower Bounds
Space Time Comment Reference
[IM’98]
memory lookups [PTW’08, PTW’10]
[IM’98]
[DIIM’04, AI’06]
ℓ1
ℓ2
[MNP’06]
[OWZ’11]
memory lookups [PTW’08, PTW’10]
[MNP’06]
[OWZ’11]
Main Result
• NNS in Hamming space () with query time, space and preprocessing for
• cf of [IM’98]
• NNS in Euclidean space () with:
• cf of [AI’06]
9
+𝑂( 1𝑐3 /2 )+𝑜(1)
+𝑂( 1𝑐3 )+𝑜(1)
A look at LSH lower bounds
• LSH lower bounds in Hamming space• Fourier analytic
• [Motwani-Naor-Panigrahy’06]• distribution over hash functions • Far pair: random, distance • Close pair: random at distance = • Get
10
[O’Donnell-Wu-Zhou’11]
𝜖 𝑑𝜖 𝑑𝑐
𝜌 ≥1/𝑐
Why not NNS lower bound?
• Suppose we try to generalize [OWZ’11] to NNS
• Pick random • All the “far point” are concentrated in a small
ball of radius • Easy to see at preprocessing: actual near
neighbor close to the center of the minimum enclosing ball
11
Our algorithm: intuition
• Data dependent LSH:• Hashing that depends on the entire given
dataset!
• Two components:• “Nice” geometric configuration with • Reduction from general to this “nice”
geometric configuration
12
Nice Configuration: “sparsity”• All points are on a sphere of radius
• Random points are at distance
• Lemma 1: • “Proof”:
• Obtained via “cap carving”• Similar to “ball carving” [KMS’98, AI’06]
• Lemma 1’: for radius =
13
Reduction: into spherical LSH• Idea: apply a few rounds of “regular” LSH
• Ball carving [AI’06]
• Intuitively:• far points unlikely to collide• partitions the data into buckets of small
diameter • find the minimum enclosing ball• finally apply spherical LSH on this ball!
14
Two-level algorithm
• hash tables, each with:• hash function • ’s are “ball carving LSH” (data independent)• ’s are “spherical LSH” (data dependent)
• Final is an “average” of from levels 1 and 2
h1 (𝑝 ) ,…,h𝑙(𝑝)
𝑠1 (𝑝 ) ,…, 𝑠𝑚(𝑝)
Details
• Inside a bucket, need to ensure “sparse” case
• 1) drop all “far pairs”• 2) find minimum enclosing ball (MEB)
• Use Jung theorem: diameter implies MEB radius
• 3) partition by “sparsity” (distance from center) into shells
16
Practice• Practice uses data-dependent
partitions!• “wherever theoreticians suggest to use
random dimensionality reduction, use PCA”
• Lots of variants• Trees: kd-trees, quad-trees, ball-trees,
rp-trees, PCA-trees, sp-trees…• no guarantees: e.g., are deterministic
• Is there a better way to do partitions in practice?
• Why do PCA-trees work?• [Abdullah-A-Kannan-Krauthgamer]: if
have more structure17
ℓ