31
Summer School on Hashing’14 Locality Sensitive Hashing Alex Andoni (Microsoft Research)

Summer School on Hashing’14 Locality Sensitive Hashing

Embed Size (px)

DESCRIPTION

Summer School on Hashing’14 Locality Sensitive Hashing. Alex Andoni ( Microsoft Research). Nearest Neighbor Search (NNS). Preprocess: a set of points Query: given a query point , report a point with the smallest distance to. Approximate NNS. c -approximate. if there exists a - PowerPoint PPT Presentation

Citation preview

Summer School on Hashing’14

Locality Sensitive Hashing

Alex Andoni(Microsoft Research)

Nearest Neighbor Search (NNS)

• Preprocess: a set of points

• Query: given a query point , report a point with the smallest distance to

𝑞

𝑝

Approximate NNS

• r-near neighbor: given a new point , report a point s.t.

• Randomized: a point returned with 90% probability

c-approximate

𝑐𝑟if there exists apoint at distance

q

r p

cr

Heuristic for Exact NNS

• r-near neighbor: given a new point , report a set with

• all points s.t. (each with 90% probability)

• may contain some approximate neighbors s.t.

• Can filter out bad answers

q

r p

cr

c-approximate

Locality-Sensitive Hashing• Random hash function on

s.t. for any points :• Close when is “high” • Far when is “small”

• Use several hashtables

q

p

𝑑𝑖𝑠𝑡 (𝑝 ,𝑞)

Pr [𝑔 (𝑝 )=𝑔 (𝑞)]

𝑟 𝑐𝑟

1

𝑃 1

𝑃 2

: , where

[Indyk-Motwani’98]

q

“not-so-small”𝑃 1=¿𝑃 2=¿

𝜌=log 1/𝑃1

log 1/𝑃2

Locality sensitive hash functions

6

• Hash function is usually a concatenation of “primitive” functions:

• Example: Hamming space • , i.e., choose bit for a random • chooses bits at random•

Full algorithm

7

• Data structure is just hash tables:• Each hash table uses a fresh random function

• Hash all dataset points into the table

• Query:• Check for collisions in each of the hash tables• until we encounter a point within distance

• Guarantees:• Space: , plus space to store points• Query time: (in expectation)• 50% probability of success.

Analysis of LSH Scheme

8

• Choice of parameters ?• hash tables with

• Pr[collision of far pair] = • Pr[collision of close pair] = • Hence “repetitions” (tables) suffice!

distance

collis

ion p

robabilit

y

𝑟 𝑐𝑟

𝑃1

𝑃2

𝑃12

𝑃22

set s.t.=

=

𝑘=1𝑘=2

Analysis: Correctness

9

• Let be an -near neighbor• If does not exists, algorithm can output

anything

• Algorithm fails when:• near neighbor is not in the searched buckets

• Probability of failure:• Probability do not collide in a hash table: • Probability they do not collide in hash tables at

most

Analysis: Runtime

10

• Runtime dominated by:• Hash function evaluation: time• Distance computations to points in buckets

• Distance computations:• Care only about far points, at distance • In one hash table, we have

• Probability a far point collides is at most • Expected number of far points in a bucket:

• Over hash tables, expected number of far points is

• Total: in expectation

NNS for Euclidean space

11

• Hash function is a concatenation of “primitive” functions:

• LSH function :• pick a random line , and quantize• project point into

• is a random Gaussian vector• random in • is a parameter (e.g., 4)

[Datar-Immorlica-Indyk-Mirrokni’04]

𝑝

• Regular grid → grid of balls• p can hit empty space, so take

more such grids until p is in a ball

• Need (too) many grids of balls

• Start by projecting in dimension t

• Analysis gives• Choice of reduced dimension

t?• Tradeoff between

• # hash tables, n, and• Time to hash, tO(t)

• Total query time: dn1/c2+o(1)

Optimal* LSH

2D

p

pRt

[A-Indyk’06]

x

Proof idea• Claim: , i.e.,

• P(r)=probability of collision when ||p-q||=r

• Intuitive proof:• Projection approx preserves distances

[JL] • P(r) = intersection / union• P(r)≈random point u beyond the

dashed line• Fact (high dimensions): the x-

coordinate of u has a nearly Gaussian distribution

→ P(r) exp(-A·r2)

pqr

q

P(r)

u

p

 

LSH Zoo

14

• Hamming distance• : pick a random coordinate(s)

[IM98]

• Manhattan distance:• : random grid [AI’06]

• Jaccard distance between sets:

• : pick a random permutation on the universe

min-wise hashing [Bro’97,BGMZ’97]

• Angle: sim-hash [Cha’02,…]

To be or not to be

To Simons or not to Simons

…21102…

be toornot

Sim

ons

…01122…

be toornot

Sim

ons

…11101… …01111…

{be,not,or,to} {not,or,to, Simons}

1 1

not

not

=be,to,Simons,or,not

be to

LSH in the wild

15

• If want exact NNS, what is ?• Can choose any parameters • Correct as long as • Performance:

• trade-off between # tables and false positives• will depend on dataset “quality”• Can tune to optimize for given dataset

𝐿

𝑘

safety not guaranteed

fewer false positives

fewer tables

Time-Space Trade-offs

[AI’06]

[KOR’98, IM’98, Pan’06]

[Ind’01, Pan’06]

Space Time Comment Reference

[DIIM’04, AI’06]

[IM’98]

querytime

space

medium medium

lowhigh

highlow

ω(1) memory lookups [AIP’06]

ω(1) memory lookups [PTW’08, PTW’10]

[MNP’06, OWZ’11]

1 mem lookup

LSH is tight…

leave the rest to cell-probe lower bounds?

Data-dependent Hashing!

• NNS in Hamming space () with query time, space and preprocessing for

• optimal LSH: of [IM’98]

• NNS in Euclidean space () with:

• optimal LSH: of [AI’06]

18

+𝑂( 1𝑐3 /2 )+𝑜(1)

+𝑂( 1𝑐3 )+𝑜(1)

[A-Indyk-Nguyen-Razenshteyn’14]

A look at LSH lower bounds

• LSH lower bounds in Hamming space• Fourier analytic

• [Motwani-Naor-Panigrahy’06]• distribution over hash functions • Far pair: random, distance • Close pair: random at distance = • Get

19

[O’Donnell-Wu-Zhou’11]

𝜖 𝑑𝜖 𝑑𝑐

𝜌 ≥1/𝑐

Why not NNS lower bound?

• Suppose we try to generalize [OWZ’11] to NNS

• Pick random • All the “far point” are concentrated in a small

ball of radius • Easy to see at preprocessing: actual near

neighbor close to the center of the minimum enclosing ball

20

Intuition

• Data dependent LSH:• Hashing that depends on the entire given

dataset!

• Two components:• “Nice” geometric configuration with • Reduction from general to this “nice”

geometric configuration

21

Nice Configuration: “sparsity”• All points are on a sphere of radius

• Random points are at distance

• Lemma 1: • “Proof”:

• Obtained via “cap carving”• Similar to “ball carving” [KMS’98, AI’06]

• Lemma 1’: for radius =

22

Reduction: into spherical LSH• Idea: apply a few rounds of “regular” LSH

• Ball carving [AI’06]• as if target approx. is 5c => query time

• Intuitively:• far points unlikely to collide• partitions the data into buckets of small

diameter • find the minimum enclosing ball• finally apply spherical LSH on this ball!

23

Two-level algorithm

• hash tables, each with:• hash function • ’s are “ball carving LSH” (data independent)• ’s are “spherical LSH” (data dependent)

• Final is an “average” of from levels 1 and 2

h1 (𝑝 ) ,…,h𝑙(𝑝)

𝑠1 (𝑝 ) ,…, 𝑠𝑚(𝑝)

Details

• Inside a bucket, need to ensure “sparse” case

• 1) drop all “far pairs”• 2) find minimum enclosing ball (MEB)• 3) partition by “sparsity” (distance from

center)

25

1) Far points

• In level 1 bucket:• Set parameters as if looking for

approximation • Probability a pair survives:• At distance : • At distance : • At distance :

• Expected number of collisions for distance is constant

• Throw out all pairs that violate this

26

2) Minimum Enclosing Ball

• Use Jung theorem: • A pointset with diameter has a MEB of radius

27

3) Partition by “sparsity”

• Inside a bucket, points are at distance at most

• “Sparse” LSH does not work in this case• Need to partition by the distance to center• Partition into spherical shells of width 1

28

Practice of NNS

29

• Data-dependent partitions…• Practice:

• Trees: kd-trees, quad-trees, ball-trees, rp-trees, PCA-trees, sp-trees…

• often no guarantees

• Theory?• assuming more about data: PCA-like

algorithms “work” [Abdullah-A-Kannan-Krauthgamer’14]

Finale

• LSH: geometric hashing• Data-dependent hashing can do better!

• Better upper bound? • Multi-level improves a bit, but not too much• for ?

• Best partitions in theory and practice?• LSH for other metrics:

• Edit distance, Earth-Mover Distance, etc

• Lower bounds (cell probe?)

30

Open question:• Practical variant of ball-carving hashing?• Design space-partition of that is

• efficient: point location in poly(t) time• qualitative: regions are “sphere-like”

[Prob. needle of length 1 is not cut]

[Prob needle of length c is not cut]

≥1/c2

𝑝