Upload
cathleen-brown
View
20
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Summer School on Hashing’14 Dimension Reduction. Alex Andoni ( Microsoft Research). Nearest Neighbor Search (NNS). Preprocess: a set of points Query: given a query point , report a point with the smallest distance to. Motivation. Generic setup: Points model objects (e.g. images) - PowerPoint PPT Presentation
Citation preview
Summer School on Hashing’14
Dimension Reduction
Alex Andoni(Microsoft Research)
Nearest Neighbor Search (NNS)
Preprocess: a set of points
Query: given a query point , report a point with the smallest distance to
𝑞
𝑝
Motivation
Generic setup: Points model objects (e.g. images) Distance models (dis)similarity measure
Application areas: machine learning: k-NN rule speech/image/video/music recognition,
vector quantization, bioinformatics, etc… Distance can be:
Hamming, Euclidean, edit distance, Earth-mover
distance, etc… Primitive for other problems:
find the similar pairs in a set D, clustering…
000000011100010100000100010100011111
000000001100000100000100110100111111 𝑞
𝑝
2D case
Compute Voronoi diagram Given query , perform point
location Performance:
Space: Query time:
High-dimensional case
All exact algorithms degrade rapidly with the dimension
Algorithm Query time Space
Full indexing (Voronoi diagram size)
No indexing – linear scan
Dimension Reduction If high dimension is an issue, reduce it?!
“flatten” dimension into dimension Not possible in general: packing bound But can if: for a fixed subset of
Johnson Lindenstrauss Lemma [JL’84] Application: NNS in
Trivial scan: query time Reduce to time if preprocess, where time to
reduce dimension of the query point
Johnson Lindenstrauss Lemma There is a randomized linear map , , that
preserves distance between two vectors up to factor:
with probability ( some constant) Preserves distances among points for Time to compute map:
Idea: Project onto a random subspace of
dimension !
1D embedding How about one dimension () ? Map
, where are iid normal (Gaussian) random variable
Why Gaussian? Stability property: is distributed as , where is also
Gaussian Equivalently: is centrally distributed, i.e., has
random direction, and projection on random direction depends only on length of
pdf =
1D embedding Map ,
for any , Linear:
Want: Ok to consider since linear
Claim: for any , we have Expectation: Standard deviation:
Proof: Expectation
2 2
pdf =
Full Dimension Reduction Just repeat the 1D embedding for times!
where is matrix of Gaussian random variables
Again, want to prove: for fixed with probability
Concentration is distributed as
where each is distributed as Gaussian Norm
is called chi-squared distribution with degrees Fact: chi-squared very well concentrated:
Equal to with probability Akin to central limit theorem
Johnson Lindenstrauss: wrap-up with high probability
Beyond Gaussians: Can use instead of Gaussians [AMS’96, Ach’01,
TZ’04…]
Faster JL ?
14
Time: To compute time ?
Yes! [AC’06, AL’08’11, DKS’10, KN’12…]
Will show: time [Ailon-Chazelle’06]
Fast JL Transform
15
Costly because is dense Meta-approach: use matrix ? Suppose sample entries/row Analysis of one row:
s.t. with probability Expectation of :
What about variance??Set
𝑮 𝑥𝑘
𝑑
𝑮
normalization constant
Fast JLT: sparse projection
16
Variance of can be large Bad case: is sparse
think: Even for (each coordinate of goes somewhere)
two coordinates collide (bad) with probability ~ want exponential in failure probability really would need
But, take away: may work if is “spread around”
New plan: “spread around” use sparse
𝑥𝑘
𝑑
𝑮
FJLT: full
17
= matrix with r.v. on diagonal = Hadamard matrix:
can be computed in time composed of only
= sparse matrix as before, size , with
Projection:sparse matrix
𝑧=𝑃𝐻𝐷⋅𝑥
“spreading around”
Hadamard(Fast Fourier Transform)
Diagonal
Spreading around: intuition
18
Idea for Hadamard/Fourier Transform: “Uncertainty principle”: if the original is sparse,
then the transform is dense! Though can “break” ’s that are already dense
𝑧=𝑃𝐻𝐷𝑥
Projection:sparse matrix
Hadamard(Fast Fourier Transform)
Diagonal
“spreading around”
Spreading around: proof
19
Suppose Ideal spreading around:
Lemma: with probability at least , for each coordinate
Proof:
for some vector Hence is approx. gaussian Hence
Why projection ?
20
Why aren’t we done? choose first few coordinates of each has same distribution: gaussian Issue: are not independent
Nevertheless: since is a change of basis (rotation in )
𝑧=𝑃𝐻𝐷𝑥
Projection
21
Have: with probability
= projection onto just random coordinates!
Proof: standard concentration
Chernoff: enough to sample terms for approximation
Hence suffices
𝑧=𝑃𝐻𝐷𝑥
FJLT: wrap-up
22
Obtain: with probability at least dimension of is time:
Dimension not optimal: apply regular (dense) JL on to reduce further to
Final time: [AC’06, AL’08’11, WDLSA’09, DKS’10, KN’12,
BN, NPW’14]
𝑧=𝑃𝐻𝐷𝑥
Dimension Reduction: beyond Euclidean
23
Johnson-Lindenstrauss: for Euclidean space dimension, oblivious
Other norms, such as ? Essentially no: [CS’02, BC’03, LN’04, JN’10…] For points, approximation: between and [BC03,
NR10, ANN10…] even if map depends on the dataset!
But can do weak dimension reduction
Towards dimension reduction for
24
Can we do the “analog” of Euclidean projections?
For , we used: Gaussian distribution has stability property: is distributed as
Is there something similar for 1-norm? Yes: Cauchy distribution! 1-stable: is distributed as
What’s wrong then? Cauchy are heavy-tailed… doesn’t even have finite expectation
𝑝𝑑𝑓 (𝑠 )= 1
𝜋(𝑠2+1)
Weak embedding [Indyk’00]
25
Still, can consider map as before
Each coordinate distributed as Cauchy does not concentrate at all, but…
Can estimate by: Median of absolute values of coordinates! Concentrates because abs(Cauchy) is in the
correct range for 90% of the time!
Gives a sketch OK for nearest neighbor search