Upload
blake-lane
View
216
Download
2
Embed Size (px)
Citation preview
One algorithm to rule them allOne join query at a time
Atri RudraUniversity at Buffalo
A brief history of this talk
L2/L2 foreach sparse recovery/compressed sensing
http://www-stat.stanford.edu/~candes/stats330/index.shtml
The key technical problem
Given the three shadows, what is the largest size of the original set of points?
Given the three shadows, what is the largest size of the original set of points?
The key technical problem
Highly trivial: 43 = 64 Still trivial: 42 = 16 Correct answer: 41.5 = 8
The key technical problem
A
B
C
|R|= k
|T| =k|S|=k
k3/2
Loomis Whitney
Algorithmic Loomis-
Whitney?
Algorithmic Loomis-
Whitney?
An equivalent view
A
B
C
R
TS
A
B C
R
S
T
Output all (a,b,c) s.t. (a,b) in R, (b,c) in S and
(c,a) in T
Output all (a,b,c) s.t. (a,b) in R, (b,c) in S and
(c,a) in T
Overview of the talk
A
B C
R
S
T
The take-away message
Joinalgo
http://welovetumblr.blogspot.com/2012/07/thor-is.html
Overview of the talk
A
B C
R
S
T
(Database) Joins
Codd
Attributes/Nodes: [n]
Relations/Hyperedges: e1,…, em [n]
11
2233
44
55
Tables/Projections: R1 , … , Rm
Output all a = (a1,..,an) s.t. a projected down to
ei is in Ri for every i in [m]
Output all a = (a1,..,an) s.t. a projected down to
ei is in Ri for every i in [m]
The triangle join query
A
B
C
R
TS
Output all (a,b,c) s.t. (a,b) in R, (b,c) in S and
(c,a) in T
Output all (a,b,c) s.t. (a,b) in R, (b,c) in S and
(c,a) in T
S
AA
BB CC
R T
Bounding the output size
Atserias Grohe Marx
AA
BB CC
S
R T
Highly trivial bound: R S T
Still trivial bound: R S
Loomis-Whitney bound: R1/2 S1/2 T1/2
½
½
½x
y
z
AGM bound: Rx Sy Tz
x + z ≥ 1 x + y ≥ 1 y + z ≥ 1
AA
BB
CCx, y, z ≥ 0
Loomis Whitney
?
Algorithmic Loomis-WhitneyLoomis-Whitney bound: R1/2 S1/2 T1/2
AA
BB CC
S
R T½
½
½
R
TS CC
BBAA
c
Goal: Count number of trianglesGoal: Count number of triangles
There are Rchoices for edges in R
There are dS(c)dT(c)choices for pairs ofneighbors of c
http://agilitrix.com/2011/03/red-pill-blue-pill/
TS CC
BBAA
c
dT(c)dS(c)
Algorithmic Loomis-WhitneyLoomis-Whitney bound: R1/2 S1/2 T1/2
Goal: Count number of trianglesGoal: Count number of triangles
There are Rchoices for edges in R
There are dS(c)dT(c)choices for pairs ofneighbors of c
Make this choice for every c in CMake this choice for every c in C
Run time of algo=Σc min( R
,dS(c)dT(c) )
Run time of algo=Σc min( R
,dS(c)dT(c) )
R
TS CC
BBAA
c
Analyzing the algorithmLoomis Whitney bound: R½ S½ T½
Σc min( R , dS(c) dT(c) )
≤ Σc (R dS(c) dT(c) ) ½
= R½Σc ( dS(c) ½ dT(c) ½ )
≤ R½(Σc dS(c)) ½(ΣcdT(c)) ½
= R½S½T½
R
TS CC
BBAA
c
Cauchy Schwartz
min(E,F) ≤ (EF)½
min(E,F) ≤ (EF)½
?Atserias Grohe Marx
Same algorithm!AGM bound: Rx Sy Tz
Σc min( R , dS(c) dT(c) )
≤ Σc Rx (dS(c) dT(c) ) 1-x
≤ RxΣc ( dS(c) y dT(c) z )
≤ Rx(Σc dS(c)) y(ΣcdT(c)) z
= RxSyTz
R
TS CC
BBAA
c
x + z ≥ 1 x + y ≥ 1 y + z ≥ 1
AA
BB
CC
Hölder
min(E,F) ≤ ExF1-x
min(E,F) ≤ ExF1-x
General Join Result
Attributes/Nodes: [n]
Relations/Hyperedges: e1,…, em [n]
11
2233
44
55
Tables/Projections: R1 , … , Rm
x1,..,xm be a fractional cover
AGM bound: R1x1…Rm
xm
Our result: O(AGM + Input size)
x1
x2
x3
x4
Provably worst-case
optimal join algorithm
Provably worst-case
optimal join algorithm
List recovery
.
.
.
..
.
.
S1 S2 S3 Sn
………………………Si subset of [q]
………………………c1 c2 c3 cn
20
Code C subset of [q]nApplications in
expandersApplications in
expanders
An alternate view of joins
A
B C
R
S
T Msg in [q]3
Codeword in [q2]3
.
.
.
..
R S T
Constant dimensionConstant block length
Large alphabet sizeLarge input list size
Constant dimensionConstant block length
Large alphabet sizeLarge input list size
Overview of the talk
A
B C
R
S
T
Sparse Recovery/Compressed Sensing
UnknownTo be designed
Observed
DecodeDecode
Output
k=2
Heavy Hitter
Tail
Quantifying the approximation
L2 ≤ C L2
(Most of) rest of the talk
Designing the matrix
UnknownTo be designed
Observed
DecodeDecode
Output
k=2
Designing the matrix k=2
N
m
k-expander
N m
< ¼ (neighborhood)
Measurement = + noise
Heavy tail noise < ¼ (neighborhood)
> ½ of the neighbors of have the
“correct” value
> ½ of the neighbors of have the
“correct” value
Count-Sketch style algo k=2
N m
Estimate = median of O(log N) values
Output the top O(k) estimates
O(N log N) decoding
Indyk Ružić
We need a faster algorithm…
S
Towards a sub-linear time algo
Estimate=median value
Output the top O(k) estimates in S
O(|S| log N) decoding
All we need to do is to
compute a small S quikcly
All we need to do is to
compute a small S quikcly
Porat-Strauss Idea: Recursion!
[N]
{0,1}log N
[√N] [√N]
Solve in ~ √N time Solve in ~ √N time
The problem we now need to solveElements of S Geometrically…
k
k
?
Output size ~ k2Overall running time ~ √N + k2
Not sub-linear for
k > √N
Not sub-linear for
k > √N
Use a table-look up to decrease
the run time
Use a table-look up to decrease
the run time
Finally…
Slightly different recursionlog N
[N]
[N⅔] [N⅔] [N⅔]
Geometricproblem tosolve
Overall runtime
k3/2 + N2/3
Our Results
L2/L2 sparse recovery with failure prob p
Optimal k log(N/k) measurements*
k1+ε poly-log N decoding+space
p ~ (N/k)-k/poly-log k
Also prove tight lower bound of k log(N/k) + log(1/p)
One algorithm to rule them allOne join query at a time
Atri RudraUniversity at Buffalo
Only two problems so far…
A
B C
R
S
T
Albert Meyer (via Dick Lipton)
"Prove it for n=3 and then let 3 go to infinity"
The 3rd problem…
Big (hyper)graph G
http://pigeonsandplanes.com/2010/12/thoughts-on-net-neutrality.html
11
2233
44
55
Small (hyper) graph H
Compute all copies of H in G
Our join algorithm gives a worst-case optimal algorithm for any constant-sized H
Our join algorithm gives a worst-case optimal algorithm for any constant-sized H
Joins model many more
problems, e.g. CSPs
Joins model many more
problems, e.g. CSPs
The take-away message
Joinalgo
http://welovetumblr.blogspot.com/2012/07/thor-is.html