Transcript
Page 1: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Beyond Set Disjointness: The Communication Complexity of Finding the

Intersection

Grigory Yaroslavtsevhttp://grigory.us

Joint with Brody, Chakrabarti, Kondapally and Woodruff

Page 2: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Communication Complexity [Yaoโ€™79]

Alice: Bob:

๐’‡ (๐’™ ,๐’š )=?

Shared randomness

โ€ฆ๐’‡ (๐’™ ,๐’š )

โ€ข = min. communication (error ) โ€ข min. -round communication (error )

Page 3: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Set Intersection

๐’™=๐‘บ ,๐’š=๐‘ป , ๐’‡ (๐’™ , ๐’š )=๐‘บโˆฉ๐‘ป๐‘บโŠ† [๐‘› ] ,|๐‘†|โ‰ค๐’Œ ๐‘ป โŠ† [๐‘› ] ,|๐‘‡|โ‰ค๐’Œ = ?

(-Intersection) = ?

is big, n is huge, where huge big

Page 4: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Our results

Let

โ€ข (-Intersection) = [Brody, Chakrabarti, Kondapally, Woodruff, Y.; PODCโ€™14]โ€ข (-Intersection) = [Saglam-Tardos FOCSโ€™13; Brody, Chakrabarti, Kondapally, Woodruff, Y.โ€™; RANDOMโ€™14]

{

times

(-Intersection) = for

Page 5: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Applications

โ€ข Exact Jaccard index (for -approximate use MinHash [Broderโ€™98; Li-Konigโ€™11; Path-Strokel-Woodruffโ€™14])โ€ข Rarity, distinct elements, joins,โ€ฆโ€ข Multi-party set intersection (later)โ€ข Contrast:

Page 6: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

1-round -protocol

๐’‰ : [๐’ ]โ†’[๐’Œ3]

๐‘บ ๐‘ป

๐’‰(๐‘บ) ๐’‰(๐‘ป )

[๐’ ] [๐’ ]

[๐’Œ3] [๐’Œ3]

Page 7: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Hashing

log ๐’Œ

=# of buckets

๐’‰ : [๐’ ]โ†’[๐’Œ / log๐’Œ]

Expected # of elements

Page 8: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Secondary Hashing

= # of hash functions

log 3๐’Œ where

Page 9: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

2-Round -protocol

log 3๐’Œ

log 3๐’Œ

|h๐‘– (๐‘บ )|,|h๐‘– (๐‘ป )|=๐‘‚ ( log๐’Œ log log๐’Œ )

Total communication = = O()

Page 10: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Collisions

๐’Œlog๐’Œ

log 3๐’ŒPr [๐‘๐‘œ๐‘™๐‘™๐‘–๐‘ ๐‘–๐‘œ๐‘› ]=๐‘‚( 1log๐’Œ )

Page 11: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Collisions

log 3๐’Œ

log 3๐’Œ

Key fact: If then also =

Page 12: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Collisions

โ€ข Second round: โ€“ For each bucket send -bit equality check (total -

communication)โ€“ Correct intersection computed in buckets where

โ€“ Expected # items in incorrect buckets โ€“ Use 1-round protocol for incorrect bucketsโ€“ Total communication

Page 13: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Main protocol

๐‘‚ (1)

=# of buckets

๐’‰ : [๐’ ]โ†’[๐’Œ]

Expected # of elements

Page 14: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Verification tree -degree

โ€ฆi log๐‘Ÿ โˆ’1๐’Œ

buckets = leaves of the verification tree

Page 15: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Verification bottom-up

๐‘บ๐Ÿโ‘ ,๐“๐Ÿ

โ‘ ๐‘บ๐Ÿโ‘ ,๐“๐Ÿ

โ‘

๐‘บ๐Ÿโ‘โˆช๐‘บ๐Ÿ ,๐“๐Ÿ

โ‘โˆช๐‘ป ๐Ÿ

๐‘บ๐Ÿโ‘โˆฉ๐“๐Ÿ

โ‘๐‘บ๐Ÿโ‘โˆฉ๐“๐Ÿ

โ‘

(๐‘บ๐Ÿโ‘โˆช๐‘บ๐Ÿ )โˆฉ(๐“ ยฟยฟ๐Ÿโ‘โˆช๐‘ป ๐Ÿ)ยฟ

Page 16: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

EQUALITY CHECK

Verification bottom-up

๐‘บ๐Ÿโ‘โˆฉ๐“๐Ÿ

โ‘๐‘บ๐Ÿโ‘โˆฉ๐“๐Ÿ

โ‘

(๐‘บ๐Ÿโ‘โˆช๐‘บ๐Ÿ )โˆฉ(๐“ ยฟยฟ๐Ÿโ‘โˆช๐‘ป ๐Ÿ)ยฟ

Correct Incorrect

Incorrect

๐‘บ๐Ÿโ‘โˆฉ๐“๐Ÿ

โ‘๐‘บ๐Ÿโ‘โˆฉ๐“๐Ÿ

โ‘

(๐‘บ๐Ÿโ‘โˆช๐‘บ๐Ÿ )โˆฉ(๐“ ยฟยฟ๐Ÿโ‘โˆช๐‘ป ๐Ÿ)ยฟ

Correct Incorrect

Page 17: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Correct

Verification bottom-up

๐‘บ๐Ÿโ‘โˆฉ๐“๐Ÿ

โ‘๐‘บ๐Ÿโ‘โˆฉ๐“๐Ÿ

โ‘

(๐‘บ๐Ÿโ‘โˆช๐‘บ๐Ÿ )โˆฉ(๐“ ยฟยฟ๐Ÿโ‘โˆช๐‘ป ๐Ÿ)ยฟ

Correct Incorrect

EQUALITY CHECK FAILS =>RESTART THE SUBTREE

๐‘บ๐Ÿโ‘โˆฉ๐“๐Ÿ

โ‘๐‘บ๐Ÿโ‘โˆฉ๐“๐Ÿ

โ‘

(๐‘บ๐Ÿโ‘โˆช๐‘บ๐Ÿ )โˆฉ(๐“ ยฟยฟ๐Ÿโ‘โˆช๐‘ป ๐Ÿ)ยฟ

Correct Incorrect

Correct

Page 18: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Verification bottom-up

๐’‘๐’“ โˆ’๐Ÿ

โ€ฆ๐’‘๐Ÿ

๐‘บ๐Ÿ๐Ÿ ,๐“๐Ÿ

๐Ÿ โ€ฆ ๐‘บ๐’Š๐Ÿ ,๐“ ๐ข

๐Ÿ๐‘บ๐Ÿ๐Ÿ ,๐“๐Ÿ

๐Ÿ ๐‘บ๐’Œ๐Ÿ ,๐“๐’Œ

๐Ÿโ€ฆ

๐’‘๐’“ โˆ’๐Ÿ

Page 19: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Analysis of Stage

โ€ข = [node at stage computed correctly]โ€ข Set = โ€“ Run equality checks and basic intersection

protocols with success probability โ€“ Key lemma: [# of restarts per leaf => Cost of

Intersection in leafs = โ€“ Cost of Equality =

โ€ข [protocol succeeds] =

Page 20: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Multi-party extensions

players: , where

โ€ข Boost error probability of 2-player protocol to โ€ข Average per player (using coordinator):

in roundsโ€ข Worst-case per player (using a tournament)

in rounds

Page 21: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

Open Problems

โ€ข (-Intersection) = ?โ€ข Better protocols for the multi-party setting?

Page 22: Beyond Set  Disjointness :  The Communication Complexity of Finding the Intersection

-Disjointnessโ€ข , iff โ€ข [Razborovโ€™92; Hastad-Wigdersonโ€™96] โ€ข [Folklore + Dasgupta, Kumar, Sivakumar; Buhrmanโ€™12, Garcia-Soriano, Matsliah, De Wolfโ€™12]

โ€ข [Saglam, Tardosโ€™13]โ€ข [Braverman, Garg, Pankratov, Weinsteinโ€™13]


Recommended