Beyond Set Disjointness: The Communication Complexity of Finding the
Intersection
Grigory Yaroslavtsevhttp://grigory.us
Joint with Brody, Chakrabarti, Kondapally and Woodruff
Communication Complexity [Yaoโ79]
Alice: Bob:
๐ (๐ ,๐ )=?
Shared randomness
โฆ๐ (๐ ,๐ )
โข = min. communication (error ) โข min. -round communication (error )
Set Intersection
๐=๐บ ,๐=๐ป , ๐ (๐ , ๐ )=๐บโฉ๐ป๐บโ [๐ ] ,|๐|โค๐ ๐ป โ [๐ ] ,|๐|โค๐ = ?
(-Intersection) = ?
is big, n is huge, where huge big
Our results
Let
โข (-Intersection) = [Brody, Chakrabarti, Kondapally, Woodruff, Y.; PODCโ14]โข (-Intersection) = [Saglam-Tardos FOCSโ13; Brody, Chakrabarti, Kondapally, Woodruff, Y.โ; RANDOMโ14]
{
times
(-Intersection) = for
Applications
โข Exact Jaccard index (for -approximate use MinHash [Broderโ98; Li-Konigโ11; Path-Strokel-Woodruffโ14])โข Rarity, distinct elements, joins,โฆโข Multi-party set intersection (later)โข Contrast:
1-round -protocol
๐ : [๐ ]โ[๐3]
๐บ ๐ป
๐(๐บ) ๐(๐ป )
[๐ ] [๐ ]
[๐3] [๐3]
Hashing
log ๐
=# of buckets
๐ : [๐ ]โ[๐ / log๐]
Expected # of elements
Secondary Hashing
= # of hash functions
log 3๐ where
2-Round -protocol
log 3๐
log 3๐
|h๐ (๐บ )|,|h๐ (๐ป )|=๐ ( log๐ log log๐ )
Total communication = = O()
Collisions
๐log๐
log 3๐Pr [๐๐๐๐๐๐ ๐๐๐ ]=๐( 1log๐ )
Collisions
log 3๐
log 3๐
Key fact: If then also =
Collisions
โข Second round: โ For each bucket send -bit equality check (total -
communication)โ Correct intersection computed in buckets where
โ Expected # items in incorrect buckets โ Use 1-round protocol for incorrect bucketsโ Total communication
Main protocol
๐ (1)
=# of buckets
๐ : [๐ ]โ[๐]
Expected # of elements
Verification tree -degree
โฆi log๐ โ1๐
buckets = leaves of the verification tree
Verification bottom-up
๐บ๐โ ,๐๐
โ ๐บ๐โ ,๐๐
โ
๐บ๐โโช๐บ๐ ,๐๐
โโช๐ป ๐
๐บ๐โโฉ๐๐
โ๐บ๐โโฉ๐๐
โ
(๐บ๐โโช๐บ๐ )โฉ(๐ ยฟยฟ๐โโช๐ป ๐)ยฟ
EQUALITY CHECK
Verification bottom-up
๐บ๐โโฉ๐๐
โ๐บ๐โโฉ๐๐
โ
(๐บ๐โโช๐บ๐ )โฉ(๐ ยฟยฟ๐โโช๐ป ๐)ยฟ
Correct Incorrect
Incorrect
๐บ๐โโฉ๐๐
โ๐บ๐โโฉ๐๐
โ
(๐บ๐โโช๐บ๐ )โฉ(๐ ยฟยฟ๐โโช๐ป ๐)ยฟ
Correct Incorrect
Correct
Verification bottom-up
๐บ๐โโฉ๐๐
โ๐บ๐โโฉ๐๐
โ
(๐บ๐โโช๐บ๐ )โฉ(๐ ยฟยฟ๐โโช๐ป ๐)ยฟ
Correct Incorrect
EQUALITY CHECK FAILS =>RESTART THE SUBTREE
๐บ๐โโฉ๐๐
โ๐บ๐โโฉ๐๐
โ
(๐บ๐โโช๐บ๐ )โฉ(๐ ยฟยฟ๐โโช๐ป ๐)ยฟ
Correct Incorrect
Correct
Verification bottom-up
๐๐ โ๐
โฆ๐๐
๐บ๐๐ ,๐๐
๐ โฆ ๐บ๐๐ ,๐ ๐ข
๐๐บ๐๐ ,๐๐
๐ ๐บ๐๐ ,๐๐
๐โฆ
๐๐ โ๐
Analysis of Stage
โข = [node at stage computed correctly]โข Set = โ Run equality checks and basic intersection
protocols with success probability โ Key lemma: [# of restarts per leaf => Cost of
Intersection in leafs = โ Cost of Equality =
โข [protocol succeeds] =
Multi-party extensions
players: , where
โข Boost error probability of 2-player protocol to โข Average per player (using coordinator):
in roundsโข Worst-case per player (using a tournament)
in rounds
Open Problems
โข (-Intersection) = ?โข Better protocols for the multi-party setting?
-Disjointnessโข , iff โข [Razborovโ92; Hastad-Wigdersonโ96] โข [Folklore + Dasgupta, Kumar, Sivakumar; Buhrmanโ12, Garcia-Soriano, Matsliah, De Wolfโ12]
โข [Saglam, Tardosโ13]โข [Braverman, Garg, Pankratov, Weinsteinโ13]