40
Artur Czumaj DIMAP and Department of Computer Science University of Warwick Testing Cluster Structure of Graphs Joint work with Pan Peng and Christian Sohler (TU Dortmund)

Testing Cluster Structure of Graphs

  • Upload
    ingrid

  • View
    50

  • Download
    6

Embed Size (px)

DESCRIPTION

Testing Cluster Structure of Graphs. Artur Czumaj DIMAP and Department of Computer Science University of Warwick. Joint work with Pan Peng and Christian Sohler (TU Dortmund). TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A A A A. - PowerPoint PPT Presentation

Citation preview

Page 1: Testing Cluster Structure of Graphs

Artur CzumajDIMAP and Department of Computer Science

University of Warwick

Testing Cluster Structure of Graphs

Joint work with Pan Peng and Christian Sohler (TU Dortmund)

Page 2: Testing Cluster Structure of Graphs

Dealing with “BigData” in Graphs

• We want to process graphs quickly– Detect basic properties– Analyze their structure

• For large graphs, by “quickly” we often would mean: in time constant or sublinear in the size of the graph

Page 3: Testing Cluster Structure of Graphs

Dealing with “BigData” in Graphs

One approach:• How to test basic properties of graphs

in the framework of property testing

Page 4: Testing Cluster Structure of Graphs

Fast Testing of Graph Properties

• Does this graph have a clique of size 11?

• Does it have a given as its subgraph?

• Is this graph planar?• Is it bipartite?• Is it -colorable?• Does it have good

expansion?• Does it have good

clustering?from Fan Chung’s web page

Page 5: Testing Cluster Structure of Graphs

Clustering in graphs

• What is a good clustering?

from Fan Chung’s web page

Page 6: Testing Cluster Structure of Graphs

Clustering in graphs

• Same cluster: points are well-connected

• Different cluster: points are poorly connected

from Fan Chung’s web page

Page 7: Testing Cluster Structure of Graphs

Clustering in graphs

Sublinearalgorithms

parameterizedcomplexity

Page 8: Testing Cluster Structure of Graphs

Clustering in graphs

• Same cluster: points have high conductance

• Different cluster: points are separated by a cut

from Fan Chung’s web page

Page 9: Testing Cluster Structure of Graphs

-clustering

Graph is -clusterable if vertices of can be partitioned into at most sets such that for each :• each has large conductance• each set has low outer-conductance (small cut)

Page 10: Testing Cluster Structure of Graphs

Conductance

has maximum degree

For every , conductance of S is defined as:

Conductance of :

(minimum conductance of any possible subset of size at most )

Page 11: Testing Cluster Structure of Graphs

-clustering

is -clusterable if vertices of can be partitioned into with such that for each :• each has large conductance

• each set has low outer-conductance (small cut)

Page 12: Testing Cluster Structure of Graphs

-clustering

Notion of -clusterable graphs has been around informally for a while;formally introduced by Oveis Gharan and Trevisan 2014

Our goal:• we want to determine if is -clusterable• really fast

– Recognize cluster structure in sublinear time using random sampling

Page 13: Testing Cluster Structure of Graphs

Framework of property testing

• We cannot quickly give 100% precise answer• We need to approximate

• Distinguish graphs that have specific propertyfrom those that are far from having the property

Page 14: Testing Cluster Structure of Graphs

Framework

• Goal:Distinguish between the case when – graph has property P and – is far from having property P

• one has to change at least edges of to obtain a graph with property

We will consider graph of degree bounded by

Page 15: Testing Cluster Structure of Graphs

Goal

Design a sublinear-time algorithm that will distinguish between two cases:• -clusterable graphs• graphs that are -far from being -clusterable (with as

possible)

Page 16: Testing Cluster Structure of Graphs

Basic case : Testing expansion

-clusterable graphs for : expanders

• For graphs of bounded degree, we can distinguish expanders from graphs that are “far” even from poor expanders in time

[C, Sohler ’07, Kale, Seshadhri’07, Nachmias, Shapira’08]• time is needed

[Goldreich, Ron’02]

Page 17: Testing Cluster Structure of Graphs

Basic case : Testing expansion

• For graphs of bounded degree, we can distinguish expanders from graphs that are “far” even from poor expanders in time

[C, Sohler ’07, Kale, Seshadhri’07, Nachmias, Shapira’08]

• We are using basic properties of expanders: random walk of logarithmic length will mix (= will reach a random vertex)

• Similar to testing uniformity of a distribution

Page 18: Testing Cluster Structure of Graphs

Testing expansion

Idea:• If is an expander then end-nodes are random nodes

e we can estimate number of collisions well• If is far from expander then we will have many more

collisions– (requires non-trivial arguments)

Choose nodes at randomFor each chosen node run random walks of length

Count the number of collisions at the end-nodesIf the number of collisions is too large then RejectAccept

Page 19: Testing Cluster Structure of Graphs

Basic case : Testing expansion

• For graphs of bounded degree, we can distinguish expanders from graphs that are “far” even from poor expanders in time

[C, Sohler ’07, Kale, Seshadhri’07, Nachmias, Shapira’08]

• We can distinguish between a graph that is an -expander and any graph that is -far from any -expander in time

Page 20: Testing Cluster Structure of Graphs

Testing expansion and clustering

Can we apply similar approach to test -clusterability for ?

• We don’t know which vertex sets form expanders• We don’t know sizes of subgraphs-expanders

– If we knew, we could try to test distributions of endpoints of random walks …

• How to test small cuts?• We don’t know how to test distributions in time!

– We know this only for uniform distributions – but since we don’t know which vertices are in each set, we won’t get it …

Page 21: Testing Cluster Structure of Graphs

Testing expansion and clustering

Can we apply similar approach to test -clusterability for ?

Still, we will following the following key intuitions:• Randomly sample a constant number of points • Points in will define a “skeleton” of clusters• If two points will have same distribution of end-points of

random walks of logarithmic length are in same cluster• If two points are separated by a cut then they will have

different distribution of end-points of random walks

Page 22: Testing Cluster Structure of Graphs

Testing -clustering

• We would like to have the following algorithm:

Sample set of random verticesFor any

• distributions of endpoints of random walk of length starting at For each pair :

• if distributions , are close thenadd edge to “cluster graph” on vertex set

If H is a union of at most k cliques then AcceptElse Reject

Page 23: Testing Cluster Structure of Graphs

Testing -clustering

Challenges:• Too slow (testing if distribution are closed needs time)• How to analyze it …• We understand random walks in expanders, but we need to understand them also on the

rest of the graph

Sample set of random verticesFor any

• distributions of endpoints of random walk of length starting at

For each pair :• if distributions , are close then

add edge to “cluster graph” on vertex set If H is a union of at most k cliques then AcceptElse Reject

Page 24: Testing Cluster Structure of Graphs

Testing -clustering

Sample set of random verticesFor any

• multiset of endpoints of random walks of length starting at • number of pairwise self-collisions in

If there is with then abort and RejectFor each pair :

• If -distribution-test accepts that distributions of and are close then add edge to “cluster graph”

If H is a union of at most k cliques then AcceptElse Reject

𝜎=𝑂(𝑘6 ln (𝑘/𝜀 )6𝜀− 8)𝑟=𝑂 (𝑘2 (ln𝑘/𝜀 )5 /2√𝑛𝜀− 3)

ℓ=𝑂 (𝑘4 log𝑛𝜙𝑖𝑛− 2)

𝑠=𝑂 (𝑘 ln (𝑘+1 )𝜀−2)

Page 25: Testing Cluster Structure of Graphs

Testing -clustering

Sample set of random verticesFor any

• multiset of endpoints of random walks of length starting at • number of pairwise self-collisions in

If there is with then abort and RejectFor each pair :

• If -distribution-test accepts that distributions of and are close then add edge to “cluster graph”

If H is a union of at most k cliques then AcceptElse Reject

Why do we need to test if is not too large?Because then -distribution testing runs fast!

Page 26: Testing Cluster Structure of Graphs

Key theorem

• The algorithm accepts every -clusterable graph (of maximum degree ) with probability .

• The algorithm rejects every graph (of maximum degree ) that is -far from being -clusterable with probability , assuming that .

• Running time is

Page 27: Testing Cluster Structure of Graphs

Key properties (completeness)

• - vertex distribution of a random walk of length starting at • If is -clusterable then for any with and there is with such that for

large enough and , for any :

• For “short enough” , for any disjoint sets with , there exist , , such that for any

• If is -clusterable then there is with such that for large enough , for any :

Page 28: Testing Cluster Structure of Graphs

Key properties (completeness)

• Convergence within a cluster

How will the distribution of endpoints differ (for clusterable graphs)?For most of starting vertices :

We prove this using higher-order Cheeger’s inequality.Key challenge: even if expect to stay inside the cluster in a large fraction of random walks, we can sometime leave the cluster and then we don’t have “easy” control over the distribution of endpoints.

Page 29: Testing Cluster Structure of Graphs

Key properties (completeness)

• - vertex distribution of a random walk of length starting at • If is -clusterable then for any with and there is with such that for

large enough and , for any :

• For “short enough” , for any disjoint sets with , there exist , , such that for any

• If is -clusterable then there is with such that for large enough , for any :

Page 30: Testing Cluster Structure of Graphs

Key properties (completeness)

• Convergence outside a cluster

How will the distribution of endpoints differ?For most of starting vertices :

Page 31: Testing Cluster Structure of Graphs

Key properties (completeness)

• - vertex distribution of a random walk of length starting at • If is -clusterable then for any with and there is with such that for

large enough and , for any :

• For “short enough” , for any disjoint sets with , there exist , , such that for any

• If is -clusterable then there is with such that for large enough , for any :

Page 32: Testing Cluster Structure of Graphs

Key properties (completeness)

• Bound for distribution for clusterable graphs

Can the distribution vector of endpoints have big values?For most of starting vertices :

Page 33: Testing Cluster Structure of Graphs

Key properties (completeness)

• - vertex distribution of a random walk of length starting at • If is -clusterable then for any with and there is with such that for

large enough and , for any :

• For “short enough” , for any disjoint sets with , there exist , , such that for any

• If is -clusterable then there is with such that for large enough , for any :

Page 34: Testing Cluster Structure of Graphs

Key properties (completeness)

With these properties:• If is clusterable then the cluster graph will consist of at

most disjoint subgraphs, each forming a clique

Page 35: Testing Cluster Structure of Graphs

Key properties (soudness)

• If is -far from -clusterable with then there are disjoint subsets of such that for each :

and

Page 36: Testing Cluster Structure of Graphs

Testing -clustering

Sample set of random verticesFor any

• multiset of endpoints of random walks of length starting at • number of pairwise self-collisions in

If there is with then abort and RejectFor each pair :

• If -distribution-test accepts that distributions of and are close then add edge to “cluster graph”

If H is a union of at most k cliques then AcceptElse Reject

Page 37: Testing Cluster Structure of Graphs

Key properties (soudness)

• If is -far from -clusterable with then there are disjoint subsets of such that for each :

and

With this property, if is -far from clusterable then the cluster graph will have more than k components

Page 38: Testing Cluster Structure of Graphs

Key theorem

• The algorithm accepts every -clusterable graph (of maximum degree ) with probability .

• The algorithm rejects every graph (of maximum degree ) that is -far from being -clusterable with probability , assuming that .

• Running time is

Page 39: Testing Cluster Structure of Graphs

Extensions

Since -clustering is related to some properties of smallest eigenvalues of the relevant Laplacian matrix:• We can recognize graphs with a (large enough) gap

between the kth and k+1st smallest eigenvalue.

Page 40: Testing Cluster Structure of Graphs

Conclusions

Clustering (or clusterability) can be tested fast• by comparing distributions of random walks • drawing conclusions from the distributions

Tools:• Random sampling• Random walks• Spectral analysis