Community detection in stochastic block models via ...stochnet/talks/Massoulie16.pdfSupporting data: e.g. OSN’s friendship graph Variation: NSA’s “co-traveler programme” spot

Community detectionin

stochastic block modelsvia

spectral methods

Laurent Massoulié (MSR-Inria Joint Centre, Inria)

based on joint works with:

Dan Tomozei (EPFL), Marc Lelarge (Inria),

Jiaming Xu (UIUC), Charles Bordenave (IMT)

Community Detection

2

Embedding space

Clustering of items based on their observed interactions Typical approach: embedding (eg in Euclidean space), then K-means

Supporting data: e.g. OSN’s friendship graph

Variation: NSA’s “co-traveler programme” spot groups of suspect persons meeting regularly in unusual places

Application 1: contact recommendation in online social networks

recommend members of user’s implicit community

Supporting data: {user-item} matrixExample: Netflix prize {user-movie} ratings matrix

Use item communities to support recommendation “users who liked this also liked…”

Application 2: item recommendation to users

User / Movie 𝑓1 𝑓2 … 𝑓𝑚

𝑢1 ? ** ***

𝑢2 ? ?

…

𝑢𝑛 ***** ** **

Application 3: categorizing chemical reactivesin biology

Basic data: sets of chemicals jointly involved in specific reaction

More generally

Knowledge graph as generic representation of data

A1 has with B1 interaction of type C1

A2 has with B2 interaction of type C2

…

Outline

• The Stochastic Block Model • With labels

• With general types

• Performance of Spectral Methods • “rich signal” case

• The weak signal case: sparse observations• Phase transition on detectability

• Non-backtracking matrix and the Spectral Redemption

The Stochastic Block Model, aka planted partition model [Holland-Laskey-Leinhardt’83]• n “nodes” partitioned into r categories

• Category 𝜎: 𝛼𝜎 𝑛 nodes

• Edge between nodes u,v present

with probability 𝑠 × 𝑏𝜎 𝑢 𝜎 𝑣 /𝑛

(𝑠 : average degree, or signal strength)

Observation:

adjacency matrix A= + noise

Community Detection: provide clustering 𝜎 based on A that is correlated as much as possible with true planted clustering 𝜎, i.e. maximizing

Overlap:=Max𝜋

1

𝑛

𝑢=1

𝑛

1𝜋( 𝜎 𝑢 )=𝜎 𝑢 − 𝑀𝑎𝑥𝜎𝜖 𝑟 𝛼𝜎

The Labeled Stochastic Block Model

• Edges (u-v) labeled by Luv L (finite set)

• Drawn from distribution 𝜇𝜎 𝑢 𝜎(𝑣)

• Netflix case: labels 1-5 stars

The SBM with general types[Aldous’81; Lovász’12]• User type 𝜎(u) i.i.d. ~𝑃 in general set (e.g. uniform on [0,1])

• Edge (u-v) present w.p. 𝑏𝜎 𝑢 𝜎 𝑣 𝑠/𝑛 for “kernel” b

e.g.



• Technical assumptions: compact type set and continuity of symmetric functions 𝑏 and 𝜇

-1 +1

F(x)

x

yxFb yx ,

Outline






Spectral Clustering

• From Matrix A extract R normed eigenvectors 𝑥𝑖 corresponding to R largest eigenvalues 1 ≥ ⋯ ≥ 𝑅

• Form R-dimensional node representatives𝑦𝑢 = 𝑛(𝑥𝑖(𝑢))

𝑖=1…𝑅

• Group nodes u according to proximity of spectral representatives 𝑦𝑢

Karl Pearson’s PCA:

“On Lines and Planes of Closest Fit

to Systems of Points in Space", 1901

Illustration for R=2

-0.05 -0.04 -0.03 -0.02 -0.01 0-0.03

-0.02

-0.01

0

0.01

0.02

0.03

Principal component 1

Prin

cip

al co

mp

on

en

t 2

Clustering from SVD

SBM with K=4Netflix dataset

Result for “logarithmic” signal strength s

Assume s=(log(n)) and clusters are distinguishable, i.e.

∀𝜎 ≠ 𝜎′ ∃τ such that 𝑏𝜎τ ≠ 𝑏𝜎′τ

Then spectrum of A consists of • R eigenvalues i of order (s) (R K) and

• n-R eigenvalues i of order 𝑂 𝑠

Where 𝑅 = rank 𝑏𝑢𝑣𝛼𝑣 1≤𝑢,𝑣≤𝐾

Node representatives 𝑦𝑢 based on top R eigenvectors 𝑥𝑖 :Cluster according to underlying “blocks” except for negligible fraction of nodes

Proof arguments

Control spectral radius of noise matrix

+ perturbation of matrix eigen-elements

A = + random “noise” matrix

Block matrix non-zero eigenvalues: (s)

Controlling perturbation of eigen-elements of Hermitian matrices

𝐴 = 𝐴 + 𝑊

Where 𝐴 : observed; 𝐴 : signal; 𝑊 : perturbation

Perturbation of eigenvalues: order 1 ≥ ⋯ ≥ 𝑛 , 1 ≥ ⋯ ≥ 𝑛

Then 𝑖 − 𝑖 ≤ 𝜌 𝑊 , 𝑖 = 1,…𝑛 (Weyl)

(a simple consequence of Courant-Fisher’s minimax theorem:

For Hermitian 𝑛 × 𝑛 matrix 𝐻 :

𝑖 𝐻 = maxdim 𝑉 =𝑖

min𝑣∈𝑉, 𝑣 =1

𝑣∗𝐻𝑣

𝑖 𝐻 = mindim 𝑉 =𝑛−𝑖+1

max𝑣∈𝑉, 𝑣 =1

𝑣∗𝐻𝑣

Controlling perturbation of eigen-elements of Hermitian matrices

𝐴 = 𝐴 + 𝑊

Where 𝐴 : observed; 𝐴 : signal; 𝑊 : perturbation

Perturbation of eigenvectors: order 1 ≥ ⋯ ≥ 𝑛 , 1 ≥ ⋯ ≥ 𝑛

Let ∆≔ min𝑖,𝑗: 𝑖≠

𝑗

𝑖 − 𝑗 . If 𝜌 𝑊 < ∆/2

Then for all i and for any normed eigenvector 𝑥𝑖 ↔ 𝑖 , there exists 𝑥𝑖↔ 𝑖 s.t.

𝑥𝑖 ∙ 𝑥𝑖≥ 1 −𝜌 𝑊

∆ − 𝜌 𝑊

2

( A simple version of [Chandler Davis-Kahane’70] )

Application to SBM

Spectrum of 𝐴 : spectrum of 𝑏𝑢𝑣𝛼𝑣 1≤𝑢,𝑣≤𝐾 scaled by 𝑠 , hence ∆= 𝑠

Announced result follows if 𝜌 𝑊 = 𝑜 𝑠

Ramanujan graphs [Lubotzky-Phillips-Sarnak’88]

s-regular graphs such that eigenvalues of adjacency matrix verify

:=max𝑘: 𝑘 < 𝑠 𝑘 ≤ 2 𝑠 − 1

The best possible spectral gap [Alon-Boppana’91]

≥ 2 𝑠 − 1 − 𝑂𝑠

𝑑𝑖𝑎𝑚 𝐺

spectral separation properties “à la Ramanujan”

[Friedman’08] random s-regular graph with 𝑠 ≥ 3 verifies whp

=2 𝑠 − 1 + o(1)

[Feige-Ofek’05]: for Erdős-Rényi graph 𝐺 𝑛, 𝑠/𝑛 and 𝑠 = log 𝑛 , then whp = 𝑂 𝑠

Corollary: for SBM, ρ 𝐴 − 𝐴 = 𝑂 𝑠 when 𝑠 = log 𝑛

spectral separation properties “à la Ramanujan”

Consequence: in SBM with 𝑠 = log 𝑛 , whp

ρ 𝐴 − 𝐴 = 𝑂 𝑠 𝐴’s leading eigen-elements close to those of 𝐴

For 𝑠 = 1 , 𝜌 𝐴 − 𝐴 ~𝐶log 𝑛

log log 𝑛

spectral separation is lost

standard spectral method needs fixing to work in low signal regime

Discrepancy between SBM with small K and Netflix

Eigenvalue distributions

SBM with K=4 Netflix (subset)

motivates consideration of SBM with general types

4 outstandingeigenvalues

SBM with general types

• User types (u) i.i.d. ~𝑃 from general set (e.g. uniform on [0,1])

• Edge (u-v) present w.p. 𝑏𝜎 𝑢 𝜎 𝑣 𝑠/𝑛 for “kernel” b

e.g.



Form matrix {𝐴𝑖𝑗𝑊(𝐿𝑖𝑗)} from random projections 𝑊 𝑙 of labels

-1 +1

F(x)

x

yxFb yx ,

SBM with general types:Spectral properties for logarithmic 𝑠

Define kernel 𝐾(𝑥, 𝑦) ≔ 𝑙 𝑊 𝑙 𝑏𝑥𝑦𝜇𝑥𝑦 𝑙 and integral operator 𝑇𝑓 𝑥 ≔ 𝐾 𝑥, 𝑦 𝑓(𝑦)𝑃 𝑑𝑦

spectrum of 𝑠−1{𝐴𝑖𝑗𝑊(𝐿𝑖𝑗)} ≈ spectrum of 𝑇

Eigenvalue convergence: 𝑠−1𝑖𝑛

→ 𝑖

Eigenvector convergence: 𝑥𝑖(𝑢) → 𝜑𝑖 𝜎𝑢

Associated eigen-functionn

Type of node u

SBM with general types:Spectral properties for logarithmic 𝑠Flexible model

-power-law spectra (convolution operator + Fourier analysis)

-better matches to

Netflix data

-1 +1

F(x)

x

yxFb yx ,

SBM with general types:estimation for logarithmic 𝑠

For fixed R form R-dimensional node representatives 𝑦𝑢 = 𝑛𝑘

1𝑥𝑘(𝑢)

𝑘=1…𝑅

Embedding allows consistent estimation of label distributions

Accuracy controlled by ε and ε𝑅 ≔ 𝑘>𝑅 𝑘2

Prob(label(i,j)=5)?

Node i

Node j

Use empirical distribution of labels L(i,k) for k in -neighborhood of j

Outline






𝑠0: threshold for Giant component

Weak signal strength: s = (1)

• Correct classification of all but negligible fraction of nodes impossible (isolated nodes…)

Performance of clustering 𝜎 assessed by overlap metric:

Overlap:=Max𝜋

1

𝑛

𝑢=1

𝑛

1𝜋( 𝜎 𝑢 )=𝜎 𝑢 − 𝑀𝑎𝑥𝜎𝜖 𝑟 𝛼𝜎

Signal strength s

Overlap

Signal strength s

Overlap

𝑠𝑐𝑠0

Positively correlated community detection

Symmetric two-communities scenario: 𝐸 𝐴 =

Conjecture ([Decelle-Krzakala-Moore-Zdeborova 2011]:

• For 𝜏: =𝑎−𝑏 2

2 𝑎+𝑏< 1 , correlation tends to zero for any 𝜎

Proven by [Mossel-Neeman-Sly 2012]

• For 𝜏 > 1 , positive correlation can be achieved

Proven by [LM 2013] and [Mossel-Neeman-Sly 2013]

“Spectral redemption” conjecture [Krzakala-Moore-Mossel-Neeman-Sly-Zdeborova-Zhang 2013]):

Spectral methods based on non-backtracking matrix can achieve positively correlated detection

a/n b/n

a/nb/n

n/2

n/2

Non-backtracking matrix 𝐵 of graph 𝐺 = (𝑉, 𝐸)

Defined on oriented edges 𝑢𝑣 for u, 𝑣 𝜖𝐸 : 𝐵𝑢𝑣,𝑥𝑦 = 1𝑣=𝑥1𝑢≠𝑦

Asymmetric, such that 𝐵𝑒𝑓𝑘 = number of non-backtracking paths

on G of length k+1 starting at e and ending at f

u v ye f

u ve

f

Non-backtracking matrix 𝐵 of graph 𝐺 = (𝑉, 𝐸)

Defined on oriented edges 𝑢𝑣 for u, 𝑣 𝜖𝐸 : 𝐵𝑢𝑣,𝑥𝑦 = 1𝑣=𝑥1𝑢≠𝑦

Ihara-Bass formula [Ihara 66]for A: adjacency matrix, D: diagonal matrix of node degrees,

1 − 𝑢2 𝑛−𝑚Det 𝐼 − 𝑢𝐵 = 𝐷𝑒𝑡 𝐼 − 𝑢𝐴 + 𝑢2 𝐷 − 𝐼

Hence s-regular graph Ramanujan iff eigenvalues 𝑘 of B verify

′ ≔ max𝑘: 𝑘 <1

𝑘 ≤ 1

A definition that extends to non-regular graphs

([Stark-Terras 96]’s graph theory Riemann hypothesis)

Notations and assumptions

Let 𝜇1, … , 𝜇𝑟 , 𝜇1 ≥ ⋯ ≥ 𝜇𝑟 be the leading eigenvalues of E 𝐴

and 𝑥𝑖 corresponding eigenvectors in 𝑅𝑛

Let 𝑦𝑖: 𝑦𝑖 𝑒 = 𝑥𝑖 𝑒1 , 𝑦𝑖 ∈ 𝑅𝑛 𝑛−1 : lift of eigenvector 𝑥𝑖 to 𝑅𝑛 𝑛−1

In symmetric two-community case, 𝜇 ‘s: eigenvalues of 𝑎/2 𝑏/2𝑏/2 𝑎/2

𝜇1 =𝑎+𝑏

2, 𝜇2=

𝑎−𝑏

2(hence 𝜏 =

𝜇22

𝜇1)

Assume constant per-community average degree: ∀𝑗 ∈ 𝑟 , 𝑖=1𝑟 𝛼𝑖𝑏𝑖𝑗 ≡ 𝜇1

Notations and assumptions

Call eigenvalue 𝜇𝑖 “visible” if 𝜇𝑖 > 𝜇1

Let 𝑟0: 𝜇𝑟0 > 𝜇1 ≥ 𝜇𝑟0+1 : index of smallest “visible” eigenvalue

Kesten-Stigum condition:

𝑟0> 1, i.e. there is some 𝑖∗ > 1 such that 𝜇𝑖∗ is “visible”

In symmetric two-community case, Kesten-Stigum condition:

𝜇22 > 𝜇1 or equivalently, 𝜏 =

𝜇22

𝜇1> 1

Main result: spectrum of non-backtracking matrix for stochastic block model

With high probability as 𝑛 → ∞, eigenvalues 𝑖 of 𝐵 satisfy

𝑖 ≤ 𝑟0 𝑖 − 𝜇𝑖 = 𝑜 1𝑖 > 𝑟0 𝑖 ≤ 𝜇1 + o 1

For 𝑖 ≤ 𝑟0 , if 𝜇𝑖 has multiplicity 1, corresponding eigenvector:

asymptotically parallel to 𝐵𝑘 𝐵𝑘 ′𝑦𝑖 for 𝑘~ log 𝑛

Illustration for 2-community symmetricStochastic block model

𝑖∗

Corollary 1: proof of Spectral Redemption Conjecture

Assume 𝑟0> 1, equal community sizes 𝛼𝑖≡1

𝑟and 𝜇𝑖∗ has multiplicity 1 for

some 𝑖∗: 1 < 𝑖∗ ≤ 𝑟0

Then positive correlation obtained by following procedure

1) Extract eigenvector 𝑧𝑖∗ of 𝐵

2) Form signs s 𝑒 = 𝑠𝑔𝑛 𝑧𝑖∗ 𝑒 − τ

3) To each node 𝑢 assign s 𝑢 = 𝑠(𝑒) for edge 𝑒 picked uniformly amonggraph edges with 𝑒1 = 𝑢

4) Assign community 𝜎 𝑢 at random according to distribution 𝜋 s 𝑢

ue

e’

e’’

Illustration for Erdős-Rényi graph with 𝑛 =500, 𝛼 = 4

Corollary 2:Erdős-Rényi graphs are nearly Ramanujan

For Erdős-Rényi graph 𝐺𝛼

𝑛, 𝑛 , spectrum 𝑖 of associated non-

backtracking matrix 𝐵 satisfies with high probability as 𝑛 → ∞

max𝑘: 𝑘 <1𝑘 ≤ 1 + o(1)

An approximate version of

max𝑘: 𝑘 <1𝑘 ≤ 1

Property generalizes notion of Ramanujan graph to non-regular case

[Ihara’66, Stark-Terras’96]

Hence: Corollary 2 a « non-regular » version of Friedman’s result for regular graphs:

most random graphs (with given number of edges) approximatelyRamanujan according to extended definition

Proof strategy

Key step: for 𝑘~ log𝑛 prove matrix expansion 𝐵𝑘 = 𝑖=1𝑟 𝑖

𝑘 𝑣𝑖𝑤𝑖′ + 𝑅𝑘

where 𝑖~𝜇𝑖 (near-eigenvalue), 𝑣𝑖 ∝ 𝐵𝑘 𝐵𝑘 ′𝑦𝑖 (near- right eigenvector),

𝑤𝑖 ∝ 𝐵𝑘 ′𝑦𝑖 (near- left eigenvector), 𝑅𝑘 a perturbation

and with high probability

𝑖 ≠ 𝑗 𝑣𝑖′𝑣𝑗 , 𝑣𝑖

′𝑤𝑗 , 𝑤𝑖′𝑤𝑗 = 𝑜 1 , (near-orthonormal system)

𝑣𝑖′𝑤𝑖 > 𝑐 > 0 (low condition number)

𝜌 𝑅𝑘 = 𝑂 𝜇1𝑘 (negligible perturbation)

Yields the result, after leveraging Bauer-Fike theorem

(a tool to control impact of additive perturbation on eigenvalues of matrix not necessarily symmetric)

Low-rank

Proof elements (perturbations negligible)

To show 𝜌 𝑅𝑘 = 𝑂 𝜇1𝑘 , establish « small norm in complement »:

𝑠𝑢𝑝 𝑥 =1,𝑥′𝑤𝑖=0,𝑖=1,…,𝑟 𝐵𝑘𝑥 = 𝑂 𝜇1𝑘

Challenge: directions 𝑤𝑖 random and correlated with 𝐵𝑘

Remaining mysteries about SBM’s (1)

Conjectured “phase diagram” for more than 2 blocks

(assuming fixed inter-community parameter b)

Intra-communityparameter a

Number ofcommunities r

Above Kesten-Stigum thresholdDetection “easy”(spectral methods or BP)

Detection hard but feasible(how? In polynomial time?)

Detection infeasible

r=4 r=5

Remaining mysteries about SBM’s (2)

Clique detection problem: add a size-K clique to random graph with edge-probability ½

i.e. a 2-block SBM with unbalanced

block sizes:

for 𝐾 = 𝑛 clique easily detectable (e.g. inspection of node degrees)

are there polynomial-time algorithms for smaller yet large K?

(e.g. 𝐾 = 3 𝑛 )

A notoriously hard problem (“planted clique detection” recently proposed as a new benchmark of algorithmic hardness)

K

n-K½

½

½

Conclusions and Outlook

Variations of basic spectral methods still to be invented: interesting mathematics and practical relevance

Detection in SBM = rich playground for analysis of computational complexity with methods of statistical physics

Computationally efficient methods for “hard” cases (planted clique, intermediate phase for multiple communities)?

Non-regular Ramanujan graphs: theory still in its infancy (proper analogue of Alon-Boppana’s theorem missing)

Thanks!

Documents

Community detection in stochastic block models via ...stochnet/talks/Massoulie16.pdfSupporting data: e.g. OSN’s friendship graph Variation: NSA’s “co-traveler programme” spot