Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Community detectionin
stochastic block modelsvia
spectral methods
Laurent Massoulié (MSR-Inria Joint Centre, Inria)
based on joint works with:
Dan Tomozei (EPFL), Marc Lelarge (Inria),
Jiaming Xu (UIUC), Charles Bordenave (IMT)
Community Detection
2
Embedding space
Clustering of items based on their observed interactions Typical approach: embedding (eg in Euclidean space), then K-means
Supporting data: e.g. OSN’s friendship graph
Variation: NSA’s “co-traveler programme” spot groups of suspect persons meeting regularly in unusual places
Application 1: contact recommendation in online social networks
recommend members of user’s implicit community
Supporting data: {user-item} matrixExample: Netflix prize {user-movie} ratings matrix
Use item communities to support recommendation “users who liked this also liked…”
Application 2: item recommendation to users
User / Movie 𝑓1 𝑓2 … 𝑓𝑚
𝑢1 ? ** ***
𝑢2 ? ?
…
𝑢𝑛 ***** ** **
Application 3: categorizing chemical reactivesin biology
Basic data: sets of chemicals jointly involved in specific reaction
More generally
Knowledge graph as generic representation of data
A1 has with B1 interaction of type C1
A2 has with B2 interaction of type C2
…
Outline
• The Stochastic Block Model • With labels
• With general types
• Performance of Spectral Methods • “rich signal” case
• The weak signal case: sparse observations• Phase transition on detectability
• Non-backtracking matrix and the Spectral Redemption
The Stochastic Block Model, aka planted partition model [Holland-Laskey-Leinhardt’83]• n “nodes” partitioned into r categories
• Category 𝜎: 𝛼𝜎 𝑛 nodes
• Edge between nodes u,v present
with probability 𝑠 × 𝑏𝜎 𝑢 𝜎 𝑣 /𝑛
(𝑠 : average degree, or signal strength)
Observation:
adjacency matrix A= + noise
Community Detection: provide clustering 𝜎 based on A that is correlated as much as possible with true planted clustering 𝜎, i.e. maximizing
Overlap:=Max𝜋
1
𝑛
𝑢=1
𝑛
1𝜋( 𝜎 𝑢 )=𝜎 𝑢 − 𝑀𝑎𝑥𝜎𝜖 𝑟 𝛼𝜎
The Labeled Stochastic Block Model
• Edges (u-v) labeled by Luv L (finite set)
• Drawn from distribution 𝜇𝜎 𝑢 𝜎(𝑣)
• Netflix case: labels 1-5 stars
The SBM with general types[Aldous’81; Lovász’12]• User type 𝜎(u) i.i.d. ~𝑃 in general set (e.g. uniform on [0,1])
• Edge (u-v) present w.p. 𝑏𝜎 𝑢 𝜎 𝑣 𝑠/𝑛 for “kernel” b
e.g.
• Edges (u-v) labeled by Luv L (finite set)
• Drawn from distribution 𝜇𝜎 𝑢 𝜎(𝑣)
• Technical assumptions: compact type set and continuity of symmetric functions 𝑏 and 𝜇
-1 +1
F(x)
x
yxFb yx ,
Outline
• The Stochastic Block Model • With labels
• With general types
• Performance of Spectral Methods • “rich signal” case
• The weak signal case: sparse observations• Phase transition on detectability
• Non-backtracking matrix and the Spectral Redemption
Spectral Clustering
• From Matrix A extract R normed eigenvectors 𝑥𝑖 corresponding to R largest eigenvalues 1 ≥ ⋯ ≥ 𝑅
• Form R-dimensional node representatives𝑦𝑢 = 𝑛(𝑥𝑖(𝑢))
𝑖=1…𝑅
• Group nodes u according to proximity of spectral representatives 𝑦𝑢
Karl Pearson’s PCA:
“On Lines and Planes of Closest Fit
to Systems of Points in Space", 1901
Illustration for R=2
-0.05 -0.04 -0.03 -0.02 -0.01 0-0.03
-0.02
-0.01
0
0.01
0.02
0.03
Principal component 1
Prin
cip
al co
mp
on
en
t 2
Clustering from SVD
SBM with K=4Netflix dataset
Result for “logarithmic” signal strength s
Assume s=(log(n)) and clusters are distinguishable, i.e.
∀𝜎 ≠ 𝜎′ ∃τ such that 𝑏𝜎τ ≠ 𝑏𝜎′τ
Then spectrum of A consists of • R eigenvalues i of order (s) (R K) and
• n-R eigenvalues i of order 𝑂 𝑠
Where 𝑅 = rank 𝑏𝑢𝑣𝛼𝑣 1≤𝑢,𝑣≤𝐾
Node representatives 𝑦𝑢 based on top R eigenvectors 𝑥𝑖 :Cluster according to underlying “blocks” except for negligible fraction of nodes
Proof arguments
Control spectral radius of noise matrix
+ perturbation of matrix eigen-elements
A = + random “noise” matrix
Block matrix non-zero eigenvalues: (s)
Controlling perturbation of eigen-elements of Hermitian matrices
𝐴 = 𝐴 + 𝑊
Where 𝐴 : observed; 𝐴 : signal; 𝑊 : perturbation
Perturbation of eigenvalues: order 1 ≥ ⋯ ≥ 𝑛 , 1 ≥ ⋯ ≥ 𝑛
Then 𝑖 − 𝑖 ≤ 𝜌 𝑊 , 𝑖 = 1,…𝑛 (Weyl)
(a simple consequence of Courant-Fisher’s minimax theorem:
For Hermitian 𝑛 × 𝑛 matrix 𝐻 :
𝑖 𝐻 = maxdim 𝑉 =𝑖
min𝑣∈𝑉, 𝑣 =1
𝑣∗𝐻𝑣
𝑖 𝐻 = mindim 𝑉 =𝑛−𝑖+1
max𝑣∈𝑉, 𝑣 =1
𝑣∗𝐻𝑣
Controlling perturbation of eigen-elements of Hermitian matrices
𝐴 = 𝐴 + 𝑊
Where 𝐴 : observed; 𝐴 : signal; 𝑊 : perturbation
Perturbation of eigenvectors: order 1 ≥ ⋯ ≥ 𝑛 , 1 ≥ ⋯ ≥ 𝑛
Let ∆≔ min𝑖,𝑗: 𝑖≠
𝑗
𝑖 − 𝑗 . If 𝜌 𝑊 < ∆/2
Then for all i and for any normed eigenvector 𝑥𝑖 ↔ 𝑖 , there exists 𝑥𝑖↔ 𝑖 s.t.
𝑥𝑖 ∙ 𝑥𝑖≥ 1 −𝜌 𝑊
∆ − 𝜌 𝑊
2
( A simple version of [Chandler Davis-Kahane’70] )
Application to SBM
Spectrum of 𝐴 : spectrum of 𝑏𝑢𝑣𝛼𝑣 1≤𝑢,𝑣≤𝐾 scaled by 𝑠 , hence ∆= 𝑠
Announced result follows if 𝜌 𝑊 = 𝑜 𝑠
Ramanujan graphs [Lubotzky-Phillips-Sarnak’88]
s-regular graphs such that eigenvalues of adjacency matrix verify
:=max𝑘: 𝑘 < 𝑠 𝑘 ≤ 2 𝑠 − 1
The best possible spectral gap [Alon-Boppana’91]
≥ 2 𝑠 − 1 − 𝑂𝑠
𝑑𝑖𝑎𝑚 𝐺
spectral separation properties “à la Ramanujan”
[Friedman’08] random s-regular graph with 𝑠 ≥ 3 verifies whp
=2 𝑠 − 1 + o(1)
[Feige-Ofek’05]: for Erdős-Rényi graph 𝐺 𝑛, 𝑠/𝑛 and 𝑠 = log 𝑛 , then whp = 𝑂 𝑠
Corollary: for SBM, ρ 𝐴 − 𝐴 = 𝑂 𝑠 when 𝑠 = log 𝑛
spectral separation properties “à la Ramanujan”
Consequence: in SBM with 𝑠 = log 𝑛 , whp
ρ 𝐴 − 𝐴 = 𝑂 𝑠 𝐴’s leading eigen-elements close to those of 𝐴
For 𝑠 = 1 , 𝜌 𝐴 − 𝐴 ~𝐶log 𝑛
log log 𝑛
spectral separation is lost
standard spectral method needs fixing to work in low signal regime
Discrepancy between SBM with small K and Netflix
Eigenvalue distributions
SBM with K=4 Netflix (subset)
motivates consideration of SBM with general types
4 outstandingeigenvalues
SBM with general types
• User types (u) i.i.d. ~𝑃 from general set (e.g. uniform on [0,1])
• Edge (u-v) present w.p. 𝑏𝜎 𝑢 𝜎 𝑣 𝑠/𝑛 for “kernel” b
e.g.
• Edges (u-v) labeled by Luv L (finite set)
• Drawn from distribution 𝜇𝜎 𝑢 𝜎(𝑣)
Form matrix {𝐴𝑖𝑗𝑊(𝐿𝑖𝑗)} from random projections 𝑊 𝑙 of labels
-1 +1
F(x)
x
yxFb yx ,
SBM with general types:Spectral properties for logarithmic 𝑠
Define kernel 𝐾(𝑥, 𝑦) ≔ 𝑙 𝑊 𝑙 𝑏𝑥𝑦𝜇𝑥𝑦 𝑙 and integral operator 𝑇𝑓 𝑥 ≔ 𝐾 𝑥, 𝑦 𝑓(𝑦)𝑃 𝑑𝑦
spectrum of 𝑠−1{𝐴𝑖𝑗𝑊(𝐿𝑖𝑗)} ≈ spectrum of 𝑇
Eigenvalue convergence: 𝑠−1𝑖𝑛
→ 𝑖
Eigenvector convergence: 𝑥𝑖(𝑢) → 𝜑𝑖 𝜎𝑢
Associated eigen-functionn
Type of node u
SBM with general types:Spectral properties for logarithmic 𝑠Flexible model
-power-law spectra (convolution operator + Fourier analysis)
-better matches to
Netflix data
-1 +1
F(x)
x
yxFb yx ,
SBM with general types:estimation for logarithmic 𝑠
For fixed R form R-dimensional node representatives 𝑦𝑢 = 𝑛𝑘
1𝑥𝑘(𝑢)
𝑘=1…𝑅
Embedding allows consistent estimation of label distributions
Accuracy controlled by ε and ε𝑅 ≔ 𝑘>𝑅 𝑘2
Prob(label(i,j)=5)?
Node i
Node j
Use empirical distribution of labels L(i,k) for k in -neighborhood of j
Outline
• The Stochastic Block Model • With labels
• With general types
• Performance of Spectral Methods • “rich signal” case
• The weak signal case: sparse observations• Phase transition on detectability
• Non-backtracking matrix and the Spectral Redemption
𝑠0: threshold for Giant component
Weak signal strength: s = (1)
• Correct classification of all but negligible fraction of nodes impossible (isolated nodes…)
Performance of clustering 𝜎 assessed by overlap metric:
Overlap:=Max𝜋
1
𝑛
𝑢=1
𝑛
1𝜋( 𝜎 𝑢 )=𝜎 𝑢 − 𝑀𝑎𝑥𝜎𝜖 𝑟 𝛼𝜎
Signal strength s
Overlap
Signal strength s
Overlap
𝑠𝑐𝑠0
Positively correlated community detection
Symmetric two-communities scenario: 𝐸 𝐴 =
Conjecture ([Decelle-Krzakala-Moore-Zdeborova 2011]:
• For 𝜏: =𝑎−𝑏 2
2 𝑎+𝑏< 1 , correlation tends to zero for any 𝜎
Proven by [Mossel-Neeman-Sly 2012]
• For 𝜏 > 1 , positive correlation can be achieved
Proven by [LM 2013] and [Mossel-Neeman-Sly 2013]
“Spectral redemption” conjecture [Krzakala-Moore-Mossel-Neeman-Sly-Zdeborova-Zhang 2013]):
Spectral methods based on non-backtracking matrix can achieve positively correlated detection
a/n b/n
a/nb/n
n/2
n/2
Non-backtracking matrix 𝐵 of graph 𝐺 = (𝑉, 𝐸)
Defined on oriented edges 𝑢𝑣 for u, 𝑣 𝜖𝐸 : 𝐵𝑢𝑣,𝑥𝑦 = 1𝑣=𝑥1𝑢≠𝑦
Asymmetric, such that 𝐵𝑒𝑓𝑘 = number of non-backtracking paths
on G of length k+1 starting at e and ending at f
u v ye f
u ve
f
Non-backtracking matrix 𝐵 of graph 𝐺 = (𝑉, 𝐸)
Defined on oriented edges 𝑢𝑣 for u, 𝑣 𝜖𝐸 : 𝐵𝑢𝑣,𝑥𝑦 = 1𝑣=𝑥1𝑢≠𝑦
Ihara-Bass formula [Ihara 66]for A: adjacency matrix, D: diagonal matrix of node degrees,
1 − 𝑢2 𝑛−𝑚Det 𝐼 − 𝑢𝐵 = 𝐷𝑒𝑡 𝐼 − 𝑢𝐴 + 𝑢2 𝐷 − 𝐼
Hence s-regular graph Ramanujan iff eigenvalues 𝑘 of B verify
′ ≔ max𝑘: 𝑘 <1
𝑘 ≤ 1
A definition that extends to non-regular graphs
([Stark-Terras 96]’s graph theory Riemann hypothesis)
Notations and assumptions
Let 𝜇1, … , 𝜇𝑟 , 𝜇1 ≥ ⋯ ≥ 𝜇𝑟 be the leading eigenvalues of E 𝐴
and 𝑥𝑖 corresponding eigenvectors in 𝑅𝑛
Let 𝑦𝑖: 𝑦𝑖 𝑒 = 𝑥𝑖 𝑒1 , 𝑦𝑖 ∈ 𝑅𝑛 𝑛−1 : lift of eigenvector 𝑥𝑖 to 𝑅𝑛 𝑛−1
In symmetric two-community case, 𝜇 ‘s: eigenvalues of 𝑎/2 𝑏/2𝑏/2 𝑎/2
𝜇1 =𝑎+𝑏
2, 𝜇2=
𝑎−𝑏
2(hence 𝜏 =
𝜇22
𝜇1)
Assume constant per-community average degree: ∀𝑗 ∈ 𝑟 , 𝑖=1𝑟 𝛼𝑖𝑏𝑖𝑗 ≡ 𝜇1
Notations and assumptions
Call eigenvalue 𝜇𝑖 “visible” if 𝜇𝑖 > 𝜇1
Let 𝑟0: 𝜇𝑟0 > 𝜇1 ≥ 𝜇𝑟0+1 : index of smallest “visible” eigenvalue
Kesten-Stigum condition:
𝑟0> 1, i.e. there is some 𝑖∗ > 1 such that 𝜇𝑖∗ is “visible”
In symmetric two-community case, Kesten-Stigum condition:
𝜇22 > 𝜇1 or equivalently, 𝜏 =
𝜇22
𝜇1> 1
Main result: spectrum of non-backtracking matrix for stochastic block model
With high probability as 𝑛 → ∞, eigenvalues 𝑖 of 𝐵 satisfy
𝑖 ≤ 𝑟0 𝑖 − 𝜇𝑖 = 𝑜 1𝑖 > 𝑟0 𝑖 ≤ 𝜇1 + o 1
For 𝑖 ≤ 𝑟0 , if 𝜇𝑖 has multiplicity 1, corresponding eigenvector:
asymptotically parallel to 𝐵𝑘 𝐵𝑘 ′𝑦𝑖 for 𝑘~ log 𝑛
Illustration for 2-community symmetricStochastic block model
𝑖∗
Corollary 1: proof of Spectral Redemption Conjecture
Assume 𝑟0> 1, equal community sizes 𝛼𝑖≡1
𝑟and 𝜇𝑖∗ has multiplicity 1 for
some 𝑖∗: 1 < 𝑖∗ ≤ 𝑟0
Then positive correlation obtained by following procedure
1) Extract eigenvector 𝑧𝑖∗ of 𝐵
2) Form signs s 𝑒 = 𝑠𝑔𝑛 𝑧𝑖∗ 𝑒 − τ
3) To each node 𝑢 assign s 𝑢 = 𝑠(𝑒) for edge 𝑒 picked uniformly amonggraph edges with 𝑒1 = 𝑢
4) Assign community 𝜎 𝑢 at random according to distribution 𝜋 s 𝑢
ue
e’
e’’
Illustration for Erdős-Rényi graph with 𝑛 =500, 𝛼 = 4
Corollary 2:Erdős-Rényi graphs are nearly Ramanujan
For Erdős-Rényi graph 𝐺𝛼
𝑛, 𝑛 , spectrum 𝑖 of associated non-
backtracking matrix 𝐵 satisfies with high probability as 𝑛 → ∞
max𝑘: 𝑘 <1𝑘 ≤ 1 + o(1)
An approximate version of
max𝑘: 𝑘 <1𝑘 ≤ 1
Property generalizes notion of Ramanujan graph to non-regular case
[Ihara’66, Stark-Terras’96]
Hence: Corollary 2 a « non-regular » version of Friedman’s result for regular graphs:
most random graphs (with given number of edges) approximatelyRamanujan according to extended definition
Proof strategy
Key step: for 𝑘~ log𝑛 prove matrix expansion 𝐵𝑘 = 𝑖=1𝑟 𝑖
𝑘 𝑣𝑖𝑤𝑖′ + 𝑅𝑘
where 𝑖~𝜇𝑖 (near-eigenvalue), 𝑣𝑖 ∝ 𝐵𝑘 𝐵𝑘 ′𝑦𝑖 (near- right eigenvector),
𝑤𝑖 ∝ 𝐵𝑘 ′𝑦𝑖 (near- left eigenvector), 𝑅𝑘 a perturbation
and with high probability
𝑖 ≠ 𝑗 𝑣𝑖′𝑣𝑗 , 𝑣𝑖
′𝑤𝑗 , 𝑤𝑖′𝑤𝑗 = 𝑜 1 , (near-orthonormal system)
𝑣𝑖′𝑤𝑖 > 𝑐 > 0 (low condition number)
𝜌 𝑅𝑘 = 𝑂 𝜇1𝑘 (negligible perturbation)
Yields the result, after leveraging Bauer-Fike theorem
(a tool to control impact of additive perturbation on eigenvalues of matrix not necessarily symmetric)
Low-rank
Proof elements (perturbations negligible)
To show 𝜌 𝑅𝑘 = 𝑂 𝜇1𝑘 , establish « small norm in complement »:
𝑠𝑢𝑝 𝑥 =1,𝑥′𝑤𝑖=0,𝑖=1,…,𝑟 𝐵𝑘𝑥 = 𝑂 𝜇1𝑘
Challenge: directions 𝑤𝑖 random and correlated with 𝐵𝑘
Remaining mysteries about SBM’s (1)
Conjectured “phase diagram” for more than 2 blocks
(assuming fixed inter-community parameter b)
Intra-communityparameter a
Number ofcommunities r
Above Kesten-Stigum thresholdDetection “easy”(spectral methods or BP)
Detection hard but feasible(how? In polynomial time?)
Detection infeasible
r=4 r=5
Remaining mysteries about SBM’s (2)
Clique detection problem: add a size-K clique to random graph with edge-probability ½
i.e. a 2-block SBM with unbalanced
block sizes:
for 𝐾 = 𝑛 clique easily detectable (e.g. inspection of node degrees)
are there polynomial-time algorithms for smaller yet large K?
(e.g. 𝐾 = 3 𝑛 )
A notoriously hard problem (“planted clique detection” recently proposed as a new benchmark of algorithmic hardness)
K
n-K½
½
½
Conclusions and Outlook
Variations of basic spectral methods still to be invented: interesting mathematics and practical relevance
Detection in SBM = rich playground for analysis of computational complexity with methods of statistical physics
Computationally efficient methods for “hard” cases (planted clique, intermediate phase for multiple communities)?
Non-regular Ramanujan graphs: theory still in its infancy (proper analogue of Alon-Boppana’s theorem missing)
Thanks!