Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Estimation of Latent Variable Densities in
Networks
Sharmodeep Bhattacharyya
Department of Statistics
University of California, Berkeley and Oregon State University
Workshop on Theory of Big Data, UCL, January, 2015
(Joint work with Peter J. Bickel, UC Berkeley and Patrick J. Wolfe,
UCL)
Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 1 / 32
Outline
1 Introduction and Motivation
2 Feature and Models of Networks
Nonparametric Latent Space Models
Density Functional Estimation
Estimation of Latent Variable Density
Regularization
3 Summary
Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 2 / 32
Introduction and Motivation
Network Data
G = (V ,E ): undirected graph and V = {v1, · · · , vn} arbitrarily labeled vertices.
Adjacency matrices (Symmetric), [Aij ]ni ,j=1 numerically represent network data:
Aij =
1 if node i links to node j ,
0 otherwise.
Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 3 / 32
Introduction and Motivation
Example: Collegiate Social Network
Figure : Facebook network adjacency matrix for two different colleges in two different rows
(Traud et. al. (2011) SIAM Review).Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 4 / 32
Feature and Models of Networks Nonparametric Models
Outline
1 Introduction and Motivation
2 Feature and Models of Networks
Nonparametric Latent Space Models
Density Functional Estimation
Estimation of Latent Variable Density
Regularization
3 Summary
Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 5 / 32
Feature and Models of Networks Nonparametric Models
Nonparametric Latent Variable Models
Derived from representation of exchangeable random infinite array by Aldous and
Hoover (1983).
NP Model
Define P({Aij}ni ,j=1) conditionally given latent variables {ξi}ni=1 associated with vertices
{vi}ni=1 respectively. (Bickel & Chen (2009), Bollobas et.al. (2007), Hoff et.al. (2002)).
ξ1, . . . , ξniid∼ U(0, 1)
Pr(Aij = 1|ξi = u, ξj = v) = hn(u, v) = ρnw(u, v),
w(u, v) is the conditional latent variable density given Aij = 1.
Define λn ≡ nρn as the expected degree parameter and P = [Pij ]ni,j = [ρnw(ξi , ξj)]ni,j .
hn: not uniquely defined. hn(ϕ(u), ϕ(v)
), with measure-preserving ϕ, gives same model.
Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 6 / 32
Feature and Models of Networks Nonparametric Models
Stochastic Block Model (Holland, Laskey and Leinhardt 1983)
A K -block stochastic block model with parameters (π,P) is defined as
follows. Consider latent variable corresponding to vertices as
z = (z1, z2, . . . , zn) with
z1, . . . , zniid∼ Multinomial(1; (π1, . . . , πK ))
Pr(Aij = 1|z i , z j) = Pz iz j ,
where P = [Pab] is a K × K symmetric matrix for undirected networks.
Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 7 / 32
Feature and Models of Networks Nonparametric Models
Parameters of Interest
Density Functional
Integral parameter on subgraph, R is defined as integral -
P(R) = E
∏(i ,j)∈R
h(ξi , ξj)∏
(i ,j)∈R
(1− h(ξi , ξj))
where, R = {(i , j) /∈ R, i ∈ V (G ), j ∈ V (G )}.
Density
Estimate a representation of the latent variable density w or h.
Estimate equivalence class of latent variable density w or h with
respect to norms of the form of cut-metric (Lovasz (2006)).
Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 8 / 32
Feature and Models of Networks Density Functional Estimation
Outline
1 Introduction and Motivation
2 Feature and Models of Networks
Nonparametric Latent Space Models
Density Functional Estimation
Estimation of Latent Variable Density
Regularization
3 Summary
Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 9 / 32
Feature and Models of Networks Density Functional Estimation
Empirical “Moments”/ Count statistics
Count statistics are normalized subgraph counts and smooth functions of them.
The subgraph count,P(R), for subgraph R is -
P(R) =1(n
p
)|Hom(R)|
∑S⊆Kn,S∼=R
1(S ⊆ G ) (1)
where, Hom(R) is the group of Homomorphisms of R and Kn is the complete graph on n
vertices.
Examples
(a) Average degree of a network is a count statistic, D = 1n
∑ni=1 Di and Di =
∑j 6=i Aij .
(b) Another well-known statistic is
Transitivity =Normalized Count of ∆
Normalized Count of ∆ + ‘V ′
Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 10 / 32
Feature and Models of Networks Density Functional Estimation
Computation of Count and Variance of Count Statistics
Counts:
Worst case computational complexity of exact counting of number of
subgraphs, R in Gn is O(np), where, p = |V (R)|.
Computational complexity varies with subgraph and sparsity of graph.
For dense graphs and complex patterns, the approximate counts are
very crude.
Variances: Finding variances of complex patterns also become
theoretically challenging.
So, instead of exact counting we try approximate counting (Similar idea
used by Holmes and Reinert (2004)).
Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 11 / 32
Feature and Models of Networks Density Functional Estimation
Bootstrap Scheme
1 For bth iterate of the bootstrap, b = 1, . . . ,B,
2 Fix p = Size of R = |V (R)|.
3 Perform random breadth-first search described in Wernicke (2006) with a set
of sampling probabilities (q1, . . . , qp)
4 Calculate Pb(R), given by formula
Pb(R) =1∏p
d=1 qd(np
)|Hom(R)|
∑S∈SRp
1(S ∼= R)
PB(R) =1
B
B∑b=1
Pb(R)
where, SRp is the set of all size-p randomly selected subgraphs of G .
Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 12 / 32
Feature and Models of Networks Density Functional Estimation
Bootstrap Theorem
Theorem (B. and Bickel (2013))
Suppose R is fixed, acyclic with |V (R)| = p and∫∞
0
∫∞0 w2|R|(u, v)dudv <∞. For
B →∞ and qd → 0 for all d = 1, . . . , p such that 1B
Ä1q1− 1ä→ 0 and
B∏p
d=2 qd ≥1
np−1ρenand n→∞, λn →∞ and under G generated from (1), then,
(i)
√n
Çρ−en PB(R)− ρ−en P(R)
σ2B(R)
å⇒ N(0, 1) (2)
(ii) Given G , VarÄρ−en Pb(R)|G
ä= O
(Ä1q1− 1ä
1n + 1
nρe−p+1n
·∏pd=2
1λnqd
).
(iii) We can set bootstrap confidence interval for P(R).
Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 13 / 32
Feature and Models of Networks Density Functional Estimation
A General Principle for Estimating Variance
(a)
(b)
If p = |V (R)|, e = |E (R)|,
Var
ÇT (R)
ρen
å=
1Äρen(np
)|Iso(R)|
ä2E
∑S ,T⊆Kn
S,T∼=R,S∩T 6=φ
1(S ,T ⊆ H)
Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 14 / 32
Feature and Models of Networks Density Functional Estimation
A General Principle for Estimating Variance
(c)
(d)
If p = |V (R)|, e = |E (R)|,
Var
ÇT (R)
ρen
å≈ 1Ä
ρen(np
)|Iso(R)|
ä2
∑W⊆Kn,W=S∪T ,S ,T∼=R,|S∩T |=1
1(W ⊆ G )
Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 15 / 32
Feature and Models of Networks Estimation of Latent Variable Density
Outline
1 Introduction and Motivation
2 Feature and Models of Networks
Nonparametric Latent Space Models
Density Functional Estimation
Estimation of Latent Variable Density
Regularization
3 Summary
Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 16 / 32
Feature and Models of Networks Estimation of Latent Variable Density
Block Model Approximation
For fixed number of communities to be K , a community assignment function z
assigns community based on symmetric matrix Mn×n, defined as -
z(M)(i) ≡ z i (M) : {1, . . . , n} → {1, . . . ,K} (3)
The metric we will mainly refer to are
(i) ‖w1 − w2‖22 = infσ
∫ 10
∫ 10 (w1 − w2)2 (u, σv )dudv
(ii) ‖z(1) − z(2)‖H = infπ H(z(1), π ◦ z(2)).
where, σ : [0, 1]→ [0, 1] measure-preserving transformation, π: any permutation of
{1, . . . , n} and H is normalized Hamming distance
H(z(1), z(2)) =1
n
n∑i=1
1(z
(1)i 6= z
(2)i
)(4)
Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 17 / 32
Feature and Models of Networks Estimation of Latent Variable Density
Block Model Approximation
Given z(M), we can form an K × K mean matrix Mz from any symmetric
matrix Mn×n -
Mzab ≡
1
Oab
n∑i=1
n∑j=1
Mij1 (z i = a, z j = b) , 1 ≤ a, b ≤ K , (5)
where,
Oab ≡
nanb, 1 ≤ a, b ≤ K , a 6= b
na(na − 1), 1 ≤ a ≤ K , a = b
where,
na ≡n∑
i=1
1 (z i = a) , 1 ≤ a ≤ K
Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 18 / 32
Feature and Models of Networks Estimation of Latent Variable Density
Block Model Approximation
Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 19 / 32
Feature and Models of Networks Estimation of Latent Variable Density
Estimation of Latent Variable Density
Now, we define the estimate of latent variable density w , based on
adjacency matrix An×n as,
w(x , y ; z) ≡ ρ−1Az(A)zG(x)(A),zG(y)(A), (x , y) ∈ [0, 1]2 (6)
where,
ρ =1(n2
) ∑i>j
Aij and G (x) ≡ mini∈[n]
ßi
n≥ x™
(7)
Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 20 / 32
Feature and Models of Networks Estimation of Latent Variable Density
Estimation of Latent Variable Density
Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 21 / 32
Feature and Models of Networks Estimation of Latent Variable Density
Assumptions
Let w0 be the true latent variable density.
Define z0 ≡ z(P), w0(·, ·; z0) by replacing A by P and ρ by ρ in (6).
Define z ≡ z(A).
Assumptions
A1 Assumption on w0: w0 ≤ M0 <∞.
A2 Assumption on w0 and z : n∧(z0) ≥ ε nK and n∨(z0) ≤ 1
εnK .
A3 Assumption on w0 and z : ‖w0(·, ·)− w0(·, ·; z0)‖2 ≤ µn → 0. Under
conditions we can show µn ≤ M1ε2K2 .
A4 Assumption on z : ‖z − z0‖H = OP (∆n(K )) where, ∆n → 0.
Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 22 / 32
Feature and Models of Networks Estimation of Latent Variable Density
Main Theorem
Theorem (B., Bickel and Wolfe (2014))
Let An×n be the adjacency matrix of a simple random graph under model
equation. Under assumptions A1-A4 and for community assignment
function z ,
‖w0(·, ·)− w(·, ·; z)‖2 = maxßO(µn(K )),OP
ÅK
nρn
ã,OP
ÄK 3/2
√ρn∆n
ä™.
Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 23 / 32
Feature and Models of Networks Estimation of Latent Variable Density
Methods of Obtaining w
Existing Methods
Olhede, Wolfe (2013) proposed a scheme using profile likelihood as the estimation
method.
Airoldi, Chan (2014) proposed a method using degree distribution as the estimation
method.
Latouche and Robin (2013) and Lloyd et.al. (2013) proposed Bayesian methods for
exchangeable network model inference.
Gao, Lu and Zhou (2014) give minimax rates for dense case.
Generalization
Any estimation method of block model, satisfying conditions on estimation error, can be
used to give w from (6).
Examples of method include maximum likelihood, variational likelihood, spectral
clustering, SDP relaxation and other sufficiently accurate clustering schemes.
Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 24 / 32
Feature and Models of Networks Estimation of Latent Variable Density
Special Case: Spectral Clustering
Theorem 2
Let An×n be the adjacency matrix of a simple random graph under model
equation. Assume A1-A3 for spectral assignment function zsp and γn is the
absolute difference between the K and (K + 1)th eigenvalue of P. As n→∞,
‖w0(·, ·)− w(·, ·; z)‖2 = maxßO(µn(K )),OP
ÅK
nρn
ã,OP
ÄK 3/2
√ρn∆n
ä™.
where,
∆n(K ) = O
(nK (||P − Pzsp(P)||+ ||A− P||)2
γ2n
)
Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 25 / 32
Feature and Models of Networks Regularization
Outline
1 Introduction and Motivation
2 Feature and Models of Networks
Nonparametric Latent Space Models
Density Functional Estimation
Estimation of Latent Variable Density
Regularization
3 Summary
Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 26 / 32
Feature and Models of Networks Regularization
Regularization: Choice of K (Ongoing Work)
One idea is using cross-validation by using the density functionals.
For size-r subgraphs {ar},
Pr (ar ) ≡ Pr [Aij = aij : 1 ≤ i , j ≤ r ]
=
∫ 1
0· · ·∫ 1
0
∏1≤i<j≤r
[ρnw(ξi , ξj)]aij [1− ρnw(ξi , ξj)]1−aij dξ1 · · · dξr
Define
‖Pr − Qr‖ =∑
ar∈{0,1}r×r
|P [Ar = ar ]− Q [Ar = ar ]| .
Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 27 / 32
Feature and Models of Networks Regularization
Regularization: Choice of K (Ongoing Work)
Pr (K ) is obtained by using w .
‖Pr (K )− P‖22 is estimated by ‖Pr (K )− Pr‖2
2.
Lemma 4
If dcut(P(K ),P) ≤ ∆n and 0 < δ ≤ w , w ≤ 1/δ,
MSE(K ) = ‖Pr (K )− Pr‖ = OP
ÇÇr
2
å2r
2/2∆n(K )
å. (8)
Kopt = argminK
‖Pr (K )− Pr B‖22. (9)
where, Pr B is the bootstrap estimate of Pr .
Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 28 / 32
Feature and Models of Networks Regularization
Facebook Data
Figure : Top left picture is the adjacency matrix of the
network. The rest of the figures represent the w generating the
network for K = 8, 13, 22.
Figure : The
cross-validation test using
r = 3 between actual
network and the estimated
network with number of
clusters K .Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 29 / 32
Conclusion
Outline
1 Introduction and Motivation
2 Feature and Models of Networks
Nonparametric Latent Space Models
Density Functional Estimation
Estimation of Latent Variable Density
Regularization
3 Summary
Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 30 / 32
Conclusion
Future Works
Works in Progress
Extension of subsampling bootstrap for more general statistics.
Provide a proper regularization scheme and general principles under
which block model approximations work.
Extend nonparametric latent space models to more general models.
Verify the usefulness of the method on real network data sets.
Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 31 / 32
Conclusion
References
S. Bhattacharyya (2013) A Study of High-dimensional Clustering and Statistical
Inference of Networks. PhD Thesis.
S. Bhattacharyya and P. J. Bickel (2013) Subsampling bootstrap of count features
of networks. (Under Revision Ann Stat)
S. Bhattacharyya and P. J. Bickel (2013) Community detection in networks using
graph distance. Arxiv.
S. Bhattacharyya, P. J. Bickel and P. J. Wolfe (2014) Estimating Latent Variable
Densities for Exchangeable Network Models. In Progress.
P.J. Bickel and A. Chen (2009) A nonparametric view of network models and
Newman-Girvan and other modularities. PNAS.
P.J. Bickel, A. Chen and E. Levina (2011) The method of moments and degree
distributions for network models. Ann Stat.
P. Wolfe and S. Olhede (2013) Nonparametric graphon estimation. Arxiv.
Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 32 / 32