Estimation of Latent Variable Densities in Networks · Workshop on Theory of Big Data, UCL, January, 2015 (Joint work with Peter J. Bickel, UC Berkeley and Patrick J. Wolfe, UCL)

Estimation of Latent Variable Densities in

Networks

Sharmodeep Bhattacharyya

Department of Statistics

University of California, Berkeley and Oregon State University

Workshop on Theory of Big Data, UCL, January, 2015

(Joint work with Peter J. Bickel, UC Berkeley and Patrick J. Wolfe,

UCL)

Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 1 / 32

Outline

1 Introduction and Motivation

2 Feature and Models of Networks

Nonparametric Latent Space Models

Density Functional Estimation

Estimation of Latent Variable Density

Regularization

3 Summary


Introduction and Motivation

Network Data

G = (V ,E ): undirected graph and V = {v1, · · · , vn} arbitrarily labeled vertices.

Adjacency matrices (Symmetric), [Aij ]ni ,j=1 numerically represent network data:

Aij =

1 if node i links to node j ,

0 otherwise.


Introduction and Motivation

Example: Collegiate Social Network

Figure : Facebook network adjacency matrix for two different colleges in two different rows

(Traud et. al. (2011) SIAM Review).Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 4 / 32

Feature and Models of Networks Nonparametric Models

Outline






Regularization

3 Summary



Nonparametric Latent Variable Models

Derived from representation of exchangeable random infinite array by Aldous and

Hoover (1983).

NP Model

Define P({Aij}ni ,j=1) conditionally given latent variables {ξi}ni=1 associated with vertices

{vi}ni=1 respectively. (Bickel & Chen (2009), Bollobas et.al. (2007), Hoff et.al. (2002)).

ξ1, . . . , ξniid∼ U(0, 1)

Pr(Aij = 1|ξi = u, ξj = v) = hn(u, v) = ρnw(u, v),

w(u, v) is the conditional latent variable density given Aij = 1.

Define λn ≡ nρn as the expected degree parameter and P = [Pij ]ni,j = [ρnw(ξi , ξj)]ni,j .

hn: not uniquely defined. hn(ϕ(u), ϕ(v)

), with measure-preserving ϕ, gives same model.



Stochastic Block Model (Holland, Laskey and Leinhardt 1983)

A K -block stochastic block model with parameters (π,P) is defined as

follows. Consider latent variable corresponding to vertices as

z = (z1, z2, . . . , zn) with

z1, . . . , zniid∼ Multinomial(1; (π1, . . . , πK ))

Pr(Aij = 1|z i , z j) = Pz iz j ,

where P = [Pab] is a K × K symmetric matrix for undirected networks.



Parameters of Interest

Density Functional

Integral parameter on subgraph, R is defined as integral -

P(R) = E

∏(i ,j)∈R

h(ξi , ξj)∏

(i ,j)∈R

(1− h(ξi , ξj))

where, R = {(i , j) /∈ R, i ∈ V (G ), j ∈ V (G )}.

Density

Estimate a representation of the latent variable density w or h.

Estimate equivalence class of latent variable density w or h with

respect to norms of the form of cut-metric (Lovasz (2006)).


Feature and Models of Networks Density Functional Estimation

Outline






Regularization

3 Summary



Empirical “Moments”/ Count statistics

Count statistics are normalized subgraph counts and smooth functions of them.

The subgraph count,P(R), for subgraph R is -

P(R) =1(n

p

)|Hom(R)|

∑S⊆Kn,S∼=R

1(S ⊆ G ) (1)

where, Hom(R) is the group of Homomorphisms of R and Kn is the complete graph on n

vertices.

Examples

(a) Average degree of a network is a count statistic, D = 1n

∑ni=1 Di and Di =

∑j 6=i Aij .

(b) Another well-known statistic is

Transitivity =Normalized Count of ∆

Normalized Count of ∆ + ‘V ′



Computation of Count and Variance of Count Statistics

Counts:

Worst case computational complexity of exact counting of number of

subgraphs, R in Gn is O(np), where, p = |V (R)|.

Computational complexity varies with subgraph and sparsity of graph.

For dense graphs and complex patterns, the approximate counts are

very crude.

Variances: Finding variances of complex patterns also become

theoretically challenging.

So, instead of exact counting we try approximate counting (Similar idea

used by Holmes and Reinert (2004)).



Bootstrap Scheme

1 For bth iterate of the bootstrap, b = 1, . . . ,B,

2 Fix p = Size of R = |V (R)|.

3 Perform random breadth-first search described in Wernicke (2006) with a set

of sampling probabilities (q1, . . . , qp)

4 Calculate Pb(R), given by formula

Pb(R) =1∏p

d=1 qd(np

)|Hom(R)|

∑S∈SRp

1(S ∼= R)

PB(R) =1

B

B∑b=1

Pb(R)

where, SRp is the set of all size-p randomly selected subgraphs of G .



Bootstrap Theorem

Theorem (B. and Bickel (2013))

Suppose R is fixed, acyclic with |V (R)| = p and∫∞

0

∫∞0 w2|R|(u, v)dudv <∞. For

B →∞ and qd → 0 for all d = 1, . . . , p such that 1B

Ä1q1− 1ä→ 0 and

B∏p

d=2 qd ≥1

np−1ρenand n→∞, λn →∞ and under G generated from (1), then,

(i)

√n

Çρ−en PB(R)− ρ−en P(R)

σ2B(R)

å⇒ N(0, 1) (2)

(ii) Given G , VarÄρ−en Pb(R)|G

ä= O

(Ä1q1− 1ä

1n + 1

nρe−p+1n

·∏pd=2

1λnqd

).

(iii) We can set bootstrap confidence interval for P(R).



A General Principle for Estimating Variance

(a)

(b)

If p = |V (R)|, e = |E (R)|,

Var

ÇT (R)

ρen

å=

1Äρen(np

)|Iso(R)|

ä2E

∑S ,T⊆Kn

S,T∼=R,S∩T 6=φ

1(S ,T ⊆ H)



A General Principle for Estimating Variance

(c)

(d)

If p = |V (R)|, e = |E (R)|,

Var

ÇT (R)

ρen

å≈ 1Ä

ρen(np

)|Iso(R)|

ä2

∑W⊆Kn,W=S∪T ,S ,T∼=R,|S∩T |=1

1(W ⊆ G )


Feature and Models of Networks Estimation of Latent Variable Density

Outline






Regularization

3 Summary



Block Model Approximation

For fixed number of communities to be K , a community assignment function z

assigns community based on symmetric matrix Mn×n, defined as -

z(M)(i) ≡ z i (M) : {1, . . . , n} → {1, . . . ,K} (3)

The metric we will mainly refer to are

(i) ‖w1 − w2‖22 = infσ

∫ 10

∫ 10 (w1 − w2)2 (u, σv )dudv

(ii) ‖z(1) − z(2)‖H = infπ H(z(1), π ◦ z(2)).

where, σ : [0, 1]→ [0, 1] measure-preserving transformation, π: any permutation of

{1, . . . , n} and H is normalized Hamming distance

H(z(1), z(2)) =1

n

n∑i=1

1(z

(1)i 6= z

(2)i

)(4)




Given z(M), we can form an K × K mean matrix Mz from any symmetric

matrix Mn×n -

Mzab ≡

1

Oab

n∑i=1

n∑j=1

Mij1 (z i = a, z j = b) , 1 ≤ a, b ≤ K , (5)

where,

Oab ≡

nanb, 1 ≤ a, b ≤ K , a 6= b

na(na − 1), 1 ≤ a ≤ K , a = b

where,

na ≡n∑

i=1

1 (z i = a) , 1 ≤ a ≤ K







Now, we define the estimate of latent variable density w , based on

adjacency matrix An×n as,

w(x , y ; z) ≡ ρ−1Az(A)zG(x)(A),zG(y)(A), (x , y) ∈ [0, 1]2 (6)

where,

ρ =1(n2

) ∑i>j

Aij and G (x) ≡ mini∈[n]

ßi

n≥ x™

(7)






Assumptions

Let w0 be the true latent variable density.

Define z0 ≡ z(P), w0(·, ·; z0) by replacing A by P and ρ by ρ in (6).

Define z ≡ z(A).

Assumptions

A1 Assumption on w0: w0 ≤ M0 <∞.

A2 Assumption on w0 and z : n∧(z0) ≥ ε nK and n∨(z0) ≤ 1

εnK .

A3 Assumption on w0 and z : ‖w0(·, ·)− w0(·, ·; z0)‖2 ≤ µn → 0. Under

conditions we can show µn ≤ M1ε2K2 .

A4 Assumption on z : ‖z − z0‖H = OP (∆n(K )) where, ∆n → 0.



Main Theorem

Theorem (B., Bickel and Wolfe (2014))

Let An×n be the adjacency matrix of a simple random graph under model

equation. Under assumptions A1-A4 and for community assignment

function z ,

‖w0(·, ·)− w(·, ·; z)‖2 = maxßO(µn(K )),OP

ÅK

nρn

ã,OP

ÄK 3/2

√ρn∆n

ä™.



Methods of Obtaining w

Existing Methods

Olhede, Wolfe (2013) proposed a scheme using profile likelihood as the estimation

method.

Airoldi, Chan (2014) proposed a method using degree distribution as the estimation

method.

Latouche and Robin (2013) and Lloyd et.al. (2013) proposed Bayesian methods for

exchangeable network model inference.

Gao, Lu and Zhou (2014) give minimax rates for dense case.

Generalization

Any estimation method of block model, satisfying conditions on estimation error, can be

used to give w from (6).

Examples of method include maximum likelihood, variational likelihood, spectral

clustering, SDP relaxation and other sufficiently accurate clustering schemes.



Special Case: Spectral Clustering

Theorem 2

Let An×n be the adjacency matrix of a simple random graph under model

equation. Assume A1-A3 for spectral assignment function zsp and γn is the

absolute difference between the K and (K + 1)th eigenvalue of P. As n→∞,

‖w0(·, ·)− w(·, ·; z)‖2 = maxßO(µn(K )),OP

ÅK

nρn

ã,OP

ÄK 3/2

√ρn∆n

ä™.

where,

∆n(K ) = O

(nK (||P − Pzsp(P)||+ ||A− P||)2

γ2n

)


Feature and Models of Networks Regularization

Outline






Regularization

3 Summary



Regularization: Choice of K (Ongoing Work)

One idea is using cross-validation by using the density functionals.

For size-r subgraphs {ar},

Pr (ar ) ≡ Pr [Aij = aij : 1 ≤ i , j ≤ r ]

=

∫ 1

0· · ·∫ 1

0

∏1≤i<j≤r

[ρnw(ξi , ξj)]aij [1− ρnw(ξi , ξj)]1−aij dξ1 · · · dξr

Define

‖Pr − Qr‖ =∑

ar∈{0,1}r×r

|P [Ar = ar ]− Q [Ar = ar ]| .



Regularization: Choice of K (Ongoing Work)

Pr (K ) is obtained by using w .

‖Pr (K )− P‖22 is estimated by ‖Pr (K )− Pr‖2

2.

Lemma 4

If dcut(P(K ),P) ≤ ∆n and 0 < δ ≤ w , w ≤ 1/δ,

MSE(K ) = ‖Pr (K )− Pr‖ = OP

ÇÇr

2

å2r

2/2∆n(K )

å. (8)

Kopt = argminK

‖Pr (K )− Pr B‖22. (9)

where, Pr B is the bootstrap estimate of Pr .



Facebook Data

Figure : Top left picture is the adjacency matrix of the

network. The rest of the figures represent the w generating the

network for K = 8, 13, 22.

Figure : The

cross-validation test using

r = 3 between actual

network and the estimated

network with number of

clusters K .Sharmodeep Bhattacharyya (berkeley) Networks January 8, 2015 29 / 32

Conclusion

Outline






Regularization

3 Summary


Conclusion

Future Works

Works in Progress

Extension of subsampling bootstrap for more general statistics.

Provide a proper regularization scheme and general principles under

which block model approximations work.

Extend nonparametric latent space models to more general models.

Verify the usefulness of the method on real network data sets.


Conclusion

References

S. Bhattacharyya (2013) A Study of High-dimensional Clustering and Statistical

Inference of Networks. PhD Thesis.

S. Bhattacharyya and P. J. Bickel (2013) Subsampling bootstrap of count features

of networks. (Under Revision Ann Stat)

S. Bhattacharyya and P. J. Bickel (2013) Community detection in networks using

graph distance. Arxiv.

S. Bhattacharyya, P. J. Bickel and P. J. Wolfe (2014) Estimating Latent Variable

Densities for Exchangeable Network Models. In Progress.

P.J. Bickel and A. Chen (2009) A nonparametric view of network models and

Newman-Girvan and other modularities. PNAS.

P.J. Bickel, A. Chen and E. Levina (2011) The method of moments and degree

distributions for network models. Ann Stat.

P. Wolfe and S. Olhede (2013) Nonparametric graphon estimation. Arxiv.


Documents

Estimation of Latent Variable Densities in Networks · Workshop on Theory of Big Data, UCL, January, 2015 (Joint work with Peter J. Bickel, UC Berkeley and Patrick J. Wolfe, UCL)