1
Stochastic Blockmodel Approximation of a Graphon: Theory and Consistent Estimation Edoardo M. Airoldi 1 , Thiago B. Costa 2,1 , Stanley H. Chan 2,1 1 Department of Statistics, Harvard University 2 Harvard School of Engineering and Applied Sciences Abstract Non-parametric approaches for analyzing network data based on ex- changeable graph models (ExGM) have recently gained interest. The key object that defines an ExGM is often referred to as a graphon. This non- parametric perspective on network modeling poses challenging questions on how to make inference on the graphon underlying observed data. In this paper, we propose a computationally efficient procedure to estimate a graphon from a set of observed networks generated from it. This pro- cedure is based on a stochastic blockmodel approximation (SBA) of the graphon. We show that, by approximating the graphon with a stochas- tic block model, the graphon can be consistently estimated, that is, the estimation error vanishes as the size of the graph approaches infinity. Problem Graphons can be seen as kernel functions for random network models. To construct an n-vertex random graph G (n, w) for a given w, we first assign a random label u i Uniform[0, 1] to each vertex i ∈{1,...,n}, and connect any two vertices i and j with probability w(u i ,u j ), i.e., Pr (G[i, j ]=1 | u i ,u j )= w(u i ,u j ), i, j =1,...,n, u i u j w × (u i ,u j ) w(u i ,u j ) G 1 G 2T Figure 1: [Left] We draw i.i.d. samples u i , u j from Uniform[0,1] and assign G t [i, j ]=1 with probability w(u i ,u j ), for t =1,..., 2T . [Middle] Heat map of a graphon w. [Right] A random graph generated by the graphon shown in the middle. The problem of interest is defined as follows: Given a sequence of 2T observed directed graphs G 1 ,...,G 2T , can we make an estimate b w of w, such that b w w with high probability as n →∞? (In this problem we assume that the observed graphs share the same set of vertices, in a way that the i-th vertex have the same position u i in all graphs) Similarity of graphon slices To measure the similarity between two labels using the graphon slices, we define the following distance d ij = 1 2 Z 1 0 [w(x, u i ) - w(x, u j )] 2 dx + Z 1 0 [w(u i ,y ) - w(u j ,y )] 2 dy = 1 2 h (c ii - c ij - c ji + c jj )+(r ii - r ij - r ji + r jj ) i where c ij = Z 1 0 w(x, u i )w(x, u j )dx and r ij = Z 1 0 w(u i ,y )w(u j ,y )dy. We consider the following estimators for c ij and r ij : c k ij = 1 T 2 X 1t 1 T G t 1 [k, i] X T <t 2 2T G t 2 [k, j ] , r k ij = 1 T 2 X 1t 1 T G t 1 [i, k] X T <t 2 2T G t 2 [j, k] . Summing all possible k ’s yields an estimator ˆ d ij that looks similar to d ij : ˆ d ij = 1 2 " 1 S X k∈S ( ˆ r k ii - ˆ r k ij - ˆ r k ji r k jj ) + ( ˆ c k ii - ˆ c k ij - ˆ c k ji c k jj ) # , where S = {1,...,n}\{i, j } is the set of summation indices. Theorem 1 The estimator ˆ d ij for d ij is unbiased and satisfies P(|d ij - ˆ d ij | >) 8e - S 2 32/T +8/3 , for any > 0. Algorithm (SBA) To cluster the unknown labels {u 1 ,...,u n } we propose a greedy ap- proach as shown in Algorithm 1. Starting with Ω= {u 1 ,...,u n }, we randomly pick a node i p and call it the pivot. Then for all other ver- tices i v Ω\{i p }, we compute the distance b d i p ,i v and check whether b d i p ,i v < Δ 2 for some precision parameter Δ > 0. If b d i p ,i v < Δ 2 , then we assign i v to the same block as i p . Therefore, after scan- ning through Ω once, a block b B = {i p ,i v 1 ,i v 2 ,...} will be defined. By updating Ω as Ω Ω\ b B , the process repeats until Ω = . Algorithm 1: Clustering the vertices Input: Observed graphs G 1 ,...,G 2T and precision parameter Δ Output: Estimated stochastic blocks b B 1 ,..., b B K Initialize: Ω= {1,...,n}, and k =1; while Ω 6= do Randomly choose a vertex i p from Ω and assign it as the pivot for b B k : b B k i p ; for i v Ω\{i p } do Compute the distance estimate b d i p ,i v ; if b d i p ,i v Δ 2 then assign i v as a member of b B k : b B k i v ; end end Update Ω Ω\ b B k ; Update k k +1; end Once the blocks b B 1 ,..., b B K are defined, we can then determine b w(u i ,u j ) by computing the empirical frequency of edges that are present across blocks b B i and b B j : b w(u i ,u j )= 1 | b B i || b B j | X i x b B i X j y b B j 1 2T (G 1 [i x ,j y ]+ G 2 [i x ,j y ]+ ... + G 2T [i x ,j y ]) where b B i is the block containing u i . Consistency The performance of the Algorithm 1 depends on the number of blocks it defines. On the one hand, it is desirable to have more blocks so that the graphon can be finely approximated. But on the other hand, if the num- ber of blocks is too large then each block will contain only few vertices, what might be a problem because in order to estimate the probabilities of connection, a sufficient number of vertices in each block is required. The trade-off between these two cases is controlled by the precision param- eter Δ: a large Δ generates few large clusters, while small Δ generates many small clusters. The following theorems shows how to balance the choice of Δ in order to achieve consistency. Theorem 2 Let Δ be the accuracy parameter and K be the number of blocks estimated by Algorithm 1, then Pr " K> QL 2 Δ # 8n 2 e - SΔ 4 128/T +16Δ 2 /3 , where L is the Lipschitz constant and Q is the number of Lipschitz blocks in w. Theorem 3 If S Θ(n) and Δ ω log(n) n 1 4 o(1), then lim n→∞ E[MAE( b w)] = 0 and lim n→∞ E[MSE( b w)] = 0. where MSE( b w)= 1 n 2 n X i v =1 n X j v =1 (w(u i v ,u j v ) - b w(u i v ,u j v )) 2 MAE( b w)= 1 n 2 n X i v =1 n X j v =1 |w(u i v ,u j v ) - b w(u i v ,u j v )| . Choosing parameter In practice, we estimate Δ using a cross-validation scheme to find the optimal 2D histogram bin width. The idea is to test a sequence of potential values of Δ and seek the one that minimizes the cross validation risk, defined as b J (Δ) = 2 h(n - 1) - n +1 h(n - 1) K X j =1 b p 2 j , where b p j = | ˆ B j |/n and h =1/K . Algorithm 2: Cross Validation Input: Graphs G 1 ,...,G 2T Output: Blocks b B 1 ,..., b B K , and optimal Δ for a sequence of Δ’s do Estimate blocks b B 1 ,..., b B K from G 1 ,...,G 2T . [Algorithm 1]; Compute b p j = | b B j |/n, for j =1,...,K ; Compute b J (Δ) = 2 h(n-1) - n+1 h(n-1) K j =1 b p 2 j , with h =1/K ; end Pick the Δ with minimum b J (Δ), and the corresponding b B 1 ,..., b B K ; Experiments For the purpose of comparison, we consider (i) the universal singu- lar value thresholding (USVT) [Cha2012]; (ii) the largest-gap algorithm (LG) [CRD2012]; (iii) matrix completion from few entries (OptSpace) [KMO2010]. Estimating stochastic blockmodels We generate (arbitrarily) a graphon w = 0.8 0.9 0.4 0.5 0.1 0.6 0.3 0.2 0.3 0.2 0.8 0.3 0.4 0.1 0.2 0.9 , (1) which represents a piecewise constant function with 4 × 4 equi- space blocks. The following figures show the asymptotic behavior of the algorithms when n grows (left), and the estimation error of SBA algorithm as T grows for graphs of size 200 vertices (right). 0 200 400 600 800 1000 -3 -2.5 -2 -1.5 -1 -0.5 n log 10 (MAE) Proposed Largest Gap OptSpace USVT 0 5 10 15 20 25 30 35 40 -3 -2.9 -2.8 -2.7 -2.6 -2.5 -2.4 -2.3 -2.2 -2.1 -2 2T log 10 (MAE) Proposed (a) Growing graph size, n (b) Growing no. observations, 2T Figure 2: [Left] MAE reduces as graph size grows. For the fairness of the amount of data that can be used, we use n 2 × n 2 × 2 observations for SBA, and n × n × 1 observation for USVT and LG. [Right] MAE of the proposed SBA algorithm reduces when more observations T is available. Both plots are averaged over 100 independent trials. Accuracy as a function of growing number of blocks Our second experiment is to evaluate the performance of the al- gorithms as K , the number of blocks, increases. To this end, we consider a sequence of K , and for each K we generate a graphon w of K × K blocks. Each entry of the block is a random number gen- erated from Uniform[0, 1]. Same as the previous experiment, we fix n = 200 and T =1. The experiment is repeated over 100 trials so that in every trial a different graphon is generated. The result shown in (a) indicates that while estimation error increases as K grows, the proposed SBA algorithm still attains the lowest MAE for all K . 0 5 10 15 20 -1.4 -1.3 -1.2 -1.1 -1 -0.9 -0.8 -0.7 K log 10 (MAE) Proposed Largest Gap USVT Figure 3: As K increases, SBA still attains the lowest MAE. Here, we use n 2 × n 2 × 2 observations for SBA, and n × n × 1 observation for USVT and LG Experiments Estimation with missing edges Our next experiment is to evaluate the performance of proposed SBA algorithm when there are miss- ing edges in the observed graph. To model missing edges, we con- struct an n × n binary matrix M with probability Pr[M [i, j ] = 0] = ξ , where 0 ξ 1 defines the percentage of missing edges. Given ξ , 2T matrices are generated with missing edges, and the observed graphs are defined as M 1 G 1 ,...,M 2T G 2T , where denotes the element-wise multiplication. The goal is to study how well SBA can reconstruct the graphon b w in the presence of missing links. 0 5 10 15 20 -1.4 -1.3 -1.2 -1.1 -1 -0.9 -0.8 -0.7 K log 10 (MAE) Proposed Largest Gap USVT Figure 4: Estimation of graphon in the presence of missing links: As the amount of missing links increases, estimation error also increases. Estimating continuous graphons Our final experiment is to evaluate the proposed SBA algorithm in estimating continuous graphons. Here, we consider two of the graphons reported in [Cha 2012]: w 1 (u, v )= 1 1 + exp{-50(u 2 + v 2 )} , and w 2 (u, v )= uv, where u, v [0, 1]. Here, w 2 can be considered as a special 0 200 400 600 800 1000 -3.2 -3.15 -3.1 -3.05 -3 -2.95 -2.9 n log 10 (MAE) Proposed USVT 0 200 400 600 800 1000 -2 -1.8 -1.6 -1.4 -1.2 -1 -0.8 -0.6 n log 10 (MAE) Proposed USVT Figure 5: Comparison between SBA and USVT in estimating two continuous graphons w 1 and w 2 . Evidently, SBA performs better for high-rank w 1 (left) and worse for low-rank w 2 (right). Concluding remarks We presented a new computational tool for estimating graphons. The proposed algorithm approximates the continuous graphon by a stochastic block-model, in which the first step is to cluster the unknown vertex labels into blocks by using an empirical estimate of the distance between two graphon slices, and the second step is to build an empirical histogram to estimate the graphon. Complete consistency analysis of the algorithm is derived. The algorithm was evaluated experimentally, and we found that the algorithm is effective in estimating block structured graphons. References [Cha2012] S. Chatterjee. Matrix estimation by universal singular value thresholding. ArXiv:1212.1247. 2012. [CDR2012] A. Channarond, J. Daudin, and S. Robin. Classification and estimation in the Stochastic Blockmodel based on the empirical degrees. Electronic Journal of Statistics, 6:2574-2601, 2012. [KMO2010] R.H. Keshavan, A. Montanari, and S. Oh. Matrix completion from a few entries. IEEE Trans. Information Theory, 56:2980-2998, Jun. 2010. [LOGR2012] J.R. Lloyd, P. Orbanz, Z. Ghahramani, and D.M. Roy. Random function priors for exchangeable arrays with applications to graphs and relational data. In Neural Information Processing Sys- tems (NIPS), 2012. [MGJ2009] K.T. Miller, T.L. Griffiths, and M.I. Jordan. Nonparamet- ric latent fature models for link prediction. In Neural Information Processing Systems (NIPS), 2009.

Stochastic Blockmodel Approximation of a Graphon: Theory ... · Stochastic Blockmodel Approximation of a Graphon: Theory and Consistent Estimation ... Thiago B. Costa 2;, ... where

Embed Size (px)

Citation preview

Page 1: Stochastic Blockmodel Approximation of a Graphon: Theory ... · Stochastic Blockmodel Approximation of a Graphon: Theory and Consistent Estimation ... Thiago B. Costa 2;, ... where

Stochastic Blockmodel Approximation of a Graphon: Theory and Consistent EstimationEdoardo M. Airoldi1, Thiago B. Costa2,1, Stanley H. Chan2,1

1Department of Statistics, Harvard University 2Harvard School of Engineering and Applied Sciences

AbstractNon-parametric approaches for analyzing network data based on ex-changeable graph models (ExGM) have recently gained interest. The keyobject that defines an ExGM is often referred to as a graphon. This non-parametric perspective on network modeling poses challenging questionson how to make inference on the graphon underlying observed data. Inthis paper, we propose a computationally efficient procedure to estimatea graphon from a set of observed networks generated from it. This pro-cedure is based on a stochastic blockmodel approximation (SBA) of thegraphon. We show that, by approximating the graphon with a stochas-tic block model, the graphon can be consistently estimated, that is, theestimation error vanishes as the size of the graph approaches infinity.

ProblemGraphons can be seen as kernel functions for random network models.To construct an n-vertex random graph G(n,w) for a given w, we firstassign a random label ui ∼ Uniform[0, 1] to each vertex i ∈ 1, . . . , n,and connect any two vertices i and j with probability w(ui, uj), i.e.,

Pr (G[i, j] = 1 | ui, uj) = w(ui, uj), i, j = 1, . . . , n,

ui

uj

w

×

(ui, uj)

w(ui, uj)

G1

G2T

Figure 1: [Left] We draw i.i.d. samples ui, uj from Uniform[0,1] and assignGt[i, j] = 1 with probability w(ui, uj), for t = 1, . . . , 2T . [Middle] Heat mapof a graphon w. [Right] A random graph generated by the graphon shownin the middle.

The problem of interest is defined as follows: Given a sequence of 2Tobserved directed graphs G1, . . . , G2T , can we make an estimate w of w,such that w → w with high probability as n → ∞? (In this problem weassume that the observed graphs share the same set of vertices, in a waythat the i-th vertex have the same position ui in all graphs)

Similarity of graphon slicesTo measure the similarity between two labels using the graphon slices,we define the following distance

dij =1

2

(∫ 1

0

[w(x, ui)− w(x, uj)]2dx+

∫ 1

0

[w(ui, y)− w(uj , y)]2dy

)

=1

2

[(cii − cij − cji + cjj) + (rii − rij − rji + rjj)

]

where

cij =

∫ 1

0

w(x, ui)w(x, uj)dx and rij =

∫ 1

0

w(ui, y)w(uj , y)dy.

We consider the following estimators for cij and rij :

ckij =1

T 2

∑1≤t1≤T

Gt1 [k, i]

∑T<t2≤2T

Gt2 [k, j]

,

rkij =1

T 2

∑1≤t1≤T

Gt1 [i, k]

∑T<t2≤2T

Gt2 [j, k]

.

Summing all possible k’s yields an estimator dij that looks similar to dij :

dij =1

2

[1

S

k∈S

(rkii − rkij − rkji + rkjj

)+(ckii − ckij − ckji + ckjj

)],

where S = 1, . . . , n\i, j is the set of summation indices.

Theorem 1 The estimator dij for dij is unbiased and satisfies

P(|dij − dij | > ε) ≤ 8e−Sε2

32/T+8ε/3 ,

for any ε > 0.

Algorithm (SBA)To cluster the unknown labels u1, . . . , un we propose a greedy ap-proach as shown in Algorithm 1. Starting with Ω = u1, . . . , un, werandomly pick a node ip and call it the pivot. Then for all other ver-tices iv ∈ Ω\ip, we compute the distance dip,iv and check whetherdip,iv < ∆2 for some precision parameter ∆ > 0. If dip,iv < ∆2,then we assign iv to the same block as ip. Therefore, after scan-ning through Ω once, a block B = ip, iv1

, iv2, . . . will be defined.

By updating Ω as Ω ← Ω\B, the process repeats until Ω = ∅.

Algorithm 1: Clustering the verticesInput: Observed graphs G1, . . . , G2T and precision parameter ∆Output: Estimated stochastic blocks B1, . . . , BK

Initialize: Ω = 1, . . . , n, and k = 1;while Ω 6= ∅ do

Randomly choose a vertex ip from Ω and assign it as the pivot for Bk:Bk ← ip;for iv ∈ Ω\ip do

Compute the distance estimate dip,iv ;if dip,iv ≤ ∆2 then

assign iv as a member of Bk: Bk ← iv;end

endUpdate Ω← Ω\Bk;Update k ← k + 1;

end

Once the blocks B1, . . . , BK are defined, we can then determine w(ui, uj)by computing the empirical frequency of edges that are present acrossblocks Bi and Bj :

w(ui, uj) =1

|Bi| |Bj |

∑ix∈Bi

∑jy∈Bj

1

2T(G1[ix, jy] + G2[ix, jy] + . . . + G2T [ix, jy])

where Bi is the block containing ui.

ConsistencyThe performance of the Algorithm 1 depends on the number of blocks itdefines. On the one hand, it is desirable to have more blocks so that thegraphon can be finely approximated. But on the other hand, if the num-ber of blocks is too large then each block will contain only few vertices,what might be a problem because in order to estimate the probabilities ofconnection, a sufficient number of vertices in each block is required. Thetrade-off between these two cases is controlled by the precision param-eter ∆: a large ∆ generates few large clusters, while small ∆ generatesmany small clusters. The following theorems shows how to balance thechoice of ∆ in order to achieve consistency.

Theorem 2 Let ∆ be the accuracy parameter and K be the number ofblocks estimated by Algorithm 1, then

Pr

[K >

QL√

2

]≤ 8n2e

− S∆4

128/T+16∆2/3 ,

where L is the Lipschitz constant and Q is the number of Lipschitz blocksin w.

Theorem 3 If S ∈ Θ(n) and ∆ ∈ ω((

log(n)n

) 14

)∩ o(1), then

limn→∞

E[MAE(w)] = 0 and limn→∞

E[MSE(w)] = 0.

where

MSE(w) =1

n2

n∑

iv=1

n∑

jv=1

(w(uiv , ujv )− w(uiv , ujv ))2

MAE(w) =1

n2

n∑

iv=1

n∑

jv=1

|w(uiv , ujv )− w(uiv , ujv )| .

Choosing parameterIn practice, we estimate ∆ using a cross-validation scheme to find theoptimal 2D histogram bin width. The idea is to test a sequence of potentialvalues of ∆ and seek the one that minimizes the cross validation risk,defined as

J(∆) =2

h(n− 1)− n+ 1

h(n− 1)

K∑

j=1

p2j ,

where pj = |Bj |/n and h = 1/K.

Algorithm 2: Cross ValidationInput: Graphs G1, . . . , G2T

Output: Blocks B1, . . . , BK , and optimal ∆for a sequence of ∆’s do

Estimate blocks B1, . . . , BK from G1, . . . , G2T . [Algorithm 1];Compute pj = |Bj |/n, for j = 1, . . . ,K;Compute J(∆) = 2

h(n−1)− n+1

h(n−1)

∑Kj=1 p

2j , with h = 1/K;

endPick the ∆ with minimum J(∆), and the corresponding B1, . . . , BK ;

ExperimentsFor the purpose of comparison, we consider (i) the universal singu-lar value thresholding (USVT) [Cha2012]; (ii) the largest-gap algorithm(LG) [CRD2012]; (iii) matrix completion from few entries (OptSpace)[KMO2010].

• Estimating stochastic blockmodels We generate (arbitrarily) agraphon

w =

0.8 0.9 0.4 0.50.1 0.6 0.3 0.20.3 0.2 0.8 0.30.4 0.1 0.2 0.9

, (1)

which represents a piecewise constant function with 4 × 4 equi-space blocks. The following figures show the asymptotic behaviorof the algorithms when n grows (left), and the estimation error ofSBA algorithm as T grows for graphs of size 200 vertices (right).

0 200 400 600 800 1000-3

-2.5

-2

-1.5

-1

-0.5

n

log10(M

AE)

ProposedLargest GapOptSpaceUSVT

0 5 10 15 20 25 30 35 40-3

-2.9

-2.8

-2.7

-2.6

-2.5

-2.4

-2.3

-2.2

-2.1

-2

2T

log10(M

AE)

Proposed

(a) Growing graph size, n (b) Growing no. observations, 2T

Figure 2: (a) MAE reduces as graph size grows. For the fairness of the amount of data that can beused, we use n

2 × n2 × 2 observations for SBA, and n × n × 1 observation for USVT [6] and LG

[5]. (b) MAE of the proposed SBA algorithm reduces when more observations T is available. Bothplots are averaged over 100 independent trials.

Accuracy as a function of growing number of blocks. Our second experiment is to evaluate theperformance of the algorithms as K , the number of blocks, increases. To this end, we consider asequence of K , and for each K we generate a graphon w of K × K blocks. Each entry of theblock is a random number generated from Uniform[0, 1]. Same as the previous experiment, we fixn = 200 and T = 1. The experiment is repeated over 100 trials so that in every trial a differentgraphon is generated. The result shown in Figure 3(a) indicates that while estimation error increasesas K grows, the proposed SBA algorithm still attains the lowest MAE for all K .

0 5 10 15 20-1.4

-1.3

-1.2

-1.1

-1

-0.9

-0.8

-0.7

K

log10(M

AE)

ProposedLargest GapUSVT

0 5 10 15 20-1.6

-1.5

-1.4

-1.3

-1.2

-1.1

-1

-0.9

-0.8

-0.7

-0.6

% missing links

log10(M

AE)

ProposedLargest GapOptSpaceUSVT

(a) Growing no. blocks, K (b) Missing links

Figure 3: (a) As K increases, MAE of all three algorithm increases but SBA still attains the lowestMAE. Here, we use n

2 × n2 × 2 observations for SBA, and n× n× 1 observation for USVT [6] and

LG [5]. (b) Estimation of graphon in the presence of missing links: As the amount of missing linksincreases, estimation error also increases.

4.2 Estimation with missing edges

Our next experiment is to evaluate the performance of proposed SBA algorithm when there aremissing edges in the observed graph. To model missing edges, we construct an n× n binary matrixM with probability Pr[M [i, j] = 0] = ξ, where 0 ≤ ξ ≤ 1 defines the percentage of missingedges. Given ξ, 2T matrices are generated with missing edges, and the observed graphs are definedas M1 ⊙ G1, . . . ,M2T ⊙ G2T , where ⊙ denotes the element-wise multiplication. The goal is tostudy how well SBA can reconstruct the graphon w in the presence of missing links.

7

Figure 2: [Left] MAE reduces as graph size grows. For the fairnessof the amount of data that can be used, we use n

2× n

2×2 observations

for SBA, and n× n× 1 observation for USVT and LG. [Right] MAEof the proposed SBA algorithm reduces when more observations T isavailable. Both plots are averaged over 100 independent trials.

• Accuracy as a function of growing number of blocks

Our second experiment is to evaluate the performance of the al-gorithms as K, the number of blocks, increases. To this end, weconsider a sequence of K, and for each K we generate a graphonw ofK×K blocks. Each entry of the block is a random number gen-erated from Uniform[0, 1]. Same as the previous experiment, we fixn = 200 and T = 1. The experiment is repeated over 100 trials sothat in every trial a different graphon is generated. The result shownin (a) indicates that while estimation error increases asK grows, theproposed SBA algorithm still attains the lowest MAE for all K.

0 200 400 600 800 1000-3

-2.5

-2

-1.5

-1

-0.5

n

log10(M

AE)

ProposedLargest GapOptSpaceUSVT

0 5 10 15 20 25 30 35 40-3

-2.9

-2.8

-2.7

-2.6

-2.5

-2.4

-2.3

-2.2

-2.1

-2

2T

log10(M

AE)

Proposed

(a) Growing graph size, n (b) Growing no. observations, 2T

Figure 2: (a) MAE reduces as graph size grows. For the fairness of the amount of data that can beused, we use n

2 × n2 × 2 observations for SBA, and n × n × 1 observation for USVT [6] and LG

[5]. (b) MAE of the proposed SBA algorithm reduces when more observations T is available. Bothplots are averaged over 100 independent trials.

Accuracy as a function of growing number of blocks. Our second experiment is to evaluate theperformance of the algorithms as K , the number of blocks, increases. To this end, we consider asequence of K , and for each K we generate a graphon w of K × K blocks. Each entry of theblock is a random number generated from Uniform[0, 1]. Same as the previous experiment, we fixn = 200 and T = 1. The experiment is repeated over 100 trials so that in every trial a differentgraphon is generated. The result shown in Figure 3(a) indicates that while estimation error increasesas K grows, the proposed SBA algorithm still attains the lowest MAE for all K .

0 5 10 15 20-1.4

-1.3

-1.2

-1.1

-1

-0.9

-0.8

-0.7

K

log10(M

AE)

ProposedLargest GapUSVT

0 5 10 15 20-1.6

-1.5

-1.4

-1.3

-1.2

-1.1

-1

-0.9

-0.8

-0.7

-0.6

% missing links

log10(M

AE)

ProposedLargest GapOptSpaceUSVT

(a) Growing no. blocks, K (b) Missing links

Figure 3: (a) As K increases, MAE of all three algorithm increases but SBA still attains the lowestMAE. Here, we use n

2 × n2 × 2 observations for SBA, and n× n× 1 observation for USVT [6] and

LG [5]. (b) Estimation of graphon in the presence of missing links: As the amount of missing linksincreases, estimation error also increases.

4.2 Estimation with missing edges

Our next experiment is to evaluate the performance of proposed SBA algorithm when there aremissing edges in the observed graph. To model missing edges, we construct an n× n binary matrixM with probability Pr[M [i, j] = 0] = ξ, where 0 ≤ ξ ≤ 1 defines the percentage of missingedges. Given ξ, 2T matrices are generated with missing edges, and the observed graphs are definedas M1 ⊙ G1, . . . ,M2T ⊙ G2T , where ⊙ denotes the element-wise multiplication. The goal is tostudy how well SBA can reconstruct the graphon w in the presence of missing links.

7

Figure 3: As K increases, SBA still attains the lowest MAE. Here,we use n

2× n

2× 2 observations for SBA, and n × n × 1 observation

for USVT and LG

Experiments• Estimation with missing edges Our next experiment is to evaluate

the performance of proposed SBA algorithm when there are miss-ing edges in the observed graph. To model missing edges, we con-struct an n× n binary matrix M with probability Pr[M [i, j] = 0] = ξ,where 0 ≤ ξ ≤ 1 defines the percentage of missing edges. Givenξ, 2T matrices are generated with missing edges, and the observedgraphs are defined as M1 G1, . . . ,M2T G2T , where denotesthe element-wise multiplication. The goal is to study how well SBAcan reconstruct the graphon w in the presence of missing links.

0 200 400 600 800 1000-3

-2.5

-2

-1.5

-1

-0.5

n

log10(M

AE)

ProposedLargest GapOptSpaceUSVT

0 5 10 15 20 25 30 35 40-3

-2.9

-2.8

-2.7

-2.6

-2.5

-2.4

-2.3

-2.2

-2.1

-2

2T

log10(M

AE)

Proposed

(a) Growing graph size, n (b) Growing no. observations, 2T

Figure 2: (a) MAE reduces as graph size grows. For the fairness of the amount of data that can beused, we use n

2 × n2 × 2 observations for SBA, and n × n × 1 observation for USVT [6] and LG

[5]. (b) MAE of the proposed SBA algorithm reduces when more observations T is available. Bothplots are averaged over 100 independent trials.

Accuracy as a function of growing number of blocks. Our second experiment is to evaluate theperformance of the algorithms as K , the number of blocks, increases. To this end, we consider asequence of K , and for each K we generate a graphon w of K × K blocks. Each entry of theblock is a random number generated from Uniform[0, 1]. Same as the previous experiment, we fixn = 200 and T = 1. The experiment is repeated over 100 trials so that in every trial a differentgraphon is generated. The result shown in Figure 3(a) indicates that while estimation error increasesas K grows, the proposed SBA algorithm still attains the lowest MAE for all K .

0 5 10 15 20-1.4

-1.3

-1.2

-1.1

-1

-0.9

-0.8

-0.7

K

log10(M

AE)

ProposedLargest GapUSVT

0 5 10 15 20-1.6

-1.5

-1.4

-1.3

-1.2

-1.1

-1

-0.9

-0.8

-0.7

-0.6

% missing links

log10(M

AE)

ProposedLargest GapOptSpaceUSVT

(a) Growing no. blocks, K (b) Missing links

Figure 3: (a) As K increases, MAE of all three algorithm increases but SBA still attains the lowestMAE. Here, we use n

2 × n2 × 2 observations for SBA, and n× n× 1 observation for USVT [6] and

LG [5]. (b) Estimation of graphon in the presence of missing links: As the amount of missing linksincreases, estimation error also increases.

4.2 Estimation with missing edges

Our next experiment is to evaluate the performance of proposed SBA algorithm when there aremissing edges in the observed graph. To model missing edges, we construct an n× n binary matrixM with probability Pr[M [i, j] = 0] = ξ, where 0 ≤ ξ ≤ 1 defines the percentage of missingedges. Given ξ, 2T matrices are generated with missing edges, and the observed graphs are definedas M1 ⊙ G1, . . . ,M2T ⊙ G2T , where ⊙ denotes the element-wise multiplication. The goal is tostudy how well SBA can reconstruct the graphon w in the presence of missing links.

7

Figure 4: Estimation of graphon in the presence of missing links: Asthe amount of missing links increases, estimation error also increases.

• Estimating continuous graphons

Our final experiment is to evaluate the proposed SBA algorithmin estimating continuous graphons. Here, we consider two of thegraphons reported in [Cha 2012]:

w1(u, v) =1

1 + exp−50(u2 + v2) , and w2(u, v) = uv,

where u, v ∈ [0, 1]. Here, w2 can be considered as a special

The modification of the proposed SBA algorithm for the case missing links is minimal: when com-puting (6), instead of averaging over all ix ∈ Bi and jy ∈ Bj , we only average ix ∈ Bi and jy ∈ Bj

that are not masked out by all M ′s. Figure 3(b) shows the result of average over 100 independenttrials. Here, we consider the graphon given in (12), with n = 200 and T = 1. It is evident that SBAoutperforms its counterparts at a lower rate of missing links.

4.3 Estimating continuous graphons

Our final experiment is to evaluate the proposed SBA algorithm in estimating continuous graphons.Here, we consider two of the graphons reported in [6]:

w1(u, v) =1

1 + exp−50(u2 + v2) , and w2(u, v) = uv,

where u, v ∈ [0, 1]. Here, w2 can be considered as a special case of the Eigenmodel [11] or latentfeature relational model [19].

The results in Figure 4 shows that while both algorithms have improved estimates when n grows, theperformance depends on which of w1 and w2 that we are studying. This suggests that in practice thechoice of the algorithm should depend on the expected structure of the graphon to be estimated: If thegraph generated by the graphon demonstrates some low-rank properties, then USVT is likely to bea better option. For more structured or complex graphons the proposed procedure is recommended.

0 200 400 600 800 1000-3.2

-3.15

-3.1

-3.05

-3

-2.95

-2.9

n

log10(M

AE)

ProposedUSVT

0 200 400 600 800 1000-2

-1.8

-1.6

-1.4

-1.2

-1

-0.8

-0.6

n

log10(M

AE)

ProposedUSVT

(a) graphon w1 (b) graphon w2

Figure 4: Comparison between SBA and USVT in estimating two continuous graphons w1 and w2.Evidently, SBA performs better for w1 (high-rank) and worse for w2 (low-rank).

5 Concluding remarks

We presented a new computational tool for estimating graphons. The proposed algorithm approx-imates the continuous graphon by a stochastic block-model, in which the first step is to clusterthe unknown vertex labels into blocks by using an empirical estimate of the distance between twographon slices, and the second step is to build an empirical histogram to estimate the graphon. Com-plete consistency analysis of the algorithm is derived. The algorithm was evaluated experimentally,and we found that the algorithm is effective in estimating block structured graphons.

Acknowledgments. EMA is partially supported by NSF CAREER award IIS-1149662, ARO MURIaward W911NF-11-1-0036, and an Alfred P. Sloan Research Fellowship. SHC is partially supportedby a Croucher Foundation Post-Doctoral Research Fellowship.

References[1] D.J. Aldous. Representations for partially exchangeable arrays of random variables. Journal of Multi-

variate Analysis, 11:581–598, 1981.

8

Figure 5: Comparison between SBA and USVT in estimating twocontinuous graphons w1 and w2. Evidently, SBA performs better forhigh-rank w1 (left) and worse for low-rank w2 (right).

Concluding remarksWe presented a new computational tool for estimating graphons. Theproposed algorithm approximates the continuous graphon by a stochasticblock-model, in which the first step is to cluster the unknown vertex labelsinto blocks by using an empirical estimate of the distance between twographon slices, and the second step is to build an empirical histogram toestimate the graphon. Complete consistency analysis of the algorithm isderived. The algorithm was evaluated experimentally, and we found thatthe algorithm is effective in estimating block structured graphons.

References• [Cha2012] S. Chatterjee. Matrix estimation by universal singular

value thresholding. ArXiv:1212.1247. 2012.

• [CDR2012] A. Channarond, J. Daudin, and S. Robin. Classificationand estimation in the Stochastic Blockmodel based on the empiricaldegrees. Electronic Journal of Statistics, 6:2574-2601, 2012.

• [KMO2010] R.H. Keshavan, A. Montanari, and S. Oh. Matrixcompletion from a few entries. IEEE Trans. Information Theory,56:2980-2998, Jun. 2010.

• [LOGR2012] J.R. Lloyd, P. Orbanz, Z. Ghahramani, and D.M. Roy.Random function priors for exchangeable arrays with applications tographs and relational data. In Neural Information Processing Sys-tems (NIPS), 2012.

• [MGJ2009] K.T. Miller, T.L. Griffiths, and M.I. Jordan. Nonparamet-ric latent fature models for link prediction. In Neural InformationProcessing Systems (NIPS), 2009.