2
Conjecture DPE for Graph Classification consistently finds J Proof. in progress... References [1] Vogelstein, et al. Are mental properties supervenient on brain properties? Nature Scientific Reports, 2011. [2] Vogelstein, et al. Graph Classification using Signal Subgraphs: Applications in Statistical Connectomics. Submitted to IEEE PAMI (available on arxiv). [3] Vogelstein, et al. Shued Graph Classification: Theory and Connectome Applications To be submitted to IEEE PAMI (and arxiv) any day now (available upon request).... [4] Vogelstein, et al. Fast Inexact Graph Matching with Applications in Statistical Connectomics To be submitted to IEEE PAMI (and arxiv) any day now (available upon request).... [5] Sussman, et al. A consistent dot product embedding for stochastic blockmodel graphs. Submitted to JASA (available on arxiv). [6] Priebe, et al. Optimizing the quantity/quality trade-oin connectome inference. Communications in Statistics - Theory and Methods, to appear. Random Variables Adjacency Matrix: A : ! A {0, 1} nv nv Latent In-Vectors: X = ! X R d nv + Latent Out-Vectors: Y = ! Y R d nv + Parameter =(, ) In- and Out- Vec Likelihood: X , Y 24 3 Block Membership Function: :[n v ] ! [3] Sampling Distribution (A, Y ), D ns = {(A i , Y i )} i 2[ns ] F A,Y F A,Y = Y (u,v )2E Bern ( a uv ; hX u , Y v i ) X (u )Y (v ) 2 F A,Y Input: A, D ns , d Output: ˆ y (and nuisance parameters ˆ X and ˆ Y ) 1: Let ¯ A y = 1 ny P i :y i =y A i be the average adjacency matrix for class y 2: Let [ e U y , e D y , e V y ]= SVD([A y ]) keeping only the d triplets with largest singular values. 3: Cluster e U and e V using a perfect K -means clustering algorithm forcing one cluster to have vertices from all classes, and one cluster for each class. 4: Let ˆ be the cluster assignments for each of the vertices 5: Do a DPE for A, and cluster each vertex accordingly. Let J be the cluster of vertices that are informative with regard to the classification task. 6: Let ˆ y = argmax y 2Y Y u,v 2J Bern ( a uv ; h ˆ X u , ˆ Y v i ) ˆ X u Y v ) Definitions Adjacency Matrices: A, B 2 R nn Permutation Matrices: Q = {Q : Q 1=1, Q T 1=1, Q 2 {0, 1} nn } Doubly Stochastic Matrices: D = {D : D 1=1, D T 1=1, d uv 0} Objective Function (QAP) ˆ Q = argmin Q2Q A - QBQ T F = argmin Q2Q hA, QBQ T i Input: A, B Output: ˆ Q 1: for i =1,..., i max do 2: Let ˆ Q i 1 be either 1 T 1/n, I , or something near I . 3: Use the Frank-Wolfe algorithm to find a local optimum of the following relaxed quadratic assignment problem (rQAP): (rQAP) ˆ Q i 2 = argmin D2D hA, QBQ T i. 4: Project ˆ D onto the Q using the Hungarian Algorithm to obtain ˆ Q i . 5: end for 6: Let ˆ Q = argmin i 2[imax ] hA, Q i BQ T i i Theorem rQAP has the same minimum as QAP whenever A and B are the adjacency matrices of simple graphs isomorphic to one another. Proof. The set of doubly stochastic matrices is the convex hull of the set of permutation matrices. Thus, if a permutation matrix minimizes rQAP then it also minimizes QAP. Moreover, hA, Ai =2m (where m = P A uv ). Thus, it is sucient to show that hA, DBD T i > hA, Ai =2m. This follows because (DBD T ) uv 1. NB: This is parallel to rLAP being equivalent to LAP. Random Variables Adjacency Matrix: A : ! A {0, 1} nv nv Permutation Matrix: Q : ! Q = {Q : q uv 2 {0, 1}, Q 1=1, Q T 1=1} Graph Class: Y : ! Y =[n y ] Sampling Distribution F Q,A,Y (a, y ; )= F Q F A|Y F Y = F A|Y F Y Uni(Q) (Q, A, Y ), D ns = {(Q i , A i , Y i )} i 2[ns ] iid F Q,A,Y 2 F Q,A,Y Random Variables Adjacency Matrix: A : ! A {0, 1} nv nv Graph Class: Y : ! Y =[n y ] Parameter =(P, , S ) Edge Probabilities: P =(p uv |y ) 2 (0, 1) nv nv ny Class Priors: = {0 ,..., n Y } 24 ny Signal Subgraph: S = {(u, v ): p uv |y i 6= p uv |y j 8y i 6= y j } P (n 2 v ) Sampling Distribution F A,Y (a, y ; )= Y uv 2S Bern(a uv ; p uv |y )y Y (u,v )2E\S Bern(a uv ; p uv )y (A, Y ), D ns = {(A i , Y i )} i 2[ns ] iid F A,Y 2 F A,Y Let ˆ L e δ s be the misclassification rate of the above algorithm. Let e L be the Bayes optimal misclassification rate for shued graphs. Theorem ˆ L e δ s ! e L as s !1 Proof. Because the joint space of adjacency matrices, permutation matrices, and graph classes has finite cardinality, the law of large numbers ensures that eventually as s !1, the plurality of nearest neighbors to a test graph will be identical to the test graph. Theorem ˆ S ! S as n s !1 Proof. A and Y are finite, so by the law of large numbers, T (i ) ! " > 0 8i 2 S and T (i ) ! 0 8i / 2 S . Graph Matched Frobenius Norm k s Nearest Neighbor Algorithm Input: A, a rule for k s as s !1 such that k s /s ! 0 and k s !1, D ns Output: ˆ y 1: Compute the graph-matched Frobenius norm distance between A and each training graph: e δ i = argmin Q2Q A - QA i Q T 2 F 2: Rank the distances in decreasing order: e δ (1) e δ (2) ··· e δ (ns ) . 3: Let ˆ y = argmax y 2Y X i :y i =y I{d (i ) k } Input: A, D ns , number of signal edges s and signal vertices m Output: ˆ y , ˆ S (and nuisance parameters ˆ P) 1: Compute the significance of each edge using Fisher’s Exact Test, yielding T (1) T (2) · T (n E ) using D ns . 2: Rank edges by significance with respect to each vertex, E k ,(1) E k ,(2) ... E k ,(n-1) for all k 2 V . 3: while not converged do 4: Increase critical value c from T (i ) to T (i +1) 5: Compute vertex score: w v ;c = P u2[V ] I{T v ,u > c } for each vertex 6: converge if P v 2[m] w v ;c s 7: end while 8: Let ˆ S be the set of s most significant edges incident to the m best scoring vertices. 9: Let ˆ y = argmax y 2Y Q (u,v )2 ˆ S Bern(a uv p uv |y y DPE for Graph Classification Setting We observe a collection of graphs and their associated classes. The vertices may be labeled or unlabeled. We assume that only a subset of vertices are informative with regard to the classification task. Goal For a novel graph, find its most likely class and which vertices encode the class-conditional signal. Statistical Connectomics Application Classify arbitrarily large graphs, include those with and without vertex labels, without necessitating graph matching or estimating O(n 2 ) parameters. Fast Inexact Graph Matching Setting We observe a pair of unlabeled graphs. Goal Find the isomorphism that matches the graphs optimally. Statistical Connectomics Application A subroutine of our shued graph classifier. Shued Graph Classification Setting We observed a collection of graphs without labeled vertices and associated graph classes. Goal For a novel graph, find its most likely class. Statistical Connectomics Application Classify brain-graphs for which vertices lack labels. This includes collections of brain-graphs across species or whenever vertices represent vertebrate neurons. Labeled Graph Classification Setting We observed a collection of graphs with labeled vertices and associated graph classes. We assume that only a subset of edges/vertices are informative with regard to the classification task. Goal For a novel graph, find its most likely class and which edges/vertices encode the class-conditional signal. Statistical Connectomics Application Classify brain-graphs for which vertices are labeled (for example, invertebrate brain-graphs where vertices represent neurons or vertebrate brain-graphs where vertices represent brain regions) and find which edges/vertices encode various cognitive/behavioral properties. 0 50 100 150 200 250 300 0 0.2 0.4 0.6 0.8 1 # training samples missededge rate 0 50 100 150 200 250 300 0.1 0.2 0.3 0.4 0.5 # training samples misclassification rate coh inc nb 10 0 10 1 10 2 10 3 0 0.25 0.5 log assumed # of signal edges misclassification rate incoherent estimator ˆ Lnb=0.41 ˆ Linc=0.27 ˆ Lˆ =0. 5 assumed # of signal edges assumed # of signal vertices coherent estimator ˆ Lcoh=0.16 200 400 600 800 1000 10 20 30 0.16 0.3 0.4 0.5 10 0 10 2 0 0.16 0.25 0.5 log assumed # of signal edges misclassification rate assumed m=12 coherent estimator ˆ Lcoh=0.16 assumed # signal edges assumed # of signal vertices zoomed in coherent estimator 400 500 600 15 18 21 0.16 0.3 0.4 0.5 coherent signal subgraph estimate vertex vertex 20 40 60 20 40 60 threshold coherogram 0.04 0.14 0.29 0.55 20 40 60 0 10 20 30 Figure: (Left) Simulation demonstrates that the coherent classifier outperforms the incoherent classifiers as a function of sample size. (Right) MR connectome sex signal subgraph estimation and analysis. By cross-validating over hyperparameters and models, we estimate that the “best” coherent signal subgraph (for this inference task on these data) has ˆ m coh = 12 and ˆ s coh = 360, achieving ˆ L coh =0.16. 10 2 10 1 10 0 Error Approximate QAP Performance on QAP Benchmark Library chr12c chr15a chr15c chr20b chr22b esc16b rou12 rou15 rou20 tai10a tai15a tai17a tai20a tai30a tai35a tai40a QAP 100 QAP 3 QAP 1 PSOA chemical electrical unit Accuracy 100 (0) 59 (0.30) % Restarts 3 (0) 25 (6.7) # Solution Time 42 (0.42) 79 (20) sec. 0 10 20 30 40 50 0.35 0.4 0.45 0.5 0.55 δ ˜ δ ˆ δ 0 number of training samples misclassification rate Connectome Classifier Comparison Figure: Connectome misclassification rates for various classifiers. 2000 Monte Carlo sub-samples of the data were performed for each s , such that errorbars were neglibly small. Five classifiers were compared: δ is the k s NN classifier labeled graphs; is the k s NN on a collection of graph invariantsis chance, and δ 0 is the k s NN on shued graphs (without graph-matching). Large Graph Classification: Theory and Statistical Connectomics Applications Joshua T. Vogelstein, Donniell E. Fishkind, Daniel L. Sussman & Carey E. Priebe | JHU, Dept of Applied Math & Statistics Model Alg Theory Task Data

Joshua T. Vogelstein, Donniell E. Fishkind, Daniel L ... · Input: A, Dns, number of signal edges s and signal vertices m Output: ˆy, Sˆ ... Large Graph Classification: Theory

Embed Size (px)

Citation preview

Conjecture

DPE for Graph Classification consistently finds JProof.in progress...

References

[1] Vogelstein, et al. Are mental properties supervenient on brainproperties? Nature Scientific Reports, 2011.[2] Vogelstein, et al. Graph Classification using Signal Subgraphs:Applications in Statistical Connectomics. Submitted to IEEE PAMI(available on arxiv).[3] Vogelstein, et al. Shu✏ed Graph Classification: Theory andConnectome Applications To be submitted to IEEE PAMI (and arxiv) anyday now (available upon request)....[4] Vogelstein, et al. Fast Inexact Graph Matching with Applications inStatistical Connectomics To be submitted to IEEE PAMI (and arxiv) anyday now (available upon request)....[5] Sussman, et al. A consistent dot product embedding for stochasticblockmodel graphs. Submitted to JASA (available on arxiv).[6] Priebe, et al. Optimizing the quantity/quality trade-o↵ in connectomeinference. Communications in Statistics - Theory and Methods, toappear.

Random Variables

Adjacency Matrix: A : ⌦ ! A ⇢ {0, 1}nv⇥n

v

Latent In-Vectors: X = ⌦ ! X ✓ Rd⇥n

v

+

Latent Out-Vectors: Y = ⌦ ! Y ✓ Rd⇥n

v

+

Parameter ✓ = (⇢, ⌧)

In- and Out- Vec Likelihood: ⇢X

,⇢Y

2 43

Block Membership Function: ⌧ : [nv

] ! [3]

Sampling Distribution

(A,Y ),Dn

s

= {(Ai

,Yi

)}i2[n

s

] ⇠ FA,Y

FA,Y =

Y

(u,v)2E

Bern�auv

; hXu

,Yv

i�⇢X

(⌧u

)⇢Y

(⌧v

) 2 FA,Y

Input: A, Dn

s

, dOutput: y , ⌧ (and nuisance parameters ⇢

X

and ⇢Y

)1: Let A

y

= 1n

y

Pi :y

i

=y

Ai

be the average adjacency matrix forclass y

2: Let [eUy

, eDy

, eVy

] = SVD([Ay

]) keeping only the d triplets withlargest singular values.

3: Cluster eU and eV using a perfect K -means clustering algorithmforcing one cluster to have vertices from all classes, and onecluster for each class.

4: Let ⌧ be the cluster assignments for each of the vertices5: Do a DPE for A, and cluster each vertex accordingly. Let J be

the cluster of vertices that are informative with regard to theclassification task.

6: Let

y = argmaxy2Y

Y

u,v2JBern

�auv

; hXu

, Yv

i�⇢X

(⌧u

)⇢Y

(⌧v

)

Definitions

Adjacency Matrices: A,B 2 Rn⇥n

Permutation Matrices: Q = {Q : Q1 = 1,QT1 = 1,Q 2 {0, 1}n⇥n}Doubly Stochastic Matrices: D = {D : D1 = 1,DT1 = 1, d

uv

� 0}

Objective Function

(QAP) Q = argminQ2Q

���A� QBQT���F

= argminQ2Q

hA,QBQTi

Input: A,BOutput: Q1: for i = 1, . . . , i

max

do2: Let Q

i1 be either 1T1/n, I , or something near I .3: Use the Frank-Wolfe algorithm to find a local optimum of

the following relaxed quadratic assignment problem (rQAP):

(rQAP) Qi2 = argmin

D2DhA,QBQTi.

4: Project D onto the Q using the Hungarian Algorithm toobtain Q

i

.5: end for6: Let

Q = argmini2[i

max

]hA,Q

i

BQTi

i

TheoremrQAP has the same minimum as QAP whenever A and B are theadjacency matrices of simple graphs isomorphic to one another.

Proof.The set of doubly stochastic matrices is the convex hull of the setof permutation matrices. Thus, if a permutation matrix minimizesrQAP then it also minimizes QAP. Moreover, hA,Ai = 2m (wherem =

PAuv

). Thus, it is su�cient to show thathA,DBDTi > hA,Ai = 2m. This follows because (DBDT)

uv

1.NB: This is parallel to rLAP being equivalent to LAP.

Random Variables

Adjacency Matrix: A : ⌦ ! A ✓ {0, 1}nv⇥n

v

Permutation Matrix: Q : ⌦ ! Q = {Q : quv

2 {0, 1},Q1 = 1,QT1 = 1}Graph Class: Y : ⌦ ! Y = [n

y

]

Sampling Distribution

FQ,A,Y (a, y ; ✓) = FQFA|Y FY = F

A|Y FYUni(Q)

(Q,A,Y ),Dn

s

= {(Qi

,Ai

,Yi

)}i2[n

s

]iid⇠ FQ,A,Y 2 FQ,A,Y

Random Variables

Adjacency Matrix: A : ⌦ ! A ✓ {0, 1}nv⇥n

v

Graph Class: Y : ⌦ ! Y = [ny

]

Parameter ✓ = (P,⇡,S)Edge Probabilities: P = (p

uv |y ) 2 (0, 1)nv⇥n

v

⇥n

y

Class Priors: ⇡ = {⇡0, . . . ,⇡nY

} 2 4n

y

Signal Subgraph: S = {(u, v) : puv |y

i

6= puv |y

j

8yi

6= yj

} ✓ P(n2v

)

Sampling Distribution

FA,Y (a, y ; ✓) =

Y

uv2SBern(a

uv

; puv |y )⇡y ⇥

Y

(u,v)2E\S

Bern(auv

; puv

)⇡y

(A,Y ),Dn

s

= {(Ai

,Yi

)}i2[n

s

]iid⇠ F

A,Y 2 FA,Y

Let Le�s

be the misclassification rate of the above algorithm.Let eL⇤ be the Bayes optimal misclassification rate for shu✏edgraphs.

TheoremL

e�s

! eL⇤ as s ! 1

Proof.Because the joint space of adjacency matrices, permutationmatrices, and graph classes has finite cardinality, the law of largenumbers ensures that eventually as s ! 1, the plurality of nearestneighbors to a test graph will be identical to the test graph.

TheoremS ! S as n

s

! 1

Proof.A and Y are finite, so by the law of large numbers,T(i) ! " > 0 8i 2 S and T(i) ! 0 8i /2 S.

Graph Matched Frobenius Norm ks

Nearest NeighborAlgorithm

Input: A, a rule for ks

as s ! 1 such that ks

/s ! 0 andk

s

! 1, Dn

s

Output: y

1: Compute the graph-matched Frobenius norm distance betweenA and each training graph:

e�i

= argminQ2Q

���A� QA

i

Q

T���2

F

2: Rank the distances in decreasing order:e�(1) e�(2) · · · e�(n

s

).3: Let

y = argmaxy2Y

X

i :yi

=y

I{d(i) k}

Input: A, Dn

s

, number of signal edges s and signal vertices mOutput: y , S (and nuisance parameters P, ⇡)1: Compute the significance of each edge using Fisher’s Exact

Test, yielding T(1) � T(2) � · � T(nE

) using Dn

s

.2: Rank edges by significance with respect to each vertex,

E

k,(1) � E

k,(2) � . . . � E

k,(n�1) for all k 2 V .3: while not converged do4: Increase critical value c from T(i) to T(i+1)

5: Compute vertex score: wv ;c =

Pu2[V ] I{Tv ,u > c} for each

vertex6: converge if

Pv2[m] wv ;c � s

7: end while8: Let S be the set of s most significant edges incident to the m

best scoring vertices.9: Let y = argmax

y2YQ

(u,v)2S Bern(auv

; puv |y )⇡y

DPE for Graph Classification

Setting

We observe a collection of graphs and their associated classes. Thevertices may be labeled or unlabeled. We assume that only a subsetof vertices are informative with regard to the classification task.

GoalFor a novel graph, find its most likely class and which verticesencode the class-conditional signal.

Statistical Connectomics Application

Classify arbitrarily large graphs, include those with and withoutvertex labels, without necessitating graph matching or estimatingO(n2) parameters.

Fast Inexact Graph Matching

Setting

We observe a pair of unlabeled graphs.

GoalFind the isomorphism that matches the graphs optimally.

Statistical Connectomics Application

A subroutine of our shu✏ed graph classifier.

Shu✏ed Graph Classification

Setting

We observed a collection of graphs without labeled vertices andassociated graph classes.

GoalFor a novel graph, find its most likely class.

Statistical Connectomics Application

Classify brain-graphs for which vertices lack labels. This includescollections of brain-graphs across species or whenever verticesrepresent vertebrate neurons.

Labeled Graph Classification

Setting

We observed a collection of graphs with labeled vertices andassociated graph classes. We assume that only a subset ofedges/vertices are informative with regard to the classification task.

GoalFor a novel graph, find its most likely class and whichedges/vertices encode the class-conditional signal.

Statistical Connectomics Application

Classify brain-graphs for which vertices are labeled (for example,invertebrate brain-graphs where vertices represent neurons orvertebrate brain-graphs where vertices represent brain regions) andfind which edges/vertices encode various cognitive/behavioralproperties.

0 50 100 150 200 250 3000

0.2

0.4

0.6

0.8

1

# training samples

mis

sed−

edge

rate

0 50 100 150 200 250 300

0.1

0.2

0.3

0.4

0.5

# training samples

mis

clas

sific

atio

n ra

te

cohincnb

100 101 102 1030

0.25

0.5

log assumed # of signal edges

mis

clas

sific

atio

n ra

te

incoherent estimator

Lnb

=0.41

Li nc

=0.27

L⇡

= 0.5

assumed # of signal edges

assu

med

# o

fsi

gnal

ver

tices

coherent estimator

Lcoh

=0.16

200 400 600 8001000

10

20

300.16

0.3

0.4

0.5

100 1020

0.160.25

0.5

log assumed # of signal edges

mis

clas

sific

atio

n ra

te

assumed m=12 coherent estimator

Lc oh

=0.16

assumed # signal edges

assu

med

# o

fsi

gnal

ver

tices

zoomed in coherent estimator

400 500 600

15

18

21

0.16

0.3

0.4

0.5

coherent signal subgraph estimate

verte

x

vertex20 40 60

20

40

60

threshold

coherogram

0.04 0.14 0.29 0.55

20

40

600

10

20

30

Figure: (Left) Simulation demonstrates that the coherent classifieroutperforms the incoherent classifiers as a function of sample size.(Right) MR connectome sex signal subgraph estimation and analysis. Bycross-validating over hyperparameters and models, we estimate that the“best” coherent signal subgraph (for this inference task on these data)has m

coh

= 12 and s

coh

= 360, achieving L

coh

= 0.16.

10−2

10−1

100

Erro

r

Approximate QAP Performance on QAP Benchmark Library

chr1

2c

chr1

5ach

r15c

chr2

0bch

r22b

esc1

6bro

u12

rou1

5

rou2

0ta

i10a

tai1

5ata

i17a

tai2

0ata

i30a

tai3

5ata

i40a

QAP100 QAP3 QAP1 PSOA

chemical electrical unit

Accuracy 100 (0) 59 (0.30) %Restarts 3 (0) 25 (6.7) #Solution Time 42 (0.42) 79 (20) sec.

0 10 20 30 40 500.35

0.4

0.45

0.5

0.55

� 0

number of training samples

mis

clas

sific

atio

n ra

te

Connectome Classifier Comparison

Figure: Connectome misclassification rates for various classifiers. 2000Monte Carlo sub-samples of the data were performed for each s, suchthat errorbars were neglibly small. Five classifiers were compared: � is thek

s

NN classifier labeled graphs; is the k

s

NN on a collection of graphinvariants, ⇡ is chance, and �0 is the k

s

NN on shu✏ed graphs (withoutgraph-matching).

Large Graph Classification: Theory and Statistical Connectomics Applications

Joshua T. Vogelstein, Donniell E. Fishkind, Daniel L. Sussman & Carey E. Priebe | JHU, Dept of Applied Math & Statistics

Model Alg TheoryTask Data

TheoremIf, for all channels c, d (c) is known, then almost always"(n) 2 O(log n/n). If, for any color c, d (c) is unknown then almostalways "(n) 2 O(n�1/4).

Proof.The proof of this theorem is an extension of the above proof, withthe following generalizations: (i) multiple dependent channels areincorporated, (ii) d (c) need not be specified (although doing sospeeds up convergence), (iii) K need not be specified.

Semi-sup. Dot Product Embedding

Input: A, (K , d (c))Output: ⌧ (and nuisance parameters B, ⇢)1: for c 2 [C ] do2: [eU(c), eD(c), eV (c)] = SVD(A(c)) keeping only the d (c) triplets

with largest singular values.

3: Let U(c) = eU(c)p

eD(c) and similarly for V (c).4: end for5: Concatenate all embedded scaled vectors,

[U(1)| · · · |U(C)|V (1)| · · · |V (C)]6: Use a perfect K-means to cluster the concatenated vectors.7: Nominate the vertices which are in the cluster with the

plurality of labelled vertices.

TheoremIt almost always holds that

"(n) = |{u 2 V : ⌧(u) 6= ⌧(u)}|/n 2 O(log n/n)

Proof.(sketch)

1. Bound��AAT � (XYT)(XYT)T

�� following Rohe et al (2010).

2. Lower bound the smallest non-zero singular value of XYT.

3. Apply Davis-Kahan Theorem.

4. The normalized dot product embedding of A is approximatelya rotation of the normalized dot product embedding of XYT.

Unsuper. Dot Product Embedding

Input: A, (K , d)Output: ⌧ (and nuisance parameters B, ⇢)1: [eU, eD, eV ] = SVD(A) keeping only the d triplets with largest

singular values2: Cluster eU and eV using a perfect K -means clustering algorithm3: Let ⌧ be the cluster assignments for each of the vertices

Stochastic Blockmodel GraphParameters

Number of Blocks: K 2 NBlock Membership probabilities: ⇢ 2 �

K

Edge Probabilities: B 2 (0, 1)K⇥K

Random Variables

Adjacency Matrix: A 2 {0, 1}n⇥n

Block Membership function: ⌧ : [n] 7! [K ]

Sampling Distribution

(A, ⌧) ⇠ FA,⌧ 2 F

A,⌧

FA,⌧ =

Y

(u,v)2E

P[auv

= 1|⌧u

= i , ⌧v

= j ]P[⌧u

= i ]P[⌧v

= j ]

=Y

(u,v)2E

b⌧u

,⌧v

⇢⌧u

⇢⌧v

Dot Product Embedding in Large (Errorfully Observed) Graphs with Applications in Statistical Connectomics

σ(Donniell E. Fishkind , Daniel L. Sussman, Minh Tang, Joshua T. Vogelstein), Carey E. Priebe{def , dsussma3, mtang10, joshuav, cep}@jhu.edu | JHU, Dept of Applied Math & Statistics

ErrorAlg TheoryTask

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Epsilon

0.3

0.4

0.5

0.6

0.7

Blo

ckA

ssig

nmen

tErr

or

Estimating Block Structure in Simulated Errorfully Observed Graphs

Figure: We simulated 1000 Monte Carlo replicates of graphs from thea�liation model with n = 1000, p = 0.1 and q = .05. We then generatederrorful versions of these graphs for ✏ 2 {0, 0.05, . . . , 0.95, 1} andz = h(✏) = 5000 + 50000

(sin(⇡/4)) sin(✏⇡/2). Upon dot product embedding wecould calculate the fraction of mis-assigned vertices. That the curve isnot flat suggests that there is a quantity/quality tradeo↵ for estimatingvertex labels.

102

30 270 510 750 990 1230n

# M

iscl

assi

fied

(a) Number of Misclassified

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

30 270 510 750 990 1230n

Mea

n Em

bedd

ing

Erro

r

(b) Average distance from true latent position

Figure: We simulated 100 undirected random graphs from the a�liationmodel for n 2 {30, 270, 510, 750, 990, 1230}, using p = 0.15, q = 0.1,and K = d = 3. The number of nodes in each block is given byni

= n/K . Panel (a) shows the number of misclassified nodes by doingK-means clustering on eU and eV. Panel (b) shows the mean distancebetween the true latent vectors and the embedded vectors, given by therows of eU and eV.

Unsupervised SettingSetting

We errorfully observe a single graph. We believe that there are setsof vertices that are stochastically equivalent.

GoalEstimate vertex labels according to which vertices are stochasticallyequivalent to one another given the errorfully observed graph.

Statistical Connectomics

1. How many brain regions are there and where are they?

2. How many cell types are there?

3. Does the number of cell types change if we allow for colorededges?

(No)

Errorfully Observed Graph ModelParameters

Set of edges in unobserved graph: E ⇢ [n]⇥ [n]

Probability of errorful edge: ✏ 2 [0, 1]

Number of edges observed: z 2 {1, 2, . . . }

Note: z = h(✏) for some function h : [0, 1] 7! N

Observation ProcedureWe observe the adjacency matrix A as follows:

for i in 1 to z doFlip a 0-1 coin with probability ✏ coin lands on 1.if Coin lands 0 thenChoose an edge from E and add to A

elseChoose an edge from [n]2 and add to A

end ifend for

Semi-Supervised Setting

Setting

We observe a single graph. Some vertices are labeled good andsome bad. Edges are colored.

GoalFind the unlabeled vertex that is most likely bad.

Statistical ConnectomicsGiven some cell types, can we estimate the cell type of another?

50 100

200

400

800

1600

n

0

0.2

0.4

0.6

0.8

1

NS

RR

Consistency of Vertex Nomination

Figure: We simulated 1000 Monte Carlo replicates of graphs from thea�liation model with n 2 {50, 100, 200, 400, 800, 1600}, p = 0.1 andq = 0.05. We then used DPE and computed the sum of distances to the10 vectors associated with labeled vertices and ranked vertices accordingto minimizing this sum of distances. We used the normalized sum ofreciprocal ranks (NSRR) metric given byP

m�m

0

v=1 (1/rank(v))/P

m�m

0

v=1 (1/v). NSRR close to 1 is goodperformance.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Epsilon

0.0

0.2

0.4

0.6

0.8

NS

RR

Vertex Nomination in Simulated Errorfully Observed Graphs

Figure: We simulated 1000 Monte Carlo replicates of graphs from thea�liation model with n = 1000, p = 0.1 and q = 0.05 and thengenerated errorful versions using ✏ 2 {0, 0.05, . . . , 0.95, 1} andz = h(✏) = 5000 + 50000

(sin(⇡/4)) sin(✏⇡/2). We then computed NSRR tomeasure performance of VN as a function of edge sampling error rate.Suggests there is a quantity/quality trade-o↵ for the vertex nominationproblem.