Upload
doanhanh
View
216
Download
0
Embed Size (px)
Citation preview
Conjecture
DPE for Graph Classification consistently finds JProof.in progress...
References
[1] Vogelstein, et al. Are mental properties supervenient on brainproperties? Nature Scientific Reports, 2011.[2] Vogelstein, et al. Graph Classification using Signal Subgraphs:Applications in Statistical Connectomics. Submitted to IEEE PAMI(available on arxiv).[3] Vogelstein, et al. Shu✏ed Graph Classification: Theory andConnectome Applications To be submitted to IEEE PAMI (and arxiv) anyday now (available upon request)....[4] Vogelstein, et al. Fast Inexact Graph Matching with Applications inStatistical Connectomics To be submitted to IEEE PAMI (and arxiv) anyday now (available upon request)....[5] Sussman, et al. A consistent dot product embedding for stochasticblockmodel graphs. Submitted to JASA (available on arxiv).[6] Priebe, et al. Optimizing the quantity/quality trade-o↵ in connectomeinference. Communications in Statistics - Theory and Methods, toappear.
Random Variables
Adjacency Matrix: A : ⌦ ! A ⇢ {0, 1}nv⇥n
v
Latent In-Vectors: X = ⌦ ! X ✓ Rd⇥n
v
+
Latent Out-Vectors: Y = ⌦ ! Y ✓ Rd⇥n
v
+
Parameter ✓ = (⇢, ⌧)
In- and Out- Vec Likelihood: ⇢X
,⇢Y
2 43
Block Membership Function: ⌧ : [nv
] ! [3]
Sampling Distribution
(A,Y ),Dn
s
= {(Ai
,Yi
)}i2[n
s
] ⇠ FA,Y
FA,Y =
Y
(u,v)2E
Bern�auv
; hXu
,Yv
i�⇢X
(⌧u
)⇢Y
(⌧v
) 2 FA,Y
Input: A, Dn
s
, dOutput: y , ⌧ (and nuisance parameters ⇢
X
and ⇢Y
)1: Let A
y
= 1n
y
Pi :y
i
=y
Ai
be the average adjacency matrix forclass y
2: Let [eUy
, eDy
, eVy
] = SVD([Ay
]) keeping only the d triplets withlargest singular values.
3: Cluster eU and eV using a perfect K -means clustering algorithmforcing one cluster to have vertices from all classes, and onecluster for each class.
4: Let ⌧ be the cluster assignments for each of the vertices5: Do a DPE for A, and cluster each vertex accordingly. Let J be
the cluster of vertices that are informative with regard to theclassification task.
6: Let
y = argmaxy2Y
Y
u,v2JBern
�auv
; hXu
, Yv
i�⇢X
(⌧u
)⇢Y
(⌧v
)
Definitions
Adjacency Matrices: A,B 2 Rn⇥n
Permutation Matrices: Q = {Q : Q1 = 1,QT1 = 1,Q 2 {0, 1}n⇥n}Doubly Stochastic Matrices: D = {D : D1 = 1,DT1 = 1, d
uv
� 0}
Objective Function
(QAP) Q = argminQ2Q
���A� QBQT���F
= argminQ2Q
hA,QBQTi
Input: A,BOutput: Q1: for i = 1, . . . , i
max
do2: Let Q
i1 be either 1T1/n, I , or something near I .3: Use the Frank-Wolfe algorithm to find a local optimum of
the following relaxed quadratic assignment problem (rQAP):
(rQAP) Qi2 = argmin
D2DhA,QBQTi.
4: Project D onto the Q using the Hungarian Algorithm toobtain Q
i
.5: end for6: Let
Q = argmini2[i
max
]hA,Q
i
BQTi
i
TheoremrQAP has the same minimum as QAP whenever A and B are theadjacency matrices of simple graphs isomorphic to one another.
Proof.The set of doubly stochastic matrices is the convex hull of the setof permutation matrices. Thus, if a permutation matrix minimizesrQAP then it also minimizes QAP. Moreover, hA,Ai = 2m (wherem =
PAuv
). Thus, it is su�cient to show thathA,DBDTi > hA,Ai = 2m. This follows because (DBDT)
uv
1.NB: This is parallel to rLAP being equivalent to LAP.
Random Variables
Adjacency Matrix: A : ⌦ ! A ✓ {0, 1}nv⇥n
v
Permutation Matrix: Q : ⌦ ! Q = {Q : quv
2 {0, 1},Q1 = 1,QT1 = 1}Graph Class: Y : ⌦ ! Y = [n
y
]
Sampling Distribution
FQ,A,Y (a, y ; ✓) = FQFA|Y FY = F
A|Y FYUni(Q)
(Q,A,Y ),Dn
s
= {(Qi
,Ai
,Yi
)}i2[n
s
]iid⇠ FQ,A,Y 2 FQ,A,Y
Random Variables
Adjacency Matrix: A : ⌦ ! A ✓ {0, 1}nv⇥n
v
Graph Class: Y : ⌦ ! Y = [ny
]
Parameter ✓ = (P,⇡,S)Edge Probabilities: P = (p
uv |y ) 2 (0, 1)nv⇥n
v
⇥n
y
Class Priors: ⇡ = {⇡0, . . . ,⇡nY
} 2 4n
y
Signal Subgraph: S = {(u, v) : puv |y
i
6= puv |y
j
8yi
6= yj
} ✓ P(n2v
)
Sampling Distribution
FA,Y (a, y ; ✓) =
Y
uv2SBern(a
uv
; puv |y )⇡y ⇥
Y
(u,v)2E\S
Bern(auv
; puv
)⇡y
(A,Y ),Dn
s
= {(Ai
,Yi
)}i2[n
s
]iid⇠ F
A,Y 2 FA,Y
Let Le�s
be the misclassification rate of the above algorithm.Let eL⇤ be the Bayes optimal misclassification rate for shu✏edgraphs.
TheoremL
e�s
! eL⇤ as s ! 1
Proof.Because the joint space of adjacency matrices, permutationmatrices, and graph classes has finite cardinality, the law of largenumbers ensures that eventually as s ! 1, the plurality of nearestneighbors to a test graph will be identical to the test graph.
TheoremS ! S as n
s
! 1
Proof.A and Y are finite, so by the law of large numbers,T(i) ! " > 0 8i 2 S and T(i) ! 0 8i /2 S.
Graph Matched Frobenius Norm ks
Nearest NeighborAlgorithm
Input: A, a rule for ks
as s ! 1 such that ks
/s ! 0 andk
s
! 1, Dn
s
Output: y
1: Compute the graph-matched Frobenius norm distance betweenA and each training graph:
e�i
= argminQ2Q
���A� QA
i
Q
T���2
F
2: Rank the distances in decreasing order:e�(1) e�(2) · · · e�(n
s
).3: Let
y = argmaxy2Y
X
i :yi
=y
I{d(i) k}
Input: A, Dn
s
, number of signal edges s and signal vertices mOutput: y , S (and nuisance parameters P, ⇡)1: Compute the significance of each edge using Fisher’s Exact
Test, yielding T(1) � T(2) � · � T(nE
) using Dn
s
.2: Rank edges by significance with respect to each vertex,
E
k,(1) � E
k,(2) � . . . � E
k,(n�1) for all k 2 V .3: while not converged do4: Increase critical value c from T(i) to T(i+1)
5: Compute vertex score: wv ;c =
Pu2[V ] I{Tv ,u > c} for each
vertex6: converge if
Pv2[m] wv ;c � s
7: end while8: Let S be the set of s most significant edges incident to the m
best scoring vertices.9: Let y = argmax
y2YQ
(u,v)2S Bern(auv
; puv |y )⇡y
DPE for Graph Classification
Setting
We observe a collection of graphs and their associated classes. Thevertices may be labeled or unlabeled. We assume that only a subsetof vertices are informative with regard to the classification task.
GoalFor a novel graph, find its most likely class and which verticesencode the class-conditional signal.
Statistical Connectomics Application
Classify arbitrarily large graphs, include those with and withoutvertex labels, without necessitating graph matching or estimatingO(n2) parameters.
Fast Inexact Graph Matching
Setting
We observe a pair of unlabeled graphs.
GoalFind the isomorphism that matches the graphs optimally.
Statistical Connectomics Application
A subroutine of our shu✏ed graph classifier.
Shu✏ed Graph Classification
Setting
We observed a collection of graphs without labeled vertices andassociated graph classes.
GoalFor a novel graph, find its most likely class.
Statistical Connectomics Application
Classify brain-graphs for which vertices lack labels. This includescollections of brain-graphs across species or whenever verticesrepresent vertebrate neurons.
Labeled Graph Classification
Setting
We observed a collection of graphs with labeled vertices andassociated graph classes. We assume that only a subset ofedges/vertices are informative with regard to the classification task.
GoalFor a novel graph, find its most likely class and whichedges/vertices encode the class-conditional signal.
Statistical Connectomics Application
Classify brain-graphs for which vertices are labeled (for example,invertebrate brain-graphs where vertices represent neurons orvertebrate brain-graphs where vertices represent brain regions) andfind which edges/vertices encode various cognitive/behavioralproperties.
0 50 100 150 200 250 3000
0.2
0.4
0.6
0.8
1
# training samples
mis
sed−
edge
rate
0 50 100 150 200 250 300
0.1
0.2
0.3
0.4
0.5
# training samples
mis
clas
sific
atio
n ra
te
cohincnb
100 101 102 1030
0.25
0.5
log assumed # of signal edges
mis
clas
sific
atio
n ra
te
incoherent estimator
Lnb
=0.41
Li nc
=0.27
L⇡
= 0.5
assumed # of signal edges
assu
med
# o
fsi
gnal
ver
tices
coherent estimator
Lcoh
=0.16
200 400 600 8001000
10
20
300.16
0.3
0.4
0.5
100 1020
0.160.25
0.5
log assumed # of signal edges
mis
clas
sific
atio
n ra
te
assumed m=12 coherent estimator
Lc oh
=0.16
assumed # signal edges
assu
med
# o
fsi
gnal
ver
tices
zoomed in coherent estimator
400 500 600
15
18
21
0.16
0.3
0.4
0.5
coherent signal subgraph estimate
verte
x
vertex20 40 60
20
40
60
threshold
coherogram
0.04 0.14 0.29 0.55
20
40
600
10
20
30
Figure: (Left) Simulation demonstrates that the coherent classifieroutperforms the incoherent classifiers as a function of sample size.(Right) MR connectome sex signal subgraph estimation and analysis. Bycross-validating over hyperparameters and models, we estimate that the“best” coherent signal subgraph (for this inference task on these data)has m
coh
= 12 and s
coh
= 360, achieving L
coh
= 0.16.
10−2
10−1
100
Erro
r
Approximate QAP Performance on QAP Benchmark Library
chr1
2c
chr1
5ach
r15c
chr2
0bch
r22b
esc1
6bro
u12
rou1
5
rou2
0ta
i10a
tai1
5ata
i17a
tai2
0ata
i30a
tai3
5ata
i40a
QAP100 QAP3 QAP1 PSOA
chemical electrical unit
Accuracy 100 (0) 59 (0.30) %Restarts 3 (0) 25 (6.7) #Solution Time 42 (0.42) 79 (20) sec.
0 10 20 30 40 500.35
0.4
0.45
0.5
0.55
�
�
⇡
� 0
number of training samples
mis
clas
sific
atio
n ra
te
Connectome Classifier Comparison
Figure: Connectome misclassification rates for various classifiers. 2000Monte Carlo sub-samples of the data were performed for each s, suchthat errorbars were neglibly small. Five classifiers were compared: � is thek
s
NN classifier labeled graphs; is the k
s
NN on a collection of graphinvariants, ⇡ is chance, and �0 is the k
s
NN on shu✏ed graphs (withoutgraph-matching).
Large Graph Classification: Theory and Statistical Connectomics Applications
Joshua T. Vogelstein, Donniell E. Fishkind, Daniel L. Sussman & Carey E. Priebe | JHU, Dept of Applied Math & Statistics
Model Alg TheoryTask Data
TheoremIf, for all channels c, d (c) is known, then almost always"(n) 2 O(log n/n). If, for any color c, d (c) is unknown then almostalways "(n) 2 O(n�1/4).
Proof.The proof of this theorem is an extension of the above proof, withthe following generalizations: (i) multiple dependent channels areincorporated, (ii) d (c) need not be specified (although doing sospeeds up convergence), (iii) K need not be specified.
Semi-sup. Dot Product Embedding
Input: A, (K , d (c))Output: ⌧ (and nuisance parameters B, ⇢)1: for c 2 [C ] do2: [eU(c), eD(c), eV (c)] = SVD(A(c)) keeping only the d (c) triplets
with largest singular values.
3: Let U(c) = eU(c)p
eD(c) and similarly for V (c).4: end for5: Concatenate all embedded scaled vectors,
[U(1)| · · · |U(C)|V (1)| · · · |V (C)]6: Use a perfect K-means to cluster the concatenated vectors.7: Nominate the vertices which are in the cluster with the
plurality of labelled vertices.
TheoremIt almost always holds that
"(n) = |{u 2 V : ⌧(u) 6= ⌧(u)}|/n 2 O(log n/n)
Proof.(sketch)
1. Bound��AAT � (XYT)(XYT)T
�� following Rohe et al (2010).
2. Lower bound the smallest non-zero singular value of XYT.
3. Apply Davis-Kahan Theorem.
4. The normalized dot product embedding of A is approximatelya rotation of the normalized dot product embedding of XYT.
Unsuper. Dot Product Embedding
Input: A, (K , d)Output: ⌧ (and nuisance parameters B, ⇢)1: [eU, eD, eV ] = SVD(A) keeping only the d triplets with largest
singular values2: Cluster eU and eV using a perfect K -means clustering algorithm3: Let ⌧ be the cluster assignments for each of the vertices
Stochastic Blockmodel GraphParameters
Number of Blocks: K 2 NBlock Membership probabilities: ⇢ 2 �
K
Edge Probabilities: B 2 (0, 1)K⇥K
Random Variables
Adjacency Matrix: A 2 {0, 1}n⇥n
Block Membership function: ⌧ : [n] 7! [K ]
Sampling Distribution
(A, ⌧) ⇠ FA,⌧ 2 F
A,⌧
FA,⌧ =
Y
(u,v)2E
P[auv
= 1|⌧u
= i , ⌧v
= j ]P[⌧u
= i ]P[⌧v
= j ]
=Y
(u,v)2E
b⌧u
,⌧v
⇢⌧u
⇢⌧v
Dot Product Embedding in Large (Errorfully Observed) Graphs with Applications in Statistical Connectomics
σ(Donniell E. Fishkind , Daniel L. Sussman, Minh Tang, Joshua T. Vogelstein), Carey E. Priebe{def , dsussma3, mtang10, joshuav, cep}@jhu.edu | JHU, Dept of Applied Math & Statistics
ErrorAlg TheoryTask
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Epsilon
0.3
0.4
0.5
0.6
0.7
Blo
ckA
ssig
nmen
tErr
or
Estimating Block Structure in Simulated Errorfully Observed Graphs
Figure: We simulated 1000 Monte Carlo replicates of graphs from thea�liation model with n = 1000, p = 0.1 and q = .05. We then generatederrorful versions of these graphs for ✏ 2 {0, 0.05, . . . , 0.95, 1} andz = h(✏) = 5000 + 50000
(sin(⇡/4)) sin(✏⇡/2). Upon dot product embedding wecould calculate the fraction of mis-assigned vertices. That the curve isnot flat suggests that there is a quantity/quality tradeo↵ for estimatingvertex labels.
102
30 270 510 750 990 1230n
# M
iscl
assi
fied
(a) Number of Misclassified
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
30 270 510 750 990 1230n
Mea
n Em
bedd
ing
Erro
r
(b) Average distance from true latent position
Figure: We simulated 100 undirected random graphs from the a�liationmodel for n 2 {30, 270, 510, 750, 990, 1230}, using p = 0.15, q = 0.1,and K = d = 3. The number of nodes in each block is given byni
= n/K . Panel (a) shows the number of misclassified nodes by doingK-means clustering on eU and eV. Panel (b) shows the mean distancebetween the true latent vectors and the embedded vectors, given by therows of eU and eV.
Unsupervised SettingSetting
We errorfully observe a single graph. We believe that there are setsof vertices that are stochastically equivalent.
GoalEstimate vertex labels according to which vertices are stochasticallyequivalent to one another given the errorfully observed graph.
Statistical Connectomics
1. How many brain regions are there and where are they?
2. How many cell types are there?
3. Does the number of cell types change if we allow for colorededges?
(No)
Errorfully Observed Graph ModelParameters
Set of edges in unobserved graph: E ⇢ [n]⇥ [n]
Probability of errorful edge: ✏ 2 [0, 1]
Number of edges observed: z 2 {1, 2, . . . }
Note: z = h(✏) for some function h : [0, 1] 7! N
Observation ProcedureWe observe the adjacency matrix A as follows:
for i in 1 to z doFlip a 0-1 coin with probability ✏ coin lands on 1.if Coin lands 0 thenChoose an edge from E and add to A
elseChoose an edge from [n]2 and add to A
end ifend for
Semi-Supervised Setting
Setting
We observe a single graph. Some vertices are labeled good andsome bad. Edges are colored.
GoalFind the unlabeled vertex that is most likely bad.
Statistical ConnectomicsGiven some cell types, can we estimate the cell type of another?
50 100
200
400
800
1600
n
0
0.2
0.4
0.6
0.8
1
NS
RR
Consistency of Vertex Nomination
Figure: We simulated 1000 Monte Carlo replicates of graphs from thea�liation model with n 2 {50, 100, 200, 400, 800, 1600}, p = 0.1 andq = 0.05. We then used DPE and computed the sum of distances to the10 vectors associated with labeled vertices and ranked vertices accordingto minimizing this sum of distances. We used the normalized sum ofreciprocal ranks (NSRR) metric given byP
m�m
0
v=1 (1/rank(v))/P
m�m
0
v=1 (1/v). NSRR close to 1 is goodperformance.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Epsilon
0.0
0.2
0.4
0.6
0.8
NS
RR
Vertex Nomination in Simulated Errorfully Observed Graphs
Figure: We simulated 1000 Monte Carlo replicates of graphs from thea�liation model with n = 1000, p = 0.1 and q = 0.05 and thengenerated errorful versions using ✏ 2 {0, 0.05, . . . , 0.95, 1} andz = h(✏) = 5000 + 50000
(sin(⇡/4)) sin(✏⇡/2). We then computed NSRR tomeasure performance of VN as a function of edge sampling error rate.Suggests there is a quantity/quality trade-o↵ for the vertex nominationproblem.