Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Dra
ft
Understanding the Mesoscale Structure of the C. elegans Brain
Network
Dragana M.Pavlovic1, Dr. Petra E.Vertes2, Prof. Edward T.Bullmore2,3, Dr. Thomas E. Nichols1
1University of Warwick, Dept. of Statistics, Coventry, UK; 2University of Cambridge, Brain Mapping Unit, Dept. of Psychiatry,Cambridge, UK; 3GlaxoSmithKline, Clinical Unit Cambridge, Addenbrooke’s Hospital, Cambridge, UK.
Introduction
Figure 1: Nervetracts C. elegans.
Recently, there has been muchinterest in mesoscale structureof networks such as: theirorganisation into communi-
ties and core and periph-
ery. However, it is often dif-ficult to disambiguate the re-lationship between these twotypes of mesoscale structureor, indeed, to summarise thefull network into the relation-ships between its mesoscaleconstituents. Here, we usea stochastic blockmodel ap-proach Erd
˝
os-Rényi Mixture
Model (ERMM)[1] for community estimation andcompare this to the much more widely used deter-ministic methods such as: Louvain [3] and Spectral[2] algorithms. We use the Caenorhabditis elegans
(C. elegans) [6] connectome (Fig. 1) as a model sys-tem in which biological knowledge about each nodeor neuron can be used to validate the functional rel-evance of the communities obtained.
Methods
The ERMM treats the communities (blocks) andtheir mutual connections as mini Erdos-Rényi mod-els, represented in the likelihood with different pro-portions. For a given number of communities Q,a variational approach is used to approximate thelikelihood, while the Integrated Classification Like-lihood (ICL) is used to compare the optimised likeli-hoods over different Q. The final result is an estimateof Q and the partition, visualised as a reorganised ad-jacency matrix. The deterministic methods like theFast Louvain and Spectral algorithms define com-munity as a group of highly connected nodes whosebetween group connections are very small. Bothalgorithms are devised to maximise the modularitybut use different strategies to find its maximum. Forexample, the Fast Louvain algorithm uses a greedyapproach, while the Spectral algorithm uses eigen-values of the modularity matrix to find the optimalpartition.
Analysis
We apply all 3 methods to the C. elegans neural net-work, composed of 279 non pharyngeal neurons andof 2287 undirected edges, and we use the additionalfunctional and anatomical measures to evaluate theestimates of its community structure. For the quanti-tative ground truth measures, we use the Intra ClassCorrelation (ICC) to compare the variance explainedby each community estimates. For the categoricalground truth measures, we use the Adjusted RandIndex (ARI) to compare the similarity.
References
[1] Daudin, Picard, Robin A mixture model for random graphs, Statistics and computing, (2008).
[2] Newman, Detecting community structure in networks,The European Physical Journal B-CondensedMatter and Complex Systems, vol. 38, (2004).
[3] Blondel, Guillaume, Lambiotte and Lefebvre Fast unfolding of communities in large networks,Journal of Statistical Mechanics: Theory and Experiment, vol. 10, (2008).
[4] Dobson, An introduction to generalised linear models (2001).
[5] Hubert and Arabie, Comparing partitions, Journal of classification, vol. 2, (1985).
[6] Varshney, Chen, Paniagua, Hall, and Chklovskii, Structural properties of the Caenorhabditis ele-
gans neuronal network, PLoS computational biology, vol. 7, (2011).
Differences in Community Estimation
1
2
3
45/67
8
9
1 2 3 4 5/67 8 9
Figure 2: ERMM.
1
2
3
4
5
1 2 3 4 5
Figure 3: Louvain.
1
2
3
4
1 2 3 4
Figure 4: Spectral.
Network Compression and Degree Distribution with ERMM
Block 1
Block 2
Block 3
Block 4
Block 5Block 6
Block 7
Block 8
Block 9 1810
23
2831
10
11
9
14388
3046
17
4080
25
8
100
51
11
31
912
50
Figure 5: ERMM connectivity structure.
0.01
1.00
1 10 100Degrees1−CD
F
EmpiricalFitted
Figure 6: ERMM’s fitted degree distribution.
Qualitative Assessment
0.0
0.2
0.4
0.6
ALL ALS AD BT BTD LD
ICC
MethodsERMMLouvain algorithmSpectral algorithm
Figure 7: ICC scores for the Anatomical location(longitudinal) (ALL), Anatomical location (sectional)(ALS), Anatomical distance (AD), Birth time (BT),Birth time difference (BTD) and Lineage distance (LD).
0.0
0.1
0.2
0.3
FC GC
AR
I
MethodsERMMLouvain algorithmSpectral algorithm
Figure 8: ARI scores for Functional Classifica-tion (FC) and Ganglion Classification (GC).
Results
The optimal ERMM fit consists of 9 classes, while the fits of Louvain and Spectral algorithms consist of 5 and4 communities, shown in Fig. 2-4 as the reorganised adjacency matrices. The ERMM finds dense blocks onthe diagonal, but but also a range of off-diagonal patterns. Note how blocks 5&6, with tight inter-connectionsand numerous external connections, form a core-periphery structure. Surprisingly, even though blocks 5&6fit the standard notion of “community" they are not identified by the determinist algorithms. Furthermore,ERMM fit provides a compressed view of the C. elegans network (see Fig.5) and a faithful approximation ofthe degree distribution (Fig. 6).To score the quality of each fit, we show the ICC scores (Fig. 7) across the known biological features char-acterising nodes and edges. Here, we see that the ERMM fit scores consistently higher than Spectral andLouvain algorithms. In Fig. 8, however, ARI is rather low in general, with the Spectral algorithm showingslightly better similarity with the functional classifications and all methods having similar ARI for ganglionclassification.
Conclusion
We showed that the Erdos- Rényi Mixture Model not only produces more biologically plausible communitiesbut also that it provides an integrated picture of the full mesoscale structure (including core-periphery) andthat it allows for compression of the network into a set of super-nodes and their connectivities. We expectthese methods to prove useful for the analysis of other types of networks such as human brain functionalconnectivity.