1
Draft Understanding the Mesoscale Structure of the C. elegans Brain Network Dragana M.Pavlovic 1 , Dr. Petra E.Vertes 2 , Prof. Edward T.Bullmore 2,3 , Dr. Thomas E. Nichols 1 1 University of Warwick, Dept. of Statistics, Coventry, UK; 2 University of Cambridge, Brain Mapping Unit, Dept. of Psychiatry, Cambridge, UK; 3 GlaxoSmithKline, Clinical Unit Cambridge, Addenbrooke’s Hospital, Cambridge, UK. Introduction Figure 1: Nerve tracts C. elegans. Recently, there has been much interest in mesoscale structure of networks such as: their organisation into communi- ties and core and periph- ery. However, it is often dif- ficult to disambiguate the re- lationship between these two types of mesoscale structure or, indeed, to summarise the full network into the relation- ships between its mesoscale constituents. Here, we use a stochastic blockmodel ap- proach Erd˝ os-Rényi Mixture Model (ERMM)[1] for community estimation and compare this to the much more widely used deter- ministic methods such as: Louvain [3] and Spectral [2] algorithms. We use the Caenorhabditis elegans (C. elegans) [6] connectome (Fig. 1) as a model sys- tem in which biological knowledge about each node or neuron can be used to validate the functional rel- evance of the communities obtained. Methods The ERMM treats the communities (blocks) and their mutual connections as mini Erd˝ os-Rényi mod- els, represented in the likelihood with different pro- portions. For a given number of communities Q, a variational approach is used to approximate the likelihood, while the Integrated Classification Like- lihood (ICL) is used to compare the optimised likeli- hoods over different Q. The final result is an estimate of Q and the partition, visualised as a reorganised ad- jacency matrix. The deterministic methods like the Fast Louvain and Spectral algorithms define com- munity as a group of highly connected nodes whose between group connections are very small. Both algorithms are devised to maximise the modularity but use different strategies to find its maximum. For example, the Fast Louvain algorithm uses a greedy approach, while the Spectral algorithm uses eigen- values of the modularity matrix to find the optimal partition. Analysis We apply all 3 methods to the C. elegans neural net- work, composed of 279 non pharyngeal neurons and of 2287 undirected edges, and we use the additional functional and anatomical measures to evaluate the estimates of its community structure. For the quanti- tative ground truth measures, we use the Intra Class Correlation (ICC) to compare the variance explained by each community estimates. For the categorical ground truth measures, we use the Adjusted Rand Index (ARI) to compare the similarity. References [1] Daudin, Picard, Robin A mixture model for random graphs, Statistics and computing, (2008). [2] Newman, Detecting community structure in networks,The European Physical Journal B-Condensed Matter and Complex Systems, vol. 38, (2004). [3] Blondel, Guillaume, Lambiotte and Lefebvre Fast unfolding of communities in large networks, Journal of Statistical Mechanics: Theory and Experiment, vol. 10, (2008). [4] Dobson, An introduction to generalised linear models (2001). [5] Hubert and Arabie, Comparing partitions, Journal of classification, vol. 2, (1985). [6] Varshney, Chen, Paniagua, Hall, and Chklovskii, Structural properties of the Caenorhabditis ele- gans neuronal network, PLoS computational biology, vol. 7, (2011). Differences in Community Estimation 1 2 3 4 5/6 7 8 9 1 2 3 4 5/67 8 9 Figure 2: ERMM. 1 2 3 4 5 1 2 3 4 5 Figure 3: Louvain. 1 2 3 4 1 2 3 4 Figure 4: Spectral. Network Compression and Degree Distribution with ERMM Block 1 Block 2 Block 3 Block 4 Block 5 Block 6 Block 7 Block 8 Block 9 18 10 23 28 31 10 11 9 14 38 8 30 46 17 40 80 25 8 100 51 11 31 9 12 50 Figure 5: ERMM connectivity structure. 0.01 1.00 1 10 100 Degrees 1CDF Empirical Fitted Figure 6: ERMM’s fitted degree distribution. Qualitative Assessment 0.0 0.2 0.4 0.6 ALL ALS AD BT BTD LD ICC Methods ERMM Louvain algorithm Spectral algorithm Figure 7: ICC scores for the Anatomical location (longitudinal) (ALL), Anatomical location (sectional) (ALS), Anatomical distance (AD), Birth time (BT), Birth time difference (BTD) and Lineage distance (LD). 0.0 0.1 0.2 0.3 FC GC ARI Methods ERMM Louvain algorithm Spectral algorithm Figure 8: ARI scores for Functional Classifica- tion (FC) and Ganglion Classification (GC). Results The optimal ERMM fit consists of 9 classes, while the fits of Louvain and Spectral algorithms consist of 5 and 4 communities, shown in Fig. 2-4 as the reorganised adjacency matrices. The ERMM finds dense blocks on the diagonal, but but also a range of off-diagonal patterns. Note how blocks 5&6, with tight inter-connections and numerous external connections, form a core-periphery structure. Surprisingly, even though blocks 5&6 fit the standard notion of “community" they are not identified by the determinist algorithms. Furthermore, ERMM fit provides a compressed view of the C. elegans network (see Fig.5) and a faithful approximation of the degree distribution (Fig. 6). To score the quality of each fit, we show the ICC scores (Fig. 7) across the known biological features char- acterising nodes and edges. Here, we see that the ERMM fit scores consistently higher than Spectral and Louvain algorithms. In Fig. 8, however, ARI is rather low in general, with the Spectral algorithm showing slightly better similarity with the functional classifications and all methods having similar ARI for ganglion classification. Conclusion We showed that the Erd˝ os- Rényi Mixture Model not only produces more biologically plausible communities but also that it provides an integrated picture of the full mesoscale structure (including core-periphery) and that it allows for compression of the network into a set of super-nodes and their connectivities. We expect these methods to prove useful for the analysis of other types of networks such as human brain functional connectivity.

Understanding the Mesoscale Structure of the C. elegans ... · Dragana M.Pavlovic1, Dr. Petra E.Vertes2, Prof. Edward T.Bullmore2,3, Dr. Thomas E. Nichols1 1University of Warwick,

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Understanding the Mesoscale Structure of the C. elegans ... · Dragana M.Pavlovic1, Dr. Petra E.Vertes2, Prof. Edward T.Bullmore2,3, Dr. Thomas E. Nichols1 1University of Warwick,

Dra

ft

Understanding the Mesoscale Structure of the C. elegans Brain

Network

Dragana M.Pavlovic1, Dr. Petra E.Vertes2, Prof. Edward T.Bullmore2,3, Dr. Thomas E. Nichols1

1University of Warwick, Dept. of Statistics, Coventry, UK; 2University of Cambridge, Brain Mapping Unit, Dept. of Psychiatry,Cambridge, UK; 3GlaxoSmithKline, Clinical Unit Cambridge, Addenbrooke’s Hospital, Cambridge, UK.

Introduction

Figure 1: Nervetracts C. elegans.

Recently, there has been muchinterest in mesoscale structureof networks such as: theirorganisation into communi-

ties and core and periph-

ery. However, it is often dif-ficult to disambiguate the re-lationship between these twotypes of mesoscale structureor, indeed, to summarise thefull network into the relation-ships between its mesoscaleconstituents. Here, we usea stochastic blockmodel ap-proach Erd

˝

os-Rényi Mixture

Model (ERMM)[1] for community estimation andcompare this to the much more widely used deter-ministic methods such as: Louvain [3] and Spectral[2] algorithms. We use the Caenorhabditis elegans

(C. elegans) [6] connectome (Fig. 1) as a model sys-tem in which biological knowledge about each nodeor neuron can be used to validate the functional rel-evance of the communities obtained.

Methods

The ERMM treats the communities (blocks) andtheir mutual connections as mini Erdos-Rényi mod-els, represented in the likelihood with different pro-portions. For a given number of communities Q,a variational approach is used to approximate thelikelihood, while the Integrated Classification Like-lihood (ICL) is used to compare the optimised likeli-hoods over different Q. The final result is an estimateof Q and the partition, visualised as a reorganised ad-jacency matrix. The deterministic methods like theFast Louvain and Spectral algorithms define com-munity as a group of highly connected nodes whosebetween group connections are very small. Bothalgorithms are devised to maximise the modularitybut use different strategies to find its maximum. Forexample, the Fast Louvain algorithm uses a greedyapproach, while the Spectral algorithm uses eigen-values of the modularity matrix to find the optimalpartition.

Analysis

We apply all 3 methods to the C. elegans neural net-work, composed of 279 non pharyngeal neurons andof 2287 undirected edges, and we use the additionalfunctional and anatomical measures to evaluate theestimates of its community structure. For the quanti-tative ground truth measures, we use the Intra ClassCorrelation (ICC) to compare the variance explainedby each community estimates. For the categoricalground truth measures, we use the Adjusted RandIndex (ARI) to compare the similarity.

References

[1] Daudin, Picard, Robin A mixture model for random graphs, Statistics and computing, (2008).

[2] Newman, Detecting community structure in networks,The European Physical Journal B-CondensedMatter and Complex Systems, vol. 38, (2004).

[3] Blondel, Guillaume, Lambiotte and Lefebvre Fast unfolding of communities in large networks,Journal of Statistical Mechanics: Theory and Experiment, vol. 10, (2008).

[4] Dobson, An introduction to generalised linear models (2001).

[5] Hubert and Arabie, Comparing partitions, Journal of classification, vol. 2, (1985).

[6] Varshney, Chen, Paniagua, Hall, and Chklovskii, Structural properties of the Caenorhabditis ele-

gans neuronal network, PLoS computational biology, vol. 7, (2011).

Differences in Community Estimation

1

2

3

45/67

8

9

1 2 3 4 5/67 8 9

Figure 2: ERMM.

1

2

3

4

5

1 2 3 4 5

Figure 3: Louvain.

1

2

3

4

1 2 3 4

Figure 4: Spectral.

Network Compression and Degree Distribution with ERMM

Block 1

Block 2

Block 3

Block 4

Block 5Block 6

Block 7

Block 8

Block 9 1810

23

2831

10

11

9

14388

3046

17

4080

25

8

100

51

11

31

912

50

Figure 5: ERMM connectivity structure.

0.01

1.00

1 10 100Degrees1−CD

F

EmpiricalFitted

Figure 6: ERMM’s fitted degree distribution.

Qualitative Assessment

0.0

0.2

0.4

0.6

ALL ALS AD BT BTD LD

ICC

MethodsERMMLouvain algorithmSpectral algorithm

Figure 7: ICC scores for the Anatomical location(longitudinal) (ALL), Anatomical location (sectional)(ALS), Anatomical distance (AD), Birth time (BT),Birth time difference (BTD) and Lineage distance (LD).

0.0

0.1

0.2

0.3

FC GC

AR

I

MethodsERMMLouvain algorithmSpectral algorithm

Figure 8: ARI scores for Functional Classifica-tion (FC) and Ganglion Classification (GC).

Results

The optimal ERMM fit consists of 9 classes, while the fits of Louvain and Spectral algorithms consist of 5 and4 communities, shown in Fig. 2-4 as the reorganised adjacency matrices. The ERMM finds dense blocks onthe diagonal, but but also a range of off-diagonal patterns. Note how blocks 5&6, with tight inter-connectionsand numerous external connections, form a core-periphery structure. Surprisingly, even though blocks 5&6fit the standard notion of “community" they are not identified by the determinist algorithms. Furthermore,ERMM fit provides a compressed view of the C. elegans network (see Fig.5) and a faithful approximation ofthe degree distribution (Fig. 6).To score the quality of each fit, we show the ICC scores (Fig. 7) across the known biological features char-acterising nodes and edges. Here, we see that the ERMM fit scores consistently higher than Spectral andLouvain algorithms. In Fig. 8, however, ARI is rather low in general, with the Spectral algorithm showingslightly better similarity with the functional classifications and all methods having similar ARI for ganglionclassification.

Conclusion

We showed that the Erdos- Rényi Mixture Model not only produces more biologically plausible communitiesbut also that it provides an integrated picture of the full mesoscale structure (including core-periphery) andthat it allows for compression of the network into a set of super-nodes and their connectivities. We expectthese methods to prove useful for the analysis of other types of networks such as human brain functionalconnectivity.