10
RESEARCH ARTICLE Open Access Revisiting the variation of clustering coefficient of biological networks suggests new modular structure Dapeng Hao 1*, Cong Ren 2and Chuanxing Li 1* Abstract Background: A central idea in biology is the hierarchical organization of cellular processes. A commonly used method to identify the hierarchical modular organization of network relies on detecting a global signature known as variation of clustering coefficient (so-called modularity scaling). Although several studies have suggested other possible origins of this signature, it is still widely used nowadays to identify hierarchical modularity, especially in the analysis of biological networks. Therefore, a further and systematical investigation of this signature for different types of biological networks is necessary. Results: We analyzed a variety of biological networks and found that the commonly used signature of hierarchical modularity is actually the reflection of spoke-like topology, suggesting a different view of network architecture. We proved that the existence of super-hubs is the origin that the clustering coefficient of a node follows a particular scaling law with degree k in metabolic networks. To study the modularity of biological networks, we systematically investigated the relationship between repulsion of hubs and variation of clustering coefficient. We provided direct evidences for repulsion between hubs being the underlying origin of the variation of clustering coefficient, and found that for biological networks having no anti-correlation between hubs, such as gene co-expression network, the clustering coefficient doesnt show dependence of degree. Conclusions: Here we have shown that the variation of clustering coefficient is neither sufficient nor exclusive for a network to be hierarchical. Our results suggest the existence of spoke-like modules as opposed to deterministic modelof hierarchical modularity, and suggest the need to reconsider the organizational principle of biological hierarchy. Background The high relevance between functional organization and topological features has motivated the development of statistical measures to characterize cellular networks. In- creasingly, these measures reveal that biological network organization is characterized by the power law of degree distribution, the concept of modularity and the degree correlations on connected nodes [1-3]. Networks with high modularity have dense connections between the nodes within same cellular functions but sparse connec- tions between nodes in different functions. Furthermore, a central theory in biology is the hierarchical organization of cellular processes, which means that high-level processes are build by connecting low-level ones [4,5]. For example, the process mitosis is composed by several low-level functions, such as spindle assembly, centrosome separation and chromosome alignment. Consequently, it is reasonable to suppose that functional modules of interest are hierarchically organized in the same way, that small modules are combined into larger modules and then further combined into even larger ones. This complexity, therefore, poses great challenges to researchers trying to understand the modularity struc- ture of cellular networks. To identify the hierarchical modularity of metabolic networks, Ravasz et al. focused on detecting a global signatureof network architecture [6,7]. In Ravaszs * Correspondence: [email protected]; [email protected] Equal contributors 1 College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China Full list of author information is available at the end of the article © 2012 Hao et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Hao et al. BMC Systems Biology 2012, 6:34 http://www.biomedcentral.com/1752-0509/6/34

Revisiting the variation of clustering coefficient of

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Revisiting the variation of clustering coefficient of

Hao et al. BMC Systems Biology 2012, 6:34http://www.biomedcentral.com/1752-0509/6/34

RESEARCH ARTICLE Open Access

Revisiting the variation of clustering coefficient ofbiological networks suggests new modularstructureDapeng Hao1*†, Cong Ren2† and Chuanxing Li1*

Abstract

Background: A central idea in biology is the hierarchical organization of cellular processes. A commonly usedmethod to identify the hierarchical modular organization of network relies on detecting a global signature knownas variation of clustering coefficient (so-called modularity scaling). Although several studies have suggested otherpossible origins of this signature, it is still widely used nowadays to identify hierarchical modularity, especially in theanalysis of biological networks. Therefore, a further and systematical investigation of this signature for differenttypes of biological networks is necessary.

Results: We analyzed a variety of biological networks and found that the commonly used signature of hierarchicalmodularity is actually the reflection of spoke-like topology, suggesting a different view of network architecture. Weproved that the existence of super-hubs is the origin that the clustering coefficient of a node follows a particularscaling law with degree k in metabolic networks. To study the modularity of biological networks, we systematicallyinvestigated the relationship between repulsion of hubs and variation of clustering coefficient. We provided directevidences for repulsion between hubs being the underlying origin of the variation of clustering coefficient, andfound that for biological networks having no anti-correlation between hubs, such as gene co-expression network,the clustering coefficient doesn’t show dependence of degree.

Conclusions: Here we have shown that the variation of clustering coefficient is neither sufficient nor exclusive for anetwork to be hierarchical. Our results suggest the existence of spoke-like modules as opposed to “deterministicmodel” of hierarchical modularity, and suggest the need to reconsider the organizational principle of biologicalhierarchy.

BackgroundThe high relevance between functional organization andtopological features has motivated the development ofstatistical measures to characterize cellular networks. In-creasingly, these measures reveal that biological networkorganization is characterized by the power law of degreedistribution, the concept of modularity and the degreecorrelations on connected nodes [1-3]. Networks withhigh modularity have dense connections between thenodes within same cellular functions but sparse connec-tions between nodes in different functions. Furthermore,

* Correspondence: [email protected]; [email protected]†Equal contributors1College of Bioinformatics Science and Technology, Harbin MedicalUniversity, Harbin 150081, ChinaFull list of author information is available at the end of the article

© 2012 Hao et al.; licensee BioMed Central LtdCommons Attribution License (http://creativecreproduction in any medium, provided the or

a central theory in biology is the hierarchicalorganization of cellular processes, which means thathigh-level processes are build by connecting low-levelones [4,5]. For example, the process mitosis is composedby several low-level functions, such as spindle assembly,centrosome separation and chromosome alignment.Consequently, it is reasonable to suppose that functionalmodules of interest are hierarchically organized in thesame way, that small modules are combined into largermodules and then further combined into even largerones. This complexity, therefore, poses great challengesto researchers trying to understand the modularity struc-ture of cellular networks.To identify the hierarchical modularity of metabolic

networks, Ravasz et al. focused on detecting a “globalsignature” of network architecture [6,7]. In Ravasz’s

. This is an Open Access article distributed under the terms of the Creativeommons.org/licenses/by/2.0), which permits unrestricted use, distribution, andiginal work is properly cited.

Page 2: Revisiting the variation of clustering coefficient of

Figure 1 The degree distribution of two metabolic networks.For each network (E.coli and S.typhi, colored by green and pink dotrespectively), the two largest super-hubs are pointed out.

Hao et al. BMC Systems Biology 2012, 6:34 Page 2 of 10http://www.biomedcentral.com/1752-0509/6/34

study, they revealed that for metabolic networks and forcertain hierarchical networks the clustering coefficient,C(k), of a node follows a scaling law with degree k C(k) ~ k-1. To explain this, they proposed a network modelwhich possesses both the power law of degree distribu-tion and the scaling law of C(k). The starting point ofthis network model is a small cluster of five fully con-nected nodes; then creates four identical replicas, con-necting the peripheral nodes of each cluster to thecentral node of the old cluster, resulted in a large 25-node cluster. Next, four replicas of this 25-node clusterare generated and the 16 peripheral nodes are connectedto the central node of the old cluster, obtaining a largercluster of 125 nodes. These replication and connectionsteps can be repeated indefinitely to generate a hierarch-ical architecture. In each step i, the number of nodes inthe network is 5i. This network model, which we expli-citly denote by “deterministic hierarchical model”, hassubsequently a great influence on the studies of networkbiology [8,9], and the scaling of C(k) is widely used toidentify whether or not a network is hierarchically orga-nized nowadays.Two former studies have suggested that the decrease

of C(k) can be tentatively attributed to the tendency thatlarge degree nodes are connected to small degree onesin biological networks[1,10]. For example, Soffer andVazquez proposed a novel measurement of clusteringcoefficient taking into account of the neighborhood de-gree of node, which didn’t scale with k. Their work sug-gested that the variation of C(k) can be attributed toneighborhood degree distribution. However, the “deter-ministic model” is also anti-correlated. Thus, it is stillpossible that both the degree anti-correlation and thevariation of C(k) is the reflection of hierarchy, suggestingthat proper “null model” is needed to clarify their rela-tionships. Moreover, metabolic networks is nicelyapproximated by C(k) ~ k-1, providing a strong evidencefor the existence of hierarchy in these networks. How-ever, to our best knowledge, former studies didn’t dir-ectly indicated why C(k) strictly follows this scaling law(k-1) in metabolic networks. These may be the reasonswhy the variation of C(k) is still widely used in assessingbiological network hierarchy. In fact, almost every studyon biological networks that observed the variation of C(k), including protein-protein networks, functional net-works, human disease networks or even ecological net-works, claimed that they have found a hierarchicalmodular structure, for example [11-17]. This situationsuggested that, a further and systematical investigationof clustering coefficient focused on different types ofbiological networks is necessary. In this work werevealed the reason why C(k) scales with k-1 in metabolicnetworks and suggested by “null model” that the vari-ation of C(k) is neither sufficient nor exclusive for a

hierarchical network. Our findings suggest the existenceof spoke-like topology as opposed to “deterministic hier-archical model”.

Results and DiscussionOrigin of the scaling law in metabolic networksWe start by indicating why clustering coefficient distri-bution of metabolic networks strictly follows the particu-lar scaling law (k-1). The clustering coefficient, definedas CðkÞ ¼ 2N=kðk� 1Þ , provides a measure of the levelof interconnectivity in the neighborhood of a node,where N is number of triangles formed by the node anda link between any two direct neighbors of it [7]. In theformer study, Ravasz et al. found a scaling law of C(k).They argued that this scaling law was not expected for arandom scale-free network of similar sizes, indicatingthe absence of hierarchy in random networks. In thestudy, they used the B-A model to generate randomscale-free networks [2]. One problem with their randomnetwork model, however, is that it does not take into ac-count the existence of so-called super-hubs in metabolicnetworks (i.e. ATP and H2O). Drawing the degree distri-bution of metabolic networks shows the existence of“super hubs” that are unexpected from the approximatedpower-law degree distribution (Figure 1). A single superhub can have great impact on topological measures ofnetwork density such as clustering coefficient, as it car-ries a lot of edges. To take into account the super-hubs,here we generate random networks by randomly rewir-ing the links of metabolic networks, which preserves thesame degree distribution [1]. We plot the C(k) curve ofE.coli and the curves averaged over the metabolic net-works of all 43 organisms, along with their randomizedcounterparts (Figure 2A and 2B). As can be seen,

Page 3: Revisiting the variation of clustering coefficient of

Figure 2 Origin of the scaling law of C(k). (A) C(k) for metabolic network of E.coli (green dot) and a randomized network (black circle). (B) TheC(k) curves averaged over 43 organisms (red circle) and over 43 randomized networks (black circle). (C) The degree distribution of a randomnetwork with 4 super-hubs added. (D) The variation of C(k) of the random network before and after 4 super-hubs added (corresponds to blackand red circles respectively). (E) Number of newly added triangles, 5N as a function of k. (F) C(k) for random networks with large degree nodesadded. In (E) and (F), black circles correspond to the original network, red circles correspond to the network after 2 nodes with degree 2,000added and purple squares corresponds to the network after 200 nodes with degree 20 added.

Hao et al. BMC Systems Biology 2012, 6:34 Page 3 of 10http://www.biomedcentral.com/1752-0509/6/34

inconsistent with the former study, the metabolic net-works and random networks show similar dependenceof clustering coefficient on node’s degree. The onlyproperty shared by metabolic networks and their ran-dom counterparts is the degree sequence, while theprominent feature of the degree sequence of metabolicnetworks is the existence of super-hubs. Therefore, thisresult suggests that super hubs is probably the reasonwhy C(k) scales with k-1. Note that the metabolic net-works and random networks are separated when k islow, suggesting that metabolic networks have relativelyhigher modular level than random networks. However,

this difference cannot be used as an evidence ofhierarchy.One may argue that given the degree distribution,

hierarchical structure of a network is largely defined, soit is not surprising that random networks generated thisway have similar dependence of clustering coefficient onnode’s degree. To rule out this possibility, we next inves-tigate whether the scaling law of C(k) can be reproducedin a totally random network. For this purpose, we firstgenerated a random scale-free network of 10,000 nodeswith degree following P(k) ~ k-2.6, and then added severallarge degree nodes unexpected from the degree

Page 4: Revisiting the variation of clustering coefficient of

Hao et al. BMC Systems Biology 2012, 6:34 Page 4 of 10http://www.biomedcentral.com/1752-0509/6/34

distribution (Figure 2C, using preferentially attachment).As shown in Figure 2D, C(k) of the network with severallarge degree nodes added appears to scale with k-1, asopposed to the original network that has no variation.Therefore, several super-hubs is sufficient to give rise tothe scaling law of C(k). This result is reasonable. For ex-ample, the metabolic network of E.coli has 2,409 edgeswith average claustering coefficient �C ¼ 0:21 and degreeof the largest hub k ¼ 253 . To keep the same value ofclustering coefficient for this super hub, its neighborshave to be connected by n ¼ k k � 1ð Þ � �C=2 ¼ 6694edges, which is nearly 3 times of the number of edges inthe network!We then would like to present an analytical investiga-

tion for this result. Consider an undirected random net-work with S nodes, M edges and average clusteringcoefficient �C , the probability that a newly added node j

has a link with a node i is pij ¼ ki=PS

s¼1ks ¼ ki=2M , andthus the expected number of edges that newly addednode j connects to i is mij ¼ min kj:ki=2M; 1

� �, where ki,

kj are the degrees for nodes i and j and the function min() is to make sure at most one edge connecting twonodes. In a random network with no degree correlation,the average degree of the neighbors of a node would bethe average degree of the network, <k>. Thus, theexpected number of edges that the newly added nodeconnecting to a neighbor of node i is mj<k> ¼min kj� < k > =2M; 1

� �. The number of newly added tri-

angles involving node i that generated by node j con-necting to both node i and its neighbors can be roughlyestimated by ΔNi ¼ ki �mij �mj<k>. For node j with smalldegree kj, mij takes the value mij ¼ kj � ki=2M , and thus

ΔNi ¼ mj<k> � k2i kj2M ¼ αjk2i , where αj is determined by kj.Now the clustering coefficient of node i is C0 kið Þ ��C þ 2ΔNi=ki ki þ 1ð Þ � �C þ 2αj , which doesn’t vary withdegree ki. However, for a node with large kj, mij takesthe value mij ¼ 1 and thus ΔNi ¼ αjki . Considering thatthe clustering coefficient of a random scale-free networkis extremely small ( �C � 0, for example, there are thou-sands of triangles in biological networks, whereas thereare only tens of triangles in random networks of similarsize), the C0 kið Þ is now mainly determined by2ΔNi=ki ki þ 1ð Þ , thus C0 kið Þek�1

i . To test this, we con-structed a network with 10,000 nodes following the dis-tribution P(k) ~ k-2.6, which has only 64 triangles in totaland thus the �C � 0. Then we randomly added 2 nodeswith degree 2,000 and 200 nodes with degree 20 intothis network respectively, of which the number of newlyadded triangles ΔN as a function of degree k is counted(Figure 2E). Although the number of newly added edgesis the same, the number of newly added trianglesincreases in different rates as a function of k: ΔNek1 in

the first case and ΔNek2 in the second case respectively!As a result, the clustering coefficient shows a perfectscaling dependence on node’s degree in the first case,whereas it doesn’t vary with k in the second case(Figure 2F). This striking difference comes from the re-striction mij ¼ min kj � ki=2M; 1

� �. For nodes with small

degrees, mij takes the valuemij ¼ kj � ki=2M , whereas fornodes with large degrees, mij takes the value mij ¼ 1 .Notably, this formula reflects the fact that there is atmost one edge connecting two nodes in these biologicalnetworks. Hence, this formula implies that connectionsbetween large degree nodes in metabolic networks arehighly suppressed, compared to a random network withno constraints on edge multiplicity. For example, thetwo largest hubs in metabolic network of E.coli would beconnected by m ¼ ki � kj=2M ¼ 253 � 128= 2 � 2409ð Þ ¼13:4 edges without constraints on edge multiplicity! Inthis case, a large degree node is forced to connect tosmall degree ones; as a consequence, its clustering coef-ficient is relatively small.It should be noted that the clustering coefficient in the

first case is at least an order of magnitude larger thanthat of the network in the second case, suggesting thatthe existence of super-hubs is one of the origins of highclustering of metabolic networks. Thus, when the levelof clustering coefficient is regarded as a measure ofmodularity level, the existence of super-hubs should beconsidered, otherwise the modularity level of biologicalnetworks would be highly overestimated [7,18].

Variation of C(k) is a reflection of degree correlationNext, we ask whether the existence of super-hubs is theonly reason that biological networks show dependenceof clustering coefficient on node’s degree. However, wefound that for other biological networks, the C(k) curvecan be highly different with random networks of samedegree distribution (for example, the protein-proteininteraction network and the genetic synthetic lethal net-work. Additional file 1: Figure S1), suggesting that thevariation of C(k) cannot be solely attributed to the exist-ence of super-hubs. For metabolic networks, we haveshown that the dependence of clustering coefficient onnode’s degree has its origin in the suppression of hub-hub connections (mij ¼ min kj � ki=2M; 1

� �). Hence, it is

possible that even without the existence of super-hubs,the anti-correlation between hubs is enough to cause thevariation of C(k). Former studies have found that manybiological networks are disassortative, indicating that thestrong repulsion between hubs is frequently observed[1,19-21].To investigate the relationship between repulsion of

hubs and the variation of C(k), we plotted the correl-ation profiles for biological networks, as well as their

Page 5: Revisiting the variation of clustering coefficient of

Figure 3 (See legend on next page.)

Hao et al. BMC Systems Biology 2012, 6:34 Page 5 of 10http://www.biomedcentral.com/1752-0509/6/34

Page 6: Revisiting the variation of clustering coefficient of

(See figure on previous page.)Figure 3 Correlation profiles and the C(k) curves. Correlation profile and C(k) curves of (A) protein network, (B) genetic network, (C) genecoexpression network and (D) metabolic network of E.coli with 21 currency metabolites (i.e. ATP and NADH) removed. Green rectangles in (A)and (B) are the C(k) curves of two simulated random networks.

Hao et al. BMC Systems Biology 2012, 6:34 Page 6 of 10http://www.biomedcentral.com/1752-0509/6/34

clustering coefficient distribution (Figure 3). The correl-ation profile compares the joint probability P(ki, kj) offinding a link between any two nodes of degree ki and kjwith the same probability Pr(ki, kj) in random counter-parts [1,20]. The random counterparts are generated byrandomly rewiring the links of original network, thuspreserving the degree sequence. The correlation profilesfor biological networks were generated by comparingwith 100 randomized counterparts. The protein-proteininteraction network and the genetic synthetic lethal net-work pose a higher level of anti-correlation: nodes withlarge degrees favor to link with nodes of small degrees(Figure 3A and 3B). A measure of degree correlation,known as assortative coefficient r (shown in the figure),is consistent with their correlation profiles [22]. Theircorresponding clustering coefficient distributions show adecline with node’s degree, although with a clear devi-ation from scaling law C(k) ~ k-1. Note that there is arapid decrease of C(k) for large node degree, corre-sponding to the highly suppressed region in the upper-right corner of correlation profiles (colored by darkblue). On the other hand, the gene co-expression net-work and the metabolic network with currency metabo-lites removed display a high level of hub affinity asopposed to anti-correlation; nodes with large degreesfavor to link with other nodes of large degrees(Figure 3C and 3D). Their clustering coefficient distribu-tions, therefore, do not decrease with node’s degree.Thus, the variation of C(k) perfectly coincides with thecorrelation profiles of network structure. This result isconsistent with a former study of Soffer and Vazquez[10]. In their study, Soffer and Vazquez proposed a novelmeasurement of clustering coefficient of node that wasnormalized by its neighborhood degree, which didn’tshow dependence on node’s degree. Their results alsosuggest that the degree correlation is probably the originof the variation of C(k).

Simulated annealingHowever, it is still possible that both the anti-correlationbetween hubs and the variation of C(k) are reflections ofhierarchy. A key procedure of generating “deterministichierarchical model” is to connect peripheral nodes to thecentral node of a certain module and to avoid directlinks between central nodes [7,23]. They provided littleinformation about why this procedure is necessary.However, this procedure helps to give rise to both anti-correlation and the variation of C(k). For example, the

assortative coefficient r ¼ �0:077 for “deterministichierarchical model” network of 56 = 15625 nodes. Torule out the possible that both the anti-correlation andthe variation of C(k) are reflections of hierarchy, one hasto investigate whether a random network with similarlevel of repulsion between hubs would have the samevariation of C(k). However, one problem with the ran-dom networks is that they are much less modular thanbiological networks. In other words, the neighbors of asame node are more likely to be linked in biological net-works than in random networks (that is, more likelyforming a triangle). To overcome this, we generated ran-dom networks by combining edge rewiring method andsimulated annealing algorithm. We first generate seednetworks that preserves the joint probability P(ki, kj) (seeMaterials and methods), and then conduct simulatedannealing introducing an effective temperature T to glo-bally minimize the following score function: E ¼Nrandom �Nj j=N, where Nrandom is the number of trian-gles in random network and N is the number of trianglesin real network. Random networks with the same levelof modularity will have the minimum score 0. At eachMonte Carlo step, we select two edges at random fromthis network and replace them with two new ones by re-wiring them with a probability min exp �ΔE=Tð Þ; 1½ �, onone condition that the rewiring step preserves the jointprobability P(ki, kj). Then, This Monte Carlo step isrepeated until E achieves a stationary value. Because theminimum energy is given, it is easy to get a networkwith similar level of modularity as real network by indu-cing an appropriate temperature T. Figure 4 and Bshows the C(k) curves for two random networks anneal-ing to the two biological networks that show variation ofC(k) (the green rectangles). As can be seen, the C(k)curves of the two random networks overlap with bio-logical networks nicely. Since the seed networks are ran-dom networks, this result confirms that the dramaticdecline of C(k) with node’s degree is the reflection of re-pulsion between hubs rather than reflection of rigid hier-archy as characterized by “deterministic hierarchicalmodel”.One concern is that a few of edges of the generated

random networks and biological networks may be over-lapped, and thus hierarchy structure is conserved in nullnetworks. To rule out this possibility, we further gener-ated much more stringent but uncorrelated random net-works of which a large fraction of edges are overlappedwith the biological network. However, we found that the

Page 7: Revisiting the variation of clustering coefficient of

A B

Deterministic hierarchical model

Spoke model

C

The top 6 most connected nuclear proteins

D EMRPS17

RSM19

MRPS18

MRPS8

MRP21

MRP4

MRP1PET123

MRPS5MRP10RSM23

RSM26 RSM22RSM24

MRP13

MRPS28

MRPS9

RSM25

MRP17

MRPS16MRP51

RSM27NAM9RSM10

RSM7

GID7

RMD5

VID24

VID28

FYV10

GID8

VID30

POP7

POP3

POP5

POP4

POP8POP1

POP6

RPP1

mitochondrial ribosomal small subunit

FBP degradation complex nucleolar ribonuclease P complex

SRB4

Kornberg's mediator (SRB) complex

Figure 4 Spoke modules. (A) Deterministic hierarchical model. (B) Spoke model. (C) The connection of the top 6 best connected nuclearproteins and their neighbors displays a spoke-like topology. (D) Spoke-like protein complexes. (E) Gene srb4 is a member of clique and a memberof spoke-like topology.

Hao et al. BMC Systems Biology 2012, 6:34 Page 7 of 10http://www.biomedcentral.com/1752-0509/6/34

variation of C(k) was substantially disappeared (Add-itional file 2: Figure S2). One should also note that theclustering coefficient distribution of Figure 3 shows cleardeviation from any scaling law C(k) ~ k-β, further sug-gesting that biological networks cannot be characterizedby the “deterministic hierarchical model”.

Spoke versus “deterministic hierarchical model”The “deterministic hierarchical model” suggests that thevariation of C(k) is caused by rigid hierarchy that is builtby connecting the external nodes of low-level dispersedmodules to the central nodes of a high-level module.(Figure 4A) [23]. However, our results suggest that thevariation of C(k) in biological networks, is caused by theabundance of large degree nodes connecting to thosewith much smaller degree, which we refer to as a heuris-tic “spoke” model (Figure 4B). The two models can beeasily checked by visualizing the connection of a fewhubs for a real network. Figure 4B shows the connectionof the top 6 best connected proteins and their neighborsin a small protein interaction network formed by pro-teins localized in nucleus according to a high-confidencedataset (Figure 4C) [24]. Apparently, the protein net-work supports the picture of “spoke” model rather thanrigid hierarchy of “deterministic hierarchical model”.What do our results suggest for the conception of

modularity? First of all, they suggest the existence of

functional modules that are spoke-like or built by con-necting spoke-like topologies. This new view will includemany biological modules that can not be revealed byfinding densely connected regions such as cliques or k-cores. For example, the functional module associated tocell wall organization is built by connecting severalspokes (Additional file 3 Figure S3). Many biologicalpathways include enzymes and tens of its substrates maybe better depicted by this view of modularity. We foundthat even protein complexes could be spoke-like as well.Figure 4D shows three protein complexes of S. cerevi-siae, of which FBP degradation complex and nucleolarribonuclease P complex are built by a single spoke, whilemitochondrial ribosomal small subunit is built by con-necting two spokes centered on mrp4 and mrps5 re-spectively. However, we stress that the traditional idea ofmodularity as finding densely connected regions is stilluseful in identifying cellular machines. In fact, the pro-tein network integrates “spoke” topology and denselyconnected regions into a highly interconnected web. Asingle molecule could be both a member of clique and amember of spoke-like topology. For example, srb4encodes a core component of the SRB mediator complexof S. cerevisiae and is required for transcription of mostyeast genes. However, the execution of the function ofsrb4 also relies on the interaction of many poorly con-nected genes outside the complex such as cbs1, a

Page 8: Revisiting the variation of clustering coefficient of

Hao et al. BMC Systems Biology 2012, 6:34 Page 8 of 10http://www.biomedcentral.com/1752-0509/6/34

mitochondrial translational activator of cob mRNA,resulting in a large spoke centered on srb4 (Figure 4E).These explain why C(k) shows negative dependence onnode’s degree in protein network, even though there area large number of protein complexes.

New hierarchical modularity paradigmFinally, our work raises two fundamental questions: aquestion about motivation of spoke-like topology duringevolution and a question about how low-level modulescommunicate with each other to generate high-levelones. A possible answer for the first question is that sup-pression between hubs confines mutational perturba-tions to the local. It is widely accepted that hub genesare more essential than poorly connected genes. Thus,the overabundance of spoke-like topology may reducethe accumulative effect of the mutational perturbationsof two directly connected hubs. Another possible answerfor the first question is that the overabundance ofspoke-like topology shortens the distance between mole-cules, and thus signals propagate more quickly. A mol-ecule connecting with a hub is more easily to propagateits signal than a molecule connecting with a poorly con-nected node. Given that most molecules of biologicalnetworks are poorly connected, this may be one of thereasons why these networks favor spoke-like topology.This speculation is supported by the finding that morenodes in an assortative network (i.e., social network) failto connect to the largest component to propagate its sig-nal than in a disassortative network (i.e., World WideWeb) [19,22].Apparently, cellular processes are hierarchically orga-

nized, so does the biological networks consisting ofinteracting molecules that carry out cellular functions.The second question is about how higher-level cellularfunctions build by connecting low-level ones in bio-logical networks. To answer this question, we studied asubnet related to cellular response to stress, which con-sists of several low-level cellular functions such as re-sponse to heat, starvation, osmotic stress and so on(Figure 5A). This subnet consists of both spoke-like top-ologies (i.e., nodes around gene hog1, pho85 and cdc48)and a clique (i.e., members of nuclear pore complexsuch as nup100, nup133, nup120 and nup84). From thissubnet, one can find that functional modules need notbe rigid, densely interconnected structures. Moreover,genes may belong to different modules at the same levelof the hierarchy, which are in contrast to the “determin-istic hierarchical model”. The overlap between functionalmodules is consist with the fact that genes are alwaysmulti-functional, which allows one function to influenceanother more effectively. For example, cdc28 is not onlya regulator in cellular response to stress (Figure 5A), butalso a regulator in mitosis, which may conduct the

interplay between environmental stress and cell cycle. Ithas been found that osmotic stress causes the downregulation of cdc28 activity and causes a cell cycle delayin Saccharomyces cerevisiae [25]. To include these fea-tures, we introduce a continuous modularity paradigm(Figure 5B), where the border of a module and overlapbetween modules can both be found, allowing eachmodule to accomplish a relatively autonomous functionand to influence the function of other modules. Thisnew paradigm is enriched of spoke-like topology; how-ever, a few cliques can also be seen. A high-level func-tional module is built by connecting these overlappedmodules together, and several high-level modules furtherbuild a higher-level module in a similar way. This pro-cedure can be repeated to generate a hierarchical archi-tecture. This new paradigm may not be as simple andconcise as “deterministic hierarchical model”, but ittakes into account the multi-functionality of biologicalmolecules and flexibility of modular structure. Further-more, it suggests that appropriate overlap could be a pri-mary basis for a high-level cellular function to integrateinformation from its low-level modules and resolve con-flicts between them.

ConclusionsIt is widely accepted that biological hierarchy can be wellcharacterized by a “deterministic hierarchical model”,because it reconciles modularity and scale-freeness, withC(k) following a scaling law [7]. A later study furtherdeveloped a more general power-law of C(k) to identifyhierarchical network [23]. Although the model success-fully shows that C(k) of a “deterministic hierarchicalmodel” network follows the scaling law C(k) ~ k-1, thereis no evidence showing that a network following thisscaling law is necessarily a network of hierarchy. There-fore, it is not sufficient to identify network hierarchy.More evidences comes from the fact that many networkswith no significant variation of C(k) are also hierarchic-ally organized. It has been found that many complex sys-tems have hierarchical organization, including socialnetworks that are known to be assortative and lack thevariation of C(k) [26,27]. These studies further suggestthat the scaling of C(k) is neither a sufficient nor aneeded condition for a network to be hierarchical. Al-though two former studies have suggested the shortcom-ings of using the variation of C(k) in assessing networkhierarchy [10,26], our study provided further and moredirect evidences. Nowadays, many sophisticated modelshave been developed to include the variation of C(k) anddegree distribution. However, since the variation of C(k)is still widely used as a standard indicator of hierarchicalnetwork structure, it is necessary to specifically pointout the limitations of “deterministic hierarchical model”.By doing this, our study suggests the need to reconsider

Page 9: Revisiting the variation of clustering coefficient of

VAM3

YPT1

ATG1

ATG4

COG2

COG4

ATG14

COG3

ATG10

VPS30

SEC18

ATG8

ATG7

STE50

ATG13ATG17

ATG3

OPY2

VAC8

STE20 ACT1ASK10

STE11 NTG1SHP1

NUP84ATG12ATG16 GLR1

NUP60ATG5

SIC1

NUP2PRB1

NUP100 NUP120

YBP1

NUP42

YAP1EDE1

GLE2

NUP133

NUP188

TSA1

ARF1

SIP2

MGS1

UBR1

ATG2

PSY4

ATG18

ARF2

CLG1PHO80SOD1

CCS1 PHO4

SNF1

SAN1

PPH3

UFD1

PCL1

NPL4

PSY2

SIT4

REG1

PRX1

SSK1

SSK2 SSK22KOG1

RIM15

PHO85

PCL5

YPD1

SEC7

SLN1

MSB2SSB1

PTP3SWI6 CDC48

SLT2

GND1 ATG20CDC28

BCK1

SHO1

PTP2

HOT1 PBS2HOG1

RCK2PTC1

SKO1

Response to heat

Response to starvation

Response to osmotic stress

Response to topological incorrect protein

Cellular response to stress

A

B

Modularity paradigm

Figure 5 New hierarchical modularity paradigm. (A) The subnet related to cellular response to stress, which consists of several low-levelcellular functions including response to heat, starvation, osmotic stress and topological incorrect protein. (B) New modularity paradigm. Colorsrepresent modules.

Hao et al. BMC Systems Biology 2012, 6:34 Page 9 of 10http://www.biomedcentral.com/1752-0509/6/34

the modularity nature of biological systems. In particu-lar, we stress the importance of overlap in the communi-cation of different modules. Our study may be applicableto other complex networks as well, such as WWW, ofwhich the variation of C(k) was interpreted as the exist-ence of network hierarchy too [23].

MethodsDatasetsOur analysis includes four types of biological networksof yeast: Physical protein interaction network, geneticsynthetic lethal network, gene co-expression networkand metabolic network. Dataset of protein-protein inter-action was obtained from DIP (version 10/2010) [28]. Todisplay the organization for the top 6 best connected nu-clear proteins, a high-confidence dataset curated fromliteratures and high-throughput sources was used [24],where the subcellular localization information wasaccording to MIPs annotation [29]. Dataset of syntheticlethal interaction was obtained from Biogrid (version3.1.72) [30], and the metabolic networks of 43 organismswere obtained from Jeong H et al. [31]. The gene co-expression network was constructed according to theyeast cell cycle expression data [32]. Arrays wheregreater than 10% of the gene expression information wasmissing were removed and genes where more than 7arrays the expression information was missing wereremoved. Then, the Pearson coefficient was calculatedfor every gene pair, and only gene pairs with absolute

value larger than 0.65 were used to construct the geneco-expression network.

Random networksTo generate seed networks that preserves the joint prob-ability P(ki, kj), we draw N � P kð Þ nodes from the degreedistribution P(k) for each degree k, and then form anode set S containing ki copies of each node i, where Ndenotes the number of nodes in biological network.Then, we select at random two nodes from S, connectthem to generate a new random network and then re-move them from S. At each time, we estimate the jointprobability R(ki, kj) in the random network, and test ifR ki; kj� �

≤P ki; kj� �

. When the condition is not fulfilled,we discard the two nodes and draw two new ones fromS. This step is repeated until R ki; kj

� � ¼¼ P ki; kj� �

for allthe degrees.

Additional files

Additional file 1: Figure S1. Clustering coefficient distribution. (A) C(k)curves of protein interaction network and (B)genetic interaction network,as well as their random counterparts of same degree distribution(generated by randomly rewiring the edges, black circles).

Additional file 2: Figure S2. The variation of C(k) of protein interactionnetwork (red circles) and stringent but uncorrelated random network(black circles). The random network and the protein interaction networkhave at least 30% of edges overlapped.

Page 10: Revisiting the variation of clustering coefficient of

Hao et al. BMC Systems Biology 2012, 6:34 Page 10 of 10http://www.biomedcentral.com/1752-0509/6/34

Additional file 3: Figure S3. The functional module associated to cellwall organization is built by connecting several spokes in yeastinteractome.

Competing interestsThe author(s) declare that they have no competing interests.

Authors’ contributionDP and CL contributed to the design of the study and the writing of themanuscript. DP and CR performed the analysis in Figure 1, 2, 3 4 andAdditional file 1: Figure S1, Additional file 2: Figure S2. Additional file 3:Figure S3, DP and CL contributed to the revision and Figure 5. CRcontributed to the biological discussion in the manuscript. All authors readand approved the final manuscript.

AcknowledgmentsCL was supported by the National Natural Science Foundation of China(Grant Nos.30600367) and Foundation of Harbin Medical University (WLD-QN1107).

Author details1College of Bioinformatics Science and Technology, Harbin MedicalUniversity, Harbin 150081, China. 2The Second Department of Orthopedics,the Second Affiliated Hospital of Harbin Medical University, Harbin 150081,China.

Received: 12 October 2011 Accepted: 16 February 2012Published: 1 May 2012

References1. Maslov S, Sneppen K: Specificity and stability in topology of protein

networks. Science 2002, 296:(5569)910–913.2. Barabasi AL, Albert R: Emergence of scaling in random networks. Science

1999, 286(5439):509–512.3. Girvan M, Newman ME: Community structure in social and biological

networks. Proc Natl Acad Sci USA 2002, 99(12):7821–7826.4. Alon U: Biological networks: the tinkerer as an engineer. Science 2003,

301(5641):1866–1867.5. Hartwell LH, Hopfield JJ, Leibler S, Murray AW: From molecular to modular

cell biology. Nature 1999, 402(6761 Suppl):C47–52.6. Ravasz E: Detecting hierarchical modularity in biological networks.

Methods Mol Biol 2009, 541:145–160.7. Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabasi AL: Hierarchical

organization of modularity in metabolic networks. Science 2002, 297(5586):1551–1555.

8. Alm E, Arkin AP: Biological networks. Curr Opin Struct Biol 2003, 13(2):193–202.9. Barabasi AL, Oltvai ZN: Network biology: understanding the cell's

functional organization. Nat Rev Genet 2004, 5(2):101–113.10. Soffer SN, Vazquez A: Network clustering coefficient without degree-

correlation biases. Phys Rev E Stat Nonlin Soft Matter Phys 2005,71(5 Pt 2):057101.

11. Yook SH, Oltvai ZN, Barabasi AL: Functional and topologicalcharacterization of protein interaction networks. Proteomics 2004,4(4):928–942.

12. Hawkins T, Chitale M, Kihara D: Functional enrichment analyses andconstruction of functional similarity networks with high confidencefunction prediction by PFP. BMC Bioinformatics 2010, 11:265.

13. Kumar G, Ranganathan S: Network analysis of human protein location.BMC Bioinformatics 2010, 11(Suppl 7):S9.

14. Zhang X, Zhang R, Jiang Y, Sun P, Tang G, Wang X, Lv H, Li X: Theexpanded human disease network combining protein-proteininteraction information. Eur J Hum Genet 2011.

15. Zhou J, Deng Y, Luo F, He Z, Tu Q, Zhi X: MBio 2010, 1(4):e00169-10.16. Wuchty S, Almaas E: Evolutionary cores of domain co-occurrence

networks. BMC Evol Biol 2005, 5:24.17. Muller H, Acquati F: Topological properties of co-occurrence networks in

published gene expression signatures. Bioinform Biol Insights 2008, 2:203–213.18. Wagner A, Fell DA: The small world inside large metabolic networks. Proc

Biol Sci 2001, 268(1478):1803–1810.

19. Newman ME: Assortative mixing in networks. Phys Rev Lett 2002, 89(20):208701.

20. Song C, Havlin S, Makse HA: Origins of fractality in the growth of complexnetworks. Nature Physics 2006, 2:275–281.

21. Yamada T, Bork P: Evolution of biomolecular networks: lessons frommetabolic and protein interactions. Nat Rev Mol Cell Biol 2009, 10(11):791–803.

22. Newman ME: Mixing patterns in networks. Phys Rev E Stat Nonlin SoftMatter Phys 2003, 67(2 Pt 2):026126.

23. Ravasz E, Barabasi AL: Hierarchical organization in complex networks. PhysRev E Stat Nonlin Soft Matter Phys 2003, 67(2 Pt 2):026112.

24. Batada NN, Reguly T, Breitkreutz A, Boucher L, Breitkreutz BJ, Hurst LD, TyersM: Still stratus not altocumulus: further evidence against the date/partyhub distinction. PLoS Biol 2007, 5(6):e154.

25. Belli G, Gari E, Aldea M, Herrero E: Osmotic stress causes a G1 cell cycledelay and downregulation of Cln3/Cdc28 activity in Saccharomycescerevisiae. Mol Microbiol 2001, 39(4):1022–1035.

26. Sales-Pardo M, Guimera R, Moreira AA, Amaral LA: Extracting thehierarchical organization of complex systems. Proc Natl Acad Sci USA2007, 104(39):15224–15229.

27. Clauset A, Moore C, Newman ME: Hierarchical structure and theprediction of missing links in networks. Nature 2008, 453(7191):98–101.

28. Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D: TheDatabase of Interacting Proteins: 2004 update. Nucleic Acids Res 2004, 32:D449–D451. Database issue.

29. Mewes HW, Amid C, Arnold R, Frishman D, Guldener U, Mannhaupt G,Munsterkotter M, Pagel P, Strack N, Stumpflen V, et al: MIPS: analysis andannotation of proteins from whole genomes. Nucleic Acids Res 2004, 32:D41–D44. Database issue.

30. Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M: BioGRID:a general repository for interaction datasets. Nucleic Acids Res 2006, 34:D535–D539. Database issue.

31. Jeong H, Tombor B, Albert R, Oltvai ZN, Barabasi AL: The large-scaleorganization of metabolic networks. Nature 2000, 407(6804):651–654.

32. Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO,Botstein D, Futcher B: Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarrayhybridization. Mol Biol Cell 1998, 9(12):3273–3297.

doi:10.1186/1752-0509-6-34Cite this article as: Hao et al.: Revisiting the variation of clusteringcoefficient of biological networks suggests new modular structure. BMCSystems Biology 2012 6:34.

Submit your next manuscript to BioMed Centraland take full advantage of:

• Convenient online submission

• Thorough peer review

• No space constraints or color figure charges

• Immediate publication on acceptance

• Inclusion in PubMed, CAS, Scopus and Google Scholar

• Research which is freely available for redistribution

Submit your manuscript at www.biomedcentral.com/submit