BDRC_Multicellular_dev.pdf

Embed Size (px)

Citation preview

  • 7/28/2019 BDRC_Multicellular_dev.pdf

    1/22

    Apprehending Multicellularity: RegulatoryNetworks, Genomics, and Evolution

    L. Aravind,* Vivek Anantharaman, and Thiago M. Venancio

    The genomic revolution has provided the first glimpses of the architec-ture of regulatory networks. Combined with evolutionary information,the network view of life processes leads to remarkable insights intohow biological systems have been shaped by various forces. This under-standing is critical because biological systems, including regulatory net-works, are not products of engineering but of historical contingencies.In this light, we attempt a synthetic overview of the natural history ofregulatory networks operating in the development and differentiation ofmulticellular organisms. We first introduce regulatory networks andtheir organizational principles as can be deduced using ideas from the

    graph theory. We then discuss findings from comparative genomics toillustrate the effects of lineage-specific expansions, gene-loss, and non-protein-coding DNA on the architecture of networks. We consider theinteraction between expansions of transcription factors, and cis regula-tory and more general chromatin state stabilizing elements in the emer-gence of morphological complexity. Finally, we consider a case study ofthe Notch subnetwork, which is present throughout Metazoa, to exam-ine how such a regulatory system has been pieced together in evolutionfrom new innovations and pre-existing components that were originallyfunctionally distinct. Birth Defects Research (Part C) 87:143164,2009. VC 2009 Wiley-Liss, Inc.y

    INTRODUCTIONThe history of biology has been

    marked by considerable provinci-alism, despite the availability of aunifying framework in the form ofthe evolutionary theory for at leastthe past 150 years (Darwin, 1859;Mayr, 1982) (as Dobzhanksy (1973)remarked: Nothing in BiologyMakes Sense Except in the Light ofEvolution. For a good portion ofthis period, most major disciplineswithin biology emerged and oper-ated in relative isolation beforebeing integrated into the overarch-ing framework of the science. Dur-

    ing this phase, evolutionary stud-ies, taxonomy, and ecology formed

    relatively isolated pursuits of thenaturalists in the Darwinian tradi-

    tion, whereas genetics, develop-mental biology, and biochemistryfollowed their own largely inde-pendent traditions (Mayr, 1982).However, by the second half of theprevious century there were sev-eral partial unifications centered ongeneticsthe neo-Darwinian syn-thesis that successfully combinedgenetics and the evolutionarytheory and the rise of developmen-tal genetics that provided the firstglimpses of how genes cooperatedto specify the forms of multicellular

    organisms (Huxley, 1942; Raff,1996; Gould, 2002). The first hints

    of a more fundamental unificationwere seen with the beginnings ofmolecular biologyit provided ameans of understanding genes andtheir products at a molecular level,thereby bridging the gap betweenthe phenotype and its underly-ing biochemical basis (Morange,1998). One of the consequences of

    this was the emergence of theso-called evo-devo field, whichsought to incorporate evolutionaryprinciples to explain aspects of ani-mal development and the emer-gence of the diversity in animalforms (Raff, 1996; Arthur, 2002).A primary result from these studies

    was the identification of numerousevolutionarily conserved pathwaysthat determined tissue differentia-tion and pattern formationthroughout Metazoa, despite theirapparent morphological disparity.

    The flip-side of this discipline wasthe relatively narrow focus on fewconserved genetic pathways,rather than objectively addressingthe mechanisms behind the biolog-ical diversity as specified in thetotal gene complement of organ-isms (Raff, 1996; Arthur, 2002;Kirschner and Gerhart, 2005).

    Starting in 1995, there was averitable revolution in biology,with the complete sequencing ofthe first genome of an organism(Fleischmann et al., 1995). In the

    coming years not only was themonumental task of sequencing

    REVIEW

    VC 2009 Wiley-Liss, Inc.

    Birth Defects Research (Part C) 87:143164 (2009)

    L. Aravind, Vivek Anantharaman, and Thiago M. Venancio are from National Center for Biotechnology Information, NationalLibrary of Medicine, National Institutes of Health, Bethesda, Maryland

    yThis article is a US Government work and, as such, is in the public domain in the United States of America.

    Grant sponsors: Intramural Research Program of the NIH; National Library of Medicine

    *Correspondence to: L. Aravind, National Center for Biotechnology Information, National Library of Medicine, National Institutes ofHealth, Bethesda, MD 20894. E-mail: [email protected]

    Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/bdrc.20153

  • 7/28/2019 BDRC_Multicellular_dev.pdf

    2/22

    the genomes of most modelorganisms (Goffeau et al., 1996;Blattner et al., 1997; Kunst et al.,1997; C. elegans Sequencing Con-sortium, 1998; Adams et al.,2000; Arabidopsis Genome Initia-

    tive, 2000; Aparicio et al., 2002;Sodergren et al., 2006), includinghumans (Venter et al., 2001;International Human GenomeSequencing Consortium, 2004),achieved, but sequencing of thegenomes of several experimentallyless-tractable organisms was alsocompleted (Ivens et al., 2005;Loftus et al., 2005; Abad et al.,2008). These developments openedup unprecedented new avenues:(1) they allowed researchers tobreak free from the constraints

    of conventional forward-geneticsstudies. Organisms could now bestudied on the basis of their com-plete gene sets rather than on thebasis of limited prior hints fromother model organisms. (2) The

    use of powerful computationalmethods to analyze protein andnucleic acid sequences and struc-tures helped develop high confi-dence predictions regarding bio-logical function directly fromgenome sequence. In many cases,such predictions based on the evo-

    lutionary principles and the statis-tical power of sequence analysiswent far beyond what could beinferred through nave experi-mental genetic or biochemicalexplorations of the same proteinor nucleic acid molecule (Altschulet al., 1997; Durbin, 1998). (3)The birth of genomics allowed thefirst robust reconstructions of evo-lutionary relationships betweenorganisms. It also enabled theidentification of the genomic cor-relates of major morphological

    transitions in evolution, such asemergence of eukaryotes and theorigins of multicellularity (Aravindand Subramanian, 1999; Doolittle,1999; Aravind et al., 2000; Lespi-net et al., 2002). (4) Genomicsalso provided the foundation for awhole class of high-throughputstudies on cellular and develop-mental processes that tried toaddress the function of every gene

    in a given organism. These studiestook several formsgeneration of

    large-scale gene-knockout reposi-

    tories (Giaever et al., 2002; Moer-man and Barstead, 2008), condi-tion/tissue-specific gene expres-sion maps (Hughes et al., 2000;Murray et al., 2004; Jongeneelet al., 2005), determination ofcomplete proteinprotein interac-tion maps of several organisms(Giot et al., 2003; Li et al., 2004;LaCount et al., 2005; Rual et al.,2005; Gavin et al., 2006; Krogan

    et al., 2006; Ewing et al., 2007),identification of transcription fac-

    tor-target gene interactions (Lee

    et al., 2002; Harbison et al.,2004; Luscombe et al., 2004;Teichmann and Babu, 2004;Gama-Castro et al., 2008; Sierroet al., 2008), determination ofparts of the proteome subject tovarious post-translational modifi-cations (Peng et al., 2003; Ptaceket al., 2005), and interactionsbetween genes and regulatoryRNAs (Amaral et al., 2008). While

    these studies are far from com-plete, they have already produced

    Figure 1. The upper panel shows the main technologies used for generating high-throughput data that are further used to compute the regulatory networks. Someimportant advantages and disadvantages of each technique are emphasized. Depend-ing on the nature of the data, directed or undirected networks can be inferred. Themodular structure of the network can also be explored through the detection of motifs(in directed networks) and cliques (in undirected networks).

    144 ARAVIND ET AL.

    Birth Defects Research (Part C) 87:143164, (2009)

  • 7/28/2019 BDRC_Multicellular_dev.pdf

    3/22

    data on an unrivalled scale andare promising to change the wayall aspects of biology areaddressed.

    One hope is that the combina-tion of these studies might allow a

    unification of the seemingly inde-pendent disciplines within biology,beyond what has been previouslyachieved (Kirschner and Gerhart,2005). In particular, it is hopedthat evolution, biochemistry, anddevelopment could be brought to-gether successfully to explain thediversity of multicellular forms. Anotable aspect of moving towardsuch a unified view has been thedevelopment of the network rep-resentation of biological data(Shen-Orr et al., 2002; Barabasi

    and Oltvai, 2004; Balaji et al.,2006a; Gianchandani et al., 2006;Russell and Aloy, 2008). Networksor graphs represent various enti-ties such as genes, proteins, orother metabolites as nodes, which

    are then connected by edges,which represent an abstraction ofa particular type of association orinteraction (see Fig. 1). Suchinteractions between nodes maytake many forms, such as regula-tory interactions between a geneand a transcription factor or a reg-

    ulatory RNA, proteinprotein inter-actions, genetic interactionsbetween two genes, an abstractionrepresenting the post-translationalmodification of one protein byanother, a reaction linking twosuccessive compounds in a bio-chemical pathway or the linkage ofindividual domains in a polypep-tide (see Fig. 1). The immediateadvantage of such representationsis that they can be explored forpatterns and features by thehuman eye, while at the same

    time being amenable to computa-tional operations. This latter set ofoperations has been inspired bymethods from graph theory, and isof enormous value in extractingpreviously concealed informationregarding the system as a whole(Shen-Orr et al., 2002; Barabasiand Bonabeau, 2003; Barabasiand Oltvai, 2004; Balaji et al.,2006a; Gianchandani et al., 2006;

    Russell and Aloy, 2008). Thus, onecan for the first time explore how

    the surrounding context of path-ways affect the behavior of anindividual pathway, which mighthave been put together frompainstaking genetic or molecularstudies. Another less-appreciated,

    but vital aspect of network repre-sentations has been the ability tointerface them with conventionalevolutionary studies. Such investi-gations previously concentratedon the evolution of the nodes ofthe networks, i.e., proteins ornucleic acids. But they can now beintegrated with the evolutionarychanges relating to their biologicalroles, i.e., the edges which repre-sent their interactions.

    The success of the aboveapproach, often termed the

    systems approach, in the pastdecade has resulted in an abun-dance of these network represen-tations, especially for the unicellu-lar models such as Saccharomycescerevisiae and Escherichia coli,and to a certain extent the multi-cellular animals including humansand parasites, such as Plasmo-dium falciparum (Shen-Orr et al.,2002; Barabasi and Oltvai, 2004;LaCount et al., 2005; Balaji et al.,2006a; Gianchandani et al., 2006;Russell and Aloy, 2008). However,

    in multicellular forms, develop-mental process and spatial differ-entiation have presented technicaldifficulties for the complete appli-cation of the systems approach.Despite obvious differences betweenmulticellular forms and the unicel-lular models, there are underlyingcommonalities of great signifi-cance. Firstly, in the context of de-velopment, though multicellularforms exhibit spatial and temporaldifferentiated states, these havecognates in the temporal differen-

    tiation exhibited by the unicellularmodelsi.e., the same cell of a uni-cellular organism assumes very dif-ferent metabolic and physiologicalstates over time (rather than space)in the course of encountering differ-ent environmental inputs. Sec-ondly, evolutionary studies showthat various multicellular lineagesobserved in animals, amoebozoans,plants, and fungi have closelyrelated to unicellular sister-groups,which appear to approximate the

    ancestral condition from which mul-ticellularity emerged (James et al.,2006; Ruiz-Trillo et al., 2008; Paw-lowski and Burki, 2009). Hence, theprinciples of biological networkstructure and dynamics gleaned

    from unicellular models, when com-bined with the more sparse datafrom multicellular forms, could illu-minate several aspects pertainingto the provenance and expressionsof multicellularity.

    In this review, we attempt tocombine concepts related to theorganization of various regulatorynetworks with evolutionary infer-ences derived from comparativegenomics to present a syntheticview of some aspects of the originand diversification of multicellular

    forms. Our intention is not to com-prehensively list the conclusions ofall studies in this direction sincethe coming to fore of the systemsapproach. Instead, we seek tohighlight key points, includingsome that are relatively neglected,and then present their potential inunderstanding aspects of the biol-ogy of multicellular forms. Toachieve this, we layout the reviewin three broad and apparently dis-tinct sections: (1) we first intro-duce types of regulatory networks

    and the principles that can bededuced from them. (2) We thenconsider the major conclusionsemerging from comparativegenomics to provide the evolution-ary context for the nodes andedges in networks. (3) Finally, weconsider a case study to illustratehow an actual regulatory subnet-work pertinent to tissue differen-tiation in animals has been piecedtogether in evolution.

    REGULATORY NETWORKS

    Regulatory networks and

    Their Types

    As mentioned earlier, a widerange of biological networks havebeen reconstructed, chiefly differ-ing in the abstraction specified bytheir edges (see Fig. 1). Amongthese there are generic networks,which encompass all genes or theirprotein products, such asthe genetic interaction networks(GInet) (Li et al., 2004; Collins

    MULTICELLULARITY AND NETWORKS 145

    Birth Defects Research (Part C) 87:143164, (2009)

  • 7/28/2019 BDRC_Multicellular_dev.pdf

    4/22

    et al., 2007) or the proteinproteininteraction networks (PPInet) (Rualet al., 2005; Gavin et al., 2006;Krogan et al., 2006), and morerestricted networks connectingtranscription factors to their target

    genes (Tnet) (Harbison et al.,2004; Luscombe et al., 2004;Balaji et al., 2006a; Vermeirssenet al., 2007) or regulatory RNA-tar-get gene networks (Ke et al.,2003). In this article, we primarilyconsider regulatory networks.While there is some fuzziness indefining these networks, there isno difficulty in recognizing such anetwork. A regulatory network canbe defined as a network where thenodes are either genes or theirproducts, and the edges signify

    transcriptional, post-transcrip-tional, or post-translational controlof one node by another. It mightalso more abstractly signify twogenes interacting in a regulatorycascade, commonly termed signal-

    ing pathways, due to genetic epis-tasis or physical interaction involv-ing their products. We usually donot consider certain structuralinteractions, for example, interac-tions of proteins and RNAs in con-stituting the mature ribosome, in aregulatory network. The archetypal

    examples of regulatory networksare Tnets that capture the regula-tion of genes at the transcriptionallevel (Harbison et al., 2004;Luscombe et al., 2004; Balaji et al.,2006a; Vermeirssen et al., 2007).Tnets are directed networks (Bara-basi and Bonabeau, 2003)theedges in this network always gofrom a transcription factor (TF) to atarget gene (TG) (see Fig. 1). Com-parable to the Tnet is a regulatorynetwork with edges connectingregulatory RNAs to their target

    genes (Ke et al., 2003). Anothersimilar type of regulatory networkis that between kinases, phospha-tases, and their target proteinsthat are subject to phosphorylationor dephosphorylation (Ptaceket al., 2005; Fiedler et al., 2009).In a sense these phoshorylationnetworks are subnetworks of theconventional proteinprotein inter-action networks.

    A more complex form of a regu-latory network is the ubiquitin

    network (Venancio et al., 2009),which depicts interactions betweencomponents of the Ub-system,i.e., ubiquitin/ubiquitin-like pro-teins (e.g., SUMO), the conjuga-tion/de-conjugation enzymes, the

    proteasome, and various otheraccessory components. This regu-latory network too overlaps withthe more generic PPInet and GInet(Venancio et al., 2009). Edgesin networks such as these aretypically depicted as undirected,because there might not be asense of polarity in all of theseinteractions (see Fig. 1) (Barabasiand Bonabeau, 2003). In principle,various individual regulatory net-works can also be combined toproduce composite networks. Other

    more abstract regulatory networksare derivatives from primary net-works that connect different regula-tory proteins with edges by virtueof shared targets. The best knownof these is the co-regulatory net-workderived from the Tnet by con-necting transcription factors, whichshare common target genes, and isvery useful in understanding coop-eration between regulatory pro-teins (Balaji et al., 2006a,b).

    Currently, regulatory networkreconstructions with the best cov-

    erage and quality in terms of bothnodes and edges are only avail-able for unicellular forms, such asS. cerevisiae and E. coli. PPInetsfor metazoans, such as Caeno-rhabditis elegans, Drosophila mel-anogaster and Homo sapiens, withreasonable coverage (Giot et al.,2003; Li et al., 2004; Rual et al.,2005), derived mainly from high-throughput yeast two-hybrid stud-ies, have become available, butthe situation is less satisfactory forTnets (Vermeirssen et al., 2007).

    Networks reconstructed fromlarge-scale data are good in termsof coverage, but suffer from falsepositives to varying extents due torecovery of spurious interactions(see Fig. 1) (Yu et al., 2008).Technical issues, in addition tothe inherent complexities of devel-opmental and differentiation proc-esses, which affect the reconstruc-tion of such networks in multicel-lular systems, are as follows: (1)difficulties due to the complex

    gene structure, including largeintrons, alternative splicing, andpresence of composite transcrip-tion regulatory elements that areoften at great distances from thegenes they regulate (Maniatis and

    Reed, 2002). (2) The complexitiesof chromatin organization, whichinfluence more conventional regu-latory interactions between TFsand TGs (Iyer et al., 2008). (3)The still incompletely understoodprocesses, such as DNA modifica-tions, chromatin protein modifica-tions, and signaling pathways(Iyer et al., 2008). On a more pos-itive note, we do possess detailedstudies on specific developmentalregulatory networks in animals,e.g., the Notch network or the TGF

    network (Kitisin et al., 2007;Kopan and Ilagan, 2009), or inplants, e.g., leaf and floral devel-opment (Lewis et al., 2006), withinformation on interactions betweentranscription factors and theirintricately tangled target ele-ments. Currently, networks recon-structed from unicellular modelsare best for inferring large-scaleor bulk properties of regulatorynetworks, whereas those frommulticellular models are best fordetailed case studies.

    General Structural Properties of

    Regulatory Networks

    Right from the earliest studies inthis regard a fundamental unity inthe organization of disparate bio-logical networks has been repeat-edly noted (Shen-Orr et al., 2002;Barabasi and Bonabeau, 2003;Barabasi and Oltvai, 2004; Balajiet al., 2006a; Gianchandani et al.,2006; Russell and Aloy, 2008). Inglobal terms they have a nested or

    self-similar structure that appearsto hold over several levels of orga-nization - a structure that can beapproximately described as fractal(see Fig. 1). The number of edgesthat connect to a node is termedits degree. When the number ofnodes in a network possessing aparticular degree is plotted, onegets a distribution (the degree dis-tribution) that is best fitted by thepower-law equation of the formn(x) 5 axk; where n is the number

    146 ARAVIND ET AL.

    Birth Defects Research (Part C) 87:143164, (2009)

  • 7/28/2019 BDRC_Multicellular_dev.pdf

    5/22

    of nodes with a particular degreeand x is the degree (see Fig. 2).The a and k in the equation

    are constants unique to eachpower-law distribution (Shen-Orret al., 2002; Barabasi and Bona-beau, 2003; Barabasi and Oltvai,2004; Balaji et al., 2006a; Gian-chandani et al., 2006; Russell andAloy, 2008). This distributionimplies that regulatory networksare similar in properties at all lev-els in which they exist and arehence scale-free. In reality theyare only approximations of thegenuinely scale-free structure seen

    in theoretical networks because,unlike them, biological networkshave a well-defined stopthenodes in the network above orbeyond which there are no furtherlevels. A consequence of the

    power-law distribution of degreesis that there are few nodes withnumerous connections (termedhubs), while the other nodes havevery few connectionsthus hubsdominate the network in terms ofconnectivity (Barabasi and Albert,1999; Barabasi and Oltvai, 2004)(see Fig. 2).

    Beyond their distinctive globalstructure, regulatory networks arealso characterized by peculiarstructures at their lower levels. Indirected regulatory networks, like

    the Tnet, they are termed motifs(Shen-Orr et al., 2002). Three ba-sic types of network motifs havebeen identified (see Fig. 1): (1)single input motifs, where a giventarget gene receives inputs onlyfrom a single transcription factor;(2) multiple input motifs, where agiven target gene receives inputsfrom two or more transcriptionfactors; (3) feed-forward motifs,where target genes receive inputsfrom at least two transcription fac-tors, with the additional condition

    of one of the TFs in the motif alsoregulating the other TF (see Fig.1). Single input motifs specializein coordinating expression of vari-ous genes required in a particularresponse, enforce an order in geneexpression, and are also the basisfor immediate transcriptionalresponses (Shen-Orr et al., 2002).Multiple input motifs are the keyplayers in integrating responses todifferent signaling pathways withrespect to gene expression.Finally, feed-forward motifs are

    critical for responding to persistentsignals and filtering noise (Shen-Orr et al., 2002). Thus, relativeproportions of such motifs in aTnet are of considerable signifi-cance in terms of the regulatoryflux passing through the network(Shen-Orr et al., 2002). In undir-ected networks, such as thePPInet, GInet, and their derivativeregulatory networks, such as theubiquitin-network, a differentkind of low-level structure is

    observedthe dense subgraph(Yu et al., 2006). These are sub-sets of nodes in the network thatare highly connected relative tothe rest of the network. Suchregions in networks are deter-

    mined by identification of struc-tures called cliques (see Fig. 1).Formally, a clique is the maximumnumber of nodes having all possi-ble edges between themselves(i.e., the largest group of nodes,which forms a polygon with edges

    corresponding to all its sides anddiagonals being present) (Yuet al., 2006). The clustering ofgenes into a clique is suggestive offunctional coherence betweenthem or some type of functionalinteraction between their prod-

    ucts. Thus, the identification of cli-ques in regulatory networks is auseful tool for the prediction offunctions of poorly characterizedgenes linked in a clique with func-tionally characterized genes byway of the guilt by associationprinciple (Balaji et al., 2006a,b;Yu et al., 2006).

    The concept of centrality devel-oped in the graph theory helps inassessing the importance of par-ticular genes or proteins in thestructure of regulatory networks.

    Two common measures of central-ity of node in a network aredegree and betweenness (Bara-basi and Albert, 1999; Brandes,2001; Barabasi and Oltvai, 2004).As described above, the degree isa simple description of how con-nected a node is, and the mostcentral elements by this measureare the hubs (Barabasi and Albert,1999; Barabasi and Oltvai, 2004).In directed graphs like Tnets there

    two types of degrees, namely thein- and out-degrees, that respec-

    tively denote the number of genesa TF regulates and the number ofregulatory inputs a particular tar-get gene receives from differentTFs (Balaji et al., 2006a,b). InTnets, TFs which are hubs, typi-cally termed global regulators,influence a vast number of genesand thereby set numerous tran-scriptional programs in motion(Balaji et al., 2008). TFs withlower connectivity, in contrast,appear to be required for the fine

    Figure 2. A: The degree distribution ofthe regulatory networks is typically wellapproximated by a power-law equation.In the graph, degree (k) is the number ofregulatory connections between thenodes, while P(k) indicates the probability

    of a gene with a given number of suchconnections. An example of hub in asmall network is also shown, along withits position in the degree distribution. B:Network susceptibility to attack (loss ofhubs, solid lines) and failure (randomloss of genes, dashed lines). Three net-works are represented for comparisonpurposes: transcriptional (gray), ubiqui-tin (pink), and proteinprotein interac-tions (blue).

    MULTICELLULARITY AND NETWORKS 147

    Birth Defects Research (Part C) 87:143164, (2009)

  • 7/28/2019 BDRC_Multicellular_dev.pdf

    6/22

    tuning of a transcriptional programby regulating smaller sets of genes(Balaji et al., 2006a,b). Between-ness is a different kind of central-ity measure that represents thenumber of shortest paths in the

    network that include a node(Brandes, 2001). Hence, theshortest paths between all pairs ofnodes should be calculated tocompute the betweenness score ofa node. Although degree andbetweenness of a node in a net-work are typically correlated, theycan illuminate different aspects ofthe networks. Certain nodes withhigh betweenness may not behubs, but play a significant role inconnecting various disparate partsof the regulatory networks (Yoon

    et al., 2006).

    Biological Significance of the

    Structure of Regulatory

    Networks

    Scale-free structures similar tobiological regulatory networks arealso encountered in various unre-lated systemsthe world-wide-web, the physical structure of theInternet, and the network ofhuman sexual relationships(Amaral et al., 2000; Barabasi and

    Bonabeau, 2003). This ledresearchers to propose general-ized evolutionary explanations forthe origin of such structuresacross these disparate systems.The simplest of these merelyassumes that (1) a network growsby addition of new nodes andedges, and (2) the edges showpreferential attachment to nodeswith higher pre-existing degree.Thus, in such an evolutionary sce-nario, the rich get richer andthere is a tendency for formation

    of few hubs and numerous poorlyconnected nodes (Amaral et al.,2000; Lee et al., 2002; Yooket al., 2002; Barabasi and Bona-beau, 2003; Harbison et al., 2004;Luscombe et al., 2004; Teichmannand Babu, 2004; Gama-Castroet al., 2008; Sierro et al., 2008).This and related simulations canreproduce the structure of biologi-cal networks; however, there is noclear test yet which establishessuch a mechanism to indeed be

    the cause for the emergence ofsuch a structure in regulatory net-works. Irrespective of the mecha-nistic model for their origin, struc-tural properties of regulatory net-works have profound implication

    for biological systems. An aspectof network structure, which is im-portant with respect to evolution,is the modularity that is observedat the lower levels of organizationin the form of motifs or cliques(see Fig. 1). It indicates that par-ticular motifs or cliques can belinked to a set of functionally dis-tinct nodes within a given regula-tory network as a natural conse-quence of its scale-free structure(Lee et al., 2002; Harbison et al.,2004; Luscombe et al., 2004;

    Teichmann and Babu, 2004; Balajiet al., 2006a,b; Gama-Castroet al., 2008; Sierro et al., 2008).Thus, a particular regulatorymechanism specified by a clique ora motif can be easily recruited invarious functional contexts. Thisprovides the basis for understand-ing the prevalent observation thatsimilar regulatory subnetworkscan be used in both unicellular andmulticellular forms or reusedwithin different tissues in multicel-lular forms. Another, more direct

    consequence of the scale-freestructure is the remarkable resil-ience of such regulatory networksto random loss of nodes or failure(Albert et al., 2000) (see Fig. 1).This is because most nodes inscale-free network have few links;hence, disrupting one of them atrandom is unlikely to break downthe network. In contrast, disrup-tion of hubs, termed attack, canbreak apart a regulatory networkmore easily, as hubs account for alarge number of the connections in

    a network (Albert et al., 2000)(see Fig. 2).

    The biological correlate of thisnetwork property is the ability ofregulatory systems to withstandrandom failures from disruption ofgenes due to mutation or chemicalaction. This in part explains whyacross the evolutionary tree thenumber of genes whose disruptionresults in lethality is only 1520%of the total number of genes in anorganism (Giaever et al., 2002).

    While resilience to failure is com-mon to different types of regula-tory networks, they still showmarked quantitative differences inthis particular property (see Fig.2). Tnets in general are more

    resilient to failure compared toother regulatory networks thatentirely or predominantly dependon proteinprotein or geneticinteractions (Balaji et al., 2006a).Some regulatory networks, likethe ubiquitin network, are also farmore susceptible to attacks thanothers (Venancio et al., 2009).Genetic studies have also sug-gested that eukaryotes in particu-lar can be quite resilient to muta-tion of transcription factors,including those lacking close paral-

    ogs (Hu et al., 2007; Balaji et al.,2008). This suggests that the Tnetis a particularly robust regulatorynetwork, beyond what would beexpected on the basis of gene re-dundancy. The construction of cor-egulatory networks based onTnets shows that there is underly-ing architecture indicative of indi-rect backup, where multiple unre-lated TFs can potentially cover foreach other in this regulatory sys-tem (Balaji et al., 2006a,b). Exis-tence of such an over-engi-

    neered backup system in theform of the architecture of the cor-egulatory network is likely to be amajor determinant of the evolv-ability of the gene expression inorganisms (see below). Differen-ces in relative tolerance of net-works to failure and attack alsohave a bearing on both the evolu-tion of such networksversionsmore tolerant to failure and/orattack appear to evolve more rap-idly between organisms and mightcontribute to regulatory diversity

    between organisms (Balaji et al.,2006a,b).

    To be able to use the informa-tion from regulatory networks inunderstanding problems pertain-ing to multicellularity, we need toplace them in the context of theevolution of such organisms. Forthis we turn to comparativegenomics in the next section andtry to understand how compo-nents of regulatory systems haveoriginated and evolved.

    148 ARAVIND ET AL.

    Birth Defects Research (Part C) 87:143164, (2009)

  • 7/28/2019 BDRC_Multicellular_dev.pdf

    7/22

    EVOLUTION OF

    REGULATORY SYSTEMS

    Multiple Origins of

    Multicellularity

    The presence of multicellular

    forms across the tree of life hassuggested that this morphologicalprinciple emerged on multipleoccasions in course of evolution(Raff, 1996; Kirschner and Ger-hart, 2005; Rokas, 2008). Themajor independent emergences ofthe multicellularity among eukar-yotes include the following: (1) ani-mals; (2) fungi; (3) amoebozoanslime molds; (4) plants; (5) chro-mist algae (e.g., brown algae); (6)chromist oomycetes (mildews orwater molds); and (7) heterolobo-

    sean or amoeboflagellate (acrasidslime molds). Among bacteria toomultiple emergence of multicellularforms have been noted, for exam-ple, among myxobacteria (d-pro-

    teobacteria), actinobacteria, cya-nobacteria, and acidobacteria, andamong archaea at least one multi-cellular lineage has been observedin the form of Methanosarcina(Mayerhofer et al., 1992; Shapiroand Dworkin, 1997). At face valueit would appear that these multipleorigins have little in common, but

    comparative genomics has revealedcertain shared features at the mo-lecular level. It has been observedthat the number of specific tran-scription factors scales nonlinearlywith increase in proteome size (vanNimwegen, 2003; Anantharamanet al., 2007a) (see Fig. 3). In thecase of bacteria, multicellularforms typically have both large pro-teomes and a more than linearincrease in the fraction of the pro-teome that comprises of specifictranscription factors (Aravind

    et al., 2005). In the case of eukar-yotes, larger proteome size doesnot necessarily imply a multicellu-lar morphologythe largest eu-karyotic proteomes are currentlyseen in unicellular forms such asciliates (Eisen et al., 2006) and Tri-chomonas (Carlton et al., 2007)(see Fig. 3). However, just as in thecase of multicellular bacteria, the

    multicellular eukaryotes have amuch higher fraction of their pro-teome devoted to specific tran-

    scription factors than unicellularforms of comparable size (Iyeret al., 2008). In particular thistrend is exemplified to the greatestdegree in animals, which appa-rently have the most complex mul-

    ticellular morphologies. Thus,across the tree of life, a greaterthan proportional increase in thenumber of specific transcriptionfactors appears to be a correlate ofmulticellularity (see Fig. 3).

    In the case of multicellular bacte-ria, a significantly higher number ofserine/threonine/tyrosine kinases(S/T/Y kinases) and phospho-pep-tide-binding FHA domains havebeen observed than their morpho-logically less organized counter-parts (Perez et al., 2008). Thus, a

    case could be made that in bacteriathe emergence of complex phos-phorylation networks was a notablecorrelate of the origin of multicellu-larity. However, all eukaryoteshave expanded protein kinase rep-ertoires. Hence, no comparabletrend to the bacteria is observedwith respect to S/T/Y kinases, orfor that matter with most familiesof signaling proteins, which displaylargely linear or mild power-lawscaling with respect to overall pro-teome size of eukaryotic organisms

    (see Fig. 3) (Anantharaman et al.,2007a). Interestingly, bacteria thatdisplay multicellular organizationor temporal morphological devel-opment consistently encode aninteresting array of proteins in thegenomes which, in addition to theS/T/Y kinases and FHA domains,includes STAND superfamilyATPases, caspase-like proteases,and TIR (Toll-interleukin receptor)domain proteins (Anantharamanet al., 2007a). Among eukaryotestoo, such regulatory proteins with

    the above set of domains are par-ticularly prevalent in the proteomesof multicellular forms. In both mul-ticellular animals (e.g., nematodeCed4 and human APAF1, caspase-1) and plants (e.g., disease resist-ance gene N and metacaspases)proteins with the same domainshave been implicated in specificsignaling pathways pertaining tocell death, pathogen response, andtissue remodeling (Aravind et al.,2001; Chamaillard et al., 2003;

    Ting et al., 2008). This suggeststhat apoptosis-related signalingpathways based on these proteindomains might be a notable com-mon denominator among severalmulticellular lineages across the

    three superkingdoms of life. Thiscan be interpreted within the evo-lutionary framework under the kinselection hypothesis (Hochberget al., 2008). In a multicellular or-ganism, the cells being clonal areeffectively an ensemble of kin.Thus, some cells sacrificingthemselves via apoptosis for thehighly increased fitness of sistercells in the ensemble might providethe dying cells with greater inclu-sive fitness than if they remained inthe unicellular state. In terms of

    regulatory networks, animal modelsystems indicate that these pro-teins functionally interact to formdistinct modules in regulatory net-works primarily devoted to media-tion of apoptosis (Aravind et al.,2001; Chamaillard et al., 2003;Ting et al., 2008). In bacteria, inaddition to co-occurring in thegenomes of organisms displayingmulticellularity or developmentalcomplexity, genes encoding suchproteins also tend to cluster to-gether in predicted operons, sug-

    gesting functional interactions insimilar networks (Aravind et al.,2005). Thus, proteins, which medi-ate apoptosis or related cell-deathprocesses, are likely to compriserelated regulatory network mod-ules that are common to phyloge-netically distant organisms sharingmulticellularity. Based on theirphyletic patterns, it appears thatlateral transfer of these interactinggenes, encoding proteins withfunctions related to apoptosis,enabled development of multicellu-

    larity in different organisms. Like-wise, several extracellular proteindomains involved in adhesion alsoappear to have been disseminatedby lateral transfer across bacteriaand multicellular eukaryotes, andmight have provided a commonfunctional mechanism for cellularassembly (Anantharaman et al.,2007a).

    Other subtle and less-recognizedmolecular features with consider-able bearing on the structure of

    MULTICELLULARITY AND NETWORKS 149

    Birth Defects Research (Part C) 87:143164, (2009)

  • 7/28/2019 BDRC_Multicellular_dev.pdf

    8/22

    regulatory networks also distin-guish eukaryotic multicellularity.Domain architectures, i.e., theway individual protein domainsare linked in a polypeptide, can beconverted into a network repre-sentation. In this representation,domains are conceived as nodes ofthe domain architecture network

    and the adjacent occurrence oftwo domains in a polypeptide isindicated as an edge joining thetwo domains (Anantharamanet al., 2007a; Iyer et al., 2008).The total domain architecture net-work is the ensemble of all suchconnections between domainsacross a set of proteins under con-sideration. The domain architec-ture network can tell us how sim-ple or complex the architectures ofa particular set of proteins are in a

    given organism. When these net-works were computed for signalingproteins across eukaryotes, it wasobserved that multicellular formsoften tend to have greater com-plexity in these networks (morenodes and edges between them)than their unicellular relatives(Anantharaman et al., 2007a,b;

    Iyer et al., 2008). This trend waseven more marked for proteinsinvolved in chromatin structureand dynamics, such as histone-modifying and chromatin remodel-ing proteins (termed chromatinproteins to distinguish them fromspecific transcription factors)(Anantharaman et al., 2007a; Iyeret al., 2008). The increased com-plexity of the domain architecturenetwork of signaling and chroma-tin proteins suggests that there

    are likely to be a concomitantincrease in number of interactionsbetween proteins, because com-bining multiple domains in a poly-peptide allows for more combina-torial interactions. In particular, inthe case of chromatin proteins,increased architectural complexityis suggestive of increased ability

    to add specific epigenetic marksvia histone and DNA modifications,and subsequently read thosemarks via specific interactions(Anantharaman et al., 2007a; Iyeret al., 2008). This could havemajor role in maintaining multipledifferentiated cellular states viaepigenetic control.

    Hence, at least in case of thebetter-studied eukaryotic multicel-

    lular forms, we notice that twosets of regulatory networks are

    Figure 3. A: Nonlinear scaling of total number of signaling proteins in eukaryotes with proteome size along with the best-fit curve.One hundred seventy signaling domains were studied in 43 completely sequenced eukaryotic genomes. B: Scaling of serine/threo-nine/tyrosine (S/T/Y) kinases in eukaryotes with proteome size with the best-fit curve. C: Scaling of eukaryotic transcription factors

    with proteome size with the best-fit curve. Note that the multicellular forms have higher numbers of specific transcription factors.D: Scaling of prokaryotic specific transcription factors with the HTH domain from complete genomes with the best-fit curve.

    150 ARAVIND ET AL.

    Birth Defects Research (Part C) 87:143164, (2009)

  • 7/28/2019 BDRC_Multicellular_dev.pdf

    9/22

    likely to have concomitantly grownlarger and more complex. Firstly,the disproportionate increase inthe number of specific transcrip-tion factors indicates that the Tnetgreatly increased in complexity(see Fig. 3). Secondly, the emer-gence of more complex domainarchitectures among signaling andchromatin proteins suggests that

    the origin of multicellularity wasaccompanied by expansion of sig-naling and protein-modificationnetworks relative to their unicellu-lar counterparts (Anantharamanet al., 2007a; Iyer et al., 2008).Nevertheless, given the overallapproximately scale-free architec-ture of regulatory networks, thegeneral principles of network orga-nization (as elucidated above)would remain consistent acrossunicellular and multicellular forms,

    despite the expansion in the latter.Importantly, the general resultspertaining to the modularity ofregulatory networks (see above)suggest how modules, such asthose pertaining to apoptotic func-tions, are likely to have beenported across distantly related lin-eages via lateral transfer. Finally,it should be noted that despite the

    multiple origins of eukaryotic mul-ticellularity, four of these are con-centrated in a particular monophy-letic clade of eukaryotes termedthe crown group, which includesanimals, fungi, slime molds, andplants (Iyer et al., 2008; Rokas,2008; Ruiz-Trillo et al., 2008;Pawlowski and Burki, 2009). Thechromists, which also display mul-ticellular forms, have emergedthrough a secondary photosyn-thetic endosymbiosis with the

    plant lineage (Bhattacharya et al.,2004). Interestingly, these chro-mists encode several regulatoryproteins, including transcriptionfactors, which appear to havebeen acquired from the plant

    endosymbiont (Iyer et al., 2008).These lineages also generally pos-sess a higher normalized count ofchromatin proteins and transcrip-tion factors than other eukaryotes(Iyer et al., 2008). Taken to-gether, these observations raisethe possibility that the genome ofthe ancestral crown group eukar-yote already possessed certainfeatures that enabled some formof facultative multicellularity, per-haps comparable to what isobserved today in amoebozoan

    slime molds. The base-level multi-cellularity was probably reinforcedin some lineages with furtherexpansions of transcription fac-tors, chromatin proteins, and ad-hesion proteins, while it was atte-nuated in others via extensivegene loss (for example, relatedsaprophytic life style in fungi).

    Lineage-Specific Gene

    Expansions

    One of the most striking revela-tions from the comparativegenomics of eukaryotes has beenthe discovery of the phenomenonof lineage-specific expansions ofprotein families (LSEs) (Landeret al., 2001; Lespinet et al.,2002). A LSE is defined as theexpansion of a family of proteinsin a particular lineage after itsdivergence from a reference sisterlineage (Lespinet et al., 2002)(see Fig. 4). One of the classicalexamples of lineage-specific

    expansions is that of the family oftranscription factors with the POZ(also called BTB) domain (Aravindand Koonin, 1999a; Lespinetet al., 2002) These transcriptionfactors have an N-terminal POZdomain combined with a C-termi-nal DNA-binding domain that isusually a C2H2 Zn-finger(Spokony and Restifo, 2007). Bothvertebrates and insects have largenumbers of these transcriptionfactors (over 50 paralogs per ge-

    Figure 4. A: A simplified phylogenetic tree of the POZ domain transcription factorsillustrating the concept of the lineage-specific expansion. Note that the transcriptionfactors from a given lineage are closer to its paralogs than to those from other line-ages. The domain architectures of selected proteins are shown to the right. C2H2,zinc-finger domain; AT, AT-hook DNA binding domain; Bzip, basic zipper domain; theC-terminal domain in pipsqueak is a helix-turn-helix domain. B: Examples of lineage-specific gene expansions in various transcription factor families in different eukaryoticlineages which are labeled by genus name. All LSEs from one lineage are colored inthe same away. The Myb/SANT domains expanded in Drosophila represent a family ofhelix-turn-helix DNA transcription factors typified by the Zeste protein. C: The ubiqui-tin and kinase pathways are shown to illustrate LSEs occurring in component proteins

    of the pathway that are at termini.

    MULTICELLULARITY AND NETWORKS 151

    Birth Defects Research (Part C) 87:143164, (2009)

  • 7/28/2019 BDRC_Multicellular_dev.pdf

    10/22

    nome). However, phylogeneticanalysis reveals that these expan-sions happened independently inthe insect and vertebrate lineages,after they separated from theircommon ancestor (Aravind and

    Koonin, 1999a)the vertebratePOZ domain transcription factorsgroup with each other to theexclusion of the insect versions,and likewise the insect versionsgroup with themselves to theexclusion of the vertebrate forms(see Fig. 4). Development of analgorithm to systematically detectLSEs and their case-by-case anal-ysis across the eukaryotesrevealed that they are one of themost important forces that shapethe contours of proteomes (Les-

    pinet et al., 2002). Anywherebetween 20% (e.g., in yeasts) to80% (e.g., plants and vertebrates)of the eukaryotic proteomes arecomprised of families of lineage-specifically expanded families.

    Further, these LSEs account fornearly one half of all the paralo-gous clusters of proteins encodedin a eukaryotic proteome (Lespinetet al., 2002). Categorization of theLSEs suggests that they are par-ticularly prevalent in certain cellu-lar functions. These include pro-

    teins involved in responses tostress, pathogen/parasites, andxenobiotics, proteins placed at thetermini of signaling cascades(e.g., E3s in the ubiquitin-basedpathways and MAP kinases of thephosphorylation cascades), tran-scription factors, and chemorecep-tors (Lespinet et al., 2002). Thispattern of function-wise enrich-ment of the LSEs is consistentlyretained across eukaryotic phylog-eny and has thus become apowerful tool for predicting func-

    tions among uncharacterized pro-teins when combined withsequence analysis.

    One of the most importantaspects of LSEs with respect toevolution of regulatory networks isthe preponderance of this phenom-enon among transcription factorfamilies (Lander et al., 2001; Les-pinet et al., 2002; Coulson andOuzounis, 2003; Iyer et al., 2008)

    (see Fig. 4). The most prevalentfamily of transcription factors in a

    given proteome is widely differentacross different eukaryotic line-ages, including within different ani-mal lineages (see Fig. 4): Nuclearhormone receptor-type zinc fingertranscription factors are the most

    prevalent transcription factorsamong nematodes, the KRAB-typeC2H2 zinc fingers in vertebrates,and AP2, VP1, and MYB domaintranscription factors in angiospermplants (Lander et al., 2001; Lespi-net et al., 2002; Coulson andOuzounis, 2003; Iyer et al., 2008).Developmental genetics in modelsystems have shown that manykey developmental processes areregulated by transcription factorsbelonging to these LSEs, ratherthan those inherited relatively

    unchanged from the last commonancestor of all animal or plant line-ages. A striking case of this is thePOZ domain transcription factors inDrosophila, which as noted abovebelong to an insect-specific LSE.Gene products of this expansionregulate a diverse range of devel-opmental decisions at the tran-scription level in contexts such asaxonal path-finding (Lola) (Ginigeret al., 1994), morphological diver-sity of neuronal dendrites (abrupt)(Ryner et al., 1996; Kim et al.,

    2006), specification of neuronsthat determine sexual orientation(Fruitless) (Ito et al., 1996), speci-fication of cell-fates in the eye(Tramtrack69) (Lai and Li, 1999),the development of distinctiveexternal genitalia (ken and barbie)(Lukacsovich et al., 2003), epithe-lial morphogenesis (ribbon) (Shimet al., 2001), and early oogenesis(Pipsqueak) (Horowitz and Berg,1996), to name just a few repre-sentatives. This functional coloni-zation of a large number of dispar-

    ate functions after the emergenceof a LSE suggests they might havea particularly important role in thediversification of morphology inmulticellular forms. The occurrenceof such LSE also implies that Tnetsundergo massive reorganizationand rewiring with the emergence ofnew lineages (Babu et al., 2006).This is also consistent with studieson Tnets, which indicate that hubsare routinely displaced by newtranscription factors or that hubs

    are lost and new hubs emerge intheir place. This plasticity of Tnetsis potentially attributable to itsinnate robustness due to the pres-ence of internal backup, whichallows the replacement of old tran-

    scription factors by new onesemerging from an LSE (Balaji et al.,2006b). Further, representatives ofthe LSE of POZ domain transcrip-tion factors in Drosophila are typi-cally positioned downstream ofmaster regulators of anteroposte-rior patterning (e.g., the Hox pro-teins), or function with the chroma-tin proteins (e.g., polycomb andtrithorax group proteins) involvedin maintaining the boundaries ofthe anteroposterior gene expres-sion (Ghosh et al., 2001; Pagans

    et al., 2002; Zhang et al., 2006;Zhu et al., 2006). Hence, transcrip-tion factors which are generated byLSEs appear to be fitted in the reg-ulatory network hierarchy usuallyin terminal locations, thereby sup-porting their role in generation ofmorphological diversity within theframework of an otherwise con-served generic anteroposteriorbody plan.

    Similar patterns are observed inthe case of certain proteins under-going LSEs in the Ub/Ubl conjuga-

    tion network (U-net) (Venancioet al., 2009) and the protein phos-phorylation networks (Ptaceket al., 2005; Fiedler et al., 2009).In the U-net the most prominentLSEs are concentrated among com-ponents of E3 enzymes of the path-way, such as RING, Ub-box, and F-box domains (see Fig. 4) (Lespinetet al., 2002). The E3s are the ter-minal enzymes in the conjugationcascade which finally transfer theUb to specific substrates (Hoch-strasser, 2009). In contrast, E1

    and E2 enzymes tend to show noLSEs and are largely vertically con-served across large sections of theeukaryotic phylogenetic tree(Anantharaman et al., 2007a). Thisindicates that the E3 LSEs in the U-net aid in directing a conservedstem pathway of E1s and E2s to-ward a diversity of substrates thatdiffer from lineage to lineage. Inthe case of the F-boxes, LSEs mighthave a specific role in targetingvarious pathogen proteins for Ub-

    152 ARAVIND ET AL.

    Birth Defects Research (Part C) 87:143164, (2009)

  • 7/28/2019 BDRC_Multicellular_dev.pdf

    11/22

    mediated development (Thomas,2006). However, the LSEs of RINGdomains participating in develop-mental regulation are supported bystudies in both animals and plants(Serrano et al., 2006). Similarly, in

    the case of phosphorylation net-works, LSEs are noted among ki-nases of the Calcium-dependentand MAP kinase families in plants,casein-kinase and soluble tyrosinekinase families in nematodes, andvarious receptor kinases in variousplant and animal lineages (Lespinetet al., 2002) (see Fig. 4). Proteinphosphatases also show LSEs inangiosperm plants and to a certainextant in vertebrates (Lespinetet al., 2002; Popescu et al., 2009).Many of these expanded families of

    kinases again belong to the terminiof signaling cascades - MAP kinasesphosphorylate specific substrates,whereas the receptor kinases areusually extreme upstream res-ponders to primary extracellular

    signals (Lespinet et al., 2002;Popescu et al., 2009). Thus, thesekinase LSEs are also likely to haveplayed a major role in harnessing aconserved core pathway to variousextrinsic signals or targeting themto different sets of substrates in dif-ferent lineages (see Fig. 4).

    Prevalence of LSEs in specifictranscription factors and termini ofsignaling cascades, especially inmulticellular forms, has been amajor factor in the rewiring of partsof regulatory networks. In contrastto the emphasis on lineage-specificadaptations, conventional evo-devo studies have repeatedlyshown conserved regulatory coresin networks. These might be sharedacross Metazoa, and are requiredfor anteroposterior and dorsoven-tral axis patterning, specification of

    tissues developing from differentgerm layers, asymmetric cell-divi-sion, and signaling between differ-ent germ layers (Raff, 1996;Arthur, 2002; Davidson and Erwin,2006; De Robertis, 2008; Kopanand Ilagan, 2009). Likewise, inplants certain conserved regulatorynetworks have been implicated indevelopmental pathways for mor-phological elements, such as flow-

    ers and leaves, and differentiationof tissues (Krizek and Fletcher,

    2005; Reinhardt, 2005; Tsukaya,2006; Endress and Doyle, 2007).Hence, by combining insights fromthese evo-devo studies and thoseoffered by LSEs discerned fromcomparative genomics, we may

    conclude that: (1) core modules ofregulatory networks specifyinggeneral morphological landmarksof a major lineage of multicellularorganisms (e.g., plants or animals)are indeed more widely conserved;(2) however, beyond these genericmodules, the regulatory networksare extensively refashioned due togeneration of new nodes by LSEs,thereby allowing adaptive radia-tions via lineage-specific altera-tions of patterning and biochemis-try of specific tissues.

    Gene Loss and Horizontal

    Transfer

    Another force which comparativegenomics has revealed to play amajor role in the reorganization ofregulatory networks is gene loss.Studies on patterns of gene losssuggest that beyond a generalbackground of sporadic genelosses there are discernable pat-terns of concerted loss in which

    functionally connected genes tendto be lost as a unit (Aravind et al.,2000; Edvardsen et al., 2005;Miller et al., 2005). This latterform of gene loss is a reflection ofthe modular network architecture.Loss of a key gene can render anentire module of a regulatory net-work dysfunctional. By virtue ofthe scale-free network architec-ture, other genes in the moduletypically might not have many ex-traneous functional connections,and are effectively superfluous as

    the loss of the key gene has al-ready attenuated the role of themodule. Hence, there is goodchance that the other genes in themodule are also lost subsequently.Massive losses of this type areobserved in fungi, particularly informs like yeasts (Liti and Louis,2005; Wapinski et al., 2007).Given that multicellularity was al-ready present in the ancestral fun-gus (James et al., 2006), suchgene loss is likely to have been a

    major player in the regression ofyeasts to a more unicellular condi-tion. Similar losses are also seenacross Metazoa (Edvardsen et al.,2005; Miller et al., 2005)inextreme cases, such losses appear

    to similarly result in regression ofthe multicellular animal form to amore unicellular condition, as seenin Myxozoa (Kent et al., 2001). Inother cases, gene loss might becorrelated with different degreesof morphological simplification.Availability of genomic sequencesof basal animals, such as cnidar-ians, Trichoplax, and sponges,shows that nematodes have lostseveral modules of regulatory subnetworks, such as the hedgehog,NFkB, and certain apoptotic sig-

    naling modules (Miller et al.,2007; Zmasek et al., 2007; Bur-glin, 2008; Srivastava et al.,2008). These losses might have arole in the absence of prominentlateral appendages and developedphotoreceptors in nematodes,such as C. elegans.

    A regulatory system, which is of-ten subject to gene loss, is theRNA-interference (RNAi) networkthat performs the key role of neg-atively regulating genes at thepost-transcriptional level. This

    system is highly developed inplants, and certain fungi and ani-mals, but has been repeatedly lostin many of the intervening sisterlineages (Aravind et al., 2000;Anantharaman et al., 2002; Cer-utti and Casas-Mollano, 2006). Forexample, it has been entirely lostin the yeast, S.cerevisiae, but ispresent in a largely intact form inother fungi such as Schizosaccha-romyces pombe, Neurosporacrassa, and mushrooms. In ani-mals, this system shows an inter-

    esting pattern of multiple partiallosses. The nematodes have acomplete RNAi network with boththe small RNA-processing system(e.g., Dicer and Drosha), themRNA targeting nucleases of theArgonaute family, and the propa-gators of siRNAs via RNA-depend-ent replication (the RNA-depend-ent RNA polymerase/RdRP) (Ara-vind et al., 2000; Anantharamanet al., 2002; Cerutti and Casas-Mollano, 2006). Some insects and

    MULTICELLULARITY AND NETWORKS 153

    Birth Defects Research (Part C) 87:143164, (2009)

  • 7/28/2019 BDRC_Multicellular_dev.pdf

    12/22

    vertebrates lack the siRNA repli-cating part of this network encom-passing the RdRP system (Anan-tharaman et al., 2002; Cerutti andCasas-Mollano, 2006). Genomicsstudies have shown that the RdRP

    is however present in the genomeof the basal chordate, amphioxus(Branchiostoma), suggesting thatthere have been at least two inde-pendent losses of this subnetwork,one in the line leading to verte-brates and one in insects. A pointemphasized by losses in the RNAinetwork is that regulatory net-works might have an intrinsicrobustness due to indirect back-upfrom other functionally compara-ble systems. In the case of theRNAi system, chromatin-level

    gene silencing and post-transla-tional protein degradation systemsperform biochemically unrelatedactions that are neverthelesseffectively functionally related,i.e., they modulate the levels of

    gene products (Lorentzen andConti, 2006; Zofall and Grewal,2006). Availability of such abackup enables the loss of certainsystems depending on the evolu-tionary forces acting on an orga-nism. These evolutionary forcesusually derive from organismal

    lifestyles that have been termedK-selected or r-selected in the ev-olutionary theory (Stearns, 1976).Typically, organisms that adopt alifestyle characterized by rapidgrowth rates and reproductivecycles tend to evolve morestreamlined regulatory systemsthrough gene loss (r-selection).On the other hand, organisms thatare strong competitors in heavilypopulated niches and that investheavily in fewer offspring tend tohave more complex regulatory

    systems with lesser gene loss. Inthe latter case, the retention ofmore regulatory systems poten-tially allows them to compete bet-ter by fine-tuning the regulation oftheir genes.

    Right from its earliest days com-parative genomics suggested thatlateral gene-transfer might playan important role in shaping thecomposition of genes in an orga-

    nism (Gogarten et al., 2002;Boucher et al., 2003). We have al-

    ready discussed how the lateraltransfer of protein domainsinvolved in apoptotic networks andadhesion might have played a rolein evolution of multicellularity indifferent lineages. Throughout eu-

    karyotic evolution, lateral trans-fers, especially from bacteria,have played a role in the emer-gence of novel regulatory mecha-nisms. Such transfers have playedimportant roles in the evolution ofcellcell communication and tran-scription regulation in multicellularorganisms (Anantharaman et al.,2007a). For example, key recep-tors in animals, such as the acetyl-choline receptor-type ligand-gatedion channels and the nitric oxidereceptor, have their ultimate ori-

    gins in receptors laterally trans-ferred from bacteria to eukaryotesat different points prior to theradiation of metazoans (Tasneemet al., 2005). Similarly, NMDA andmetabotropic glutamate receptorshave ligand-binding domains evo-lutionarily derived from small-mol-ecule sensors in bacterial signalingsystems (Tasneem et al., 2005).In plants, DNA-binding domains oftwo notable transcription factorfamilies, which include several de-velopmental regulators, namely

    the Vp1 and AP2 families, appearto have been derived from DNA-binding domains found in bacteria(Babu et al., 2006; Iyer et al.,2008). Most bacterial versions areencoded by bacterial transposonsor mobile restriction-modificationsystems. Thus, it is quite likelythat the DNA-binding domains ofthese transcription factors enteredthe plant lineage, early in evolu-tion, via transfers from bacteria,probably via the medium of mobileelements (Babu et al., 2006; Iyer

    et al., 2008). On a related note,across eukaryotes, mobile ele-ments such as transposons havecontributed greatly to the evolu-tion of transcription factors (Babuet al., 2006). Transposases ofmost transposons containsequence-specific DNA-bindingdomains that can provide the pre-cursor for the innovation of anovel DNA-binding domain for atranscription factor (Babu et al.,2006; Lin et al., 2007; Iyer et al.,

    2008). Thus, catalytically inactivetransposons remnants, whichretain their DNA-binding domains,have been the progenitors of sev-eral specific transcription factors,including certain major develop-

    mental regulators (Babu et al.,2006; Lin et al., 2007; Iyer et al.,2008). Such lateral transfers orremnants of transposases havethe ability of delivering entirelyfunctional pre-made regulatorymolecules. During the emergenceand subsequent elaboration ofmulticellularity, such transfersappear to have provided importantraw material for the origin of newregulatory proteins which wereincorporated at various points inpre-existing regulatory networks.

    The Conundrum of the

    Dissociation Between

    Morphology and Proteomic

    Complexity: Nonprotein Coding

    Segments of Genomes

    Within multicellular lineages, or-ganizational complexity has devel-oped along very different lines. Insome cases there have been regres-sions to more unicellular-like states(see above), whereas in other cases

    there has been enormous increasein complexity in terms of morphol-ogy and tissue types. Such trendsare visible in the course of the evolu-tion of plant, animal, and fungal lin-eages, but have been best studied inthe metazoans. Genome sequencesof early-branching metazoan line-ages, particularly the cnidarians,have brought forth a conundrum.Cnidarians are accepted to beorganizationally less-complex thanvertebrates and arthropods (prob-ably even nematodes) from the view

    point of tissue differentiation andmorphology (Schierwater et al.,2009). However, analysis of cnidar-ian proteomes suggests that theypossess all the major regulatory net-works seen in vertebrates (Putnamet al., 2007; Zmasek et al., 2007;Burglin, 2008). In fact, certain sub-sets of these have been lost ininsects (aspects of interferon-sig-naling-type pathways such as theIRF family transcription factors) andnematodes (e.g., certain parts of

    154 ARAVIND ET AL.

    Birth Defects Research (Part C) 87:143164, (2009)

  • 7/28/2019 BDRC_Multicellular_dev.pdf

    13/22

  • 7/28/2019 BDRC_Multicellular_dev.pdf

    14/22

    ment or a back-up for the moreconserved protein-based regula-tory networks involved in silencing,rather than being a novel ensembleof control steps.

    Finally, there is genuine noncod-

    ing DNA which exerts its regulatoryinfluence by providing binding sitesfor specific transcription factors aswell as chromatin proteins. The for-mer set of binding sites includesconventional promoter elementsand more distant enhancer and si-lencer elements (Weinstock, 2007;Bonn and Furlong, 2008; Busseret al., 2008; Zinzen and Furlong,2008). The latter set includes theelements such as insulators,boundary elements, and sequencesbound by trithorax and polycomb

    group proteins, all of which play amajor role in higher order dynamicsof chromatin structure, and conse-quently gene expression (Valen-zuela and Kamakaka, 2006; Breil-ing et al., 2007). A simple compari-

    son of a fungal genome like yeastwith that of an animal like Drosoph-ila reveals that regulatory elementsare close to the gene and typicallysimple in the former and can beenormously complex in the latter(Harbison et al., 2004; Bonn andFurlong, 2008). The evidence for

    this increased complexity of tran-scription regulatory elementscomes from many direct studies onsuch elements in developmentalgenes in various animals, as well asindirect studies from genetic poly-morphism-phenotype associationstudies in humans and other ani-mals (Weinstock, 2007; Bonn andFurlong, 2008; Busser et al., 2008;Zinzen and Furlong, 2008; Verlaanet al., 2009). In humans, suchassociation studies show that alarge number of single nucleotide

    polymorphisms with extensivephenotypic consequences do notaffect the protein-coding parts ofgenes. Rather, they affect noncod-ing regions, often at great distan-ces from the coding segment, indi-cating the presence of a rich land-scape of regulatory sites that haveprofound consequences on geneexpression (Verlaan et al., 2009).Indeed, the DNA-binding domains

    of various lineage-specificallyexpanded TFs in multicellular forms

    are often very similar and unlikelyto have very distinct binding specif-icities. However, they do widely dif-fer in expression patterns suggest-ing that they have notable differen-ces in their regulatory elements

    and this differential expression is amajor aspect of their functional dif-ferentiation (Hoey and Levine,1988; Babu et al., 2006).

    The importance of cis regulatoryelements as major players inorganizational complexity has beenhighlighted by classical case stud-ies such as those on the even-skipped gene (eve) in Drosophilaand the Endo16 gene in the sea ur-chin, Strongylocentrotus purpura-tus (Howard and Davidson, 2004;Istrail and Davidson, 2005). The

    former gene is expressed early inembryonic development in sevencircumferential stripes that play arole in defining the territories of thefuture segments in the Drosophilabody plan (Stanojevic et al., 1989,1991; Veitia, 2008). This expres-sion pattern in the form of sevenstripes is exquisitely orchestratedvia the action of five cis regulatoryelements. Of these, three elementscontrol the emergence of one evestripe each, where as the remain-ing two control two stripes each.

    One lesson from these eve ele-ments is that the precise spatialemergence of a pattern is con-trolled via both the positive andnegative regulatory actions of spe-cific transcription factors bindingtheir target sites on these ele-ments. For example, in the case ofstripe #2 the transcriptional activa-tors Bicoid and Hunchback activatethe expression of eve (Hoey andLevine, 1988; Stanojevic et al.,1989, 1991; Howard and Davidson,2004). However, these transcrip-

    tion factors themselves are broadlydistributed and the precise spatialrestriction of eve expression isbrought about by two negative reg-ulators, giant and kruppel, thatrespectively limit the anterior andposterior expression boundaries ofeve (Stanojevic et al., 1991).Based on these observations, spe-cific transcription factor bindingsites in the regulatory elementscontrolling the expression of theeve gene have been compared to

    AND and NOT logical operators atthe DNA level (Howard and David-son, 2004). It is also clear that dif-ferent noncoding regulatory ele-ments could exert influence atmany different levels in the net-

    work hierarchy (Weinstock, 2007;Bonn and Furlong, 2008; Busseret al., 2008; Zinzen and Furlong,2008). The core promoter elementsallow the driving of the basicexpression of a gene. The cis regu-latory elements coordinate with thecore promoter to establish a spatialor temporal expression pattern byacting as logic gates that parse theambient transcription factor con-centrations. However, this initialexpression pattern could be lostover subsequent cell cycles. The

    maintenance of these patternsover multiple cell cycles is thenmediated by regulatory elementsthat recruit binding of chromatin-level modifiers (Valenzuela andKamakaka, 2006; Breiling et al.,2007). These protein complexes

    fix a certain expression state byeither maintaining an open chro-matin configuration usually (tri-thorax group proteins) or a con-densed chromatin state (polycombgroup proteins). Proteins bindingthe insulator elements prevent

    propagation of these chromatinstates from one chromosonalregion to another (Bushey et al.,2008). Without changes to theactual protein-coding sequence ofa gene, alterations to regulatoryregions can modify the position of agene in the regulatory network bychanging its linkages to upstreamtranscription factors (Busser et al.,2008; Veitia, 2008). Thus, the spa-tial and temporal location in whicha protein performs its function canbe drastically altered, potentially

    resulting in changes to the mor-phological complexity by using thesame underlying protein comple-ment. Both the extensive LSEs oftranscription factors and theincreased domain architecturalcomplexity of chromatin proteinsseen in multicellular forms could beseen as a pre-adaptation withwhich the diversifying noncodingregulatory elements interacted toproduce organizational diversity incertain lineages (Copley, 2008).

    156 ARAVIND ET AL.

    Birth Defects Research (Part C) 87:143164, (2009)

  • 7/28/2019 BDRC_Multicellular_dev.pdf

    15/22

    THE MAKING OF

    SUBNETWORKSA CASESTUDY

    The Notch System

    In the final part of this review,we take up a well-known meta-zoan regulatory subnetwork as acase study. We discuss the prove-nance of its various componentsand how different evolutionaryforces and network organizationprinciples have acted together inassembling this subnetwork (seeFig. 5). The Notch subnetwork is

    conserved throughout metazoaand appears to be a sharedderived character that sets meta-zoans apart from their closest sis-ter groups such as the choanofla-gellate, Monosiga (King et al.,2008). Components of the Notchsubnetwork have been worked outin considerable detail across differ-ent metazoan models, such ashuman/mouse, Drosophila, andCaenorhabditis (Kopan and Ilagan,2009), and it interfaces with other

    signaling subnetworks, such as

    epidermal growth factor receptor(EGFR) and Wnt, within the larger

    regulatory network involved in tis-

    sue differentiation (Sahlgren and

    Lendahl, 2006). These studies

    have shown that the Notch sub-

    network by itself is essentially a

    selector system that helps in

    choosing the execution of a partic-

    ular subnetwork from among dif-

    ferent preexisting subnetworks in

    a cell. This action of the Notch sys-

    tem thus helps in choosing

    between activation and suppres-

    sion of cell proliferation, cell

    death, and survival, and between

    alternative differentiated states

    (Kopan and Ilagan, 2009). This

    last function is especially impor-

    tant in asymmetric cell-divisions,

    wherein the Notch system helps

    the daughter cells adopt distinct

    differentiated states; for example,

    in ectodermal differentiation Notch

    helps in the asymmetric divisions

    accompanying the separation of

    epithelial and neural fates (Guo

    et al., 1996).The Notch system is centered on

    the receptor-ligand pair of Notchand one of its many relatedligands prototyped by Drosophila,serrate or delta (Dsouza et al.,2008; Kopan and Ilagan, 2009).These ligands are also typicallymembrane-bound proteins, thusmaking Notch signaling dependenton the mechanical force acting onthe Notch protein due to directcellcell interactions (Kopan andIlagan, 2009). There are certain

    other soluble Notch ligands thateither act as negative regulatorsor cooperate with a membrane-bound version to increase thestrength of the ligand-Notch inter-action. When associated with itsligand, the Notch protein is proc-essed by multiple membrane-associated cleavage steps medi-ated by the ADAM family of metal-loproteases (Bray, 2006) and pre-senilins (c-secretase complex)(Selkoe and Wolfe, 2007) that

    Figure 5. A graphical representation of the Notch System. The notch ligands may be membrane-bound or soluble proteins. Uponligand binding, an intracellular part of the protein (NICD) is released by proteolytic processing which is shown separately. Otherregulatory processes impinging on the Notch subnetwork are shown in boxes to indicate their cofunctional linkage. The network isfestooned with labels indicating the evolutionary history of different components.

    MULTICELLULARITY AND NETWORKS 157

    Birth Defects Research (Part C) 87:143164, (2009)

  • 7/28/2019 BDRC_Multicellular_dev.pdf

    16/22

    then release an intracellular frag-ment of the protein which translo-cates to the nucleus (Kopan andIlagan, 2009). In the nucleus itinteracts with a transcription fac-tor of the CSL (CBF1/RBPjj/

    Su(H)/Lag-1) family. By default,the CSL transcription factor asso-ciates with a corepressor complexto negatively regulate geneexpression and also recruits thehistone chaperone ASF1 to pro-mote condensed chromatin thatfurther modulates gene expression(Kovall, 2008). Upon interactionwith the Notch intracellular frag-ment, the CSL transcription factorswitches its association from thecorepressor to recruit the coacti-vator MAM, which together recruit

    chromatin modifying and remodel-ing factors to promote transcrip-tional activation of target genes(Kopan and Ilagan, 2009). One ofthe targets of the CSL transcrip-tion factor conserved throughout

    Metazoa are the Enhancer of split(en(Spl)) transcription factors withbHLH DNA-binding domains thatinitiate a further transcriptionalcascade by binding their targetsites (Davis and Turner, 2001;Neves and Priess, 2005). This corepathway is dependent on a num-

    ber of other regulatory inputs (Laiet al., 2000; Kopan and Ilagan,2009). One of these is O-linkedglycosylation of the EGF domainsfound in the Notch extracellularregion by the fucosyltransferases,OFUT1 and Fringe, and the gluco-syltransferase, Rumi (Haines andIrvine, 2003; Stanley, 2007;Kopan and Ilagan, 2009). Thesemodifications alter the strength ofthe ligand-Notch interaction andhave an effect on the downstreamsignal flux through the Notch

    pathway. The Notch system alsointersects with the ubiquitin net-work in many ways, which resultsin altered stability or function ofdifferent components (Nicholset al., 2007). Ligands of Notchundergo endocytosis due to mono-ubiquitination mediated by the E3enzymes, Neuralized and Mind-bomb, and this process through anas yet unclear mechanism pro-

    duces a more active form of theligand (Kopan and Ilagan, 2009).

    The Notch protein is also subjectto ubiquitination by E3s, such asDeltex, Itch, Cbl, and Nedd4,which might target it for lysosomaldegradation or recycling (Kopanand Ilagan, 2009). This aspect of

    regulation is also central to thecross-talk between Notch andother signaling subnetworks suchas EGFR. In Drosophila the pro-tein, Phyllopod, activated by EGFRsignaling, is an adaptor for the E3Ebi, which helps in directing Notchto the early endocytotic vesiclesand thereby favoring its lysosomaldegradation (Nagaraj and Bane-rjee, 2009).

    Origin and Diversification of the

    Notch System

    Despite the depth of our under-standing of components of theNotch system and their operationin different metazoans, their ori-gins and evolutionary diversifica-tion have to be determined tounderstand how the system hascome together to play such keyroles in differentiation. The princi-pal innovation in the emergence ofthis system occurred at the baseof the metazoan tree in the formof the Notch receptor and its

    ligands (Kasbauer et al., 2007).The extracellular domains of bothNotch and its ligands are com-prised primarily of EGF domains.EGF domain proteins had alreadyextensively proliferated evenbefore the emergence of metazo-ans in the common ancestorshared with their sister group, thechoanoflagellates (King et al.,2008). In choanoflagellates we al-ready encounter proteins relatedto Notch, which have a giganticextracellular region with numerous

    EGF repeats; however, they dif-fered from Notch in lacking thedistinctive intracellular ankyrinrepeat modules (e.g., MON-BRDRAFT_27644, gi:167527456).Ankyrin repeats, like those presentin Notch, are also found fused toDNA-binding domains in transcrip-tion factors, such as the NFkBfamily and the SPT23 family,which share a common evolution-ary origin with the CSL family oftranscription factors (Hoppe et al.,

    2000; Iyer et al., 2008). Thesetranscription factor families areunified by the presence of a speci-alized immunoglobulin fold do-main, the TIG domain which inter-acts with the ankyrin repeats

    (Aravind and Koonin, 1999b;Hoppe et al., 2000; Kovall, 2007).This suggests an ancient func-tional association between the TIGdomain transcription factors andankyrin repeats. Comparativegenomics reveals that CSL tran-scription factors were alreadypresent in the common ancestor ofanimals and fungi, long before theemergence of Notch and itsligands (Aravind and Subrama-nian, 1999). The fusion of ankyrinrepeats to the cytoplasmic tail of a

    large EGF repeat protein that hademerged in the animal lineageprior to the radiation of metazoans(e.g., the above version found inchoanoflagellates) appears tohave given rise to a new signalingreceptor that could now communi-cate with preexisting nuclear TIGdomain transcription factors suchas CSL. This observation indicatesthat the Notch-subnetwork wasbuilt stepwise in evolution bysuperimposition of a newly derivedmembrane receptor on to an al-

    ready existent transcription sub-network regulated by a CSL tran-scription factor.

    Linkage to several other subsys-tems with very distinct originsappears to have played an addi-tional role in transforming this coreinto the recognizable Notch sys-tem. The ubiquitin-network pro-teins involved in endocytotic proc-esses controlling the Notch system,in part, represent the adaptation ofan older eukaryotic protein traffick-ing and degradation system to reg-

    ulate this signaling system (Ven-ancio et al., 2009). Presenilins andthe associated c-secretase complexform an ancient membrane-proteinprocessing complex that was inher-ited by the eukaryotes from theirarchaeal ancestors (VA and LA,unpublished). In contrast, theADAM metalloproteases underwenta major expansion in the animallineage in relation to the extracellu-lar matrix that emerged along withmulticellularity (Andreini et al.,

    158 ARAVIND ET AL.

    Birth Defects Research (Part C) 87:143164, (2009)

  • 7/28/2019 BDRC_Multicellular_dev.pdf

    17/22

    2005). Thus, proteolytic systemswith very distinct origins appear tohave come together in the ances-tral animal to mediate the process-ing of the Notch receptor (see Fig.5). Similarly, the emergence of

    multicellularity in animals appearsto have been accompanied by theexpansion of different O-linked gly-cosylation enzymes that modifiedserines and threonines in adhe-sion-related domains found in cell-surface proteins (Anantharamanet al., 2007b). This expansionmight have primarily been an ad-aptation for regulating cell-adhe-sion through glycosylation, but thepresence of this system alsoallowed it to be adapted as a regu-latory influence on the incipient

    Notch system. Taken together,these observations suggest thatmajor parts of the Notch systemwere: (1) already available mod-ules minimally adapted in biochem-ical terms for a new role or (2)

    emerged as part of a more generalseries of molecular adaptationsthat accompanied the origin of mul-ticellularity (see Fig. 5). However,there were components of theNotch system that appear to repre-sent innovations specific to Meta-zoa: the MAM domain of the MAM

    coactivators and the specializedSPRY domain found only in theNeuralized family E3s that are spe-cific to this system are the twomain instances. Even in thesecases, the innovations were not toodrastic because the MAM domain ismerely a long bent a-helix anddomains of the Neuralized havebeen derived from preexistingSPRY domains through rapid diver-gence (Kovall, 2007; Kuang et al.,2009).

    The two prominent genome-

    shaping forces of lineage-specificexpansion and gene loss also playan important role in the history ofthe Notch pathway in metazoanevolution (see Fig. 5). Gene lossand degradation are observed inthe nematode lineage. The coacti-vator of the CSL transcription fac-tor, MAM, is apparently lost inCaenorhabditis. Likewise, theNotch-ligand endocytosis regula-

    tor, Mindbomb, appears to havelost the N-terminal Herc2 and ZZ

    domains found in other metazoanlineages (Kasbauer et al., 2007)(LA, unpublished). LSEs haveimpacted the evolution of manynodes in the Notch subnetwork atvarious points in it organization. A

    notable LSE of the Notch ligands isseen in Caenorhabditis with atleast 15 distinct ligands (Kopanand Ilagan, 2009). A smaller LSEof Notch ligands and of Notch itselfis observed in vertebrates (Kopanand Ilagan, 2009). This LSE ofligands has enabled the transmis-sion of signals encompassing awhole range of relative strengthsvia the same receptors. In Dro-sophila, there is a lineage-specificexpansion of a remarkable Zn-fin-ger domain of the treble-clef fold,

    the C4DM domain (also calledZAD), with four conserved cys-teines (Lander et al., 2001). Mem-bers of this family of domainsappear to be adaptors that link avariety of targets to ubiquitinationby E3s (Jauch et al., 2003). Phyl-lopod, which connects the EGFRand Notch networks (Nagaraj andBanerjee, 2009), is a member ofthis LSE. Thus, phyllopod offers anexample of how the emergence ofa neomorphic protein through anLSE has resulted in the strength-

    ening of the linkage between twopathways. One of the conservedtargets of the notch pathway, theEn(Spl) transcription factors alsoshow striking independent LSEs invarious lineages. In Drosophilamelanogaster, the LSE in theEn(Spl) has resulted in seven dis-tinct paralogs (Knust et al., 1992).A similar LSE is seen in Caeno-rhabditis elegans, wherein theEn(Spl) cognate has undergone anLSE to spawn six distinct paralogs,many of which contain two bHLH

    DNA-binding domains (Neves andPriess, 2005). These paralogs arenot found in Brugia or even otherCaenorhabditis species, suggest-ing that is a relatively recent LSEfollowed by rapid sequence diver-gence (LA, unpublished). This LSEof the En(Spl) genes has probablyplayed a role in diversifying thetargets that are under the controlof the Notch signaling system. Inpart this diversification of thefunction of the En(Spl) transcrip-

    tion factors has occurred not viadifferences in their DNA-bindingproperties but their cis regulatoryelements (Maeder et al., 2009).

    In conclusion, the evolutionarydissection of the Notch pathway

    illustrates some of the principlesderived from regulatory networkstructure. Firstly, it emphasizes themodularity of the networks andhow the emergence of a key linkingelement with a high betweenness(in this case Notch) can bring to-gether disparate modules into asingle network. The biochemicalbasis for this property of Notch isthe fact that it evolved throughfusion of two distinct sets of domains, which respectively medi-ate intracellular and extracellular

    interactions. Furthermore, thisanalysis shows how genes or partthereof, which are considered to becritical in one system can be lost inanotherwhen viewed in light ofthe resistance of regulatory net-works to failure (i.e., random nodeloss) such losses are not unex-pected. The example also illus-trates how LSEs act at various lev-els to allow a core module of a net-work to receive inputs and deliveroutputs that greatly differ in theirconsequences or be combined with

    signals from other subnetworks(Nagaraj and Banerjee, 2009).Finally, these LSEs combined withthe diversification of cis regulatoryregions actually illustrate how aconserved subnetwork like Notchhas been used in generating uniquemorphologies in different animallineages.

    CONCLUDING REMARKS

    The triumphs of the genomic revo-lution and the consequent impact

    on the way biology is done haveallowed us for the first time toapprehend the architecture andfunctions of regulatory networks.When combined with evolutionaryinformation, we obtain a remark-able view of how biological sys-tems have actually been shapedby various forces. This informationhas considerable implications forhow processes such as develop-ment and differentiation in multi-cellular forms are addressed. Pri-

    MULTICELLULARITY AND NETWORKS 159

    Birth Defects Research (Part C) 87:143164, (2009)

  • 7/28/2019 BDRC_Multicellular_dev.pdf

    18/22

    marily, it allows one to see theseprocesses not in isolation, but inthe context of both their historiesand place in the overall network ofmolecular interactions. This is im-portant because all biological sys-

    tems, including regulatory net-works, are not products of engi-neering but of the contingencies ofhistorical processes (Raff, 1996;Gould, 2002; Kirschner and Ger-hart, 2005; De Robertis, 2008). Ithas been typical to take an engi-neering approach to the analysisof regulatory networks in the past,due to lack of evolutionary infor-mation. In the postgenomic erathis need not be the case (Bara-basi and Oltvai, 2004; Balaji et al.,2006a; Busser et al., 2008).

    Hence, we can transcend the viewof regulatory networks restrictedto particular model organisms andinstead view them as evolvingentities across the evolutionarytree. In the first place, this

    approach helps in objectively tack-ling features of the model organ-isms themselves. In the exampleof the Notch system, it wasobserved that Phyllopod is a partof an LSE that is unique to insects;hence, there would be no point tosearch for it in other metazoan

    models. Thus, understanding aregulatory network in evolutionaryterms helps in discriminatingbetween universal and nonuniver-sal components of a system anddelineating its functional core.Moreover, such an approach alsohelps us in trying to identify fea-tures of regulatory networks thatmight be of interest in new modelsor experimentally less-tractableorganisms for which genomesequences are available (Grimsonet al., 2008; Veitia, 2008). In such

    cases using concepts, such asLSEs and gene loss, identificationof cis regulatory elements, andclues from domain architectures,one could focus on aspects of thenetwork that are likely to be criti-cal in organism-specific functions(Lespinet et al., 2002; Weinstock,2007). This could potentially leadto uncovering of the mechanismsby which subnetworks have been

    involved in generating a lineage-specific phenotypic novelty.

    However, considerable work stillremains to be done. None of thelarge-scale studies in multicellulareukaryotes or prokaryotes hasreached the level of detailachieved by the individual efforts

    such as those that explain the for-mation of even-skipped stripes inDrosophila or such other spatio-temporal patterns (Howard andDavidson, 2004; Istrail and David-son, 2005). Achieving that level ofdetail in regulatory network recon-struction is certainly impeded bytechnical hurdles, as mentioned inthe beginning of this review. How-ever, the rapid advances in high-throughput methods, includingsequencing, optics, and micro-fab-rication technology indicate that

    these roadblocks may be over-come sooner than later (Imelfortet al., 2009; Joos and Bachmann,2009; Nygaard and Hovig, 2009;Todt and Blohm, 2009). Hence, itdoes appear that direct compari-sons of entire regulatory networksof different multicellular forms area clear possibility in the nearfuture. While the general princi-ples that have been discussed inthis article will remain a guidingforce, it is very likely that severalunexpected findings emerge from

    these comparisons.

    ACKNOWLEDGMENTS

    Given the vast number of articles in the topic under consid-eration, we have been unable tocite all of them due to obviousspace constraints. We do acknowl-edge the enormous labor of work-ers in this field and apologize fornot being able to cite them all.

    REFERENCES

    Abad P, Gouzy J, Aury JM, et al. 2008.Genome sequence of the metazoanplant-parasitic nematode Meloido-gyne incognita. Nat Biotechnol 26:909915.

    Adams MD, Celniker SE, Holt RA, et al.2000. The genome sequence of Dro-sophila melanogaster. Science 287:21852195.

    Albert R, J