13
Review Integrated approaches to uncovering transcription regulatory networks in mammalian cells Kai Tan a, , Jesper Tegner b,c, , Timothy Ravasi a,d, a Department of Bioengineering, Jacobs School of Engineering, University of California at San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA b Unit of Computational Medicine, King Gustaf V Research Institute, Department of Medicine, Karolinska Institute, SE-171 76 Stockholm, Sweden c Computational Biology, Department of Physics, Linköping University, SE-581 53 Linköping, Sweden d Scripps NeuroAIDS Preclinical Studies Center, Scripps Research Institute, La Jolla, CA 92037, USA Received 17 September 2007; accepted 16 November 2007 Available online 8 January 2008 Abstract Integrative systems biology has emerged as an exciting research approach in molecular biology and functional genomics that involves the integration of genomics, proteomics, and metabolomics datasets. These endeavors establish a systematic paradigm by which to interrogate, model, and iteratively refine our knowledge of the regulatory events within a cell. Here we review the latest technologies available to collect high- throughput measurements of a cellular state as well as the most successful methods for the integration and interrogation of these measurements. In particular we will focus on methods available to infer transcription regulatory networks in mammals. Published by Elsevier Inc. Keywords: Systems biology; Networks; Gene networks; Transcriptional networks; Genomics; Dynamics; Gene expression; Cellular networks; Personalized medicine; Systems medicine; Bioinformatics; Protein-protein interaction networks; ChIp-chip; Protein interaction maps Contents Introduction ................................................................ 220 Graphic theoretical representation of transcriptional regulatory networks ................................ 220 Experimental technologies to obtain network data ............................................ 221 Capturing the cellular transcriptome ................................................... 221 Acquiring physical interaction data of the network ............................................ 222 Proteinprotein interaction networks ................................................. 222 ProteinDNA interaction networks .................................................. 223 Computational modeling and inference of regulatory networks ..................................... 223 Identification of regulatory DNA elements .............................................. 223 Identification of regulatory DNA modules .............................................. 224 Current databases for regulatory interactions ............................................... 224 Integrating gene expression profiles with molecular interaction data to construct regulatory networks.................. 224 Using integrated networks ........................................................ 225 Integrated approaches to studying tissue-and disease-specific networks ............................... 226 Network dynamics .......................................................... 226 Available online at www.sciencedirect.com Genomics 91 (2008) 219 231 www.elsevier.com/locate/ygeno Corresponding authors. J. Tegner is to be contacted at Unit of Computational Medicine, King Gustaf V Research Institute, Department of Medicine, Karolinska Institute, SE-171 76 Stockholm, Sweden. T. Ravasi, Department of Bioengineering, Jacobs School of Engineering, University of California at San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA. Fax: +1 858 822 4246. E-mail addresses: [email protected] (K. Tan), [email protected] (J. Tegner), [email protected] (T. Ravasi). 0888-7543/$ - see front matter. Published by Elsevier Inc. doi:10.1016/j.ygeno.2007.11.005

Review Integrated approaches to uncovering transcription ...enables network identification provided large enough sam-ples are available. Perturbation algorithms have been success-fully

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Review Integrated approaches to uncovering transcription ...enables network identification provided large enough sam-ples are available. Perturbation algorithms have been success-fully

Available online at www.sciencedirect.com

Genomics 91 (2008) 219–231www.elsevier.com/locate/ygeno

Review

Integrated approaches to uncovering transcription regulatorynetworks in mammalian cells

Kai Tan a,⁎, Jesper Tegner b,c,⁎, Timothy Ravasi a,d,⁎

a Department of Bioengineering, Jacobs School of Engineering, University of California at San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USAb Unit of Computational Medicine, King Gustaf V Research Institute, Department of Medicine, Karolinska Institute, SE-171 76 Stockholm, Sweden

c Computational Biology, Department of Physics, Linköping University, SE-581 53 Linköping, Swedend Scripps NeuroAIDS Preclinical Studies Center, Scripps Research Institute, La Jolla, CA 92037, USA

Received 17 September 2007; accepted 16 November 2007Available online 8 January 2008

Abstract

Integrative systems biology has emerged as an exciting research approach in molecular biology and functional genomics that involves theintegration of genomics, proteomics, and metabolomics datasets. These endeavors establish a systematic paradigm by which to interrogate, model,and iteratively refine our knowledge of the regulatory events within a cell. Here we review the latest technologies available to collect high-throughput measurements of a cellular state as well as the most successful methods for the integration and interrogation of these measurements. Inparticular we will focus on methods available to infer transcription regulatory networks in mammals.Published by Elsevier Inc.

Keywords: Systems biology; Networks; Gene networks; Transcriptional networks; Genomics; Dynamics; Gene expression; Cellular networks; Personalizedmedicine; Systems medicine; Bioinformatics; Protein-protein interaction networks; ChIp-chip; Protein interaction maps

Contents

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220Graphic theoretical representation of transcriptional regulatory networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220Experimental technologies to obtain network data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221Capturing the cellular transcriptome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221Acquiring physical interaction data of the network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222

Protein–protein interaction networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222Protein–DNA interaction networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223

Computational modeling and inference of regulatory networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223Identification of regulatory DNA elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223Identification of regulatory DNA modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224

Current databases for regulatory interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224Integrating gene expression profiles with molecular interaction data to construct regulatory networks. . . . . . . . . . . . . . . . . . 224Using integrated networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225

Integrated approaches to studying tissue-and disease-specific networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226Network dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226

⁎ Corresponding authors. J. Tegner is to be contacted at Unit of Computational Medicine, King Gustaf V Research Institute, Department of Medicine, KarolinskaInstitute, SE-171 76 Stockholm, Sweden. T. Ravasi, Department of Bioengineering, Jacobs School of Engineering, University of California at San Diego, 9500 GilmanDrive, La Jolla, CA 92093, USA. Fax: +1 858 822 4246.

E-mail addresses: [email protected] (K. Tan), [email protected] (J. Tegner), [email protected] (T. Ravasi).

0888-7543/$ - see front matter. Published by Elsevier Inc.doi:10.1016/j.ygeno.2007.11.005

Page 2: Review Integrated approaches to uncovering transcription ...enables network identification provided large enough sam-ples are available. Perturbation algorithms have been success-fully

220 K. Tan et al. / Genomics 91 (2008) 219–231

Future challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227

Introduction

The genome content of most cells in a multicellular organismis identical. Nevertheless, in the human body over 220 differentcell types can be distinguished based on their morphologicalproperties, localizations, functions, and developmental stages[1,2].

The cell's ability to acquire and maintain its identity requiresa tight control of the temporal and spatial interplay betweenmillions of individual components (nucleic acids, proteins, andmetabolites) encoded by the same genome [3–6]. Althoughbiologists commonly study cells as if they were static entities,individual cells rarely operate in a steady state. The ability of acell to detect and respond to transient signals within and outsidethe cell is crucial for the integrity of a multicellular organism.When error occurs in the fine-tuned cellular regulatory systemor errors are introduced into the genome as mutations, thecellular behaviors and identities can deviate from their normalsituations and thereby induce pathological scenarios such asdiseases [7–9]. To understand these cellular functions in detail,we need to identify the mechanisms by which genomicinformation is tightly controlled in space and time.

In eukaryotes the control of gene expression is a multilevelprocess influenced by the position of each chromosome in thenucleus (nuclear territories) [10–12], the spatial and conforma-tional rearrangements of chromosomal regions (chromatinremodeling), and the fine interplay between regulatory proteinsand DNA sequences [11–13] (Fig. 1). The interactions betweenregulatory proteins and DNA sequences are probably the mostdiverse and complex of all these regulatory mechanisms. In anygiven cell all combinations of transcriptional regulators andtheir specific DNA binding sequences responsible for thecorrect regulation of every transcript encoded in the genome areusually referred to as the transcriptional regulatory network(TRN) of the cell. Yet recent findings add an additionaldimension to the regulatory control of gene expression, in thatevidence has been gathered [14] supporting the notion that

Fig. 1. Transcription regulation. Regulation of gene expression is a complexinterplay between transcription regulators, transcription regulator complexes,and DNA cis-regulatory elements. Black line, DNA; black box, exon; blackarrow, transcription start site; gray box, DNA cis-regulatory element; greensymbol, transcriptional activator; red symbol, transcriptional repressor.

transcription is itself controlled by noncoding RNAs (ncRNAs).The precise mechanism by which this posttranscriptionalcontrol operates may be very complex. For example, RNAmay act directly on the induction, processing, or stability ofanother transcript. Furthermore ncRNAs are a major, regulatedoutput of the mammalian genome [15,16]. The role of ncRNAsin mammalian transcription regulation is beyond the scope ofthis review, although they need to be considered as a part of theregulatory network.

If we know the identity of every transcript produced by thegenome in each cell type, the way transcription changes with timeand space, and, most importantly, the control mechanisms of thesegenome outputs, the stage is set to infer the regulatory network foreach cell under different conditions. This gives us the opportunity tosuggest therapeutic mechanisms to correct for the errors underlyingpathological conditions. Thus research aimed at developingtechniques for uncovering cellular networks from experimentaldata is central for translating genomic discoveries into therapeutics.

During the past 5 years there has been rapid progress in thedevelopment of a systems approach for identifying transcrip-tional regulatory networks from high-throughput data [17,18]. Aparticularly powerful paradigm is the “perturbation” approach.Here the cellular response to a perturbation (typically of aregulator via environmental and/or genetic means) is monitoredby high-throughput assays such as gene expression microarray,and fitting the data to a computational model of the gene networkenables network identification provided large enough sam-ples are available. Perturbation algorithms have been success-fully applied to Escherichia coli and Saccharomyces cerevisiae[19,20]. Although successful in unicellular organisms, theperturbation approach has been difficult to apply to mammaliancells due to the limited number of samples and the difficulties inmodulating gene activity in mammalian cells with siRNA orgene knockout. However, a complementary approach, dataintegration, promises to be a useful strategy for uncoveringmammalian networks. In this paper we will therefore review thecurrent advances in the construction of transcription regulatorynetworks in mammals. We will first focus on the availableexperimental and computational techniques to collect genome-wide measurements and to construct static models of regulatorynetworks. We will then describe some of the current approachesto add temporal and spatial dimensions to these networks.

Graphic theoretical representation of transcriptionalregulatory networks

Agraph is useful for illustrating a network,with its componentsand their interactions, of a complex system [21–26]. Here, thegraph corresponding to the network is composed of “nodes,”which can represent any biomolecules such as proteins, DNA,RNA, and metabolites, and “edges,” representing relationshipsbetween nodes. An edge can denote a physical interaction, such as

Page 3: Review Integrated approaches to uncovering transcription ...enables network identification provided large enough sam-ples are available. Perturbation algorithms have been success-fully

221K. Tan et al. / Genomics 91 (2008) 219–231

protein–protein, protein–DNA, protein–RNA, RNA–RNA, andprotein–metabolite interactions, or a functional relationship suchas coexpression or a genetic interaction such as epistasis.

In a transcriptional regulatory network the nodes generallyare transcription factors, cofactors, and chromatin regulators(from now on in this review we will refer to these genescollectively as transcription regulators or TRs) and the DNAregulatory regions (promoters and enhancers, referred asregulatory elements, REs). Edges in the network are physicalinteractions between TRs (in this case, protein–protein interac-tions) and physical interactions between TRs and REs (protein–DNA interactions, PDIs). Fig. 2 shows a graphical representa-tion of a simple transcription regulatory network.

Experimental technologies to obtain network data

There are two main types of information we have to collect tobe able to infer transcription regulatory networks in a cell. First,we need to identify all the nodes (including the TRs), whichmeans that we have to define all the expressed transcripts underany developmental and growth conditions. Second, we need toidentify all possible physical interactions between nodes, i.e.,between TRs and between TRs and REs.

The completion of a large number of eukaryotic genomesequences, the rapid progress toward complete catalogs ofgenomic transcriptional output (the transcriptome), the capacityto analyze the complete transcriptome of a cell, and the advent

Fig. 2. Transcription regulatory networks. Regulatory events at the gene level (lefregulatory events inside a cell (right). Protein–protein (black edges) and protein–D(usually represented as the regulated genes, squared nodes) are shown. Note that prrepression (red).

of technologies that allow us to capture protein–protein andprotein–DNA interactions within any cellular content offer usthe tools to identify all the nodes and edges in the network.

Capturing the cellular transcriptome

Gene expression profiling using microarray chips is by far themost successful genome-wide technology to capture the genomeoutput of a cell [27,28]. Expression microarray is an RNA-basedmethod that allows the simultaneous measurement of virtually allthe transcripts in a cell. This has been and still is a very powerfultechnique thanks to its relative technical simplicity, low cost, andshort turnover time, which make expression microarrays astandard molecular biology technique available to any laboratory.Moreover, different array-based technologies have now becomecomparable across different platforms [29]. In recent years theadvent of high-density microarray chips has allowed us to definethe entire transcriptome of more complex organisms such ashuman. Computational methods used for the analysis of theselarge collections of data have also been improved andstandardized, making the interpretation of microarray data moreaccessible to those without a strong computational background[30,31]. For these reasons and also thanks to the efforts ofconsortia aiming to standardize microarray datasets [32], we nowhave access to more then 160,000 expression profiles for variousorganisms and cell types (estimate fromNCBI's Gene ExpressionOmnibus, July 2007; http://www.ncbi.nlm.nih.gov/geo).

t) are mapped into a network view (broken arrows) aiming to capture all theNA interactions (green and red edges) involving TRs (circular nodes) and REsotein–DNA interactions are directional and can represent activation (green) or

Page 4: Review Integrated approaches to uncovering transcription ...enables network identification provided large enough sam-ples are available. Perturbation algorithms have been success-fully

222 K. Tan et al. / Genomics 91 (2008) 219–231

Although transcript profiling using microarrays is the mostwidespread functional genomics technique, it is not the onlyone. Other approaches have been developed as alternatives tomicroarray, including serial analysis of gene expression (SAGE)[33,34] and most recently the cap analysis of gene expression(CAGE) [35–37]. More sensitive techniques such as the polonymultiplex analysis of gene expression (PMAGE) allow the mea-surements of mRNAs as rare as one transcript per three cells [38].Since SAGE and CAGE are RNA library based they requiresequencing of millions of cDNA tags from each library andconsequently they are not accessible to many laboratories. UnlikeSAGE, CAGE also enables systematic and genome-widemapping of transcription starting sites (TSSs) for every transcriptexpressed in a cell (more about the properties of CAGE is givenin the following paragraphs).

Although with less throughput than array-based technolo-gies, quantitative real-time PCR (qRT-PCR) is becoming anincreasingly important complementary tool for the constructionof TRNs [39,40], due to its quantitative nature and highersensitivity, which allow more accurate measurements of low-abundance transcripts such as those encoding transcriptionfactors [41] (Table 1).

Acquiring physical interaction data of the network

In a transcriptional regulatory network there are two types ofphysical interactions represented by edges, namely those be-tween the regulatory proteins and their DNA binding sequences(PDIs) and those between regulatory proteins (PPIs).

Protein–protein interaction networks

In eukaryotes the regulation of gene expression oftenrequires more than one TR to ensure the correct expression ofa gene. TRs interact to form protein complexes and in manycases this is a requirement to be able to bind DNA regulatoryelements [42–48]. For example, this is the case for homodimersbinding palindromic transcription factor binding sites (TFBS)[49]. Furthermore, TFBSs tend to cluster together in specificand conserved regions in the genome and TRs targeting these

Table 1Experimental techniques commonly used to collect high-throughputmeasurementsof gene expression and physical interactions

Technique References

Expression measurementDNA microarray [27,28]SAGE [33,34]CAGE [35–37]PMAGE [38]qRT-PCR [39,40]

Protein–DNA measurementChIP-chip [74,75]ChIP-PET [76]

Protein–protein measurementTwo-hybrid systems [59,61,62]Co-IP and mass spectrometry [63–65]

DNA regions also interact at the protein level to form proteincomplexes [50–53]. A dramatic example is the transcriptionalinitiation complex, which is composed of more than 30 proteinsand binds specific regulatory elements via a few core com-ponents such as the TATA box binding protein [54–58].

The interplay between TRs is often referred to as the com-binatorial regulation of gene expression. Therefore capturing allpossible combinatorial interactions between TRs is an essentialstep toward the construction of mammalian transcription regu-latory networks. For this purpose complete maps of PPIs areof utmost value as a first step in mapping putative pairwiseinteractions.

PPIs are usually generated by two-hybrid technologies(Y2H) [59], in which two proteins, a bait fused with a specificDNA binding domain able to bind the yeast GAL4 gene and aprey fused with the GAL4 activator domain, are overexpressedin the same yeast cell. If the two proteins of interest are able tointeract, then the GAL4 activator is reformed and thus able toactivate the expression of a reporter gene that is under thecontrol of the GAL4 gene promoter [60]. A similar approachhas been developed, using a mammalian system, by Suzuki andcolleagues [61,62]. Their technique is at least in principle moreamenable to a systematic screening of binary mammalian PPIsbecause the overexpressed proteins are folded and modified in amore natural environment.

PPI maps can also be constructed using coimmunoprecipita-tion followed by mass spectrometry [63–69]. This technology ismore specific than Y2H (lower false positive rate) and thereforeless scalable. Since the technology relies on coimmunopreci-pitation it is more suitable for identifying protein complexeswith indirect interactions, in contrast to Y2H, which measuresdirect pairwise interactions.

In recent years the number of binary nonredundant humanPPIs has increased dramatically thanks to extensive literaturemining (36,617 in the HPRD database [70]) and also to large-scale experimentally determined PPIs such as the work fromRual and colleagues and Ewing and colleagues [63,71].However, one of the limitations with the current human PPImap is the low coverage of TR interactions because the exper-imental techniques generally are biased toward large macro-molecular complexes (i.e., ribosome, spliceosome, membranechannels, etc.) and because of the low abundance of TRscompared to cytosolic proteins. Suzuki and colleagues of theRIKEN Genome Science Center in Japan have generated forthe first time a nuclear-specific PPI map for mouse [62] andnow they are focusing on the human nuclear PPI map (personalcommunication). Such maps will be very useful resourcesfor the construction of mammalian transcriptional regulatorynetworks.

Another limitation of current PPI networks is the extremelyhigh false discovery rate, which has been estimated to be from40 to 70% for the Rual human PPI network [71,72]. Such higherror rates can be readily reduced with the integration of othertypes of data, for example, gene expression information, so thatall those interacting proteins that are never coexpressed could beremoved from the network. We will discuss more details aboutthe integrated approach in the following sections (Table 1).

Page 5: Review Integrated approaches to uncovering transcription ...enables network identification provided large enough sam-ples are available. Perturbation algorithms have been success-fully

223K. Tan et al. / Genomics 91 (2008) 219–231

Protein–DNA interaction networks

To regulate gene expression, either individual TRs orcomplexes of TRs need first to bind specifically to cis-regulatory DNA sequences. Traditionally, the most commonmethods to infer TR–DNA binding events on a large scale arecomputational ones (discussed in detail later).

New technologies have emerged that enable in vivo genome-wide experimental mapping of TR–DNA binding events. Themost widespread of these techniques is the genome-widelocation analysis (GWLA), also known as chromatin immuno-precipitation coupled with microarray chip (ChIP–chip) orchromatin immunoprecipitation coupled with paired-end ditagsequencing (ChIP–PET) [73–76]. In GWLA, TR–DNAbindingevents are captured and frozen in a specific cellular state by invivo crosslinking. Then the genomic DNA is fragmented and theTR of interest is isolated with a specific antibody, along withthose genomic fragments bounded by the TR. After crosslinkingreversal and protein digestion, the pulled-down DNA is labeledin a manner analogous to a cDNA microarray experiment, buthybridized to an oligo microarray chip whose content is directedtoward regulatory regions rather than exons. These chips arecomposed of 20- to 70-mer probes tiling the entire genome ormore likely, in the case of mammalian genomes, tiling promotersand intergenic regions. In the case of two-colored arrayplatforms, the immunoprecipitated DNA can be compared tothe total DNA input (the same genomic DNA but divided beforethe immunoprecipitation step) and this facilitates the normal-ization of the fluorescent signals [75].

In the case of ChIP–PET, the immunoprecipitated DNA iscloned into a DNA library and then converted into paired-endditags (PETs). The PETs are concatenated and cloned into aChIP–PET library for sequencing, and the number of ditagsshould be proportional to the original amount of immunopre-cipitated DNA fragment and therefore enriched for thosefragment bounded by the TR [76].

GWLAs are powerful techniques since they capture in vivobinding events in a high-throughput fashion and thus TF bindingevents can be compared across several cellular conditions, drugstimulation, developmental stage, etc. GWLAs also facilitatecomputational prediction of TFBSs since the experimentallyidentified TR binding regions drastically reduce the searchspace.

The general limitation of GWLAs is the high noise level (andso high false discovery rate) in the data due to several technicalchallenges in the experimental protocol. The amplification steprequired for the DNA labeling tends to reduce the enrichment ofthe immunoprecipitated DNA, and, most importantly, thequality and specificity of the antibody can change the efficiencyof the immunoprecipitation. The antibody also introduces alimitation in the horizontal throughput of these techniques,meaning it is difficult, particularly in mammalian systems, tomap the binding events of several TRs in parallel so theselection of the TRs is strongly biased toward the availability ofa high-quality antibody. This problem is less accentuated inmodel organisms such as yeast in which all the TRs have beenepitope-tagged using the tandem affinity purification (TAP) tag

[77] and therefore the same anti-TAP antibody can be use toimmunoprecipitate virtually all the TRs in an array of differentconditions [78,79] (Table 1).

Computational modeling and inference ofregulatory networks

Before the advent of high-throughput assays such as GWLAand CAGE, the most common methods to infer TR–DNAbinding events were computational ones. In a network view wecan draw inferred edges from a TR to a gene that bears aputative binding site(s) for the respective TR in its regulatoryregion. All these approaches have several intrinsic limitations,especially when applied to complex organisms such as humans,as the genome size correlates directly with the amount of noisein binding-site predictions. In addition, mammalian promotersare not well or easily defined because a promoter regulating agene can reside a long distance away from the gene start andmultiple promoter regions can contribute to the regulation of thesame gene [80–83]. Taken together these factors make TFBSprediction a very difficult task and it may therefore suffer fromproblems of false positives and false negatives; yet there areseveral tricks (discussed below) that have been used success-fully to reduce the noise in TR binding-site predictions. Theadvantage of TFBS prediction in the construction of TR–DNAedge libraries is that it can be done for virtually all the TRs forwhich a DNA binding motif has been defined.

Identification of regulatory DNA elements

Cross-species sequence comparisons, which rely upon theslow substitution rate of many categories of functional DNArelative to neutral sequence, have emerged as the preeminentmeans of identifying candidate cis-regulatory elements inmammalian genomes [84–89]. These studies involve sequencecomparisons of human (or other mammal) genomic intervals toorthologous regions from organisms separated by varying evo-lutionary distances, ranging from primates to fish. The mostimportant issue of comparative analysis is the choice of species,which depends on the goal being pursued. Previous theoreticalstudies [90,91] have shown that higher-resolution functionalprediction at the level of a transcription-factor binding site (6–12 bp) is likely to require sequence from more than 10 mammalsspread across the clade. In practice, Xie et al. performed acomparative analysis of the human, mouse, rat, and dog ge-nomes to create a systematic catalog of common regulatorymotifs in promoters and 3′ untranslated regions (3′ UTRs). Thepromoter analysis yielded 174 candidate motifs, including mostpreviously known transcription factor binding sites and 105 newmotifs. The 3′-UTR analysis yielded 106 motifs likely to beinvolved in posttranscriptional regulation. On the other hand,Pennachio made use of extreme evolutionary sequence con-servation as a filter to identify putative enhancer activity of alarge group of noncoding elements in the human genome that areconserved in human–pufferfish (Takifugu (Fugu) rubripes) orultraconserved in human–mouse–rat. They tested 167 of theseextremely conserved sequences in a transgenic mouse enhancer

Page 6: Review Integrated approaches to uncovering transcription ...enables network identification provided large enough sam-ples are available. Perturbation algorithms have been success-fully

Table 2Databases of transcription regulatory and biomolecular physical interactions inmetazoan

Type of interaction Web address

Transcription regulatoryTRANSFAC www.gene-regulation.com/pub/databases.html#transfacTRED http://rulai.cshl.edu/cgi-bin/TRED/tred.cgi?process=

homeVISTA Enhancer http://genome.lbl.gov/vista/index.shtmlPReMod http://genomequebec.mcgill.ca/PReMod

Biomolecular physicalDIP http://dip.doe-mbi.ucla.eduBIND www.binddb.orgHPRD www.hprd.orgREACTOME www.reactome.orgMINT http://mint.bio.uniroma2.it/mint/Welcome.doPDZBase http://icb.med.cornell.edu/services/pdz/startAfCS www.signaling-gateway.orgIntAct www.ebi.ac.uk/intact/site/index.jsf

224 K. Tan et al. / Genomics 91 (2008) 219–231

assay and found that 45% of these sequences functionedreproducibly as tissue-specific enhancers of gene expression.By extrapolation, the authors estimated that there were at least anadditional 5500 human–fish conserved noncoding sequences inthe human genome with similar levels of constraint that arestrong candidates for acting as gene enhancers. Sequencing ofadditional mammalian genomes [92] will incrementally facil-itate the identification of large regulatory modules in the humangenome. The advantage of the comparative genomic approachover current high-throughput technologies is that it can identifyconserved regulatory modules.

With the advent of high-throughput technologies such asGWLA and CAGE, many types of functional DNA elements(TFBSs, basal promoters, enhancers, insulators, etc.) are nowbeing experimentally mapped on the genome scale with a typicalresolution of a few hundred nucleotides. Carninci, Nilsson, andTegner, for example, have recently used CAGE-based TSSmapping to restrict the cis-regulatory elements search space andalso to focus the analysis on only active (expressed) REs,thereby reducing the number of false positives.

Raw data from these high-throughput assays are noisy andtypically do not have resolution at the single binding site level.Thus several new computational methods for DNA motiffinding have been developed to take advantage of genome-widelocation data. ChIP–chip binding p values were first used torank and select high-confidence promoter sequences as inputs toexisting motif discovery algorithms [93,94]. Recently, twogroups introduced boosting strategies that take into accountboth binding and nonbinding sequences from ChIP–chip dataduring motif search [95,96].

In addition to narrowing the search space, GWLA data alsoprovide quantitative information about the binding free energyof TR–DNA interaction. Two groups [97,98] have fitted astatistical–mechanical model of TR–DNA interactions tobinding ratios of oligonucleotide probes on the microarrayused in ChIP–chip, to infer DNA motif models for the TRs.

GWLA technologies have also been used to map chromatinmodification states, such as histone modifications. In a recentstudy, Heintzman et al. [74] demonstrated that these epigeneticsignatures could be used to identify promoters and enhancers inthe human genome.

Identification of regulatory DNA modules

For genes in higher eukaryotes the binding of an individualTR is not sufficient to drive context-specific transcription.Rather, interaction and cooperation of several TRs are needed toaffect gene expression at specific times and locations [99]. TheDNA regions targeted by a group of TRs are usually clusteredtogether and form so-called cis-regulatory modules. Comparedto TR binding-site predictions, module predictions are morereliable, and methods have been developed along several lines.In the first category the promoters of a set of coregulated genesobtained from prior experiments are analyzed to identifyoverrepresented motif combinations likely to be responsiblefor the genes' coregulation [100–105]. Other approachesassume that the user provides a small set of TF–DNA binding

motifs that are expected to co-occur in modules and identifygenomic regions densely populated in putative sites for theseTRs [106–110]. A third type of approach is based on thedetection of statistically significant clusters of phylogeneticallyconserved TR binding sites [86,111,112].

Current databases for regulatory interactions

Large regulatory interaction datasets are now available for avariety of metazoan species (Table 2), including Drosophilamelanogaster, Caenorhabditis elegans, and Homo sapiens. Inlight of these vast scientific resources made available throughexperimental and computational analyses, several databasesstoring interaction data are now in wide usage (Table 2). Most ofthese databases contain interaction data derived from both high-throughput analyses and small-scale experiments. In addition tobeing data warehouses, some of these databases have developednew methods for data exchange and visualization to facilitatethe study of molecular interaction networks.

Integrating gene expression profiles with molecularinteraction data to construct regulatory networks

Gene expression profiles describe the steady-state mRNAlevels in the cell—the outcome of the regulatory network.Although network structure could in principle be inferred fromonly expression data, it would be a very challenging taskbecause of the small sample sizes (number of genes greatlyexceeds number of measurements per gene) and large amount ofnoise in expression profiles. However, the integration of large-scale physical interaction datasets with expression data providesa more direct route for reconstructing gene regulatory networks.

Different data sources have their own limitations. Currently,both gene expression and GWLA data are noisy. Differentiallyexpressed genes from replicate microarray experiments typi-cally overlap by 70–75% [113], whereas the overlap betweenreplicate ChIP–chip experiments is even lower, usually lessthan 50%. Because gene expression and TR location data

Page 7: Review Integrated approaches to uncovering transcription ...enables network identification provided large enough sam-ples are available. Perturbation algorithms have been success-fully

225K. Tan et al. / Genomics 91 (2008) 219–231

provide complementary information, integration of the two datasources can emphasize the functional part of the network andthus make the inferred network more biologically relevant. Thisobservation provides the rationale for several recently devel-oped integrative approaches. Bar-Joseph and colleagues devel-oped an iterative method in which coherent gene expressionprofile is used to include low-confidence but true ChIP–chiptargets in regulatory modules [114]. Gao et al. [115] usedmultiple linear regressions to model gene expression ratios andTF binding to gene promoters, as measured with ChIP–chip.Similarly, Liao et al. [116] used two sets of linear equations tomodel simultaneously the regulatory strength of TFs on theirDNA targets and the activity levels of the TF themselves. This isan improvement over the linear regression approach of Gao andcolleagues, which could model only TF activity levels.

Most cellular functions are carried out by protein complexes,such as ribosome, spliceosome, and proteasome. So far little isknown about how protein complexes are regulated. Protein

Fig. 3. An integrated approach to uncovering transcription regulatory networks. Reprein which physical interactions (protein–protein, protein–DNA, etc.) and expressionintegrated network model. Predictions of regulatory mechanisms, obtained from the nand TSRA [39,120,121], are first experimentally validated and then used to refine t

interaction data could also be integrated with gene expressionprofiles to discover regulated protein complexes [117,118] andto study the regulatory dynamics of protein complexes [119].

Finally, gene expression data could also be integrated simul-taneously with both protein–DNA (GWLA) and protein–protein interaction data (Fig. 3). Examples in this categoryinclude the jActiveModule [120] and the physical network[121] approach. jActiveModule, as the name suggests, identifiesa set of genes that are differentially expressed under certainconditions and whose protein products physically interact. Thephysical network approach by Yeang et al. identifies causalphysical pathways leading to differentially expressed genes.

Using integrated networks

A regulatory network, defined from several different datasources as outlined above, is useful in different ways. Clearlythe network structure suggests novel mechanistic hypotheses

sentation of an integrated approach similar to the one used by Nilsson et al. [39],measurements (microarray, qPCR, CAGE, etc.) are combined to generate anetwork interrogation using bioinformatics tools such as jActiveModules, PNM,he original network model.

Page 8: Review Integrated approaches to uncovering transcription ...enables network identification provided large enough sam-ples are available. Perturbation algorithms have been success-fully

226 K. Tan et al. / Genomics 91 (2008) 219–231

that must be experimentally tested as a final validation step.However, before this step is taken it is mandatory to considerthat networks are condition-and state-dependent, that is,different parts of the network will be active under differentconditions. For example, a cell that is exposed to a particularcompound or a physiological condition such as stress willproduce two different activity patterns. Therefore a static net-work defined from different data sources has to be evaluatedand projected onto the specific condition of interest. Such anetwork projection can be performed in space (over differentorgans/tissues) and/or in time (in response to a stimulation forexample).

Integrated approaches to studying tissue-and disease-specificnetworks

The tissue-specific mRNA expression patterns of a gene canoffer important clues to its physiological function. Previously,Su et al. have generated a large compendium of gene expressionprofiles of 79 human and 61 mouse tissues [122]. Apart fromnormal tissues, a large number of disease-specific expressionprofiles (cancer, immune disorders, neurological diseases, etc.)

Fig. 4. Transcription regulatory network dynamics in macrophage activation by lipopapproach developed by Nilsson and colleagues [39]. (A) The integrated network is p(B) Network model inferred using the TSR algorithm. Nodes represent TFs and are co24 h poststimulation). Arrowed edges indicate TF–DNA binding events inferred as de

have also been generated over the past decades. These expres-sion datasets, which in essence constitute an activity-defined“fingerprint” of a disease state, provide unprecedented oppor-tunities to study transcriptional regulatory networks in mam-mals. The key to success is to adopt a system to integrate theseexpression profiles with the static (interactome) network andphenotypic data. Recently, as a proof-of-concept, several groupshave adopted this strategy to study regulatory networks inhuman diseases, including cancer [123,124], innate immunity[39,125], and inflammation [126]. For example Segal andcoworkers used a large compendium of gene expression profilesto defined shared and cancer-subtype-specific modules [124].Similarly, Paulsson and coworkers have defined metabolicinteraction networks and used these static networks togetherwith expression data sampled from different conditions to definedisease states of relevance for diabetes and cardiovasculardiseases [127].

Network dynamics

In addition to spatial, tissue, and disease-dependent activity, itis equally important to investigate system dynamics over a period

olysaccharide (LPS), a bacterial endotoxin, obtained by applying the integratedarsed for time-shifted regulatory events involving TFs and their regulated genes.lored based on the time of maximum expression during the LPS time course (0 toscribed by Nilsson and colleagues. Dotted edges are protein–protein interactions.

Page 9: Review Integrated approaches to uncovering transcription ...enables network identification provided large enough sam-ples are available. Perturbation algorithms have been success-fully

227K. Tan et al. / Genomics 91 (2008) 219–231

of time. This approach has a long tradition based on computa-tional modeling of cellular processes [128]. Those purelycomputational attempts are severely limited by the current lackof information about kinetic parameters describing the networkdynamics. In contrast, using a topological structure defined byintegration of different data types offers a much simpler temporalanalysis without the need for detailed information and the precisekinetics. Here the relevant parts of a static network can insteadbe extracted when the system (cell/populations/tissues/organs)produces a time-dependent response. For example, a human cellor a cell line can be stimulated to mimic a physiological responseof interest. Calvano and coworkers [126] used manual curation toassemble different sources of data that defined a static inflam-matory network in human. Next, they studied the blood leukocytegene expression patterns at different time points (2, 4, 6, 9, 24 h) inhuman subjects receiving an inflammatory stimulus (bacterialendotoxin). The infusion of endotoxin essentially activated aninnate immune response of brief duration. By extracting differentparts of their inflammatory network at each time point the authorsrevealed an initial proinflammatory network and a subsequentcounterregulatory network. A similar analysis of allergic diseasehas been presented by Benson and colleagues [129]. Nilsson andcolleagues [39] used a promoter-based network as a backbone tostudy the LPS-induced response in bone-marrow-derived macro-phages (Fig. 4). Again, by extracting the active part of the networkat different time points, using an algorithm that searches thenetwork for time-shifted regulatory events involving TFs andtheir regulated genes, the authors uncovered central pathwaysinvolved in the inflammatory response and new critical roles forTRs during the inflammatory response (Fig. 4). These particularexamples demonstrate how integrative networks can be used touncover time-dependent and disease-relevant responses, whichprovides a wealth ofmechanistic hypotheses that can be subjectedto subsequent experimental analysis.

Future challenges

A major emerging challenge of network biology is tocompare and contrast biological networks systematically overdifferent species, conditions, cell types, disease states, or pointsin time. For this purpose, methods are being developed tocompare and contrast protein interaction networks to predictregulatory interactions [118,130] and to identify conservedinteraction complexes and pathways [118]. Althoughmost of theprevious research has focused on protein interaction networks,many methods can be extended to compare transcriptional reg-ulatory networks. An intriguing future application of theseintegrated networks may be to probe the functional role of singlenucleotide polymorphisms, which are now rapidly being as-sembled for a range of diseases [131,132].

Finally, it is clear that data-integration methods need to beformulated in a proper statistical framework. For example, foreach data type it is essential to develop statistical measures thatindicate the reliability of the data, thus setting the stage for aweighted integrative method in which each data source con-tributes in proportion to its internal quality and relevance to thequestion at hand.

Acknowledgments

K.T. was funded by a postdoctoral fellowship from IBMCorp., through its Institutes of Innovation Program. J.T. wassupported by Clinical Gene Networks AB and LinköpingUniversity. T.R. was supported by National Institute of MentalHealth Grant 2P30MH062261-07.

References

[1] J.C. Venter, M.D. Adams, E.W. Myers, P.W. Li, R.J. Mural, G.G. Sutton,H.O. Smith, M. Yandell, C.A. Evans, R.A. Holt, J.D. Gocayne, P.Amanatides, R.M. Ballew, D.H. Huson, J.R. Wortman, Q. Zhang, C.D.Kodira, X.H. Zheng, L. Chen, M. Skupski, G. Subramanian, P.D.Thomas, J. Zhang, G.L. Gabor Miklos, C. Nelson, S. Broder, A.G. Clark,J. Nadeau, V.A. McKusick, N. Zinder, A.J. Levine, R.J. Roberts, M.Simon, C. Slayman, M. Hunkapiller, R. Bolanos, A. Delcher, I. Dew, D.Fasulo, M. Flanigan, L. Florea, A. Halpern, S. Hannenhalli, S. Kravitz, S.Levy, C. Mobarry, K. Reinert, K. Remington, J. Abu-Threideh, E.Beasley, K. Biddick, V. Bonazzi, R. Brandon, M. Cargill, I. Chandra-mouliswaran, R. Charlab, K. Chaturvedi, Z. Deng, V. Di Francesco, P.Dunn, K. Eilbeck, C. Evangelista, A.E. Gabrielian, W. Gan, W. Ge, F.Gong, Z. Gu, P. Guan, T.J. Heiman, M.E. Higgins, R.R. Ji, Z. Ke, K.A.Ketchum, Z. Lai, Y. Lei, Z. Li, J. Li, Y. Liang, X. Lin, F. Lu, G.V.Merkulov,N. Milshina, H.M. Moore, A.K. Naik, V.A. Narayan, B. Neelam, D.Nusskern, D.B. Rusch, S. Salzberg, W. Shao, B. Shue, J. Sun, Z. Wang, A.Wang, X. Wang, J. Wang, M. Wei, R. Wides, C. Xiao, C. Yan, et al., Thesequence of the human genome, Science 291 (2001) 1304–1351.

[2] E.S. Lander, L.M. Linton, B. Birren, C. Nusbaum, M.C. Zody, J.Baldwin, K. Devon, K. Dewar, M. Doyle, W. FitzHugh, R. Funke, D.Gage, K. Harris, A. Heaford, J. Howland, L. Kann, J. Lehoczky, R.LeVine, P. McEwan, K. McKernan, J. Meldrim, J.P. Mesirov, C. Miranda,W. Morris, J. Naylor, C. Raymond, M. Rosetti, R. Santos, A. Sheridan, C.Sougnez, N. Stange-Thomann, N. Stojanovic, A. Subramanian, D.Wyman, J. Rogers, J. Sulston, R. Ainscough, S. Beck, D. Bentley, J.Burton, C. Clee, N. Carter, A. Coulson, R. Deadman, P. Deloukas, A.Dunham, I. Dunham, R. Durbin, L. French, D. Grafham, S. Gregory, T.Hubbard, S. Humphray, A. Hunt, M. Jones, C. Lloyd, A. McMurray, L.Matthews, S. Mercer, S. Milne, J.C. Mullikin, A. Mungall, R. Plumb, M.Ross, R. Shownkeen, S. Sims, R.H.Waterston, R.K.Wilson, L.W. Hillier,J.D. McPherson, M.A. Marra, E.R. Mardis, L.A. Fulton, A.T. Chinwalla,K.H. Pepin, W.R. Gish, S.L. Chissoe, M.C. Wendl, K.D. Delehaunty, T.L.Miner, A. Delehaunty, J.B. Kramer, L.L. Cook, R.S. Fulton, D.L.Johnson, P.J. Minx, S.W. Clifton, T. Hawkins, E. Branscomb, P. Predki, P.Richardson, S. Wenning, T. Slezak, N. Doggett, J.F. Cheng, A. Olsen, S.Lucas, C. Elkin, E. Uberbacher, M. Frazier, et al., Initial sequencing andanalysis of the human genome, Nature 409 (2001) 860–921.

[3] E.H. Davidson, D.H. Erwin, Gene regulatory networks and the evolutionof animal body plans, Science 311 (2006) 796–800.

[4] E.H. Davidson, D.R.McClay, L. Hood, Regulatory gene networks and theproperties of the developmental process, Proc. Natl. Acad. Sci. U. S. A.100 (2003) 1475–1480.

[5] M. Levine, E.H. Davidson, Gene regulatory networks for development,Proc. Natl. Acad. Sci. U. S. A. 102 (2005) 4936–4942.

[6] S.C. Materna, E.H. Davidson, Logic of gene regulatory networks, Curr.Opin. Biotechnol. 18 (2007) 351–354.

[7] L.DeLellis,M.C.Curia,G.M.Aceto, S. Toracchio, G. Colucci, A.Russo, R.Mariani-Costantini, A. Cama,Analysis of extended genomic rearrangementsin oncological research, Ann. Oncol. 18 (Suppl 6) (2007) vi173–vi178.

[8] S.J. Tebbutt, A. James, P.D. Pare, Single-nucleotide polymorphisms andlung disease: clinical implications, Chest 131 (2007) 1216–1223.

[9] V.E. Carlton, J.S. Ireland, F. Useche, M. Faham, Functional singlenucleotide polymorphism-based association studies, Hum. Genomics 2(2006) 391–402.

[10] T. Misteli, Beyond the sequence: cellular organization of genomefunction, Cell 128 (2007) 787–800.

Page 10: Review Integrated approaches to uncovering transcription ...enables network identification provided large enough sam-ples are available. Perturbation algorithms have been success-fully

228 K. Tan et al. / Genomics 91 (2008) 219–231

[11] T. Cremer, M. Cremer, S. Dietzel, S. Muller, I. Solovei, S. Fakan,Chromosome territories—a functional nuclear landscape, Curr. Opin.Cell Biol. 18 (2006) 307–316.

[12] N. Gilbert, S. Gilchrist, W.A. Bickmore, Chromatin organization in themammalian nucleus, Int. Rev. Cytol. 242 (2005) 283–336.

[13] T. Cremer, C. Cremer, Chromosome territories, nuclear architecture andgene regulation in mammalian cells, Nat., Rev. Genet. 2 (2001) 292–301.

[14] P. Carninci, T. Kasukawa, S. Katayama, J. Gough, M.C. Frith, N. Maeda,R. Oyama, T. Ravasi, B. Lenhard, C. Wells, R. Kodzius, K. Shimokawa,V.B. Bajic, S.E. Brenner, S. Batalov, A.R. Forrest, M. Zavolan, M.J.Davis, L.G. Wilming, V. Aidinis, J.E. Allen, A. Ambesi-Impiombato, R.Apweiler, R.N. Aturaliya, T.L. Bailey, M. Bansal, L. Baxter, K.W. Beisel,T. Bersano, H. Bono, A.M. Chalk, K.P. Chiu, V. Choudhary, A.Christoffels, D.R. Clutterbuck, M.L. Crowe, E. Dalla, B.P. Dalrymple, B.de Bono, G. Della Gatta, D. di Bernardo, T. Down, P. Engstrom, M.Fagiolini, G. Faulkner, C.F. Fletcher, T. Fukushima, M. Furuno, S. Futaki,M. Gariboldi, P. Georgii-Hemming, T.R. Gingeras, T. Gojobori, R.E.Green, S. Gustincich, M. Harbers, Y. Hayashi, T.K. Hensch, N.Hirokawa, D. Hill, L. Huminiecki, M. Iacono, K. Ikeo, A. Iwama, T.Ishikawa, M. Jakt, A. Kanapin, M. Katoh, Y. Kawasawa, J. Kelso, H.Kitamura, H. Kitano, G. Kollias, S.P. Krishnan, A. Kruger, S.K.Kummerfeld, I.V. Kurochkin, L.F. Lareau, D. Lazarevic, L. Lipovich, J.Liu, S. Liuni, S. McWilliam, M. Madan Babu, M. Madera, L.Marchionni, H. Matsuda, S. Matsuzawa, H. Miki, F. Mignone, S.Miyake, K. Morris, S. Mottagui-Tabar, N. Mulder, N. Nakano, H.Nakauchi, P. Ng, R. Nilsson, S. Nishiguchi, S. Nishikawa, et al., Thetranscriptional landscape of the mammalian genome, Science 309 (2005)1559–1563.

[15] S. Katayama, Y. Tomaru, T. Kasukawa, K. Waki, M. Nakanishi, M.Nakamura, H. Nishida, C.C. Yap, M. Suzuki, J. Kawai, H. Suzuki, P.Carninci, Y. Hayashizaki, C. Wells, M. Frith, T. Ravasi, K.C. Pang, J.Hallinan, J. Mattick, D.A. Hume, L. Lipovich, S. Batalov, P.G. Engstrom,Y. Mizuno, M.A. Faghihi, A. Sandelin, A.M. Chalk, S. Mottagui-Tabar,Z. Liang, B. Lenhard, C. Wahlestedt, Antisense transcription in themammalian transcriptome, Science 309 (2005) 1564–1566.

[16] T. Ravasi, H. Suzuki, K.C. Pang, S. Katayama, M. Furuno, R. Okunishi,S. Fukuda, K. Ru, M.C. Frith, M.M. Gongora, S.M. Grimmond, D.A.Hume, Y. Hayashizaki, J.S. Mattick, Experimental validation of theregulated expression of large numbers of non-coding RNAs from themouse genome, Genome Res. 16 (2006) 11–19.

[17] J. Tegner, J. Bjorkegren, Perturbations to uncover gene networks, TrendsGenet. 23 (2007) 34–41.

[18] M. Bansal, V. Belcastro, A. Ambesi-Impiombato, D. di Bernardo, How toinfer gene networks from expression profiles, Mol. Syst. Biol. 3 (2007)122.

[19] J. Tegner, M.K. Yeung, J. Hasty, J.J. Collins, Reverse engineering genenetworks: integrating genetic perturbations with dynamical modeling,Proc. Natl. Acad. Sci. U. S. A. 100 (2003) 5944–5949.

[20] T.S. Gardner, D. di Bernardo, D. Lorenz, J.J. Collins, Inferring geneticnetworks and identifying compound mode of action via expressionprofiling, Science 301 (2003) 102–105.

[21] C. Christensen, J. Thakar, R. Albert, Systems-level insights into cellularregulation: inferring, analysing, and modelling intracellular networks,IET Syst. Biol. 1 (2007) 61–77.

[22] W. Gesler, The uses of spatial analysis in medical geography: a review,Soc. Sci. Med. 23 (1986) 963–973.

[23] P. Grindrod, M. Kibble, Review of uses of network and graph theoryconcepts within proteomics, Expert. Rev. Proteomics 1 (2004) 229–238.

[24] M.J. Keeling, K.T. Eames, Networks and epidemic models, J. R. Soc.Interface 2 (2005) 295–307.

[25] R.U. Muller, M. Stead, J. Pach, The hippocampus as a cognitive graph,J. Gen. Physiol. 107 (1996) 663–694.

[26] M.E. Newman, S.H. Strogatz, D.J. Watts, Random graphs with arbitrarydegree distributions and their applications, Phys. Rev., E Stat. Nonlin.Soft Matter Phys. 64 (2001).

[27] M. Schena, D. Shalon, R.W. Davis, P.O. Brown, Quantitative monitoringof gene expression patterns with a complementary DNA microarray,Science 270 (1995) 467–470.

[28] J. DeRisi, L. Penland, P.O. Brown, M.L. Bittner, P.S. Meltzer, M. Ray, Y.Chen, Y.A. Su, J.M. Trent, Use of a cDNA microarray to analyse geneexpression patterns in human cancer, Nat. Genet. 14 (1996) 457–460.

[29] R.D. Canales, Y. Luo, J.C. Willey, B. Austermiller, C.C. Barbacioru, C.Boysen, K. Hunkapiller, R.V. Jensen, C.R. Knight, K.Y. Lee, Y. Ma, B.Maqsodi, A. Papallo, E.H. Peters, K. Poulter, P.L. Ruppel, R.R. Samaha,L. Shi, W. Yang, L. Zhang, F.M. Goodsaid, Evaluation of DNAmicroarray results with quantitative gene expression platforms, Nat.Biotechnol. 24 (2006) 1115–1122.

[30] J. Quackenbush, Microarray analysis and tumor classification, N. Engl. J.Med. 354 (2006) 2463–2472.

[31] D.W. Lin, P.S. Nelson, Microarray analysis and tumor classification,N. Engl. J. Med. 355 (2006) 960 (author reply 960).

[32] A. Brazma, P. Hingamp, J. Quackenbush, G. Sherlock, P. Spellman, C.Stoeckert, J. Aach, W. Ansorge, C.A. Ball, H.C. Causton, T. Gaasterland,P. Glenisson, F.C. Holstege, I.F. Kim, V. Markowitz, J.C. Matese, H.Parkinson, A. Robinson, U. Sarkans, S. Schulze-Kremer, J. Stewart, R.Taylor, J. Vilo, M. Vingron, Minimum information about a microarrayexperiment (MIAME)—toward standards for microarray data, Nat.Genet. 29 (2001) 365–371.

[33] V.E. Velculescu, L. Zhang, B. Vogelstein, K.W. Kinzler, Serial analysis ofgene expression, Science 270 (1995) 484–487.

[34] S. Saha, A.B. Sparks, C. Rago, V. Akmaev, C.J.Wang, B. Vogelstein, K.W.Kinzler, V.E. Velculescu, Using the transcriptome to annotate the genome,Nat. Biotechnol. 20 (2002) 508–512.

[35] P. Carninci, Tagging mammalian transcription complexity, Trends Genet.22 (2006) 501–510.

[36] P. Carninci, A. Sandelin, B. Lenhard, S. Katayama, K. Shimokawa, J.Ponjavic, C.A. Semple, M.S. Taylor, P.G. Engstrom, M.C. Frith, A.R.Forrest, W.B. Alkema, S.L. Tan, C. Plessy, R. Kodzius, T. Ravasi, T.Kasukawa, S. Fukuda, M. Kanamori-Katayama, Y. Kitazume, H. Kawaji,C. Kai, M. Nakamura, H. Konno, K. Nakano, S. Mottagui-Tabar, P. Arner,A. Chesi, S. Gustincich, F. Persichetti, H. Suzuki, S.M. Grimmond, C.A.Wells, V. Orlando, C. Wahlestedt, E.T. Liu, M. Harbers, J. Kawai, V.B.Bajic, D.A. Hume, Y. Hayashizaki, Genome-wide analysis of mammalianpromoter architecture and evolution, Nat. Genet. 38 (2006) 626–635.

[37] R. Kodzius, M. Kojima, H. Nishiyori, M. Nakamura, S. Fukuda, M.Tagami, D. Sasaki, K. Imamura, C. Kai, M. Harbers, Y. Hayashizaki, P.Carninci, CAGE: cap analysis of gene expression, Nat. Methods 3 (2006)211–222.

[38] J.B. Kim, G.J. Porreca, L. Song, S.C. Greenway, J.M. Gorham, G.M.Church, C.E. Seidman, J.G. Seidman, Polony multiplex analysis of geneexpression (PMAGE) in mouse hypertrophic cardiomyopathy, Science316 (2007) 1481–1484.

[39] R. Nilsson, V.B. Bajic, H. Suzuki, D. di Bernardo, J. Bjorkegren, S.Katayama, J.F. Reid, M.J. Sweet, M. Gariboldi, P. Carninci, Y.Hayashizaki, D.A. Hume, J. Tegner, T. Ravasi, Transcriptional networkdynamics in macrophage activation, Genomics 88 (2006) 133–142.

[40] J. Tegner, R. Nilsson, V.B. Bajic, J. Bjorkegren, T. Ravasi, Systemsbiology of innate immunity, Cell. Immunol. 244 (2006) 105–109.

[41] M.J. Holland, Transcript abundance in yeast varies over six orders ofmagnitude, J. Biol. Chem. 277 (2002) 14363–14366.

[42] J.V. Falvo, A.M. Uglialoro, B.M. Brinkman,M.Merika, B.S. Parekh, E.Y.Tsai, H.C. King, A.D. Morielli, E.G. Peralta, T. Maniatis, D. Thanos, A.E.Goldfeld, Stimulus-specific assembly of enhancer complexes on the tu-mor necrosis factor alpha gene promoter, Mol. Cell. Biol. 20 (2000)2239–2247.

[43] T.K. Kim, T. Maniatis, The mechanism of transcriptional synergy of an invitro assembled interferon-beta enhanceosome, Mol. Cell 1 (1997)119–129.

[44] M.Bellorini, D.K. Lee, J.C. Dantonel, K. Zemzoumi, R.G. Roeder, L. Tora,R. Mantovani, CCAAT binding NF-Y–TBP interactions: NF-YB and NF-YC require short domains adjacent to their histone fold motifs for asso-ciationwith TBP basic residues, Nucleic Acids Res. 25 (1997) 2174–2181.

[45] A. Hoffmann, T. Oelgeschlager, R.G. Roeder, Considerations oftranscriptional control mechanisms: do TFIID–core promoter complexesrecapitulate nucleosome-like functions? Proc. Natl. Acad. Sci. U. S. A. 94(1997) 8928–8935.

Page 11: Review Integrated approaches to uncovering transcription ...enables network identification provided large enough sam-ples are available. Perturbation algorithms have been success-fully

229K. Tan et al. / Genomics 91 (2008) 219–231

[46] A.B. Lassar, P.L. Martin, R.G. Roeder, Transcription of class III genes:formation of preinitiation complexes, Science 222 (1983) 740–748.

[47] T. Oelgeschlager, C.M. Chiang, R.G. Roeder, Topology and reorganiza-tion of a human TFIID–promoter complex, Nature 382 (1996) 735–738.

[48] M.W. Van Dyke, M. Sawadogo, R.G. Roeder, Stability of transcriptioncomplexes on class II genes, Mol. Cell. Biol. 9 (1989) 342–344.

[49] S.C. Tucker, R. Wisdom, Site-specific heterodimerization by paired classhomeodomain proteins mediates selective transcriptional responses,J. Biol. Chem. 274 (1999) 32325–32332.

[50] J.A. Armstrong, B.M. Emerson, Transcription of chromatin: these arecomplex times, Curr. Opin. Genet. Dev. 8 (1998) 165–172.

[51] M. Gaestel, Molecular chaperones in signal transduction, Handbk. Exp.Pharmacol. (2006) 93–109.

[52] R.X. Luo, D.C. Dean, Chromatin remodeling and transcriptionalregulation, J. Natl. Cancer Inst. 91 (1999) 1288–1294.

[53] T. Tsukiyama, C. Wu, Chromatin remodeling and transcription, Curr.Opin. Genet. Dev. 7 (1997) 182–191.

[54] A. Dvir, J.W. Conaway, R.C. Conaway, Mechanism of transcriptioninitiation and promoter escape by RNA polymerase II, Curr. Opin. Genet.Dev. 11 (2001) 209–214.

[55] M. Kozak, Initiation of translation in prokaryotes and eukaryotes, Gene234 (1999) 187–208.

[56] M. Kunzler, C. Springer, G.H. Braus, The transcriptional apparatusrequired for mRNA encoding genes in the yeast Saccharomycescerevisiae emerges from a jigsaw puzzle of transcription factors, FEMSMicrobiol. Rev. 19 (1996) 117–136.

[57] C.A. Spencer, M. Groudine, Transcription elongation and eukaryoticgene regulation, Oncogene 5 (1990) 777–785.

[58] T. Tamura,Y.Makino, T.Kishimoto,Regulation of gene expression and recentadvance on transcription studies, Nippon Rinsho 53 (1995) 1033–1047.

[59] J.K. Joung, E.I. Ramm, C.O. Pabo, A bacterial two-hybrid selectionsystem for studying protein–DNA and protein–protein interactions, Proc.Natl. Acad. Sci. U. S. A. 97 (2000) 7382–7387.

[60] S.A. Johnston, J.E. Hopper, Isolation of the yeast regulatory gene GAL4and analysis of its dosage effects on the galactose/melibiose regulon,Proc. Natl. Acad. Sci. U. S. A. 79 (1982) 6971–6975.

[61] H. Suzuki, Y. Fukunishi, I. Kagawa, R. Saito, H. Oda, T. Endo, S. Kondo,H. Bono, Y. Okazaki, Y. Hayashizaki, Protein–protein interaction panelusing mouse full-length cDNAs, Genome Res. 11 (2001) 1758–1765.

[62] H. Suzuki, R. Saito, M. Kanamori, C. Kai, C. Schonbach, T. Nagashima,J. Hosaka, Y. Hayashizaki, The mammalian protein–protein interactiondatabase and its viewing system that is linked to the main FANTOM2viewer, Genome Res. 13 (2003) 1534–1541.

[63] R.M. Ewing, P. Chu, F. Elisma, H. Li, P. Taylor, S. Climie, L. McBroom-Cerajewski, M.D. Robinson, L. O'Connor, M. Li, R. Taylor, M. Dharsee,Y. Ho, A. Heilbut, L. Moore, S. Zhang, O. Ornatsky, Y.V. Bukhman, M.Ethier, Y. Sheng, J. Vasilescu, M. Abu-Farha, J.P. Lambert, H.S. Duewel,I.I. Stewart, B. Kuehl, K. Hogue, K. Colwill, K. Gladwish, B. Muskat, R.Kinach, S.L. Adams, M.F. Moran, G.B. Morin, T. Topaloglou, D. Figeys,Large-scale mapping of human protein–protein interactions by massspectrometry, Mol. Syst. Biol. 3 (2007) 89.

[64] R. Aebersold, M. Mann, Mass spectrometry-based proteomics, Nature422 (2003) 198–207.

[65] A.C. Gingras, R. Aebersold, B. Raught, Advances in protein complexanalysis using mass spectrometry, J. Physiol. 563 (2005) 11–21.

[66] B. Blagoev, I. Kratchmarova, S.E. Ong, M. Nielsen, L.J. Foster, M.Mann,A proteomics strategy to elucidate functional protein–protein interactionsapplied to EGF signaling, Nat. Biotechnol. 21 (2003) 315–318.

[67] Y.Ho,A.Gruhler, A.Heilbut, G.D. Bader, L.Moore, S.L.Adams,A.Millar,P. Taylor, K. Bennett, K. Boutilier, L. Yang, C. Wolting, I. Donaldson, S.Schandorff, J. Shewnarane,M.Vo, J. Taggart,M.Goudreault, B.Muskat, C.Alfarano, D. Dewar, Z. Lin, K.Michalickova, A.R. Willems, H. Sassi, P.A.Nielsen, K.J. Rasmussen, J.R. Andersen, L.E. Johansen, L.H. Hansen, H.Jespersen, A. Podtelejnikov, E. Nielsen, J. Crawford, V. Poulsen, B.D.Sorensen, J. Matthiesen, R.C. Hendrickson, F. Gleeson, T. Pawson, M.F.Moran, D. Durocher, M. Mann, C.W. Hogue, D. Figeys, M. Tyers,Systematic identification of protein complexes in Saccharomycescerevisiae by mass spectrometry, Nature 415 (2002) 180–183.

[68] M. Mann, R.C. Hendrickson, A. Pandey, Analysis of proteins and pro-teomes by mass spectrometry, Ann. Rev. Biochem. 70 (2001) 437–473.

[69] D. Lin, D.L. Tabb, J.R. Yates III, Large-scale protein identification usingmass spectrometry, Biochim. Biophys. Acta 1646 (2003) 1–10.

[70] S. Mathivanan, B. Periaswamy, T.K. Gandhi, K. Kandasamy, S. Suresh,R. Mohmood, Y.L. Ramachandra, A. Pandey, An evaluation of humanprotein–protein interaction data in the public domain, BMC Bioinform. 7(Suppl 5) (2006) S19.

[71] J.F. Rual, K. Venkatesan, T. Hao, T. Hirozane-Kishikawa, A. Dricot, N.Li, G.F. Berriz, F.D. Gibbons, M. Dreze, N. Ayivi-Guedehoussou, N.Klitgord, C. Simon, M. Boxem, S. Milstein, J. Rosenberg, D.S. Goldberg,L.V. Zhang, S.L. Wong, G. Franklin, S. Li, J.S. Albala, J. Lim, C.Fraughton, E. Llamosas, S. Cevik, C. Bex, P. Lamesch, R.S. Sikorski, J.Vandenhaute, H.Y. Zoghbi, A. Smolyar, S. Bosak, R. Sequerra, L.Doucette-Stamm, M.E. Cusick, D.E. Hill, F.P. Roth, M. Vidal, Towards aproteome-scale map of the human protein–protein interaction network,Nature 437 (2005) 1173–1178.

[72] G.T. Hart, A.K. Ramani, E.M. Marcotte, How complete are current yeastand human protein-interaction networks? Genome Biol. 7 (2006) 120.

[73] G.M. Euskirchen, J.S. Rozowsky, C.L. Wei, W.H. Lee, Z.D. Zhang, S.Hartman, O. Emanuelsson, V. Stolc, S. Weissman, M.B. Gerstein, Y.Ruan, M. Snyder, Mapping of transcription factor binding regions inmammalian cells by ChIP: comparison of array- and sequencing-basedtechnologies, Genome Res. 17 (2007) 898–909.

[74] N.D. Heintzman, R.K. Stuart, G. Hon, Y. Fu, C.W. Ching, R.D. Hawkins,L.O. Barrera, S. Van Calcar, C. Qu, K.A. Ching, W. Wang, Z. Weng, R.D.Green, G.E. Crawford, B. Ren, Distinct and predictive chromatinsignatures of transcriptional promoters and enhancers in the humangenome, Nat. Genet. 39 (2007) 311–318.

[75] B. Ren, F. Robert, J.J. Wyrick, O. Aparicio, E.G. Jennings, I. Simon, J.Zeitlinger, J. Schreiber, N. Hannett, E. Kanin, T.L. Volkert, C.J. Wilson,S.P. Bell, R.A. Young, Genome-wide location and function of DNAbinding proteins, Science 290 (2000) 2306–2309.

[76] C.L. Wei, Q. Wu, V.B. Vega, K.P. Chiu, P. Ng, T. Zhang, A. Shahab, H.C.Yong, Y. Fu, Z. Weng, J. Liu, X.D. Zhao, J.L. Chew, Y.L. Lee, V.A.Kuznetsov, W.K. Sung, L.D. Miller, B. Lim, E.T. Liu, Q. Yu, H.H. Ng, Y.Ruan, A global map of p53 transcription-factor binding sites in the humangenome, Cell 124 (2006) 207–219.

[77] S. Ghaemmaghami, W.K. Huh, K. Bower, R.W. Howson, A. Belle, N.Dephoure, E.K. O'Shea, J.S. Weissman, Global analysis of proteinexpression in yeast, Nature 425 (2003) 737–741.

[78] C.T. Harbison, D.B. Gordon, T.I. Lee, N.J. Rinaldi, K.D. Macisaac, T.W.Danford, N.M.Hannett, J.B. Tagne, D.B. Reynolds, J. Yoo, E.G. Jennings,J. Zeitlinger, D.K. Pokholok,M. Kellis, P.A. Rolfe, K.T. Takusagawa, E.S.Lander, D.K. Gifford, E. Fraenkel, R.A.Young, Transcriptional regulatorycode of a eukaryotic genome, Nature 431 (2004) 99–104.

[79] T.I. Lee, N.J. Rinaldi, F. Robert, D.T. Odom, Z. Bar-Joseph, G.K. Gerber,N.M. Hannett, C.T. Harbison, C.M. Thompson, I. Simon, J. Zeitlinger,E.G. Jennings, H.L. Murray, D.B. Gordon, B. Ren, J.J. Wyrick, J.B.Tagne, T.L. Volkert, E. Fraenkel, D.K. Gifford, R.A. Young, Transcrip-tional regulatory networks in Saccharomyces cerevisiae, Science 298(2002) 799–804.

[80] C.M. Gorman, Mammalian cell expression, Curr. Opin. Biotechnol. 1(1990) 36–47.

[81] S. Majumder, M.L. DePamphilis, A unique role for enhancers is revealedduring early mouse development, Bioessays 17 (1995) 879–889.

[82] F. Muller, P. Blader, U. Strahle, Search for enhancers: teleost models incomparative genomic and transgenic analysis of cis regulatory elements,Bioessays 24 (2002) 564–572.

[83] L.C. Myers, R.D. Kornberg, Mediator of transcriptional regulation, Ann.Rev. Biochem. 69 (2000) 729–749.

[84] L.A. Pennacchio, E.M. Rubin, Genomic strategies to identify mammalianregulatory sequences, Nat., Rev. Genet. 2 (2001) 100–109.

[85] N. Ahituv, E.M. Rubin, M.A. Nobrega, Exploiting human–fish genomecomparisons for deciphering gene regulation, Hum. Mol. Genet. 13(Spec. No. 2) (2004) R261–R266.

[86] D.C. King, J. Taylor, L. Elnitski, F. Chiaromonte, W. Miller, R.C.Hardison, Evaluation of regulatory potential and conservation scores

Page 12: Review Integrated approaches to uncovering transcription ...enables network identification provided large enough sam-ples are available. Perturbation algorithms have been success-fully

230 K. Tan et al. / Genomics 91 (2008) 219–231

for detecting cis-regulatory modules in aligned mammalian genome se-quences, Genome Res. 15 (2005) 1051–1060.

[87] A. Woolfe, M. Goodson, D.K. Goode, P. Snell, G.K. McEwen, T.Vavouri, S.F. Smith, P. North, H. Callaway, K. Kelly, K. Walter, I.Abnizova, W. Gilks, Y.J. Edwards, J.E. Cooke, G. Elgar, Highlyconserved non-coding sequences are associated with vertebrate develop-ment, PLoS Biol. 3 (2005) e7.

[88] X. Xie, J. Lu, E.J. Kulbokas, T.R. Golub, V. Mootha, K. Lindblad-Toh,E.S. Lander, M. Kellis, Systematic discovery of regulatory motifs inhuman promoters and 3′ UTRs by comparison of several mammals,Nature 434 (2005) 338–345.

[89] S. Prabhakar, F. Poulin, M. Shoukry, V. Afzal, E.M. Rubin, O. Couronne,L.A. Pennacchio, Close sequence comparisons are sufficient to identifyhuman cis-regulatory elements, Genome Res. 16 (2006) 855–863.

[90] G.M. Cooper, E.A. Stone, G. Asimenos, E.D. Green, S. Batzoglou, A.Sidow, Distribution and intensity of constraint in mammalian genomicsequence, Genome Res. 15 (2005) 901–913.

[91] S.R. Eddy, A model of the statistical power of comparative genomesequence analysis, PLoS Biol. 3 (2005) e10.

[92] E.H. Margulies, J.P. Vinson, W. Miller, D.B. Jaffe, K. Lindblad-Toh, J.L.Chang, E.D. Green, E.S. Lander, J.C. Mullikin, M. Clamp, An initialstrategy for the systematic identification of functional elements in thehuman genome by low-redundancy comparative sequencing, Proc. Natl.Acad. Sci. U. S. A. 102 (2005) 4795–4800.

[93] X.S. Liu, D.L. Brutlag, J.S. Liu, An algorithm for finding protein–DNAbinding sites with applications to chromatin-immunoprecipitation micro-array experiments, Nat. Biotechnol. 20 (2002) 835–839.

[94] E.M. Conlon, X.S. Liu, J.D. Lieb, J.S. Liu, Integrating regulatory motifdiscovery and genome-wide expression analysis, Proc. Natl. Acad. Sci.U. S. A. 100 (2003) 3339–3344.

[95] P. Hong, X.S. Liu, Q. Zhou, X. Lu, J.S. Liu, W.H. Wong, A boostingapproach for motif modeling using ChIP–chip data, Bioinformatics 21(2005) 2636–2643.

[96] L.Y. Wang, M. Snyder, M. Gerstein, BoCaTFBS: a boosted cascadelearner to refine the binding sites suggested by ChIP–chip experiments,Genome Biol. 7 (2006) R102.

[97] B.C. Foat, A.V. Morozov, H.J. Bussemaker, Statistical mechanicalmodeling of genome-wide transcription factor occupancy data byMatrixREDUCE, Bioinformatics 22 (2006) e141–e149.

[98] A. Tanay, Extensive low-affinity transcriptional interactions in the yeastgenome, Genome Res. 16 (2006) 962–972.

[99] N.M. Luscombe, M.M. Babu, H. Yu, M. Snyder, S.A. Teichmann, M.Gerstein, Genomic analysis of regulatory network dynamics reveals largetopological changes, Nature 431 (2004) 308–312.

[100] Q. Zhou, W.H. Wong, CisModule: de novo discovery of cis-regulatorymodules by hierarchical mixture modeling, Proc. Natl. Acad. Sci. U. S. A.101 (2004) 12114–12119.

[101] W.W. Wasserman, J.W. Fickett, Identification of regulatory regions whichconfer muscle-specific gene expression, J. Mol. Biol. 278 (1998) 167–181.

[102] S. Aerts, P. Van Loo, Y. Moreau, B. De Moor, A genetic algorithm for thedetection of new cis-regulatory modules in sets of coregulated genes,Bioinformatics 20 (2004) 1974–1976.

[103] E. Segal, R. Sharan, A discriminative model for identifying spatial cis-regulatory modules, J. Comput. Biol. 12 (2005) 822–834.

[104] W. Thompson,M.J. Palumbo,W.W.Wasserman, J.S. Liu, C.E. Lawrence,Decoding human regulatory circuits, Genome Res. 14 (2004) 1967–1974.

[105] M.Gupta, J.S. Liu, De novo cis-regulatorymodule elicitation for eukaryoticgenomes, Proc. Natl. Acad. Sci. U. S. A. 102 (2005) 7079–7084.

[106] O. Johansson, W. Alkema, W.W. Wasserman, J. Lagergren, Identificationof functional clusters of transcription factor binding motifs in genomesequences: the MSCAN algorithm, Bioinformatics 19 (Suppl. 1) (2003)i169–i176.

[107] T.L. Bailey, W.S. Noble, Searching for statistically significant regulatorymodules, Bioinformatics 19 (Suppl. 2) (2003) II16–II25.

[108] M.C. Frith, M.C. Li, Z. Weng, Cluster-Buster: finding dense clusters ofmotifs in DNA sequences, Nucleic Acids Res. 31 (2003) 3666–3668.

[109] S. Sinha, E. van Nimwegen, E.D. Siggia, A probabilistic method to detectregulatory modules, Bioinformatics 19 Suppl. (1) (2003) i292–i301.

[110] W.B. Alkema, O. Johansson, J. Lagergren, W.W. Wasserman, MSCAN:identification of functional clusters of transcription factor binding sites,Nucleic Acids Res. 32 (2004) W195–W198.

[111] D. Kolbe, J. Taylor, L. Elnitski, P. Eswara, J. Li, W.Miller, R. Hardison, F.Chiaromonte, Regulatory potential scores from genome-wide three-wayalignments of human, mouse, and rat, Genome Res. 14 (2004) 700–707.

[112] M. Blanchette, A.R. Bataille, X. Chen, C. Poitras, J. Laganiere, C.Lefebvre, G. Deblois, V. Giguere, V. Ferretti, D. Bergeron, B. Coulombe,F. Robert, Genome-wide computational prediction of transcriptionalregulatory modules reveals new insights into human gene expression,Genome Res. 16 (2006) 656–668.

[113] L. Shi, L.H. Reid,W.D. Jones, R. Shippy, J.A.Warrington, S.C. Baker, P.J.Collins, F. de Longueville, E.S. Kawasaki, K.Y. Lee, Y. Luo, Y.A. Sun,J.C. Willey, R.A. Setterquist, G.M. Fischer, W. Tong, Y.P. Dragan, D.J.Dix, F.W. Frueh, F.M. Goodsaid, D. Herman, R.V. Jensen, C.D.Johnson, E.K. Lobenhofer, R.K. Puri, U. Schrf, J. Thierry-Mieg, C.Wang, M. Wilson, P.K. Wolber, L. Zhang, S. Amur, W. Bao, C.C.Barbacioru, A.B. Lucas, V. Bertholet, C. Boysen, B. Bromley, D.Brown, A. Brunner, R. Canales, X.M. Cao, T.A. Cebula, J.J. Chen, J.Cheng, T.M. Chu, E. Chudin, J. Corson, J.C. Corton, L.J. Croner, C.Davies, T.S. Davison, G. Delenstarr, X. Deng, D. Dorris, A.C. Eklund,X.H. Fan, H. Fang, S. Fulmer-Smentek, J.C. Fuscoe, K. Gallagher, W.Ge, L. Guo, X. Guo, J. Hager, P.K. Haje, J. Han, T. Han, H.C. Harbottle,S.C. Harris, E. Hatchwell, C.A. Hauser, S. Hester, H. Hong, P. Hurban,S.A. Jackson, H. Ji, C.R. Knight, W.P. Kuo, J.E. LeClerc, S. Levy, Q.Z.Li, C. Liu, Y. Liu, M.J. Lombardi, Y. Ma, S.R. Magnuson, B. Maqsodi,T. McDaniel, N. Mei, O. Myklebost, B. Ning, N. Novoradovskaya, M.S.Orr, T.W. Osborn, A. Papallo, T.A. Patterson, R.G. Perkins, E.H. Peters,R. Peterson, et al., The MicroArray Quality Control (MAQC) projectshows inter- and intraplatform reproducibility of gene expressionmeasurements, Nat. Biotechnol. 24 (2006) 1151–1161.

[114] Z. Bar-Joseph, G.K. Gerber, T.I. Lee, N.J. Rinaldi, J.Y. Yoo, F. Robert,D.B. Gordon, E. Fraenkel, T.S. Jaakkola, R.A. Young, D.K. Gifford,Computational discovery of gene modules and regulatory networks, Nat.Biotechnol. 21 (2003) 1337–1342.

[115] F. Gao, B.C. Foat, H.J. Bussemaker, Defining transcriptional networksthrough integrative modeling of mRNA expression and transcriptionfactor binding data, BMC Bioinformatics 5 (2004) 31.

[116] J.C. Liao, R. Boscolo, Y.L. Yang, L.M. Tran, C. Sabatti, V.P.Roychowdhury, Network component analysis: reconstruction of regula-tory signals in biological systems, Proc. Natl. Acad. Sci. U. S. A. 100(2003) 15522–15527.

[117] N. Simonis, J. van Helden, G.N. Cohen, S.J. Wodak, Transcriptionalregulation of protein complexes in yeast, Genome Biol. 5 (2004) R33.

[118] K. Tan, T. Shlomi, H. Feizi, T. Ideker, R. Sharan, Transcriptionalregulation of protein complexes within and across species, Proc. Natl.Acad. Sci. U. S. A. 104 (2007) 1283–1288.

[119] U. de Lichtenberg, L.J. Jensen, S. Brunak, P. Bork, Dynamic complexformation during the yeast cell cycle, Science 307 (2005) 724–727.

[120] T. Ideker, O. Ozier, B. Schwikowski, A.F. Siegel, Discovering regulatoryand signalling circuits in molecular interaction networks, Bioinformatics18 (Suppl. 1) (2002) S233–S240.

[121] C.H. Yeang, T. Ideker, T. Jaakkola, Physical network models, J. Comput.Biol. 11 (2004) 243–262.

[122] A.I. Su, T. Wiltshire, S. Batalov, H. Lapp, K.A. Ching, D. Block, J.Zhang, R. Soden, M. Hayakawa, G. Kreiman, M.P. Cooke, J.R. Walker,J.B. Hogenesch, A gene atlas of the mouse and human protein-encodingtranscriptomes, Proc. Natl. Acad. Sci. U. S. A. 101 (2004) 6062–6067.

[123] S.A. Tomlins, R. Mehra, D.R. Rhodes, X. Cao, L. Wang, S.M.Dhanasekaran, S. Kalyana-Sundaram, J.T. Wei, M.A. Rubin, K.J. Pienta,R.B. Shah, A.M. Chinnaiyan, Integrative molecular concept modeling ofprostate cancer progression, Nat. Genet. 39 (2007) 41–51.

[124] E. Segal, N. Friedman, N. Kaminski, A. Regev, D. Koller, Fromsignatures to models: understanding cancer using microarrays, Nat.Genet. 37 (2005) S38–S45 (Suppl).

[125] M. Gilchrist, V. Thorsson, B. Li, A.G. Rust, M. Korb, K. Kennedy, T. Hai,H. Bolouri, A. Aderem, Systems biology approaches identify ATF3 as anegative regulator of Toll-like receptor 4, Nature 441 (2006) 173–178.

Page 13: Review Integrated approaches to uncovering transcription ...enables network identification provided large enough sam-ples are available. Perturbation algorithms have been success-fully

231K. Tan et al. / Genomics 91 (2008) 219–231

[126] S.E. Calvano, W. Xiao, D.R. Richards, R.M. Felciano, H.V. Baker, R.J.Cho, R.O. Chen, B.H. Brownstein, J.P. Cobb, S.K. Tschoeke, C. Miller-Graziano, L.L. Moldawer, M.N. Mindrinos, R.W. Davis, R.G. Tompkins,S.F. Lowry, A network-based analysis of systemic inflammation inhumans, Nature 437 (2005) 1032–1037.

[127] N.C. Duarte, S.A. Becker, N. Jamshidi, I. Thiele, M.L. Mo, T.D. Vo, R.Srivas, B.O. Palsson, Global reconstruction of the human metabolicnetwork based on genomic and bibliomic data, Proc. Natl. Acad. Sci.U. S. A. 104 (2007) 1777–1782.

[128] A. Ma'ayan, R.D. Blitzer, R. Iyengar, Toward predictive models ofmammalian cells, Annu. Rev. Biophys. Biomol. Struct. 34 (2005)319–349.

[129] M. Benson, L. Carlsson, G. Guillot, M. Jernas, M.A. Langston, M.Rudemo, B. Andersson, A network-based analysis of allergen-challengedCD4+T cells from patients with allergic rhinitis, Genes Immun. 7 (2006)514–521.

[130] H. Yu, N.M. Luscombe, H.X. Lu, X. Zhu, Y. Xia, J.D. Han, N. Bertin, S.Chung, M. Vidal, M. Gerstein, Annotation transfer between genomes:protein–protein interologs and protein–DNA regulogs, Genome Res. 14(2004) 1107–1118.

[131] G.S. Sellick, L.R. Goldin, R.W. Wild, S.L. Slager, L. Ressenti, S.S. Strom,M.J. Dyer, F.R. Mauro, G.E. Marti, S. Fuller, M. Lyttelton, T.J. Kipps, M.J.Keating, T.G. Call, D. Catovsky, N. Caporaso, R.S. Houlston, A high-density SNP genome-wide linkage search of 206 families identifiessusceptibility loci for chronic lymphocytic leukemia, Blood 110 (9)(Nov 1) (2007) 3326–3333.

[132] R. Saxena, B.F. Voight, V. Lyssenko, N.P. Burtt, P.I. de Bakker, H. Chen,J.J. Roix, S. Kathiresan, J.N. Hirschhorn, M.J. Daly, T.E. Hughes, L.Groop, D. Altshuler, P. Almgren, J.C. Florez, J. Meyer, K. Ardlie, K.Bengtsson Bostrom, B. Isomaa, G. Lettre, U. Lindblad, H.N. Lyon, O.Melander, C. Newton-Cheh, P. Nilsson, M. Orho-Melander, L. Rastam,E.K. Speliotes, M.R. Taskinen, T. Tuomi, C. Guiducci, A. Berglund, J.Carlson, L. Gianniny, R. Hackett, L. Hall, J. Holmkvist, E. Laurila, M.Sjogren, M. Sterner, A. Surti, M. Svensson, M. Svensson, R. Tewhey, B.Blumenstiel, M. Parkin, M. Defelice, R. Barry, W. Brodeur, J. Camarata,N. Chia, M. Fava, J. Gibbons, B. Handsaker, C. Healy, K. Nguyen, C.Gates, C. Sougnez, D. Gage, M. Nizzari, S.B. Gabriel, G.W. Chirn, Q.Ma, H. Parikh, D. Richardson, D. Ricke, S. Purcell, Genome-wideassociation analysis identifies loci for type 2 diabetes and triglyceridelevels, Science 316 (2007) 1331–1336.