18
LARGE-SCALE BIOLOGY ARTICLE A Functional and Evolutionary Perspective on Transcription Factor Binding in Arabidopsis thaliana C W Ken S. Heyndrickx, a,b,1 Jan Van de Velde, a,b,1 Congmao Wang, c Detlef Weigel, c and Klaas Vandepoele a,b,2 a Department of Plant Systems Biology, VIB, 9052 Gent, Belgium b Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Gent, Belgium c Department of Molecular Biology, Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany ORCID IDs: 0000-0001-5831-0536 (K.S.H.); 0000-0001-7742-1266 (J.V.d.V.); 0000-0002-2114-7963 (D.W.); 0000-0003-4790-2725 (K.V.) Understanding the mechanisms underlying gene regulation is paramount to comprehend the translation from genotype to phenotype. The two are connected by gene expression, and it is generally thought that variation in transcription factor (TF) function is an important determinant of phenotypic evolution. We analyzed publicly available genome-wide chromatin immunoprecipitation experiments for 27 TFs in Arabidopsis thaliana and constructed an experimental network containing 46,619 regulatory interactions and 15,188 target genes. We identied hub targets and highly occupied target (HOT) regions, which are enriched for genes involved in development, stimulus responses, signaling, and gene regulatory processes in the currently proled network. We provide several lines of evidence that TF binding at plant HOT regions is functional, in contrast to that in animals, and not merely the result of accessible chromatin. HOT regions harbor specic DNA motifs, are enriched for differentially expressed genes, and are often conserved across crucifers and dicots, even though they are not under higher levels of purifying selection than non-HOT regions. Distal bound regions are under purifying selection as well and are enriched for a chromatin state showing regulation by the Polycomb repressive complex. Gene expression complexity is positively correlated with the total number of bound TFs, revealing insights in the regulatory code for genes with different expression breadths. The integration of noncanonical and canonical DNA motif information yields new hypotheses on cobinding and tethering between specic TFs involved in owering and light regulation. INTRODUCTION Unraveling the mechanisms underlying gene regulation is an important premise to understand how the genotype is translated into a functional organism. Transcriptional regulation by tran- scription factors (TFs) is one of the most investigated mecha- nisms, as it can be considered the primary level of regulation (Wray et al., 2003). The emergence of chromatin immunoprecipitation (ChIP) followed by genome-wide readout through microarray (ChIP-chip) or deep sequencing (ChIP-Seq) has stimulated the experimental identication and comprehensive characterization of target genes bound by a specic TF (Ren et al., 2000; Johnson et al., 2007). Studying a single TF using ChIP (henceforth referring to both ChIP-chip and ChIP-Seq) is already valuable to examine its DNA binding motif, identify putative target genes, and unravel its biological role through the functional analysis of its targets. Going further, the integration of complementary functional genomics data sets has the potential to provide insights re- garding the bound DNA and the mechanisms underlying cor- egulation by multiple TFs. While these genome-wide approaches can open many interesting avenues for subsequent studies, the biological interpretation of ChIP studies involves a number of important challenges. First, ChIP data have revealed only weak correlation between TF binding and transcriptional regulation of the potential target genes (Lee et al., 2007). Possible explanations are the dependency on other condi- tion-speci c factors, such as cofactors or chromatin remodeling, for the correct regulation of the target gene, or that many of the ob- served binding events are nonfunctional. In the latter case, such binding events are suggested to be the result of passive thermo- dynamics instead of active recruitment (MacArthur et al., 2009), and nonfunctional binding events have been linked with highly bound genes (hub target genes) and highly occupied target (HOT) regions (bound by many TFs) in the worm Caenorhabditis elegans and in the yeast Saccharomyces cerevisiae (Teytelman et al., 2013; Van Nostrand and Kim, 2013). Second, some TF-bound regions show enrichment for multiple different DNA sequence motifs, compli- cating the identication of directly regulated targets. In regions of the genome of Arabidopsis thaliana bound by SEPALLATA3 (SEP3), a TF involved in ower development, enrichment was found for ve known TF sequence motifs (Kaufmann et al., 2009). Multiple enriched DNA binding motifs in a ChIP data set can be 1 These authors contributed equally to this work. 2 Address correspondence to [email protected]. The author responsible for distribution of materials integral to the ndings presented in this article in accordance with the policy described in the Instructions for Authors (www.plantcell.org) is: Klaas Vandepoele (klaas. [email protected]). C Some gures in this article are displayed in color online but in black and white in the print edition. W Online version contains Web-only data. www.plantcell.org/cgi/doi/10.1105/tpc.114.130591 The Plant Cell, Vol. 26: 3894–3910, October 2014, www.plantcell.org ã 2014 American Society of Plant Biologists. All rights reserved.

A Functional and Evolutionary Perspective on …A Functional and Evolutionary Perspective on Transcription Factor Binding in Arabidopsis thaliana C W Ken S. Heyndrickx, a,b,1 Jan Van

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Functional and Evolutionary Perspective on …A Functional and Evolutionary Perspective on Transcription Factor Binding in Arabidopsis thaliana C W Ken S. Heyndrickx, a,b,1 Jan Van

LARGE-SCALE BIOLOGY ARTICLE

A Functional and Evolutionary Perspective on TranscriptionFactor Binding in Arabidopsis thalianaC W

Ken S. Heyndrickx,a,b,1 Jan Van de Velde,a,b,1 CongmaoWang,c Detlef Weigel,c and Klaas Vandepoelea,b,2

a Department of Plant Systems Biology, VIB, 9052 Gent, BelgiumbDepartment of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Gent, BelgiumcDepartment of Molecular Biology, Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany

ORCID IDs: 0000-0001-5831-0536 (K.S.H.); 0000-0001-7742-1266 (J.V.d.V.); 0000-0002-2114-7963 (D.W.); 0000-0003-4790-2725(K.V.)

Understanding the mechanisms underlying gene regulation is paramount to comprehend the translation from genotype tophenotype. The two are connected by gene expression, and it is generally thought that variation in transcription factor (TF)function is an important determinant of phenotypic evolution. We analyzed publicly available genome-wide chromatinimmunoprecipitation experiments for 27 TFs in Arabidopsis thaliana and constructed an experimental network containing46,619 regulatory interactions and 15,188 target genes. We identified hub targets and highly occupied target (HOT) regions,which are enriched for genes involved in development, stimulus responses, signaling, and gene regulatory processes in thecurrently profiled network. We provide several lines of evidence that TF binding at plant HOT regions is functional, in contrastto that in animals, and not merely the result of accessible chromatin. HOT regions harbor specific DNA motifs, are enrichedfor differentially expressed genes, and are often conserved across crucifers and dicots, even though they are not under higherlevels of purifying selection than non-HOT regions. Distal bound regions are under purifying selection as well and are enrichedfor a chromatin state showing regulation by the Polycomb repressive complex. Gene expression complexity is positivelycorrelated with the total number of bound TFs, revealing insights in the regulatory code for genes with different expressionbreadths. The integration of noncanonical and canonical DNA motif information yields new hypotheses on cobinding andtethering between specific TFs involved in flowering and light regulation.

INTRODUCTION

Unraveling the mechanisms underlying gene regulation is animportant premise to understand how the genotype is translatedinto a functional organism. Transcriptional regulation by tran-scription factors (TFs) is one of the most investigated mecha-nisms, as it can be considered the primary level of regulation (Wrayet al., 2003). The emergence of chromatin immunoprecipitation(ChIP) followed by genome-wide readout through microarray(ChIP-chip) or deep sequencing (ChIP-Seq) has stimulated theexperimental identification and comprehensive characterization oftarget genes bound by a specific TF (Ren et al., 2000; Johnsonet al., 2007). Studying a single TF using ChIP (henceforth referringto both ChIP-chip and ChIP-Seq) is already valuable to examine itsDNA binding motif, identify putative target genes, and unravel itsbiological role through the functional analysis of its targets.

Going further, the integration of complementary functionalgenomics data sets has the potential to provide insights re-garding the bound DNA and the mechanisms underlying cor-egulation by multiple TFs.While these genome-wide approaches can openmany interesting

avenues for subsequent studies, the biological interpretation ofChIP studies involves a number of important challenges. First, ChIPdata have revealed only weak correlation between TF binding andtranscriptional regulation of the potential target genes (Lee et al.,2007). Possible explanations are the dependency on other condi-tion-specific factors, such as cofactors or chromatin remodeling, forthe correct regulation of the target gene, or that many of the ob-served binding events are nonfunctional. In the latter case, suchbinding events are suggested to be the result of passive thermo-dynamics instead of active recruitment (MacArthur et al., 2009), andnonfunctional binding events have been linked with highly boundgenes (hub target genes) and highly occupied target (HOT) regions(bound by many TFs) in the worm Caenorhabditis elegans and inthe yeast Saccharomyces cerevisiae (Teytelman et al., 2013; VanNostrand and Kim, 2013). Second, some TF-bound regions showenrichment for multiple different DNA sequence motifs, compli-cating the identification of directly regulated targets. In regions ofthe genome of Arabidopsis thaliana bound by SEPALLATA3(SEP3), a TF involved in flower development, enrichment wasfound for five known TF sequence motifs (Kaufmann et al., 2009).Multiple enriched DNA binding motifs in a ChIP data set can be

1 These authors contributed equally to this work.2 Address correspondence to [email protected] author responsible for distribution of materials integral to the findingspresented in this article in accordance with the policy described in theInstructions for Authors (www.plantcell.org) is: Klaas Vandepoele ([email protected]).C Some figures in this article are displayed in color online but in black andwhite in the print edition.W Online version contains Web-only data.www.plantcell.org/cgi/doi/10.1105/tpc.114.130591

The Plant Cell, Vol. 26: 3894–3910, October 2014, www.plantcell.org ã 2014 American Society of Plant Biologists. All rights reserved.

Page 2: A Functional and Evolutionary Perspective on …A Functional and Evolutionary Perspective on Transcription Factor Binding in Arabidopsis thaliana C W Ken S. Heyndrickx, a,b,1 Jan Van

a sign of cooperative binding by multiple TFs, or of tethering, wherethe profiled TF associates with the chromatin through a protein-protein interaction with a second TF.

Some of the first integrative regulatory studies were in the contextof the ModENCODE and ENCODE projects in C. elegans (Gersteinet al., 2010; Cheng et al., 2011; Van Nostrand and Kim, 2013),Drosophila melanogaster (Roy et al., 2010; Nègre et al., 2011), andHomo sapiens (Bernstein et al., 2012; Gerstein et al., 2012; Wanget al., 2012). Information on protein-protein interactions, microRNA(miRNA)-target interactions, and gene expression profiles has beenharnessed for the identification of master regulators and networkmotifs (Cheng et al., 2011; Gerstein et al., 2012) and for inferringgene regulatory networks and predictive models of gene expressionlevels of target genes (Marbach et al., 2012; Van Nostrand and Kim,2013). Ferrier et al. (2011) andMejia-Guerra et al. (2012) have alreadygenerated an overview of the available TF profiling studies in Arab-idopsis. They also listed several challenges related to unraveling TFbinding complexity in plants; however, an integrated experimentalgene regulatory network describing cooperative TF binding eventsin plants is currently missing (Ferrier et al., 2011; Mejia-Guerra et al.,2012).

Here, an integrative study of 27 genome-wide TF profilingexperiments containing 15,188 potential target genes in Arab-idopsis is presented, in combination with complementary TFperturbation information, chromatin states, population genomicdata, and various functional data sets. We study the organiza-tion and mechanisms underlying TF regulation and uncover thefollowing insights in transcriptional regulation in plants: (1) Groupingpotential target genes into modules of functionally related genesoffers, complementary to filtering potential target genes using DNAmotifs, a valuable approach to identify TF-regulated genes, andprovides a computational alternative to differentially expressedgenes obtained through TF perturbation experiments. (2) TF bind-ing is organized in distinct islands across the genome that correlatewell with DNase I hypersensitive (DH) sites. TF-bound regions havedifferent levels of complexity, ranging from being bound by a singleTF to up to more than half of the profiled TFs. (3) Hub potentialtarget genes are enriched for functions related to signaling andregulation, responses to stimuli, and development and are exam-ples of complexly bound genes. Furthermore, through the in-tegration of miRNA and kinase networks, we confirmed that TFsthemselves are complexly targeted through several mechanisms.(4) Broad expression and high gene expression levels are correlatedwith complex regulation by many TFs, offering insights into howtranscriptional control for genes expressed under numerous con-ditions is encoded in the genome. (5) Cross-species sequenceconservation, population sequence diversity, and chromatin statesof the bound regions together with functional analysis of the po-tential target genes indicate that HOT regions are functional and donot reflect spurious binding events due to open chromatin. Thispattern is different from results in animals, where it has been re-ported that HOT-associated genes are less likely to be regulatedthan other genes. (6) Overlap with chromatin states links a subset ofdistal upstream bound regions to binding events under regulationby the Polycomb complex, an important repressor complex in plantdevelopment. (7) For several TFs, a large number of DNA bindingevents are associated with noncanonical motifs, generating new

testable hypotheses of cobinding TFs and TFs associating withchromatin through tethering.

RESULTS

Construction of an Experimental Arabidopsis GeneRegulatory Network through the Integration of TFChIP Experiments

At the start of our study, 34 ChIP experiments had been performedin Arabidopsis using the Affymetrix Tiling Array or short readsequencing, profiling 30 different TFs (Table 1). These factorsare primarily involved in flowering (AGAMOUS-LIKE15 [AGL15],APETALA1 [AP1], AP2, AP3, SEP3, SCHLAFMUTZE [SMZ],SUPPRESSOR OF OVEREXPRESSION OF CO1 [SOC1], SHORTVEGETATIVE PHASE [SVP], PISTILLATA [PI], LEAFY [LFY],FLOWERING LOCUS C [FLC], WUSCHEL [WUS], FOUR LIPS/MYB88 [FLP/MYB88], and FLOWERING LOCUSM [FLM]), circadianrhythm and light response (PSEUDO RESPONSE REGULATOR5[PRR5], PRR7, SOC1, TIMING OF CAB EXPRESSION1 [TOC1],PHYTOCHROME INTERACTING FACTOR3 [PIF3], PIF4, PIF5,REVOLUTA [REV], and FAR-RED ELONGATED HYPOCOTYLS3[FHY3]), cell cycle (WUS), hormone signaling (BRI1-EMS-SUPPRESSOR1 [BES1] and ETHYLENE-INSENSITIVE3 [EIN3]),and other aspects of development (GLABRA1 [GL1], GL3, GT2-LIKE1 [GTL1], FUSCA3 [FUS3], ABORTED MICROSPORES[AMS], and ETHYLENE RESPONSE FACTOR115 [ERF115]). Tocreate comparable data sets, we developed an analysis pipelineconsisting of quality control, platform-specific signal process-ing, and peak calling to reprocess all raw data in a standardizedand uniform manner (see Methods; Supplemental Figure 1).Thus, the integrated network comprised 27 unique TFs bindingnear 15,188 potential target genes, covering 46,619 unique TF-target interactions (Figure 1). For the remainder of this article, weuse the terms potential target genes or bound genes for genesthat were associated with a TF binding event. Genes that arebound and display differential expression (DE) upon perturbationof the TF will be referred to as TF-regulated genes. The TFs forwhich DE data was available are listed in Supplemental Table 1.Genome-wide ChIP experiments can lead to the identification

of many potential target genes, some of which have no knownfunctional association with the TF. The integration of DE data,which results in a set of high confidence, directly regulated targetgenes is often used to filter ChIP data. However, TF binding canalso be part of a strategy to poise the promoter for fast responseto subsequent other signals that lead to a transcriptional re-sponse of the target gene. In the latter case, there would be no DEresponse of the potential target gene in the perturbation experi-ment (Para et al., 2014). Therefore, as an alternative to using TFperturbation in a single condition, we sought potential targetgenes that show functional coherence, a sign of bona fide regu-lated genes. False positive potential target genes will not showfunctional coherence with other potential target genes, in contrastto genuine regulated genes (Lindemose et al., 2014). To delineatefunctionally coherent subsets of bound genes per TF, the en-richment of potential targets was determined in 1563 functionalgene modules (Heyndrickx and Vandepoele, 2012). The latter

Function and Evolution of TF-Bound DNA 3895

Page 3: A Functional and Evolutionary Perspective on …A Functional and Evolutionary Perspective on Transcription Factor Binding in Arabidopsis thaliana C W Ken S. Heyndrickx, a,b,1 Jan Van

comprise 13,142 genes annotated with specific functional de-scriptions based on coexpression, experimental Gene Ontology(GO) information, experimental protein-protein interaction data,protein-DNA interactions described in AtRegNet (Palaniswamyet al., 2006), or AraNet gene function predictions (Lee et al., 2010).

The benefits of this strategy are illustrated by the finding thatpotential target genes are greatly enriched for DE genes in 10out of 15 ChIP experiments for which DE data are available andfor which >20% of the potential target genes are in modules(Supplemental Figure 2). For an additional four experiments(LFY, FHY3, PI, and AP1), the effect was marginal. The appli-cability of this approach is by definition dependent on thepresence of the potential target genes in the functional genemodules. For GLT1, GL1, BES1, GL3, PIF3, and GL3, there wassupport for less than 20% of the potential target genes andthese were concentrated in very few modules, leading to in-effective subselection.

In addition to the functional module enrichment, de novo motiffinding using Peak-Motifs (Thomas-Chollier et al., 2012) wasperformed on the sequences underneath the bound regions

identified after peak calling. Selecting for potential target genesthat are associated with a peak containing a significant DNAmotif is based on the fact that most TFs are thought to bind atspecific DNA sequences, although some bind through protein-protein interactions with other DNA binding factors (Li et al.,2012). The motif-based subset improved the enrichment for DEgenes, albeit less consistently than the enrichment in functionalmodules. The combination of both criteria led to an additionalgain in enrichment for some experiments (SOC1 ChIP-Seq,FUS3, PIF5, GL3, both LFY experiments, FHY3, AP3, PI, PRR5,and both AP2 experiments; Supplemental Figure 2). We con-clude that the selection of potential target genes based on en-richment in functional modules, and to a lesser extent DNA motifenrichment, complements TF perturbation data to filter genome-wide ChIP data sets toward TF-regulated genes.We made use of the results described above to extract high-

confidence subnetworks. In the multiple-evidences (ME) network,a TF-target gene interaction is kept only when it has addi-tional support of (1) DE or the complementary approach of thefunctional modules or (2) a significantly enriched DNA motif.

Table 1. Arabidopsis TF ChIP Data Sets Used

TF TF Name Method Tissue Replicates Reference Included in Analysis

AT1G14350/AT2G02820 FLP/MYB88 ChIP-chip 10-d-old seedlings Yes Xie et al. (2010) YesAT5G13790 AGL15 ChIP-chip Embryonic culture Yes Zheng et al. (2009) YesAT5G41315 GL3 ChIP-chip 3-Week-old green tissue Yes Morohashi and

Grotewold (2009)Yes

AT3G27920 GL1 ChIP-chip 3-Week-old green tissue Yes Morohashi andGrotewold (2009)

Yes

AT4G36920 AP2 ChIP-chip Young inflorescences Yes Yant et al. (2010) YesAT1G24260 SEP3 ChIP-chip 5-Week-old inflorescences Yes Kaufmann et al. (2009) YesAT2G17950 WUS ChIP-chip Seedling apices Yes Busch et al. (2010) YesAT3G54990 SMZ ChIP-chip 9-d-old seedlings Yes Mathieu et al. (2009) YesAT1G19350 BES1 ChIP-chip 14-d-old seedlings Yes Yu et al. (2011) YesAT2G45660 SOC1 ChIP-chip 9-d-old seedlings Yes Tao et al. (2012) YesAT2G22540 SVP ChIP-chip 9-d-old seedlings Yes Tao et al. (2012) YesAT5G61850 LFY ChIP-chip 9-d-old seedlings Yes Winter et al. (2011) YesAT3G26790 FUS3 ChIP-chip Embryonic culture Yes Wang and Perry (2013) YesAT1G33240 GTL1 ChIP-chip 2-Week-old whole aerial tissues Yes Breuer et al. (2012) YesAT2G16910 AMS ChIP-Seq Flower buds No Wang et al. (2010) NoAT4G36920 AP2 ChIP-Seq Young inflorescences Yes Yant et al. (2010) YesAT1G69120 AP1 ChIP-Seq 4-Week-old inflorescences Yes Kaufmann et al. (2010) YesAT1G24260 SEP3 ChIP-Seq 5-Week-old inflorescences Yes Kaufmann et al. (2009) YesAT3G22170 FHY3 ChIP-Seq 4-d-old seedling No Ouyang et al. (2011) YesAT5G60690 REV ChIP-Seq 10-d-old seedlings No Brandt et al. (2012) NoAT5G61850 LFY ChIP-Seq 15-d-old seedlings Yes Moyroud et al. (2011) YesAT2G43010 PIF4 ChIP-Seq 14-d-old seedlings No Oh et al. (2012) YesAT3G59060 PIF5 ChIP-Seq 10-d-old seedlings No Hornitschek et al. (2012) YesAT5G10140 FLC ChIP-Seq 12-d-old seedlings No Deng et al. (2011) YesAT5G61380 TOC1 ChIP-Seq 14-d-old seedlings No Huang et al. (2012) YesAT2G45660 SOC1 ChIP-Seq 15-d-old shoot apices Yes Immink et al. (2012) YesAT5G24470 PRR5 ChIP-Seq Whole plants No Nakamichi et al. (2012) YesAT3G54340 AP3 ChIP-Seq Stage 5 floral buds No Wuest et al. (2012) YesAT5G20240 PI ChIP-Seq Stage 5 floral buds No Wuest et al. (2012) YesAT5G07310 ERF115 ChIP-Seq Cell culture No Heyman et al. (2013) YesAT1G09530 PIF3 ChIP-Seq 2-d-old seedlings Yes Zhang et al. (2013) YesAT5G02810 PRR7 ChIP-Seq 14-d-old seedlings No Liu et al. (2013) YesAT1G77080 FLM ChIP-Seq 15-d-old seedlings Yes Posé et al. (2013) YesAT3G20770 EIN3 ChIP-Seq 3-d-old seedlings Yes Chang et al. (2013) Yes

3896 The Plant Cell

Page 4: A Functional and Evolutionary Perspective on …A Functional and Evolutionary Perspective on Transcription Factor Binding in Arabidopsis thaliana C W Ken S. Heyndrickx, a,b,1 Jan Van

The high-confidence (HC) network is filtered for TF-target geneinteractions that are supported by both (1) and (2). Whereas theME network contains all 27 TFs and 10,990 potential targetgenes (30,072 interactions; Supplemental Figure 3A), the HCnetwork is reduced to 25 TFs and 3957 potential target genes(8872 interactions; Supplemental Figure 3B). The experimentsdescribed in this article were performed on these networks inaddition to the complete network, and unless mentioned oth-erwise, results were found to be robust in the subnetworks.

The entire set of peak-called regions can be accessed anddownloaded at http://bioinformatics.psb.ugent.be/cig_data/RegNet/ (see Methods). The GenomeView (Abeel et al., 2012)visualization also includes the DH sites (Zhang et al., 2012)discussed below.

TF Binding Properties

There are large differences in the number of potential targetgenes for different TFs, ranging from 56 (WUS) to 6790 (AGL15)

(Figure 1A). While some of this variability might arise from thedifferent experimental conditions, the similarity in the number ofpotential target genes for TFs that have been profiled using bothChIP-chip and ChIP-Seq indicates that those effects are minor.More important than the overall number of potential target genesis the type of genes that are bound (Supplemental Figure 4A),and more specifically, the number of potential target genes thatare gene expression regulators (TFs or miRNAs; SupplementalFigure 4B). The highest fraction of regulators among potentialtarget genes is 18% (for FLC). The fraction gradually lowers to6% (for GL1), but given the sigmoidal shape of the distribution,the majority of TFs have around 12 to 14% potential targetgenes that are regulators (compared with the expected 6%).With regard to transcriptional regulation of miRNAs, the fractionof bound miRNAs ranges from 0 to 1.8% (for FUS3). Among themiRNAs that are found as potential target genes of TFs, we findknown flowering regulators such as miR172 and miR156 (Higginset al., 2010). Thus, this network will also be a valuable resource toinvestigate transcriptional regulation of miRNAs in flowering.

Figure 1. Number of Potential Target Genes per TF and the Distribution of ChIP Peaks across Different Types of Genomic Regions.

(A) Fractions of peaks in each genomic region.(B) Enrichment/depletion heat map of peak frequencies per genomic region compared with random peaks. The colored bars represent the fractions ofpeaks in each of the genomic regions (y axis). The exact number of potential target genes is given in the labels at the bottom as n. TFs are orderedfollowing the hierarchical clustering based on potential target gene overlap.

Function and Evolution of TF-Bound DNA 3897

Page 5: A Functional and Evolutionary Perspective on …A Functional and Evolutionary Perspective on Transcription Factor Binding in Arabidopsis thaliana C W Ken S. Heyndrickx, a,b,1 Jan Van

A second important difference between TFs is the distributionof the types of bound genomic regions and how this comparesagainst a random experiment (Figures 1A and 1B). Based on thefunction of TFs in transcriptional regulation, we would expect tosee the majority of binding sites in close proximity of the po-tential target genes. Although most TFs exhibit depletion ofexonic binding (Figure 1B), there are TFs with a substantialamount of intragenic binding in exons (WUS, GL1, FUS3, TOC1,GL3, ERF115, BES1, FHY3, AP3, PI, PRR5, AP2, and PRR7). Toensure that the differences in binding distribution between TFswere not an effect of assigning a bound region based on its 1-bp-peak summit, the observed distributions were confirmed basedon the overlap using the entire peak regions (Supplemental Figure5). The robustness of TF binding sites in codons in the ME and HCsubnetworks (Supplemental Figure 3) confirms their relevance.They might be instances of what has been termed dual-use codonsin plants (Stergachis et al., 2013).

Concerning the position of the binding sites with respect tothe gene, we observed that 57 and 28% of the binding eventsare upstream and downstream, respectively, of the potential tar-get gene. Overall, 89% (23,891/26,717) of all upstream bindingsites are within 2 kb of the transcription start site (73% in 1-kbpromoter). At the 39 end of the gene, 91% (11,687/12,828) of allbinding sites are within 2 kb and 72% within 1 kb from the tran-scription stop. The highest fraction of binding for all TFs is closeto the transcription start site (Supplemental Figure 6). To groupTFs having similar binding profiles within a locus context, weclustered binding information for the different TFs. Whereas forsome TFs binding is restricted to a small region around the genebody (see clusters 1, 6, and 7 in Supplemental Figure 6), thebinding landscape of clusters 2, 4, and 5 is more diffuse acrossthe 2-kb upstream region (e.g., AP1). SVP (cluster 3) is uniquebased on the fact that it is the only TF in the data set with sub-stantial binding at 300 to 400 bp downstream of the transcriptiontermination site.

Detection of Hub Targets and HOT Regions

To estimate the complexity of gene regulation in the network, allTF-target gene interactions were integrated for the 27 uniqueTFs. The majority (63%) of the potential target genes are boundby more than one TF (Figure 2A), but the number of genes de-creases rapidly for an increasing number of bound TFs, reachinga maximum of 18 bound TFs per potential target gene. Thedistribution itself best fits an exponential seen as a linear relationin a log-y scale (top insets, Figure 2A), instead of the more com-monly described power law (which would be linear in a log-logscale; bottom inset). In a network context, hub genes are attrib-uted the important function of providing crosstalk between dif-ferent processes (Barabási and Oltvai, 2004). To delineate the hubgenes in the ChIP gene regulatory network, a random TF-genetarget distribution was built (Figure 2A) by randomizing the rela-tionships between TFs and potential target genes while pre-serving the number of potential target genes per TF (Marbachet al., 2012). Based on the 99th percentile values of the ran-domized distributions, we defined the 1174 potential target genesthat are bound by eight TFs or more as target hubs. Non-hubgenes include all other genes.

In complement to the hub target genes, we delineated HOTregions in the genome as regions in which many TFs bind. HOTregions differ from hub genes as the hub genes can be bound bymany TFs each binding at a different position (Figures 2C and2D). To delineate HOT regions, all peak-called regions from all27 TFs were merged (see Methods) and collapsed. To avoidchaining of multiple single-bound regions into long stretchesbased on limited overlap, all peaks were trimmed to regions of235 bp at each side of the summit (unless original regions wereshorter), which is the average length of all peaks (SupplementalFigure 7A). This resulted in conservative “merged regions” witha median length of 349 bp that were used to identify HOT regions(Figure 2C; Supplemental Figure 7B). The region occupancy fol-lowed an exponential curve, where;44% of the regions are boundby more than one TF (Figure 2B). A total of 1185 HOT regions weredefined as those being bound by seven or more TFs. Non-HOTregions include all other merged regions.Whereas hub genes measure TF complexity at the level of the

target gene, HOT regions define how many TFs bind to the sameregion at such close proximity that the ChIP peaks could not bediscerned from each other. Similar to peak annotation of individualbinding events, each HOT region is assigned to the closest geneto obtain the potential target genes associated with HOT regions.Based on the two gene lists, we observe that of the 1174 hubgenes, 355 (30%) are not associated with HOT regions because ofthe TFs binding at different regions (Figure 2D). The distributionsare robust in the ME and HC subnetworks (Supplemental Figure 8).

Target Hubs Are Enriched for Regulatory Genes

Through the integration of different data sets, the regulatorycomplexity was also functionally investigated. Hub genes aresignificantly enriched for genes involved in stimulus responses,development, signaling, and process regulation. No enrichmentfor these GO terms was found in the non-hub genes nor in a morespecific set of low-complexity genes, defined as potential targetgenes bound by one or two TFs. While these processes are en-riched in hub genes in the currently profiled network, it will beimportant to see whether this pattern is confirmed in other sub-sets of the complete Arabidopsis transcriptional network.To further explore the functional properties of hub genes,

other gene function information was collected, including all TFsfrom AGRIS, miRNAs from “AthaMap MicroRNA targets” (Bülowet al., 2012), embryo-lethal genes (Meinke et al., 2008), and theset of kinases described by PhosPhAt (Zulawski et al., 2013).Although there is a significant enrichment for TFs in the entireset of potential target genes, the enrichment is dependent on thelevel of target complexity: There is a significant 3-fold enrichment ofTFs in hubs while they are significantly underrepresented amonggenes bound by less than three TFs (fold enrichment = 0.87).Similarly, there is a significant enrichment for kinases in the hubgenes (fold enrichment = 3.15). No enrichment could be found formiRNAs or embryo-lethal genes among the hub genes.In addition to evaluating the enrichment of miRNAs and

kinases in the set of TF hubs, we determined hub target genes ofthe miRNAs and kinases in their respective networks in the samemanner as in the TF network (Supplemental Figure 9). Kinasehub targets are defined as being phosphorylated by $5 kinases,

3898 The Plant Cell

Page 6: A Functional and Evolutionary Perspective on …A Functional and Evolutionary Perspective on Transcription Factor Binding in Arabidopsis thaliana C W Ken S. Heyndrickx, a,b,1 Jan Van

Figure 2. Organization of Hub Genes and HOT Regions.

(A) and (B) Histogram of the number of bound TFs per potential target gene (A) and per peak region (B). The black line is the cumulative number ofpotential target genes, and the gray band is the ensemble of 1000 random distributions. The insets are log or double log-transformed representations ofthe same data.(C) Four examples of peak region merging.

Function and Evolution of TF-Bound DNA 3899

Page 7: A Functional and Evolutionary Perspective on …A Functional and Evolutionary Perspective on Transcription Factor Binding in Arabidopsis thaliana C W Ken S. Heyndrickx, a,b,1 Jan Van

whereas miRNA hub targets are regulated$6 miRNAs. Interestingly,both the miRNA and kinase hubs are significantly enriched for DNA-dependent nucleic acid binding and TF activity. Three kinase hubs(ATBZIP12, BIN2, and ABI5) are also TF target hubs, all of which areinvolved in brassinosteroid signaling. The enrichment for TF activityin hubs of different network types reveals that genes related totranscriptional regulation are also complexly regulated throughother regulatory mechanisms.

Expression Levels Are Correlated with the Total Number ofBound TFs

Apart from function, we evaluated expression of the potentialtarget genes in the context of regulatory complexity (see Methods).Because our TF set involved a large number of known floweringregulators (Table 1), we focused on potential target genes asso-ciated with flowering based on the functional modules (n = 406genes). They were divided into low-complexity genes (bound byless than three TFs), intermediate complexity (bound by three toseven TFs), and hub or high-complexity genes and comparedusing the Kolmogorov-Smirnov test.

Expression breadth, defined as the number of conditions inwhich a gene is expressed, is positively correlated with thenumber of regulating TFs of the potential target genes (Figure 3;P value < 0.05). Although high-complexity genes also displaya U-shaped distribution with some genes being expressed in onlya few conditions, genes expressed in only a single condition aremost frequently bound by only one or a few TFs. To determinewhether the observed correlation was due to the presence of HOTregions or the added complexity of all nearby bound regions, wecompared the distributions for the hub genes (Figure 3) and thoseof the HOT-associated genes and found the shift was not sig-nificant when comparing HOT- and non-HOT-associated genes(Supplemental Figure 10). Therefore, we conclude that the totalregulatory TF complexity of the potential target genes is the mainresponsible factor. This is supported by the same analyses per-formed on the subnetworks, where the shift is consistently largerfor hub target genes than for HOT-associated target genes (datanot shown). Similarly, using median gene expression levels in-stead of expression breadth confirms this bias (SupplementalFigure 11).

HOT Regions Are Enriched for DH Sites

A common characteristic of all genomic regions associated withregulatory proteins is a pronounced sensitivity to DNase I digestion(Zhang et al., 2012). We evaluated the overlap between DH sitesfrom flower and leaf (Zhang et al., 2012) with our merged regionsdescribing TF binding (Figure 2E). All bound regions (non-HOT and

HOT) are significantly enriched for flower DH sites (P value < 0.001),with the enrichment in HOT regions being twice as high as in non-HOT regions. The fraction of HOT regions that overlap with DHsites is 87%, compared with 55% for non-HOT regions. The samepatterns were observed when using the DH sites determined inleaf tissue. The significant overlap of DH sites with bound regionsin general confirms their susceptibility to transcriptional regulationwhile the higher enrichment for HOT regions suggest a more steadyopen chromatin state, possibly because of the high number of TFbinding events.

Hub and HOT-Associated Genes Respond toTF Perturbation

Next, we investigated how TF perturbation affected potentialtarget genes and how this was reflected by regulatory complexity.Van Nostrand and Kim (2013) reported that HOT-associated po-tential target genes in C. elegans are less responsive to TF per-turbation in C. elegans. For each of the 18 TFs with perturbationdata in our data set, we compared the enrichment for DE genes,defined as genes that respond to perturbation of the profiled TFamong non-hub–non-HOT genes (low-complexity binding) andhub-HOT genes (high-complexity binding). Overall both low- andhigh-complexity bound genes are significantly enriched for DEgenes, and in most (13/18) data sets, there is no significant re-duction in expression responsiveness in hub genes or HOT-associated genes (Supplemental Figure 12). Also, TFs displayhigher DE enrichment in the high-complexity bound gene sets.Deviating patterns are found for some specific TFs: PIF3 potentialtarget genes show higher DE enrichment in non-hub genes andnon-HOT-associated genes, while FUS3-, PIF4-, LFY-, and PI-bound genes exhibit almost no difference in enrichment. OnlyFLC potential target genes have different patterns for hub andHOT-associated genes.

Chromatin States of Bound Regions

The Arabidopsis genome can be divided into nine chromatinstates (Sequeira-Mendes et al., 2014) based on nine genome-wide histone modification marks, three histone variants, nucleo-some density, genomic G+C content, and CGmethylated residues.The combination of these marks into signatures or states holdsmore power for functional association than different marks in iso-lation. With regard to our set of TF-bound regions (Figure 2E), weobserved significant enrichment for state 1 (associated with tran-scribed regions and transcription start sites), 2 (similar to 1, butlower nucleosome density and located outside the gene body butin the promoter), and 4 (similar to state 2, but with fewer activemarks, mostly overlapping with noncoding intergenic regions and

Figure 2. (continued).

(D) Example of a hub gene that is not classified as HOT associated because the set of the regulating TFs bind at two distinct regions (59 and 39). The“Peak-Called Regions” track contains all regions called in any of the single TF experiments used for the hub and HOT analysis. Zooming in shows thename of the TF binding at each region based on the name of the region. The “Merged Regions” track shows the result of our merging procedure ofseparate binding regions into genomic binding regions. The “DH I Flower” track shows the results of the study by Zhang et al. (2012).(E) Enrichment of bound regions for DH sites in flower and leaf and in the chromatin states delineated by Sequeira-Mendes et al. (2014).

3900 The Plant Cell

Page 8: A Functional and Evolutionary Perspective on …A Functional and Evolutionary Perspective on Transcription Factor Binding in Arabidopsis thaliana C W Ken S. Heyndrickx, a,b,1 Jan Van

upstream promoter). By contrast, the bound regions were signifi-cantly depleted for states 3 (transcription elongation), 7 (gene bodyand intron), 8 (AT-rich heterochromatin), and 9 (GC-rich hetero-chromatin). The association with states 1 and 2 and the depletionfor 3 and 7 appears to be a direct consequence of the location ofmost bound regions near genes, and the enrichment for state 4 anddepletion for states 8 and 9 confirm the functionality of the inter-genic bound regions.

Based on the ChIP peak-gene distance distribution, we defineda set of 195 distal bound regions as those further than 4 kb awayfrom the closest gene. Although a small fraction (11%) of thedistal upstream bound regions lies in heterochromatic regions(state 8 and 9), they are significantly depleted for these hetero-chromatin-typical states. Interestingly, the remainder of the distalupstream bound regions can be split into enrichment towardstates 4 and 5 (Polycomb chromatin). The Polycomb pathway isan important repressive pathway in development, including flow-ering, which is known to act by regulating chromatin accessibility tobinding sites. When the repression is overcome, TF binding leadsto target gene regulation (Farrona et al., 2011). The enrichment forstate 5 suggests that the distal upstream bound regions are can-didate distal elements where the chromatin is under regulation bythe Polycomb complex similar to the distal element of FLOWERINGLOCUS T (Adrian et al., 2010), which is brought to close associationwith the proximal promoter through a chromatin loop (Cao et al.,2014). While downstream distal elements appear to show similarenrichment patterns for state 4 and 5, the sample size is too smallto obtain significant results.

Population Sequence Diversity and Conservation ofBound DNA

If bound regions are of functional importance for transcriptionalregulation, we expect them to be under purifying selection.Based on complete resequencing data of 369 Arabidopsisstrains from the 1001 Genomes project (Weigel and Mott, 2009),we assessed the nucleotide diversity within the bound regionsusing the average number of nucleotide differences per site, p(Nei and Li, 1979). We compared the TF-bound regions withfourfold degenerate (4D) sites and other sets of genomic regions(Figure 4). The 4D sites are thought to be the most neutrallyevolving sites in the genome, as such mutations do not affectthe encoded amino acid, and coding sequences are less likely tohave other regulatory functions. The 4D sites are indeed lessconstrained than either intergenic regions or 1 kb up- anddownstream regions of genes (p of 0.0052, 0.0050, and 0.0034,respectively, versus p of 0.0070 for 4D sites), but bound regionshave the lowest diversity (P value < 0.001 based on reshuffling;see Methods). The diversity of bound regions is similar to that of59 and 39 untranslated regions (UTRs) and almost as low ascoding sequences. Importantly, the ME and HC subnetworksshow only little additional constraint for bound regions (Figure 4).In addition, we examined HOT regions and distal bound re-

gions in comparison to the non-HOT regions and proximalbound regions, respectively. HOT regions show reduced p val-ues compared with the non-HOT regions, which can be ex-plained by the necessity to retain binding sites for more TFs than

Figure 3. Expression Breadth as a Function of Regulatory Complexity for Flowering-Associated Genes.

Expression breadth distributions based on a nonredundant expression compendium of 111 conditions for three series of complexity: low, <3 TFs;intermediate,$3 TFs and <8 TFs; and hub,$ 8TFs (n = 406 flowering-associated genes). The lines indicate the cumulative histograms (right y axis). TheKolmogorov-Smirnov (KS) statistic and P value are calculated between the low complexity and the hub series.

Function and Evolution of TF-Bound DNA 3901

Page 9: A Functional and Evolutionary Perspective on …A Functional and Evolutionary Perspective on Transcription Factor Binding in Arabidopsis thaliana C W Ken S. Heyndrickx, a,b,1 Jan Van

non-HOT regions and further corroborating the functionality ofHOT regions. Similarly, distal bound regions show similar p valuescompared with regions acting proximally, providing evidence fortheir functionality.

Because of their function, bound regions are also often con-served across species, which is the premise of genome-widestudies of conserved noncoding sequences (CNSs). We determinedthe fraction of bound regions exhibiting conservation within thecrucifers (Haudry et al., 2013) and within the dicot lineage (Van deVelde et al., 2014) based on overlap with CNSs. Overall, CNSssupported 35 and 29% of the 24,898 bound regions in the cruciferand dicot data, and 15% are supported in both sets. Bound regionsare significantly enriched for overlap with CNSs in crucifers (3.2-fold)and dicots (1.6-fold). For the set of 1185 HOT regions, we observethat 72 and 52% overlap with a conserved region, which results ina slightly higher enrichment of HOT regions in CNSs of crucifer(3.8-fold) and dicot (1.6-fold) data sets compared with non-HOTregions (3.2- and 1.5-fold, respectively). This result complementsthe findings of the population sequence diversity analysis regardingthe higher constraint on HOT regions.

Hypotheses to Explain the Diversity of Motifs inBound Regions

Combinatorial control, where different TFs cooperate in a context-dependent manner, is an important principle in transcriptionalregulation (Singh, 1998; Vandepoele et al., 2006). For all 27 TFs,we determined the overlap in potential target genes (Figure 5A)and clustered them accordingly. Importantly, when all experi-ments, including ChIP-chip and ChIP-Seq experiments for thesame TF, were taken into account, all experiments of a single TF

clustered together, rather than clustering based on the ChIPmethod used. We observed significant overlaps for 255 out of the351 TF pairs in our data set, showing that there is high degree ofoverlap in the genes that are targeted by TFs involved in flower-ing, circadian rhythm, and light response. Among the profiled TFs,there are two major protein-protein interaction clusters: light re-sponse (marked in orange) and a flowering cluster (marked ingreen; Figure 5B). Interacting TFs can be retrieved from theoverlap analysis (Figure 5A), albeit the flowering cluster is split upin three smaller clusters, potentially revealing the more commoninteractions. Since HOT-associated genes have a large influenceon cobinding statistics (Nègre et al., 2011), the same matrix wasconstructed using only the non-HOT-associated genes. Althoughfewer significant TF pairs were found (208/351), the clusterstructure of the matrix is robust, also when using the subnetworks(Supplemental Figure 13).Whereas cotargeting of potential target genes reveals possible

coregulation, cobinding of TFs in close proximity of each other,i.e., in the same bound region, can identify cobinding complexes.Therefore, we integrated de novo motif finding for each of theprofiled TFs (see Methods). An overview of all enriched motiflogos per TF, together with their frequency and location within thepeak regions, is given in Supplemental Table 2. Importantly, motifdefinitions were determined stringently, meaning that differencesin flanking nucleotides were considered as different motifs, as canbe seen for PIF5. Flanking nucleotides have been shown to addimportant specificity in motif recognition and are therefore notcollapsed into a single degenerate consensus binding site(Williams et al., 1992; Catron et al., 1993; Suzuki et al., 2005). Motifswere ranked by occurrence, with the most frequent motif denotedas the primary motif.For each factor, we evaluated whether any of the de novo

motifs corresponded to the canonical motif (the motif that isknown to be bound by the TF, as opposed to noncanonicalmotifs), based on motif alignments against the AGRIS databaseand comparison with motifs from literature (Supplemental Table3). Notably, we observed for several TFs that the primary motif isnot the canonical motif. For most TFs, such as TOC1, onlya single motif fitted the canonical motif description, whereas forothers, such as PIF5, multiple motifs matched the canonicalmotif as motif differences resided in the flanking nucleotides.In the traditional view, a DNA motif is expected to explain the

binding site of the TF in the peak region. However, a single motifrarely covers more than 40% of the ChIP peaks (SupplementalFigure 14A), raising the question of how the TF might be associ-ated with the chromatin in the remainder of the peaks. Interestingly,when taking all significantly enriched motifs into account, thefraction of peaks with at least one motif increased to 45 to 80%.This large increase indicates different motifs are rarely present inthe same subset of peaks. Enrichment for DE genes shows that thesets of genes uniquely associated with the nonprimary motifs likelyrepresent regulated target genes (with the exception of somemotifsof GL1, FLC, BES1, PIF4, and GL3; Supplemental Figure 14B).Notably, we did not observe a reduction in the fraction of peakswith motif instances between non-HOT regions and HOT regions(Supplemental Figure 15).Cobinding TFs, where one TF binds through association with

another TF with a different DNA binding specificity or where TFs

Figure 4. Nucleotide Diversity (p) in Different Sets of Genomic DNA.

Nucleotide diversity values based on 369 Arabidopsis strains for differentgenomic regions, including bound regions from the complete network(Bound), the subnetworks (Bound ME and Bound HC), and distal boundregions (all, and the subsets lying in the chromatin states 4 and 5). Comp.Interg. is the complete intergenic space and 4D represents fourfold de-generate sites in CDSs. Gray bars denote different types of genomicregions, and bound DNA is represented by black bars.

3902 The Plant Cell

Page 10: A Functional and Evolutionary Perspective on …A Functional and Evolutionary Perspective on Transcription Factor Binding in Arabidopsis thaliana C W Ken S. Heyndrickx, a,b,1 Jan Van

modify each others’ DNA binding specificity, provide a possibleexplanation for the widespread occurrence of different DNAmotifs for the same TFs. This can only be the case if TFs bind inthe same bound region. Considering all identified motifs per TFand the complete set of potential target genes, we systemati-cally categorized binding events via canonical and noncanonicalmotifs. TFs can bind (1) peaks where only a canonical motifinstance is present, (2) peaks where both canonical and non-canonical motif instances are present, or (3) peaks where onlynoncanonical motif instances are present. Peaks of type I fulfillthe traditional view of TF binding, where a TF binds its targetdirectly. Type II represents cobinding, where a second TF bindsin cooperation with the profiled TF. The peaks of type III repre-sent tethering, where the profiled TF associates with the chro-matin through a partnering TF, an example being TCP bindingvia a protein-protein interaction with AS2 (Li et al., 2012; Wanget al., 2012). Based on the fraction of these three peak types fora given TF, we observe that most TFs bind a mixture of thesepeak types (Figure 6). Only a few TFs, such as SEP3 and FLM,tend to bind peaks that almost always include a canonical motif.To find explanations for the different DNA motifs in a single

ChIP experiment, we assessed the co-occurrence of pairs ofTFs in bound regions (Supplemental Figure 16). From the per-spective of each TF, its entire peak set was divided into the differentcategories of peaks and the number of co-occurrences was sta-tistically evaluated. Starting from this matrix, we tested whetherknown cobinding regulators could be recovered and derived newtestable hypotheses for several TFs. First, in the ChIP data for theMADS domain protein SEP3, multiple different CArG motifs, typicalfor MADS domain proteins, are enriched (Supplemental Table 2).Binding of many other MADS box TFs is significantly enriched inthe SEP3-bound regions. All TFs that form a protein-protein in-teraction with SEP3 (de Folter et al., 2005) have high cobindingscores: AGL15, AP1, SOC1, PI, and AP3. Although there is noprotein-protein interaction known or predicted between SEP3 andFLM, we observe a highly significant cobinding pattern in the sameregions for these TFs as well. Overall, the cobinding of the differentMADS TFs is a likely explanation for the different CArG motifs(different flanking nucleotides) found in the peaks of SEP3. Simi-larly, PIF3-4-5 and PRR5-7 show highly significant cobindingscores within their respective TF family members, fitting with theprotein-protein interactions between them. Overall, for TF pairs thathave a known protein-protein interaction, the co-occurrence scoresare higher compared with pairs without interactions.Apart from canonical CArG motifs, many MADS domain TFs

have noncanonical G-boxes as secondary motifs. Based on thecobinding of other TFs with MADS TFs, we attempted to iden-tify new cooperative TF interactions. For instance, AP1-boundregions that harbor noncanonical motifs also often bind PIF5,PIF3, PRR5, and PRR7. This suggests a link between the pres-ence of the G-box and the cobinding of these TFs. In the PRR7peaks, there is a relationship between the presence of theFHY3-FAR1 binding site (FBS) motif (CACGCG; Lin et al., 2007)and FHY3 found in the PRR7 peaks. FHY3, which has an FBSmotif as canonical motif, shows very high cobinding scores inthe peaks with both motif types and in those with only non-canonical motifs. The fact that PRR7 has high cobinding withFHY3 in its type II and III peaks, but low cobinding in its type I

Figure 5. Coregulation and Protein Complexes of TFs.

(A) TF cobinding matrix based on common potential target genes andaverage-linkage hierarchical clustering based on Jaccard Index. Thelower left half displays the Jaccard index, while the upper right half dis-plays hypergeometric P values of overlap between the two sets of boundgenes, corrected using the Bonferroni method. Orange, light responsecluster; green, flowering cluster; gray, other.(B) Experimental and predicted protein-protein interactions between theTFs. Solid lines indicate experimentally determined protein-protein in-teractions, while dotted lines indicate predicted interactions. The linethickness relates to the number of supporting experiments.

Function and Evolution of TF-Bound DNA 3903

Page 11: A Functional and Evolutionary Perspective on …A Functional and Evolutionary Perspective on Transcription Factor Binding in Arabidopsis thaliana C W Ken S. Heyndrickx, a,b,1 Jan Van

peaks with only canonical motifs, corroborates the hypothesisthat the non-canonical FBS in PRR7 is explained by FHY3. Asimilar signal can be seen for AP1 and PRR7, and LFY andPRR7, where there is only significant cobinding in AP1 peakswhere the G-box (PRR7 canonical motif) is found. In both cases,we hypothesize a tethering event.

DISCUSSION

Large-scale analysis of TF binding can provide insights into theorganization and complexity underlying transcriptional regula-tion. To investigate gene regulatory networks in Arabidopsis, wecompiled an experimental network comprising 46,619 uniqueTF-target regulatory interactions based on 27 TF ChIP profilingexperiments. Given the different data analysis methodologies ofthe different source studies, we reprocessed the raw data fol-lowing a uniform pipeline to obtain an unbiased view on po-tential target genes for different TFs. Prior to our study, theAtRegNet platform has made great efforts to collect and store allArabidopsis regulatory information from both small- and large-scale studies (Palaniswamy et al., 2006). However, given therapid increase in genome-wide ChIP studies in Arabidopsis, the

AtRegNet database as of the writing this article is lacking 21 ofthe experiments included in this study. In contrast to AtRegNet,we did not include data from small-scale studies, as we wereprimarily interested in discerning binding patterns and propertiesof TFs for which global genomic binding information is required.Through the integration of different functional data sets in-cluding GO, functional modules, embryo-lethal genes, miRNAs,and kinases, as well as DNA motif finding information, our generegulatory network provides a functional view of TF regulation inArabidopsis as well as an entry point to predict functions forunknown genes in the set of potential target genes.To investigate the organization of regulation and binding sites

among the potential target genes, all ChIP data sets were merged,and the distributions of the number of regulators per potentialtarget gene and number of binding events per region were quan-tified. In both cases, an exponential distribution was observed,which is distinct from the commonly described power law in bi-ological networks (Barabási and Oltvai, 2004). However, the ex-ponential distribution was also reported in the C. elegans generegulatory network by Cheng et al. (2011). We delineated hubgenes and HOT regions, two proxies for complex gene regulation.In contrast to the modENCODE study (Gerstein et al., 2010) whereHOT regions had to be bound by more than 65% of the profiledTFs, our definition of HOT regions is based on a percentile scoreinferred through network randomizations, as was done by Shalgiet al. (2007), avoiding a static ad-hoc threshold. Functional anal-ysis of the potential target genes revealed that the genes bound byfew TFs are depleted for TFs, while the fold enrichment for TFswas higher for potential target genes with high TF complexity, suchas hubs and HOT-associated genes. In addition, TFs were alsoenriched in the hubs of kinase and miRNA networks, showing thatregulatory genes in plants, such as those involved in hormonesignaling, are complexly targeted in different types of regulatorynetworks.Through overlap analysis with DH sites, all bound regions

showed significant enrichment for open chromatin regions. HOTregions consistently exhibited higher enrichments, likely causedby a constraint on the chromatin to maintain an open confor-mation because of the high number of binding TFs. This openconformation raises concerns about whether the binding in HOTregions truly affects the regulation of the associated target geneor merely represents a state of massive TF binding, due to increasedlocal accessibility of DNA, without any regulatory consequences.There is evidence in non-plant species that (1) at H. sapiens HOTregions, TF occupancy is strongly predictive of transcription pre-initiation complex recruitment and moderately predictive of initiatingPol II recruitment, but not of transcript abundance (Foley and Sidow,2013); (2) highly expressed loci are very amenable to ChIP in yeast,leading to HOT regions (Teytelman et al., 2013); and (3) DNA motifsappear to be of less importance for TF binding in human HOT re-gions (Yip et al., 2012). To assess whether HOT represent functionalregulatory elements in plants, we investigated the expression ofHOT-associated genes, together with purifying selection patterns,chromatin states, and DNA motifs in HOT regions.First, we found that for most TFs, there is no indication that

genes associated with HOT regions are less prone to be responsiveupon perturbation of the profiled TF than non-HOT-associatedgenes. These results differ from those in C. elegans modENCODE,

Figure 6. Canonical versus Noncanonical Motifs in TF-Bound Regions.

Circles represent tethering hypotheses based on significant cobinding toexplain the fractions of peaks with only noncanonical motifs. PRR7 isthought to associate with the chromatin through binding with FHY3, andthe G-box in AP1 noncanonical peaks is hypothesized to be the con-sequence of tethered binding with PIF3, PIF5, PRR5, and PRR5.[See online article for color version of this figure.]

3904 The Plant Cell

Page 12: A Functional and Evolutionary Perspective on …A Functional and Evolutionary Perspective on Transcription Factor Binding in Arabidopsis thaliana C W Ken S. Heyndrickx, a,b,1 Jan Van

where it has been suggested that HOT-associated genes are lessprone to be regulated by the binding TFs. Instead, HOT-associatedgenes tend to be ubiquitously expressed (Van Nostrand and Kim,2013), which is not the case for the plant HOT-associated genesdelineated here. However, it should be noted that Van Nostrandand Kim (2013) inferred this pattern for only two TFs, raising thequestion whether this finding represents a global trend that is validfor other TFs as well. Second, the percentage of peaks, as well asthe distribution of canonical and noncanonical motifs, harboringa motif instance is similar in HOT regions and non-HOT regions,revealing that sequence-specific TF binding is prevalent inHOT regions as well. This is again in contrast with results foundin H. sapiens, where the ENCODE project concluded that openchromatin facilitated TF binding in HOT regions even in theabsence of specific binding motifs for the particular TF examined(Yip et al., 2012). Through the integration of genome-widechromatin states, we explored whether different types of boundregions are enriched for specific states, which could indicatefunctional differences. Overall, we observed that both HOT andnon-HOT regions are strongly enriched for states describingproximal and distal promoters, as well as transcription startsites, and are depleted for heterochromatin. Furthermore, basedon nucleotide diversity data from 369 resequenced Arabidopsisstrains, we found that bound regions, both HOT and non-HOT,show strong signatures of purifying selection. Combining thesedifferent results, we therefore concluded that the binding eventsoccurring in Arabidopsis HOT regions are functional and aremediated by specific DNA binding motifs and are not merely theresult of increased accessibility due to an open chromatinconfiguration.

While we have shown that HOT regions are indicative of functionalbinding, one of the consistent observations in genome-wide ChIPexperiments is poor correlation between binding, DNA motifpresence, and transcriptional response for candidate targetgenes. Possible explanations are the incorrect assignment ofa binding site to a potential target gene, functional redundancyamong related TFs, conditional differences between ChIP andtranscript profiling (different cell-type, developmental stage, orphysiological condition), or an incompatible chromatin state(Ferrier et al., 2011). Additional hypotheses are that there isa transcriptional response following the binding event, but themRNA is immediately degraded, or that the binding merelyfacilitates binding of cofactors essential for activation or re-pression of the targets (Para et al., 2014). A last explanation isthat transcript profiling studies in part capture indirect regulation.With respect to the DNA motif presence in ChIP peak sequences,we have shown that when taking into account significantly en-riched noncanonical or nonprimary motifs, the fraction of peakswith a motif instance substantially increased. Furthermore, weobserved for some TFs that the most frequent motif does notmatch the canonical motif, which is consistent with the ENCODEresults (Wang et al., 2012). Importantly, the potential target genesets associated with canonical and noncanonical motifs aresimilarly enriched for DE genes, implying that both types of motifsmediate TF regulation.

Based on the noncanonical motifs and the TF co-occupancyat merged regions, we inferred cobinding events that are sig-nificantly more frequent compared with what would be expected

by chance. For example, the different motifs matching CArG boxesin one MADS domain TF ChIP profiling study can be explained bythe extensive cobinding among MADS domain family members (deFolter et al., 2005). Furthermore, the G-boxes found enriched inregions bound the AP1 MADS domain TF can be explained bycobinding of PIF3, PIF5, PRR5, and PRR7. Similarly, we couldcorrelate the significant enrichment of a noncanonical FBS motif inthe peaks of PRR7 to the cobinding with FHY3. Because thesemotifs and cobinding is most strongly enriched in peaks with onlynoncanonical motifs, we hypothesize that these binding eventsoccur through tethering (Wang et al., 2012). Whereas it has recentlybeen shown, based on in vitro in protein binding microarrays(Franco-Zorrilla et al., 2014), that some plant TFs can bind differentDNA sequences, based on our cobinding observations, weconclude that the noncanonical DNA motifs can for the mostpart be explained as the result of cooperative TFs binding thesame region.In conclusion, the integration of different experimental ChIP

data sets has revealed a number of insights regarding the organi-zation of binding events on a genome-wide scale. In addition, weshowed that bound regions show a clear signal of purifying selectionbased on a population diversity, as well as conservation analysis.Finally, we provide testable hypotheses for the cooperative regula-tion of TFs through tethering based on the integration of DNA motifinformation for the different binding events.

METHODS

ChIP-Seq Processing

Raw reads were downloaded from the NCBI Sequence Read Archive(SRA; Wheeler et al., 2008; accession IDs are listed below). The quality ofthe raw data was evaluated with FASTQC (v0.10.0; http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/), and adaptors and other overrepresented se-quences were removed using the fastx-toolkit (v0.0.13; http://hannonlab.cshl.edu/fastx_toolkit/). The readsweremapped to the unmaskedTAIR10 referencegenome of Arabidopsis thaliana (TAIR10_chr_all.fas; ftp.arabidopsis.org) usingBWA with default settings for all parameters (v0.5.9; Li and Durbin, 2009).Reads that could not be assigned to a unique position in the genome wereremoved using samtools (v0.1.18; Li et al., 2009) by setting themapping qualitythreshold (-q) to 1. Redundant readswere removed, retaining only one read perstart position, using Picard tools (v1.56; http://broadinstitute.github.io/picard/).Peak calling was performed using MACS (v2.0.10; Zhang et al., 2008; defaultparameters except –g 1.0e8 and false discovery rate [FDR] < 0.05). Whenreplicates were available, the Pearson correlation coefficient (PCC) betweenthe peak fragment per kilobase per million values was calculated for all peak-called regions across the different replicates (Supplemental Figure 17). Sincemost ChIP-Seq studies were performed without biological replication, theanalysis was continued with the better replicate, with the choice of replicatebeing based on the results of the motif enrichment under the peaks (seeMethods on Peak Calling). A few of the older experiments (SRP002328,SRP003928, and SRP000783) had lower PCC values between replicatesthan recent studies because of lower consistency in quality. Both for ex-periments with high and low PCC values between replicates, the replicatewith bettermotif enrichmentwas retained (seeMethods onMotif Finding). Anoverview of which replicates were used for the samples is provided inSupplemental Table 4. For EIN3, the time point at which themaximal numberof binding events occurred (4 h) was processed (Chang et al., 2013). REV,AMS, and FLP/MYB88 were removed from the data set due to a very lownumber of peaks in the results, the lack of paired-end read processing in thecomputational pipeline, and an abnormally high fraction of peak regions near

Function and Evolution of TF-Bound DNA 3905

Page 13: A Functional and Evolutionary Perspective on …A Functional and Evolutionary Perspective on Transcription Factor Binding in Arabidopsis thaliana C W Ken S. Heyndrickx, a,b,1 Jan Van

transposable elements (Supplemental Figure 2A), respectively. All experi-ments were visually inspected with GenomeView (Abeel et al., 2012), and allfigures were made using matplotlib (Hunter, 2007).

ChIP-chip Processing

Raw CEL files were downloaded from Gene Expression Omnibus (Barrettet al., 2011; accession IDs are listed below). The Affymetrix Tiling arraybpmap files were updated to the current TAIR10 annotation with Starr(Zacher et al., 2010). Normalization and peak calling was performed withthe Bioconductor (Gentleman et al., 2004) package rMAT (Droit et al., 2010)in R (R Core Team, 2012). The PairBinned method was used to normalizethe arrays, and peaks were called using a FDR cutoff of 0.05 except for thedata sets GSE13090, GSE24684, GSE43291, and GSE40519, in which theP value was set at 1023 (in analogy to the original study and necessary toobtain peak calling results). The minimum requirement of consecutiveenriched probes was set at eight. Other parameters were left at their defaultsetting. All replicates were taken into account by the rMAT algorithm.

Peak Annotation

Peak regions were annotated based on the location of their summits. Apeak was assigned to the closest gene as annotated in the TAIR10 releaserepresented in the PLAZA2.5 database (Van Bel et al., 2012); peaks can beassigned both 59 and 39 of a gene. Each assignment is considered asa potential TF-target interaction. The peak locations were categorized byassigning a peak to one of the following genomic regions: intergenic, 1-kbpromoter (1 kb upstream of transcription start site), 59UTR, coding, intron,39 UTR, and 1 kb down of the transcription stop site. For SupplementalFigure 3, the assignment based on the entire peak regions shows theaverage fraction of the peak lengths assigned to each genomic region.Random peak assignment was performed by BEDtools random (Quinlanand Hall, 2010).

Motif Finding

The sequences of the complete peak regions were masked for codingsequence and submitted to the Peak-Motifs algorithm using defaultsettings (Thomas-Chollier et al., 2012). Motifs that could be aligned witha correlation score $75% were collapsed. For each returned DNA motif,enrichment was defined as the ratio of the peak set frequency over thefrequency in 1000 random sets of peaks of the same size and lengthdistribution sampled without replacement from the complete noncodinggenome space (intergenic + UTR). The motifs from Peak-Motifs weremapped using matrix scan (Turatsinze et al., 2008) using the same pa-rameters as used by Peak-Motifs. To determine whether a motif corre-sponded with a TF’s canonical DNAmotif, de novo motifs were comparedwith known motifs from the AGRIS database (Palaniswamy et al., 2006)using the STAMP Web tool with default settings (Mahony and Benos,2007).

Population Genomic Analyses

Single nucleotide polymorphism data were downloaded from the 1001Genomes project (http://1001genomes.org/projects/MPICWang2013/)on April 10, 2014. Positions were only taken into account when they weresequenced in 70% of the strains. p values (Nei and Li, 1979) were cal-culated per site using VCFtools (Danecek et al., 2011) and recalculatedinto region p values for the different genomic data sets used. For the largeintergenic regions (complete, 1 kb up, and 1 kb down), the regions withinformation in <70% of the accessions were discarded. For the other(smaller) genomic elements, it was required that they were coveredcompletely by regions with 70% information. The significance of thedifference in p for different regions was determined by shuffling the bound

regions across the Arabidopsis intergenic space 1000 times usingBEDTools (Quinlan and Hall, 2010) and its python extension Pybedtools(Dale et al., 2011). The P value was empirically determined by counting thenumber of iterations in which the overlap was larger in the reshuffled thanin the real data set.

Integrated Functional Data Sets

Protein-protein interaction data was taken from the CORNET database(De Bodt et al., 2012), excluding the EVEX and AraNet relations. Thefunctional modules were taken from our previous study (Heyndrickx andVandepoele, 2012). Phosphorylation datawere downloaded fromPhosPhAton March 24, 2013 (Zulawski et al., 2013). Only those interactions weretaken into account that describe a verified relationship between the kinaseand the target protein itself: protein regulation, activation/inactivation,phosphorylation, dephosphorylation, and autophosphorylation. ThemiRNAtarget data was extracted from Supplemental Table 1 of Bülow et al. (2012).miRNA-target relations were filtered for psRNATarget (Dai and Zhao, 2011)expectation scores lower or equal to 3. DH sites (flowering and leaf tissue)were from Zhang et al. (2012). DE data were obtained from the publicationsas listed in Supplemental Table 1. Genes were removed when they werepresent as being up- and downregulated upon perturbation of a TF becauseof different time points and conditions. The GO and MapMan geneannotations were downloaded on May 15, 2013. Enrichment of afunctional category in a set of genes was calculated as the ratio of theset frequency over the genome-wide frequency. All functional enrich-ment values (GO [Ashburner et al., 2000], MapMan [Thimm et al., 2004],functional modules [Heyndrickx and Vandepoele, 2012], and DE) werevalidated statistically using the hypergeometric distribution and ad-justed using FDR correction for multiple hypotheses testing (Storey andTibshirani, 2003). The significance level was set at 0.05. For DE en-richment, the potential target genes were filtered for those present onthe ATH1 microarray.

Hub Targets and HOT Regions

Target hub genes were identified as described by Shalgi et al. (2007). ForTFs that were profiled by both ChIP-chip and ChIP-Seq, only one of theexperiments was taken into account. Hub genes are targeted bymore TFsthan the 99th percentile of the maximal value in 1000 randomizationsof the columns in the TF to gene matrix. The TF-target randomizationpreserved the number of potential target genes for each TF but reassignedeach link. Following this procedure, target hubs are genes that are tar-geted by $8 TFs. For the ME and HC networks, the cutoff values for hubgenes were $7 and $6, respectively. For the determination of the HOTregions, all peak regions of all 27 TF data sets were merged after pruninglong peak regions to the median length of all peak regions (470 bp;Supplemental Figure 3). Gene regulatory complexity was defined as thenumbers of TFs that bind to peak regions assigned to a specific genethrough peak annotation. The HOT regions were determined using thesame strategy as the target hubs, being bound by$7 TFs. For theME andHC networks, the cutoff values was $6.

Enrichment Analysis of Bound Regions in DifferentGenomic Regions

The DH sites in flower and leaf were downloaded from NCBI SRA da-tabase (accession ID SRP009678; Zhang et al., 2012). The chromatinstates were downloaded from Supplemental Data Set 2 from Sequeira-Mendes et al. (2014). The CNS data in dicots and crucifers was taken fromVan de Velde et al. (2014) and Haudry et al. (2013), respectively. The HOTand non-HOT-bound region files of each TF were formatted as BED files.Overlap analysis was performed using the BEDTools function intersectBed(Quinlan and Hall, 2010). For DH sites and chromatin states, the observed

3906 The Plant Cell

Page 14: A Functional and Evolutionary Perspective on …A Functional and Evolutionary Perspective on Transcription Factor Binding in Arabidopsis thaliana C W Ken S. Heyndrickx, a,b,1 Jan Van

presence was determined with –u parameter and the –f parameter set to 0.5(Quinlan and Hall, 2010). Because of the very long CNS regions in thecrucifer data set, the overlap requirement was set to 50 bp. By contrast, thedicot CNSs are very short since they resemble actual binding sites and here,CNSs were required to be completely embedded in bound regions. Theexpected presence in bound regions was determined by shuffling the DHsites data set 1000 times using shuffleBed, excluding the actual positions ofthe real instances. The overlap was determined using the same parametersfor each shuffled file and the median number of shared elements presentover 1000 shuffled files was used as a measure for the expected presence.This was used to calculate enrichment as the ratio between observedpresence and expected presence.

TF Coregulation and Cobinding

For the coregulatory matrix, the TFs were clustered based on the Jaccarddistance (1 – Jaccard Index) between their target sets using average linkagehierarchical clustering. The overlap was validated statistically using thehypergeometric P value, with Bonferroni correction for multiple hypothesistesting. The cutoff for significance was set at 0.001.

The cobinding statistics per type of peak (based on the presence ofcanonical and noncanonical motifs) were generated per query TF. Foreach query TF, the entire peak set was divided into the different cat-egories of peaks (only canonical, both canonical and noncanonical, andnoncanonical). Based on the merged regions to which each peak isassociated, the number of times each other TF binds in the samemerged region was counted. The P value for this overlap (number ofmerged regions in which they cobind) given the total set of mergedregions, the set of merged regions associated with the query TF (splitper type), and the set of merged regions associated with the cobindingTF was calculated with the hypergeometric distribution.

Expression Values and Condition Specificity

Expression values were determined based on the filtered microarraycompendium 2 from the CORNET database (De Bodt et al., 2012). Forcondition specificity, a gene was considered expressed if the log2 expressionvalue was above 7.5. The Kolmogorov-Smirnov test was executed usingScipy (Jones et al., 2001).

Accession Numbers

NCBI SRA and Gene Expression Omnibus accession IDs are as follows:FLP/MYB88,GSE19763;AGL15,GSE17717;GL3,GSE13090;GL1,GSE13090;AP2, E_MEXP_2653, SRP002328; SEP3, GSE14635, SRP000783; WUS,E_MEXP_2499; SMZ, E_MEXP_2068; BES1, GSE24684; SOC1, GSE33297,SRP020612;SVP,GSE33297; LFY,GSE28063,SRP003928;FUS,GSE43291;GTL1, GSE40519; AMS, SRP002566; AP1, SRP002174; FHY3, SRP007485;REV, SRP006211; PIF4, SRP010570; PIF5, SRP010315; FLC, SRP005412;TOC1, SRP010999; PRR5, SRP011389; AP3, SRP013458; PI, SRP013458;ERF115,GSE48793;PIF3,SRP014179;PRR7,SRP028272; FLM,SRP026163;EIN3, SRP017902; DH sites, SRP009678.

Supplemental Data

The following materials are available in the online version of this article.

Supplemental Figure 1. Overview of the Data and Methodology Usedin This Study.

Supplemental Figure 2. Enrichment for Differentially ExpressedGenes in Network Subcategories.

Supplemental Figure 3. Number of Potential Target Genes per TF andTheir Distribution across Different Genomic Regions for the Multiple-Evidence Subnetwork and High-Confidence Subnetwork.

Supplemental Figure 4. Fraction of Different Gene Types Bound byEach TF.

Supplemental Figure 5. Peak Region Annotation Based on theFraction of Overlap of the Entire Peak Region.

Supplemental Figure 6. Peak Location Binding Preference for theDifferent TFs.

Supplemental Figure 7. Length Distributions of All Peak Regions forEach TF and the Merged Regions Based on All TF Peak Regions.

Supplemental Figure 8. Histogram of the Number of TFs per PotentialTarget Gene and per Peak Region for the Multiple-Evidence and High-Confidence Subnetworks.

Supplemental Figure 9. Histogram of the Number of RegulatingKinases and miRNAs per Target Gene.

Supplemental Figure 10. Expression Breadth in Function of Regula-tory Complexity.

Supplemental Figure 11. Histogram of the Median Expression Valuesfor Flowering-Associated Genes for Different Series of RegulatoryComplexity.

Supplemental Figure 12. Enrichment for Differentially ExpressedGenes in Non-HOT Associated and Non-Hub Genes versus HOT-Associated and Hub Genes.

Supplemental Figure 13. Robustness of the Cluster TF Coregulation.

Supplemental Figure 14. DNA Motif Statistics.

Supplemental Figure 15. Canonical versus Noncanonical Motifs inNon-HOT and HOT Regions.

Supplemental Figure 16. TF Cobinding Matrix Based on MergedRegions.

Supplemental Figure 17. Replicate Correlation for ChIP-Seq DataSets with Replicates.

Supplemental Table 1. Sources of Differential Expression Data for theProfiled TFs.

Supplemental Table 2. Different Significant DNA Motifs per TF inOrder of Prevalence.

Supplemental Table 3. Motifs from Supplemental Table 3 That Fit theTF’s Canonical Motif.

Supplemental Table 4. Replicates Used for Each ChIP-Seq Studywith Replicates.

ACKNOWLEDGMENTS

This work was supported by the Multidisciplinary Research Partnership“Bioinformatics: from nucleotides to networks” Project (no 01MR0310W)of Ghent University (K.V.), the Agency for Innovation by Science andTechnology (IWT) in Flanders (K.S.H. and J.V.d.V.), an FWO travel grant(K.S.H.), and the Max Planck Society (C.W. and D.W.).

AUTHOR CONTRIBUTIONS

K.S.H., K.V., C.W., and D.W. designed the research methodology.K.S.H., J.V.d.V., and C.W. performed data analyses. K.S.H., J.V.d.V,D.W., and K.V. wrote the article.

Received August 6, 2014; revised October 7, 2014; accepted October12, 2014; published October 31, 2014.

Function and Evolution of TF-Bound DNA 3907

Page 15: A Functional and Evolutionary Perspective on …A Functional and Evolutionary Perspective on Transcription Factor Binding in Arabidopsis thaliana C W Ken S. Heyndrickx, a,b,1 Jan Van

REFERENCES

Abeel, T., Van Parys, T., Saeys, Y., Galagan, J., and Van de Peer, Y.(2012). GenomeView: a next-generation genome browser. NucleicAcids Res. 40: e12.

Adrian, J., Farrona, S., Reimer, J.J., Albani, M.C., Coupland, G.,and Turck, F. (2010). cis-Regulatory elements and chromatin statecoordinately control temporal and spatial expression of FLOWERINGLOCUS T in Arabidopsis. Plant Cell 22: 1425–1440.

Ashburner, M., et al., The Gene Ontology Consortium (2000). Geneontology: tool for the unification of biology. Nat. Genet. 25: 25–29.

Barabási, A.L., and Oltvai, Z.N. (2004). Network biology: understandingthe cell’s functional organization. Nat. Rev. Genet. 5: 101–113.

Barrett, T., et al. (2011). NCBI GEO: archive for functional genomicsdata sets—10 years on. Nucleic Acids Res. 39: D1005–D1010.

Bernstein, B.E., Birney, E., Dunham, I., Green, E.D., Gunter, C., andSnyder, M., ENCODE Project Consortium (2012). An integratedencyclopedia of DNA elements in the human genome. Nature 489:57–74.

Brandt, R., et al. (2012). Genome-wide binding-site analysis of REVOLUTAreveals a link between leaf patterning and light-mediated growthresponses. Plant J. 72: 31–42.

Breuer, C., Morohashi, K., Kawamura, A., Takahashi, N., Ishida, T.,Umeda, M., Grotewold, E., and Sugimoto, K. (2012). Transcrip-tional repression of the APC/C activator CCS52A1 promotes activetermination of cell growth. EMBO J. 31: 4488–4501.

Bülow, L., Bolívar, J.C., Ruhe, J., Brill, Y., and Hehl, R. (2012). ‘MicroRNATargets’, a new AthaMap web-tool for genome-wide identification ofmiRNA targets in Arabidopsis thaliana. BioData Min. 5: 7.

Busch, W., et al. (2010). Transcriptional control of a plant stem cellniche. Dev. Cell 18: 849–861.

Cao, S., Kumimoto, R.W., Gnesutta, N., Calogero, A.M., Mantovani, R.,and Holt III, B.F. (2014). A distal CCAAT/NUCLEAR FACTOR Y complexpromotes chromatin looping at the FLOWERING LOCUS T promoterand regulates the timing of flowering in Arabidopsis. Plant Cell 26:1009–1017.

Catron, K.M., Iler, N., and Abate, C. (1993). Nucleotides flankinga conserved TAAT core dictate the DNA binding specificity of threemurine homeodomain proteins. Mol. Cell. Biol. 13: 2354–2365.

Chang, K.N., et al. (2013). Temporal transcriptional response to ethylenegas drives growth hormone cross-regulation in Arabidopsis. eLife 2:e00675.

Cheng, C., Yan, K.K., Hwang, W., Qian, J., Bhardwaj, N.,Rozowsky, J., Lu, Z.J., Niu, W., Alves, P., Kato, M., Snyder, M.,and Gerstein, M. (2011). Construction and analysis of an integratedregulatory network derived from high-throughput sequencing data.PLOS Comput. Biol. 7: e1002190.

Dai, X., and Zhao, P.X. (2011). psRNATarget: a plant small RNA targetanalysis server. Nucleic Acids Res. 39: W155–W159.

Dale, R.K., Pedersen, B.S., and Quinlan, A.R. (2011). Pybedtools:a flexible Python library for manipulating genomic datasets andannotations. Bioinformatics 27: 3423–3424.

Danecek, P., Auton, A., Abecasis, G., Albers, C.A., Banks, E.,DePristo, M.A., Handsaker, R.E., Lunter, G., Marth, G.T., Sherry,S.T., McVean, G., and Durbin, R., 1000 Genomes ProjectAnalysis Group (2011). The variant call format and VCFtools. Bio-informatics 27: 2156–2158.

De Bodt, S., Hollunder, J., Nelissen, H., Meulemeester, N., andInzé, D. (2012). CORNET 2.0: integrating plant coexpression, pro-tein-protein interactions, regulatory interactions, gene associationsand functional annotations. New Phytol. 195: 707–720.

de Folter, S., Immink, R.G., Kieffer, M., Parenicová, L., Henz, S.R.,Weigel, D., Busscher, M., Kooiker, M., Colombo, L., Kater, M.M.,Davies, B., and Angenent, G.C. (2005). Comprehensive interaction

map of the Arabidopsis MADS Box transcription factors. Plant Cell17: 1424–1433.

Deng, W., Ying, H., Helliwell, C.A., Taylor, J.M., Peacock, W.J., andDennis, E.S. (2011). FLOWERING LOCUS C (FLC) regulates developmentpathways throughout the life cycle of Arabidopsis. Proc. Natl. Acad. Sci.USA 108: 6680–6685.

Droit, A., Cheung, C., and Gottardo, R. (2010). rMAT—an R/Bioconductorpackage for analyzing ChIP-chip experiments. Bioinformatics 26:678–679.

Farrona, S., Thorpe, F.L., Engelhorn, J., Adrian, J., Dong, X., Sarid-Krebs, L., Goodrich, J., and Turck, F. (2011). Tissue-specific expressionof FLOWERING LOCUS T in Arabidopsis is maintained independently ofpolycomb group protein repression. Plant Cell 23: 3204–3214.

Ferrier, T., Matus, J.T., Jin, J., and Riechmann, J.L. (2011). Arab-idopsis paves the way: genomic and network analyses in crops.Curr. Opin. Biotechnol. 22: 260–270.

Foley, J.W., and Sidow, A. (2013). Transcription-factor occupancy atHOT regions quantitatively predicts RNA polymerase recruitment infive human cell lines. BMC Genomics 14: 720.

Franco-Zorrilla, J.M., López-Vidriero, I., Carrasco, J.L., Godoy, M.,Vera, P., and Solano, R. (2014). DNA-binding specificities of planttranscription factors and their potential to define target genes. Proc.Natl. Acad. Sci. USA 111: 2367–2372.

Gentleman, R.C., et al. (2004). Bioconductor: open software developmentfor computational biology and bioinformatics. Genome Biol. 5: R80.

Gerstein, M.B., et al. (2012). Architecture of the human regulatorynetwork derived from ENCODE data. Nature 489: 91–100.

Gerstein, M.B., et al., modENCODE Consortium (2010). Integrativeanalysis of the Caenorhabditis elegans genome by the modENCODEproject. Science 330: 1775–1787.

Haudry, A., et al. (2013). An atlas of over 90,000 conserved non-coding sequences provides insight into crucifer regulatory regions.Nat. Genet. 45: 891–898.

Heyman, J., Cools, T., Vandenbussche, F., Heyndrickx, K.S., VanLeene, J., Vercauteren, I., Vanderauwera, S., Vandepoele, K.,De Jaeger, G., Van Der Straeten, D., and De Veylder, L. (2013).ERF115 controls root quiescent center cell division and stem cellreplenishment. Science 342: 860–863.

Heyndrickx, K.S., and Vandepoele, K. (2012). Systematic identification offunctional plant modules through the integration of complementary datasources. Plant Physiol. 159: 884–901.

Higgins, J.A., Bailey, P.C., and Laurie, D.A. (2010). Comparativegenomics of flowering time pathways using Brachypodium distachyonas a model for the temperate grasses. PLoS ONE 5: e10065.

Hornitschek, P., Kohnen, M.V., Lorrain, S., Rougemont, J., Ljung, K.,López-Vidriero, I., Franco-Zorrilla, J.M., Solano, R., Trevisan, M.,Pradervand, S., Xenarios, I., and Fankhauser, C. (2012). Phytochromeinteracting factors 4 and 5 control seedling growth in changing lightconditions by directly controlling auxin signaling. Plant J. 71: 699–711.

Huang, W., Pérez-García, P., Pokhilko, A., Millar, A.J., Antoshechkin, I.,Riechmann, J.L., and Mas, P. (2012). Mapping the core of the Arabi-dopsis circadian clock defines the network structure of the oscillator.Science 336: 75–79.

Hunter, J.D. (2007). Matplotlib: A 2D graphics environment. Comput.Sci. Eng. 9: 90–95.

Immink, R.G., Posé, D., Ferrario, S., Ott, F., Kaufmann, K., Valentim, F.L.,de Folter, S., van der Wal, F., van Dijk, A.D., Schmid, M., andAngenent, G.C. (2012). Characterization of SOC1’s central role inflowering by the identification of its upstream and downstreamregulators. Plant Physiol. 160: 433–449.

Johnson, D.S., Mortazavi, A., Myers, R.M., and Wold, B. (2007).Genome-wide mapping of in vivo protein-DNA interactions. Science316: 1497–1502.

3908 The Plant Cell

Page 16: A Functional and Evolutionary Perspective on …A Functional and Evolutionary Perspective on Transcription Factor Binding in Arabidopsis thaliana C W Ken S. Heyndrickx, a,b,1 Jan Van

Jones, E., et al. (2001). SciPy: Open Source Scientific Tools forPython, http://www.scipy.org/.

Kaufmann, K., Muiño, J.M., Jauregui, R., Airoldi, C.A., Smaczniak,C., Krajewski, P., and Angenent, G.C. (2009). Target genes of theMADS transcription factor SEPALLATA3: integration of developmentaland hormonal pathways in the Arabidopsis flower. PLoS Biol. 7: e1000090.

Kaufmann, K., Wellmer, F., Muiño, J.M., Ferrier, T., Wuest, S.E.,Kumar, V., Serrano-Mislata, A., Madueño, F., Krajewski, P.,Meyerowitz, E.M., Angenent, G.C., and Riechmann, J.L. (2010).Orchestration of floral initiation by APETALA1. Science 328: 85–89.

Lee, I., Ambaru, B., Thakkar, P., Marcotte, E.M., and Rhee, S.Y.(2010). Rational association of genes with traits using a genome-scalegene network for Arabidopsis thaliana. Nat. Biotechnol. 28: 149–156.

Lee, J., He, K., Stolc, V., Lee, H., Figueroa, P., Gao, Y., Tongprasit,W., Zhao, H., Lee, I., and Deng, X.W. (2007). Analysis of tran-scription factor HY5 genomic binding sites revealed its hierarchicalrole in light regulation of development. Plant Cell 19: 731–749.

Li, H., and Durbin, R. (2009). Fast and accurate short read alignmentwith Burrows-Wheeler transform. Bioinformatics 25: 1754–1760.

Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N.,Marth, G., Abecasis, G., and Durbin, R., 1000 Genome ProjectData Processing Subgroup (2009). The Sequence Alignment/Mapformat and SAMtools. Bioinformatics 25: 2078–2079.

Li, Z., Li, B., Shen, W.H., Huang, H., and Dong, A. (2012). TCPtranscription factors interact with AS2 in the repression of class-IKNOX genes in Arabidopsis thaliana. Plant J. 71: 99–107.

Lin, R., Ding, L., Casola, C., Ripoll, D.R., Feschotte, C., and Wang,H. (2007). Transposase-derived transcription factors regulate lightsignaling in Arabidopsis. Science 318: 1302–1305.

Lindemose, S., Jensen, M.K., Van de Velde, J., O’Shea, C.,Heyndrickx, K.S., Workman, C.T., Vandepoele, K., Skriver, K.,and De Masi, F. (2014). A DNA-binding-site landscape and regulatorynetwork analysis for NAC transcription factors in Arabidopsis thaliana.Nucleic Acids Res. 42: 7681–7693.

Liu, T., Carlsson, J., Takeuchi, T., Newton, L., and Farré, E.M.(2013). Direct regulation of abiotic responses by the Arabidopsiscircadian clock component PRR7. Plant J. 76: 101–114.

MacArthur, S., et al. (2009). Developmental roles of 21 Drosophila tran-scription factors are determined by quantitative differences in binding to anoverlapping set of thousands of genomic regions. Genome Biol. 10: R80.

Mahony, S., and Benos, P.V. (2007). STAMP: a web tool for exploringDNA-binding motif similarities. Nucleic Acids Res. 35: W253–W258.

Marbach, D., Roy, S., Ay, F., Meyer, P.E., Candeias, R., Kahveci, T.,Bristow, C.A., and Kellis, M. (2012). Predictive regulatory modelsin Drosophila melanogaster by integrative inference of transcrip-tional networks. Genome Res. 22: 1334–1349.

Mathieu, J., Yant, L.J., Mürdter, F., Küttner, F., and Schmid, M.(2009). Repression of flowering by the miR172 target SMZ. PLoSBiol. 7: e1000148.

Meinke, D., Muralla, R., Sweeney, C., and Dickerman, A. (2008).Identifying essential genes in Arabidopsis thaliana. Trends Plant Sci.13: 483–491.

Mejia-Guerra, M.K., Pomeranz, M., Morohashi, K., and Grotewold,E. (2012). From plant gene regulatory grids to network dynamics.Biochim. Biophys. Acta 1819: 454–465.

Morohashi, K., and Grotewold, E. (2009). A systems approach revealsregulatory circuitry for Arabidopsis trichome initiation by the GL3 andGL1 selectors. PLoS Genet. 5: e1000396.

Moyroud, E., Minguet, E.G., Ott, F., Yant, L., Posé, D., Monniaux,M., Blanchet, S., Bastien, O., Thévenon, E., Weigel, D., Schmid,M., and Parcy, F. (2011). Prediction of regulatory interactions fromgenome sequences using a biophysical model for the ArabidopsisLEAFY transcription factor. Plant Cell 23: 1293–1306.

Nakamichi, N., Kiba, T., Kamioka, M., Suzuki, T., Yamashino, T.,Higashiyama, T., Sakakibara, H., and Mizuno, T. (2012). Tran-scriptional repressor PRR5 directly regulates clock-output path-ways. Proc. Natl. Acad. Sci. USA 109: 17123–17128.

Nègre, N., et al. (2011). A cis-regulatory map of the Drosophila ge-nome. Nature 471: 527–531.

Nei, M., and Li, W.H. (1979). Mathematical model for studying geneticvariation in terms of restriction endonucleases. Proc. Natl. Acad.Sci. USA 76: 5269–5273.

Oh, E., Zhu, J.Y., and Wang, Z.Y. (2012). Interaction between BZR1and PIF4 integrates brassinosteroid and environmental responses.Nat. Cell Biol. 14: 802–809.

Ouyang, X., et al. (2011). Genome-wide binding site analysis of FAR-RED ELONGATED HYPOCOTYL3 reveals its novel function in Ara-bidopsis development. Plant Cell 23: 2514–2535.

Palaniswamy, S.K., James, S., Sun, H., Lamb, R.S., Davuluri, R.V.,and Grotewold, E. (2006). AGRIS and AtRegNet. a platform to linkcis-regulatory elements and transcription factors into regulatorynetworks. Plant Physiol. 140: 818–829.

Para, A., et al. (2014). Hit-and-run transcriptional control by bZIP1mediates rapid nutrient signaling in Arabidopsis. Proc. Natl. Acad.Sci. USA 111: 10371–10376.

Posé, D., Verhage, L., Ott, F., Yant, L., Mathieu, J., Angenent, G.C.,Immink, R.G., and Schmid, M. (2013). Temperature-dependentregulation of flowering by antagonistic FLM variants. Nature 503:414–417.

Quinlan, A.R., and Hall, I.M. (2010). BEDTools: a flexible suite ofutilities for comparing genomic features. Bioinformatics 26: 841–842.

R Core Team (2012). R: A Language and Environment for StatisticalComputing. (Vienna, Austria: R Foundation for Statistical Computing).

Ren, B., et al. (2000). Genome-wide location and function of DNAbinding proteins. Science 290: 2306–2309.

Roy, S., et al., modENCODE Consortium (2010). Identification offunctional elements and regulatory circuits by Drosophila mod-ENCODE. Science 330: 1787–1797.

Sequeira-Mendes, J., Aragüez, I., Peiró, R., Mendez-Giraldez, R.,Zhang, X., Jacobsen, S.E., Bastolla, U., and Gutierrez, C. (2014).The functional topography of the Arabidopsis genome is organizedin a reduced number of linear motifs of chromatin states. Plant Cell26: 2351–2366.

Shalgi, R., Lieber, D., Oren, M., and Pilpel, Y. (2007). Global andlocal architecture of the mammalian microRNA-transcription factorregulatory network. PLOS Comput. Biol. 3: e131.

Singh, K.B. (1998). Transcriptional regulation in plants: the impor-tance of combinatorial control. Plant Physiol. 118: 1111–1120.

Stergachis, A.B., Haugen, E., Shafer, A., Fu, W., Vernot, B., Reynolds,A., Raubitschek, A., Ziegler, S., LeProust, E.M., Akey, J.M., andStamatoyannopoulos, J.A. (2013). Exonic transcription factor bindingdirects codon choice and affects protein evolution. Science 342: 1367–1372.

Storey, J.D., and Tibshirani, R. (2003). Statistical significance forgenomewide studies. Proc. Natl. Acad. Sci. USA 100: 9440–9445.

Suzuki, M., Ketterling, M.G., and McCarty, D.R. (2005). Quantitativestatistical analysis of cis-regulatory sequences in ABA/VP1- andCBF/DREB1-regulated genes of Arabidopsis. Plant Physiol. 139:437–447.

Tao, Z., Shen, L., Liu, C., Liu, L., Yan, Y., and Yu, H. (2012). Genome-wide identification of SOC1 and SVP targets during the floral tran-sition in Arabidopsis. Plant J. 70: 549–561.

Teytelman, L., Thurtle, D.M., Rine, J., and van Oudenaarden, A.(2013). Highly expressed loci are vulnerable to misleading ChIPlocalization of multiple unrelated proteins. Proc. Natl. Acad. Sci.USA 110: 18602–18607.

Function and Evolution of TF-Bound DNA 3909

Page 17: A Functional and Evolutionary Perspective on …A Functional and Evolutionary Perspective on Transcription Factor Binding in Arabidopsis thaliana C W Ken S. Heyndrickx, a,b,1 Jan Van

Thimm, O., Bläsing, O., Gibon, Y., Nagel, A., Meyer, S., Krüger, P.,Selbig, J., Müller, L.A., Rhee, S.Y., and Stitt, M. (2004). MAPMAN: a user-driven tool to display genomics data sets onto diagrams of metabolicpathways and other biological processes. Plant J. 37: 914–939.

Thomas-Chollier, M., Herrmann, C., Defrance, M., Sand, O.,Thieffry, D., and van Helden, J. (2012). RSAT peak-motifs: motifanalysis in full-size ChIP-seq datasets. Nucleic Acids Res. 40: e31.

Turatsinze, J.V., Thomas-Chollier, M., Defrance, M., and van Helden, J.(2008). Using RSAT to scan genome sequences for transcription factorbinding sites and cis-regulatory modules. Nat. Protoc. 3: 1578–1588.

Van Bel, M., Proost, S., Wischnitzki, E., Movahedi, S., Scheerlinck,C., Van de Peer, Y., and Vandepoele, K. (2012). Dissecting plantgenomes with the PLAZA comparative genomics platform. PlantPhysiol. 158: 590–600.

Vandepoele, K., Casneuf, T., and Van de Peer, Y. (2006). Identifi-cation of novel regulatory modules in dicotyledonous plants usingexpression data and comparative genomics. Genome Biol. 7: R103.

Van de Velde, J., Heyndrickx, K.S., and Vandepoele, K. (2014). In-ference of transcriptional networks in Arabidopsis through con-served noncoding sequence analysis. Plant Cell 26: 2729–2745.

Van Nostrand, E.L., and Kim, S.K. (2013). Integrative analysis ofC. elegans modENCODE ChIP-seq data sets to infer gene regulatoryinteractions. Genome Res. 23: 941–953.

Wang, C., Xu, J., Zhang, D., Wilson, Z.A., and Zhang, D. (2010). Aneffective approach for identification of in vivo protein-DNA bindingsites from paired-end ChIP-Seq data. BMC Bioinformatics 11: 81.

Wang, F., and Perry, S.E. (2013). Identification of direct targets ofFUSCA3, a key regulator of Arabidopsis seed development. PlantPhysiol. 161: 1251–1264.

Wang, J., et al. (2012). Sequence features and chromatin structurearound the genomic regions bound by 119 human transcriptionfactors. Genome Res. 22: 1798–1812.

Weigel, D., and Mott, R. (2009). The 1001 genomes project for Ara-bidopsis thaliana. Genome Biol. 10: 107.

Wheeler, D.L., et al. (2008). Database resources of the NationalCenter for Biotechnology Information. Nucleic Acids Res. 36: D13–D21.

Williams, M.E., Foster, R., and Chua, N.H. (1992). Sequencesflanking the hexameric G-box core CACGTG affect the specificity ofprotein binding. Plant Cell 4: 485–496.

Winter, C.M., et al. (2011). LEAFY target genes reveal floral regulatory logic,cis motifs, and a link to biotic stimulus response. Dev. Cell 20: 430–443.

Wray, G.A., Hahn, M.W., Abouheif, E., Balhoff, J.P., Pizer, M.,Rockman, M.V., and Romano, L.A. (2003). The evolution of tran-scriptional regulation in eukaryotes. Mol. Biol. Evol. 20: 1377–1419.

Wuest, S.E., O’Maoileidigh, D.S., Rae, L., Kwasniewska, K.,Raganelli, A., Hanczaryk, K., Lohan, A.J., Loftus, B., Graciet,E., and Wellmer, F. (2012). Molecular basis for the specification offloral organs by APETALA3 and PISTILLATA. Proc. Natl. Acad. Sci.USA 109: 13452–13457.

Xie, Z., Lee, E., Lucas, J.R., Morohashi, K., Li, D., Murray, J.A.,Sack, F.D., and Grotewold, E. (2010). Regulation of cell proliferation inthe stomatal lineage by the Arabidopsis MYB FOUR LIPS via directtargeting of core cell cycle genes. Plant Cell 22: 2306–2321.

Yant, L., Mathieu, J., Dinh, T.T., Ott, F., Lanz, C., Wollmann, H.,Chen, X., and Schmid, M. (2010). Orchestration of the floral tran-sition and floral development in Arabidopsis by the bifunctionaltranscription factor APETALA2. Plant Cell 22: 2156–2170.

Yip, K.Y., Cheng, C., Bhardwaj, N., Brown, J.B., Leng, J., Kundaje,A., Rozowsky, J., Birney, E., Bickel, P., Snyder, M., and Gerstein,M. (2012). Classification of human genomic regions based on ex-perimentally determined binding sites of more than 100 transcrip-tion-related factors. Genome Biol. 13: R48.

Yu, X., Li, L., Zola, J., Aluru, M., Ye, H., Foudree, A., Guo, H., Anderson,S., Aluru, S., Liu, P., Rodermel, S., and Yin, Y. (2011). A brassinoste-roid transcriptional network revealed by genome-wide identification ofBESI target genes in Arabidopsis thaliana. Plant J. 65: 634–646.

Zacher, B., Torkler, P., and Tresch, A. (2011). Analysis of AffymetrixChIP-chip data using starr and R/Bioconductor. Cold Spring HarbProtoc 2011: top110.

Zhang, W., Zhang, T., Wu, Y., and Jiang, J. (2012). Genome-wide iden-tification of regulatory DNA elements and protein-binding footprints usingsignatures of open chromatin in Arabidopsis. Plant Cell 24: 2719–2731.

Zhang, Y., Mayba, O., Pfeiffer, A., Shi, H., Tepperman, J.M., Speed,T.P., and Quail, P.H. (2013). A quartet of PIF bHLH factors providesa transcriptionally centered signaling hub that regulates seedlingmorphogenesis through differential expression-patterning of sharedtarget genes in Arabidopsis. PLoS Genet. 9: e1003244.

Zhang, Y., Liu, T., Meyer, C.A., Eeckhoute, J., Johnson, D.S.,Bernstein, B.E., Nusbaum, C., Myers, R.M., Brown, M., Li, W.,and Liu, X.S. (2008). Model-based analysis of ChIP-Seq (MACS).Genome Biol. 9: R137.

Zheng, Y., Ren, N., Wang, H., Stromberg, A.J., and Perry, S.E.(2009). Global identification of targets of the Arabidopsis MADSdomain protein AGAMOUS-Like15. Plant Cell 21: 2563–2577.

Zulawski, M., Braginets, R., and Schulze, W.X. (2013). PhosPhAtgoes kinases—searchable protein kinase target information in theplant phosphorylation site database PhosPhAt. Nucleic Acids Res.41: D1176–D1184.

3910 The Plant Cell

Page 18: A Functional and Evolutionary Perspective on …A Functional and Evolutionary Perspective on Transcription Factor Binding in Arabidopsis thaliana C W Ken S. Heyndrickx, a,b,1 Jan Van

DOI 10.1105/tpc.114.130591; originally published online October 31, 2014; 2014;26;3894-3910Plant Cell

Ken S. Heyndrickx, Jan Van de Velde, Congmao Wang, Detlef Weigel and Klaas Vandepoelethaliana

ArabidopsisA Functional and Evolutionary Perspective on Transcription Factor Binding in

 This information is current as of July 9, 2020

 

Supplemental Data /content/suppl/2014/10/14/tpc.114.130591.DC1.html

References /content/26/10/3894.full.html#ref-list-1

This article cites 100 articles, 43 of which can be accessed free at:

Permissions https://www.copyright.com/ccc/openurl.do?sid=pd_hw1532298X&issn=1532298X&WT.mc_id=pd_hw1532298X

eTOCs http://www.plantcell.org/cgi/alerts/ctmain

Sign up for eTOCs at:

CiteTrack Alerts http://www.plantcell.org/cgi/alerts/ctmain

Sign up for CiteTrack Alerts at:

Subscription Information http://www.aspb.org/publications/subscriptions.cfm

is available at:Plant Physiology and The Plant CellSubscription Information for

ADVANCING THE SCIENCE OF PLANT BIOLOGY © American Society of Plant Biologists