Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Genomewide Structural Annotation and Evolutionary Analysis of
the Type I MADS-Box Genes in Plants
Stefanie De Bodt,1 Jeroen Raes,1 Kobe Florquin,1 Stephane Rombauts,1 Pierre Rouze,1,2 Gunter Theißen,3
Yves Van de Peer1
1Department of Plant Systems Biology, Flanders Interuniversity Institute for Biotechnology, Ghent University, K.L Ledeganckstraat 35,
B-9000 Gent, Belgium2Laboratoire Associe de I’lnstitut National de la Recherche Agronomique (France), Ghent University, B-9000 Gent, Belgium3University of Jena, Lehrstuhl for Genetics, Philosophenweg 12, D-07743 Jena, Germany
Received: 1 August 2002 / Accepted: 18 November 2002
Abstract. The type I MADS-box genes constitute alargely unexplored subfamily of the extensivelystudied MADS-box gene family, well known for itsrole in flower development. Genes of the type IMADS-box subfamily possess the characteristicMADS box but are distinguished from type IIMADS-box genes by the absence of the keratin-likebox. In this in silico study, we have structurally an-notated all 47 members of the type I MADS-box genefamily in Arabidopsis thaliana and exerted a thoroughanalysis of the C-terminal regions of the translatedproteins. On the basis of conserved motifs in the C-terminal region, we could classify the gene family intothree main groups, two of which could be furthersubdivided. Phylogenetic trees were inferred to studythe evolutionary relationships within this largeMADS-box gene subfamily. These suggest for planttype I genes a dynamic of evolution that is signifi-cantly different from the mode of both animal type I(SRF) and plant type II (MIKC-type) gene phyloge-ny. The presence of conserved motifs in the majorityof these genes, the identification of Oryza sativaMADS-box type I homologues, and the detection ofexpressed sequence tags for Arabidopsis thaliana andother plant type I genes suggest that these genes areindeed of functional importance to plants. It is
therefore even more intriguing that, from an experi-mental point of view, almost nothing is known aboutthe function of these MADS-box type I genes.
Key words: Structural annotations — Type IMADS-box gene family — Arabidopsis — Rice —Classification
Introduction
The MADS-box gene family encodes a family oftranscription factors involved in diverse aspects ofplant development and has been designated by anacronym (Schwarz-Sommer et al. 1990) after a few ofits earliest members, namely, MCM1, found in yeast(Passmore et al. 1988), AGAMOUS, in Arabidopsisthaliana (Yanofsky et al. 1990), DEFICIENS, inAntirrhinum majus (Sommer et al. 1990; Schwarz-Sommer et al. 1992), and SRF, in human (Normanet al. 1988). All MADS-box genes encode a stronglyconserved MADS domain—found in the N-terminalregion—that is responsible for DNA binding toCC(A/T)6GG boxes in the regulatory region of theirtarget genes (Shore and Sharrocks 1995). Recentanalyses have shown that this large gene family canbe divided into two major lineages, called type I andtype II (Alvarez-Buylla et al. 2000b). Since both typeI and type II genes are found in plants, animals, andfungi, both types of MADS-box genes are assumed to
J Mol Evol (2003) 56:573–586DOI: 10.1007/s00239-002-2426-x
Correspondence to: Yves Van de Peer; email: [email protected].
ac.be
have originated by duplication before the divergenceof these kingdoms. Based on the structure of theMADS domain, type I and type II genes are alsoreferred to as MADS SRF-like and MADS MEF2-like genes, respectively (Alvarez-Buylla et al. 2000b).In animals, type I genes are involved in response to
growth factors, while type II genes are involved inmuscle development (Norman et al. 1988; Yu et al.1992). Besides the highly conserved MADS domain,animal type I (SRF-like) and type II (MEF2-like)genes contain an additionally conserved region, theSAM and MEF2 domain, respectively (Shore andSharrocks 1995, Riechmann and Meyerowitz 1997;Alvarez-Buylla et al. 2000b). The same is true forFungi.Plant type II MADS-box genes possess a strongly
conserved MEF2-like MADS box, followed by aweakly conserved I (intervening) box, a K (keratin-like) box, and a C box and are therefore termed theMIKC-type (MIKC) genes (Munster et al. 1997). Themoderately conserved K domain has been shown tobe important for protein–protein interactions andprobably forms a coiled-coil structure. The poorlyconserved carboxyl-terminal (C) region may functionas a trans-activation domain (Riechmann andMeyerowitz 1997). Plant type II MADS-box geneshave been extensively studied during the last decadeand are best known for their role in flower develop-ment (see, e.g., Riechmann and Meyerowitz 1997;Pelaz et al. 2000; Theißen et al. 2000; Ng andYanofsky 2001; Theißen 2001; Theißen and Saedler2001). Besides this role, MADS-box genes also havean important function in the development of otherplant organs such as fruit (Liljegren et al. 1998, 2000),roots (Zhang and Forde 2000; Alvarez-Buylla et al.2000a; Burgeff et al. 2002), and ovules (Angenent andColombo 1996). The type II MADS-box transcrip-tion factors provide an excellent genetic toolkit tostudy the evolution of plant development. Alterationsin the expression of genes coding for transcriptionalregulators, such as MADS-box genes, are emergingas a major source of the diversity and change thatunderlie evolution and can be linked to changes inplant body plan or the generation of evolutionarynovelties (Riechmann et al. 2000; Theißen 2001).Unlike the type II MADS-box genes in plants, the
type I subfamily has remained largely unexplored.Plant type I MADS-domain proteins are charac-terized by an SRF-like MADS domain but the C-terminal region of these genes is still not well definedand is of variable length. Furthermore, type I genesare characterized by the absence of the well-defined Kbox. Based on phylogenetic tree inference, Alvarez-Buylla et al. (2000b) concluded that this K box arosein plant type II genes after the divergence of plantsand animals and fungi. Hitherto, only a few membersof this subfamily have been identified by in silico
prediction in Arabidopsis thaliana, whereas theirfunction remains completely unknown (Alvarez-Buylla et al. 2000b). The recent discovery of this newsubfamily of MADS-box genes in Arabidopsis thali-ana and the lack of knowledge about their functionurge upon the full characterization of this gene familyin Arabidopsis thaliana and the identification of ho-mologues in other plants. Moreover, further analysisof the type I MADS-box gene family may be veryimportant in understanding the origin and evolutionof the whole MADS-box gene family. In this respect,we have analyzed the size and the structural charac-teristics of the type I subfamily in Arabidopsis thali-ana and have identified the first type I MADS-boxgenes in Oryza sativa. The completion of the Ara-bidopsis thaliana genome sequence (Arabidopsis Ge-nome Initiative 2000) allows investigation of the fullcomplement of MADS-box type I genes in this modelplant. The structural annotation of the gene familywas done in a semiautomated way, combining high-throughput gene prediction with a manual controlstep. By using this approach we tried to combinespeed with accuracy because future research on thesesequences depends on the correctness of their anno-tation. Additionally, we performed a phylogeneticanalysis of the type I subfamily of MADS-box genesto study the evolutionary relationships between thenewly annotated genes.
Methods
Structural Annotation of Type I MADS-Box Genes
The annotation of the type I MADS-box gene family in Arabidopsis
thaliana was based on homology searches with the conserved part
of the genes of the family. Hence, the MADS domain of the type I
MADS-domain proteins identified by Alvarez-Buylla et al. (2000b)
was used as a query sequence in BLAST (tblastn using default
parameters) searches (Altschul et al. 1990) against the sequences of
the Arabidopis genome. The E-value cutoff was initially set at 1e-10,
where hits with higher E-values were selected manually, taking into
account the conserved, possibly functionally important residues in
the MADS domain. The genomic sequences containing putative
type I MADS-box genes were subjected to gene prediction using
GeneMark.hmm (Lukashin and Borodovsky 1998). A manual
control step of the annotation involved the inspection of the exon–
intron structure and the multiple alignment of the MADS-domain
protein sequences using Artemis (Rutherford et al. 2000) and
BioEdit (Hall 1999). Based on similarity to close relatives of the
gene family, wrongly predicted exon borders and over- or under-
prediction of exons were detected and corrected. To identify more
distantly related proteins, we also constructed a HMMer profile
(Eddy 1998) based on the already predicted and manually corrected
genes. This profile was used to search a nonredundant database
containing a collection of Arabidopsis thaliana proteins found
through prediction with GeneMark.hmm (Lukashin and Borod-
ovsky 1998) on the Arabidopsis thaliana genome (genome version
of January 18 2001 [v180101], downloaded from the MIPS ftp-site
at ftp://ftpmips.gsf.de/cress/). These gene predictions were then
checked again manually.
Additionally, we searched for type I MADS-domain proteins in
Oryza sativa. Based on the multiple sequence alignment of the
574
Table 1. Arabidopsis thaliana and Oryza sativa type I MADS-box genes
Locus name Gene Accession No.a Start Stop Length Strand Chromosome EST Class
At1g28460 AC010155 35,082 35,630 182 ) 1 M
At1g28450 AC010155_2 37,337 37,894 185 + 1 M
At1g60880 AC018908_2 24,777 25,352 191 ) 1 M
At1g60920 AC018908_1 6,660 7,265 201 + 1 M
At3g04100 AC016829 84,782 85,405 207 + 3 M
At1g01530 AGL28 Y12776 6,766 7,788 247 + 1 M
At1g65360 AGL23 AC004512_2 47,399 48,213 226 + 1 M
At2g24840 AC006585 25,227 25,859 210 + 2 M
At5g60440 ABO11483 26,829 28,020 299 + 5 M
At4g36590 AGL40 AL161589 121,429 123,079 243 ) 4 M
At5g38620 AB005231 463,826 464,875 349 ) 5 M
At5g49420 AB023034 34,638 36,134 402 ) 5 M
At2g34440 AGL29 AC004077 16,781 17,299 172 + 2 M
At1g48150 AC023673 497,767 498,738 323 + 1 M
At5g27130 AGL39 AF007271 71,901 75,618 435 ) 5 M
At1g47760 AC012463 70,240 70,948 184 ) 1 M
At3g66656 AC036106 29,224 29,760 178 ) 3 M
At4g14530 AL161539 46,973 47,658 213 ) 4 M
At5g49490 AB023033 10,587 11,330 247 + 5 M
At5g04640 AL162875 89,521 90,489 322 + 5 M
Os_AP003951_1 28,733 29,365 633 ) 6 M
OS_AP003951_2 50,199 50,771 572 ). 6 M
Os_AP003627 102,168 102,794 627 + 1 M
Os_AP004093 73,268 73,128 861 + 2 M
Os_Contig2417 5,705 9,967 210 + M
Os_Contig4095 1,453 2,109 218 + M
Os_Contig4276 6,289 6,921 210 + M
Os_Contig28459 1,540 2,078 141 ) M
Os_Contig18609 465 1,049 194 + M
At5g26580/
At5g26575bAF058914 4,471 5,508 304 + 5 N
At5g26630/
At5g26625bAF058914_2 40,425 47,737 315 ) 5 N
At5g26650/
At5g26575bAF058914_3 53,688 54,794 327 ) 5 X N
At1g65330 AC004512_1 32,543 33,382 279 ) 1 N
At1g65300 AC004512_3 21,003 21,827 278 + 1 N
At3g05860 AC012393 56,275 57,224 260 ) 3 N
At2g28700 AC007184 3,732 4,502 256 ) 2 N
At5g27960 AC007627 64,277 65,368 363 ) 5 N
At5g48670 AB015468 59,946 60,911 321 ) 5 N
At1g31630 AC074360_2 59,951 60,970 339 + 1 N
At1g31640 AC074360_1 55,322 56,806 464 + 1 N
At2g40210 AC018721 40,137 42,106 402 ) 2 X N
At2g26880 AGL41 AC005168 51,386 52,188 260 ) 2 N
Os_AP002070_1 57,225 57,947 240 + 1 N
Os_AP002070_2 71,207 72,127 306 + 1 N
Os_Contig28311 1,167 1,580 138 ) N
Os_Contig603 3,973 4,776 267 + N
Os_Contig23118 1,479 1,904 141 + N
Os_Contig18573 850 1,079 209 + N
Os_Contig119850 52 667 205 ) N
Os_Contig31610 1,805 2,215 136 ) N
Os_Contig18149 1,420 2,035 205 ) N
At2g03060c AGL30 AC004138 81,852 83,914 364 + 2 O
At1g31140 AC004793 29,171 30,813 211 + 1 O
At1g22590 AC006551 24,033 24,524 163 ) 1 X O
At1g77950 AC009243 50,294 52,808 244 + 1 X O
At1g72350 AC016529 73,282 73,956 224 + 1 O
At1g17310 AC026479 2,189 2,827 212 ) 1 O
At5g26950 AGL26 AF007270 84,574 85,554 292 ) 5 O
At2g26320 AGL33 AC004484 66,595 68,743 209 ) 2 O
At1g18750c AC011809 60,126 62,604 440 + 1 O
At1g22130 AC0690252 2,812,402 2,814,274 335 ) 1 O
Continued
575
Arabidopsis thaliana type I MADS-domain proteins, a HMMer
profile (Eddy 1998) was built to search a rice protein database for
type I MADS-domain proteins. This database contained 24,305
rice proteins predicted with GeneMark.hmm (Lukashin and Bor-
odovsky 1998) on rice BAC sequences from the Rice Genome
Project covering approximately 29% of the rice genome (Oryza
sativa spp. japonica [Sasaki and Burr 2000; http://rgp.dna.af-
frc.go.jp/]). Furthermore, we screened the draft sequence of Oryza
sativa spp. indica (Yu et al. 2002) for putative type I MADS-box
genes using BLAST, with other type I genes as query sequences.
Duplicated blocks (i.e., large regions of colinearity) in the
Arabidopsis thaliana genome were detected and dated as described
earlier (Vandepoele et al. 2002; Raes et al. 2003).
Structural Analysis of the C-Terminal Region
All type I MADS-box genes possess the strongly conserved MADS
box. However, the C-terminal region of these genes is much less
conserved and has a variable length. We performed a motif search
on all type I MADS-domain protein sequences using MEME
(Multiple Expectation Minimization for Motif Elicitation), version
3.0 (Bailey and Elkan 1994). Based on the conserved motifs found
by MEME, the type I MADS-box gene family was further subdi-
vided into smaller subgroups, after which these subgroups were
realigned, now taking into account additional sites that could be
proven to belong to shared and conserved motifs.
A HMMer profile (Eddy 1998) was built from the different
motifs identified by MEME (see Results). These profiles were
scanned against our in-house Arabidopsis thaliana protein database
(see Structural Annotation and1 Phylogenetic Analysis, below) and
the MIPS protein database to search for other proteins that contain
similar motifs. The InterPro database (release 4.0, November 2001
[Apweiler et al. 2001]) was also checked for the presence of the C-
terminal motifs.
To make sure that no type II MADS-domain proteins have
been included in our data set, all sequences were analyzed for the
presence of the type II-specific K domain using InterPro searches
(release 4.0, November 2001 [Apweiler et al. 2001]) and Multicoil
(Wolf et al. 1997) for coiled-coil prediction based on the presence of
heptat-repeat signature motifs (abcdefg, where a and d are hy-
drophobic residues and are pointing to the core of the coiled-coil
and b, d, e, f, and g are hydrophylic residues) in the sequences
(Lupas 1996).
Table 1. Continued
Locus name Gene Accession No.a Start Stop Length Strand Chromosome EST Class
At1g77980 AC009243_2 58,477 60,332 303 ) 1 O
At1g69540 AC073178 88,592 90,480 359 ) 1 O
At5g06500 AP002543 7,047 7,775 728 + 5 O
At5g58890 AGL43 AB016885 33,758 34,642 294 + 5 O
At5g55690 AB009050 40,372 41,205 277 ) 5 O
Os_AP000616 39,576 42,209 855 + 6 O
Os_AP003104 53,129 54,943 1,815 + 1 O
Os_AP003331_1 86,944 88,167 1,224 + 1 O
Os_AP003331_2 89,653 90,947 975 + 1 O
Os_AP003380 8,256 9,365 1,110 ) 1 O
Os_AP003436 171,451 172,890 1,440 ) 1 O
Os_AP003763 127,881 128,597 279 + 6 O
Os_AP003742 63,104 64,429 645 ) 7 O
Os_AP004322 4,659 10,836 477 + 6 O
Os_AP003331_3 95,818 98,653 1,188 + 1 O
Os_Contig19550 853 1,324 375 + O
Os_Contig52002 790 ? ? + O
Os_Contig20368 ? 405 ? ) O
Os_Contig45237 1,180 ? ? + O
Os_Contig11428 5,081 ? ? + O
Os_Contig32902 2,555 ? ? )Os_Contig2175 12,725 ? ? ) Unassignedd
Os_Contig5668 ? 5,842 ? )Os_Contig21589 ? 2,624 ? )
a Rich genes are in boldface.b Genes on BAC AF058914 have different locus names in MIPS and TIGR (http://www.tigr.org), respectively.c Locus names of genes in MIPS (Schoof et al. 2002) that differ in their structural annotation from those presented here.d These genes could not be classified unambiguously because the prediction was incomplete (see text for details).
Table 2. List of MADS-like genes in Arabidopsis thaliana
Locus name Accession No., BAC
At5g27090 AF170760
At5g27070 AF170670
At5g27580 AC007478
At5g26950 AF007270
At4g11250 AL096882
At5g65330 AB011479
At5g40220 AB010699
At5g39750 AB016876
At5g38740 AB011478
At5g40120 AB010699
At5g39810 AB016876
At5g41200 AB010072
At3g18650 AB026654
At5g27050 AF170670
At1g60040 AC005966
At1g59810 AC007258
576
Phylogenetic Analysis of Type I MADS-DomainProteins
The complete alignment of all type I MADS-domain proteins was
edited and reformatted for phylogenetic analysis using BioEdit
(Hall 1999) and ForCon (Raes and Van de Peer 1999), resulting in
an alignment of the conserved residues (MAD domain + residues
of shared motifs). Neighbor-joining (Saitou and Nei 1987) trees
were constructed using TREECON (Van de Peer and De Wachter
1997) based on Poisson-corrected distances. To assess support for
the inferred relationships, 500 bootstrap samples (Felsenstein 1985)
were generated.
Maximum likelihood trees were constructed for type I MADS-
box genes (see below) using TREE-PUZZLE 5.0 (Strimmer and
von Haeseler 1996; Schmidt et al. 2002) and PAML (Yang 2000).
In TREE-PUZZLE, the mutation probability matrix of Muller and
Vingron (2000) was used, whereas the number of puzzling steps was
set to 20,000. Bootstrapped maximum parsimony trees for class M
and class N genes were constructed with PAUP* (Swofford 1998).
Predicted sequences and multiple alignments are available at our
web site, http://www.psb.ac.be/bioinformatics/MADS/.
Results
Structural Annotation and Phylogenetic Analysis
Based on a genomewide analysis, we identified 47type I MADS-box genes in the genome of Arabidopsisthaliana, of which 14 correspond to genes previouslydescribed by Alvarez-Buylla et al. (2000b) and 33 arenew (see Table 1). Additionally, we discovered thepresence of a new group of MADS-like genes. Thesegenes are different from type I (and also type II)MADS-box genes due to a highly divergent N-ter-minal region of the MADS box. Furthermore, al-
Fig. 1. Chromosomal localization of the type I MADS-box genes in Arabidopsis thaliana. Gray bands denote duplicated blocks (see text for
details).
577
though most of these genes are overall strongly con-served, they do not possess the C-terminal conservedregions characteristic of type I (or type II) genes. Forthese reasons, we did not include these genes (listed inTable 2) in our analyses.Figure 1 shows the distribution of the type I
MADS-box genes on the different chromosomes.Seven genes could be linked to block duplications,namely, both the gene pairs AC016529 andAC026479 and the gene pairs AC009243_2 andAC069252, which are all located in an internallyduplicated block that contains 172 duplicatedgenes on chromosome 1 (Simillion et al. 2002; Raeset al. 2003). Additionally, genes AC012393 andAF058914_2 (and its neighbor AF058914_3) belongto a smaller block of 13 genes duplicated betweenchromosome 3 and chromosome 5 (Fig. 1). Thelargest block has been dated to 69 ± 17 MYA, whilethe smaller block duplication was dated to 78 ± 29MYA, which implies that they could both haveoriginated during the same complete genome dupli-cation event, estimated to have occurred at aboutthat time (Lynch and Conery 2000; Simillion et al.2002; Raes et al. 2003).Figure 2a shows the distribution of the number of
exons found in type I MADS-box genes. As can beobserved, the majority of the type I genes consist ofonly one or two exons, which is quite different fromtype II MADS-box genes, where most genes consist
of seven exons (Fig. 2b). In addition to the Arabid-opsis thaliana type I genes 16 rice type I MADS-boxgenes were annotated on BAC sequences of the riceconsortium (Sasaki and Burr 2000). Preliminaryanalysis of the draft sequence of rice resulted in theadditional identification of 19 putative type IMADS-box genes. Six other genes were foundthrough BLAST searches on the rice draft sequencebut could not be ascribed unequivocally to the type Isubfamily. Further analysis and manual annotationof these rice genes will be necessary to decide whetherthese are type I or type II genes. Furthermore, toimprove gene prediction in rice, an assembly of thecontigs of the draft sequence will be necessary be-cause many MADS-box genes are located at the endof the contigs. We also searched the publicly avail-able databases for type I MADS-box genes of otherplants but could not find any other type I homo-logues. It should be noted that the sequencing andannotation of other plant sequences are still ongoing,which will probably result in the detection of manymore type I MADS-domain proteins in the near-future.The construction of reliable phylogenetic trees of
the complete type I subfamily of MADS-domainproteins is very difficult due to the small size (60amino acids) of the conserved MADS domain. Treesconstructed on such a low number of residues oftenturn out to be unreliable and poorly supported bystatistical analyses. As shown in Fig. 3, very fewnodes are well supported and no conclusion can bedrawn about possible subclasses present in the type IMADS-box gene family. Therefore, we applied al-ternative approaches to resolve the phylogeny of thegene family (see also Methods).Detailed structural analysis using MEME (Bailey
and Elkan 1994) enabled us to discover several con-served motifs in the C-terminal region of the type IMADS-domain proteins (summarized in Figs. 4 and5). Two main distinct classes of type I MADS-domain proteins, which we designate class M andclass N, can be identified, each of which can be fur-ther subdivided. Class M possesses three types ofgenes, viz., type I M1 genes, which are characterizedby motifs 1, 2, and 3; type I M2 genes, characterizedby motifs 1 and 3; and type I M3 genes, which con-tain only motif 1 (Fig. 4). Class N possesses threetypes of genes, viz., type I N1 genes, which arecharacterized by motifs 4, 5, 6, 7, sometimes 8, and 9;type I N2 genes, which possess motifs 4 and 5 andhave a degenerated form of motif 6; and, finally, typeI N3 genes, which contain only motifs 4 and 5 (Fig.5). Next to class M and class N genes, there is a thirdclass O of genes that do not possess the same con-servation in the C-terminal region as the proteins inthe other classes. Thus, although specific motifs couldbe identified for class M and N genes, it was not
Fig. 2. Distribution of the number of exons in the type I (a) and
type II (b) MADS-box gene family.
578
Fig. 3. Phylogenetic distance tree of all type I MADS-box pro-
teins identified in Arabidopsis thaliana and Oryza sativa. Tree
construction was based on only 47 conserved residues in the
MADS domain. Five hundred bootstrap samples (Felsenstein
1985) were taken and branches are drawn as unresolved when
supported by less than 50%. Based on the presence or absence of C-
terminal motifs, genes were ascribed to class M, N, or O (see text
for more details). Rice proteins are indicated in gray. The scale
indicates 0.1 substitution per site.
possible to find any conserved motif for the proteinsthat we classified as belonging to class O. It should benoted that type I MADS-box genes of rice have beenfound for all three classes (see Fig. 1).Classification of the type I MADS-box genes into
classes M and N on the basis of the presence of cer-tain conserved motifs allowed alignment of longerregions of the type I MADS-box genes. Therefore, aphylogenetic tree was constructed for genes belongingto class M from an alignment of 76 conserved resi-dues, including the MADS domain and motif 1(shared among all the genes belonging to class M),whereas a second tree for class N genes was con-structed from an alignment of 116 conserved residues,based on the MADS domain and motifs 4 and 5.These trees are shown in Figs. 6 and 7, respectively.Both trees were artificially rooted based on thepresence or absence of certain motifs. As expected, ingeneral there is a clear correlation between the treetopology and the structural characteristics of a groupof proteins. In other words, proteins with the sameC-terminal motif composition seem to be moreclosely related. In a few cases, remnants of commonancestry can be found, but the conservation was toolow to be picked up by MEME. For example, genesof type I N2 do not contain motif 6 according to
MEME, but some residues of the consensus sequenceof this motif can still be recognized in these proteins.Therefore, these motifs are represented by hatchedboxes (Figs. 6 and 7).The trees shown in Figs. 6 and 7 are neighbor-
joining trees (Saitou and Nei 1987) based on Poisson-corrected distances computed with TREECON (Vande Peer and De Wachter 1997). Overall, maximumlikelihood trees and maximum parsimony trees gavesimilar results and differences were observed only fornonsupported nodes. As expected, the resolution ofthe trees seems to be correlated with the number ofresidues that could be taken into account for tree in-ference. The tree of class M genes, shown in Fig. 6 andbased on 76 alignment positions, is still not very wellresolved, apart from one subgroup of sequences thatalso contain additional conserved motifs (type I M1and type I M2). Although strong conclusions cannotbe drawn regarding the rice genes, due to the uncer-tainty of most branching orders, it seems that none ofthe rice genes is specifically related to any of theArabidopsis thaliana genes. This is also observed in thetree of the N genes, based on 116 alignment positions,where the rice genes clearly form a monophyleticgroup, which is well supported by bootstrap analysisand different methods of tree construction (Fig. 7).
Fig. 4. Conserved motifs in the C-terminal region of class M
proteins of the type I MADS-box gene family found by MEME
(Bailey and Elkan 1994). Rice genes are preceded by the prefix Os.
The multilevel consensus sequence is calculated from the motif
position-specific probability matrix computed by MEME. For each
column of the motif, the amino acid residues are sorted in de-
creasing order by the probability with which they are expected to
occur at a certain position of the motif. The most probable amino
acid is on top. Only amino acids with probabilities of 0.2 or higher
at that position in the motif are listed.
580
Functional Annotation
To assign a putative function to the type I MADS-box genes, we analyzed the C-terminal part of thesegenes in more detail. Genes that encode transcriptionfactors often contain a transcription-activating do-main. Three types of trans-activation domains aredescribed in the literature: they are rich in acidicresidues, proline residues, or glutamine residues buthave low overall conservation on the primary struc-ture level (Latchman 1998). Type I M1 and type I M2
proteins contain an acidic region in their character-istic motif 3. Class N proteins all contain a proline-rich region, starting from approximately position160. This region shows low conservation on the pri-mary sequence level and does not correlate with anyparticular C-terminal motif designated by MEME.However, as stated before, the abundance of prolinesin this region might possibly refer to the trans-acti-vation domain of these proteins (Latchman 1998).However, apart from these putative trans-activationdomains, little can be said about theC-terminal region.
Fig. 5. Conserved motifs in the C-terminal region of class N proteins of the type I MADS-box gene family found by MEME. Interpre-
tation is as in Fig. 4.
581
For example, no similarity could be found betweenthe profiles inferred from the conserved motifs andany previously described motifs or domains (InterProrelease 4.0, November 2001 [Apweiler et al. 2001]).To get more information on the expression of type
I MADS-box genes, and their possible functionalannotation, we screened Arabidopsis thaliana ESTs,rice ESTs, and an EST collection containing allpublicly available ESTs from diverse plant species.However, the number of ESTs corresponding to typeI MADS-box genes of Arabidopsis thaliana was ex-tremely low (see Table 3), in particular, in compari-son with ESTs for type II genes, where on averagefour or five ESTs per gene could be identified. Wefound one EST (C99890) for type I gene AGL39(type I M3[b]), which had also been identified previ-ously by Alvarez-Buylla et al. (2000a) and ESTs forfour other Arabidopsis thaliana genes. Some ESTsfrom other plant species could be found that werelong enough to demonstrate unambiguously that theyare ESTs from type I MADS-box genes (Table 3).ESTs of type I MADS-box genes are found in diverse
plant species such as Glycine max, Lycopersicon es-culentum, Triticum aestivum, and even Ceratopterisrichardii (a fern) and Physcomitrella patens (a moss).
Discussion
Detailed structural and evolutionary analysis of thetype I subfamily of MADS-box genes suggests thatthese genes are indeed of functional importance inplants. The type I subfamily possesses 47 members,which is more than the number of members of thevery well-studied type II subfamily (unpublished re-sults). Moreover, in a first preliminary analysis, 33type I genes have already been identified in Oryzasativa spp. japonica on BAC sequences of the riceconsortium (December 2001) and on the draft se-quence of Oryza sativa spp. indica. Furthermore,Arabidopsis thaliana and rice type I proteins still haveconserved common motifs in their C-terminal region(rice genes are present in the type I M3[a] and type IN3 classes). This conservation is most likely due to
Fig. 6. Pairwise distance tree of the type I MADS-box genes
belonging to class M (see text for details), inferred from a sequence
alignment including sites of the MADS domain and motif 1. The
motif composition of each gene is denoted by a black line (repre-
senting the length of the sequence) and shaded boxes. A hatched box
denotes a degenerated form of the motif. Rice genes are preceded
by the prefix Os. Interpretation of the scale is as in Fig. 3.
582
functional constraints on the C-terminal region,although the overall functional constraint within thetype I genes has probably been lower than that withinthe type II genes. This is, among other things, sup-ported by the higher evolutionary distances betweentype I MADS-box genes (Alvarez-Buylla et al. 2000b;our own observations).Unfortunately, based on in silico analyses, we
cannot assign a putative function to the type IMADS-domain proteins. The low number of ESTsfound for type I MADS-box genes of different plantspecies can probably be attributed to the fact thatmost of the type I genes have a very low expressionlevel or that the genes are expressed under very spe-cific conditions that are not yet monitored in EST-sequencing projects. Strikingly, nearly half of type Igenes are intronless (Fig. 2). This gene structure couldpossibly be interpreted as a result of the evolutionaryhistory of the type I genes through reverse tran-scription, with the possibility that many of them areinactive pseudogenes. However, it should be notedthat gene AC006551, for which we found three ESTs,consists of only one exon, which argues that at leastsome of these genes are expressed and functional, andnot pseudogenes as put forward by Ng and Yanofsky(2001). In maize, transposon-like elements have beenidentified that have recently hijacked AGAMOUS-like (type II) MADS boxes and distributed themthrough the maize genome (Fischer et al. 1995;Montag et al. 1995, 1996). To investigate whether thiscould have been the case for the Arabidopsis thaliana
type I genes, we looked for characteristic transposon-like elements in the flanking and coding regions of thetype I genes. To this end, we searched for similaritywith known (retro)transposons and with proteinsinvolved in their activity such as pol, gag, and RT(Bennetzen 2000). However, no evidence for thepresence of transposable elements could be found inour analyses.As stated previously, all the type I MADS-box
class N rice genes form a well-supported monophy-letic grouping, while a monophyletic origin of the riceclass M genes cannot be ruled out on the basis of treeinference. If true, and provided that the root in Figs.6 and 7 is placed correctly, this would suggest that theexpansion of both the Arabidopsis thaliana and therice class M and N type I MADS-box genes (nothingcan be said about genes from class O) occurred afterthe divergence of these two plants, somewhere be-tween 150 and 200 MYA (Wikstrom et al. 2001). Thisis in clear contrast with observations in MADS typeII phylogenies, according to which the last commonancestor of extant gymnosperms and angiospermsalready contained at least seven different MIKC-typeMADS-box genes (Becker et al. 2000). If type IMADS-box genes were present in the most recentcommon ancestor of plants, animals, and fungi, assuggested by Alvarez-Buylla et al. (2000b), and ourobservations are correct, this would imply that type IMADS-box genes may have remained low-copy (oreven single-copy) for many hundreds of millions ofyears, until the most recent common ancestor of
Fig. 7. Pairwise distance tree of the type I MADS-box genes belonging to class N, inferred from a sequence alignment including sites of the
MADS domain and motifs 4 and 5. Interpretation is as in Fig. 6. Interpretation of the scale is as in Fig. 3.
583
Arabidopsis thaliana and rice, and then started tomultiply independently, giving rise to high genenumbers in both Arabidopsis thaliana and rice. Thisseems highly unrealistic, given the evolutionary his-tory of type II MADS-box genes (Becker et al. 2000;Krogan and Ashton 2000; Theißen et al. 2001). Analternative explanation is that the type I genes fromanimals, and plants are not monophyletic, i.e., thatthey originated two times independently in plants andanimals, and, at least for plants, much more recentlythan previously suggested. In line with this, the type Igenes from animals (SRF-like genes) have a structurewhich is significantly different from that of plant typeI genes, and obvious sequence similarity betweenboth gene types is restricted to the MADS domain
anyway (Alvarez-Buylla et al. 2000b). Animal type Igenes have an evolutionary history which is differentfrom that of plant type I genes: while the genenumber of the latter increased dramatically in thelineages that led to extant Arabidopsis thaliana andrice (this work), SRF seems to have remained a sin-gle-copy gene throughout the more than 500 millionyears of animal evolution and represents the evolu-tionarily most conserved subfamily of MADS-boxgenes (Escalante and Sastre 1998; Hoffmann andKroiher 2001; Scheffer et al. 1997). As stated byAlvarez-Buylla et al. (2000b), the type I MADS-boxclade in plants is defined by only one putative syna-pomorphy, while some synapomorphies are sharedby all but one or a few sequences; this cannot be
Table 3. List of ESTs found for type I MADS-box6 genes in different plant species
EST Gene Plant species Expressiona
AV558219 AF058914_3 Arabidopsis thaliana Organ: green siliques
AV823886 AC018721 Arabidopsis thaliana Developmental stage: in various developmental stages, from
germination to mature seeds
Treatment: dehydration and cold
AV787106 AC018721 Arabidopsis thaliana Idem
AV787440 AC018721 Arabidopsis thaliana Idem
AV788503 AC018721 Arabidopsis thaliana Idem
AV784963 AC018721 Arabidopsis thaliana Idem
AU238686 AC006551 Arabidopsis thaliana Treatment: cold
Z37169 AC006551 Arabidopsis thaliana Tissue type: green shoots
F13558 AC006551 Arabidopsis thaliana Tissue type: green shoots
AU236968 AC009243 Arabidopsis thaliana Organ: flowers and siliques
AV556667 AF058914 Arabidopsis thaliana Organ: green siliques
BE610209
BE823841
AW508033
Glycine max
Glycine max
Glycine max
Tissue type: immature seed coats of greenhouse-grown plants
From cDNA libraries5 from various tissues and stages of development
of soybean that represent 2639 sequences from immature cotyledons,
1770 from immature seed coats, 3938 from flowers, and 869 from
young pods
From a cDNA library that was constructed from mRNA
isolated from immature cotyledons of greenhouse-grown plants
BE054256 Gossypium arboreum Tissue type: fibers isolated from bolls harvested 7–10 dpa
BE999756 Medicago truncatula Tissue type: senescent root nodules
Developmental stage: mixture of effective nodules from 40-day-old
plants harvested 36 h post-shoot removal and nodules collected from
2-month-old plants at midpod stage
AW029842 Lycopersicon esculentum Tissue type: callus
Developmental stage: 25–40 days old
BI929334 Lycopersicon esculentum Tissue type: flower
Developmental stage: 3- to 8-mm buds
BG139571 Lycopersicon pennellii Tissue type: pollen
Developmental stage: pollen collected from open flowers
BJ247094 Triticum aestivum Tissue type: spike at flowering date
Developmental stage: Feekes’ scale 10.5.1
BJ248139 Triticum aestivum Idem
BJ218990 Triticum aestivum Tissue type: spike at meiosis
Developmental stage: Feekes’ scale 9
BG525865 Stevia rebaudiana Tissue type: leaf
Developmental stage: field grown midsize
AW010840 Pinus taeda Organ: shoot tips
BE643398 Ceratopteris richardii Tissue type: gametophyte
Cell type: spore
Developmental stage: 20 h after germination initiation
BJ184681 Physcomitrella patens
subsp. patens
Tissue type: mixture of chloronemata, caulonemata, and malformed
buds
a Expression details (e.g., tissue or organ, condition) are as described in the EMBL entries.
584
considered strong proof for a monophyletic origin oftype I MADS-box genes.On the other hand, it is possible that there are or-
thologous type I genes in Arabidopsis thaliana and ricebut that phylogeny reconstruction, due to the limitednumber of phylogenetically informative sites, is unableto identify them correctly. Probably, the identificationof type I genes from other plants will be necessaryto clarify this. This is not yet possible, however, due tothe limited genomic data from other plant species.Hopefully, as suggested by Riechmann and Ratc-
liffe (2000), in silico studies on the annotation andclassification of specific gene families, such as theone described here, can guide future experimentalwork and enhance the functional characterization ofgenes.
Note Added in Proof
After acceptance, novel MADS-box genes wereidentified in Physcomitrella [Henschel K, Kofuji R,Hasebe M, Saedler H, Munster T, Theißen G (2002)Two ancient classes of MIKC-type MADS-box genesare present in the moss Physcomitrella patens. MolBiol Evol 19:801–814]. By including the MIKC* (typeII) genes (PPM3, PPM4, PPMADS2, andPPMADS3) in our analysis, some of the Arabidopsisgenes that we denoted as being of type I clusteredwith the Physcomitrella genes. Although these Ara-bidopsis genes did not seem to possess a conserved K-box (the reason why they were included), a relic ofthis box could be identified through comparison withthe very degenerated K-box found in Physcomitrella.Therefore, some of the genes (i.e., AC011809,AC073178, AC004138, AC069252, AC009243,AC009243_2, and AC004484; Fig. 1) should prob-ably be classified as type II rather than type I genes inour study.
Acknowledgments. The authors want to thank Klaas Vandepoele
and Cedric Simillion for technical help. S.D. and K.F. are indebted
to the Institute for the Promotion of Innovation by Science and
Technology in Flanders (IWT) for a predoctoral fellowship. An-
notated sequences have been submitted to the MAtDB (Schoof et
al. 2002) and the TAIR (Huala et al. 2001) databases. Supple-
mentary data are available at http://www.psb.rug.ac.be/bioinfor-
matics/MADS/.
References
Altschul SF, Gish GW, Miller W, Myers EW, Lipman DJ (1990)
Basic local alignment search tool. J Mol Biol 215:403–410
Alvarez-Buylla ER, Liljegren SJ, Pelaz S, Gold SE, Burgeff C,
Ditta GS, Vergara-Silva F, Yanofsky MF (2000a) MADS-box
gene evolution beyond flowers: Expression in pollen, endo-
sperm, guard cells, roots and trichomes. Plant J 24:457–466
Alvarez-Buylla ER, Pelaz S, Liljegren SJ, Gold SE, Burgeff C,
Ditta GS (2000b) An ancestral MADS-box gene duplication
occurred before the divergence of plants and animals. Proc Natl
Acad Sci USA 97:5328–5223
Angenent GC, Colombo L (1996) Molecular control of ovule de-
velopment. Trends Plants Sci 1:228–232
Apweiler R, Attwood TK, Bairoch A, Bateman A, Birney E, Bi-
swas M, Bucher P, Marx B, Mulder NJ, Oinn TM, Pagni M,
Servant F, Sigrist CJA, Zdobnov EM (2001) The InterPro
database, an integrated documentation resource for protein
families, domains and functional sites. Nucleic Acids Res
29:37–40
Arabidopsis Genome Initiative (2000) Analysis of the genome se-
quence of the flowering plant Arabidopsis thaliana. Nature
408:798–815
Bailey TL, Elkan C (1994) Fitting a mixture model by expectation
maximization to discover motifs in biopolymers. In: Proceed-
ings of the Second International Conference on Intelligent
Systems for Molecular Biology. AAAI Press, Menlo Park, CA,
pp 28–36
Becker A, Winter K-U, Meyer B, Saedler H, Thei�en G (2000)
MADS-box gene diversity in seed plants 300 million years ago.
Mol Biol Evol 17:1425–1434
Bennetzen JL (2000) Transposable elements contributions to plant
gene and genome evolution. Plant Mol Biol 42:251–269
Burgeff C, Liljegren SJ, Tapia-Lopez R, Yanofsky MF, Alvarez-
Buylla ER (2002) MADS-box gene expression in lateral pri-
mordia, meristems and differentiated tissues of Arabidopsis
thaliana roots. Planta 214:365–372
Eddy SR (1998) Profile hidden Markov model. Bioinformatics
14:755–763
Escalante R, Sastre L (1998) A serum response factor homolog is
required for spore differentiation in Dictyostelium. Develop-
ment 125:3801–3808
Felsenstein J (1985) Confidence limits on phylogenies: An ap-
proach using the bootstrap. Evolution 39:783–791
Fischer A, Baum N, Saedler H, Thei�en G (1995) Chromosomal
mapping of the MADS-box multigene family in Zea mays re-
veals dispersed distribution of allelic genes as well as transposed
copies. Nucleic Acids Res 23:1901–1911
Hall TA (1999) BioEdit: A user-friendly biological sequence
alignment editor and analysis program for Windows 95/98/NT.
Nucleic Acids Symp Ser 41:95–98
Hoffmann U, Kroiher M (2001) A possible role for the cnidarian
homologue of serum response factor in decision making by
undifferentiated cells. Dev Biol 236:304–315
Huala E, Dickerman AW, Garcia-Hernandez M, Weems D, Reiser
L, LaFond F, Hanley D, Kiphart D, Zhuang M, Huang W,
Mueller LA, Bhattacharyya D, Bhaya D, Sobral BW, Beavis
W, Meinke DW, Town CD, Somerville C, Rhee SY (2001) The
Arabidopsis Information Resource (TAIR): A comprehensive
database and web-based information retrieval, analysis, and
visualization system for a model plant. Nucleic Acids Res
29:102–105
Krogan NT, Ashton NW (2000) Ancestry of plant MADS-box
genes revealed by bryophyte (Physcomitrella patens) homo-
logues. New Phytol 147:505–517
Latchman DS (1998) Eukaryotic transcription factors. Academic
Press, San Diego, CA
Liljegren SJ, Ferrandiz C, Alvarez-Buylla ER, Pelaz S, Yanofsky
MF (1998) Arabidopsis MADS-box genes involved in fruit de-
hiscence. Flower News Lett 25:9–19
Liljegren SJ, Ditta GS, Eshed Y, Savidge B, Bowman JL, Yanofsky
MF (2000) SHATTERPROOF MADS-box genes control seed
dispersal in Arabidopsis. Nature 404:766–770
Lukashin AV, Borodovsky M (1998) GeneMark.hmm: New solu-
tions for gene finding. Nucleic Acids Res 26:1107–1115
Lupas A (1996) Coiled coils: New structures and new functions.
Trends Biochem Sci 21:375–382
585
Lynch M, Conery JS (2000) The evolutionary fate and conse-
quences of duplicate genes. Science 290:1151–1155
Montag K, Salamini F, Thompson RD (1995) ZEMa, a member of
a novel group of MADS box genes, is alternatively spliced in
maize endosperm. Nucleic Acids Res 23:2168–2177
Montag K, Salamini F, Thompson RD (1996) The ZEM2 family of
maize MADS box genes possess features of transposable ele-
ments. Maydica 41:241–254
Muller T, Vingron M (2000) Modeling amino acid replacement.
J Comput Biol 7:761–776
Munster T, Pahnke J, Di Rosa A, Kim JT, Martin W, Saedler H,
Thei�en G (1997) Floral homeotic genes were recruited from
homologous MADS-box genes preexisting in the common an-
cestor of ferns and seed plants. Proc Natl Acad Sci USA
94:2415–2420
Norman C, Runswick M, Pollock R, Treisman R (1988) Isolation
and properties of cDNA clones encoding SRF, a transcription
factor that binds to the c-fos serum response element. Cell
55:989–1003
Ng M, Yanofsky MF (2001) Function and evolution of the plant
MADS-box gene family. Nat Rev Genet 2:186–195
Passmore S, Elble R, Tye BK (1989) A protein involved in mini-
chromosome maintenance in yeast binds a transcriptional en-
hancer conserved in eukaryotes. Genes Dev 3:921–935
Pelaz S, Ditta GS, Baumann E, Wisman E, Yanofsky MF (2000) B
and C floral organ identity functions require SEPALLATA
MADS-box genes, Nature 405:200–203
Raes J, Van de Peer Y (1999) ForCon, a tool to automatically
convert sequence alignment formats. EMBnet.news 6(1); http://
www.psb.rug.ac.be/�jerae/ForCon/index.htmlRaes J, Vandepoele K, Saeys Y, Simillion C, Van de Peer Y (2003)
Investigating ancient duplication events in the Arabidopsis ge-
nome. J Struct Funct2 Genom (in press)
Riechmann JL, Meyerowitz EM (1997) MADS domain proteins in
plant development. Biol Chem 378:1079–1101
Riechmann JL, Ratcliffe OJ (2000) A genomic perspective on plant
transcription factors. Curr Opin Plant Biol 3(5):423–434
Rutherford K, Parkhill J, Crook J, Horsnell T, Rice P, Rajandream
M, Barrell B (2000) Artemis: Sequence visualization and an-
notation. Bioinformatics 16:944–945
Saitou N, Nei M (1987) The neighbour-joining method: A new
method for constructing phylogenetic trees. Mol Biol Evol
4:406–425
Sasaki T, Burr P (2000) International Rice Genome Sequencing
Project: The effort to completely sequence the rice genome.
Curr Opin Plant Biol 3:138–141
Scheffer U, Krasko A, Pancer Z, Muller WEG (1997) High con-
servation of the serum response factor within Metazoa: cDNA
from the sponge Geodia cydonium. Biol J Linn Soc 61:127–137
Schmidt HA, Strimmer K, Vingron M, von Haeseler A (2002)
TREE-PUZZLE: Maximum likelihood phylogenetic analysis
using quartets and parallel computing. Bioinformatics 18:502–
504
Schoof H, Zaccaria P, Gundlach H, Lemcke K, Rudd S, Kolesov
G, Arnold R, Mewes HW, Mayer KF (2002) MIPS Arabidopsis
thaliana Database (MAtDB): An integrated biological knowl-
edge resource based on the first complete plant genome. Nucleic
Acids Res 30:91–93
Schwarz-Sommer Z, Huijser P, Nacken W, Saedler H, Sommer H
(1990) Genetic control of flower development by homeotic
genes in Antirrhinum majus. Science 250:931–936
Schwarz-Sommer Z, Hue I, Huijser P, Flor PJ, Hansen R, Tetens
F, Lonnig WE, Saedler H, Sommer H (1992) Characterization
of the Antirrhinum floral homeotic MADS-box gene deficiens:
Evidence for DNA binding and autoregulation of its persistent
expression throughout flower development. EMBO J 11:251–
263
Shore P, Sharrocks AD (1995) The MADS-box family of tran-
scription factors. Eur J Biochem 229:1–13
Simillion C, Vandepoele K, Van Montagu M, Zabeau M, Van de
Peer Y (2002) The hidden duplication past of Arabidopsis tha-
liana. Proc Natl Acad Sci3 USA 99:13627–13632
Sommer H, Beltran J-P, Huijser P, Pape H, Lonnig W-E, Saedler
H, Schwarz-Sommer Z (1990) Deficiens, a homeotic gene in-
volved in the control of flower morphogenesis in Antirrhinum
majus: The protein shows homology to transcription factors.
EMBO J 9:605–613
Stoesser G, Baker W, van den Broek A, Camon E, Garcia-Pastor
M, Kanz C, Kulikova T, Leinonen R, Lin Q, Lombard V,
Lopez R, Redaschi N, Stoehr P, Tuli, MA, Tzouvara K,
Vaughan R (2002) The EMBL Nucleotide Sequence Database.
Nucleic Acids Res 30:21–26
Strimmer K, von Haeseler A (1996) Quartet puzzling: A quartet
maximum likelihood method for reconstructing tree topologies.
Mol Biol Evol 13:964–969
Swofford DL (1998) PAUP*. Phylogenetic analysis using parsi-
mony (*And other methods). Version 4. Sinauer Associates,
Sunderland, MA
Thei�en G (2001) Development of floral organ identity: Stories
from the MADS house. Curr Opin Plant Biol 4:75–85
Thei�en G, Saedler H (2001) Floral quartets. Nature 409:469–
471
Thei�en G, Becker A, Di Rosa A, Kanno A, Kim JT, Munster T,
Winter K-U, Saedler H (2000) A short history of MADS-box
genes in plants. Plant Biol 42:115–149
Thei�en G, Munster T, Henschel K (2001) Why don’t mosses
flower? New Phytol 150:1–8
Van de Peer Y, De Wachter R (1997) Construction of evolutionary
distance trees with TREECON for Windows: Accounting for
variation in nucleotide substitution rate among sites. Comput
Appl Biosci 13:227–230
Vandepoele K, Saeys Y, Simillion C, Raes J, Van de Peer Y (2002)
The Automatic Detection of Homologous Regions (ADHoRe)
and its application to microcolinearity between Arabidopsis
and rice. Genome Res4 12:1792–1801
Wikstrom N, Savolainen V, Chase MW (2001) Evolution of the
angiosperms: Calibrating the family tree. Proc R Soc Lond Ser
B Biol Sci 268:2211–2220
Wolf E, Kim PS, Berger B (1997) MultiCoil: A program for pre-
dicting two- and three-stranded coiled coils. Protein Sci 6:1179–
1189
Yang Z (2000) Phylogenetic Analysis by Maximum Likeli-
hood (PAML), version 3.0. University College London,
London
Yanofsky MF, Ma H, Bowman JL, Drews GN, Feldmann KA,
Meyerowitz EM (1990) The protein encoded by the Arabidopsis
homeotic gene agamous resembles transcription factors. Nature
346:35–39
Yu J, et al. (2002) A draft sequence of the rice genome (Oryza
sativa L. spp. ndica). Science 296:79–92
Yu YT, Breitbart RE, Smoot LB, Lee Y, Mahdavi V, Nadal-Gi-
nard B (1992) Human myocyte-specific enhancer factor 2
comprises a group of tissue-restricted MADS box transcription
factors. Genes Dev 6:1783–1798
Zhang H, Forde BG (2000) Regulation of Arabidopsis root devel-
opment by nitrate availability. J Exp Bot 51:51–59
586