Upload
jonathan-eisen
View
5.324
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Talk by Jonathan Eisen for #SMBEEuks
Citation preview
The Need for a Phylogeny-Driven Genomic Encyclopedia of Eukaryotes
Jonathan A. Eisen@phylogenomics
University of California, Davis
Talk for SMBE-EUKSMonday, April 29, 13
I: The Problem
Monday, April 29, 13
Googling Sequenced Eukaryotic Genomes
Monday, April 29, 13
Wikipedia On Sequenced Euks
Monday, April 29, 13
More from Wikipedia
Monday, April 29, 13
Better Source: GOLD
http://www.genomesonline.org/cgi-bin/GOLD/index.cgi
Monday, April 29, 13
GOLD by Taxonomy
http://www.genomesonline.org/cgi-bin/GOLD/index.cgi
Monday, April 29, 13
GOLD: Euks by Phylum
4/28/13 9:20 AMGOLD
Page 1 of 1http://www.genomesonline.org/cgi-bin/GOLD/phylogenetic_distribution.cgi
Search
Home Version 4.0
Archaeal Phylum Distribution
Phylum Count Percent
Korarchaeota 1 0
Nanoarchaeota 2 0
Thaumarchaeota 30 5
Crenarchaeota 142 25
Euryarchaeota 356 64
Unclassified 28 5
Bacterial Phylum Distribution
Phylum Count Percent
Caldiserica 1 0
Nitrospinae 1 0
Crenarchaeota 2 0
Chrysiogenetes 2 0
Dictyoglomi 2 0
Fibrobacteres 2 0
Armatimonadetes 3 0
Elusimicrobia 3 0
Lentisphaerae 3 0
Poribacteria 4 0
Gemmatimonadetes 6 0
Thermodesulfobacteria 7 0
Ignavibacteria 8 0
Deferribacteres 10 0
Chlorobi 14 0
Synergistetes 21 0
Euryarchaeota 23 0
Nitrospirae 24 0
Aquificae 24 0
Acidobacteria 30 0
Verrucomicrobia 41 0
Planctomycetes 42 0
Thermotogae 50 0
Chloroflexi 51 0
Fusobacteria 80 0
Deinococcus-Thermus 92 0
Chlamydiae 207 1
Cyanobacteria 245 1
Tenericutes 251 1
Spirochaetes 472 2
Bacteroidetes 762 4
Actinobacteria 2,065 10
Firmicutes 5,342 26
Proteobacteria 10,088 50
Unclassified 17 0
Eukaryotic Phylum Distribution
Phylum Count Percent
Phaeophyceae 1 0
Priapulida 1 0
Rotifera 1 0
Hemichordata 1 0
Pinguiophyceae 1 0
Ctenophora 1 0
Bolidophyceae 1 0
Chaetognatha 1 0
Porifera 2 0
Xanthophyceae 2 0
Tardigrada 2 0
Euglenida 2 0
Chromerida 3 0
Placozoa 3 0
Glomeromycota 3 0
Cryptomycota 4 0
Blastocladiomycota 5 0
Echinodermata 6 0
Entomophthoromycota 9 0
Chytridiomycota 12 0
Neocallimastigomycota 12 0
Annelida 13 0
Eustigmatophyceae 13 0
Cnidaria 18 0
Bacillariophyta 21 0
Platyhelminthes 23 0
Mollusca 25 0
Microsporidia 31 1
Chlorophyta 77 1
Nematoda 110 2
Apicomplexa 264 5
Arthropoda 370 7
Chordata 626 12
Streptophyta 796 15
Basidiomycota 976 18
Ascomycota 1,251 23
Unclassified 704 13
Back to GOLD
PHYLOGENETIC DISTRIBUTION
ARCHAEA TOTAL: 559 Phylum: 5/5 Class: 10/9 Order: 18/18 Family: 30/29 Genus: 103/118 Species: 340/673
BACTERIA TOTAL: 20318 Phylum: 35/31 Class: 59/52 Order: 124/118 Family: 280/298 Genus: 1368/2106 Species: 6352/11424
EUKARYA TOTAL: 5391 Phylum: 36/56 Class: 107/182 Order: 330/1037 Family: 689/6689 Genus: 1170/54319 Species: 1769/218222
NUMBER EXPLANATION: Number of classifieds subdivisions with genome projects over number of the classified subdivisions of this phylogenetic group.
4/28/13 9:20 AMGOLD
Page 1 of 1http://www.genomesonline.org/cgi-bin/GOLD/phylogenetic_distribution.cgi
Search
Home Version 4.0
Archaeal Phylum Distribution
Phylum Count Percent
Korarchaeota 1 0
Nanoarchaeota 2 0
Thaumarchaeota 30 5
Crenarchaeota 142 25
Euryarchaeota 356 64
Unclassified 28 5
Bacterial Phylum Distribution
Phylum Count Percent
Caldiserica 1 0
Nitrospinae 1 0
Crenarchaeota 2 0
Chrysiogenetes 2 0
Dictyoglomi 2 0
Fibrobacteres 2 0
Armatimonadetes 3 0
Elusimicrobia 3 0
Lentisphaerae 3 0
Poribacteria 4 0
Gemmatimonadetes 6 0
Thermodesulfobacteria 7 0
Ignavibacteria 8 0
Deferribacteres 10 0
Chlorobi 14 0
Synergistetes 21 0
Euryarchaeota 23 0
Nitrospirae 24 0
Aquificae 24 0
Acidobacteria 30 0
Verrucomicrobia 41 0
Planctomycetes 42 0
Thermotogae 50 0
Chloroflexi 51 0
Fusobacteria 80 0
Deinococcus-Thermus 92 0
Chlamydiae 207 1
Cyanobacteria 245 1
Tenericutes 251 1
Spirochaetes 472 2
Bacteroidetes 762 4
Actinobacteria 2,065 10
Firmicutes 5,342 26
Proteobacteria 10,088 50
Unclassified 17 0
Eukaryotic Phylum Distribution
Phylum Count Percent
Phaeophyceae 1 0
Priapulida 1 0
Rotifera 1 0
Hemichordata 1 0
Pinguiophyceae 1 0
Ctenophora 1 0
Bolidophyceae 1 0
Chaetognatha 1 0
Porifera 2 0
Xanthophyceae 2 0
Tardigrada 2 0
Euglenida 2 0
Chromerida 3 0
Placozoa 3 0
Glomeromycota 3 0
Cryptomycota 4 0
Blastocladiomycota 5 0
Echinodermata 6 0
Entomophthoromycota 9 0
Chytridiomycota 12 0
Neocallimastigomycota 12 0
Annelida 13 0
Eustigmatophyceae 13 0
Cnidaria 18 0
Bacillariophyta 21 0
Platyhelminthes 23 0
Mollusca 25 0
Microsporidia 31 1
Chlorophyta 77 1
Nematoda 110 2
Apicomplexa 264 5
Arthropoda 370 7
Chordata 626 12
Streptophyta 796 15
Basidiomycota 976 18
Ascomycota 1,251 23
Unclassified 704 13
Back to GOLD
PHYLOGENETIC DISTRIBUTION
ARCHAEA TOTAL: 559 Phylum: 5/5 Class: 10/9 Order: 18/18 Family: 30/29 Genus: 103/118 Species: 340/673
BACTERIA TOTAL: 20318 Phylum: 35/31 Class: 59/52 Order: 124/118 Family: 280/298 Genus: 1368/2106 Species: 6352/11424
EUKARYA TOTAL: 5391 Phylum: 36/56 Class: 107/182 Order: 330/1037 Family: 689/6689 Genus: 1170/54319 Species: 1769/218222
NUMBER EXPLANATION: Number of classifieds subdivisions with genome projects over number of the classified subdivisions of this phylogenetic group.
http://www.genomesonline.org/cgi-bin/GOLD/index.cgi
Monday, April 29, 13
GOLD: Euks by PhylumPriapulida 1 0Phaeophyceae 1 0Rotifera 1 0Hemichordata 1 0Pinguiophyceae 1 0Ctenophora 1 0Bolidophyceae 1 0Chaetognatha 1 0Porifera 2 0Xanthophyceae 2 0Tardigrada 2 0Euglenida 2 0Chromerida 3 0Placozoa 3 0Glomeromycota 3 0Cryptomycota 4 0Blastocladiomycota 5 0Echinodermata 6 0Entomophthoromycota 9 0Chytridiomycota 12 0Neocallimastigomycota 12 0Annelida 13 0Eustigmatophyceae 13 0Cnidaria 18 0Bacillariophyta 21 0Platyhelminthes 23 0Mollusca 25 0Microsporidia 31 1Chlorophyta 77 1Nematoda 110 2Apicomplexa 264 5Arthropoda 370 7Chordata 626 12Streptophyta 796 15Basidiomycota 976 18Ascomycota 1,251 23
Monday, April 29, 13
Euks More Resolution
0.2
Bodomorpha minima
Lumbricus rubellus
Diplophrys
BOLA458
Chaunacanthida sp.
Labyrinthuloides minuta
Filamoeba nolandi
Chlamydaster sterni
RT7iin2
Phalansterium solitarium
Euglena gracilis
RT5iin20
BOLA383
Ulkenia profunda
LEMD267
Ammonia sp.
Oxymonas sp.
DH148EKB1Diplonema ambulator
Minchinia teredinis
Pavlova salina
Glaucosphaera vacuolata
Cyanoptyche gloeocystis
OLI11305
Gromia oviformis
Cryptosporidium parvum
Breviata anathema
Achlya bisexualis
LEMD052Phagomyxa odontellae
Raphidiophrys ambigua
Compsopogon coeruleus
BOLA212
Colpodella pontica
Uncultured eukaryote clone BOLA187
Jakoba libera
RT5iin2
CS.E036
Acrosphaera sp. CR6A
Acanthamoeba castellanii
AT1.3
Saccharomyces cerevisiae
OLI11150
Nuclearia simplex
RA000412.136
TCS 2002
BOLA868
Allogromia sp.
Monosiga brevicollis
RT5iin4
Plasmodiophora brassicae
RT5iin8
OLI51105
RA010412.17
BOLA515
OLI11032
RT 5iin25
AT4.11
Symphyacanthida
RT5iin44
CS.E045
Urosporidium crescens
Goniomonas truncata
Gymnophrys cometa
Podocoryne carnea
OLI11066
Reclinomonas americana
Reticulomyxa filosa
RT8n7
Oxytricha nova
AT4.50
C1.E027
Arthracanthida sp.
RT1n14cul
AT4.94
Telonema antarcticum
OLI11025
LKM30
LKM48
Filobasidiella neoformans
DH147EKD17
Mayorella sp.
C2.E026
Bacillaria paxillifer
Retortamonas sp.
OLI11059
Malawimonas jakobiformis
BOLA048
Streblomastix strix
Guillardia theta
Platyamoeba stenopodia
DH148EKD18
Cafeteria roenbergensis
Telonema subtilis RCC404.5
DH148EKD53
LKM74
Ciliophrys infusionum
Scherffelia dubia
Volvox carteri
CS.R003
Trypanosoma cruzi
BL010625.25
AT4.56
N-Por
Jakoba incarcerata
Sphaerozoum punctatum
Uncultured eukaryote clone BOLA366
Lecythium sp.
Acanthometra sp.
Loxophyllum utriculare
LKM101
Glaucocystis nostochinearum
OLI11056
BAQA072
Apusomonas proboscidea
Trimastix marina
C3.E012
Helianthus annuus
AT8.54
Ichthyobodo necator
CS.E022
RA001219.10
RT5in38 Paravahlkampfia ustiana
OLI11007
Telonema subtilis RCC358.7
Amastigomonas debruynei
Emiliania huxleyi
Leptomyxa reticulataHartmannella vermiformis
OLI11072
DH145EKD11
Noctiluca scintillans
Cyanophora paradoxa
Trimastix pyriformis
Naegleria gruberi
AT 4.96
Amoeba proteus
Gonyaulax spinifera
sp.
0.99/68
0.89/-0.40/-
0.87/-
0.88/-
0.88/-
0.84/-
0.78/59
0.66/61
0.55/-
0.89/-
Collodictyon triciliatum
Diphylleia rotansUncultured Collodictyonidae partial
1.0/77
-/84
1.0/63
1.0/56
0.99/-
1.0/-
0.96/-
0.99/-
0.95/-
0.99/-
0.99/68
1.0/63
1.0/62
0.69/-
0.63/- 0.83/-
0.79/75
0.69/57
0.79/-
0.87/-0.59/-
0.68/-1.0/-
0.57/50
0.63/-
1.0/78
0.53/-
SAR
Excavata
Diphyllatia
Amoebozoa
Opisthokonta
0.53/76
0.73/-
0.81/-
0.84/-
-/-
0.63/-
0.79/-
0.81/-
0.70/-
0.98/-
1.0/74
0.51/-
-/-
-/-
Haptophyta
Telonemia
Apusozoa
Centrohelida
CryptophytaRhodophyta
Glaucophyta
Viridiplantae
FIG. 1. 18S rDNA phylogeny of the Diphyllatia species Collodictyon triciliatum (highlighted by black box) and Diphylleia rotans. The topologywas reconstructed by MrBayes v3.1.2 under the GTR ! GAMMA ! I ! covarion model. Posterior probabilities (PP) and ML bootstrap supports(BP, inferred by RAxML v7.1.2 under GTR ! GAMMA ! I model) are shown at the nodes. Thick lines indicate PP. 0.90 and BP. 80%. Dashes‘‘-’’ indicate PP , 0.5 or BP , 50%. A few long branches are shortened by 50% (/) or 75% (//).
Zhao et al. · doi:10.1093/molbev/mss001 MBE
1560
by guest on April 28, 2013
http://mbe.oxfordjournals.org/
Dow
nloaded from
Collodictyon—An Ancient Lineage in the Tree of EukaryotesSen Zhao,!,1 Fabien Burki,!,2 Jon Brate,1 Patrick J. Keeling,2 Dag Klaveness,1 andKamran Shalchian-Tabrizi*,1
1Microbial Evolution Research Group, Department of Biology, University of Oslo, Oslo, Norway2Canadian Institute for Advanced Research, Botany Department, University of British Columbia, Vancouver, British Columbia,Canada
!These authors contributed equally to this work.
*Corresponding author: E-mail: [email protected].
Associate editor: Herve Philippe
Abstract
The current consensus for the eukaryote tree of life consists of several large assemblages (supergroups) that are hypothesized todescribe the existing diversity. Phylogenomic analyses have shed light on the evolutionary relationships within and betweensupergroups as well as placed newly sequenced enigmatic species close to known lineages. Yet, a few eukaryote species remain ofunknown origin and could represent key evolutionary forms for inferring ancient genomic and cellular characteristics ofeukaryotes. Here, we investigate the evolutionary origin of the poorly studied protist Collodictyon (subphylum Diphyllatia) bysequencing a cDNA library as well as the 18S and 28S ribosomal DNA (rDNA) genes. Phylogenomic trees inferred from 124 genesplaced Collodictyon close to the bifurcation of the ‘‘unikont’’ and ‘‘bikont’’ groups, either alone or as sister to the potentiallycontentious excavateMalawimonas. Phylogenies based on rDNA genes confirmed that Collodictyon is closely related to anothergenus, Diphylleia, and revealed a very low diversity in environmental DNA samples. The early and distinct origin of Collodictyonsuggests that it constitutes a new lineage in the global eukaryote phylogeny. Collodictyon shares cellular characteristics withExcavata and Amoebozoa, such as ventral feeding groove supported by microtubular structures and the ability to form thin andbroad pseudopods. These may therefore be ancient morphological features among eukaryotes. Overall, this shows thatCollodictyon is a key lineage to understand early eukaryote evolution.
Key words: 18S and 28S rDNA, Collodictyon, Diphyllatia, tree of life, phylogenomics, cDNA, pyrosequencing.
IntroductionOver the last few years, molecular sequence data have ad-dressed some of the most intriguing questions about theeukaryote tree of life. Phylogenomic analyses have con-firmed the existence of several major eukaryote groups(supergroups) as well as shown various levels of evidencesfor the relationships among them (Burki et al. 2007; Parfreyet al. 2010). Recently, two new large assemblages, SAR(Stramenopila, Alveolata, and Rhizaria) and CCTH (Crypto-phyta, Centrohelida, Telonemia, and Haptophyta), wereproposed to encompass a large fraction of the eukaryotediversity, together with the other supergroups Opisthokon-ta, Amoebozoa, Archaeplastida, and Excavata (Patron et al.2007; Burki et al. 2009). Solid phylogenomic evidencesupports the monophyly of Amoebozoa, Opisthokonta,Archaeplastida, and SAR (Rodriguez-Ezpeleta et al. 2007;Burki et al. 2009; Minge et al. 2009), but the monophylyof Excavata and CCTH (also called Hacrobia; Okamotoet al. 2009) remains controversial, often dependent onthe selection of taxa and gene data set (Burki et al.2009; Hampl et al. 2009; Baurain et al. 2010). Despite severalattempts, the evolutionary relationships between thesesupergroups are still uncertain because of the ancient
and complex genome histories (Simpson and Roger2004; Parfrey et al. 2006; Roger and Simpson 2009).
Identification of sister lineages to these supergroups iscrucial for resolving the eukaryote tree and understandingthe early history of eukaryotes. If these key lineages exist,they may be found among the few species that harbor dis-tinct morphological features but are of unknown evolu-tionary origin in single-gene phylogenies (Patterson 1999;Shalchian-Tabrizi et al. 2006; Kim et al. 2011). Indicationsthat such enigmatic species can be placed in the eukaryotetree come from recent phylogenomic analyses. For in-stance, Ministeria (Opisthokonta), Breviata (Amoebozoa)and Telonemia, Centroheliozoa, and Picobiliphyta havebeen shown to constitute deep lineages within their re-spective supergroups (Shalchian-Tabrizi, Minge, et al.2008; Burki et al. 2009; Minge et al. 2009; Yoon et al. 2011).
Here, we investigate a member of such a key lineage, Col-lodictyon, which was first described in 1865 (Carter 1865),but its cellular structure and outer morphology were ana-lyzed only recently (Klaveness 1995; Brugerolle et al. 2002).Collodictyonwas originally proposed to be closely related toDiphylleia and Sulcomonas and classified in the familyDiphylleidae (Cavalier-Smith 1993; the synonymous family
© The Author(s) 2012. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License(http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, andreproduction in any medium, provided the original work is properly cited. Open AccessMol. Biol. Evol. 29(6):1557–1568. 2012 doi:10.1093/molbev/mss001 Advance Access publication January 6, 2012 1557
Research
article by guest on A
pril 28, 2013http://m
be.oxfordjournals.org/D
ownloaded from
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3351787/
Monday, April 29, 13
2010 PARFREY ET AL.—BROADLY SAMPLED TREE OF EUKARYOTIC LIFE 523
FIGURE 1. Most likely eukaryotic tree of life reconstructed using all 451 taxa and all 16 genes (SSU-rDNA plus 15 protein genes). Majornodes in this topology are robust to analyses of subsets of taxa and genes, which include varying levels of missing data (Table 1). Clades in boldare monophyletic in analyses with 2 or more members except in all:15 in which taxa represented by a single gene were sometimes misplaced.Numbers in boxes represent support at key nodes in analyses with increasing amounts of missing data (10:16, 6:16, 4:16, and all:16 analyses; seeTable 1 for more details). Given uncertainties around the root of the eukaryotic tree of life (see text), we have chosen to draw the tree rooted withthe well-supported clade Opisthokonta. Dashed line indicates alternate branching pattern seen for Amoebozoa in other analyses. Long branches,indicated by //, have been reduced by half. The 6 lineages labeled by * represent taxa that are misplaced, probably due to LBA, listed fromtop to bottom with expected clade in parentheses. These are Protoopalina japonica (Stramenopiles), Aggregata octopiana (Apicomplexa), Mikrocytosmackini (Haplosporidia), Centropyxis laevigata (Tubulinea), Marteilioides chungmuensis (unplaced), and Cochliopodium spiniferum (Amoebozoa).
by guest on April 28, 2013
http://sysbio.oxfordjournals.org/D
ownloaded from
Syst. Biol. 59(5):518–533, 2010c! The Author(s) 2010. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved.
For Permissions, please email: [email protected]:10.1093/sysbio/syq037Advance Access publication on July 23, 2010
Broadly Sampled Multigene Analyses Yield a Well-Resolved Eukaryotic Tree of Life
LAURA WEGENER PARFREY1, JESSICA GRANT2, YONAS I. TEKLE2,6, ERICA LASEK-NESSELQUIST3,4,HILARY G. MORRISON3, MITCHELL L. SOGIN3, DAVID J. PATTERSON5, AND LAURA A. KATZ1,2,!
1Program in Organismic and Evolutionary Biology, University of Massachusetts, 611 North Pleasant Street, Amherst,MA 01003, USA; 2Department of Biological Sciences, Smith College, 44 College Lane, Northampton, MA 01063, USA; 3Bay Paul Center for
Comparative Molecular Biology and Evolution, Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA 02543, USA; 4Department of Ecology andEvolutionary Biology, Brown University, 80 Waterman Street, Providence, RI 02912, USA; 5Biodiversity Informatics Group, Marine Biological
Laboratory, 7 MBL Street, Woods Hole, MA 02543, USA; 6Present address: Department of Epidemiology and Public Health, Yale University School ofMedicine, New Haven, CT 06520, USA;
"Correspondence to be sent to: Laura A. Katz, 44 College Lane, Northampton, MA 01003, USA; E-mail: [email protected] Wegener Parfrey and Jessica Grant have contributed equally to this work.
Received 30 September 2009; reviews returned 1 December 2009; accepted 25 May 2010Associate Editor: Cecile Ane
Abstract.—An accurate reconstruction of the eukaryotic tree of life is essential to identify the innovations underlying thediversity of microbial and macroscopic (e.g., plants and animals) eukaryotes. Previous work has divided eukaryotic diver-sity into a small number of high-level “supergroups,” many of which receive strong support in phylogenomic analyses.However, the abundance of data in phylogenomic analyses can lead to highly supported but incorrect relationships dueto systematic phylogenetic error. Furthermore, the paucity of major eukaryotic lineages (19 or fewer) included in thesegenomic studies may exaggerate systematic error and reduce power to evaluate hypotheses. Here, we use a taxon-richstrategy to assess eukaryotic relationships. We show that analyses emphasizing broad taxonomic sampling (up to 451 taxarepresenting 72 major lineages) combined with a moderate number of genes yield a well-resolved eukaryotic tree of life.The consistency across analyses with varying numbers of taxa (88–451) and levels of missing data (17–69%) supports theaccuracy of the resulting topologies. The resulting stable topology emerges without the removal of rapidly evolving genesor taxa, a practice common to phylogenomic analyses. Several major groups are stable and strongly supported in theseanalyses (e.g., SAR, Rhizaria, Excavata), whereas the proposed supergroup “Chromalveolata” is rejected. Furthermore, ex-tensive instability among photosynthetic lineages suggests the presence of systematic biases including endosymbiotic genetransfer from symbiont (nucleus or plastid) to host. Our analyses demonstrate that stable topologies of ancient evolutionaryrelationships can be achieved with broad taxonomic sampling and a moderate number of genes. Finally, taxon-rich analy-ses such as presented here provide a method for testing the accuracy of relationships that receive high bootstrap support(BS) in phylogenomic analyses and enable placement of the multitude of lineages that lack genome scale data. [Excavata;microbial eukaryotes; Rhizaria; supergroups; systematic error; taxon sampling.]
Perspectives on the structure of the eukaryotic treeof life have shifted in the past decade as molecularanalyses provide hypotheses for relationships amongthe approximately 75 robust lineages of eukaryotes.These lineages are defined by ultrastructural identities(Patterson 1999)—patterns of cellular and subcellularorganization revealed by electron microscopy—and arestrongly supported in molecular analyses (Parfrey et al.2006; Yoon et al. 2008). Most of these lineages nowfall within a small number of higher level clades, thesupergroups of eukaryotes (Simpson and Roger 2004;Adl et al. 2005; Keeling et al. 2005). Several of theseclades—Opisthokonta, Rhizaria, and Amoebozoa—are increasingly well supported by phylogenomic(Rodrıguez-Ezpeleta et al. 2007a; Burki et al. 2008;Hampl et al. 2009) and phylogenetic (Parfrey et al.2006; Pawlowski and Burki 2009), analyses, whereassupport for “Archaeplastida” predominantly comesfrom some phylogenomic studies (Rodrıguez-Ezpeletaet al. 2005; Burki et al. 2007) or analyses of plastidgenes (Yoon et al. 2002; Parfrey et al. 2006). In con-trast, support for “Chromalveolata” and Excavata ismixed, often dependent on the selection of taxa in-cluded in analyses (Rodrıguez-Ezpeleta et al. 2005;Parfrey et al. 2006; Rodrıguez-Ezpeleta et al. 2007a;Burki et al. 2008; Hampl et al. 2009). We use quotation
marks throughout to note groups where uncertaintiesremain. Moreover, it is difficult to evaluate the overallstability of major clades of eukaryotes because phyloge-nomic analyses have 19 or fewer of the major lineagesand hence do not sufficiently sample eukaryotic diver-sity (Rodrıguez-Ezpeleta et al. 2007b; Burki et al. 2008;Hampl et al. 2009), whereas taxon-rich analyses with4 or fewer genes yield topologies with poor support atdeep nodes (Cavalier-Smith 2004; Parfrey et al. 2006;Yoon et al. 2008).
Estimating the relationships of the major lineagesof eukaryotes is difficult because of both the ancientage of eukaryotes (1.2–1.8 billion years; Knoll et al.2006) and complex gene histories that include hetero-geneous rates of molecular evolution and paralogy(Maddison 1997; Gribaldo and Philippe 2002; Tekleet al. 2009). A further issue obscuring eukaryotic re-lationships is the chimeric nature of the eukaryoticgenome—not all genes are vertically inherited due tolateral gene transfer (LGT) and endosymbiotic genetransfer (EGT)—that can also mislead efforts to re-construct phylogenetic relationships (Andersson 2005;Rannala and Yang 2008; Tekle et al. 2009). This is espe-cially true among photosynthetic lineages that comprise“Chromalveolata” and “Archaeplastida” where a largeportion of the host genome (approximately 8–18%) is
518
by guest on April 28, 2013
http://sysbio.oxfordjournals.org/D
ownloaded from
http://sysbio.oxfordjournals.org/content/59/5/518.full
Euks More Resolution
Monday, April 29, 13
Syst. Biol. 59(5):518–533, 2010c! The Author(s) 2010. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved.
For Permissions, please email: [email protected]:10.1093/sysbio/syq037Advance Access publication on July 23, 2010
Broadly Sampled Multigene Analyses Yield a Well-Resolved Eukaryotic Tree of Life
LAURA WEGENER PARFREY1, JESSICA GRANT2, YONAS I. TEKLE2,6, ERICA LASEK-NESSELQUIST3,4,HILARY G. MORRISON3, MITCHELL L. SOGIN3, DAVID J. PATTERSON5, AND LAURA A. KATZ1,2,!
1Program in Organismic and Evolutionary Biology, University of Massachusetts, 611 North Pleasant Street, Amherst,MA 01003, USA; 2Department of Biological Sciences, Smith College, 44 College Lane, Northampton, MA 01063, USA; 3Bay Paul Center for
Comparative Molecular Biology and Evolution, Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA 02543, USA; 4Department of Ecology andEvolutionary Biology, Brown University, 80 Waterman Street, Providence, RI 02912, USA; 5Biodiversity Informatics Group, Marine Biological
Laboratory, 7 MBL Street, Woods Hole, MA 02543, USA; 6Present address: Department of Epidemiology and Public Health, Yale University School ofMedicine, New Haven, CT 06520, USA;
"Correspondence to be sent to: Laura A. Katz, 44 College Lane, Northampton, MA 01003, USA; E-mail: [email protected] Wegener Parfrey and Jessica Grant have contributed equally to this work.
Received 30 September 2009; reviews returned 1 December 2009; accepted 25 May 2010Associate Editor: Cecile Ane
Abstract.—An accurate reconstruction of the eukaryotic tree of life is essential to identify the innovations underlying thediversity of microbial and macroscopic (e.g., plants and animals) eukaryotes. Previous work has divided eukaryotic diver-sity into a small number of high-level “supergroups,” many of which receive strong support in phylogenomic analyses.However, the abundance of data in phylogenomic analyses can lead to highly supported but incorrect relationships dueto systematic phylogenetic error. Furthermore, the paucity of major eukaryotic lineages (19 or fewer) included in thesegenomic studies may exaggerate systematic error and reduce power to evaluate hypotheses. Here, we use a taxon-richstrategy to assess eukaryotic relationships. We show that analyses emphasizing broad taxonomic sampling (up to 451 taxarepresenting 72 major lineages) combined with a moderate number of genes yield a well-resolved eukaryotic tree of life.The consistency across analyses with varying numbers of taxa (88–451) and levels of missing data (17–69%) supports theaccuracy of the resulting topologies. The resulting stable topology emerges without the removal of rapidly evolving genesor taxa, a practice common to phylogenomic analyses. Several major groups are stable and strongly supported in theseanalyses (e.g., SAR, Rhizaria, Excavata), whereas the proposed supergroup “Chromalveolata” is rejected. Furthermore, ex-tensive instability among photosynthetic lineages suggests the presence of systematic biases including endosymbiotic genetransfer from symbiont (nucleus or plastid) to host. Our analyses demonstrate that stable topologies of ancient evolutionaryrelationships can be achieved with broad taxonomic sampling and a moderate number of genes. Finally, taxon-rich analy-ses such as presented here provide a method for testing the accuracy of relationships that receive high bootstrap support(BS) in phylogenomic analyses and enable placement of the multitude of lineages that lack genome scale data. [Excavata;microbial eukaryotes; Rhizaria; supergroups; systematic error; taxon sampling.]
Perspectives on the structure of the eukaryotic treeof life have shifted in the past decade as molecularanalyses provide hypotheses for relationships amongthe approximately 75 robust lineages of eukaryotes.These lineages are defined by ultrastructural identities(Patterson 1999)—patterns of cellular and subcellularorganization revealed by electron microscopy—and arestrongly supported in molecular analyses (Parfrey et al.2006; Yoon et al. 2008). Most of these lineages nowfall within a small number of higher level clades, thesupergroups of eukaryotes (Simpson and Roger 2004;Adl et al. 2005; Keeling et al. 2005). Several of theseclades—Opisthokonta, Rhizaria, and Amoebozoa—are increasingly well supported by phylogenomic(Rodrıguez-Ezpeleta et al. 2007a; Burki et al. 2008;Hampl et al. 2009) and phylogenetic (Parfrey et al.2006; Pawlowski and Burki 2009), analyses, whereassupport for “Archaeplastida” predominantly comesfrom some phylogenomic studies (Rodrıguez-Ezpeletaet al. 2005; Burki et al. 2007) or analyses of plastidgenes (Yoon et al. 2002; Parfrey et al. 2006). In con-trast, support for “Chromalveolata” and Excavata ismixed, often dependent on the selection of taxa in-cluded in analyses (Rodrıguez-Ezpeleta et al. 2005;Parfrey et al. 2006; Rodrıguez-Ezpeleta et al. 2007a;Burki et al. 2008; Hampl et al. 2009). We use quotation
marks throughout to note groups where uncertaintiesremain. Moreover, it is difficult to evaluate the overallstability of major clades of eukaryotes because phyloge-nomic analyses have 19 or fewer of the major lineagesand hence do not sufficiently sample eukaryotic diver-sity (Rodrıguez-Ezpeleta et al. 2007b; Burki et al. 2008;Hampl et al. 2009), whereas taxon-rich analyses with4 or fewer genes yield topologies with poor support atdeep nodes (Cavalier-Smith 2004; Parfrey et al. 2006;Yoon et al. 2008).
Estimating the relationships of the major lineagesof eukaryotes is difficult because of both the ancientage of eukaryotes (1.2–1.8 billion years; Knoll et al.2006) and complex gene histories that include hetero-geneous rates of molecular evolution and paralogy(Maddison 1997; Gribaldo and Philippe 2002; Tekleet al. 2009). A further issue obscuring eukaryotic re-lationships is the chimeric nature of the eukaryoticgenome—not all genes are vertically inherited due tolateral gene transfer (LGT) and endosymbiotic genetransfer (EGT)—that can also mislead efforts to re-construct phylogenetic relationships (Andersson 2005;Rannala and Yang 2008; Tekle et al. 2009). This is espe-cially true among photosynthetic lineages that comprise“Chromalveolata” and “Archaeplastida” where a largeportion of the host genome (approximately 8–18%) is
518
by guest on April 28, 2013
http://sysbio.oxfordjournals.org/D
ownloaded from
530 SYSTEMATIC BIOLOGY VOL. 59
a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (TableS1). Three major clades are strongly supported, thoughthe relationships among them are unresolved: i) Cerco-zoa, ii) Foraminifera plus Polycystinea and Acantharea(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-phorida with Gromia and vampyrellids (Fig. 3; Basset al. 2009). We show that Theratromyxa, a nematode-eating soil amoeba, is related to vampyrellid amoebae(Fig. 3; 100% BS), and together they are sister to the plantparasites plasmodiophorids (100% BS). The SSU-rDNAsequence for Theratromyxa is identical to an amoeba iso-lated from Siberia where it was identified as Arachnulaimpatiens (EU567294; Bass et al. 2009).
The topology within the Excavata is consistent withprevious hypotheses and clades with ultrastructuralidentities (Simpson 2003; Fig. 4), when contaminantEST data originally mislabeled as Streblomastix strixare excluded (Slamovits and Keeling 2006). Excavatais often polyphyletic in other analyses because Malaw-imonas branches outside the other clades of Excavata(Rodrıguez-Ezpeleta et al. 2007a; Hampl et al. 2009),whereas in analyses of fewer genes Excavata mem-bers fall into 2 or 3 clades (Parfrey et al. 2006; Simp-son et al. 2006). Although Malawimonas nests robustlywithin Excavata in our analyses, it does not have astable sister group and may represent an independentlineage (Fig. 4). Our analyses confirm that Stephanopogon(unplaced in Patterson 1999) branches within Heterolo-bosea (Cavalier-Smith and Nikolaev 2008; Yubuki andLeander 2008) and suggests that another enigmatic flag-ellate, ATCC 50646 (tentatively named Soginia anisocys-tis) is a basal member of Heterolobosea.
CONCLUSIONS
The robust tree of life emerging from this studydemonstrates the benefits of improved taxon samplingfor reconstructing deep phylogeny as our analyses pro-duce stable topologies that include a broad representa-tion of eukaryotes. The current study, combined withinsights from other studies referenced herein, has re-fined the eukaryotic tree of life from over 70 majorlineages (Patterson 1999) to !16 major groups (Fig. 5,http://eutree.lifedesks.org/). Most significantly, weattribute the stability of major clades (e.g., Excavata,Amoebozoa, Opisthokonta, and SAR) to broader taxo-nomic sampling combined with analyses of sufficientcharacters (16 genes or 6578 characters). In our view,inclusion of more taxa coupled with carefully chosengenes is necessary to further resolve the 16 or so majorlineages of microbial eukaryotes for which sister grouprelationships remain uncertain.
SUPPLEMENTARY MATERIAL
Supplementary material can be found at http://www.sysbio.oxfordjournals.org/.
FIGURE 5. Summary of major findings—the evolutionary relation-ships among major lineages of eukaryotes. Clades have been collapsedinto those that we view to be strongly supported. The many poly-tomies represent uncertainties that remain.
FUNDING
This work was made possible by the US NationalScience Foundation Assembling the Tree of Life grantto L.A.K. and D.J.P. (043115) and US National Institutesof Health 5R01AI058054-05 to M.L.S. Funding to collectForaminifera was provided by a Society of SystematicBiologists MiniPEET grant to L.W.P.
ACKNOWLEDGMENTS
We are grateful to Robert Molestina at ATCC whoprovided DNAs through a collaborative National Sci-ence Foundation grant. We acknowledge the assistanceof Kasia Hammar, Leslie Murphy, and Jillian Wardin preparation and sequencing of EST libraries. Ourmanuscript was improved following detailed commentsfrom the editors, Alastair Simpson, and 1 anonymousreviewer. Thanks to David Hillis for conversations on
by guest on April 28, 2013
http://sysbio.oxfordjournals.org/D
ownloaded from
http://sysbio.oxfordjournals.org/content/59/5/518.full
Euks More Resolution but Simpler
Monday, April 29, 13
Mapping GOLD to TreePriapulida 1 0Phaeophyceae 1 0Rotifera 1 0Hemichordata 1 0Pinguiophyceae 1 0Ctenophora 1 0Bolidophyceae 1 0Chaetognatha 1 0Porifera 2 0Xanthophyceae 2 0Tardigrada 2 0Euglenida 2 0Chromerida 3 0Placozoa 3 0Glomeromycota 3 0Cryptomycota 4 0Blastocladiomycota 5 0Echinodermata 6 0Entomophthoromycota 9 0Chytridiomycota 12 0Neocallimastigomycota 12 0Annelida 13 0Eustigmatophyceae 13 0Cnidaria 18 0Bacillariophyta 21 0Platyhelminthes 23 0Mollusca 25 0Microsporidia 31 1Chlorophyta 77 1Nematoda 110 2Apicomplexa 264 5Arthropoda 370 7Chordata 626 12Streptophyta 796 15Basidiomycota 976 18Ascomycota 1,251 23
530 SYSTEMATIC BIOLOGY VOL. 59
a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (TableS1). Three major clades are strongly supported, thoughthe relationships among them are unresolved: i) Cerco-zoa, ii) Foraminifera plus Polycystinea and Acantharea(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-phorida with Gromia and vampyrellids (Fig. 3; Basset al. 2009). We show that Theratromyxa, a nematode-eating soil amoeba, is related to vampyrellid amoebae(Fig. 3; 100% BS), and together they are sister to the plantparasites plasmodiophorids (100% BS). The SSU-rDNAsequence for Theratromyxa is identical to an amoeba iso-lated from Siberia where it was identified as Arachnulaimpatiens (EU567294; Bass et al. 2009).
The topology within the Excavata is consistent withprevious hypotheses and clades with ultrastructuralidentities (Simpson 2003; Fig. 4), when contaminantEST data originally mislabeled as Streblomastix strixare excluded (Slamovits and Keeling 2006). Excavatais often polyphyletic in other analyses because Malaw-imonas branches outside the other clades of Excavata(Rodrıguez-Ezpeleta et al. 2007a; Hampl et al. 2009),whereas in analyses of fewer genes Excavata mem-bers fall into 2 or 3 clades (Parfrey et al. 2006; Simp-son et al. 2006). Although Malawimonas nests robustlywithin Excavata in our analyses, it does not have astable sister group and may represent an independentlineage (Fig. 4). Our analyses confirm that Stephanopogon(unplaced in Patterson 1999) branches within Heterolo-bosea (Cavalier-Smith and Nikolaev 2008; Yubuki andLeander 2008) and suggests that another enigmatic flag-ellate, ATCC 50646 (tentatively named Soginia anisocys-tis) is a basal member of Heterolobosea.
CONCLUSIONS
The robust tree of life emerging from this studydemonstrates the benefits of improved taxon samplingfor reconstructing deep phylogeny as our analyses pro-duce stable topologies that include a broad representa-tion of eukaryotes. The current study, combined withinsights from other studies referenced herein, has re-fined the eukaryotic tree of life from over 70 majorlineages (Patterson 1999) to !16 major groups (Fig. 5,http://eutree.lifedesks.org/). Most significantly, weattribute the stability of major clades (e.g., Excavata,Amoebozoa, Opisthokonta, and SAR) to broader taxo-nomic sampling combined with analyses of sufficientcharacters (16 genes or 6578 characters). In our view,inclusion of more taxa coupled with carefully chosengenes is necessary to further resolve the 16 or so majorlineages of microbial eukaryotes for which sister grouprelationships remain uncertain.
SUPPLEMENTARY MATERIAL
Supplementary material can be found at http://www.sysbio.oxfordjournals.org/.
FIGURE 5. Summary of major findings—the evolutionary relation-ships among major lineages of eukaryotes. Clades have been collapsedinto those that we view to be strongly supported. The many poly-tomies represent uncertainties that remain.
FUNDING
This work was made possible by the US NationalScience Foundation Assembling the Tree of Life grantto L.A.K. and D.J.P. (043115) and US National Institutesof Health 5R01AI058054-05 to M.L.S. Funding to collectForaminifera was provided by a Society of SystematicBiologists MiniPEET grant to L.W.P.
ACKNOWLEDGMENTS
We are grateful to Robert Molestina at ATCC whoprovided DNAs through a collaborative National Sci-ence Foundation grant. We acknowledge the assistanceof Kasia Hammar, Leslie Murphy, and Jillian Wardin preparation and sequencing of EST libraries. Ourmanuscript was improved following detailed commentsfrom the editors, Alastair Simpson, and 1 anonymousreviewer. Thanks to David Hillis for conversations on
by guest on April 28, 2013
http://sysbio.oxfordjournals.org/D
ownloaded from
a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (Table
the relationships among them are unresolved: i) Cerco-
(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-
Bassa nematode-
eating soil amoeba, is related to vampyrellid amoebae; 100% BS), and together they are sister to the plant
is identical to an amoeba iso-Arachnula
previous hypotheses and clades with ultrastructural), when contaminant
). ExcavataMalaw-
),whereas in analyses of fewer genes Excavata mem-
Simp-
stable sister group and may represent an independent
) branches within Heterolo-
) and suggests that another enigmatic flag-Soginia anisocys-
Monday, April 29, 13
Priapulida 1 0Phaeophyceae 1 0Rotifera 1 0Hemichordata 1 0Pinguiophyceae 1 0Ctenophora 1 0Bolidophyceae 1 0Chaetognatha 1 0Porifera 2 0Xanthophyceae 2 0Tardigrada 2 0Euglenida 2 0Chromerida 3 0Placozoa 3 0Glomeromycota 3 0Cryptomycota 4 0Blastocladiomycota 5 0Echinodermata 6 0Entomophthoromycota 9 0Chytridiomycota 12 0Neocallimastigomycota 12 0Annelida 13 0Eustigmatophyceae 13 0Cnidaria 18 0Bacillariophyta 21 0Platyhelminthes 23 0Mollusca 25 0Microsporidia 31 1Chlorophyta 77 1Nematoda 110 2Apicomplexa 264 5Arthropoda 370 7Chordata 626 12Streptophyta 796 15Basidiomycota 976 18Ascomycota 1,251 23
530 SYSTEMATIC BIOLOGY VOL. 59
a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (TableS1). Three major clades are strongly supported, thoughthe relationships among them are unresolved: i) Cerco-zoa, ii) Foraminifera plus Polycystinea and Acantharea(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-phorida with Gromia and vampyrellids (Fig. 3; Basset al. 2009). We show that Theratromyxa, a nematode-eating soil amoeba, is related to vampyrellid amoebae(Fig. 3; 100% BS), and together they are sister to the plantparasites plasmodiophorids (100% BS). The SSU-rDNAsequence for Theratromyxa is identical to an amoeba iso-lated from Siberia where it was identified as Arachnulaimpatiens (EU567294; Bass et al. 2009).
The topology within the Excavata is consistent withprevious hypotheses and clades with ultrastructuralidentities (Simpson 2003; Fig. 4), when contaminantEST data originally mislabeled as Streblomastix strixare excluded (Slamovits and Keeling 2006). Excavatais often polyphyletic in other analyses because Malaw-imonas branches outside the other clades of Excavata(Rodrıguez-Ezpeleta et al. 2007a; Hampl et al. 2009),whereas in analyses of fewer genes Excavata mem-bers fall into 2 or 3 clades (Parfrey et al. 2006; Simp-son et al. 2006). Although Malawimonas nests robustlywithin Excavata in our analyses, it does not have astable sister group and may represent an independentlineage (Fig. 4). Our analyses confirm that Stephanopogon(unplaced in Patterson 1999) branches within Heterolo-bosea (Cavalier-Smith and Nikolaev 2008; Yubuki andLeander 2008) and suggests that another enigmatic flag-ellate, ATCC 50646 (tentatively named Soginia anisocys-tis) is a basal member of Heterolobosea.
CONCLUSIONS
The robust tree of life emerging from this studydemonstrates the benefits of improved taxon samplingfor reconstructing deep phylogeny as our analyses pro-duce stable topologies that include a broad representa-tion of eukaryotes. The current study, combined withinsights from other studies referenced herein, has re-fined the eukaryotic tree of life from over 70 majorlineages (Patterson 1999) to !16 major groups (Fig. 5,http://eutree.lifedesks.org/). Most significantly, weattribute the stability of major clades (e.g., Excavata,Amoebozoa, Opisthokonta, and SAR) to broader taxo-nomic sampling combined with analyses of sufficientcharacters (16 genes or 6578 characters). In our view,inclusion of more taxa coupled with carefully chosengenes is necessary to further resolve the 16 or so majorlineages of microbial eukaryotes for which sister grouprelationships remain uncertain.
SUPPLEMENTARY MATERIAL
Supplementary material can be found at http://www.sysbio.oxfordjournals.org/.
FIGURE 5. Summary of major findings—the evolutionary relation-ships among major lineages of eukaryotes. Clades have been collapsedinto those that we view to be strongly supported. The many poly-tomies represent uncertainties that remain.
FUNDING
This work was made possible by the US NationalScience Foundation Assembling the Tree of Life grantto L.A.K. and D.J.P. (043115) and US National Institutesof Health 5R01AI058054-05 to M.L.S. Funding to collectForaminifera was provided by a Society of SystematicBiologists MiniPEET grant to L.W.P.
ACKNOWLEDGMENTS
We are grateful to Robert Molestina at ATCC whoprovided DNAs through a collaborative National Sci-ence Foundation grant. We acknowledge the assistanceof Kasia Hammar, Leslie Murphy, and Jillian Wardin preparation and sequencing of EST libraries. Ourmanuscript was improved following detailed commentsfrom the editors, Alastair Simpson, and 1 anonymousreviewer. Thanks to David Hillis for conversations on
by guest on April 28, 2013
http://sysbio.oxfordjournals.org/D
ownloaded from
a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (Table
the relationships among them are unresolved: i) Cerco-
(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-
Bassa nematode-
eating soil amoeba, is related to vampyrellid amoebae; 100% BS), and together they are sister to the plant
is identical to an amoeba iso-Arachnula
previous hypotheses and clades with ultrastructural), when contaminant
). ExcavataMalaw-
),whereas in analyses of fewer genes Excavata mem-
Simp-
stable sister group and may represent an independent
) branches within Heterolo-
) and suggests that another enigmatic flag-Soginia anisocys-
Mapping GOLD to Tree
Monday, April 29, 13
Priapulida 1 0Phaeophyceae 1 0Rotifera 1 0Hemichordata 1 0Pinguiophyceae 1 0Ctenophora 1 0Bolidophyceae 1 0Chaetognatha 1 0Porifera 2 0Xanthophyceae 2 0Tardigrada 2 0Euglenida 2 0Chromerida 3 0Placozoa 3 0Glomeromycota 3 0Cryptomycota 4 0Blastocladiomycota 5 0Echinodermata 6 0Entomophthoromycota 9 0Chytridiomycota 12 0Neocallimastigomycota 12 0Annelida 13 0Eustigmatophyceae 13 0Cnidaria 18 0Bacillariophyta 21 0Platyhelminthes 23 0Mollusca 25 0Microsporidia 31 1Chlorophyta 77 1Nematoda 110 2Apicomplexa 264 5Arthropoda 370 7Chordata 626 12Streptophyta 796 15Basidiomycota 976 18Ascomycota 1,251 23
530 SYSTEMATIC BIOLOGY VOL. 59
a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (TableS1). Three major clades are strongly supported, thoughthe relationships among them are unresolved: i) Cerco-zoa, ii) Foraminifera plus Polycystinea and Acantharea(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-phorida with Gromia and vampyrellids (Fig. 3; Basset al. 2009). We show that Theratromyxa, a nematode-eating soil amoeba, is related to vampyrellid amoebae(Fig. 3; 100% BS), and together they are sister to the plantparasites plasmodiophorids (100% BS). The SSU-rDNAsequence for Theratromyxa is identical to an amoeba iso-lated from Siberia where it was identified as Arachnulaimpatiens (EU567294; Bass et al. 2009).
The topology within the Excavata is consistent withprevious hypotheses and clades with ultrastructuralidentities (Simpson 2003; Fig. 4), when contaminantEST data originally mislabeled as Streblomastix strixare excluded (Slamovits and Keeling 2006). Excavatais often polyphyletic in other analyses because Malaw-imonas branches outside the other clades of Excavata(Rodrıguez-Ezpeleta et al. 2007a; Hampl et al. 2009),whereas in analyses of fewer genes Excavata mem-bers fall into 2 or 3 clades (Parfrey et al. 2006; Simp-son et al. 2006). Although Malawimonas nests robustlywithin Excavata in our analyses, it does not have astable sister group and may represent an independentlineage (Fig. 4). Our analyses confirm that Stephanopogon(unplaced in Patterson 1999) branches within Heterolo-bosea (Cavalier-Smith and Nikolaev 2008; Yubuki andLeander 2008) and suggests that another enigmatic flag-ellate, ATCC 50646 (tentatively named Soginia anisocys-tis) is a basal member of Heterolobosea.
CONCLUSIONS
The robust tree of life emerging from this studydemonstrates the benefits of improved taxon samplingfor reconstructing deep phylogeny as our analyses pro-duce stable topologies that include a broad representa-tion of eukaryotes. The current study, combined withinsights from other studies referenced herein, has re-fined the eukaryotic tree of life from over 70 majorlineages (Patterson 1999) to !16 major groups (Fig. 5,http://eutree.lifedesks.org/). Most significantly, weattribute the stability of major clades (e.g., Excavata,Amoebozoa, Opisthokonta, and SAR) to broader taxo-nomic sampling combined with analyses of sufficientcharacters (16 genes or 6578 characters). In our view,inclusion of more taxa coupled with carefully chosengenes is necessary to further resolve the 16 or so majorlineages of microbial eukaryotes for which sister grouprelationships remain uncertain.
SUPPLEMENTARY MATERIAL
Supplementary material can be found at http://www.sysbio.oxfordjournals.org/.
FIGURE 5. Summary of major findings—the evolutionary relation-ships among major lineages of eukaryotes. Clades have been collapsedinto those that we view to be strongly supported. The many poly-tomies represent uncertainties that remain.
FUNDING
This work was made possible by the US NationalScience Foundation Assembling the Tree of Life grantto L.A.K. and D.J.P. (043115) and US National Institutesof Health 5R01AI058054-05 to M.L.S. Funding to collectForaminifera was provided by a Society of SystematicBiologists MiniPEET grant to L.W.P.
ACKNOWLEDGMENTS
We are grateful to Robert Molestina at ATCC whoprovided DNAs through a collaborative National Sci-ence Foundation grant. We acknowledge the assistanceof Kasia Hammar, Leslie Murphy, and Jillian Wardin preparation and sequencing of EST libraries. Ourmanuscript was improved following detailed commentsfrom the editors, Alastair Simpson, and 1 anonymousreviewer. Thanks to David Hillis for conversations on
by guest on April 28, 2013
http://sysbio.oxfordjournals.org/D
ownloaded from
a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (Table
the relationships among them are unresolved: i) Cerco-
(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-
Bassa nematode-
eating soil amoeba, is related to vampyrellid amoebae; 100% BS), and together they are sister to the plant
is identical to an amoeba iso-Arachnula
previous hypotheses and clades with ultrastructural), when contaminant
). ExcavataMalaw-
),whereas in analyses of fewer genes Excavata mem-
Simp-
stable sister group and may represent an independent
) branches within Heterolo-
) and suggests that another enigmatic flag-Soginia anisocys-
Fungi49%
Mapping GOLD to Tree
Monday, April 29, 13
Priapulida 1 0Phaeophyceae 1 0Rotifera 1 0Hemichordata 1 0Pinguiophyceae 1 0Ctenophora 1 0Bolidophyceae 1 0Chaetognatha 1 0Porifera 2 0Xanthophyceae 2 0Tardigrada 2 0Euglenida 2 0Chromerida 3 0Placozoa 3 0Glomeromycota 3 0Cryptomycota 4 0Blastocladiomycota 5 0Echinodermata 6 0Entomophthoromycota 9 0Chytridiomycota 12 0Neocallimastigomycota 12 0Annelida 13 0Eustigmatophyceae 13 0Cnidaria 18 0Bacillariophyta 21 0Platyhelminthes 23 0Mollusca 25 0Microsporidia 31 1Chlorophyta 77 1Nematoda 110 2Apicomplexa 264 5Arthropoda 370 7Chordata 626 12Streptophyta 796 15Basidiomycota 976 18Ascomycota 1,251 23
530 SYSTEMATIC BIOLOGY VOL. 59
a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (TableS1). Three major clades are strongly supported, thoughthe relationships among them are unresolved: i) Cerco-zoa, ii) Foraminifera plus Polycystinea and Acantharea(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-phorida with Gromia and vampyrellids (Fig. 3; Basset al. 2009). We show that Theratromyxa, a nematode-eating soil amoeba, is related to vampyrellid amoebae(Fig. 3; 100% BS), and together they are sister to the plantparasites plasmodiophorids (100% BS). The SSU-rDNAsequence for Theratromyxa is identical to an amoeba iso-lated from Siberia where it was identified as Arachnulaimpatiens (EU567294; Bass et al. 2009).
The topology within the Excavata is consistent withprevious hypotheses and clades with ultrastructuralidentities (Simpson 2003; Fig. 4), when contaminantEST data originally mislabeled as Streblomastix strixare excluded (Slamovits and Keeling 2006). Excavatais often polyphyletic in other analyses because Malaw-imonas branches outside the other clades of Excavata(Rodrıguez-Ezpeleta et al. 2007a; Hampl et al. 2009),whereas in analyses of fewer genes Excavata mem-bers fall into 2 or 3 clades (Parfrey et al. 2006; Simp-son et al. 2006). Although Malawimonas nests robustlywithin Excavata in our analyses, it does not have astable sister group and may represent an independentlineage (Fig. 4). Our analyses confirm that Stephanopogon(unplaced in Patterson 1999) branches within Heterolo-bosea (Cavalier-Smith and Nikolaev 2008; Yubuki andLeander 2008) and suggests that another enigmatic flag-ellate, ATCC 50646 (tentatively named Soginia anisocys-tis) is a basal member of Heterolobosea.
CONCLUSIONS
The robust tree of life emerging from this studydemonstrates the benefits of improved taxon samplingfor reconstructing deep phylogeny as our analyses pro-duce stable topologies that include a broad representa-tion of eukaryotes. The current study, combined withinsights from other studies referenced herein, has re-fined the eukaryotic tree of life from over 70 majorlineages (Patterson 1999) to !16 major groups (Fig. 5,http://eutree.lifedesks.org/). Most significantly, weattribute the stability of major clades (e.g., Excavata,Amoebozoa, Opisthokonta, and SAR) to broader taxo-nomic sampling combined with analyses of sufficientcharacters (16 genes or 6578 characters). In our view,inclusion of more taxa coupled with carefully chosengenes is necessary to further resolve the 16 or so majorlineages of microbial eukaryotes for which sister grouprelationships remain uncertain.
SUPPLEMENTARY MATERIAL
Supplementary material can be found at http://www.sysbio.oxfordjournals.org/.
FIGURE 5. Summary of major findings—the evolutionary relation-ships among major lineages of eukaryotes. Clades have been collapsedinto those that we view to be strongly supported. The many poly-tomies represent uncertainties that remain.
FUNDING
This work was made possible by the US NationalScience Foundation Assembling the Tree of Life grantto L.A.K. and D.J.P. (043115) and US National Institutesof Health 5R01AI058054-05 to M.L.S. Funding to collectForaminifera was provided by a Society of SystematicBiologists MiniPEET grant to L.W.P.
ACKNOWLEDGMENTS
We are grateful to Robert Molestina at ATCC whoprovided DNAs through a collaborative National Sci-ence Foundation grant. We acknowledge the assistanceof Kasia Hammar, Leslie Murphy, and Jillian Wardin preparation and sequencing of EST libraries. Ourmanuscript was improved following detailed commentsfrom the editors, Alastair Simpson, and 1 anonymousreviewer. Thanks to David Hillis for conversations on
by guest on April 28, 2013
http://sysbio.oxfordjournals.org/D
ownloaded from
a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (Table
the relationships among them are unresolved: i) Cerco-
(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-
Bassa nematode-
eating soil amoeba, is related to vampyrellid amoebae; 100% BS), and together they are sister to the plant
is identical to an amoeba iso-Arachnula
previous hypotheses and clades with ultrastructural), when contaminant
). ExcavataMalaw-
),whereas in analyses of fewer genes Excavata mem-
Simp-
stable sister group and may represent an independent
) branches within Heterolo-
) and suggests that another enigmatic flag-Soginia anisocys-
Mapping GOLD to Tree
Monday, April 29, 13
Priapulida 1 0Phaeophyceae 1 0Rotifera 1 0Hemichordata 1 0Pinguiophyceae 1 0Ctenophora 1 0Bolidophyceae 1 0Chaetognatha 1 0Porifera 2 0Xanthophyceae 2 0Tardigrada 2 0Euglenida 2 0Chromerida 3 0Placozoa 3 0Glomeromycota 3 0Cryptomycota 4 0Blastocladiomycota 5 0Echinodermata 6 0Entomophthoromycota 9 0Chytridiomycota 12 0Neocallimastigomycota 12 0Annelida 13 0Eustigmatophyceae 13 0Cnidaria 18 0Bacillariophyta 21 0Platyhelminthes 23 0Mollusca 25 0Microsporidia 31 1Chlorophyta 77 1Nematoda 110 2Apicomplexa 264 5Arthropoda 370 7Chordata 626 12Streptophyta 796 15Basidiomycota 976 18Ascomycota 1,251 23
530 SYSTEMATIC BIOLOGY VOL. 59
a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (TableS1). Three major clades are strongly supported, thoughthe relationships among them are unresolved: i) Cerco-zoa, ii) Foraminifera plus Polycystinea and Acantharea(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-phorida with Gromia and vampyrellids (Fig. 3; Basset al. 2009). We show that Theratromyxa, a nematode-eating soil amoeba, is related to vampyrellid amoebae(Fig. 3; 100% BS), and together they are sister to the plantparasites plasmodiophorids (100% BS). The SSU-rDNAsequence for Theratromyxa is identical to an amoeba iso-lated from Siberia where it was identified as Arachnulaimpatiens (EU567294; Bass et al. 2009).
The topology within the Excavata is consistent withprevious hypotheses and clades with ultrastructuralidentities (Simpson 2003; Fig. 4), when contaminantEST data originally mislabeled as Streblomastix strixare excluded (Slamovits and Keeling 2006). Excavatais often polyphyletic in other analyses because Malaw-imonas branches outside the other clades of Excavata(Rodrıguez-Ezpeleta et al. 2007a; Hampl et al. 2009),whereas in analyses of fewer genes Excavata mem-bers fall into 2 or 3 clades (Parfrey et al. 2006; Simp-son et al. 2006). Although Malawimonas nests robustlywithin Excavata in our analyses, it does not have astable sister group and may represent an independentlineage (Fig. 4). Our analyses confirm that Stephanopogon(unplaced in Patterson 1999) branches within Heterolo-bosea (Cavalier-Smith and Nikolaev 2008; Yubuki andLeander 2008) and suggests that another enigmatic flag-ellate, ATCC 50646 (tentatively named Soginia anisocys-tis) is a basal member of Heterolobosea.
CONCLUSIONS
The robust tree of life emerging from this studydemonstrates the benefits of improved taxon samplingfor reconstructing deep phylogeny as our analyses pro-duce stable topologies that include a broad representa-tion of eukaryotes. The current study, combined withinsights from other studies referenced herein, has re-fined the eukaryotic tree of life from over 70 majorlineages (Patterson 1999) to !16 major groups (Fig. 5,http://eutree.lifedesks.org/). Most significantly, weattribute the stability of major clades (e.g., Excavata,Amoebozoa, Opisthokonta, and SAR) to broader taxo-nomic sampling combined with analyses of sufficientcharacters (16 genes or 6578 characters). In our view,inclusion of more taxa coupled with carefully chosengenes is necessary to further resolve the 16 or so majorlineages of microbial eukaryotes for which sister grouprelationships remain uncertain.
SUPPLEMENTARY MATERIAL
Supplementary material can be found at http://www.sysbio.oxfordjournals.org/.
FIGURE 5. Summary of major findings—the evolutionary relation-ships among major lineages of eukaryotes. Clades have been collapsedinto those that we view to be strongly supported. The many poly-tomies represent uncertainties that remain.
FUNDING
This work was made possible by the US NationalScience Foundation Assembling the Tree of Life grantto L.A.K. and D.J.P. (043115) and US National Institutesof Health 5R01AI058054-05 to M.L.S. Funding to collectForaminifera was provided by a Society of SystematicBiologists MiniPEET grant to L.W.P.
ACKNOWLEDGMENTS
We are grateful to Robert Molestina at ATCC whoprovided DNAs through a collaborative National Sci-ence Foundation grant. We acknowledge the assistanceof Kasia Hammar, Leslie Murphy, and Jillian Wardin preparation and sequencing of EST libraries. Ourmanuscript was improved following detailed commentsfrom the editors, Alastair Simpson, and 1 anonymousreviewer. Thanks to David Hillis for conversations on
by guest on April 28, 2013
http://sysbio.oxfordjournals.org/D
ownloaded from
a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (Table
the relationships among them are unresolved: i) Cerco-
(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-
Bassa nematode-
eating soil amoeba, is related to vampyrellid amoebae; 100% BS), and together they are sister to the plant
is identical to an amoeba iso-Arachnula
previous hypotheses and clades with ultrastructural), when contaminant
). ExcavataMalaw-
),whereas in analyses of fewer genes Excavata mem-
Simp-
stable sister group and may represent an independent
) branches within Heterolo-
) and suggests that another enigmatic flag-Soginia anisocys-
Mapping GOLD to Tree
Monday, April 29, 13
Priapulida 1 0Phaeophyceae 1 0Rotifera 1 0Hemichordata 1 0Pinguiophyceae 1 0Ctenophora 1 0Bolidophyceae 1 0Chaetognatha 1 0Porifera 2 0Xanthophyceae 2 0Tardigrada 2 0Euglenida 2 0Chromerida 3 0Placozoa 3 0Glomeromycota 3 0Cryptomycota 4 0Blastocladiomycota 5 0Echinodermata 6 0Entomophthoromycota 9 0Chytridiomycota 12 0Neocallimastigomycota 12 0Annelida 13 0Eustigmatophyceae 13 0Cnidaria 18 0Bacillariophyta 21 0Platyhelminthes 23 0Mollusca 25 0Microsporidia 31 1Chlorophyta 77 1Nematoda 110 2Apicomplexa 264 5Arthropoda 370 7Chordata 626 12Streptophyta 796 15Basidiomycota 976 18Ascomycota 1,251 23
530 SYSTEMATIC BIOLOGY VOL. 59
a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (TableS1). Three major clades are strongly supported, thoughthe relationships among them are unresolved: i) Cerco-zoa, ii) Foraminifera plus Polycystinea and Acantharea(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-phorida with Gromia and vampyrellids (Fig. 3; Basset al. 2009). We show that Theratromyxa, a nematode-eating soil amoeba, is related to vampyrellid amoebae(Fig. 3; 100% BS), and together they are sister to the plantparasites plasmodiophorids (100% BS). The SSU-rDNAsequence for Theratromyxa is identical to an amoeba iso-lated from Siberia where it was identified as Arachnulaimpatiens (EU567294; Bass et al. 2009).
The topology within the Excavata is consistent withprevious hypotheses and clades with ultrastructuralidentities (Simpson 2003; Fig. 4), when contaminantEST data originally mislabeled as Streblomastix strixare excluded (Slamovits and Keeling 2006). Excavatais often polyphyletic in other analyses because Malaw-imonas branches outside the other clades of Excavata(Rodrıguez-Ezpeleta et al. 2007a; Hampl et al. 2009),whereas in analyses of fewer genes Excavata mem-bers fall into 2 or 3 clades (Parfrey et al. 2006; Simp-son et al. 2006). Although Malawimonas nests robustlywithin Excavata in our analyses, it does not have astable sister group and may represent an independentlineage (Fig. 4). Our analyses confirm that Stephanopogon(unplaced in Patterson 1999) branches within Heterolo-bosea (Cavalier-Smith and Nikolaev 2008; Yubuki andLeander 2008) and suggests that another enigmatic flag-ellate, ATCC 50646 (tentatively named Soginia anisocys-tis) is a basal member of Heterolobosea.
CONCLUSIONS
The robust tree of life emerging from this studydemonstrates the benefits of improved taxon samplingfor reconstructing deep phylogeny as our analyses pro-duce stable topologies that include a broad representa-tion of eukaryotes. The current study, combined withinsights from other studies referenced herein, has re-fined the eukaryotic tree of life from over 70 majorlineages (Patterson 1999) to !16 major groups (Fig. 5,http://eutree.lifedesks.org/). Most significantly, weattribute the stability of major clades (e.g., Excavata,Amoebozoa, Opisthokonta, and SAR) to broader taxo-nomic sampling combined with analyses of sufficientcharacters (16 genes or 6578 characters). In our view,inclusion of more taxa coupled with carefully chosengenes is necessary to further resolve the 16 or so majorlineages of microbial eukaryotes for which sister grouprelationships remain uncertain.
SUPPLEMENTARY MATERIAL
Supplementary material can be found at http://www.sysbio.oxfordjournals.org/.
FIGURE 5. Summary of major findings—the evolutionary relation-ships among major lineages of eukaryotes. Clades have been collapsedinto those that we view to be strongly supported. The many poly-tomies represent uncertainties that remain.
FUNDING
This work was made possible by the US NationalScience Foundation Assembling the Tree of Life grantto L.A.K. and D.J.P. (043115) and US National Institutesof Health 5R01AI058054-05 to M.L.S. Funding to collectForaminifera was provided by a Society of SystematicBiologists MiniPEET grant to L.W.P.
ACKNOWLEDGMENTS
We are grateful to Robert Molestina at ATCC whoprovided DNAs through a collaborative National Sci-ence Foundation grant. We acknowledge the assistanceof Kasia Hammar, Leslie Murphy, and Jillian Wardin preparation and sequencing of EST libraries. Ourmanuscript was improved following detailed commentsfrom the editors, Alastair Simpson, and 1 anonymousreviewer. Thanks to David Hillis for conversations on
by guest on April 28, 2013
http://sysbio.oxfordjournals.org/D
ownloaded from
a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (Table
the relationships among them are unresolved: i) Cerco-
(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-
Bassa nematode-
eating soil amoeba, is related to vampyrellid amoebae; 100% BS), and together they are sister to the plant
is identical to an amoeba iso-Arachnula
previous hypotheses and clades with ultrastructural), when contaminant
). ExcavataMalaw-
),whereas in analyses of fewer genes Excavata mem-
Simp-
stable sister group and may represent an independent
) branches within Heterolo-
) and suggests that another enigmatic flag-Soginia anisocys-
Animals26%
Mapping GOLD to Tree
Monday, April 29, 13
Priapulida 1 0Phaeophyceae 1 0Rotifera 1 0Hemichordata 1 0Pinguiophyceae 1 0Ctenophora 1 0Bolidophyceae 1 0Chaetognatha 1 0Porifera 2 0Xanthophyceae 2 0Tardigrada 2 0Euglenida 2 0Chromerida 3 0Placozoa 3 0Glomeromycota 3 0Cryptomycota 4 0Blastocladiomycota 5 0Echinodermata 6 0Entomophthoromycota 9 0Chytridiomycota 12 0Neocallimastigomycota 12 0Annelida 13 0Eustigmatophyceae 13 0Cnidaria 18 0Bacillariophyta 21 0Platyhelminthes 23 0Mollusca 25 0Microsporidia 31 1Chlorophyta 77 1Nematoda 110 2Apicomplexa 264 5Arthropoda 370 7Chordata 626 12Streptophyta 796 15Basidiomycota 976 18Ascomycota 1,251 23
530 SYSTEMATIC BIOLOGY VOL. 59
a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (TableS1). Three major clades are strongly supported, thoughthe relationships among them are unresolved: i) Cerco-zoa, ii) Foraminifera plus Polycystinea and Acantharea(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-phorida with Gromia and vampyrellids (Fig. 3; Basset al. 2009). We show that Theratromyxa, a nematode-eating soil amoeba, is related to vampyrellid amoebae(Fig. 3; 100% BS), and together they are sister to the plantparasites plasmodiophorids (100% BS). The SSU-rDNAsequence for Theratromyxa is identical to an amoeba iso-lated from Siberia where it was identified as Arachnulaimpatiens (EU567294; Bass et al. 2009).
The topology within the Excavata is consistent withprevious hypotheses and clades with ultrastructuralidentities (Simpson 2003; Fig. 4), when contaminantEST data originally mislabeled as Streblomastix strixare excluded (Slamovits and Keeling 2006). Excavatais often polyphyletic in other analyses because Malaw-imonas branches outside the other clades of Excavata(Rodrıguez-Ezpeleta et al. 2007a; Hampl et al. 2009),whereas in analyses of fewer genes Excavata mem-bers fall into 2 or 3 clades (Parfrey et al. 2006; Simp-son et al. 2006). Although Malawimonas nests robustlywithin Excavata in our analyses, it does not have astable sister group and may represent an independentlineage (Fig. 4). Our analyses confirm that Stephanopogon(unplaced in Patterson 1999) branches within Heterolo-bosea (Cavalier-Smith and Nikolaev 2008; Yubuki andLeander 2008) and suggests that another enigmatic flag-ellate, ATCC 50646 (tentatively named Soginia anisocys-tis) is a basal member of Heterolobosea.
CONCLUSIONS
The robust tree of life emerging from this studydemonstrates the benefits of improved taxon samplingfor reconstructing deep phylogeny as our analyses pro-duce stable topologies that include a broad representa-tion of eukaryotes. The current study, combined withinsights from other studies referenced herein, has re-fined the eukaryotic tree of life from over 70 majorlineages (Patterson 1999) to !16 major groups (Fig. 5,http://eutree.lifedesks.org/). Most significantly, weattribute the stability of major clades (e.g., Excavata,Amoebozoa, Opisthokonta, and SAR) to broader taxo-nomic sampling combined with analyses of sufficientcharacters (16 genes or 6578 characters). In our view,inclusion of more taxa coupled with carefully chosengenes is necessary to further resolve the 16 or so majorlineages of microbial eukaryotes for which sister grouprelationships remain uncertain.
SUPPLEMENTARY MATERIAL
Supplementary material can be found at http://www.sysbio.oxfordjournals.org/.
FIGURE 5. Summary of major findings—the evolutionary relation-ships among major lineages of eukaryotes. Clades have been collapsedinto those that we view to be strongly supported. The many poly-tomies represent uncertainties that remain.
FUNDING
This work was made possible by the US NationalScience Foundation Assembling the Tree of Life grantto L.A.K. and D.J.P. (043115) and US National Institutesof Health 5R01AI058054-05 to M.L.S. Funding to collectForaminifera was provided by a Society of SystematicBiologists MiniPEET grant to L.W.P.
ACKNOWLEDGMENTS
We are grateful to Robert Molestina at ATCC whoprovided DNAs through a collaborative National Sci-ence Foundation grant. We acknowledge the assistanceof Kasia Hammar, Leslie Murphy, and Jillian Wardin preparation and sequencing of EST libraries. Ourmanuscript was improved following detailed commentsfrom the editors, Alastair Simpson, and 1 anonymousreviewer. Thanks to David Hillis for conversations on
by guest on April 28, 2013
http://sysbio.oxfordjournals.org/D
ownloaded from
a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (Table
the relationships among them are unresolved: i) Cerco-
(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-
Bassa nematode-
eating soil amoeba, is related to vampyrellid amoebae; 100% BS), and together they are sister to the plant
is identical to an amoeba iso-Arachnula
previous hypotheses and clades with ultrastructural), when contaminant
). ExcavataMalaw-
),whereas in analyses of fewer genes Excavata mem-
Simp-
stable sister group and may represent an independent
) branches within Heterolo-
) and suggests that another enigmatic flag-Soginia anisocys-
Mapping GOLD to Tree
Monday, April 29, 13
Priapulida 1 0Phaeophyceae 1 0Rotifera 1 0Hemichordata 1 0Pinguiophyceae 1 0Ctenophora 1 0Bolidophyceae 1 0Chaetognatha 1 0Porifera 2 0Xanthophyceae 2 0Tardigrada 2 0Euglenida 2 0Chromerida 3 0Placozoa 3 0Glomeromycota 3 0Cryptomycota 4 0Blastocladiomycota 5 0Echinodermata 6 0Entomophthoromycota 9 0Chytridiomycota 12 0Neocallimastigomycota 12 0Annelida 13 0Eustigmatophyceae 13 0Cnidaria 18 0Bacillariophyta 21 0Platyhelminthes 23 0Mollusca 25 0Microsporidia 31 1Chlorophyta 77 1Nematoda 110 2Apicomplexa 264 5Arthropoda 370 7Chordata 626 12Streptophyta 796 15Basidiomycota 976 18Ascomycota 1,251 23
530 SYSTEMATIC BIOLOGY VOL. 59
a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (TableS1). Three major clades are strongly supported, thoughthe relationships among them are unresolved: i) Cerco-zoa, ii) Foraminifera plus Polycystinea and Acantharea(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-phorida with Gromia and vampyrellids (Fig. 3; Basset al. 2009). We show that Theratromyxa, a nematode-eating soil amoeba, is related to vampyrellid amoebae(Fig. 3; 100% BS), and together they are sister to the plantparasites plasmodiophorids (100% BS). The SSU-rDNAsequence for Theratromyxa is identical to an amoeba iso-lated from Siberia where it was identified as Arachnulaimpatiens (EU567294; Bass et al. 2009).
The topology within the Excavata is consistent withprevious hypotheses and clades with ultrastructuralidentities (Simpson 2003; Fig. 4), when contaminantEST data originally mislabeled as Streblomastix strixare excluded (Slamovits and Keeling 2006). Excavatais often polyphyletic in other analyses because Malaw-imonas branches outside the other clades of Excavata(Rodrıguez-Ezpeleta et al. 2007a; Hampl et al. 2009),whereas in analyses of fewer genes Excavata mem-bers fall into 2 or 3 clades (Parfrey et al. 2006; Simp-son et al. 2006). Although Malawimonas nests robustlywithin Excavata in our analyses, it does not have astable sister group and may represent an independentlineage (Fig. 4). Our analyses confirm that Stephanopogon(unplaced in Patterson 1999) branches within Heterolo-bosea (Cavalier-Smith and Nikolaev 2008; Yubuki andLeander 2008) and suggests that another enigmatic flag-ellate, ATCC 50646 (tentatively named Soginia anisocys-tis) is a basal member of Heterolobosea.
CONCLUSIONS
The robust tree of life emerging from this studydemonstrates the benefits of improved taxon samplingfor reconstructing deep phylogeny as our analyses pro-duce stable topologies that include a broad representa-tion of eukaryotes. The current study, combined withinsights from other studies referenced herein, has re-fined the eukaryotic tree of life from over 70 majorlineages (Patterson 1999) to !16 major groups (Fig. 5,http://eutree.lifedesks.org/). Most significantly, weattribute the stability of major clades (e.g., Excavata,Amoebozoa, Opisthokonta, and SAR) to broader taxo-nomic sampling combined with analyses of sufficientcharacters (16 genes or 6578 characters). In our view,inclusion of more taxa coupled with carefully chosengenes is necessary to further resolve the 16 or so majorlineages of microbial eukaryotes for which sister grouprelationships remain uncertain.
SUPPLEMENTARY MATERIAL
Supplementary material can be found at http://www.sysbio.oxfordjournals.org/.
FIGURE 5. Summary of major findings—the evolutionary relation-ships among major lineages of eukaryotes. Clades have been collapsedinto those that we view to be strongly supported. The many poly-tomies represent uncertainties that remain.
FUNDING
This work was made possible by the US NationalScience Foundation Assembling the Tree of Life grantto L.A.K. and D.J.P. (043115) and US National Institutesof Health 5R01AI058054-05 to M.L.S. Funding to collectForaminifera was provided by a Society of SystematicBiologists MiniPEET grant to L.W.P.
ACKNOWLEDGMENTS
We are grateful to Robert Molestina at ATCC whoprovided DNAs through a collaborative National Sci-ence Foundation grant. We acknowledge the assistanceof Kasia Hammar, Leslie Murphy, and Jillian Wardin preparation and sequencing of EST libraries. Ourmanuscript was improved following detailed commentsfrom the editors, Alastair Simpson, and 1 anonymousreviewer. Thanks to David Hillis for conversations on
by guest on April 28, 2013
http://sysbio.oxfordjournals.org/D
ownloaded from
a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (Table
the relationships among them are unresolved: i) Cerco-
(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-
Bassa nematode-
eating soil amoeba, is related to vampyrellid amoebae; 100% BS), and together they are sister to the plant
is identical to an amoeba iso-Arachnula
previous hypotheses and clades with ultrastructural), when contaminant
). ExcavataMalaw-
),whereas in analyses of fewer genes Excavata mem-
Simp-
stable sister group and may represent an independent
) branches within Heterolo-
) and suggests that another enigmatic flag-Soginia anisocys-
Mapping GOLD to Tree
Monday, April 29, 13
Priapulida 1 0Phaeophyceae 1 0Rotifera 1 0Hemichordata 1 0Pinguiophyceae 1 0Ctenophora 1 0Bolidophyceae 1 0Chaetognatha 1 0Porifera 2 0Xanthophyceae 2 0Tardigrada 2 0Euglenida 2 0Chromerida 3 0Placozoa 3 0Glomeromycota 3 0Cryptomycota 4 0Blastocladiomycota 5 0Echinodermata 6 0Entomophthoromycota 9 0Chytridiomycota 12 0Neocallimastigomycota 12 0Annelida 13 0Eustigmatophyceae 13 0Cnidaria 18 0Bacillariophyta 21 0Platyhelminthes 23 0Mollusca 25 0Microsporidia 31 1Chlorophyta 77 1Nematoda 110 2Apicomplexa 264 5Arthropoda 370 7Chordata 626 12Streptophyta 796 15Basidiomycota 976 18Ascomycota 1,251 23
530 SYSTEMATIC BIOLOGY VOL. 59
a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (TableS1). Three major clades are strongly supported, thoughthe relationships among them are unresolved: i) Cerco-zoa, ii) Foraminifera plus Polycystinea and Acantharea(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-phorida with Gromia and vampyrellids (Fig. 3; Basset al. 2009). We show that Theratromyxa, a nematode-eating soil amoeba, is related to vampyrellid amoebae(Fig. 3; 100% BS), and together they are sister to the plantparasites plasmodiophorids (100% BS). The SSU-rDNAsequence for Theratromyxa is identical to an amoeba iso-lated from Siberia where it was identified as Arachnulaimpatiens (EU567294; Bass et al. 2009).
The topology within the Excavata is consistent withprevious hypotheses and clades with ultrastructuralidentities (Simpson 2003; Fig. 4), when contaminantEST data originally mislabeled as Streblomastix strixare excluded (Slamovits and Keeling 2006). Excavatais often polyphyletic in other analyses because Malaw-imonas branches outside the other clades of Excavata(Rodrıguez-Ezpeleta et al. 2007a; Hampl et al. 2009),whereas in analyses of fewer genes Excavata mem-bers fall into 2 or 3 clades (Parfrey et al. 2006; Simp-son et al. 2006). Although Malawimonas nests robustlywithin Excavata in our analyses, it does not have astable sister group and may represent an independentlineage (Fig. 4). Our analyses confirm that Stephanopogon(unplaced in Patterson 1999) branches within Heterolo-bosea (Cavalier-Smith and Nikolaev 2008; Yubuki andLeander 2008) and suggests that another enigmatic flag-ellate, ATCC 50646 (tentatively named Soginia anisocys-tis) is a basal member of Heterolobosea.
CONCLUSIONS
The robust tree of life emerging from this studydemonstrates the benefits of improved taxon samplingfor reconstructing deep phylogeny as our analyses pro-duce stable topologies that include a broad representa-tion of eukaryotes. The current study, combined withinsights from other studies referenced herein, has re-fined the eukaryotic tree of life from over 70 majorlineages (Patterson 1999) to !16 major groups (Fig. 5,http://eutree.lifedesks.org/). Most significantly, weattribute the stability of major clades (e.g., Excavata,Amoebozoa, Opisthokonta, and SAR) to broader taxo-nomic sampling combined with analyses of sufficientcharacters (16 genes or 6578 characters). In our view,inclusion of more taxa coupled with carefully chosengenes is necessary to further resolve the 16 or so majorlineages of microbial eukaryotes for which sister grouprelationships remain uncertain.
SUPPLEMENTARY MATERIAL
Supplementary material can be found at http://www.sysbio.oxfordjournals.org/.
FIGURE 5. Summary of major findings—the evolutionary relation-ships among major lineages of eukaryotes. Clades have been collapsedinto those that we view to be strongly supported. The many poly-tomies represent uncertainties that remain.
FUNDING
This work was made possible by the US NationalScience Foundation Assembling the Tree of Life grantto L.A.K. and D.J.P. (043115) and US National Institutesof Health 5R01AI058054-05 to M.L.S. Funding to collectForaminifera was provided by a Society of SystematicBiologists MiniPEET grant to L.W.P.
ACKNOWLEDGMENTS
We are grateful to Robert Molestina at ATCC whoprovided DNAs through a collaborative National Sci-ence Foundation grant. We acknowledge the assistanceof Kasia Hammar, Leslie Murphy, and Jillian Wardin preparation and sequencing of EST libraries. Ourmanuscript was improved following detailed commentsfrom the editors, Alastair Simpson, and 1 anonymousreviewer. Thanks to David Hillis for conversations on
by guest on April 28, 2013
http://sysbio.oxfordjournals.org/D
ownloaded from
a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (Table
the relationships among them are unresolved: i) Cerco-
(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-
Bassa nematode-
eating soil amoeba, is related to vampyrellid amoebae; 100% BS), and together they are sister to the plant
is identical to an amoeba iso-Arachnula
previous hypotheses and clades with ultrastructural), when contaminant
). ExcavataMalaw-
),whereas in analyses of fewer genes Excavata mem-
Simp-
stable sister group and may represent an independent
) branches within Heterolo-
) and suggests that another enigmatic flag-Soginia anisocys-
Green algae19%
Mapping GOLD to Tree
Monday, April 29, 13
Priapulida 1 0Phaeophyceae 1 0Rotifera 1 0Hemichordata 1 0Pinguiophyceae 1 0Ctenophora 1 0Bolidophyceae 1 0Chaetognatha 1 0Porifera 2 0Xanthophyceae 2 0Tardigrada 2 0Euglenida 2 0Chromerida 3 0Placozoa 3 0Glomeromycota 3 0Cryptomycota 4 0Blastocladiomycota 5 0Echinodermata 6 0Entomophthoromycota 9 0Chytridiomycota 12 0Neocallimastigomycota 12 0Annelida 13 0Eustigmatophyceae 13 0Cnidaria 18 0Bacillariophyta 21 0Platyhelminthes 23 0Mollusca 25 0Microsporidia 31 1Chlorophyta 77 1Nematoda 110 2Apicomplexa 264 5Arthropoda 370 7Chordata 626 12Streptophyta 796 15Basidiomycota 976 18Ascomycota 1,251 23
530 SYSTEMATIC BIOLOGY VOL. 59
a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (TableS1). Three major clades are strongly supported, thoughthe relationships among them are unresolved: i) Cerco-zoa, ii) Foraminifera plus Polycystinea and Acantharea(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-phorida with Gromia and vampyrellids (Fig. 3; Basset al. 2009). We show that Theratromyxa, a nematode-eating soil amoeba, is related to vampyrellid amoebae(Fig. 3; 100% BS), and together they are sister to the plantparasites plasmodiophorids (100% BS). The SSU-rDNAsequence for Theratromyxa is identical to an amoeba iso-lated from Siberia where it was identified as Arachnulaimpatiens (EU567294; Bass et al. 2009).
The topology within the Excavata is consistent withprevious hypotheses and clades with ultrastructuralidentities (Simpson 2003; Fig. 4), when contaminantEST data originally mislabeled as Streblomastix strixare excluded (Slamovits and Keeling 2006). Excavatais often polyphyletic in other analyses because Malaw-imonas branches outside the other clades of Excavata(Rodrıguez-Ezpeleta et al. 2007a; Hampl et al. 2009),whereas in analyses of fewer genes Excavata mem-bers fall into 2 or 3 clades (Parfrey et al. 2006; Simp-son et al. 2006). Although Malawimonas nests robustlywithin Excavata in our analyses, it does not have astable sister group and may represent an independentlineage (Fig. 4). Our analyses confirm that Stephanopogon(unplaced in Patterson 1999) branches within Heterolo-bosea (Cavalier-Smith and Nikolaev 2008; Yubuki andLeander 2008) and suggests that another enigmatic flag-ellate, ATCC 50646 (tentatively named Soginia anisocys-tis) is a basal member of Heterolobosea.
CONCLUSIONS
The robust tree of life emerging from this studydemonstrates the benefits of improved taxon samplingfor reconstructing deep phylogeny as our analyses pro-duce stable topologies that include a broad representa-tion of eukaryotes. The current study, combined withinsights from other studies referenced herein, has re-fined the eukaryotic tree of life from over 70 majorlineages (Patterson 1999) to !16 major groups (Fig. 5,http://eutree.lifedesks.org/). Most significantly, weattribute the stability of major clades (e.g., Excavata,Amoebozoa, Opisthokonta, and SAR) to broader taxo-nomic sampling combined with analyses of sufficientcharacters (16 genes or 6578 characters). In our view,inclusion of more taxa coupled with carefully chosengenes is necessary to further resolve the 16 or so majorlineages of microbial eukaryotes for which sister grouprelationships remain uncertain.
SUPPLEMENTARY MATERIAL
Supplementary material can be found at http://www.sysbio.oxfordjournals.org/.
FIGURE 5. Summary of major findings—the evolutionary relation-ships among major lineages of eukaryotes. Clades have been collapsedinto those that we view to be strongly supported. The many poly-tomies represent uncertainties that remain.
FUNDING
This work was made possible by the US NationalScience Foundation Assembling the Tree of Life grantto L.A.K. and D.J.P. (043115) and US National Institutesof Health 5R01AI058054-05 to M.L.S. Funding to collectForaminifera was provided by a Society of SystematicBiologists MiniPEET grant to L.W.P.
ACKNOWLEDGMENTS
We are grateful to Robert Molestina at ATCC whoprovided DNAs through a collaborative National Sci-ence Foundation grant. We acknowledge the assistanceof Kasia Hammar, Leslie Murphy, and Jillian Wardin preparation and sequencing of EST libraries. Ourmanuscript was improved following detailed commentsfrom the editors, Alastair Simpson, and 1 anonymousreviewer. Thanks to David Hillis for conversations on
by guest on April 28, 2013
http://sysbio.oxfordjournals.org/D
ownloaded from
a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (Table
the relationships among them are unresolved: i) Cerco-
(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-
Bassa nematode-
eating soil amoeba, is related to vampyrellid amoebae; 100% BS), and together they are sister to the plant
is identical to an amoeba iso-Arachnula
previous hypotheses and clades with ultrastructural), when contaminant
). ExcavataMalaw-
),whereas in analyses of fewer genes Excavata mem-
Simp-
stable sister group and may represent an independent
) branches within Heterolo-
) and suggests that another enigmatic flag-Soginia anisocys-
Mapping GOLD to Tree
Monday, April 29, 13
Priapulida 1 0Phaeophyceae 1 0Rotifera 1 0Hemichordata 1 0Pinguiophyceae 1 0Ctenophora 1 0Bolidophyceae 1 0Chaetognatha 1 0Porifera 2 0Xanthophyceae 2 0Tardigrada 2 0Euglenida 2 0Chromerida 3 0Placozoa 3 0Glomeromycota 3 0Cryptomycota 4 0Blastocladiomycota 5 0Echinodermata 6 0Entomophthoromycota 9 0Chytridiomycota 12 0Neocallimastigomycota 12 0Annelida 13 0Eustigmatophyceae 13 0Cnidaria 18 0Bacillariophyta 21 0Platyhelminthes 23 0Mollusca 25 0Microsporidia 31 1Chlorophyta 77 1Nematoda 110 2Apicomplexa 264 5Arthropoda 370 7Chordata 626 12Streptophyta 796 15Basidiomycota 976 18Ascomycota 1,251 23
530 SYSTEMATIC BIOLOGY VOL. 59
a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (TableS1). Three major clades are strongly supported, thoughthe relationships among them are unresolved: i) Cerco-zoa, ii) Foraminifera plus Polycystinea and Acantharea(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-phorida with Gromia and vampyrellids (Fig. 3; Basset al. 2009). We show that Theratromyxa, a nematode-eating soil amoeba, is related to vampyrellid amoebae(Fig. 3; 100% BS), and together they are sister to the plantparasites plasmodiophorids (100% BS). The SSU-rDNAsequence for Theratromyxa is identical to an amoeba iso-lated from Siberia where it was identified as Arachnulaimpatiens (EU567294; Bass et al. 2009).
The topology within the Excavata is consistent withprevious hypotheses and clades with ultrastructuralidentities (Simpson 2003; Fig. 4), when contaminantEST data originally mislabeled as Streblomastix strixare excluded (Slamovits and Keeling 2006). Excavatais often polyphyletic in other analyses because Malaw-imonas branches outside the other clades of Excavata(Rodrıguez-Ezpeleta et al. 2007a; Hampl et al. 2009),whereas in analyses of fewer genes Excavata mem-bers fall into 2 or 3 clades (Parfrey et al. 2006; Simp-son et al. 2006). Although Malawimonas nests robustlywithin Excavata in our analyses, it does not have astable sister group and may represent an independentlineage (Fig. 4). Our analyses confirm that Stephanopogon(unplaced in Patterson 1999) branches within Heterolo-bosea (Cavalier-Smith and Nikolaev 2008; Yubuki andLeander 2008) and suggests that another enigmatic flag-ellate, ATCC 50646 (tentatively named Soginia anisocys-tis) is a basal member of Heterolobosea.
CONCLUSIONS
The robust tree of life emerging from this studydemonstrates the benefits of improved taxon samplingfor reconstructing deep phylogeny as our analyses pro-duce stable topologies that include a broad representa-tion of eukaryotes. The current study, combined withinsights from other studies referenced herein, has re-fined the eukaryotic tree of life from over 70 majorlineages (Patterson 1999) to !16 major groups (Fig. 5,http://eutree.lifedesks.org/). Most significantly, weattribute the stability of major clades (e.g., Excavata,Amoebozoa, Opisthokonta, and SAR) to broader taxo-nomic sampling combined with analyses of sufficientcharacters (16 genes or 6578 characters). In our view,inclusion of more taxa coupled with carefully chosengenes is necessary to further resolve the 16 or so majorlineages of microbial eukaryotes for which sister grouprelationships remain uncertain.
SUPPLEMENTARY MATERIAL
Supplementary material can be found at http://www.sysbio.oxfordjournals.org/.
FIGURE 5. Summary of major findings—the evolutionary relation-ships among major lineages of eukaryotes. Clades have been collapsedinto those that we view to be strongly supported. The many poly-tomies represent uncertainties that remain.
FUNDING
This work was made possible by the US NationalScience Foundation Assembling the Tree of Life grantto L.A.K. and D.J.P. (043115) and US National Institutesof Health 5R01AI058054-05 to M.L.S. Funding to collectForaminifera was provided by a Society of SystematicBiologists MiniPEET grant to L.W.P.
ACKNOWLEDGMENTS
We are grateful to Robert Molestina at ATCC whoprovided DNAs through a collaborative National Sci-ence Foundation grant. We acknowledge the assistanceof Kasia Hammar, Leslie Murphy, and Jillian Wardin preparation and sequencing of EST libraries. Ourmanuscript was improved following detailed commentsfrom the editors, Alastair Simpson, and 1 anonymousreviewer. Thanks to David Hillis for conversations on
by guest on April 28, 2013
http://sysbio.oxfordjournals.org/D
ownloaded from
a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (Table
the relationships among them are unresolved: i) Cerco-
(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-
Bassa nematode-
eating soil amoeba, is related to vampyrellid amoebae; 100% BS), and together they are sister to the plant
is identical to an amoeba iso-Arachnula
previous hypotheses and clades with ultrastructural), when contaminant
). ExcavataMalaw-
),whereas in analyses of fewer genes Excavata mem-
Simp-
stable sister group and may represent an independent
) branches within Heterolo-
) and suggests that another enigmatic flag-Soginia anisocys-
Mapping GOLD to Tree
Monday, April 29, 13
Priapulida 1 0Phaeophyceae 1 0Rotifera 1 0Hemichordata 1 0Pinguiophyceae 1 0Ctenophora 1 0Bolidophyceae 1 0Chaetognatha 1 0Porifera 2 0Xanthophyceae 2 0Tardigrada 2 0Euglenida 2 0Chromerida 3 0Placozoa 3 0Glomeromycota 3 0Cryptomycota 4 0Blastocladiomycota 5 0Echinodermata 6 0Entomophthoromycota 9 0Chytridiomycota 12 0Neocallimastigomycota 12 0Annelida 13 0Eustigmatophyceae 13 0Cnidaria 18 0Bacillariophyta 21 0Platyhelminthes 23 0Mollusca 25 0Microsporidia 31 1Chlorophyta 77 1Nematoda 110 2Apicomplexa 264 5Arthropoda 370 7Chordata 626 12Streptophyta 796 15Basidiomycota 976 18Ascomycota 1,251 23
530 SYSTEMATIC BIOLOGY VOL. 59
a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (TableS1). Three major clades are strongly supported, thoughthe relationships among them are unresolved: i) Cerco-zoa, ii) Foraminifera plus Polycystinea and Acantharea(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-phorida with Gromia and vampyrellids (Fig. 3; Basset al. 2009). We show that Theratromyxa, a nematode-eating soil amoeba, is related to vampyrellid amoebae(Fig. 3; 100% BS), and together they are sister to the plantparasites plasmodiophorids (100% BS). The SSU-rDNAsequence for Theratromyxa is identical to an amoeba iso-lated from Siberia where it was identified as Arachnulaimpatiens (EU567294; Bass et al. 2009).
The topology within the Excavata is consistent withprevious hypotheses and clades with ultrastructuralidentities (Simpson 2003; Fig. 4), when contaminantEST data originally mislabeled as Streblomastix strixare excluded (Slamovits and Keeling 2006). Excavatais often polyphyletic in other analyses because Malaw-imonas branches outside the other clades of Excavata(Rodrıguez-Ezpeleta et al. 2007a; Hampl et al. 2009),whereas in analyses of fewer genes Excavata mem-bers fall into 2 or 3 clades (Parfrey et al. 2006; Simp-son et al. 2006). Although Malawimonas nests robustlywithin Excavata in our analyses, it does not have astable sister group and may represent an independentlineage (Fig. 4). Our analyses confirm that Stephanopogon(unplaced in Patterson 1999) branches within Heterolo-bosea (Cavalier-Smith and Nikolaev 2008; Yubuki andLeander 2008) and suggests that another enigmatic flag-ellate, ATCC 50646 (tentatively named Soginia anisocys-tis) is a basal member of Heterolobosea.
CONCLUSIONS
The robust tree of life emerging from this studydemonstrates the benefits of improved taxon samplingfor reconstructing deep phylogeny as our analyses pro-duce stable topologies that include a broad representa-tion of eukaryotes. The current study, combined withinsights from other studies referenced herein, has re-fined the eukaryotic tree of life from over 70 majorlineages (Patterson 1999) to !16 major groups (Fig. 5,http://eutree.lifedesks.org/). Most significantly, weattribute the stability of major clades (e.g., Excavata,Amoebozoa, Opisthokonta, and SAR) to broader taxo-nomic sampling combined with analyses of sufficientcharacters (16 genes or 6578 characters). In our view,inclusion of more taxa coupled with carefully chosengenes is necessary to further resolve the 16 or so majorlineages of microbial eukaryotes for which sister grouprelationships remain uncertain.
SUPPLEMENTARY MATERIAL
Supplementary material can be found at http://www.sysbio.oxfordjournals.org/.
FIGURE 5. Summary of major findings—the evolutionary relation-ships among major lineages of eukaryotes. Clades have been collapsedinto those that we view to be strongly supported. The many poly-tomies represent uncertainties that remain.
FUNDING
This work was made possible by the US NationalScience Foundation Assembling the Tree of Life grantto L.A.K. and D.J.P. (043115) and US National Institutesof Health 5R01AI058054-05 to M.L.S. Funding to collectForaminifera was provided by a Society of SystematicBiologists MiniPEET grant to L.W.P.
ACKNOWLEDGMENTS
We are grateful to Robert Molestina at ATCC whoprovided DNAs through a collaborative National Sci-ence Foundation grant. We acknowledge the assistanceof Kasia Hammar, Leslie Murphy, and Jillian Wardin preparation and sequencing of EST libraries. Ourmanuscript was improved following detailed commentsfrom the editors, Alastair Simpson, and 1 anonymousreviewer. Thanks to David Hillis for conversations on
by guest on April 28, 2013
http://sysbio.oxfordjournals.org/D
ownloaded from
a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (Table
the relationships among them are unresolved: i) Cerco-
(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-
Bassa nematode-
eating soil amoeba, is related to vampyrellid amoebae; 100% BS), and together they are sister to the plant
is identical to an amoeba iso-Arachnula
previous hypotheses and clades with ultrastructural), when contaminant
). ExcavataMalaw-
),whereas in analyses of fewer genes Excavata mem-
Simp-
stable sister group and may represent an independent
) branches within Heterolo-
) and suggests that another enigmatic flag-Soginia anisocys-
Apicomplexa5%
Mapping GOLD to Tree
Monday, April 29, 13
Priapulida 1 0Phaeophyceae 1 0Rotifera 1 0Hemichordata 1 0Pinguiophyceae 1 0Ctenophora 1 0Bolidophyceae 1 0Chaetognatha 1 0Porifera 2 0Xanthophyceae 2 0Tardigrada 2 0Euglenida 2 0Chromerida 3 0Placozoa 3 0Glomeromycota 3 0Cryptomycota 4 0Blastocladiomycota 5 0Echinodermata 6 0Entomophthoromycota 9 0Chytridiomycota 12 0Neocallimastigomycota 12 0Annelida 13 0Eustigmatophyceae 13 0Cnidaria 18 0Bacillariophyta 21 0Platyhelminthes 23 0Mollusca 25 0Microsporidia 31 1Chlorophyta 77 1Nematoda 110 2Apicomplexa 264 5Arthropoda 370 7Chordata 626 12Streptophyta 796 15Basidiomycota 976 18Ascomycota 1,251 23
530 SYSTEMATIC BIOLOGY VOL. 59
a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (TableS1). Three major clades are strongly supported, thoughthe relationships among them are unresolved: i) Cerco-zoa, ii) Foraminifera plus Polycystinea and Acantharea(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-phorida with Gromia and vampyrellids (Fig. 3; Basset al. 2009). We show that Theratromyxa, a nematode-eating soil amoeba, is related to vampyrellid amoebae(Fig. 3; 100% BS), and together they are sister to the plantparasites plasmodiophorids (100% BS). The SSU-rDNAsequence for Theratromyxa is identical to an amoeba iso-lated from Siberia where it was identified as Arachnulaimpatiens (EU567294; Bass et al. 2009).
The topology within the Excavata is consistent withprevious hypotheses and clades with ultrastructuralidentities (Simpson 2003; Fig. 4), when contaminantEST data originally mislabeled as Streblomastix strixare excluded (Slamovits and Keeling 2006). Excavatais often polyphyletic in other analyses because Malaw-imonas branches outside the other clades of Excavata(Rodrıguez-Ezpeleta et al. 2007a; Hampl et al. 2009),whereas in analyses of fewer genes Excavata mem-bers fall into 2 or 3 clades (Parfrey et al. 2006; Simp-son et al. 2006). Although Malawimonas nests robustlywithin Excavata in our analyses, it does not have astable sister group and may represent an independentlineage (Fig. 4). Our analyses confirm that Stephanopogon(unplaced in Patterson 1999) branches within Heterolo-bosea (Cavalier-Smith and Nikolaev 2008; Yubuki andLeander 2008) and suggests that another enigmatic flag-ellate, ATCC 50646 (tentatively named Soginia anisocys-tis) is a basal member of Heterolobosea.
CONCLUSIONS
The robust tree of life emerging from this studydemonstrates the benefits of improved taxon samplingfor reconstructing deep phylogeny as our analyses pro-duce stable topologies that include a broad representa-tion of eukaryotes. The current study, combined withinsights from other studies referenced herein, has re-fined the eukaryotic tree of life from over 70 majorlineages (Patterson 1999) to !16 major groups (Fig. 5,http://eutree.lifedesks.org/). Most significantly, weattribute the stability of major clades (e.g., Excavata,Amoebozoa, Opisthokonta, and SAR) to broader taxo-nomic sampling combined with analyses of sufficientcharacters (16 genes or 6578 characters). In our view,inclusion of more taxa coupled with carefully chosengenes is necessary to further resolve the 16 or so majorlineages of microbial eukaryotes for which sister grouprelationships remain uncertain.
SUPPLEMENTARY MATERIAL
Supplementary material can be found at http://www.sysbio.oxfordjournals.org/.
FIGURE 5. Summary of major findings—the evolutionary relation-ships among major lineages of eukaryotes. Clades have been collapsedinto those that we view to be strongly supported. The many poly-tomies represent uncertainties that remain.
FUNDING
This work was made possible by the US NationalScience Foundation Assembling the Tree of Life grantto L.A.K. and D.J.P. (043115) and US National Institutesof Health 5R01AI058054-05 to M.L.S. Funding to collectForaminifera was provided by a Society of SystematicBiologists MiniPEET grant to L.W.P.
ACKNOWLEDGMENTS
We are grateful to Robert Molestina at ATCC whoprovided DNAs through a collaborative National Sci-ence Foundation grant. We acknowledge the assistanceof Kasia Hammar, Leslie Murphy, and Jillian Wardin preparation and sequencing of EST libraries. Ourmanuscript was improved following detailed commentsfrom the editors, Alastair Simpson, and 1 anonymousreviewer. Thanks to David Hillis for conversations on
by guest on April 28, 2013
http://sysbio.oxfordjournals.org/D
ownloaded from
a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (Table
the relationships among them are unresolved: i) Cerco-
(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-
Bassa nematode-
eating soil amoeba, is related to vampyrellid amoebae; 100% BS), and together they are sister to the plant
is identical to an amoeba iso-Arachnula
previous hypotheses and clades with ultrastructural), when contaminant
). ExcavataMalaw-
),whereas in analyses of fewer genes Excavata mem-
Simp-
stable sister group and may represent an independent
) branches within Heterolo-
) and suggests that another enigmatic flag-Soginia anisocys-
A Very Biased Sampling
Monday, April 29, 13
Solution to Biased Sampling?Priapulida 1 0Phaeophyceae 1 0Rotifera 1 0Hemichordata 1 0Pinguiophyceae 1 0Ctenophora 1 0Bolidophyceae 1 0Chaetognatha 1 0Porifera 2 0Xanthophyceae 2 0Tardigrada 2 0Euglenida 2 0Chromerida 3 0Placozoa 3 0Glomeromycota 3 0Cryptomycota 4 0Blastocladiomycota 5 0Echinodermata 6 0Entomophthoromycota 9 0Chytridiomycota 12 0Neocallimastigomycota 12 0Annelida 13 0Eustigmatophyceae 13 0Cnidaria 18 0Bacillariophyta 21 0Platyhelminthes 23 0Mollusca 25 0Microsporidia 31 1Chlorophyta 77 1Nematoda 110 2Apicomplexa 264 5Arthropoda 370 7Chordata 626 12Streptophyta 796 15Basidiomycota 976 18Ascomycota 1,251 23
530 SYSTEMATIC BIOLOGY VOL. 59
a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (TableS1). Three major clades are strongly supported, thoughthe relationships among them are unresolved: i) Cerco-zoa, ii) Foraminifera plus Polycystinea and Acantharea(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-phorida with Gromia and vampyrellids (Fig. 3; Basset al. 2009). We show that Theratromyxa, a nematode-eating soil amoeba, is related to vampyrellid amoebae(Fig. 3; 100% BS), and together they are sister to the plantparasites plasmodiophorids (100% BS). The SSU-rDNAsequence for Theratromyxa is identical to an amoeba iso-lated from Siberia where it was identified as Arachnulaimpatiens (EU567294; Bass et al. 2009).
The topology within the Excavata is consistent withprevious hypotheses and clades with ultrastructuralidentities (Simpson 2003; Fig. 4), when contaminantEST data originally mislabeled as Streblomastix strixare excluded (Slamovits and Keeling 2006). Excavatais often polyphyletic in other analyses because Malaw-imonas branches outside the other clades of Excavata(Rodrıguez-Ezpeleta et al. 2007a; Hampl et al. 2009),whereas in analyses of fewer genes Excavata mem-bers fall into 2 or 3 clades (Parfrey et al. 2006; Simp-son et al. 2006). Although Malawimonas nests robustlywithin Excavata in our analyses, it does not have astable sister group and may represent an independentlineage (Fig. 4). Our analyses confirm that Stephanopogon(unplaced in Patterson 1999) branches within Heterolo-bosea (Cavalier-Smith and Nikolaev 2008; Yubuki andLeander 2008) and suggests that another enigmatic flag-ellate, ATCC 50646 (tentatively named Soginia anisocys-tis) is a basal member of Heterolobosea.
CONCLUSIONS
The robust tree of life emerging from this studydemonstrates the benefits of improved taxon samplingfor reconstructing deep phylogeny as our analyses pro-duce stable topologies that include a broad representa-tion of eukaryotes. The current study, combined withinsights from other studies referenced herein, has re-fined the eukaryotic tree of life from over 70 majorlineages (Patterson 1999) to !16 major groups (Fig. 5,http://eutree.lifedesks.org/). Most significantly, weattribute the stability of major clades (e.g., Excavata,Amoebozoa, Opisthokonta, and SAR) to broader taxo-nomic sampling combined with analyses of sufficientcharacters (16 genes or 6578 characters). In our view,inclusion of more taxa coupled with carefully chosengenes is necessary to further resolve the 16 or so majorlineages of microbial eukaryotes for which sister grouprelationships remain uncertain.
SUPPLEMENTARY MATERIAL
Supplementary material can be found at http://www.sysbio.oxfordjournals.org/.
FIGURE 5. Summary of major findings—the evolutionary relation-ships among major lineages of eukaryotes. Clades have been collapsedinto those that we view to be strongly supported. The many poly-tomies represent uncertainties that remain.
FUNDING
This work was made possible by the US NationalScience Foundation Assembling the Tree of Life grantto L.A.K. and D.J.P. (043115) and US National Institutesof Health 5R01AI058054-05 to M.L.S. Funding to collectForaminifera was provided by a Society of SystematicBiologists MiniPEET grant to L.W.P.
ACKNOWLEDGMENTS
We are grateful to Robert Molestina at ATCC whoprovided DNAs through a collaborative National Sci-ence Foundation grant. We acknowledge the assistanceof Kasia Hammar, Leslie Murphy, and Jillian Wardin preparation and sequencing of EST libraries. Ourmanuscript was improved following detailed commentsfrom the editors, Alastair Simpson, and 1 anonymousreviewer. Thanks to David Hillis for conversations on
by guest on April 28, 2013
http://sysbio.oxfordjournals.org/D
ownloaded from
a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (Table
the relationships among them are unresolved: i) Cerco-
(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-
Bassa nematode-
eating soil amoeba, is related to vampyrellid amoebae; 100% BS), and together they are sister to the plant
is identical to an amoeba iso-Arachnula
previous hypotheses and clades with ultrastructural), when contaminant
). ExcavataMalaw-
),whereas in analyses of fewer genes Excavata mem-
Simp-
stable sister group and may represent an independent
) branches within Heterolo-
) and suggests that another enigmatic flag-Soginia anisocys-
Monday, April 29, 13
Solution: Fill in the TreePriapulida 1 0Phaeophyceae 1 0Rotifera 1 0Hemichordata 1 0Pinguiophyceae 1 0Ctenophora 1 0Bolidophyceae 1 0Chaetognatha 1 0Porifera 2 0Xanthophyceae 2 0Tardigrada 2 0Euglenida 2 0Chromerida 3 0Placozoa 3 0Glomeromycota 3 0Cryptomycota 4 0Blastocladiomycota 5 0Echinodermata 6 0Entomophthoromycota 9 0Chytridiomycota 12 0Neocallimastigomycota 12 0Annelida 13 0Eustigmatophyceae 13 0Cnidaria 18 0Bacillariophyta 21 0Platyhelminthes 23 0Mollusca 25 0Microsporidia 31 1Chlorophyta 77 1Nematoda 110 2Apicomplexa 264 5Arthropoda 370 7Chordata 626 12Streptophyta 796 15Basidiomycota 976 18Ascomycota 1,251 23
530 SYSTEMATIC BIOLOGY VOL. 59
a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (TableS1). Three major clades are strongly supported, thoughthe relationships among them are unresolved: i) Cerco-zoa, ii) Foraminifera plus Polycystinea and Acantharea(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-phorida with Gromia and vampyrellids (Fig. 3; Basset al. 2009). We show that Theratromyxa, a nematode-eating soil amoeba, is related to vampyrellid amoebae(Fig. 3; 100% BS), and together they are sister to the plantparasites plasmodiophorids (100% BS). The SSU-rDNAsequence for Theratromyxa is identical to an amoeba iso-lated from Siberia where it was identified as Arachnulaimpatiens (EU567294; Bass et al. 2009).
The topology within the Excavata is consistent withprevious hypotheses and clades with ultrastructuralidentities (Simpson 2003; Fig. 4), when contaminantEST data originally mislabeled as Streblomastix strixare excluded (Slamovits and Keeling 2006). Excavatais often polyphyletic in other analyses because Malaw-imonas branches outside the other clades of Excavata(Rodrıguez-Ezpeleta et al. 2007a; Hampl et al. 2009),whereas in analyses of fewer genes Excavata mem-bers fall into 2 or 3 clades (Parfrey et al. 2006; Simp-son et al. 2006). Although Malawimonas nests robustlywithin Excavata in our analyses, it does not have astable sister group and may represent an independentlineage (Fig. 4). Our analyses confirm that Stephanopogon(unplaced in Patterson 1999) branches within Heterolo-bosea (Cavalier-Smith and Nikolaev 2008; Yubuki andLeander 2008) and suggests that another enigmatic flag-ellate, ATCC 50646 (tentatively named Soginia anisocys-tis) is a basal member of Heterolobosea.
CONCLUSIONS
The robust tree of life emerging from this studydemonstrates the benefits of improved taxon samplingfor reconstructing deep phylogeny as our analyses pro-duce stable topologies that include a broad representa-tion of eukaryotes. The current study, combined withinsights from other studies referenced herein, has re-fined the eukaryotic tree of life from over 70 majorlineages (Patterson 1999) to !16 major groups (Fig. 5,http://eutree.lifedesks.org/). Most significantly, weattribute the stability of major clades (e.g., Excavata,Amoebozoa, Opisthokonta, and SAR) to broader taxo-nomic sampling combined with analyses of sufficientcharacters (16 genes or 6578 characters). In our view,inclusion of more taxa coupled with carefully chosengenes is necessary to further resolve the 16 or so majorlineages of microbial eukaryotes for which sister grouprelationships remain uncertain.
SUPPLEMENTARY MATERIAL
Supplementary material can be found at http://www.sysbio.oxfordjournals.org/.
FIGURE 5. Summary of major findings—the evolutionary relation-ships among major lineages of eukaryotes. Clades have been collapsedinto those that we view to be strongly supported. The many poly-tomies represent uncertainties that remain.
FUNDING
This work was made possible by the US NationalScience Foundation Assembling the Tree of Life grantto L.A.K. and D.J.P. (043115) and US National Institutesof Health 5R01AI058054-05 to M.L.S. Funding to collectForaminifera was provided by a Society of SystematicBiologists MiniPEET grant to L.W.P.
ACKNOWLEDGMENTS
We are grateful to Robert Molestina at ATCC whoprovided DNAs through a collaborative National Sci-ence Foundation grant. We acknowledge the assistanceof Kasia Hammar, Leslie Murphy, and Jillian Wardin preparation and sequencing of EST libraries. Ourmanuscript was improved following detailed commentsfrom the editors, Alastair Simpson, and 1 anonymousreviewer. Thanks to David Hillis for conversations on
by guest on April 28, 2013
http://sysbio.oxfordjournals.org/D
ownloaded from
530 SYSTEMATIC BIOLOGY VOL. 59
a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (TableS1). Three major clades are strongly supported, thoughthe relationships among them are unresolved: i) Cerco-zoa, ii) Foraminifera plus Polycystinea and Acantharea(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-phorida with Gromia and vampyrellids (Fig. 3; Basset al. 2009). We show that Theratromyxa, a nematode-eating soil amoeba, is related to vampyrellid amoebae(Fig. 3; 100% BS), and together they are sister to the plantparasites plasmodiophorids (100% BS). The SSU-rDNAsequence for Theratromyxa is identical to an amoeba iso-lated from Siberia where it was identified as Arachnulaimpatiens (EU567294; Bass et al. 2009).
The topology within the Excavata is consistent withprevious hypotheses and clades with ultrastructuralidentities (Simpson 2003; Fig. 4), when contaminantEST data originally mislabeled as Streblomastix strixare excluded (Slamovits and Keeling 2006). Excavatais often polyphyletic in other analyses because Malaw-imonas branches outside the other clades of Excavata(Rodrıguez-Ezpeleta et al. 2007a; Hampl et al. 2009),whereas in analyses of fewer genes Excavata mem-bers fall into 2 or 3 clades (Parfrey et al. 2006; Simp-son et al. 2006). Although Malawimonas nests robustlywithin Excavata in our analyses, it does not have astable sister group and may represent an independentlineage (Fig. 4). Our analyses confirm that Stephanopogon(unplaced in Patterson 1999) branches within Heterolo-bosea (Cavalier-Smith and Nikolaev 2008; Yubuki andLeander 2008) and suggests that another enigmatic flag-ellate, ATCC 50646 (tentatively named Soginia anisocys-tis) is a basal member of Heterolobosea.
CONCLUSIONS
The robust tree of life emerging from this studydemonstrates the benefits of improved taxon samplingfor reconstructing deep phylogeny as our analyses pro-duce stable topologies that include a broad representa-tion of eukaryotes. The current study, combined withinsights from other studies referenced herein, has re-fined the eukaryotic tree of life from over 70 majorlineages (Patterson 1999) to !16 major groups (Fig. 5,http://eutree.lifedesks.org/). Most significantly, weattribute the stability of major clades (e.g., Excavata,Amoebozoa, Opisthokonta, and SAR) to broader taxo-nomic sampling combined with analyses of sufficientcharacters (16 genes or 6578 characters). In our view,inclusion of more taxa coupled with carefully chosengenes is necessary to further resolve the 16 or so majorlineages of microbial eukaryotes for which sister grouprelationships remain uncertain.
SUPPLEMENTARY MATERIAL
Supplementary material can be found at http://www.sysbio.oxfordjournals.org/.
FIGURE 5. Summary of major findings—the evolutionary relation-ships among major lineages of eukaryotes. Clades have been collapsedinto those that we view to be strongly supported. The many poly-tomies represent uncertainties that remain.
FUNDING
This work was made possible by the US NationalScience Foundation Assembling the Tree of Life grantto L.A.K. and D.J.P. (043115) and US National Institutesof Health 5R01AI058054-05 to M.L.S. Funding to collectForaminifera was provided by a Society of SystematicBiologists MiniPEET grant to L.W.P.
ACKNOWLEDGMENTS
We are grateful to Robert Molestina at ATCC whoprovided DNAs through a collaborative National Sci-ence Foundation grant. We acknowledge the assistanceof Kasia Hammar, Leslie Murphy, and Jillian Wardin preparation and sequencing of EST libraries. Ourmanuscript was improved following detailed commentsfrom the editors, Alastair Simpson, and 1 anonymousreviewer. Thanks to David Hillis for conversations on
by guest on April 28, 2013http://sysbio.oxfordjournals.org/
Downloaded from
Monday, April 29, 13
II: Filling in the Tree Example
Monday, April 29, 13
Big Microbial Sequencing Projects
• Coordinated, top-down efforts• Fungal Genome Initiative (Broad/Whitehead)• Gordon and Betty Moore Foundation Marine Microbial Genome
Sequencing Project• Sanger Center Pathogen Sequencing Unit• NHGRI Human Gut Microbiome Project• NIH Human Microbiome Program
• White paper or grant systems• NIAID Microbial Sequencing Centers• DOE/JGI Community Sequencing Program• DOE/JGI BER Sequencing Program• NSF/USDA Microbial Genome Sequencing
• Covers lots of ground and biological diversity
Monday, April 29, 13
As of 2002
Monday, April 29, 13
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
As of 2002
Tree Based on Hugenholtz, 2002. http://genomebiology.com/2002/3/2/reviews/0003
Monday, April 29, 13
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
As of 2002
Tree Based on Hugenholtz, 2002. http://genomebiology.com/2002/3/2/reviews/0003
Monday, April 29, 13
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
• Some other phyla are only sparsely sampled
As of 2002
Tree Based on Hugenholtz, 2002. http://genomebiology.com/2002/3/2/reviews/0003
Monday, April 29, 13
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
• Some other phyla are only sparsely sampled
As of 2002
Tree Based on Hugenholtz, 2002. http://genomebiology.com/2002/3/2/reviews/0003
Monday, April 29, 13
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
• Some other phyla are only sparsely sampled
As of 2002
Tree Based on Hugenholtz, 2002. http://genomebiology.com/2002/3/2/reviews/0003
Monday, April 29, 13
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
• Some other phyla are only sparsely sampled
• Solution I: sequence more phyla
• NSF-funded Tree of Life Project
• A genome from each of eight phyla
Eisen, Ward, Robb, Nelson, et al
Tree Based on Hugenholtz, 2002. http://genomebiology.com/2002/3/2/reviews/0003
Monday, April 29, 13
Phylum
Species selected
Chrysiogenes
Chrysiogenes arsenatis (GCA)
Coprothermobacter
Coprothermobacter proteolyticus (GCBP)
Dictyoglomi
Dictyoglomus thermophilum (GD T )
Thermodesulfobacteria
Thermodesulfobacterium commune (GTC)
Nitrospirae
Thermodesulfovibrio yellowstonii (GTY)
Thermomicrobia
Thermomicrobium roseum (GTR )
Deferribacteres
Geovibrio thiophilus (GGT)
Synergistes
Synergistes jonesii (GSJ)
Organisms Selected
Monday, April 29, 13
Monday, April 29, 13
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
• Some other phyla are only sparsely sampled
• Still highly biased in terms of the tree
• NSF-funded Tree of Life Project
• A genome from each of eight phyla
Eisen & Ward, PIs Tree Based on Hugenholtz, 2002. http://genomebiology.com/2002/3/2/reviews/0003
Monday, April 29, 13
Major Lineages of Actinobacteria2.5.1 Acidimicrobidae2.5.1.1 Unclassified2.5.1.2 "Microthrixineae2.5.1.3 Acidimicrobineae2.5.1.4 BD2-102.5.1.5 EB10172.5.2 Actinobacteridae2.5.2.1 Unclassified2.5.2.10 Ellin306/WR1602.5.2.11 Ellin50122.5.2.12 Ellin50342.5.2.13 Frankineae2.5.2.14 Glycomyces2.5.2.15 Intrasporangiaceae2.5.2.16 Kineosporiaceae2.5.2.17 Microbacteriaceae2.5.2.18 Micrococcaceae2.5.2.19 Micromonosporaceae2.5.2.2 Actinomyces2.5.2.20 Propionibacterineae2.5.2.21 Pseudonocardiaceae2.5.2.22 Streptomycineae2.5.2.23 Streptosporangineae2.5.2.3 Actinomycineae2.5.2.4 Actinosynnemataceae2.5.2.5 Bifidobacteriaceae2.5.2.6 Brevibacteriaceae2.5.2.7 Cellulomonadaceae2.5.2.8 Corynebacterineae2.5.2.9 Dermabacteraceae2.5.3 Coriobacteridae2.5.3.1 Unclassified2.5.3.2 Atopobiales2.5.3.3 Coriobacteriales2.5.3.4 Eggerthellales2.5.4 OPB412.5.5 PK12.5.6 Rubrobacteridae2.5.6.1 Unclassified2.5.6.2 "Thermoleiphilaceae2.5.6.3 MC472.5.6.4 Rubrobacteraceae
2.5 Actinobacteria2.5.1 Acidimicrobidae2.5.1.1 Unclassified2.5.1.2 "Microthrixineae2.5.1.3 Acidimicrobineae2.5.1.3.1 Unclassified2.5.1.3.2 Acidimicrobiaceae2.5.1.4 BD2-102.5.1.5 EB10172.5.2 Actinobacteridae2.5.2.1 Unclassified2.5.2.10 Ellin306/WR1602.5.2.11 Ellin50122.5.2.12 Ellin50342.5.2.13 Frankineae2.5.2.13.1 Unclassified2.5.2.13.2 Acidothermaceae2.5.2.13.3 Ellin60902.5.2.13.4 Frankiaceae2.5.2.13.5 Geodermatophilaceae2.5.2.13.6 Microsphaeraceae2.5.2.13.7 Sporichthyaceae2.5.2.14 Glycomyces2.5.2.15 Intrasporangiaceae2.5.2.15.1 Unclassified2.5.2.15.2 Dermacoccus2.5.2.15.3 Intrasporangiaceae2.5.2.16 Kineosporiaceae2.5.2.17 Microbacteriaceae2.5.2.17.1 Unclassified2.5.2.17.2 Agrococcus2.5.2.17.3 Agromyces2.5.2.18 Micrococcaceae2.5.2.19 Micromonosporaceae2.5.2.2 Actinomyces2.5.2.20 Propionibacterineae2.5.2.20.1 Unclassified2.5.2.20.2 Kribbella2.5.2.20.3 Nocardioidaceae2.5.2.20.4 Propionibacteriaceae2.5.2.21 Pseudonocardiaceae2.5.2.22 Streptomycineae2.5.2.22.1 Unclassified2.5.2.22.2 Kitasatospora2.5.2.22.3 Streptacidiphilus2.5.2.23 Streptosporangineae2.5.2.23.1 Unclassified2.5.2.23.2 Ellin51292.5.2.23.3 Nocardiopsaceae2.5.2.23.4 Streptosporangiaceae2.5.2.23.5 Thermomonosporaceae2.5.2.3 Actinomycineae2.5.2.4 Actinosynnemataceae2.5.2.5 Bifidobacteriaceae2.5.2.6 Brevibacteriaceae2.5.2.7 Cellulomonadaceae2.5.2.8 Corynebacterineae2.5.2.8.1 Unclassified2.5.2.8.2 Corynebacteriaceae2.5.2.8.3 Dietziaceae2.5.2.8.4 Gordoniaceae2.5.2.8.5 Mycobacteriaceae2.5.2.8.6 Rhodococcus2.5.2.8.7 Rhodococcus2.5.2.8.8 Rhodococcus2.5.2.9 Dermabacteraceae2.5.2.9.1 Unclassified2.5.2.9.2 Brachybacterium2.5.2.9.3 Dermabacter2.5.3 Coriobacteridae2.5.3.1 Unclassified2.5.3.2 Atopobiales2.5.3.3 Coriobacteriales2.5.3.4 Eggerthellales2.5.4 OPB412.5.5 PK12.5.6 Rubrobacteridae2.5.6.1 Unclassified2.5.6.2 "Thermoleiphilaceae2.5.6.2.1 Unclassified2.5.6.2.2 Conexibacter2.5.6.2.3 XGE5142.5.6.3 MC472.5.6.4 Rubrobacteraceae
Monday, April 29, 13
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
• Some other phyla are only sparsely sampled
• Same trend in Archaea
• NSF-funded Tree of Life Project
• A genome from each of eight phyla
Eisen & Ward, PIs Tree Based on Hugenholtz, 2002. http://genomebiology.com/2002/3/2/reviews/0003
Monday, April 29, 13
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
• Some other phyla are only sparsely sampled
• Same trend in Eukaryotes
• NSF-funded Tree of Life Project
• A genome from each of eight phyla
Eisen & Ward, PIs Tree Based on Hugenholtz, 2002. http://genomebiology.com/2002/3/2/reviews/0003
Monday, April 29, 13
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
• Some other phyla are only sparsely sampled
• Same trend in Viruses
• NSF-funded Tree of Life Project
• A genome from each of eight phyla
Eisen & Ward, PIs Tree Based on Hugenholtz, 2002. http://genomebiology.com/2002/3/2/reviews/0003
Monday, April 29, 13
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
• Some other phyla are only sparsely sampled
• Solution: Really Fill in the Trees
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
Tree Based on Hugenholtz, 2002. http://genomebiology.com/2002/3/2/reviews/0003
Monday, April 29, 13
Filling in the Tree
Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree
Monday, April 29, 13
Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree
Filling in the Tree
Monday, April 29, 13
Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree
Filling in the Tree
Monday, April 29, 13
Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree
Filling in the Tree
Monday, April 29, 13
Lots of Plants, Animals, Fungi
Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree
Monday, April 29, 13
Exclude Plants, Animals, Fungi
Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree
Monday, April 29, 13
A Genomic Encyclopedia of Microbes (GEM)
Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree
Monday, April 29, 13
Just Say No to Eukaryotes
Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree
Monday, April 29, 13
GEBA: A Genomic Encyclopedia of Bacteria and Archaea
Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree
Monday, April 29, 13
GEBA
Monday, April 29, 13
GEBA Pilot Project: Components• Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan
Eisen, Eddy Rubin, Jim Bristow)• Project management (David Bruce, Eileen Dalin, Lynne
Goodwin)• Culture collection and DNA prep (DSMZ, Hans-Peter Klenk)• Sequencing and closure (Eileen Dalin, Susan Lucas, Alla
Lapidus, Mat Nolan, Alex Copeland, Cliff Han, Feng Chen, Jan-Fang Cheng)
• Annotation and data release (Nikos Kyrpides, Victor Markowitz, et al)
• Analysis (Dongying Wu, Kostas Mavrommatis, Martin Wu, Victor Kunin, Neil Rawlings, Ian Paulsen, Patrick Chain, Patrik D’Haeseleer, Sean Hooper, Iain Anderson, Amrita Pati, Natalia N. Ivanova, Athanasios Lykidis, Adam Zemla)
• Adopt a microbe education project (Cheryl Kerfeld)• Outreach (David Gilbert)• $$$ (DOE, Eddy Rubin, Jim Bristow)
Monday, April 29, 13
rRNA Tree of Life
FIgure from Barton, Eisen et al. “Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
Monday, April 29, 13
rRNA Tree of BA
FIgure from Barton, Eisen et al. “Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
Monday, April 29, 13
GreenGenes
Monday, April 29, 13
Monday, April 29, 13
DSMZ
Monday, April 29, 13
Monday, April 29, 13
GEBA Pilot Project Overview
• Identify major branches in rRNA tree for which no genomes are available
• Identify those with a cultured representative in DSMZ
• DSMZ grew > 200 of these and prepped DNA• Sequence and finish 200+• Annotate, analyze, release data• Assess benefits of tree guided sequencing• 1st paper Wu et al in Nature Dec 2009
Monday, April 29, 13
GEBA Pilot Target List
0
5
10
15
20
25
30
35
B: A
ctino
bacte
ria (H
igh G
C)
B: A
mina
naer
obia
B: A
quific
ae
B: B
actero
idetes
B: C
hloro
flexi
B: D
efer
ribac
tere
s
B: D
efer
ribac
tere
s
B: D
einoc
occi
B: D
elta Pr
oteo
bacte
ria
B: Eps
ilon Pr
oteo
bacte
ria
B: Firm
icutes
B: Fus
obac
teria
B: G
amma Pr
oteo
bacte
ria
B: G
emmatim
onad
etes
B: H
aloan
aero
biales
B: Plan
ctomyc
etes
B: S
piroc
haetes
B: The
rmod
esulf
obac
teria
B: The
rmod
esulf
obia
B: The
rmov
enab
ulae
A: H
aloba
cteria
A: A
rcha
eoglo
bi
A: M
etha
noba
cteria
A: M
etha
nomicr
obia
A: The
rmoc
occi
A: The
rmop
rotei
Phyla
# o
f G
en
om
es
GEBA Initial Target List
Monday, April 29, 13
Assess Benefits of GEBA
• All genomes have some value
• But what, if any, is the benefit of tree-guided sequencing over other selection methods
• Lessons for other large scale microbial genome projects?
Monday, April 29, 13
Lessons from GEBA
Monday, April 29, 13
Lesson 1: rRNA PD IDs novel lineages
From Wu et al. 2009 Nature 462, 1056-1060
Monday, April 29, 13
Concatenated Marker PDFrom Wu et al. 2009 Nature 462, 1056-1060
Monday, April 29, 13
Lesson 2: rRNA Tree is not perfect
Badger et al. 2005 Int J System Evol Microbiol 55: 1021-1026.
16s WGT, 23S
Monday, April 29, 13
How Pick Novel Lineages for Euks?
• Molecular• rRNA PD?• Conserved markers by PCR?• EST shotgun?
• Other data for phylogeny
Monday, April 29, 13
Lesson 3: Improves annotation
• Took 56 GEBA genomes and compared results vs. 56 randomly sampled new genomes
• Better definition of protein family sequence “patterns”• Greatly improves “comparative” and “evolutionary”
based predictions• Conversion of hypothetical into conserved hypotheticals• Linking distantly related members of protein families• Improved non-homology prediction
Monday, April 29, 13
Annotation for Euks?
Monday, April 29, 13
Lesson 4 : Metadata Important
Monday, April 29, 13
Lesson 5: Project management critical
• Tracking samples and status• Getting permissions• Shipping samples• Contacting collaborators• Data archiving and submission• Communicating with core facilities• and more
Monday, April 29, 13
Lesson 6: Culture Collections Needed
Monday, April 29, 13
Lesson 7: Data Publications
Monday, April 29, 13
Lesson 8: Diversity Discovery
• Phylogeny-driven genome selection helps discover new genetic diversity
Monday, April 29, 13
Protein Family Rarefaction
• Take data set of multiple complete genomes
• Identify all protein families using MCL• Plot # of genomes vs. # of protein families
Monday, April 29, 13
Wu et al. 2009 Nature 462, 1056-1060
Monday, April 29, 13
Wu et al. 2009 Nature 462, 1056-1060
Monday, April 29, 13
Wu et al. 2009 Nature 462, 1056-1060
Monday, April 29, 13
Wu et al. 2009 Nature 462, 1056-1060
Monday, April 29, 13
Wu et al. 2009 Nature 462, 1056-1060
Monday, April 29, 13
Synapomorphies exist
Wu et al. 2009 Nature 462, 1056-1060
Monday, April 29, 13
True for Euks?
Monday, April 29, 13
Lesson 9: Improves metagenomics
Monday, April 29, 13
Phylotyping
0
0.125
0.250
0.375
0.500
Alphapro
teobacteria
Betap
roteobacteria
Gamm
aproteobacteria
Epsilo
nproteobacteria
Deltapro
teobacteria
Cyanobacteria
Firmicutes
Actinobacteria
Chlorobi
CFB
Chloroflexi
Spirochaetes
Fusobacteria
Deinococcus-Th
ermus
Euryarchaeota
Crenarchaeota
Sargasso Phylotypes
Wei
ghte
d %
of C
lone
s
Major Phylogenetic Group
EFG EFTuHSP70 RecARpoB rRNA
Venter et al., Science 304: 66-74. 2004
GEBA Project improves metagenomic analysis
Monday, April 29, 13
Eukaryotic Metagenomics?
Monday, April 29, 13
GEBA Zoom
Monday, April 29, 13
GEBA Now
• 300+ genomes• Rich sampling of major groups of
cultured organisms• Zoomed in sampling of haloarchaea,
cyanobacteria and more
Monday, April 29, 13
GEBA Cyanobacteria
www.pnas.org/cgi/doi/10.1073/pnas.1217107110
Monday, April 29, 13
Haloarchaeal GEBA-like
Lynch EA, Langille MGI, Darling A, Wilbanks EG, Haltiner C, et al. (2012) Sequencing of Seven Haloarchaeal Genomes Reveals Patterns of Genomic Flux. PLoS ONE 7(7): e41389. doi:10.1371/journal.pone.0041389
Monday, April 29, 13
88
Plan: Sequence multiple Root Nodule Bacteria (RNBs) across the
planet. Pilot: 100 RNBs.
Alpha RNB
BradyrhizobiumMesorhizobiumRhizobium
Beta RNB
Sinorhizobium
CupriavidisBurkholderia
Balneimonas-like
DevosiaOchrobactrumPhyllobacterium
AzorhizobiumAllorhizobium
Goal: • Understand BioGeographical effects on species
evolution and understand host-specificity.
Rationale: • N2 fixation by legume pastures and crops provides 65% of the
N currently utilized in agricultural production.
• Contributes 25 to 90 million metric tones N pa.
• Symbioses save $US 6-10 billion annually on N fertilizer.
• Grain and animal production enhanced by fixed nitrogen supplied by the symbiosis.
Nikos Kyrpides
GEBA RNB
Monday, April 29, 13
But ...
Monday, April 29, 13
Phylotyping
0
0.125
0.250
0.375
0.500
Alphapro
teobacteria
Betap
roteobacteria
Gamm
aproteobacteria
Epsilo
nproteobacteria
Deltapro
teobacteria
Cyanobacteria
Firmicutes
Actinobacteria
Chlorobi
CFB
Chloroflexi
Spirochaetes
Fusobacteria
Deinococcus-Th
ermus
Euryarchaeota
Crenarchaeota
Sargasso Phylotypes
Wei
ghte
d %
of C
lone
s
Major Phylogenetic Group
EFG EFTuHSP70 RecARpoB rRNA
Venter et al., Science 304: 66-74. 2004
GEBA Project improves metagenomic analysis
Monday, April 29, 13
Phylotyping
0
0.125
0.250
0.375
0.500
Alphapro
teobacteria
Betap
roteobacteria
Gamm
aproteobacteria
Epsilo
nproteobacteria
Deltapro
teobacteria
Cyanobacteria
Firmicutes
Actinobacteria
Chlorobi
CFB
Chloroflexi
Spirochaetes
Fusobacteria
Deinococcus-Th
ermus
Euryarchaeota
Crenarchaeota
Sargasso Phylotypes
Wei
ghte
d %
of C
lone
s
Major Phylogenetic Group
EFG EFTuHSP70 RecARpoB rRNA
But not a lot
Venter et al., Science 304: 66-74. 2004
Monday, April 29, 13
Phylogenomics Future 1
• Need to adapt genomic and metagenomic methods to make better use of data
Monday, April 29, 13
Improving Metagenomic Analysis
• Methods• More automation• Better phylogenetic methods for short reads
and large data sets• Improved tools for using distantly related
genomes in metagenomic analysis• Data sets
• Rebuild protein family models• New phylogenetic markers• Need better reference phylogenies, including
HGT• More simulations
Monday, April 29, 13
Kembel Correction)LJXUH��&OLFN�KHUH�WR�GRZQORDG�KLJK�UHVROXWLRQ�LPDJH
Kembel, Wu, Eisen, Green. In press. PLoS Computational Biology. Incorporating 16S gene copy number information improves estimates of microbial diversity and abundance
Monday, April 29, 13
alignment used to build the profile, resulting in a multiplesequence alignment of full-length reference sequences andmetagenomic reads. The final step of the alignment process is aquality control filter that 1) ensures that only homologous SSU-rRNA sequences from the appropriate phylogenetic domain areincluded in the final alignment, and 2) masks highly gappedalignment columns (see Text S1).We use this high quality alignment of metagenomic reads and
references sequences to construct a fully-resolved, phylogenetictree and hence determine the evolutionary relationships betweenthe reads. Reference sequences are included in this stage of theanalysis to guide the phylogenetic assignment of the relativelyshort metagenomic reads. While the software can be easilyextended to incorporate a number of different phylogenetic toolscapable of analyzing metagenomic data (e.g., RAxML [27],pplacer [28], etc.), PhylOTU currently employs FastTree as adefault method due to its relatively high speed-to-performanceratio and its ability to construct accurate trees in the presence ofhighly-gapped data [29]. After construction of the phylogeny,lineages representing reference sequences are pruned from thetree. The resulting phylogeny of metagenomic reads is then used tocompute a PD distance matrix in which the distance between apair of reads is defined as the total tree path distance (i.e., branchlength) separating the two reads [30]. This tree-based distancematrix is subsequently used to hierarchically cluster metagenomicreads via MOTHUR into OTUs in a fashion similar to traditionalPID-based analysis [31]. As with PID clustering, the hierarchicalalgorithm can be tuned to produce finer or courser clusters,corresponding to different taxonomic levels, by adjusting theclustering threshold and linkage method.To evaluate the performance of PhylOTU, we employed
statistical comparisons of distance matrices and clustering resultsfor a variety of data sets. These investigations aimed 1) to compare
PD versus PID clustering, 2) to explore overlap between PhylOTUclusters and recognized taxonomic designations, and 3) to quantifythe accuracy of PhylOTU clusters from shotgun reads relative tothose obtained from full-length sequences.
PhylOTU Clusters Recapitulate PID ClustersWe sought to identify how PD-based clustering compares to
commonly employed PID-based clustering methods by applyingthe two methods to the same set of sequences. Both PID-basedclustering and PhylOTU may be used to identify OTUs fromoverlapping sequences. Therefore we applied both methods to adataset of 508 full-length bacterial SSU-rRNA sequences (refer-ence sequences; see above) obtained from the Ribosomal DatabaseProject (RDP) [25]. Recent work has demonstrated that PID ismore accurately calculated from pairwise alignments than multiplesequence alignments [32–33], so we used ESPRIT, whichimplements pairwise alignments, to obtain a PID distance matrixfor the reference sequences [32]. We used PhylOTU to compute aPD distance matrix for the same data. Then, we used MOTHUR tohierarchically cluster sequences into OTUs based on both PIDand PD. For each of the two distance matrices, we employed arange of clustering thresholds and three different definitions oflinkage in the hierarchical clustering algorithm: nearest-neighbor,average, and furthest-neighbor.To statistically evaluate the similarity of cluster composition
between of each pair of clustering results, we used two summarystatistics that together capture the frequency with which sequencesare co-clustered in both analyses: true conjunction rate (i.e., theproportion of pairs of sequences derived from the same cluster inthe first analysis that also are clustered together in the secondanalysis) and true disjunction rate (i.e., the proportion of pairs ofsequences derived from different clusters in the first analysis thatalso are not clustered together in the second analysis) (see Methods
Figure 1. PhylOTU Workflow. Computational processes are represented as squares and databases are represented as cylinders in this generalizeworkflow of PhylOTU. See Results section for details.doi:10.1371/journal.pcbi.1001061.g001
Finding Metagenomic OTUs
PLoS Computational Biology | www.ploscompbiol.org 3 January 2011 | Volume 7 | Issue 1 | e1001061
Sharpton TJ, Riesenfeld SJ, Kembel SW, Ladau J, O'Dwyer JP, Green JL, Eisen JA, Pollard KS. (2011) PhylOTU: A High-Throughput Procedure Quantifies Microbial Community Diversity and Resolves Novel Taxa from Metagenomic Data. PLoS Comput Biol 7(1): e1001061. doi:10.1371/journal.pcbi.1001061
PhylOTU
Monday, April 29, 13
Phylosift/ pplacer
Aaron Darling, Guillaume Jospin, Holly Bik, Erik Matsen, Eric Lowe, and others
Monday, April 29, 13
Kembel Combiner
cally defined by a sequence similarity threshold) in the sampleas equally related. Newer ! diversity measures that incorporatephylogenetic information are more powerful because they ac-count for the degree of divergence between sequences (13, 18,29, 30). Phylogenetic ! diversity measures can also be eitherquantitative or qualitative depending on whether abundance istaken into account. The original, unweighted UniFrac measure(13) is a qualitative measure. Unweighted UniFrac measuresthe distance between two communities by calculating the frac-tion of the branch length in a phylogenetic tree that leads todescendants in either, but not both, of the two communities(Fig. 1A). The fixation index (FST), which measures thedistance between two communities by comparing the geneticdiversity within each community to the total genetic diversity ofthe communities combined (18), is a quantitative measure thataccounts for different levels of divergence between sequences.The phylogenetic test (P test), which measures the significanceof the association between environment and phylogeny (18), istypically used as a qualitative measure because duplicate se-quences are usually removed from the tree. However, the Ptest may be used in a semiquantitative manner if all clones,even those with identical or near-identical sequences, are in-cluded in the tree (13).
Here we describe a quantitative version of UniFrac that wecall “weighted UniFrac.” We show that weighted UniFrac be-haves similarly to the FST test in situations where both are
applicable. However, weighted UniFrac has a major advantageover FST because it can be used to combine data in whichdifferent parts of the 16S rRNA were sequenced (e.g., whennonoverlapping sequences can be combined into a single treeusing full-length sequences as guides). We use two differentdata sets to illustrate how analyses with quantitative and qual-itative ! diversity measures can lead to dramatically differentconclusions about the main factors that structure microbialdiversity. Specifically, qualitative measures that disregard rel-ative abundance can better detect effects of different foundingpopulations, such as the source of bacteria that first colonizethe gut of newborn mice and the effects of factors that arerestrictive for microbial growth such as temperature. In con-trast, quantitative measures that account for the relative abun-dance of microbial lineages can reveal the effects of moretransient factors such as nutrient availability.
MATERIALS AND METHODS
Weighted UniFrac. Weighted UniFrac is a new variant of the original un-weighted UniFrac measure that weights the branches of a phylogenetic treebased on the abundance of information (Fig. 1B). Weighted UniFrac is thus aquantitative measure of ! diversity that can detect changes in how many se-quences from each lineage are present, as well as detect changes in which taxaare present. This ability is important because the relative abundance of differentkinds of bacteria can be critical for describing community changes. In contrast,the original, unweighted UniFrac (Fig. 1A) is a qualitative ! diversity measurebecause duplicate sequences contribute no additional branch length to the tree(by definition, the branch length that separates a pair of duplicate sequences iszero, because no substitutions separate them).
The first step in applying weighted UniFrac is to calculate the raw weightedUniFrac value (u), according to the first equation:
u ! !i
n
bi " "Ai
AT#
Bi
BT"
Here, n is the total number of branches in the tree, bi is the length of branch i,Ai and Bi are the numbers of sequences that descend from branch i in commu-nities A and B, respectively, and AT and BT are the total numbers of sequencesin communities A and B, respectively. In order to control for unequal samplingeffort, Ai and Bi are divided by AT and BT.
If the phylogenetic tree is not ultrametric (i.e., if different sequences in thesample have evolved at different rates), clustering with weighted UniFrac willplace more emphasis on communities that contain quickly evolving taxa. Sincethese taxa are assigned more branch length, a comparison of the communitiesthat contain them will tend to produce higher values of u. In some situations, itmay be desirable to normalize u so that it has a value of 0 for identical commu-nities and 1 for nonoverlapping communities. This is accomplished by dividing uby a scaling factor (D), which is the average distance of each sequence from theroot, as shown in the equation as follows:
D ! !j
n
dj " #Aj
AT$
Bj
BT$
Here, dj is the distance of sequence j from the root, Aj and Bj are the numbersof times the sequences were observed in communities A and B, respectively, andAT and BT are the total numbers of sequences from communities A and B,respectively.
Clustering with normalized u values treats each sample equally instead of
TABLE 1. Measurements of diversity
Measure Measurement of " diversity Measurement of ! diversity
Only presence/absence of taxa considered Qualitative (species richness) QualitativeAdditionally accounts for the no. of times that
each taxon was observedQuantitative (species richness and evenness) Quantitative
FIG. 1. Calculation of the unweighted and the weighted UniFracmeasures. Squares and circles represent sequences from two differentenvironments. (a) In unweighted UniFrac, the distance between thecircle and square communities is calculated as the fraction of thebranch length that has descendants from either the square or the circleenvironment (black) but not both (gray). (b) In weighted UniFrac,branch lengths are weighted by the relative abundance of sequences inthe square and circle communities; square sequences are weightedtwice as much as circle sequences because there are twice as many totalcircle sequences in the data set. The width of branches is proportionalto the degree to which each branch is weighted in the calculations, andgray branches have no weight. Branches 1 and 2 have heavy weightssince the descendants are biased toward the square and circles, respec-tively. Branch 3 contributes no value since it has an equal contributionfrom circle and square sequences after normalization.
VOL. 73, 2007 PHYLOGENETICALLY COMPARING MICROBIAL COMMUNITIES 1577
Kembel SW, Eisen JA, Pollard KS, Green JL (2011) The Phylogenetic Diversity of Metagenomes. PLoS ONE 6(8): e23214. doi:10.1371/journal.pone.0023214
Monday, April 29, 13
NMF in MetagenomesCharacterizing the niche-space distributions of componentsS
ite
s
N orth American E ast C oast_G S 005_E mbayment
N orth American E ast C oast_G S 002_C oasta l
N orth American E ast C oast_G S 003_C oasta l
N orth American E ast C oast_G S 007_C oasta l
N orth American E ast C oast_G S 004_C oasta l
N orth American E ast C oast_G S 013_C oasta l
N orth American E ast C oast_G S 008_C oasta l
N orth American E ast C oast_G S 011_E stuary
N orth American E ast C oast_G S 009_C oasta l
E astern Tropica l Pacific_G S 021_C oasta l
N orth American E ast C oast_G S 006_E stuary
N orth American E ast C oast_G S 014_C oasta l
Polynesia Archipelagos_G S 051_C ora l R eef Atoll
G alapagos Islands_G S 036_C oasta l
G alapagos Islands_G S 028_C oasta l
Indian O cean_G S 117a_C oasta l sample
G alapagos Islands_G S 031_C oasta l upwelling
G alapagos Islands_G S 029_C oasta l
G alapagos Islands_G S 030_W arm S eep
G alapagos Islands_G S 035_C oasta l
S argasso S ea_G S 001c_O pen O cean
E astern Tropica l Pacific_G S 022_O pen O cean
G alapagos Islands_G S 027_C oasta l
Indian O cean_G S 149_H arbor
Indian O cean_G S 123_O pen O cean
C aribbean S ea_G S 016_C oasta l S ea
Indian O cean_G S 148_Fringing R eef
Indian O cean_G S 113_O pen O cean
Indian O cean_G S 112a_O pen O cean
C aribbean S ea_G S 017_O pen O cean
Indian O cean_G S 121_O pen O cean
Indian O cean_G S 122a_O pen O cean
G alapagos Islands_G S 034_C oasta l
C aribbean S ea_G S 018_O pen O cean
Indian O cean_G S 108a_Lagoon R eef
Indian O cean_G S 110a_O pen O cean
E astern Tropica l Pacific_G S 023_O pen O cean
Indian O cean_G S 114_O pen O cean
C aribbean S ea_G S 019_C oasta l
C aribbean S ea_G S 015_C oasta l
Indian O cean_G S 119_O pen O cean
G alapagos Islands_G S 026_O pen O cean
Polynesia Archipelagos_G S 049_C oasta l
Indian O cean_G S 120_O pen O cean
Polynesia Archipelagos_G S 048a_C ora l R eef
Component 1
Component 2
Component 3
Component 4
Component 5
0 .1 0 .2 0 .3 0 .4 0 .5 0 .6
0 .2 0 .4 0 .6 0 .8 1 .0
Salin
ity
Sam
ple
Dep
th
Ch
loro
ph
yll
Tem
pera
ture
Inso
lati
on
Wate
r D
ep
th
G enera l
H ighM ediumLowN A
H ighM ediumLowN A
W ater depth
>4000m2000!4000m900!2000m100!200m20!100m0!20m
>4000m2000!4000m900!2000m100!200m20!100m0!20m
(a) (b) (c)
Figure 3: a) Niche-space distributions for our five components (HT ); b) the site-similarity matrix (HT H); c) environmental variables for the sites. The matrices arealigned so that the same row corresponds to the same site in each matrix. Sites areordered by applying spectral reordering to the similarity matrix (see Materials andMethods). Rows are aligned across the three matrices.
Figure 3a shows the estimated niche-space distribution for each of the five com-ponents. Components 2 (Photosystem) and 4 (Unidentified) are broadly distributed;Components 1 (Signalling) and 5 (Unidentified) are largely restricted to a handful ofsites; and component 3 shows an intermediate pattern. There is a great deal of overlapbetween niche-space distributions for di�erent components.
Figure 3b shows the pattern of filtered similarity between sites. We see clear pat-terns of grouping, that do not emerge when we calculate functional distances withoutfiltering, or using PCA rather than NMF filtering (Figure 3 in Text S1). As withthe Pfams, we see clusters roughly associated with our components, but there is moreoverlapping than with the Pfam clusters (Figure 2b).
Figure 3c shows the distribution of environmental variables measured at each site.Inspection of Figure 3 reveals qualitative correspondence between environmental factorsand clusters of similar sites in the similarity matrix. For example, the “North AmericanEast Coast” samples are divided into two groups, one in the top left and the other in thebottom right of the similarity matrix. Inspection of the environmental features suggeststhat the split in these samples could be mostly due to the di�erences in insolation andwater depth.
We can also examine patterns of similarity between the components themselves,using niche-site distributions or functional profiles (see Figure 5 in Text S1). All 5
8
Functional biogeography of ocean microbes revealed through non-negative matrixfactorization Jiang et al. In press PLoS One. Comes out 9/18.
w/ Weitz, Dushoff, Langille, Neches, Levin, etc
Monday, April 29, 13
More Markers
Phylogenetic group Genome Number
Gene Number
Maker Candidates
Archaea 62 145415 106Actinobacteria 63 267783 136Alphaproteobacteria 94 347287 121Betaproteobacteria 56 266362 311Gammaproteobacteria 126 483632 118Deltaproteobacteria 25 102115 206Epislonproteobacteria 18 33416 455Bacteriodes 25 71531 286Chlamydae 13 13823 560Chloroflexi 10 33577 323Cyanobacteria 36 124080 590Firmicutes 106 312309 87Spirochaetes 18 38832 176Thermi 5 14160 974Thermotogae 9 17037 684
Monday, April 29, 13
Better Reference Tree
Morgan et al. submitted
Monday, April 29, 13
Sifting FamiliesRepresentative
Genomes
ExtractProtein
Annotation
All v. AllBLAST
HomologyClustering
(MCL)
SFams
Align & Build
HMMs
HMMs
Screen forHomologs
NewGenomes
ExtractProtein
Annotation
Figure 1Sharpton et al. submitted
AB
C
��
�
�
�
�
�� �
�
�
�
�
�
��
��
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
�
�
��
�
�
�
��
��
�
�
��
�
�
� ��
�
�
��
��
�
��
�
�
�
�
�
�
�
�
�
�
�
�
��
��
�
�
�
�
�
� �
�
�
��
�
�
� �
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
�
�
�
�
�
� �
��
�
�
�
�
�
�
�
��
�
� �
�
�
�
�
�
�
�
�
�
� �
�
�
�
�
�
�
�
�
� �
�
��
�
�
��
�
�
� ��
�
��
�
��
�
�
��
�
��
�
��
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
��
��
�
�
� �
�
�
�
��
�
�
�
��
�
�
�
�
�
�
� �
�
�
��
�
�
�
��
�
�
�
�
�
�
�
�
�
�
� �
�
�
�
�
��
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
��
�
�
��
��
�
�
��
�
�
� �
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
���
�
�
�
�
�
�
�
�
�
��
�
�
�
��
� �
�
�
��
�
�
��
�
�
�
�
�
�
�
�
�
��
�
� ��
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
� �
�
�
�
�
�
��
�
�
�
�
�
�
��
�
�
�
� �
�
�
� �� �
�
�
�
�
� �
�
��
�
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
��
���
�
�
�
�
�
�
� �
�
�
��
� �
�
�
�
�
�
�
�
�
�
�
�
�
�
��
�
�
��
�
�
��
�
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
���
�
�
��
�
�
��
��
�
�
�
�
��
�
�
�
�
�
��
�
� �
�
�
��
�
��
�
��
�
��
�
��
��
�
�
�
�
�
�
�
���
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
��
�
�
�
�
�
�
�
��
�
�
��
�
�
�
�
�
��
� �
��
� �
�
�
�
�
�
�
��
�
�� �
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
��
� �
�
�
�
�
�
�
�
��
�
�
�
�
�
�
��
�
�
�
�
�
�
��
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
��
�
�
�
�
�
�
��
��
�
��
�
�
�
�
�
�
�
�
�� �
�
�
�
�
�
�
�
�
��
��
�
�
��
�
�
���
�
�
�
�
�
�
�
��
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
�
�
��
��
�
��
�
�
�
�
�
�
�
�
�
�
�
�� �
�
�
�
�
�
�
�
�
�
�
�
�
�
�� �
�
�
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
��
��
�
�
�
�
�
�
�
�
�
�
��
��
�
�
�
�
�
�
�
�
�
�� �
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
� ��
�
�
�
�
�
�
�
�
� �
��
�
�
�
�
�
�
��
�
�
��
�
�
�
�
�
�
� �
�
��
�
�
�
�
� �
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
�
�
� �
�
�
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
�
�
�
�
�
��
�
�
�
�
�
�
��
�
��
�
�
�
� �
�
�
��
�
�
�
��
��
� �
�
�
�
�
��
�
�
�
�
�
�
�
�
��
��
�
�
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
�
��
�
�
��
�
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
� �
��
�
�
�
�
��
�
��
��
��
�
�
�
�
�
�
� �
��
�
�
�
�
Monday, April 29, 13
Zorro - Automated Masking
ce to
Tru
e Tr
ee
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
200 400 800 1600 3200
Dist
ance
to T
rue
Tree
Sequence Length
200
no maskingzorrogblocks
Wu M, Chatterji S, Eisen JA (2012) Accounting For Alignment Uncertainty in Phylogenomics. PLoS ONE 7(1): e30288. doi:10.1371/journal.pone.0030288
Monday, April 29, 13
Phylogenomics Future 2
• We have still only scratched the surface of microbial diversity
Monday, April 29, 13
rRNA Tree of Life
Figure from Barton, Eisen et al. “Evolution”, CSHL Press. 2007.
Based on tree from Pace 1997 Science 276:734-740
Archaea
Eukaryotes
Bacteria
Monday, April 29, 13
PD: Genomes
From Wu et al. 2009 Nature 462, 1056-1060
Monday, April 29, 13
PD: Genomes + GEBA
From Wu et al. 2009 Nature 462, 1056-1060
Monday, April 29, 13
PD: Isolates
From Wu et al. 2009 Nature 462, 1056-1060
Monday, April 29, 13
PD: All
From Wu et al. 2009 Nature 462, 1056-1060
Monday, April 29, 13
Uncultured Lineages: Methods
• Get into culture• Enrichment cultures• If abundant in low diversity ecosystems• Flow sorting• Microbeads• Microfluidic sorting• Single cell amplification
Monday, April 29, 13
110
Number of SAGs from Candidate Phyla
OD
1
OP
11
OP
3
SA
R4
06
Site A: Hydrothermal vent 4 1 - -Site B: Gold Mine 6 13 2 -Site C: Tropical gyres (Mesopelagic) - - - 2Site D: Tropical gyres (Photic zone) 1 - - -
Sample collections at 4 additional sites are underway.
Phil Hugenholtz
GEBA Uncultured
Monday, April 29, 13
Uncultured Eukaryotes?
Monday, April 29, 13
Phylogenomics Future 3
• Need Experiments from Across the Tree of Life too
Monday, April 29, 13
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
As of 2002
Tree Based on Hugenholtz, 2002. http://genomebiology.com/2002/3/2/reviews/0003
Monday, April 29, 13
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Experimental studies are mostly from three phyla
As of 2002
Tree Based on Hugenholtz, 2002. http://genomebiology.com/2002/3/2/reviews/0003
Monday, April 29, 13
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Experimental studies are mostly from three phyla
• Some studies in other phyla
As of 2002
Tree Based on Hugenholtz, 2002. http://genomebiology.com/2002/3/2/reviews/0003
Monday, April 29, 13
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
• Some other phyla are only sparsely sampled
• Same trend in Eukaryotes
As of 2002
Tree Based on Hugenholtz, 2002. http://genomebiology.com/2002/3/2/reviews/0003
Monday, April 29, 13
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
• Some other phyla are only sparsely sampled
• Same trend in Viruses
As of 2002
Tree Based on Hugenholtz, 2002. http://genomebiology.com/2002/3/2/reviews/0003
Monday, April 29, 13
0.1
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
Tree based on Hugenholtz (2002) with some modifications.
Need experimental studies from across the tree too
Tree Based on Hugenholtz, 2002. http://genomebiology.com/2002/3/2/reviews/0003
Monday, April 29, 13
0.1
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus
Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
Tree based on Hugenholtz (2002) with some modifications.
Adopt a Microbe
Tree Based on Hugenholtz, 2002. http://genomebiology.com/2002/3/2/reviews/0003
Monday, April 29, 13
What Next?
Monday, April 29, 13
Acknowledgements• GEBA:
• $$: DOE-JGI, DSMZ• Eddy Rubin, Phil Hugenholtz, Hans-Peter Klenk, Nikos Kyrpides, Tanya Woyke, Dongying Wu, Aaron
Darling, Jenna Lang• GEBA Cyanobacteria
• $$: DOE-JGI• Cheryl Kerfeld, Dongying Wu, Patrick Shih
• Haloarchaea• $$$ NSF• Marc Facciotti, Aaron Darling, Erin Lynch,
• iSEEM: • $$: GBMF• Katie Pollard, Jessica Green, Martin Wu, Steven Kembel, Tom Sharpton, Morgan Langille, Guillaume
Jospin, Dongying Wu, • aTOL
• $$: NSF• Naomi Ward, Jonathan Badger, Frank Robb, Martin Wu, Dongying Wu
• Others (not mentioned in detail)• $$: NSF, NIH, DOE, GBMF, DARPA, Sloan• Frank Robb, Craig Venter, Doug Rusch, Shibu Yooseph, Nancy Moran, Colleen Cavanaugh, Josh
Weitz• EisenLab: Srijak Bhatnagar, Russell Neches, Lizzy Wilbanks, Holly Bik
Monday, April 29, 13