125
The Need for a Phylogeny-Driven Genomic Encyclopedia of Eukaryotes Jonathan A. Eisen @phylogenomics University of California, Davis Talk for SMBE-EUKS Monday, April 29, 13

The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Embed Size (px)

DESCRIPTION

Talk by Jonathan Eisen for #SMBEEuks

Citation preview

Page 1: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

The Need for a Phylogeny-Driven Genomic Encyclopedia of Eukaryotes

Jonathan A. Eisen@phylogenomics

University of California, Davis

Talk for SMBE-EUKSMonday, April 29, 13

Page 2: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

I: The Problem

Monday, April 29, 13

Page 3: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Googling Sequenced Eukaryotic Genomes

Monday, April 29, 13

Page 4: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Wikipedia On Sequenced Euks

Monday, April 29, 13

Page 5: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

More from Wikipedia

Monday, April 29, 13

Page 6: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Better Source: GOLD

http://www.genomesonline.org/cgi-bin/GOLD/index.cgi

Monday, April 29, 13

Page 7: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

GOLD by Taxonomy

http://www.genomesonline.org/cgi-bin/GOLD/index.cgi

Monday, April 29, 13

Page 8: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

GOLD: Euks by Phylum

4/28/13 9:20 AMGOLD

Page 1 of 1http://www.genomesonline.org/cgi-bin/GOLD/phylogenetic_distribution.cgi

Search

Home Version 4.0

Archaeal Phylum Distribution

Phylum Count Percent

Korarchaeota 1 0

Nanoarchaeota 2 0

Thaumarchaeota 30 5

Crenarchaeota 142 25

Euryarchaeota 356 64

Unclassified 28 5

Bacterial Phylum Distribution

Phylum Count Percent

Caldiserica 1 0

Nitrospinae 1 0

Crenarchaeota 2 0

Chrysiogenetes 2 0

Dictyoglomi 2 0

Fibrobacteres 2 0

Armatimonadetes 3 0

Elusimicrobia 3 0

Lentisphaerae 3 0

Poribacteria 4 0

Gemmatimonadetes 6 0

Thermodesulfobacteria 7 0

Ignavibacteria 8 0

Deferribacteres 10 0

Chlorobi 14 0

Synergistetes 21 0

Euryarchaeota 23 0

Nitrospirae 24 0

Aquificae 24 0

Acidobacteria 30 0

Verrucomicrobia 41 0

Planctomycetes 42 0

Thermotogae 50 0

Chloroflexi 51 0

Fusobacteria 80 0

Deinococcus-Thermus 92 0

Chlamydiae 207 1

Cyanobacteria 245 1

Tenericutes 251 1

Spirochaetes 472 2

Bacteroidetes 762 4

Actinobacteria 2,065 10

Firmicutes 5,342 26

Proteobacteria 10,088 50

Unclassified 17 0

Eukaryotic Phylum Distribution

Phylum Count Percent

Phaeophyceae 1 0

Priapulida 1 0

Rotifera 1 0

Hemichordata 1 0

Pinguiophyceae 1 0

Ctenophora 1 0

Bolidophyceae 1 0

Chaetognatha 1 0

Porifera 2 0

Xanthophyceae 2 0

Tardigrada 2 0

Euglenida 2 0

Chromerida 3 0

Placozoa 3 0

Glomeromycota 3 0

Cryptomycota 4 0

Blastocladiomycota 5 0

Echinodermata 6 0

Entomophthoromycota 9 0

Chytridiomycota 12 0

Neocallimastigomycota 12 0

Annelida 13 0

Eustigmatophyceae 13 0

Cnidaria 18 0

Bacillariophyta 21 0

Platyhelminthes 23 0

Mollusca 25 0

Microsporidia 31 1

Chlorophyta 77 1

Nematoda 110 2

Apicomplexa 264 5

Arthropoda 370 7

Chordata 626 12

Streptophyta 796 15

Basidiomycota 976 18

Ascomycota 1,251 23

Unclassified 704 13

Back to GOLD

PHYLOGENETIC DISTRIBUTION

ARCHAEA TOTAL: 559 Phylum: 5/5 Class: 10/9 Order: 18/18 Family: 30/29 Genus: 103/118 Species: 340/673

BACTERIA TOTAL: 20318 Phylum: 35/31 Class: 59/52 Order: 124/118 Family: 280/298 Genus: 1368/2106 Species: 6352/11424

EUKARYA TOTAL: 5391 Phylum: 36/56 Class: 107/182 Order: 330/1037 Family: 689/6689 Genus: 1170/54319 Species: 1769/218222

NUMBER EXPLANATION: Number of classifieds subdivisions with genome projects over number of the classified subdivisions of this phylogenetic group.

4/28/13 9:20 AMGOLD

Page 1 of 1http://www.genomesonline.org/cgi-bin/GOLD/phylogenetic_distribution.cgi

Search

Home Version 4.0

Archaeal Phylum Distribution

Phylum Count Percent

Korarchaeota 1 0

Nanoarchaeota 2 0

Thaumarchaeota 30 5

Crenarchaeota 142 25

Euryarchaeota 356 64

Unclassified 28 5

Bacterial Phylum Distribution

Phylum Count Percent

Caldiserica 1 0

Nitrospinae 1 0

Crenarchaeota 2 0

Chrysiogenetes 2 0

Dictyoglomi 2 0

Fibrobacteres 2 0

Armatimonadetes 3 0

Elusimicrobia 3 0

Lentisphaerae 3 0

Poribacteria 4 0

Gemmatimonadetes 6 0

Thermodesulfobacteria 7 0

Ignavibacteria 8 0

Deferribacteres 10 0

Chlorobi 14 0

Synergistetes 21 0

Euryarchaeota 23 0

Nitrospirae 24 0

Aquificae 24 0

Acidobacteria 30 0

Verrucomicrobia 41 0

Planctomycetes 42 0

Thermotogae 50 0

Chloroflexi 51 0

Fusobacteria 80 0

Deinococcus-Thermus 92 0

Chlamydiae 207 1

Cyanobacteria 245 1

Tenericutes 251 1

Spirochaetes 472 2

Bacteroidetes 762 4

Actinobacteria 2,065 10

Firmicutes 5,342 26

Proteobacteria 10,088 50

Unclassified 17 0

Eukaryotic Phylum Distribution

Phylum Count Percent

Phaeophyceae 1 0

Priapulida 1 0

Rotifera 1 0

Hemichordata 1 0

Pinguiophyceae 1 0

Ctenophora 1 0

Bolidophyceae 1 0

Chaetognatha 1 0

Porifera 2 0

Xanthophyceae 2 0

Tardigrada 2 0

Euglenida 2 0

Chromerida 3 0

Placozoa 3 0

Glomeromycota 3 0

Cryptomycota 4 0

Blastocladiomycota 5 0

Echinodermata 6 0

Entomophthoromycota 9 0

Chytridiomycota 12 0

Neocallimastigomycota 12 0

Annelida 13 0

Eustigmatophyceae 13 0

Cnidaria 18 0

Bacillariophyta 21 0

Platyhelminthes 23 0

Mollusca 25 0

Microsporidia 31 1

Chlorophyta 77 1

Nematoda 110 2

Apicomplexa 264 5

Arthropoda 370 7

Chordata 626 12

Streptophyta 796 15

Basidiomycota 976 18

Ascomycota 1,251 23

Unclassified 704 13

Back to GOLD

PHYLOGENETIC DISTRIBUTION

ARCHAEA TOTAL: 559 Phylum: 5/5 Class: 10/9 Order: 18/18 Family: 30/29 Genus: 103/118 Species: 340/673

BACTERIA TOTAL: 20318 Phylum: 35/31 Class: 59/52 Order: 124/118 Family: 280/298 Genus: 1368/2106 Species: 6352/11424

EUKARYA TOTAL: 5391 Phylum: 36/56 Class: 107/182 Order: 330/1037 Family: 689/6689 Genus: 1170/54319 Species: 1769/218222

NUMBER EXPLANATION: Number of classifieds subdivisions with genome projects over number of the classified subdivisions of this phylogenetic group.

http://www.genomesonline.org/cgi-bin/GOLD/index.cgi

Monday, April 29, 13

Page 9: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

GOLD: Euks by PhylumPriapulida 1 0Phaeophyceae 1 0Rotifera 1 0Hemichordata 1 0Pinguiophyceae 1 0Ctenophora 1 0Bolidophyceae 1 0Chaetognatha 1 0Porifera 2 0Xanthophyceae 2 0Tardigrada 2 0Euglenida 2 0Chromerida 3 0Placozoa 3 0Glomeromycota 3 0Cryptomycota 4 0Blastocladiomycota 5 0Echinodermata 6 0Entomophthoromycota 9 0Chytridiomycota 12 0Neocallimastigomycota 12 0Annelida 13 0Eustigmatophyceae 13 0Cnidaria 18 0Bacillariophyta 21 0Platyhelminthes 23 0Mollusca 25 0Microsporidia 31 1Chlorophyta 77 1Nematoda 110 2Apicomplexa 264 5Arthropoda 370 7Chordata 626 12Streptophyta 796 15Basidiomycota 976 18Ascomycota 1,251 23

Monday, April 29, 13

Page 10: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Euks More Resolution

0.2

Bodomorpha minima

Lumbricus rubellus

Diplophrys

BOLA458

Chaunacanthida sp.

Labyrinthuloides minuta

Filamoeba nolandi

Chlamydaster sterni

RT7iin2

Phalansterium solitarium

Euglena gracilis

RT5iin20

BOLA383

Ulkenia profunda

LEMD267

Ammonia sp.

Oxymonas sp.

DH148EKB1Diplonema ambulator

Minchinia teredinis

Pavlova salina

Glaucosphaera vacuolata

Cyanoptyche gloeocystis

OLI11305

Gromia oviformis

Cryptosporidium parvum

Breviata anathema

Achlya bisexualis

LEMD052Phagomyxa odontellae

Raphidiophrys ambigua

Compsopogon coeruleus

BOLA212

Colpodella pontica

Uncultured eukaryote clone BOLA187

Jakoba libera

RT5iin2

CS.E036

Acrosphaera sp. CR6A

Acanthamoeba castellanii

AT1.3

Saccharomyces cerevisiae

OLI11150

Nuclearia simplex

RA000412.136

TCS 2002

BOLA868

Allogromia sp.

Monosiga brevicollis

RT5iin4

Plasmodiophora brassicae

RT5iin8

OLI51105

RA010412.17

BOLA515

OLI11032

RT 5iin25

AT4.11

Symphyacanthida

RT5iin44

CS.E045

Urosporidium crescens

Goniomonas truncata

Gymnophrys cometa

Podocoryne carnea

OLI11066

Reclinomonas americana

Reticulomyxa filosa

RT8n7

Oxytricha nova

AT4.50

C1.E027

Arthracanthida sp.

RT1n14cul

AT4.94

Telonema antarcticum

OLI11025

LKM30

LKM48

Filobasidiella neoformans

DH147EKD17

Mayorella sp.

C2.E026

Bacillaria paxillifer

Retortamonas sp.

OLI11059

Malawimonas jakobiformis

BOLA048

Streblomastix strix

Guillardia theta

Platyamoeba stenopodia

DH148EKD18

Cafeteria roenbergensis

Telonema subtilis RCC404.5

DH148EKD53

LKM74

Ciliophrys infusionum

Scherffelia dubia

Volvox carteri

CS.R003

Trypanosoma cruzi

BL010625.25

AT4.56

N-Por

Jakoba incarcerata

Sphaerozoum punctatum

Uncultured eukaryote clone BOLA366

Lecythium sp.

Acanthometra sp.

Loxophyllum utriculare

LKM101

Glaucocystis nostochinearum

OLI11056

BAQA072

Apusomonas proboscidea

Trimastix marina

C3.E012

Helianthus annuus

AT8.54

Ichthyobodo necator

CS.E022

RA001219.10

RT5in38 Paravahlkampfia ustiana

OLI11007

Telonema subtilis RCC358.7

Amastigomonas debruynei

Emiliania huxleyi

Leptomyxa reticulataHartmannella vermiformis

OLI11072

DH145EKD11

Noctiluca scintillans

Cyanophora paradoxa

Trimastix pyriformis

Naegleria gruberi

AT 4.96

Amoeba proteus

Gonyaulax spinifera

sp.

0.99/68

0.89/-0.40/-

0.87/-

0.88/-

0.88/-

0.84/-

0.78/59

0.66/61

0.55/-

0.89/-

Collodictyon triciliatum

Diphylleia rotansUncultured Collodictyonidae partial

1.0/77

-/84

1.0/63

1.0/56

0.99/-

1.0/-

0.96/-

0.99/-

0.95/-

0.99/-

0.99/68

1.0/63

1.0/62

0.69/-

0.63/- 0.83/-

0.79/75

0.69/57

0.79/-

0.87/-0.59/-

0.68/-1.0/-

0.57/50

0.63/-

1.0/78

0.53/-

SAR

Excavata

Diphyllatia

Amoebozoa

Opisthokonta

0.53/76

0.73/-

0.81/-

0.84/-

-/-

0.63/-

0.79/-

0.81/-

0.70/-

0.98/-

1.0/74

0.51/-

-/-

-/-

Haptophyta

Telonemia

Apusozoa

Centrohelida

CryptophytaRhodophyta

Glaucophyta

Viridiplantae

FIG. 1. 18S rDNA phylogeny of the Diphyllatia species Collodictyon triciliatum (highlighted by black box) and Diphylleia rotans. The topologywas reconstructed by MrBayes v3.1.2 under the GTR ! GAMMA ! I ! covarion model. Posterior probabilities (PP) and ML bootstrap supports(BP, inferred by RAxML v7.1.2 under GTR ! GAMMA ! I model) are shown at the nodes. Thick lines indicate PP. 0.90 and BP. 80%. Dashes‘‘-’’ indicate PP , 0.5 or BP , 50%. A few long branches are shortened by 50% (/) or 75% (//).

Zhao et al. · doi:10.1093/molbev/mss001 MBE

1560

by guest on April 28, 2013

http://mbe.oxfordjournals.org/

Dow

nloaded from

Collodictyon—An Ancient Lineage in the Tree of EukaryotesSen Zhao,!,1 Fabien Burki,!,2 Jon Brate,1 Patrick J. Keeling,2 Dag Klaveness,1 andKamran Shalchian-Tabrizi*,1

1Microbial Evolution Research Group, Department of Biology, University of Oslo, Oslo, Norway2Canadian Institute for Advanced Research, Botany Department, University of British Columbia, Vancouver, British Columbia,Canada

!These authors contributed equally to this work.

*Corresponding author: E-mail: [email protected].

Associate editor: Herve Philippe

Abstract

The current consensus for the eukaryote tree of life consists of several large assemblages (supergroups) that are hypothesized todescribe the existing diversity. Phylogenomic analyses have shed light on the evolutionary relationships within and betweensupergroups as well as placed newly sequenced enigmatic species close to known lineages. Yet, a few eukaryote species remain ofunknown origin and could represent key evolutionary forms for inferring ancient genomic and cellular characteristics ofeukaryotes. Here, we investigate the evolutionary origin of the poorly studied protist Collodictyon (subphylum Diphyllatia) bysequencing a cDNA library as well as the 18S and 28S ribosomal DNA (rDNA) genes. Phylogenomic trees inferred from 124 genesplaced Collodictyon close to the bifurcation of the ‘‘unikont’’ and ‘‘bikont’’ groups, either alone or as sister to the potentiallycontentious excavateMalawimonas. Phylogenies based on rDNA genes confirmed that Collodictyon is closely related to anothergenus, Diphylleia, and revealed a very low diversity in environmental DNA samples. The early and distinct origin of Collodictyonsuggests that it constitutes a new lineage in the global eukaryote phylogeny. Collodictyon shares cellular characteristics withExcavata and Amoebozoa, such as ventral feeding groove supported by microtubular structures and the ability to form thin andbroad pseudopods. These may therefore be ancient morphological features among eukaryotes. Overall, this shows thatCollodictyon is a key lineage to understand early eukaryote evolution.

Key words: 18S and 28S rDNA, Collodictyon, Diphyllatia, tree of life, phylogenomics, cDNA, pyrosequencing.

IntroductionOver the last few years, molecular sequence data have ad-dressed some of the most intriguing questions about theeukaryote tree of life. Phylogenomic analyses have con-firmed the existence of several major eukaryote groups(supergroups) as well as shown various levels of evidencesfor the relationships among them (Burki et al. 2007; Parfreyet al. 2010). Recently, two new large assemblages, SAR(Stramenopila, Alveolata, and Rhizaria) and CCTH (Crypto-phyta, Centrohelida, Telonemia, and Haptophyta), wereproposed to encompass a large fraction of the eukaryotediversity, together with the other supergroups Opisthokon-ta, Amoebozoa, Archaeplastida, and Excavata (Patron et al.2007; Burki et al. 2009). Solid phylogenomic evidencesupports the monophyly of Amoebozoa, Opisthokonta,Archaeplastida, and SAR (Rodriguez-Ezpeleta et al. 2007;Burki et al. 2009; Minge et al. 2009), but the monophylyof Excavata and CCTH (also called Hacrobia; Okamotoet al. 2009) remains controversial, often dependent onthe selection of taxa and gene data set (Burki et al.2009; Hampl et al. 2009; Baurain et al. 2010). Despite severalattempts, the evolutionary relationships between thesesupergroups are still uncertain because of the ancient

and complex genome histories (Simpson and Roger2004; Parfrey et al. 2006; Roger and Simpson 2009).

Identification of sister lineages to these supergroups iscrucial for resolving the eukaryote tree and understandingthe early history of eukaryotes. If these key lineages exist,they may be found among the few species that harbor dis-tinct morphological features but are of unknown evolu-tionary origin in single-gene phylogenies (Patterson 1999;Shalchian-Tabrizi et al. 2006; Kim et al. 2011). Indicationsthat such enigmatic species can be placed in the eukaryotetree come from recent phylogenomic analyses. For in-stance, Ministeria (Opisthokonta), Breviata (Amoebozoa)and Telonemia, Centroheliozoa, and Picobiliphyta havebeen shown to constitute deep lineages within their re-spective supergroups (Shalchian-Tabrizi, Minge, et al.2008; Burki et al. 2009; Minge et al. 2009; Yoon et al. 2011).

Here, we investigate a member of such a key lineage, Col-lodictyon, which was first described in 1865 (Carter 1865),but its cellular structure and outer morphology were ana-lyzed only recently (Klaveness 1995; Brugerolle et al. 2002).Collodictyonwas originally proposed to be closely related toDiphylleia and Sulcomonas and classified in the familyDiphylleidae (Cavalier-Smith 1993; the synonymous family

© The Author(s) 2012. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License(http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, andreproduction in any medium, provided the original work is properly cited. Open AccessMol. Biol. Evol. 29(6):1557–1568. 2012 doi:10.1093/molbev/mss001 Advance Access publication January 6, 2012 1557

Research

article by guest on A

pril 28, 2013http://m

be.oxfordjournals.org/D

ownloaded from

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3351787/

Monday, April 29, 13

Page 11: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

2010 PARFREY ET AL.—BROADLY SAMPLED TREE OF EUKARYOTIC LIFE 523

FIGURE 1. Most likely eukaryotic tree of life reconstructed using all 451 taxa and all 16 genes (SSU-rDNA plus 15 protein genes). Majornodes in this topology are robust to analyses of subsets of taxa and genes, which include varying levels of missing data (Table 1). Clades in boldare monophyletic in analyses with 2 or more members except in all:15 in which taxa represented by a single gene were sometimes misplaced.Numbers in boxes represent support at key nodes in analyses with increasing amounts of missing data (10:16, 6:16, 4:16, and all:16 analyses; seeTable 1 for more details). Given uncertainties around the root of the eukaryotic tree of life (see text), we have chosen to draw the tree rooted withthe well-supported clade Opisthokonta. Dashed line indicates alternate branching pattern seen for Amoebozoa in other analyses. Long branches,indicated by //, have been reduced by half. The 6 lineages labeled by * represent taxa that are misplaced, probably due to LBA, listed fromtop to bottom with expected clade in parentheses. These are Protoopalina japonica (Stramenopiles), Aggregata octopiana (Apicomplexa), Mikrocytosmackini (Haplosporidia), Centropyxis laevigata (Tubulinea), Marteilioides chungmuensis (unplaced), and Cochliopodium spiniferum (Amoebozoa).

by guest on April 28, 2013

http://sysbio.oxfordjournals.org/D

ownloaded from

Syst. Biol. 59(5):518–533, 2010c! The Author(s) 2010. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved.

For Permissions, please email: [email protected]:10.1093/sysbio/syq037Advance Access publication on July 23, 2010

Broadly Sampled Multigene Analyses Yield a Well-Resolved Eukaryotic Tree of Life

LAURA WEGENER PARFREY1, JESSICA GRANT2, YONAS I. TEKLE2,6, ERICA LASEK-NESSELQUIST3,4,HILARY G. MORRISON3, MITCHELL L. SOGIN3, DAVID J. PATTERSON5, AND LAURA A. KATZ1,2,!

1Program in Organismic and Evolutionary Biology, University of Massachusetts, 611 North Pleasant Street, Amherst,MA 01003, USA; 2Department of Biological Sciences, Smith College, 44 College Lane, Northampton, MA 01063, USA; 3Bay Paul Center for

Comparative Molecular Biology and Evolution, Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA 02543, USA; 4Department of Ecology andEvolutionary Biology, Brown University, 80 Waterman Street, Providence, RI 02912, USA; 5Biodiversity Informatics Group, Marine Biological

Laboratory, 7 MBL Street, Woods Hole, MA 02543, USA; 6Present address: Department of Epidemiology and Public Health, Yale University School ofMedicine, New Haven, CT 06520, USA;

"Correspondence to be sent to: Laura A. Katz, 44 College Lane, Northampton, MA 01003, USA; E-mail: [email protected] Wegener Parfrey and Jessica Grant have contributed equally to this work.

Received 30 September 2009; reviews returned 1 December 2009; accepted 25 May 2010Associate Editor: Cecile Ane

Abstract.—An accurate reconstruction of the eukaryotic tree of life is essential to identify the innovations underlying thediversity of microbial and macroscopic (e.g., plants and animals) eukaryotes. Previous work has divided eukaryotic diver-sity into a small number of high-level “supergroups,” many of which receive strong support in phylogenomic analyses.However, the abundance of data in phylogenomic analyses can lead to highly supported but incorrect relationships dueto systematic phylogenetic error. Furthermore, the paucity of major eukaryotic lineages (19 or fewer) included in thesegenomic studies may exaggerate systematic error and reduce power to evaluate hypotheses. Here, we use a taxon-richstrategy to assess eukaryotic relationships. We show that analyses emphasizing broad taxonomic sampling (up to 451 taxarepresenting 72 major lineages) combined with a moderate number of genes yield a well-resolved eukaryotic tree of life.The consistency across analyses with varying numbers of taxa (88–451) and levels of missing data (17–69%) supports theaccuracy of the resulting topologies. The resulting stable topology emerges without the removal of rapidly evolving genesor taxa, a practice common to phylogenomic analyses. Several major groups are stable and strongly supported in theseanalyses (e.g., SAR, Rhizaria, Excavata), whereas the proposed supergroup “Chromalveolata” is rejected. Furthermore, ex-tensive instability among photosynthetic lineages suggests the presence of systematic biases including endosymbiotic genetransfer from symbiont (nucleus or plastid) to host. Our analyses demonstrate that stable topologies of ancient evolutionaryrelationships can be achieved with broad taxonomic sampling and a moderate number of genes. Finally, taxon-rich analy-ses such as presented here provide a method for testing the accuracy of relationships that receive high bootstrap support(BS) in phylogenomic analyses and enable placement of the multitude of lineages that lack genome scale data. [Excavata;microbial eukaryotes; Rhizaria; supergroups; systematic error; taxon sampling.]

Perspectives on the structure of the eukaryotic treeof life have shifted in the past decade as molecularanalyses provide hypotheses for relationships amongthe approximately 75 robust lineages of eukaryotes.These lineages are defined by ultrastructural identities(Patterson 1999)—patterns of cellular and subcellularorganization revealed by electron microscopy—and arestrongly supported in molecular analyses (Parfrey et al.2006; Yoon et al. 2008). Most of these lineages nowfall within a small number of higher level clades, thesupergroups of eukaryotes (Simpson and Roger 2004;Adl et al. 2005; Keeling et al. 2005). Several of theseclades—Opisthokonta, Rhizaria, and Amoebozoa—are increasingly well supported by phylogenomic(Rodrıguez-Ezpeleta et al. 2007a; Burki et al. 2008;Hampl et al. 2009) and phylogenetic (Parfrey et al.2006; Pawlowski and Burki 2009), analyses, whereassupport for “Archaeplastida” predominantly comesfrom some phylogenomic studies (Rodrıguez-Ezpeletaet al. 2005; Burki et al. 2007) or analyses of plastidgenes (Yoon et al. 2002; Parfrey et al. 2006). In con-trast, support for “Chromalveolata” and Excavata ismixed, often dependent on the selection of taxa in-cluded in analyses (Rodrıguez-Ezpeleta et al. 2005;Parfrey et al. 2006; Rodrıguez-Ezpeleta et al. 2007a;Burki et al. 2008; Hampl et al. 2009). We use quotation

marks throughout to note groups where uncertaintiesremain. Moreover, it is difficult to evaluate the overallstability of major clades of eukaryotes because phyloge-nomic analyses have 19 or fewer of the major lineagesand hence do not sufficiently sample eukaryotic diver-sity (Rodrıguez-Ezpeleta et al. 2007b; Burki et al. 2008;Hampl et al. 2009), whereas taxon-rich analyses with4 or fewer genes yield topologies with poor support atdeep nodes (Cavalier-Smith 2004; Parfrey et al. 2006;Yoon et al. 2008).

Estimating the relationships of the major lineagesof eukaryotes is difficult because of both the ancientage of eukaryotes (1.2–1.8 billion years; Knoll et al.2006) and complex gene histories that include hetero-geneous rates of molecular evolution and paralogy(Maddison 1997; Gribaldo and Philippe 2002; Tekleet al. 2009). A further issue obscuring eukaryotic re-lationships is the chimeric nature of the eukaryoticgenome—not all genes are vertically inherited due tolateral gene transfer (LGT) and endosymbiotic genetransfer (EGT)—that can also mislead efforts to re-construct phylogenetic relationships (Andersson 2005;Rannala and Yang 2008; Tekle et al. 2009). This is espe-cially true among photosynthetic lineages that comprise“Chromalveolata” and “Archaeplastida” where a largeportion of the host genome (approximately 8–18%) is

518

by guest on April 28, 2013

http://sysbio.oxfordjournals.org/D

ownloaded from

http://sysbio.oxfordjournals.org/content/59/5/518.full

Euks More Resolution

Monday, April 29, 13

Page 12: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Syst. Biol. 59(5):518–533, 2010c! The Author(s) 2010. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved.

For Permissions, please email: [email protected]:10.1093/sysbio/syq037Advance Access publication on July 23, 2010

Broadly Sampled Multigene Analyses Yield a Well-Resolved Eukaryotic Tree of Life

LAURA WEGENER PARFREY1, JESSICA GRANT2, YONAS I. TEKLE2,6, ERICA LASEK-NESSELQUIST3,4,HILARY G. MORRISON3, MITCHELL L. SOGIN3, DAVID J. PATTERSON5, AND LAURA A. KATZ1,2,!

1Program in Organismic and Evolutionary Biology, University of Massachusetts, 611 North Pleasant Street, Amherst,MA 01003, USA; 2Department of Biological Sciences, Smith College, 44 College Lane, Northampton, MA 01063, USA; 3Bay Paul Center for

Comparative Molecular Biology and Evolution, Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA 02543, USA; 4Department of Ecology andEvolutionary Biology, Brown University, 80 Waterman Street, Providence, RI 02912, USA; 5Biodiversity Informatics Group, Marine Biological

Laboratory, 7 MBL Street, Woods Hole, MA 02543, USA; 6Present address: Department of Epidemiology and Public Health, Yale University School ofMedicine, New Haven, CT 06520, USA;

"Correspondence to be sent to: Laura A. Katz, 44 College Lane, Northampton, MA 01003, USA; E-mail: [email protected] Wegener Parfrey and Jessica Grant have contributed equally to this work.

Received 30 September 2009; reviews returned 1 December 2009; accepted 25 May 2010Associate Editor: Cecile Ane

Abstract.—An accurate reconstruction of the eukaryotic tree of life is essential to identify the innovations underlying thediversity of microbial and macroscopic (e.g., plants and animals) eukaryotes. Previous work has divided eukaryotic diver-sity into a small number of high-level “supergroups,” many of which receive strong support in phylogenomic analyses.However, the abundance of data in phylogenomic analyses can lead to highly supported but incorrect relationships dueto systematic phylogenetic error. Furthermore, the paucity of major eukaryotic lineages (19 or fewer) included in thesegenomic studies may exaggerate systematic error and reduce power to evaluate hypotheses. Here, we use a taxon-richstrategy to assess eukaryotic relationships. We show that analyses emphasizing broad taxonomic sampling (up to 451 taxarepresenting 72 major lineages) combined with a moderate number of genes yield a well-resolved eukaryotic tree of life.The consistency across analyses with varying numbers of taxa (88–451) and levels of missing data (17–69%) supports theaccuracy of the resulting topologies. The resulting stable topology emerges without the removal of rapidly evolving genesor taxa, a practice common to phylogenomic analyses. Several major groups are stable and strongly supported in theseanalyses (e.g., SAR, Rhizaria, Excavata), whereas the proposed supergroup “Chromalveolata” is rejected. Furthermore, ex-tensive instability among photosynthetic lineages suggests the presence of systematic biases including endosymbiotic genetransfer from symbiont (nucleus or plastid) to host. Our analyses demonstrate that stable topologies of ancient evolutionaryrelationships can be achieved with broad taxonomic sampling and a moderate number of genes. Finally, taxon-rich analy-ses such as presented here provide a method for testing the accuracy of relationships that receive high bootstrap support(BS) in phylogenomic analyses and enable placement of the multitude of lineages that lack genome scale data. [Excavata;microbial eukaryotes; Rhizaria; supergroups; systematic error; taxon sampling.]

Perspectives on the structure of the eukaryotic treeof life have shifted in the past decade as molecularanalyses provide hypotheses for relationships amongthe approximately 75 robust lineages of eukaryotes.These lineages are defined by ultrastructural identities(Patterson 1999)—patterns of cellular and subcellularorganization revealed by electron microscopy—and arestrongly supported in molecular analyses (Parfrey et al.2006; Yoon et al. 2008). Most of these lineages nowfall within a small number of higher level clades, thesupergroups of eukaryotes (Simpson and Roger 2004;Adl et al. 2005; Keeling et al. 2005). Several of theseclades—Opisthokonta, Rhizaria, and Amoebozoa—are increasingly well supported by phylogenomic(Rodrıguez-Ezpeleta et al. 2007a; Burki et al. 2008;Hampl et al. 2009) and phylogenetic (Parfrey et al.2006; Pawlowski and Burki 2009), analyses, whereassupport for “Archaeplastida” predominantly comesfrom some phylogenomic studies (Rodrıguez-Ezpeletaet al. 2005; Burki et al. 2007) or analyses of plastidgenes (Yoon et al. 2002; Parfrey et al. 2006). In con-trast, support for “Chromalveolata” and Excavata ismixed, often dependent on the selection of taxa in-cluded in analyses (Rodrıguez-Ezpeleta et al. 2005;Parfrey et al. 2006; Rodrıguez-Ezpeleta et al. 2007a;Burki et al. 2008; Hampl et al. 2009). We use quotation

marks throughout to note groups where uncertaintiesremain. Moreover, it is difficult to evaluate the overallstability of major clades of eukaryotes because phyloge-nomic analyses have 19 or fewer of the major lineagesand hence do not sufficiently sample eukaryotic diver-sity (Rodrıguez-Ezpeleta et al. 2007b; Burki et al. 2008;Hampl et al. 2009), whereas taxon-rich analyses with4 or fewer genes yield topologies with poor support atdeep nodes (Cavalier-Smith 2004; Parfrey et al. 2006;Yoon et al. 2008).

Estimating the relationships of the major lineagesof eukaryotes is difficult because of both the ancientage of eukaryotes (1.2–1.8 billion years; Knoll et al.2006) and complex gene histories that include hetero-geneous rates of molecular evolution and paralogy(Maddison 1997; Gribaldo and Philippe 2002; Tekleet al. 2009). A further issue obscuring eukaryotic re-lationships is the chimeric nature of the eukaryoticgenome—not all genes are vertically inherited due tolateral gene transfer (LGT) and endosymbiotic genetransfer (EGT)—that can also mislead efforts to re-construct phylogenetic relationships (Andersson 2005;Rannala and Yang 2008; Tekle et al. 2009). This is espe-cially true among photosynthetic lineages that comprise“Chromalveolata” and “Archaeplastida” where a largeportion of the host genome (approximately 8–18%) is

518

by guest on April 28, 2013

http://sysbio.oxfordjournals.org/D

ownloaded from

530 SYSTEMATIC BIOLOGY VOL. 59

a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (TableS1). Three major clades are strongly supported, thoughthe relationships among them are unresolved: i) Cerco-zoa, ii) Foraminifera plus Polycystinea and Acantharea(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-phorida with Gromia and vampyrellids (Fig. 3; Basset al. 2009). We show that Theratromyxa, a nematode-eating soil amoeba, is related to vampyrellid amoebae(Fig. 3; 100% BS), and together they are sister to the plantparasites plasmodiophorids (100% BS). The SSU-rDNAsequence for Theratromyxa is identical to an amoeba iso-lated from Siberia where it was identified as Arachnulaimpatiens (EU567294; Bass et al. 2009).

The topology within the Excavata is consistent withprevious hypotheses and clades with ultrastructuralidentities (Simpson 2003; Fig. 4), when contaminantEST data originally mislabeled as Streblomastix strixare excluded (Slamovits and Keeling 2006). Excavatais often polyphyletic in other analyses because Malaw-imonas branches outside the other clades of Excavata(Rodrıguez-Ezpeleta et al. 2007a; Hampl et al. 2009),whereas in analyses of fewer genes Excavata mem-bers fall into 2 or 3 clades (Parfrey et al. 2006; Simp-son et al. 2006). Although Malawimonas nests robustlywithin Excavata in our analyses, it does not have astable sister group and may represent an independentlineage (Fig. 4). Our analyses confirm that Stephanopogon(unplaced in Patterson 1999) branches within Heterolo-bosea (Cavalier-Smith and Nikolaev 2008; Yubuki andLeander 2008) and suggests that another enigmatic flag-ellate, ATCC 50646 (tentatively named Soginia anisocys-tis) is a basal member of Heterolobosea.

CONCLUSIONS

The robust tree of life emerging from this studydemonstrates the benefits of improved taxon samplingfor reconstructing deep phylogeny as our analyses pro-duce stable topologies that include a broad representa-tion of eukaryotes. The current study, combined withinsights from other studies referenced herein, has re-fined the eukaryotic tree of life from over 70 majorlineages (Patterson 1999) to !16 major groups (Fig. 5,http://eutree.lifedesks.org/). Most significantly, weattribute the stability of major clades (e.g., Excavata,Amoebozoa, Opisthokonta, and SAR) to broader taxo-nomic sampling combined with analyses of sufficientcharacters (16 genes or 6578 characters). In our view,inclusion of more taxa coupled with carefully chosengenes is necessary to further resolve the 16 or so majorlineages of microbial eukaryotes for which sister grouprelationships remain uncertain.

SUPPLEMENTARY MATERIAL

Supplementary material can be found at http://www.sysbio.oxfordjournals.org/.

FIGURE 5. Summary of major findings—the evolutionary relation-ships among major lineages of eukaryotes. Clades have been collapsedinto those that we view to be strongly supported. The many poly-tomies represent uncertainties that remain.

FUNDING

This work was made possible by the US NationalScience Foundation Assembling the Tree of Life grantto L.A.K. and D.J.P. (043115) and US National Institutesof Health 5R01AI058054-05 to M.L.S. Funding to collectForaminifera was provided by a Society of SystematicBiologists MiniPEET grant to L.W.P.

ACKNOWLEDGMENTS

We are grateful to Robert Molestina at ATCC whoprovided DNAs through a collaborative National Sci-ence Foundation grant. We acknowledge the assistanceof Kasia Hammar, Leslie Murphy, and Jillian Wardin preparation and sequencing of EST libraries. Ourmanuscript was improved following detailed commentsfrom the editors, Alastair Simpson, and 1 anonymousreviewer. Thanks to David Hillis for conversations on

by guest on April 28, 2013

http://sysbio.oxfordjournals.org/D

ownloaded from

http://sysbio.oxfordjournals.org/content/59/5/518.full

Euks More Resolution but Simpler

Monday, April 29, 13

Page 13: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Mapping GOLD to TreePriapulida 1 0Phaeophyceae 1 0Rotifera 1 0Hemichordata 1 0Pinguiophyceae 1 0Ctenophora 1 0Bolidophyceae 1 0Chaetognatha 1 0Porifera 2 0Xanthophyceae 2 0Tardigrada 2 0Euglenida 2 0Chromerida 3 0Placozoa 3 0Glomeromycota 3 0Cryptomycota 4 0Blastocladiomycota 5 0Echinodermata 6 0Entomophthoromycota 9 0Chytridiomycota 12 0Neocallimastigomycota 12 0Annelida 13 0Eustigmatophyceae 13 0Cnidaria 18 0Bacillariophyta 21 0Platyhelminthes 23 0Mollusca 25 0Microsporidia 31 1Chlorophyta 77 1Nematoda 110 2Apicomplexa 264 5Arthropoda 370 7Chordata 626 12Streptophyta 796 15Basidiomycota 976 18Ascomycota 1,251 23

530 SYSTEMATIC BIOLOGY VOL. 59

a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (TableS1). Three major clades are strongly supported, thoughthe relationships among them are unresolved: i) Cerco-zoa, ii) Foraminifera plus Polycystinea and Acantharea(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-phorida with Gromia and vampyrellids (Fig. 3; Basset al. 2009). We show that Theratromyxa, a nematode-eating soil amoeba, is related to vampyrellid amoebae(Fig. 3; 100% BS), and together they are sister to the plantparasites plasmodiophorids (100% BS). The SSU-rDNAsequence for Theratromyxa is identical to an amoeba iso-lated from Siberia where it was identified as Arachnulaimpatiens (EU567294; Bass et al. 2009).

The topology within the Excavata is consistent withprevious hypotheses and clades with ultrastructuralidentities (Simpson 2003; Fig. 4), when contaminantEST data originally mislabeled as Streblomastix strixare excluded (Slamovits and Keeling 2006). Excavatais often polyphyletic in other analyses because Malaw-imonas branches outside the other clades of Excavata(Rodrıguez-Ezpeleta et al. 2007a; Hampl et al. 2009),whereas in analyses of fewer genes Excavata mem-bers fall into 2 or 3 clades (Parfrey et al. 2006; Simp-son et al. 2006). Although Malawimonas nests robustlywithin Excavata in our analyses, it does not have astable sister group and may represent an independentlineage (Fig. 4). Our analyses confirm that Stephanopogon(unplaced in Patterson 1999) branches within Heterolo-bosea (Cavalier-Smith and Nikolaev 2008; Yubuki andLeander 2008) and suggests that another enigmatic flag-ellate, ATCC 50646 (tentatively named Soginia anisocys-tis) is a basal member of Heterolobosea.

CONCLUSIONS

The robust tree of life emerging from this studydemonstrates the benefits of improved taxon samplingfor reconstructing deep phylogeny as our analyses pro-duce stable topologies that include a broad representa-tion of eukaryotes. The current study, combined withinsights from other studies referenced herein, has re-fined the eukaryotic tree of life from over 70 majorlineages (Patterson 1999) to !16 major groups (Fig. 5,http://eutree.lifedesks.org/). Most significantly, weattribute the stability of major clades (e.g., Excavata,Amoebozoa, Opisthokonta, and SAR) to broader taxo-nomic sampling combined with analyses of sufficientcharacters (16 genes or 6578 characters). In our view,inclusion of more taxa coupled with carefully chosengenes is necessary to further resolve the 16 or so majorlineages of microbial eukaryotes for which sister grouprelationships remain uncertain.

SUPPLEMENTARY MATERIAL

Supplementary material can be found at http://www.sysbio.oxfordjournals.org/.

FIGURE 5. Summary of major findings—the evolutionary relation-ships among major lineages of eukaryotes. Clades have been collapsedinto those that we view to be strongly supported. The many poly-tomies represent uncertainties that remain.

FUNDING

This work was made possible by the US NationalScience Foundation Assembling the Tree of Life grantto L.A.K. and D.J.P. (043115) and US National Institutesof Health 5R01AI058054-05 to M.L.S. Funding to collectForaminifera was provided by a Society of SystematicBiologists MiniPEET grant to L.W.P.

ACKNOWLEDGMENTS

We are grateful to Robert Molestina at ATCC whoprovided DNAs through a collaborative National Sci-ence Foundation grant. We acknowledge the assistanceof Kasia Hammar, Leslie Murphy, and Jillian Wardin preparation and sequencing of EST libraries. Ourmanuscript was improved following detailed commentsfrom the editors, Alastair Simpson, and 1 anonymousreviewer. Thanks to David Hillis for conversations on

by guest on April 28, 2013

http://sysbio.oxfordjournals.org/D

ownloaded from

a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (Table

the relationships among them are unresolved: i) Cerco-

(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-

Bassa nematode-

eating soil amoeba, is related to vampyrellid amoebae; 100% BS), and together they are sister to the plant

is identical to an amoeba iso-Arachnula

previous hypotheses and clades with ultrastructural), when contaminant

). ExcavataMalaw-

),whereas in analyses of fewer genes Excavata mem-

Simp-

stable sister group and may represent an independent

) branches within Heterolo-

) and suggests that another enigmatic flag-Soginia anisocys-

Monday, April 29, 13

Page 14: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Priapulida 1 0Phaeophyceae 1 0Rotifera 1 0Hemichordata 1 0Pinguiophyceae 1 0Ctenophora 1 0Bolidophyceae 1 0Chaetognatha 1 0Porifera 2 0Xanthophyceae 2 0Tardigrada 2 0Euglenida 2 0Chromerida 3 0Placozoa 3 0Glomeromycota 3 0Cryptomycota 4 0Blastocladiomycota 5 0Echinodermata 6 0Entomophthoromycota 9 0Chytridiomycota 12 0Neocallimastigomycota 12 0Annelida 13 0Eustigmatophyceae 13 0Cnidaria 18 0Bacillariophyta 21 0Platyhelminthes 23 0Mollusca 25 0Microsporidia 31 1Chlorophyta 77 1Nematoda 110 2Apicomplexa 264 5Arthropoda 370 7Chordata 626 12Streptophyta 796 15Basidiomycota 976 18Ascomycota 1,251 23

530 SYSTEMATIC BIOLOGY VOL. 59

a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (TableS1). Three major clades are strongly supported, thoughthe relationships among them are unresolved: i) Cerco-zoa, ii) Foraminifera plus Polycystinea and Acantharea(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-phorida with Gromia and vampyrellids (Fig. 3; Basset al. 2009). We show that Theratromyxa, a nematode-eating soil amoeba, is related to vampyrellid amoebae(Fig. 3; 100% BS), and together they are sister to the plantparasites plasmodiophorids (100% BS). The SSU-rDNAsequence for Theratromyxa is identical to an amoeba iso-lated from Siberia where it was identified as Arachnulaimpatiens (EU567294; Bass et al. 2009).

The topology within the Excavata is consistent withprevious hypotheses and clades with ultrastructuralidentities (Simpson 2003; Fig. 4), when contaminantEST data originally mislabeled as Streblomastix strixare excluded (Slamovits and Keeling 2006). Excavatais often polyphyletic in other analyses because Malaw-imonas branches outside the other clades of Excavata(Rodrıguez-Ezpeleta et al. 2007a; Hampl et al. 2009),whereas in analyses of fewer genes Excavata mem-bers fall into 2 or 3 clades (Parfrey et al. 2006; Simp-son et al. 2006). Although Malawimonas nests robustlywithin Excavata in our analyses, it does not have astable sister group and may represent an independentlineage (Fig. 4). Our analyses confirm that Stephanopogon(unplaced in Patterson 1999) branches within Heterolo-bosea (Cavalier-Smith and Nikolaev 2008; Yubuki andLeander 2008) and suggests that another enigmatic flag-ellate, ATCC 50646 (tentatively named Soginia anisocys-tis) is a basal member of Heterolobosea.

CONCLUSIONS

The robust tree of life emerging from this studydemonstrates the benefits of improved taxon samplingfor reconstructing deep phylogeny as our analyses pro-duce stable topologies that include a broad representa-tion of eukaryotes. The current study, combined withinsights from other studies referenced herein, has re-fined the eukaryotic tree of life from over 70 majorlineages (Patterson 1999) to !16 major groups (Fig. 5,http://eutree.lifedesks.org/). Most significantly, weattribute the stability of major clades (e.g., Excavata,Amoebozoa, Opisthokonta, and SAR) to broader taxo-nomic sampling combined with analyses of sufficientcharacters (16 genes or 6578 characters). In our view,inclusion of more taxa coupled with carefully chosengenes is necessary to further resolve the 16 or so majorlineages of microbial eukaryotes for which sister grouprelationships remain uncertain.

SUPPLEMENTARY MATERIAL

Supplementary material can be found at http://www.sysbio.oxfordjournals.org/.

FIGURE 5. Summary of major findings—the evolutionary relation-ships among major lineages of eukaryotes. Clades have been collapsedinto those that we view to be strongly supported. The many poly-tomies represent uncertainties that remain.

FUNDING

This work was made possible by the US NationalScience Foundation Assembling the Tree of Life grantto L.A.K. and D.J.P. (043115) and US National Institutesof Health 5R01AI058054-05 to M.L.S. Funding to collectForaminifera was provided by a Society of SystematicBiologists MiniPEET grant to L.W.P.

ACKNOWLEDGMENTS

We are grateful to Robert Molestina at ATCC whoprovided DNAs through a collaborative National Sci-ence Foundation grant. We acknowledge the assistanceof Kasia Hammar, Leslie Murphy, and Jillian Wardin preparation and sequencing of EST libraries. Ourmanuscript was improved following detailed commentsfrom the editors, Alastair Simpson, and 1 anonymousreviewer. Thanks to David Hillis for conversations on

by guest on April 28, 2013

http://sysbio.oxfordjournals.org/D

ownloaded from

a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (Table

the relationships among them are unresolved: i) Cerco-

(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-

Bassa nematode-

eating soil amoeba, is related to vampyrellid amoebae; 100% BS), and together they are sister to the plant

is identical to an amoeba iso-Arachnula

previous hypotheses and clades with ultrastructural), when contaminant

). ExcavataMalaw-

),whereas in analyses of fewer genes Excavata mem-

Simp-

stable sister group and may represent an independent

) branches within Heterolo-

) and suggests that another enigmatic flag-Soginia anisocys-

Mapping GOLD to Tree

Monday, April 29, 13

Page 15: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Priapulida 1 0Phaeophyceae 1 0Rotifera 1 0Hemichordata 1 0Pinguiophyceae 1 0Ctenophora 1 0Bolidophyceae 1 0Chaetognatha 1 0Porifera 2 0Xanthophyceae 2 0Tardigrada 2 0Euglenida 2 0Chromerida 3 0Placozoa 3 0Glomeromycota 3 0Cryptomycota 4 0Blastocladiomycota 5 0Echinodermata 6 0Entomophthoromycota 9 0Chytridiomycota 12 0Neocallimastigomycota 12 0Annelida 13 0Eustigmatophyceae 13 0Cnidaria 18 0Bacillariophyta 21 0Platyhelminthes 23 0Mollusca 25 0Microsporidia 31 1Chlorophyta 77 1Nematoda 110 2Apicomplexa 264 5Arthropoda 370 7Chordata 626 12Streptophyta 796 15Basidiomycota 976 18Ascomycota 1,251 23

530 SYSTEMATIC BIOLOGY VOL. 59

a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (TableS1). Three major clades are strongly supported, thoughthe relationships among them are unresolved: i) Cerco-zoa, ii) Foraminifera plus Polycystinea and Acantharea(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-phorida with Gromia and vampyrellids (Fig. 3; Basset al. 2009). We show that Theratromyxa, a nematode-eating soil amoeba, is related to vampyrellid amoebae(Fig. 3; 100% BS), and together they are sister to the plantparasites plasmodiophorids (100% BS). The SSU-rDNAsequence for Theratromyxa is identical to an amoeba iso-lated from Siberia where it was identified as Arachnulaimpatiens (EU567294; Bass et al. 2009).

The topology within the Excavata is consistent withprevious hypotheses and clades with ultrastructuralidentities (Simpson 2003; Fig. 4), when contaminantEST data originally mislabeled as Streblomastix strixare excluded (Slamovits and Keeling 2006). Excavatais often polyphyletic in other analyses because Malaw-imonas branches outside the other clades of Excavata(Rodrıguez-Ezpeleta et al. 2007a; Hampl et al. 2009),whereas in analyses of fewer genes Excavata mem-bers fall into 2 or 3 clades (Parfrey et al. 2006; Simp-son et al. 2006). Although Malawimonas nests robustlywithin Excavata in our analyses, it does not have astable sister group and may represent an independentlineage (Fig. 4). Our analyses confirm that Stephanopogon(unplaced in Patterson 1999) branches within Heterolo-bosea (Cavalier-Smith and Nikolaev 2008; Yubuki andLeander 2008) and suggests that another enigmatic flag-ellate, ATCC 50646 (tentatively named Soginia anisocys-tis) is a basal member of Heterolobosea.

CONCLUSIONS

The robust tree of life emerging from this studydemonstrates the benefits of improved taxon samplingfor reconstructing deep phylogeny as our analyses pro-duce stable topologies that include a broad representa-tion of eukaryotes. The current study, combined withinsights from other studies referenced herein, has re-fined the eukaryotic tree of life from over 70 majorlineages (Patterson 1999) to !16 major groups (Fig. 5,http://eutree.lifedesks.org/). Most significantly, weattribute the stability of major clades (e.g., Excavata,Amoebozoa, Opisthokonta, and SAR) to broader taxo-nomic sampling combined with analyses of sufficientcharacters (16 genes or 6578 characters). In our view,inclusion of more taxa coupled with carefully chosengenes is necessary to further resolve the 16 or so majorlineages of microbial eukaryotes for which sister grouprelationships remain uncertain.

SUPPLEMENTARY MATERIAL

Supplementary material can be found at http://www.sysbio.oxfordjournals.org/.

FIGURE 5. Summary of major findings—the evolutionary relation-ships among major lineages of eukaryotes. Clades have been collapsedinto those that we view to be strongly supported. The many poly-tomies represent uncertainties that remain.

FUNDING

This work was made possible by the US NationalScience Foundation Assembling the Tree of Life grantto L.A.K. and D.J.P. (043115) and US National Institutesof Health 5R01AI058054-05 to M.L.S. Funding to collectForaminifera was provided by a Society of SystematicBiologists MiniPEET grant to L.W.P.

ACKNOWLEDGMENTS

We are grateful to Robert Molestina at ATCC whoprovided DNAs through a collaborative National Sci-ence Foundation grant. We acknowledge the assistanceof Kasia Hammar, Leslie Murphy, and Jillian Wardin preparation and sequencing of EST libraries. Ourmanuscript was improved following detailed commentsfrom the editors, Alastair Simpson, and 1 anonymousreviewer. Thanks to David Hillis for conversations on

by guest on April 28, 2013

http://sysbio.oxfordjournals.org/D

ownloaded from

a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (Table

the relationships among them are unresolved: i) Cerco-

(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-

Bassa nematode-

eating soil amoeba, is related to vampyrellid amoebae; 100% BS), and together they are sister to the plant

is identical to an amoeba iso-Arachnula

previous hypotheses and clades with ultrastructural), when contaminant

). ExcavataMalaw-

),whereas in analyses of fewer genes Excavata mem-

Simp-

stable sister group and may represent an independent

) branches within Heterolo-

) and suggests that another enigmatic flag-Soginia anisocys-

Fungi49%

Mapping GOLD to Tree

Monday, April 29, 13

Page 16: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Priapulida 1 0Phaeophyceae 1 0Rotifera 1 0Hemichordata 1 0Pinguiophyceae 1 0Ctenophora 1 0Bolidophyceae 1 0Chaetognatha 1 0Porifera 2 0Xanthophyceae 2 0Tardigrada 2 0Euglenida 2 0Chromerida 3 0Placozoa 3 0Glomeromycota 3 0Cryptomycota 4 0Blastocladiomycota 5 0Echinodermata 6 0Entomophthoromycota 9 0Chytridiomycota 12 0Neocallimastigomycota 12 0Annelida 13 0Eustigmatophyceae 13 0Cnidaria 18 0Bacillariophyta 21 0Platyhelminthes 23 0Mollusca 25 0Microsporidia 31 1Chlorophyta 77 1Nematoda 110 2Apicomplexa 264 5Arthropoda 370 7Chordata 626 12Streptophyta 796 15Basidiomycota 976 18Ascomycota 1,251 23

530 SYSTEMATIC BIOLOGY VOL. 59

a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (TableS1). Three major clades are strongly supported, thoughthe relationships among them are unresolved: i) Cerco-zoa, ii) Foraminifera plus Polycystinea and Acantharea(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-phorida with Gromia and vampyrellids (Fig. 3; Basset al. 2009). We show that Theratromyxa, a nematode-eating soil amoeba, is related to vampyrellid amoebae(Fig. 3; 100% BS), and together they are sister to the plantparasites plasmodiophorids (100% BS). The SSU-rDNAsequence for Theratromyxa is identical to an amoeba iso-lated from Siberia where it was identified as Arachnulaimpatiens (EU567294; Bass et al. 2009).

The topology within the Excavata is consistent withprevious hypotheses and clades with ultrastructuralidentities (Simpson 2003; Fig. 4), when contaminantEST data originally mislabeled as Streblomastix strixare excluded (Slamovits and Keeling 2006). Excavatais often polyphyletic in other analyses because Malaw-imonas branches outside the other clades of Excavata(Rodrıguez-Ezpeleta et al. 2007a; Hampl et al. 2009),whereas in analyses of fewer genes Excavata mem-bers fall into 2 or 3 clades (Parfrey et al. 2006; Simp-son et al. 2006). Although Malawimonas nests robustlywithin Excavata in our analyses, it does not have astable sister group and may represent an independentlineage (Fig. 4). Our analyses confirm that Stephanopogon(unplaced in Patterson 1999) branches within Heterolo-bosea (Cavalier-Smith and Nikolaev 2008; Yubuki andLeander 2008) and suggests that another enigmatic flag-ellate, ATCC 50646 (tentatively named Soginia anisocys-tis) is a basal member of Heterolobosea.

CONCLUSIONS

The robust tree of life emerging from this studydemonstrates the benefits of improved taxon samplingfor reconstructing deep phylogeny as our analyses pro-duce stable topologies that include a broad representa-tion of eukaryotes. The current study, combined withinsights from other studies referenced herein, has re-fined the eukaryotic tree of life from over 70 majorlineages (Patterson 1999) to !16 major groups (Fig. 5,http://eutree.lifedesks.org/). Most significantly, weattribute the stability of major clades (e.g., Excavata,Amoebozoa, Opisthokonta, and SAR) to broader taxo-nomic sampling combined with analyses of sufficientcharacters (16 genes or 6578 characters). In our view,inclusion of more taxa coupled with carefully chosengenes is necessary to further resolve the 16 or so majorlineages of microbial eukaryotes for which sister grouprelationships remain uncertain.

SUPPLEMENTARY MATERIAL

Supplementary material can be found at http://www.sysbio.oxfordjournals.org/.

FIGURE 5. Summary of major findings—the evolutionary relation-ships among major lineages of eukaryotes. Clades have been collapsedinto those that we view to be strongly supported. The many poly-tomies represent uncertainties that remain.

FUNDING

This work was made possible by the US NationalScience Foundation Assembling the Tree of Life grantto L.A.K. and D.J.P. (043115) and US National Institutesof Health 5R01AI058054-05 to M.L.S. Funding to collectForaminifera was provided by a Society of SystematicBiologists MiniPEET grant to L.W.P.

ACKNOWLEDGMENTS

We are grateful to Robert Molestina at ATCC whoprovided DNAs through a collaborative National Sci-ence Foundation grant. We acknowledge the assistanceof Kasia Hammar, Leslie Murphy, and Jillian Wardin preparation and sequencing of EST libraries. Ourmanuscript was improved following detailed commentsfrom the editors, Alastair Simpson, and 1 anonymousreviewer. Thanks to David Hillis for conversations on

by guest on April 28, 2013

http://sysbio.oxfordjournals.org/D

ownloaded from

a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (Table

the relationships among them are unresolved: i) Cerco-

(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-

Bassa nematode-

eating soil amoeba, is related to vampyrellid amoebae; 100% BS), and together they are sister to the plant

is identical to an amoeba iso-Arachnula

previous hypotheses and clades with ultrastructural), when contaminant

). ExcavataMalaw-

),whereas in analyses of fewer genes Excavata mem-

Simp-

stable sister group and may represent an independent

) branches within Heterolo-

) and suggests that another enigmatic flag-Soginia anisocys-

Mapping GOLD to Tree

Monday, April 29, 13

Page 17: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Priapulida 1 0Phaeophyceae 1 0Rotifera 1 0Hemichordata 1 0Pinguiophyceae 1 0Ctenophora 1 0Bolidophyceae 1 0Chaetognatha 1 0Porifera 2 0Xanthophyceae 2 0Tardigrada 2 0Euglenida 2 0Chromerida 3 0Placozoa 3 0Glomeromycota 3 0Cryptomycota 4 0Blastocladiomycota 5 0Echinodermata 6 0Entomophthoromycota 9 0Chytridiomycota 12 0Neocallimastigomycota 12 0Annelida 13 0Eustigmatophyceae 13 0Cnidaria 18 0Bacillariophyta 21 0Platyhelminthes 23 0Mollusca 25 0Microsporidia 31 1Chlorophyta 77 1Nematoda 110 2Apicomplexa 264 5Arthropoda 370 7Chordata 626 12Streptophyta 796 15Basidiomycota 976 18Ascomycota 1,251 23

530 SYSTEMATIC BIOLOGY VOL. 59

a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (TableS1). Three major clades are strongly supported, thoughthe relationships among them are unresolved: i) Cerco-zoa, ii) Foraminifera plus Polycystinea and Acantharea(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-phorida with Gromia and vampyrellids (Fig. 3; Basset al. 2009). We show that Theratromyxa, a nematode-eating soil amoeba, is related to vampyrellid amoebae(Fig. 3; 100% BS), and together they are sister to the plantparasites plasmodiophorids (100% BS). The SSU-rDNAsequence for Theratromyxa is identical to an amoeba iso-lated from Siberia where it was identified as Arachnulaimpatiens (EU567294; Bass et al. 2009).

The topology within the Excavata is consistent withprevious hypotheses and clades with ultrastructuralidentities (Simpson 2003; Fig. 4), when contaminantEST data originally mislabeled as Streblomastix strixare excluded (Slamovits and Keeling 2006). Excavatais often polyphyletic in other analyses because Malaw-imonas branches outside the other clades of Excavata(Rodrıguez-Ezpeleta et al. 2007a; Hampl et al. 2009),whereas in analyses of fewer genes Excavata mem-bers fall into 2 or 3 clades (Parfrey et al. 2006; Simp-son et al. 2006). Although Malawimonas nests robustlywithin Excavata in our analyses, it does not have astable sister group and may represent an independentlineage (Fig. 4). Our analyses confirm that Stephanopogon(unplaced in Patterson 1999) branches within Heterolo-bosea (Cavalier-Smith and Nikolaev 2008; Yubuki andLeander 2008) and suggests that another enigmatic flag-ellate, ATCC 50646 (tentatively named Soginia anisocys-tis) is a basal member of Heterolobosea.

CONCLUSIONS

The robust tree of life emerging from this studydemonstrates the benefits of improved taxon samplingfor reconstructing deep phylogeny as our analyses pro-duce stable topologies that include a broad representa-tion of eukaryotes. The current study, combined withinsights from other studies referenced herein, has re-fined the eukaryotic tree of life from over 70 majorlineages (Patterson 1999) to !16 major groups (Fig. 5,http://eutree.lifedesks.org/). Most significantly, weattribute the stability of major clades (e.g., Excavata,Amoebozoa, Opisthokonta, and SAR) to broader taxo-nomic sampling combined with analyses of sufficientcharacters (16 genes or 6578 characters). In our view,inclusion of more taxa coupled with carefully chosengenes is necessary to further resolve the 16 or so majorlineages of microbial eukaryotes for which sister grouprelationships remain uncertain.

SUPPLEMENTARY MATERIAL

Supplementary material can be found at http://www.sysbio.oxfordjournals.org/.

FIGURE 5. Summary of major findings—the evolutionary relation-ships among major lineages of eukaryotes. Clades have been collapsedinto those that we view to be strongly supported. The many poly-tomies represent uncertainties that remain.

FUNDING

This work was made possible by the US NationalScience Foundation Assembling the Tree of Life grantto L.A.K. and D.J.P. (043115) and US National Institutesof Health 5R01AI058054-05 to M.L.S. Funding to collectForaminifera was provided by a Society of SystematicBiologists MiniPEET grant to L.W.P.

ACKNOWLEDGMENTS

We are grateful to Robert Molestina at ATCC whoprovided DNAs through a collaborative National Sci-ence Foundation grant. We acknowledge the assistanceof Kasia Hammar, Leslie Murphy, and Jillian Wardin preparation and sequencing of EST libraries. Ourmanuscript was improved following detailed commentsfrom the editors, Alastair Simpson, and 1 anonymousreviewer. Thanks to David Hillis for conversations on

by guest on April 28, 2013

http://sysbio.oxfordjournals.org/D

ownloaded from

a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (Table

the relationships among them are unresolved: i) Cerco-

(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-

Bassa nematode-

eating soil amoeba, is related to vampyrellid amoebae; 100% BS), and together they are sister to the plant

is identical to an amoeba iso-Arachnula

previous hypotheses and clades with ultrastructural), when contaminant

). ExcavataMalaw-

),whereas in analyses of fewer genes Excavata mem-

Simp-

stable sister group and may represent an independent

) branches within Heterolo-

) and suggests that another enigmatic flag-Soginia anisocys-

Mapping GOLD to Tree

Monday, April 29, 13

Page 18: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Priapulida 1 0Phaeophyceae 1 0Rotifera 1 0Hemichordata 1 0Pinguiophyceae 1 0Ctenophora 1 0Bolidophyceae 1 0Chaetognatha 1 0Porifera 2 0Xanthophyceae 2 0Tardigrada 2 0Euglenida 2 0Chromerida 3 0Placozoa 3 0Glomeromycota 3 0Cryptomycota 4 0Blastocladiomycota 5 0Echinodermata 6 0Entomophthoromycota 9 0Chytridiomycota 12 0Neocallimastigomycota 12 0Annelida 13 0Eustigmatophyceae 13 0Cnidaria 18 0Bacillariophyta 21 0Platyhelminthes 23 0Mollusca 25 0Microsporidia 31 1Chlorophyta 77 1Nematoda 110 2Apicomplexa 264 5Arthropoda 370 7Chordata 626 12Streptophyta 796 15Basidiomycota 976 18Ascomycota 1,251 23

530 SYSTEMATIC BIOLOGY VOL. 59

a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (TableS1). Three major clades are strongly supported, thoughthe relationships among them are unresolved: i) Cerco-zoa, ii) Foraminifera plus Polycystinea and Acantharea(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-phorida with Gromia and vampyrellids (Fig. 3; Basset al. 2009). We show that Theratromyxa, a nematode-eating soil amoeba, is related to vampyrellid amoebae(Fig. 3; 100% BS), and together they are sister to the plantparasites plasmodiophorids (100% BS). The SSU-rDNAsequence for Theratromyxa is identical to an amoeba iso-lated from Siberia where it was identified as Arachnulaimpatiens (EU567294; Bass et al. 2009).

The topology within the Excavata is consistent withprevious hypotheses and clades with ultrastructuralidentities (Simpson 2003; Fig. 4), when contaminantEST data originally mislabeled as Streblomastix strixare excluded (Slamovits and Keeling 2006). Excavatais often polyphyletic in other analyses because Malaw-imonas branches outside the other clades of Excavata(Rodrıguez-Ezpeleta et al. 2007a; Hampl et al. 2009),whereas in analyses of fewer genes Excavata mem-bers fall into 2 or 3 clades (Parfrey et al. 2006; Simp-son et al. 2006). Although Malawimonas nests robustlywithin Excavata in our analyses, it does not have astable sister group and may represent an independentlineage (Fig. 4). Our analyses confirm that Stephanopogon(unplaced in Patterson 1999) branches within Heterolo-bosea (Cavalier-Smith and Nikolaev 2008; Yubuki andLeander 2008) and suggests that another enigmatic flag-ellate, ATCC 50646 (tentatively named Soginia anisocys-tis) is a basal member of Heterolobosea.

CONCLUSIONS

The robust tree of life emerging from this studydemonstrates the benefits of improved taxon samplingfor reconstructing deep phylogeny as our analyses pro-duce stable topologies that include a broad representa-tion of eukaryotes. The current study, combined withinsights from other studies referenced herein, has re-fined the eukaryotic tree of life from over 70 majorlineages (Patterson 1999) to !16 major groups (Fig. 5,http://eutree.lifedesks.org/). Most significantly, weattribute the stability of major clades (e.g., Excavata,Amoebozoa, Opisthokonta, and SAR) to broader taxo-nomic sampling combined with analyses of sufficientcharacters (16 genes or 6578 characters). In our view,inclusion of more taxa coupled with carefully chosengenes is necessary to further resolve the 16 or so majorlineages of microbial eukaryotes for which sister grouprelationships remain uncertain.

SUPPLEMENTARY MATERIAL

Supplementary material can be found at http://www.sysbio.oxfordjournals.org/.

FIGURE 5. Summary of major findings—the evolutionary relation-ships among major lineages of eukaryotes. Clades have been collapsedinto those that we view to be strongly supported. The many poly-tomies represent uncertainties that remain.

FUNDING

This work was made possible by the US NationalScience Foundation Assembling the Tree of Life grantto L.A.K. and D.J.P. (043115) and US National Institutesof Health 5R01AI058054-05 to M.L.S. Funding to collectForaminifera was provided by a Society of SystematicBiologists MiniPEET grant to L.W.P.

ACKNOWLEDGMENTS

We are grateful to Robert Molestina at ATCC whoprovided DNAs through a collaborative National Sci-ence Foundation grant. We acknowledge the assistanceof Kasia Hammar, Leslie Murphy, and Jillian Wardin preparation and sequencing of EST libraries. Ourmanuscript was improved following detailed commentsfrom the editors, Alastair Simpson, and 1 anonymousreviewer. Thanks to David Hillis for conversations on

by guest on April 28, 2013

http://sysbio.oxfordjournals.org/D

ownloaded from

a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (Table

the relationships among them are unresolved: i) Cerco-

(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-

Bassa nematode-

eating soil amoeba, is related to vampyrellid amoebae; 100% BS), and together they are sister to the plant

is identical to an amoeba iso-Arachnula

previous hypotheses and clades with ultrastructural), when contaminant

). ExcavataMalaw-

),whereas in analyses of fewer genes Excavata mem-

Simp-

stable sister group and may represent an independent

) branches within Heterolo-

) and suggests that another enigmatic flag-Soginia anisocys-

Animals26%

Mapping GOLD to Tree

Monday, April 29, 13

Page 19: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Priapulida 1 0Phaeophyceae 1 0Rotifera 1 0Hemichordata 1 0Pinguiophyceae 1 0Ctenophora 1 0Bolidophyceae 1 0Chaetognatha 1 0Porifera 2 0Xanthophyceae 2 0Tardigrada 2 0Euglenida 2 0Chromerida 3 0Placozoa 3 0Glomeromycota 3 0Cryptomycota 4 0Blastocladiomycota 5 0Echinodermata 6 0Entomophthoromycota 9 0Chytridiomycota 12 0Neocallimastigomycota 12 0Annelida 13 0Eustigmatophyceae 13 0Cnidaria 18 0Bacillariophyta 21 0Platyhelminthes 23 0Mollusca 25 0Microsporidia 31 1Chlorophyta 77 1Nematoda 110 2Apicomplexa 264 5Arthropoda 370 7Chordata 626 12Streptophyta 796 15Basidiomycota 976 18Ascomycota 1,251 23

530 SYSTEMATIC BIOLOGY VOL. 59

a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (TableS1). Three major clades are strongly supported, thoughthe relationships among them are unresolved: i) Cerco-zoa, ii) Foraminifera plus Polycystinea and Acantharea(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-phorida with Gromia and vampyrellids (Fig. 3; Basset al. 2009). We show that Theratromyxa, a nematode-eating soil amoeba, is related to vampyrellid amoebae(Fig. 3; 100% BS), and together they are sister to the plantparasites plasmodiophorids (100% BS). The SSU-rDNAsequence for Theratromyxa is identical to an amoeba iso-lated from Siberia where it was identified as Arachnulaimpatiens (EU567294; Bass et al. 2009).

The topology within the Excavata is consistent withprevious hypotheses and clades with ultrastructuralidentities (Simpson 2003; Fig. 4), when contaminantEST data originally mislabeled as Streblomastix strixare excluded (Slamovits and Keeling 2006). Excavatais often polyphyletic in other analyses because Malaw-imonas branches outside the other clades of Excavata(Rodrıguez-Ezpeleta et al. 2007a; Hampl et al. 2009),whereas in analyses of fewer genes Excavata mem-bers fall into 2 or 3 clades (Parfrey et al. 2006; Simp-son et al. 2006). Although Malawimonas nests robustlywithin Excavata in our analyses, it does not have astable sister group and may represent an independentlineage (Fig. 4). Our analyses confirm that Stephanopogon(unplaced in Patterson 1999) branches within Heterolo-bosea (Cavalier-Smith and Nikolaev 2008; Yubuki andLeander 2008) and suggests that another enigmatic flag-ellate, ATCC 50646 (tentatively named Soginia anisocys-tis) is a basal member of Heterolobosea.

CONCLUSIONS

The robust tree of life emerging from this studydemonstrates the benefits of improved taxon samplingfor reconstructing deep phylogeny as our analyses pro-duce stable topologies that include a broad representa-tion of eukaryotes. The current study, combined withinsights from other studies referenced herein, has re-fined the eukaryotic tree of life from over 70 majorlineages (Patterson 1999) to !16 major groups (Fig. 5,http://eutree.lifedesks.org/). Most significantly, weattribute the stability of major clades (e.g., Excavata,Amoebozoa, Opisthokonta, and SAR) to broader taxo-nomic sampling combined with analyses of sufficientcharacters (16 genes or 6578 characters). In our view,inclusion of more taxa coupled with carefully chosengenes is necessary to further resolve the 16 or so majorlineages of microbial eukaryotes for which sister grouprelationships remain uncertain.

SUPPLEMENTARY MATERIAL

Supplementary material can be found at http://www.sysbio.oxfordjournals.org/.

FIGURE 5. Summary of major findings—the evolutionary relation-ships among major lineages of eukaryotes. Clades have been collapsedinto those that we view to be strongly supported. The many poly-tomies represent uncertainties that remain.

FUNDING

This work was made possible by the US NationalScience Foundation Assembling the Tree of Life grantto L.A.K. and D.J.P. (043115) and US National Institutesof Health 5R01AI058054-05 to M.L.S. Funding to collectForaminifera was provided by a Society of SystematicBiologists MiniPEET grant to L.W.P.

ACKNOWLEDGMENTS

We are grateful to Robert Molestina at ATCC whoprovided DNAs through a collaborative National Sci-ence Foundation grant. We acknowledge the assistanceof Kasia Hammar, Leslie Murphy, and Jillian Wardin preparation and sequencing of EST libraries. Ourmanuscript was improved following detailed commentsfrom the editors, Alastair Simpson, and 1 anonymousreviewer. Thanks to David Hillis for conversations on

by guest on April 28, 2013

http://sysbio.oxfordjournals.org/D

ownloaded from

a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (Table

the relationships among them are unresolved: i) Cerco-

(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-

Bassa nematode-

eating soil amoeba, is related to vampyrellid amoebae; 100% BS), and together they are sister to the plant

is identical to an amoeba iso-Arachnula

previous hypotheses and clades with ultrastructural), when contaminant

). ExcavataMalaw-

),whereas in analyses of fewer genes Excavata mem-

Simp-

stable sister group and may represent an independent

) branches within Heterolo-

) and suggests that another enigmatic flag-Soginia anisocys-

Mapping GOLD to Tree

Monday, April 29, 13

Page 20: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Priapulida 1 0Phaeophyceae 1 0Rotifera 1 0Hemichordata 1 0Pinguiophyceae 1 0Ctenophora 1 0Bolidophyceae 1 0Chaetognatha 1 0Porifera 2 0Xanthophyceae 2 0Tardigrada 2 0Euglenida 2 0Chromerida 3 0Placozoa 3 0Glomeromycota 3 0Cryptomycota 4 0Blastocladiomycota 5 0Echinodermata 6 0Entomophthoromycota 9 0Chytridiomycota 12 0Neocallimastigomycota 12 0Annelida 13 0Eustigmatophyceae 13 0Cnidaria 18 0Bacillariophyta 21 0Platyhelminthes 23 0Mollusca 25 0Microsporidia 31 1Chlorophyta 77 1Nematoda 110 2Apicomplexa 264 5Arthropoda 370 7Chordata 626 12Streptophyta 796 15Basidiomycota 976 18Ascomycota 1,251 23

530 SYSTEMATIC BIOLOGY VOL. 59

a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (TableS1). Three major clades are strongly supported, thoughthe relationships among them are unresolved: i) Cerco-zoa, ii) Foraminifera plus Polycystinea and Acantharea(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-phorida with Gromia and vampyrellids (Fig. 3; Basset al. 2009). We show that Theratromyxa, a nematode-eating soil amoeba, is related to vampyrellid amoebae(Fig. 3; 100% BS), and together they are sister to the plantparasites plasmodiophorids (100% BS). The SSU-rDNAsequence for Theratromyxa is identical to an amoeba iso-lated from Siberia where it was identified as Arachnulaimpatiens (EU567294; Bass et al. 2009).

The topology within the Excavata is consistent withprevious hypotheses and clades with ultrastructuralidentities (Simpson 2003; Fig. 4), when contaminantEST data originally mislabeled as Streblomastix strixare excluded (Slamovits and Keeling 2006). Excavatais often polyphyletic in other analyses because Malaw-imonas branches outside the other clades of Excavata(Rodrıguez-Ezpeleta et al. 2007a; Hampl et al. 2009),whereas in analyses of fewer genes Excavata mem-bers fall into 2 or 3 clades (Parfrey et al. 2006; Simp-son et al. 2006). Although Malawimonas nests robustlywithin Excavata in our analyses, it does not have astable sister group and may represent an independentlineage (Fig. 4). Our analyses confirm that Stephanopogon(unplaced in Patterson 1999) branches within Heterolo-bosea (Cavalier-Smith and Nikolaev 2008; Yubuki andLeander 2008) and suggests that another enigmatic flag-ellate, ATCC 50646 (tentatively named Soginia anisocys-tis) is a basal member of Heterolobosea.

CONCLUSIONS

The robust tree of life emerging from this studydemonstrates the benefits of improved taxon samplingfor reconstructing deep phylogeny as our analyses pro-duce stable topologies that include a broad representa-tion of eukaryotes. The current study, combined withinsights from other studies referenced herein, has re-fined the eukaryotic tree of life from over 70 majorlineages (Patterson 1999) to !16 major groups (Fig. 5,http://eutree.lifedesks.org/). Most significantly, weattribute the stability of major clades (e.g., Excavata,Amoebozoa, Opisthokonta, and SAR) to broader taxo-nomic sampling combined with analyses of sufficientcharacters (16 genes or 6578 characters). In our view,inclusion of more taxa coupled with carefully chosengenes is necessary to further resolve the 16 or so majorlineages of microbial eukaryotes for which sister grouprelationships remain uncertain.

SUPPLEMENTARY MATERIAL

Supplementary material can be found at http://www.sysbio.oxfordjournals.org/.

FIGURE 5. Summary of major findings—the evolutionary relation-ships among major lineages of eukaryotes. Clades have been collapsedinto those that we view to be strongly supported. The many poly-tomies represent uncertainties that remain.

FUNDING

This work was made possible by the US NationalScience Foundation Assembling the Tree of Life grantto L.A.K. and D.J.P. (043115) and US National Institutesof Health 5R01AI058054-05 to M.L.S. Funding to collectForaminifera was provided by a Society of SystematicBiologists MiniPEET grant to L.W.P.

ACKNOWLEDGMENTS

We are grateful to Robert Molestina at ATCC whoprovided DNAs through a collaborative National Sci-ence Foundation grant. We acknowledge the assistanceof Kasia Hammar, Leslie Murphy, and Jillian Wardin preparation and sequencing of EST libraries. Ourmanuscript was improved following detailed commentsfrom the editors, Alastair Simpson, and 1 anonymousreviewer. Thanks to David Hillis for conversations on

by guest on April 28, 2013

http://sysbio.oxfordjournals.org/D

ownloaded from

a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (Table

the relationships among them are unresolved: i) Cerco-

(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-

Bassa nematode-

eating soil amoeba, is related to vampyrellid amoebae; 100% BS), and together they are sister to the plant

is identical to an amoeba iso-Arachnula

previous hypotheses and clades with ultrastructural), when contaminant

). ExcavataMalaw-

),whereas in analyses of fewer genes Excavata mem-

Simp-

stable sister group and may represent an independent

) branches within Heterolo-

) and suggests that another enigmatic flag-Soginia anisocys-

Mapping GOLD to Tree

Monday, April 29, 13

Page 21: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Priapulida 1 0Phaeophyceae 1 0Rotifera 1 0Hemichordata 1 0Pinguiophyceae 1 0Ctenophora 1 0Bolidophyceae 1 0Chaetognatha 1 0Porifera 2 0Xanthophyceae 2 0Tardigrada 2 0Euglenida 2 0Chromerida 3 0Placozoa 3 0Glomeromycota 3 0Cryptomycota 4 0Blastocladiomycota 5 0Echinodermata 6 0Entomophthoromycota 9 0Chytridiomycota 12 0Neocallimastigomycota 12 0Annelida 13 0Eustigmatophyceae 13 0Cnidaria 18 0Bacillariophyta 21 0Platyhelminthes 23 0Mollusca 25 0Microsporidia 31 1Chlorophyta 77 1Nematoda 110 2Apicomplexa 264 5Arthropoda 370 7Chordata 626 12Streptophyta 796 15Basidiomycota 976 18Ascomycota 1,251 23

530 SYSTEMATIC BIOLOGY VOL. 59

a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (TableS1). Three major clades are strongly supported, thoughthe relationships among them are unresolved: i) Cerco-zoa, ii) Foraminifera plus Polycystinea and Acantharea(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-phorida with Gromia and vampyrellids (Fig. 3; Basset al. 2009). We show that Theratromyxa, a nematode-eating soil amoeba, is related to vampyrellid amoebae(Fig. 3; 100% BS), and together they are sister to the plantparasites plasmodiophorids (100% BS). The SSU-rDNAsequence for Theratromyxa is identical to an amoeba iso-lated from Siberia where it was identified as Arachnulaimpatiens (EU567294; Bass et al. 2009).

The topology within the Excavata is consistent withprevious hypotheses and clades with ultrastructuralidentities (Simpson 2003; Fig. 4), when contaminantEST data originally mislabeled as Streblomastix strixare excluded (Slamovits and Keeling 2006). Excavatais often polyphyletic in other analyses because Malaw-imonas branches outside the other clades of Excavata(Rodrıguez-Ezpeleta et al. 2007a; Hampl et al. 2009),whereas in analyses of fewer genes Excavata mem-bers fall into 2 or 3 clades (Parfrey et al. 2006; Simp-son et al. 2006). Although Malawimonas nests robustlywithin Excavata in our analyses, it does not have astable sister group and may represent an independentlineage (Fig. 4). Our analyses confirm that Stephanopogon(unplaced in Patterson 1999) branches within Heterolo-bosea (Cavalier-Smith and Nikolaev 2008; Yubuki andLeander 2008) and suggests that another enigmatic flag-ellate, ATCC 50646 (tentatively named Soginia anisocys-tis) is a basal member of Heterolobosea.

CONCLUSIONS

The robust tree of life emerging from this studydemonstrates the benefits of improved taxon samplingfor reconstructing deep phylogeny as our analyses pro-duce stable topologies that include a broad representa-tion of eukaryotes. The current study, combined withinsights from other studies referenced herein, has re-fined the eukaryotic tree of life from over 70 majorlineages (Patterson 1999) to !16 major groups (Fig. 5,http://eutree.lifedesks.org/). Most significantly, weattribute the stability of major clades (e.g., Excavata,Amoebozoa, Opisthokonta, and SAR) to broader taxo-nomic sampling combined with analyses of sufficientcharacters (16 genes or 6578 characters). In our view,inclusion of more taxa coupled with carefully chosengenes is necessary to further resolve the 16 or so majorlineages of microbial eukaryotes for which sister grouprelationships remain uncertain.

SUPPLEMENTARY MATERIAL

Supplementary material can be found at http://www.sysbio.oxfordjournals.org/.

FIGURE 5. Summary of major findings—the evolutionary relation-ships among major lineages of eukaryotes. Clades have been collapsedinto those that we view to be strongly supported. The many poly-tomies represent uncertainties that remain.

FUNDING

This work was made possible by the US NationalScience Foundation Assembling the Tree of Life grantto L.A.K. and D.J.P. (043115) and US National Institutesof Health 5R01AI058054-05 to M.L.S. Funding to collectForaminifera was provided by a Society of SystematicBiologists MiniPEET grant to L.W.P.

ACKNOWLEDGMENTS

We are grateful to Robert Molestina at ATCC whoprovided DNAs through a collaborative National Sci-ence Foundation grant. We acknowledge the assistanceof Kasia Hammar, Leslie Murphy, and Jillian Wardin preparation and sequencing of EST libraries. Ourmanuscript was improved following detailed commentsfrom the editors, Alastair Simpson, and 1 anonymousreviewer. Thanks to David Hillis for conversations on

by guest on April 28, 2013

http://sysbio.oxfordjournals.org/D

ownloaded from

a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (Table

the relationships among them are unresolved: i) Cerco-

(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-

Bassa nematode-

eating soil amoeba, is related to vampyrellid amoebae; 100% BS), and together they are sister to the plant

is identical to an amoeba iso-Arachnula

previous hypotheses and clades with ultrastructural), when contaminant

). ExcavataMalaw-

),whereas in analyses of fewer genes Excavata mem-

Simp-

stable sister group and may represent an independent

) branches within Heterolo-

) and suggests that another enigmatic flag-Soginia anisocys-

Green algae19%

Mapping GOLD to Tree

Monday, April 29, 13

Page 22: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Priapulida 1 0Phaeophyceae 1 0Rotifera 1 0Hemichordata 1 0Pinguiophyceae 1 0Ctenophora 1 0Bolidophyceae 1 0Chaetognatha 1 0Porifera 2 0Xanthophyceae 2 0Tardigrada 2 0Euglenida 2 0Chromerida 3 0Placozoa 3 0Glomeromycota 3 0Cryptomycota 4 0Blastocladiomycota 5 0Echinodermata 6 0Entomophthoromycota 9 0Chytridiomycota 12 0Neocallimastigomycota 12 0Annelida 13 0Eustigmatophyceae 13 0Cnidaria 18 0Bacillariophyta 21 0Platyhelminthes 23 0Mollusca 25 0Microsporidia 31 1Chlorophyta 77 1Nematoda 110 2Apicomplexa 264 5Arthropoda 370 7Chordata 626 12Streptophyta 796 15Basidiomycota 976 18Ascomycota 1,251 23

530 SYSTEMATIC BIOLOGY VOL. 59

a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (TableS1). Three major clades are strongly supported, thoughthe relationships among them are unresolved: i) Cerco-zoa, ii) Foraminifera plus Polycystinea and Acantharea(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-phorida with Gromia and vampyrellids (Fig. 3; Basset al. 2009). We show that Theratromyxa, a nematode-eating soil amoeba, is related to vampyrellid amoebae(Fig. 3; 100% BS), and together they are sister to the plantparasites plasmodiophorids (100% BS). The SSU-rDNAsequence for Theratromyxa is identical to an amoeba iso-lated from Siberia where it was identified as Arachnulaimpatiens (EU567294; Bass et al. 2009).

The topology within the Excavata is consistent withprevious hypotheses and clades with ultrastructuralidentities (Simpson 2003; Fig. 4), when contaminantEST data originally mislabeled as Streblomastix strixare excluded (Slamovits and Keeling 2006). Excavatais often polyphyletic in other analyses because Malaw-imonas branches outside the other clades of Excavata(Rodrıguez-Ezpeleta et al. 2007a; Hampl et al. 2009),whereas in analyses of fewer genes Excavata mem-bers fall into 2 or 3 clades (Parfrey et al. 2006; Simp-son et al. 2006). Although Malawimonas nests robustlywithin Excavata in our analyses, it does not have astable sister group and may represent an independentlineage (Fig. 4). Our analyses confirm that Stephanopogon(unplaced in Patterson 1999) branches within Heterolo-bosea (Cavalier-Smith and Nikolaev 2008; Yubuki andLeander 2008) and suggests that another enigmatic flag-ellate, ATCC 50646 (tentatively named Soginia anisocys-tis) is a basal member of Heterolobosea.

CONCLUSIONS

The robust tree of life emerging from this studydemonstrates the benefits of improved taxon samplingfor reconstructing deep phylogeny as our analyses pro-duce stable topologies that include a broad representa-tion of eukaryotes. The current study, combined withinsights from other studies referenced herein, has re-fined the eukaryotic tree of life from over 70 majorlineages (Patterson 1999) to !16 major groups (Fig. 5,http://eutree.lifedesks.org/). Most significantly, weattribute the stability of major clades (e.g., Excavata,Amoebozoa, Opisthokonta, and SAR) to broader taxo-nomic sampling combined with analyses of sufficientcharacters (16 genes or 6578 characters). In our view,inclusion of more taxa coupled with carefully chosengenes is necessary to further resolve the 16 or so majorlineages of microbial eukaryotes for which sister grouprelationships remain uncertain.

SUPPLEMENTARY MATERIAL

Supplementary material can be found at http://www.sysbio.oxfordjournals.org/.

FIGURE 5. Summary of major findings—the evolutionary relation-ships among major lineages of eukaryotes. Clades have been collapsedinto those that we view to be strongly supported. The many poly-tomies represent uncertainties that remain.

FUNDING

This work was made possible by the US NationalScience Foundation Assembling the Tree of Life grantto L.A.K. and D.J.P. (043115) and US National Institutesof Health 5R01AI058054-05 to M.L.S. Funding to collectForaminifera was provided by a Society of SystematicBiologists MiniPEET grant to L.W.P.

ACKNOWLEDGMENTS

We are grateful to Robert Molestina at ATCC whoprovided DNAs through a collaborative National Sci-ence Foundation grant. We acknowledge the assistanceof Kasia Hammar, Leslie Murphy, and Jillian Wardin preparation and sequencing of EST libraries. Ourmanuscript was improved following detailed commentsfrom the editors, Alastair Simpson, and 1 anonymousreviewer. Thanks to David Hillis for conversations on

by guest on April 28, 2013

http://sysbio.oxfordjournals.org/D

ownloaded from

a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (Table

the relationships among them are unresolved: i) Cerco-

(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-

Bassa nematode-

eating soil amoeba, is related to vampyrellid amoebae; 100% BS), and together they are sister to the plant

is identical to an amoeba iso-Arachnula

previous hypotheses and clades with ultrastructural), when contaminant

). ExcavataMalaw-

),whereas in analyses of fewer genes Excavata mem-

Simp-

stable sister group and may represent an independent

) branches within Heterolo-

) and suggests that another enigmatic flag-Soginia anisocys-

Mapping GOLD to Tree

Monday, April 29, 13

Page 23: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Priapulida 1 0Phaeophyceae 1 0Rotifera 1 0Hemichordata 1 0Pinguiophyceae 1 0Ctenophora 1 0Bolidophyceae 1 0Chaetognatha 1 0Porifera 2 0Xanthophyceae 2 0Tardigrada 2 0Euglenida 2 0Chromerida 3 0Placozoa 3 0Glomeromycota 3 0Cryptomycota 4 0Blastocladiomycota 5 0Echinodermata 6 0Entomophthoromycota 9 0Chytridiomycota 12 0Neocallimastigomycota 12 0Annelida 13 0Eustigmatophyceae 13 0Cnidaria 18 0Bacillariophyta 21 0Platyhelminthes 23 0Mollusca 25 0Microsporidia 31 1Chlorophyta 77 1Nematoda 110 2Apicomplexa 264 5Arthropoda 370 7Chordata 626 12Streptophyta 796 15Basidiomycota 976 18Ascomycota 1,251 23

530 SYSTEMATIC BIOLOGY VOL. 59

a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (TableS1). Three major clades are strongly supported, thoughthe relationships among them are unresolved: i) Cerco-zoa, ii) Foraminifera plus Polycystinea and Acantharea(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-phorida with Gromia and vampyrellids (Fig. 3; Basset al. 2009). We show that Theratromyxa, a nematode-eating soil amoeba, is related to vampyrellid amoebae(Fig. 3; 100% BS), and together they are sister to the plantparasites plasmodiophorids (100% BS). The SSU-rDNAsequence for Theratromyxa is identical to an amoeba iso-lated from Siberia where it was identified as Arachnulaimpatiens (EU567294; Bass et al. 2009).

The topology within the Excavata is consistent withprevious hypotheses and clades with ultrastructuralidentities (Simpson 2003; Fig. 4), when contaminantEST data originally mislabeled as Streblomastix strixare excluded (Slamovits and Keeling 2006). Excavatais often polyphyletic in other analyses because Malaw-imonas branches outside the other clades of Excavata(Rodrıguez-Ezpeleta et al. 2007a; Hampl et al. 2009),whereas in analyses of fewer genes Excavata mem-bers fall into 2 or 3 clades (Parfrey et al. 2006; Simp-son et al. 2006). Although Malawimonas nests robustlywithin Excavata in our analyses, it does not have astable sister group and may represent an independentlineage (Fig. 4). Our analyses confirm that Stephanopogon(unplaced in Patterson 1999) branches within Heterolo-bosea (Cavalier-Smith and Nikolaev 2008; Yubuki andLeander 2008) and suggests that another enigmatic flag-ellate, ATCC 50646 (tentatively named Soginia anisocys-tis) is a basal member of Heterolobosea.

CONCLUSIONS

The robust tree of life emerging from this studydemonstrates the benefits of improved taxon samplingfor reconstructing deep phylogeny as our analyses pro-duce stable topologies that include a broad representa-tion of eukaryotes. The current study, combined withinsights from other studies referenced herein, has re-fined the eukaryotic tree of life from over 70 majorlineages (Patterson 1999) to !16 major groups (Fig. 5,http://eutree.lifedesks.org/). Most significantly, weattribute the stability of major clades (e.g., Excavata,Amoebozoa, Opisthokonta, and SAR) to broader taxo-nomic sampling combined with analyses of sufficientcharacters (16 genes or 6578 characters). In our view,inclusion of more taxa coupled with carefully chosengenes is necessary to further resolve the 16 or so majorlineages of microbial eukaryotes for which sister grouprelationships remain uncertain.

SUPPLEMENTARY MATERIAL

Supplementary material can be found at http://www.sysbio.oxfordjournals.org/.

FIGURE 5. Summary of major findings—the evolutionary relation-ships among major lineages of eukaryotes. Clades have been collapsedinto those that we view to be strongly supported. The many poly-tomies represent uncertainties that remain.

FUNDING

This work was made possible by the US NationalScience Foundation Assembling the Tree of Life grantto L.A.K. and D.J.P. (043115) and US National Institutesof Health 5R01AI058054-05 to M.L.S. Funding to collectForaminifera was provided by a Society of SystematicBiologists MiniPEET grant to L.W.P.

ACKNOWLEDGMENTS

We are grateful to Robert Molestina at ATCC whoprovided DNAs through a collaborative National Sci-ence Foundation grant. We acknowledge the assistanceof Kasia Hammar, Leslie Murphy, and Jillian Wardin preparation and sequencing of EST libraries. Ourmanuscript was improved following detailed commentsfrom the editors, Alastair Simpson, and 1 anonymousreviewer. Thanks to David Hillis for conversations on

by guest on April 28, 2013

http://sysbio.oxfordjournals.org/D

ownloaded from

a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (Table

the relationships among them are unresolved: i) Cerco-

(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-

Bassa nematode-

eating soil amoeba, is related to vampyrellid amoebae; 100% BS), and together they are sister to the plant

is identical to an amoeba iso-Arachnula

previous hypotheses and clades with ultrastructural), when contaminant

). ExcavataMalaw-

),whereas in analyses of fewer genes Excavata mem-

Simp-

stable sister group and may represent an independent

) branches within Heterolo-

) and suggests that another enigmatic flag-Soginia anisocys-

Mapping GOLD to Tree

Monday, April 29, 13

Page 24: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Priapulida 1 0Phaeophyceae 1 0Rotifera 1 0Hemichordata 1 0Pinguiophyceae 1 0Ctenophora 1 0Bolidophyceae 1 0Chaetognatha 1 0Porifera 2 0Xanthophyceae 2 0Tardigrada 2 0Euglenida 2 0Chromerida 3 0Placozoa 3 0Glomeromycota 3 0Cryptomycota 4 0Blastocladiomycota 5 0Echinodermata 6 0Entomophthoromycota 9 0Chytridiomycota 12 0Neocallimastigomycota 12 0Annelida 13 0Eustigmatophyceae 13 0Cnidaria 18 0Bacillariophyta 21 0Platyhelminthes 23 0Mollusca 25 0Microsporidia 31 1Chlorophyta 77 1Nematoda 110 2Apicomplexa 264 5Arthropoda 370 7Chordata 626 12Streptophyta 796 15Basidiomycota 976 18Ascomycota 1,251 23

530 SYSTEMATIC BIOLOGY VOL. 59

a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (TableS1). Three major clades are strongly supported, thoughthe relationships among them are unresolved: i) Cerco-zoa, ii) Foraminifera plus Polycystinea and Acantharea(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-phorida with Gromia and vampyrellids (Fig. 3; Basset al. 2009). We show that Theratromyxa, a nematode-eating soil amoeba, is related to vampyrellid amoebae(Fig. 3; 100% BS), and together they are sister to the plantparasites plasmodiophorids (100% BS). The SSU-rDNAsequence for Theratromyxa is identical to an amoeba iso-lated from Siberia where it was identified as Arachnulaimpatiens (EU567294; Bass et al. 2009).

The topology within the Excavata is consistent withprevious hypotheses and clades with ultrastructuralidentities (Simpson 2003; Fig. 4), when contaminantEST data originally mislabeled as Streblomastix strixare excluded (Slamovits and Keeling 2006). Excavatais often polyphyletic in other analyses because Malaw-imonas branches outside the other clades of Excavata(Rodrıguez-Ezpeleta et al. 2007a; Hampl et al. 2009),whereas in analyses of fewer genes Excavata mem-bers fall into 2 or 3 clades (Parfrey et al. 2006; Simp-son et al. 2006). Although Malawimonas nests robustlywithin Excavata in our analyses, it does not have astable sister group and may represent an independentlineage (Fig. 4). Our analyses confirm that Stephanopogon(unplaced in Patterson 1999) branches within Heterolo-bosea (Cavalier-Smith and Nikolaev 2008; Yubuki andLeander 2008) and suggests that another enigmatic flag-ellate, ATCC 50646 (tentatively named Soginia anisocys-tis) is a basal member of Heterolobosea.

CONCLUSIONS

The robust tree of life emerging from this studydemonstrates the benefits of improved taxon samplingfor reconstructing deep phylogeny as our analyses pro-duce stable topologies that include a broad representa-tion of eukaryotes. The current study, combined withinsights from other studies referenced herein, has re-fined the eukaryotic tree of life from over 70 majorlineages (Patterson 1999) to !16 major groups (Fig. 5,http://eutree.lifedesks.org/). Most significantly, weattribute the stability of major clades (e.g., Excavata,Amoebozoa, Opisthokonta, and SAR) to broader taxo-nomic sampling combined with analyses of sufficientcharacters (16 genes or 6578 characters). In our view,inclusion of more taxa coupled with carefully chosengenes is necessary to further resolve the 16 or so majorlineages of microbial eukaryotes for which sister grouprelationships remain uncertain.

SUPPLEMENTARY MATERIAL

Supplementary material can be found at http://www.sysbio.oxfordjournals.org/.

FIGURE 5. Summary of major findings—the evolutionary relation-ships among major lineages of eukaryotes. Clades have been collapsedinto those that we view to be strongly supported. The many poly-tomies represent uncertainties that remain.

FUNDING

This work was made possible by the US NationalScience Foundation Assembling the Tree of Life grantto L.A.K. and D.J.P. (043115) and US National Institutesof Health 5R01AI058054-05 to M.L.S. Funding to collectForaminifera was provided by a Society of SystematicBiologists MiniPEET grant to L.W.P.

ACKNOWLEDGMENTS

We are grateful to Robert Molestina at ATCC whoprovided DNAs through a collaborative National Sci-ence Foundation grant. We acknowledge the assistanceof Kasia Hammar, Leslie Murphy, and Jillian Wardin preparation and sequencing of EST libraries. Ourmanuscript was improved following detailed commentsfrom the editors, Alastair Simpson, and 1 anonymousreviewer. Thanks to David Hillis for conversations on

by guest on April 28, 2013

http://sysbio.oxfordjournals.org/D

ownloaded from

a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (Table

the relationships among them are unresolved: i) Cerco-

(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-

Bassa nematode-

eating soil amoeba, is related to vampyrellid amoebae; 100% BS), and together they are sister to the plant

is identical to an amoeba iso-Arachnula

previous hypotheses and clades with ultrastructural), when contaminant

). ExcavataMalaw-

),whereas in analyses of fewer genes Excavata mem-

Simp-

stable sister group and may represent an independent

) branches within Heterolo-

) and suggests that another enigmatic flag-Soginia anisocys-

Apicomplexa5%

Mapping GOLD to Tree

Monday, April 29, 13

Page 25: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Priapulida 1 0Phaeophyceae 1 0Rotifera 1 0Hemichordata 1 0Pinguiophyceae 1 0Ctenophora 1 0Bolidophyceae 1 0Chaetognatha 1 0Porifera 2 0Xanthophyceae 2 0Tardigrada 2 0Euglenida 2 0Chromerida 3 0Placozoa 3 0Glomeromycota 3 0Cryptomycota 4 0Blastocladiomycota 5 0Echinodermata 6 0Entomophthoromycota 9 0Chytridiomycota 12 0Neocallimastigomycota 12 0Annelida 13 0Eustigmatophyceae 13 0Cnidaria 18 0Bacillariophyta 21 0Platyhelminthes 23 0Mollusca 25 0Microsporidia 31 1Chlorophyta 77 1Nematoda 110 2Apicomplexa 264 5Arthropoda 370 7Chordata 626 12Streptophyta 796 15Basidiomycota 976 18Ascomycota 1,251 23

530 SYSTEMATIC BIOLOGY VOL. 59

a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (TableS1). Three major clades are strongly supported, thoughthe relationships among them are unresolved: i) Cerco-zoa, ii) Foraminifera plus Polycystinea and Acantharea(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-phorida with Gromia and vampyrellids (Fig. 3; Basset al. 2009). We show that Theratromyxa, a nematode-eating soil amoeba, is related to vampyrellid amoebae(Fig. 3; 100% BS), and together they are sister to the plantparasites plasmodiophorids (100% BS). The SSU-rDNAsequence for Theratromyxa is identical to an amoeba iso-lated from Siberia where it was identified as Arachnulaimpatiens (EU567294; Bass et al. 2009).

The topology within the Excavata is consistent withprevious hypotheses and clades with ultrastructuralidentities (Simpson 2003; Fig. 4), when contaminantEST data originally mislabeled as Streblomastix strixare excluded (Slamovits and Keeling 2006). Excavatais often polyphyletic in other analyses because Malaw-imonas branches outside the other clades of Excavata(Rodrıguez-Ezpeleta et al. 2007a; Hampl et al. 2009),whereas in analyses of fewer genes Excavata mem-bers fall into 2 or 3 clades (Parfrey et al. 2006; Simp-son et al. 2006). Although Malawimonas nests robustlywithin Excavata in our analyses, it does not have astable sister group and may represent an independentlineage (Fig. 4). Our analyses confirm that Stephanopogon(unplaced in Patterson 1999) branches within Heterolo-bosea (Cavalier-Smith and Nikolaev 2008; Yubuki andLeander 2008) and suggests that another enigmatic flag-ellate, ATCC 50646 (tentatively named Soginia anisocys-tis) is a basal member of Heterolobosea.

CONCLUSIONS

The robust tree of life emerging from this studydemonstrates the benefits of improved taxon samplingfor reconstructing deep phylogeny as our analyses pro-duce stable topologies that include a broad representa-tion of eukaryotes. The current study, combined withinsights from other studies referenced herein, has re-fined the eukaryotic tree of life from over 70 majorlineages (Patterson 1999) to !16 major groups (Fig. 5,http://eutree.lifedesks.org/). Most significantly, weattribute the stability of major clades (e.g., Excavata,Amoebozoa, Opisthokonta, and SAR) to broader taxo-nomic sampling combined with analyses of sufficientcharacters (16 genes or 6578 characters). In our view,inclusion of more taxa coupled with carefully chosengenes is necessary to further resolve the 16 or so majorlineages of microbial eukaryotes for which sister grouprelationships remain uncertain.

SUPPLEMENTARY MATERIAL

Supplementary material can be found at http://www.sysbio.oxfordjournals.org/.

FIGURE 5. Summary of major findings—the evolutionary relation-ships among major lineages of eukaryotes. Clades have been collapsedinto those that we view to be strongly supported. The many poly-tomies represent uncertainties that remain.

FUNDING

This work was made possible by the US NationalScience Foundation Assembling the Tree of Life grantto L.A.K. and D.J.P. (043115) and US National Institutesof Health 5R01AI058054-05 to M.L.S. Funding to collectForaminifera was provided by a Society of SystematicBiologists MiniPEET grant to L.W.P.

ACKNOWLEDGMENTS

We are grateful to Robert Molestina at ATCC whoprovided DNAs through a collaborative National Sci-ence Foundation grant. We acknowledge the assistanceof Kasia Hammar, Leslie Murphy, and Jillian Wardin preparation and sequencing of EST libraries. Ourmanuscript was improved following detailed commentsfrom the editors, Alastair Simpson, and 1 anonymousreviewer. Thanks to David Hillis for conversations on

by guest on April 28, 2013

http://sysbio.oxfordjournals.org/D

ownloaded from

a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (Table

the relationships among them are unresolved: i) Cerco-

(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-

Bassa nematode-

eating soil amoeba, is related to vampyrellid amoebae; 100% BS), and together they are sister to the plant

is identical to an amoeba iso-Arachnula

previous hypotheses and clades with ultrastructural), when contaminant

). ExcavataMalaw-

),whereas in analyses of fewer genes Excavata mem-

Simp-

stable sister group and may represent an independent

) branches within Heterolo-

) and suggests that another enigmatic flag-Soginia anisocys-

A Very Biased Sampling

Monday, April 29, 13

Page 26: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Solution to Biased Sampling?Priapulida 1 0Phaeophyceae 1 0Rotifera 1 0Hemichordata 1 0Pinguiophyceae 1 0Ctenophora 1 0Bolidophyceae 1 0Chaetognatha 1 0Porifera 2 0Xanthophyceae 2 0Tardigrada 2 0Euglenida 2 0Chromerida 3 0Placozoa 3 0Glomeromycota 3 0Cryptomycota 4 0Blastocladiomycota 5 0Echinodermata 6 0Entomophthoromycota 9 0Chytridiomycota 12 0Neocallimastigomycota 12 0Annelida 13 0Eustigmatophyceae 13 0Cnidaria 18 0Bacillariophyta 21 0Platyhelminthes 23 0Mollusca 25 0Microsporidia 31 1Chlorophyta 77 1Nematoda 110 2Apicomplexa 264 5Arthropoda 370 7Chordata 626 12Streptophyta 796 15Basidiomycota 976 18Ascomycota 1,251 23

530 SYSTEMATIC BIOLOGY VOL. 59

a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (TableS1). Three major clades are strongly supported, thoughthe relationships among them are unresolved: i) Cerco-zoa, ii) Foraminifera plus Polycystinea and Acantharea(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-phorida with Gromia and vampyrellids (Fig. 3; Basset al. 2009). We show that Theratromyxa, a nematode-eating soil amoeba, is related to vampyrellid amoebae(Fig. 3; 100% BS), and together they are sister to the plantparasites plasmodiophorids (100% BS). The SSU-rDNAsequence for Theratromyxa is identical to an amoeba iso-lated from Siberia where it was identified as Arachnulaimpatiens (EU567294; Bass et al. 2009).

The topology within the Excavata is consistent withprevious hypotheses and clades with ultrastructuralidentities (Simpson 2003; Fig. 4), when contaminantEST data originally mislabeled as Streblomastix strixare excluded (Slamovits and Keeling 2006). Excavatais often polyphyletic in other analyses because Malaw-imonas branches outside the other clades of Excavata(Rodrıguez-Ezpeleta et al. 2007a; Hampl et al. 2009),whereas in analyses of fewer genes Excavata mem-bers fall into 2 or 3 clades (Parfrey et al. 2006; Simp-son et al. 2006). Although Malawimonas nests robustlywithin Excavata in our analyses, it does not have astable sister group and may represent an independentlineage (Fig. 4). Our analyses confirm that Stephanopogon(unplaced in Patterson 1999) branches within Heterolo-bosea (Cavalier-Smith and Nikolaev 2008; Yubuki andLeander 2008) and suggests that another enigmatic flag-ellate, ATCC 50646 (tentatively named Soginia anisocys-tis) is a basal member of Heterolobosea.

CONCLUSIONS

The robust tree of life emerging from this studydemonstrates the benefits of improved taxon samplingfor reconstructing deep phylogeny as our analyses pro-duce stable topologies that include a broad representa-tion of eukaryotes. The current study, combined withinsights from other studies referenced herein, has re-fined the eukaryotic tree of life from over 70 majorlineages (Patterson 1999) to !16 major groups (Fig. 5,http://eutree.lifedesks.org/). Most significantly, weattribute the stability of major clades (e.g., Excavata,Amoebozoa, Opisthokonta, and SAR) to broader taxo-nomic sampling combined with analyses of sufficientcharacters (16 genes or 6578 characters). In our view,inclusion of more taxa coupled with carefully chosengenes is necessary to further resolve the 16 or so majorlineages of microbial eukaryotes for which sister grouprelationships remain uncertain.

SUPPLEMENTARY MATERIAL

Supplementary material can be found at http://www.sysbio.oxfordjournals.org/.

FIGURE 5. Summary of major findings—the evolutionary relation-ships among major lineages of eukaryotes. Clades have been collapsedinto those that we view to be strongly supported. The many poly-tomies represent uncertainties that remain.

FUNDING

This work was made possible by the US NationalScience Foundation Assembling the Tree of Life grantto L.A.K. and D.J.P. (043115) and US National Institutesof Health 5R01AI058054-05 to M.L.S. Funding to collectForaminifera was provided by a Society of SystematicBiologists MiniPEET grant to L.W.P.

ACKNOWLEDGMENTS

We are grateful to Robert Molestina at ATCC whoprovided DNAs through a collaborative National Sci-ence Foundation grant. We acknowledge the assistanceof Kasia Hammar, Leslie Murphy, and Jillian Wardin preparation and sequencing of EST libraries. Ourmanuscript was improved following detailed commentsfrom the editors, Alastair Simpson, and 1 anonymousreviewer. Thanks to David Hillis for conversations on

by guest on April 28, 2013

http://sysbio.oxfordjournals.org/D

ownloaded from

a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (Table

the relationships among them are unresolved: i) Cerco-

(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-

Bassa nematode-

eating soil amoeba, is related to vampyrellid amoebae; 100% BS), and together they are sister to the plant

is identical to an amoeba iso-Arachnula

previous hypotheses and clades with ultrastructural), when contaminant

). ExcavataMalaw-

),whereas in analyses of fewer genes Excavata mem-

Simp-

stable sister group and may represent an independent

) branches within Heterolo-

) and suggests that another enigmatic flag-Soginia anisocys-

Monday, April 29, 13

Page 27: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Solution: Fill in the TreePriapulida 1 0Phaeophyceae 1 0Rotifera 1 0Hemichordata 1 0Pinguiophyceae 1 0Ctenophora 1 0Bolidophyceae 1 0Chaetognatha 1 0Porifera 2 0Xanthophyceae 2 0Tardigrada 2 0Euglenida 2 0Chromerida 3 0Placozoa 3 0Glomeromycota 3 0Cryptomycota 4 0Blastocladiomycota 5 0Echinodermata 6 0Entomophthoromycota 9 0Chytridiomycota 12 0Neocallimastigomycota 12 0Annelida 13 0Eustigmatophyceae 13 0Cnidaria 18 0Bacillariophyta 21 0Platyhelminthes 23 0Mollusca 25 0Microsporidia 31 1Chlorophyta 77 1Nematoda 110 2Apicomplexa 264 5Arthropoda 370 7Chordata 626 12Streptophyta 796 15Basidiomycota 976 18Ascomycota 1,251 23

530 SYSTEMATIC BIOLOGY VOL. 59

a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (TableS1). Three major clades are strongly supported, thoughthe relationships among them are unresolved: i) Cerco-zoa, ii) Foraminifera plus Polycystinea and Acantharea(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-phorida with Gromia and vampyrellids (Fig. 3; Basset al. 2009). We show that Theratromyxa, a nematode-eating soil amoeba, is related to vampyrellid amoebae(Fig. 3; 100% BS), and together they are sister to the plantparasites plasmodiophorids (100% BS). The SSU-rDNAsequence for Theratromyxa is identical to an amoeba iso-lated from Siberia where it was identified as Arachnulaimpatiens (EU567294; Bass et al. 2009).

The topology within the Excavata is consistent withprevious hypotheses and clades with ultrastructuralidentities (Simpson 2003; Fig. 4), when contaminantEST data originally mislabeled as Streblomastix strixare excluded (Slamovits and Keeling 2006). Excavatais often polyphyletic in other analyses because Malaw-imonas branches outside the other clades of Excavata(Rodrıguez-Ezpeleta et al. 2007a; Hampl et al. 2009),whereas in analyses of fewer genes Excavata mem-bers fall into 2 or 3 clades (Parfrey et al. 2006; Simp-son et al. 2006). Although Malawimonas nests robustlywithin Excavata in our analyses, it does not have astable sister group and may represent an independentlineage (Fig. 4). Our analyses confirm that Stephanopogon(unplaced in Patterson 1999) branches within Heterolo-bosea (Cavalier-Smith and Nikolaev 2008; Yubuki andLeander 2008) and suggests that another enigmatic flag-ellate, ATCC 50646 (tentatively named Soginia anisocys-tis) is a basal member of Heterolobosea.

CONCLUSIONS

The robust tree of life emerging from this studydemonstrates the benefits of improved taxon samplingfor reconstructing deep phylogeny as our analyses pro-duce stable topologies that include a broad representa-tion of eukaryotes. The current study, combined withinsights from other studies referenced herein, has re-fined the eukaryotic tree of life from over 70 majorlineages (Patterson 1999) to !16 major groups (Fig. 5,http://eutree.lifedesks.org/). Most significantly, weattribute the stability of major clades (e.g., Excavata,Amoebozoa, Opisthokonta, and SAR) to broader taxo-nomic sampling combined with analyses of sufficientcharacters (16 genes or 6578 characters). In our view,inclusion of more taxa coupled with carefully chosengenes is necessary to further resolve the 16 or so majorlineages of microbial eukaryotes for which sister grouprelationships remain uncertain.

SUPPLEMENTARY MATERIAL

Supplementary material can be found at http://www.sysbio.oxfordjournals.org/.

FIGURE 5. Summary of major findings—the evolutionary relation-ships among major lineages of eukaryotes. Clades have been collapsedinto those that we view to be strongly supported. The many poly-tomies represent uncertainties that remain.

FUNDING

This work was made possible by the US NationalScience Foundation Assembling the Tree of Life grantto L.A.K. and D.J.P. (043115) and US National Institutesof Health 5R01AI058054-05 to M.L.S. Funding to collectForaminifera was provided by a Society of SystematicBiologists MiniPEET grant to L.W.P.

ACKNOWLEDGMENTS

We are grateful to Robert Molestina at ATCC whoprovided DNAs through a collaborative National Sci-ence Foundation grant. We acknowledge the assistanceof Kasia Hammar, Leslie Murphy, and Jillian Wardin preparation and sequencing of EST libraries. Ourmanuscript was improved following detailed commentsfrom the editors, Alastair Simpson, and 1 anonymousreviewer. Thanks to David Hillis for conversations on

by guest on April 28, 2013

http://sysbio.oxfordjournals.org/D

ownloaded from

530 SYSTEMATIC BIOLOGY VOL. 59

a 97-taxon data set of Rhizaria that included all lin-eages with previously published data plus additionalmultigene data for 12 taxa added for this study (TableS1). Three major clades are strongly supported, thoughthe relationships among them are unresolved: i) Cerco-zoa, ii) Foraminifera plus Polycystinea and Acantharea(formerly classified with Phaeodarea as radiolarians),and (iii) the parasitic Haplosporidia and Plasmodio-phorida with Gromia and vampyrellids (Fig. 3; Basset al. 2009). We show that Theratromyxa, a nematode-eating soil amoeba, is related to vampyrellid amoebae(Fig. 3; 100% BS), and together they are sister to the plantparasites plasmodiophorids (100% BS). The SSU-rDNAsequence for Theratromyxa is identical to an amoeba iso-lated from Siberia where it was identified as Arachnulaimpatiens (EU567294; Bass et al. 2009).

The topology within the Excavata is consistent withprevious hypotheses and clades with ultrastructuralidentities (Simpson 2003; Fig. 4), when contaminantEST data originally mislabeled as Streblomastix strixare excluded (Slamovits and Keeling 2006). Excavatais often polyphyletic in other analyses because Malaw-imonas branches outside the other clades of Excavata(Rodrıguez-Ezpeleta et al. 2007a; Hampl et al. 2009),whereas in analyses of fewer genes Excavata mem-bers fall into 2 or 3 clades (Parfrey et al. 2006; Simp-son et al. 2006). Although Malawimonas nests robustlywithin Excavata in our analyses, it does not have astable sister group and may represent an independentlineage (Fig. 4). Our analyses confirm that Stephanopogon(unplaced in Patterson 1999) branches within Heterolo-bosea (Cavalier-Smith and Nikolaev 2008; Yubuki andLeander 2008) and suggests that another enigmatic flag-ellate, ATCC 50646 (tentatively named Soginia anisocys-tis) is a basal member of Heterolobosea.

CONCLUSIONS

The robust tree of life emerging from this studydemonstrates the benefits of improved taxon samplingfor reconstructing deep phylogeny as our analyses pro-duce stable topologies that include a broad representa-tion of eukaryotes. The current study, combined withinsights from other studies referenced herein, has re-fined the eukaryotic tree of life from over 70 majorlineages (Patterson 1999) to !16 major groups (Fig. 5,http://eutree.lifedesks.org/). Most significantly, weattribute the stability of major clades (e.g., Excavata,Amoebozoa, Opisthokonta, and SAR) to broader taxo-nomic sampling combined with analyses of sufficientcharacters (16 genes or 6578 characters). In our view,inclusion of more taxa coupled with carefully chosengenes is necessary to further resolve the 16 or so majorlineages of microbial eukaryotes for which sister grouprelationships remain uncertain.

SUPPLEMENTARY MATERIAL

Supplementary material can be found at http://www.sysbio.oxfordjournals.org/.

FIGURE 5. Summary of major findings—the evolutionary relation-ships among major lineages of eukaryotes. Clades have been collapsedinto those that we view to be strongly supported. The many poly-tomies represent uncertainties that remain.

FUNDING

This work was made possible by the US NationalScience Foundation Assembling the Tree of Life grantto L.A.K. and D.J.P. (043115) and US National Institutesof Health 5R01AI058054-05 to M.L.S. Funding to collectForaminifera was provided by a Society of SystematicBiologists MiniPEET grant to L.W.P.

ACKNOWLEDGMENTS

We are grateful to Robert Molestina at ATCC whoprovided DNAs through a collaborative National Sci-ence Foundation grant. We acknowledge the assistanceof Kasia Hammar, Leslie Murphy, and Jillian Wardin preparation and sequencing of EST libraries. Ourmanuscript was improved following detailed commentsfrom the editors, Alastair Simpson, and 1 anonymousreviewer. Thanks to David Hillis for conversations on

by guest on April 28, 2013http://sysbio.oxfordjournals.org/

Downloaded from

Monday, April 29, 13

Page 28: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

II: Filling in the Tree Example

Monday, April 29, 13

Page 30: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

As of 2002

Monday, April 29, 13

Page 31: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

As of 2002

Tree Based on Hugenholtz, 2002. http://genomebiology.com/2002/3/2/reviews/0003

Monday, April 29, 13

Page 32: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

As of 2002

Tree Based on Hugenholtz, 2002. http://genomebiology.com/2002/3/2/reviews/0003

Monday, April 29, 13

Page 33: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

As of 2002

Tree Based on Hugenholtz, 2002. http://genomebiology.com/2002/3/2/reviews/0003

Monday, April 29, 13

Page 34: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

As of 2002

Tree Based on Hugenholtz, 2002. http://genomebiology.com/2002/3/2/reviews/0003

Monday, April 29, 13

Page 35: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

As of 2002

Tree Based on Hugenholtz, 2002. http://genomebiology.com/2002/3/2/reviews/0003

Monday, April 29, 13

Page 36: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Solution I: sequence more phyla

• NSF-funded Tree of Life Project

• A genome from each of eight phyla

Eisen, Ward, Robb, Nelson, et al

Tree Based on Hugenholtz, 2002. http://genomebiology.com/2002/3/2/reviews/0003

Monday, April 29, 13

Page 37: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Phylum

Species selected

Chrysiogenes

Chrysiogenes arsenatis (GCA)

Coprothermobacter

Coprothermobacter proteolyticus (GCBP)

Dictyoglomi

Dictyoglomus thermophilum (GD T )

Thermodesulfobacteria

Thermodesulfobacterium commune (GTC)

Nitrospirae

Thermodesulfovibrio yellowstonii (GTY)

Thermomicrobia

Thermomicrobium roseum (GTR )

Deferribacteres

Geovibrio thiophilus (GGT)

Synergistes

Synergistes jonesii (GSJ)

Organisms Selected

Monday, April 29, 13

Page 38: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Monday, April 29, 13

Page 39: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Still highly biased in terms of the tree

• NSF-funded Tree of Life Project

• A genome from each of eight phyla

Eisen & Ward, PIs Tree Based on Hugenholtz, 2002. http://genomebiology.com/2002/3/2/reviews/0003

Monday, April 29, 13

Page 40: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Major Lineages of Actinobacteria2.5.1 Acidimicrobidae2.5.1.1 Unclassified2.5.1.2 "Microthrixineae2.5.1.3 Acidimicrobineae2.5.1.4 BD2-102.5.1.5 EB10172.5.2 Actinobacteridae2.5.2.1 Unclassified2.5.2.10 Ellin306/WR1602.5.2.11 Ellin50122.5.2.12 Ellin50342.5.2.13 Frankineae2.5.2.14 Glycomyces2.5.2.15 Intrasporangiaceae2.5.2.16 Kineosporiaceae2.5.2.17 Microbacteriaceae2.5.2.18 Micrococcaceae2.5.2.19 Micromonosporaceae2.5.2.2 Actinomyces2.5.2.20 Propionibacterineae2.5.2.21 Pseudonocardiaceae2.5.2.22 Streptomycineae2.5.2.23 Streptosporangineae2.5.2.3 Actinomycineae2.5.2.4 Actinosynnemataceae2.5.2.5 Bifidobacteriaceae2.5.2.6 Brevibacteriaceae2.5.2.7 Cellulomonadaceae2.5.2.8 Corynebacterineae2.5.2.9 Dermabacteraceae2.5.3 Coriobacteridae2.5.3.1 Unclassified2.5.3.2 Atopobiales2.5.3.3 Coriobacteriales2.5.3.4 Eggerthellales2.5.4 OPB412.5.5 PK12.5.6 Rubrobacteridae2.5.6.1 Unclassified2.5.6.2 "Thermoleiphilaceae2.5.6.3 MC472.5.6.4 Rubrobacteraceae

2.5 Actinobacteria2.5.1 Acidimicrobidae2.5.1.1 Unclassified2.5.1.2 "Microthrixineae2.5.1.3 Acidimicrobineae2.5.1.3.1 Unclassified2.5.1.3.2 Acidimicrobiaceae2.5.1.4 BD2-102.5.1.5 EB10172.5.2 Actinobacteridae2.5.2.1 Unclassified2.5.2.10 Ellin306/WR1602.5.2.11 Ellin50122.5.2.12 Ellin50342.5.2.13 Frankineae2.5.2.13.1 Unclassified2.5.2.13.2 Acidothermaceae2.5.2.13.3 Ellin60902.5.2.13.4 Frankiaceae2.5.2.13.5 Geodermatophilaceae2.5.2.13.6 Microsphaeraceae2.5.2.13.7 Sporichthyaceae2.5.2.14 Glycomyces2.5.2.15 Intrasporangiaceae2.5.2.15.1 Unclassified2.5.2.15.2 Dermacoccus2.5.2.15.3 Intrasporangiaceae2.5.2.16 Kineosporiaceae2.5.2.17 Microbacteriaceae2.5.2.17.1 Unclassified2.5.2.17.2 Agrococcus2.5.2.17.3 Agromyces2.5.2.18 Micrococcaceae2.5.2.19 Micromonosporaceae2.5.2.2 Actinomyces2.5.2.20 Propionibacterineae2.5.2.20.1 Unclassified2.5.2.20.2 Kribbella2.5.2.20.3 Nocardioidaceae2.5.2.20.4 Propionibacteriaceae2.5.2.21 Pseudonocardiaceae2.5.2.22 Streptomycineae2.5.2.22.1 Unclassified2.5.2.22.2 Kitasatospora2.5.2.22.3 Streptacidiphilus2.5.2.23 Streptosporangineae2.5.2.23.1 Unclassified2.5.2.23.2 Ellin51292.5.2.23.3 Nocardiopsaceae2.5.2.23.4 Streptosporangiaceae2.5.2.23.5 Thermomonosporaceae2.5.2.3 Actinomycineae2.5.2.4 Actinosynnemataceae2.5.2.5 Bifidobacteriaceae2.5.2.6 Brevibacteriaceae2.5.2.7 Cellulomonadaceae2.5.2.8 Corynebacterineae2.5.2.8.1 Unclassified2.5.2.8.2 Corynebacteriaceae2.5.2.8.3 Dietziaceae2.5.2.8.4 Gordoniaceae2.5.2.8.5 Mycobacteriaceae2.5.2.8.6 Rhodococcus2.5.2.8.7 Rhodococcus2.5.2.8.8 Rhodococcus2.5.2.9 Dermabacteraceae2.5.2.9.1 Unclassified2.5.2.9.2 Brachybacterium2.5.2.9.3 Dermabacter2.5.3 Coriobacteridae2.5.3.1 Unclassified2.5.3.2 Atopobiales2.5.3.3 Coriobacteriales2.5.3.4 Eggerthellales2.5.4 OPB412.5.5 PK12.5.6 Rubrobacteridae2.5.6.1 Unclassified2.5.6.2 "Thermoleiphilaceae2.5.6.2.1 Unclassified2.5.6.2.2 Conexibacter2.5.6.2.3 XGE5142.5.6.3 MC472.5.6.4 Rubrobacteraceae

Monday, April 29, 13

Page 41: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Same trend in Archaea

• NSF-funded Tree of Life Project

• A genome from each of eight phyla

Eisen & Ward, PIs Tree Based on Hugenholtz, 2002. http://genomebiology.com/2002/3/2/reviews/0003

Monday, April 29, 13

Page 42: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Same trend in Eukaryotes

• NSF-funded Tree of Life Project

• A genome from each of eight phyla

Eisen & Ward, PIs Tree Based on Hugenholtz, 2002. http://genomebiology.com/2002/3/2/reviews/0003

Monday, April 29, 13

Page 43: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Same trend in Viruses

• NSF-funded Tree of Life Project

• A genome from each of eight phyla

Eisen & Ward, PIs Tree Based on Hugenholtz, 2002. http://genomebiology.com/2002/3/2/reviews/0003

Monday, April 29, 13

Page 44: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Solution: Really Fill in the Trees

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

Tree Based on Hugenholtz, 2002. http://genomebiology.com/2002/3/2/reviews/0003

Monday, April 29, 13

Page 45: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Filling in the Tree

Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree

Monday, April 29, 13

Page 46: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree

Filling in the Tree

Monday, April 29, 13

Page 47: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree

Filling in the Tree

Monday, April 29, 13

Page 48: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree

Filling in the Tree

Monday, April 29, 13

Page 49: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Lots of Plants, Animals, Fungi

Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree

Monday, April 29, 13

Page 50: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Exclude Plants, Animals, Fungi

Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree

Monday, April 29, 13

Page 51: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

A Genomic Encyclopedia of Microbes (GEM)

Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree

Monday, April 29, 13

Page 52: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Just Say No to Eukaryotes

Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree

Monday, April 29, 13

Page 53: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

GEBA: A Genomic Encyclopedia of Bacteria and Archaea

Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree

Monday, April 29, 13

Page 55: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

GEBA Pilot Project: Components• Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan

Eisen, Eddy Rubin, Jim Bristow)• Project management (David Bruce, Eileen Dalin, Lynne

Goodwin)• Culture collection and DNA prep (DSMZ, Hans-Peter Klenk)• Sequencing and closure (Eileen Dalin, Susan Lucas, Alla

Lapidus, Mat Nolan, Alex Copeland, Cliff Han, Feng Chen, Jan-Fang Cheng)

• Annotation and data release (Nikos Kyrpides, Victor Markowitz, et al)

• Analysis (Dongying Wu, Kostas Mavrommatis, Martin Wu, Victor Kunin, Neil Rawlings, Ian Paulsen, Patrick Chain, Patrik D’Haeseleer, Sean Hooper, Iain Anderson, Amrita Pati, Natalia N. Ivanova, Athanasios Lykidis, Adam Zemla)

• Adopt a microbe education project (Cheryl Kerfeld)• Outreach (David Gilbert)• $$$ (DOE, Eddy Rubin, Jim Bristow)

Monday, April 29, 13

Page 56: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

rRNA Tree of Life

FIgure from Barton, Eisen et al. “Evolution”, CSHL Press.

Based on tree from Pace NR, 2003.

Monday, April 29, 13

Page 57: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

rRNA Tree of BA

FIgure from Barton, Eisen et al. “Evolution”, CSHL Press.

Based on tree from Pace NR, 2003.

Monday, April 29, 13

Page 58: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

GreenGenes

Monday, April 29, 13

Page 59: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Monday, April 29, 13

Page 60: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

DSMZ

Monday, April 29, 13

Page 61: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Monday, April 29, 13

Page 62: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

GEBA Pilot Project Overview

• Identify major branches in rRNA tree for which no genomes are available

• Identify those with a cultured representative in DSMZ

• DSMZ grew > 200 of these and prepped DNA• Sequence and finish 200+• Annotate, analyze, release data• Assess benefits of tree guided sequencing• 1st paper Wu et al in Nature Dec 2009

Monday, April 29, 13

Page 63: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

GEBA Pilot Target List

0

5

10

15

20

25

30

35

B: A

ctino

bacte

ria (H

igh G

C)

B: A

mina

naer

obia

B: A

quific

ae

B: B

actero

idetes

B: C

hloro

flexi

B: D

efer

ribac

tere

s

B: D

efer

ribac

tere

s

B: D

einoc

occi

B: D

elta Pr

oteo

bacte

ria

B: Eps

ilon Pr

oteo

bacte

ria

B: Firm

icutes

B: Fus

obac

teria

B: G

amma Pr

oteo

bacte

ria

B: G

emmatim

onad

etes

B: H

aloan

aero

biales

B: Plan

ctomyc

etes

B: S

piroc

haetes

B: The

rmod

esulf

obac

teria

B: The

rmod

esulf

obia

B: The

rmov

enab

ulae

A: H

aloba

cteria

A: A

rcha

eoglo

bi

A: M

etha

noba

cteria

A: M

etha

nomicr

obia

A: The

rmoc

occi

A: The

rmop

rotei

Phyla

# o

f G

en

om

es

GEBA Initial Target List

Monday, April 29, 13

Page 64: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Assess Benefits of GEBA

• All genomes have some value

• But what, if any, is the benefit of tree-guided sequencing over other selection methods

• Lessons for other large scale microbial genome projects?

Monday, April 29, 13

Page 65: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Lessons from GEBA

Monday, April 29, 13

Page 66: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Lesson 1: rRNA PD IDs novel lineages

From Wu et al. 2009 Nature 462, 1056-1060

Monday, April 29, 13

Page 67: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Concatenated Marker PDFrom Wu et al. 2009 Nature 462, 1056-1060

Monday, April 29, 13

Page 69: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

How Pick Novel Lineages for Euks?

• Molecular• rRNA PD?• Conserved markers by PCR?• EST shotgun?

• Other data for phylogeny

Monday, April 29, 13

Page 70: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Lesson 3: Improves annotation

• Took 56 GEBA genomes and compared results vs. 56 randomly sampled new genomes

• Better definition of protein family sequence “patterns”• Greatly improves “comparative” and “evolutionary”

based predictions• Conversion of hypothetical into conserved hypotheticals• Linking distantly related members of protein families• Improved non-homology prediction

Monday, April 29, 13

Page 71: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Annotation for Euks?

Monday, April 29, 13

Page 72: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Lesson 4 : Metadata Important

Monday, April 29, 13

Page 73: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Lesson 5: Project management critical

• Tracking samples and status• Getting permissions• Shipping samples• Contacting collaborators• Data archiving and submission• Communicating with core facilities• and more

Monday, April 29, 13

Page 74: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Lesson 6: Culture Collections Needed

Monday, April 29, 13

Page 75: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Lesson 7: Data Publications

Monday, April 29, 13

Page 76: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Lesson 8: Diversity Discovery

• Phylogeny-driven genome selection helps discover new genetic diversity

Monday, April 29, 13

Page 77: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Protein Family Rarefaction

• Take data set of multiple complete genomes

• Identify all protein families using MCL• Plot # of genomes vs. # of protein families

Monday, April 29, 13

Page 84: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

True for Euks?

Monday, April 29, 13

Page 85: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Lesson 9: Improves metagenomics

Monday, April 29, 13

Page 86: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Phylotyping

0

0.125

0.250

0.375

0.500

Alphapro

teobacteria

Betap

roteobacteria

Gamm

aproteobacteria

Epsilo

nproteobacteria

Deltapro

teobacteria

Cyanobacteria

Firmicutes

Actinobacteria

Chlorobi

CFB

Chloroflexi

Spirochaetes

Fusobacteria

Deinococcus-Th

ermus

Euryarchaeota

Crenarchaeota

Sargasso Phylotypes

Wei

ghte

d %

of C

lone

s

Major Phylogenetic Group

EFG EFTuHSP70 RecARpoB rRNA

Venter et al., Science 304: 66-74. 2004

GEBA Project improves metagenomic analysis

Monday, April 29, 13

Page 87: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Eukaryotic Metagenomics?

Monday, April 29, 13

Page 88: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

GEBA Zoom

Monday, April 29, 13

Page 89: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

GEBA Now

• 300+ genomes• Rich sampling of major groups of

cultured organisms• Zoomed in sampling of haloarchaea,

cyanobacteria and more

Monday, April 29, 13

Page 90: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

GEBA Cyanobacteria

www.pnas.org/cgi/doi/10.1073/pnas.1217107110

Monday, April 29, 13

Page 91: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Haloarchaeal GEBA-like

Lynch EA, Langille MGI, Darling A, Wilbanks EG, Haltiner C, et al. (2012) Sequencing of Seven Haloarchaeal Genomes Reveals Patterns of Genomic Flux. PLoS ONE 7(7): e41389. doi:10.1371/journal.pone.0041389

Monday, April 29, 13

Page 92: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

88

Plan: Sequence multiple Root Nodule Bacteria (RNBs) across the

planet. Pilot: 100 RNBs.

Alpha RNB

BradyrhizobiumMesorhizobiumRhizobium

Beta RNB

Sinorhizobium

CupriavidisBurkholderia

Balneimonas-like

DevosiaOchrobactrumPhyllobacterium

AzorhizobiumAllorhizobium

Goal: • Understand BioGeographical effects on species

evolution and understand host-specificity.

Rationale: • N2 fixation by legume pastures and crops provides 65% of the

N currently utilized in agricultural production.

• Contributes 25 to 90 million metric tones N pa.

• Symbioses save $US 6-10 billion annually on N fertilizer.

• Grain and animal production enhanced by fixed nitrogen supplied by the symbiosis.

Nikos Kyrpides

GEBA RNB

Monday, April 29, 13

Page 93: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

But ...

Monday, April 29, 13

Page 94: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Phylotyping

0

0.125

0.250

0.375

0.500

Alphapro

teobacteria

Betap

roteobacteria

Gamm

aproteobacteria

Epsilo

nproteobacteria

Deltapro

teobacteria

Cyanobacteria

Firmicutes

Actinobacteria

Chlorobi

CFB

Chloroflexi

Spirochaetes

Fusobacteria

Deinococcus-Th

ermus

Euryarchaeota

Crenarchaeota

Sargasso Phylotypes

Wei

ghte

d %

of C

lone

s

Major Phylogenetic Group

EFG EFTuHSP70 RecARpoB rRNA

Venter et al., Science 304: 66-74. 2004

GEBA Project improves metagenomic analysis

Monday, April 29, 13

Page 95: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Phylotyping

0

0.125

0.250

0.375

0.500

Alphapro

teobacteria

Betap

roteobacteria

Gamm

aproteobacteria

Epsilo

nproteobacteria

Deltapro

teobacteria

Cyanobacteria

Firmicutes

Actinobacteria

Chlorobi

CFB

Chloroflexi

Spirochaetes

Fusobacteria

Deinococcus-Th

ermus

Euryarchaeota

Crenarchaeota

Sargasso Phylotypes

Wei

ghte

d %

of C

lone

s

Major Phylogenetic Group

EFG EFTuHSP70 RecARpoB rRNA

But not a lot

Venter et al., Science 304: 66-74. 2004

Monday, April 29, 13

Page 96: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Phylogenomics Future 1

• Need to adapt genomic and metagenomic methods to make better use of data

Monday, April 29, 13

Page 97: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Improving Metagenomic Analysis

• Methods• More automation• Better phylogenetic methods for short reads

and large data sets• Improved tools for using distantly related

genomes in metagenomic analysis• Data sets

• Rebuild protein family models• New phylogenetic markers• Need better reference phylogenies, including

HGT• More simulations

Monday, April 29, 13

Page 98: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Kembel Correction)LJXUH��&OLFN�KHUH�WR�GRZQORDG�KLJK�UHVROXWLRQ�LPDJH

Kembel, Wu, Eisen, Green. In press. PLoS Computational Biology. Incorporating 16S gene copy number information improves estimates of microbial diversity and abundance

Monday, April 29, 13

Page 99: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

alignment used to build the profile, resulting in a multiplesequence alignment of full-length reference sequences andmetagenomic reads. The final step of the alignment process is aquality control filter that 1) ensures that only homologous SSU-rRNA sequences from the appropriate phylogenetic domain areincluded in the final alignment, and 2) masks highly gappedalignment columns (see Text S1).We use this high quality alignment of metagenomic reads and

references sequences to construct a fully-resolved, phylogenetictree and hence determine the evolutionary relationships betweenthe reads. Reference sequences are included in this stage of theanalysis to guide the phylogenetic assignment of the relativelyshort metagenomic reads. While the software can be easilyextended to incorporate a number of different phylogenetic toolscapable of analyzing metagenomic data (e.g., RAxML [27],pplacer [28], etc.), PhylOTU currently employs FastTree as adefault method due to its relatively high speed-to-performanceratio and its ability to construct accurate trees in the presence ofhighly-gapped data [29]. After construction of the phylogeny,lineages representing reference sequences are pruned from thetree. The resulting phylogeny of metagenomic reads is then used tocompute a PD distance matrix in which the distance between apair of reads is defined as the total tree path distance (i.e., branchlength) separating the two reads [30]. This tree-based distancematrix is subsequently used to hierarchically cluster metagenomicreads via MOTHUR into OTUs in a fashion similar to traditionalPID-based analysis [31]. As with PID clustering, the hierarchicalalgorithm can be tuned to produce finer or courser clusters,corresponding to different taxonomic levels, by adjusting theclustering threshold and linkage method.To evaluate the performance of PhylOTU, we employed

statistical comparisons of distance matrices and clustering resultsfor a variety of data sets. These investigations aimed 1) to compare

PD versus PID clustering, 2) to explore overlap between PhylOTUclusters and recognized taxonomic designations, and 3) to quantifythe accuracy of PhylOTU clusters from shotgun reads relative tothose obtained from full-length sequences.

PhylOTU Clusters Recapitulate PID ClustersWe sought to identify how PD-based clustering compares to

commonly employed PID-based clustering methods by applyingthe two methods to the same set of sequences. Both PID-basedclustering and PhylOTU may be used to identify OTUs fromoverlapping sequences. Therefore we applied both methods to adataset of 508 full-length bacterial SSU-rRNA sequences (refer-ence sequences; see above) obtained from the Ribosomal DatabaseProject (RDP) [25]. Recent work has demonstrated that PID ismore accurately calculated from pairwise alignments than multiplesequence alignments [32–33], so we used ESPRIT, whichimplements pairwise alignments, to obtain a PID distance matrixfor the reference sequences [32]. We used PhylOTU to compute aPD distance matrix for the same data. Then, we used MOTHUR tohierarchically cluster sequences into OTUs based on both PIDand PD. For each of the two distance matrices, we employed arange of clustering thresholds and three different definitions oflinkage in the hierarchical clustering algorithm: nearest-neighbor,average, and furthest-neighbor.To statistically evaluate the similarity of cluster composition

between of each pair of clustering results, we used two summarystatistics that together capture the frequency with which sequencesare co-clustered in both analyses: true conjunction rate (i.e., theproportion of pairs of sequences derived from the same cluster inthe first analysis that also are clustered together in the secondanalysis) and true disjunction rate (i.e., the proportion of pairs ofsequences derived from different clusters in the first analysis thatalso are not clustered together in the second analysis) (see Methods

Figure 1. PhylOTU Workflow. Computational processes are represented as squares and databases are represented as cylinders in this generalizeworkflow of PhylOTU. See Results section for details.doi:10.1371/journal.pcbi.1001061.g001

Finding Metagenomic OTUs

PLoS Computational Biology | www.ploscompbiol.org 3 January 2011 | Volume 7 | Issue 1 | e1001061

Sharpton TJ, Riesenfeld SJ, Kembel SW, Ladau J, O'Dwyer JP, Green JL, Eisen JA, Pollard KS. (2011) PhylOTU: A High-Throughput Procedure Quantifies Microbial Community Diversity and Resolves Novel Taxa from Metagenomic Data. PLoS Comput Biol 7(1): e1001061. doi:10.1371/journal.pcbi.1001061

PhylOTU

Monday, April 29, 13

Page 100: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Phylosift/ pplacer

Aaron Darling, Guillaume Jospin, Holly Bik, Erik Matsen, Eric Lowe, and others

Monday, April 29, 13

Page 101: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Kembel Combiner

cally defined by a sequence similarity threshold) in the sampleas equally related. Newer ! diversity measures that incorporatephylogenetic information are more powerful because they ac-count for the degree of divergence between sequences (13, 18,29, 30). Phylogenetic ! diversity measures can also be eitherquantitative or qualitative depending on whether abundance istaken into account. The original, unweighted UniFrac measure(13) is a qualitative measure. Unweighted UniFrac measuresthe distance between two communities by calculating the frac-tion of the branch length in a phylogenetic tree that leads todescendants in either, but not both, of the two communities(Fig. 1A). The fixation index (FST), which measures thedistance between two communities by comparing the geneticdiversity within each community to the total genetic diversity ofthe communities combined (18), is a quantitative measure thataccounts for different levels of divergence between sequences.The phylogenetic test (P test), which measures the significanceof the association between environment and phylogeny (18), istypically used as a qualitative measure because duplicate se-quences are usually removed from the tree. However, the Ptest may be used in a semiquantitative manner if all clones,even those with identical or near-identical sequences, are in-cluded in the tree (13).

Here we describe a quantitative version of UniFrac that wecall “weighted UniFrac.” We show that weighted UniFrac be-haves similarly to the FST test in situations where both are

applicable. However, weighted UniFrac has a major advantageover FST because it can be used to combine data in whichdifferent parts of the 16S rRNA were sequenced (e.g., whennonoverlapping sequences can be combined into a single treeusing full-length sequences as guides). We use two differentdata sets to illustrate how analyses with quantitative and qual-itative ! diversity measures can lead to dramatically differentconclusions about the main factors that structure microbialdiversity. Specifically, qualitative measures that disregard rel-ative abundance can better detect effects of different foundingpopulations, such as the source of bacteria that first colonizethe gut of newborn mice and the effects of factors that arerestrictive for microbial growth such as temperature. In con-trast, quantitative measures that account for the relative abun-dance of microbial lineages can reveal the effects of moretransient factors such as nutrient availability.

MATERIALS AND METHODS

Weighted UniFrac. Weighted UniFrac is a new variant of the original un-weighted UniFrac measure that weights the branches of a phylogenetic treebased on the abundance of information (Fig. 1B). Weighted UniFrac is thus aquantitative measure of ! diversity that can detect changes in how many se-quences from each lineage are present, as well as detect changes in which taxaare present. This ability is important because the relative abundance of differentkinds of bacteria can be critical for describing community changes. In contrast,the original, unweighted UniFrac (Fig. 1A) is a qualitative ! diversity measurebecause duplicate sequences contribute no additional branch length to the tree(by definition, the branch length that separates a pair of duplicate sequences iszero, because no substitutions separate them).

The first step in applying weighted UniFrac is to calculate the raw weightedUniFrac value (u), according to the first equation:

u ! !i

n

bi " "Ai

AT#

Bi

BT"

Here, n is the total number of branches in the tree, bi is the length of branch i,Ai and Bi are the numbers of sequences that descend from branch i in commu-nities A and B, respectively, and AT and BT are the total numbers of sequencesin communities A and B, respectively. In order to control for unequal samplingeffort, Ai and Bi are divided by AT and BT.

If the phylogenetic tree is not ultrametric (i.e., if different sequences in thesample have evolved at different rates), clustering with weighted UniFrac willplace more emphasis on communities that contain quickly evolving taxa. Sincethese taxa are assigned more branch length, a comparison of the communitiesthat contain them will tend to produce higher values of u. In some situations, itmay be desirable to normalize u so that it has a value of 0 for identical commu-nities and 1 for nonoverlapping communities. This is accomplished by dividing uby a scaling factor (D), which is the average distance of each sequence from theroot, as shown in the equation as follows:

D ! !j

n

dj " #Aj

AT$

Bj

BT$

Here, dj is the distance of sequence j from the root, Aj and Bj are the numbersof times the sequences were observed in communities A and B, respectively, andAT and BT are the total numbers of sequences from communities A and B,respectively.

Clustering with normalized u values treats each sample equally instead of

TABLE 1. Measurements of diversity

Measure Measurement of " diversity Measurement of ! diversity

Only presence/absence of taxa considered Qualitative (species richness) QualitativeAdditionally accounts for the no. of times that

each taxon was observedQuantitative (species richness and evenness) Quantitative

FIG. 1. Calculation of the unweighted and the weighted UniFracmeasures. Squares and circles represent sequences from two differentenvironments. (a) In unweighted UniFrac, the distance between thecircle and square communities is calculated as the fraction of thebranch length that has descendants from either the square or the circleenvironment (black) but not both (gray). (b) In weighted UniFrac,branch lengths are weighted by the relative abundance of sequences inthe square and circle communities; square sequences are weightedtwice as much as circle sequences because there are twice as many totalcircle sequences in the data set. The width of branches is proportionalto the degree to which each branch is weighted in the calculations, andgray branches have no weight. Branches 1 and 2 have heavy weightssince the descendants are biased toward the square and circles, respec-tively. Branch 3 contributes no value since it has an equal contributionfrom circle and square sequences after normalization.

VOL. 73, 2007 PHYLOGENETICALLY COMPARING MICROBIAL COMMUNITIES 1577

Kembel SW, Eisen JA, Pollard KS, Green JL (2011) The Phylogenetic Diversity of Metagenomes. PLoS ONE 6(8): e23214. doi:10.1371/journal.pone.0023214

Monday, April 29, 13

Page 102: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

NMF in MetagenomesCharacterizing the niche-space distributions of componentsS

ite

s

N orth American E ast C oast_G S 005_E mbayment

N orth American E ast C oast_G S 002_C oasta l

N orth American E ast C oast_G S 003_C oasta l

N orth American E ast C oast_G S 007_C oasta l

N orth American E ast C oast_G S 004_C oasta l

N orth American E ast C oast_G S 013_C oasta l

N orth American E ast C oast_G S 008_C oasta l

N orth American E ast C oast_G S 011_E stuary

N orth American E ast C oast_G S 009_C oasta l

E astern Tropica l Pacific_G S 021_C oasta l

N orth American E ast C oast_G S 006_E stuary

N orth American E ast C oast_G S 014_C oasta l

Polynesia Archipelagos_G S 051_C ora l R eef Atoll

G alapagos Islands_G S 036_C oasta l

G alapagos Islands_G S 028_C oasta l

Indian O cean_G S 117a_C oasta l sample

G alapagos Islands_G S 031_C oasta l upwelling

G alapagos Islands_G S 029_C oasta l

G alapagos Islands_G S 030_W arm S eep

G alapagos Islands_G S 035_C oasta l

S argasso S ea_G S 001c_O pen O cean

E astern Tropica l Pacific_G S 022_O pen O cean

G alapagos Islands_G S 027_C oasta l

Indian O cean_G S 149_H arbor

Indian O cean_G S 123_O pen O cean

C aribbean S ea_G S 016_C oasta l S ea

Indian O cean_G S 148_Fringing R eef

Indian O cean_G S 113_O pen O cean

Indian O cean_G S 112a_O pen O cean

C aribbean S ea_G S 017_O pen O cean

Indian O cean_G S 121_O pen O cean

Indian O cean_G S 122a_O pen O cean

G alapagos Islands_G S 034_C oasta l

C aribbean S ea_G S 018_O pen O cean

Indian O cean_G S 108a_Lagoon R eef

Indian O cean_G S 110a_O pen O cean

E astern Tropica l Pacific_G S 023_O pen O cean

Indian O cean_G S 114_O pen O cean

C aribbean S ea_G S 019_C oasta l

C aribbean S ea_G S 015_C oasta l

Indian O cean_G S 119_O pen O cean

G alapagos Islands_G S 026_O pen O cean

Polynesia Archipelagos_G S 049_C oasta l

Indian O cean_G S 120_O pen O cean

Polynesia Archipelagos_G S 048a_C ora l R eef

Component 1

Component 2

Component 3

Component 4

Component 5

0 .1 0 .2 0 .3 0 .4 0 .5 0 .6

0 .2 0 .4 0 .6 0 .8 1 .0

Salin

ity

Sam

ple

Dep

th

Ch

loro

ph

yll

Tem

pera

ture

Inso

lati

on

Wate

r D

ep

th

G enera l

H ighM ediumLowN A

H ighM ediumLowN A

W ater depth

>4000m2000!4000m900!2000m100!200m20!100m0!20m

>4000m2000!4000m900!2000m100!200m20!100m0!20m

(a) (b) (c)

Figure 3: a) Niche-space distributions for our five components (HT ); b) the site-similarity matrix (HT H); c) environmental variables for the sites. The matrices arealigned so that the same row corresponds to the same site in each matrix. Sites areordered by applying spectral reordering to the similarity matrix (see Materials andMethods). Rows are aligned across the three matrices.

Figure 3a shows the estimated niche-space distribution for each of the five com-ponents. Components 2 (Photosystem) and 4 (Unidentified) are broadly distributed;Components 1 (Signalling) and 5 (Unidentified) are largely restricted to a handful ofsites; and component 3 shows an intermediate pattern. There is a great deal of overlapbetween niche-space distributions for di�erent components.

Figure 3b shows the pattern of filtered similarity between sites. We see clear pat-terns of grouping, that do not emerge when we calculate functional distances withoutfiltering, or using PCA rather than NMF filtering (Figure 3 in Text S1). As withthe Pfams, we see clusters roughly associated with our components, but there is moreoverlapping than with the Pfam clusters (Figure 2b).

Figure 3c shows the distribution of environmental variables measured at each site.Inspection of Figure 3 reveals qualitative correspondence between environmental factorsand clusters of similar sites in the similarity matrix. For example, the “North AmericanEast Coast” samples are divided into two groups, one in the top left and the other in thebottom right of the similarity matrix. Inspection of the environmental features suggeststhat the split in these samples could be mostly due to the di�erences in insolation andwater depth.

We can also examine patterns of similarity between the components themselves,using niche-site distributions or functional profiles (see Figure 5 in Text S1). All 5

8

Functional biogeography of ocean microbes revealed through non-negative matrixfactorization Jiang et al. In press PLoS One. Comes out 9/18.

w/ Weitz, Dushoff, Langille, Neches, Levin, etc

Monday, April 29, 13

Page 103: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

More Markers

Phylogenetic group Genome Number

Gene Number

Maker Candidates

Archaea 62 145415 106Actinobacteria 63 267783 136Alphaproteobacteria 94 347287 121Betaproteobacteria 56 266362 311Gammaproteobacteria 126 483632 118Deltaproteobacteria 25 102115 206Epislonproteobacteria 18 33416 455Bacteriodes 25 71531 286Chlamydae 13 13823 560Chloroflexi 10 33577 323Cyanobacteria 36 124080 590Firmicutes 106 312309 87Spirochaetes 18 38832 176Thermi 5 14160 974Thermotogae 9 17037 684

Monday, April 29, 13

Page 104: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Better Reference Tree

Morgan et al. submitted

Monday, April 29, 13

Page 105: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Sifting FamiliesRepresentative

Genomes

ExtractProtein

Annotation

All v. AllBLAST

HomologyClustering

(MCL)

SFams

Align & Build

HMMs

HMMs

Screen forHomologs

NewGenomes

ExtractProtein

Annotation

Figure 1Sharpton et al. submitted

AB

C

��

�� �

��

��

��

��

��

��

��

��

��

� ��

��

��

��

��

��

� �

��

� �

��

��

� �

��

��

� �

� �

� �

��

��

� ��

��

��

��

��

��

��

��

��

��

� �

��

��

� �

��

��

� �

��

��

��

��

��

��

��

� �

��

��

���

��

��

� �

��

��

��

� ��

��

� �

��

��

� �

� �� �

� �

��

��

��

��

���

� �

��

� �

��

��

��

��

��

��

��

���

��

��

��

��

��

� �

��

��

��

��

��

��

���

��

��

��

��

��

� �

��

� �

��

�� �

��

��

� �

��

��

��

��

��

��

��

��

�� �

��

��

��

���

��

��

��

��

��

�� �

�� �

��

��

��

��

��

�� �

��

� ��

� �

��

��

��

� �

��

� �

��

� �

��

��

��

��

��

� �

��

��

��

� �

��

��

��

��

��

��

��

��

��

� �

��

��

��

��

��

� �

��

Monday, April 29, 13

Page 106: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Zorro - Automated Masking

ce to

Tru

e Tr

ee

0.0

1.0

2.0

3.0

4.0

5.0

6.0

7.0

8.0

9.0

200 400 800 1600 3200

Dist

ance

to T

rue

Tree

Sequence Length

200

no maskingzorrogblocks

Wu M, Chatterji S, Eisen JA (2012) Accounting For Alignment Uncertainty in Phylogenomics. PLoS ONE 7(1): e30288. doi:10.1371/journal.pone.0030288

Monday, April 29, 13

Page 107: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Phylogenomics Future 2

• We have still only scratched the surface of microbial diversity

Monday, April 29, 13

Page 108: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

rRNA Tree of Life

Figure from Barton, Eisen et al. “Evolution”, CSHL Press. 2007.

Based on tree from Pace 1997 Science 276:734-740

Archaea

Eukaryotes

Bacteria

Monday, April 29, 13

Page 110: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

PD: Genomes + GEBA

From Wu et al. 2009 Nature 462, 1056-1060

Monday, April 29, 13

Page 113: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Uncultured Lineages: Methods

• Get into culture• Enrichment cultures• If abundant in low diversity ecosystems• Flow sorting• Microbeads• Microfluidic sorting• Single cell amplification

Monday, April 29, 13

Page 114: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

110

Number of SAGs from Candidate Phyla

OD

1

OP

11

OP

3

SA

R4

06

Site A: Hydrothermal vent 4 1 - -Site B: Gold Mine 6 13 2 -Site C: Tropical gyres (Mesopelagic) - - - 2Site D: Tropical gyres (Photic zone) 1 - - -

Sample collections at 4 additional sites are underway.

Phil Hugenholtz

GEBA Uncultured

Monday, April 29, 13

Page 115: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Uncultured Eukaryotes?

Monday, April 29, 13

Page 116: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Phylogenomics Future 3

• Need Experiments from Across the Tree of Life too

Monday, April 29, 13

Page 117: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

As of 2002

Tree Based on Hugenholtz, 2002. http://genomebiology.com/2002/3/2/reviews/0003

Monday, April 29, 13

Page 118: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Experimental studies are mostly from three phyla

As of 2002

Tree Based on Hugenholtz, 2002. http://genomebiology.com/2002/3/2/reviews/0003

Monday, April 29, 13

Page 119: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Experimental studies are mostly from three phyla

• Some studies in other phyla

As of 2002

Tree Based on Hugenholtz, 2002. http://genomebiology.com/2002/3/2/reviews/0003

Monday, April 29, 13

Page 120: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Same trend in Eukaryotes

As of 2002

Tree Based on Hugenholtz, 2002. http://genomebiology.com/2002/3/2/reviews/0003

Monday, April 29, 13

Page 121: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

• At least 40 phyla of bacteria

• Genome sequences are mostly from three phyla

• Some other phyla are only sparsely sampled

• Same trend in Viruses

As of 2002

Tree Based on Hugenholtz, 2002. http://genomebiology.com/2002/3/2/reviews/0003

Monday, April 29, 13

Page 122: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

0.1

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

Tree based on Hugenholtz (2002) with some modifications.

Need experimental studies from across the tree too

Tree Based on Hugenholtz, 2002. http://genomebiology.com/2002/3/2/reviews/0003

Monday, April 29, 13

Page 123: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

0.1

Acidobacteria

Bacteroides

Fibrobacteres

Gemmimonas

Verrucomicrobia

Planctomycetes

Chloroflexi

Proteobacteria

Chlorobi

FirmicutesFusobacteria Actinobacteria

Cyanobacteria

Chlamydia

Spriochaetes

Deinococcus-Thermus

Aquificae

Thermotogae

TM6OS-K

Termite GroupOP8

Marine GroupAWS3

OP9

NKB19

OP3

OP10

TM7

OP1OP11

Nitrospira

SynergistesDeferribacteres

Thermudesulfobacteria

Chrysiogenetes

Thermomicrobia

Dictyoglomus

Coprothmermobacter

Tree based on Hugenholtz (2002) with some modifications.

Adopt a Microbe

Tree Based on Hugenholtz, 2002. http://genomebiology.com/2002/3/2/reviews/0003

Monday, April 29, 13

Page 124: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

What Next?

Monday, April 29, 13

Page 125: The need for a phylogeny driven genomic encyclopedia of eukaryotes #SMBEEuks

Acknowledgements• GEBA:

• $$: DOE-JGI, DSMZ• Eddy Rubin, Phil Hugenholtz, Hans-Peter Klenk, Nikos Kyrpides, Tanya Woyke, Dongying Wu, Aaron

Darling, Jenna Lang• GEBA Cyanobacteria

• $$: DOE-JGI• Cheryl Kerfeld, Dongying Wu, Patrick Shih

• Haloarchaea• $$$ NSF• Marc Facciotti, Aaron Darling, Erin Lynch,

• iSEEM: • $$: GBMF• Katie Pollard, Jessica Green, Martin Wu, Steven Kembel, Tom Sharpton, Morgan Langille, Guillaume

Jospin, Dongying Wu, • aTOL

• $$: NSF• Naomi Ward, Jonathan Badger, Frank Robb, Martin Wu, Dongying Wu

• Others (not mentioned in detail)• $$: NSF, NIH, DOE, GBMF, DARPA, Sloan• Frank Robb, Craig Venter, Doug Rusch, Shibu Yooseph, Nancy Moran, Colleen Cavanaugh, Josh

Weitz• EisenLab: Srijak Bhatnagar, Russell Neches, Lizzy Wilbanks, Holly Bik

Monday, April 29, 13