A phylogeny driven genomic encyclopedia of bacteria and archaea
(or what is GEBA anyway?)
Jonathan A. EisenOctober 27, 2009
From http://genomesonline.org
rRNA Tree of Life
The Tree is not Happy
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
As of 2002
Based on Hugenholtz, 2002
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
As of 2002
Based on Hugenholtz, 2002
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
• Some other phyla are only sparsely sampled
As of 2002
Based on Hugenholtz, 2002
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
• Some other phyla are only sparsely sampled
• Same trend in Archaea
As of 2002
Based on Hugenholtz, 2002
Need for Tree Guidance Well Established
• Common approach within some eukaryotic groups
• Many small projects funded to fill in some bacterial or archaeal gaps
• Phylogenetic gaps in bacterial and archaeal projects commonly lamented in literature
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 100 phyla of bacteria
• Genome sequences are mostly from three phyla
• Most phyla with cultured species are sparsely sampled
• Lineages with no cultured taxa even more poorly sampled
• Solution - use tree to really fill gaps
Well sampled phyla
http://www.jgi.doe.gov/programs/GEBA/pilot.html
GEBA Pilot Project Overview
• Identify major branches in rRNA tree for which no genomes are available
• Identify a cultured representative for each group
• Grow > 200 of these and prep. DNA• Sequence and finish 100• Annotate, analyze, release data• Assess benefits of tree guided sequencing
GEBA Pilot Project: Components• Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan
Eisen, Eddy Rubin, Jim Bristow)• Project management (David Bruce, Eileen Dalin, Lynne Goodwin)• Culture collection and DNA prep (DSMZ, Hans-Peter Klenk)• Sequencing and closure (Eileen Dalin, Susan Lucas, Alla Lapidus,
Mat Nolan, Alex Copeland, Cliff Han, Feng Chen, Jan-Fang Cheng)• Annotation and data release (Nikos Kyrpides, Victor Markowitz, et
al)• Analysis (Dongying Wu, Kostas Mavrommatis, Martin Wu, Victor
Kunin, Neil Rawlings, Ian Paulsen, Patrick Chain, Patrik D’Haeseleer, Sean Hooper, Iain Anderson, Amrita Pati, Natalia N. Ivanova, Athanasios Lykidis, Adam Zemla)
• Adopt a microbe education project (Cheryl Kerfeld)• Outreach (David Gilbert)• $$$ (DOE, Eddy Rubin, Jim Bristow)
Some Lessons From GEBA
GEBA Lesson 1
rRNA Tree of Life is a Useful Guide and Genomes Improve Resolution
GEBA Lesson 2
Phylogenetically Guided Selection Can Help Annotate Other Genomes
Most/All Functional Prediction Improves w/ Better Phylogenetic Sampling
• Better definition of protein family sequence “patterns”• Greatly improves “comparative” and “evolutionary”
based predictions• Conversion of hypothetical into conserved
hypotheticals• Linking distantly related members of protein families• Improved non-homology prediction
Kostas Mavrommatis
Natalia Ivanova
Thanos Lykidis
Nikos Kyrpides
Iain Anderson
GEBA Lesson 3
Phylogenetically Guided Selection Can Help Study Uncultured
Organisms
Environmental Shotgun Sequencing
shotgun
sequence
ABCDEFG
TUVWXYZ
Binning challenge
Metagenomic Analysis Improves
Sean Hooper
Amrita Pati
• Small but real improvement in metagenomic annotation and analysis
GEBA Lesson 4
We have still only scratched the surface of microbial diversity
Protein Family Rarefaction Curves
• Take data set of multiple complete genomes• Identify all protein families using MCL• Plot # of genomes vs. # of protein families
Phylogenetic Distribution Novelty: 1st Bacterial Actin Related Protein
Haliangium ochraceum DSM 14365
Victor Kunin
Patrik D’haeseleer
Adam Zemla
Phylogenetic Diversity with GEBA
Phylogenetic Diversity: Isolates
Phylogenetic Diversity: All
Acidobacteria
Bacteroides
Fibrobacteres
Gemmimonas
Verrucomicrobia
Planctomycetes
Chloroflexi
Proteobacteria
Chlorobi
FirmicutesFusobacteria Actinobacteria
Cyanobacteria
Chlamydia
Spriochaetes
Deinococcus-Thermus Aquificae
Thermotogae
TM6OS-K
Termite GroupOP8
Marine GroupAWS3
OP9
NKB19
OP3
OP10
TM7
OP1OP11
Nitrospira
SynergistesDeferribacteres
Thermudesulfobacteria
Chrysiogenetes
Thermomicrobia
Dictyoglomus
Coprothmermobacter
• At least 40 phyla of bacteria
• Genome sequences are mostly from three phyla
• Most phyla with cultured species are sparsely sampled
• Lineages with no cultured taxa even more poorly sampled
Well sampled phylaPoorly sampled
No cultured taxa
Uncultured Lineages:Technical Approaches
• Get into culture• Enrichment cultures• If abundant in low diversity ecosystems• Flow sorting• Microbeads• Microfluidic sorting• Single cell amplification
GEBA Lesson 6
Need Experiments from Across the Tree of Life too
Adopt a Microbe
MICROBES
A Happy Tree of Life
Related Lesson 1
METADATA ROCKS
SIGS
• The Genomic Standards Consortium • The GSC is an open-membership working body which
formed in September 2005. • The goal of this international community is to promote
mechanisms that standardize the description of genomes and the exchange and integration of genomic data.
• See http://gensc.org/gc_wiki/index.php/Main_Page