Upload
madsalbertsen
View
1.857
Download
4
Tags:
Embed Size (px)
DESCRIPTION
Invited lecture at University of Vienna on extracting genomes from metagenomes.
Citation preview
Extracting genomes from metagenomes
Mads AlbertsenPhD Student (2011-2014)
02-12-2013 @ University of Vienna
CENTER FOR MICROBIAL COMMUNITIES
Aalborg
Per H. Nielsen
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Microbial Ecology: Who - when, where and why?
1/13
Seweragesystem
Occasional breakdowns
Strike
Microbial Ecology
Nielsen et al., 2012 Curr. Opin. Biotechnol. 23:452-9 CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Biological wastewater treatment
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Aalborg
Hjørring
Århus
Odense
MiDASSince 2006 4 samples / year = 7 2 samples / year = 6 Some years = 16
Copenhagen
Nielsen et al., 2012 Curr. Opin. Biotechnol. 23:452-9
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
30 abundant core genera in all Danish
EBPR WWTPs
Functional studies using MAR-FISH
Nielsen et al., 2012 Curr. Opin. Biotechnol. 23:452-9
qFISH
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
www.midasfieldguide.org
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Metabolites
Proteins
mRNA
DNA
Meta-bolomics
Meta-proteomics
Meta-transcriptomics
Meta-genomics
Data integration
In Situ methods
Community structure Microbial functions
Omics
P-Removal:
N-Removal:
-Removal:
Foaming:
Ethanol production:
Microbial needsEcology
Understanding ecosystems
Albertsen et al., 2012, ISME J 6: 1094-106
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Metabolites
Proteins
mRNA
DNA
Meta-bolomics
Meta-proteomics
Meta-transcriptomics
Meta-genomics
Data integration
In Situ methods
Community structure Microbial functions
Omics
P-Removal:
N-Removal:
-Removal:
Foaming:
Ethanol production:
Microbial needsEcology
Understanding ecosystems
Omics requires good reference genomes!
Albertsen et al., 2012, ISME J 6: 1094-106
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2012, ISME J 6: 1094-106
Available genomes (+)
(+)
(+)
Culturing
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
How do we get the genomes?
Few microorganisms can be easily cultured (<<5%)
Tetrasphaera: Kristiansen et al., 2013, ISME J 7: 543-54Microthirx: McIllroy et al., 2013, ISME J 7:1161-72
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
How do we get the genomes?
What you think you study What you actually study
Single cell genomics
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
How do we get the genomes?
CulturingFew microorganisms can be easily cultured (<<5%)
Only routinely performed in specialized labsVery incomplete genomes (mean 40%, range 10-90%)
www.bigelow.org
Single cell genomics
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
How do we get the genomes?
CulturingFew microorganisms can be easily cultured (<<5%)
Only routinely performed in specialized labsVery incomplete genomes (mean 40%, range 10-90%)
Metagenomics
www.bigelow.org
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Genome = Parts list of a single species
What is a genome?
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Metagenome = Parts list of the community
Photo: D. Kunkel; color, E. Latypova
What is a metagenome?
”...functional analysis of the collective genomes of soil microflora, which we term the metagenome of the soil.”
- J. Handelsman et al., 1998
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
What is a metagenome?
PubMed: metagenom*[Title/Abstract]
”...functional analysis of the collective genomes of soil microflora, which we term the metagenome of the soil.”
- J. Handelsman et al., 1998
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Metagenomics is hot!
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
”...functional analysis of the collective genomes of soil microflora, which we term the metagenome of the soil.”
- J. Handelsman et al., 1998
PubMed: metagenom*[Title/Abstract]
Sequencing costs
http://www.genome.gov/sequencingcosts/
Sequencing is cheap!
DNA extraction
Sequencing
Assembly Contigs Search against
database
1000+ bp
100-150 bp
Reads
Metagenomics
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
100++ Abundant species (≈3 Mbp each)
DNA extraction
Sequencing
Assembly Contigs Search against
database
Phylogenetic classificationWho is there?
Functional classificationWhat can they do?
Bacterium ABacterium B...Bacterium X
Gene AGene B...Gene X
100++ Abundant species (≈3 Mbp each)
1000+ bp
100-150 bp
Reads
Metagenomics
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
DNA extraction
Sequencing
Assembly Contigs Search against
database
Phylogenetic classificationWho is there?
Functional classificationWhat can they do?
Bacterium ABacterium B...Bacterium X
Gene AGene B...Gene X
100++ Abundant species (≈3 Mbp each)
1000+ bp
100-150 bp
Reads
Metagenomics
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Omics requires good reference genomes!
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
”If you want to understand the ecosystem
you need to understand the individual species
in the ecosystem”
Metagenomics
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Lion + Eagle ≠ Flying Lion
DNA extraction
Sequencing
Assembly
100-150 bp
Reads
Metagenomics
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Why not full genomes?
100++ Abundant species (≈3 Mbp each)
Contigs
1000+ bp
DNA extraction
Sequencing
Assembly Contigs
1000+ bp
100-150 bp
Reads
Metagenomics
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Why not full genomes?
1. Micro-diversity
2. Separation of genomes (Binning)
100++ Abundant species (≈3 Mbp each)
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Not 1 strain
Many closely related strains
AAAAAAAAAAAAAA
AAAAAAAAATAAAA
AAAAAAAAACAAAA
AAAAAAAAA
TAAAA
CAAAA
What you get
AAAAA
Assembly
Micro-diversity
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Low micro-diversityHigh micro-diversity
Short term enrichment
Micro-diversity
DNA extraction
Sequencing
Assembly
100-150 bp
Reads
Binning
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Why not full genomes?
1. Micro-diversity
2. Separation of genomes (Binning)
100++ Abundant species (≈3 Mbp each)
Contigs
1000+ bp
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Complex sample
PhD student
”Binning”
Binning
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Genomic signatures (e.g GC and codon usage )Tetranucleotide frequency + statistical method
Complex sample
PhD student
”Binning”
Binning
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Complex sample
PhD student
”Binning”
Short pieces of DNA sequences (1-10kbp)Local sequence divergence
BinningGenomic signatures (e.g GC and codon usage )Tetranucleotide frequency + statistical method
”Metagenomics can be used to measure the abundance of the
organims in the original sample.”
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Binning
Assembly
ScaffoldsMetagenome reads Abundance
Sequencing
Original sample
Mapping 3x1x1x
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Binning
Assembly
ScaffoldsMetagenome reads Abundance
Sequencing
Original sample
Mapping 3x1x1x
Sample 1
Abun
danc
e
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Binning
Sequence composition-independent binning
Sample 1
Abun
danc
e
Sample 2
Abun
danc
e
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Binning
Sequence composition-independent binning
Sequence composition-independent binning
Sample 1 Sample 2
Abundance Sample 1
Abun
danc
e Sa
mpl
e 2
Abun
danc
e
Abun
danc
e
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Binning
1. Reduce micro-diversity
2. Use multiple related samples
Abundance Sample 1
Abun
danc
e Sa
mpl
e 2
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Binning
1. Reduce micro-diversity
2. Use multiple related samples
Abundance Sample 1
Abun
danc
e Sa
mpl
e 2
Abundance Sample 1
Abun
danc
e Sa
mpl
e 2
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Binning
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYH. Daims & C. Dorninger, DOME, University of Vienna
• Nitrospira enrichment running for years
• 3 dominant species
• No micro-diversity
Binning
Short term enrichment
Full-scale EBPR plantSBR reactor
Days 1. Reduction of (micro)-diversityCENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.
Short term enrichment
Full-scale EBPR plantSBR reactor
2. Two different
DNA extraction methods
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.
Colored using a set of 100 phylogenetic marker genes
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.
Colored using a set of 100 phylogenetic marker genes
TM7-1 (1.6%)
TM7-2 (0.7%)
TM7-3 (0.2%)
TM7-4 (0.06%)
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.
Zoom on target
TM7-2 (0.7%)
Colored using a set of 100 phylogenetic marker genes
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.
Zoom on target
PC2
PC1
TM7-2
PCA on genomic signatures
TM7-2 (0.7%)
Colored using a set of 100 phylogenetic marker genes
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.
Colored using a set of 100 phylogenetic marker genes
TM7-1 (1.6%)
Candidate phylum TM7
Saccharibacteria
Candidatus Saccharimonas aalborgensis
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.
Phyla
Genes (HMM models)
Essential single copy genesAssembly inspection
Genome validation
In situ confirmation
PL. Larsen, SJ. McIllroy
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.
http://madsalbertsen.github.io/multi-metagenome/Short: goo.gl/0ctA3
• Guides• Workflow scripts• Example data• All the code• Reccomendations
R markdown enables reproducible and
transparent genome extractions
Multi-metagenome
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
It’s just a potential!
..and a poor description of it.
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Competibacter
McIlroy and Albertsen et al., 2013, ISME J (AOP).
Competibacter has the potential to negatively influence phosphorus removal in wastewater treatment.
Litterature disagreement on glycolytic pathways with consequences for modeling.
Candidatus Competibacter odensis
(44%)
GAO989
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Competibacter
FISH with Competibacter specific probe
MAR with H3-labeled glucose
McIlroy and Albertsen et al., 2013, ISME J (AOP).
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Obtaining genomes is easy…
… but they are useless without high quality annotations, in situ validations
and good questions!
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
G.W. Tyson
Per H. NielsenSimon J. McIllroySøren M. KarstEB group
C. Dorringer H. Daims P. HugenholtzUniversity of Vienna
University of Queensland
Questions? @MadsAlbertsen85
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Databases
Contigs
Databases
...you only see what is in the database
Annotated metagenome
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
What is in the databases?
PhylaClassOrderSpecies
2946
1001268
90249405
99322
Genomes 16S
Finshed Genomes in IMGVs.
Greengenes 16S rRNA database
Note: only including 1 strain pr. species
*97% clustering
*
MG-RAST example
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Contigs
650.000 EBPR proteins with taxonomy assigned
How similar are they to the genomes in the database?
Sludge microbes vs. Database genomes
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
650.000 EBPR proteins
Note: not abundance weighted
Sludge microbes vs. Database genomes
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
650.000 EBPR proteins1.260.000 Human gut
Qin et al., 2010 NatureRAST ID: 4448044.3
Note: not abundance weighted
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Sludge microbes vs. Database genomes
The 7 genera with most EBPR proteins assigned
Effect of missing genomes
What is the effect of not having closely related genomes in the database?
1. Remove a genome from the database
2. Search the removed genome against the database
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Effect of missing genomes
Best hit
Bacteria 1268Proteobacteria 564Betaproteobacteria 84Rhodocyclales 5Rhodocyclaceae 5
Accumulibacter phosphatis
blastp
Related genomes
4326 proteins
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Effect of missing genomes
Best hit
Accumulibacter phosphatis
blastp
Related genomes
4326 proteinsAzoarcus
Bacteria 1268Proteobacteria 564Betaproteobacteria 84Rhodocyclales 5Rhodocyclaceae 5
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Effect of missing genomes
MEGAN LCA
Accumulibacter phosphatis
blastp
Lowest common ancester (LCA) approach:Hit 1: Beta-proteobacteria 80% IDHit 2: Gamma-proteobacteria 79% IDHit 3: Actinobacteria 59% ID
Assigned to Proteobacteria
Related genomes
4326 proteins
Bacteria 1268Proteobacteria 564Betaproteobacteria 84Rhodocyclales 5Rhodocyclaceae 5
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Effect of missing genomes
MEGAN LCA
Accumulibacter phosphatis
blastp
Genus
No hits 261
Bacteria 325
Proteobacteria 860
Beta- 853
Rhodocyclaceae 1149
4326 proteins:• 27% correctly
classified on genus level
• 54% not assigned the correct class
• 101 genera identified
Related genomes
Lowest common ancester (LCA) approach:Hit 1: Beta-proteobacteria 80% IDHit 2: Gamma-proteobacteria 79% IDHit 3: Actinobacteria 59% ID
Assigned to Proteobacteria
4326 proteins
Bacteria 1268Proteobacteria 564Betaproteobacteria 84Rhodocyclales 5Rhodocyclaceae 5
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Effect of missing genomes
MEGAN LCA
Nitrospira defluvii
Bacteria 1268Nitrospirae 3
blastp
Related genomes
4268 proteins:• 1% correctly
classified on phylum level
Phylum
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Effect of missing genomes
MEGAN LCA+
KEGG
Nitrospira defluvii
blastp
Related genomesBacteria 1268Nitrospirae 3
What about function?
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Effect of missing genomes
MEGAN LCA+
KEGG
Nitrospira defluvii
blastp
Related genomesBacteria 1268Nitrospirae 3
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Effect of missing genomes
Nitrospira defluvii
blastp
Related genomes
MEGAN LCA+
KEGG
Bacteria 1268Nitrospirae 3
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Implication of missing genomes
Function A
Function B
Function C
Function D
Pitfalls
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
You always get billions of data!