Upload
ueb
View
611
Download
1
Embed Size (px)
DESCRIPTION
Course: Bioinformatics for Biologiacl Researchers (2014). Session: 3.1- Introduction to Metagenomics. Applications, Approaches and Tools. Statistics and Bioinformatisc Unit (UEB) from Vall d'Hebron Research Institute (www.vhir.org), Barcelona.
Citation preview
Hospital Universitari Vall d’HebronInstitut de Recerca - VHIR
Institut d’Investigació Sanitària de l’Instituto de Salud Carlos III (ISCIII)
Bioinformatics for Biological Researchers
http://eib.stat.ub.edu/2014BBR
Ferran Briansó[email protected]
28/05/2014
INTRODUCTION TO METAGENOMICSINTRODUCTION TO METAGENOMICS
1. Introduction
2. Applications
3. Basic Concepts
4. Approaches & Workflows
1. Whole Genome Shotgun
2. 16S/ITS Community Surveys
● Analysis Tools1. MEGAN
2. Mothur
3. Qiime
4. Axiome & CloVR
5. MG-RAST
1. More resources
5
1
2
3
4
5
PRESENTATION OUTLINE
6
1 INTRODUCTIONINTRODUCTION
Introduction | Metagenomics definition1
4
First use of the term metagenome, referencing the idea that a collection of genes sequenced from the environment could be analyzed in a way analogous to the study of a single genome.Handelsman, J.; Rondon, M. R.; Brady, S. F.; Clardy, J.; Goodman, R. M. (1998). "Molecular biological access to the chemistry of unknown soil microbes: A new frontier for natural products".Chemistry & Biology 5 (10): R245–R249. doi:10.1016/S1074-5521(98)90108-9. PMID 9818143
1
First use of the term metagenome, referencing the idea that a collection of genes sequenced from the environment could be analyzed in a way analogous to the study of a single genome.Handelsman, J.; Rondon, M. R.; Brady, S. F.; Clardy, J.; Goodman, R. M. (1998). "Molecular biological access to the chemistry of unknown soil microbes: A new frontier for natural products".Chemistry & Biology 5 (10): R245–R249. doi:10.1016/S1074-5521(98)90108-9. PMID 9818143
Chen, K.; Pachter, L. (2005). "Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities".PLoS Computational Biology 1 (2): e24. doi:10.1371/journal.pcbi.0010024
Current definition:“The application of modern genomics techniques to the study of communities of microbial organisms directly in their natural environments, bypassing the need for isolation and lab cultivation of individual species.”
5
Introduction | Metagenomics definition
1
6
Introduction | Historical context
1
Source:
7
Introduction | Historical context
1
Source:
http://howcoolismyresear.ch/#metagenomics8
Introduction | Historical context
1
9
Introduction | Basic purpose
2 APPLICATIONSAPPLICATIONS
2
11
Applications | What metagenomics can do
● Global Impacts. The role of microbes is critical in maintaining atmospheric
balances, as they are
● the main photosynthetic agents
● responsible for the generation and consumption of greenhouse
gases
● involved at all levels in ecosystems and trophic chains
2
12
Applications | What metagenomics can do
● Global Impacts. The role of microbes is critical in maintaining atmospheric
balances, as they are
● the main photosynthetic agents
● responsible for the generation and consumption of greenhouse
gases
● involved at all levels in ecosystems and trophic chains
● Bioremediation. Cleaning up environmental contamination, such as
● the waste from water treatment facilities
● gasoline leaks on lands or oil spills in the oceans
● toxic chemicals
2
13
Applications | What metagenomics can do
● Bioenergy. We are harnessing microbial power in order to produce
● ethanol (from cellulose), hydrogen, methane, butanol...
● Smart Farming. Microbes help our crops by● the “supressive soil” phenomenon
(buffer effect against disease-causing organisms)● soil enrichment and regeneration
2
14
Applications | What metagenomics can do
● Bioenergy. We are harnessing microbial power in order to produce
● ethanol (from cellulose), hydrogen, methane, butanol...
● Smart Farming. Microbes help our crops by● the “supressive soil” phenomenon
(buffer effect against disease-causing organisms)● soil enrichment and regeneration
● The World Within. Studying the human microbiome may lead
to valuable new tools and guidelines in● human and animal nutrition● better understanding of complex diseases
(obesity, cancer, asthma...)● drug discovery● preventative medicine
Grice E.A. & Segre J.A. (2012) The Human Microbiome: Our Second Genome, Annu. Rev. Genomics Human Genet. 13, 151-170
2
15
Applications | Mapping the Human Microbiome
3 BASIC CONCEPTSBASIC CONCEPTS
3
17
Concepts | Trimming
● Trimming: is the pre-processing step of cleaning sequence data (primers, multiplexing barcodes...) from automated DNA sequencers prior to sequence assembly and other downstream uses.
18
● Trimming: is the pre-processing step of cleaning sequence data (primers, multiplexing barcodes...) from automated DNA sequencers prior to sequence assembly and other downstream uses.
● Binning is the process of grouping reads or contigs and assigning them to operational taxonomic units (OTUs). ● OTU (Operational Taxonomic Unit): Taxonomic level of sampling selected by the user to be used in a study.
Typically using a percent sequence similarity threshold for classifying microbes within the same, or different, OTUs.
3 Concepts | Binning, OTUs
http://shuixia100.weebly.com/1/post/2011/12/mothur-tutorial-1.html / Wikipedia: Biological classification
19
● Trimming: is the pre-processing step of cleaning sequence data (primers, multiplexing barcodes...) from automated DNA sequencers prior to sequence assembly and other downstream uses.
● Binning is the process of grouping reads or contigs and assigning them to operational taxonomic units (OTUs). ● OTU (Operational Taxonomic Unit): Taxonomic level of sampling selected by the user to be used in a study.
Typically using a percent sequence similarity threshold for classifying microbes within the same, or different, OTUs.
3 Concepts | Binning, OTUs
http://shuixia100.weebly.com/1/post/2011/12/mothur-tutorial-1.html / Wikipedia: Biological classification
20
● Trimming: is the pre-processing step of cleaning sequence data (primers, multiplexing barcodes...) from automated DNA sequencers prior to sequence assembly and other downstream uses.
● Binning is the process of grouping reads or contigs and assigning them to operational taxonomic units (OTUs). ● OTU (Operational Taxonomic Unit): Taxonomic level of sampling selected by the user to be used in a study.
Typically using a percent sequence similarity threshold for classifying microbes within the same, or different, OTUs.
● Chimeras: Artificial sequences formed during PCR amplification. The majority of them are believed to arise from incomplete extension. During subsequent cycles of PCR, a partially extended strand can bind to a template derived from a different but similar sequence. This then acts as a primer that is extended to form a chimeric sequence (Smith et al. 2010, Thompson et al., 2002, Meyerhans et al., 1990, Judo et al., 1998, Odelberg, 1995). A chimeric template is created during one round, then amplified by subsequent rounds to produce chimeric amplicons that are difficult to distinguish from amplicons derived from a single biological sequence.
3 Concepts | Chimeras
Hass B.J. et al (2011) Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons, Genome Res. 21: 494-504.
3
21
● Trimming: is the pre-processing step of cleaning sequence data (primers, multiplexing barcodes...) from automated DNA sequencers prior to sequence assembly and other downstream uses.
● Binning is the process of grouping reads or contigs and assigning them to operational taxonomic units (OTUs). ● OTU (Operational Taxonomic Unit): Taxonomic level of sampling selected by the user to be used in a study.
Typically using a percent sequence similarity threshold for classifying microbes within the same, or different, OTUs.
● Chimeras: Artificial sequences formed during PCR amplification. The majority of them are believed to arise from incomplete extension. During subsequent cycles of PCR, a partially extended strand can bind to a template derived from a different but similar sequence. This then acts as a primer that is extended to form a chimeric sequence (Smith et al. 2010, Thompson et al., 2002, Meyerhans et al., 1990, Judo et al., 1998, Odelberg, 1995). A chimeric template is created during one round, then amplified by subsequent rounds to produce chimeric amplicons that are difficult to distinguish from amplicons derived from a single biological sequence.
● Alpha diversity: the diversity within a particular area or ecosystem; expressed by the number of species (i.e., species richness) in that ecosystem, or by one or more diversity indices.
● Beta diversity: a comparison of of diversity between ecosystems, usually measured as the amount of species change between the ecosystems.
● Gamma diversity: a measure of the overall diversity within a large region. Geographic-scale species diversity according to Hunter (2002:448).
Concepts | Diversities
Zinger L. et al. (2012) Two decades of describing the unseen majority of aquatic microbial diversity, Molecular Ecology 21, 1878–1896.
3
22
● Trimming: is the pre-processing step of cleaning sequence data (primers, multiplexing barcodes...) from automated DNA sequencers prior to sequence assembly and other downstream uses.
● Binning is the process of grouping reads or contigs and assigning them to operational taxonomic units (OTUs). ● OTU (Operational Taxonomic Unit): Taxonomic level of sampling selected by the user to be used in a study.
Typically using a percent sequence similarity threshold for classifying microbes within the same, or different, OTUs.
● Chimeras: Artificial sequences formed during PCR amplification. The majority of them are believed to arise from incomplete extension. During subsequent cycles of PCR, a partially extended strand can bind to a template derived from a different but similar sequence. This then acts as a primer that is extended to form a chimeric sequence (Smith et al. 2010, Thompson et al., 2002, Meyerhans et al., 1990, Judo et al., 1998, Odelberg, 1995). A chimeric template is created during one round, then amplified by subsequent rounds to produce chimeric amplicons that are difficult to distinguish from amplicons derived from a single biological sequence.
● Alpha diversity: the diversity within a particular area or ecosystem; expressed by the number of species (i.e., species richness) in that ecosystem, or by one or more diversity indices.
● Beta diversity: a comparison of of diversity between ecosystems, usually measured as the amount of species change between the ecosystems.
● Gamma diversity: a measure of the overall diversity within a large region. Geographic-scale species diversity according to Hunter (2002:448).
Concepts | Diversity measurement issues
Zhou J. et al. (2010) Random Sampling Process Leads to Overestimation of β-Diversity of Microbial Communities, mBio 4(3):e00324-13. doi:10.1128/mBio.00324-13.
Diversity can virtually never be measured directly, rather it must be estimated or inferred from available data. Our estimates are anchored in the sample itself.Magurran (Ed.), Biological Diversity, Oxford U.P. 2010. Ch. 16 Microbial Diversity and Ecology
3
23
● Trimming: is the pre-processing step of cleaning sequence data (primers, multiplexing barcodes...) from automated DNA sequencers prior to sequence assembly and other downstream uses.
● Binning is the process of grouping reads or contigs and assigning them to operational taxonomic units (OTUs). ● OTU (Operational Taxonomic Unit): Taxonomic level of sampling selected by the user to be used in a study.
Typically using a percent sequence similarity threshold for classifying microbes within the same, or different, OTUs.
● Chimeras: Artificial sequences formed during PCR amplification. The majority of them are believed to arise from incomplete extension. During subsequent cycles of PCR, a partially extended strand can bind to a template derived from a different but similar sequence. This then acts as a primer that is extended to form a chimeric sequence (Smith et al. 2010, Thompson et al., 2002, Meyerhans et al., 1990, Judo et al., 1998, Odelberg, 1995). A chimeric template is created during one round, then amplified by subsequent rounds to produce chimeric amplicons that are difficult to distinguish from amplicons derived from a single biological sequence.
● Alpha diversity: the diversity within a particular area or ecosystem; expressed by the number of species (i.e., species richness) in that ecosystem, or by one or more diversity indices.
● Beta diversity: a comparison of of diversity between ecosystems, usually measured as the amount of species change between the ecosystems.
● Gamma diversity: a measure of the overall diversity within a large region. Geographic-scale species diversity according to Hunter (2002:448).
● Rarefaction allows the calculation of species richness for a given number of individual samples, based on the construction of so-called rarefaction curves. This curve is a plot of the number of species as a function of the number of samples.
Concepts | Rarefaction
most or all species have been sampled
species rich habitat, only a small fraction has been sampled
this habitat has not been exhaustively sampled
Wooley J.C. et al. (2010) A Primer on Metagenomics, PLoS Computational Biology 6 (2) e1000667
3
24
Concepts | Diversity indices (α diversity)
Mozzarella project, Michele Iacono http://www.science.gov/topicpages/w/water+buffalo+mozzarella.html
Other indices: berger_parker_d, brillouin_d, dominance, doubles, esty_ci, fisher_alpha, gini_index, goods_coverage, margalef, mcintosh_d, mcintosh_e, menhinick,osd, simpson_reciprocal, robbins, singles, strong...
3
25
Concepts | Compositional similarity (β diversity)
Mozzarella project, Michele Iacono http://www.science.gov/topicpages/w/water+buffalo+mozzarella.html
3
26
Concepts | Compositional similarity (β diversity)
Mozzarella project, Michele Iacono http://www.science.gov/topicpages/w/water+buffalo+mozzarella.html
3
27
Concepts | Compositional similarity (β diversity)
Mozzarella project, Michele Iacono http://www.science.gov/topicpages/w/water+buffalo+mozzarella.html
3
28
Concepts | Compositional similarity (β diversity)
Heat map
Mozzarella project, Michele Iacono http://www.science.gov/topicpages/w/water+buffalo+mozzarella.html
3
29
● Trimming: is the pre-processing step of cleaning sequence data (primers, multiplexing barcodes...) from automated DNA sequencers prior to sequence assembly and other downstream uses.
● Binning is the process of grouping reads or contigs and assigning them to operational taxonomic units (OTUs). ● OTU (Operational Taxonomic Unit): Taxonomic level of sampling selected by the user to be used in a study.
Typically using a percent sequence similarity threshold for classifying microbes within the same, or different, OTUs.
● Chimeras: Artificial sequences formed during PCR amplification. The majority of them are believed to arise from incomplete extension. During subsequent cycles of PCR, a partially extended strand can bind to a template derived from a different but similar sequence. This then acts as a primer that is extended to form a chimeric sequence (Smith et al. 2010, Thompson et al., 2002, Meyerhans et al., 1990, Judo et al., 1998, Odelberg, 1995). A chimeric template is created during one round, then amplified by subsequent rounds to produce chimeric amplicons that are difficult to distinguish from amplicons derived from a single biological sequence.
● Alpha diversity: the diversity within a particular area or ecosystem; expressed by the number of species (i.e., species richness) in that ecosystem, or by one or more diversity indices.
● Beta diversity: a comparison of of diversity between ecosystems, usually measured as the amount of species change between the ecosystems.
● Gamma diversity: a measure of the overall diversity within a large region. Geographic-scale species diversity according to Hunter (2002:448).
● Rarefaction allows the calculation of species richness for a given number of individual samples, based on the construction of so-called rarefaction curves. This curve is a plot of the number of species as a function of the number of samples.
● Metadata, reads, fasta/fastq files, counts, OTU tables/networks, .biom files, PCoA, p-values, diversity metrics, robustness, scores, jackniffed, clustering, UPGMA, trees, bootstrap, Bi-Plots, ...
Concepts | Summary
4 APPROACHES & WORKFLOWSAPPROACHES & WORKFLOWS
4
31
Workflows | Microbial ecology approaches
4
32Grice E.A. & Segre J.A. (2012) The Human Microbiome: Our Second Genome, Annu. Rev. Genomics Human Genet. 13, 151-170
Workflows | Overview
Sample collection
DNA extraction and preparation
Sequencing
Analysis
4
33Grice E.A. & Segre J.A. (2012) The Human Microbiome: Our Second Genome, Annu. Rev. Genomics Human Genet. 13, 151-170
Workflows | Overview
Sample collection
DNA extraction and preparation
Sequencing
Analysis
Experimental design
Sample Quality Controls
Sequence Quality Controls
Biological interpretation
4.1 WGS MetagenomicsWGS Metagenomics
4
35
Workflows | Whole Genome Shotgun (WGS)
Sven-Eric Schelhorn https://bioinf.mpi-inf.mpg.de/homepage/research.php?&account=sven
4
36
Workflows | Whole Genome Shotgun (WGS)
Surya Saha & Magdalen Lindeberg http://www.slideshare.net/suryasaha/surya-saha-metagenomics-tools
4
37
Workflows | Whole Genome Shotgun (WGS)
Surya Saha & Magdalen Lindeberg http://www.slideshare.net/suryasaha/surya-saha-metagenomics-tools
4
38
Workflows | Whole Genome Shotgun (WGS)
Sven-Eric Schelhorn https://bioinf.mpi-inf.mpg.de/homepage/research.php?&account=sven
4
39
Workflows | Whole Genome Shotgun (WGS)
Surya Saha & Magdalen Lindeberg http://www.slideshare.net/suryasaha/surya-saha-metagenomics-tools
4
40
Workflows | Whole Genome Shotgun (WGS)
Surya Saha & Magdalen Lindeberg http://www.slideshare.net/suryasaha/surya-saha-metagenomics-tools
4.2 16S/ITS Metagenomics16S/ITS Metagenomics
4
42
Workflows | 16S/ITS Community Surveys
Surya Saha & Magdalen Lindeberg http://www.slideshare.net/suryasaha/surya-saha-metagenomics-tools
4
43
Workflows | 16S/ITS Community Surveys
Surya Saha & Magdalen Lindeberg http://www.slideshare.net/suryasaha/surya-saha-metagenomics-tools
4
44
Workflows | 16S/ITS Community Surveys
Surya Saha & Magdalen Lindeberg http://www.slideshare.net/suryasaha/surya-saha-metagenomics-tools
4
45
Workflows | 16S/ITS Community Surveys
Surya Saha & Magdalen Lindeberg http://www.slideshare.net/suryasaha/surya-saha-metagenomics-tools
4
46
Workflows | 16S/ITS Community Surveys
Surya Saha & Magdalen Lindeberg http://www.slideshare.net/suryasaha/surya-saha-metagenomics-tools
4
47
Workflows | 16S/ITS Community Surveys
Surya Saha & Magdalen Lindeberg http://www.slideshare.net/suryasaha/surya-saha-metagenomics-tools
5 METAGENOMICS TOOLSMETAGENOMICS TOOLS
5
49
Tools | “The great quest”
5
50
Tools | “The great quest”
5
51
Tools | “The great quest”
5
52
Tools | “The great quest”
5
53
Tools | MEGAN
http://ab.inf.uni-tuebingen.de/software/megan5/
5
54
Tools | MEGAN
http://ab.inf.uni-tuebingen.de/software/megan5/
5
55
Tools | MEGAN
http://ab.inf.uni-tuebingen.de/software/megan5/
5
56
Tools | Mothur
http://www.mothur.org/wiki/Main_Page / Kevin R. Theis (Michigan State University)
5
57
Tools | Mothur
http://www.mothur.org/wiki/Main_Page / Kevin R. Theis (Michigan State University)
5
58
Tools | Mothur
http://www.mothur.org/wiki/Main_Page / Kevin R. Theis (Michigan State University)
5
60
Tools | Qiime
5
61
Tools | Qiime
5
62
Tools | Axiome
http://neufeld.github.io/AXIOME
6 MORE RESOURCESMORE RESOURCES
6
72
More resources, courses...
Resources & Projects:
MEGAN DB http://www.megan-db.org/megan-db/ (MEtaGenomics ANalysis)
CAMERA http://camera.calit2.net/ (community Cyberinfrastructure for Advanced Microbial Ecology Research and Analysis)
MG-RAST Search http://metagenomics.anl.gov/metagenomics.cgi?page=MetagenomeSearch
IMG http://img.jgi.doe.gov/ (Integrated Microbial Genomes and metagenomes)
MetaBioME http://metasystems.riken.jp/metabiome/ (Comprehensive Metagenomic BioMining Engine)
BOLD http://www.boldsystems.org/ (Barcoding Of Live Database)
GOS Expedition http://www.jcvi.org/cms/research/projects/gos/overview (Global Ocean Sampling)
...
6
73
More resources, courses...
Courses:
EBI http://www.ebi.ac.uk/training/course/metagenomics2014
EMBO http://cymeandcystidium.com/?tag=metagenomics
Coursera https://www.coursera.org/course/genomescience
... and a lot of seminars and workshops everywhere
Hospital Universitari Vall d’HebronInstitut de Recerca - VHIR
Institut d’Investigació Sanitària de l’Instituto de Salud Carlos III (ISCIII)
Thanks for your attentionThanks for your attention
and also thanks toJosep Gregori (VHIR, ROCHE)
for providing some materials
INTRODUCTION TO METAGENOMICSINTRODUCTION TO METAGENOMICS
Bioinformatics for Biological Researchers
http://eib.stat.ub.edu/2014BBR
Ferran Briansó[email protected]
28/05/2014