Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

Preview:

DESCRIPTION

UEB-VHIR's Metagenomics Training. Session 1. 2013/08/26. An Introduction to Metagenomics Data Analysis. Ferran Briansó (ferran.brianso@vhir.org)

Citation preview

An Introduction to An Introduction to Metagenomics Data AnalysisMetagenomics Data Analysis

Metagenomics TrainingMetagenomics Training

Ferran BriansóFerran Briansó

VHIR - 26/08/2013

ferran.brianso@vhir.orgferran.brianso@vhir.org

OutlineOutline

Introduction to Metagenomics

Basic Terminology

Computational Approaches & Tools Whole Genome Shotgun 16S/ITS Community Surveys

Recommended Tools MEGAN mothur QIIME AXIOME & CloVR

Introduction to METAGENOMICSMETAGENOMICS

IntroductionIntroduction

First use of the term metagenome, referencing the idea that a collection of genes sequenced from the environment could be analyzed in a way analogous to the study of a single genome.

Handelsman, J.; Rondon, M. R.; Brady, S. F.; Clardy, J.; Goodman, R. M. (1998). "Molecular biological access to the chemistry of unknown soil microbes: A new frontier for natural products".Chemistry & Biology 5 (10): R245–R249. doi:10.1016/S1074-5521(98)90108-9. PMID 9818143

First use of the term metagenome, referencing the idea that a collection of genes sequenced from the environment could be analyzed in a way analogous to the study of a single genome.

“The application of modern genomics techniques to the study of communities of microbial organisms directly in their natural environments, bypassing the need for isolation and lab cultivation of individual species.”

Handelsman, J.; Rondon, M. R.; Brady, S. F.; Clardy, J.; Goodman, R. M. (1998). "Molecular biological access to the chemistry of unknown soil microbes: A new frontier for natural products".Chemistry & Biology 5 (10): R245–R249. doi:10.1016/S1074-5521(98)90108-9. PMID 9818143

Chen, K.; Pachter, L. (2005). "Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities".PLoS Computational Biology 1 (2): e24. doi:10.1371/journal.pcbi.0010024

IntroductionIntroduction

Source: US Division of Earth & Life Studies of the National Academieshttp://dels-old.nas.edu/metagenomics/overview.shtml

IntroductionIntroduction

Source: US Division of Earth & Life Studies of the National Academieshttp://dels-old.nas.edu/metagenomics/overview.shtml

IntroductionIntroduction

Source:

IntroductionIntroduction

Source: Feng Chen, JGI

IntroductionIntroduction

Perfomance Comparison for (some) Platforms

Basic TERMINOLOGYTERMINOLOGY

TerminologyTerminology

Trimming: is the pre-processing step of cleaning sequence data (primers, multiplexing barcodes...) from automated DNA sequencers prior to sequence assembly and other downstream uses.

Binning is the process of grouping reads or contigs and assigning them to operational taxonomic units (OTUs).

OTU (Operational Taxonomic Unit): Taxonomic level of sampling selected by the user to be used in a study. Typically using a percent sequence similarity threshold for classifying microbes within the same, or different, OTUs.

Trimming: is the pre-processing step of cleaning sequence data (primers, multiplexing barcodes...) from automated DNA sequencers prior to sequence assembly and other downstream uses.

Binning is the process of grouping reads or contigs and assigning them to operational taxonomic units (OTUs).

OTU (Operational Taxonomic Unit): Taxonomic level of sampling selected by the user to be used in a study. Typically using a percent sequence similarity threshold for classifying microbes within the same, or different, OTUs.

Chimeras: Artificial sequences formed during PCR amplification. The majority of them are believed to arise from incomplete extension. During subsequent cycles of PCR, a partially extended strand can bind to a template derived from a different but similar sequence. This then acts as a primer that is extended to form a chimeric sequence (Smith et al. 2010, Thompson et al., 2002, Meyerhans et al., 1990, Judo et al., 1998, Odelberg, 1995). A chimeric template is created during one round, then amplified by subsequent rounds to produce chimeric amplicons that are difficult to distinguish from amplicons derived from a single biological sequence.

TerminologyTerminology

Trimming: is the pre-processing step of cleaning sequence data (primers, multiplexing barcodes...) from automated DNA sequencers prior to sequence assembly and other downstream uses.

Binning is the process of grouping reads or contigs and assigning them to operational taxonomic units (OTUs).

OTU (Operational Taxonomic Unit): Taxonomic level of sampling selected by the user to be used in a study. Typically using a percent sequence similarity threshold for classifying microbes within the same, or different, OTUs.

Chimeras: Artificial sequences formed during PCR amplification. The majority of them are believed to arise from incomplete extension. During subsequent cycles of PCR, a partially extended strand can bind to a template derived from a different but similar sequence. This then acts as a primer that is extended to form a chimeric sequence (Smith et al. 2010, Thompson et al., 2002, Meyerhans et al., 1990, Judo et al., 1998, Odelberg, 1995). A chimeric template is created during one round, then amplified by subsequent rounds to produce chimeric amplicons that are difficult to distinguish from amplicons derived from a single biological sequence.

Alpha diversity: the diversity within a particular area or ecosystem; expressed by the number of species (i.e., species richness) in that ecosystem, or by one or more diversity indices.

Beta diversity: a comparison of of diversity between ecosystems, usually measured as the amount of species change between the ecosystems.

Gamma diversity: a measure of the overall diversity within a large region. Geographic-scale species diversity according to Hunter (2002:448).

TerminologyTerminology

Trimming: is the pre-processing step of cleaning sequence data (primers, multiplexing barcodes...) from automated DNA sequencers prior to sequence assembly and other downstream uses.

Binning is the process of grouping reads or contigs and assigning them to operational taxonomic units (OTUs).

OTU (Operational Taxonomic Unit): Taxonomic level of sampling selected by the user to be used in a study. Typically using a percent sequence similarity threshold for classifying microbes within the same, or different, OTUs.

Chimeras: Artificial sequences formed during PCR amplification. The majority of them are believed to arise from incomplete extension. During subsequent cycles of PCR, a partially extended strand can bind to a template derived from a different but similar sequence. This then acts as a primer that is extended to form a chimeric sequence (Smith et al. 2010, Thompson et al., 2002, Meyerhans et al., 1990, Judo et al., 1998, Odelberg, 1995). A chimeric template is created during one round, then amplified by subsequent rounds to produce chimeric amplicons that are difficult to distinguish from amplicons derived from a single biological sequence.

Alpha diversity: the diversity within a particular area or ecosystem; expressed by the number of species (i.e., species richness) in that ecosystem, or by one or more diversity indices.

Beta diversity: a comparison of of diversity between ecosystems, usually measured as the amount of species change between the ecosystems.

Gamma diversity: a measure of the overall diversity within a large region. Geographic-scale species diversity according to Hunter (2002:448).

Rarefaction allows the calculation of species richness for a given number of individual samples, based on the construction of so-called rarefaction curves. This curve is a plot of the number of species as a function of the number of samples.

TerminologyTerminology

Computational APPROACHES & TOOLSAPPROACHES & TOOLS

Approaches & ToolsApproaches & Tools

Approaches & ToolsApproaches & Tools

Approaches & ToolsApproaches & Tools

Approaches & ToolsApproaches & Tools

Whole Genome SHOTGUNSHOTGUN

Whole Genome ShotgunWhole Genome Shotgun

WGS WorkflowWGS Workflow

WGS WorkflowWGS Workflow

WGS WorkflowWGS Workflow

WGS WorkflowWGS Workflow

Examples of WGS ToolsExamples of WGS Tools

Examples of WGS ToolsExamples of WGS Tools

Analysis of 16S/ITS 16S/ITS Community SurveysCommunity Surveys

16S/ITS community surveys16S/ITS community surveys

16S/ITS issues16S/ITS issues

16S/ITS workflow16S/ITS workflow

16S/ITS workflow16S/ITS workflow

16S/ITS workflow16S/ITS workflow

16S/ITS workflow16S/ITS workflow

Some recommended ToolsTools

Some (recommended) ToolsSome (recommended) Tools

mothur

MEGAN

MEGANMEGAN

2007 →

2011 →

...

...

2012 →

MEGAN 4 for 16S rRNAMEGAN 4 for 16S rRNA

MEGAN 4 for 16S rRNAMEGAN 4 for 16S rRNA

mothurmothur

2009 →

mothurmothur

2009 →

QIIMEQIIME

Integrative Tools/PlatformsTools/Platforms

AXIOMEAXIOME

AXIOMEAXIOME

AXIOMEAXIOME

CloVRCloVR

http://www.edgebio.com

CloVRCloVR

http://www.edgebio.com

http://clovr.org

CloVRCloVR

CloVRCloVR

CloVRCloVR

CloVRCloVR

CloVRCloVR

Ferran BriansóFerran BriansóMGTraining 26/08/2013

Thanks for your attentionThanks for your attention

ferran.brianso@vhir.orgferran.brianso@vhir.org

An Introduction to An Introduction to Metagenomics Data AnalysisMetagenomics Data Analysis

more info at http://ueb.vhir.org/MGT

Recommended