ASM 2013 Metagenomic Assembly Workshop Slides

View
1.324
Download
1
Category

Technology

Tags:

Preview:

Citation preview

Adina HoweMichigan State University, AdjunctArgonne National Laboratory, PostdocASM Workshop, May 2013

Visual Complexityhttp://www.flickr.com/photos/maisonbisson

Titus Brown Jim Tiedje Jason Pell Qingpeng Zhang Jordan Fish Eric McDonald Chris Welcher Aaron Garoutte Jiarong Guo

Janet Jansson Susannah Tringe

MSU Lab: Collaborators:

I will upload this on slideshare (adinachuanghowe) Khmer documentation

github.com/ged-lab/khmer/https://khmer.readthedocs.org/en/latest/guide.html

Manuscripts

Scaling metagenome sequence assembly with probabilistic de Bruijn graphs

http://www.pnas.org/content/early/2012/07/25/1121464109

A reference-free algorithm for computational normalization of shotgun sequencing data

http://arxiv.org/abs/1203.4802

Assembling large, complex metagenomeshttp://arxiv.org/abs/1212.2832

A few gotchas of sequencing:

Errors / Artifacts (confusion)

Diversity / Complexity (scale)

1. Digital normalization (lossy compression)

2. Partitioning3. Enabling usage of current previously

unusable assembly tools

Reduces data for analysis Longer sequences (increased accuracy of annotation) Gene order Does not rely on known references, access to unknowns Creates new references Lots of assembly tools available

But…

Reduces data for analysis Longer sequences (increased accuracy of annotation) Gene order Does not rely on known references, access to unknowns Creates new references Lots of assembly tools available

But…

High memory requirementsDepends on good (~10x) sequencing coverage

“Coverage” is simply the average number of reads that overlap

each true base in genome.

Here, the coverage is ~10 – just draw a line straight down from the top through all of the reads.

Note that k-mer abundance is not properly represented here! Each blue k-mer will be present

around 10 times.

Each single base error generates ~k new k-mers.Generally, erroneous k-mers show up only once – errors are random.

Low-abundance peak (errors)

High-abundance peak(true k-mers)

Suppose you have a dilution factor of A

(10) to B(1). To get 10x of B you need to

get 100x of A! Overkill!!

This 100x will consume disk space

and, because of errors, memory.

We can discard it for you…

A digital analog to cDNA library normalization, diginorm:

Reference free.

Is single pass: looks at each read only once;

Does not “collect” the majority of errors;

Keeps all low-coverage reads;

Smooths out coverage of regions.

Digital normalization produces “good” metagenome assemblies.

Smooths out abundance variation, strain variation.

Reduces computational requirements for assembly.

It also kinda makes sense :)

Split reads into “bins” belonging to different source species.

Can do this based almost entirely on connectivity of sequences.

“Divide and conquer”Memory-efficient

implementation helps to scale assembly.

Pell et al., 2012, PNAS

Low coverage is the dominant problem blocking assembly of your soil metagenome.

In order to build assemblies, each assembler makes choices – uses heuristics – to reach a conclusion.

These heuristics may not be appropriate for your sample! High polymorphism? Mixed population vs clonal? Genomic vs metagenomic vs mRNA Low coverage drives differences in

assembly.

We can assemble virtually anything but soil ;). Genomes, transcriptomes, MDA, mixtures, etc. Repeat resolution will be fundamentally limited by

sequencing technology (insert size; sampling depth)

Strain variation confuses assembly, but does not prevent useful results. Diginorm is systematic strategy to enable

assembly. Banfield has shown how to deconvolve strains at

differential abundance. Kostas K. results suggest that there will be a

species gap sufficient to prevent contig misassembly.

Most metagenomes require 50-150 GB of RAM.

Many people don’t have access to computers of that size.

Amazon Web Services (aws.amazon.com) will happily rent you such computers for $1-2/hr.

http://ged.msu.edu/angus/2013-hmp-assembly-webinar/index.html

Optimizing our programs => faster.

Building an evaluation framework for metagenome assemblers.

Error correction!

Achieving one or more assemblies is fairly straightforward.

An assembly is a hypothesis and evaluating them is challenging, however, and where you should be thinking hardest about assembly.

There are relatively few pipelines available for analyzing assembled metagenomic data.

Questions?

How do we study complexity? Interactions? Diversity? Communities? Evolution? Our environment?

Visual Complexityhttp://www.flickr.com/photos/maisonbisson

• Major efforts of data collection

• Open-mind for discoveries• Willingness to adjust to

change• Multiple efforts• Well-designed experiments

Workshop example: Illumina deep sequencing and scaling large datasets on soil metagenomes

We receive Gb of sequences Generally, my data is…

Split by barcodes Untrimmed Adapters are present Two paired end fastq files

Underestimation of computational requirements: Quality control steps usually require 2-3

times the amount of hard drive space Similarity comparison against known

databases impractical (soil metagenome ~50 years to BLAST)

Home Alone ScreamMy first slide graphic that I’m scared may date me.

Two ways to reduce the onslaught:

Cluster into known observances (annotate, bin)AssemblySome mix of the above

Ten of you upload 1 Hiseq flowcell into MG-RAST

Illumina short reads from soil metagenome (~100 bp)

454 short reads from soil metagenome (~368 bp)

Assembled contigs (Illumina) reads from soil metagenome (~491 bp)

Read length will increase… computational requirements? Assembly great way to reduce data.

Recommended

ASM Charts. Outline ASM Charts Components of ASM Charts ASM Charts: An Example Register Operations Timing in ASM Charts ASM Charts = Digital

Documents

Metagenomic, phylogenetic, and functional characterization

Documents

Isolation of a novel lipase from a metagenomic library ... · Isolation of a novel lipase from a metagenomic library derived from mangrove ... enzymes adapted ... ing of a metagenomic

Documents

Hello ASM World: A Painless and Contextual Introduction to x86 Assembly rogueclown DerbyCon 3.0 September 28, 2013

Documents

ASM ʭ 1 )cyy/courses/assembly/...- 3 - ( ) "219 jk"lm

Documents

Poster Presentations – - Bioinformatics · Assembly of metagenomic reads into longer contigs becomes a standard step in metagenomics studies. ... Improved Prokaryotic Gene Prediction

Documents

Bayesian biclustering for microbial metagenomic

Documents

NANO Technical Information - ASM Assembly Systems | ASM

Documents

Investigating the Metric 0201 Assembly Process...Investigating the Metric 0201 Assembly Process Clive Ashmore ASM Assembly Systems Weymouth, UK Abstract The advance in technology and

Documents

Computer Organization & Assembly Languagespjcheng/course/asm2008/asm... · 2008-11-30 · Computer Organization & Assembly Languages Pu-Jen Cheng Procedure Adapted from the slides

Documents

संयुत व े ता नद शका COMPOSITE VENDOR DIRECTORY9).pdf · Main Lube oil Strainer Assembly OST Linkage Asm Power Asm Rocker arm Apl Turbocharger Water Pump

Documents

Assembly 1: Covers - LaMiaStampante.it della Lexmar… · Assembly 1: Covers Asm ... paper media low/out 2 12G6471 1 Tray interlock ... 16—1 56P1216 1 ITU motor drive assembly with

Documents

MIPS Assembly Language Programmer’s Guide - DISIsolmi/teaching/arch_2002-2003... · MIPS Assembly Language Programmer’s Guide ASM-01-DOC. ... It also discusses memory ... Auxiliary

Documents

L5 – Simulator 1 Comp 411 – Fall 2009 9/16/09 Adventures in Assembly Land What is an Assembler ASM Directives ASM Syntax Intro to SPIM/MARS Simple examples

Documents

ASM Assembly Systems | ASM Assembly Systems GmbH & Co. …

Documents

Assembly Language - University of Texas at Austinfussell/courses/cs310h/...Assembly Process Convert assembly language file (.asm) into an executable file (.obj) for the LC-3 simulator

Documents

SHAMAN : a SHiny Application for Metagenomic ANalysisshaman.pasteur.fr/_w_8e9b5d178a7ae231f2370076f0893e6f4ddb2a… · SHAMAN : a SHiny Application for Metagenomic ANalysis Amine

Documents

Overview of Virus Metagenomic Classification Methods and Their …sihua.ivyunion.org/QT/Overview of Virus Metagenomic Classification... · Nooij et al. Virus Metagenomic Classiﬁcation

Documents

PART NUMBER ASM-10harrier.smartwire.com/SpecSheet/ASM-10.pdf · T 800.379.1191 | | © Copyright 2016 Windy City Wire, Inc. | All rights reserved. DESCRIPTION CABLE ASSEMBLY, 10' 3.5MM

Documents

Asm Spfile on Asm Diskgroup

Documents