20
Introduction to Microbial Sequencing Matthew L. Settles Genome Center Bioinformatics Core University of California, Davis [email protected]; [email protected]

Introduction to Microbial Sequencing€¢Quality should be established using Fragment analysis traces (pseudo-gel images, RNA RIN > 7.0) •DNA/RNA should not be degraded •260/280

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to Microbial Sequencing€¢Quality should be established using Fragment analysis traces (pseudo-gel images, RNA RIN > 7.0) •DNA/RNA should not be degraded •260/280

IntroductiontoMicrobialSequencing

Matthew L. SettlesGenome Center Bioinformatics Core

University of California, [email protected]; [email protected]

Page 2: Introduction to Microbial Sequencing€¢Quality should be established using Fragment analysis traces (pseudo-gel images, RNA RIN > 7.0) •DNA/RNA should not be degraded •260/280

Generalrulesforpreparingandexperiment/samples

• Preparemoresamplesthenyouaregoingtoneed,i.e.expectsomewillbeofpoorquality,orfail

• Preparationstagesshouldoccuracrossallsamplesatthesametime(orascloseaspossible)andbythesameperson

• Spendtimepracticinganewtechniquetoproducethehighestqualityproductyoucan,reliably

• QualityshouldbeestablishedusingFragmentanalysistraces(pseudo-gelimages,RNARIN>7.0)

• DNA/RNAshouldnotbedegraded• 260/280ratiosforRNAshouldbeapproximately2.0and260/230shouldbebetween2.0and2.2.Valuesover1.8areacceptable

• QuantityshouldbedeterminedwithaFluorometer,suchasaQubit.

Page 3: Introduction to Microbial Sequencing€¢Quality should be established using Fragment analysis traces (pseudo-gel images, RNA RIN > 7.0) •DNA/RNA should not be degraded •260/280

Samplepreparation

Inhighthroughputbiologicalwork(Microarrays,Sequencing,HTGenotyping,etc.),whatmayseemlikesmalltechnical

detailsintroducedduringsampleextraction/preparationcanleadtolargechanges,ortechnicalbias,inthedata.

Nottosaythisdoesn’toccurwithsmallerscaleanalysissuchasSangersequencingorqRT-PCR,buttheydobecomemoreapparent(seenonaglobalscale)andmaycausesignificant

issuesduringanalysis.

Page 4: Introduction to Microbial Sequencing€¢Quality should be established using Fragment analysis traces (pseudo-gel images, RNA RIN > 7.0) •DNA/RNA should not be degraded •260/280

BeConsistent

BECONSISTENTACROSSALLSAMPLES!!!

Page 5: Introduction to Microbial Sequencing€¢Quality should be established using Fragment analysis traces (pseudo-gel images, RNA RIN > 7.0) •DNA/RNA should not be degraded •260/280

Illuminasequencing

• IlluminaSBSTargetRegionP5BC2

BC1P7

Read1(50- 300bp)

Read2(50-300bp)

BC1(8bp) BC2(8bp)

Insertsize

Fragmentlength

Page 6: Introduction to Microbial Sequencing€¢Quality should be established using Fragment analysis traces (pseudo-gel images, RNA RIN > 7.0) •DNA/RNA should not be degraded •260/280

IlluminaMISEQSEQUENCINGhttps://www.illumina.com/systems/sequencing-platforms/miseq/specifications.html

Page 7: Introduction to Microbial Sequencing€¢Quality should be established using Fragment analysis traces (pseudo-gel images, RNA RIN > 7.0) •DNA/RNA should not be degraded •260/280

IlluminaHiSeq Sequencinghttps://www.illumina.com/systems/sequencing-platforms/hiseq-3000-4000/specifications.html

Page 8: Introduction to Microbial Sequencing€¢Quality should be established using Fragment analysis traces (pseudo-gel images, RNA RIN > 7.0) •DNA/RNA should not be degraded •260/280

• ThefirstandmostbasicquestionishowmanybasepairsofsequencedatawillIgetFactorstoconsiderare:

• 1.Numberofreadsbeingsequenced• 2.Readlength(ifpairedconsiderthenasindividuals)• 3.Numberofsamplesbeingsequenced• 4.Expectedpercentageofusabledata

• Thenumberofreadsandreadlengthdataarebestobtainedfromthemanufacturer’swebsite(searchforspecifications)andalwaysusethelowerendoftheestimate.

SequencingDepth

Page 9: Introduction to Microbial Sequencing€¢Quality should be established using Fragment analysis traces (pseudo-gel images, RNA RIN > 7.0) •DNA/RNA should not be degraded •260/280

GenomicCoverage

Onceyouhavethenumberofbasepairspersampleyoucanthendetermineexpectedcoverage

Factorstoconsiderthenare:1. Lengthofthegenome2. Anyextra-genomicsequence(ie mitochondria,virus,plasmids,etc.).For

bacteriainparticular,thesecanbecomeasignificantpercentage

𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑𝐶𝑜𝑣𝑒𝑟𝑎𝑔𝑒𝑠𝑎𝑚𝑝𝑙𝑒 =

𝑟𝑒𝑎𝑑𝐿𝑒𝑛𝑔𝑡ℎ ∗ 𝑛𝑢𝑚𝑅𝑒𝑎𝑑𝑠 ∗ 0.8𝑛𝑢𝑚𝑆𝑎𝑚𝑝𝑙𝑒𝑠 ∗num.lanes

𝑇𝑜𝑡𝑎𝑙𝐺𝑒𝑛𝑜𝑚𝑖𝑐𝐶𝑜𝑛𝑡𝑒𝑛𝑡

Page 10: Introduction to Microbial Sequencing€¢Quality should be established using Fragment analysis traces (pseudo-gel images, RNA RIN > 7.0) •DNA/RNA should not be degraded •260/280

Considerations(whenaliteraturesearchturnsupnothing)• Proportionthatishost(non-microbialgenomiccontent)• Proportionthatismicrobial(genomiccontentofinterest)• Numberofspecies• Genomesizeofeachspecies• Relativeabundanceofeachspecies

Metagenomics Sequencing

Thebackoftheenvelopecalculation

𝑛𝑢𝑚𝑅𝑒𝑎𝑑𝑠𝑠𝑎𝑚𝑝𝑙𝑒 =

𝐶𝑜𝑣𝑒𝑟𝑎𝑔𝑒 ∗ 𝐴𝑣𝑒𝑟𝑎𝑔𝑒𝐺𝑒𝑛𝑜𝑚𝑒𝑆𝑖𝑧𝑒𝑅𝑒𝑎𝑑𝐿𝑒𝑛 ∗ 𝐷𝑖𝑙𝑢𝑡𝑖𝑜𝑛𝐹𝑎𝑐𝑡𝑜𝑟 ∗ (1 − ℎ𝑜𝑠𝑡𝑃𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛) ∗

10.8

Page 11: Introduction to Microbial Sequencing€¢Quality should be established using Fragment analysis traces (pseudo-gel images, RNA RIN > 7.0) •DNA/RNA should not be degraded •260/280

SequencingDepth– Countingbasedexperiments

• Coverageisdetermineddifferentlyfor”Counting”basedexperiments(RNAseq,amplicons,etc.)whereanexpectednumberofreadspersampleistypicallymoresuitable.

• ThefirstandmostbasicquestionishowmanyreadspersamplewillIgetFactorstoconsiderare(perlane):1.Numberofreadsbeingsequenced2.Numberofsamplesbeingsequenced3.Expectedpercentageofusabledata4.Numberoflanesbeingsequenced

IJKLMMKNOPJ

= IJKLM.MJQRJSTJL∗U.VMKNOPJM.OWWPJL

*num.lanes

• Readlength,orSEvsPE,doesnotfactorintosequencingdepth.

Page 12: Introduction to Microbial Sequencing€¢Quality should be established using Fragment analysis traces (pseudo-gel images, RNA RIN > 7.0) •DNA/RNA should not be degraded •260/280

AmpliconSequencing(Communities,genotyping)

Considerations• Numberofreadsbeingsequenced• Proportionthatisdiversitysample(e.g.PhiX)• Numberofsamplesbeingpooledintherun

Thebackoftheenvelopecalculation𝑟𝑒𝑎𝑑𝑠𝑠𝑎𝑚𝑝𝑙𝑒 =

𝑟𝑒𝑎𝑑𝑠_𝑠𝑒𝑞𝑢𝑒𝑛𝑐𝑒𝑑 ∗ 1 − 𝑑𝑖𝑣𝑒𝑟𝑠𝑖𝑡𝑦_𝑠𝑎𝑚𝑝𝑙𝑒𝑛𝑢𝑚_𝑠𝑎𝑚𝑝𝑙𝑒𝑠

example102,000𝑠𝑎𝑚𝑝𝑙𝑒 =

18𝑒6 ∗ 1 − 0.15150Recommendations

• Illumina‘recommends’100Kpersample• I’veused30Kpersamplehistorically,othersarefinewith3Kpersample• Reallyshouldhaveasmanyreadsasyourexperimentneeds

Page 13: Introduction to Microbial Sequencing€¢Quality should be established using Fragment analysis traces (pseudo-gel images, RNA RIN > 7.0) •DNA/RNA should not be degraded •260/280

HowMuch?CommunityRarefactioncurves

• ’Deep’sequenceanumberoftestsamplesamplicons:~1M+reads.metagenomics:1fullHiSeq lane

• Plotrarefactionscurvesoforganismidentification,todetermineifsaturationisachieved

Page 14: Introduction to Microbial Sequencing€¢Quality should be established using Fragment analysis traces (pseudo-gel images, RNA RIN > 7.0) •DNA/RNA should not be degraded •260/280

Metagenomics assembly

Todetermineifyou’vesequenced‘enough’tore-assemble‘most’ofthecommunitymember’sgeneticcontent,looktowhatisleftover- proportionally

Page 15: Introduction to Microbial Sequencing€¢Quality should be established using Fragment analysis traces (pseudo-gel images, RNA RIN > 7.0) •DNA/RNA should not be degraded •260/280

Ampliconsvs.Metagenomics

• Metagenomics• Shotgunlibrariesintendedtosequencerandomgenomicsequencesfromtheentirebacterialcommunity.

• Canbecostlypersample($500tomultithousandspersample)• Betterresolutionandsensitivitytocharacterizethesample• Duetocost,canonlydorelativelyfewsamples

• Ampliconcommunityprofiling• Sequenceonlyoneregionsofonegene(e.g.16s,ITS,LSU)• Cheappersample(atscale,downto$20/sample)• Duetocost,candomanyhundredsofsamplesmakemoreglobalinferences

Page 16: Introduction to Microbial Sequencing€¢Quality should be established using Fragment analysis traces (pseudo-gel images, RNA RIN > 7.0) •DNA/RNA should not be degraded •260/280

• TaxonomicIdentification• Ampliconbased(e.g.16svariableregions)• ShotgunMetagenomics

• FunctionalCharacterization• ShotgunMetagenomics• ShotgunMetatranscriptomics (active)

• GenomeAssembly,FunctionandVariation• ShotgunMetagenomics• ShotgunMetatranscriptomics

CommunitySequencingDesigns

Page 17: Introduction to Microbial Sequencing€¢Quality should be established using Fragment analysis traces (pseudo-gel images, RNA RIN > 7.0) •DNA/RNA should not be degraded •260/280

• DNA/RNAextractionandQA/QC(Bioanalyzer/Gels)• Metatranscriptomes:EnrichmentofRNAofinterestandRNAlibrarypreparation

• LibraryQA/QC(Bioanalyzer andQubit)• Pooling($10/library)

• Metagenomes:DNAlibrarypreparation• LibraryQA/QC(Bioanalyzer andQubit)• Pooling($10/library)

• CommunityProfiling:PCRreactions• LibraryQA/QC(Bioanalyzer andQubit/microplatereader)• Pooling

• Sequencing(NumberofLanes/runs)• Bioinformatics(Generalruleistoestimatethesameamountasdatageneration,i.e.doubleyourbudget)

http://dnatech.genomecenter.ucdavis.edu/prices/

CostEstimation

Page 18: Introduction to Microbial Sequencing€¢Quality should be established using Fragment analysis traces (pseudo-gel images, RNA RIN > 7.0) •DNA/RNA should not be degraded •260/280

BioinformaticsCosts

Bioinformaticsincludes:1.Storageofdata2.Accessanduseofcomputationalresourcesandsoftware3.SystemAdministrationtime4.BioinformaticsDataAnalysistime5.Backandforthconsultation/analysistoextractbiologicalmeaning

Ruleofthumb:Bioinformaticscanandshouldcostasmuch(sometimesmore)asthecostofdatageneration.

Page 19: Introduction to Microbial Sequencing€¢Quality should be established using Fragment analysis traces (pseudo-gel images, RNA RIN > 7.0) •DNA/RNA should not be degraded •260/280

CostEstimation• Amplicons

• 384Samples• Amplicongeneration($20/sample)=$383/sample=$4,596

• SequencingPE300,target30Kreadspersample• Bioinformatics

• Metagenome• 12samples(DNA)• Expectations:HostProportion40%,useaveragegenomesizeofeColi,Targetthe1%andcoverageof20

• SequencingPE150• Bionformatics

Page 20: Introduction to Microbial Sequencing€¢Quality should be established using Fragment analysis traces (pseudo-gel images, RNA RIN > 7.0) •DNA/RNA should not be degraded •260/280

TakeHomes

• Experienceand/orliteraturesearches(otherpeoplesexperiences)willprovidethebestjustificationforestimatesonneededdepth.

• ‘Longer’readsarebetterthanshortreads.• Paired-endreadsaremoreusefulthansingle-endreads• Librariescanbesequencedagain,sodoapilot,performapreliminaryanalysis,thensequencemoreaccordingly.