Upload
phamkhue
View
230
Download
5
Embed Size (px)
Citation preview
Microbiome16SAnalysis:AQuick-StartGuideAmandaBirminghamCenter for Computat iona l B io logy &B io informat i csUn ivers i ty o f Ca l i forn ia at San D iego
Agenda:• Rapidintroductionto16Smicrobiomestudies
• Summaryofanalysisstepsandsoftwaretools
•Minimalinstructiononcomputeenvironment
• Practicumon16SanalysiswithQIIME2◦ Alternatinglectureandtutorial§ Goal:AnytopicI’velecturedabout,youwillgettotestlive(evenifwedon’tfinishalltopics)
• NoticeanemphasisonspeedJ◦ ReddotonslidemeansIwon’tbecoveringitindepth◦ ConsiderableadditionalmaterialisdescribedintheSupplementalSlides
V1 V2 V3 V4 V5 V6 V7 V8 V9
100 500 1000 1500
MarkerGeneMetagenomicsBasics• Approach:PCRampliconsofaconservedconstitutivegene(a"markergene")todetermineidentityandabundanceofmicrobespresent◦ Usuallythe“conservedconstitutivegene”ofchoiceisanrRNA§ Forbacteria/archea,usuallythe16S—smallsub-unit(SSU)ofribosome−ExcludeseukaryoticDNAaseukaryotes’SSUis18S
• 16SrRNA widelyconservedacrossbacteria/archaea(sosharedprimersites)◦ Butalsohas9hypervariableregions§ Canbeusedtoiddifferent“species”andbuildphylogenetictrees
• Can’tstudyfungiwith16S(theydon’thaveit)nor18S(evolvestooslowly)◦ Internaltranscribedspacer(ITS)isstandardfungimarkergene;28Salsoused
WhentoUseMarkerGeneMetagenomics•WhenyoursampleisMOSTLYmadeupofhostDNA,e.g.tumorsamples◦ ShotgunreadswillalsobemostlyhostDNA,withfewleftoverforthemicrobes◦ Use16SrRNA instead,astheprimersexcludeeukaryoticDNAfromamplification
•Whenyou’recheapJ
Imagemod
ified
from
Morgan&
Hutten
hower
(2012).PLoSCo
mpu
tBiol8(12):e1002808.
WhentoUseMarkerGeneMetagenomics•WhenyoursampleisMOSTLYmadeupofhostDNA,e.g.tumorsamples◦ ShotgunreadswillalsobemostlyhostDNA,withfewleftoverforthemicrobes◦ Use16SrRNA instead,astheprimersexcludeeukaryoticDNAfromamplification
•Whenyou’recheapJ
• Thegoodnews:◦ Targetgenestudiesareslightly cheapertoprepandsequencethanshotgunones◦ Analysissoftwareismature,andmanystudiescanbeanalyzedonalaptop◦ Knowntaxacanbedetectedwithverylow(100sofreads)sequencedepth
WhentoUseMarkerGeneMetagenomics•WhenyoursampleisMOSTLYmadeupofhostDNA,e.g.tumorsamples◦ ShotgunreadswillalsobemostlyhostDNA,withfewleftoverforthemicrobes◦ Use16SrRNA instead,astheprimersexcludeeukaryoticDNAfromamplification
•Whenyou’recheapJ
• Thegoodnews:◦ Targetgenestudiesareslightly cheapertoprepandsequencethanshotgunones◦ Analysissoftwareismature,andmanystudiescanbeanalyzedonalaptop◦ Knowntaxacanbedetectedwithverylow(100sofreads)sequencedepth
• Thebadnews◦ Notargetgenedistinguishesallmicrobeswell§ And,foragivengene,noprimerpairdistinguishesallmicrobeswell
◦ Noothergenomeinformation(outsidetargetgene)iscaptured
CommonIssuesinMarkerGeneStudies• Neglectingmetadata◦ Analysiscannottestforeffectsof,ordiscardbiasfrom,featuresyoudidn’trecord!
• Pickingnovel16Sprimers—notallcreatedequal◦ EarthMicrobiomeProjectrecommends515f-806rprimers,error-correctingbarcodes
• Nottakingprecautionstosupportampliconsequencing◦ SomeIlluminamachinesrequirehighPhiX,lowclusterdensity
MarkerGeneAnalysisWorkflow
•Mostcriticalanalysischoices:◦ WhethertodoOTUpickingorerrorcorrection
◦ Whatαandβmetricstopick§ Somearephylogeneticallyaware,somearen’t
Pre-process[Removeprimers,
demultiplex,qualityfilter]
Pickfeatures&representativesequences
Featuretable(taxonabundance)Assigntaxonomy
Alignsequences&inferphylogeny
Phylogenetictree
Calculateαdiversity
Calculateβdiversity
Visualize
Testdifferencesforsignificance
Figures
P-values
Sequencefile(s)
Metadataspreadsheet
Cleanedsequence
file
SoftwareSelection• Google“16Sanalysis<programname>”;maincontendersare
•Mothur◦ Name:notanacronym(playonDOTUR,SONS)◦ Philosophy:singlepieceofre-implementedsoftware◦ Toppro:easytoinstall◦ Topcon:re-implementationscouldbebuggy◦ Language:C++◦ Model:open-source◦ License:GPL◦ Published:2009◦ Developed:atUmichigan
SoftwareSelection• Google“16Sanalysis<programname>”;maincontendersare
• QIIME◦ Name:QuantitativeInsightsIntoMicrobialEcology◦ Philosophy:wrapperofbest-in-classsoftware◦ Toppro:extremelyflexible◦ Topcon:QIIME2notyetfeature-complete◦ Language:python(wrapper)◦ Model:open-source◦ License:mixed◦ Published:2010◦ Developed:AtUCSD,NAU
SoftwareSelection• Google“16Sanalysis<programname>”◦ MaincontendersareMothur andQIIME◦ Bothwidelyused◦ Bothpridethemselvesonqualityofsupport
•WilldiscussonlyQIIMEinthistutorial
• QIIME1vsQIIME2◦ QIIME1won’tbesupportedafterendof2017◦ QIIME2notyetfeature-complete§ Butalreadymucheasiertouse!
◦ ThistutorialusesQIIME2only
• I’mnotaQIIME2developer◦ I’mnottakingcreditforthistool,justdemonstratingit!
GettingtheSoftware&Data
• Notcoveredinthistutorial,forsakeoftime
• QIIME2isveryeasytoinstallwiththeConda environment- andpackage-manager◦ Conda isalsoveryeasytoinstall—eitherMiniconda orAnacondaversions◦ OnceConda installed,QIIME2installisoneline,e.g.forlinux 64-bit,conda create –n qiime2-2017.6 --file https://data.qiime2.org/distro/core/qiime2-2017.6-conda-linux-64.txt
• Dataacquisitionmethodisproject-specific◦ Publicdatacanoftenbepulleddownfrominternetwithwget orcurl commands◦ Sequencingdatafromacoreusuallyavailablebyftp◦ Ifallelsefails,useaflashdriveJ
GetReadyToPractice!• “Whyareyoumakingmetype?!”◦ QIIME2hasaGUI—butstillveryunderdevelopment◦ QIIME2command-lineinterfaceiseasytoinstallandreadytorun◦ Emphasizetypingratherthancopy/pastingcommandsbecauseinyourrealanalyses,youwillneedtotypeintheappropriatecommandsforyourdata§ Needtomakerealistictypingmistakesnowsoyouknowhowtocorrectthemlater!
GetReadyToPractice!• “Whyareyoumakingmetype?!”◦ QIIME2hasaGUI—butstillveryunderdevelopment◦ QIIME2command-lineinterfaceiseasytoinstallandreadytorun◦ Typingratherthancopy/pastingcommandsbecauseinyourrealanalyses,youwillneedtotypeintheappropriatecommandsforyourdata§ Needtomakerealistictypingmistakesnowsoyouknowhowtocorrectthemlater!
•WillbeworkinginshellontheUbuntulinux operatingsystem◦ TerminalonMacOSXisverysimilar,windowsISN’T§ Mostbioinformaticssoftwaredoesn’tsupportWindows§ UsevirtualmachineorCygwin
• Notethatyouwillbetrainingonunusuallytractabledata◦ Beautiful,clearclustering,significantp-values,etc.◦ Ifyourowndatadon’tgivesuchclearresults,thatdoesn'tmeantheanalysisiswrong
TipstoHelp•Whentypingafilenameordirectorypath,youcanusetabcompletion◦ Starttypingfile/directorypath,thenhittab—ifonlyonefile/directorymatcheswhatyoualreadytyped,shellfillsthatin§ Veryhelpfulforcorrectlyenteringlongfilenames§ If>1matches,shellfillsinasmuchasitcan
• Pressuparrowtogetbackpreviouscommandsyoutyped
• Ifyoutypeacommand,pressenter,and“nothinghappens”,don’tjustrunitagain◦ Manyunix commandsproducenovisibleoutputtoshell—justgetbackcommandprompt◦ Thatdoesn’tmeantheydonothing,sorunningthem*again*canscrewupresults◦ Donotstorecommandsinawordprocessingprogram(orPowerPoint,etc)§ E.g.,MSWordchangeshyphensto“mdash”—whichcommandlinecan’tunderstand
◦ Shellcommandsarecase-sensitive
Pre-Acknowledgment• PleasejoinmeinthankingPedroFernandes
•Withouthisimpeccablymanagedtrainingcomputers,resources,androom,thesetutorialswouldnotbepossible!
MakingaMappingFile
• “Mappingfile”containsmetadataforstudy◦ MustcontaininfoneededtoprocesssequencesandtestYOURhypotheses
• QIIME1requiredcertaincolumnsincertainorder,butQIIME2ismoreflexible◦ Tab-separatedtextfilewithcolumnlabelsinfirstline+atleastonedataline§ Columnlabelvaluesmustbeunique(i.e.noduplicatevalues)
◦ Firstcolumnisthe“identifier”column(sampleID)§ Allvaluesinthefirstcolumnmustbeunique(i.e.noduplicatevalues)
◦ Seehttps://docs.qiime2.org/2017.6/tutorials/metadata/
• Theeasiestwaytomakeamappingfileiswithaspreadsheet◦ ButExcelisnotyourfriend!§ Routinelycorruptsgenesymbols,anythinginterpretedasadates,etc,&isn’treversible
Practicum:ViewingAMappingFile• OpenTerminal◦ Forbelow,remembertotrytabcompletion!
source activate qiime2-2017.6
cd tutorial-qiime2
ls
nano sample-metadata.tsv
• Turnonyourgreenlightwhenthemappingfileopensforyou
MappingFileView
Practicum:ViewingAMappingFile• OpenTerminal◦ Forbelow,remembertotrytabcompletion!
source activate qiime2-2017.6
cd tutorial-qiime2
ls
nano sample-metadata.tsv
• Stretchthewindowsoyoucanlookatthecontents;then,toclose,typeCtrl + x
•MappingfileerrorscanleadtoQIIME2errors—orworse,garbageresults!◦ Keemei (pronounced‘keymay’)toolchecksforerrorsinGoogleSheets§ Chromeonly,andmusthaveGoogleaccounttouse§ Seehttp://keemei.qiime.org/
ImportingData• Aftersequencedataisonyourmachine,mustbeimportedtoaQIIME2“artifact”◦ Artifact=data+metadata◦ QIIME2artifactshaveextension.qza
• Differentinputcommandsfor◦ Differentkindsofinputdata(e.g.,single-endvspaired-end)◦ Differentformatsofinputdata(e.g.,sequences&barcodesinsameordifferentfile)
• See“Importingdata”tutorialathttps://docs.qiime2.org/
https://docs.qiime2.org/2017.6/concepts/#data-files-qiime-2-artifacts
Practicum:ImportingData
• Abackslash\ isusedtobreakupacommandontomultiplelines◦ Ifyouprefertotypethewholecommandontoonerun-online,youcanleaveitout
qiime tools import \--type EMPSingleEndSequences \--input-path emp-single-end-sequences \--output-path emp-single-end-sequences.qza
Practicum:ImportingData
•Whatdoesthiscommandactuallydo?◦ Tellsqiime tolookintothefolderemp-single-end-sequences …◦ ForthekindofsequencefilesexpectedforEMPSingleEndSequences ...◦ Andloadthemintoanewqiimeartifactnamedemp-single-end-sequences.qza
• Notestructureofargumentstoqiime command◦ Pluginnamethenmethodnamethenarguments§ Ordermatters
qiime tools import \--type EMPSingleEndSequences \--input-path emp-single-end-sequences \--output-path emp-single-end-sequences.qza
Demultiplexing
•Mustassignresultingsequencestosamplestoanalyze
• Youmaynotneedtodothis!◦ Ifsequencingdonebyacore,resultsmaybedemultiplexed beforereturnedtoyou
QIIM
E2,https://qiim
e2.org.
Practicum:Demultiplexingqiime demux emp-single \--i-seqs emp-single-end-sequences.qza \--m-barcodes-file sample-metadata.tsv \--m-barcodes-category BarcodeSequence \--o-per-sample-sequences demux.qza
• Argumentshaveanamingconvention◦ Inputs(--i-<whatever>),metadata(--m-<whatever>),parameter(--p-<whatever),output(--o-<whatever>)
◦ Orderdoesn’tmatter
Practicum:Demultiplexing (cont.)• Presumablyyou’dliketoknowhowyourdemultiplexing worked
• QIIME2artifactfilescan’tbevieweddirectly(e.g.,innano)
• Newconcept:QIIME2visualizationfile◦ Has.qzv extension◦ Isintendedforhuman(ratherthancomputer)use◦ Generallyprovideinfoviaawebbrowser
Practicum:Demultiplexing (cont.)
• Nowviewthevisualization,locallyqiime tools view demux.qzv
qiime demux summarize \--i-data demux.qza \--o-visualization demux.qzv
Practicum:Demultiplexing (cont.)
Practicum:Demultiplexing (cont.)
Practicum:Demultiplexing (cont.)
Practicum:Demultiplexing (cont.)
• Nowviewthevisualization,locallyqiime tools view demux.qzv
•Whendoneexamining,inTerminal,typeJUSTq◦ Don’tneedtohitEnterafterwards◦ Beware:quittingvisualizationdoesn’tclosewebpage(butpagebecomesunreliable)
qiime demux summarize \--i-data demux.qza \--o-visualization demux.qzv
QualityControl
Bokulich,N.etal.(2013).Quality-filteringvastlyim
provesdiversityestimatesfrom
Illumina
amplicon
sequencing.NatM
ethods,10(1),57–59.
QIIMEdefaults:• r=3• q=3• p=0.75• n=0• c=0.005%or2
Practicum:QualityControlqiime quality-filter q-score \--i-demux demux.qza \--o-filtered-sequences demux-filtered.qza \--o-filter-stats demux-filter-stats.qza
• Thisrunsthecommandwithdefaultvaluesforallthetuneable parameters◦ Toseetheoptionalparameters,theirdescriptions,andtheirdefaults,runjustqiime quality-filter q-score
Practicum:QualityControlqiime quality-filter q-score \--i-demux demux.qza \--o-filtered-sequences demux-filtered.qza \--o-filter-stats demux-filter-stats.qza
qiime quality-filter visualize-stats \--i-filter-stats demux-filter-stats.qza \--o-visualization demux-filter-stats.qzv
• Remember:fortheremainderofthistutorial,anytimeyoucreateavisualizationfile,youwillneedtorunanadditionalcommandtoviewit!
qiime tools view yourvisualizationfilename.qzv
Practicum:QualityControl
FeatureTableCreation—ThePast• Lastyear:OTU(OperationalTaxonomicUnit)◦ “anoperationaldefinitionofaspeciesusedwhenonlyDNAsequencedataisavailable”◦ Sequencesat/aboveagivensimilaritythresholdconsideredpartofthesameOTU§ 97%istheusual“species-level”threshold−Similaritydeterminedusingalignment(time-consuming)
§ Purposeistominimizeimpactofsequencingerrors−Butalsomasksfine(sub-OTU)variationinrealbiologicalsequences
◦ Resultsverydifficulttocompareacrossstudiesifdonedenovo§ “Closedreference”,“openreference”methodsincreasecomparabilityrequirereferencedatabase
• Outputisa“featuretable”:§ Rowsaresamples§ ColumnsareOTUs(arbitraryidentifiersifdenovo,fromreferencedatabaseifclosedreference)§ ValuesarefrequencyofreadsfromthatOTUinthatsample
FeatureTableCreation—ThePresent• Thisyear:sOTU (sub-OTU)methods◦ Useerrormodelingtoinsilco correctsequencingmistakes§ Soundsimpossiblebutisactuallyquiteaccurate,withrighterrormodel−Errormodelisspecifictothesequencingtype(e.g.,454,IlluminaHi/MiSeq)
◦ Result:onlysequenceslikelytohavebeeninputtothesequencer◦ Optionsinclude(NOTacompletelist):§ DADA2(2016)§ Deblur (2017)
• OutputisSTILLafeaturetable:◦ Rowsaresamples◦ ColumnsareSEQUENCES◦ ValuesarefrequencyofreadsfromthatSEQUENCEinthatsample
QIIM
E2,https://qiim
e2.org.
Practicum:FeatureTableCreation
• Thiscommandcantakeafewminutestorun◦ Sodon’tworryifthecommandpromptdoesn’timmediatelyreturnafteryouhitenter
•Wheredoyouguessthenumber120camefrom?
qiime deblur denoise-16S \--i-demultiplexed-seqs demux-filtered.qza \--p-trim-length 120 \--o-representative-sequences rep-seqs.qza \--o-table table.qza \--o-stats deblur-stats.qza
Practicum:FeatureTableCreationqiime deblur denoise-16S \--i-demultiplexed-seqs demux-filtered.qza \--p-trim-length 120 \--o-representative-sequences rep-seqs.qza \--o-table table.qza \--o-stats deblur-stats.qza
•Wheredoyouguessthenumber120camefrom?◦ Itisthelengthtowhichallsequenceswillbetrimmed
◦ ItwaschosenbyviewingtheInteractiveQualityPlotindemux.qzv
◦ Youmightevenchooseamoreconservativelength,like110
Practicum:FeatureTableCreation(cont.)qiime feature-table tabulate-seqs \--i-data rep-seqs.qza \--o-visualization rep-seqs.qzv
FeatureTableTabulationView
Practicum:FeatureTableCreation(cont.)qiime feature-table summarize \--i-table table.qza \--o-visualization table.qzv \--m-sample-metadata-file sample-metadata.tsv
FeatureTableSummaryView
FeatureTableSummaryView
FeatureTableSummaryView
FeatureTableSummaryView
…
PhylogeneticTreeCreation• Evolutionisthecoreconceptofbiology◦ There’sonlysomuchyoucanlearnfrommicrobeswhileignoringevolution!
• Evolution-awareanalysesofadatasetneedaphylogenetictreeofitssequences◦ Denovo:infertreeusingonlysequencesfromdataset◦ Reference-based:insertsequencesfromdatasetintoanexistingphylogenetictree§ Notallexistingphylogeniesarecreatedequal—havestrengthsandweaknessesbasedonintendedpurposewhendeveloped
• PhylogeneticallybasedanalysesinQIIME2needarooted tree
Unrooted: Rooted:
Geer,R.C.,Messersmith,D.J,Alpi,K.,Bhagwat,M.,Chattopadhyay,A.,Gaedeke,N.,Lyon,J.,Minie,M.E.,Morris,R.C.,Ohles,J.A.,Osterbur,D.L.&Tennant,M.R.2002.NCBIAdvancedWorkshopforBioinformaticsInformationSpecialists.[Online]http://www.ncbi.nlm.nih.gov/Class/NAWBIS/.
Practicum:PhylogeneticTreeCreation• Note:herewearedoingdenovophylogenetictreecreation◦ NotnecessarilytheBESTapproach,butaneasytoshowyouJ
• Novisualizationswillbeproduced
Practicum:PhylogeneticTreeCreationqiime alignment mafft \--i-sequences rep-seqs.qza \--o-alignment aligned-rep-seqs.qza
qiime alignment mask \--i-alignment aligned-rep-seqs.qza \--o-masked-alignment masked-aligned-rep-seqs.qza
qiime phylogeny fasttree \--i-alignment masked-aligned-rep-seqs.qza \--o-tree unrooted-tree.qza
qiime phylogeny midpoint-root \--i-tree unrooted-tree.qza \--o-rooted-tree rooted-tree.qza
CoreMetrics• Sohowdoyouactuallycomparemicrobialcommunities?◦ Can’tjusteyeballthe(gigantic,sparse)featuretablesandlookfordifferences◦ Instead,calculatemetricsthatcompressalotofinfointoasinglenumber◦ Thendostatisticaltestsonmetricstolookforsignificantdifferences§ BECAREFUL—microbiomedataissparse,compositional,etc,sorequiresunusualtests§ QIIME2usesappropriatetests;ifdoingyourown,MUST checktheliteraturefirst
• Thesemetricsarelossy!◦ Nometricexposesalltheinformationinthefullfeaturetable§ Ifitdid,itwouldBEthefeaturetable
◦ Differentmetricscapturedifferentaspectsofthecommunities
• Thus...◦ Don’task,“WhichmetricshouldIuse?”UNTILyouknowwhatyou’relookingfor!
CoreMetrics(cont.)• QIIME2calculatesasmorgasbordofmetricsforyouwithonecommand
• Alphadiversity§ Shannon’sdiversityindex(aquantitativemeasureofcommunityrichness)§ ObservedOTUs(aqualitativemeasureofcommunityrichness)§ Faith’sPhylogeneticDiversity(aqualitiative measureofcommunityrichnessthatincorporatesphylogeneticrelationshipsbetweenthefeatures)
§ Evenness(orPielou’s Evenness;ameasureofcommunityevenness)
• Betadiversity§ Jaccard distance(aqualitativemeasureofcommunitydissimilarity)§ Bray-Curtisdistance(aquantitativemeasureofcommunitydissimilarity)§ unweightedUniFrac distance(aqualitativemeasureofcommunitydissimilaritythatincorporatesphylogeneticrelationshipsbetweenthefeatures)
§ weightedUniFrac distance(aquantitativemeasureofcommunitydissimilaritythatincorporatesphylogeneticrelationshipsbetweenthefeatures)
NormalizationforCoreMetrics
• Normalizationisnecessaryforvalidcomparisonsofabundance/diversity◦ “Buthow?!”§ Longstandingapproach:rarefaction(reduceallsamplestouniformsamplingdepth)§ Recentpublicationcausedconcern− Wastenot,wantnot:whyrarefyingmicrobiomedataisinadmissible.McMurdie PJ,HolmesS.PLoS
Comput Biol.2014;10(4).§ Furtherworkdemonstratedconcernisexcessive− Normalizationandmicrobialdifferentialabundancestrategiesdependupondatacharacteristics.
WeissS,etal.Microbiome.2017Mar3;5(1):27.(Note:I’manauthor,sonotobjective)
• Calculatedmetricvaluesdependonsamplingdepth
• Ex:circledcolumnhasmorenon-zerocountsthanothers◦ Isitscommunityreallymorediverse—ordowejustSEEmore?◦ Sampleswithmoresequences(greatersamplingdepth)show
morediversity
Rarefaction• Whatisrarefaction?◦ randomlysubsamplingthesamenumberofsequencesfromeachsample◦ NB:sampleswithoutthatnumberofsequencesarediscarded
• Concerns:◦ Toolow:ignorealotofsamples’information◦ Toohigh:ignorealotofsamples◦ Still agoodchoicefornormalization(WeissS,etal.Microbiome.2017):§ “Rarefyingmoreclearlyclusterssamplesaccordingtobiologicaloriginthanother
normalizationtechniquesdoforordinationmetricsbasedonpresenceorabsence”§ “Alternatenormalizationmeasuresarepotentiallyvulnerabletoartifactsduetolibrary
size”
• Researchermustchoosesamplingdepth—buthow?
SamplingDepthSelection• Don’tsweatittoomuch◦ “Low”depths(10-1000sequencespersample)captureallbutverysubtlevariations
◦ Retainingsamplesisusuallymoreimportantthanretainingsequences§ MaycarenotjusthowmanysamplesareleftoutbutWHICHsamplesareleftout
Fig.2,Kuczynski,J.etal.,"Directsequencingofthehumanmicrobiomereadilyrevealscommunitydifferences",GenomeBiology,2010
Fulldataset(approximately1,500sequencespersample) Datasetsampledatonly10sequencespersample,showingthesamepattern
Exercise:CoreMetricsSamplingDepth
qiime diversity core-metrics \--i-phylogeny rooted-tree.qza \--i-table table.qza \--p-sampling-depth ??? \--output-dir metrics
•DoNOTstarttypingyet!
• Notethatthecoremetricscommandrequiresasamplingdepth
Exercise:CoreMetricsSamplingDepth•Whichsamplingdepthshouldweuse?◦ Howcanwedecide?
◦ Workwithyourpartnertochooseasamplingdepth,thenanswer:§ Whydidyouchoosethisvalue?§ Howmanysampleswillbeexcludedfromyouranalysisbasedonthischoice?§ Howmanytotalsequenceswillyoubeanalyzinginthecore-metricscommand?
qiime tools view table.qzv
Answers:CoreMetrics
•Myanswers:◦ Whydidyouchoosethisvalue?§ Anythinghigherexcludes>=halfofrightpalmsamples
◦ Howmanysampleswillbeexcludedfromyouranalysisbasedonthischoice?§ 4,allfromrightpalmofsubject1
◦ Howmanytotalsequenceswillyoubeanalyzinginthecore-metricscommand?§ 24,000(23.40%)
qiime diversity core-metrics \--i-phylogeny rooted-tree.qza \--i-table table.qza \--p-sampling-depth 800 \--output-dir metrics
Practicum:CoreMetrics
• Note:thereisnosinglevisualizationforcoremetrics◦ Wewillexamineafewdifferentvisualizationslater
qiime diversity core-metrics \--i-phylogeny rooted-tree.qza \--i-table table.qza \--p-sampling-depth 800 \--output-dir metrics
BetaDiversity• “Between-sample”diversity◦ Hassimilarcategories,caveatsas𝜶 diversity
• Apopularphylogeneticoptionis'UniFrac’:◦ Measureshowdifferenttwosamples'componentsequencesare
◦ WeightedUniFrac:takesabundanceeachsequenceintoaccount
IllustrationcourtesyofD
r.Ro
bKn
ight
BetaDiversityOrdination• Ordination:multivariatetechniquesthatarrangesamplesalongaxesonthebasisofcomposition
• PrincipalCoordinatesAnalysis:awaytomapnon-EuclideandistancesintoaEuclideanspacetoenablefurtherinvestigation◦ AbbreviatedasPCoA,nottobeconfusedwithPCA(PrincipalComponentAnalysis)◦ Startingpointisdistancematrix§ NOTthefullsetofindependentvariablesforeachsample
◦ npairwisedistancesareprojectedinton-1dimensions◦ PCAperformedtoreducethedimensionalitybackdown
• PCoA axescan’tbedecomposedintoindependentvariablecontributions◦ Butresultscanbecomparedtometadatatoidentifypatterns
ABC
ABC
Practicum:BetaDiversityOrdinationqiime emperor plot \--i-pcoa metrics/unweighted_unifrac_pcoa_results.qza \--m-metadata-file sample-metadata.tsv \--o-visualization metrics/unweighted-unifrac-emperor.qzv
• ThisisonlyshowingthePCoA visualizationofONEbetadiversitymetric◦ Notnecessarily“thecorrectone”!◦ Rememberthat3othersarecalculatedbycore-metrics alone
• Tocheckthegroupsignificanceofadifferentmetric,justinputadifferentfile◦ Tofindthem:cd metrics/ls *_pcoa_results.qza
BetaDiversityOrdinationView
Exercise:BetaDiversityOrdination• InitialPCoA view(seepreviousslide)iscompletelyindependent ofmetadata◦ Clusters/gradients/etc seeninPCoA areproducedbyunsupervisedlearning,basedonthefeaturetableinformationwithoutanyawarenessofmetadata
• It’sgreattoseeclear,distinctclustersasinthisdataset–butevengreateriftheycanbeexplainedbyaknownmetadatacategory
•Workwithyourpartnertoanswerthefollowingquestion:◦ Canyoufindametadatacategorythatappearsassociatedwiththeobservedclusters?§ Hint:Experimentwithcoloringpointsbydifferentmetadata
Answers:BetaDiversityOrdination•Myanswer:◦ Canyoufindametadatacategorythatappearsassociatedwiththeobservedclusters?§ Yep:BodySite.Noteleftandrightpalmaren’tdistinctfromeachother,unsurprisingly
Practicum:BetaDiversityGroupSignificance
• Standardcaveatsapply—notthe“onetruemetric”,etc
qiime diversity beta-group-significance \--i-distance-matrix metrics/unweighted_unifrac_distance_matrix.qza \--m-metadata-file sample-metadata.tsv \--m-metadata-category BodySite \--p-pairwise \--o-visualization \metrics/unweighted-unifrac-bodysite-significance.qzv
BetaDiversityGroupSignificanceView
Exercise:BetaDiversityGroupSignificance•Workwithyourpartnertoanswerthesequestions:◦ Doesthegroupsignificanceanalysisbearoutyourintuitionfromtheordination?§ Ifso,arethedifferencesstatisticallysignificant?§ AretherespecificpairsofBodySite valuesthataresignificantlydifferentfromeachother?
◦ HowaboutSubject?§ Hint:youwillneedtorunanewcommand!
Answers:BetaDiversityGroupSignificance•Myanswers:◦ Doesthegroupsignificanceanalysisbearoutyourintuitionfromtheordination?§ Yes§ Ifso,arethedifferencesstatisticallysignificant?−Yes,withp<=0.001(bonus:why doIsay ”lessthan orequal to”?)
§ AretherespecificpairsofBodySite valuesthataresignificantlydifferentfromeachother?−Yes,allofthepairsexceptleftpalm/rightpalm
◦ HowaboutSubject?• qiime diversity beta-group-significance \
• --i-distance-matrix metrics/unweighted_unifrac_distance_matrix.qza \
• --m-metadata-file sample-metadata.tsv \
• --m-metadata-category Subject\
• --p-pairwise \
• --o-visualization metrics/unweighted-unifrac-subject-significance.qzv
§ Nope:significanceofdifferenceofdistributionsofunweightedunifrac metricgroupedbysubjecthasp=0.442
Acknowledgements• CenterforComputationalBiology&Bioinformatics,UniversityofCaliforniaatSanDiego
• Caporaso lab,NorthernArizonaUniversity• Knightlab,UCSD• QIIME2developmentteam!◦ Especiallyfortheexcellent“MovingPictures”tutorialonwhichthisoneisbased
Practicum:BetaDiversityOrdination• Butwait,thisistime-seriesdata!Maybewe’dliketoviewitonatimeaxis:
qiime emperor plot \--i-pcoa metrics/unweighted_unifrac_pcoa_results.qza \--m-metadata-file sample-metadata.tsv \--p-custom-axis DaysSinceExperimentStart \--o-visualization metrics/unweighted-unifrac-emperor-bydayssince.qzv
• Standardcaveatsapply
SupplementalSlides
BetaDiversityOrdinationView(cont.)
Practicum:BetaDiversityCorrelation
• Standardcaveatsapply
qiime diversity beta-correlation \--i-distance-matrix metrics/unweighted_unifrac_distance_matrix.qza \--m-metadata-file sample-metadata.tsv \--m-metadata-category DaysSinceExperimentStart \--o-visualization metrics/unweighted-unifrac-
dayssinceexperimentstart-beta-correlation.qzv
BetaDiversityCorrelationView
AlphaDiversity• “Within-sample”diversity◦ Manydifferentmetricsexist§ Taxonomy-based(e.g.,numberofobservedOTUs)−Assumeeverythingisequallydissimilar−Morelikelytoseedifferencesbasedoncloserelatives
§ Phylogeny-based(e.g.,phylogeneticdiversityoverwholetree)−Treatlessrelateditemsasmoredissimilar−Betteratscalingtheobserveddifferences
◦ The“correct”metric(s)arethoserelevanttoyourhypothesis§ PleasedoHAVEahypothesis!
• Testingapproach:◦ Examinealphadiversitymetricbymetadatavalues◦ Testwhetherdifferencesinmetricdistributionisdifferentbetweengroups(ifmetadataiscategorical)orcorrelatedwithmetadata(ifmetadataiscontinuous)
NumberofOTUsbysamplingsite
AlphaDiversity
Highwithin-samplediversity—why?
Practicum:AlphaDiversityGroupSignificance
• Note:onlyshowingyouthegroupsignificancevisualizationofONEalphadiversitymetric◦ Rememberthat3othersarecalculatedbycore-metrics alone◦ TheoneIamshowingisnot“thecorrectone”—picktheonethatfitsyourhypothesis
• Tocheckthegroupsignificanceofadifferentmetric,justinputadifferentvectorfile◦ Tofindthem:
qiime diversity alpha-group-significance \--i-alpha-diversity metrics/faith_pd_vector.qza \--m-metadata-file sample-metadata.tsv \--o-visualization metrics/faith-pd-group-significance.qzv
cd metrics/ls *_vector.qza
AlphaDiversityGroupSignificanceView
AlphaDiversityGroupSignificanceView
Exercise:AlphaDiversityGroupSignificance
•Workwithyourpartnertoanswerthesequestions:◦ IsBodySite valueassociatedwithsignificantdifferencesinphylogeneticdiversity?◦ Whichtwositeshavethemostsignificantdifferenceinphylogeneticdiversitydistributions?§ Notedifferentbetweenp-valueandq-value
◦ IsSubjectvalueassociatedwithsignificantdifferencesinphylogeneticdiversity?
qiime diversity alpha-group-significance \--i-alpha-diversity metrics/faith_pd_vector.qza \--m-metadata-file sample-metadata.tsv \--o-visualization metrics/faith-pd-group-significance.qzv
Answers:AlphaDiversityGroupSignificance•Myanswers:◦ IsBodySite valueassociatedwithsignificantdifferencesinphylogeneticdiversity?§ Yes,withp<1E-3
◦ Whichtwositeshavethemostsignificantlydifferenceinphylogeneticdiversitydistributions?§ Leftpalmis(equally)mostsignificantlydifferentfromgutandtongue−Consider:anyideawhyperhapsleftpalmbutnotright?
◦ IsSubjectvalueassociatedwithsignificantdifferencesinphylogeneticdiversity?§ No,pvalue=0.24
Practicum:AlphaDiversityCorrelation
• Samecaveatasbefore:◦ OnlyshowingthecorrelationvisualizationofONEalphadiversitymetric§ Notnecessarily“thecorrectone”!
qiime diversity alpha-correlation \--i-alpha-diversity metrics/evenness_vector.qza \--m-metadata-file sample-metadata.tsv \--o-visualization metrics/evenness-alpha-correlation.qzv
AlphaDiversityCorrelationView
TaxonomicAssignment• SequencefeaturesorOTUshavelimitedutility◦ Atsomepoint,you’llwanttolinkyourfindingstopublishedwork◦ Thatrequiresidentifyingthetaxonomyofeachsequencefeature
• Steps:◦ Pickreferencedatabase§ Ihearyoucry,“WhichoneshouldIuse?”
◦ Trainaclassifieralgorithmtoassigntaxonomiestosequences§ Usethereferencedatabaseasthetrainingset
◦ Runtheclassifieralgorithmonyoursequencefeatures
TaxonomicAssignment• SequencefeaturesorOTUshavelimitedutility◦ Atsomepoint,you’llwanttolinkyourfindingstopublishedwork◦ Thatrequiresidentifyingthetaxonomyofeachsequencefeature
• Steps:◦ Pickreferencedatabase§ Ihearyoucry,“WhichoneshouldIuse?”
◦ Trainaclassifieralgorithmtoassigntaxonomiestosequences§ Usethereferencedatabaseasthetrainingset
◦ Runtheclassifieralgorithmonyoursequencefeatures
CommonIssuesinMarkerGeneStudies• Neglectingmetadata◦ Analysiscannottestforeffectsof,ordiscardbiasfrom,featuresyoudidn’trecord!
• Pickingnovel16Sprimers—notallcreatedequal◦ EarthMicrobiomeProjectrecommends515f-806rprimers,error-correctingbarcodes
• Nottakingprecautionstosupportampliconsequencing◦ SomeIlluminamachinesrequirehighPhiX,lowclusterdensity
• Selectinganinappropriatereferencedatabase◦ E.g.,Greengenes (16S)referencedatabasewhensequencingITS
MarkerGeneReferenceDatabases◦ NOTacompletelist:§ Greengenes:16S§ Silva:16S/18S§ RDP:16S/18S/28S§ UNITE:ITS
◦ Anothernotcompletelistateukref.org/databases(notjusteukaryotic)◦ Attheveryleast,chooseadatabasethatincludesyourmarkergene!§ Beyondthat,formalguidanceishardtofind§ ButofftherecordyoumightgetsomeinformalguidanceJ
TaxonomicAssignment• SequencefeaturesorOTUshavelimitedutility◦ Atsomepoint,you’llwanttolinkyourfindingstopublishedwork◦ Thatrequiresidentifyingthetaxonomyofeachsequencefeature
• Steps:◦ Pickreferencedatabase§ Ihearyoucry,“WhichoneshouldIuse?”
◦ Trainaclassifieralgorithmtoassigntaxonomiestosequences§ Usethereferencedatabaseasthetrainingset
◦ Runtheclassifieralgorithmonyoursequencefeatures
CommonIssuesinMarkerGeneStudies• Neglectingmetadata◦ Analysiscannottestforeffectsof,ordiscardbiasfrom,featuresyoudidn’trecord!
• Pickingnovel16Sprimers—notallcreatedequal◦ EarthMicrobiomeProjectrecommends515f-806rprimers,error-correctingbarcodes
• Nottakingprecautionstosupportampliconsequencing◦ SomeIlluminamachinesrequirehighPhiX,lowclusterdensity
• Selectinganinappropriatereferencedatabase◦ E.g.,Greengenes (16S)referencedatabasewhensequencingITS
• Expectingspecies-leveltaxonomycalls◦ MostOTUs/featuresonlyspecifiedtofamilyorgenuslevel
Taxonomy:ExpectationVsRealityIdealResult RealResult
Kingdom Bacteria Bacteria
Phylum Proteobacteria Proteobacteria
Class Gammaproteobacteria GammaproteobacteriaOrder Enterobacteriales Enterobacteriales
Family Enterobacteriaceae Enterobacteriaceae
Genus Eschericia ---
Species coli OTU 2445338
Strain O157:H7 --
Practicum:TaxonomicAssignmentqiime taxa tabulate \--i-data taxonomy.qza \--o-visualization taxonomy.qzv
TaxonomicAssignmentTabulationView
Practicum:TaxonomicAssignmentqiime feature-classifier classify-sklearn \--i-classifier gg-13-8-99-515-806-nb-classifier.qza \--i-reads rep-seqs.qza \--o-classification taxonomy.qza
qiime taxa barplot \--i-table table.qza \--i-taxonomy taxonomy.qza \--m-metadata-file sample-metadata.tsv \--o-visualization taxa-bar-plots.qzv
TaxonomicAssignmentBarPlotView
Exercise:TaxonomicAssignment• “Level1”=kingdom,“Level2”=phylum,etc
•Workwithyourpartnerto:◦ Visualizethetaxaatlevel2◦ SortthesamplesbyBodySite◦ Doyouseeanythingsuggestive?
Answers:TaxonomicAssignment
• GutsureseemstohavealotmoreBacteroidetes thantheothersites
DifferentialAbundanceAnalysis•Whygotothetroubleofassigningtaxonomies?◦ Probablyyouwanttoknowwhetheranyparticulartaxaaredifferentiallyabundant§ Indifferentindividuals,environments,timepoints,etc
• Howtotestfordifferentialabundance?◦ Remember:microbiomedatasetsare“compositional”(fixedsum)◦ Watchout:“traditional”statisticalmethodsperformbadlyforthissortofdata!§ E.g.,95%falsepositiveswhenyouexpectanFDRof5%
◦ Whattouseinstead?§ Balancetrees(borrowedfromgeology)arecurrentlythebestknownoption(asof2017)−Buttheyaren’timplementedinQIIME2yet−Sountiltheyare,usepreviousbestknownoption(asof2015)
§ ANCOM(ANalysis ofCompositionOfMicrobiomes)
Practicum:DifferentialAbundanceAnalysisqiime taxa collapse \--i-table table.qza \--i-taxonomy taxonomy.qza \--p-level 2 \--o-collapsed-table table-level2.qza
qiime composition add-pseudocount \--i-table table-l2.qza \--o-composition-table comp-table-level2.qza
qiime composition ancom \--i-table comp-table-level2.qza \--m-metadata-file sample-metadata.tsv \--m-metadata-category BodySite \--o-visualization ancom-bodysite-level2.qzv
DifferentialAbundanceAnalysisView
Details:DifferentialAbundanceAnalysis•W-statistic◦ #ofotheritemsfromwhichasingleitemisfoundtobesignificantlydifferent§ With alpha=0.05by default(can be changed)
• Percentileabundancetable:◦ Atableofitemsandtheirpercentileabundancesineachgroup◦ Rowsareitems◦ Columnsarepercentilewithinagroup◦ Valuesareabundanceofreadsforgivenpercentileforthatgroup