23
Galaxy RNA-Seq Analysis: H. sapiens Tutorial Research Informatics Solutions Minnesota Supercomputing Institute University of Minnesota Version 3 10/25/2016

Galaxy RNA Seq Analysis: H. sapiens

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Galaxy RNA Seq Analysis: H. sapiens

GalaxyRNA-SeqAnalysis:H.sapiensTutorialResearchInformaticsSolutionsMinnesotaSupercomputingInstituteUniversityofMinnesotaVersion310/25/2016

Page 2: Galaxy RNA Seq Analysis: H. sapiens

Introduction

RISMinnesotaSupercomputingInstitute,UniversityofMinnesota 2

1 Introduction......................................................................................................................................................31.1 Scopeofthistutorial.................................................................................................................................31.2 Referencematerials..................................................................................................................................31.3 Outlineoftutorial.......................................................................................................................................3

2 StartingGalaxy................................................................................................................................................42.1 AccessingGalaxy.........................................................................................................................................52.2 ImportFastqfilesforonesampleintocurrenthistory..............................................................62.3 ImporttheGTFfilefromtheiGenomesdatalibrary..................................................................62.4 Setfileattributes........................................................................................................................................62.5 RunFastQC....................................................................................................................................................6

3 MappingwithTophat...................................................................................................................................73.1 InitialTophatrun.......................................................................................................................................83.2 Determineinsertsize................................................................................................................................93.3 RerunTophatwithcorrectinsertsize...........................................................................................103.4 Reviewmappingstatistics...................................................................................................................10

4 Workflows.......................................................................................................................................................115 VisualizingalignmentswithIGV............................................................................................................115.1 LoadBAMalignmentfilesandGTFintonewhistory..............................................................125.2 LoadfilesintoIGV...................................................................................................................................125.3 Lookatahousekeepinggene.............................................................................................................135.4 Lookatagenewithdifferentialexpression.................................................................................13

6 Computingdifferentialexpressionwithcuffdiff.............................................................................146.1 Runcuffdiff.................................................................................................................................................156.2 Filtercuffdiffoutput...............................................................................................................................16

7 CuffdiffvisualizationwithCummeRbund..........................................................................................177.1 RunCummeRbundtool.........................................................................................................................187.2 ReviewCummeRbundplots................................................................................................................197.3 AdditionalCummeRbundplots:........................................................................................................207.4 Troubleshooting.......................................................................................................................................20

8 AppendixA:Workflows.............................................................................................................................218.1 Extractworkflowfromcurrenthistory.........................................................................................228.2 Edittheworkflow....................................................................................................................................228.3 Createnewhistory..................................................................................................................................238.4 Runworkflow............................................................................................................................................23

Page 3: Galaxy RNA Seq Analysis: H. sapiens

Introduction

RISMinnesotaSupercomputingInstitute,UniversityofMinnesota 3

1 Introduction

1.1 ScopeofthistutorialThisisapractical,hands-ontutorialdesignedtogiveparticipantsexperiencewithRNA-SeqdataanalysisusingTophat,Cufflinks,andCummRbundinGalaxy.Theanalysisinthistutorialistypicalofexperimentsineukaryoticspecieswithhigh-qualitygenomesandgenomeannotationavailable.Participantsareexpectedtobefamiliarwithnext-generationsequencedata,basictheoryofRNA-Seq,andGalaxy.ParticipantsdonotneedpreviousexperiencewithTophat,Cufflinks,orCummeRbund.

1.2 ReferencematerialsRNA-SeqLecturePDFsonMSIwebsite:https://www.msi.umn.edu/sites/default/files/RNA-Seq%20Lecture_2016.pdfGalaxy101:NGSdataanalysishands-ontutorial:www.msi.umn.edu/content/bioinformatics-analysisTophatmanual:ccb.jhu.edu/software/tophat/manual.shtmlCufflinksmanual:cole-trapnell-lab.github.io/cufflinks/manual/CummeRbundmanual:compbio.mit.edu/cummeRbund

1.3 Outlineoftutorial1 Introduction2 StartingGalaxy3 MappingwithTophat4 Workflows5 VisualizingalignmentswithIGV6 Computingdifferentialexpressionwithcuffdiff7 CuffdiffvisualizationwithCummeRbund8 AppendixA:Workflows

Page 4: Galaxy RNA Seq Analysis: H. sapiens

StartingGalaxy

RISMinnesotaSupercomputingInstitute,UniversityofMinnesota 4

2 StartingGalaxy

êTutorialDataset(Sect2.2page6)Thistutorialwillidentifygeneswhoseexpressionlevelsdifferbetweenskeletalmuscletissueandheartmuscletissue.ThesampledatasetusedinthistutorialwascreatedfromtheheartandskeletalmusclesamplesfromtheIlluminaBodymap2.0Project(www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE30611).Thesingleheartandskeletalmusclesamplesweresplitintothreesubsamples,andthereadsmappingtoa5MBregionnearthedistalendofchromosome19wereextractedalongwithsomeunmappedreads.Eachfastqfilecontainsabout50,00050base-pairpaired-endreads.NOTE:Thisdatasetwaschosentoallowforfastprocessingandresponsetimesinaclassroomsettingwhere dozensofpeoplewillbesubmittingjobsatoncetotheserver.Itisnotidealduetothesmallsamplesizes(leadingtoatypical-lookinggraphsinsomecasesandpoorstatistics)andlackofrealbiologicalreplicates(resultinginunrealistically-goodsampleseparation). êGTFFiles(Sect2.3page6)AGTFfileidentifiesthegenomiclocationsofgenesandtheirexons.IfaGTFfileforyourorganismisnotlistedsendarequesttoMSI,orfindoneonlineatsitessuchaswww.ensembl.org/info/data/ftp/index.html,genome.ucsc.edu/cgi-bin/hgTables?command=start,orNCBI.TheGTFfilesprovidedintheIlluminaiGenomescollection(ccb.jhu.edu/software/tophat/igenomes.shtml)havebeenspeciallymodifiedformaximumcompatibilitywiththeCufflinksandCuffdiffprograms.êQualityControl(Sect2.5page6)Itisimportanttoalwaysverifytheintegrityofadatasetbeforestartingtoanalyzeit.Quantifyingdatasetqualitymayuncoverproblemsthatmightotherwisegoundetected.Dataqualityproblemssuchassequencingadaptorcontaminationorlowreadqualityrequiretrimmingandfilteringnotcoveredinthistutorial.SeetheGalaxy101tutorialhandoutontheMSIwebsitefordetailedinstructionsonhowtocleanupalowqualitydataset:www.msi.umn.edu/content/bioinformatics-analysisThegraphsgeneratedinthistutorialarenotentirelytypicalduetothesmallsampledatasetsused.SeeexamplesofoutputfromgoodandbadIlluminadatasetsunderthe“ExampleReports”sectiononthiswebsite:www.bioinformatics.babraham.ac.uk/projects/fastqc/.FormoreinformationaboutinterpretingFastQCoutputrefertotheRIStutorial“QCofIlluminaDatausingGalaxy“handout:www.msi.umn.edu/content/bioinformatics-analysis

Page 5: Galaxy RNA Seq Analysis: H. sapiens

StartingGalaxy

RISMinnesotaSupercomputingInstitute,UniversityofMinnesota 5

2.1 AccessingGalaxya) OpenawebbrowserandnavigatetoMSIGalaxywebsitegalaxy.msi.umn.edub) LoginwithyourMSIusernameandpassword

Toolspane Centerpane Historypane

Page 6: Galaxy RNA Seq Analysis: H. sapiens

StartingGalaxy

RISMinnesotaSupercomputingInstitute,UniversityofMinnesota 6

2.2 ImportFastqfilesforonesampleintocurrenthistory!TutorialDataseta) Atthetopofthescreenselect“SharedData->DataLibraries”b) Select“RISS-tutorial-Hsapiens”fromthelistofdatalibrariesc) Expandthe“Fastq”folderandchecktheboxesnexttothefirsttwofilesd) Nearthetopofthescreenclickthe“toHistory”button,thenclick“Import”toimporttheselecteddatasetstotheselectedhistory(defaultiscurrentdirectory)

2.3 ImporttheGTFfilefromtheiGenomesdatalibrary!GTFFilesa) Atthetopofthescreenselect“SharedData->DataLibraries”b) Select“iGenomes”fromthelistofdatalibrariesc) Checktheboxnexttothe“hg19_chr19_genes_2012-03-09.gtf”filed) Nearthetopofthepageclickthe“toHistory”button,thenclick“Import”toimporttheselecteddatasetstothecurrenthistory

e) Atthetopofthescreenclick“AnalyzeData”toreturntoyourcurrenthistory

2.4 Setfileattributesa) Inthehistorypaneclickonthepencilicon nexttotheheart-1_R1.fastqfileb) ClicktheDatatypetabc) Enter“fastqsanger”inthe“NewType”box.Alistofavailabledatatypeswillappearasyoutype.

d) Clicksave

2.5 RunFastQC!QualityControla) LoadtheFastQCtoolfromthetoolpane:“NGS:QCandmanipulation->FastQC”b) Settheinputfile:select“heart-1_R1.fastq”fromthedropdownmenuunder“Shortreaddatafromyourcurrenthistory”

c) Click“Execute”d) WhenFastQChasfinishedrunning,clickontheeye ontheFastQCWebpageoutputfiletodisplaythefileinthecenterpane

ForarealdatasetyouwouldneedtorepeatthisstepontheR2fastqfile

SeetheGalaxy101tutorialhandoutfordetailedinstructionsonhowtocleanupalowqualitydataset:www.msi.umn.edu/content/bioinformatics-analysis

ForarealdatasetyouwouldneedtorepeatthisstepontheR2fastqfile

Page 7: Galaxy RNA Seq Analysis: H. sapiens

MappingwithTophat

RISMinnesotaSupercomputingInstitute,UniversityofMinnesota 7

3 MappingwithTophat

êReferenceGenomes(Sect3.1page8)ItisimportantthatthereferencegenomeyoualignagainstisgeneratedfromthesamereferencegenomeastheGTFyouareusingbecausethechromosomenamesandcoordinatesusedintheGTFfilemustbethesameasthoseusedinthedatabase.IfthereferencegenomeforyourorganismisnotlistedemailarequesttoMSItohaveitadded.

êMeanInnerDistance–PartI(Sect3.1page8)Thisistheexpected(mean)innerdistancebetweenmatepairs.Forexample,theUMGC’sdefaultfragmentselectionsizeis200,so200–(2*readlength)isagoodvaluetouseforthisparameter.Wewilldeterminetheexactfragmentlengthinthenextsection.

êJunctions(Sect3.1page8)Tophatcanattempttoidentifyexon-exonsplicejunctionssolelyusingyourdataset,oryoumaysupplyasetofgenemodelannotationsasaGTForGFFfile.InthistutorialwewillprovideaGTFannotationfilebecausethehumangenomeiswellannotated.êAdvancedTophatParameters(Sect3.1page8)SeetheRNA-SeqLecture2handoutformoredetailonsettingparametersproperlyforotherorganisms:www.msi.umn.edu/content/bioinformatics-analysisêMeanInnerDistance–PartII(Sect3.2page9)ItisimportantthatthemeaninnerdistanceTophatparameterissetcorrectlyinordertogetthebestmappingresults.TheactualaveragefragmentsizeforeachsamplecanbedeterminedbyrunningTophatwithanestimatedinnerdistanceandthencalculatingthetruevaluefromthemappedreads.RerunningTophatwiththetruevaluewillgiveimprovedresults.êInsertSizeHistogram(Sect3.2page9)Theinsertsizehistogramgeneratedfromthissampledatasetisnoisierthanatypicalhistogram,shownhere:

êMappingStatistics(Sect3.4page10)ItisimportanttodeterminehowwelltheRNA-Seqreadsaligntothereferencegenome.Lowmappingratesrequirefurtherinvestigationtodeterminethecause.

Page 8: Galaxy RNA Seq Analysis: H. sapiens

MappingwithTophat

RISMinnesotaSupercomputingInstitute,UniversityofMinnesota 8

3.1 InitialTophatrun!ReferenceGenomes!MeanInnerDistance–PartI!Junctions!AdvancedTophatParametersa) LoadtheTophattoolfromthetoolpane:“NGS:RNAAnalysis->Tophat”b) Isthislibrarymate-paired->Paired-end(asindividualdatasets)c) RNA-SeqFASTQfile,forwardreads->heart-1_R1.fastqd) RNA-SeqFASTQfile,reversereads->heart-1_R2.fastqe) MeanInnerDistancebetweenMatePairs->100f) Selectareferencegenome->Humanhg19chr19g) TopHatsettingstouse->Fullparameterlisth) Doyouwanttosupplyyourownjunctiondata->Yesi) UseGeneAnnotationModel->Useageneannotationfromhistoryj) Click“Execute”tosubmitthejob

Onlyfilesoftype“fastqsanger”willappearinthedropdownlist.Ifyourfastqfileisn’tshownthefiletypeissetincorrectly.Seestep2.4

Doyouwanttosupplyyourownjunctiondata

Useageneannotationfromhistory

Page 9: Galaxy RNA Seq Analysis: H. sapiens

MappingwithTophat

RISMinnesotaSupercomputingInstitute,UniversityofMinnesota 9

3.2 Determineinsertsize!MeanInnerDistance–PartII!InsertSizeHistograma) Loadtheinsertsizetool“NGS:Picard->CollectInsertSizeMetrics”b) Usingreferencegenome->hg19-chr19c) ClickExecuted) Clickonthe“eye”iconnexttothefirstofthetwooutputfilesinthehistorypanetoviewtheoutputinthecentralpane

e) Identifythemode(highestfrequency)insertsizefromtheprogramoutput

Page 10: Galaxy RNA Seq Analysis: H. sapiens

MappingwithTophat

RISMinnesotaSupercomputingInstitute,UniversityofMinnesota 10

3.3 RerunTophatwithcorrectinsertsizea) ClickonthenameofanyoneoftheTophatoutputfilesinthehistorypanetoexpandit,andclickonthecirculararrowicon todisplaytheTophattoolinthecentralpanewiththeparameterspresetfromthelastTophatrun

b) Changethe“MeanInnerDistancebetweenMatePairs”tothecorrectvalue:Picardvalue–(2*readlength)=160–(2*50)=60

c) Click“Execute”tosubmitthejob

3.4 Reviewmappingstatistics!MappingStatisticsa) Clickonthe“eye”iconnexttotheTophat“align_summary”outputfileinthehistorypanetoviewtheoutputinthecentralpane

b) Renamethecurrenthistory:atthetopofthehistorypaneclickon“Unnamedhistory”andrenameit“heart-1”.(NOTE:youmusthit‘Enter’aftertypingthenewname,ratherthanclickingoutsidethebox)

Page 11: Galaxy RNA Seq Analysis: H. sapiens

Workflows

RISMinnesotaSupercomputingInstitute,UniversityofMinnesota 11

4 Workflows

5 VisualizingalignmentswithIGV

êGalaxyWorkflows(Sect8page21)Allofthestepsthathavebeenperformedontheheart-1sampleneedtoberepeated,inseparatehistories,forthetwootherheartsamplesandthethreeskeletalsamples.Galaxyworkflowsprovideaneasymethodtoautomateananalysispipeline.AppendixAdemonstrateshowtogenerateaworkflowfromyourcurrenthistoryanduseittoanalyzeanothersample.Tosavetimewewillnotworkthroughthissectioninthehands-onworkshop,butthissectionshouldbecompletedifworkingonarealdataset.

êVisualization(Sect5.3page13)Visualizingalignmentsisaquickandeasywaytocheckformajorproblemswiththedata.Youmaywishtoverifythathousekeepinggenesareindeedroughlyevenlycoveredwithreads,ordocumenteddifferentially-expressedgenesindeedhavedifferentialcoveragebetweensamplesofdifferentgroups.êGalaxyVisualizationOptions(Sect5.2page12)Galaxysupportsthreegenomebrowsersforvisualizingdata:TheIntegrativeGenomicsViewer(IGV)istherecommendedgenomebrowserbecauseitisfast,powerful,andeasytouse.TracksterisagenomebrowserbuiltintoGalaxy.AnydatafilethatcanbeviewedinTracksterwill

haveaTrackstericon displayedwith“Download”and“Viewdetails”buttons.TheIntegratedGenomeBrowser(IGB)issimilartoIGV,butmostusersprefertouseIGV.êSampleDataset(Sect5.1page12)InthissectionwestartwithBamalignmentfilesthathavealreadybeengeneratedforallsixheartandskeletalsamples.TheseBamfilesweregeneratedusingtheworkflowpreviouslydescribedinthistutorial.

Page 12: Galaxy RNA Seq Analysis: H. sapiens

VisualizingalignmentswithIGV

RISMinnesotaSupercomputingInstitute,UniversityofMinnesota 12

5.1 LoadBAMalignmentfilesandGTFintonewhistory!SampleDataseta) Createanewhistorybyclickingonthegearicon atthetopofthehistorywindowandselecting“CreateNew”fromthedrop-downmenu

b) Clickon“SharedData->DataLibraries”atthetopofthewindowc) Clickonthe“RISS-tutorial-Hsapiens”datalibraryd) Expandthe“Bam”folderandchecktheboxnexttoeachbamfilee) Click“toHistory”almostatthetopofthecentertoimporttocurrenthistoryf) Importthehg19_chr19GTFfilebyclickingon“SharedData->DataLibraries”atthetopofthescreenandselecting“hg19_chr19_genes_2012-03-09.gtf”fromthe“iGenomes”datalibrary

g) Returntoyourhistorybyclickingon“AnalyzeData”atthetopofthescreen

5.2 LoadfilesintoIGV!GalaxyVisualizationOptionsa) LaunchIGVbrowseronyourcomputer(todownloadIGV:http://software.broadinstitute.org/software/igv/download).

b) Clickonthe“heart-1_accepted_hits.bam”fileinthehistorypanetoexpanditandclickonthe“local”linknextto“displaywithIGV”.Theheart-1.bamfilewillloadintoIGV.

c) Repeatb)toloadskeletal-1.bamintoIGV.

b

Page 13: Galaxy RNA Seq Analysis: H. sapiens

VisualizingalignmentswithIGV

RISMinnesotaSupercomputingInstitute,UniversityofMinnesota 13

5.3 Lookatahousekeepinggene!Visualizationa) Verifythat“Humanhg19”isselectedasthereferencegenomefromthedrop-downmenuatthetopleftoftheIGVwindow

b) Enter“ube2s”inthesearchboxtoviewthereadsaligningtotheubiquitin-conjugatingenzymeE2Sgene,whichisexpectedtohavesimilarexpresslevelsinbothtissuetypes

c) Right-clickontheheartcoveragetrackandselect“SetDataRange”d) Setthe“Max”valueto16e) Repeatfortheskeletalcoveragetrack

5.4 Lookatagenewithdifferentialexpressiona) Enter“tnnt1”inthesearchboxtoviewthereadsaligningtotheTroponinT,slowskeletalmusclegene,whichisexpectedtobeexpressedonlyinskeletalmuscle

b) Adjustthescaleofthecoveragetracksasneeded(trymax=1700)

x

Page 14: Galaxy RNA Seq Analysis: H. sapiens

Computingdifferentialexpressionwithcuffdiff

RISMinnesotaSupercomputingInstitute,UniversityofMinnesota 14

6 Computingdifferentialexpressionwithcuffdiff

êCuffdiffOutput(Sect6.2page16)Cuffdiffproducesmanyoutputfiles.Inthistutorialwelookatthegenedifferentialexpressiontestingfilewhichshowswhichgenesaredifferentiallyexpressed.Theotheroutputfilesalsocontainimportantdata,includingtheresultsofdifferentialexpressiontestingforsplicedtranscripts,primarytranscripts,andcodingsequences.Seethecufflinksmanualfordetailedinformationaboutwhatinformationisineachfile:cole-trapnell-lab.github.io/cufflinks/file_formats/index.html#output-formats-used-in-the-cufflinks-suiteêDifferentialGeneExpression(Sect6.2page16)Thegenedifferentialexpressiontestingoutputfileisatab-delimitedtextfilewithonerowforeachgene.Oursampledatasetonlycoversasmallportionofchr19somostgeneswillhavetoofewalignedreadsforadifferentialexpressiontest.Thesegenesareindicatedwith“NOTEST”or“LOWDATA”incolumn7.êDenovogene/transcriptdiscovery(Sect6.1page15)Theanalysispipelineusedinthistutorialwillquantifytheexpressionofknowngenesinareferenceannotation.Ifyouareinterestedindiscoveringnovelgenesorspliceformsmorestepsneedtobeaddedtothepipeline.RefertotheNatureProtocolspaper“DifferentialgeneandtranscriptexpressionanalysisofRNA-seqexperimentswithTopHatandCufflinks”formoreinformation:www.ncbi.nlm.nih.gov/pubmed/22383036

Page 15: Galaxy RNA Seq Analysis: H. sapiens

Computingdifferentialexpressionwithcuffdiff

RISMinnesotaSupercomputingInstitute,UniversityofMinnesota 15

6.1 Runcuffdiff!Denovogene/transcriptdiscoverya) LoadtheCuffdifftool:“NGS:RNAAnalysis->Cuffdiff”b) Setparameters:

§ GenerateSQLite->Yes§ 1:ConditionName->Heart§ Replicates->useshifttoselectthethreeheartbamfiles§ 2:ConditionName->Skeletal§ Replicates->useshifttoselectthethreeskeletalbamfiles

c) Click“Execute”tosubmitthejob

Page 16: Galaxy RNA Seq Analysis: H. sapiens

Computingdifferentialexpressionwithcuffdiff

RISMinnesotaSupercomputingInstitute,UniversityofMinnesota 16

6.2 Filtercuffdiffoutput!CufdiffOutput!DifferentialGeneExpressiona) Loadthetextfiltertool:“FilterandSort->Filter”b) Clickontheoutputfile“genedifferentialexpressiontesting”toexpanditinthehistorypane(thisallowsyoutoseethecolumnnamesandnumbers)

c) SettheCuffdiffoutputfile“genedifferentialexpressiontesting”asthefiletofilterd) Filteroutgeneswithsignificantchangeinexpressionwithalogfold-changeofatleast1byentering“c14==‘yes’andabs(c10)>1”inthe“withfollowingcondition”textbox

e) Click“Execute”tosubmitthejobf) Clickonthe“eye”iconnexttothefilteroutputfilenametoviewtheresultsinthecenterpane

Page 17: Galaxy RNA Seq Analysis: H. sapiens

CuffdiffvisualizationwithCummeRbund

RISMinnesotaSupercomputingInstitute,UniversityofMinnesota 17

7 CuffdiffvisualizationwithCummeRbund

êCummeRbundCummeRbundisaneasytouseRpackagethattakestheoutputfilesfromacuffdiffrunandcreatesaSQLitedatabaseoftheresults.Thisallowstheusertoexploredataforgenes,transcripts,transcriptionstartsites,andCDSregionsacrossmultiplesamplesorconditions.CummeRbundimplementsnumerousplottingfunctionsforcommonlyusedvisualizations.TheCummeRbundwrapperinGalaxyallowseasyaccesstomuchofCummeRbund’sfunctionality.FormoredetailsaboutavailableplotsrefertotheCummeRbundwebsite:compbio.mit.edu/cummeRbund/êDensityPlotsAKerneldensityplotisinterpretedthesameasahistogram.Thedensityplotshowsthedistributionofgeneexpressionlevelsacrossdifferentsamples.Allsamplesshouldhavereasonablysimilardistributions.Alog10(FPKM)of0=1FPKM,whichisverylowexpression.êMDSPlotsMDSplotsaresimilartoPrincipleComponentAnalysis(PCA)plots.Theyareusefulfordeterminingthemajorsourcesofvariationinthedataset.Ideallysamplesfromthesameexperimentalgroupwillbeclusteredtogetherintheplotindicatingthatexperimentalconditionisthemajorsourceofvariation.Samplesmightalsoclusterbyage,batch,date,technician,orothertechnicalaspectoftheexperiment.êDendogramAdendogramisatreediagramshowinghowsampleclusterbysimilarity.Ideallysamplesfromthesameexperimentalgroupareclusteredtogether.

Page 18: Galaxy RNA Seq Analysis: H. sapiens

CuffdiffvisualizationwithCummeRbund

RISMinnesotaSupercomputingInstitute,UniversityofMinnesota 18

7.1 RunCummeRbundtool!CummeRbunda) LoadtheCummeRbundtool:NGS:RNAAnalysis->cummeRbundvisualizeCuffdiffoutput

b) Setparameters:§ +InsertPlots(clickthreetimestogeneratethreeplots)§ Plottype:Density§ Plottype:MultiDimensionalScaling(MDS)Plot§ Plottype:Dendrogram

c) Click“Execute”tosubmitthejob

HavepatiencewhensettingtheCummeRbundparameters.Afterchangingeachsettingittakesseveralsecondsforthecenterpanetoreload.Thisiscommonwhenworkingwithlargehistories.

Page 19: Galaxy RNA Seq Analysis: H. sapiens

CuffdiffvisualizationwithCummeRbund

RISMinnesotaSupercomputingInstitute,UniversityofMinnesota 19

7.2 ReviewCummeRbundplots!Densityplots,MDSplots,andDendrogramsa) Whenthecummerbundjobhasfinishedrefreshthehistorypanebyclickingontherefreshiconatthetopofthehistorypane

b) Clickthe“eye”iconnexttotheeachofthethreecummerbundoutputfilestoviewtheplots

c) Verifythat:• Thesampleshavesimilardensitydistributions• ThesamplesclusterbyexperimentalconditionintheMDSplot• Thesampleclusterbyexperimentalconditioninthedendrogram

Page 20: Galaxy RNA Seq Analysis: H. sapiens

CuffdiffvisualizationwithCummeRbund

RISMinnesotaSupercomputingInstitute,UniversityofMinnesota 20

7.3 AdditionalCummeRbundplots:a) Volcano,Heatmap,ExpressionPlot,andCluster.

7.4 TroubleshootingIfyouexperienceproblemsusingGalaxysendanemailtohelp@msi.umn.eduwithasubjectbeginning“RIS”andareportoftheproblem.

Page 21: Galaxy RNA Seq Analysis: H. sapiens

AppendixA:Workflows

RISMinnesotaSupercomputingInstitute,UniversityofMinnesota 21

8 AppendixA:Workflows

êGalaxyWorkflows(Sect8.1page22)Allofthestepsthathavebeenperformedontheheart-1sampleneedtoberepeatedforthetwootherheartsamplesandthethreeskeletalsamples.Galaxyworkflowsprovideaneasymethodtoautomateananalysispipeline.AppendixAdemonstrateshowtogenerateaworkflowfromyourcurrenthistoryanduseittoanalyzeanothersample.Tosavetimewewillnotworkthroughthissectioninthehands-onworkshop.êWorkflowParameters(Sect8.2page22)TheworkflowwesetupinthissectionwillrunFastQC,Tophat,andInsertionsizemetrics.Tophat2willberunjustonceusingtheinnermatedistancecalculatedfromthefirstsample.Samplesthatweresequencedtogetherinthesamebatchoftenhaveverysimilaraverageinsertsizesandthesameinnermatedistancecanbeusedforallsamples.ChecktheInsertionsizemetricsresultsafterrunningtheworkflowtoverifythatisthecase.

Page 22: Galaxy RNA Seq Analysis: H. sapiens

AppendixA:Workflows

RISMinnesotaSupercomputingInstitute,UniversityofMinnesota 22

8.1 Extractworkflowfromcurrenthistory!GalaxyWorkflowsa) Atthetopofthehistorypaneclickonthesmallgeariconandselect“ExtractWorkflow”fromthepop-upmenu

b) Inthe“Workflowname”boxenter“QCandTophatc) Uncheckthesecond(closesttothebottom)Tophatrund) Click”CreateWorkflow”undertheworkflowname

8.2 Edittheworkflow!Workflowparametersa) Clickon“Workflow”atthetopoftheGalaxywindowb) Clickontheworkflowthatwasjustcreatedandselect“Edit”fromthedrop-downmenu

c) Movetheelementsoftheworkflowaroundtomakeiteasiertoseehowtheyareconnected.

d) ClickonthefirstInputdatasetboxandsettheNamefieldto‘R1’.Repeatforsecondinputdataset(‘R2’).

e) ClickontheTophatboxtodisplaytheTophatoptionsinthe“Details”paneontherightside.

f) Setthe“MeanInnerDistancebetweenMatePairs”to60.g) VerifytheotherTophatparametersaresetcorrectly.h) Saveyourchangesbyselecting“Options->Save”nearthetopofthescreeni) Returntoyourhistorybyclickingon“AnalyzeData”atthetopofthescreen

Page 23: Galaxy RNA Seq Analysis: H. sapiens

AppendixA:Workflows

RISMinnesotaSupercomputingInstitute,UniversityofMinnesota 23

8.3 Createnewhistorya) Renamethecurrenthistory:atthetopofthehistorypaneclickon“Unnamedhistory”andrenameit“heart-1”.(NOTE:youmusthit‘Enter’aftertypingthenewname,ratherthanclickingoutsidethebox.)

b) Createanewhistorybyclickingonthegeariconatthetopofthehistorypaneandselecting“CreateNew”fromthepop-upmenu

c) Namethenewhistory“heart-2”d) Importtheheart-2fastqfilesbyclickingon“SharedData->DataLibraries”atthetopofthescreenandselectingthe“heart-2_R1.fastq”and“heart-2_R2.fastq”filesfromthe“RISS-tutorial-Hsapiens”datalibrary

e) Importthehg19_chr19GTFfilebyclickingon“SharedData->DataLibraries”atthetopofthescreenandselecting“hg19_chr19_genes_2012-03-09.gtf”fromthe“iGenomes”datalibrary

f) Returntoyourhistorybyclickingon“AnalyzeData”atthetopofthescreen

8.4 Runworkflowa) Loadaworkflowbyclickingon“Workflow”atthetopofthescreenb) Clickontheworkflowthatwasjustcreatedandselect“Run”fromthedropdownmenuc) Selectthe“heart-2_R1.fastq”fileinthefirstdrop-downmenuandthe“heart-2_R2.fastq”fileintheseconddrop-downmenu

d) VerifytheGTFfileisselectedinthethirddrop-downmenue) Clickon“Runworkflow”tosubmittheFastQC,Tophat,andInsertionsizemetricsjobs.