Transcript
Page 1: Worksheet 4.3 - Mapping reads using HISAT2 Authors: Mary Allen …dowell.colorado.edu/.../4_qc/4_3_Worksheet_mapping_2019.pdf · 2019. 7. 10. · Complete the new “mapping.sbatch”

Worksheet4.3-MappingreadsusingHISAT2 Authors:MaryAllen&DanielRamirezHISAT2manual:https://ccb.jhu.edu/software/hisat2/manual.shtml Samtoolsmanual:http://www.htslib.org/doc/samtools.htmlUsername:Screenshotsshow‘daramirez’,thoughyouwillseeyourownusername!

1. Usinganappropriateterminal,logontotheclustertousehisat2: a. Usepwdtomakesureyouknowwhereyouareandlstomakesureyou

knowwhatisinthisdirectory.

b. Changetheworkingdirectory(cd)toyourownscratchdirectory.

2. Makeanewdirectory/folder(mkdir)namedhisat2.

Thisdirectorythatwillcontaintheresultsfromhisat2.Theerrorandoutputfilesgeneratedbyyourbatchscriptjobswillbestoredin“eofiles”.Thebatchscriptthatyouwillcreatewillliveinthe“sbatch”directory.

3. Gocheckthefastqdatafilesinthefollowingpublicdirectoryusingcdandls:/scratch/Workshop/SR2019.Inthere,thereareseveralfolderscontainingfastqfilesthathaveallbeenalignedusinghisat2;fromATAC-seq,toChIP-seq,andRNA-seq.Inthisexample,wewillmap/alignsequencingdatafromaChIP-seqexperimentfromahumancellline.Tomakethisexamplerunquickenoughforteachingpurposes,asubsampleofthewholeChIP-seqfilehasbeenproducedcorrespondingtosomesequencingreadsfromchromosome1only.Wewillworkwiththefile

Page 2: Worksheet 4.3 - Mapping reads using HISAT2 Authors: Mary Allen …dowell.colorado.edu/.../4_qc/4_3_Worksheet_mapping_2019.pdf · 2019. 7. 10. · Complete the new “mapping.sbatch”

“SRR5855054_chr1.trimmed.fastq”whichhasbeenalreadybeentrimmed,andlivesin/scratch/Workshop/SR2019/4_qc/trimmomatic.

4. Findthescriptbatchtemplate“template.sbatch”inthedirectory:

/scratch/Workshop/SR2019/scripts

5. Copythescriptbatch“template.sbatch”thatyoujustlookedattoyoursbatch

directory/scratch/Users/<YOUR_USERNAME>/sbatch”andchangeitsnameto“mapping.sbatch”(mv<originalname><newname>).

6. Completethenew“mapping.sbatch”filewiththerightcontenttorunhisat2.

(hint:transitiontoinsertmodebypressingiifusingvim.

Page 3: Worksheet 4.3 - Mapping reads using HISAT2 Authors: Mary Allen …dowell.colorado.edu/.../4_qc/4_3_Worksheet_mapping_2019.pdf · 2019. 7. 10. · Complete the new “mapping.sbatch”

a. Changethenameofthescriptbatchfrom<JOB-NAME>tosomethingmore

useful,suchas“mapping”. b. Replace<EMAIL>withyourownemailaddresstowhichyouwanttoreceive

anynotifications. c. Replace<USERNAME>withyourownusernametocompletethepath

directorytowheretostoretheerrorandoutputfiles. d. Completethefollowingfields:nnodes,ntasks,memandtime.Hisat2canuse

multipleprocessorsperinputfile.So,1node,4tasks/processors/CPUs,5Gbformemoryand5minutesforwall-timeshouldbeenough.

e. Specifythepathtotheinputfile“SRR5855054_chr1.trimmed.fastq”as

thevalueofthevariable“INPUT_DIRECTORY”.Alsospecifythepaththatleadstothe“hisat2”directoryyoucreatedearlierinyourscratchdirectoryasthevalueofthevariable“OUTPUT_DIRECTORY”.

f. Assigntherequiredmodulesnecessarytorunthisbatchscriptjobwithboth

hisat2andsamtools.Tolookforthecorrectbowtie2andsamtoolsmodules,exitvimbysavingallchanges(pressESCandtype:wq!),andintheterminal,listallavailablemodulesonthecomputerclusterthatcontaintheword“hisat2”and“samtools”inthem.Dothiswiththecommandmodulespider<string>andlookfortheonesforhisat2andsamtools.

Page 4: Worksheet 4.3 - Mapping reads using HISAT2 Authors: Mary Allen …dowell.colorado.edu/.../4_qc/4_3_Worksheet_mapping_2019.pdf · 2019. 7. 10. · Complete the new “mapping.sbatch”

Usingvim,add“moduleloadhisat2/2.1.0”and“moduleloadsamtools/1.8”inthefile“mapping.sbatch”inthesection“<MODULES_TO_LOAD>”.

g. Thelastedityouneedtodoistheactualblockoftextthatspecifieshowto

runhisat2andacoupleofsamtoolscommandsneededtoobtainafilereadyforvisualizationusingtheIntegrativeGenomicsViewer(IGV).

1)Thesyntaxtousehisat2forsingle-endreadsisasfollows: hisat2[options]-x<genome_index>-U<input_fastq>><output_sam>

Donotforgettospecifythefullpathtoallthefiles,includingthehumangenomeindexfiles.Usethevariablesthatyoucreatedearliertomakethingseasier.Youcoulddecidetypethewholehisat2commandinasingleline,asshownherebelow:

Butthatisveryhardtoread.Youcouldinsteadbreakupthecommandontomanylinesusingthecharacter\attheendofeveryline.These\charactersareignoredbythecomputer,butwillhelpyouidentifyeachpartofthecommandmoreeasily.

Page 5: Worksheet 4.3 - Mapping reads using HISAT2 Authors: Mary Allen …dowell.colorado.edu/.../4_qc/4_3_Worksheet_mapping_2019.pdf · 2019. 7. 10. · Complete the new “mapping.sbatch”

2)Thesyntaxtousesamtoolsviewisasfollows: samtoolsview[options]<output_bam>><input.sam> 3)Thesyntaxtousesamtoolssortisasfollows: samtoolssort[options]<input_bam>><output_sorted.bam> 4)Thesyntaxtousesamtoolsindexisasfollows: samtoolsindex<input_sorted.bam>><output_sorted.bam.bai>

So,wewillgofromhavinganemptytemplate,tohavingacompletehisat2&samtoolsblockofcommandsandacompletebatchscript.

Page 6: Worksheet 4.3 - Mapping reads using HISAT2 Authors: Mary Allen …dowell.colorado.edu/.../4_qc/4_3_Worksheet_mapping_2019.pdf · 2019. 7. 10. · Complete the new “mapping.sbatch”

Saveallchangestothe“mapping.sbatch”fileandexitvim.

7. Nowthatthebatchscriptisready,submitittothejobmanagerSLURMtobeginprocessingtheChIP-seqsequencingdata.Intheterminal,whilelocatedinthe“sbatch”directorywhere“mapping.sbatch”lives,typesbatch<sbatchfile>.Thejobmanagerwillgiveyouajobnumber.Oncesubmitted,youcancheckonthestatusofjobsbytypingsqueue-uusername.

8. Movetothe“eofiles”directoryandopentheerrorandoutputfiles.Ifyourjobfailed,

hereiswhereyoushouldgotofigureoutwhatwentwrong.Ifyourjobsucceeded,youcanseeinthe“.err”filethehisat2alignmentreport.

9. Checkthe“hisat2”directory.Thereshouldbefourfiles:asamfile,abamfile,a

sortedbamfile,andasortedbamindexfile.

10. Thesorted.bamandbaifilesarethetwonecessaryfilesforvisualizationofthedata

usingIGV.Open“X2Go”.Loginontoanewsessionwindow.Ifyouhavenotconfiguredyoursession,thenyoushouldconfigureitnow.Nameyoursessionwithameaningfulnameinthesection“Sessionname”.Inhost,typethecorrespondingservernamethatyouwanttoconnectto,forthisclasstype“18.222.55.224”.TypeyourGitHubusernamein“Login”.Selecttheoption“Tryautologin(viaSSHAgentordefaultSSHkey”.Change“Sessiontype”to“XFCE”.Donotchangeanythingelse.Savechangesofthenewsessionbyclicking“OK”.

Page 7: Worksheet 4.3 - Mapping reads using HISAT2 Authors: Mary Allen …dowell.colorado.edu/.../4_qc/4_3_Worksheet_mapping_2019.pdf · 2019. 7. 10. · Complete the new “mapping.sbatch”

11. Clickonthecreatedsessionboxontheright,andselect“Yes”ifaskedifyoutrust

thehostkey.Ifsuccessfullyconnected,anewwindowwillappear.ThisistheclusternodethatyouwillusetovisualizeyourdatausingIGV.

12. IncreasethesizeofthewindowsothatIGVcanbedisplayedcompletely.Openthe

terminalemulatoriconlocatedinthebottombarofthenewwindow.Youcannavigatetoallyourfilesanddirectoriesthatyouhavecreatedsofarusingthesamecommandsyouhavelearned.Changedirectorytowhereyourbamfilesarelocated.ToopenIGValongwiththebamfile,typethefollowingcommand:

sh /opt/igv/2.4.10/igv.sh SRR5855054_chr1.trimmed.sorted.bam

Page 8: Worksheet 4.3 - Mapping reads using HISAT2 Authors: Mary Allen …dowell.colorado.edu/.../4_qc/4_3_Worksheet_mapping_2019.pdf · 2019. 7. 10. · Complete the new “mapping.sbatch”

13. Finally,customizeIGVasyouplease(fontsize,renametrack,trackheight,track

color,coloralignmentsbycondition,etc.).Zoominontoyourfavoritechr1locus.