Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Worksheet4.3-MappingreadsusingHISAT2 Authors:MaryAllen&DanielRamirezHISAT2manual:https://ccb.jhu.edu/software/hisat2/manual.shtml Samtoolsmanual:http://www.htslib.org/doc/samtools.htmlUsername:Screenshotsshow‘daramirez’,thoughyouwillseeyourownusername!
1. Usinganappropriateterminal,logontotheclustertousehisat2: a. Usepwdtomakesureyouknowwhereyouareandlstomakesureyou
knowwhatisinthisdirectory.
b. Changetheworkingdirectory(cd)toyourownscratchdirectory.
2. Makeanewdirectory/folder(mkdir)namedhisat2.
Thisdirectorythatwillcontaintheresultsfromhisat2.Theerrorandoutputfilesgeneratedbyyourbatchscriptjobswillbestoredin“eofiles”.Thebatchscriptthatyouwillcreatewillliveinthe“sbatch”directory.
3. Gocheckthefastqdatafilesinthefollowingpublicdirectoryusingcdandls:/scratch/Workshop/SR2019.Inthere,thereareseveralfolderscontainingfastqfilesthathaveallbeenalignedusinghisat2;fromATAC-seq,toChIP-seq,andRNA-seq.Inthisexample,wewillmap/alignsequencingdatafromaChIP-seqexperimentfromahumancellline.Tomakethisexamplerunquickenoughforteachingpurposes,asubsampleofthewholeChIP-seqfilehasbeenproducedcorrespondingtosomesequencingreadsfromchromosome1only.Wewillworkwiththefile
“SRR5855054_chr1.trimmed.fastq”whichhasbeenalreadybeentrimmed,andlivesin/scratch/Workshop/SR2019/4_qc/trimmomatic.
4. Findthescriptbatchtemplate“template.sbatch”inthedirectory:
/scratch/Workshop/SR2019/scripts
5. Copythescriptbatch“template.sbatch”thatyoujustlookedattoyoursbatch
directory/scratch/Users/<YOUR_USERNAME>/sbatch”andchangeitsnameto“mapping.sbatch”(mv<originalname><newname>).
6. Completethenew“mapping.sbatch”filewiththerightcontenttorunhisat2.
(hint:transitiontoinsertmodebypressingiifusingvim.
a. Changethenameofthescriptbatchfrom<JOB-NAME>tosomethingmore
useful,suchas“mapping”. b. Replace<EMAIL>withyourownemailaddresstowhichyouwanttoreceive
anynotifications. c. Replace<USERNAME>withyourownusernametocompletethepath
directorytowheretostoretheerrorandoutputfiles. d. Completethefollowingfields:nnodes,ntasks,memandtime.Hisat2canuse
multipleprocessorsperinputfile.So,1node,4tasks/processors/CPUs,5Gbformemoryand5minutesforwall-timeshouldbeenough.
e. Specifythepathtotheinputfile“SRR5855054_chr1.trimmed.fastq”as
thevalueofthevariable“INPUT_DIRECTORY”.Alsospecifythepaththatleadstothe“hisat2”directoryyoucreatedearlierinyourscratchdirectoryasthevalueofthevariable“OUTPUT_DIRECTORY”.
f. Assigntherequiredmodulesnecessarytorunthisbatchscriptjobwithboth
hisat2andsamtools.Tolookforthecorrectbowtie2andsamtoolsmodules,exitvimbysavingallchanges(pressESCandtype:wq!),andintheterminal,listallavailablemodulesonthecomputerclusterthatcontaintheword“hisat2”and“samtools”inthem.Dothiswiththecommandmodulespider<string>andlookfortheonesforhisat2andsamtools.
Usingvim,add“moduleloadhisat2/2.1.0”and“moduleloadsamtools/1.8”inthefile“mapping.sbatch”inthesection“<MODULES_TO_LOAD>”.
g. Thelastedityouneedtodoistheactualblockoftextthatspecifieshowto
runhisat2andacoupleofsamtoolscommandsneededtoobtainafilereadyforvisualizationusingtheIntegrativeGenomicsViewer(IGV).
1)Thesyntaxtousehisat2forsingle-endreadsisasfollows: hisat2[options]-x<genome_index>-U<input_fastq>><output_sam>
Donotforgettospecifythefullpathtoallthefiles,includingthehumangenomeindexfiles.Usethevariablesthatyoucreatedearliertomakethingseasier.Youcoulddecidetypethewholehisat2commandinasingleline,asshownherebelow:
Butthatisveryhardtoread.Youcouldinsteadbreakupthecommandontomanylinesusingthecharacter\attheendofeveryline.These\charactersareignoredbythecomputer,butwillhelpyouidentifyeachpartofthecommandmoreeasily.
2)Thesyntaxtousesamtoolsviewisasfollows: samtoolsview[options]<output_bam>><input.sam> 3)Thesyntaxtousesamtoolssortisasfollows: samtoolssort[options]<input_bam>><output_sorted.bam> 4)Thesyntaxtousesamtoolsindexisasfollows: samtoolsindex<input_sorted.bam>><output_sorted.bam.bai>
So,wewillgofromhavinganemptytemplate,tohavingacompletehisat2&samtoolsblockofcommandsandacompletebatchscript.
Saveallchangestothe“mapping.sbatch”fileandexitvim.
7. Nowthatthebatchscriptisready,submitittothejobmanagerSLURMtobeginprocessingtheChIP-seqsequencingdata.Intheterminal,whilelocatedinthe“sbatch”directorywhere“mapping.sbatch”lives,typesbatch<sbatchfile>.Thejobmanagerwillgiveyouajobnumber.Oncesubmitted,youcancheckonthestatusofjobsbytypingsqueue-uusername.
8. Movetothe“eofiles”directoryandopentheerrorandoutputfiles.Ifyourjobfailed,
hereiswhereyoushouldgotofigureoutwhatwentwrong.Ifyourjobsucceeded,youcanseeinthe“.err”filethehisat2alignmentreport.
9. Checkthe“hisat2”directory.Thereshouldbefourfiles:asamfile,abamfile,a
sortedbamfile,andasortedbamindexfile.
10. Thesorted.bamandbaifilesarethetwonecessaryfilesforvisualizationofthedata
usingIGV.Open“X2Go”.Loginontoanewsessionwindow.Ifyouhavenotconfiguredyoursession,thenyoushouldconfigureitnow.Nameyoursessionwithameaningfulnameinthesection“Sessionname”.Inhost,typethecorrespondingservernamethatyouwanttoconnectto,forthisclasstype“18.222.55.224”.TypeyourGitHubusernamein“Login”.Selecttheoption“Tryautologin(viaSSHAgentordefaultSSHkey”.Change“Sessiontype”to“XFCE”.Donotchangeanythingelse.Savechangesofthenewsessionbyclicking“OK”.
11. Clickonthecreatedsessionboxontheright,andselect“Yes”ifaskedifyoutrust
thehostkey.Ifsuccessfullyconnected,anewwindowwillappear.ThisistheclusternodethatyouwillusetovisualizeyourdatausingIGV.
12. IncreasethesizeofthewindowsothatIGVcanbedisplayedcompletely.Openthe
terminalemulatoriconlocatedinthebottombarofthenewwindow.Youcannavigatetoallyourfilesanddirectoriesthatyouhavecreatedsofarusingthesamecommandsyouhavelearned.Changedirectorytowhereyourbamfilesarelocated.ToopenIGValongwiththebamfile,typethefollowingcommand:
sh /opt/igv/2.4.10/igv.sh SRR5855054_chr1.trimmed.sorted.bam
13. Finally,customizeIGVasyouplease(fontsize,renametrack,trackheight,track
color,coloralignmentsbycondition,etc.).Zoominontoyourfavoritechr1locus.