8
Worksheet 4.3 - Mapping reads using HISAT2 Authors: Mary Allen & Daniel Ramirez HISAT2 manual: https://ccb.jhu.edu/software/hisat2/manual.shtml Samtools manual: http://www.htslib.org/doc/samtools.html Username: Screenshots show ‘daramirez’, though you will see your own username! 1. Using an appropriate terminal, log on to the cluster to use hisat2: a. Use pwd to make sure you know where you are and ls to make sure you know what is in this directory. b. Change the working directory (cd) to your own scratch directory. 2. Make a new directory/folder (mkdir) named hisat2. This directory that will contain the results from hisat2. The error and output files generated by your batch script jobs will be stored in “eofiles”. The batch script that you will create will live in the “sbatch” directory. 3. Go check the fastq data files in the following public directory using cd and ls: /scratch/Workshop/SR2019. In there, there are several folders containing fastq files that have all been aligned using hisat2; from ATAC-seq, to ChIP-seq, and RNA- seq. In this example, we will map/align sequencing data from a ChIP-seq experiment from a human cell line. To make this example run quick enough for teaching purposes, a subsample of the whole ChIP-seq file has been produced corresponding to some sequencing reads from chromosome 1 only. We will work with the file

Worksheet 4.3 - Mapping reads using HISAT2 Authors: Mary Allen …dowell.colorado.edu/.../4_qc/4_3_Worksheet_mapping_2019.pdf · 2019. 7. 10. · Complete the new “mapping.sbatch”

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Worksheet 4.3 - Mapping reads using HISAT2 Authors: Mary Allen …dowell.colorado.edu/.../4_qc/4_3_Worksheet_mapping_2019.pdf · 2019. 7. 10. · Complete the new “mapping.sbatch”

Worksheet4.3-MappingreadsusingHISAT2 Authors:MaryAllen&DanielRamirezHISAT2manual:https://ccb.jhu.edu/software/hisat2/manual.shtml Samtoolsmanual:http://www.htslib.org/doc/samtools.htmlUsername:Screenshotsshow‘daramirez’,thoughyouwillseeyourownusername!

1. Usinganappropriateterminal,logontotheclustertousehisat2: a. Usepwdtomakesureyouknowwhereyouareandlstomakesureyou

knowwhatisinthisdirectory.

b. Changetheworkingdirectory(cd)toyourownscratchdirectory.

2. Makeanewdirectory/folder(mkdir)namedhisat2.

Thisdirectorythatwillcontaintheresultsfromhisat2.Theerrorandoutputfilesgeneratedbyyourbatchscriptjobswillbestoredin“eofiles”.Thebatchscriptthatyouwillcreatewillliveinthe“sbatch”directory.

3. Gocheckthefastqdatafilesinthefollowingpublicdirectoryusingcdandls:/scratch/Workshop/SR2019.Inthere,thereareseveralfolderscontainingfastqfilesthathaveallbeenalignedusinghisat2;fromATAC-seq,toChIP-seq,andRNA-seq.Inthisexample,wewillmap/alignsequencingdatafromaChIP-seqexperimentfromahumancellline.Tomakethisexamplerunquickenoughforteachingpurposes,asubsampleofthewholeChIP-seqfilehasbeenproducedcorrespondingtosomesequencingreadsfromchromosome1only.Wewillworkwiththefile

Page 2: Worksheet 4.3 - Mapping reads using HISAT2 Authors: Mary Allen …dowell.colorado.edu/.../4_qc/4_3_Worksheet_mapping_2019.pdf · 2019. 7. 10. · Complete the new “mapping.sbatch”

“SRR5855054_chr1.trimmed.fastq”whichhasbeenalreadybeentrimmed,andlivesin/scratch/Workshop/SR2019/4_qc/trimmomatic.

4. Findthescriptbatchtemplate“template.sbatch”inthedirectory:

/scratch/Workshop/SR2019/scripts

5. Copythescriptbatch“template.sbatch”thatyoujustlookedattoyoursbatch

directory/scratch/Users/<YOUR_USERNAME>/sbatch”andchangeitsnameto“mapping.sbatch”(mv<originalname><newname>).

6. Completethenew“mapping.sbatch”filewiththerightcontenttorunhisat2.

(hint:transitiontoinsertmodebypressingiifusingvim.

Page 3: Worksheet 4.3 - Mapping reads using HISAT2 Authors: Mary Allen …dowell.colorado.edu/.../4_qc/4_3_Worksheet_mapping_2019.pdf · 2019. 7. 10. · Complete the new “mapping.sbatch”

a. Changethenameofthescriptbatchfrom<JOB-NAME>tosomethingmore

useful,suchas“mapping”. b. Replace<EMAIL>withyourownemailaddresstowhichyouwanttoreceive

anynotifications. c. Replace<USERNAME>withyourownusernametocompletethepath

directorytowheretostoretheerrorandoutputfiles. d. Completethefollowingfields:nnodes,ntasks,memandtime.Hisat2canuse

multipleprocessorsperinputfile.So,1node,4tasks/processors/CPUs,5Gbformemoryand5minutesforwall-timeshouldbeenough.

e. Specifythepathtotheinputfile“SRR5855054_chr1.trimmed.fastq”as

thevalueofthevariable“INPUT_DIRECTORY”.Alsospecifythepaththatleadstothe“hisat2”directoryyoucreatedearlierinyourscratchdirectoryasthevalueofthevariable“OUTPUT_DIRECTORY”.

f. Assigntherequiredmodulesnecessarytorunthisbatchscriptjobwithboth

hisat2andsamtools.Tolookforthecorrectbowtie2andsamtoolsmodules,exitvimbysavingallchanges(pressESCandtype:wq!),andintheterminal,listallavailablemodulesonthecomputerclusterthatcontaintheword“hisat2”and“samtools”inthem.Dothiswiththecommandmodulespider<string>andlookfortheonesforhisat2andsamtools.

Page 4: Worksheet 4.3 - Mapping reads using HISAT2 Authors: Mary Allen …dowell.colorado.edu/.../4_qc/4_3_Worksheet_mapping_2019.pdf · 2019. 7. 10. · Complete the new “mapping.sbatch”

Usingvim,add“moduleloadhisat2/2.1.0”and“moduleloadsamtools/1.8”inthefile“mapping.sbatch”inthesection“<MODULES_TO_LOAD>”.

g. Thelastedityouneedtodoistheactualblockoftextthatspecifieshowto

runhisat2andacoupleofsamtoolscommandsneededtoobtainafilereadyforvisualizationusingtheIntegrativeGenomicsViewer(IGV).

1)Thesyntaxtousehisat2forsingle-endreadsisasfollows: hisat2[options]-x<genome_index>-U<input_fastq>><output_sam>

Donotforgettospecifythefullpathtoallthefiles,includingthehumangenomeindexfiles.Usethevariablesthatyoucreatedearliertomakethingseasier.Youcoulddecidetypethewholehisat2commandinasingleline,asshownherebelow:

Butthatisveryhardtoread.Youcouldinsteadbreakupthecommandontomanylinesusingthecharacter\attheendofeveryline.These\charactersareignoredbythecomputer,butwillhelpyouidentifyeachpartofthecommandmoreeasily.

Page 5: Worksheet 4.3 - Mapping reads using HISAT2 Authors: Mary Allen …dowell.colorado.edu/.../4_qc/4_3_Worksheet_mapping_2019.pdf · 2019. 7. 10. · Complete the new “mapping.sbatch”

2)Thesyntaxtousesamtoolsviewisasfollows: samtoolsview[options]<output_bam>><input.sam> 3)Thesyntaxtousesamtoolssortisasfollows: samtoolssort[options]<input_bam>><output_sorted.bam> 4)Thesyntaxtousesamtoolsindexisasfollows: samtoolsindex<input_sorted.bam>><output_sorted.bam.bai>

So,wewillgofromhavinganemptytemplate,tohavingacompletehisat2&samtoolsblockofcommandsandacompletebatchscript.

Page 6: Worksheet 4.3 - Mapping reads using HISAT2 Authors: Mary Allen …dowell.colorado.edu/.../4_qc/4_3_Worksheet_mapping_2019.pdf · 2019. 7. 10. · Complete the new “mapping.sbatch”

Saveallchangestothe“mapping.sbatch”fileandexitvim.

7. Nowthatthebatchscriptisready,submitittothejobmanagerSLURMtobeginprocessingtheChIP-seqsequencingdata.Intheterminal,whilelocatedinthe“sbatch”directorywhere“mapping.sbatch”lives,typesbatch<sbatchfile>.Thejobmanagerwillgiveyouajobnumber.Oncesubmitted,youcancheckonthestatusofjobsbytypingsqueue-uusername.

8. Movetothe“eofiles”directoryandopentheerrorandoutputfiles.Ifyourjobfailed,

hereiswhereyoushouldgotofigureoutwhatwentwrong.Ifyourjobsucceeded,youcanseeinthe“.err”filethehisat2alignmentreport.

9. Checkthe“hisat2”directory.Thereshouldbefourfiles:asamfile,abamfile,a

sortedbamfile,andasortedbamindexfile.

10. Thesorted.bamandbaifilesarethetwonecessaryfilesforvisualizationofthedata

usingIGV.Open“X2Go”.Loginontoanewsessionwindow.Ifyouhavenotconfiguredyoursession,thenyoushouldconfigureitnow.Nameyoursessionwithameaningfulnameinthesection“Sessionname”.Inhost,typethecorrespondingservernamethatyouwanttoconnectto,forthisclasstype“18.222.55.224”.TypeyourGitHubusernamein“Login”.Selecttheoption“Tryautologin(viaSSHAgentordefaultSSHkey”.Change“Sessiontype”to“XFCE”.Donotchangeanythingelse.Savechangesofthenewsessionbyclicking“OK”.

Page 7: Worksheet 4.3 - Mapping reads using HISAT2 Authors: Mary Allen …dowell.colorado.edu/.../4_qc/4_3_Worksheet_mapping_2019.pdf · 2019. 7. 10. · Complete the new “mapping.sbatch”

11. Clickonthecreatedsessionboxontheright,andselect“Yes”ifaskedifyoutrust

thehostkey.Ifsuccessfullyconnected,anewwindowwillappear.ThisistheclusternodethatyouwillusetovisualizeyourdatausingIGV.

12. IncreasethesizeofthewindowsothatIGVcanbedisplayedcompletely.Openthe

terminalemulatoriconlocatedinthebottombarofthenewwindow.Youcannavigatetoallyourfilesanddirectoriesthatyouhavecreatedsofarusingthesamecommandsyouhavelearned.Changedirectorytowhereyourbamfilesarelocated.ToopenIGValongwiththebamfile,typethefollowingcommand:

sh /opt/igv/2.4.10/igv.sh SRR5855054_chr1.trimmed.sorted.bam

Page 8: Worksheet 4.3 - Mapping reads using HISAT2 Authors: Mary Allen …dowell.colorado.edu/.../4_qc/4_3_Worksheet_mapping_2019.pdf · 2019. 7. 10. · Complete the new “mapping.sbatch”

13. Finally,customizeIGVasyouplease(fontsize,renametrack,trackheight,track

color,coloralignmentsbycondition,etc.).Zoominontoyourfavoritechr1locus.