Using BioGrids › wiki › downloads › BioGrids_AWS_RNA_Seq_July_25_2019.pdfUsing BioGrids for...

Preview:

Citation preview

UsingBioGridsforRNA-SeqonAWSandYourLaptop

TMEC304 July252019

JamesVincent

BioGridsConsortiumHarvardMedicalSchool

biogrids.orghelp@biogrids.org

Todaywewill.....

InstallsoftwarewithBioGrids

RunanRNA-Seqworkflow

ReplicateaboveonAWSandlaptop

biogrids.orghelp@biogrids.org

BioGridsonAWS

faithfullaptop AWSEC2Instance

biogrids.orghelp@biogrids.org

RNA-SeqWorkflow

openaterminalopenabrowser:biogrids.org/wiki/workshops

biogrids.orghelp@biogrids.org

SubtleThings

•  CapsuleEnvironment•  .bashrc/.profilenotchanged•  binaryinstalls

biogrids.orghelp@biogrids.org

https://www.biostars.org/p/189261/:Thisseemstobeabugwheninstallingfastqcusingapt-getinstallfastqc

STARmanual.pdf….whichcreatesproblemsforSTARcompilation.Oneoptiontoavoidthisproblemistoinstallgcc…….

http://github.gersteinlab.org/exceRpt/ManualInstallation:….generallynotrecommended…<snip>…instructionsonhowtoinstallexceRptanditsvariousdependencieswill[oneday]belistedtowardthebottomofthispage.

AvoidTimeSinks

biogrids.orghelp@biogrids.org

ReproducibleResearch

$ STAR --sbapp:d !!Capsule:STAR using star version 2.5.3a ! Version information for: /programs/i386-mac/star !!Default version: 2.5.3a !In-use version: 2.5.3a !Other available versions: none !Overrides use this shell variable: STAR_M !

SelfDocumenting

STAR --sbapp:d ! samtools --sbapp:d !

Includeinworkflow:

biogrids.orghelp@biogrids.org

ConfigFile

[installer] !site = biogrid-production !key = 70rYFBTDnmCr93VUklfbf1s3M4jdyC9bFVYHew== !user = jvincent1 !![packages] !star@2.5.3a = i386-mac !samtools@1.5 = i386-mac !igv@2.4.10 = i386-mac !

biogrids.orghelp@biogrids.org

ThisIsHandy

biogridssavemysetup.txt biogridsreactivatemysetup.txt

faithfullaptop newworkstation

biogrids.orghelp@biogrids.org

BioGridsisPortable

faithfullaptop

laboratoryworkstation

HMSO2computecluster

BCHcomputecluster

biogrids.orghelp@biogrids.org

BioGridsBenefits

savetime-reduceheadachesscaleandshareworkflowspartofreproducibleresearch

biogrids.orghelp@biogrids.org

BioGridsConsortium

ComputeInfrastructure

PersonnelSBGridBioGrids

FundingHMSToolsandTechnologiesCommittee

biogrids.orghelp@biogrids.org

WhyBioGrids?

You CoverofNature

compilesoftwarecompilelibrariesmanagedependenciesmanageversionsmanagepathschangeversions....

learntousesoftwareoptimizeworkflowgetsciencedone

biogrids.orghelp@biogrids.org

RNA-SeqOverview

HarvardChanBioinformaticsCore(HBC)

http://bioinformatics.sph.harvard.edu/training

biogrids.orghelp@biogrids.org

RNA-SeqOverview

hbctraining.github.io/Intro-to-rnaseq-hpc-O2

Biologicalsamples/Libraryprep

sequencereads

qualitycheck

adapter/qualitytrimming

spliceawaremappingtogenome

countreadsassociatedwithgenes

statisticalanalysisidentifydifferentiallyexpressedgenes

biogrids.orghelp@biogrids.org

RNAPrep

biogrids.orghelp@biogrids.org

Sequencing

biogrids.orghelp@biogrids.org

RNA-SeqOverview

hbctraining.github.io/Intro-to-rnaseq-hpc-O2

Biologicalsamples/Libraryprep

sequencereads

qualitycheck

adapter/qualitytrimming

spliceawaremappingtogenome

countreadsassociatedwithgenes

statisticalanalysisidentifydifferentiallyexpressedgenes

FastQC

(trimmomatic)

STAR

subRead

BioGridsApps

biogrids.orghelp@biogrids.org

MappedReads

biogrids.orghelp@biogrids.org

CheckResults

IGV1:Genomes/LoadGenomefromFile...(chr1_MOV10.fa)2:File/Loadfromfile...(.gtffile)3:File/Loadfromfile...(.bamfile)

biogrids.orghelp@biogrids.org

workflow softwarestack

computeresources

DevOpswithBioGrids

bioinformatics BioGrids laptopHMSO2AWS

biogrids.orghelp@biogrids.org

AWSHandsOn

https://sbgrid.signin.aws.amazon.com/consoleusername:workshop21password:Biogrids_Workshop1

biogrids.orghelp@biogrids.org

AWS-AmazonWebServices

biogrids.orghelp@biogrids.org

AWSParallelClusteraws-parallelcluster.readthedocs.io

scalableHPCcluster

biogrids.orghelp@biogrids.org

help@biogrids.org

BioGrids is funded by the  Harvard Medical School

Tools and Technologies Committee 

biogrids.orghelp@biogrids.org

AdditionalResourcesENCODEdatafilescanbefoundhereforCalTechRNA-Seq:http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeCaltechRnaSeq/Usethisbamfile:wgEncodeCaltechRnaSeqK562R1x75dAlignsRep1V2RegionofMOV10gene:chr1:113,214,934-113,243,900Howtodownloadwholegenome:-UCSCftpsite:hgdownload.cse.ucsc.edu-UCSCwebsite:http://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/-UCSCrecommendsusinganftpclientforlargefiledownloads-chr1isonly70M

biogrids.orghelp@biogrids.org

References

TRAININGhbctraining.github.io/Intro-to-rnaseq-hpc-O2AWShttps://aws.amazon.com/ec2/getting-startedENCODEhttps://www.encodeproject.orgIMAGEShttps://www.diagenode.com/en/categories/Library-preparation-for-RNA-seqhttps://rnaseq.uoregon.edu

Recommended