RNA‐Seq: Methods and Applicaonsbarc.wi.mit.edu/education/hot_topics/RNAseq/RNA_Seq.pdf · Outline...

Preview:

Citation preview

RNA‐Seq:MethodsandApplica6ons

PratThiru

1

Outline•  IntrotoRNA‐Seq

 BiologicalQues6ons ComparisonwithOtherMethods RNA‐SeqProtocol

•  RNA‐SeqApplica6ons Annota6on Quan6fica6on OtherApplica6ons

•  ExpressionProfilingStepsandSoGware•  RunningTopHatandCufflinks(Commands)

2

GoalsofSequencingtheTranscriptome

•  Annota6on Iden6fygenes,exons,splicingevents,ncRNAs,etc. Novelgenesortranscripts

•  Quan6fica6on Abundanceoftranscriptsbetweendifferentcondi6ons

3

Transcriptome:RNAWorld

4hYp://finchtalk.geospiza.com/2009/05/small‐rnas‐get‐smaller.html

Transcriptome:Complexity

5hYp://www.ncbi.nlm.nih.gov/books/NBK21128/

ComparisonofMethodsforStudyingtheTranscriptome

Technology Tilingmicroarray cDNAorESTsequencing RNA‐Seq

Technology specifica0ons

Principle Hybridiza6on Sangersequencing High‐throughputsequencing

Resolu.on Fromseveralto100bp Singlebase Singlebase

Throughput High Low High

Relianceongenomicsequence Yes No Insomecases

Backgroundnoise High Low Low

Applica0on

Simultaneouslymaptranscribedregionsandgeneexpression Yes Limitedforgeneexpression Yes

Dynamicrangetoquan.fygeneexpressionlevel Uptoafew‐hundredfold Notprac6cal >8,000‐fold

Abilitytodis.nguishdifferentisoforms Limited Yes Yes

Abilitytodis.nguishallelicexpression Limited Yes Yes

Prac0cal issues

RequiredamountofRNA High High Low

Costformappingtranscriptomesoflargegenomes High High Rela6velylow

6Wang,Z.etal.RNA‐Seq:arevolu.onarytoolfortranscriptomicsNatureReviewsGene6cs(2009)

RNA‐SeqExperiment

7Wang,Z.etal.RNA‐Seq:arevolu.onarytoolfortranscriptomicsNatureReviewsGene6cs(2009)

Outline•  IntrotoRNA‐Seq

 BiologicalQues6ons ComparisonwithOtherMethods RNA‐SeqProtocol

•  RNA‐SeqApplica6ons Annota6on Quan6fica6on OtherApplica6ons

•  ExpressionProfilingStepsandSoGware•  RunningTopHatandCufflinks(Commands)

8

RNA‐SeqApplica6ons–Annota6on:Alterna6veSplicingEvents

9Ozsolak,F.andMilos,P.RNAsequencing:advances,challengesandopportuni.esNatureReviewsGene6cs(2011)

RNA‐SeqApplica6ons–Annota6on:Iden6fyKnownandNovelTranscripts

10

UnmappedReads:novelsplicejunc6ons?

MappedReads:novelexonorgene?Knownexons/gene

GuYman,M.etalAbini.oreconstruc.onofcelltype–specifictranscriptomesinmouserevealstheconservedmul.‐exonicstructureoflincRNAsNatureBiotechnology(2010)

Trapnell,C.etalTranscriptassemblyandquan.fica.onbyRNA‐Seqrevealsunannotatedtranscriptsandisoformswitchingduringcelldifferen.a.onNatureBiotechnology(2010)

AssemblyandMappingRNA‐Seq

11Haas,B.J.,andZody,M.C.AdvancingRNA‐SeqanalysisNatureBiotechnology(2010)

• Op6ons: Alignandthenassemble Assembleandthenalign

• Alignto genome transcriptome

RNA‐SeqApplica6ons‐Quan6fica6on:ExpressionProfiling

12MortazaviA.,etal.Mappingandquan.fyingmammaliantranscriptomesbyRNA‐SeqNatureMethods(2008)

NeedforNormaliza6on

•  Morereadsmappedtoatranscriptifitisi)long

ii)athigherdepthofcoverage

•  Normalizesuchthat

i)featuresofdifferentlengths

ii)totalsequencefromdifferentcondi6ons

canbecompared

13

Quan6fyingExpression:RPKM

14

•  RPKM:ReadsPerKilobaseperMillionmappedreads

•  RPKM= C:Numberofmappablereadsonafeature(eg.transcript,exon,etc.)

 L:Lengthoffeature(inkb) N:Totalnumberofmappablereads(inmillions)

MortazaviA.,etal.Mappingandquan.fyingmammaliantranscriptomesbyRNA‐SeqNatureMethods(2008)

RPKMExample

15

N=6M

N=8M

Sample1

Sample2

C=12C=24C=11

C=19C=28C=16

RPKM=19/(0.6*8)=3.96RPKM=28/(1.1*8)=1.94RPKM=16/(1.4*8)=1.43

RPKM=12/(0.6*6)=3.33RPKM=24/(1.1*6)=3.64RPKM=11/(1.4*6)=1.31

GeneA600basesGeneB1100basesGeneC1400bases

Quan6fyingExpression:FPKM

•  FPKM:FragmentsPerKilobaseoftranscriptperMillionfragmentsmapped AnalogoustoRPKMbutdoesnotusereadcounts.

 therela6veabundancesoftranscriptsaredescribedintermsoftheexpectedbiologicalobjects(fragments)observedfromanRNA‐Seqexperiment,whichinthefuturemaynotberepresentedbysingleread

16Trapnell,C.etalTranscriptassemblyandquan.fica.onbyRNA‐Seqrevealsunannotatedtranscriptsandisoformswitchingduringcelldifferen.a.onNatureBiotechnology(2010)

Quan6fyingExpression:Normaliza6onMethods

•  Total‐count(eg.RPKM)•  UpperQuar6le(eg.75thpercen6le):SimilartoTotal‐countbutper‐laneupper‐quar6leofcountsforgeneswithreadsinatleastonelane.

•  Quan6le:Foreachlanethedistribu6onofreadcountsismatchedtoareferencedistribu6ondefinedintermsofmediancounts

17Bullard,J.,etal.Evalua.onofsta.s.calmethodsfornormaliza.onanddifferen.alexpressioninmRNA‐SeqexperimentsBMCBioinforma6cs(2010)

RNA‐SeqApplica6ons:GeneFusion

18Ozsolak,F.andMilos,P.RNAsequencing:advances,challengesandopportuni.esNatureReviewsGene6cs(2011)

Outline•  IntrotoRNA‐Seq

 BiologicalQues6ons ComparisonwithOtherMethods RNA‐SeqProtocol

•  RNA‐SeqApplica6ons Iden6fyingTranscripts Quan6fica6on OtherApplica6ons

•  ExpressionProfilingStepsandSoGware•  RunningTopHatandCufflinks(Commands)

19

ExpressionProfilingWorkflow

20

QC:FilterShortReads

AlignandAssembleorAssembleandAlign

Computa6onalAnalysis:Quan6fyExpression,or

otherapplica6ons

VisualizeData

(SeeHotTopicsonMappingNGSReads)• FASTXToolkit• FastQC• R:ShortRead

• AlignwithTopHat,assemblewithCufflinks

• Cuffcompare,Cuffdiff• SAMtools,BEDtools• R:edgeR,DESeq

• IGV(SeeHotTopicsonIGV)• UCSCGenomeBrowser

TheTuxedoTools

21hYp://mged12‐deep‐sequencing‐analysis.wikispaces.com/file/view/Cole_MGED_tutorial_slides.pdf

TopHatAlgorithm

22Trapnell,C.,etalTopHat:discoveringsplicejunc.onswithRNA‐SeqBioinforma6cs(2009)

CufflinksAlgorithm

23Trapnell,C.,etalTranscriptassemblyandquan.fica.onbyRNA‐Seqrevealsunannotatedtranscriptsandisoformswitchingduringcelldifferen.a.onNatureBiotechnology(2010)

Outline•  IntrotoRNA‐Seq

 BiologicalQues6ons ComparisonwithOtherMethods RNA‐SeqProtocol

•  RNA‐SeqApplica6ons Iden6fyingTranscripts Quan6fica6on OtherApplica6ons

•  ExpressionProfilingStepsandSoGware•  RunningTopHatandCufflinks(Commands)

24

RunningTopHat:AlignReads

•  TopHatManual:hYp://tophat.cbcb.umd.edu/manual.html

•  RunningTopHatonTakUsage:tophat[op6ons]<bow6e_index><reads1[,reads2,...,readsN]>[reads1[,reads2,...,readsN]]eg.bsub“tophat‐p2‐‐solexa1.3‐quals‐‐max‐mul6hits5‐os_1_TopHat_Out/nfs/genomes/

mouse_gp_jul_07_no_random/bow6e/mm9s_1_sequence.txt”Op6ons(SeeManualforallavailableop6ons):‐o/‐‐output‐dir SetsthenameofthedirectoryinwhichTopHatwillwriteallofitsoutput.‐‐solexa‐quals UsetheSolexascaleforqualityvaluesinFASTQfiles.‐‐solexa1.3‐quals AsoftheIlluminaGApipelineversion1.3,qualityscoresareencodedinPhred‐scaledbase‐64.

Usethisop6onforFASTQfilesfrompipeline1.3orlater.‐p/‐‐num‐threads Usethismanythreadstoalignreads.Thedefaultis1.‐g/‐‐max‐mul6hits InstructsTopHattoallowuptothismanyalignmentstothereferenceforagivenread,and

suppressesallalignmentsforreadswithmorethanthismanyalignments.Thedefaultis40.

25

TopHatOutput

•  OutputofTopHatisabamfile.BinaryversionofSequenceAlignment/Map(SAM)file

•  UseIntegra6veGenomicsViewer(IGV)toviewbamfileoruseSAMtoolstoanalyzebamfile

eg.SAMFile

26

WICMT‐SOLEXA:1:20:670:1533#137chr13240920330M*00CTGGATCTGGACCTGGACCTGGATCTATAT::::::::::::::::‐:::::::::::::NM:i:1NH:i:2CC:Z:chr6CP:i:83893005WICMT‐SOLEXA:1:69:135:1285#89chr13269437130M*00TGCCTAAACTTATTAAGGCAGGCCATGGGC:((/+:::(+:+':/:+++&+//':++:::NM:i:2NH:i:4CC:Z:chr7CP:i:20934843WICMT‐SOLEXA:1:84:584:747#153chr13270083030M*00AGCAAGTTTTTTNTTAGCCCTAGATTCCAG::::::::::::%:::::::::::::::::NM:i:1NH:i:5CC:Z:=CP:i:136301734WICMT‐SOLEXA:1:75:1357:1675#163chr1352212825530M=35222870GTGGCTTTGTGGTCTTCACCAACCTTTCTC::::::::::::::::::::::::::::::NM:i:1NH:i:1WICMT‐SOLEXA:1:75:1357:1675#83chr1352228725530M=35221280CTGTAGGTGTAATCCTAAATTCTTATTACG::::::::::::::::::::::::::::::NM:i:0NH:i:1WICMT‐SOLEXA:1:8:59:283#153chr13522536330M*00TTTCTGCTTTGATTATGGTACTGATGTCTG:::::::::::4::::::::::::::::::NM:i:2NH:i:2CC:Z:chr5CP:i:134317691WICMT‐SOLEXA:1:12:1161:945#89chr13523371130M*00TCTACATAGCCCAAACTGGCTTTGGACTCT::::::::::::::::::::::::::::::NM:i:0NH:i:3CC:Z:chr10CP:i:117172515WICMT‐SOLEXA:1:45:1469:1826#73chr13620888330M*00CAAGTATTTAATGTTTTCATTAAATTGTTT::::::::::::::::::::::::::4:::NM:i:0NH:i:2CC:Z:chr11CP:i:22903295WICMT‐SOLEXA:1:14:536:150#73chr13620943330M*00CTGGAAGACAATGTCCAAAAACTCTGAATC:::::::::::::::::::::::::%::&:NM:i:1NH:i:2CC:Z:chr11CP:i:22903240WICMT‐SOLEXA:1:66:646:1188#137chr13662923030M*00AAAAAAAAAACACCACCCCCAACAAAAAAA+00++0+0+''0++++:00::.&:::,:,:NM:i:2NH:i:5CC:Z:chr10CP:i:94881279

Cufflinks:AssembleandQuan6fyReads

•  CufflinksManual:hYp://cufflinks.cbcb.umd.edu/manual.html

•  RunningCufflinksonTak•  Op6onal:Supplyannota6oninGTFformatwith“‐G”op6on

Usage:cufflinks[op6ons]<hits.bam>eg.bsub“cufflinks‐p2‐os_1_Cufflinks_Outs_1_TopHat_Out/accepted_hits.bam”

eg.cufflinkswillassembleandquan6fyusingknowntranscriptsusingg~filesuppliedbsub“cufflinks‐p2‐Gtranscripts.g~accepted_hits.bam”

27

CufflinksOutput•  OutputofCufflinksisaGTFfilewithassembledisoforms

eg.chr1Cufflinkstranscript36321447363302701000‐.gene_id"Neurl3";transcript_id"NM_153408";FPKM"3.7155221121";frac"1.000000";

conf_lo"0.000000";conf_hi"7.570660";cov"0.649922";chr1Cufflinksexon36321447363233981000‐.gene_id"Neurl3";transcript_id"NM_153408";exon_number"1";FPKM"3.7155221121";frac

"1.000000";conf_lo"0.000000";conf_hi"7.570660";cov"0.649922";chr1Cufflinksexon36325501363255541000‐.gene_id"Neurl3";transcript_id"NM_153408";exon_number"2";FPKM"3.7155221121";frac

"1.000000";conf_lo"0.000000";conf_hi"7.570660";cov"0.649922";chr1Cufflinksexon36326058363265461000‐.gene_id"Neurl3";transcript_id"NM_153408";exon_number"3";FPKM"3.7155221121";frac

"1.000000";conf_lo"0.000000";conf_hi"7.570660";cov"0.649922";chr1Cufflinksexon36330183363302701000‐.gene_id"Neurl3";transcript_id"NM_153408";exon_number"4";FPKM"3.7155221121";frac

"1.000000";conf_lo"0.000000";conf_hi"7.570660";cov"0.649922";chr1Cufflinkstranscript36364578363808744+.gene_id"Arid5a";transcript_id"NM_145996";FPKM"0.0015751054";frac"0.002360";conf_lo

"0.000000";conf_hi"0.081996";cov"0.000263";chr1Cufflinksexon36364578363646814+.gene_id"Arid5a";transcript_id"NM_145996";exon_number"1";FPKM"0.0015751054";frac

"0.002360";conf_lo"0.000000";conf_hi"0.081996";cov"0.000263";chr1Cufflinksexon36373054363731724+.gene_id"Arid5a";transcript_id"NM_145996";exon_number"2";FPKM"0.0015751054";frac

"0.002360";conf_lo"0.000000";conf_hi"0.081996";cov"0.000263";chr1Cufflinksexon36374929363750264+.gene_id"Arid5a";transcript_id"NM_145996";exon_number"3";FPKM"0.0015751054";frac

"0.002360";conf_lo"0.000000";conf_hi"0.081996";cov"0.000263";chr1Cufflinksexon36375333363754984+.gene_id"Arid5a";transcript_id"NM_145996";exon_number"4";FPKM"0.0015751054";frac

"0.002360";conf_lo"0.000000";conf_hi"0.081996";cov"0.000263";chr1Cufflinksexon36375837363808744+.gene_id"Arid5a";transcript_id"NM_145996";exon_number"5";FPKM"0.0015751054";frac

"0.002360";conf_lo"0.000000";conf_hi"0.081996";cov"0.000263";

28

LocalResources

•  Descrip6onofavailablefiles,see/nfs/genomes/BaRC_Genomes_README.txt

 Bow6eindex/nfs/genomes/<species>/bowtie

eg./nfs/genomes/mouse_gp_jul_07_no_random/bowtie

 GTFfiles/nfs/genomes/<species>/gtf

eg./nfs/genomes/mouse_gp_jul_07/gtf

29

FurtherReading•  RNA‐SeqMortazavi,A.,etal.Mappingandquan.fyingmammaliantranscriptomesbyRNA‐SeqNatureMethods

5(7):621‐628(2008)Wang,Z.,atal.RNA‐Seq:arevolu.onarytoolfortranscriptomicsNatureReviewsGene6cs10:57‐63

(2009)Ozsolak,F.andMilosP.M.RNAsequencing:advances,challenges,andopportuni.esNatureReviews

Gene6cs12:87‐98(2011)•  TopHatTrapnell,C.,etal.TopHat:discoveringsplicejunc.onswithRNA‐SeqBioinforma6cs25(9)1105‐1111

(2009)

•  CufflinksTrapnell,C.,etal.Transcriptassemblyandquan.fica.onbyRNA‐Seqrevealsunannotatedtranscripts

andisoformswitchingduringcelldifferen.a.onNatureBiotechnology28(5)511‐515(2010)

30

OnlineCommunityForumandDiscussion

•  hYp://seqanswers.com/

31

Recommended