29
Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work with Serghei Mangul, Sahar Al Seesi, Adrian Caciula, Dumitru Brinza, Ion Mandoiu

Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work

Embed Size (px)

Citation preview

Page 1: Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work

Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data

Alex Zelikovsky Department of Computer Science

Georgia State University

Joint work with Serghei Mangul, Sahar Al Seesi, Adrian Caciula, Dumitru Brinza, Ion Mandoiu

Page 2: Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work

2

Advances in Next Generation Sequencing

http://www.economist.com/node/16349358

Roche/454 FLX Titanium400-600 million reads/run

400bp avg. length

Illumina HiSeq 2000Up to 6 billion PE reads/run

35-100bp read length

SOLiD 4/55001.4-2.4 billion PE reads/run

35-50bp read length

Ion Proton Sequencer

Page 3: Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work

3

RNA-SeqRNA-Seq

A B C D E

Make cDNA & shatter into fragments

Sequence fragment ends

Map reads

Gene Expression

A B C

A C

D E

Transcriptome Reconstruction Isoform Expression

Page 4: Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work

4

Transcriptome Assembly

• Given partial or incomplete information about something, use that information to make an informed guess about the missing or unknown data.

Page 5: Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work

5

Transcriptome Assembly Types

• Genome-independent reconstruction (de novo)– de Brujin k-mer graph

• Genome-guided reconstruction (ab initio)– Spliced read mapping – Exon identification– Splice graph

• Annotation-guided reconstruction– Use existing annotation (known transcripts) – Focus on discovering novel transcripts

Page 6: Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work

6

Previous approaches

• Genome-independent reconstruction – Trinity(2011), Velvet(2008), TransABySS(2008)

• Genome-guided reconstruction – Scripture(2010)

• Reports “all” transcripts

– Cufflinks(2010), IsoLasso(2011), SLIDE(2012), CLIIQ(2012), TRIP(2012), Traph (2013)

• Minimizes set of transcripts explaining reads

• Annotation-guided reconstruction– RABT(2011), DRUT(2011)

Page 7: Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work

7

Gene representation

• Pseudo-exons - regions of a gene between consecutive transcriptional or splicing events

• Gene - set of non-overlapping pseudo-exons

e1 e3 e5

e2 e4 e6

Spse1Epse1

Spse2

Epse2Spse3

Epse3

Spse4

Epse4

Spse5

Epse5 Spse6

Epse6

Spse7Epse7

Pseudo-exons:

e1 e5

pse1 pse2 pse3 pse4 pse5 pse6 pse7

Tr1:

Tr2:

Tr3:

Page 8: Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work

8

Splice GraphGenome

1 42 3 5 6 7 8 9

TSSpseudo-exons

TES

Page 9: Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work

• Map the RNA-Seq reads to genome

• Construct Splice Graph - G(V,E)– V : exons– E: splicing events

• Candidate transcripts– depth-first-search (DFS)

• Select candidate transcripts– IsoEM– greedy algorithm

9

Genome

MaLTA Maximum Likelihood Transcriptome Assembly

Page 10: Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work

10

How to select?

• Select the smallest set of candidate transcripts • covering all transcript variants

Transcript : set of transcript variants

Sharmistha Pal, Ravi Gupta, Hyunsoo Kim, et al., Alternative transcription exceeds alternative splicing in generating the transcriptome diversity of cerebellar development, Genome Res. 2011 21: 1260-1272

alternative first exon alternative last exon exon skipping intron retention

alternative 5' splice junction alternative 5' splice junction splice junction

Page 11: Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work

IsoEM: Isoform Expression Level Estimation

• Expectation-Maximization algorithm• Unified probabilistic model incorporating

– Single and/or paired reads– Fragment length distribution– Strand information– Base quality scores– Repeat and hexamer bias correction

Page 12: Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work

Read-isoform compatibility graphirw ,

a

aaair FQOw ,

Page 13: Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work

Fragment length distribution

A B C

A C

A B C

A C

A B C

A C

i

j

Series1

Fa(i)

Series1

Fa (j)

Page 14: Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work

14

Greedy algorithm

1. Sort transcripts by inferred IsoEM expression levels in decreasing order

2. Traverse transcripts – Select transcripts if it contains novel transcript

variant– Continue traversing until all transcript variant

are covered

Page 15: Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work

15

Greedy algorithm Transcript Variants:Transcripts sorted by expression levels

Page 16: Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work

16

Greedy algorithm Transcript Variants:Transcripts sorted by expression levels

Page 17: Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work

17

Greedy algorithm Transcript Variants:Transcripts sorted by expression levels

Page 18: Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work

18

Greedy algorithm Transcript Variants:Transcripts sorted by expression levels

Page 19: Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work

19

Greedy algorithm Transcript Variants:Transcripts sorted by expression levels

Page 20: Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work

20

Greedy algorithm Transcript Variants:Transcripts sorted by expression levels

Page 21: Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work

21

Greedy algorithm Transcript Variants:Transcripts sorted by expression levels

Page 22: Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work

22

Greedy algorithm Transcript Variants:Transcripts sorted by expression levels

Page 23: Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work

23

Greedy algorithm Transcript Variants:Transcripts sorted by expression levels

Page 24: Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work

24

Greedy algorithm Transcript Variants:Transcripts sorted by expression levels

Page 25: Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work

25

Greedy algorithm Transcript Variants:Transcripts sorted by expression levels

STOP. All transcript variant are covered.

Page 26: Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work

26

MaLTA results on GOG-350 dataset

• 4.5M single Ion reads with average read length 121 bp, aligned using TopHat2• Number of assembled transcripts

– MaLTA : 15385 – Cufflinks : 17378

• Number of transcripts matching annotations– MaLTA : 4555(26%) – Cufflinks : 2031(13%)

Page 27: Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work

Expression Estimation on Ion Torrent reads

IsoEM HBR Cufflinks HBR IsoEM UHR Cufflinks UHR0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

R2 fo

r Is

oEM

/Cuffl

inks

Esti

mat

es v

s qP

CR

• Squared correlation– IsoEM / Cufflinks FPKMs vs qPCR values for 800 genes – 2 MAQC samples : Human Brain and Universal

Page 28: Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work

28

Conclusions

• Novel method for transcriptome assembly • Validated on Ion Torrent RNA-Seq Data• Comparing with Cufflinks:

– similar number of assembled transcripts– 2x more previously annotated transcripts

• Transcript quantification is useful for transcript assembly better quantification?

Page 29: Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work

29