28
CRC Project on Robust Transcript Discovery and Quantification from Sequencing Data Oct. 7, 2011 live call UCONN: Ion Mandoiu, Sahar Elsisi GSU: Alex Zelikovsky, Serghei Mangul, Adrian Caciula Lifetech PI: Dumitru Brinza

CRC Project on Robust Transcript Discovery and Quantification from Sequencing Data

  • Upload
    benjy

  • View
    26

  • Download
    0

Embed Size (px)

DESCRIPTION

CRC Project on Robust Transcript Discovery and Quantification from Sequencing Data. Oct. 7, 2011 live call. UCONN: Ion Mandoiu , Sahar Elsisi GSU: Alex Zelikovsky , Serghei Mangul , Adrian Caciula Lifetech PI: Dumitru Brinza. Outline. - PowerPoint PPT Presentation

Citation preview

Page 1: CRC Project on  Robust Transcript Discovery and Quantification from Sequencing Data

CRC Project on Robust Transcript Discovery and Quantification from Sequencing

Data

Oct. 7, 2011 live call

UCONN: Ion Mandoiu, Sahar ElsisiGSU: Alex Zelikovsky, Serghei Mangul, Adrian Caciula

Lifetech PI: Dumitru Brinza

Page 2: CRC Project on  Robust Transcript Discovery and Quantification from Sequencing Data

Outline

1. IsoEM plugin for accurate prediction of transcription level

2. Results on ION data3. Comparison to other tools and platforms 4. Progress on transcript reconstruction5. Feedback on plugin development and

deployment on VM

Page 3: CRC Project on  Robust Transcript Discovery and Quantification from Sequencing Data

Outline

1. IsoEM plugin for accurate prediction of transcription level

2. Results on ION data3. Comparison to other tools and platforms 4. Progress on transcript reconstruction5. Feedback on plugin development and

deployment on VM

Page 4: CRC Project on  Robust Transcript Discovery and Quantification from Sequencing Data

IsoEM: Isoform Expression Level Estimation

• Expectation-Maximization algorithm• Unified probabilistic model incorporating– Single and/or paired reads– Fragment length distribution– Strand information– Base quality scores– Repeat and hexamer bias correction

Page 5: CRC Project on  Robust Transcript Discovery and Quantification from Sequencing Data

Read-isoform compatibility graphirw ,

a

aaair FQOw ,

Page 6: CRC Project on  Robust Transcript Discovery and Quantification from Sequencing Data

Fragment length distribution

• Paired reads

A B C

A C

A B C

A CA C

A B Ci

j

Series1

Fa(i)

Series1

Fa (j)

Page 7: CRC Project on  Robust Transcript Discovery and Quantification from Sequencing Data

Fragment length distribution

• Single reads

A B C

A C

A B C

A C

A B C

A C

i

j

Series1

Fa(i)

Series1

Fa (j)

Page 8: CRC Project on  Robust Transcript Discovery and Quantification from Sequencing Data

IsoEM Plugin

Page 9: CRC Project on  Robust Transcript Discovery and Quantification from Sequencing Data

IsoEM Plugin Outputs

Page 10: CRC Project on  Robust Transcript Discovery and Quantification from Sequencing Data

IsoEM Plugin Outputs: FPKM estimates

Page 11: CRC Project on  Robust Transcript Discovery and Quantification from Sequencing Data

IsoEM Plugin Outputs: UCSC tracks

Page 12: CRC Project on  Robust Transcript Discovery and Quantification from Sequencing Data

Outline

1. IsoEM plugin for accurate prediction of transcription level

2. Results on ION data3. Comparison to other tools and platforms 4. Progress on transcript reconstruction5. Feedback on plugin development and

deployment on VM

Page 13: CRC Project on  Robust Transcript Discovery and Quantification from Sequencing Data

MAQC data

• 2 RNA samples: UHRR, HBRR • 5 ION runs each

• Gold standard• Gene expression levels measured in quadruplicate by qPCR

for 832 Ensembl genes [MAQC Consortium 06]

Page 14: CRC Project on  Robust Transcript Discovery and Quantification from Sequencing Data

IsoEM results on ION HBR runs

Page 15: CRC Project on  Robust Transcript Discovery and Quantification from Sequencing Data

IsoEM results on ION UHR runs

Page 16: CRC Project on  Robust Transcript Discovery and Quantification from Sequencing Data

Outline

1. IsoEM plugin for accurate prediction of transcription level

2. Results on ION data3. Comparison to other tools and platforms 4. Progress on transcript reconstruction5. Feedback on plugin development and

deployment on VM

Page 17: CRC Project on  Robust Transcript Discovery and Quantification from Sequencing Data

IsoEM vs. Cufflinks 1.0.3 on ION reads

IsoEM HBR Cufflinks HBR IsoEM UHR Cufflinks UHR0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

R2 fo

r Iso

EM/C

cuffl

inks

Esti

mat

es v

s qP

CR

Page 18: CRC Project on  Robust Transcript Discovery and Quantification from Sequencing Data

HBR GOG-139_281

1 10 1001

10

100

1000

R² = 0.208761354805839

Cufflinks

qPCR Estimates

Cuffl

inks

Esti

mat

es

1 10 1001

10

100

R² = 0.736479062391342

IsoEM

qPCR Estimates

IsoE

M E

stim

ates

Page 19: CRC Project on  Robust Transcript Discovery and Quantification from Sequencing Data

MAQC Illumina datasets

Page 20: CRC Project on  Robust Transcript Discovery and Quantification from Sequencing Data

250k 500k 1M 2M 4M 7M all0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Number of Reads

R2

Average R2 for 5 ION Torrent MAQC HBR Runs (avg. 1,559,842 reads)R2 for combined reads from 5 ION Torrent MAQC HBR Runs (7,799,210 reads)

R2 of IsoEM estimates from ION & Illumina HBR reads

Page 21: CRC Project on  Robust Transcript Discovery and Quantification from Sequencing Data

250k 500k 1M 2M 4M 7M all0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Number of Reads

R2

Average R2 for 5 ION Torrent MAQC UHR Runs (average 1,941,663 reads)R2 for combined reads from 5 ION Torrent MAQC UHR Runs (9,708,315 reads)

R2 of IsoEM estimates from ION & Illumina UHR reads

Page 22: CRC Project on  Robust Transcript Discovery and Quantification from Sequencing Data

Outline

1. IsoEM plugin for accurate prediction of transcription level

2. Results on ION data3. Comparison to other tools and platforms 4. Progress on transcript reconstruction5. Feedback on plugin development and

deployment on VM

Page 23: CRC Project on  Robust Transcript Discovery and Quantification from Sequencing Data

Virtual Transcript Expectation Maximization (VTEM)

ML estimates of transcriptfrequencies

Computeexpected exons

frequencies

Update weightsof reads in

virtual transcript

EM(Partially) Annotated

Genome+ Virtual Transcript

with 0-weightsin virtual transcript

Virtual Transcript frequencychange>ε?

Output overexpressed

exons

EM

YESNO

*

Page 24: CRC Project on  Robust Transcript Discovery and Quantification from Sequencing Data

Discovery and Reconstruction of Unannotated Transcripts (DRUT)

Page 25: CRC Project on  Robust Transcript Discovery and Quantification from Sequencing Data

Experimental results

Page 26: CRC Project on  Robust Transcript Discovery and Quantification from Sequencing Data

Outline

1. IsoEM plugin for accurate prediction of transcription level

2. Results on ION data3. Comparison to other tools and platforms 4. Progress on transcript reconstruction5. Feedback on plugin development and

deployment on VM

Page 27: CRC Project on  Robust Transcript Discovery and Quantification from Sequencing Data

What we would have liked…

• Available RNA-Seq runs on the demo VM• Documentation for XML tag sets used in

instance.html• Installed plugins that run without errors on

demo data• Remote access to production Torrent server

dedicated to plugin developers– Running tests on VM is very slow

Page 28: CRC Project on  Robust Transcript Discovery and Quantification from Sequencing Data

Work in progress for IsoEM plugin v.2– More genomes/transcript libraries– Expose more options to the user• Read mapping algorithm• Filtering of local alignments• Hexamer bias correction• Quality scores• Inference of fragment length distribution (for PE data)

– Bias correction using ERCC– Inference of allele specific isoform expression

(requires RNA-Seq SNP calling plugin)