30
Manuel Tardaguila, PhD SQANTI: Classification, Curation and Quantification of a PacBio transcriptome Ana Conesa Lab

SQANTI: Classification, Curation and Quantification of a ...€¦ · Disance o reerence TTS ns Upstream 1 b 1 b 1 b 1 b 1 b 1 b Mean Transcri eresion TPM 0 1 2 30 40 50 Read Coerage

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SQANTI: Classification, Curation and Quantification of a ...€¦ · Disance o reerence TTS ns Upstream 1 b 1 b 1 b 1 b 1 b 1 b Mean Transcri eresion TPM 0 1 2 30 40 50 Read Coerage

Manuel Tardaguila, PhD

SQANTI: Classification, Curation and Quantification of a PacBio transcriptome

Ana Conesa Lab

Page 2: SQANTI: Classification, Curation and Quantification of a ...€¦ · Disance o reerence TTS ns Upstream 1 b 1 b 1 b 1 b 1 b 1 b Mean Transcri eresion TPM 0 1 2 30 40 50 Read Coerage

PacBio in Oligodendrogenesis

Neural Stem Cells (neurospheres)

Oligodendrocyte Progenitor Cells

Immature Oligodendrocytes

Premyelinating mature Oligodendrocytes

Myelinated mature Oligodendrocytes

AAAAA

AAAAA

AAAAA

AAAAA

AAAAA

AAAAA

AAAAA

AAAAA

AAAAA

AAAAA

AAAAA

AAAAA

AAAAA

AAAAA

AAAAA AAAAA

AAAAA

AAAAA

AAAAA

AAAAA

AAAAA

AAAAAAAAAA

AAAAA

AAAAA

AAAAAIndels

SQANTI

PacBio RoIs ToFU pipeline

ToFU transcripts

RefSeq

Illumina Short reads

cDNA

CuratedTranscriptome

QC features + Machine Learning

Global Transcriptome Expressed Transcriptomes

cDNA

Mapping

CorrectedTranscriptome

Tardaguila et al SQANTI: extensive characterization of long read transcript sequences for quality control in full-length transcriptome identification and quantification Pre-print BioRxiv (2017)

Page 3: SQANTI: Classification, Curation and Quantification of a ...€¦ · Disance o reerence TTS ns Upstream 1 b 1 b 1 b 1 b 1 b 1 b Mean Transcri eresion TPM 0 1 2 30 40 50 Read Coerage

1. CLASSIFICATION OF PACBIO TRANSCRIPTS

Page 4: SQANTI: Classification, Curation and Quantification of a ...€¦ · Disance o reerence TTS ns Upstream 1 b 1 b 1 b 1 b 1 b 1 b Mean Transcri eresion TPM 0 1 2 30 40 50 Read Coerage

Splice-based classification (I): Known Transcripts

Full Splice Match (FSM)

Reference

PacBio

Reference

PacBio

Reference

PacBio

PacBio

Reference

Incomplete Splice Match (ISM)

Donor Acceptor

Splice Junction

n= 1

6,10

4

Page 5: SQANTI: Classification, Curation and Quantification of a ...€¦ · Disance o reerence TTS ns Upstream 1 b 1 b 1 b 1 b 1 b 1 b Mean Transcri eresion TPM 0 1 2 30 40 50 Read Coerage

Splice-based classification (II): Novel Transcripts from known genes

n= 1

6,10

4

Novel SJs with known Donors and acceptors

Reference

PacBio

Reference

PacBio

RetainedIntron

New combination of Known SJs

Novel In Catalog (NIC)

Novel Not in Catalog (NNC)

Novel SJs from novel Donors and/or Acceptors

Reference

PacBio

Page 6: SQANTI: Classification, Curation and Quantification of a ...€¦ · Disance o reerence TTS ns Upstream 1 b 1 b 1 b 1 b 1 b 1 b Mean Transcri eresion TPM 0 1 2 30 40 50 Read Coerage

Splice-based classification (III): Novel Transcripts from novel genes

n= 1

6,10

4

Intergenic

Fusion Transcripts

Gene A Gene B

PacBio

Genic Intron

PacBio

Exon A Exon BIntron A-B

Gene A Gene B

PacBio

Page 7: SQANTI: Classification, Curation and Quantification of a ...€¦ · Disance o reerence TTS ns Upstream 1 b 1 b 1 b 1 b 1 b 1 b Mean Transcri eresion TPM 0 1 2 30 40 50 Read Coerage

Splice-based classification (IV): Antisense and Genic Genomic transcripts

n= 1

6,10

4Gene A

Ref 1Ref 2PacBio

Antisense

Antisense

Genic Genomic

PacBio

Exon A Exon BIntron A-B

Page 8: SQANTI: Classification, Curation and Quantification of a ...€¦ · Disance o reerence TTS ns Upstream 1 b 1 b 1 b 1 b 1 b 1 b Mean Transcri eresion TPM 0 1 2 30 40 50 Read Coerage

CLASSIFICATION SUMMARY

• Splice based classification allows to separate known transcripts

(FSM and ISM) from novel transcripts arising from novel splice junctions

or new combinations of already known splice junctions (NIC and NNC)

• The predominant categories in our transcriptome are FSM,ISM, NIC

and NNC and they are mostly coding (oligodT library preparation)

Page 9: SQANTI: Classification, Curation and Quantification of a ...€¦ · Disance o reerence TTS ns Upstream 1 b 1 b 1 b 1 b 1 b 1 b Mean Transcri eresion TPM 0 1 2 30 40 50 Read Coerage

2. CURATION OF PACBIO TRANSCRIPTS

Page 10: SQANTI: Classification, Curation and Quantification of a ...€¦ · Disance o reerence TTS ns Upstream 1 b 1 b 1 b 1 b 1 b 1 b Mean Transcri eresion TPM 0 1 2 30 40 50 Read Coerage

40% novel isoforms in mouse... Are all of them real?

Page 11: SQANTI: Classification, Curation and Quantification of a ...€¦ · Disance o reerence TTS ns Upstream 1 b 1 b 1 b 1 b 1 b 1 b Mean Transcri eresion TPM 0 1 2 30 40 50 Read Coerage

PacBio output needs curation

• NNC transcripts characterized by novel Donors and/or Acceptors of splicing concentrate traits of low quality transcripts

Page 12: SQANTI: Classification, Curation and Quantification of a ...€¦ · Disance o reerence TTS ns Upstream 1 b 1 b 1 b 1 b 1 b 1 b Mean Transcri eresion TPM 0 1 2 30 40 50 Read Coerage

Using Short reads and Long reads to mine QC featuresB)

CorrectedTranscriptome

Page 13: SQANTI: Classification, Curation and Quantification of a ...€¦ · Disance o reerence TTS ns Upstream 1 b 1 b 1 b 1 b 1 b 1 b Mean Transcri eresion TPM 0 1 2 30 40 50 Read Coerage

Features selected for Random Forest Classificationbite

test set

as.fa

ctor

(test

ing$

bite

)

NEG POS

FALS

ETR

UE

0.0

0.2

0.4

0.6

0.8

1.0

NEG POS

25

1020

5010

020

050

020

0050

00

FL

test set

NEG POS

5e+0

15e

+02

5e+0

35e

+04

5e+0

5

geneExp

test set

NEG POS

110

010

000

isoExp

test set

NEG POS

12

510

20

Min_sample_cov

test set

NEG POS1

100

1000

0

MinCov

test set

NEG POS

2050

100

200

500

1000

2000

5000

MinCovPos

test set

NEG POS

12

510

20

nIndels

test set

NEG POS

1.0

1.5

2.0

2.5

3.0

3.5

4.0

nIndelsJunc

test set

NEG POS

1.0

1.2

1.4

1.6

1.8

2.0

ratioExp

test set

NEG POS

110

100

1000

1000

0

sdCov

test set

FSM_class

test set

test

ing[

, i]

NEG POS

AB

C

0.0

0.2

0.4

0.6

0.8

1.0

NEG POS

510

2050

exons

test set

NEG POS

1000

2000

3000

4000

5000

7000

length

test set

Page 14: SQANTI: Classification, Curation and Quantification of a ...€¦ · Disance o reerence TTS ns Upstream 1 b 1 b 1 b 1 b 1 b 1 b Mean Transcri eresion TPM 0 1 2 30 40 50 Read Coerage

PCR validation in an independent set of transcripts

PB.5557.1NNC isoform

Pgam5

463 pb*500 pb

+-

42

-+42

-+50

oligodTR.H.

Temp (°C)

* 895 pb

PB.4207.4NNC isoform

Trp53inp2

500 pb

1000 pb

+-

42

-+42

-+50

oligodTR.H.

Temp (°C)

PB.3658.3NNC isoform

Nolc1

729 pb500 pb

1000 pb

+-

42

-+42

-+50

oligodTR.H.

Temp (°C)

olig

o (d

T) P

CR

R.H

. PC

R

Positive

Negative

Total

Positive

Negative

Total

23 (3nc)

0

23

FSMNIC NNC

Fusion

7

2

9

8 (3 nc)

12 (8 nc)

20

1

2

3

5 (3nc)

0

5

7

0

7

2

6 (3nc)

8

1

0

1

Figure: First line: figure on the complete dataset. Second line: on the all mouse data; application of the classifier on all categories except FSM and ISM. Isoform distribution and quality control, threshold 0.5 . 1.2) Results on the PCR set:

Figure 3: ROC curve obtained on the PCR set. The red curve was obtained on the complete PCR set, the dark red curve on the PCR set reduced to the novel transcripts.

Figure 4: boxplot of the values of the random forest probabilities to be positive obtained on the PCR sets (red for the PCR negative transcripts, green for the PCR positive transcripts).

The area under roc obtained on the PCR sets is comprise between 96.66 (if all the type of sequences are evaluated) and 95.56 if the classifier is only applied to the novels.

Page 15: SQANTI: Classification, Curation and Quantification of a ...€¦ · Disance o reerence TTS ns Upstream 1 b 1 b 1 b 1 b 1 b 1 b Mean Transcri eresion TPM 0 1 2 30 40 50 Read Coerage

Splice Junction independent curation: Intrapriming features

• oligodT can prime outside polyA regions in A rich regions inside transcripts• We looked for transcripts showing >= 80% Adenines in the 20 nts downstream (DNA) the end of thetranscript• If this transcripts lack a consensus poly Adenilation site we filter them out • 605 transcripts filtered out

FSM

% o

f in

trapr

imin

g Fi

ltere

d ou

t in

each

cat

egor

y

0

20

40

60

80

100

ISM NICNNC

Genic

Genom

ic

Antise

nse

Fusion

Genic

Intron

Interg

enic

Coding

Non Coding

n=35 n=81 n=135 n=17 n=108 n=17 n=210

Page 16: SQANTI: Classification, Curation and Quantification of a ...€¦ · Disance o reerence TTS ns Upstream 1 b 1 b 1 b 1 b 1 b 1 b Mean Transcri eresion TPM 0 1 2 30 40 50 Read Coerage

SQANTI filter eliminates transcript with bad QC featuresn=

12,

908

Page 17: SQANTI: Classification, Curation and Quantification of a ...€¦ · Disance o reerence TTS ns Upstream 1 b 1 b 1 b 1 b 1 b 1 b Mean Transcri eresion TPM 0 1 2 30 40 50 Read Coerage

SQANTI filter works across different PacBio datasets: Maize

• Before SQANTI

• AFTER SQANTI

Page 18: SQANTI: Classification, Curation and Quantification of a ...€¦ · Disance o reerence TTS ns Upstream 1 b 1 b 1 b 1 b 1 b 1 b Mean Transcri eresion TPM 0 1 2 30 40 50 Read Coerage

SQANTI filter works across different PacBio datasets: MCF-7

• Before SQANTI

• AFTER SQANTI

Page 19: SQANTI: Classification, Curation and Quantification of a ...€¦ · Disance o reerence TTS ns Upstream 1 b 1 b 1 b 1 b 1 b 1 b Mean Transcri eresion TPM 0 1 2 30 40 50 Read Coerage

CURATION SUMMARY

• PacBio output needs curation, specially in NNC novel acceptor/

donnor transcripts

• SQANTI mines features based in splice junctions, expression and

structure to feed a RF classifier that succeeds in filtering out transcripts

that fail to validate in an independent PCR set

• SQANTI works across different datasets

Page 20: SQANTI: Classification, Curation and Quantification of a ...€¦ · Disance o reerence TTS ns Upstream 1 b 1 b 1 b 1 b 1 b 1 b Mean Transcri eresion TPM 0 1 2 30 40 50 Read Coerage

3. QUANTIFICATION OF PACBIO TRANSCRIPTOME

Page 21: SQANTI: Classification, Curation and Quantification of a ...€¦ · Disance o reerence TTS ns Upstream 1 b 1 b 1 b 1 b 1 b 1 b Mean Transcri eresion TPM 0 1 2 30 40 50 Read Coerage

Quantification: PacBio curated vs RefSeq: Transcripts

21934 36748782

6408

239

6845

6,8456,408

Tran

scrip

t ex

pres

ion

(log(

TPM

))

0.0

2.5

5.0

7.5

10.0

Found by both PacBio Exclusive

OLP

repl

icat

e 1

R2 = 0.39

PacBio

R2 = 0.89

R2 = 0.99

OLP replicate 2

High Expn=6,357

Medium Expn=5,677

Low Expn=422

RefSeq

OLP

repl

icat

e 1

OLP replicate 2

R2 = 0.3R2 = 0.88

R2 = 0.99

Found by both RefSeq Exclusive

High Expn=7,440

Medium Expn=7,440

Low Expn=8,379

% o

f tot

al G

enes

0

20

40

60

80

100

12-34-5>= 6

Isoforms per gene

0

20

40

60

80

100

% o

f tot

al G

enes

ENSEMBL

NIC: n

ew co

mbinati

on of

know

n SJNIC

: mon

oexo

n by I

R

NIC: n

ovel

SJ of k

nown D

/ANNC

antis

ense

fusion

genic

geno

mic

genic

intro

n

interg

enic

% o

f nov

el tr

ansc

ripts

iPac

Bio

Excl

usiv

e

0

10

20

30

40

50

603,674 PacBio exclusive transcripts

12-34-5>= 6

Isoforms per gene

PacBio

RefSeq

239

0.0

2.5

5.0

7.5

10.0

Tran

scrip

t ex

pres

ion

(log(

TPM

))8,78221,934 3,674

Found by both PacBio Exclusive

High Expn=2,729

Medium Expn=4,181

Low Expn=174

Found by both RefSeq Exclusive

High Expn=3,330

Medium Expn=6,589

Low Expn=3,315

OLP

repl

icat

e 1

R2 = 0.92

PacBio

R2 = 0.99

R2 = 0.99

OLP replicate 2

RefSeq

OLP

repl

icat

e 1

OLP replicate 2

R2 = 0.93

R2 = 0.98

R2 = 0.99

• PacBio finds a set of highly expressed transcripts and reproducible transcripts in common with RefSeq• RefSeq detects lots of lowly expressed, difficult to reproduce transcripts. However it also captures exclusively 30% of highly expressed ones (PacBio 18%).

Page 22: SQANTI: Classification, Curation and Quantification of a ...€¦ · Disance o reerence TTS ns Upstream 1 b 1 b 1 b 1 b 1 b 1 b Mean Transcri eresion TPM 0 1 2 30 40 50 Read Coerage

Most of the PacBio exclusive transcripts are new combinations of already known Splice Junction

21934 36748782

6408

239

6845

6,8456,408

Tran

scrip

t ex

pres

ion

(log(

TPM

))

0.0

2.5

5.0

7.5

10.0

Found by both PacBio Exclusive

OLP

repl

icat

e 1

R2 = 0.39

PacBio

R2 = 0.89

R2 = 0.99

OLP replicate 2

High Expn=6,357

Medium Expn=5,677

Low Expn=422

RefSeq

OLP

repl

icat

e 1

OLP replicate 2

R2 = 0.3R2 = 0.88

R2 = 0.99

Found by both RefSeq Exclusive

High Expn=7,440

Medium Expn=7,440

Low Expn=8,379

% o

f tot

al G

enes

0

20

40

60

80

100

12-34-5>= 6

Isoforms per gene

0

20

40

60

80

100

% o

f tot

al G

enes

ENSEMBL

NIC: n

ew co

mbinati

on of

know

n SJNIC

: mon

oexo

n by I

R

NIC: n

ovel

SJ of k

nown D

/ANNC

antis

ense

fusion

genic

geno

mic

genic

intro

n

interg

enic

% o

f nov

el tr

ansc

ripts

iPac

Bio

Excl

usiv

e

0

10

20

30

40

50

603,674 PacBio exclusive transcripts

12-34-5>= 6

Isoforms per gene

PacBio

RefSeq

239

0.0

2.5

5.0

7.5

10.0

Tran

scrip

t ex

pres

ion

(log(

TPM

))

8,78221,934 3,674

Found by both PacBio Exclusive

High Expn=2,729

Medium Expn=4,181

Low Expn=174

Found by both RefSeq Exclusive

High Expn=3,330

Medium Expn=6,589

Low Expn=3,315

OLP

repl

icat

e 1

R2 = 0.92

PacBio

R2 = 0.99

R2 = 0.99

OLP replicate 2

RefSeq

OLP

repl

icat

e 1

OLP replicate 2

R2 = 0.93

R2 = 0.98

R2 = 0.99

Page 23: SQANTI: Classification, Curation and Quantification of a ...€¦ · Disance o reerence TTS ns Upstream 1 b 1 b 1 b 1 b 1 b 1 b Mean Transcri eresion TPM 0 1 2 30 40 50 Read Coerage

Quantification: PacBio curated vs RefSeq: Genes

21934 36748782

6408

239

6845

6,8456,408Tr

ansc

ript

expr

esio

n (lo

g(TP

M))

0.0

2.5

5.0

7.5

10.0

Found by both PacBio Exclusive

OLP

repl

icat

e 1

R2 = 0.39

PacBio

R2 = 0.89

R2 = 0.99

OLP replicate 2

High Expn=6,357

Medium Expn=5,677

Low Expn=422

RefSeq

OLP

repl

icat

e 1

OLP replicate 2

R2 = 0.3R2 = 0.88

R2 = 0.99

Found by both RefSeq Exclusive

High Expn=7,440

Medium Expn=7,440

Low Expn=8,379

% o

f tot

al G

enes

0

20

40

60

80

100

12-34-5>= 6

Isoforms per gene

0

20

40

60

80

100

% o

f tot

al G

enes

ENSEMBL

NIC: n

ew co

mbinati

on of

know

n SJNIC

: mon

oexo

n by I

R

NIC: n

ovel

SJ of k

nown D

/ANNC

antis

ense

fusion

genic

geno

mic

genic

intro

n

interg

enic

% o

f nov

el tr

ansc

ripts

iPac

Bio

Excl

usiv

e

0

10

20

30

40

50

603,674 PacBio exclusive transcripts

12-34-5>= 6

Isoforms per gene

PacBio

RefSeq

239

0.0

2.5

5.0

7.5

10.0

Tran

scrip

t ex

pres

ion

(log(

TPM

))

8,78221,934 3,674

Found by both PacBio Exclusive

High Expn=2,729

Medium Expn=4,181

Low Expn=174

Found by both RefSeq Exclusive

High Expn=3,330

Medium Expn=6,589

Low Expn=3,315

OLP

repl

icat

e 1

R2 = 0.92

PacBio

R2 = 0.99

R2 = 0.99

OLP replicate 2

RefSeq

OLP

repl

icat

e 1

OLP replicate 2

R2 = 0.93

R2 = 0.98

R2 = 0.99

• RefSeq detects lots of lowly expressed genes. However it captures an important fraction of highly expressed ones 19% (PacBio 1.5%)

Page 24: SQANTI: Classification, Curation and Quantification of a ...€¦ · Disance o reerence TTS ns Upstream 1 b 1 b 1 b 1 b 1 b 1 b Mean Transcri eresion TPM 0 1 2 30 40 50 Read Coerage

Imposing a higher EXP threshold highlights advantages and disadvantages of PacBio quantification

5101 21117425

2110 59258385,8382,110

Tran

scrip

t ex

pres

ion

(log(

TPM

))

0.0

2.5

5.0

7.5

10.0

Found by both PacBio Exclusive

PacBio

OLP

repl

icat

e 1

R2 = -0.01R2 = 0.89

R2 = 0.99

OLP replicate 2

High Expn=6,350

Medium Expn=3,167

Low Expn=19

RefSeq

OLP

repl

icat

e 1

OLP replicate 2

R2 = 0.22R2 = 0.87

R2 = 0.99

Found by both RefSeq Exclusive

High Expn=7,339

Medium Expn=5,109

Low Expn=78

% o

f tot

al G

enes

12-34-5>= 6

Isoforms per gene

% o

f tot

al G

enes

12-34-5>= 6

Isoforms per gene

PacBio

RefSeq

592

0.0

2.5

5.0

7.5

10.0

Tran

scrip

t ex

pres

ion

(log(

TPM

))

7,4255,101 2,111

Found by both PacBio Exclusive

High Expn=2,698

Medium Expn=3,725

Low Expn=7

Found by both RefSeq Exclusive

High Expn=3,268

Medium Expn=4,488

Low Expn=186

OLP

repl

icat

e 1

R2 = 0.98

PacBio

R2 = 0.98

R2 = 0.99

OLP replicate 2

RefSeq

OLP

repl

icat

e 1

OLP replicate 2

R2 = 0.95

R2 = 0.98

R2 = 0.99

0

20

40

60

80

100

0

20

40

60

80

100

0

10

20

30

40

50

60

ENSEMBL

NIC: n

ew co

mbinati

on of

know

n SJNIC

: mon

oexo

n by I

R

NIC: n

ovel

SJ of k

nown D

/ANNC

antis

ense

fusion

genic

geno

mic

genic

intro

n

interg

enic

% o

f nov

el tr

ansc

ripts

iPac

Bio

Excl

usiv

e

2,111 PacBio exclusive transcripts

Page 25: SQANTI: Classification, Curation and Quantification of a ...€¦ · Disance o reerence TTS ns Upstream 1 b 1 b 1 b 1 b 1 b 1 b Mean Transcri eresion TPM 0 1 2 30 40 50 Read Coerage

Analysis of Most Expressed Transcript (MET) reveal unaccounted 3’ end variability that is captured by PacBio

1000

500

0

500

1000

Dow

nstre

amU

pstre

am

Dis

tanc

e to

refe

renc

e T

TS (n

ts)

1000 pb1500 pb

1000 pb

1500 pb

1000 pb

1500 pb

Mean Transcript expresion (TPM)0 10 20 30 40 50

Read Coverage

Dhrs7b - Dehydrogenase / reductase (SDR family) member 7B

NM_145428 /PB.1035.1

NM_001172112 /PB.1035.2

XM_006532869

Same M

ET

Differe

nt MET

PB.1035.1

XM_006532869

NM_145428

PB.1035.2

NM_001172112

***

ExR

GlRLo

wes

t SJ

cove

rage

by

sho

rt re

ads

***ns

Same M

ET

Differe

nt MET

ExR

GlR

Same M

ET

Differe

nt MET

***ns

Low

est m

ean

cove

rage

of

an

exon

0

0

500

1000

1500

2000

0

500

1000

1500

2000

Page 26: SQANTI: Classification, Curation and Quantification of a ...€¦ · Disance o reerence TTS ns Upstream 1 b 1 b 1 b 1 b 1 b 1 b Mean Transcri eresion TPM 0 1 2 30 40 50 Read Coerage

CUANTIFICATION SUMMARY

• PacBio captures a robust fraction of transcripts and genes being

expressed and at the same time allow for novel discovery.

• RefSeq captures more transcripts and genes, many lowly expressed

and hardly reproducible ones

• However, PacBio TOFU fails to capture many highly expressed

genes detected by RefSeq.

• PacBio transcriptome reveals unaccounted for 3’ end variability in

known transcripts that hamper RefSeq quantification.

Tardaguila et al SQANTI: extensive characterization of long read transcript sequences for quality control in full-length transcriptome identification and quantification Pre-print BioRxiv (2017)

Page 27: SQANTI: Classification, Curation and Quantification of a ...€¦ · Disance o reerence TTS ns Upstream 1 b 1 b 1 b 1 b 1 b 1 b Mean Transcri eresion TPM 0 1 2 30 40 50 Read Coerage

4. Functional outcome of alternative splicing: Transcript2GO

Page 28: SQANTI: Classification, Curation and Quantification of a ...€¦ · Disance o reerence TTS ns Upstream 1 b 1 b 1 b 1 b 1 b 1 b Mean Transcri eresion TPM 0 1 2 30 40 50 Read Coerage

T2GO combines SQANTI classification with Genomic, transcript and protein annotation to maximize analytical possibilities

Transcript2GO: assessing the functional outcome of alternative splicingmanuscript in preparation

Page 29: SQANTI: Classification, Curation and Quantification of a ...€¦ · Disance o reerence TTS ns Upstream 1 b 1 b 1 b 1 b 1 b 1 b Mean Transcri eresion TPM 0 1 2 30 40 50 Read Coerage

Diferential splicing linked to the appearance of sequence motifs on a transcriptome wide scale

NPC1 NPC2 OPC1 OPC2 MM

INPUT

NPC1 NPC2 OPC1 OPC2 M

Nuclear Fraction

NPC1 NPC2 OPC1OPC2

Cytosolic Fraction

GAPDH

H3-AcORF collapse

ns

***

***

Transcript2GO: assessing the functional outcome of alternative splicingmanuscript in preparation

Page 30: SQANTI: Classification, Curation and Quantification of a ...€¦ · Disance o reerence TTS ns Upstream 1 b 1 b 1 b 1 b 1 b 1 b Mean Transcri eresion TPM 0 1 2 30 40 50 Read Coerage

Thanks!