71
Annotation and differential analysis of alternative splicing using de novo assembly of RNAseq data Clara Benoit-Pilven Team ERABLE, LBBE, Lyon Team GENDEV, CRNL, Lyon ______ 31 March 2017

Annotation and differential analysis of alternative …...Annotation and differential analysis of alternative splicing using de novo assembly of RNAseq dataClara Benoit-Pilven Team

  • Upload
    others

  • View
    20

  • Download
    0

Embed Size (px)

Citation preview

Annotation and differential analysis of alternative splicing using

de novo assembly of RNAseq data

Clara Benoit-Pilven Team ERABLE, LBBE, Lyon Team GENDEV, CRNL, Lyon

______

31 March 2017

Alternative SplicingIntroduction 1

exonintron

gene

transcription

pre-mRNA

mRNA

splicing

protein

translation

Alternative Splicing1

exonintron

gene

transcription

pre-mRNA

mRNA

alternative splicing

protein

translation

Introduction

Alternative Splicing1

exonintron

gene

transcription

pre-mRNA

mRNA

alternative splicing

protein

translation

Introduction

PrematureSTOP Codon

degradation by NMD (Nonsense Mediated Decay)

Alternative Splicing1

exonintron

gene

transcription

pre-mRNA

mRNA

alternative splicing

protein

translation

à Concerns more than 95% of multi-exons human genes

à Deregulation of AS involved in many diseases (like cancer)

Introduction

PrématureSTOP Codon

dégradation par le NMD

(Nonsense Mediated Decay)

Assembly-first and Mapping first approaches 2

adapted from Martin et Wang, Nat Rev Genet 2011

RNA-seq reads

Mapping-first approach

Align readsto the

genome

Alternative splicing event

Introduction

Assembly-first and Mapping first approaches

adapted from Martin et Wang, Nat Rev Genet 2011

RNA-seq reads

Assembly-first approach

De-novo assembly

Align readsto the

genome

Aligncontigs to

the genome

Alternative splicing event

Introduction

Mapping-first approach

2

Assembly-first and Mapping first approaches

adapted from Martin et Wang, Nat Rev Genet 2011

RNA-seq reads

Assembly-first approach

De-novo assembly

Align readsto the

genome

Aligncontigs to

the genome

Alternative splicing event

Introduction

Mapping-first approach

2

Local methods :- MISO- DEXSeq- MATS

Global methods :- Cufflinks- String-Tie- Flip-Flop- Scripture

Local methods :- KisSplice

Global methods :- Trinity- Trans-ABySS

- OASES

3

How much the predictions of these two approaches overlap ?

Introduction

INTERSECTION

Mapping-first

Assembly-first

Identify pros and cons of assembly-first and mapping-first methods

à Comparison done on alternative skipped exon (ASE) events only

Introduction

INTERSECTION

Mapping-first

Assembly-first

Identify pros and cons of assembly-first and mapping-first methods

How much the predictions of these two approaches overlap ?

3

à Comparison done on alternative skipped exon (ASE) events only

à Public dataset (ENCODE) from neuroblastoma SK-N-SH cell line with or without retinoic acid (RA) treatment

How much the predictions of these two approaches overlap ?

Sk-n-sh cell lineSK-N-SH

RA treatmentduring 2 days

Differenciated Sk-n-sh cell lineSK-N-SH RA

Introduction

INTERSECTION

Mapping-first

Assembly-first

Identify pros and cons of assembly-first and mapping-first methods

3

FasterDB RNAseq pipelineMethods 4

Mapped reads

Reads

Mapping(TopHat2)

A BS

FasterDB RNAseq pipelineMethods

Mapped reads

Splicing event

Reads

Mapping(TopHat2)

Identification and annotation of splicing events

A BS

A BS

4

FasterDB RNAseq pipelineMethods

Mapped reads

Splicing event

Reads

Mapping(TopHat2)

Identification and annotation of splicing events

A BS

A BSA BS

150

10 10

Quantification

Quantified event

4

FasterDB RNAseq pipelineMethods

Mapped reads

Splicing event

Significant event

Reads

Mapping(TopHat2)

Identification and annotation of splicing events

A BS

A BS

A BS

150 / 50

10 / 100

Condition 1 Condition 2

10 / 100

A BS

150

10 10

Quantification

Differential analysis (KissDE)

Quantified event

4

FasterDB RNAseq pipelineMethods

Mapped reads

Splicing event

Significant event

Reads

Mapping(TopHat2)

Identification and annotation of splicing events

A BS

A BS

A BS

150 / 50

10 / 100

Condition 1 Condition 2

10 / 100

A BS

150

10 10

Quantification

Differential analysis (KissDE)

Quantified event

Condition 1

Ψ1 = 10

10 + 150= 6%

Condition 2

Ψ2 = 100

100 + 50= 67%

ΔΨ = Ψ2 - Ψ1 = 61 %

4

Differential analysis with KissDEl Count regression with negative binomial distribution l Generalized linear model (GLM)

Methods 5

mean gene expression

contribution of variant i

interaction termcontribution

of condition j

Lopez-Maestre, NAR, 2016

Differential analysis with KissDEl Count regression with negative binomial distribution l Generalized linear model (GLM)

l Target hypothesis

l Likelihood ratio test l Multiple test correction (Benjamini-hochberg procedure)

Methods 5

mean gene expression

contribution of variant i

interaction termcontribution

of condition j

Lopez-Maestre, NAR, 2016

Differential analysis with KissDEl Count regression with negative binomial distribution l Generalized linear model (GLM)

l Target hypothesis

l Likelihood ratio test l Multiple test correction (Benjamini-hochberg procedure)l Significant variants : P-value adjusted < 0,05 et ΔΨ ≥ 10%

Methods 5

mean gene expression

contribution of variant i

interaction termcontribution

of condition j

Lopez-Maestre, NAR, 2016

KisSplice pipelineMethods 6

Assembly, events identification and quantification (KisSplice)

Bubble in thede Bruijn Graph

Reads

http://kissplice.prabi.fr/

Assembly with KisSplice7

ATTCAATGGTAGCTATCTAT

AGTTGTATTCATAGCTATCTGTATTCATAGCTATCTATTA

CAATGGTAGCTATCTATTACTTCATAGCTATCTATTACCA

TTGTATTCAATGGTAGCTAT

Gene structure and sequence : …GTATTCA TAGCTAT…ATGG

Methods

7

ATTCAATGGTAGCTATCTAT

AGTTGTATTCATAGCTATCTGTATTCATAGCTATCTATTA

CAATGGTAGCTATCTATTACTTCATAGCTATCTATTACCA

TTGTATTCAATGGTAGCTAT

…GTATTCA TAGCTAT…ATGGGene structure and sequence :

Methods

Assembly with KisSplice

7

ATTCAATGGTAGCTATCTATATTC

TTCATCAA

CAAT...

GTATTCATAGCTATCTATTAGTAT

TATTATTC

TTCA...

List all k-mers for a chosen value of k

…GTATTCA TAGCTAT…ATGG

ATTCAATGGTAGCTATCTAT

AGTTGTATTCATAGCTATCTGTATTCATAGCTATCTATTA

CAATGGTAGCTATCTATTACTTCATAGCTATCTATTACCA

TTGTATTCAATGGTAGCTAT

Methods

Gene structure and sequence :

Assembly with KisSplice

7

Construct the de Bruijn graphfor that set of k-mers

…GTATTCA TAGCTAT…ATGG

ATTC TTCA AGCT

TCAT CATA ATAG

TAGC

TCAA CAAT AATG ATGG TGGT GGTA GTAG

TATTGTAT GCTA CTAT

Methods

Gene structure and sequence :

ATTCAATGGTAGCTATCTAT

AGTTGTATTCATAGCTATCTGTATTCATAGCTATCTATTA

CAATGGTAGCTATCTATTACTTCATAGCTATCTATTACCA

TTGTATTCAATGGTAGCTAT

ATTCAATGGTAGCTATCTATATTC

TTCATCAA

CAAT...

GTATTCATAGCTATCTATTAGTAT

TATTATTC

TTCA...

List all k-mers for a chosen value of k

Assembly with KisSplice

ATTC TTCA AGCT

TCAT CATA ATAG

TAGC

TCAA CAAT AATG ATGG TGGT GGTA GTAG

TATTGTAT GCTA CTAT

7

…GTATTCA TAGCTAT…ATGG

Methods

Gene structure and sequence :

Construct the de Bruijn graphfor that set of k-mers

ATTCAATGGTAGCTATCTAT

AGTTGTATTCATAGCTATCTGTATTCATAGCTATCTATTA

CAATGGTAGCTATCTATTACTTCATAGCTATCTATTACCA

TTGTATTCAATGGTAGCTAT

ATTCAATGGTAGCTATCTATATTC

TTCATCAA

CAAT...

GTATTCATAGCTATCTATTAGTAT

TATTATTC

TTCA...

List all k-mers for a chosen value of k

Assembly with KisSplice

ATTC TTCA AGCT

TCAT CATA ATAG

TAGC

TCAA CAAT AATG ATGG TGGT GGTA GTAG

TATTGTAT GCTA CTAT

7

…GTATTCA TAGCTAT…ATGG

Methods

Gene structure and sequence :

Construct the de Bruijn graphfor that set of k-mers

ATTCAATGGTAGCTATCTAT

AGTTGTATTCATAGCTATCTGTATTCATAGCTATCTATTA

CAATGGTAGCTATCTATTACTTCATAGCTATCTATTACCA

TTGTATTCAATGGTAGCTAT

ATTCAATGGTAGCTATCTATATTC

TTCATCAA

CAAT...

GTATTCATAGCTATCTATTAGTAT

TATTATTC

TTCA...

List all k-mers for a chosen value of k

Assembly with KisSplice

7

ATTC TTCA AGCT

TCAT CATA ATAG

TAGC

TCAA CAAT AATG ATGG TGGT GGTA GTAG

TATTGTAT GCTA CTAT

…GTATTCA TAGCTAT…ATGG

Methods

Gene structure and sequence :

Construct the de Bruijn graphfor that set of k-mers

ATTCAATGGTAGCTATCTAT

AGTTGTATTCATAGCTATCTGTATTCATAGCTATCTATTA

CAATGGTAGCTATCTATTACTTCATAGCTATCTATTACCA

TTGTATTCAATGGTAGCTAT

ATTCAATGGTAGCTATCTATATTC

TTCATCAA

CAAT...

GTATTCATAGCTATCTATTAGTAT

TATTATTC

TTCA...

List all k-mers for a chosen value of k

Assembly with KisSplice

All bubbles don’t come fromalternative splicing events

8

→ Sequencing errors : filter on edges relative minimum coverage (default : 5 %)

low expression high expression

Methods

8

low expression high expression

→ Repeats : filter on maximum number of ‟branching” nodes (default : 5)

R1

R2

Methods

→ Sequencing errors : filter on edges relative minimum coverage (default : 5 %)

All bubbles don’t come fromalternative splicing events

8

R1

R2

R4

R3

R5

R6

R7

R8

Methods

→ Sequencing errors : filter on edges relative minimum coverage (default : 5 %)

→ Repeats : filter on maximum number of ‟branching” nodes (default : 5)

All bubbles don’t come fromalternative splicing events

low expression high expression

KisSplice pipelineMethods 9

Assembly, events identification and quantification (KisSplice)

Bubble in thede Bruijn Graph

Reads

A BS

Mapping (STAR)

Mapped event

http://kissplice.prabi.fr/

KisSplice pipelineMethods

Assembly, events identification and quantification (KisSplice)

Bubble in thede Bruijn Graph

Reads

A BS

Mapping (STAR)

Mapped event A BS

150

10 10

Annotated and quantified event

Event annotation (Kiss2RefGenome)

http://kissplice.prabi.fr/

9

KisSplice2RefGenomeMethods 10

Exonskipping

Number ofalignment

blocks

3

2

SS SS SS SS

KisSplice2RefGenomeMethods 10

Exonskipping

Number ofalignment

blocks

3

2

SS SS SS SS

SS SS SS

Alternativedonor

2

2

SS SS SS

Alternativeacceptor

2

2

Intronretention

1

2

SS SS

> threshold

Multipleexon skipping

>3

2

Number ofalignment

blocks

SS SS SS SSSS SS

Deletion1

2≤ threshold

Insertion2

1

KisSplice pipelineMethods

Assembly, events identification and quantification (KisSplice)

Bubble in thede Bruijn Graph

Reads

A BS

Mapping (STAR)

Mapped event

Significant eventA BS

150 / 50

10 / 100

Condition 1 Condition 2

10 / 100

A BS

150

10 10

Differential analysis (KissDE)

Annotated and quantified event

Event annotation (Kiss2RefGenome)

Condition 1

Ψ1 = 10

10 + 150= 6%

Condition 2

Ψ2 = 100

100 + 50= 67%

ΔΨ = Ψ2 - Ψ1 = 61 %

http://kissplice.prabi.fr/

11

12

Compared pipelines

Methods

http://kissplice.prabi.fr/

Strength of this comparison :

- Similar rules for annotation and quantification

- Same statistical package to carry out the differential analysis

Comparison done at 2 levels:

- Annotation - Differential analysis

13

9 91830 577 4 676

Results

KisSpliceAssembly-first

approach

FaRLineMapping-first

approach

Comparison at the annotation level

Mapping-first approach finds manyunfrequent variants

expr

essio

n of

the

min

oriso

form

Results

FaRLineonly

KisSpliceonly

common

KisSpliceAssembly-first

approach

FaRLineMapping-first

approach

10

1000

FaRLine only Common KisSplice only

9 91830 577 4 676

13

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●●●●●●●

●●●●●●

●●

●●●

●●

●●●

●●●

●●

●●

●●●●●●●●

●●

●●●●●

●●●●●●●●●

●●

●●●●

●●●●

●●●●●

●●

●●

●●●●●●●●●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●●

10

1000

FaRLine only Common KisSplice onlyFaRLineonly

KisSpliceonly

common

expr

essio

n of

the

min

oriso

form

The overlap between methods increases whenunfrequent variants are filtered out

Results

KisSpliceAssembly-first

approach

5 7821 637 2 384

FaRLineMapping-first

approach

Unfrequent variant = less than 5 reads orrelative abundance < 10 %

14

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●●●●●●●

●●●●●●

●●

●●●

●●

●●●

●●●

●●

●●

●●●●●●●●

●●

●●●●●

●●●●●●●●●

●●

●●●●

●●●●

●●●●●

●●

●●

●●●●●●●●●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●●

10

1000

FaRLine only Common KisSplice onlyFaRLineonly

KisSpliceonly

common

expr

essio

n of

the

min

oriso

form

Strong candidates are however missed by each method

Results

KisSpliceAssembly-first

approach

FaRLineMapping-first

approach

Unfrequent variant = less than 5 reads orrelative abundance < 10 %

14

5 7821 637 2 384

Some events are systematically missed by one approach

Results

KisSpliceAssembly-first

approach

FaRLineMapping-first

approach

Repeats

Complexevents

Not annotated

Paralogs 65

1 620 338

762

15

Results

Events found only by KisSplice16

E8’

97

29 48

HIRA

→ New event

E8 E9E8’E8 E9

SK-N-SH RA

ψ = 0,28

Results

Events found only by KisSplice16

E8’

97

29 48

HIRA

→ New event

E8 E9E8’E8 E9

SK-N-SH RA

ψ = 0,28

- new alternative exon- new flanking exon- new association of exons

Results 17

Reference annotation

Incomplete annotation :exon not present in the annotation

Rich annotation :too many exons annotated

Annotations

Event

Results 17

Reference annotation

Rich annotation :too many exons annotated

Annotations

Event

How to choose the flanking exons ?

- the closest exons in the annotation- the most common flanking exons in the annotation (MISO)- all the possible flanking exons present in the annotation- all the possible flanking exons defined by the reads in the

RNAseq samples analyzed

Incomplete annotation :exon not present in the annotation

Results

Events found only by KisSplice17

E8’

97

29 48

HIRA

→ New event

E8 E9E8’E8 E9

SK-N-SH RA

ψ = 0,28

- new alternative exon- new flanking exon- new association of exons

RASA4 and RASA4B

→ Recent paralogs

Results

Events found only by KisSplice

168

329 412E18/E17

E17 E20E18

E17 E20

SK-N-SHψ = 0,69

18

E8’

97

29 48

HIRA

→ New event

E8 E9E8’E8 E9

SK-N-SH RA

ψ = 0,28

- new alternative exon- new flanking exon- new association of exons

Results

Events found only by FaRLine

→ Exons overlapping repeats

RAB5C164

29 77

ALU E1 E5E2

E1 E5

SK-N-SH RA

ψ = 0,24

19

Results

Events found only by FaRLine

→ Exons overlapping repeats

RAB5C164

29 77

ALU E1 E5E2

E1 E5

SK-N-SH RA

ψ = 0,24

19

Results

Events found only by FaRLine

→ Exons overlapping repeats

RAB5C164

29 77

ALU E1 E5E2

E1 E5

SK-N-SH RA

ψ = 0,24

19

Results

RAB5C

Events found only by FaRLine

164

29 77

ALU E1 E5E2

E1 E5

SK-N-SH RA

ψ = 0,24

→ Complex eventsRPAIN

Both

FaRLineonly

E4 E5 E6 E7

19

→ Exons overlapping repeats

Results

Annotation summary

18

→ Mapping-first approach is stronger for - rare variants - and exonised repeats (like ALU).

→ Assembly-first approach is stronger for - novel variants

- and recent paralogs.

Should I care about these differences ? Does it have an impact on my differential analysis ?

Results

Comparison after differential analysis

FaRLineMapping-first

approach

KisSpliceAssembly-first

approach

19

587287 522

Results

Comparison after differential analysis

FaRLineMapping-first

approach

KisSpliceAssembly-first

approach

ΔΨ (K

isSpl

ice)

ΔΨ (FaRLine)

y = −0.0061 + 0.99 ⋅ x , r2 = 0.944

−0.8

−0.4

0.0

0.4

−0.8 −0.4 0.0 0.4∆Ψ for the mapping−first method

19

587287 522

Complex events

Results

Comparison after differential analysis

FaRLineMapping-first

approach

KisSpliceAssembly-first

approach

ΔΨ (K

isSpl

ice)

ΔΨ (FaRLine)

y = −0.0061 + 0.99 ⋅ x , r2 = 0.944

−0.8

−0.4

0.0

0.4

−0.8 −0.4 0.0 0.4∆Ψ for the mapping−first method

19

587226 25261 270

Inherited fromthe annotation

Differences due to complex

events

Inherited fromthe annotation

Complex events

Results

Comparison to others methods (I)

4231

997

124 155

MISO -FaRLine -KisSplice

MISO -FaRLine

MISOMISO - KisSplice

20

Results

Comparison to others methods (I)20

5 2282 191 279

FaRLine MISO

4231

997

124 155

MISO -FaRLine -KisSplice

MISO -FaRLine

MISOMISO - KisSplice

Results

Comparison to others methods (II)

FaRLine

1 359 1036 060

Cufflinks

Mappingevent

Cufflinkstranscript

GTF2I

SK-N-SH

E12 E15E13

E12 E15

21

Results

Comparison to others methods (II)

KisSplice Trinity

3 855 9724 311

FaRLine

1 359 1036 060

Cufflinks

Mappingevent

Cufflinkstranscript

GTF2I

SK-N-SH

E12 E15E13

E12 E15

Trinitytranscript

KisSpliceevent

RFWD2

SK-N-SH

E8 E10E9

E8 E10

21

ConclusionAnnotating alternative splicing with a single approach leads to

missing a large number of candidates.These candidates cannot be neglected, since many of them are

differentially regulated across conditions.

Conclusion & perspectives 22

ConclusionAnnotating alternative splicing with a single approach leads to

missing a large number of candidates.These candidates cannot be neglected, since many of them are

differentially regulated across conditions.

We advocate for the use of a combination of both mapping-first and assembly-first approaches for annotation and differential

analysis of alternative splicing from RNA-seq data.

Conclusion & perspectives

Mapping-first

Assembly-first

UNION INTERSECTION

Mapping-first

Assembly-first

22

http://kissplice.prabi.fr/sknsh/http://biorxiv.org/content/early/2016/09/12/074807

Perspectives

Third generation sequencing to annotate splicing events

Conclusion & perspectives 23

è Long reads

è High error rateè Low throughtput

Advantage :

But :

AcknowledgmentsVincent LacroixLeandro Lima

Emilie ChautardCamille Marchet

Gustavo SacomotoDidier Auboeuf Cyril Bourgeois Amandine Rey

Marie-Pierre LambertLouis DulaurierSophie Terrone

Jean-Baptiste Claude

Thank you for your attention !

Comparison all vs all

KisSplice

FaRLine

MISO

CufflinksTrinity

Readmapping(TopHat)

Eventidentification & annotation

QuantificationDifferential

analysis(KissDE)

Knownannotations

New annotations

AnnotatedES event

Exon 1 Exon 2 Exon 3Intron 1 Intron 2

Exon 1 Exon 3Intron 1 Intron 2Exon 2

Junctionreads

2 reads 1 read

2 reads

2 reads

FasterDB RNAseq pipeline

KisSplice pipeline

Reads assembly & Event identification &

Quantification (KisSplice)

Event annotation(Kiss2RefGenome)

Differentialanalysis(KissDE)

Event mapping

(STAR)

ES event found by KisSplice(‘‘Bubble’’)

BA

S

Condition 1

Condition 2Exon(s)

ES event mapped onthe reference genome

Exon Intron

A

A S

Intron Exon Exon

B

B

PSI (Ψ) : Percent Spliced In

ΔΨ = Ψ1 - Ψ2

Ψ1 =inclusion1 + exclusion1

inclusion1 Ψ2 =inclusion2 + exclusion2

inclusion2

• Quantification : exon skipping example

A S B A S B

AS1 + SB1inclusion1 = 2

AS2 + SB2inclusion2 = 2

Condition 1 Condition 2

Event containing ALU elements

àMapping-first using a reference transcriptome

à Assembly-first

àMapping-first without using a reference transcriptome

Number of skipping of ALU exon annotated

TopHat with option --transcriptome-index

KisSplice

TopHat run default option

àMapping-first using a reference transcriptome

à Assembly-first

àMapping-first without using a reference transcriptome

TopHat with option --transcriptome-index

Results 18

Complex events

Results 18

Complex events