31
Scoring transcript variation in single cell RNA-seq data Xiuwei Zhang

Scoring transcript variation in single cell RNA-seq data · 2020. 1. 3. · ES-cell1 ES-cell2 …... ES-cellm NPC-cell1 NPC-cell2 … NPC-celln E S-c e l l 1 E S-c e l l 2 …

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Scoring transcript variation in single cell RNA-seq data · 2020. 1. 3. · ES-cell1 ES-cell2 …... ES-cellm NPC-cell1 NPC-cell2 … NPC-celln E S-c e l l 1 E S-c e l l 2 …

Scoring transcript variation in

single cell RNA-seq data

Xiuwei Zhang

Page 2: Scoring transcript variation in single cell RNA-seq data · 2020. 1. 3. · ES-cell1 ES-cell2 …... ES-cellm NPC-cell1 NPC-cell2 … NPC-celln E S-c e l l 1 E S-c e l l 2 …

Single cell RNA-seq provides data at cellular resolution

Single Cell0

2

4

6

8

10

12

Cell 1Cell 2Cell 3Cell 4Cell 5

Bulk result0

0.51

1.52

2.53

3.54

4.5

Gene e

xpre

ssio

n level

Gene e

xpre

ssio

n level

Cell1 Cell2 Cell3 Cell4 Cell5

~100,000 cells

Sin

gle

-cell

RN

A-S

eq

Bulk

RN

A-S

eq

Page 3: Scoring transcript variation in single cell RNA-seq data · 2020. 1. 3. · ES-cell1 ES-cell2 …... ES-cellm NPC-cell1 NPC-cell2 … NPC-celln E S-c e l l 1 E S-c e l l 2 …

Read c

overa

ge in s

ing

le c

ells

In bulk

Single cell RNA-seq also shows variation in read coverage profiles

Single cell RNA-seq provides data at cellular resolutionSin

gle

-cell

RN

A-S

eq

Bu

lk R

NA

-Seq

Page 4: Scoring transcript variation in single cell RNA-seq data · 2020. 1. 3. · ES-cell1 ES-cell2 …... ES-cellm NPC-cell1 NPC-cell2 … NPC-celln E S-c e l l 1 E S-c e l l 2 …

Background

Traditional bulk RNA-seq tools for single cell data

● Calculating exon/intron inclusion scores:

– MISO (Katz et al. 2010) used in Shalek et al. 2013high isoform variation, bimodal distribution of PSI scores

– Bam2ssj (Pervouchine et al. 2013) used in Marinov et al, 2014 number of isoforms for one gene in a cell

● Find novel splice junctions

Global analysis of profile variation across single cellsSources? patterns? sub-populations?

Page 5: Scoring transcript variation in single cell RNA-seq data · 2020. 1. 3. · ES-cell1 ES-cell2 …... ES-cellm NPC-cell1 NPC-cell2 … NPC-celln E S-c e l l 1 E S-c e l l 2 …

Outline

● Method

– Profile Variation (PV) score

● Benchmarking and thresholding

– Various data sets

– Various gene categories and exons

– Compare with bulk RNA-seq

● Applications

– Genes with high isoform variation

– Patterns in isoform usage

– Genes which switch isoforms

Page 6: Scoring transcript variation in single cell RNA-seq data · 2020. 1. 3. · ES-cell1 ES-cell2 …... ES-cellm NPC-cell1 NPC-cell2 … NPC-celln E S-c e l l 1 E S-c e l l 2 …

Profile variation (PV) score

Num

ber

of

reads

(covera

ge)

Num

ber

of

reads

(covera

ge)

Gene g in Cell s

Genomic location 5' → 3'

Gene g in Cell t

Genomic location 5' → 3'

Page 7: Scoring transcript variation in single cell RNA-seq data · 2020. 1. 3. · ES-cell1 ES-cell2 …... ES-cellm NPC-cell1 NPC-cell2 … NPC-celln E S-c e l l 1 E S-c e l l 2 …

Probability distribution P1

Probability distribution P2

P1(i)

i

P2(i)

i

Profile variation (PV) score

Jensen-Shannon Divergence (JSD)

Page 8: Scoring transcript variation in single cell RNA-seq data · 2020. 1. 3. · ES-cell1 ES-cell2 …... ES-cellm NPC-cell1 NPC-cell2 … NPC-celln E S-c e l l 1 E S-c e l l 2 …

H( ): entropy

increases with the number of categories in a discrete probability distribution.

Profile variation (PV) score

Page 9: Scoring transcript variation in single cell RNA-seq data · 2020. 1. 3. · ES-cell1 ES-cell2 …... ES-cellm NPC-cell1 NPC-cell2 … NPC-celln E S-c e l l 1 E S-c e l l 2 …

H( ): entropy

increases with the number of categories in a discrete probability distribution.

PV = JSD/log2(L)

Profile variation (PV) score

gene length

Page 10: Scoring transcript variation in single cell RNA-seq data · 2020. 1. 3. · ES-cell1 ES-cell2 …... ES-cellm NPC-cell1 NPC-cell2 … NPC-celln E S-c e l l 1 E S-c e l l 2 …

PV with gene length regressed out

is the length regressed PV scores .

Page 11: Scoring transcript variation in single cell RNA-seq data · 2020. 1. 3. · ES-cell1 ES-cell2 …... ES-cellm NPC-cell1 NPC-cell2 … NPC-celln E S-c e l l 1 E S-c e l l 2 …

Outline

● Method

– Profile Variation (PV) score --- two versions of PV score

● Benchmarking and thresholding

– Various data sets

– Various gene categories and exons

– Compare with bulk RNA-seq

● Applications

– Genes with high isoform variation

– Patterns in isoform usage

– Genes which switch isoforms

Page 12: Scoring transcript variation in single cell RNA-seq data · 2020. 1. 3. · ES-cell1 ES-cell2 …... ES-cellm NPC-cell1 NPC-cell2 … NPC-celln E S-c e l l 1 E S-c e l l 2 …

Compare between data sets

Differentiating T helper cells

Embryonic stem (ES) cells: different culture conditions

different cell cycle phases

Mahata et al 2014

Kołodziejczyk et al 2015

Buttener et al 2015

Marinov et al 2014

Page 13: Scoring transcript variation in single cell RNA-seq data · 2020. 1. 3. · ES-cell1 ES-cell2 …... ES-cellm NPC-cell1 NPC-cell2 … NPC-celln E S-c e l l 1 E S-c e l l 2 …

Compare between gene categories

Page 14: Scoring transcript variation in single cell RNA-seq data · 2020. 1. 3. · ES-cell1 ES-cell2 …... ES-cellm NPC-cell1 NPC-cell2 … NPC-celln E S-c e l l 1 E S-c e l l 2 …

Thresholding PV scores

PV of AS genes

technical noise

biological noise

AS events

cell 1cell 2cell 3cell 4cell 5

Page 15: Scoring transcript variation in single cell RNA-seq data · 2020. 1. 3. · ES-cell1 ES-cell2 …... ES-cellm NPC-cell1 NPC-cell2 … NPC-celln E S-c e l l 1 E S-c e l l 2 …

Thresholding PV scores

PV of AS genes

technical noise

biological noise

AS events

cell 1cell 2cell 3cell 4cell 5

PV of exons

technical noise

biological noise

Page 16: Scoring transcript variation in single cell RNA-seq data · 2020. 1. 3. · ES-cell1 ES-cell2 …... ES-cellm NPC-cell1 NPC-cell2 … NPC-celln E S-c e l l 1 E S-c e l l 2 …

300~600 genes were found to have highly variable isoform usage

Thresholding PV scores with exons

Page 17: Scoring transcript variation in single cell RNA-seq data · 2020. 1. 3. · ES-cell1 ES-cell2 …... ES-cellm NPC-cell1 NPC-cell2 … NPC-celln E S-c e l l 1 E S-c e l l 2 …

single cell data

Compare with bulk RNA-seq data

ES cellsES cells NPC cellsNPC cells

PV score

bulk data Cuffdiff

Page 18: Scoring transcript variation in single cell RNA-seq data · 2020. 1. 3. · ES-cell1 ES-cell2 …... ES-cellm NPC-cell1 NPC-cell2 … NPC-celln E S-c e l l 1 E S-c e l l 2 …

Compare with bulk RNA-seq data

What is consistent

p=0.006

What is different

● Genes with high PV but not detected by Cuffdiff

● Enriched in cell cycle genes

● Biological variation within one cell type

Page 19: Scoring transcript variation in single cell RNA-seq data · 2020. 1. 3. · ES-cell1 ES-cell2 …... ES-cellm NPC-cell1 NPC-cell2 … NPC-celln E S-c e l l 1 E S-c e l l 2 …

Outline

● Method

– Profile Variation (PV) score -- two versions of PV score

● Benchmarking and thresholding

– Various data sets -- conforms with biological heterogeneity

– Various gene categories and exons -- significant variation

– Compare with bulk RNA-seq -- consistent and more than bulk

● Applications

– Genes with high isoform variation

– Patterns in isoform usage

– Genes which switch isoforms

Page 20: Scoring transcript variation in single cell RNA-seq data · 2020. 1. 3. · ES-cell1 ES-cell2 …... ES-cellm NPC-cell1 NPC-cell2 … NPC-celln E S-c e l l 1 E S-c e l l 2 …

Genes with highly variable isoforms

Isoform variation at two levels

T helper cells ES cells

expression regulationchromatin modification

immunology cell cycle

PV

Page 21: Scoring transcript variation in single cell RNA-seq data · 2020. 1. 3. · ES-cell1 ES-cell2 …... ES-cellm NPC-cell1 NPC-cell2 … NPC-celln E S-c e l l 1 E S-c e l l 2 …

cell

1

cell

2

. .

cell

n

Find representative read coverage patterns

cell 1

cell 2

.

.

cell n

pairwise is a metric√PV

√PV

Page 22: Scoring transcript variation in single cell RNA-seq data · 2020. 1. 3. · ES-cell1 ES-cell2 …... ES-cellm NPC-cell1 NPC-cell2 … NPC-celln E S-c e l l 1 E S-c e l l 2 …

cell

1

cell

2

. .

cell

n

Find representative read coverage patterns

cell 1

cell 2

.

.

cell n

Page 23: Scoring transcript variation in single cell RNA-seq data · 2020. 1. 3. · ES-cell1 ES-cell2 …... ES-cellm NPC-cell1 NPC-cell2 … NPC-celln E S-c e l l 1 E S-c e l l 2 …

Example: Nsf in T cells

Pattern A

Pattern B

Pattern C

Find representative read coverage patterns

Page 24: Scoring transcript variation in single cell RNA-seq data · 2020. 1. 3. · ES-cell1 ES-cell2 …... ES-cellm NPC-cell1 NPC-cell2 … NPC-celln E S-c e l l 1 E S-c e l l 2 …

Find correlated genes in isoform usage

Gene 1 Gene 2 Gene 3

Cell 1 Pattern A Pattern A Pattern A

Cell 2 Pattern A Pattern A

Cell 3 Pattern C

Cell 4 Pattern B Pattern B Pattern B

Cell 5 Pattern B Pattern B

Difficulty: genes are expressed in a small number of cells.

Page 25: Scoring transcript variation in single cell RNA-seq data · 2020. 1. 3. · ES-cell1 ES-cell2 …... ES-cellm NPC-cell1 NPC-cell2 … NPC-celln E S-c e l l 1 E S-c e l l 2 …

Find correlated genes in isoform usage

Cell vs “gene pattern“ binary matrix

Gene1PatternA

Gene1 PatternB

Gene2 PatternA

Gene2 PatternB

Gene3 PatternA

Gene3 PatternB

Gene3 PatternC

Cell 1 1 0 1 0 1 0 0

Cell 2 1 0 0 0 1 0 0

Cell 3 0 0 0 0 0 0 1

Cell 4 0 1 0 1 0 1 0

Cell 5 0 1 0 1 0 0 0

Page 26: Scoring transcript variation in single cell RNA-seq data · 2020. 1. 3. · ES-cell1 ES-cell2 …... ES-cellm NPC-cell1 NPC-cell2 … NPC-celln E S-c e l l 1 E S-c e l l 2 …

Find clusters of genes with Jaccard distance < h

Compare with random binary matrices:

Isoform usage across single cells has high stochasticity

Stochasticity in isoform usage

Page 27: Scoring transcript variation in single cell RNA-seq data · 2020. 1. 3. · ES-cell1 ES-cell2 …... ES-cellm NPC-cell1 NPC-cell2 … NPC-celln E S-c e l l 1 E S-c e l l 2 …

Genes which switch isoforms between cell types

Intra group distances

Intra group distances

Inter group distances

ES-cell1

ES-cell2

...

ES-cellm

NPC-cell1

NPC-cell2

NPC-celln

ES

-cell1

ES

-cell2

… ...

ES

-cellm

NPC

-cell 1

NPC

-cell 2

… NPC

-cell n

Inter group distances

1. high ratio of:

Average(Inter-group distances)

Average(Intra-group distances)

2. high PV(all cells)

ES cellsES cells NPC cellsNPC cells

Page 28: Scoring transcript variation in single cell RNA-seq data · 2020. 1. 3. · ES-cell1 ES-cell2 …... ES-cellm NPC-cell1 NPC-cell2 … NPC-celln E S-c e l l 1 E S-c e l l 2 …

NPC cells

12 ES cells 5 NPC cells

Lig1

Page 29: Scoring transcript variation in single cell RNA-seq data · 2020. 1. 3. · ES-cell1 ES-cell2 …... ES-cellm NPC-cell1 NPC-cell2 … NPC-celln E S-c e l l 1 E S-c e l l 2 …

Lig1protein domain of the short transcript

Lig1protein domains of the long transcript

● Lig1 is a cell cycle gene● Cells slow down cycling ES → NPC● Not detected in bulk data

ES cellsES cells NPC cellsNPC cells

Page 30: Scoring transcript variation in single cell RNA-seq data · 2020. 1. 3. · ES-cell1 ES-cell2 …... ES-cellm NPC-cell1 NPC-cell2 … NPC-celln E S-c e l l 1 E S-c e l l 2 …

Outline

● Method

– Profile Variation (PV) score -- two versions of PV score

● Benchmarking and thresholding

– Various data sets -- conforms with biological heterogeneity

– Various gene categories and exons -- significant variation

– Compare with bulk RNA-seq -- consistent and more than bulk

● Applications

– Genes with high isoform variation -- sources of isoform variation

– Patterns in isoform usage -- high stochasticity

– Genes which switch isoforms -- function change during differentiation

Page 31: Scoring transcript variation in single cell RNA-seq data · 2020. 1. 3. · ES-cell1 ES-cell2 …... ES-cellm NPC-cell1 NPC-cell2 … NPC-celln E S-c e l l 1 E S-c e l l 2 …

Sarah TeichmannSarah TeichmannOliver Stegle (EBI)Oliver Stegle (EBI)

John Marioni (EBI)John Marioni (EBI)

Nick Owens (MRC NIMR)Nick Owens (MRC NIMR)

Kaur Alasoo (Sanger Institute)Kaur Alasoo (Sanger Institute)

Acknowledgements

Jong Kyoung KimJong Kyoung KimValentine SvenssonValentine Svensson