100
Single-Cell Transcriptome Analysis of Pluripotent Stem Cells Nacho Caballero Center for Regenerative Medicine Boston University Jun 12, 2017 From raw data to insights

Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Embed Size (px)

Citation preview

Page 1: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Nacho CaballeroCenter for Regenerative Medicine

Boston UniversityJun 12, 2017

From raw data to insights

Page 2: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Raw data

ATCG

Analysis pipeline

Page 3: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Raw data

ATCG

Initial QC

Analysis pipeline

Page 4: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Raw data

ATCG

Alignment and Quantification

Initial QC

Analysis pipeline

Page 5: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Raw data

ATCG

Alignment and Quantification

Outlier analysis

Initial QC

Analysis pipeline

Page 6: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Raw data

ATCG

Alignment and Quantification

Outlier analysis

Gene selection and clustering

Initial QC

Analysis pipeline

Page 7: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Raw data

ATCG

Alignment and Quantification

Outlier analysis

Gene selection and clustering

Initial QC Insights

Analysis pipeline

Page 8: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Raw data Initial QC Alignment and Quantification

Outlier analysis

Gene selection and clustering

Insights

ATCG

Page 9: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Barcodedsequencing

files

ATCG

Page 10: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Demultiplex

One pair of sequencing

filesper cell

Barcodedsequencing

files

ATCG

Page 11: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Demultiplex

One pair of sequencing

filesper cell

@NB500996:64:HNM72BGX2:3:12510:12240:93662:N:0:TAGTCATGCTACTGTCTAGAGCTTGTCTCAATGGATCTAGAACTTCATCGCCCTCTGATC…+AAAAAEEEE<E/EEEEEEEEE6EE/6AEEE//E/EEE/AEA/EAEEEE</6A……

Millions of reads

Barcodedsequencing

files

ATCG

Page 12: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Demultiplex

One pair of sequencing

filesper cell

@NB500996:64:HNM72BGX2:3:12510:12240:93662:N:0:TAGTCATGCTACTGTCTAGAGCTTGTCTCAATGGATCTAGAACTTCATCGCCCTCTGATC…+AAAAAEEEE<E/EEEEEEEEE6EE/6AEEE//E/EEE/AEA/EAEEEE</6A……

Millions of reads

Metadata fileCell_idCondition1Condition2Cell_01BU3redCell_02BU3greenCell_03C17redCell_04C17greenCell_05BU3redCell_06BU3green…

Barcodedsequencing

files

ATCG

Page 13: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Demultiplex

One pair of sequencing

filesper cell

@NB500996:64:HNM72BGX2:3:12510:12240:93662:N:0:TAGTCATGCTACTGTCTAGAGCTTGTCTCAATGGATCTAGAACTTCATCGCCCTCTGATC…+AAAAAEEEE<E/EEEEEEEEE6EE/6AEEE//E/EEE/AEA/EAEEEE</6A……

Millions of reads

Metadata fileCell_idCondition1Condition2Cell_01BU3redCell_02BU3greenCell_03C17redCell_04C17greenCell_05BU3redCell_06BU3green…

Barcodedsequencing

files

ATCG

Short simple names

Page 14: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Raw data Initial QC Alignment and Quantification

Outlier analysis

Gene selection and clustering

Insights

ATCG

Analysis pipeline

Page 15: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Position in ReadAvg

Sequ

ence

Qua

lity

Page 16: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Good cDNA quality

Position in ReadAvg

Sequ

ence

Qua

lity

Page 17: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Good cDNA quality

Read length is often inversely correlated with base-pair sequencing quality

Position in ReadAvg

Sequ

ence

Qua

lity

Page 18: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Good cDNA quality Average quality

Read length is often inversely correlated with base-pair sequencing quality

Position in ReadAvg

Sequ

ence

Qua

lity

Page 19: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Good cDNA quality Average quality Bad quality

Read length is often inversely correlated with base-pair sequencing quality

Position in ReadAvg

Sequ

ence

Qua

lity

Page 20: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Num

ber o

f rea

ds p

er c

ell

1M

10K

1K

0400 Cells

Page 21: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

More reads is generally better than longer reads (safe target: 200K reads, 150-bp long)

Num

ber o

f rea

ds p

er c

ell

1M

10K

1K

0400 Cells

Page 22: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

The Fluidigm protocol makes it extremely easy to lose entire rows or columns

Row

s

Columns

Page 23: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

The Fluidigm protocol makes it extremely easy to lose entire rows or columns

Row

s

Columns

Page 24: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Raw data Initial QC Alignment and Quantification

Outlier analysis

Gene selection and clustering

Insights

ATCG

Analysis pipeline

Page 25: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

We quantify the gene expression in a cell by counting how many reads align to each gene

Page 26: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

SFTPC gene

We quantify the gene expression in a cell by counting how many reads align to each gene

Page 27: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

AGGCAGAGGGGCGAGATGCA…

SFTPC gene

We quantify the gene expression in a cell by counting how many reads align to each gene

Page 28: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

AGGCAGAGGGGCGAGATGCA…

1358 reads aligned to the SFTPC gene in this cell

SFTPC gene

We quantify the gene expression in a cell by counting how many reads align to each gene

Page 29: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Read type Number of reads per cell

Raw 333,229

Unaligned 81,673

Aligned, but non-uniquely 28,813

Aligned uniquely, but not to a gene 32,774

Aligned uniquely, but span multiple genes 20,838

Aligned uniquely to a single gene 167,241

Page 30: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Read type Number of reads per cell

Raw 333,229

Unaligned 81,673

Aligned, but non-uniquely 28,813

Aligned uniquely, but not to a gene 32,774

Aligned uniquely, but span multiple genes 20,838

Aligned uniquely to a single gene 167,241

Page 31: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Read type Number of reads per cell

Raw 333,229

Unaligned 81,673

Aligned, but non-uniquely 28,813

Aligned uniquely, but not to a gene 32,774

Aligned uniquely, but span multiple genes 20,838

Aligned uniquely to a single gene 167,241

Page 32: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Read type Number of reads per cell

Raw 333,229

Unaligned 81,673

Aligned, but non-uniquely 28,813

Aligned uniquely, but not to a gene 32,774

Aligned uniquely, but span multiple genes 20,838

Aligned uniquely to a single gene 167,241

Page 33: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Read type Number of reads per cell

Raw 333,229

Unaligned 81,673

Aligned, but non-uniquely 28,813

Aligned uniquely, but not to a gene 32,774

Aligned uniquely, but span multiple genes 20,838

Aligned uniquely to a single gene 167,241

Page 34: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Read type Number of reads per cell

Raw 333,229

Unaligned 81,673

Aligned, but non-uniquely 28,813

Aligned uniquely, but not to a gene 32,774

Aligned uniquely, but span multiple genes 20,838

Aligned uniquely to a single gene 167,241

Page 35: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Read type Number of reads per cell

Raw 333,229

Unaligned 81,673

Aligned, but non-uniquely 28,813

Aligned uniquely, but not to a gene 32,774

Aligned uniquely, but span multiple genes 20,838

Aligned uniquely to a single gene 167,241

40-60% of the raw reads cannot be used to quantify gene expression

Page 36: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Raw data Initial QC Alignment and Quantification

Outlier analysis

Gene selection and clustering

Insights

ATCG

Analysis pipeline

Page 37: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Filter out cells with fewer than 5K aligned reads N

umbe

r of a

ligne

d re

ads

1M

10K

1K

0120 Cells

Page 38: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Filter out cells with a high percentage of mitochondrial gene counts (indicative of a broken cell membrane)

% o

f Mito

chon

dria

l gen

e co

unts 100%

75%

50%

048 Cells

25%

Page 39: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Filter out cells with less than 2K expressed genes N

umbe

r of e

xpre

ssed

gen

es6K

4K

030 Cells

2K

Page 40: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Raw data Initial QC Alignment and Quantification

Outlier analysis

Gene selection and clustering

Insights

ATCG

Analysis pipeline

Page 41: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Raw count data

Normalized expression data

Page 42: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Raw count data

Assume that most genes are not differentially expressed

Normalized expression data

Page 43: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Raw count data

Assume that most genes are not differentially expressed

Calculate scaling factors for each cell

Normalized expression data

Page 44: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Raw count data

Assume that most genes are not differentially expressed

Calculate scaling factors for each cell

Normalized expression data

Apply the scaling factors and log

Page 45: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Raw count data

Normalization corrects for differences in capture efficiency, sequencing depth and other technical bias

Assume that most genes are not differentially expressed

Calculate scaling factors for each cell

Normalized expression data

Apply the scaling factors and log

Page 46: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Aver

age

expr

essi

on

Variance

Page 47: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Aver

age

expr

essi

on

Expr

essi

on

Variance

Page 48: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Aver

age

expr

essi

on

Expr

essi

on

Variance

cell

Page 49: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Aver

age

expr

essi

on

Expr

essi

on

Variance

high expression low variance

cell

Page 50: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Aver

age

expr

essi

on

Expr

essi

on

Variance

high expression low variance

cell

Expr

essi

on

low expression low variance

Page 51: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Aver

age

expr

essi

on

Expr

essi

on

Variance

high expression low variance

cell

Expr

essi

on

low expression low variance

high expression high variance

high expression high variance

Page 52: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Typical questions

What are the expression differences between my experimental groups?

Page 53: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Typical questions

What are the expression differences between my experimental groups?

What are the subpopulations in my data?

Page 54: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Typical questions

What are the expression differences between my experimental groups?

What are the subpopulations in my data?

What are the gene expression patterns in each subpopulation?

Page 55: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

TREATCONDITIONS AS

GROUPS?

Page 56: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

TREATCONDITIONS AS

GROUPS?

ASSIGN CELLS TOGROUPS

SELECTGENES

NO

Page 57: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

ASSIGN CELLS TOGROUPS

SELECTGENES

NO

A difference between the populations (signal) should appear among the most variable genes

Aver

age

expr

essi

on

Variance

TREATCONDITIONS AS

GROUPS?

Page 58: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

ASSIGN CELLS TOGROUPS

SELECTGENES

NO

A difference between the populations (signal) should appear among the most variable genes

Aver

age

expr

essi

on

Variance

TREATCONDITIONS AS

GROUPS?

Page 59: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

ASSIGN CELLS TOGROUPS

SELECTGENES

NO

Variance is a necessary but insufficient indicator of population differences

Aver

age

expr

essi

on

Variance

TREATCONDITIONS AS

GROUPS?

Page 60: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

ASSIGN CELLS TOGROUPS

SELECTGENES

NO

Aver

age

expr

essi

on

Variance

Unique populations consistently over or under-express a set of genes

TREATCONDITIONS AS

GROUPS?

Page 61: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

ASSIGN CELLS TOGROUPS

SELECTGENES

NO

TREATCONDITIONS AS

GROUPS?

Page 62: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

ASSIGN CELLS TOGROUPS

SELECTGENES

NO

TREATCONDITIONS AS

GROUPS?

Page 63: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

ASSIGN CELLS TOGROUPS

SELECTGENES

NO

TREATCONDITIONS AS

GROUPS?

The silhouette coefficient is a useful metric to determine the optimal number of groups

Page 64: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

ASSIGN CELLS TOGROUPS

SELECTGENES

NO

k = 2 Silhouette coefficient: 0.48

TREATCONDITIONS AS

GROUPS?

The silhouette coefficient is a useful metric to determine the optimal number of groups

Page 65: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

ASSIGN CELLS TOGROUPS

SELECTGENES

NO

k = 3 Silhouette coefficient: 0.56

TREATCONDITIONS AS

GROUPS?

The silhouette coefficient is a useful metric to determine the optimal number of groups

Page 66: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

ASSIGN CELLS TOGROUPS

SELECTGENES

NO

k = 4 Silhouette coefficient: 0.47

TREATCONDITIONS AS

GROUPS?

The silhouette coefficient is a useful metric to determine the optimal number of groups

Page 67: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

ASSIGN CELLS TOGROUPS

TEST GENES FOR DIFFERENTIALEXPRESSION

YES

SELECTGENES

NO

TREATCONDITIONS AS

GROUPS?

Page 68: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

ASSIGN CELLS TOGROUPS

TEST GENES FOR DIFFERENTIALEXPRESSION

YES

SELECTGENES

NO

TREATCONDITIONS AS

GROUPS?

Variance

Aver

age

exp

ress

ion

Differentially expressed genes

Page 69: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

ASSIGN CELLS TOGROUPS

TEST GENES FOR DIFFERENTIALEXPRESSION

YES

SELECTGENES

NO

TREATCONDITIONS AS

GROUPS?

Variance

Aver

age

exp

ress

ion

Differentially expressed genes

Page 70: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

ASSIGN CELLS TOGROUPS

TEST GENES FOR DIFFERENTIALEXPRESSION

YES

SELECTGENES

NO

TREATCONDITIONS AS

GROUPS?

Variance

Aver

age

exp

ress

ion

Differentially expressed genes

Variance

Aver

age

exp

ress

ion

Highly variable genes

Page 71: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

ASSIGN CELLS TOGROUPS

TEST GENES FOR DIFFERENTIALEXPRESSION

YES

SELECTGENES

NO

TREATCONDITIONS AS

GROUPS?

Variance

Aver

age

exp

ress

ion

Differentially expressed genes

Variance

Aver

age

exp

ress

ion

Highly variable genes

Page 72: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Raw data Initial QC Alignment and Quantification

Outlier analysis

Gene selection and clustering

Insights

ATCG

Analysis pipeline

Page 73: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

The ideal heatmap

Page 74: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Real heatmaps are a rough-draft visualization

Page 75: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

NKX2-1CD47

Real heatmaps are a rough-draft visualization

Page 76: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

NKX2-1CD47

NKX2-1

CD47

Real heatmaps are a rough-draft visualization

Page 77: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

NKX2-1CD47

NKX2-1

CD47

ROW-SCALING GLOBAL SCALING

Real heatmaps are a rough-draft visualization

Page 78: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Expression patterns arebetter conveyed by showing individual genes

Page 79: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Expression patterns arebetter conveyed by showing individual genes

Page 80: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

CLU

STER

ED

Expression patterns arebetter conveyed by showing individual genes

Page 81: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

CLU

STER

EDR

AN

DO

M

Expression patterns arebetter conveyed by showing individual genes

Page 82: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Geneset enrichment analysis depends on the quality of the geneset

Page 83: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Geneset enrichment analysis depends on the quality of the geneset

MsigDB hallmark genesets only contain 4000 genes

Page 84: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Geneset enrichment analysis depends on the quality of the geneset

MsigDB hallmark genesets only contain 4000 genesMAKE YOUR OWN GENESETS FROM THE LITERATURE

Page 85: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells
Page 86: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells
Page 87: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells
Page 88: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells
Page 89: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells
Page 90: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells
Page 91: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells
Page 92: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells
Page 93: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells
Page 94: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Remember to provide a metadata file

Raw data Initial QC Alignment and Quantification

Outlier analysis

Gene selection and clustering

Insights

ATCG

Takeaways

Page 95: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Raw data Initial QC Alignment and Quantification

Outlier analysis

Gene selection and clustering

Insights

ATCG

Takeaways

More reads is usually better than longer reads

Page 96: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Raw data Initial QC Alignment and Quantification

Outlier analysis

Gene selection and clustering

Insights

ATCG

Takeaways

You will only be able to align 50% of your reads

Page 97: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Raw data Initial QC Alignment and Quantification

Outlier analysis

Gene selection and clustering

Insights

ATCG

Takeaways

Assume that 50% of your cells could fail

Page 98: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Raw data Initial QC Alignment and Quantification

Outlier analysis

Gene selection and clustering

Insights

ATCG

Takeaways

High variance doesn’t imply subpopulations

Page 99: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Raw data Initial QC Alignment and Quantification

Outlier analysis

Gene selection and clustering

Insights

ATCG

Takeaways

Make your own gene lists!

Page 100: Single-Cell Transcriptome Analysis of Pluripotent Stem Cells

Slides available at: bit.ly/crem_bioinformatics

Raw data Initial QC Alignment and Quantification

Outlier analysis

Gene selection and clustering

Insights

ATCG

Takeaways