Festival Of Genomics 2016 - Brain talk

Preview:

Citation preview

1Jean Fan / Festival of Genomics / June 2016

Jean Fan NSF GRFP | Bioinformatics and Integrative Genomics PhD Candidate Kharchenko Lab | Department of Biomedical Informatics | Harvard University

Applying single cell transcriptomics: unraveling the complexity of the brain

2Jean Fan / Festival of Genomics / June 2016

3

Motivation: Characterize heterogeneity and identify cell subpopulations with scRNA-seq

Jean Fan / Festival of Genomics / June 2016

Valent P, Bonnet D, De maria R, et al. Cancer stem cell definitions and terminology: the devil is in the details. Nat Rev Cancer. 2012;12(11):767-75.

Cancer

Kaech SM, Cui W. Transcriptional control of effector and memory CD8+ T cell differentiation. Nat Rev Immunol. 2012;12(11):749-61.

T Cells

4

Motivation: Characterize heterogeneity and identify cell subpopulations with scRNA-seq

Jean Fan / Festival of Genomics / June 2016

Greig LC, Woodworth MB, Galazo MJ, Padmanabhan H, Macklis JD. Molecular logic of neocortical projection neuron specification, development and diversity. Nat Rev Neurosci. 2013;14(11):755-69.

NPCs

5

Motivation: Characterize heterogeneity and identify cell subpopulations with scRNA-seq

Jean Fan / Festival of Genomics / June 2016

Greig LC, Woodworth MB, Galazo MJ, Padmanabhan H, Macklis JD. Molecular logic of neocortical projection neuron specification, development and diversity. Nat Rev Neurosci. 2013;14(11):755-69.

NPCs

Single cellRNA-seq

6

Food For Thought◦ How can we identify transcriptional subpopulations in a way that is robust

and takes into consideration technical artefacts from single cell RNA-seq?◦ What are the different ways to group and classify cells in the brain?◦ In additional to expression heterogeneity, how can we make the most out

of single-cell RNA-seq data?

Jean Fan / Festival of Genomics / June 2016

7

Food For Thought◦ How can we identify transcriptional subpopulations in a way that is

robust and takes into consideration technical artefacts from single cell RNA-seq?

◦ What are the different ways to group and classify cells in the brain?◦ In additional to expression heterogeneity, how can we make the most out

of single-cell RNA-seq data?

Jean Fan / Festival of Genomics / June 2016

8

Challenges: scRNA-seq data is highly variable and noisy◦ Expect high correlation between replicates

Jean Fan / Festival of Genomics / June 2016

expression in bulk replicate 1

expr

essio

n in

bul

k re

plic

ate

2

Bulk

9

Challenges: scRNA-seq data is highly variable and noisy◦ Expect high correlation between replicates◦ Many differences between individual cells

(even of the same type)◦ Biological vs. technical differences◦ Focus on the biological variability◦ Control for the technical variability

◦ ex. measurement failures (drop-outs)

Jean Fan / Festival of Genomics / June 2016

Single Cell

10

Previous work: SCDE - use error models to get a better handle on technical noise

Jean Fan / Festival of Genomics / June 2016

11

Previous work: SCDE - use error models to get a better handle on technical noise◦ Estimate true

biological variability of a gene

◦ Account for possible drop-out events

Jean Fan / Festival of Genomics / June 2016

Cross-fits

Cell 1

Cell

2

12

Previous work: SCDE - use error models to get a better handle on technical noise◦ Estimate true

biological variability of a gene

◦ Account for possible drop-out events

Jean Fan / Festival of Genomics / June 2016

Cross-fits Error Models

Cell 1

Cell

2

13

Previous work: SCDE - use error models to get a better handle on technical noise◦ Estimate true

biological variability of a gene

◦ Account for possible drop-out events

◦ Assess variability of expressing taking into consideration expression magnitude dependencies

Jean Fan / Festival of Genomics / June 2016

Variance Normalization

14Jean Fan / Festival of Genomics / June 2016

Error models and normalization helps us understand the data on a probabilistic level:

What is the chance this 0 expression in this cell is due to drop-out or true non-expression?

What is the chance that this gene is really this variable given the expected variability for genes at this average expression magnitude?

PAGODA (Pathway And Geneset OverDispersion Analysis) applies error models and variance normalization to characterize heterogeneity and identify subpopulations

pklab.med.harvard.edu/scde

PAGODA intuition: Improve statistical sensitivity by taking advantage of pathways and gene sets◦ Rather than relying on a few genes, look for broader patterns of variability◦ Coordinated patterns of variability of genes linked to function/phenotype

== stronger signal -> increases statistical power

PAGODA intuition: Improve statistical sensitivity by taking advantage of pathways and gene sets◦ Rather than relying on a few genes, look for broader patterns of variability◦ Coordinated patterns of variability of genes linked to function/phenotype

== stronger signal -> increases statistical power

PAGODA intuition: Improve statistical sensitivity by taking advantage of pathways and gene sets◦ Rather than relying on a few genes, look for broader patterns of variability◦ Coordinated patterns of variability of genes linked to function/phenotype

== stronger signal -> increases statistical power

PAGODA overview: assess expression within annotated pathways and de novo gene sets

PAGODA overview: assess expression within annotated pathways and de novo gene sets

PAGODA overview: Identify pathways and gene sets exhibiting coordinated over dispersion

PAGODA overview: Remove redundancy pathways and gene sets, and visualize

23Jean Fan / Festival of Genomics / June 2016

Pathway based approach integrates prior knowledge to increase statistical power and provide interpretability of identified subpopulations

(example next)

24

Food For Thought◦ How can we identify transcriptional subpopulations in a way that is robust

and takes into consideration technical artefacts from single cell RNA-seq?◦ What are the different ways to group and classify cells in the brain?◦ In additional to expression heterogeneity, how can we make the most out

of single-cell RNA-seq data?

Jean Fan / Festival of Genomics / June 2016

PAGODA applied to mouse neural progenitors identifies and characterizes subpopulations

cells

pathway clusters

Kun Zhang

Jerold Chun

PAGODA applied to mouse neural progenitors identifies and characterizes subpopulations

PAGODA applied to mouse neural progenitors identifies and characterizes subpopulations

PAGODA applied to mouse neural progenitors identifies and characterizes subpopulations

PAGODA applied to mouse neural progenitors identifies and characterizes subpopulations

PAGODA applied to mouse neural progenitors identifies and characterizes subpopulations

PAGODA applied to mouse neural progenitors identifies and characterizes subpopulations

32

PAGODA integrated with FISH data spatially placed subpopulations

github.com/hms-dbmi/brainmapr

PAGODA integrated with FISH data spatially placed subpopulations

Allen Brain Atlas; https://github.com/hms-dbmi/brainmapr

PAGODA identifies multiple, potentially overlapping aspects of transcriptional heterogeneity

PAGODA identifies multiple, potentially overlapping aspects of transcriptional heterogeneity

PAGODA identifies multiple, potentially overlapping aspects of transcriptional heterogeneity

Allen Brain Atlas; https://github.com/hms-dbmi/brainmapr

37

Food For Thought◦ How can we identify transcriptional subpopulations in a way that is robust

and takes into consideration technical artefacts from single cell RNA-seq?◦ What are the different ways to group and classify cells in the brain?◦ In additional to expression heterogeneity, how can we make the most

out of single-cell RNA-seq data?

Jean Fan / Festival of Genomics / June 2016

38

Food For Thought◦ How can we identify transcriptional subpopulations in a way that is robust

and takes into consideration technical artefacts from single cell RNA-seq?◦ What are the different ways to group and classify cells in the brain?◦ In additional to expression heterogeneity, how can we make the most

out of single-cell RNA-seq data? ◦ Alternative splicing

Jean Fan / Festival of Genomics / June 2016

39

PAGODA applied to human cortical cells identifies and characterizes subpopulations

Jean Fan / Festival of Genomics / June 2016

Xiaochang Zhang

Chris Walsh

40Jean Fan / Festival of Genomics / June 2016

Marker genes confirm subpopulation identified by PAGODA

41

PAGODA integrated with MISO identifies alternative splicing in pure pooled single cells

Jean Fan / Festival of Genomics / June 2016

42

PAGODA integrated with MISO identifies alternative splicing in pure pooled single cells

Jean Fan / Festival of Genomics / June 2016

Needs bulk

43

PAGODA integrated with MISO identifies alternative splicing in pure pooled single cells

Jean Fan / Festival of Genomics / June 2016

Needs bulk -> pool single cells

44

Pure pooled RGs vs neurons lend credence to potential purity concerns with bulk CP vs. VZ

Jean Fan / Festival of Genomics / June 2016

45

Food For Thought◦ How can we identify transcriptional subpopulations in a way that is robust

and takes into consideration technical artefacts from single cell RNA-seq?◦ What are the different ways to group and classify cells in the brain?◦ In additional to expression heterogeneity, how can we make the most

out of single-cell RNA-seq data? ◦ Alternative splicing◦ Copy number alteration detection / integrative analysis

Jean Fan / Festival of Genomics / June 2016

46

BADGER quantitatively assess posterior probabilities of copy number alterations

Jean Fan / Festival of Genomics / June 2016

Bayesian Approach to CNV Detection from single cell RNA-seq (BADGER)

47

BADGER quantitatively assess posterior probabilities of copy number alterations

Jean Fan / Festival of Genomics / June 2016

Bayesian Approach to CNV Detection from single cell RNA-seq (BADGER)

48

BADGER quantitatively assess posterior probabilities of copy number alterations

Jean Fan / Festival of Genomics / June 2016

Bayesian Approach to CNV Detection from single cell RNA-seq (BADGER)

49

BADGER applied to scRNA-seq identified subclonal expansion in progressive MM

Jean Fan / Festival of Genomics / June 2016

Soo Lee

Peter Park

Woong-Yang Park

Hae-Ock Lee

Initi

al

Bone

M

arro

wAs

cite

MM34

MM34A

50

BADGER applied to scRNA-seq identified subclonal expansion in progressive MM

Jean Fan / Festival of Genomics / June 2016

51

BADGER applied to scRNA-seq identified subclonal expansion in progressive MM

Jean Fan / Festival of Genomics / June 2016

52

BADGER applied to scRNA-seq identified subclonal expansion in progressive MM

Jean Fan / Festival of Genomics / June 2016

53

BADGER applied to scRNA-seq identified subclonal expansion in progressive MM

Jean Fan / Festival of Genomics / June 2016

54

PAGODA integrated with BADGER connects genetic with transcriptional heterogeneity

Jean Fan / Festival of Genomics / June 2016

55

PAGODA integrated with BADGER connects genetic with transcriptional heterogeneity

Jean Fan / Festival of Genomics / June 2016

56Jean Fan / Festival of Genomics / June 2016

ScRNA-seq contains (noisy) expression as well as (noisy) splicing and some (noisy) genetic information.

Novel statistical and computational methods and techniques are still needed to harness the potential of scRNA-seq data!

57

Thanks! Kharchenko Lab

Peter Kharchenko

Joseph Herman

Jean Fan / Festival of Genomics / June 2016

Park Lab

Soo Lee

Semin Lee

SGI

Hae-Ock Lee

Walsh Lab

Xiaochang Zhang

Funding

Recommended