17
JAMES LINDSAY 1 CAROLINE JAKUBA 2 ION MANDOIU 1 CRAIG NELSON 2 Gene Expression Deconvolution with Single-cell Data UNIVERSITY OF CONNECTICUT 1 DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING 2 DEPARTMENT OF MOLECULAR AND CELL BIOLOGY

J AMES L INDSAY 1 C AROLINE J AKUBA 2 I ON MANDOIU 1 C RAIG N ELSON 2 Gene Expression Deconvolution with Single-cell Data U NIVERSITY O F C ONNECTICUT

Embed Size (px)

Citation preview

Page 1: J AMES L INDSAY 1 C AROLINE J AKUBA 2 I ON MANDOIU 1 C RAIG N ELSON 2 Gene Expression Deconvolution with Single-cell Data U NIVERSITY O F C ONNECTICUT

JAMES LINDSAY1

CAROLINE JAKUBA2

ION MANDOIU1

CRAIG NELSON2

Gene Expression Deconvolution with Single-

cell Data

UNIVERSITY OF CONNECTICUT1DEPARTMENT OF COMPUTER SCIENCE AND

ENGINEERING2DEPARTMENT OF MOLECULAR AND CELL BIOLOGY

Page 2: J AMES L INDSAY 1 C AROLINE J AKUBA 2 I ON MANDOIU 1 C RAIG N ELSON 2 Gene Expression Deconvolution with Single-cell Data U NIVERSITY O F C ONNECTICUT

Mouse Embryo

Somites

POSTERIOR / TAIL

ANTERIOR / HEAD

Node

Neura

l tube

Primitive streak

Page 3: J AMES L INDSAY 1 C AROLINE J AKUBA 2 I ON MANDOIU 1 C RAIG N ELSON 2 Gene Expression Deconvolution with Single-cell Data U NIVERSITY O F C ONNECTICUT

Unknown Mesoderm Progenitor

• What is the expression profile of the progenitor cell type?

NSB=node-streak border; PSM=presomitic mesoderm; S=somite; NT=neural tube/neurectoderm; EN=endoderm

Page 4: J AMES L INDSAY 1 C AROLINE J AKUBA 2 I ON MANDOIU 1 C RAIG N ELSON 2 Gene Expression Deconvolution with Single-cell Data U NIVERSITY O F C ONNECTICUT

Characterizing Cell-types

• Goal: Whole transcriptome expression profiles of individual cell-types

• Technically challenging to measure whole transcriptome expression from single-cells

• Approach: Computational Deconvolution of cell mixtures• Assisted by single-cell qPCR

expression data for a small number of genes

Page 5: J AMES L INDSAY 1 C AROLINE J AKUBA 2 I ON MANDOIU 1 C RAIG N ELSON 2 Gene Expression Deconvolution with Single-cell Data U NIVERSITY O F C ONNECTICUT

Modeling Cell Mixtures

Mixtures (X) are a linear combination of signature matrix (S) and concentration matrix (C)

𝑋𝑚𝑥𝑛=𝑆𝑚𝑥𝑘 ∙𝐶𝑘𝑥𝑛

mixtures

gen

es

cell typesg

ene

smixtures

cell

type

s

Page 6: J AMES L INDSAY 1 C AROLINE J AKUBA 2 I ON MANDOIU 1 C RAIG N ELSON 2 Gene Expression Deconvolution with Single-cell Data U NIVERSITY O F C ONNECTICUT

Previous Work

1. Coupled Deconvolution• Given: X, Infer: S, C

• NMF Repsilber, BMC Bioinformatics, 2010• Minimum polytope Schwartz, BMC Bioinformatics, 2010

2. Estimation of Mixing Proportions• Given: X, S Infer: C

• Quadratic Prog Gong, PLoS One, 2012• LDA Qiao, PLoS Comp Bio, 2o12

3. Estimation of Expression Signatures• Given: X, C Infer: S

• csSAM Shen-Orr, Nature Brief Com, 2010

Page 7: J AMES L INDSAY 1 C AROLINE J AKUBA 2 I ON MANDOIU 1 C RAIG N ELSON 2 Gene Expression Deconvolution with Single-cell Data U NIVERSITY O F C ONNECTICUT

Single-cell Assisted Deconvolution

Given: X and single-cells qPCR data Infer: S, C Approach:1. Identify cell-types and estimate reduced

signature matrix using single-cells qPCR data

• Outlier removal • K-means clustering followed by averaging

2. Estimate mixing proportions C using • Quadratic programming, 1 mixture at a time

3. Estimate full expression signature matrix S using C

• Quadratic programming , 1 gene at a time

�̂�

�̂�

Page 8: J AMES L INDSAY 1 C AROLINE J AKUBA 2 I ON MANDOIU 1 C RAIG N ELSON 2 Gene Expression Deconvolution with Single-cell Data U NIVERSITY O F C ONNECTICUT

Step 1: Outlier Removal + Clustering

unfiltered filtered

Remove cells that have maximum Pearson correlation to other cells below .95

Page 9: J AMES L INDSAY 1 C AROLINE J AKUBA 2 I ON MANDOIU 1 C RAIG N ELSON 2 Gene Expression Deconvolution with Single-cell Data U NIVERSITY O F C ONNECTICUT

Step 2: Estimate Mixture Proportions

min (‖�̂�𝑐−𝑥‖¿¿2) ,𝑠 . 𝑡 .{ ∑𝑐=1𝑐 𝑙≥0 ∀ 𝑙=0…𝑘

¿

𝑐=𝐶𝑙 ,𝑖 ∀ 𝑙=1…𝑘

𝑥=𝑋 𝑗 , 𝑖∀ 𝑗=1…𝑚

For a given mixture i:

Page 10: J AMES L INDSAY 1 C AROLINE J AKUBA 2 I ON MANDOIU 1 C RAIG N ELSON 2 Gene Expression Deconvolution with Single-cell Data U NIVERSITY O F C ONNECTICUT

Step 3: Estimating Full Expression Signatures

s: new gene to estimate signatures

mixtures

gen

es

cell types

gen

es

mixtures

cell

type

s

min (‖𝑠𝐶−𝑥‖¿¿2)¿Now solve:

C: known from step 2x: observed signals from new gene

Page 11: J AMES L INDSAY 1 C AROLINE J AKUBA 2 I ON MANDOIU 1 C RAIG N ELSON 2 Gene Expression Deconvolution with Single-cell Data U NIVERSITY O F C ONNECTICUT

Experimental Design

Simulated Concentrations• Sample uniformly at random

[0,1]• Scale column sum to 1.

Simulated Mixtures• Choose single-cells randomly

with replacement from each cluster

• Sum to generate mixture

Single Cell Profiles• 92 profiles• 31 genes

Actual Mixtures• 12 mixtures• 31 genes

Dimensions• k = 3• m = 31• n = 92, 12• # mixtures = {10…

300}

Page 12: J AMES L INDSAY 1 C AROLINE J AKUBA 2 I ON MANDOIU 1 C RAIG N ELSON 2 Gene Expression Deconvolution with Single-cell Data U NIVERSITY O F C ONNECTICUT

Data Processing

RT-qPCR

• CT values are the cycle in which gene was detected

• Relative Normalization to house-keeping genes

• HouseKeeping genes • gapdh, bactin1• geometric mean• Vandesompele, 2002

• dCT(x) = geometric mean – CT(x)• expression(x) = 2^dCT(x)

Page 13: J AMES L INDSAY 1 C AROLINE J AKUBA 2 I ON MANDOIU 1 C RAIG N ELSON 2 Gene Expression Deconvolution with Single-cell Data U NIVERSITY O F C ONNECTICUT

Accuracy of Inferred Mixing Proportions

Page 14: J AMES L INDSAY 1 C AROLINE J AKUBA 2 I ON MANDOIU 1 C RAIG N ELSON 2 Gene Expression Deconvolution with Single-cell Data U NIVERSITY O F C ONNECTICUT

Concentration Matrix: Concordance

predicted

Page 15: J AMES L INDSAY 1 C AROLINE J AKUBA 2 I ON MANDOIU 1 C RAIG N ELSON 2 Gene Expression Deconvolution with Single-cell Data U NIVERSITY O F C ONNECTICUT

Leave-one-out Accuracy of Inferred Gene Expression Signatures

Page 16: J AMES L INDSAY 1 C AROLINE J AKUBA 2 I ON MANDOIU 1 C RAIG N ELSON 2 Gene Expression Deconvolution with Single-cell Data U NIVERSITY O F C ONNECTICUT

Future Work

• Apply gene signature estimation technique using more genes in mixed samples

• Identify PSM-Pr Signature• Confirm the anatomical location of the putative PSM-

Pr cell population through exhaustive ISH

Page 17: J AMES L INDSAY 1 C AROLINE J AKUBA 2 I ON MANDOIU 1 C RAIG N ELSON 2 Gene Expression Deconvolution with Single-cell Data U NIVERSITY O F C ONNECTICUT

Conclusion

Special Thanks to:• Ion Mandoiu• Craig Nelson• Caroline Jakuba• Mathew Gajdosik

[email protected]