18
One class Differential Expression Analysis using Tensor Decomposition based Unsupervised Feature Extraction Applied to Integrated Analysis of Multiple Omics Data from 26 Lung Adenocarcinoma Cell Lines Y-h. Taguchi Department of Physics, Chuo University, Tokyo, Japan. 

One class ‐ Differential Expression Analysis using Tensor Decomposition‐based Unsupervised Feature Extraction Applied to Integrated Analysis of Multiple Omics Data from 26 Lung

Embed Size (px)

Citation preview

Page 1: One class ‐ Differential Expression Analysis using Tensor Decomposition‐based Unsupervised Feature Extraction Applied to Integrated Analysis of Multiple Omics Data from 26 Lung

One class Differential Expression Analysis ‐using Tensor Decomposition based‐

Unsupervised Feature Extraction Applied to Integrated Analysis of Multiple Omics Datafrom 26 Lung Adenocarcinoma Cell Lines

Y­h. TaguchiDepartment of Physics,

Chuo University,Tokyo, Japan. 

Page 2: One class ‐ Differential Expression Analysis using Tensor Decomposition‐based Unsupervised Feature Extraction Applied to Integrated Analysis of Multiple Omics Data from 26 Lung

Reasons:

Purpose:

1. CCLs often differ from tumors from which CCLs were generated although CCLs are often used as representatives of tumors.

2. Most of studies using  CCLs focus the comparison between treated and control CCLs. Not CCLs themselves.

Characterizing cancer cell lines (CCLs) themselves without comparisons with anything referenced.

Page 3: One class ‐ Differential Expression Analysis using Tensor Decomposition‐based Unsupervised Feature Extraction Applied to Integrated Analysis of Multiple Omics Data from 26 Lung

Methods:

Through collecting 26 non­small cell lung (NSCL) CCLs from DBTSS(*) and  identifying genes commonly expressed among 26 NSCL­CCLs. 

But what is the definition of “commonly expressed” if  the references are missing?

(*) dbtss.hgc.jp: Data base of transcription start sites. Now including more data sets including histone modification, promoter methylation, RNA­seq, long  read, single cell RNA­seq etc, etc....

DBTSS(*)

Page 4: One class ‐ Differential Expression Analysis using Tensor Decomposition‐based Unsupervised Feature Extraction Applied to Integrated Analysis of Multiple Omics Data from 26 Lung

Solutions:

Usage of recently proposed tensor decomposition (TD)  based unsupervised feature extraction (FE)

Q:What is TD?

A: Extension of matrix factorization to tensor.

xijk = ∑l1,l2,l3 G(l1,l2,l3)xl1i xl2j xl3k

ik j

xijk

Tensor

l2

l1

l3

GCore tensor ixl1i

j

xl2j

k xl3kSingular 

value matrix

Page 5: One class ‐ Differential Expression Analysis using Tensor Decomposition‐based Unsupervised Feature Extraction Applied to Integrated Analysis of Multiple Omics Data from 26 Lung

Over all plans of this studyOver all plans of this study

Proposed methods

Synthetic data sets

Real data sets

Biological validations of selected genes

Page 6: One class ‐ Differential Expression Analysis using Tensor Decomposition‐based Unsupervised Feature Extraction Applied to Integrated Analysis of Multiple Omics Data from 26 Lung

TD based unsupervised FE applied to synthetic data set

Cell  lines

Genes

Omics AOmics B

Omics C

Omics

Commonly expressive

Omics specific expressive

Cell line specific expressive

Genes

expr

essi

on

Task  : identify “commonly expressive” genes

Page 7: One class ‐ Differential Expression Analysis using Tensor Decomposition‐based Unsupervised Feature Extraction Applied to Integrated Analysis of Multiple Omics Data from 26 Lung

Results of TD:

Commonly expressiveOmics specific expressiveCell line specific expressive

xl1i :i=1,...,104 genes

l1=1

l1=5

Singular value vectorsxl2j :j=1,...,20 cell lines

l2=1

jCell lines independent Cell lines independent 

expressionexpression

Page 8: One class ‐ Differential Expression Analysis using Tensor Decomposition‐based Unsupervised Feature Extraction Applied to Integrated Analysis of Multiple Omics Data from 26 Lung

Over all plans of this studyOver all plans of this studyReal data sets

Page 9: One class ‐ Differential Expression Analysis using Tensor Decomposition‐based Unsupervised Feature Extraction Applied to Integrated Analysis of Multiple Omics Data from 26 Lung

TD based unsupervised FE applied to real data set

 TSS­seqRNA­seq

ChIP­seq(H3K27ac) 

coincidence

regulation

Commonly expressive genes independent  of omics and cell lines

(*)dbtss.hgc.jp

 26 lung adenocarcinoma cell lines (*)

Page 10: One class ‐ Differential Expression Analysis using Tensor Decomposition‐based Unsupervised Feature Extraction Applied to Integrated Analysis of Multiple Omics Data from 26 Lung

Chromosome 1­22,X,Y 3 omics summed up within 25,000 bp each long  interval in each chromosome

TD

xjkiR(3 omics)  (26 cell lines)  (intervals) 

Page 11: One class ‐ Differential Expression Analysis using Tensor Decomposition‐based Unsupervised Feature Extraction Applied to Integrated Analysis of Multiple Omics Data from 26 Lung

Chromosome 1­22, X,Y 

The

  f ir s

tT

he  f i

r st 

c ell  

l ine 

s ing

ular

  val

ue v

ecto

r

26 lung adenocarcinoma cell lines

Page 12: One class ‐ Differential Expression Analysis using Tensor Decomposition‐based Unsupervised Feature Extraction Applied to Integrated Analysis of Multiple Omics Data from 26 Lung

G(l1,l2,l3)

Cell line singular value vectorsOmics singular value vectors

interval singular value vectors

Associated with omics independent 

expression

xl3i , l3=1,2,3

Page 13: One class ‐ Differential Expression Analysis using Tensor Decomposition‐based Unsupervised Feature Extraction Applied to Integrated Analysis of Multiple Omics Data from 26 Lung

xl3i , l3=1,2,3

Outliers = intervals associated with cell line independent expression

interval singular value vectors

 2703 Entrez gene IDs included in outlier intervals

l3=1

l3=2

l3=3

Page 14: One class ‐ Differential Expression Analysis using Tensor Decomposition‐based Unsupervised Feature Extraction Applied to Integrated Analysis of Multiple Omics Data from 26 Lung

Over all plans of this studyOver all plans of this study

Biological validations of selected genes

Page 15: One class ‐ Differential Expression Analysis using Tensor Decomposition‐based Unsupervised Feature Extraction Applied to Integrated Analysis of Multiple Omics Data from 26 Lung

Biological validations of selected genesBiological validations of selected genes

Q1: Are genes NSCLC specific?

A1: Yes (MSigDB) 78100 2625

Selected genesNSCLC

Q2: Are genes lung specific?

A2: No, 

tissue specificity lungCell type specificity

 glandular cells 

but glandular cells specific (g:profiler)

Page 16: One class ‐ Differential Expression Analysis using Tensor Decomposition‐based Unsupervised Feature Extraction Applied to Integrated Analysis of Multiple Omics Data from 26 Lung

GO term / Reactome enrichmentGO term / Reactome enrichment

SRP­dependent cotranslational protein targeting to membrane

nuclear­transcribed mRNA catabolic process, nonsense­mediated decay (NMD)

SRP & NMD are often reported cancer causing factors

As for more enrichment analyses performed, including PPI and TF binding, see supporting information of the paper.

Page 17: One class ‐ Differential Expression Analysis using Tensor Decomposition‐based Unsupervised Feature Extraction Applied to Integrated Analysis of Multiple Omics Data from 26 Lung

ConclusionsConclusions

TD based unsupervised FE was applied to synthetic data set and real (multi­omics) data set

As for the application to synthetic data set, TD based unsupervised FE successfully identified commonly expressed genes.

As for the application to real data set, TD based unsupervised FE successfully identified biologically reasonable genes.

Page 18: One class ‐ Differential Expression Analysis using Tensor Decomposition‐based Unsupervised Feature Extraction Applied to Integrated Analysis of Multiple Omics Data from 26 Lung

Future directions

Since DBTSS was recently (Sep. 2017) updated, it has more data sets to which TD based unsupervised FE can be applied.

I am looking for someone which can provide me more data set to which TD based unsupervised FE can be applied (e.g., paired multiomics measurements of in vitro study)