63
Beyond matrices: statistical method for higher-order tensors and its application. II Miaoyan Wang University of Wisconsin, Madison Fudan University, July, 2019

Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

Beyond matrices: statistical method for higher-ordertensors and its application. II

Miaoyan Wang

University of Wisconsin, Madison

Fudan University,

July, 2019

Page 2: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Talk outline

Prohibitive Computational Complexity

Most higher-order tensor problems are NP-hard [Hillar & Lim, 2013].

Topics I will address:

I Application of tensor decomposition to genetics

I Low-rank tensor estimation from binary data

I Multiway clustering via tensor block models

1 / 51

Page 3: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Applications of tensor decomposition to biology

I Many biomedical datasets come naturally in a multiway form, e.g.,patients × genes × experimental conditions.

I For example, time-series gene expression measurements could be orga-nized as a order-3 tensor A = JagstK ∈ RN×S×T .

gene

time

individual

I g = 1, . . . ,N indexes (often thou-sands of) the genes

I s = 1, . . . ,S indexes the individu-als/samples

I t = 1, . . . ,T indexes the timepoints.

I Typically, N � ST .

Goal

Identify subsets of genes that are similarly expressed within subsets of indi-viduals, and learn the time trajectories of the associated expression level.

2 / 51

Page 4: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Time-series gene expression

We simulate tensorial data A ∈ RN×S×T for time-series gene expression:

I Given a time t, the matrix slide A(·, ·, t) are transposable (i.e., boththe gene and sample indexes are permutable)⇒ simulate a “checkerboard” pattern for the latent bicluster member-ships at the initial time.

I Given a gene g and a sample s, the trajectory A(g , s, ·) is temporallystructured (i.e., the time indexes are meaningful)⇒ for each bicluster, select a time trajectory from a library of candidatefunctions, such as sigmoid, sines, cosines, etc.

time

gene

individual

timegene

individual

3 / 51

Page 5: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Tensor Applications to Time-Series Gene Expression Data

I Data (denoted A): noisy time-series gene expression with shuffled geneand sample indexes.

I Goal: Identify subsets of genes that are similarly expressed within sub-sets of individuals, and learn the time trajectories of the associatedexpression level.

We propose to

1 first perform truncated CP decomposition of A via two-mode HOSVDalgorithm:

gene

individual

time

2 then perform K-mean clusterings on the gene factor matrixG = [g1, . . . , gk ] and the sample factor matrix S = [s1, . . . , sk ].

4 / 51

Page 6: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Comparison to matrix-based clustering

Tensor-based clustering outperforms the matrix-based clustering.Matrix-based clustering:

I PCA on the time-averaged matrix M1 = 1T

∑Tt=1 A(·, ·, t)

I PCA on the sample-averaged matrix M2 = 1S

∑Ss=1 A(·, s, ·)

5 / 51

Page 7: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Comparison to HOSVD-based clustering

In our simulations, clustering based on CP decomposition is more accuratethan that based on Tucker decomposition/HOSVD.

HOSVD:

gene

individual

time

CP decomposition:

gene

individual

time

●●

5 10 15 20

0.10

0.15

0.20

0.25

Classification Error Rate of Gene Clustering

Model Complexity (# of sample groups among 20 samples)

clas

sific

atio

n er

ror

rate

(C

ER

)

● CP Decomposition via Two−mode HOSVD

Tucker/HOSVD Decomposition

6 / 51

Page 8: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Multi-tissue gene expression analysis

I Central dogma of genetics: DNAtranscription−−−−−−−→ RNA

translation−−−−−−→ protein.I GTEx RNA-seq data: expression profiles (∼ 25, 000 genes) measured

from 544 individuals across 53 tissues.

gene

individual

individual

gene

Genotype-tissue expression (GTEx) project, Aguet et al. Nature (2017) 550, 204-213.

7 / 51

Page 9: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Review: Matrix SVD for biclustering

=

samples

fea

ture

s

I Columns of U describe patterns across samples

I Columns of VT describe patterns across genesY Kluger et al, Genome Research (2003). 13(4): 703-71

Data Science Specialization (COURSERA) by Brian Caffo and Jeff Leek

8 / 51

Page 10: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Multi-tissue gene expression analysis

We have developed a semi-nonnegative tensor decomposition for three-wayclustering of multi-tissue multi-individual gene expression data.

entriwise

Wang et al., under review (2017). doi.org/10.1101/229245

9 / 51

Page 11: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Performance comparison

Our algorithm identifies 3-way clusters with higher accuracy compared toexisting methods.

0.10

0.20

0.30

rela

tive

err

or

0.6 0.8 1.0 1.2 1.4

HOSVD

5 5.5 6 6.5 7

0.20

0.25

0.30

0.35

rela

tive

err

or

our method

our method

SDA: sparse decomposition of array (Hore et al. Nat. Gen. 2016).HOSVD: higher-order singular value decomposition; Tucker decomposition (Omberg et al. PNAS. 2007).

details10 / 51

Page 12: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Component I: shared, global expression

I Both tissue- and individual-loadings are essentially flat ⇒ this compo-nent captures global expression common to all samples.

a. Tissue Loadings (Component 1)

c. Individual Loadings (Component 1)

ranked individuals

b. Gene Loadings (Component 1)

loa

din

g

0.04

0.02

0.00

loa

din

g

0.000

0.010

0.015

loa

din

g

0.00

0.05

0.10

0.15 brain

blood

artery

esophagus

skin

heart

digestion

others

adipos

0.005

ranked genes

11 / 51

Page 13: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Component I: shared, global expression

I Both tissue- and individual-loadings are essentially flat ⇒ this compo-nent captures global expression common to all samples.

a. Tissue Loadings (Component 1)

c. Individual Loadings (Component 1)

ranked individuals

b. Gene Loadings (Component 1)

loa

din

g

0.04

0.02

0.00

loa

din

g

0.000

0.010

0.015

loa

din

g

0.00

0.05

0.10

0.15 brain

blood

artery

esophagus

skin

heart

digestion

others

adipos

0.005

ranked genes

11 / 51

Page 14: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Component I: shared, global expression

I Top genes are mainly mitochondrial genes (15/20 top genes); othernon-mitochondrial genes include ACTB, EEF1A1, and EEF2.

loa

din

g

0.000

0.015

0 20 40 60

2.5e−11

5.0e−11

7.5e−11

1.0e−10

SRP-dependent cotraslational protein targeting to membrane

d. Enriched gene ontologies (GOs) among top genes (Component 1)

cotraslational protein targeting to membrane

protein targeting to endoplasmic reticulum

ribosomal large subunit biogenesis

cytoplasmic translation

top 200 genes

Benjamini-Hochberg

adjusted pvalue

0.010

Number of top genes in the GO

b. Gene Loadings (Component 1)

0.005

12 / 51

Page 15: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Component II: brain tissues

• Tissue vector clearly separates brain tissues from non-brain tissues.

a. Tissue Loadings (Component 2)

c. Individual Loadings (Component 2)

ranked individuals

loa

din

g

0.04

0.02

0.00

loa

din

g

-0.03

0.00

0.03

load

ing

0.00

0.10

0.20

0.30brain

blood

artery

esophagus

skin

heart

digestion

adipos

b. Gene Loadings (Component 2)

ranked genes

13 / 51

Page 16: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Component II: brain tissues

• Tissue vector clearly separates brain tissues from non-brain tissues.

a. Tissue Loadings (Component 2)

c. Individual Loadings (Component 2)

ranked individuals

loa

din

g

0.04

0.02

0.00

loa

din

g

-0.03

0.00

0.03

load

ing

0.00

0.10

0.20

0.30brain

blood

artery

esophagus

skin

heart

digestion

adipos

b. Gene Loadings (Component 2)

ranked genes

details

13 / 51

Page 17: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Component II: brain tissues

I Enriched GOs are mostly related to glutamate receptor signaling path-way and chemical synaptic transmission.

0 10 20 30

glutamate receptor signaling pathway

d. Enriched GOs among top genes (Component 2)

chemical synaptic transmission, postsynpatic

excitotary postsynaptic potential

memory

synaptic vesicle exocytosis

3e−14

6e−14

9e−14

B-H corrected p-value

Number of top genes in the GO

loa

din

g

-0.03

0.00

0.03 top 899 genes

b. Gene Loadings (Component 2)

ranked genes

14 / 51

Page 18: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Component II: brain tissues

I Age explains more variation (24.4%) than gender (0.3%) or ethnicity(4.3%).

d.

c. Individual Loadings (Component 2)

ranked individuals

loa

din

g0.04

0.02

0.00

I How to pinpoint the age-related genes?

15 / 51

Page 19: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Tensor projection to detect differentially-expressed genes

I We incorporate information across tissues by projecting the tensorthrough the tissue-vector.

I For a given test gene, we perform the following regression analysis

A(test gene, ·,Ti ) = Wβ0 + Xβ1 + ε,

where W is the covariate, and X is the primary variable of interest, andε ∼ N(0, I). Consider H0 : β1 = 0 vs. Hα : β1 6= 0.

indivdiuals

ge

ne

s

tissues

gene to be tested

The projected gene-by-individual matrix:

16 / 51

Page 20: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Power to detect differentially-expressed genes

I Using tensor-projection procedures, we identify 694 age-related genes,of which only 514 were detected by separate analyses.

I Receiver operating characteristic (ROC) plot from simulated expressiondata with age-related genes.

0.0 0.2 0.4 0.6 0.8 1.0

False positive rate

Tru

e p

ositi

ve r

ate

0.0

0.2

0.4

0.6

0.8

1.0

single tissue ( = 10)

Tensor projection with

r = 10 = # of tissues

r = 5

r = 3 = # of tissue groups

17 / 51

Page 21: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Gene expression in the brain

I We analyze 13 brain tissues (“subtensor”) and apply our tensor methodto this subtensor.

I Expression modules are spatially restricted to specific brain regions.

18 / 51

Page 22: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Gene expression in the brain

I We analyze 13 brain tissues (“subtensor”) and apply our tensor methodto this subtensor.

I Expression modules are spatially restricted to specific brain regions.

loa

din

g 0.6

0.4

0.2

0.0

0.8

0.4

0.0

loa

din

g 0.8

0.4

0.0

ranked tissues

loa

din

g

0.4

0.2

0.0

ranked tissues

loa

din

g

Tissue-vector (comp 3)

Tissue-vector (comp 5)

Cerebellum Cortex Basal ganglia

Others (Hippocampus, Amygdala, Spinal cord, etc)

Tissue-vector (comp 4)

ranked tissues

Tissue-vector (comp 2)Recall: i tensor component, i = 1, 2, ... th

ranked tissues

18 / 51

Page 23: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Gene expression in the brain

Table 1: Three-way clustering analysis of gene expression in the brain

Tissue Gene Individual

enriched region enriched ontologyvariance explained by

age gender ethnicity

cerebellum dorsal spinal cord development 0.0% 8.0% 0.2%cortex behavior defense response 16.7% 0.6% 1.4%

basal ganglia forebrain generation of neurons 1.3% 0.8% 1.7%others embryonic skeletal system morphogenesis 10.5% 0.7% 5.2%

Red number indicates p-value < 0.001.

19 / 51

Page 24: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Power to detect differentially-expressed genes

loa

din

g

Tissue-vector Top genes

Tensor component 8

0.6

0.4

0.2

0.0

IL1RL1, SERPINA3

etc

variance explained

age 22.1%

gender 0.6%

ethnicity 1.9%

Cerebellum Cortex Basal ganglia Others

Individual-vector

ranked tissues

I The 2nd top gene SERPINA3 is implicated in Alzheimer’s disease [Kam-

boh et al, Neurobiol Aging 2006].

I This gene has a moderate age effect in each single tissue (p rangingfrom 0.05 to 1.1× 10−6 with all effects in the same direction).

I Tensor projection method yields p = 1.0× 10−8 for the age effect ofthis gene ⇒ an increase of significance by two orders of magnitude.

20 / 51

Page 25: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Outline

I Application of tensor decomposition to genetics

I Low-rank tensor estimation from binary data

I Multiway clustering via tensor block models

20 / 51

Page 26: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Review: Matrix SVD

=

samples

fea

ture

s

I Columns of U describe patterns across samples

I Columns of VT describe patterns across genes

21 / 51

Page 27: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Tensor rank decomposition

I Tensor analogue of matrix SVD:

Y =R∑

r=1

λrar ⊗ br⊗cr ,

where λ1 ≥ λ2 ≥ . . . ≥ λR , ‖ar‖2 = ‖br‖2 = ‖cr‖2 = 1.

I Example: a rank-3 order-3 tensor.

+ +=

22 / 51

Page 28: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Low-rank approximation

I A noisy rank-R tensor:

Y︸︷︷︸observation

=R∑

r=1

λrar ⊗ br ⊗ cr︸ ︷︷ ︸signal

+ E︸︷︷︸noise

,

I Assume that R is known and the error E = JεijkK ∼ i.i.d. N(0, σ2).I Maximizing the likelihood under the i.i.d. Gaussian error model is equiv-

alent to minimizing the squared loss:

minar ,br ,cr ,λr :r∈[R]

‖Y −R∑

r=1

λrar ⊗ br ⊗ cr‖2F

I CANDECOMP/PARAFAC (CP) tensor approximation (Carroll & Chang, 1970,

Kolda & Bader, 2009, Casey et al, 2018) 23 / 51

Page 29: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Bibliography dataset in DBLP

I Three-way bibliography relationships (author, conference, keyword) ex-tracted from DBLP database.

I A tensor entry is 1 if the triplet co-occurs in a bibliography entry.I Could be organized into a tensor of the form Y ∈ {0, 1}10k×200×10k

(10k authors, 200 conferences, and 10k keywords)Conferences

24 / 51

Page 30: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Latent variable models for binary tensors

Directly applying rank-R approximation to Y = JYijkK is sub-optimal. why?(symmetry w.r.t. 0↔ 1; non-negativity; non-linearity)

"Preference" tensor

Binary tensor

I Θ is unknown. Θ has low rank.I π : R 7→ [0, 1] is a known function (e.g., the logistic curve).I Θ ∈ Rd1×d2×d3, Y ∈ {0, 1}d1×d2×d3 .

25 / 51

Page 31: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Probabilistic models for binary tensors

Latent variable formulation:

I We observe Y = JYijkK = {0, 1}d1×d2×d3 .

I Assume an underlying continuous-valued tensor Z = JZijkK:

Yijk |Zijk =

{1 if Zijk ≥ 0,

0 if Zijk < 0.

I The latent tensor Z is further modeled as:

Z = Θ︸︷︷︸signal

+ E︸︷︷︸noise

, where Rank(Θ) ≤ R,

and the entries in E = JεijkK follow i.i.d. N(0, σ2).

I Rank (Θ) = min{R ∈ N+ : Θ =∑R

r=1 a1r ⊗ · · · ⊗ ak

r }.I Assume R is known or otherwise could be estimated (e.g., using BIC).

26 / 51

Page 32: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Connection with GLM

More generally, we can model the binary tensor Y as

Y = JYijkK ∼ind. Bernoulli(π(Θ)), where Rank(Θ) ≤ R,

and π : R 7→ [0, 1] is the link function (we allow π to be applied to tensorsin a point-wise manner):

I Logistic link: π(θ) = 11+e−θ/σ

⇐⇒ latent noise E follows i.i.d. logisticdistribution with scale parameter σ.

I Probit link: π(θ) = Φ(θ/σ), where Φ(·) is the CDF for standard normaldistribution ⇐⇒ latent noise E follows i.i.d. N(0, σ2).

27 / 51

Page 33: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Low-rank tensor estimation

I Goal: estimate low-rank Θ ∈ Rd1×d2×d3 and/or π(Θ) ∈ [0, 1]d1×d2×d3 .

I Data: Y ∈ {0, 1}d1×d2×d3 .

Log-likelihood:

LY(Θ) =∑ijk

(1(Yijk=1) log π(θijk) + 1(Yijk=0) log (1− π(θijk))

).

We propose to estimate Θ using the constrained MLE:

ΘMLE = arg maxΘ∈DLY(Θ).

where D(R, α) ={

Θ ∈ Rd1×d2×d3 : rank(Θ) ≤ R, ‖Θ‖∞ ≤ α}

.

28 / 51

Page 34: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Convergence rate

Define Loss (Θ1,Θ2) = 1√∏k dk‖Θ1 −Θ2‖F , where Θ1,Θ2 ∈ Rd1×···×dK .

Error bound for constrained MLE (W. and Li, 2019)

Assume that Θtrue ∈ D(R, α) and regularity conditions on the link functionπ. Then with high probability,

Loss(

ΘMLE,Θtrue)≤ C

Lαγα

√log(K )RK−1

∑k dk∏

k dk,

where

Lα := sup|θ|≤α

π(θ)

π(θ)(1− π(θ)), γα = inf

|θ|≤α

(π2(θ)

π2(θ)− π(θ)

π(θ)

)> 0.

Is this bound tight?

29 / 51

Page 35: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Special case: Probit link

Lα,σ ≈ασ + 1

σ, γα,σ ≈

ασ + 1

6

σ2e−α

2/σ2.

Theorem (W. and Li, 2019)

I Upper bound for constrained MLE: w.h.p.

Loss(

ΘMLE,Θtrue

)≤ min

{2α,Cσ

(α + σ

6α + σ

)exp

(α2

σ2

)√dmax∏k dk

}.

I Lower bound for any estimator:

infΘ

supΘtrue∈D(R,α)

ELoss(Θ,Θtrue) ≥ min

{α, σC ′

√dmax∏k dk

}.

I The constrained-MLE is rate-optimal in terms of {dk} in the class oflow-rank tensors.

I Tensor vs. “best” matricization: O(d−K+1) vs O(d−bK/2c).

30 / 51

Page 36: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Phase Diagram

I Define signal-to-noise ratio (SNR) = Θtrue

σ = ασ .

SNR � O(1) O(1) & SNR� O(d−(K−1)/2) O(d−(K−1)/2) & SNR

Binary tensor σeα2/σ2

d−(K−1)/2 ↘ σd−(K−1)/2 ↗ α

Continuous-valued tensor σd−(K−1)/2 σd−(K−1)/2 α

Table: Error bound for estimating Θtrue ∈ (Rd )⊗K from noisy observations.

I Noise helps in the high SNR regime! (Davenport et al 2014, Cai-Zhou 2013.)

''noise helps'' region

"noise hurts" region

31 / 51

Page 37: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Noise helps!

Intuition: If σ = 0, we cannot recover the magnitude of the underlyingtensor Θ ∈ Rd1×···×dK from the observed binary tensor Y ∈ {0, 1}d1×···×dK .

I Consider the rank-1 probit model without noise.

Yijk = 1{θijk>0}, Θ = a ⊗ a ⊗ a.

I Cannot distinguish a = (a1, . . . , ad)T and a = (sign(a1), . . . , sign(ad))T .

I In contrary to many statistical problems: noise is essential for recoveringΘ.

I Our constrained MLE based on binary observations achieves the samedegree of accuracy as if it were given access to the completely unquan-tized measurements.

32 / 51

Page 38: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Algorithm

Alternating optimization

maxΘ=JθijkK

LY(Θ) =∑ijk

log π (qijkθijk) ,

subject to Θ =R∑

r=1

ar ⊗ br ⊗ cr , Θ ≤ α,

where qijk = 2Yijk − 1 taking values -1 or 1 and π(·) is the link function.

I Denote by A = [a1, . . . , aR ], B = [b1, . . . ,bR ] and C = [c1, . . . , cR ].I Objective function is multi-convex in A,B and C .I Update At+1 ← maxA LY(A; Bt ,C t); solved by GLM.I Modify At+1 ← γAt + (1−γ)At+1 subject to infinity-norm constraint;

1-dimension optimization over γ ∈ [0, 1].

33 / 51

Page 39: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Empirical performance

Interplay between statistical efficiency and computational cost

Let M(t) = [A(t),B(t),C (t)] denote a sequence of estimators generatedfrom alternating GLM Algorithm. Under certain conditions∗ on the initialpoint M(0) and the true point Mtrue, we have, with very high probability,

Loss(

Θ(M(t)),Θtrue

)≤ C1ρ

tε︸ ︷︷ ︸algorithmic error

+C2Lαγα

√RK−1

∑k dk∏

k dk︸ ︷︷ ︸statistical error

,

where ρ ∈ (0, 1) is a contraction parameter, and C1,C2 > 0 are two con-stants.

Stopping criterium:

t ≥ T � log1/ρ

{d (k−1)/2

}.

34 / 51

Page 40: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

A long but fun journey

The spectral analysis for higher-order tensors is much more dedicated thanthe matrix analysis.

I Norm landscape over all possible tensor unfoldings (W. et al, LAA’17)

I Perturbation analysis of Higher-order SVD algorithm (W. and Song,AISTATS’17)

I Exponential-valued tensor analysis (W and Li’19, Zeng and W.’19)

I Regularity on decomposition algorithm (W. et al’, AOAS’19)

35 / 51

Page 41: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Simulation

I Rank selection via BIC:σ = 0.1 σ = 0.01

True rank d = 20 d = 40 d = 60 d = 20 d = 40 d = 60

R = 10 8.7 (0.9) 10 (0) 10 (0) 8.8 (0.4) 10 (0) 10 (0)

R = 40 36.8 (1.1) 39.6(1.7) 40.2 (0.4) 36.0 (1.2) 38.8(1.6) 40.3 (1.1)

I Comparison with alternative methods

BooleanTF (Boolean tensor factorization, Miettinen 2011, Rukat et al 2018); BTF Bayesian

(Rai et al 2014); BTF Gradient (Hong et al 2018)36 / 51

Page 42: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Experiments with real data

I Nations (Nickel et al., 2011). 14× 14× 56 binary tensor consisting of56 relations among 14 countries.

I Human connectome project (HCP) (Wang et al., 2017a). 68×68×212 binary tensor consisting of structural connectivity patterns between68 brain regions among 212 individuals.

I Enron (Zhe et al., 2016). 581 × 124 × 48 binary tensor depictingthree-way relationships (sender, receiver, time) from the Enron emaildataset.

I Kinship (Nickel et al., 2011). 104× 104× 26 binary tensor consistingof 26 types of relations among 104 individuals.

37 / 51

Page 43: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Political relationships in 1950-1965

I 56 types of relationships among 14 countries between 1950-1965.

I Multi-relational data can be organized as a tensor Y ∈ {0, 1}14×14×56.A tensor entry is 1 if the relation holds between two countries.

I Relations: “sends tourists to”, “exports books to”, “joint membershipof NGOs”, “conferences”, etc.

I Countries: USSR, Poland, China, UK, Brazil, etc.

38 / 51

Page 44: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Estimates from binary tensor decomposition

39 / 51

Page 45: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Human connectome project (HCP)

I Structural connectivity patterns among 68 brain regions for 212 indi-viduals from HCP.

I Brain images were parcellated to 68 regions-of-interest following theDesikan atlas (Desikan et al 2006).

I Strong spatial patterns in the brain connectivity. e.g. edges capturedby tensor component 2 are located within the cerebral hemisphere.

I Nodes with high connectivity intensity: superior frontal gyrus (sensorysystem), corpus callosum (commissural tract).

Component 6

40 / 51

Page 46: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Comparison with continuous-valued tensor decomposition

I Binary tensor decomposition has prediction performance compared toclassical CP tensor decomposition.

I Link prediction from 5-fold cross-validation:

tensor decomposition methoddataset non-zeros binary (logistic link) continuous-valued

AUC MSE AUC MSEHPC 35.3% 0.9860 1.3× 10−3 0.9314 1.4× 10−2

Nations 21.1% 0.9169 1.1× 10−2 0.8619 2.2× 10−2

Kinship 3.8% 0.9708 1.2× 10−4 0.9436 1.4× 10−3

Enron 0.01% 0.9432 6.4× 10−3 0.7956 6.3× 10−5

AUC: area under the receiver operating characteristic (ROC) curve;

MSE: mean squared error

41 / 51

Page 47: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Outline

I Application of tensor decomposition to genetics

I Low-rank tensor estimation from binary data

I Multiway clustering via tensor block models

41 / 51

Page 48: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Tensor block model

I In many applications, the data tensors are often expected to have un-derlying block structure.

I Examples of tensor block model (TBM):

Input Output Input Output

Figure: (a) Our TBM method is used for multiway clustering and for revealing the underlyingcheckerbox structure in a noisy tensor. (b) The sparse TBM method is used for detecting sub-tensors of elevated means.

Related work:I Low-rank estimation: block structure implies low-rankness, but directly ap-

plying low-rank estimation to a block tensors yields an inferior estimator.I Multiway clustering: typically two steps, by first estimating a low-rank repre-

sentation, and then performing clustering on the resulting factors.42 / 51

Page 49: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Tensor block model

I Tensor block model on Y = Jyi1,...,iK K ∈ Rd1×···×dK :

I Suppose the tensor entry yi1,...,iK belongs to the block determined bythe rkth cluster in the mode k for rk ∈ [Rk ], then

yi1,...,iK = cr1,...,rK + εi1,...,iK , for (i1, . . . , iK ) ∈ [d1]× · · · × [dK ],

I Can be viewed as a super sparse Tucker model:

Y = C ×1 M1 ×2 · · · ×K MK + E ,

where C ∈ RR1×···×RK is a core tensor consisting of block means, Mk ∈{0, 1}Rk×dk is a membership matrix for mode k , and E is the sub-Gaussian (σ) noise tensor.

I Special cases: Gaussian tensor block model (real tensors), Stochastictensor block model (binary tensors, σ = 1/4)

43 / 51

Page 50: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Identifiability

Irreducible core

The core tensor C is irreducible if it cannot be written as a block tensor withthe number of mode-k clusters smaller than Rk , for any k ∈ [K ].

I K = 2: C has no two identical rows and no two identical columns.

I General K : none of order-(K -1) fibers of C are identical.

I Irreducibility is a weaker assumption than full-rankness.

Identifiability

Under the irreducibility assumption, the factor matrices Mk ’s are identifiableup to permutations of cluster labels.

I Recall that Tucker and many other factor analyses suffer from rotationalinvariance.

44 / 51

Page 51: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Least-square estimation

I We propose a least-square estimation:

Θ = arg minΘ∈P

{−2〈Y,Θ〉+ ‖Θ‖2

F

}= arg min

Θ∈P‖Y −Θ‖2

F .

where the parameter space P is

P ={

Θ ∈ Rd1×···×dK : Θ = C ×1 M1 ×2 · · · ×K MK ,with some

membership matrices Mk ’s and a core tensor C ∈ RR1×···×RK}.

I Algorithm: alternating optimization with suitable initialization.

45 / 51

Page 52: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Convergence rate (Zeng and W., 2019)

Let Θ be the least-square estimator of under tensor block model. There exists twoconstants C1,C2 > 0 such that, with w.h.p.,

Loss2(Θtrue, Θ) ≤ C1σ2∏

k dk(

∏k

Rk︸ ︷︷ ︸estimating block means

+∑k

dk logRk︸ ︷︷ ︸estimating block allocations

).

In the special case d = d1 = . . . = dK :

Our rate Tucker rate (Zhang & Xia’ 18) Lasso rate (Chi et al ’18)

O(d logR) O(dR) O(dK−1)

Partition consistency (Zeng and W., 2019)

Under mild separation assumption∗ on block means, there exist permutation ma-trices Pk ’s such that, the misclassification rate (MCR) satisfies∑

k

MCR(Mk ,PkMtrue)→ 0, in probability.

∗block-mean gap = 1‖C‖F

minrk ,r′k,I−k|fiber(Crk ,I−k

)− fiber(Cr′k,I−k

)| ≥ O(d) for all k ∈ [K ].46 / 51

Page 53: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Extension to sparse estimation

I In many large-scale applications, not every block in a data tensor is ofequal importance.

I For example, in the genome-wise expression data analysis, only a fewentries represent the signals while the majority come from the back-ground noise

I We propose the regularized least-square estimation:

Θsparse = arg minΘ∈P

{‖Y −Θ‖2

F + λ ‖‖ Cρ},

where ρ = 1 (lasso), ρ = 0 (subset sparse), or ρ = Frobenius (ridge),among many others.

I Sparse estimation incurs slight changes to the previous Algorithm.

Input Output Input Output

47 / 51

Page 54: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Simulation

I Simulate order-3 Gaussian block tensors with d1 logR1 ≈ d2 logR2 ≈d3 logR3.

I Performance assessment by root mean squared error (RMSE):

Figure: (a) Average RMSE against d1. (b) Average RMSE against rescaled samplesize N =

√d2d3/ logR1.

48 / 51

Page 55: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Comparison with alternative methods

We compare our tensor block model (TBM) with two popular low-ranktensor estimation: (i) CP decomposition, and (ii) Tucker decomposition.

I Non-sparse case

TBM

I Sparse case

Sparsity (ρ) Noise (σ) Penalization (λ) Estimated Sparsity Rate Correct Zero Rate Sparsity Error Rate

0.5 4λ = 0 0(0) 0(0) 0.49(0.03)

λ = 136.4 0.56(0.04) 0.99(0.02) 0.06(0.03)

0.5 8λ = 0 0(0) 0(0) 0.49(0.03)

λ = 439.7 0.59(0.05) 0.99(0.01) 0.14(0.06)

0.8 8λ = 0 0(0) 0(0) 0.80(0.05)

λ = 241.3 0.83(0.06) 0.95(0.04) 0.12(0.06)

49 / 51

Page 56: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

Conclusions and future work

I We have developed a framework of statistical models, scalable algo-rithms, and statistical theory to analyze tensor-valued data.

I The general strategy is to carve out a broad range of specially-structuredtensors that are useful in practice, and to develop efficient algorithmsfor analyzing these high-dimensional tensor data.

I Other structures are also possible, e.g. non-negativity (M. et al, AOAS2019, in press) and smoothness (on-going).

I Extend the modeling for second-order structure in tensors, e.g. tensornormal model with factorized covariance

E ∼ N(0,Φ1 ⊗ Φ2 ⊗ Φ3).

Tensor analogy of matrix normal models (W. et al, PNAS 2018).

50 / 51

Page 57: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

gene expression low-rank binary tensor tensor block model Conclusions and future work

References:

I Yuchen Zeng and M. Wang. Multiway clustering via tensor block model. Under Review inNIPS. (2019).

I M. Wang and L. Li. Learning from binary multiway data: probabilistic tensor decompositionand its statistical optimality. Under Review in JMLR. (2019).

I M. Wang, J. Fischer, and Y. S. Song. Three-way clustering of multi-tissue multi-individualgene expression data using semi-nonnegative tensor tensor decomposition. Annals of Ap-plied Statistics. (2019), Vol. 13, No. 2, 1124-1148.

I M. Wang et al. Two-way Mixed-Effects Methods for Joint Association Analyses Using BothHost and Pathogen Genomes. PNAS. Vol. 115 (24), E5440-E5449, (2018)

I M. Wang and Y. S. Song. Tensor decomposition via two-mode higher-order SVD (HOSVD).Journal of Machine Learning Research W&CP (AISTATS track), Vol. 54, (2017) 614-622.

I M. Wang, K. Dao Duc, J. Fischer, and Y. S. Song. Operator norm inequalities betweentensor unfoldings on the partition lattice. Linear Algebra and its Applications, Vol. 520(2017) 44-66.

Thank you!

51 / 51

Page 58: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

1 / 6

Page 59: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

2 / 6

Page 60: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

3 / 6

Page 61: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

4 / 6

Page 62: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

5 / 6

Page 63: Beyond matrices: statistical method for higher-order …pages.stat.wisc.edu/~miaoyan/tensor_tutorial2_fudan.pdfBeyond matrices: statistical method for higher-order tensors and its

6 / 6