22
An introduction to the statistical analysis of random variable dependencies using GLASSO. Fr ´ ed´ eric RICHARD Introduction Dependency Correlation Independence Conditional independence Estimation MLE GLASSO Graphical model GLASSO Path GLASSO knots Path construction References An introduction to the statistical analysis of random variable dependencies using GLASSO. Fr ´ ed´ eric RICHARD Institute of Mathematics of Marseille, Aix-Marseille University. Lecture given at the School of Mathematical Sciences, Queensland University of Technology, Brisbane, Australia, July, the 29th, 2019. 1 / 22

An introduction to the statistical analysis of random ... · dependencies using GLASSO. Fred´ eric´ ... Introduction Dependency Correlation Independence Conditional independence

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: An introduction to the statistical analysis of random ... · dependencies using GLASSO. Fred´ eric´ ... Introduction Dependency Correlation Independence Conditional independence

Anintroduction tothe statisticalanalysis of

randomvariable

dependenciesusing

GLASSO.

FredericRICHARD

Introduction

DependencyCorrelation

Independence

Conditionalindependence

EstimationMLE

GLASSO

GraphicalmodelGLASSO Path

GLASSO knots

Path construction

References

An introduction to the statistical analysis ofrandom variable dependencies using

GLASSO.

Frederic RICHARD

Institute of Mathematics of Marseille, Aix-Marseille University.

Lecture given at the School of Mathematical Sciences,Queensland University of Technology,

Brisbane, Australia,July, the 29th, 2019.

1 / 22

Page 2: An introduction to the statistical analysis of random ... · dependencies using GLASSO. Fred´ eric´ ... Introduction Dependency Correlation Independence Conditional independence

Anintroduction tothe statisticalanalysis of

randomvariable

dependenciesusing

GLASSO.

FredericRICHARD

Introduction

DependencyCorrelation

Independence

Conditionalindependence

EstimationMLE

GLASSO

GraphicalmodelGLASSO Path

GLASSO knots

Path construction

References

Context and Motivation• Let X1, · · · ,Xn be an observed sample of i.i.d.

Gaussian random vectors N (µ,Σ).• Xi = (Xij)

pj=1, j th variable of the i th observation.

• Theme:analysis of variable dependencies from observations.• An application to brain analysis:

• Data: Positron Emission Tomography (PET) of the brain.• Measures the local glucose consumption→ observe the neural activity within the brain.

• Modeling: Xij represents the brain activity within a j thbrain region as observed in a i th patient.

• Study of variable dependencies→ information aboutthe metabolic connectivity.

• Characterization of neurodegenerative diseases suchas the Alzheimer’s Disease.

2 / 22

Page 3: An introduction to the statistical analysis of random ... · dependencies using GLASSO. Fred´ eric´ ... Introduction Dependency Correlation Independence Conditional independence

Anintroduction tothe statisticalanalysis of

randomvariable

dependenciesusing

GLASSO.

FredericRICHARD

Introduction

DependencyCorrelation

Independence

Conditionalindependence

EstimationMLE

GLASSO

GraphicalmodelGLASSO Path

GLASSO knots

Path construction

References

Outline of the talk

1 Random variable dependency.

2 Dependency estimation.

3 Test with GLASSO.

3 / 22

Page 4: An introduction to the statistical analysis of random ... · dependencies using GLASSO. Fred´ eric´ ... Introduction Dependency Correlation Independence Conditional independence

Anintroduction tothe statisticalanalysis of

randomvariable

dependenciesusing

GLASSO.

FredericRICHARD

Introduction

DependencyCorrelation

Independence

Conditionalindependence

EstimationMLE

GLASSO

GraphicalmodelGLASSO Path

GLASSO knots

Path construction

References

Correlation coefficient

• Let X ∼ N (µ,Σ). What information does thecovariance matrix Σ bring?• Covariance between Xj and Xk :

Σjk = Cov(Xj ,Xk ).

• Correlation coefficient:

ρjk = Σjk/√

ΣjjΣkk ,

a normalized measure of the linear dependencybetween Xj and Xk :• |ρjk | ' 1: strong linear dependencies,• ρjk ' 0: weak linear dependencies.

4 / 22

Page 5: An introduction to the statistical analysis of random ... · dependencies using GLASSO. Fred´ eric´ ... Introduction Dependency Correlation Independence Conditional independence

Anintroduction tothe statisticalanalysis of

randomvariable

dependenciesusing

GLASSO.

FredericRICHARD

Introduction

DependencyCorrelation

Independence

Conditionalindependence

EstimationMLE

GLASSO

GraphicalmodelGLASSO Path

GLASSO knots

Path construction

References

IndependenceDefinitionTwo variables Xk and Xl are independent (Xk ⊥⊥ Xl ) iff

∀A,B ⊂ B(R), P(Xk ∈ A,Xl ∈ B) = P(Xk ∈ A)P(Xl ∈ B).

PropositionLet Xk , Xl be two variables with densities fXk and fXl , resp.(i) Xk ⊥⊥ Xl iff f(Xk ,Xl ) = fXk fXl .

(ii) Xk ⊥⊥ Xl iff fXk |Xl= fXk .

Interpretation: Xl provides ”no information” about the distribution of Xk .

PropositionLet X = (Xj)

pj=1 be a Gaussian random vector

N (µ,Σ).Then,

Xk ⊥⊥ Xl iff Σkl = 0.5 / 22

Page 6: An introduction to the statistical analysis of random ... · dependencies using GLASSO. Fred´ eric´ ... Introduction Dependency Correlation Independence Conditional independence

Anintroduction tothe statisticalanalysis of

randomvariable

dependenciesusing

GLASSO.

FredericRICHARD

Introduction

DependencyCorrelation

Independence

Conditionalindependence

EstimationMLE

GLASSO

GraphicalmodelGLASSO Path

GLASSO knots

Path construction

References

Conditional independence (1/3)

• Cov(Xk ,Xl) = 0⇒ no (linear) link between Xk and Xl .• But, if Cov(Xk ,Xl) 6= 0, is there necessary a direct link

between Xk and Xl?• Example:

• X1: delay in Brisbane of planes coming from Singapore.• X2: delay in Tokyo of planes coming from Singapore.• X3: waiting time for takeoff in Singapore.

→ There could be positive correlations between X1 andX2. But these correlations could be only due to links ofX1 and X2 to the other variable X3.

6 / 22

Page 7: An introduction to the statistical analysis of random ... · dependencies using GLASSO. Fred´ eric´ ... Introduction Dependency Correlation Independence Conditional independence

Anintroduction tothe statisticalanalysis of

randomvariable

dependenciesusing

GLASSO.

FredericRICHARD

Introduction

DependencyCorrelation

Independence

Conditionalindependence

EstimationMLE

GLASSO

GraphicalmodelGLASSO Path

GLASSO knots

Path construction

References

Conditional independence (2/3)DefinitionTwo random vectors XA and XB are independentconditionally to another one XC ((XA ⊥⊥ XB)|XC) if

f(XA,XB)|XC= fXA|XC

fXB |XC

Interpretation: if we know about XC , XA provides ”no information” aboutthe distribution of XB , and reciprocally.

Question: When X is a Gaussian vector, where can we find informationabout variable conditional dependencies?Answer: in the precision matrix Θ = Σ−1.

PropositionLet X be a Gaussian vector of size p with precision matrixΘ. Let A and B be two disjoint index subsets of [[1,p]] andC = [[1,p]]\(A ∪ B) their complementary. Then

(XA ⊥⊥ XB)|XC ⇔ Θij = 0,∀(i , j) ∈ A× B.7 / 22

Page 8: An introduction to the statistical analysis of random ... · dependencies using GLASSO. Fred´ eric´ ... Introduction Dependency Correlation Independence Conditional independence

Anintroduction tothe statisticalanalysis of

randomvariable

dependenciesusing

GLASSO.

FredericRICHARD

Introduction

DependencyCorrelation

Independence

Conditionalindependence

EstimationMLE

GLASSO

GraphicalmodelGLASSO Path

GLASSO knots

Path construction

References

Conditional independence (3/3)Representation of variable conditional dependencies:• Vertices: variable indices [[1,p]].• Edge j ∼ k if Θjk 6= 0.

Measure of the conditional dependencies:Partial correlation between Xj and Xk

νjk = γ(Xj ,Xk )/√γ(Xj ,Xj )γ(Xk ,Xk ).

where γ(Xj ,Xk ) is a partial covariance:γ(Xj ,Xk ) = E((Xj−E(Xj |Xm,m 6=j,m 6=k ))(Xk−E(Xk |Xm,m 6=j,m 6=k ))|Xm,m 6=j,m 6=k ).

PropositionIf X is a Gaussian random vector with precision matrix Θ,then

νjk = −Θjk√ΘjjΘkk

.

8 / 22

Page 9: An introduction to the statistical analysis of random ... · dependencies using GLASSO. Fred´ eric´ ... Introduction Dependency Correlation Independence Conditional independence

Anintroduction tothe statisticalanalysis of

randomvariable

dependenciesusing

GLASSO.

FredericRICHARD

Introduction

DependencyCorrelation

Independence

Conditionalindependence

EstimationMLE

GLASSO

GraphicalmodelGLASSO Path

GLASSO knots

Path construction

References

Statistical setting

• Issue: Observing an i.i.d sample X1, · · · ,Xn of N (µ,Σ),estimate the precision matrix Θ = Σ−1.• Statistical model:

P(X1 ∈ A1, · · · ,Xn ∈ An) =

∫A1

· · ·∫

An

L(µ,Σ; x)dx1 · · · dxn,

where L is the likelihood function

L(µ,Σ; x) = (2π|Σ|)−n/2 exp(−12

n∑i=1

(xi−µ)T Σ−1(xi−µ)).

• Maximum likelihood estimate (MLE):

(µ, Σ) = arg max(µ,Σ)

L(µ,Σ; x).

9 / 22

Page 10: An introduction to the statistical analysis of random ... · dependencies using GLASSO. Fred´ eric´ ... Introduction Dependency Correlation Independence Conditional independence

Anintroduction tothe statisticalanalysis of

randomvariable

dependenciesusing

GLASSO.

FredericRICHARD

Introduction

DependencyCorrelation

Independence

Conditionalindependence

EstimationMLE

GLASSO

GraphicalmodelGLASSO Path

GLASSO knots

Path construction

References

Optimization• Log-likelihood: `(Σ) = − loge(L(xn,Σ; x)).• It can be expressed as

`(Σ) =n2

loge(|Σ|) +n2

trace(Σ−1Sn) + c,

where Sn is the sample covariance matrix

Sn =1n

n∑i=1

(xi − xn)(xi − xn)T .

• This function is convex and differentiable. For thematrix inner product 〈M,P〉 = trace(MT P), its gradientwith respect to Σ is given by

∇`(Σ) = −Σ + Sn.

• Hence, it has a unique minimum reached at Σ = Sn,which is the unique solution of

∇`(Σ) = 0.10 / 22

Page 11: An introduction to the statistical analysis of random ... · dependencies using GLASSO. Fred´ eric´ ... Introduction Dependency Correlation Independence Conditional independence

Anintroduction tothe statisticalanalysis of

randomvariable

dependenciesusing

GLASSO.

FredericRICHARD

Introduction

DependencyCorrelation

Independence

Conditionalindependence

EstimationMLE

GLASSO

GraphicalmodelGLASSO Path

GLASSO knots

Path construction

References

Estimator properties

• Due to the law of large numbers,

Sn =1n

n∑i=1

(xi − xn)(xi − xn)T −→n→+∞

Σ.

• Besides, almost surely, rank(Sn) = min(p,n). So, aslong as p < n, Θ = Σ−1 can be estimated with

Θ = S−1n .

• However, such an estimation is not possible in ahigh-dimensional setting where n < p.• Question: how can we estimate Θ in such a case?

11 / 22

Page 12: An introduction to the statistical analysis of random ... · dependencies using GLASSO. Fred´ eric´ ... Introduction Dependency Correlation Independence Conditional independence

Anintroduction tothe statisticalanalysis of

randomvariable

dependenciesusing

GLASSO.

FredericRICHARD

Introduction

DependencyCorrelation

Independence

Conditionalindependence

EstimationMLE

GLASSO

GraphicalmodelGLASSO Path

GLASSO knots

Path construction

References

Statistical setting• In a Bayesian approach, set a prior assumption on

coefficients Θij of Θ:Θij are i.i.d random variables sampled from a Laplaciandistribution of parameter 2/(nλ) for λ > 0.• Posterior likelihood (density of Θ given X ) :

fΘ|X=x (Θ) = L(µ,Θ; x)πΘ(Θ),

with

πΘ(Θ) ∝ exp

−nλ2

∑i,j

|Θij |

.

• Maximum A Posterior Estimate (MAP) :

Θ = arg maxΘ

fΘ|X (Θ).

Reading: J. Friedman, T. Hastie, and R. Tibshirani (2007).12 / 22

Page 13: An introduction to the statistical analysis of random ... · dependencies using GLASSO. Fred´ eric´ ... Introduction Dependency Correlation Independence Conditional independence

Anintroduction tothe statisticalanalysis of

randomvariable

dependenciesusing

GLASSO.

FredericRICHARD

Introduction

DependencyCorrelation

Independence

Conditionalindependence

EstimationMLE

GLASSO

GraphicalmodelGLASSO Path

GLASSO knots

Path construction

References

OptimizationLog-posterior likelihood (setting µ = xn):

H(Θ) ∝ − loge(|Θ|) + trace(ΘSn) + λ|Θ|1 + c,

where |Θ|1 =∑

i,j |Θij |.This function is convex but no longer differentiable.However, it has sub-gradients given by elements of the set

∂H(Θ) = −Θ−1 + Sn + λΓ(Θ),

where Γ(Θ) is a set defined by the cartesian product of sets

Γ(Θ)ij = sign(Θij) =

{1} if Θij > 0,{−1} if Θij < 0,[− 1,1

]if Θij = 0.

Minimum of H is reached at solution of 0 ∈ ∂H(Θ).

13 / 22

Page 14: An introduction to the statistical analysis of random ... · dependencies using GLASSO. Fred´ eric´ ... Introduction Dependency Correlation Independence Conditional independence

Anintroduction tothe statisticalanalysis of

randomvariable

dependenciesusing

GLASSO.

FredericRICHARD

Introduction

DependencyCorrelation

Independence

Conditionalindependence

EstimationMLE

GLASSO

GraphicalmodelGLASSO Path

GLASSO knots

Path construction

References

GLASSO path• The precision matrix Θ can be associated to a graphG = {V, E} whose vertex set and edge sets are• V = [[1,p]],• E = {(j , k) ∈ [[1,p]]2,Θjk 6= 0}.

• From the GLASSO solution

Θ(λ) = arg maxΘ{− loge(|Θ|) + trace(ΘSn) + λ|Θ|1},

the graph G can be estimated by G(λ) = {V, E(λ)} with

E(λ) = {(j , k) ∈ [[1,p]]2, Θ(λ)jk 6= 0}.

• GLASSO path : set of graphs (G(λ), λ > 0).• The larger λ, the sparser Θ(λ) and the lower the

number of edges of G(λ).

14 / 22

Page 15: An introduction to the statistical analysis of random ... · dependencies using GLASSO. Fred´ eric´ ... Introduction Dependency Correlation Independence Conditional independence

Anintroduction tothe statisticalanalysis of

randomvariable

dependenciesusing

GLASSO.

FredericRICHARD

Introduction

DependencyCorrelation

Independence

Conditionalindependence

EstimationMLE

GLASSO

GraphicalmodelGLASSO Path

GLASSO knots

Path construction

References

Connected componentsDefinitionTwo vertices j and k of a graph G = {V, E} are connected ifthere exists a vertex path (m0,m1, · · · ,mL) such that

m0 = j ,mL = k , and (mi ,mi+1) ∈ E ,∀i ∈ [[0,L− 1]].

DefinitionA vertex set C of a graph G = {V, E} is a connectedcomponent if any couples of vertices of C are connected.

Theorem (Characterization of connected components)Let G(λ) be a graph estimated by GLASSO for some λ > 0.Let C1, · · · ,CM be a partition of [[1,p]]. Then, C1, · · · ,CMare pairwise disconnected components of the graph G(λ) iff

∀k , l ∈ [[1,M]], k 6= l , ∀(i , j) ∈ Ck × Cl , |Sij | ≤ λ.

Reading: D. Witten, J. Friedman, and N. Simon (2011). 15 / 22

Page 16: An introduction to the statistical analysis of random ... · dependencies using GLASSO. Fred´ eric´ ... Introduction Dependency Correlation Independence Conditional independence

Anintroduction tothe statisticalanalysis of

randomvariable

dependenciesusing

GLASSO.

FredericRICHARD

Introduction

DependencyCorrelation

Independence

Conditionalindependence

EstimationMLE

GLASSO

GraphicalmodelGLASSO Path

GLASSO knots

Path construction

References

GLASSO knots

Characterization (contrapositive): If |Sij | > λ then verticesi , j are connected in G(λ).

Let S(1) > · · · > S(P) be the ordered values of {|Sij |, i < j}.S(0) = +∞, S(P+1) = −∞

CorollaryLet m ∈ [[0,P]]. For all λ in [S(m+1),S(m)), connectedcomponents of graphs G(λ) are the same.

−→ As λ varies, connected components of graphs G(λ) mayonly change at some values of {S(m),m ∈ [[1,P]]}.

Question: what may happen at S(m)?

16 / 22

Page 17: An introduction to the statistical analysis of random ... · dependencies using GLASSO. Fred´ eric´ ... Introduction Dependency Correlation Independence Conditional independence

Anintroduction tothe statisticalanalysis of

randomvariable

dependenciesusing

GLASSO.

FredericRICHARD

Introduction

DependencyCorrelation

Independence

Conditionalindependence

EstimationMLE

GLASSO

GraphicalmodelGLASSO Path

GLASSO knots

Path construction

References

1

2

3

4

5

6

C(m)2C(m)

1

Figure: Two connected components C(m)1 and C(m)

2 of graphs G(λ)

for S(m+1) ≤ λ < S(m).

17 / 22

Page 18: An introduction to the statistical analysis of random ... · dependencies using GLASSO. Fred´ eric´ ... Introduction Dependency Correlation Independence Conditional independence

Anintroduction tothe statisticalanalysis of

randomvariable

dependenciesusing

GLASSO.

FredericRICHARD

Introduction

DependencyCorrelation

Independence

Conditionalindependence

EstimationMLE

GLASSO

GraphicalmodelGLASSO Path

GLASSO knots

Path construction

References

1

2

3

4

5

6

C(m+1)2C(m+1)

1

|S13| = S(m+1)

Figure: Connected components C(m+1)1 = C(m)

1 and C(m+1)2 = C(m)

2of graphs G(λ) for S(m+2) ≤ λ < S(m+1).

18 / 22

Page 19: An introduction to the statistical analysis of random ... · dependencies using GLASSO. Fred´ eric´ ... Introduction Dependency Correlation Independence Conditional independence

Anintroduction tothe statisticalanalysis of

randomvariable

dependenciesusing

GLASSO.

FredericRICHARD

Introduction

DependencyCorrelation

Independence

Conditionalindependence

EstimationMLE

GLASSO

GraphicalmodelGLASSO Path

GLASSO knots

Path construction

References

1

2

3

4

5

6

C(m)2C(m)

1

C(m+1)1

|S25| = S(m+1)

Figure: Connected component C(m+1)1 = C(m)

1 ∪ C(m)2 of graphs

G(λ) for S(m+2) ≤ λ < S(m+1). S(m+1) is a GLASSO node: a criticalvalue where the graph structure changes.

19 / 22

Page 20: An introduction to the statistical analysis of random ... · dependencies using GLASSO. Fred´ eric´ ... Introduction Dependency Correlation Independence Conditional independence

Anintroduction tothe statisticalanalysis of

randomvariable

dependenciesusing

GLASSO.

FredericRICHARD

Introduction

DependencyCorrelation

Independence

Conditionalindependence

EstimationMLE

GLASSO

GraphicalmodelGLASSO Path

GLASSO knots

Path construction

References

S =

(0) (1) (2) (3) (4) (5)(0) 1. 0.09 −0.37 0. −0.01 −0.01(1) 0.09 0.99 0.65 0.01 −0.01 −0.05(2) −0.37 0.65 1. 0.02 −0.01 −0.01(3) 0. 0.01 0.02 0.99 −0.68 −0.14(4) −0.01 −0.01 −0.01 −0.68 1. −0.26(5) −0.01 −0.05 −0.01 −0.14 −0.26 1.

20 / 22

Page 21: An introduction to the statistical analysis of random ... · dependencies using GLASSO. Fred´ eric´ ... Introduction Dependency Correlation Independence Conditional independence

Anintroduction tothe statisticalanalysis of

randomvariable

dependenciesusing

GLASSO.

FredericRICHARD

Introduction

DependencyCorrelation

Independence

Conditionalindependence

EstimationMLE

GLASSO

GraphicalmodelGLASSO Path

GLASSO knots

Path construction

References

Where to cut the tree?

• At each level m, test the hypothesis H(m)0 : connected

components of the graph G of Θ are within those of Gλm .• Test statistics:

Tm = nλm(λm − λm+1).

TheoremUnder H(m)

0 , Tm → E(1/m), as n,p → +∞, log(p)/n→ 0.

Reading: M. Grazier G’Sell, J. Taylor, and R. Tibshirani (2013).21 / 22

Page 22: An introduction to the statistical analysis of random ... · dependencies using GLASSO. Fred´ eric´ ... Introduction Dependency Correlation Independence Conditional independence

Anintroduction tothe statisticalanalysis of

randomvariable

dependenciesusing

GLASSO.

FredericRICHARD

Introduction

DependencyCorrelation

Independence

Conditionalindependence

EstimationMLE

GLASSO

GraphicalmodelGLASSO Path

GLASSO knots

Path construction

References

Cited references

• J. Friedman, T. Hastie, and R. Tibshirani (2008),Sparse inverse covariance estimation with the graphicallasso, Biostatistics, 9(3), 432-441.• D. Witten, J. Friedman, and N. Simon (2011), New

insights and faster computations for the graphical lasso,Journal of Computational and Graphical Statistics,20(4), 892-900.• M. Grazier G’Sell, J. Taylor, and R. Tibshirani (2013),

Adaptive testing for the graphical Lasso, arXiv1307.4765v2.

22 / 22