Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Anintroduction tothe statisticalanalysis of
randomvariable
dependenciesusing
GLASSO.
FredericRICHARD
Introduction
DependencyCorrelation
Independence
Conditionalindependence
EstimationMLE
GLASSO
GraphicalmodelGLASSO Path
GLASSO knots
Path construction
References
An introduction to the statistical analysis ofrandom variable dependencies using
GLASSO.
Frederic RICHARD
Institute of Mathematics of Marseille, Aix-Marseille University.
Lecture given at the School of Mathematical Sciences,Queensland University of Technology,
Brisbane, Australia,July, the 29th, 2019.
1 / 22
Anintroduction tothe statisticalanalysis of
randomvariable
dependenciesusing
GLASSO.
FredericRICHARD
Introduction
DependencyCorrelation
Independence
Conditionalindependence
EstimationMLE
GLASSO
GraphicalmodelGLASSO Path
GLASSO knots
Path construction
References
Context and Motivation• Let X1, · · · ,Xn be an observed sample of i.i.d.
Gaussian random vectors N (µ,Σ).• Xi = (Xij)
pj=1, j th variable of the i th observation.
• Theme:analysis of variable dependencies from observations.• An application to brain analysis:
• Data: Positron Emission Tomography (PET) of the brain.• Measures the local glucose consumption→ observe the neural activity within the brain.
• Modeling: Xij represents the brain activity within a j thbrain region as observed in a i th patient.
• Study of variable dependencies→ information aboutthe metabolic connectivity.
• Characterization of neurodegenerative diseases suchas the Alzheimer’s Disease.
2 / 22
Anintroduction tothe statisticalanalysis of
randomvariable
dependenciesusing
GLASSO.
FredericRICHARD
Introduction
DependencyCorrelation
Independence
Conditionalindependence
EstimationMLE
GLASSO
GraphicalmodelGLASSO Path
GLASSO knots
Path construction
References
Outline of the talk
1 Random variable dependency.
2 Dependency estimation.
3 Test with GLASSO.
3 / 22
Anintroduction tothe statisticalanalysis of
randomvariable
dependenciesusing
GLASSO.
FredericRICHARD
Introduction
DependencyCorrelation
Independence
Conditionalindependence
EstimationMLE
GLASSO
GraphicalmodelGLASSO Path
GLASSO knots
Path construction
References
Correlation coefficient
• Let X ∼ N (µ,Σ). What information does thecovariance matrix Σ bring?• Covariance between Xj and Xk :
Σjk = Cov(Xj ,Xk ).
• Correlation coefficient:
ρjk = Σjk/√
ΣjjΣkk ,
a normalized measure of the linear dependencybetween Xj and Xk :• |ρjk | ' 1: strong linear dependencies,• ρjk ' 0: weak linear dependencies.
4 / 22
Anintroduction tothe statisticalanalysis of
randomvariable
dependenciesusing
GLASSO.
FredericRICHARD
Introduction
DependencyCorrelation
Independence
Conditionalindependence
EstimationMLE
GLASSO
GraphicalmodelGLASSO Path
GLASSO knots
Path construction
References
IndependenceDefinitionTwo variables Xk and Xl are independent (Xk ⊥⊥ Xl ) iff
∀A,B ⊂ B(R), P(Xk ∈ A,Xl ∈ B) = P(Xk ∈ A)P(Xl ∈ B).
PropositionLet Xk , Xl be two variables with densities fXk and fXl , resp.(i) Xk ⊥⊥ Xl iff f(Xk ,Xl ) = fXk fXl .
(ii) Xk ⊥⊥ Xl iff fXk |Xl= fXk .
Interpretation: Xl provides ”no information” about the distribution of Xk .
PropositionLet X = (Xj)
pj=1 be a Gaussian random vector
N (µ,Σ).Then,
Xk ⊥⊥ Xl iff Σkl = 0.5 / 22
Anintroduction tothe statisticalanalysis of
randomvariable
dependenciesusing
GLASSO.
FredericRICHARD
Introduction
DependencyCorrelation
Independence
Conditionalindependence
EstimationMLE
GLASSO
GraphicalmodelGLASSO Path
GLASSO knots
Path construction
References
Conditional independence (1/3)
• Cov(Xk ,Xl) = 0⇒ no (linear) link between Xk and Xl .• But, if Cov(Xk ,Xl) 6= 0, is there necessary a direct link
between Xk and Xl?• Example:
• X1: delay in Brisbane of planes coming from Singapore.• X2: delay in Tokyo of planes coming from Singapore.• X3: waiting time for takeoff in Singapore.
→ There could be positive correlations between X1 andX2. But these correlations could be only due to links ofX1 and X2 to the other variable X3.
6 / 22
Anintroduction tothe statisticalanalysis of
randomvariable
dependenciesusing
GLASSO.
FredericRICHARD
Introduction
DependencyCorrelation
Independence
Conditionalindependence
EstimationMLE
GLASSO
GraphicalmodelGLASSO Path
GLASSO knots
Path construction
References
Conditional independence (2/3)DefinitionTwo random vectors XA and XB are independentconditionally to another one XC ((XA ⊥⊥ XB)|XC) if
f(XA,XB)|XC= fXA|XC
fXB |XC
Interpretation: if we know about XC , XA provides ”no information” aboutthe distribution of XB , and reciprocally.
Question: When X is a Gaussian vector, where can we find informationabout variable conditional dependencies?Answer: in the precision matrix Θ = Σ−1.
PropositionLet X be a Gaussian vector of size p with precision matrixΘ. Let A and B be two disjoint index subsets of [[1,p]] andC = [[1,p]]\(A ∪ B) their complementary. Then
(XA ⊥⊥ XB)|XC ⇔ Θij = 0,∀(i , j) ∈ A× B.7 / 22
Anintroduction tothe statisticalanalysis of
randomvariable
dependenciesusing
GLASSO.
FredericRICHARD
Introduction
DependencyCorrelation
Independence
Conditionalindependence
EstimationMLE
GLASSO
GraphicalmodelGLASSO Path
GLASSO knots
Path construction
References
Conditional independence (3/3)Representation of variable conditional dependencies:• Vertices: variable indices [[1,p]].• Edge j ∼ k if Θjk 6= 0.
Measure of the conditional dependencies:Partial correlation between Xj and Xk
νjk = γ(Xj ,Xk )/√γ(Xj ,Xj )γ(Xk ,Xk ).
where γ(Xj ,Xk ) is a partial covariance:γ(Xj ,Xk ) = E((Xj−E(Xj |Xm,m 6=j,m 6=k ))(Xk−E(Xk |Xm,m 6=j,m 6=k ))|Xm,m 6=j,m 6=k ).
PropositionIf X is a Gaussian random vector with precision matrix Θ,then
νjk = −Θjk√ΘjjΘkk
.
8 / 22
Anintroduction tothe statisticalanalysis of
randomvariable
dependenciesusing
GLASSO.
FredericRICHARD
Introduction
DependencyCorrelation
Independence
Conditionalindependence
EstimationMLE
GLASSO
GraphicalmodelGLASSO Path
GLASSO knots
Path construction
References
Statistical setting
• Issue: Observing an i.i.d sample X1, · · · ,Xn of N (µ,Σ),estimate the precision matrix Θ = Σ−1.• Statistical model:
P(X1 ∈ A1, · · · ,Xn ∈ An) =
∫A1
· · ·∫
An
L(µ,Σ; x)dx1 · · · dxn,
where L is the likelihood function
L(µ,Σ; x) = (2π|Σ|)−n/2 exp(−12
n∑i=1
(xi−µ)T Σ−1(xi−µ)).
• Maximum likelihood estimate (MLE):
(µ, Σ) = arg max(µ,Σ)
L(µ,Σ; x).
9 / 22
Anintroduction tothe statisticalanalysis of
randomvariable
dependenciesusing
GLASSO.
FredericRICHARD
Introduction
DependencyCorrelation
Independence
Conditionalindependence
EstimationMLE
GLASSO
GraphicalmodelGLASSO Path
GLASSO knots
Path construction
References
Optimization• Log-likelihood: `(Σ) = − loge(L(xn,Σ; x)).• It can be expressed as
`(Σ) =n2
loge(|Σ|) +n2
trace(Σ−1Sn) + c,
where Sn is the sample covariance matrix
Sn =1n
n∑i=1
(xi − xn)(xi − xn)T .
• This function is convex and differentiable. For thematrix inner product 〈M,P〉 = trace(MT P), its gradientwith respect to Σ is given by
∇`(Σ) = −Σ + Sn.
• Hence, it has a unique minimum reached at Σ = Sn,which is the unique solution of
∇`(Σ) = 0.10 / 22
Anintroduction tothe statisticalanalysis of
randomvariable
dependenciesusing
GLASSO.
FredericRICHARD
Introduction
DependencyCorrelation
Independence
Conditionalindependence
EstimationMLE
GLASSO
GraphicalmodelGLASSO Path
GLASSO knots
Path construction
References
Estimator properties
• Due to the law of large numbers,
Sn =1n
n∑i=1
(xi − xn)(xi − xn)T −→n→+∞
Σ.
• Besides, almost surely, rank(Sn) = min(p,n). So, aslong as p < n, Θ = Σ−1 can be estimated with
Θ = S−1n .
• However, such an estimation is not possible in ahigh-dimensional setting where n < p.• Question: how can we estimate Θ in such a case?
11 / 22
Anintroduction tothe statisticalanalysis of
randomvariable
dependenciesusing
GLASSO.
FredericRICHARD
Introduction
DependencyCorrelation
Independence
Conditionalindependence
EstimationMLE
GLASSO
GraphicalmodelGLASSO Path
GLASSO knots
Path construction
References
Statistical setting• In a Bayesian approach, set a prior assumption on
coefficients Θij of Θ:Θij are i.i.d random variables sampled from a Laplaciandistribution of parameter 2/(nλ) for λ > 0.• Posterior likelihood (density of Θ given X ) :
fΘ|X=x (Θ) = L(µ,Θ; x)πΘ(Θ),
with
πΘ(Θ) ∝ exp
−nλ2
∑i,j
|Θij |
.
• Maximum A Posterior Estimate (MAP) :
Θ = arg maxΘ
fΘ|X (Θ).
Reading: J. Friedman, T. Hastie, and R. Tibshirani (2007).12 / 22
Anintroduction tothe statisticalanalysis of
randomvariable
dependenciesusing
GLASSO.
FredericRICHARD
Introduction
DependencyCorrelation
Independence
Conditionalindependence
EstimationMLE
GLASSO
GraphicalmodelGLASSO Path
GLASSO knots
Path construction
References
OptimizationLog-posterior likelihood (setting µ = xn):
H(Θ) ∝ − loge(|Θ|) + trace(ΘSn) + λ|Θ|1 + c,
where |Θ|1 =∑
i,j |Θij |.This function is convex but no longer differentiable.However, it has sub-gradients given by elements of the set
∂H(Θ) = −Θ−1 + Sn + λΓ(Θ),
where Γ(Θ) is a set defined by the cartesian product of sets
Γ(Θ)ij = sign(Θij) =
{1} if Θij > 0,{−1} if Θij < 0,[− 1,1
]if Θij = 0.
Minimum of H is reached at solution of 0 ∈ ∂H(Θ).
13 / 22
Anintroduction tothe statisticalanalysis of
randomvariable
dependenciesusing
GLASSO.
FredericRICHARD
Introduction
DependencyCorrelation
Independence
Conditionalindependence
EstimationMLE
GLASSO
GraphicalmodelGLASSO Path
GLASSO knots
Path construction
References
GLASSO path• The precision matrix Θ can be associated to a graphG = {V, E} whose vertex set and edge sets are• V = [[1,p]],• E = {(j , k) ∈ [[1,p]]2,Θjk 6= 0}.
• From the GLASSO solution
Θ(λ) = arg maxΘ{− loge(|Θ|) + trace(ΘSn) + λ|Θ|1},
the graph G can be estimated by G(λ) = {V, E(λ)} with
E(λ) = {(j , k) ∈ [[1,p]]2, Θ(λ)jk 6= 0}.
• GLASSO path : set of graphs (G(λ), λ > 0).• The larger λ, the sparser Θ(λ) and the lower the
number of edges of G(λ).
14 / 22
Anintroduction tothe statisticalanalysis of
randomvariable
dependenciesusing
GLASSO.
FredericRICHARD
Introduction
DependencyCorrelation
Independence
Conditionalindependence
EstimationMLE
GLASSO
GraphicalmodelGLASSO Path
GLASSO knots
Path construction
References
Connected componentsDefinitionTwo vertices j and k of a graph G = {V, E} are connected ifthere exists a vertex path (m0,m1, · · · ,mL) such that
m0 = j ,mL = k , and (mi ,mi+1) ∈ E ,∀i ∈ [[0,L− 1]].
DefinitionA vertex set C of a graph G = {V, E} is a connectedcomponent if any couples of vertices of C are connected.
Theorem (Characterization of connected components)Let G(λ) be a graph estimated by GLASSO for some λ > 0.Let C1, · · · ,CM be a partition of [[1,p]]. Then, C1, · · · ,CMare pairwise disconnected components of the graph G(λ) iff
∀k , l ∈ [[1,M]], k 6= l , ∀(i , j) ∈ Ck × Cl , |Sij | ≤ λ.
Reading: D. Witten, J. Friedman, and N. Simon (2011). 15 / 22
Anintroduction tothe statisticalanalysis of
randomvariable
dependenciesusing
GLASSO.
FredericRICHARD
Introduction
DependencyCorrelation
Independence
Conditionalindependence
EstimationMLE
GLASSO
GraphicalmodelGLASSO Path
GLASSO knots
Path construction
References
GLASSO knots
Characterization (contrapositive): If |Sij | > λ then verticesi , j are connected in G(λ).
Let S(1) > · · · > S(P) be the ordered values of {|Sij |, i < j}.S(0) = +∞, S(P+1) = −∞
CorollaryLet m ∈ [[0,P]]. For all λ in [S(m+1),S(m)), connectedcomponents of graphs G(λ) are the same.
−→ As λ varies, connected components of graphs G(λ) mayonly change at some values of {S(m),m ∈ [[1,P]]}.
Question: what may happen at S(m)?
16 / 22
Anintroduction tothe statisticalanalysis of
randomvariable
dependenciesusing
GLASSO.
FredericRICHARD
Introduction
DependencyCorrelation
Independence
Conditionalindependence
EstimationMLE
GLASSO
GraphicalmodelGLASSO Path
GLASSO knots
Path construction
References
1
2
3
4
5
6
C(m)2C(m)
1
Figure: Two connected components C(m)1 and C(m)
2 of graphs G(λ)
for S(m+1) ≤ λ < S(m).
17 / 22
Anintroduction tothe statisticalanalysis of
randomvariable
dependenciesusing
GLASSO.
FredericRICHARD
Introduction
DependencyCorrelation
Independence
Conditionalindependence
EstimationMLE
GLASSO
GraphicalmodelGLASSO Path
GLASSO knots
Path construction
References
1
2
3
4
5
6
C(m+1)2C(m+1)
1
|S13| = S(m+1)
Figure: Connected components C(m+1)1 = C(m)
1 and C(m+1)2 = C(m)
2of graphs G(λ) for S(m+2) ≤ λ < S(m+1).
18 / 22
Anintroduction tothe statisticalanalysis of
randomvariable
dependenciesusing
GLASSO.
FredericRICHARD
Introduction
DependencyCorrelation
Independence
Conditionalindependence
EstimationMLE
GLASSO
GraphicalmodelGLASSO Path
GLASSO knots
Path construction
References
1
2
3
4
5
6
C(m)2C(m)
1
C(m+1)1
|S25| = S(m+1)
Figure: Connected component C(m+1)1 = C(m)
1 ∪ C(m)2 of graphs
G(λ) for S(m+2) ≤ λ < S(m+1). S(m+1) is a GLASSO node: a criticalvalue where the graph structure changes.
19 / 22
Anintroduction tothe statisticalanalysis of
randomvariable
dependenciesusing
GLASSO.
FredericRICHARD
Introduction
DependencyCorrelation
Independence
Conditionalindependence
EstimationMLE
GLASSO
GraphicalmodelGLASSO Path
GLASSO knots
Path construction
References
S =
(0) (1) (2) (3) (4) (5)(0) 1. 0.09 −0.37 0. −0.01 −0.01(1) 0.09 0.99 0.65 0.01 −0.01 −0.05(2) −0.37 0.65 1. 0.02 −0.01 −0.01(3) 0. 0.01 0.02 0.99 −0.68 −0.14(4) −0.01 −0.01 −0.01 −0.68 1. −0.26(5) −0.01 −0.05 −0.01 −0.14 −0.26 1.
20 / 22
Anintroduction tothe statisticalanalysis of
randomvariable
dependenciesusing
GLASSO.
FredericRICHARD
Introduction
DependencyCorrelation
Independence
Conditionalindependence
EstimationMLE
GLASSO
GraphicalmodelGLASSO Path
GLASSO knots
Path construction
References
Where to cut the tree?
• At each level m, test the hypothesis H(m)0 : connected
components of the graph G of Θ are within those of Gλm .• Test statistics:
Tm = nλm(λm − λm+1).
TheoremUnder H(m)
0 , Tm → E(1/m), as n,p → +∞, log(p)/n→ 0.
Reading: M. Grazier G’Sell, J. Taylor, and R. Tibshirani (2013).21 / 22
Anintroduction tothe statisticalanalysis of
randomvariable
dependenciesusing
GLASSO.
FredericRICHARD
Introduction
DependencyCorrelation
Independence
Conditionalindependence
EstimationMLE
GLASSO
GraphicalmodelGLASSO Path
GLASSO knots
Path construction
References
Cited references
• J. Friedman, T. Hastie, and R. Tibshirani (2008),Sparse inverse covariance estimation with the graphicallasso, Biostatistics, 9(3), 432-441.• D. Witten, J. Friedman, and N. Simon (2011), New
insights and faster computations for the graphical lasso,Journal of Computational and Graphical Statistics,20(4), 892-900.• M. Grazier G’Sell, J. Taylor, and R. Tibshirani (2013),
Adaptive testing for the graphical Lasso, arXiv1307.4765v2.
22 / 22