37
Hanne Iryna Vitor Dominique Conclusions Estimating the Inverse Covariance Matrix of Independent Multivariate Normally Distributed Random Variables Dominique Brunet, Hanne Kekkonen, Vitor Nunes, Iryna Sivak Fields-MITACS Thematic Program on Inverse Problems and Imaging Group on Bayesian/Statistical Inverse Problems July 27, 2012

Estimating the Inverse Covariance Matrix of Independent ... · Estimating inverse covariance matrix 1 We consider the problem of finding a good estimator for inverse covariance matrix

  • Upload
    letuyen

  • View
    258

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Estimating the Inverse Covariance Matrix of Independent ... · Estimating inverse covariance matrix 1 We consider the problem of finding a good estimator for inverse covariance matrix

Hanne Iryna Vitor Dominique Conclusions

Estimating the Inverse Covariance Matrix of IndependentMultivariate Normally Distributed Random Variables

Dominique Brunet, Hanne Kekkonen, Vitor Nunes, Iryna Sivak

Fields-MITACS Thematic Program on Inverse Problems and ImagingGroup on Bayesian/Statistical Inverse Problems

July 27, 2012

Page 2: Estimating the Inverse Covariance Matrix of Independent ... · Estimating inverse covariance matrix 1 We consider the problem of finding a good estimator for inverse covariance matrix

Hanne Iryna Vitor Dominique Conclusions

Menu

HanneProblem Formulation and MotivationFrequentist and Bayesian OriginsRelation to Imaging

IrynaThe LASSO Methods

VitorNewton Methods and Quasi-Newton MethodsThe QUIC TrickThoughts on Future Work

DominiqueSimulation and Results

Conclusions

Page 3: Estimating the Inverse Covariance Matrix of Independent ... · Estimating inverse covariance matrix 1 We consider the problem of finding a good estimator for inverse covariance matrix

Hanne Iryna Vitor Dominique Conclusions

Bayes’ formula brings a priori information and measured data together

Statistical InversionWe consider the linear measurement model

m = Ax + ε

where x ∈ Rp and m ∈ Rp are treated as random variables.

Bayes’ formulaBayes’ formula gives us the posterior distribution π(x | m):

π(x | m) ∝ π(m | x)π(x)

We recover x as a point estimate from π(x | m).

Page 4: Estimating the Inverse Covariance Matrix of Independent ... · Estimating inverse covariance matrix 1 We consider the problem of finding a good estimator for inverse covariance matrix

Hanne Iryna Vitor Dominique Conclusions

Likelihood distribution describes the measurement

If we assume that the noise ε is Gaussian

εk ∼ Np(µ, σ2), 1 ≤ k ≤ p

we can write the likelihood distribution which measures data misfit as

π(m | x) = πε(‖Ax −m‖) =1

σ√

(2π)exp

(− 1

2σ2 ‖Ax −m‖22

).

Page 5: Estimating the Inverse Covariance Matrix of Independent ... · Estimating inverse covariance matrix 1 We consider the problem of finding a good estimator for inverse covariance matrix

Hanne Iryna Vitor Dominique Conclusions

Prior distribution contains all other information about unknown x

• The prior distribution shouldn’t depend on the measurement.

• The prior distribution should assign a clearly higher probability to thosevalues of x that we expect to see than to unexpected x .

• Designing a good prior distributions is one of the main difficulties instatistical inversion.

Page 6: Estimating the Inverse Covariance Matrix of Independent ... · Estimating inverse covariance matrix 1 We consider the problem of finding a good estimator for inverse covariance matrix

Hanne Iryna Vitor Dominique Conclusions

Gaussian prior distribution

Gaussian modelIf we assume that also x is Gaussian

x ∼ Np(µ,Σ)

we can write

π(x) =(det(Σ−1))1/2

(2π)p/2 exp(− 1

2(x − µ)T Σ−1(x − µ)

)

Page 7: Estimating the Inverse Covariance Matrix of Independent ... · Estimating inverse covariance matrix 1 We consider the problem of finding a good estimator for inverse covariance matrix

Hanne Iryna Vitor Dominique Conclusions

Estimating inverse covariance matrix Σ−1

We consider the problem of finding a good estimator for inverse covariancematrix Σ−1 with a constraint that certain given pairs of variables areconditionally independent.

Conditional independence constraints describe the sparsity pattern of theinverse covariance matrix Σ−1, zeros showing the conditional independencebetween variables.

Page 8: Estimating the Inverse Covariance Matrix of Independent ... · Estimating inverse covariance matrix 1 We consider the problem of finding a good estimator for inverse covariance matrix

Hanne Iryna Vitor Dominique Conclusions

Why inverse covariance matrix?

Simple exampleWe can think a line of objects which are connected by a spring.

If we move the leftmost object y1 also the rightmost object y5 of the line willmove so clearly y1 and y5 are not independent.

Page 9: Estimating the Inverse Covariance Matrix of Independent ... · Estimating inverse covariance matrix 1 We consider the problem of finding a good estimator for inverse covariance matrix

Hanne Iryna Vitor Dominique Conclusions

However y1 and y5 are conditionally independent given all the other variables.This is because given the information of how y2 moves knowing y5 gives usno further information about y1. In fact yi is conditionally dependent only of itsneighbours.

Page 10: Estimating the Inverse Covariance Matrix of Independent ... · Estimating inverse covariance matrix 1 We consider the problem of finding a good estimator for inverse covariance matrix

Hanne Iryna Vitor Dominique Conclusions

What does this have to do with pictures?

Neighbourhood -RuleWe are interested in exploring the assumption that every pixel is conditionallydependent only of its neighbours.

0 20 40 60 80 100

0

10

20

30

40

50

60

70

80

90

100

nz = 838

Student Version of MATLAB

In this case the inverse covariance matrix has non zero elements only onnine ’diagonals’.

Page 11: Estimating the Inverse Covariance Matrix of Independent ... · Estimating inverse covariance matrix 1 We consider the problem of finding a good estimator for inverse covariance matrix

Hanne Iryna Vitor Dominique Conclusions

Log-likelihood function

Likelihood functionWe want to estimate parameters Σ−1 and µ of Np(µ,Σ), based on Nindependent samples xi . The likelihood function is

N∏i=1

π(xi | Σ, µ) =det Σ−

N2

(2π)Np2

exp{− 1

2

N∑j=1

(xj − µ)Tσ−1(xj − µ)

}

Log-likelihood functionLog-likelihood function of data is, up to a constant

L(Σ−1, µ) = N log det Σ−1 −N∑

j=1

(xj − µ)T Σ−1(xj − µ)

Page 12: Estimating the Inverse Covariance Matrix of Independent ... · Estimating inverse covariance matrix 1 We consider the problem of finding a good estimator for inverse covariance matrix

Hanne Iryna Vitor Dominique Conclusions

How to obtain sparse Σ−1

maximum likelihood estimateMaximizing the log-likelihood with respect to Σ−1 leads to the maximumlikelihood estimate Σ−1 = S−1 which isn’t usually sparse (here S is thesample covariance obtained from the data). Also when p > N, S will besingular and so the maximum likelihood estimate cannot be computed.

Penalised log-likelihood functionTo find a sparse Σ−1 we use penalised log-likelihood function

argminΣ−1�0{− log det Σ−1 + (SΣ−1) + ||P ∗ Σ−1||1}

where where S is the sample covariance obtained from the data and P is oursparsity constraint.

Page 13: Estimating the Inverse Covariance Matrix of Independent ... · Estimating inverse covariance matrix 1 We consider the problem of finding a good estimator for inverse covariance matrix

Hanne Iryna Vitor Dominique Conclusions

In a nutshell

π(x | m) ∝ π(m | x)π(x)

Expect that

x ∼ Np(µ,Σ).

What is Σ−1?

Back to original inverse problemSolve the original inverse problemusing Σ−1

argminx‖Ax −m‖2 − (x − µ)T Σ−1(x − µ)

Penalised log-likelihoodfunction

Estimating Σ−1

Use algorithms to find Σ−1

Page 14: Estimating the Inverse Covariance Matrix of Independent ... · Estimating inverse covariance matrix 1 We consider the problem of finding a good estimator for inverse covariance matrix

Hanne Iryna Vitor Dominique Conclusions

Graphical Lasso

Σ−11 =

a11 a12 a13

a12 a22 a23

a13 a23 a33

Σ−12 =

b11 0 b13

0 b22 b23

b13 b23 b33

Page 15: Estimating the Inverse Covariance Matrix of Independent ... · Estimating inverse covariance matrix 1 We consider the problem of finding a good estimator for inverse covariance matrix

Hanne Iryna Vitor Dominique Conclusions

Graphical Lasso

Σ−11 =

a11 a12 a13

a12 a22 a23

a13 a23 a33

Σ−12 =

b11 0 b13

0 b22 b23

b13 b23 b33

Page 16: Estimating the Inverse Covariance Matrix of Independent ... · Estimating inverse covariance matrix 1 We consider the problem of finding a good estimator for inverse covariance matrix

Hanne Iryna Vitor Dominique Conclusions

Graphical Lasso

Page 17: Estimating the Inverse Covariance Matrix of Independent ... · Estimating inverse covariance matrix 1 We consider the problem of finding a good estimator for inverse covariance matrix

Hanne Iryna Vitor Dominique Conclusions

Graphical Lasso

Page 18: Estimating the Inverse Covariance Matrix of Independent ... · Estimating inverse covariance matrix 1 We consider the problem of finding a good estimator for inverse covariance matrix

Hanne Iryna Vitor Dominique Conclusions

GLASSO

Θ := Σ−1

Θ = argminΘ�0− log det(Θ) + tr(SΘ) + λ||Θ||1

m

Θ−1 − S − λΓ(Θ) = 0,

where (Γ(Θ))ij = Γ(Θij ) the subgradient of |Θij |:

Γ(Θij ) = sign(Θij ) if Θij 6= 0,

Γ(Θij ) ∈ (−1, 1) if Θij = 0

Page 19: Estimating the Inverse Covariance Matrix of Independent ... · Estimating inverse covariance matrix 1 We consider the problem of finding a good estimator for inverse covariance matrix

Hanne Iryna Vitor Dominique Conclusions

GLASSO

Θ := Σ−1, W := Θ−1

Θ =

(Θ11 θ12

θ21 θ22

), S =

(S11 s12

s21 s22

), W =

(W11 w12

w21 w22

)Start with W = S + λI. We also added Thresholding.Cycle around rows/columns till convergence:• solve QP:

β = arg minβ∈Rp−1

12β′W11β + β′s12 + λ||β||1, W11 � 0

using as warm starts the solution from the previous round for this column• update w12 = −W11β

• save β for this column in the matrix B

For every row/column compute θ22 = (s22 + λ− β′w12)−1, θ12 = βθ22.

[J. Friedman, T. Hastie, R. Tibshirani, 2007]

Page 20: Estimating the Inverse Covariance Matrix of Independent ... · Estimating inverse covariance matrix 1 We consider the problem of finding a good estimator for inverse covariance matrix

Hanne Iryna Vitor Dominique Conclusions

GLASSO 1.7

Theorem. Solution to the graphical lasso problem to be diagonal withblocks C1,C2, ...CK ⇔ |Sij | ≤ λ for all i ∈ Ck , j ∈ Cl , k 6= l .

Θ =

Θ1

Θ2

. . .ΘK

where Θk solves the graphical lasso problem applied to submatrix of Sconsisting of the entries whose indices are in Ck .

[D.M. Witten, J. Friedman, N. Simon, 2011]

Page 21: Estimating the Inverse Covariance Matrix of Independent ... · Estimating inverse covariance matrix 1 We consider the problem of finding a good estimator for inverse covariance matrix

Hanne Iryna Vitor Dominique Conclusions

Problem Formulation

Cost Functional DefinitionOur goal is to minimize the following functional, f (v), under the set of definitepositive matrices.

minΣ−1�0

f (Σ−1) = − log(det Σ−1) + tr(Σ−1S) +ρ

2‖R(Σ−1)‖2

2 (1)

Minimizing the previous is equivalent to minimize :

min�0

f (Θ) = − log(det Θ) + tr(ΘS) +ρ

2‖R(Θ)‖2

2 (2)

Page 22: Estimating the Inverse Covariance Matrix of Independent ... · Estimating inverse covariance matrix 1 We consider the problem of finding a good estimator for inverse covariance matrix

Hanne Iryna Vitor Dominique Conclusions

Model Reduction- Closest Neighbor

Neighborhood -RuleReduce the model by assuming that each pixel is only related to itself and tothose which it shares a boundary or a corner. As the following figure shows:

Neighborhood -RuleThis rule permits to reduced the problem dimension from n = (nx ny )2 tom = 5nx ny − 3nx − 3ny + 2, where nx and ny are the number of pixels on thex-axis and y-axis respectively.

Page 23: Estimating the Inverse Covariance Matrix of Independent ... · Estimating inverse covariance matrix 1 We consider the problem of finding a good estimator for inverse covariance matrix

Hanne Iryna Vitor Dominique Conclusions

Model Reduction-Problem Reformulation

TransformationOnce g : Rm → Rn,n is defined, the problem becomes an unconstrainedoptimization problem on Rm. Therefore the minimizing problem (2) is writtenas:

minv∈Rm

J(v) = f (g(v)) = − log(det g(v)) + tr(g(v)S) +ρ

2‖R(g(v))‖2

2 (3)

Page 24: Estimating the Inverse Covariance Matrix of Independent ... · Estimating inverse covariance matrix 1 We consider the problem of finding a good estimator for inverse covariance matrix

Hanne Iryna Vitor Dominique Conclusions

Numerical Approach

Quasi-Newton’s MethodExpanding J(v) to a quadratic function we get :

J(v + ∆) ≈ J(v) +∇J(v) ·∆ + ∆tH(v)∆ (4)

Therefore the ∆ that minimizes J(v + ∆) must be the solution of∆ = −H−1(v) ∇J(v). So given an initial guess v0 one can solve the iterativesequence:

vk = vk−1 − αk H−1k (vk−1)J ′(vk−1) (5)

Page 25: Estimating the Inverse Covariance Matrix of Independent ... · Estimating inverse covariance matrix 1 We consider the problem of finding a good estimator for inverse covariance matrix

Hanne Iryna Vitor Dominique Conclusions

Gradient Computation:The gradient of J(v) is computed exactly by:

J ′(v) = ∇f (g(v)) · g′(v) ∈ Rm (6)

Where ∇f (Θ) is given by

(∇f )i,j = −Θ−1j,i + Sj,i + +ρ[R′(Θ)

∂Θ

∂σi,j]i,j (7)

Hessian Computation:Since is computational expensive to compute the Hessian, one should use anapproximation method such as BFGS.

Page 26: Estimating the Inverse Covariance Matrix of Independent ... · Estimating inverse covariance matrix 1 We consider the problem of finding a good estimator for inverse covariance matrix

Hanne Iryna Vitor Dominique Conclusions

Numerical Results, n=100

Error Analysis ‖g(v)S − I‖2

Error Analysis ‖g(v)−Θ‖2

Page 27: Estimating the Inverse Covariance Matrix of Independent ... · Estimating inverse covariance matrix 1 We consider the problem of finding a good estimator for inverse covariance matrix

Hanne Iryna Vitor Dominique Conclusions

QUIC Algorithm, Cho Jui Hsieh University of Texas:

Cost Function

minΘ

J(Θ) = − log(det Θ) + tr(ΘS) +ρ

2‖Θ‖1 (8)

Split Into a Differentiable Function and a Non-Differentiable

g(Θ) = − log(det Θ) + tr(ΘS) (9)

h(Θ) =ρ

2‖Θ‖1 (10)

Page 28: Estimating the Inverse Covariance Matrix of Independent ... · Estimating inverse covariance matrix 1 We consider the problem of finding a good estimator for inverse covariance matrix

Hanne Iryna Vitor Dominique Conclusions

QUIC Algorithm, Cho Jui Hsieh University of Texas:

Second Order Approximation of g(.)

G(∆) = g(Θ + ∆) = g(Θ) +∇g(Θ + ∆) + ∆tH∆ (11)

New Cost functional

min∆

F (∆) = G(∆) + h(Θ + ∆) (12)

Choice of ∆

∆ = µ[ej eti + ei et

j ] (13)

minµ

F (µ) (14)

Page 29: Estimating the Inverse Covariance Matrix of Independent ... · Estimating inverse covariance matrix 1 We consider the problem of finding a good estimator for inverse covariance matrix

Hanne Iryna Vitor Dominique Conclusions

Overview / Future Work

Overview• Model Reduction• Numerical Approach• Numerical Results• QUIC Algorithm

Future Work• Improve Memory Efficiency• Better Choice of x0

• Different Rules of Reduction• Incorporate the Model Reduction to QUIC

Page 30: Estimating the Inverse Covariance Matrix of Independent ... · Estimating inverse covariance matrix 1 We consider the problem of finding a good estimator for inverse covariance matrix

Hanne Iryna Vitor Dominique Conclusions

Simulation

Inverse Covariance TestMatrix

Σ−1 = 0 20 40 60 80 100

0

10

20

30

40

50

60

70

80

90

100

nz = 838

Student Version of MATLAB

Inverse CovarianceEstimation

Σ−1

Multivariate NormalSampling

x1, x2, . . . , xN

Sample Covariance

S = xxT/N

Page 31: Estimating the Inverse Covariance Matrix of Independent ... · Estimating inverse covariance matrix 1 We consider the problem of finding a good estimator for inverse covariance matrix

Hanne Iryna Vitor Dominique Conclusions

Timing (s)

Sample data: X ∈ Rd×N with N = 100.Covariance matrix S ∈ Rd×d .

Algorithm d = 49 d = 100 d = 200 d = 400S−1 0.0004 0.0018 0.0083 0.0477GLASSO 0.01± 0.01 0.17± 0.01 2.03± 0.03 23.51± 0.22QUIC 0.04± 0.02 0.39± 0.03 4.72± 0.11 77.15± 2.46quasi-Newton 10.5 56.2 ∗ ∗

Page 32: Estimating the Inverse Covariance Matrix of Independent ... · Estimating inverse covariance matrix 1 We consider the problem of finding a good estimator for inverse covariance matrix

Hanne Iryna Vitor Dominique Conclusions

Error Analysis (N = 100)

0 100 200 300 4000

10

20

30

40

50

60

70

80

90

100

d

Fro

be

niu

s n

orm

quasi!Newton

QUIC or GLASSO

S inverse

Student Version of MATLAB

Page 33: Estimating the Inverse Covariance Matrix of Independent ... · Estimating inverse covariance matrix 1 We consider the problem of finding a good estimator for inverse covariance matrix

Hanne Iryna Vitor Dominique Conclusions

Training Set extracted from

Page 34: Estimating the Inverse Covariance Matrix of Independent ... · Estimating inverse covariance matrix 1 We consider the problem of finding a good estimator for inverse covariance matrix

Hanne Iryna Vitor Dominique Conclusions

Results

original noisy (! = 5%)

GRMF denoising TV denoising

Student Version of MATLAB

Page 35: Estimating the Inverse Covariance Matrix of Independent ... · Estimating inverse covariance matrix 1 We consider the problem of finding a good estimator for inverse covariance matrix

Hanne Iryna Vitor Dominique Conclusions

Remarks

1. Computed Inverse Covariance Matrix on 15× 17 subblocks andaggregated blocks;

2. Assumed Stationarity;

3. Assumed Gaussianity of the image pixels;

4. Performed very basic (poor) alignment of training set;

5. Had a small (29) (and unhealthy) training set;

Nevertheless, it performs better than TV denoising!

Page 36: Estimating the Inverse Covariance Matrix of Independent ... · Estimating inverse covariance matrix 1 We consider the problem of finding a good estimator for inverse covariance matrix

Hanne Iryna Vitor Dominique Conclusions

Future Challenges

1. Computational Challenge: computing inverse covariance matrix on fullimages (e.g. 40, 000× 40, 000)

• Could Iteratively Re-weighted Least Squares Minimization for SparseRecovery help?

2. Parameter selection: “warm start” and generalized cross-validation;

3. Non-stationary model: use bigger and better aligned database;

4. Try Gaussianity model in transform domain or higher order models;

Page 37: Estimating the Inverse Covariance Matrix of Independent ... · Estimating inverse covariance matrix 1 We consider the problem of finding a good estimator for inverse covariance matrix

Hanne Iryna Vitor Dominique Conclusions

THANK YOU FOR YOUR TIME!!!