15
Sparse Inverse Covariance Estimation with the Graphical Lasso Paper by: Jerome Friedman, Trevor Hastie, and Robert Tibshirani Presented by: Joseph Lubars

Sparse Inverse Covariance Estimation with the Graphical Lassoswoh.web.engr.illinois.edu/.../handout/fall2016_slide22.pdf · 2016. 12. 3. · Sparse Inverse Covariance Estimation with

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Sparse Inverse Covariance Estimation with the Graphical Lassoswoh.web.engr.illinois.edu/.../handout/fall2016_slide22.pdf · 2016. 12. 3. · Sparse Inverse Covariance Estimation with

Sparse Inverse Covariance Estimation with the Graphical

LassoPaper by: Jerome Friedman, Trevor Hastie, and Robert Tibshirani

Presented by: Joseph Lubars

Page 2: Sparse Inverse Covariance Estimation with the Graphical Lassoswoh.web.engr.illinois.edu/.../handout/fall2016_slide22.pdf · 2016. 12. 3. · Sparse Inverse Covariance Estimation with

Preliminaries: LASSO

• We have a model 𝑌 = 𝑋𝛽 + 𝐸, where 𝐸 ∼ 𝑁 0, 𝜎2𝐼

• 𝑌 ∈ ℝ𝑝 is a vector of observations, X ∈ ℝ𝑛×𝑝 an observation matrix

• Want to estimate 𝛽

• Use MLE (Least Squares):

• Closed form solution: 𝛽 = 𝑋𝑇𝑋 −1𝑋𝑇𝑌

• What if 𝑝 > 𝑛? What if we want sparsity?

𝛽 = argmin𝛽

𝑌 − 𝑋𝛽 22

Page 3: Sparse Inverse Covariance Estimation with the Graphical Lassoswoh.web.engr.illinois.edu/.../handout/fall2016_slide22.pdf · 2016. 12. 3. · Sparse Inverse Covariance Estimation with

LASSO (Continued)

• What if we want sparsity in our solution? One attempt:

• This problem is not convex and is very difficult to solve.

• We relax the norm to the 𝐿1 norm and use the Lagrangian version:

• This is called LASSO!

• Can be solved efficiently using coordinate descent

𝛽 = argmin𝛽 𝑌 − 𝑋𝛽 22 s.t. 𝛽 0 = 𝑐

𝛽 = argmin𝛽 𝑌 − 𝑋𝛽 22 + 𝜆 𝛽 1

Page 4: Sparse Inverse Covariance Estimation with the Graphical Lassoswoh.web.engr.illinois.edu/.../handout/fall2016_slide22.pdf · 2016. 12. 3. · Sparse Inverse Covariance Estimation with

Recall: Gaussian Graphical Models

• We have a Gaussian random vector 𝑥 ∈ 𝑅𝑝

• Covariance Form: 𝑥 = 𝑁(𝜇, Σ)

• Information Form: 𝑥 = 𝑁−1(ℎ, Θ)

• Θ = Σ−1 encodes conditional independences

• We find the graph structure by estimating the non-zero entries of Θ

Page 5: Sparse Inverse Covariance Estimation with the Graphical Lassoswoh.web.engr.illinois.edu/.../handout/fall2016_slide22.pdf · 2016. 12. 3. · Sparse Inverse Covariance Estimation with

Estimating Θ

• Given data 𝑋 ∈ ℝ𝑛×𝑝 with 𝑛 observations of 𝑥 ~𝑁 0, Σ

• We can calculate the empirical covariance matrix: 𝑆 =1

𝑛𝑋𝑇𝑋

• Goal: Estimate Θ = Σ−1

• Log-Likelihood for Θ:

• Maximized when Θ = 𝑆−1

log det Θ − 𝑡𝑟(𝑆Θ)

Page 6: Sparse Inverse Covariance Estimation with the Graphical Lassoswoh.web.engr.illinois.edu/.../handout/fall2016_slide22.pdf · 2016. 12. 3. · Sparse Inverse Covariance Estimation with

Graphical Lasso Formulation

• Want to encourage sparsity in Θ

• 𝐿1 penalized formulation:

• We want to maximize this over non-negative definite matrices Θ

• This is an SDP, but we want to solve it more efficiently

log det Θ − 𝑡𝑟 𝑆Θ − 𝜆 Θ 1

Page 7: Sparse Inverse Covariance Estimation with the Graphical Lassoswoh.web.engr.illinois.edu/.../handout/fall2016_slide22.pdf · 2016. 12. 3. · Sparse Inverse Covariance Estimation with

Optimality Conditions

• Recall the objective:

• KKT Conditions:

• Γ𝑖𝑗 = 𝛾𝑖𝑗: Subgradient of Θ 𝑖𝑗: Sign(Θ𝑖𝑗) if Θ𝑖𝑗 ≠ 0, 𝛾𝑖𝑗 ∈ [−1,1] o.w.

• We will estimate Θ−1 as 𝑊

max𝜃

(log det Θ − 𝑡𝑟 𝑆Θ − 𝜆 Θ 1)

Θ−1 − 𝑆 − 𝜆Γ = 0

Page 8: Sparse Inverse Covariance Estimation with the Graphical Lassoswoh.web.engr.illinois.edu/.../handout/fall2016_slide22.pdf · 2016. 12. 3. · Sparse Inverse Covariance Estimation with

Optimality Conditions, Blockwise

• Consider blocks of 𝑊 and 𝑆:

• Write the conditions from the previous page for the upper right block:

• Time for the magic…

𝑊 =𝑊11 𝑤12

𝑤21 𝑤22𝑆 =

𝑆11 𝑠12

𝑠21 𝑠22

𝑤12 − 𝑠12 − 𝜆𝛾12 = 0

Page 9: Sparse Inverse Covariance Estimation with the Graphical Lassoswoh.web.engr.illinois.edu/.../handout/fall2016_slide22.pdf · 2016. 12. 3. · Sparse Inverse Covariance Estimation with

A Certain Equivalence

• Consider the following Quadratic Program:

• Its KKT Conditions:

• The Conditions for the Upper-Right Block:

min𝛽∈ℝ𝑝−1

{1

2𝛽𝑇𝑊11𝛽 − 𝛽𝑇𝑠12 + 𝜆 𝛽 1 }

𝑊11𝛽 − 𝑠12 + 𝜆𝜌 = 0

𝑤12 − 𝑠12 − 𝜆𝛾12 = 0

Equivalent if:• 𝛽 = 𝑊11

−1𝑤12

• 𝜌 = −𝛾12

Page 10: Sparse Inverse Covariance Estimation with the Graphical Lassoswoh.web.engr.illinois.edu/.../handout/fall2016_slide22.pdf · 2016. 12. 3. · Sparse Inverse Covariance Estimation with

Exploring the Equivalence

• All right, so what if 𝛽 = 𝑊11−1𝑤12?

• We know how to do block matrix inverses from class:

• So 𝛽 = −Θ12

Θ22, and we also have 𝜌 = −𝛾12

𝑊 =𝑊11 𝑤12

𝑤21 𝑤22=

(Θ11 −Θ12Θ21

Θ22) −𝑊11

Θ12

Θ22

1

Θ22−

Θ21𝑊11Θ12

Θ222

Page 11: Sparse Inverse Covariance Estimation with the Graphical Lassoswoh.web.engr.illinois.edu/.../handout/fall2016_slide22.pdf · 2016. 12. 3. · Sparse Inverse Covariance Estimation with

How Does This Help Us?

• We can now solve one block of our objective by solving:

• But this is secretly LASSO:

• And we can solve LASSO efficiently!

min𝛽∈ℝ𝑝−1

(1

2𝛽𝑇𝑊11𝛽 − 𝛽𝑇𝑠12 + 𝜆 𝛽 1 )

min𝛽

(1

2𝑊11

12 𝛽 − 𝑊11

−12𝑠12 2

2

+ 𝜆 𝛽 1 )

Page 12: Sparse Inverse Covariance Estimation with the Graphical Lassoswoh.web.engr.illinois.edu/.../handout/fall2016_slide22.pdf · 2016. 12. 3. · Sparse Inverse Covariance Estimation with

Our Strategy

1. Start with 𝑊 = 𝑆 + 𝜆𝐼

2. Solve the LASSO sub-problem (and save the value of 𝛽):

3. Update 𝑤12 and 𝑤21 using 𝑤12 = 𝑊11 𝛽

4. Rearrange 𝑊 so the next row and column are in position 12

5. Repeat steps 2-4 until convergence

6. Calculate the diagonals of Θ (using Θ22 = 1/(𝑊22 − 𝑊12𝑇 𝛽))

7. Use the most recent values of 𝛽 to complete Θ (using 𝛽 = −Θ12

Θ22)

𝛽 = min𝛽

(1

2𝑊11

12 𝛽 − 𝑊11

−12𝑠12 2

2

+ 𝜆 𝛽 1 )

Page 13: Sparse Inverse Covariance Estimation with the Graphical Lassoswoh.web.engr.illinois.edu/.../handout/fall2016_slide22.pdf · 2016. 12. 3. · Sparse Inverse Covariance Estimation with

What is Going On?

• We used block coordinate descent:• Start with a problem of optimizing 𝑊

• Break it into a number of smaller sub-problems (blocks of 𝑊)

• Solve the sub-problems and update the associated variables in 𝑊

• Because our problem is convex, block coordinate descent is guaranteed to converge

• Each of the sub-problems is equivalent to LASSO, so we can solve them efficiently!

Page 14: Sparse Inverse Covariance Estimation with the Graphical Lassoswoh.web.engr.illinois.edu/.../handout/fall2016_slide22.pdf · 2016. 12. 3. · Sparse Inverse Covariance Estimation with

Solving LASSO efficiently

• We have the problem:

• We don’t actually have to do these matrix powers and multiplications, since we can calculate closed form solutions for coordinate without them:

min𝛽

(1

2𝑊11

12 𝛽 − 𝑊11

−12𝑠12 2

2

+ 𝜆 𝛽 1 )

𝛽𝑗 = 𝑆((𝑠12)𝑗 −

𝑘≠𝑗

(𝑊11)𝑘𝑗 𝛽𝑘 , 𝜌)/ 𝑊11 𝑗𝑗

𝑆 𝑥, 𝑡 = 𝑠𝑖𝑔𝑛 𝑥 𝑥 − 𝑡 +

Page 15: Sparse Inverse Covariance Estimation with the Graphical Lassoswoh.web.engr.illinois.edu/.../handout/fall2016_slide22.pdf · 2016. 12. 3. · Sparse Inverse Covariance Estimation with

Questions?