53
L 1 -L 2 1/36 Yifei Lou L 1 -L 2 model Algorithms Applications Conclusions Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei Lou Department of Mathematical Sciences University of Texas Dallas May 31, 2017 Partially supported by NSF DMS 1522786

L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L2 1/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

Minimizing the Difference of L1 and L2 Normswith Applications

Yifei Lou

Department of Mathematical SciencesUniversity of Texas Dallas

May 31, 2017

Partially supported by NSF DMS 1522786

Page 2: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L2 2/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

Outline

1 A nonconvex approach: L1-L2

2 Minimization algorithms

3 Some applications

4 Conclusions

Page 3: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L2 3/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

Background

We aim to find a sparse vector from an under-determinedlinear system,

x̂0 = argminx‖x‖0 s.t. Ax = b.

This is NP-hard.

A popular approach is to replace L0 by L1, i.e.,

x̂1 = argminx‖x‖1 s.t. Ax = b.

The equivalence between L0 and L1 norms holds when thematrix A satisfies the restricted isometry property (RIP).

Candes-Romberg-Tao (2006)

Page 4: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L2 3/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

Background

We aim to find a sparse vector from an under-determinedlinear system,

x̂0 = argminx‖x‖0 s.t. Ax = b.

This is NP-hard.

A popular approach is to replace L0 by L1, i.e.,

x̂1 = argminx‖x‖1 s.t. Ax = b.

The equivalence between L0 and L1 norms holds when thematrix A satisfies the restricted isometry property (RIP).

Candes-Romberg-Tao (2006)

Page 5: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L2 3/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

Background

We aim to find a sparse vector from an under-determinedlinear system,

x̂0 = argminx‖x‖0 s.t. Ax = b.

This is NP-hard.

A popular approach is to replace L0 by L1, i.e.,

x̂1 = argminx‖x‖1 s.t. Ax = b.

The equivalence between L0 and L1 norms holds when thematrix A satisfies the restricted isometry property (RIP).

Candes-Romberg-Tao (2006)

Page 6: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L2 4/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

Coherence

Another sparse recovery guarantee is based on coherence.

‖x‖0 612

(1 + µ(A)−1),

where coherence of a matrix A = [a1, · · · , aN ] is defined as

µ(A) = maxi 6=j

|aTi aj|

‖ai‖‖aj‖.

Two extreme cases are

• µ ∼ 0⇒ incoherent matrix• µ ∼ 1⇒ coherent matrix

What if the matrix is coherent?

Page 7: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L2 4/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

Coherence

Another sparse recovery guarantee is based on coherence.

‖x‖0 612

(1 + µ(A)−1),

where coherence of a matrix A = [a1, · · · , aN ] is defined as

µ(A) = maxi 6=j

|aTi aj|

‖ai‖‖aj‖.

Two extreme cases are• µ ∼ 0⇒ incoherent matrix• µ ∼ 1⇒ coherent matrix

What if the matrix is coherent?

Page 8: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L2 4/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

Coherence

Another sparse recovery guarantee is based on coherence.

‖x‖0 612

(1 + µ(A)−1),

where coherence of a matrix A = [a1, · · · , aN ] is defined as

µ(A) = maxi 6=j

|aTi aj|

‖ai‖‖aj‖.

Two extreme cases are• µ ∼ 0⇒ incoherent matrix• µ ∼ 1⇒ coherent matrix

What if the matrix is coherent?

Page 9: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L2 5/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

L1-L2 works well for coherent matrix

We consider an over-sampled DCT matrix with each columnas

aj =1√N

cos(2πjw

F) , j = 1, · · · ,N

where w is a random vector of length M.

The larger F is, the more coherent the matrix. Take a100× 1000 matrix for an example:

F coherence1 0.3981

10 0.998120 0.9999

P. Yin, Y. Lou, Q. He and J. Xin, SIAM Sci. Comput., 2015

Y. Lou, P. Yin, Q. He and J. Xin, J. Sci. Comput., 2015

Page 10: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L2 6/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

10 15 20 25 30 350

10

20

30

40

50

60

70

80

90

100

Sparsity

Suc

cess

rat

e

L1−L

2

L1/2

L1

Figure: Success rates of incoherent matrices, F = 1.

Page 11: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L2 6/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

5 10 15 20 25 30 350

10

20

30

40

50

60

70

80

90

100

Sparsity

Suc

cess

rat

e

L1−L

2

L1/2

L1

Figure: Success rates of coherent matrices, F = 20.

Page 12: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L2 7/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

Comparing metrics

L2 L1 L1 − L2

-1 -0.5 0 0.5 1-1

-0.5

0

0.5

1

-1 -0.5 0 0.5 1-1

-0.5

0

0.5

1

-1 -0.5 0 0.5 1-1

-0.5

0

0.5

1

Figure: Level lines of three metrics: L2 (strictly convex), L1(convex), and L1 − L2 (nonconvex).

Page 13: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L2 8/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

Comparing nonconvex metrics

Consider a matrix A of size 17× 19 and V ∈ R19×2 be thebasis of the null space of A, i.e. AV = 0.

So the feasible set is a two-dimensional affine space, i.e.

{x : Ax = Axg} = {x = xg + V[

st

]: s, t ∈ R}.

Visualize objective functions L0, L1/2, and L1-L2 over 2Dst-plane.

Page 14: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L2 9/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

L0: incoherent (left) and coherent (right)

Page 15: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L210/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

L1/2: incoherent (left) and coherent (right)

Page 16: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L211/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

L1-L2: incoherent (left) and coherent (right)

Page 17: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L212/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

Model failure v.s. algorithm failureL1 L1/2

5 10 15 20 25 30 350

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Sparsity

Occ

urre

nce

rate

Lasso

F (x̄) ≥ F (x)(model failure)F (x̄) < F (x)(algorithmic failure)

5 10 15 20 25 30 350

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Sparsity

Occ

urre

nce

rate

IRucLq-v

F (x̄) ≥ F (x)(model failure)F (x̄) < F (x)(algorithmic failure)

L1-L2 truncated L1-L2

5 10 15 20 25 30 350

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Sparsity

Occ

urre

nce

rate

l1−2

F (x̄) ≥ F (x)(model failure)F (x̄) < F (x)(algorithmic failure)

5 10 15 20 25 30 350

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Sparsity

Occ

urre

nce

rate

lt,1−2

F (x̄) ≥ F (x)(model failure)F (x̄) < F (x)(algorithmic failure)

Page 18: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L213/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

Truncated L1-L2

‖x‖t,1−2 :=∑

i/∈Γx,t|xi| −

√∑i/∈Γx,t

x2i ,

where Γx,t ⊆ {1, . . . ,N} with cardinality t is a set containingthe indices of the entries of x with the t largest magnitudes.

10 15 20 25 30 350

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Sparsity

Suc

cess

rat

e

Lassol1−2IRucLq-vISDlt,1−2

5 10 15 20 25 30 350

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Sparsity

Suc

cess

rat

e

Lassol1−2IRucLq-vISDlt,1−2

T. Ma, Y. Lou, and T. Huang, SIAM Imaging Sci., to appear 2017

Page 19: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L214/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

Advantages of L1-L2

• Lipschitz continuous

• Correct L1’s biasedness by subtracting something withsmooth gradient a.e.

• Exact recovery of 1-sparse vectors (truncated versionyields exact recovery of t-sparse vectors)

• Good for coherent compressive sensing

Page 20: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L214/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

Advantages of L1-L2

• Lipschitz continuous

• Correct L1’s biasedness by subtracting something withsmooth gradient a.e.

• Exact recovery of 1-sparse vectors (truncated versionyields exact recovery of t-sparse vectors)

• Good for coherent compressive sensing

Page 21: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L214/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

Advantages of L1-L2

• Lipschitz continuous

• Correct L1’s biasedness by subtracting something withsmooth gradient a.e.

• Exact recovery of 1-sparse vectors (truncated versionyields exact recovery of t-sparse vectors)

• Good for coherent compressive sensing

Page 22: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L214/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

Advantages of L1-L2

• Lipschitz continuous

• Correct L1’s biasedness by subtracting something withsmooth gradient a.e.

• Exact recovery of 1-sparse vectors (truncated versionyields exact recovery of t-sparse vectors)

• Good for coherent compressive sensing

Page 23: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L215/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

Outline

1 A nonconvex approach: L1-L2

2 Minimization algorithms

3 Some applications

4 Conclusions

Page 24: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L216/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

Algorithms

We consider an unconstrained L1 − L2 formulation, i.e.,

minx∈RN

F(x) =12‖A x − b‖2

2 + λ(‖x‖1 − ‖x‖2).

Our first attempt is using the difference of convex algorithm(DCA) by composing decompose F(x) = G(x)− H(x) into{

G(x) = 12‖A x − b‖2

2 + λ‖x‖1H(x) = λ‖x‖2.

An iterative scheme is,

xn+1 = arg minx∈RN

12‖A x − b‖2

2 + λ‖x‖1 − 〈x,λxn

‖xn‖2〉.

Page 25: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L217/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

We then derive a proximal operator for L1-αL2 (α ≥ 0)

x∗ = arg minxλ (‖x‖1 − α‖x‖2) +

12‖x − y‖2

2,

which has a closed-form solution:

1 If ‖y‖∞ > λ, then x∗ = z(‖z‖2 + αλ)/‖z‖2, wherez = shrink(y, λ);

2 if ‖y‖∞ = λ, then ‖x∗‖2 = αλ, x∗i = 0 for |yi| < λ;

3 If (1− α)λ < ‖y‖∞ < λ, then x∗ is 1-sparse vectorsatisfying x∗i = 0 for |yi| < ‖y‖∞;

4 If ‖y‖∞ ≤ (1− α)λ, then x∗ = 0.

Y. Lou and M. Yan, J. Sci Comput., to appear 2017

Page 26: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L218/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

Remarks

• Most L1 solves are applicable for L1-αL2 by replacingsoft shrinkage with this proximal operator.

• The algorithm of combining ADMM and this operator(nonconvex-ADMM), is much faster than the DCA.

• Both nonconvex-ADMM and DCA converge tostationary points.

• However, nonconvex-ADMM does not give betterperformance than DCA for coherent CS.

Page 27: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L218/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

Remarks

• Most L1 solves are applicable for L1-αL2 by replacingsoft shrinkage with this proximal operator.

• The algorithm of combining ADMM and this operator(nonconvex-ADMM), is much faster than the DCA.

• Both nonconvex-ADMM and DCA converge tostationary points.

• However, nonconvex-ADMM does not give betterperformance than DCA for coherent CS.

Page 28: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L218/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

Remarks

• Most L1 solves are applicable for L1-αL2 by replacingsoft shrinkage with this proximal operator.

• The algorithm of combining ADMM and this operator(nonconvex-ADMM), is much faster than the DCA.

• Both nonconvex-ADMM and DCA converge tostationary points.

• However, nonconvex-ADMM does not give betterperformance than DCA for coherent CS.

Page 29: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L218/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

Remarks

• Most L1 solves are applicable for L1-αL2 by replacingsoft shrinkage with this proximal operator.

• The algorithm of combining ADMM and this operator(nonconvex-ADMM), is much faster than the DCA.

• Both nonconvex-ADMM and DCA converge tostationary points.

• However, nonconvex-ADMM does not give betterperformance than DCA for coherent CS.

Page 30: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L219/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

15 20 25 30 35

Sparsity

0

10

20

30

40

50

60

70

80

90

100 L1 (ADMM)

L1-L

2 (DCA)

L1-L

2 (ADMM)

L1-L

2 (weighted)

Figure: Success rates of coherent matrices, F=20.

Page 31: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L220/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

15 20 25 30 35

Sparsity

0

10

20

30

40

50

60

70

80

90

100 L1 (ADMM)

L1-L

2 (DCA)

L1-L

2 (ADMM)

L1-L

2 (weighted)

Figure: Success rates of incoherent matrices, F=5.

Page 32: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L221/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

• DCA is more stable than nonconvex-ADMM, as eachDCA subproblem is convex.

• Since it is convex at α = 0, we consider a continuationscheme of gradually increasing α from 0 to 1, referredto as “weighted”.

• How to update the weight α?

• For incoherence matrices, a linear increase for α with alarge slope until reaching one;

• For coherent cases, a sigmoid function to update α,which may or may not reach one.

Page 33: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L221/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

• DCA is more stable than nonconvex-ADMM, as eachDCA subproblem is convex.

• Since it is convex at α = 0, we consider a continuationscheme of gradually increasing α from 0 to 1, referredto as “weighted”.

• How to update the weight α?

• For incoherence matrices, a linear increase for α with alarge slope until reaching one;

• For coherent cases, a sigmoid function to update α,which may or may not reach one.

Page 34: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L221/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

• DCA is more stable than nonconvex-ADMM, as eachDCA subproblem is convex.

• Since it is convex at α = 0, we consider a continuationscheme of gradually increasing α from 0 to 1, referredto as “weighted”.

• How to update the weight α?

• For incoherence matrices, a linear increase for α with alarge slope until reaching one;

• For coherent cases, a sigmoid function to update α,which may or may not reach one.

Page 35: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L221/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

• DCA is more stable than nonconvex-ADMM, as eachDCA subproblem is convex.

• Since it is convex at α = 0, we consider a continuationscheme of gradually increasing α from 0 to 1, referredto as “weighted”.

• How to update the weight α?

• For incoherence matrices, a linear increase for α with alarge slope until reaching one;

• For coherent cases, a sigmoid function to update α,which may or may not reach one.

Page 36: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L221/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

• DCA is more stable than nonconvex-ADMM, as eachDCA subproblem is convex.

• Since it is convex at α = 0, we consider a continuationscheme of gradually increasing α from 0 to 1, referredto as “weighted”.

• How to update the weight α?

• For incoherence matrices, a linear increase for α with alarge slope until reaching one;

• For coherent cases, a sigmoid function to update α,which may or may not reach one.

Page 37: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L222/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

0 1000 2000 3000 4000 5000 6000 7000 8000

Iteration number

0

0.2

0.4

0.6

0.8

1

F=5F=20

Figure: Different ways of updating α for incoherent (blue) orcoherent (red) matrices.

Page 38: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L223/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

15 20 25 30 35

Sparsity

0

10

20

30

40

50

60

70

80

90

100 L1 (ADMM)

L1-L

2 (DCA)

L1-L

2 (ADMM)

L1-L

2 (weighted)

Figure: Success rates of coherent matrices, F=20.

Page 39: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L224/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

Outline

1 A nonconvex approach: L1-L2

2 Minimization algorithms

3 Some applications

4 Conclusions

Page 40: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L225/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

Super-resolution

The super-resolution problem discussed here is different toimage zooming or magnification, but aiming to recover areal-valued signal from its low-frequency measurements.

A mathematical model is expressed as

bk =1√N

N−1∑t=0

xte−i2πkt/N , |k| ≤ fc,

where x ∈ RN is a vector of interest, and b ∈ Cn is the givenlow frequency information with n = 2fc + 1 (n < N).

Page 41: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L226/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

Point source with minimum separation

Theorem by Candés and Fernandez-Granda 2012Let T = {tj} be the support of x. If the minimum distanceobeys

4(T) ≥ 2 · N/fc,

then x is the unique solution to L1 minimization. If x isreal-valued, then the minimum gap can be lowered to1.26 · N/fc.

Page 42: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L227/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

0.75 0.8 0.85 0.9 0.95 1 1.05 1.1 1.15 1.2 1.25

0

10

20

30

40

50

60

70

80

90

100

MSF

Constrained L1 (SDP)

Constrained L1−2

Unconstrained L1−2

Unconstrained CL1

Unconstrained L1/2

Y. Lou, P. Yin and J. Xin, J. Sci. Comput., 2016

Page 43: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L228/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

Low-rank recovery

Replacing nuclear norm with truncated L1-L2 of the singularvalues

10 14 18 22 26 300

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Rank

Suc

cess

rat

e

FPCFPCALMaFitIRucLq-M

lt,1−2

T. Ma, Y. Lou, and T. Huang, SIAM Imaging Sci., to appear 2017

Page 44: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L229/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

Image processing

We consider

J(u) = ‖Dxu‖1 + ‖Dyu‖1 − α‖√|Dxu|2 + |Dyu|2‖1 ,

which turns out to be a weighted difference of anisotropicand isotropic TV:

J(u) = Jani − αJiso,

where α ∈ [0, 1] is a weighting parameter.

Gradient vectors (ux, uy) are mostly 1-sparse, and α takesinto account the occurrence of non-sparse gradient vectors.

Y. Lou, T. Zeng, S. Osher and J. Xin, SIAM J. Imaging Sci., 2015

Page 45: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L230/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

MRI reconstruction

original k-space data sampling mask

FBP, ER=0.99 L1,ER = 0.1 L1 − L2,ER ∼ 10−8

Page 46: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L231/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

Block-matching

Block−matching Grouping

Reference patch

Similar patches

Vectorizationw × w → w2 × 1

w2 × n

Figure: Illustration of constructing groups by block-matching(BM). For each w× w reference patch from an n1 × n2 image, weuse block-matching to search its n− 1 best matched patches interms of Euclidean distance, and then vectorize and combinethose patches to form a group of size w2 × n.

Page 47: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L232/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

Block-matching inpaintingBarbara 85% missing

BM3D SAIST oursPSNR=26.71,SSIM=0.8419 PSNR=29.53,SSIM=0.9147 PSNR=30.65,SSIM=0.9264

T. Ma, Y. Lou, T. Huang and X. Zhao, ICIP, to appear 2017

Page 48: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L233/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

Logistic regression

Given a collection of training data xi ∈ Rn with a labelyi ∈ {±1} for i = 1, 2, · · · ,m, we aim to find a hyperplanedefined by {x : wT x + v = 0} by minimizing the followingobjective function,

minw∈Rn,v∈R

F(w, v) + lavg(w, v)

where the second term is called averaged loss defined as

lavg(w, v) =1m

m∑i=1

ln(1 + exp(−yi(wT xi + v))

).

Page 49: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L234/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

Preliminary results

Real data of patients with inflammatory bowel disease (IBD) foryears 2011 and 2012. For each year, the data set contains 18types of medical information, such as prescriptions, number ofoffice visits, and whether the patient was hospitalized. We usedthe 2011 data to train our classifier and the 2012 data to validateits performance.

L1 L2 L1-L2

Recall 0.6494 0.6585 0.6829Precision 0.0883 0.0882 0.0912F-Score 0.0573 0.0581 0.0622AUC 0.7342 0.7321 0.7491

UCLA REU project in 2015, followed by Q. Jin and Y. Lou for a journal submission

Page 50: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L235/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

Conclusions

1 L1-L2 is always better than L1, and is better than Lp forhighly coherent matrices.

2 Proximal operator can accelerate the minimization, butit tends to obtain a suboptimal solution.

3 In general, nonconvex methods have better empiricalperformance compared to convex ones, but lack ofprovable grounds.

Page 51: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L235/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

Conclusions

1 L1-L2 is always better than L1, and is better than Lp forhighly coherent matrices.

2 Proximal operator can accelerate the minimization, butit tends to obtain a suboptimal solution.

3 In general, nonconvex methods have better empiricalperformance compared to convex ones, but lack ofprovable grounds.

Page 52: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L235/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

Conclusions

1 L1-L2 is always better than L1, and is better than Lp forhighly coherent matrices.

2 Proximal operator can accelerate the minimization, butit tends to obtain a suboptimal solution.

3 In general, nonconvex methods have better empiricalperformance compared to convex ones, but lack ofprovable grounds.

Page 53: L1 L2 Minimizing the Difference of and Norms with Applicationsims.nus.edu.sg/events/2017/data/files/yifei.pdf · Minimizing the Difference of L 1 and L 2 Norms with Applications Yifei

L1-L236/36

Yifei Lou

L1-L2 model

Algorithms

Applications

Conclusions

Thank you!