48
x Nonsmooth, Nonconvex Minimization Problems with Applications Xiaojun Chen Department of Applied Mathematics Hong Kong Polytechnic University ICCOPT, Santiago, Chile, 26-29 July 2010 Nonsmooth, nonconvex Minimization Problems with Applications 0/47

xNonsmooth, Nonconvex Minimization - PolyU

  • Upload
    others

  • View
    28

  • Download
    0

Embed Size (px)

Citation preview

Page 1: xNonsmooth, Nonconvex Minimization - PolyU

xNonsmooth, Nonconvex Minimization

Problems with Applications

Xiaojun Chen

Department of Applied Mathematics

Hong Kong Polytechnic University

ICCOPT, Santiago, Chile, 26-29 July 2010

Nonsmooth, nonconvex Minimization Problems with Applications 0/47

Page 2: xNonsmooth, Nonconvex Minimization - PolyU

Nonsmooth, nonconvex minimization

minx∈X

f(x),

where X ⊆ Rn is convex but f : Rn → R is

not convex

not differentiable

not locally Lipschitz in some applications

OutlineMathematical models and applications:

Traffic assignment under uncertaintyDistribution of points on the sphereVariable selection, signal reconstruction

Smoothing algorithms

Nonsmooth, nonconvex Minimization Problems with Applications 1/47

Page 3: xNonsmooth, Nonconvex Minimization - PolyU

Part I: Mathematical models and applications

I. Stochastic complementarity problems— Traffic assignment under uncertainty

M. Fukushima (Kyoto Univ.)A. Sumalee (PolyU, Transportation engineering)

C. Zhang (Beijing Jiaotong Univ.), et al.

II. Optimization on the sphere— Distribution of points on the sphere

I. Sloan, R. Womersley (Univ. New South Wales)A. Frommer, B. Lang (Wuppertal Univ.)

J. Ye (Victoria Univ.), et al.

III. The ℓ2-ℓp (0 < p < 1) minimization— Variable selection, signal reconstruction

Y. Ye (Stanford Univ.)

F. Xu (Xi’an Jiaotong Univ.), W. Zhou (PolyU), et al.

Nonsmooth, nonconvex Minimization Problems with Applications 2/47

Page 4: xNonsmooth, Nonconvex Minimization - PolyU

I. Stochastic complementarity problemsNonlinear complementarity problem (NCP): Given F : Rn → Rn,

x ≥ 0, F (x) ≥ 0, xTF (x) = 0.

The NCP can be reformulated as a system of nonlinear equations

Φ(x, F (x)) =

φ(x1, F1(x))...

φ(xn, Fn(x))

= 0

or a minimization problem

minx∈Rn

‖Φ(x, F (x))‖2

by using an NCP function φ.

Nonsmooth, nonconvex Minimization Problems with Applications 3/47

Page 5: xNonsmooth, Nonconvex Minimization - PolyU

NCP functions

A function φ : R2 → R is called an NCP-function if

φ(a, b) = 0 ⇔ ab = 0, a ≥ 0, b ≥ 0.

Example of NCP functions

φNR(a, b) = min(a, b) natural residualφFB(a, b) = a+ b−

√a2 + b2 Fischer-Burmeister function

φCCK(a, b) = λφFB(a, b) + (1− λ)a+b+ penalized FB function

Smoothing Newton methods and semismooth Newton methods areefficient to solve the NCP via the nonsmooth equations Φ(x, F (x)) = 0

or minimization problem min ‖Φ(x, F (x))‖2.Cottle-Pang-Stone (1992), Facchinei-Pang (2000), Ferris-Pang (1997), B.Chen-Harker

(1997), C.Chen-Mangasarian (1996), Chen-Qi-Sun (1998), Chen-Ye (1999),

Chen-Chen-Kanzwo (2000), Luo-Tseng(1997), Yamashita-Fukushima (1997), Qi-Sun

(1993), Ralph (1994) et al.Nonsmooth, nonconvex Minimization Problems with Applications 4/47

Page 6: xNonsmooth, Nonconvex Minimization - PolyU

Stochastic NCP using an NCP functionStochastic NCP: Given F : Rn × Ω→ Rn,

x ≥ 0, F (x, ω) ≥ 0, xTF (x, ω) = 0, for ω ∈ Ω.

Expected value (EV) formulation

x ≥ 0, E[F (x, ω)] ≥ 0, xTE[F (x, ω)] = 0

⇔ minx∈Rn

‖Φ(x,E[F (x, ω)])‖2

Expected residual minimization (ERM) formulation

minx≥0

E[‖Φ(x, F (x, ω))‖2]

Best worst case(BWC) formulation

minx≥0

supω∈Ω‖Φ(x, F (x, ω))‖2

Nonsmooth, nonconvex Minimization Problems with Applications 5/47

Page 7: xNonsmooth, Nonconvex Minimization - PolyU

Traffic assignmentNguyen and Dupuis Network

i

i

i

i

i

i

i

i

i

i

i

i

i

-HHHHHHHHHj- - - -

- - -

-

? ?

? ? ? ?

?

@@

@@@R

@@

@@@R

4

9

5

1

13

10

6

12

3

11

7

2

8

2

18

3 5 7 9

12 14 15

19

1

6

17

8 10

16

114

13

13 nodes, 19 links, 25 paths connecting 4 origin-destination (OD) pairs

1→ 2, 4→ 2, 1→ 3 and 4→ 3.Nonsmooth, nonconvex Minimization Problems with Applications 6/47

Page 8: xNonsmooth, Nonconvex Minimization - PolyU

Sioux Falls network

1

654

7

3

16

2

9 8

19

17

101112 18

1514

20

22

21

23

2413

3

11

1

7

4

15

6

145

9 12

8

37

35

38

33

36

10 31

2313

27

40

25 26

21

48

16 19

2

29

22 47

20

17

55

18 54

50

24

34

41

28 43

3052

53 5657

42

45

3251 49

71

7270

46 6759 61

6675

7673 69 656264

6368

3974

44

6058

24 nodes, 76 links, 528 OD pairs, 1179 paths

Nonsmooth, nonconvex Minimization Problems with Applications 7/47

Page 9: xNonsmooth, Nonconvex Minimization - PolyU

Wardrop’s user equilibriumWardrop’s user equilibrium At the equilibrium point no travelercan change his route to reduce his travel cost.

For one scenario ω ∈ Ω, the static Wardrop’s user equilibrium isequivalent to NCP

x ≥ 0, F (x, ω) ≥ 0, xTF (x, ω) = 0,

where

x =

(

y

u

)

, F (x, ω) =

(

G(y, ω)− ΓTu

Γy −Q(ω)

)

.

y : a path flow pattern, u : a travel cost vector.

G : path travel cost functionΓ : Origin-Destination(OD) route incidence matrixQ : demand on each OD-pair

Nonsmooth, nonconvex Minimization Problems with Applications 8/47

Page 10: xNonsmooth, Nonconvex Minimization - PolyU

ERM formulationExpected residual minimization (ERM) formulation

minx≥0

f(x) := E[‖min(x, F (x, ω))‖2] (ERM)

Chen-Fukushima(MOR2005),Fang-Chen-Fukushima(SIOPT2007).

Error bounds

E[‖x− x∗ω‖] ≤ kE[‖min(x, F (x, ω))‖2]

E[dist(x−X∗ω)] ≤ kE[‖min(x, F (x, ω))‖2]

Chen-Xiang (MP 2006, 2009, SIOPT 2007).

Smoothing algorithms for solving ERMChen-Zhang-Fukushima(MP2009), Zhang-Chen(SIOPT2009).

Applications in traffic assignmentZhang-Chen-Sumalee (2009)

Nonsmooth, nonconvex Minimization Problems with Applications 9/47

Page 11: xNonsmooth, Nonconvex Minimization - PolyU

II. Optimization on the sphere

S2 = z ∈ R3 : ‖z‖2 = 1 , Area |S2| = 4π

Pt: the linear space of restrictions of polynomials

of degree ≤ t in 3 variables to S2.

dim Pt= (t + 1)2

Nonsmooth, nonconvex Minimization Problems with Applications 10/47

Page 12: xNonsmooth, Nonconvex Minimization - PolyU

Distribution of points on the sphere

Pt can be spanned by an orthonormal set of real spherical harmonicswith degree r and order k,

Yrk | k = 1, . . . , 2r + 1, r = 0, 1, . . . , t.

Let XN = z1, . . . , zN ⊂ S2 be a set of N -points on the sphere.

The Gram matrixGt(XN ) = Y (XN )TY (XN ),

where Y (XN ) ∈ R(t+1)2×N and the j-th column of Y (XN ) is given by

Yrk(zj), k = 1, . . . , 2r + 1, r = 0, 1, . . . , t.

Nonsmooth, nonconvex Minimization Problems with Applications 11/47

Page 13: xNonsmooth, Nonconvex Minimization - PolyU

Four sets of points on the sphereSet of points XN = z1, . . . , zN ⊂ S2

minimum energy system argmin

N∑

i 6=j

1

‖zi − zj‖extremal system argmax det(Y (XN )Y (XN )T )

minimum cond points argminλmax(Y (XN )Y (XN )T )

λmin(Y (XN )Y (XN )T )

(Chen-Womersley-Ye 2010)

spherical t−design∫

S2

p(z)dz =4π

N

N∑

i=1

p(zi), ∀p ∈ Pt

⇔ F (XN ) = 0 (Chen-Womersley, SINUM2006)

Well conditioned spherical t-design(Chen-Frommer-Lang, 2009, An-Chen-Sloan-Womersley 2010)

Nonsmooth, nonconvex Minimization Problems with Applications 12/47

Page 14: xNonsmooth, Nonconvex Minimization - PolyU

Spherical 100-design with N = 10201 points

Nonsmooth, nonconvex Minimization Problems with Applications 13/47

Page 15: xNonsmooth, Nonconvex Minimization - PolyU

III. The ℓ2-ℓp (0 < p < 1) minimization

Given a matrix A ∈ Rm×n, a vector b ∈ Rm, a number λ > 0,

minx∈Rn

‖Ax− b‖22 + λ‖x‖pp (ℓ2-ℓp)

Nonsmooth, nonconvex, non-Lipschitz minimization

Compressive sensing, sparse solutions of systems

Signal reconstruction, variable selection, image processing.

‖x‖0 =

n∑

i=1

xi 6=0

|xi|0 ←− ‖x‖pp =

n∑

i=1

|xi|p −→ ‖x‖1 =

n∑

i=1

|xi|

Bruckstein-Donoho-Elad (2009), Candén-Wakin-Boyd (2008),Chartrand-Staneva (2008), Chartrand-Yin (2009), Foucart-Lai (2009),Ge-Jiang-Ye (2010), Lai-Wang (2009), Nikolova et al (2008), Xu et al(2010).

Nonsmooth, nonconvex Minimization Problems with Applications 14/47

Page 16: xNonsmooth, Nonconvex Minimization - PolyU

The lower bound theory I

Chen-Xu-Ye, 2009

Let ai be the ith column of A. Let

Li =

(

λp(1− p)2‖ai‖2

)1

2−p

, i = 1, · · · , n.

Theorem 1 For any local solution x∗ of (ℓ2-ℓp), the followingstatements hold.

x∗i ∈ (−Li, Li) ⇒ x∗i = 0, i ∈ 1, · · · , n.The columns of the sub-matrix B := AΛ ∈ Rm×|Λ| of A are linearlyindependent, where Λ = support x∗.(ℓ2-ℓp) has a finite number of local minimizers.

Nonsmooth, nonconvex Minimization Problems with Applications 15/47

Page 17: xNonsmooth, Nonconvex Minimization - PolyU

The lower bound theory II

Chen-Xu-Ye 2009

For an arbitrarily given point x0, let

L =

(

λp

2‖A‖√

f(x0)

)1

1−p

.

Theorem 2 Let x∗ be any local minimizer of (ℓ2-ℓp) satisfyingf(x∗) ≤ f(x0). Then we have

x∗i ∈ (−L,L) ⇒ x∗i = 0, i ∈ 1, · · · , n.The number of nonzero entries in x∗ is bounded by

‖x∗‖0 ≤ min

(

m,f(x0)

λLp

)

.

Nonsmooth, nonconvex Minimization Problems with Applications 16/47

Page 18: xNonsmooth, Nonconvex Minimization - PolyU

Reweighted ℓ1 minimization algorithm (RL1)

Chen-Zhou 2010

Given ε > 0. The Iterative RL1 (IRL1) for (ℓ2-ℓ1):

xk+1 = arg minx∈Rn

‖Ax− b‖22 + λ‖W kx‖1

W k = diag(wki ), wk

i =p

(|xki |+ ε)1−p

, i = 1, . . . , n.

Extensive numerical experiments (Candén-Wakin-Boyd(2008),Chartrand-Staneva(2008), et al ) have shown that the IRL1 is veryefficient. However no convergence results have been given.

Theorem 3 Let xk be a sequence generated by the IRL1. Then thesequence xk converges to a stationary point x∗ of

minx∈Rn

‖Ax− b‖22 + λ

n∑

i=1

(|xi|+ ε)p, 0 < p < 1.

Nonsmooth, nonconvex Minimization Problems with Applications 17/47

Page 19: xNonsmooth, Nonconvex Minimization - PolyU

Part II: Smoothing algorithms

Definition 1: Let f : Rn → R be locally Lipschitz. We callf : Rn ×R+ → R a smoothing function of f , if f(·, µ) iscontinuously differentiable in Rn for any fixed µ > 0, and

limµ↓0

f(x, µ) = f(x), for anyx ∈ Rn.

Subdifferential associated with f

Gf (x) = V : ∃N ∈ N ♯∞, x

ν−→Nx, µν ↓ 0 with ∇xf(xν , µν) −→

NV .

Rockafellar and Wets (1998): Gf (x) is nonempty and bounded,

∂f(x) = co limxi→ x

xi∈Df

∇f(xi) ⊆ coGf (x).

In many cases: ∂f(x) = coGf

Nonsmooth, nonconvex Minimization Problems with Applications 18/47

Page 20: xNonsmooth, Nonconvex Minimization - PolyU

Smoothing algorithms

Choose a smoothing function f(x, µ) and an algorithm for smoothproblems

Use f(xk, µk) and its gradient ∇f(xk, µk) at each step of thealgorithm

Update the smoothing parameter µk at each step. The updatingscheme plays a key role in convergence analysis of the smoothingmethod.

Challeges:

1 How to choose a smoothing function and an algorithm for theproblem ?

2 How to update the smoothing parameter µk ?

We develop efficient smoothing projected gradient method andsmoothing conjugate gradient method.We prove global convergence of these methods to a stationary point.

Nonsmooth, nonconvex Minimization Problems with Applications 19/47

Page 21: xNonsmooth, Nonconvex Minimization - PolyU

Smoothing gradient method

Step 1. Choose constants σ, ρ ∈ (0, 1), and an initial point x0. Set k = 0.

Step 2. Compute the gradient

gk = ∇f(xk, µk).

Step 3. Compute the step size νk by the Armijo line search, whereνk = maxρ0, ρ1, · · · and ρi satisfies

f(xk − ρigk, µk) ≤ f(xk, µk)− σρigTk gk.

Set xk+1 = xk − νkgk.

Step 4. If ‖∇f(xk+1, µk)‖ ≥ nµk, then set µk+1 = µk; otherwise, chooseµk+1 = σµk.

Smoothing conjugate gradient method Chen-Zhou (SIIMS 2010).Nonsmooth, nonconvex Minimization Problems with Applications 20/47

Page 22: xNonsmooth, Nonconvex Minimization - PolyU

Smoothing model of the ℓ2-ℓp minimization

Let ψµ(x) = (sµ(x1), · · · , sµ(xn))T , and

sµ(t) =

|t| |t| > µ

t2

2µ+µ

2|t| ≤ µ.

minx∈Rn

f(x, µ) := ‖Ax− b‖2 + λ‖ψµ(x)‖pp.

For any µ > 0, the set of local minimizers X ∗µ of the smoothing

model is nonempty and bounded, and fµ is continuouslydifferentiable.

Nonsmooth, nonconvex Minimization Problems with Applications 21/47

Page 23: xNonsmooth, Nonconvex Minimization - PolyU

Smooth version of the lower bound theory

Let

L =

(

λp

2‖A‖√

f(x0)

)1

1−p

and Li =

(

λp(1− p)2‖ai‖2

)1

2−p

, i = 1, · · · , n.

Theorem 4

For any local minimizer x∗µ of the smoothing (ℓ2-ℓp),

(xµ)∗i ∈ (−Li, Li) ⇒ |(x∗µ)i| ≤ µ, i ∈ 1, · · · , n.

For any local minimizer x∗µ of the smoothing (ℓ2-ℓp) satisfyingf(x∗µ) ≤ f(x0),

(xµ)∗i ∈ (−L, L) ⇒ |(x∗µ)i| ≤ µ, i ∈ 1, · · · , n.

Nonsmooth, nonconvex Minimization Problems with Applications 22/47

Page 24: xNonsmooth, Nonconvex Minimization - PolyU

Stationary PointFor x ∈ Rn, let X = diag(x).

(1) x is said to satisfy the first order necessary condition (KKTcondition) of the ℓ2-ℓp problem if

2XAT (Ax− b) + λp|x|p = 0.

(2) x is said to satisfy the second order necessary condition of theℓ2-ℓp problem if

2XATAX + λp(p− 1)diag(|x|p)

is positive semidefinite.

Let X be the set of KKT points of the ℓ2-ℓp problem and Xµ be the setof KKT points of its smoothing problem.Theorem 5 Let xµ ∈ Xµ. We have

limµ↓0

dist(xµ,X ) = 0.

Nonsmooth, nonconvex Minimization Problems with Applications 23/47

Page 25: xNonsmooth, Nonconvex Minimization - PolyU

Properties of smoothing functionf(x, µ) is continuously differentiable and

| f(x, µ)− f(x) |≤ λn(µ

2

)p

.

For any x ∈ Rn, the level set

Sµ(x) = x ∈ Rn∣

∣f(x, µ) ≤ f(x, µ)

is bounded;

The gradient of f(·, µ) is Lipschitz continuous.

Theorem 6 From any initial point x0, the sequence xk generated bythe SG method satisfies

µk ≡ ε, for all large k and limk→∞

inf‖∇f(xk, µk)‖ = 0.

Nonsmooth, nonconvex Minimization Problems with Applications 24/47

Page 26: xNonsmooth, Nonconvex Minimization - PolyU

Error boundTheorem 7 There is µ > 0, such that for any µ ∈ (0, µ] and anyxµ ∈ Xµ, there is x∗ ∈ X such that

Γµ := i∣

∣|(x∗µ)i| ≤ µ, i ∈ N = i∣

∣|x∗i | = 0, i ∈ N =: Γ.

Define

(x∗µ)i =

0 i ∈ Γ

(xµ)i i ∈ N\Γ.

Let B be the submatrix of A whose columns are indexed by N\Γ.Suppose λmin(BTB) >

λp(1− p)2

Lp−2, then

∥x∗µ − x∗∥

∥ ≤∥

∥G−1∥

∥∇f(x∗µ, µ)∥

∥.

where G = 2BTB + λp(p− 1)Lp−2I, and λmin(BTB) denotes thesmallest eigenvalue of BTB.

Nonsmooth, nonconvex Minimization Problems with Applications 25/47

Page 27: xNonsmooth, Nonconvex Minimization - PolyU

Orthogonal Matching Pursuit (OMP)Mallat-Zhang (1993), Chen-Donoho-Saunders (1998), Mruckstein-Donoho-Elad (2009)

Parameters: Given the error threshold β.Initialization: Set the initial point x0 = 0, the initial residual

r0 = b− Ax0 = b, the initial solution support Λ0 = ∅.Main Iteration: Increment k by 1 and perform the following steps:

• Find the index jk that solves the optimization problem

jk = arg max‖(Ak−1xk−1 − b)Taj‖22

‖aj‖for j ∈ N \ Λk−1.

• Let Λk = Λk−1

[jk].

• Find xk = arg min ‖Ax− b‖22 | supportx = Λk .• Calculate the new residual rk = Axk − b.• If ‖AT rk‖ < β, stop.

Output: A point xomp := xk, a set Λ=support(xomp)

and a matrix B = AΛ ∈ Rm×|Λ|.

Nonsmooth, nonconvex Minimization Problems with Applications 26/47

Page 28: xNonsmooth, Nonconvex Minimization - PolyU

OMP-SCG hybrid method

Step 1. Using the OMP to get xomp, Λ =support(xomp), andB = AΛ ∈ Rm×|Λ|.

Step 2. Using the SCG algorithm with the initial point x0 = xomp to find

y∗ = arg min ‖By − b‖22 + λ‖y‖pp.

Step 3. Output an numerical solution x∗, where

x∗j =

y∗j |yj | ≥ L and j ∈ Λ,

0 j 6∈ Λ.

Nonsmooth, nonconvex Minimization Problems with Applications 27/47

Page 29: xNonsmooth, Nonconvex Minimization - PolyU

Other smoothing functions

minx∈Rn

f(x) := ‖Ax− b‖22 + λ

n∑

i=1

ϕ(xi),

ϕ is a potential function ( α ∈ (0, 1) is a parameter)

Convex Non Lipschitz

(f1) ϕ(t) = |t| ϕ(t) = |t|pNon convex Non Lipschitz

(f2) ϕ(t) = |t|α ϕ(t) = (|t|α)p

(f3) ϕ(t) =α|t|

1 + α|t| ϕ(t) =α|t|p

1 + α|t|p(f4) ϕ(t) = log(α|t|+ 1) ϕ(t) = log(α|t|p + 1)

|t| ⇒ sµ(t) =

|t| |t| > µt2

2µ+ µ

2 |t| ≤ µ.

Nonsmooth, nonconvex Minimization Problems with Applications 28/47

Page 30: xNonsmooth, Nonconvex Minimization - PolyU

ReferencesI. Stochastic complementarity problems

1. X. Chen and M. Fukushima, Expected Residual MinimizationMethod for Stochastic Linear Complementarity Problems,Mathematics of Operations Research 30(2005) 1022-1038.

2. H. Fang, X. Chen and M. Fukushima, Stochastic R0 matrix linearcomplementarity problems , SIAM J. Optimization 18(2007)482-506.

3. X. Chen, C. Zhang and M. Fukushima, Robust solution ofmonotone stochastic linear complementarity problems,Mathematical Programming 117(2009) 51-80

4. C. Zhang and X. Chen, Smoothing projected gradient method andits application to stochastic linear complementarity problems,SIAM J. Optimization 20(2009) 627-649.

5. C. Zhang, X. Chen and A Sumalee, Robust Wardrop’s UserEquilibrium Assignment under Stochastic Demand and Supply:Expected Residual Minimization Approach, Submitted toTransportation Research Part B (second revised version, minorchanges, 2010) Nonsmooth, nonconvex Minimization Problems with Applications 29/47

Page 31: xNonsmooth, Nonconvex Minimization - PolyU

References

II Optimization on the sphere

1. X. Chen and R. Womersley, Existence of solutions to systems ofunderdetermined equations and spherical designs , SIAM J.Numerical Analysis 44(2006) 2326-2341

2. X. Chen, A. Frommer and B. Lang, Computational ExistenceProofs for Spherical t-Designs , to appear in NumerischeMathematik.

3. C. An, X. Chen, I. H. Sloan, R. S. Womersley, Well conditionedspherical designs for integration and interpolation on thetwo-Sphere, to appear in SIAM J. Numerical Analysis.

4. X. Chen, R. S. Womersley and J. Ye, Minimizing the ConditionNumber of a Gram Matrix, 2010, accepted for publication in SIAMJ. Optimization subject to a revised version.

Nonsmooth, nonconvex Minimization Problems with Applications 30/47

Page 32: xNonsmooth, Nonconvex Minimization - PolyU

References

III Nonsmooth, nonconvex regularization

1.X.Chen, F. Xu and Y. Ye, Lower bound theory of nonzero entries insolutions of ℓ2-ℓp minimization, to appear in SIAM J. ScientificComputing

2.X. Chen and W. Zhou, Smoothing Nonlinear Conjugate GradientMethod for Image Restoration using Nonsmooth NonconvexMinimization, to appear in SIAM J. Imaging Sciences

3. X. Chen and W. Zhou, Convergence of Reweighted ℓ1 MinimizationAlgorithms and Unique Solution of Truncated ℓp Minimization,April, 2010,http://www.polyu.edu.hk/ama/staff/xjchen/ChenXJ.htm

Nonsmooth, nonconvex Minimization Problems with Applications 31/47

Page 33: xNonsmooth, Nonconvex Minimization - PolyU

Thank you

Note: The remainder of slides is for questions.

Nonsmooth, nonconvex Minimization Problems with Applications 32/47

Page 34: xNonsmooth, Nonconvex Minimization - PolyU

Computational results

LASSO: Solve the ℓ2-ℓ1 problem by the least squares algorithm (2004).

IRL1: At kth iteration, use LASSO to solve the following ℓ2-ℓ1 problem

min ‖Ax− b‖22 + λ

n∑

i=1

|xi|√

|xi|k + ε,

where ε > 0 is a parameter.

OMP-SCG

Nonsmooth, nonconvex Minimization Problems with Applications 33/47

Page 35: xNonsmooth, Nonconvex Minimization - PolyU

Example 1: Variable selection

This example is artificially generated and is used firstly inTibshirani (1996).

True optimal solution x∗ = (3, 1.5, 0, 0, 2, 0, 0, 0)T .

We simulated 100 data sets consisting of m observations from themodel

Ax = b+ σǫ,

where ǫ is standard normal.

MSE: The mean squared errors over the test set;ANZ: The average number of correctly identified zero coefficient;NANZ: The average number of the coefficients erroneously set

to zero over test set.

Nonsmooth, nonconvex Minimization Problems with Applications 34/47

Page 36: xNonsmooth, Nonconvex Minimization - PolyU

Results for variable selection

m σ Approach MSE ANZ NANZ

LASSO 0.4730 4.77 0.23

40 3 IRL1 0.4688 4.83 0.17

OMP-SCG 0.4755 4.88 0.12

LASSO 0.1595 4.77 0.23

40 1 IRL1 0.1541 4.86 0.14

OMP-SCG 0.1511 4.91 0.09

LASSO 0.3582 4.92 0.08

60 1 IRL1 0.3503 4.93 0.07

OMP-SCG 0.3464 4.95 0.05

The OMP-SCG performs the best, followed by LASSO and IRL1.

Nonsmooth, nonconvex Minimization Problems with Applications 35/47

Page 37: xNonsmooth, Nonconvex Minimization - PolyU

Example 2: Prostate cancer

This data sets are from the UCI Standard database.

The data set consists of the medical records of 97 patients whowere about to receive a radical prostatectomy. The predictors areeight clinical measures: lcavol, lweight, age, lbph, svi,lcp, gleasonand pgg45.

One of the main aims here is to identify which predictors are moreimportant in predicting the response.

Nonsmooth, nonconvex Minimization Problems with Applications 36/47

Page 38: xNonsmooth, Nonconvex Minimization - PolyU

Results for prostate cancer

Parameter LASSO IRL1 OMP-SCG

x1(lcavol) 0.545 0.6187 0.6436

x2(lweight) 0.237 0.2362 0.2804

x3(lage) 0 0 0

x4(lbph) 0.098 0.1003 0

x5(svi) 0.165 0.1858 0.1857

x6(lcp) 0 0 0

x7(gleason) 0 0 0

x8(pgg45) 0.059 0 0

Number of nonzreo 5 4 3

Prediction error 0.478 0.468 0.4419

SCG and OMP-SCG succeed in finding three main factors andhave better prediction accuracy than IRL1 and LASSO.

Nonsmooth, nonconvex Minimization Problems with Applications 37/47

Page 39: xNonsmooth, Nonconvex Minimization - PolyU

Error bound

For given ε,

∥x∗µ − x∗∥

∥ ≤∥

∥G−1∥

∥∇f(x∗µ, µ)∥

=: error bound

µ L λ error bound

0.001 0.015 0.1304 1.5793× 10−5

0.0001 0.0119 0.1164 5.7310× 10−6

0.00001 0.0119 0.1164 5.5721× 10−6

Nonsmooth, nonconvex Minimization Problems with Applications 38/47

Page 40: xNonsmooth, Nonconvex Minimization - PolyU

Example 3: signal reconstruction

A real-valued, finite-length signal x ∈ Rn and x is T-sparse;

A ∈ Rm×n is a Gaussian random matrix.

Problem LASSO IRL1 OMP-SCG

(Error,Time) (Error,Time) L λ Error Time

n = 512

T = 60 (5.33 × 10−4, (1.29 × 10−5, 0.8 0.002 1.12 × 10−16 1.02

m = 184 0.653) 6.82)

n = 512

T = 60 (38.64, (2.41 × 10−5, 0.7 0.001 1.03 × 10−16 1.34

m = 182 0.43) 7.84)

n = 512

T = 130 (122.25, (119.43, 0.00001 0.00006 0.41 4.03

m = 225 0.69) 19.99)

Nonsmooth, nonconvex Minimization Problems with Applications 39/47

Page 41: xNonsmooth, Nonconvex Minimization - PolyU

f1(t) = |t|, f2(t) = |t| 12 , f3(t) =α|t|

1 + α|t| , f4(t) = log(α|t|+ 1)

−0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

f1f2f3f4f1

mu

f2mu

Nonsmooth, nonconvex Minimization Problems with Applications 40/47

Page 42: xNonsmooth, Nonconvex Minimization - PolyU

Results of prostate cancer by all the PFs

p (L, Number of nonzero, Prediction error)

f1 f2 f3 f4

0.9 (0.0001, 4, 0.4754) (0.011, 4, 0.473) (2.500, 4, 0.475) (2.040, 4, 0.474)

0.8 (0.0015, 4, 0.4740) (0.013, 4, 0.468) (1.990, 4, 0.474) (1.851, 4, 0.474)

0.7 (0.0050, 4, 0.4741) (0.012, 4, 0.465) (1.755, 4, 0.474) (1.550, 4, 0.474)

0.6 (0.0084, 4, 0.4661) (0.015, 3, 0.446) (1.545, 4, 0.475) (1.344, 4, 0.475)

0.5 (0.0119, 3, 0.4419) (0.016, 3, 0.445) (1.420, 3, 0.477) (1.200, 3, 0.483)

0.4 (0.0148, 3, 0.4456) (0.014, 3, 0.445) (1.480, 3, 0.477) (1.114, 3, 0.484)

0.3 (0.0176, 3, 0.4429) (0.012, 3, 0.443) (1.590, 3, 0.484) (1.190, 3, 0.483)

0.2 (0.0196, 3, 0.4359) (0.018, 3, 0.443) (1.955, 3, 0.483) (1.240, 3, 0.482)

Nonsmooth, nonconvex Minimization Problems with Applications 41/47

Page 43: xNonsmooth, Nonconvex Minimization - PolyU

signal reconstruction

0 100 200 300 400 500 600−8

−6

−4

−2

0

2

4

6

(a) Original signal

−4

−2

0

2

4

6

Nonsmooth, nonconvex Minimization Problems with Applications 42/47

Page 44: xNonsmooth, Nonconvex Minimization - PolyU

Remarks on lower bound theory

The theory establishes a theoretical justification for “zeroing”some small entries in an approximate solution.

The theory gives a theoretical explanation why using ‖x‖pp cangenerate more sparse solutions.

The theory shows clearly the relationship between the sparsity ofthe solution and the choice of the regularization parameter andnorm.

It provides a systematic mechanism for selecting theregularization parameter.

Nonsmooth, nonconvex Minimization Problems with Applications 43/47

Page 45: xNonsmooth, Nonconvex Minimization - PolyU

NotationsQr(ω): demand on each OD-pair

Ca(ω): capacity on each link

K: link-route incidence matrix

Γ: OD-route incidence matrix

The generalized Bureau of public road (BPR) link cost function

Ta(v, ω) = t0a

(

1 + ba

( va

Ca(ω)

)na)

,

where t0a, ba and na are given parameters and va is the link flow.

The nonadditive path travel cost function

G(y, ω) = η1KTT (Ky, ω) + Ψ(KTT (Ky, ω)) + Λ(y, ω),

where y is the path flow, η1 > 0 is the time-based operating costsfactor, Ψ is the translation function converting time T to money,and Λ is the perturbed financial cost function.

Nonsmooth, nonconvex Minimization Problems with Applications 44/47

Page 46: xNonsmooth, Nonconvex Minimization - PolyU

Main Contribution for the ℓ2-ℓp minimization

joint work with F. Xu, Y. Ye, W. Zhou

We derive a lower bound theory for nonzero entries in every localminimizer of the ℓ2-ℓp minimization problems. This theory showsclearly the relationship between the sparsity of the solution andthe choice of parameters in the model.

We develop a hybrid orthogonal matching pursuit-smoothingconjugate gradient method.

We prove global convergence of the ℓ1 reweighted minimizationalgorithm.

We prove uniqueness of solution under the truncated null spaceproperty which is weaker than the restricted isometry propertyintroduced by Candés and Tao (2005).

Nonsmooth, nonconvex Minimization Problems with Applications 45/47

Page 47: xNonsmooth, Nonconvex Minimization - PolyU

Uniqueness

minx∈Rn

‖xT ‖pp, s.t. Ax = b, (1)

where ‖xT ‖pp =∑

i∈T

|xi|p and T is a subset of 1, . . . , n.

F = x |Ax = b, S(x) = i | xi 6= 0

Theorem 3 Let x∗ ∈ F and T be a subset of 1, . . . , n. LetS = T ∩ S(x∗). If S = ∅, then x∗ is a solution of (1). If S 6= ∅ and

‖ηS‖p ≤ γ‖η(T∩SC)‖p, γ < 1

for all η ∈ N(A), then x∗ is the unique solution of (1).

Ge-Jiang-Ye (2010) showed that for T = 1, . . . , n, (1) is NP-hard.The set of basic feasible solutions is the set of local minimizers.

Nonsmooth, nonconvex Minimization Problems with Applications 46/47

Page 48: xNonsmooth, Nonconvex Minimization - PolyU

Truncated null space property

A satisfies the t-null space property(NSP) of order K for γ > 0,0 < t ≤ n if

‖ηS‖p ≤ γ‖η(T∩SC)‖p|T | = t, all subsets S ⊂ T with |S| ≤ K, and all η ∈ N(A).

Theorem 4 If A satisfies the restricted isometry property

αs‖x‖2 ≤ ‖Ax‖2 ≤ βs‖x‖2, ∀ ‖x‖0 ≤ s

β22t1

α22t1

− 1 < 4(√

2− 1)( t1

K

)1

p− 1

2

,

for some t1 ≥ K then A satisfies the t-NSP of order K for γ < 1 and|T | = n.

Nonsmooth, nonconvex Minimization Problems with Applications 47/47