Model Selection with Piecewise Regular Gauges

Gabriel Peyré

www.numerical-tours.com

Model Selectionwith Piecewise

Regular Gauges

Samuel VaiterCharles Deledalle

Jalal FadiliJoint work with:

Joseph Salmon

VISI N

http://www.ceremade.dauphine.fr/~peyre/wavelet-tour/

http://www.ceremade.dauphine.fr/~peyre/wavelet-tour/

Overview

• Inverse Problems

• Gauge Decomposition and Model Selection

• L2 Stability Performances

• Model Stability Performances

y = �x0 + w 2 RP

Inverse Problems

Recovering x0 � RN from noisy observations

Examples: Inpainting, super-resolution, compressed-sensing

y = �x0 + w 2 RP

�

Inverse Problems

Recovering x0 � RN from noisy observations

x0

x0

�

Regularized inversion:

Estimators

x(y) 2 argminx2RN

1

2||y � �x||2 + � J(x)

Data fidelity Regularity

Observations: y = �x0 + w 2 RP .

L2error stability: ||x(y)� x0|| = O(||w||).

Promoted subspace (“model”) stability.

Goal: Performance analysis:

Regularized inversion:

Estimators

x(y) 2 argminx2RN

1

2||y � �x||2 + � J(x)

�! Criteria on (x0, ||w||,�) to ensure

Data fidelity Regularity

Observations: y = �x0 + w 2 RP .

Overview





�

Coe�cients x Image x

Union of Linear Models for Data ProcessingUnion of models: T 2 T linear spaces.

Synthesissparsity:

T

�


Union of Linear Models for Data ProcessingUnion of models: T 2 T linear spaces.

Synthesissparsity:

T

Structuredsparsity:

�


Union of Linear Models for Data Processing

D�

Image x

Gradient D⇤x

Union of models: T 2 T linear spaces.

Synthesissparsity:

T

Structuredsparsity:

Analysissparsity:

�


Multi-spectral imaging:xi,· =

Prj=1 Ai,jSj,·

Union of Linear Models for Data Processing

D�

Image x

Gradient D⇤x

Union of models: T 2 T linear spaces.

Synthesissparsity:

T

Structuredsparsity:

Analysissparsity:

Low-rank:

S1,·

S2,·

S3,·Figure 3. The concept of hyperspectral imagery. Image measurements are made atmany narrow contiguous wavelength bands, resulting in a complete spectrum for eachpixel.

Hyperspectral Data

Most multispectral imagers (e.g., Landsat, SPOT, AVHRR) measure radiation reflectedfrom a surface at a few wide, separated wavelength bands (Fig. 4). Most hyperspectralimagers (Table 1), on the other hand, measure reflected radiation at a series of narrowand contiguous wavelength bands. When we look at a spectrum for one pixel in ahyperspectral image, it looks very much like a spectrum that would be measured in aspectroscopy laboratory (Fig. 5). This type of detailed pixel spectrum can provide muchmore information about the surface than a multispectral pixel spectrum.

x

Gauge: J : RN ! R+8↵ 2 R+

, J(↵x) = ↵J(x)

Gauges for Union of Linear ModelsConvex

Gauge: J : RN ! R+8↵ 2 R+

, J(↵x) = ↵J(x)

J(x) = �C(x) = inf {⇢ > 0 \ x 2 ⇢C}C = {x \ J(x) 6 1} (assuming 0 2 C)

Gauges for Union of Linear Models

J(x)

C

1

Convex

Gauge:

,Union of linear models

(T )T2TPiecewise regular ball

J : RN ! R+8↵ 2 R+

, J(↵x) = ↵J(x)



J(x)

C

1

Convex

x

T

Gauge:



J : RN ! R+8↵ 2 R+

, J(↵x) = ↵J(x)



J(x) = ||x||1T = sparsevectors

J(x)

C

1

Convex

x

T

Gauge:



J : RN ! R+8↵ 2 R+

, J(↵x) = ↵J(x)



J(x) = ||x||1

x

0T 0

T = sparsevectors

J(x)

C

1

Convex

x

T

Gauge:



J : RN ! R+8↵ 2 R+

, J(↵x) = ↵J(x)



J(x) = ||x||1

x

0T 0

T = sparsevectors

|x1|+||x2,3||x

0

x

T

T 0

T = block

vectors

sparse

J(x)

C

1

Convex

x

T

Gauge:



J : RN ! R+8↵ 2 R+

, J(↵x) = ↵J(x)



J(x) = ||x||1T = low-rank

matrices

J(x) = ||x||⇤

x

x

0T 0

T = sparsevectors

|x1|+||x2,3||x

0

x

T

T 0

T = block

vectors

sparse

J(x)

C

1

Convex

x

T

Gauge:



J : RN ! R+8↵ 2 R+

, J(↵x) = ↵J(x)



J(x) = ||x||1T = low-rank

matrices

J(x) = ||x||⇤

x

x

0T 0

T = anti-sparsevectors

J(x) = ||x||1

x

x

0

T 0

T = sparsevectors

|x1|+||x2,3||x

0

x

T

T 0

T = block

vectors

sparse

J(x)

C

1

Convex

Subdifferentials and Models

@J(x) =�⌘ 2 RN \ 8 y, J(y) > J(x) + h⌘, y � xi

I = supp(x) = {i \ xi 6= 0}


Example: J(x) = ||x||1 @||x||1 =

⇢⌘ \ supp(⌘) = I,

8 j /2 I, |⌘j | 6 1

�

x

@J(x)

0

@J(x) =�⌘ 2 RN \ 8 y, J(y) > J(x) + h⌘, y � xi

x

@J(x)

0

I = supp(x) = {i \ xi 6= 0}


Example: J(x) = ||x||1 @||x||1 =

⇢⌘ \ supp(⌘) = I,

8 j /2 I, |⌘j | 6 1

�

x

@J(x)

0

@J(x) =�⌘ 2 RN \ 8 y, J(y) > J(x) + h⌘, y � xi

Tx

x

@J(x)

0Definition:

I = supp(x) = {i \ xi 6= 0}

T

x

= VectHull(@J(x))?


Tx

= {⌘ \ supp(⌘) = I}

Example: J(x) = ||x||1 @||x||1 =

⇢⌘ \ supp(⌘) = I,

8 j /2 I, |⌘j | 6 1

�

Tx

x

@J(x)

0

@J(x) =�⌘ 2 RN \ 8 y, J(y) > J(x) + h⌘, y � xi

Tx

x

@J(x)

0Definition:

I = supp(x) = {i \ xi 6= 0}

T

x

= VectHull(@J(x))?


ex

e

x

= Proj

T

x

(@J(x))

e

x

= sign(x)

Tx

= {⌘ \ supp(⌘) = I}

Example: J(x) = ||x||1 @||x||1 =

⇢⌘ \ supp(⌘) = I,

8 j /2 I, |⌘j | 6 1

�

ex

Tx

x

@J(x)

0

@J(x) =�⌘ 2 RN \ 8 y, J(y) > J(x) + h⌘, y � xi

Examples`

1 sparsity: J(x) = ||x||1e

x

= sign(x) T

x

= {z \ supp(z) ⇢ supp(x)}

x

0x

@J(x)

Examples

x

0x

@J(x)

`


x

= sign(x) T

x


e

x

= (N (xb

))b2B

N (a) = a/||a||Structured sparsity: J(x) =P

b ||xb||T

x


x

0x

@J(x)

Examples

x

0x

@J(x)

Tx

= {z \ U⇤?zV? = 0}e

x

= UV ⇤Nuclear norm: J(x) = ||x||⇤ x = U⇤V ⇤SVD:

`


x

= sign(x) T

x


e

x

= (N (xb

))b2B


b ||xb||T

x


x

@J(x)x

0x

@J(x)

Examples

x

0x

@J(x)

x

x

0

@J(x)

I = {i \ |xi| = ||x||1}Anti-sparsity: J(x) = ||x||1T

x

= {y \ y

I

/ sign(xI

)}e

x

= |I|�1 sign(x)

Tx

= {z \ U⇤?zV? = 0}e

x

= UV ⇤Nuclear norm: J(x) = ||x||⇤ x = U⇤V ⇤SVD:

`


x

= sign(x) T

x


e

x

= (N (xb

))b2B


b ||xb||T

x


x

@J(x)x

0x

@J(x)

Overview





Noiseless recovery:

min�x=�x0

J(x) (P0)

�x = �

x0

Dual Certificate and L2 Stability

x

?

Noiseless recovery:

min�x=�x0

J(x) (P0)

Dual certificates:

�x = �

x0

⌘

Proposition:

D = Im(�⇤) \ @J(x0)

9 ⌘ 2 D () x0 solution of (P0)

Dual Certificate and L2 Stability@J(x0)

x

?

Noiseless recovery:

min�x=�x0

J(x) (P0)

Dual certificates:

Tight dual certificates:

�x = �

x0

⌘

Proposition:

D = Im(�⇤) \ @J(x0)

D̄ = Im(�⇤) \ ri(@J(x0))



x

?

Noiseless recovery:

min�x=�x0

J(x) (P0)

Dual certificates:


�x = �

x0

⌘

Proposition:

D = Im(�⇤) \ @J(x0)

D̄ = Im(�⇤) \ ri(@J(x0))



x

?

Theorem:

[Fadili et al. 2013] for � ⇠ ||w|| one has ||x? � x0|| = O(||w||)If 9 ⌘ 2 D̄ and ker(�) \ T

x0 = {0}

Noiseless recovery:

min�x=�x0

J(x) (P0)

Dual certificates:


�x = �

x0

⌘

Proposition:

�! The constants depend on N . . .

D = Im(�⇤) \ @J(x0)

D̄ = Im(�⇤) \ ri(@J(x0))



x

?

Theorem:


x0 = {0}

Noiseless recovery:

min�x=�x0

J(x) (P0)

Dual certificates:


�x = �

x0

⌘

Proposition:

[Grassmair 2012]: J(x? � x0) = O(||w||).[Grassmair, Haltmeier, Scherzer 2010]: J = || · ||1.

�! The constants depend on N . . .

D = Im(�⇤) \ @J(x0)

D̄ = Im(�⇤) \ ri(@J(x0))



x

?

Theorem:


x0 = {0}

Overview





⌘ 2 D () and J�(⌘) = 1

Minimal-norm Certificate⌘ = �⇤q⌘T = e

⇢T = T

x0

e = ex0

⌘ 2 D ()

We assume ker(�) \ T = {0} and J piecewise regular.

and J�(⌘) = 1


⇢T = T

x0

e = ex0

⌘0 = argmin⌘=�⇤q,⌘T=e

||q||

⌘ 2 D ()


and J�(⌘) = 1


Minimal-norm pre-certificate:

⇢T = T

x0

e = ex0


||q||

⌘ 2 D ()


and J�(⌘) = 1

Minimal-norm Certificate

Proposition: One has

⌘ = �⇤q⌘T = e


⇢T = T

x0

e = ex0

⌘0 = (�+T�)

⇤e


||q||

⌘ 2 D ()


and J�(⌘) = 1


Proposition:

||w|| = O(⌫x0) and � ⇠ ||w||,Theorem:

the unique solution x

?of P�(y) for y = �x0 + w satisfies

Tx

? = Tx0 and ||x? � x0|| = O(||w||) [Vaiter et al. 2013]

One has

⌘ = �⇤q⌘T = e


⇢T = T

x0

e = ex0

⌘0 = (�+T�)

⇤e

If ⌘0 2 D̄,

[Fuchs 2004]: J = || · ||1.[Bach 2008]: J = || · ||1,2 and J = || · ||⇤.

[Vaiter et al. 2011]: J = ||D⇤ · ||1.


||q||

⌘ 2 D ()


and J�(⌘) = 1


Proposition:

||w|| = O(⌫x0) and � ⇠ ||w||,Theorem:

the unique solution x

?of P�(y) for y = �x0 + w satisfies

Tx

? = Tx0 and ||x? � x0|| = O(||w||) [Vaiter et al. 2013]

One has

⌘ = �⇤q⌘T = e


⇢T = T

x0

e = ex0

⌘0 = (�+T�)

⇤e

If ⌘0 2 D̄,

⇥x =�

i

xi�(·��i)

Increasing �:� reduces correlation.� reduces resolution.

J(x) = ||x||1

Example: 1-D Sparse Deconvolution

�x0

x0�

⇥x =�

i

xi�(·��i)

Increasing �:� reduces correlation.� reduces resolution.

�0 10

2

support recovery.

J(x) = ||x||1

()

||⌘0,Ic ||1 < 1

⌘0 2 D̄(x0)

I = {j \ x0(j) 6= 0}||⌘0,Ic ||1

Example: 1-D Sparse Deconvolution

�x0

x0�

201

()

J(x) = ||rx||1 (rx)i = xi � xi�1

� = Id I = {i \ (rx0)i 6= 0}

8 j /2 I, (�↵0)j = 0⌘0 = div(↵0) where

Example: 1-D TV Denoisingx0

+1

�1�I

�J

Support stability.

J(x) = ||rx||1 (rx)i = xi � xi�1

� = Id I = {i \ (rx0)i 6= 0}

8 j /2 I, (�↵0)j = 0⌘0 = div(↵0) where

||↵0,Ic || < 1

Example: 1-D TV Denoising

x0

x0

+1

�1�I

�J

`2 stability only

Support stability.

x0

J(x) = ||rx||1 (rx)i = xi � xi�1

� = Id I = {i \ (rx0)i 6= 0}

8 j /2 I, (�↵0)j = 0⌘0 = div(↵0) where

||↵0,Ic || < 1 ||↵0,Ic || = 1

Example: 1-D TV Denoising

+1

�1

�J

x0

x0

Gauges: encode linear models as singular points.

Conclusion

Piecewise smooth gauges: enable model recovery Tx

?= T

x0 .


Tight dual certificates: enables L2 stability.

Conclusion

Piecewise smooth gauges: enable model recovery Tx

?= T

x0 .

– Approximate model recovery Tx

? ⇡ Tx0 .


– Infinite dimensional problems (measures, TV, etc.).

Tight dual certificates: enables L2 stability.

Conclusion

Open problems:

Education

Model Selection with Piecewise Regular Gauges