Robust Sparse Analysis Recovery

Gabriel Peyré

www.numerical-tours.com

Robust SparseAnalysis Recovery

Samuel VaiterCharles Dossal

Jalal Fadili

Joint work with:

http://www.ceremade.dauphine.fr/~peyre/wavelet-tour/

http://www.ceremade.dauphine.fr/~peyre/wavelet-tour/

Overview

• Synthesis vs. Analysis Regularization

• Risk Estimation

• Local Behavior of Sparse Regularization

• Robustness to Noise

• Numerical Illustrations

y = �x0 + w � RP

Inverse ProblemsRecovering x0 � RN from noisy observations

Examples: Inpainting, super-resolution, compressed-sensing

y = �x0 + w � RP


x0

Examples: Inpainting, super-resolution, compressed-sensing

y = �x0 + w � RP

Regularized inversion:

Data fidelity Regularity

x

�

(y) 2 argminx2RN

1

2||y � �x||2 + � J(x)


x0

Coe�cients � Image x = ��

�

min��RQ

12

||y � �⇥�||22 + ⇥||�||1

Sparse RegularizationsSynthesis regularization

Coe�cients � Image x = �� Image x Correlations� = D�x

� D�

min��RQ

12

||y � �⇥�||22 + ⇥||�||1 minx�RN

12

||y � �x||22 + �||D�x||1

Sparse RegularizationsSynthesis regularization Analysis regularization


� D�

min��RQ

12

||y � �⇥�||22 + ⇥||�||1 minx�RN

12

||y � �x||22 + �||D�x||1

Sparse Regularizations

=

6= 0

=

D⇤

Synthesis regularization Analysis regularization

��

x x �


� D�

Unless D = � is orthogonal, produces di⇥erent results.

min��RQ

12

||y � �⇥�||22 + ⇥||�||1 minx�RN

12

||y � �x||22 + �||D�x||1

Sparse Regularizations

=

6= 0

=

D⇤

Synthesis regularization Analysis regularization

��

x x �

y = �x0 + w

Recovery:Observations:

��

0+

(no noise)

x

�

(y) 2 argminx2RN

1

2||�x� y||2 + �||D⇤

x||1

x0+(y) 2 argmin�x=y

||D⇤x||1

Variations and Stability

(P�(y))

(P0(y))

y = �x0 + w

Questions:


��

0+

(no noise)

x

�

(y) 2 argminx2RN

1

2||�x� y||2 + �||D⇤

x||1


||D⇤x||1

– Behavior of x�(y) with respect to y and �.

�! Application: risk estimation (SURE, GCV, etc.)


(P�(y))

(P0(y))

y = �x0 + w

Questions:


��

0+

(no noise)

(with “reasonable” C)

x

�

(y) 2 argminx2RN

1

2||�x� y||2 + �||D⇤

x||1


||D⇤x||1



– Criteria to ensure ||x�(y)� x0|| 6 C||w||


(P�(y))

(P0(y))

y = �x0 + w

Questions:


��

0+

(no noise)

Synthesis case (D = Id): works of Fuchs and Tropp.

Analysis case: [Nam et al. 2011] for w = 0.

(with “reasonable” C)

x

�

(y) 2 argminx2RN

1

2||�x� y||2 + �||D⇤

x||1


||D⇤x||1



– Criteria to ensure ||x�(y)� x0|| 6 C||w||


(P�(y))

(P0(y))

Overview

• Inverse Problems

• Stein Unbiased Risk Estimators

• Theory: SURE for Sparse Regularization

• Practice: SURE for Iterative Algorithms

Plugin-estimator: x�?(y)(y)�?(y) = argmin�

R(�)

Risk Minimization

Unbiased Risk Estimation for Sparse Analysis RegularizationCharles Deledalle1, Samuel Vaiter1, Gabriel Peyre1, Jalal Fadili3 and Charles Dossal2

1CEREMADE, Universite Paris–Dauphine — 2GREY’C, ENSICAEN — 3IMB, Universite Bordeaux I

Problem statement

Consider the convex but non-smooth Analysis Sparsity Regularization problem

x

?(y,�) 2 argminx2RN

1

2||y � �x||2 + �||D⇤

x||1 (P�

(y))

which aims at inverting

y = �x0 + w

by promoting sparsity and with

Ix0 2 RN the unknown image of interest,

Iy 2 RQ the low-dimensional noisy observation of x0,

I � 2 RQ⇥N a linear operator that models the acquisition process,

Iw ⇠ N (0, �2Id

Q

) the noise component,

ID 2 RN⇥P an analysis dictionary, and

I� > 0 a regularization parameter.

How to choose the value of the parameter �?

Risk-based selection of �

I Risk associated to �: measure of the expected quality of x?(y,�) wrt x0,

R(�) = Ew

||x?(y,�) � x0||2 .I The optimal (theoretical) � minimizes the risk.

The risk is unknown since it depends on x0.

Can we estimate the risk solely from x

?(y,�)?

Risk estimation

I Assume y 7! �x?(y,�) is weakly di↵erentiable (a fortiori uniquely defined).

Prediction risk estimation via SURE

I The Stein Unbiased Risk Estimator (SURE):

SURE(y,�) =||y � �x?(y,�)||2 � �

2Q + 2�2 tr

✓@�x?(y,�)

@y

◆

| {z }Estimator of the DOF

is an unbiased estimator of the prediction risk [Stein, 1981]:

Ew

(SURE(y,�)) = Ew

(||�x0 � �x?(y,�)||2) .

Projection risk estimation via GSURE

I Let ⇧ = �⇤(��⇤)+� be the orthogonal projector on ker(�)? = Im(�⇤),I Denote xML(y) = �⇤(��⇤)+y,I The Generalized Stein Unbiased Risk Estimator (GSURE):

GSURE(y,�) =||xML(y) � ⇧x?(y,�)||2 � �

2 tr((��⇤)+) + 2�2 tr

✓(��⇤)+@�x

?(y,�)

@y

◆

is an unbiased estimator of the projection risk [Vaiter et al., 2012]

Ew

(GSURE(y,�)) = Ew

(||⇧x0 � ⇧x?(y,�)||2)(see also [Eldar, 2009, Pesquet et al., 2009, Vonesch et al., 2008] for similar results).

Illustration of risk estimation

(here, x? denotes x?(y,�) for an arbitrary value of �)

How to estimate the quantity tr⇣(��⇤)+@x

?(y,�)@y

⌘?

Main notations and assumptions

I Let I = supp(D⇤x

?(y,�)) be the support of D⇤x

?(y,�),I Let J = I

c be the co-support of D⇤x

?(y,�),I Let D

I

be the submatrix of D whose columns are indexed by I ,

I Let sI

= sign(D⇤x

?(y,�))I

be the subvector of D⇤x

?(y,�) whose entries are indexed by I ,

I Let GJ

= KerD⇤J

be the “cospace” associated to x

?(y,�) ,I To study the local behaviour of x?(y,�), we impose � to be “invertible” on G

J

:

GJ

\ Ker� = {0},I It allows us to define the matrix

A

[J ] = U(U⇤�⇤�U)�1U

⇤,

where U is a matrix whose columns form a basis of GJ

,

I In this case, we obtain an implicit equation:

x

?(y,�) solution of P�

(y) , x

?(y,�) = x(y,�) , A

[J ]�⇤y � �A

[J ]D

I

s

I

.

Is this relation true in a neighbourhood of (y,�)?

Theorem (Local Parameterization)

I Even if the solutions x?(y,�) of P�

(y) might benot unique, �x?(y,�) is uniquely defined.

I If (y,�) 62 H, for (y, �) close to (y,�), x(y, �)is a solution of P(y, �) where

x(y, �) = A

[J ]�⇤y � �A

[J ]D

I

s

I

.

I Hence, it allows us writing

@�x?(y,�)

@y

= �A[J ]�⇤,

I Moreover, the DOF can be estimated by

tr

✓@�x?(y,�)

@y

◆= dim(G

J

) .

Can we compute this quantity e�ciently?

�

x1

x2

�0 = 0 �k

x�k = 0

x�0

P0(y)

Monday, September 24, 12

Computation of GSURE

I One has for Z ⇠ N (0, IdP

),

tr

✓(��⇤)+@�x

?(y,�)

@y

◆= E

Z

(h⌫(Z), �⇤(��⇤)+Zi)

where, for any z 2 RP , ⌫ = ⌫(z) solves the following linear system✓�⇤� D

J

D

⇤J

0

◆✓⌫

⌫

◆=

✓�⇤

z

0

◆.

I In practice, with law of large number, the empirical mean is replaced for the expectation.

I The computation of ⌫(z) is achieved by solving the linear system with a conjugate gradient solver.

Numerical example

Super-resolution using (anisotropic) Total-Variation

(a) y

(b) x?(y,�) at the optimal � 2 4 6 8 10 12

1

1.5

2

2.5x 10

6

Regularization parameter !

Qu

ad

ratic

loss

Projection RiskGSURETrue Risk

Compressed-sensing using multi-scale wavelet thresholding

(c) xML

(d) x?(y,�) at the optimal �2 4 6 8 10 12

1

1.5

2

2.5x 10

6


Qu

ad

ratic

loss


Perspectives: How to e�ciently minimizes GSURE(y,�) wrt �?

References

Eldar, Y. C. (2009).Generalized SURE for exponential families: Applications to regularization.IEEE Transactions on Signal Processing, 57(2):471–481.

Pesquet, J.-C., Benazza-Benyahia, A., and Chaux, C. (2009).A SURE approach for digital signal/image deconvolution problems.IEEE Transactions on Signal Processing, 57(12):4616–4632.

Stein, C. (1981).Estimation of the mean of a multivariate normal distribution.The Annals of Statistics, 9(6):1135–1151.

Vaiter, S., Deledalle, C., Peyre, G., Dossal, C., and Fadili, J. (2012).Local behavior of sparse analysis regularization: Applications to risk estimation.Arxiv preprint arXiv:1204.3212.

Vonesch, C., Ramani, S., and Unser, M. (2008).Recursive risk estimation for non-linear image deconvolution with a wavelet-domain sparsity constraint.In ICIP, pages 665–668. IEEE.

http://www.ceremade.dauphine.fr/~deledall/ [email protected]

Average risk: R(�) = Ew(||x�(y)� x0||2)

But:Ew is not accessible ! use one observation.


R(�)

Risk Minimization



Problem statement


x


1

2||y � �x||2 + �||D⇤

x||1 (P�

(y))


y = �x0 + w





Iw ⇠ N (0, �2Id

Q







R(�) = Ew




?(y,�)?

Risk estimation




SURE(y,�) =||y � �x?(y,�)||2 � �

2Q + 2�2 tr

✓@�x?(y,�)

@y

◆



Ew

(SURE(y,�)) = Ew

(||�x0 � �x?(y,�)||2) .



GSURE(y,�) =||xML(y) � ⇧x?(y,�)||2 � �

2 tr((��⇤)+) + 2�2 tr

✓(��⇤)+@�x

?(y,�)

@y

◆


Ew

(GSURE(y,�)) = Ew





?(y,�)@y

⌘?




?(y,�),I Let J = I


?(y,�),I Let D

I


I Let sI

= sign(D⇤x

?(y,�))I



I Let GJ

= KerD⇤J



J

:

GJ


A

[J ] = U(U⇤�⇤�U)�1U

⇤,


,


x


(y) , x

?(y,�) = x(y,�) , A

[J ]�⇤y � �A

[J ]D

I

s

I

.






x(y, �) = A

[J ]�⇤y � �A

[J ]D

I

s

I

.


@�x?(y,�)

@y

= �A[J ]�⇤,


tr

✓@�x?(y,�)

@y

◆= dim(G

J

) .


�

x1

x2

�0 = 0 �k

x�k = 0

x�0

P0(y)




),

tr

✓(��⇤)+@�x

?(y,�)

@y

◆= E

Z

(h⌫(Z), �⇤(��⇤)+Zi)


J

D

⇤J

0

◆✓⌫

⌫

◆=

✓�⇤

z

0

◆.



Numerical example


(a) y


1

1.5

2

2.5x 10

6


Qu

ad

ratic

loss



(c) xML


1

1.5

2

2.5x 10

6


Qu

ad

ratic

loss



References








But:x0 is not accessible ! needs risk estimators.

Ew is not accessible ! use one observation.


R(�)

Risk Minimization



Problem statement


x


1

2||y � �x||2 + �||D⇤

x||1 (P�

(y))


y = �x0 + w





Iw ⇠ N (0, �2Id

Q







R(�) = Ew




?(y,�)?

Risk estimation




SURE(y,�) =||y � �x?(y,�)||2 � �

2Q + 2�2 tr

✓@�x?(y,�)

@y

◆



Ew

(SURE(y,�)) = Ew

(||�x0 � �x?(y,�)||2) .



GSURE(y,�) =||xML(y) � ⇧x?(y,�)||2 � �

2 tr((��⇤)+) + 2�2 tr

✓(��⇤)+@�x

?(y,�)

@y

◆


Ew

(GSURE(y,�)) = Ew





?(y,�)@y

⌘?




?(y,�),I Let J = I


?(y,�),I Let D

I


I Let sI

= sign(D⇤x

?(y,�))I



I Let GJ

= KerD⇤J



J

:

GJ


A

[J ] = U(U⇤�⇤�U)�1U

⇤,


,


x


(y) , x

?(y,�) = x(y,�) , A

[J ]�⇤y � �A

[J ]D

I

s

I

.






x(y, �) = A

[J ]�⇤y � �A

[J ]D

I

s

I

.


@�x?(y,�)

@y

= �A[J ]�⇤,


tr

✓@�x?(y,�)

@y

◆= dim(G

J

) .


�

x1

x2

�0 = 0 �k

x�k = 0

x�0

P0(y)




),

tr

✓(��⇤)+@�x

?(y,�)

@y

◆= E

Z

(h⌫(Z), �⇤(��⇤)+Zi)


J

D

⇤J

0

◆✓⌫

⌫

◆=

✓�⇤

z

0

◆.



Numerical example


(a) y


1

1.5

2

2.5x 10

6


Qu

ad

ratic

loss



(c) xML


1

1.5

2

2.5x 10

6


Qu

ad

ratic

loss



References








Prediction: µ�(y) = �x�(y)

Sensitivity analysis: if µ� is weakly di↵erentiable

µ�(y + �) = µ�(y) + @µ�(y) · � +O(||�||2)

Prediction Risk Estimation



Stein Unbiased Risk Estimator:

µ�(y + �) = µ�(y) + @µ�(y) · � +O(||�||2)

df�(y) = tr(@µ�(y)) = div(µ�)(y)

SURE�(y) = ||y � µ�(y)||2 � �2P + 2�2df�(y)




Theorem: [Stein, 1981]


µ�(y + �) = µ�(y) + @µ�(y) · � +O(||�||2)

df�(y) = tr(@µ�(y)) = div(µ�)(y)

SURE�(y) = ||y � µ�(y)||2 � �2P + 2�2df�(y)

Ew(SURE�(y)) = Ew(||�x0 � µ�(y)||2)




Theorem: [Stein, 1981]

Other estimators: GCV, BIC, AIC, . . .

SURE:Requires � (not always available)

Unbiased and good practical performances


µ�(y + �) = µ�(y) + @µ�(y) · � +O(||�||2)

df�(y) = tr(@µ�(y)) = div(µ�)(y)

SURE�(y) = ||y � µ�(y)||2 � �2P + 2�2df�(y)

Ew(SURE�(y)) = Ew(||�x0 � µ�(y)||2)


Problem: ||�x0 � �x�(y)|| poor indicator of ||x0 � x�(y)||.

Generalized SURE: take into account risk on ker(�)

?

Generalized SURE

gdf�(y) = tr�(��⇤)+@µ�(y)

�


GSURE�(y)= ||x(y)��x�(y)||2��

2tr((��⇤)+)+2�2gdf�(y)

Generalized df:


?

Projker(�)? = ⇧ = �

⇤(��

⇤)

+�

ML estimator: x(y) = �

⇤(��

⇤)

+y.

Generalized SURE

gdf�(y) = tr�(��⇤)+@µ�(y)

�

Theorem:


GSURE�(y)= ||x(y)��x�(y)||2��

2tr((��⇤)+)+2�2gdf�(y)

Generalized df:


?

Projker(�)? = ⇧ = �

⇤(��

⇤)

+�


⇤(��

⇤)

+y.

Ew(GSURE�(y)) = Ew(||⇧(x0 � x�(y))||2)

Generalized SURE

[Eldar 09, Pesquet al. 09, Vonesh et al. 08]

gdf�(y) = tr�(��⇤)+@µ�(y)

�

Theorem:


GSURE�(y)= ||x(y)��x�(y)||2��

2tr((��⇤)+)+2�2gdf�(y)

Generalized df:


?

Projker(�)? = ⇧ = �

⇤(��

⇤)

+�


⇤(��

⇤)

+y.

Ew(GSURE�(y)) = Ew(||⇧(x0 � x�(y))||2)

Generalized SURE

�! How to compute @µ�(y) ?

[Eldar 09, Pesquet al. 09, Vonesh et al. 08]

Overview



• SURE Unbiased Risk Estimator


y = �x0 + w

Recovery:

Observations:

x

�

(y) 2 argminx2RN

1

2||�x� y||2 + �||D⇤

x||1


(P�(y))

y = �x0 + w

Recovery:

Observations:

x

�

(y) 2 argminx2RN

1

2||�x� y||2 + �||D⇤

x||1

Remark: x�(y) not always unique but

µ�(y) = �x�(y) always unique.


(P�(y))

y = �x0 + w

Questions:

Recovery:

Observations:

x

�

(y) 2 argminx2RN

1

2||�x� y||2 + �||D⇤

x||1

– When is y ! µ�(y) di↵erentiable ?

– Formula for @µ�(y).

Remark: x�(y) not always unique but

µ�(y) = �x�(y) always unique.


(P�(y))

D� =

�

��

1 �1 0 00 1 �1 00 0 1 �1�1 0 0 1

�

��

TV-1D ball:Displayed in {x \ ⇥x, 1⇤ = 0} � R3

x

�

(y) 2 argminy=�x

||D⇤x||1

Copyright

c� Mael

TV-1D PolytopeB = {x \ ||D�x||1 � 1}

D� =

�

��

1 �1 0 00 1 �1 00 0 1 �1�1 0 0 1

�

��


x

�

(y) 2 argminy=�x

||D⇤x||1

Copyright

c� Mael

TV-1D Polytope

x�{x \ y = �x}

B = {x \ ||D�x||1 � 1}

D� =

�

��

1 �1 0 00 1 �1 00 0 1 �1�1 0 0 1

�

��


x

�

(y) 2 argminy=�x

||D⇤x||1

Copyright

c� Mael

TV-1D Polytope

x�{x \ y = �x}

B = {x \ ||D�x||1 � 1}

D� =

�

��

1 �1 0 00 1 �1 00 0 1 �1�1 0 0 1

�

��


x

�

(y) 2 argminy=�x

||D⇤x||1

Copyright

c� Mael

TV-1D Polytope

x�{x \ y = �x}

B = {x \ ||D�x||1 � 1}

J = Ic

(P�(y))

Support of the solution:I = {i \ (D⇤

x�(y))i 6= 0}

Union of Subspaces Model

x

�

(y) 2 argminx2RN

1

2||�x� y||2 + �||D⇤

x||1

J = Ic

(P�(y))

I

Support of the solution:

1-D total variation:

I = {i \ (D⇤x�(y))i 6= 0}


D�x = (xi � xi�1)i

x

x

�

(y) 2 argminx2RN

1

2||�x� y||2 + �||D⇤

x||1

J = Ic

(P�(y))

I



Sub-space model:

I = {i \ (D⇤x�(y))i 6= 0}


D�x = (xi � xi�1)i

x

GJ = ker(D�J) = Im(DJ)�

x

�

(y) 2 argminx2RN

1

2||�x� y||2 + �||D⇤

x||1

J = Ic

(P�(y))

I



Sub-space model:

I = {i \ (D⇤x�(y))i 6= 0}

Lemma: There exists a solution x?such that (HJ) holds.


D�x = (xi � xi�1)i

x

GJ = ker(D�J) = Im(DJ)�

Local well-posedness: ker(�) � GJ = {0} (HJ)

x

�

(y) 2 argminx2RN

1

2||�x� y||2 + �||D⇤

x||1

To be understood: there exists a solution with same sign.

Local Sign Stability(P�(y))

Lemma:

dim(GJ)

2

2 2

2

2 2

11

1

11

0

x

�

(y) 2 argminx

1

2||�x� y||2 + �||D⇤

x||1

sign(D

⇤x�(y)) is constant around (y,�) /2 H.

Linearized problem:x�(y) = argmin

x�GJ

12

||�x � y||2 + � �D�Ix, sI�


= sign(D⇤Ix�(y))


Lemma:

dim(GJ)

2

2 2

2

2 2

11

1

11

0

x

�

(y) 2 argminx

1

2||�x� y||2 + �||D⇤

x||1

sign(D


Linearized problem:

= A[J]��y � �DIsI

�x�(y) = argmin

x�GJ

12

||�x � y||2 + � �D�Ix, sI�


A[J]z = argminx�GJ

12

||�x||2 � �x, z�



Lemma:

dim(GJ)

2

2 2

2

2 2

11

1

11

0

x

�

(y) 2 argminx

1

2||�x� y||2 + �||D⇤

x||1

sign(D


Linearized problem:

= A[J]��y � �DIsI

�x�(y) = argmin

x�GJ

12

||�x � y||2 + � �D�Ix, sI�


A[J]z = argminx�GJ

12

||�x||2 � �x, z�



Lemma:

dim(GJ)

2

2 2

2

2 2

11

1

11

0Theorem:x�(y) is a solution of P�(y).

If (y, �) /� H, for (y, �)

close to (y, �),

x

�

(y) 2 argminx

1

2||�x� y||2 + �||D⇤

x||1

sign(D


Local parameterization:

Under uniqueness assumption:

are piecewise a�ne functions.y 7! x�(y)� 7! x�(y)

change of support of D

⇤x�(y)x0+

Local Affine Maps

��0 = 0 �k

x�k = 0

breaking points��

x�(y) = A[J]��y � �A[J]DI sI

for D = Id (synthesis)

gdf�(y) = tr(⇧A[J])

Let I = supp(D⇤x�(y)) such that HJ holds.

are unbiased estimators of df and gdf.

df�(y) = tr (µ�(y)) = dim(�A[J]�⇤)

Application to GSURE

For y /2 H, one has locally:

df�(y) = ||x�(y)||0

Corollary:

µ�(y) = �A[J]�⇤y + cst.





Trick: tr(A) = Ez(hAz, zi), z ⇠ N (0, IdP ).

df�(y) = tr (µ�(y)) = dim(�A[J]�⇤)



df�(y) = ||x�(y)||0

Corollary:

µ�(y) = �A[J]�⇤y + cst.

In practice:



Proposition:

gdf�(y) = Ez(h⌫(z), �+zi), z ⇠ N (0, IdP )✓�⇤� DJ

D⇤J 0

◆✓⌫(z)⌫

◆=

✓�⇤z0

◆where ⌫(z) solves



Trick: tr(A) = Ez(hAz, zi), z ⇠ N (0, IdP ).

gdf�(y) ⇡ 1K

PKk=1h⌫(zk), �+zki, zk ⇠ N (0, IdP ).

df�(y) = tr (µ�(y)) = dim(�A[J]�⇤)



df�(y) = ||x�(y)||0

Corollary:

µ�(y) = �A[J]�⇤y + cst.



Problem statement


x


1

2||y � �x||2 + �||D⇤

x||1 (P�

(y))


y = �x0 + w





Iw ⇠ N (0, �2Id

Q







R(�) = Ew




?(y,�)?

Risk estimation




SURE(y,�) =||y � �x?(y,�)||2 � �

2Q + 2�2 tr

✓@�x?(y,�)

@y

◆



Ew

(SURE(y,�)) = Ew

(||�x0 � �x?(y,�)||2) .



GSURE(y,�) =||xML(y) � ⇧x?(y,�)||2 � �

2 tr((��⇤)+) + 2�2 tr

✓(��⇤)+@�x

?(y,�)

@y

◆


Ew

(GSURE(y,�)) = Ew





?(y,�)@y

⌘?




?(y,�),I Let J = I


?(y,�),I Let D

I


I Let sI

= sign(D⇤x

?(y,�))I



I Let GJ

= KerD⇤J



J

:

GJ


A

[J ] = U(U⇤�⇤�U)�1U

⇤,


,


x


(y) , x

?(y,�) = x(y,�) , A

[J ]�⇤y � �A

[J ]D

I

s

I

.






x(y, �) = A

[J ]�⇤y � �A

[J ]D

I

s

I

.


@�x?(y,�)

@y

= �A[J ]�⇤,


tr

✓@�x?(y,�)

@y

◆= dim(G

J

) .


�

x1

x2

�0 = 0 �k

x�k = 0

x�0

P0(y)




),

tr

✓(��⇤)+@�x

?(y,�)

@y

◆= E

Z

(h⌫(Z), �⇤(��⇤)+Zi)


J

D

⇤J

0

◆✓⌫

⌫

◆=

✓�⇤

z

0

◆.



Numerical example


(a) y


1

1.5

2

2.5x 10

6


Qu

ad

ratic

loss



(c) xML


1

1.5

2

2.5x 10

6


Qu

ad

ratic

loss



References









Problem statement


x


1

2||y � �x||2 + �||D⇤

x||1 (P�

(y))


y = �x0 + w





Iw ⇠ N (0, �2Id

Q







R(�) = Ew




?(y,�)?

Risk estimation




SURE(y,�) =||y � �x?(y,�)||2 � �

2Q + 2�2 tr

✓@�x?(y,�)

@y

◆



Ew

(SURE(y,�)) = Ew

(||�x0 � �x?(y,�)||2) .



GSURE(y,�) =||xML(y) � ⇧x?(y,�)||2 � �

2 tr((��⇤)+) + 2�2 tr

✓(��⇤)+@�x

?(y,�)

@y

◆


Ew

(GSURE(y,�)) = Ew





?(y,�)@y

⌘?




?(y,�),I Let J = I


?(y,�),I Let D

I


I Let sI

= sign(D⇤x

?(y,�))I



I Let GJ

= KerD⇤J



J

:

GJ


A

[J ] = U(U⇤�⇤�U)�1U

⇤,


,


x


(y) , x

?(y,�) = x(y,�) , A

[J ]�⇤y � �A

[J ]D

I

s

I

.






x(y, �) = A

[J ]�⇤y � �A

[J ]D

I

s

I

.


@�x?(y,�)

@y

= �A[J ]�⇤,


tr

✓@�x?(y,�)

@y

◆= dim(G

J

) .


�

x1

x2

�0 = 0 �k

x�k = 0

x�0

P0(y)




),

tr

✓(��⇤)+@�x

?(y,�)

@y

◆= E

Z

(h⌫(Z), �⇤(��⇤)+Zi)


J

D

⇤J

0

◆✓⌫

⌫

◆=

✓�⇤

z

0

◆.



Numerical example


(a) y


1

1.5

2

2.5x 10

6


Quadra

tic lo

ss



(c) xML


1

1.5

2

2.5x 10

6


Quadra

tic lo

ss



References







� 2 RP⇥Nrealization of a random vector.

�+y

x�?(y)

D: wavelet TI redundant tight frame.

P = N/4

CS with Wavelets



Problem statement


x


1

2||y � �x||2 + �||D⇤

x||1 (P�

(y))


y = �x0 + w





Iw ⇠ N (0, �2Id

Q







R(�) = Ew




?(y,�)?

Risk estimation




SURE(y,�) =||y � �x?(y,�)||2 � �

2Q + 2�2 tr

✓@�x?(y,�)

@y

◆



Ew

(SURE(y,�)) = Ew

(||�x0 � �x?(y,�)||2) .



GSURE(y,�) =||xML(y) � ⇧x?(y,�)||2 � �

2 tr((��⇤)+) + 2�2 tr

✓(��⇤)+@�x

?(y,�)

@y

◆


Ew

(GSURE(y,�)) = Ew





?(y,�)@y

⌘?




?(y,�),I Let J = I


?(y,�),I Let D

I


I Let sI

= sign(D⇤x

?(y,�))I



I Let GJ

= KerD⇤J



J

:

GJ


A

[J ] = U(U⇤�⇤�U)�1U

⇤,


,


x


(y) , x

?(y,�) = x(y,�) , A

[J ]�⇤y � �A

[J ]D

I

s

I

.






x(y, �) = A

[J ]�⇤y � �A

[J ]D

I

s

I

.


@�x?(y,�)

@y

= �A[J ]�⇤,


tr

✓@�x?(y,�)

@y

◆= dim(G

J

) .


�

x1

x2

�0 = 0 �k

x�k = 0

x�0

P0(y)




),

tr

✓(��⇤)+@�x

?(y,�)

@y

◆= E

Z

(h⌫(Z), �⇤(��⇤)+Zi)


J

D

⇤J

0

◆✓⌫

⌫

◆=

✓�⇤

z

0

◆.



Numerical example


(a) y


1

1.5

2

2.5x 10

6


Qu

ad

ratic

loss



(c) xML


1

1.5

2

2.5x 10

6


Qu

ad

ratic

loss



References







��?

Quadraticloss



Problem statement


x


1

2||y � �x||2 + �||D⇤

x||1 (P�

(y))


y = �x0 + w





Iw ⇠ N (0, �2Id

Q







R(�) = Ew




?(y,�)?

Risk estimation




SURE(y,�) =||y � �x?(y,�)||2 � �

2Q + 2�2 tr

✓@�x?(y,�)

@y

◆



Ew

(SURE(y,�)) = Ew

(||�x0 � �x?(y,�)||2) .



GSURE(y,�) =||xML(y) � ⇧x?(y,�)||2 � �

2 tr((��⇤)+) + 2�2 tr

✓(��⇤)+@�x

?(y,�)

@y

◆


Ew

(GSURE(y,�)) = Ew





?(y,�)@y

⌘?




?(y,�),I Let J = I


?(y,�),I Let D

I


I Let sI

= sign(D⇤x

?(y,�))I



I Let GJ

= KerD⇤J



J

:

GJ


A

[J ] = U(U⇤�⇤�U)�1U

⇤,


,


x


(y) , x

?(y,�) = x(y,�) , A

[J ]�⇤y � �A

[J ]D

I

s

I

.






x(y, �) = A

[J ]�⇤y � �A

[J ]D

I

s

I

.


@�x?(y,�)

@y

= �A[J ]�⇤,


tr

✓@�x?(y,�)

@y

◆= dim(G

J

) .


�

x1

x2

�0 = 0 �k

x�k = 0

x�0

P0(y)




),

tr

✓(��⇤)+@�x

?(y,�)

@y

◆= E

Z

(h⌫(Z), �⇤(��⇤)+Zi)


J

D

⇤J

0

◆✓⌫

⌫

◆=

✓�⇤

z

0

◆.



Numerical example


(a) y


1

1.5

2

2.5x 10

6


Quadra

tic lo

ss



(c) xML


1

1.5

2

2.5x 10

6


Quadra

tic lo

ss



References









Problem statement


x


1

2||y � �x||2 + �||D⇤

x||1 (P�

(y))


y = �x0 + w





Iw ⇠ N (0, �2Id

Q







R(�) = Ew




?(y,�)?

Risk estimation




SURE(y,�) =||y � �x?(y,�)||2 � �

2Q + 2�2 tr

✓@�x?(y,�)

@y

◆



Ew

(SURE(y,�)) = Ew

(||�x0 � �x?(y,�)||2) .



GSURE(y,�) =||xML(y) � ⇧x?(y,�)||2 � �

2 tr((��⇤)+) + 2�2 tr

✓(��⇤)+@�x

?(y,�)

@y

◆


Ew

(GSURE(y,�)) = Ew





?(y,�)@y

⌘?




?(y,�),I Let J = I


?(y,�),I Let D

I


I Let sI

= sign(D⇤x

?(y,�))I



I Let GJ

= KerD⇤J



J

:

GJ


A

[J ] = U(U⇤�⇤�U)�1U

⇤,


,


x


(y) , x

?(y,�) = x(y,�) , A

[J ]�⇤y � �A

[J ]D

I

s

I

.






x(y, �) = A

[J ]�⇤y � �A

[J ]D

I

s

I

.


@�x?(y,�)

@y

= �A[J ]�⇤,


tr

✓@�x?(y,�)

@y

◆= dim(G

J

) .


�

x1

x2

�0 = 0 �k

x�k = 0

x�0

P0(y)




),

tr

✓(��⇤)+@�x

?(y,�)

@y

◆= E

Z

(h⌫(Z), �⇤(��⇤)+Zi)


J

D

⇤J

0

◆✓⌫

⌫

◆=

✓�⇤

z

0

◆.



Numerical example


(a) y


1

1.5

2

2.5x 10

6


Quadra

tic lo

ss



(c) xML


1

1.5

2

2.5x 10

6


Quadra

tic lo

ss



References







Observations y

�: vertical sub-sampling.

D = [@1, @2]

Finite di↵erences gradient:

Anisotropic Total-Variation



Problem statement


x


1

2||y � �x||2 + �||D⇤

x||1 (P�

(y))


y = �x0 + w





Iw ⇠ N (0, �2Id

Q







R(�) = Ew




?(y,�)?

Risk estimation




SURE(y,�) =||y � �x?(y,�)||2 � �

2Q + 2�2 tr

✓@�x?(y,�)

@y

◆



Ew

(SURE(y,�)) = Ew

(||�x0 � �x?(y,�)||2) .



GSURE(y,�) =||xML(y) � ⇧x?(y,�)||2 � �

2 tr((��⇤)+) + 2�2 tr

✓(��⇤)+@�x

?(y,�)

@y

◆


Ew

(GSURE(y,�)) = Ew





?(y,�)@y

⌘?




?(y,�),I Let J = I


?(y,�),I Let D

I


I Let sI

= sign(D⇤x

?(y,�))I



I Let GJ

= KerD⇤J



J

:

GJ


A

[J ] = U(U⇤�⇤�U)�1U

⇤,


,


x


(y) , x

?(y,�) = x(y,�) , A

[J ]�⇤y � �A

[J ]D

I

s

I

.






x(y, �) = A

[J ]�⇤y � �A

[J ]D

I

s

I

.


@�x?(y,�)

@y

= �A[J ]�⇤,


tr

✓@�x?(y,�)

@y

◆= dim(G

J

) .


�

x1

x2

�0 = 0 �k

x�k = 0

x�0

P0(y)




),

tr

✓(��⇤)+@�x

?(y,�)

@y

◆= E

Z

(h⌫(Z), �⇤(��⇤)+Zi)


J

D

⇤J

0

◆✓⌫

⌫

◆=

✓�⇤

z

0

◆.



Numerical example


(a) y


1

1.5

2

2.5x 10

6


Qu

ad

ratic

loss



(c) xML


1

1.5

2

2.5x 10

6


Qu

ad

ratic

loss



References







��?

Quadraticloss

x�?(y)

Overview





Identifiability Criterion

Identifiability criterion of a sign: we suppose (HJ) holds

where � = D+J (Id� ��A[J])DI

IC(s) = minu�Ker DJ

||�sI � u||� (convex ! computable)

� = Id (denoising)


xD�x = (xi � xi�1)i

Discrete 1-D derivative:





� = Id (denoising)

�sI = sign(D�

Ix)�J = �sI

sI


x

+1

�1

�J

IC(sign(D�x)) < 1

IC(s) = ||�J ||�

D�x = (xi � xi�1)i







||�sI � u||� � = D+J (Id� ��A[J])DI

Robustness to Small Noise


||�sI � u||� � = D+J (Id� ��A[J])DI


is the unique solution of P�(y).

If ||w||/T is small enough and � � ||w||, then

Theorem:

x0 + A[J]��w � �A[J]DIsI ,

If IC(sign(D�x0)) < 1 T = mini�I

|(D�x0)i|


||�sI � u||� � = D+J (Id� ��A[J])DI




Theorem:

x0 + A[J]��w � �A[J]DIsI ,


|(D�x0)i|

Theorem: If IC(sign(D�x0)) < 1x0 is the unique solution of P0(�x0).

�� When D = Id, results of J.J. Fuchs.


||�sI � u||� � = D+J (Id� ��A[J])DI




Theorem:

x0 + A[J]��w � �A[J]DIsI ,


|(D�x0)i|

Theorem: If IC(sign(D�x0)) < 1x0 is the unique solution of P0(�x0).

Robustness criterion:

= IC(p)

Robustness to Bounded Noise

RC(I) = max||pI ||��1

minu�ker(DJ )

||�pI � u||�

Robustness criterion:

cJ = ||D+J ��(�A[J]�� Id)||2,�

CJ = ||A[J]||2,2

�||�||2,2 +

�cJ

1� RC(I)||DI ||2,�

�Constants:

�� When D = Id, results of Tropp (ERC)

= IC(p)

Robustness to Bounded Noise

Theorem:

||x0 � x�||2 � CJ ||w||2

� = ⇥||w||2cJ

1� RC(I)with ⇥ > 1

P�(y) has a unique solution x⇥ � GJ and

If RC(I) < 1 for I = Supp(D�x0), setting

RC(I) = max||pI ||��1

minu�ker(DJ )

||�pI � u||�

Noiseless CNS:

x0 2 argminy=�x

||D⇤x||1

() 9↵ 2 @||D⇤x0||1, D↵ 2 Im(�⇤)

(SC↵)

Source Condition

||D⇤x||1 6 c

y

=�x

D↵x0

Noiseless CNS:

x0 2 argminy=�x

||D⇤x||1

() 9↵ 2 @||D⇤x0||1, D↵ 2 Im(�⇤)

(SC↵)

Source Condition

||D⇤x||1 6 c

y

=�x

D↵x0

Theorem: If (HJ), (SC↵) and ||↵J ||1 < 1, then

[Grassmair, Inverse Prob., 2011]

||D⇤(x? � x0)|| = O(||w||)

Noiseless CNS:

x0 2 argminy=�x

||D⇤x||1

() 9↵ 2 @||D⇤x0||1, D↵ 2 Im(�⇤)

(SC↵)

Proposition:

⇢↵J = ⌦sI � u↵I = sI

Let s = sign(D⇤x0) and

Then IC(s) < 1 =) (SC↵) and ||↵J ||1 = IC(s).

Source Condition


||�sI � u||�

||D⇤x||1 6 c

y

=�x

D↵x0

Theorem: If (HJ), (SC↵) and ||↵J ||1 < 1, then

[Grassmair, Inverse Prob., 2011]

||D⇤(x? � x0)|| = O(||w||)

Overview





D�x = (xi � xi�1)i


Signals with k � 1 steps

Denoising � = Id.

x

Example: Total Variation in 1-D{GJ \ dimGJ = k}

D�x = (xi � xi�1)i



Denoising � = Id.

x

+1

�1

Small noise robustness

x

�I

�J

IC(sign(D�x)) < 1

��I = sign(D�

Ix)�J = � sign(D�

Ix)IC(s) = ||�J ||�

Example: Total Variation in 1-D{GJ \ dimGJ = k}

D�x = (xi � xi�1)i



Denoising � = Id.

x

+1

�1

Small noise robustness

x

�I

�J

IC(sign(D�x)) < 1 IC(sign(D�x)) = 1

��I = sign(D�


Ix)IC(s) = ||�J ||�

Example: Total Variation in 1-D

+1

�1

{GJ \ dimGJ = k}

x

�J

Case IC= 1:

Support recovery.

Staircasing Example

x�(y)

i

yi

x0

= +

wy

�

yi

i �

x�(y)

Case IC= 1:

No support recovery.

Support recovery.

Staircasing Example

x�(y)

i

yi

x0

= +

wy

�

D�x = (D�1x,D�

2x) � RN�2

D�2x = (xi,j � xi,j�1)i,j � RN

D�1x = (xi,j � xi�1,j)i,j � RN

Directional derivatives:

Gradient:


Dual vector:

D�x = (D�1x,D�

2x) � RN�2

D�2x = (xi,j � xi,j�1)i,j � RN

D�1x = (xi,j � xi�1,j)i,j � RN

Directional derivatives:

Gradient:

x

��I = sign(D�


Ix)


x

�I �I

IC(s) > 1IC(s) < 1

Sum of k interval indicators.

x

Gaussian � 2 RQ⇥N .

Example: Fused Lasso{GJ \ dimGJ = k}

D�x = (xi � xi�1)i � (�xi)i

Total variation and �1 hybrid:

Compressed sensing:

Sum of k interval indicators.

x

Gaussian � 2 RQ⇥N .

Probabilite P (⌘, ", QN ) of the even IC< 1.

x0

⌘

Example: Fused Lasso{GJ \ dimGJ = k}

"

⌘10

D�x = (xi � xi�1)i � (�xi)i

Total variation and �1 hybrid:

Compressed sensing:

⌘

⌘

Q

N

Haar wavelets:

Haar TI analysis:

�(j)

�(j)

2j+1

||D�x||1 =�

j ||x � �(j)||1 =�

j ||x � �(j)||TV

(j)i =

1

2

⌧(j+1)

8<

:

+1 if 0 6 i < 2

j

�1 if � 2

j 6 i < 0

0 otherwise.

Example: Invariant Haar Analysis

Haar wavelets:

Haar TI analysis:

�(j)

�(j)

2j+1

||D�x||1 =�

j ||x � �(j)||1 =�

j ||x � �(j)||TV

(j)i =

1

2

⌧(j+1)

8<

:

+1 if 0 6 i < 2

j

�1 if � 2

j 6 i < 0

0 otherwise.

IC

�

�x = ' ? x, '(t) / e

� t2

2�2De-blurring:

Example: Invariant Haar Analysis

8

For N a multiple of 4, we split {1, . . . , N} into 4 sets lk =

{(k� 1)M +1, ..., kM} of cardinality M = N/4. Let 1lk bethe boxcar signal whose support is lk . Consider the staircasesignal x

0

= �1l1 + 1l4 degraded by a deterministic noise wof the form w = "(1l3 � 1l2), where " 2 R. The observationvector y = x

0

+ w reads

y = �1l1 � "1l2 + "1l3 + 1l4 .

Suppose that " > 0, then the solution x?� of (P�(y)) is

x?� =

✓

�1 +

�

M

◆

1l1 � "1l2 + "1l3 +

✓

1� �

M

◆

1l4 ,

if 0 6 � 6 �1

= M(1� "), and

x?� =

✓

�"+ �� 1

2M

◆

(1l1+1l2)+

✓

"� �� 1

2M

◆

(1l3+1l4),

if �1

6 � 6 �2

= �1

+ 2"M , and 0 if � > �2

. Similarly, if" < 0, the solution x?

� reads

x?� =

✓

�1 +

�

M

◆

1l1�✓

"+ 2

�

M

◆

(1l2�1l3)+

✓

1� �

M

◆

1l4 ,

if 0 6 � 6 ¯�1

= �"M2

, and

x?� =

✓

�1 +

�

M

◆

1l1 +

✓

1� �

M

◆

1l4 ,

if �1

6 � 6 ¯�2

= M , and 0 if � > �2

. Figure 2 displaysplots of the the coordinates’ path for both cases. It is worth

x?�[i]

�

0

1

¯�2

�"

¯�1

x?�[i]

�

0

1

�1

"

�2

y[i]

i

0

2✏

y[i]

i

0

Fig. 2: Top row: Signals y for " < 0 (left) and " > 0

(right). Bottom row: Corresponding coordinates’ path of x?� as

a function of �. The solid lines correspond to the coordinatesin l

1

and l4

, and the dashed ones to the coordinates in l2

andl3

.

pointing out that when " > 0, the D-support of x?� is always

different from that of x0

whatever the choice of �, whereasin the case " < 0, for any ¯�

1

6 � 6 ¯�2

, the D-support of x?�

and sign of D⇤x?� are exactly those of x

0

.

D. Shift-Invariant Haar Deconvolution

Sparse analysis regularization using a 1-D shift invariantHaar dictionary is efficient to recover piecewise constant

signals. This dictionary is built using a set of scaled and dilatedHaar filters

(j)i =

1

2

⌧(j+1)

8

>

<

>

:

+1 if 0 6 i < 2

j

�1 if � 2

j 6 i < 0

0 otherwise,

where ⌧ > 0 is a normalization exponent. For ⌧ = 1,the dictionary is said to be unit-normed. For ⌧ = 1/2, itcorresponds to a Parseval tight-frame. The action on a signalx of the analysis operator corresponding to the translationinvariant Haar dictionary DH is

D⇤Hx =

⇣

(j) ? x⌘

06j6Jmax

,

where ? stands for the discrete convolution (with appropriateboundary conditions) and Jmax < log

2

(N), where N is thesize of the signal. The analysis regularization ||D⇤

Hx||1

can alsobe written as the sum over scales of the TV semi-norms offiltered versions of the signal. As such, it can be understood asa sort of multiscale total variation regularization. Apart from amultiplicative factor, one recovers Total Variation when Jmax =

0.We consider a noiseless convolution setting (for N = 256)

where � is a circular convolution operator with a Gaussiankernel of standard deviation �. We first study the impact of �on the identifiability criterion IC. The blurred signal x⌘ is acentered boxcar signal with a support of size 2⌘N

x⌘ = 1{bN/2�⌘Nc,...,bN/2+⌘Nc}, ⌘ 2 (0, 1/2] .

Figure 3 displays the evolution of IC(sign(D⇤Hx

0

) as afunction of � for three dictionaries (⌧ = 1, ⌧ = 0.5 and theTotal Variation), where we fixed ⌘ = 0.2. In the identifiability

Fig. 3: Behaviour of IC for a noiseless deconvolution scenariowith a Gaussian blur and `

1

-analsyis sparsity regularizationin a shift invariant Haar dictionary with J

max

= 4. IC isplotted as a function of the Gaussian bluring kernel size � 2[0.5, 3.0] for the total variation dictionary and the Haar waveletdictionary with two normalization exponents ⌧ . Dash-dottedline: ⌧ = 1 (unit-normed). Dashed line: ⌧ = 1/2 (tight-frame).Solid line: total variation.

regime, IC(sign(D⇤Hx

0

) appears smaller in the case of theunit-normed normalization. However, one should avoid to

⌧ = 1

⌧ = 1/2

TV

x0

Conclusion

SURE risk estimation:

Sensitivity analysis.

Computing risk estimates

()

Open problem:

Fast algorithms to optimize �.

Conclusion

SURE�

�?�




()

Open problem:


Conclusion

SURE�

�?�




()

Analysis support is less stable.

Analysis vs. synthesis regularization:

Open problem:


Conclusion

SURE�

�?�




()

Stability without support stability.

Open problem:

Analysis support is less stable.

Analysis vs. synthesis regularization:

Documents

Robust Sparse Analysis Recovery