Upload
gabriel-peyre
View
622
Download
1
Embed Size (px)
DESCRIPTION
Slides of the presentation at the "Séminaire Parisien de Statistique", Nov. 21st 2012, IHP, Paris.
Citation preview
Gabriel Peyré
www.numerical-tours.com
Robust SparseAnalysis Recovery
Samuel VaiterCharles Dossal
Jalal Fadili
Joint work with:
Overview
• Synthesis vs. Analysis Regularization
• Risk Estimation
• Local Behavior of Sparse Regularization
• Robustness to Noise
• Numerical Illustrations
y = �x0 + w � RP
Inverse ProblemsRecovering x0 � RN from noisy observations
Examples: Inpainting, super-resolution, compressed-sensing
y = �x0 + w � RP
Inverse ProblemsRecovering x0 � RN from noisy observations
x0
Examples: Inpainting, super-resolution, compressed-sensing
y = �x0 + w � RP
Regularized inversion:
Data fidelity Regularity
x
�
(y) 2 argminx2RN
1
2||y � �x||2 + � J(x)
Inverse ProblemsRecovering x0 � RN from noisy observations
x0
Coe�cients � Image x = ��
�
min��RQ
12
||y � �⇥�||22 + ⇥||�||1
Sparse RegularizationsSynthesis regularization
Coe�cients � Image x = �� Image x Correlations� = D�x
� D�
min��RQ
12
||y � �⇥�||22 + ⇥||�||1 minx�RN
12
||y � �x||22 + �||D�x||1
Sparse RegularizationsSynthesis regularization Analysis regularization
Coe�cients � Image x = �� Image x Correlations� = D�x
� D�
min��RQ
12
||y � �⇥�||22 + ⇥||�||1 minx�RN
12
||y � �x||22 + �||D�x||1
Sparse Regularizations
=
6= 0
=
D⇤
Synthesis regularization Analysis regularization
��
x x �
Coe�cients � Image x = �� Image x Correlations� = D�x
� D�
Unless D = � is orthogonal, produces di⇥erent results.
min��RQ
12
||y � �⇥�||22 + ⇥||�||1 minx�RN
12
||y � �x||22 + �||D�x||1
Sparse Regularizations
=
6= 0
=
D⇤
Synthesis regularization Analysis regularization
��
x x �
y = �x0 + w
Recovery:Observations:
��
0+
(no noise)
x
�
(y) 2 argminx2RN
1
2||�x� y||2 + �||D⇤
x||1
x0+(y) 2 argmin�x=y
||D⇤x||1
Variations and Stability
(P�(y))
(P0(y))
y = �x0 + w
Questions:
Recovery:Observations:
��
0+
(no noise)
x
�
(y) 2 argminx2RN
1
2||�x� y||2 + �||D⇤
x||1
x0+(y) 2 argmin�x=y
||D⇤x||1
– Behavior of x�(y) with respect to y and �.
�! Application: risk estimation (SURE, GCV, etc.)
Variations and Stability
(P�(y))
(P0(y))
y = �x0 + w
Questions:
Recovery:Observations:
��
0+
(no noise)
(with “reasonable” C)
x
�
(y) 2 argminx2RN
1
2||�x� y||2 + �||D⇤
x||1
x0+(y) 2 argmin�x=y
||D⇤x||1
– Behavior of x�(y) with respect to y and �.
�! Application: risk estimation (SURE, GCV, etc.)
– Criteria to ensure ||x�(y)� x0|| 6 C||w||
Variations and Stability
(P�(y))
(P0(y))
y = �x0 + w
Questions:
Recovery:Observations:
��
0+
(no noise)
Synthesis case (D = Id): works of Fuchs and Tropp.
Analysis case: [Nam et al. 2011] for w = 0.
(with “reasonable” C)
x
�
(y) 2 argminx2RN
1
2||�x� y||2 + �||D⇤
x||1
x0+(y) 2 argmin�x=y
||D⇤x||1
– Behavior of x�(y) with respect to y and �.
�! Application: risk estimation (SURE, GCV, etc.)
– Criteria to ensure ||x�(y)� x0|| 6 C||w||
Variations and Stability
(P�(y))
(P0(y))
Overview
• Inverse Problems
• Stein Unbiased Risk Estimators
• Theory: SURE for Sparse Regularization
• Practice: SURE for Iterative Algorithms
Plugin-estimator: x�?(y)(y)�?(y) = argmin�
R(�)
Risk Minimization
Unbiased Risk Estimation for Sparse Analysis RegularizationCharles Deledalle1, Samuel Vaiter1, Gabriel Peyre1, Jalal Fadili3 and Charles Dossal2
1CEREMADE, Universite Paris–Dauphine — 2GREY’C, ENSICAEN — 3IMB, Universite Bordeaux I
Problem statement
Consider the convex but non-smooth Analysis Sparsity Regularization problem
x
?(y,�) 2 argminx2RN
1
2||y � �x||2 + �||D⇤
x||1 (P�
(y))
which aims at inverting
y = �x0 + w
by promoting sparsity and with
Ix0 2 RN the unknown image of interest,
Iy 2 RQ the low-dimensional noisy observation of x0,
I � 2 RQ⇥N a linear operator that models the acquisition process,
Iw ⇠ N (0, �2Id
Q
) the noise component,
ID 2 RN⇥P an analysis dictionary, and
I� > 0 a regularization parameter.
How to choose the value of the parameter �?
Risk-based selection of �
I Risk associated to �: measure of the expected quality of x?(y,�) wrt x0,
R(�) = Ew
||x?(y,�) � x0||2 .I The optimal (theoretical) � minimizes the risk.
The risk is unknown since it depends on x0.
Can we estimate the risk solely from x
?(y,�)?
Risk estimation
I Assume y 7! �x?(y,�) is weakly di↵erentiable (a fortiori uniquely defined).
Prediction risk estimation via SURE
I The Stein Unbiased Risk Estimator (SURE):
SURE(y,�) =||y � �x?(y,�)||2 � �
2Q + 2�2 tr
✓@�x?(y,�)
@y
◆
| {z }Estimator of the DOF
is an unbiased estimator of the prediction risk [Stein, 1981]:
Ew
(SURE(y,�)) = Ew
(||�x0 � �x?(y,�)||2) .
Projection risk estimation via GSURE
I Let ⇧ = �⇤(��⇤)+� be the orthogonal projector on ker(�)? = Im(�⇤),I Denote xML(y) = �⇤(��⇤)+y,I The Generalized Stein Unbiased Risk Estimator (GSURE):
GSURE(y,�) =||xML(y) � ⇧x?(y,�)||2 � �
2 tr((��⇤)+) + 2�2 tr
✓(��⇤)+@�x
?(y,�)
@y
◆
is an unbiased estimator of the projection risk [Vaiter et al., 2012]
Ew
(GSURE(y,�)) = Ew
(||⇧x0 � ⇧x?(y,�)||2)(see also [Eldar, 2009, Pesquet et al., 2009, Vonesch et al., 2008] for similar results).
Illustration of risk estimation
(here, x? denotes x?(y,�) for an arbitrary value of �)
How to estimate the quantity tr⇣(��⇤)+@x
?(y,�)@y
⌘?
Main notations and assumptions
I Let I = supp(D⇤x
?(y,�)) be the support of D⇤x
?(y,�),I Let J = I
c be the co-support of D⇤x
?(y,�),I Let D
I
be the submatrix of D whose columns are indexed by I ,
I Let sI
= sign(D⇤x
?(y,�))I
be the subvector of D⇤x
?(y,�) whose entries are indexed by I ,
I Let GJ
= KerD⇤J
be the “cospace” associated to x
?(y,�) ,I To study the local behaviour of x?(y,�), we impose � to be “invertible” on G
J
:
GJ
\ Ker� = {0},I It allows us to define the matrix
A
[J ] = U(U⇤�⇤�U)�1U
⇤,
where U is a matrix whose columns form a basis of GJ
,
I In this case, we obtain an implicit equation:
x
?(y,�) solution of P�
(y) , x
?(y,�) = x(y,�) , A
[J ]�⇤y � �A
[J ]D
I
s
I
.
Is this relation true in a neighbourhood of (y,�)?
Theorem (Local Parameterization)
I Even if the solutions x?(y,�) of P�
(y) might benot unique, �x?(y,�) is uniquely defined.
I If (y,�) 62 H, for (y, �) close to (y,�), x(y, �)is a solution of P(y, �) where
x(y, �) = A
[J ]�⇤y � �A
[J ]D
I
s
I
.
I Hence, it allows us writing
@�x?(y,�)
@y
= �A[J ]�⇤,
I Moreover, the DOF can be estimated by
tr
✓@�x?(y,�)
@y
◆= dim(G
J
) .
Can we compute this quantity e�ciently?
�
x1
x2
�0 = 0 �k
x�k = 0
x�0
P0(y)
Monday, September 24, 12
Computation of GSURE
I One has for Z ⇠ N (0, IdP
),
tr
✓(��⇤)+@�x
?(y,�)
@y
◆= E
Z
(h⌫(Z), �⇤(��⇤)+Zi)
where, for any z 2 RP , ⌫ = ⌫(z) solves the following linear system✓�⇤� D
J
D
⇤J
0
◆✓⌫
⌫
◆=
✓�⇤
z
0
◆.
I In practice, with law of large number, the empirical mean is replaced for the expectation.
I The computation of ⌫(z) is achieved by solving the linear system with a conjugate gradient solver.
Numerical example
Super-resolution using (anisotropic) Total-Variation
(a) y
(b) x?(y,�) at the optimal � 2 4 6 8 10 12
1
1.5
2
2.5x 10
6
Regularization parameter !
Qu
ad
ratic
loss
Projection RiskGSURETrue Risk
Compressed-sensing using multi-scale wavelet thresholding
(c) xML
(d) x?(y,�) at the optimal �2 4 6 8 10 12
1
1.5
2
2.5x 10
6
Regularization parameter !
Qu
ad
ratic
loss
Projection RiskGSURETrue Risk
Perspectives: How to e�ciently minimizes GSURE(y,�) wrt �?
References
Eldar, Y. C. (2009).Generalized SURE for exponential families: Applications to regularization.IEEE Transactions on Signal Processing, 57(2):471–481.
Pesquet, J.-C., Benazza-Benyahia, A., and Chaux, C. (2009).A SURE approach for digital signal/image deconvolution problems.IEEE Transactions on Signal Processing, 57(12):4616–4632.
Stein, C. (1981).Estimation of the mean of a multivariate normal distribution.The Annals of Statistics, 9(6):1135–1151.
Vaiter, S., Deledalle, C., Peyre, G., Dossal, C., and Fadili, J. (2012).Local behavior of sparse analysis regularization: Applications to risk estimation.Arxiv preprint arXiv:1204.3212.
Vonesch, C., Ramani, S., and Unser, M. (2008).Recursive risk estimation for non-linear image deconvolution with a wavelet-domain sparsity constraint.In ICIP, pages 665–668. IEEE.
http://www.ceremade.dauphine.fr/~deledall/ [email protected]
Average risk: R(�) = Ew(||x�(y)� x0||2)
But:Ew is not accessible ! use one observation.
Plugin-estimator: x�?(y)(y)�?(y) = argmin�
R(�)
Risk Minimization
Unbiased Risk Estimation for Sparse Analysis RegularizationCharles Deledalle1, Samuel Vaiter1, Gabriel Peyre1, Jalal Fadili3 and Charles Dossal2
1CEREMADE, Universite Paris–Dauphine — 2GREY’C, ENSICAEN — 3IMB, Universite Bordeaux I
Problem statement
Consider the convex but non-smooth Analysis Sparsity Regularization problem
x
?(y,�) 2 argminx2RN
1
2||y � �x||2 + �||D⇤
x||1 (P�
(y))
which aims at inverting
y = �x0 + w
by promoting sparsity and with
Ix0 2 RN the unknown image of interest,
Iy 2 RQ the low-dimensional noisy observation of x0,
I � 2 RQ⇥N a linear operator that models the acquisition process,
Iw ⇠ N (0, �2Id
Q
) the noise component,
ID 2 RN⇥P an analysis dictionary, and
I� > 0 a regularization parameter.
How to choose the value of the parameter �?
Risk-based selection of �
I Risk associated to �: measure of the expected quality of x?(y,�) wrt x0,
R(�) = Ew
||x?(y,�) � x0||2 .I The optimal (theoretical) � minimizes the risk.
The risk is unknown since it depends on x0.
Can we estimate the risk solely from x
?(y,�)?
Risk estimation
I Assume y 7! �x?(y,�) is weakly di↵erentiable (a fortiori uniquely defined).
Prediction risk estimation via SURE
I The Stein Unbiased Risk Estimator (SURE):
SURE(y,�) =||y � �x?(y,�)||2 � �
2Q + 2�2 tr
✓@�x?(y,�)
@y
◆
| {z }Estimator of the DOF
is an unbiased estimator of the prediction risk [Stein, 1981]:
Ew
(SURE(y,�)) = Ew
(||�x0 � �x?(y,�)||2) .
Projection risk estimation via GSURE
I Let ⇧ = �⇤(��⇤)+� be the orthogonal projector on ker(�)? = Im(�⇤),I Denote xML(y) = �⇤(��⇤)+y,I The Generalized Stein Unbiased Risk Estimator (GSURE):
GSURE(y,�) =||xML(y) � ⇧x?(y,�)||2 � �
2 tr((��⇤)+) + 2�2 tr
✓(��⇤)+@�x
?(y,�)
@y
◆
is an unbiased estimator of the projection risk [Vaiter et al., 2012]
Ew
(GSURE(y,�)) = Ew
(||⇧x0 � ⇧x?(y,�)||2)(see also [Eldar, 2009, Pesquet et al., 2009, Vonesch et al., 2008] for similar results).
Illustration of risk estimation
(here, x? denotes x?(y,�) for an arbitrary value of �)
How to estimate the quantity tr⇣(��⇤)+@x
?(y,�)@y
⌘?
Main notations and assumptions
I Let I = supp(D⇤x
?(y,�)) be the support of D⇤x
?(y,�),I Let J = I
c be the co-support of D⇤x
?(y,�),I Let D
I
be the submatrix of D whose columns are indexed by I ,
I Let sI
= sign(D⇤x
?(y,�))I
be the subvector of D⇤x
?(y,�) whose entries are indexed by I ,
I Let GJ
= KerD⇤J
be the “cospace” associated to x
?(y,�) ,I To study the local behaviour of x?(y,�), we impose � to be “invertible” on G
J
:
GJ
\ Ker� = {0},I It allows us to define the matrix
A
[J ] = U(U⇤�⇤�U)�1U
⇤,
where U is a matrix whose columns form a basis of GJ
,
I In this case, we obtain an implicit equation:
x
?(y,�) solution of P�
(y) , x
?(y,�) = x(y,�) , A
[J ]�⇤y � �A
[J ]D
I
s
I
.
Is this relation true in a neighbourhood of (y,�)?
Theorem (Local Parameterization)
I Even if the solutions x?(y,�) of P�
(y) might benot unique, �x?(y,�) is uniquely defined.
I If (y,�) 62 H, for (y, �) close to (y,�), x(y, �)is a solution of P(y, �) where
x(y, �) = A
[J ]�⇤y � �A
[J ]D
I
s
I
.
I Hence, it allows us writing
@�x?(y,�)
@y
= �A[J ]�⇤,
I Moreover, the DOF can be estimated by
tr
✓@�x?(y,�)
@y
◆= dim(G
J
) .
Can we compute this quantity e�ciently?
�
x1
x2
�0 = 0 �k
x�k = 0
x�0
P0(y)
Monday, September 24, 12
Computation of GSURE
I One has for Z ⇠ N (0, IdP
),
tr
✓(��⇤)+@�x
?(y,�)
@y
◆= E
Z
(h⌫(Z), �⇤(��⇤)+Zi)
where, for any z 2 RP , ⌫ = ⌫(z) solves the following linear system✓�⇤� D
J
D
⇤J
0
◆✓⌫
⌫
◆=
✓�⇤
z
0
◆.
I In practice, with law of large number, the empirical mean is replaced for the expectation.
I The computation of ⌫(z) is achieved by solving the linear system with a conjugate gradient solver.
Numerical example
Super-resolution using (anisotropic) Total-Variation
(a) y
(b) x?(y,�) at the optimal � 2 4 6 8 10 12
1
1.5
2
2.5x 10
6
Regularization parameter !
Qu
ad
ratic
loss
Projection RiskGSURETrue Risk
Compressed-sensing using multi-scale wavelet thresholding
(c) xML
(d) x?(y,�) at the optimal �2 4 6 8 10 12
1
1.5
2
2.5x 10
6
Regularization parameter !
Qu
ad
ratic
loss
Projection RiskGSURETrue Risk
Perspectives: How to e�ciently minimizes GSURE(y,�) wrt �?
References
Eldar, Y. C. (2009).Generalized SURE for exponential families: Applications to regularization.IEEE Transactions on Signal Processing, 57(2):471–481.
Pesquet, J.-C., Benazza-Benyahia, A., and Chaux, C. (2009).A SURE approach for digital signal/image deconvolution problems.IEEE Transactions on Signal Processing, 57(12):4616–4632.
Stein, C. (1981).Estimation of the mean of a multivariate normal distribution.The Annals of Statistics, 9(6):1135–1151.
Vaiter, S., Deledalle, C., Peyre, G., Dossal, C., and Fadili, J. (2012).Local behavior of sparse analysis regularization: Applications to risk estimation.Arxiv preprint arXiv:1204.3212.
Vonesch, C., Ramani, S., and Unser, M. (2008).Recursive risk estimation for non-linear image deconvolution with a wavelet-domain sparsity constraint.In ICIP, pages 665–668. IEEE.
http://www.ceremade.dauphine.fr/~deledall/ [email protected]
Average risk: R(�) = Ew(||x�(y)� x0||2)
But:x0 is not accessible ! needs risk estimators.
Ew is not accessible ! use one observation.
Plugin-estimator: x�?(y)(y)�?(y) = argmin�
R(�)
Risk Minimization
Unbiased Risk Estimation for Sparse Analysis RegularizationCharles Deledalle1, Samuel Vaiter1, Gabriel Peyre1, Jalal Fadili3 and Charles Dossal2
1CEREMADE, Universite Paris–Dauphine — 2GREY’C, ENSICAEN — 3IMB, Universite Bordeaux I
Problem statement
Consider the convex but non-smooth Analysis Sparsity Regularization problem
x
?(y,�) 2 argminx2RN
1
2||y � �x||2 + �||D⇤
x||1 (P�
(y))
which aims at inverting
y = �x0 + w
by promoting sparsity and with
Ix0 2 RN the unknown image of interest,
Iy 2 RQ the low-dimensional noisy observation of x0,
I � 2 RQ⇥N a linear operator that models the acquisition process,
Iw ⇠ N (0, �2Id
Q
) the noise component,
ID 2 RN⇥P an analysis dictionary, and
I� > 0 a regularization parameter.
How to choose the value of the parameter �?
Risk-based selection of �
I Risk associated to �: measure of the expected quality of x?(y,�) wrt x0,
R(�) = Ew
||x?(y,�) � x0||2 .I The optimal (theoretical) � minimizes the risk.
The risk is unknown since it depends on x0.
Can we estimate the risk solely from x
?(y,�)?
Risk estimation
I Assume y 7! �x?(y,�) is weakly di↵erentiable (a fortiori uniquely defined).
Prediction risk estimation via SURE
I The Stein Unbiased Risk Estimator (SURE):
SURE(y,�) =||y � �x?(y,�)||2 � �
2Q + 2�2 tr
✓@�x?(y,�)
@y
◆
| {z }Estimator of the DOF
is an unbiased estimator of the prediction risk [Stein, 1981]:
Ew
(SURE(y,�)) = Ew
(||�x0 � �x?(y,�)||2) .
Projection risk estimation via GSURE
I Let ⇧ = �⇤(��⇤)+� be the orthogonal projector on ker(�)? = Im(�⇤),I Denote xML(y) = �⇤(��⇤)+y,I The Generalized Stein Unbiased Risk Estimator (GSURE):
GSURE(y,�) =||xML(y) � ⇧x?(y,�)||2 � �
2 tr((��⇤)+) + 2�2 tr
✓(��⇤)+@�x
?(y,�)
@y
◆
is an unbiased estimator of the projection risk [Vaiter et al., 2012]
Ew
(GSURE(y,�)) = Ew
(||⇧x0 � ⇧x?(y,�)||2)(see also [Eldar, 2009, Pesquet et al., 2009, Vonesch et al., 2008] for similar results).
Illustration of risk estimation
(here, x? denotes x?(y,�) for an arbitrary value of �)
How to estimate the quantity tr⇣(��⇤)+@x
?(y,�)@y
⌘?
Main notations and assumptions
I Let I = supp(D⇤x
?(y,�)) be the support of D⇤x
?(y,�),I Let J = I
c be the co-support of D⇤x
?(y,�),I Let D
I
be the submatrix of D whose columns are indexed by I ,
I Let sI
= sign(D⇤x
?(y,�))I
be the subvector of D⇤x
?(y,�) whose entries are indexed by I ,
I Let GJ
= KerD⇤J
be the “cospace” associated to x
?(y,�) ,I To study the local behaviour of x?(y,�), we impose � to be “invertible” on G
J
:
GJ
\ Ker� = {0},I It allows us to define the matrix
A
[J ] = U(U⇤�⇤�U)�1U
⇤,
where U is a matrix whose columns form a basis of GJ
,
I In this case, we obtain an implicit equation:
x
?(y,�) solution of P�
(y) , x
?(y,�) = x(y,�) , A
[J ]�⇤y � �A
[J ]D
I
s
I
.
Is this relation true in a neighbourhood of (y,�)?
Theorem (Local Parameterization)
I Even if the solutions x?(y,�) of P�
(y) might benot unique, �x?(y,�) is uniquely defined.
I If (y,�) 62 H, for (y, �) close to (y,�), x(y, �)is a solution of P(y, �) where
x(y, �) = A
[J ]�⇤y � �A
[J ]D
I
s
I
.
I Hence, it allows us writing
@�x?(y,�)
@y
= �A[J ]�⇤,
I Moreover, the DOF can be estimated by
tr
✓@�x?(y,�)
@y
◆= dim(G
J
) .
Can we compute this quantity e�ciently?
�
x1
x2
�0 = 0 �k
x�k = 0
x�0
P0(y)
Monday, September 24, 12
Computation of GSURE
I One has for Z ⇠ N (0, IdP
),
tr
✓(��⇤)+@�x
?(y,�)
@y
◆= E
Z
(h⌫(Z), �⇤(��⇤)+Zi)
where, for any z 2 RP , ⌫ = ⌫(z) solves the following linear system✓�⇤� D
J
D
⇤J
0
◆✓⌫
⌫
◆=
✓�⇤
z
0
◆.
I In practice, with law of large number, the empirical mean is replaced for the expectation.
I The computation of ⌫(z) is achieved by solving the linear system with a conjugate gradient solver.
Numerical example
Super-resolution using (anisotropic) Total-Variation
(a) y
(b) x?(y,�) at the optimal � 2 4 6 8 10 12
1
1.5
2
2.5x 10
6
Regularization parameter !
Qu
ad
ratic
loss
Projection RiskGSURETrue Risk
Compressed-sensing using multi-scale wavelet thresholding
(c) xML
(d) x?(y,�) at the optimal �2 4 6 8 10 12
1
1.5
2
2.5x 10
6
Regularization parameter !
Qu
ad
ratic
loss
Projection RiskGSURETrue Risk
Perspectives: How to e�ciently minimizes GSURE(y,�) wrt �?
References
Eldar, Y. C. (2009).Generalized SURE for exponential families: Applications to regularization.IEEE Transactions on Signal Processing, 57(2):471–481.
Pesquet, J.-C., Benazza-Benyahia, A., and Chaux, C. (2009).A SURE approach for digital signal/image deconvolution problems.IEEE Transactions on Signal Processing, 57(12):4616–4632.
Stein, C. (1981).Estimation of the mean of a multivariate normal distribution.The Annals of Statistics, 9(6):1135–1151.
Vaiter, S., Deledalle, C., Peyre, G., Dossal, C., and Fadili, J. (2012).Local behavior of sparse analysis regularization: Applications to risk estimation.Arxiv preprint arXiv:1204.3212.
Vonesch, C., Ramani, S., and Unser, M. (2008).Recursive risk estimation for non-linear image deconvolution with a wavelet-domain sparsity constraint.In ICIP, pages 665–668. IEEE.
http://www.ceremade.dauphine.fr/~deledall/ [email protected]
Average risk: R(�) = Ew(||x�(y)� x0||2)
Prediction: µ�(y) = �x�(y)
Sensitivity analysis: if µ� is weakly di↵erentiable
µ�(y + �) = µ�(y) + @µ�(y) · � +O(||�||2)
Prediction Risk Estimation
Prediction: µ�(y) = �x�(y)
Sensitivity analysis: if µ� is weakly di↵erentiable
Stein Unbiased Risk Estimator:
µ�(y + �) = µ�(y) + @µ�(y) · � +O(||�||2)
df�(y) = tr(@µ�(y)) = div(µ�)(y)
SURE�(y) = ||y � µ�(y)||2 � �2P + 2�2df�(y)
Prediction Risk Estimation
Prediction: µ�(y) = �x�(y)
Sensitivity analysis: if µ� is weakly di↵erentiable
Theorem: [Stein, 1981]
Stein Unbiased Risk Estimator:
µ�(y + �) = µ�(y) + @µ�(y) · � +O(||�||2)
df�(y) = tr(@µ�(y)) = div(µ�)(y)
SURE�(y) = ||y � µ�(y)||2 � �2P + 2�2df�(y)
Ew(SURE�(y)) = Ew(||�x0 � µ�(y)||2)
Prediction Risk Estimation
Prediction: µ�(y) = �x�(y)
Sensitivity analysis: if µ� is weakly di↵erentiable
Theorem: [Stein, 1981]
Other estimators: GCV, BIC, AIC, . . .
SURE:Requires � (not always available)
Unbiased and good practical performances
Stein Unbiased Risk Estimator:
µ�(y + �) = µ�(y) + @µ�(y) · � +O(||�||2)
df�(y) = tr(@µ�(y)) = div(µ�)(y)
SURE�(y) = ||y � µ�(y)||2 � �2P + 2�2df�(y)
Ew(SURE�(y)) = Ew(||�x0 � µ�(y)||2)
Prediction Risk Estimation
Problem: ||�x0 � �x�(y)|| poor indicator of ||x0 � x�(y)||.
Generalized SURE: take into account risk on ker(�)
?
Generalized SURE
gdf�(y) = tr�(��⇤)+@µ�(y)
�
Problem: ||�x0 � �x�(y)|| poor indicator of ||x0 � x�(y)||.
GSURE�(y)= ||x(y)��x�(y)||2��
2tr((��⇤)+)+2�2gdf�(y)
Generalized df:
Generalized SURE: take into account risk on ker(�)
?
Projker(�)? = ⇧ = �
⇤(��
⇤)
+�
ML estimator: x(y) = �
⇤(��
⇤)
+y.
Generalized SURE
gdf�(y) = tr�(��⇤)+@µ�(y)
�
Theorem:
Problem: ||�x0 � �x�(y)|| poor indicator of ||x0 � x�(y)||.
GSURE�(y)= ||x(y)��x�(y)||2��
2tr((��⇤)+)+2�2gdf�(y)
Generalized df:
Generalized SURE: take into account risk on ker(�)
?
Projker(�)? = ⇧ = �
⇤(��
⇤)
+�
ML estimator: x(y) = �
⇤(��
⇤)
+y.
Ew(GSURE�(y)) = Ew(||⇧(x0 � x�(y))||2)
Generalized SURE
[Eldar 09, Pesquet al. 09, Vonesh et al. 08]
gdf�(y) = tr�(��⇤)+@µ�(y)
�
Theorem:
Problem: ||�x0 � �x�(y)|| poor indicator of ||x0 � x�(y)||.
GSURE�(y)= ||x(y)��x�(y)||2��
2tr((��⇤)+)+2�2gdf�(y)
Generalized df:
Generalized SURE: take into account risk on ker(�)
?
Projker(�)? = ⇧ = �
⇤(��
⇤)
+�
ML estimator: x(y) = �
⇤(��
⇤)
+y.
Ew(GSURE�(y)) = Ew(||⇧(x0 � x�(y))||2)
Generalized SURE
�! How to compute @µ�(y) ?
[Eldar 09, Pesquet al. 09, Vonesh et al. 08]
Overview
• Synthesis vs. Analysis Regularization
• Local Behavior of Sparse Regularization
• SURE Unbiased Risk Estimator
• Numerical Illustrations
y = �x0 + w
Recovery:
Observations:
x
�
(y) 2 argminx2RN
1
2||�x� y||2 + �||D⇤
x||1
Variations and Stability
(P�(y))
y = �x0 + w
Recovery:
Observations:
x
�
(y) 2 argminx2RN
1
2||�x� y||2 + �||D⇤
x||1
Remark: x�(y) not always unique but
µ�(y) = �x�(y) always unique.
Variations and Stability
(P�(y))
y = �x0 + w
Questions:
Recovery:
Observations:
x
�
(y) 2 argminx2RN
1
2||�x� y||2 + �||D⇤
x||1
– When is y ! µ�(y) di↵erentiable ?
– Formula for @µ�(y).
Remark: x�(y) not always unique but
µ�(y) = �x�(y) always unique.
Variations and Stability
(P�(y))
D� =
�
���
1 �1 0 00 1 �1 00 0 1 �1�1 0 0 1
�
���
TV-1D ball:Displayed in {x \ ⇥x, 1⇤ = 0} � R3
x
�
(y) 2 argminy=�x
||D⇤x||1
Copyright
c� Mael
TV-1D PolytopeB = {x \ ||D�x||1 � 1}
D� =
�
���
1 �1 0 00 1 �1 00 0 1 �1�1 0 0 1
�
���
TV-1D ball:Displayed in {x \ ⇥x, 1⇤ = 0} � R3
x
�
(y) 2 argminy=�x
||D⇤x||1
Copyright
c� Mael
TV-1D Polytope
x�{x \ y = �x}
B = {x \ ||D�x||1 � 1}
D� =
�
���
1 �1 0 00 1 �1 00 0 1 �1�1 0 0 1
�
���
TV-1D ball:Displayed in {x \ ⇥x, 1⇤ = 0} � R3
x
�
(y) 2 argminy=�x
||D⇤x||1
Copyright
c� Mael
TV-1D Polytope
x�{x \ y = �x}
B = {x \ ||D�x||1 � 1}
D� =
�
���
1 �1 0 00 1 �1 00 0 1 �1�1 0 0 1
�
���
TV-1D ball:Displayed in {x \ ⇥x, 1⇤ = 0} � R3
x
�
(y) 2 argminy=�x
||D⇤x||1
Copyright
c� Mael
TV-1D Polytope
x�{x \ y = �x}
B = {x \ ||D�x||1 � 1}
J = Ic
(P�(y))
Support of the solution:I = {i \ (D⇤
x�(y))i 6= 0}
Union of Subspaces Model
x
�
(y) 2 argminx2RN
1
2||�x� y||2 + �||D⇤
x||1
J = Ic
(P�(y))
I
Support of the solution:
1-D total variation:
I = {i \ (D⇤x�(y))i 6= 0}
Union of Subspaces Model
D�x = (xi � xi�1)i
x
x
�
(y) 2 argminx2RN
1
2||�x� y||2 + �||D⇤
x||1
J = Ic
(P�(y))
I
Support of the solution:
1-D total variation:
Sub-space model:
I = {i \ (D⇤x�(y))i 6= 0}
Union of Subspaces Model
D�x = (xi � xi�1)i
x
GJ = ker(D�J) = Im(DJ)�
x
�
(y) 2 argminx2RN
1
2||�x� y||2 + �||D⇤
x||1
J = Ic
(P�(y))
I
Support of the solution:
1-D total variation:
Sub-space model:
I = {i \ (D⇤x�(y))i 6= 0}
Lemma: There exists a solution x?such that (HJ) holds.
Union of Subspaces Model
D�x = (xi � xi�1)i
x
GJ = ker(D�J) = Im(DJ)�
Local well-posedness: ker(�) � GJ = {0} (HJ)
x
�
(y) 2 argminx2RN
1
2||�x� y||2 + �||D⇤
x||1
To be understood: there exists a solution with same sign.
Local Sign Stability(P�(y))
Lemma:
dim(GJ)
2
2 2
2
2 2
11
1
11
0
x
�
(y) 2 argminx
1
2||�x� y||2 + �||D⇤
x||1
sign(D
⇤x�(y)) is constant around (y,�) /2 H.
Linearized problem:x�(y) = argmin
x�GJ
12
||�x � y||2 + � �D�Ix, sI�
To be understood: there exists a solution with same sign.
= sign(D⇤Ix�(y))
Local Sign Stability(P�(y))
Lemma:
dim(GJ)
2
2 2
2
2 2
11
1
11
0
x
�
(y) 2 argminx
1
2||�x� y||2 + �||D⇤
x||1
sign(D
⇤x�(y)) is constant around (y,�) /2 H.
Linearized problem:
= A[J]���y � �DIsI
�x�(y) = argmin
x�GJ
12
||�x � y||2 + � �D�Ix, sI�
To be understood: there exists a solution with same sign.
A[J]z = argminx�GJ
12
||�x||2 � �x, z�
= sign(D⇤Ix�(y))
Local Sign Stability(P�(y))
Lemma:
dim(GJ)
2
2 2
2
2 2
11
1
11
0
x
�
(y) 2 argminx
1
2||�x� y||2 + �||D⇤
x||1
sign(D
⇤x�(y)) is constant around (y,�) /2 H.
Linearized problem:
= A[J]���y � �DIsI
�x�(y) = argmin
x�GJ
12
||�x � y||2 + � �D�Ix, sI�
To be understood: there exists a solution with same sign.
A[J]z = argminx�GJ
12
||�x||2 � �x, z�
= sign(D⇤Ix�(y))
Local Sign Stability(P�(y))
Lemma:
dim(GJ)
2
2 2
2
2 2
11
1
11
0Theorem:x�(y) is a solution of P�(y).
If (y, �) /� H, for (y, �)
close to (y, �),
x
�
(y) 2 argminx
1
2||�x� y||2 + �||D⇤
x||1
sign(D
⇤x�(y)) is constant around (y,�) /2 H.
Local parameterization:
Under uniqueness assumption:
are piecewise a�ne functions.y 7! x�(y)� 7! x�(y)
change of support of D
⇤x�(y)x0+
Local Affine Maps
��0 = 0 �k
x�k = 0
breaking points��
x�(y) = A[J]��y � �A[J]DI sI
for D = Id (synthesis)
gdf�(y) = tr(⇧A[J])
Let I = supp(D⇤x�(y)) such that HJ holds.
are unbiased estimators of df and gdf.
df�(y) = tr (µ�(y)) = dim(�A[J]�⇤)
Application to GSURE
For y /2 H, one has locally:
df�(y) = ||x�(y)||0
Corollary:
µ�(y) = �A[J]�⇤y + cst.
for D = Id (synthesis)
gdf�(y) = tr(⇧A[J])
Let I = supp(D⇤x�(y)) such that HJ holds.
are unbiased estimators of df and gdf.
Trick: tr(A) = Ez(hAz, zi), z ⇠ N (0, IdP ).
df�(y) = tr (µ�(y)) = dim(�A[J]�⇤)
Application to GSURE
For y /2 H, one has locally:
df�(y) = ||x�(y)||0
Corollary:
µ�(y) = �A[J]�⇤y + cst.
In practice:
for D = Id (synthesis)
gdf�(y) = tr(⇧A[J])
Proposition:
gdf�(y) = Ez(h⌫(z), �+zi), z ⇠ N (0, IdP )✓�⇤� DJ
D⇤J 0
◆✓⌫(z)⌫
◆=
✓�⇤z0
◆where ⌫(z) solves
Let I = supp(D⇤x�(y)) such that HJ holds.
are unbiased estimators of df and gdf.
Trick: tr(A) = Ez(hAz, zi), z ⇠ N (0, IdP ).
gdf�(y) ⇡ 1K
PKk=1h⌫(zk), �+zki, zk ⇠ N (0, IdP ).
df�(y) = tr (µ�(y)) = dim(�A[J]�⇤)
Application to GSURE
For y /2 H, one has locally:
df�(y) = ||x�(y)||0
Corollary:
µ�(y) = �A[J]�⇤y + cst.
Unbiased Risk Estimation for Sparse Analysis RegularizationCharles Deledalle1, Samuel Vaiter1, Gabriel Peyre1, Jalal Fadili3 and Charles Dossal2
1CEREMADE, Universite Paris–Dauphine — 2GREY’C, ENSICAEN — 3IMB, Universite Bordeaux I
Problem statement
Consider the convex but non-smooth Analysis Sparsity Regularization problem
x
?(y,�) 2 argminx2RN
1
2||y � �x||2 + �||D⇤
x||1 (P�
(y))
which aims at inverting
y = �x0 + w
by promoting sparsity and with
Ix0 2 RN the unknown image of interest,
Iy 2 RQ the low-dimensional noisy observation of x0,
I � 2 RQ⇥N a linear operator that models the acquisition process,
Iw ⇠ N (0, �2Id
Q
) the noise component,
ID 2 RN⇥P an analysis dictionary, and
I� > 0 a regularization parameter.
How to choose the value of the parameter �?
Risk-based selection of �
I Risk associated to �: measure of the expected quality of x?(y,�) wrt x0,
R(�) = Ew
||x?(y,�) � x0||2 .I The optimal (theoretical) � minimizes the risk.
The risk is unknown since it depends on x0.
Can we estimate the risk solely from x
?(y,�)?
Risk estimation
I Assume y 7! �x?(y,�) is weakly di↵erentiable (a fortiori uniquely defined).
Prediction risk estimation via SURE
I The Stein Unbiased Risk Estimator (SURE):
SURE(y,�) =||y � �x?(y,�)||2 � �
2Q + 2�2 tr
✓@�x?(y,�)
@y
◆
| {z }Estimator of the DOF
is an unbiased estimator of the prediction risk [Stein, 1981]:
Ew
(SURE(y,�)) = Ew
(||�x0 � �x?(y,�)||2) .
Projection risk estimation via GSURE
I Let ⇧ = �⇤(��⇤)+� be the orthogonal projector on ker(�)? = Im(�⇤),I Denote xML(y) = �⇤(��⇤)+y,I The Generalized Stein Unbiased Risk Estimator (GSURE):
GSURE(y,�) =||xML(y) � ⇧x?(y,�)||2 � �
2 tr((��⇤)+) + 2�2 tr
✓(��⇤)+@�x
?(y,�)
@y
◆
is an unbiased estimator of the projection risk [Vaiter et al., 2012]
Ew
(GSURE(y,�)) = Ew
(||⇧x0 � ⇧x?(y,�)||2)(see also [Eldar, 2009, Pesquet et al., 2009, Vonesch et al., 2008] for similar results).
Illustration of risk estimation
(here, x? denotes x?(y,�) for an arbitrary value of �)
How to estimate the quantity tr⇣(��⇤)+@x
?(y,�)@y
⌘?
Main notations and assumptions
I Let I = supp(D⇤x
?(y,�)) be the support of D⇤x
?(y,�),I Let J = I
c be the co-support of D⇤x
?(y,�),I Let D
I
be the submatrix of D whose columns are indexed by I ,
I Let sI
= sign(D⇤x
?(y,�))I
be the subvector of D⇤x
?(y,�) whose entries are indexed by I ,
I Let GJ
= KerD⇤J
be the “cospace” associated to x
?(y,�) ,I To study the local behaviour of x?(y,�), we impose � to be “invertible” on G
J
:
GJ
\ Ker� = {0},I It allows us to define the matrix
A
[J ] = U(U⇤�⇤�U)�1U
⇤,
where U is a matrix whose columns form a basis of GJ
,
I In this case, we obtain an implicit equation:
x
?(y,�) solution of P�
(y) , x
?(y,�) = x(y,�) , A
[J ]�⇤y � �A
[J ]D
I
s
I
.
Is this relation true in a neighbourhood of (y,�)?
Theorem (Local Parameterization)
I Even if the solutions x?(y,�) of P�
(y) might benot unique, �x?(y,�) is uniquely defined.
I If (y,�) 62 H, for (y, �) close to (y,�), x(y, �)is a solution of P(y, �) where
x(y, �) = A
[J ]�⇤y � �A
[J ]D
I
s
I
.
I Hence, it allows us writing
@�x?(y,�)
@y
= �A[J ]�⇤,
I Moreover, the DOF can be estimated by
tr
✓@�x?(y,�)
@y
◆= dim(G
J
) .
Can we compute this quantity e�ciently?
�
x1
x2
�0 = 0 �k
x�k = 0
x�0
P0(y)
Monday, September 24, 12
Computation of GSURE
I One has for Z ⇠ N (0, IdP
),
tr
✓(��⇤)+@�x
?(y,�)
@y
◆= E
Z
(h⌫(Z), �⇤(��⇤)+Zi)
where, for any z 2 RP , ⌫ = ⌫(z) solves the following linear system✓�⇤� D
J
D
⇤J
0
◆✓⌫
⌫
◆=
✓�⇤
z
0
◆.
I In practice, with law of large number, the empirical mean is replaced for the expectation.
I The computation of ⌫(z) is achieved by solving the linear system with a conjugate gradient solver.
Numerical example
Super-resolution using (anisotropic) Total-Variation
(a) y
(b) x?(y,�) at the optimal � 2 4 6 8 10 12
1
1.5
2
2.5x 10
6
Regularization parameter !
Qu
ad
ratic
loss
Projection RiskGSURETrue Risk
Compressed-sensing using multi-scale wavelet thresholding
(c) xML
(d) x?(y,�) at the optimal �2 4 6 8 10 12
1
1.5
2
2.5x 10
6
Regularization parameter !
Qu
ad
ratic
loss
Projection RiskGSURETrue Risk
Perspectives: How to e�ciently minimizes GSURE(y,�) wrt �?
References
Eldar, Y. C. (2009).Generalized SURE for exponential families: Applications to regularization.IEEE Transactions on Signal Processing, 57(2):471–481.
Pesquet, J.-C., Benazza-Benyahia, A., and Chaux, C. (2009).A SURE approach for digital signal/image deconvolution problems.IEEE Transactions on Signal Processing, 57(12):4616–4632.
Stein, C. (1981).Estimation of the mean of a multivariate normal distribution.The Annals of Statistics, 9(6):1135–1151.
Vaiter, S., Deledalle, C., Peyre, G., Dossal, C., and Fadili, J. (2012).Local behavior of sparse analysis regularization: Applications to risk estimation.Arxiv preprint arXiv:1204.3212.
Vonesch, C., Ramani, S., and Unser, M. (2008).Recursive risk estimation for non-linear image deconvolution with a wavelet-domain sparsity constraint.In ICIP, pages 665–668. IEEE.
http://www.ceremade.dauphine.fr/~deledall/ [email protected]
Unbiased Risk Estimation for Sparse Analysis RegularizationCharles Deledalle1, Samuel Vaiter1, Gabriel Peyre1, Jalal Fadili3 and Charles Dossal2
1CEREMADE, Universite Paris–Dauphine — 2GREY’C, ENSICAEN — 3IMB, Universite Bordeaux I
Problem statement
Consider the convex but non-smooth Analysis Sparsity Regularization problem
x
?(y,�) 2 argminx2RN
1
2||y � �x||2 + �||D⇤
x||1 (P�
(y))
which aims at inverting
y = �x0 + w
by promoting sparsity and with
Ix0 2 RN the unknown image of interest,
Iy 2 RQ the low-dimensional noisy observation of x0,
I � 2 RQ⇥N a linear operator that models the acquisition process,
Iw ⇠ N (0, �2Id
Q
) the noise component,
ID 2 RN⇥P an analysis dictionary, and
I� > 0 a regularization parameter.
How to choose the value of the parameter �?
Risk-based selection of �
I Risk associated to �: measure of the expected quality of x?(y,�) wrt x0,
R(�) = Ew
||x?(y,�) � x0||2 .I The optimal (theoretical) � minimizes the risk.
The risk is unknown since it depends on x0.
Can we estimate the risk solely from x
?(y,�)?
Risk estimation
I Assume y 7! �x?(y,�) is weakly di↵erentiable (a fortiori uniquely defined).
Prediction risk estimation via SURE
I The Stein Unbiased Risk Estimator (SURE):
SURE(y,�) =||y � �x?(y,�)||2 � �
2Q + 2�2 tr
✓@�x?(y,�)
@y
◆
| {z }Estimator of the DOF
is an unbiased estimator of the prediction risk [Stein, 1981]:
Ew
(SURE(y,�)) = Ew
(||�x0 � �x?(y,�)||2) .
Projection risk estimation via GSURE
I Let ⇧ = �⇤(��⇤)+� be the orthogonal projector on ker(�)? = Im(�⇤),I Denote xML(y) = �⇤(��⇤)+y,I The Generalized Stein Unbiased Risk Estimator (GSURE):
GSURE(y,�) =||xML(y) � ⇧x?(y,�)||2 � �
2 tr((��⇤)+) + 2�2 tr
✓(��⇤)+@�x
?(y,�)
@y
◆
is an unbiased estimator of the projection risk [Vaiter et al., 2012]
Ew
(GSURE(y,�)) = Ew
(||⇧x0 � ⇧x?(y,�)||2)(see also [Eldar, 2009, Pesquet et al., 2009, Vonesch et al., 2008] for similar results).
Illustration of risk estimation
(here, x? denotes x?(y,�) for an arbitrary value of �)
How to estimate the quantity tr⇣(��⇤)+@x
?(y,�)@y
⌘?
Main notations and assumptions
I Let I = supp(D⇤x
?(y,�)) be the support of D⇤x
?(y,�),I Let J = I
c be the co-support of D⇤x
?(y,�),I Let D
I
be the submatrix of D whose columns are indexed by I ,
I Let sI
= sign(D⇤x
?(y,�))I
be the subvector of D⇤x
?(y,�) whose entries are indexed by I ,
I Let GJ
= KerD⇤J
be the “cospace” associated to x
?(y,�) ,I To study the local behaviour of x?(y,�), we impose � to be “invertible” on G
J
:
GJ
\ Ker� = {0},I It allows us to define the matrix
A
[J ] = U(U⇤�⇤�U)�1U
⇤,
where U is a matrix whose columns form a basis of GJ
,
I In this case, we obtain an implicit equation:
x
?(y,�) solution of P�
(y) , x
?(y,�) = x(y,�) , A
[J ]�⇤y � �A
[J ]D
I
s
I
.
Is this relation true in a neighbourhood of (y,�)?
Theorem (Local Parameterization)
I Even if the solutions x?(y,�) of P�
(y) might benot unique, �x?(y,�) is uniquely defined.
I If (y,�) 62 H, for (y, �) close to (y,�), x(y, �)is a solution of P(y, �) where
x(y, �) = A
[J ]�⇤y � �A
[J ]D
I
s
I
.
I Hence, it allows us writing
@�x?(y,�)
@y
= �A[J ]�⇤,
I Moreover, the DOF can be estimated by
tr
✓@�x?(y,�)
@y
◆= dim(G
J
) .
Can we compute this quantity e�ciently?
�
x1
x2
�0 = 0 �k
x�k = 0
x�0
P0(y)
Monday, September 24, 12
Computation of GSURE
I One has for Z ⇠ N (0, IdP
),
tr
✓(��⇤)+@�x
?(y,�)
@y
◆= E
Z
(h⌫(Z), �⇤(��⇤)+Zi)
where, for any z 2 RP , ⌫ = ⌫(z) solves the following linear system✓�⇤� D
J
D
⇤J
0
◆✓⌫
⌫
◆=
✓�⇤
z
0
◆.
I In practice, with law of large number, the empirical mean is replaced for the expectation.
I The computation of ⌫(z) is achieved by solving the linear system with a conjugate gradient solver.
Numerical example
Super-resolution using (anisotropic) Total-Variation
(a) y
(b) x?(y,�) at the optimal � 2 4 6 8 10 12
1
1.5
2
2.5x 10
6
Regularization parameter !
Quadra
tic lo
ss
Projection RiskGSURETrue Risk
Compressed-sensing using multi-scale wavelet thresholding
(c) xML
(d) x?(y,�) at the optimal �2 4 6 8 10 12
1
1.5
2
2.5x 10
6
Regularization parameter !
Quadra
tic lo
ss
Projection RiskGSURETrue Risk
Perspectives: How to e�ciently minimizes GSURE(y,�) wrt �?
References
Eldar, Y. C. (2009).Generalized SURE for exponential families: Applications to regularization.IEEE Transactions on Signal Processing, 57(2):471–481.
Pesquet, J.-C., Benazza-Benyahia, A., and Chaux, C. (2009).A SURE approach for digital signal/image deconvolution problems.IEEE Transactions on Signal Processing, 57(12):4616–4632.
Stein, C. (1981).Estimation of the mean of a multivariate normal distribution.The Annals of Statistics, 9(6):1135–1151.
Vaiter, S., Deledalle, C., Peyre, G., Dossal, C., and Fadili, J. (2012).Local behavior of sparse analysis regularization: Applications to risk estimation.Arxiv preprint arXiv:1204.3212.
Vonesch, C., Ramani, S., and Unser, M. (2008).Recursive risk estimation for non-linear image deconvolution with a wavelet-domain sparsity constraint.In ICIP, pages 665–668. IEEE.
http://www.ceremade.dauphine.fr/~deledall/ [email protected]
� 2 RP⇥Nrealization of a random vector.
�+y
x�?(y)
D: wavelet TI redundant tight frame.
P = N/4
CS with Wavelets
Unbiased Risk Estimation for Sparse Analysis RegularizationCharles Deledalle1, Samuel Vaiter1, Gabriel Peyre1, Jalal Fadili3 and Charles Dossal2
1CEREMADE, Universite Paris–Dauphine — 2GREY’C, ENSICAEN — 3IMB, Universite Bordeaux I
Problem statement
Consider the convex but non-smooth Analysis Sparsity Regularization problem
x
?(y,�) 2 argminx2RN
1
2||y � �x||2 + �||D⇤
x||1 (P�
(y))
which aims at inverting
y = �x0 + w
by promoting sparsity and with
Ix0 2 RN the unknown image of interest,
Iy 2 RQ the low-dimensional noisy observation of x0,
I � 2 RQ⇥N a linear operator that models the acquisition process,
Iw ⇠ N (0, �2Id
Q
) the noise component,
ID 2 RN⇥P an analysis dictionary, and
I� > 0 a regularization parameter.
How to choose the value of the parameter �?
Risk-based selection of �
I Risk associated to �: measure of the expected quality of x?(y,�) wrt x0,
R(�) = Ew
||x?(y,�) � x0||2 .I The optimal (theoretical) � minimizes the risk.
The risk is unknown since it depends on x0.
Can we estimate the risk solely from x
?(y,�)?
Risk estimation
I Assume y 7! �x?(y,�) is weakly di↵erentiable (a fortiori uniquely defined).
Prediction risk estimation via SURE
I The Stein Unbiased Risk Estimator (SURE):
SURE(y,�) =||y � �x?(y,�)||2 � �
2Q + 2�2 tr
✓@�x?(y,�)
@y
◆
| {z }Estimator of the DOF
is an unbiased estimator of the prediction risk [Stein, 1981]:
Ew
(SURE(y,�)) = Ew
(||�x0 � �x?(y,�)||2) .
Projection risk estimation via GSURE
I Let ⇧ = �⇤(��⇤)+� be the orthogonal projector on ker(�)? = Im(�⇤),I Denote xML(y) = �⇤(��⇤)+y,I The Generalized Stein Unbiased Risk Estimator (GSURE):
GSURE(y,�) =||xML(y) � ⇧x?(y,�)||2 � �
2 tr((��⇤)+) + 2�2 tr
✓(��⇤)+@�x
?(y,�)
@y
◆
is an unbiased estimator of the projection risk [Vaiter et al., 2012]
Ew
(GSURE(y,�)) = Ew
(||⇧x0 � ⇧x?(y,�)||2)(see also [Eldar, 2009, Pesquet et al., 2009, Vonesch et al., 2008] for similar results).
Illustration of risk estimation
(here, x? denotes x?(y,�) for an arbitrary value of �)
How to estimate the quantity tr⇣(��⇤)+@x
?(y,�)@y
⌘?
Main notations and assumptions
I Let I = supp(D⇤x
?(y,�)) be the support of D⇤x
?(y,�),I Let J = I
c be the co-support of D⇤x
?(y,�),I Let D
I
be the submatrix of D whose columns are indexed by I ,
I Let sI
= sign(D⇤x
?(y,�))I
be the subvector of D⇤x
?(y,�) whose entries are indexed by I ,
I Let GJ
= KerD⇤J
be the “cospace” associated to x
?(y,�) ,I To study the local behaviour of x?(y,�), we impose � to be “invertible” on G
J
:
GJ
\ Ker� = {0},I It allows us to define the matrix
A
[J ] = U(U⇤�⇤�U)�1U
⇤,
where U is a matrix whose columns form a basis of GJ
,
I In this case, we obtain an implicit equation:
x
?(y,�) solution of P�
(y) , x
?(y,�) = x(y,�) , A
[J ]�⇤y � �A
[J ]D
I
s
I
.
Is this relation true in a neighbourhood of (y,�)?
Theorem (Local Parameterization)
I Even if the solutions x?(y,�) of P�
(y) might benot unique, �x?(y,�) is uniquely defined.
I If (y,�) 62 H, for (y, �) close to (y,�), x(y, �)is a solution of P(y, �) where
x(y, �) = A
[J ]�⇤y � �A
[J ]D
I
s
I
.
I Hence, it allows us writing
@�x?(y,�)
@y
= �A[J ]�⇤,
I Moreover, the DOF can be estimated by
tr
✓@�x?(y,�)
@y
◆= dim(G
J
) .
Can we compute this quantity e�ciently?
�
x1
x2
�0 = 0 �k
x�k = 0
x�0
P0(y)
Monday, September 24, 12
Computation of GSURE
I One has for Z ⇠ N (0, IdP
),
tr
✓(��⇤)+@�x
?(y,�)
@y
◆= E
Z
(h⌫(Z), �⇤(��⇤)+Zi)
where, for any z 2 RP , ⌫ = ⌫(z) solves the following linear system✓�⇤� D
J
D
⇤J
0
◆✓⌫
⌫
◆=
✓�⇤
z
0
◆.
I In practice, with law of large number, the empirical mean is replaced for the expectation.
I The computation of ⌫(z) is achieved by solving the linear system with a conjugate gradient solver.
Numerical example
Super-resolution using (anisotropic) Total-Variation
(a) y
(b) x?(y,�) at the optimal � 2 4 6 8 10 12
1
1.5
2
2.5x 10
6
Regularization parameter !
Qu
ad
ratic
loss
Projection RiskGSURETrue Risk
Compressed-sensing using multi-scale wavelet thresholding
(c) xML
(d) x?(y,�) at the optimal �2 4 6 8 10 12
1
1.5
2
2.5x 10
6
Regularization parameter !
Qu
ad
ratic
loss
Projection RiskGSURETrue Risk
Perspectives: How to e�ciently minimizes GSURE(y,�) wrt �?
References
Eldar, Y. C. (2009).Generalized SURE for exponential families: Applications to regularization.IEEE Transactions on Signal Processing, 57(2):471–481.
Pesquet, J.-C., Benazza-Benyahia, A., and Chaux, C. (2009).A SURE approach for digital signal/image deconvolution problems.IEEE Transactions on Signal Processing, 57(12):4616–4632.
Stein, C. (1981).Estimation of the mean of a multivariate normal distribution.The Annals of Statistics, 9(6):1135–1151.
Vaiter, S., Deledalle, C., Peyre, G., Dossal, C., and Fadili, J. (2012).Local behavior of sparse analysis regularization: Applications to risk estimation.Arxiv preprint arXiv:1204.3212.
Vonesch, C., Ramani, S., and Unser, M. (2008).Recursive risk estimation for non-linear image deconvolution with a wavelet-domain sparsity constraint.In ICIP, pages 665–668. IEEE.
http://www.ceremade.dauphine.fr/~deledall/ [email protected]
��?
Quadraticloss
Unbiased Risk Estimation for Sparse Analysis RegularizationCharles Deledalle1, Samuel Vaiter1, Gabriel Peyre1, Jalal Fadili3 and Charles Dossal2
1CEREMADE, Universite Paris–Dauphine — 2GREY’C, ENSICAEN — 3IMB, Universite Bordeaux I
Problem statement
Consider the convex but non-smooth Analysis Sparsity Regularization problem
x
?(y,�) 2 argminx2RN
1
2||y � �x||2 + �||D⇤
x||1 (P�
(y))
which aims at inverting
y = �x0 + w
by promoting sparsity and with
Ix0 2 RN the unknown image of interest,
Iy 2 RQ the low-dimensional noisy observation of x0,
I � 2 RQ⇥N a linear operator that models the acquisition process,
Iw ⇠ N (0, �2Id
Q
) the noise component,
ID 2 RN⇥P an analysis dictionary, and
I� > 0 a regularization parameter.
How to choose the value of the parameter �?
Risk-based selection of �
I Risk associated to �: measure of the expected quality of x?(y,�) wrt x0,
R(�) = Ew
||x?(y,�) � x0||2 .I The optimal (theoretical) � minimizes the risk.
The risk is unknown since it depends on x0.
Can we estimate the risk solely from x
?(y,�)?
Risk estimation
I Assume y 7! �x?(y,�) is weakly di↵erentiable (a fortiori uniquely defined).
Prediction risk estimation via SURE
I The Stein Unbiased Risk Estimator (SURE):
SURE(y,�) =||y � �x?(y,�)||2 � �
2Q + 2�2 tr
✓@�x?(y,�)
@y
◆
| {z }Estimator of the DOF
is an unbiased estimator of the prediction risk [Stein, 1981]:
Ew
(SURE(y,�)) = Ew
(||�x0 � �x?(y,�)||2) .
Projection risk estimation via GSURE
I Let ⇧ = �⇤(��⇤)+� be the orthogonal projector on ker(�)? = Im(�⇤),I Denote xML(y) = �⇤(��⇤)+y,I The Generalized Stein Unbiased Risk Estimator (GSURE):
GSURE(y,�) =||xML(y) � ⇧x?(y,�)||2 � �
2 tr((��⇤)+) + 2�2 tr
✓(��⇤)+@�x
?(y,�)
@y
◆
is an unbiased estimator of the projection risk [Vaiter et al., 2012]
Ew
(GSURE(y,�)) = Ew
(||⇧x0 � ⇧x?(y,�)||2)(see also [Eldar, 2009, Pesquet et al., 2009, Vonesch et al., 2008] for similar results).
Illustration of risk estimation
(here, x? denotes x?(y,�) for an arbitrary value of �)
How to estimate the quantity tr⇣(��⇤)+@x
?(y,�)@y
⌘?
Main notations and assumptions
I Let I = supp(D⇤x
?(y,�)) be the support of D⇤x
?(y,�),I Let J = I
c be the co-support of D⇤x
?(y,�),I Let D
I
be the submatrix of D whose columns are indexed by I ,
I Let sI
= sign(D⇤x
?(y,�))I
be the subvector of D⇤x
?(y,�) whose entries are indexed by I ,
I Let GJ
= KerD⇤J
be the “cospace” associated to x
?(y,�) ,I To study the local behaviour of x?(y,�), we impose � to be “invertible” on G
J
:
GJ
\ Ker� = {0},I It allows us to define the matrix
A
[J ] = U(U⇤�⇤�U)�1U
⇤,
where U is a matrix whose columns form a basis of GJ
,
I In this case, we obtain an implicit equation:
x
?(y,�) solution of P�
(y) , x
?(y,�) = x(y,�) , A
[J ]�⇤y � �A
[J ]D
I
s
I
.
Is this relation true in a neighbourhood of (y,�)?
Theorem (Local Parameterization)
I Even if the solutions x?(y,�) of P�
(y) might benot unique, �x?(y,�) is uniquely defined.
I If (y,�) 62 H, for (y, �) close to (y,�), x(y, �)is a solution of P(y, �) where
x(y, �) = A
[J ]�⇤y � �A
[J ]D
I
s
I
.
I Hence, it allows us writing
@�x?(y,�)
@y
= �A[J ]�⇤,
I Moreover, the DOF can be estimated by
tr
✓@�x?(y,�)
@y
◆= dim(G
J
) .
Can we compute this quantity e�ciently?
�
x1
x2
�0 = 0 �k
x�k = 0
x�0
P0(y)
Monday, September 24, 12
Computation of GSURE
I One has for Z ⇠ N (0, IdP
),
tr
✓(��⇤)+@�x
?(y,�)
@y
◆= E
Z
(h⌫(Z), �⇤(��⇤)+Zi)
where, for any z 2 RP , ⌫ = ⌫(z) solves the following linear system✓�⇤� D
J
D
⇤J
0
◆✓⌫
⌫
◆=
✓�⇤
z
0
◆.
I In practice, with law of large number, the empirical mean is replaced for the expectation.
I The computation of ⌫(z) is achieved by solving the linear system with a conjugate gradient solver.
Numerical example
Super-resolution using (anisotropic) Total-Variation
(a) y
(b) x?(y,�) at the optimal � 2 4 6 8 10 12
1
1.5
2
2.5x 10
6
Regularization parameter !
Quadra
tic lo
ss
Projection RiskGSURETrue Risk
Compressed-sensing using multi-scale wavelet thresholding
(c) xML
(d) x?(y,�) at the optimal �2 4 6 8 10 12
1
1.5
2
2.5x 10
6
Regularization parameter !
Quadra
tic lo
ss
Projection RiskGSURETrue Risk
Perspectives: How to e�ciently minimizes GSURE(y,�) wrt �?
References
Eldar, Y. C. (2009).Generalized SURE for exponential families: Applications to regularization.IEEE Transactions on Signal Processing, 57(2):471–481.
Pesquet, J.-C., Benazza-Benyahia, A., and Chaux, C. (2009).A SURE approach for digital signal/image deconvolution problems.IEEE Transactions on Signal Processing, 57(12):4616–4632.
Stein, C. (1981).Estimation of the mean of a multivariate normal distribution.The Annals of Statistics, 9(6):1135–1151.
Vaiter, S., Deledalle, C., Peyre, G., Dossal, C., and Fadili, J. (2012).Local behavior of sparse analysis regularization: Applications to risk estimation.Arxiv preprint arXiv:1204.3212.
Vonesch, C., Ramani, S., and Unser, M. (2008).Recursive risk estimation for non-linear image deconvolution with a wavelet-domain sparsity constraint.In ICIP, pages 665–668. IEEE.
http://www.ceremade.dauphine.fr/~deledall/ [email protected]
Unbiased Risk Estimation for Sparse Analysis RegularizationCharles Deledalle1, Samuel Vaiter1, Gabriel Peyre1, Jalal Fadili3 and Charles Dossal2
1CEREMADE, Universite Paris–Dauphine — 2GREY’C, ENSICAEN — 3IMB, Universite Bordeaux I
Problem statement
Consider the convex but non-smooth Analysis Sparsity Regularization problem
x
?(y,�) 2 argminx2RN
1
2||y � �x||2 + �||D⇤
x||1 (P�
(y))
which aims at inverting
y = �x0 + w
by promoting sparsity and with
Ix0 2 RN the unknown image of interest,
Iy 2 RQ the low-dimensional noisy observation of x0,
I � 2 RQ⇥N a linear operator that models the acquisition process,
Iw ⇠ N (0, �2Id
Q
) the noise component,
ID 2 RN⇥P an analysis dictionary, and
I� > 0 a regularization parameter.
How to choose the value of the parameter �?
Risk-based selection of �
I Risk associated to �: measure of the expected quality of x?(y,�) wrt x0,
R(�) = Ew
||x?(y,�) � x0||2 .I The optimal (theoretical) � minimizes the risk.
The risk is unknown since it depends on x0.
Can we estimate the risk solely from x
?(y,�)?
Risk estimation
I Assume y 7! �x?(y,�) is weakly di↵erentiable (a fortiori uniquely defined).
Prediction risk estimation via SURE
I The Stein Unbiased Risk Estimator (SURE):
SURE(y,�) =||y � �x?(y,�)||2 � �
2Q + 2�2 tr
✓@�x?(y,�)
@y
◆
| {z }Estimator of the DOF
is an unbiased estimator of the prediction risk [Stein, 1981]:
Ew
(SURE(y,�)) = Ew
(||�x0 � �x?(y,�)||2) .
Projection risk estimation via GSURE
I Let ⇧ = �⇤(��⇤)+� be the orthogonal projector on ker(�)? = Im(�⇤),I Denote xML(y) = �⇤(��⇤)+y,I The Generalized Stein Unbiased Risk Estimator (GSURE):
GSURE(y,�) =||xML(y) � ⇧x?(y,�)||2 � �
2 tr((��⇤)+) + 2�2 tr
✓(��⇤)+@�x
?(y,�)
@y
◆
is an unbiased estimator of the projection risk [Vaiter et al., 2012]
Ew
(GSURE(y,�)) = Ew
(||⇧x0 � ⇧x?(y,�)||2)(see also [Eldar, 2009, Pesquet et al., 2009, Vonesch et al., 2008] for similar results).
Illustration of risk estimation
(here, x? denotes x?(y,�) for an arbitrary value of �)
How to estimate the quantity tr⇣(��⇤)+@x
?(y,�)@y
⌘?
Main notations and assumptions
I Let I = supp(D⇤x
?(y,�)) be the support of D⇤x
?(y,�),I Let J = I
c be the co-support of D⇤x
?(y,�),I Let D
I
be the submatrix of D whose columns are indexed by I ,
I Let sI
= sign(D⇤x
?(y,�))I
be the subvector of D⇤x
?(y,�) whose entries are indexed by I ,
I Let GJ
= KerD⇤J
be the “cospace” associated to x
?(y,�) ,I To study the local behaviour of x?(y,�), we impose � to be “invertible” on G
J
:
GJ
\ Ker� = {0},I It allows us to define the matrix
A
[J ] = U(U⇤�⇤�U)�1U
⇤,
where U is a matrix whose columns form a basis of GJ
,
I In this case, we obtain an implicit equation:
x
?(y,�) solution of P�
(y) , x
?(y,�) = x(y,�) , A
[J ]�⇤y � �A
[J ]D
I
s
I
.
Is this relation true in a neighbourhood of (y,�)?
Theorem (Local Parameterization)
I Even if the solutions x?(y,�) of P�
(y) might benot unique, �x?(y,�) is uniquely defined.
I If (y,�) 62 H, for (y, �) close to (y,�), x(y, �)is a solution of P(y, �) where
x(y, �) = A
[J ]�⇤y � �A
[J ]D
I
s
I
.
I Hence, it allows us writing
@�x?(y,�)
@y
= �A[J ]�⇤,
I Moreover, the DOF can be estimated by
tr
✓@�x?(y,�)
@y
◆= dim(G
J
) .
Can we compute this quantity e�ciently?
�
x1
x2
�0 = 0 �k
x�k = 0
x�0
P0(y)
Monday, September 24, 12
Computation of GSURE
I One has for Z ⇠ N (0, IdP
),
tr
✓(��⇤)+@�x
?(y,�)
@y
◆= E
Z
(h⌫(Z), �⇤(��⇤)+Zi)
where, for any z 2 RP , ⌫ = ⌫(z) solves the following linear system✓�⇤� D
J
D
⇤J
0
◆✓⌫
⌫
◆=
✓�⇤
z
0
◆.
I In practice, with law of large number, the empirical mean is replaced for the expectation.
I The computation of ⌫(z) is achieved by solving the linear system with a conjugate gradient solver.
Numerical example
Super-resolution using (anisotropic) Total-Variation
(a) y
(b) x?(y,�) at the optimal � 2 4 6 8 10 12
1
1.5
2
2.5x 10
6
Regularization parameter !
Quadra
tic lo
ss
Projection RiskGSURETrue Risk
Compressed-sensing using multi-scale wavelet thresholding
(c) xML
(d) x?(y,�) at the optimal �2 4 6 8 10 12
1
1.5
2
2.5x 10
6
Regularization parameter !
Quadra
tic lo
ss
Projection RiskGSURETrue Risk
Perspectives: How to e�ciently minimizes GSURE(y,�) wrt �?
References
Eldar, Y. C. (2009).Generalized SURE for exponential families: Applications to regularization.IEEE Transactions on Signal Processing, 57(2):471–481.
Pesquet, J.-C., Benazza-Benyahia, A., and Chaux, C. (2009).A SURE approach for digital signal/image deconvolution problems.IEEE Transactions on Signal Processing, 57(12):4616–4632.
Stein, C. (1981).Estimation of the mean of a multivariate normal distribution.The Annals of Statistics, 9(6):1135–1151.
Vaiter, S., Deledalle, C., Peyre, G., Dossal, C., and Fadili, J. (2012).Local behavior of sparse analysis regularization: Applications to risk estimation.Arxiv preprint arXiv:1204.3212.
Vonesch, C., Ramani, S., and Unser, M. (2008).Recursive risk estimation for non-linear image deconvolution with a wavelet-domain sparsity constraint.In ICIP, pages 665–668. IEEE.
http://www.ceremade.dauphine.fr/~deledall/ [email protected]
Observations y
�: vertical sub-sampling.
D = [@1, @2]
Finite di↵erences gradient:
Anisotropic Total-Variation
Unbiased Risk Estimation for Sparse Analysis RegularizationCharles Deledalle1, Samuel Vaiter1, Gabriel Peyre1, Jalal Fadili3 and Charles Dossal2
1CEREMADE, Universite Paris–Dauphine — 2GREY’C, ENSICAEN — 3IMB, Universite Bordeaux I
Problem statement
Consider the convex but non-smooth Analysis Sparsity Regularization problem
x
?(y,�) 2 argminx2RN
1
2||y � �x||2 + �||D⇤
x||1 (P�
(y))
which aims at inverting
y = �x0 + w
by promoting sparsity and with
Ix0 2 RN the unknown image of interest,
Iy 2 RQ the low-dimensional noisy observation of x0,
I � 2 RQ⇥N a linear operator that models the acquisition process,
Iw ⇠ N (0, �2Id
Q
) the noise component,
ID 2 RN⇥P an analysis dictionary, and
I� > 0 a regularization parameter.
How to choose the value of the parameter �?
Risk-based selection of �
I Risk associated to �: measure of the expected quality of x?(y,�) wrt x0,
R(�) = Ew
||x?(y,�) � x0||2 .I The optimal (theoretical) � minimizes the risk.
The risk is unknown since it depends on x0.
Can we estimate the risk solely from x
?(y,�)?
Risk estimation
I Assume y 7! �x?(y,�) is weakly di↵erentiable (a fortiori uniquely defined).
Prediction risk estimation via SURE
I The Stein Unbiased Risk Estimator (SURE):
SURE(y,�) =||y � �x?(y,�)||2 � �
2Q + 2�2 tr
✓@�x?(y,�)
@y
◆
| {z }Estimator of the DOF
is an unbiased estimator of the prediction risk [Stein, 1981]:
Ew
(SURE(y,�)) = Ew
(||�x0 � �x?(y,�)||2) .
Projection risk estimation via GSURE
I Let ⇧ = �⇤(��⇤)+� be the orthogonal projector on ker(�)? = Im(�⇤),I Denote xML(y) = �⇤(��⇤)+y,I The Generalized Stein Unbiased Risk Estimator (GSURE):
GSURE(y,�) =||xML(y) � ⇧x?(y,�)||2 � �
2 tr((��⇤)+) + 2�2 tr
✓(��⇤)+@�x
?(y,�)
@y
◆
is an unbiased estimator of the projection risk [Vaiter et al., 2012]
Ew
(GSURE(y,�)) = Ew
(||⇧x0 � ⇧x?(y,�)||2)(see also [Eldar, 2009, Pesquet et al., 2009, Vonesch et al., 2008] for similar results).
Illustration of risk estimation
(here, x? denotes x?(y,�) for an arbitrary value of �)
How to estimate the quantity tr⇣(��⇤)+@x
?(y,�)@y
⌘?
Main notations and assumptions
I Let I = supp(D⇤x
?(y,�)) be the support of D⇤x
?(y,�),I Let J = I
c be the co-support of D⇤x
?(y,�),I Let D
I
be the submatrix of D whose columns are indexed by I ,
I Let sI
= sign(D⇤x
?(y,�))I
be the subvector of D⇤x
?(y,�) whose entries are indexed by I ,
I Let GJ
= KerD⇤J
be the “cospace” associated to x
?(y,�) ,I To study the local behaviour of x?(y,�), we impose � to be “invertible” on G
J
:
GJ
\ Ker� = {0},I It allows us to define the matrix
A
[J ] = U(U⇤�⇤�U)�1U
⇤,
where U is a matrix whose columns form a basis of GJ
,
I In this case, we obtain an implicit equation:
x
?(y,�) solution of P�
(y) , x
?(y,�) = x(y,�) , A
[J ]�⇤y � �A
[J ]D
I
s
I
.
Is this relation true in a neighbourhood of (y,�)?
Theorem (Local Parameterization)
I Even if the solutions x?(y,�) of P�
(y) might benot unique, �x?(y,�) is uniquely defined.
I If (y,�) 62 H, for (y, �) close to (y,�), x(y, �)is a solution of P(y, �) where
x(y, �) = A
[J ]�⇤y � �A
[J ]D
I
s
I
.
I Hence, it allows us writing
@�x?(y,�)
@y
= �A[J ]�⇤,
I Moreover, the DOF can be estimated by
tr
✓@�x?(y,�)
@y
◆= dim(G
J
) .
Can we compute this quantity e�ciently?
�
x1
x2
�0 = 0 �k
x�k = 0
x�0
P0(y)
Monday, September 24, 12
Computation of GSURE
I One has for Z ⇠ N (0, IdP
),
tr
✓(��⇤)+@�x
?(y,�)
@y
◆= E
Z
(h⌫(Z), �⇤(��⇤)+Zi)
where, for any z 2 RP , ⌫ = ⌫(z) solves the following linear system✓�⇤� D
J
D
⇤J
0
◆✓⌫
⌫
◆=
✓�⇤
z
0
◆.
I In practice, with law of large number, the empirical mean is replaced for the expectation.
I The computation of ⌫(z) is achieved by solving the linear system with a conjugate gradient solver.
Numerical example
Super-resolution using (anisotropic) Total-Variation
(a) y
(b) x?(y,�) at the optimal � 2 4 6 8 10 12
1
1.5
2
2.5x 10
6
Regularization parameter !
Qu
ad
ratic
loss
Projection RiskGSURETrue Risk
Compressed-sensing using multi-scale wavelet thresholding
(c) xML
(d) x?(y,�) at the optimal �2 4 6 8 10 12
1
1.5
2
2.5x 10
6
Regularization parameter !
Qu
ad
ratic
loss
Projection RiskGSURETrue Risk
Perspectives: How to e�ciently minimizes GSURE(y,�) wrt �?
References
Eldar, Y. C. (2009).Generalized SURE for exponential families: Applications to regularization.IEEE Transactions on Signal Processing, 57(2):471–481.
Pesquet, J.-C., Benazza-Benyahia, A., and Chaux, C. (2009).A SURE approach for digital signal/image deconvolution problems.IEEE Transactions on Signal Processing, 57(12):4616–4632.
Stein, C. (1981).Estimation of the mean of a multivariate normal distribution.The Annals of Statistics, 9(6):1135–1151.
Vaiter, S., Deledalle, C., Peyre, G., Dossal, C., and Fadili, J. (2012).Local behavior of sparse analysis regularization: Applications to risk estimation.Arxiv preprint arXiv:1204.3212.
Vonesch, C., Ramani, S., and Unser, M. (2008).Recursive risk estimation for non-linear image deconvolution with a wavelet-domain sparsity constraint.In ICIP, pages 665–668. IEEE.
http://www.ceremade.dauphine.fr/~deledall/ [email protected]
��?
Quadraticloss
x�?(y)
Overview
• Synthesis vs. Analysis Regularization
• Local Behavior of Sparse Regularization
• Robustness to Noise
• Numerical Illustrations
Identifiability Criterion
Identifiability criterion of a sign: we suppose (HJ) holds
where � = D+J (Id� ���A[J])DI
IC(s) = minu�Ker DJ
||�sI � u||� (convex ! computable)
� = Id (denoising)
Identifiability Criterion
xD�x = (xi � xi�1)i
Discrete 1-D derivative:
Identifiability criterion of a sign: we suppose (HJ) holds
where � = D+J (Id� ���A[J])DI
IC(s) = minu�Ker DJ
||�sI � u||� (convex ! computable)
� = Id (denoising)
�sI = sign(D�
Ix)�J = �sI
sI
Identifiability Criterion
x
+1
�1
�J
IC(sign(D�x)) < 1
IC(s) = ||�J ||�
D�x = (xi � xi�1)i
Discrete 1-D derivative:
Identifiability criterion of a sign: we suppose (HJ) holds
where � = D+J (Id� ���A[J])DI
IC(s) = minu�Ker DJ
||�sI � u||� (convex ! computable)
IC(s) = minu�Ker DJ
||�sI � u||� � = D+J (Id� ���A[J])DI
Robustness to Small Noise
IC(s) = minu�Ker DJ
||�sI � u||� � = D+J (Id� ���A[J])DI
Robustness to Small Noise
is the unique solution of P�(y).
If ||w||/T is small enough and � � ||w||, then
Theorem:
x0 + A[J]��w � �A[J]DIsI ,
If IC(sign(D�x0)) < 1 T = mini�I
|(D�x0)i|
IC(s) = minu�Ker DJ
||�sI � u||� � = D+J (Id� ���A[J])DI
Robustness to Small Noise
is the unique solution of P�(y).
If ||w||/T is small enough and � � ||w||, then
Theorem:
x0 + A[J]��w � �A[J]DIsI ,
If IC(sign(D�x0)) < 1 T = mini�I
|(D�x0)i|
Theorem: If IC(sign(D�x0)) < 1x0 is the unique solution of P0(�x0).
�� When D = Id, results of J.J. Fuchs.
IC(s) = minu�Ker DJ
||�sI � u||� � = D+J (Id� ���A[J])DI
Robustness to Small Noise
is the unique solution of P�(y).
If ||w||/T is small enough and � � ||w||, then
Theorem:
x0 + A[J]��w � �A[J]DIsI ,
If IC(sign(D�x0)) < 1 T = mini�I
|(D�x0)i|
Theorem: If IC(sign(D�x0)) < 1x0 is the unique solution of P0(�x0).
Robustness criterion:
= IC(p)
Robustness to Bounded Noise
RC(I) = max||pI ||��1
minu�ker(DJ )
||�pI � u||�
Robustness criterion:
cJ = ||D+J ��(�A[J]�� � Id)||2,�
CJ = ||A[J]||2,2
�||�||2,2 +
�cJ
1� RC(I)||DI ||2,�
�Constants:
�� When D = Id, results of Tropp (ERC)
= IC(p)
Robustness to Bounded Noise
Theorem:
||x0 � x�||2 � CJ ||w||2
� = ⇥||w||2cJ
1� RC(I)with ⇥ > 1
P�(y) has a unique solution x⇥ � GJ and
If RC(I) < 1 for I = Supp(D�x0), setting
RC(I) = max||pI ||��1
minu�ker(DJ )
||�pI � u||�
Noiseless CNS:
x0 2 argminy=�x
||D⇤x||1
() 9↵ 2 @||D⇤x0||1, D↵ 2 Im(�⇤)
(SC↵)
Source Condition
||D⇤x||1 6 c
y
=�x
D↵x0
Noiseless CNS:
x0 2 argminy=�x
||D⇤x||1
() 9↵ 2 @||D⇤x0||1, D↵ 2 Im(�⇤)
(SC↵)
Source Condition
||D⇤x||1 6 c
y
=�x
D↵x0
Theorem: If (HJ), (SC↵) and ||↵J ||1 < 1, then
[Grassmair, Inverse Prob., 2011]
||D⇤(x? � x0)|| = O(||w||)
Noiseless CNS:
x0 2 argminy=�x
||D⇤x||1
() 9↵ 2 @||D⇤x0||1, D↵ 2 Im(�⇤)
(SC↵)
Proposition:
⇢↵J = ⌦sI � u↵I = sI
Let s = sign(D⇤x0) and
Then IC(s) < 1 =) (SC↵) and ||↵J ||1 = IC(s).
Source Condition
IC(s) = minu�Ker DJ
||�sI � u||�
||D⇤x||1 6 c
y
=�x
D↵x0
Theorem: If (HJ), (SC↵) and ||↵J ||1 < 1, then
[Grassmair, Inverse Prob., 2011]
||D⇤(x? � x0)|| = O(||w||)
Overview
• Synthesis vs. Analysis Regularization
• Local Behavior of Sparse Regularization
• Robustness to Noise
• Numerical Illustrations
D�x = (xi � xi�1)i
Discrete 1-D derivative:
Signals with k � 1 steps
Denoising � = Id.
x
Example: Total Variation in 1-D{GJ \ dimGJ = k}
D�x = (xi � xi�1)i
Discrete 1-D derivative:
Signals with k � 1 steps
Denoising � = Id.
x
+1
�1
Small noise robustness
x
�I
�J
IC(sign(D�x)) < 1
��I = sign(D�
Ix)�J = � sign(D�
Ix)IC(s) = ||�J ||�
Example: Total Variation in 1-D{GJ \ dimGJ = k}
D�x = (xi � xi�1)i
Discrete 1-D derivative:
Signals with k � 1 steps
Denoising � = Id.
x
+1
�1
Small noise robustness
x
�I
�J
IC(sign(D�x)) < 1 IC(sign(D�x)) = 1
��I = sign(D�
Ix)�J = � sign(D�
Ix)IC(s) = ||�J ||�
Example: Total Variation in 1-D
+1
�1
{GJ \ dimGJ = k}
x
�J
Case IC= 1:
Support recovery.
Staircasing Example
x�(y)
i
yi
x0
= +
wy
�
yi
i �
x�(y)
Case IC= 1:
No support recovery.
Support recovery.
Staircasing Example
x�(y)
i
yi
x0
= +
wy
�
D�x = (D�1x,D�
2x) � RN�2
D�2x = (xi,j � xi,j�1)i,j � RN
D�1x = (xi,j � xi�1,j)i,j � RN
Directional derivatives:
Gradient:
Example: Total Variation in 2-D
Dual vector:
D�x = (D�1x,D�
2x) � RN�2
D�2x = (xi,j � xi,j�1)i,j � RN
D�1x = (xi,j � xi�1,j)i,j � RN
Directional derivatives:
Gradient:
x
��I = sign(D�
Ix)�J = � sign(D�
Ix)
Example: Total Variation in 2-D
x
�I �I
IC(s) > 1IC(s) < 1
Sum of k interval indicators.
x
Gaussian � 2 RQ⇥N .
Example: Fused Lasso{GJ \ dimGJ = k}
D�x = (xi � xi�1)i � (�xi)i
Total variation and �1 hybrid:
Compressed sensing:
Sum of k interval indicators.
x
Gaussian � 2 RQ⇥N .
Probabilite P (⌘, ", QN ) of the even IC< 1.
x0
⌘
Example: Fused Lasso{GJ \ dimGJ = k}
"
⌘10
D�x = (xi � xi�1)i � (�xi)i
Total variation and �1 hybrid:
Compressed sensing:
⌘
⌘
Q
N
Haar wavelets:
Haar TI analysis:
�(j)
�(j)
2j+1
||D�x||1 =�
j ||x � �(j)||1 =�
j ||x � �(j)||TV
(j)i =
1
2
⌧(j+1)
8<
:
+1 if 0 6 i < 2
j
�1 if � 2
j 6 i < 0
0 otherwise.
Example: Invariant Haar Analysis
Haar wavelets:
Haar TI analysis:
�(j)
�(j)
2j+1
||D�x||1 =�
j ||x � �(j)||1 =�
j ||x � �(j)||TV
(j)i =
1
2
⌧(j+1)
8<
:
+1 if 0 6 i < 2
j
�1 if � 2
j 6 i < 0
0 otherwise.
IC
�
�x = ' ? x, '(t) / e
� t2
2�2De-blurring:
Example: Invariant Haar Analysis
8
For N a multiple of 4, we split {1, . . . , N} into 4 sets lk =
{(k� 1)M +1, ..., kM} of cardinality M = N/4. Let 1lk bethe boxcar signal whose support is lk . Consider the staircasesignal x
0
= �1l1 + 1l4 degraded by a deterministic noise wof the form w = "(1l3 � 1l2), where " 2 R. The observationvector y = x
0
+ w reads
y = �1l1 � "1l2 + "1l3 + 1l4 .
Suppose that " > 0, then the solution x?� of (P�(y)) is
x?� =
✓
�1 +
�
M
◆
1l1 � "1l2 + "1l3 +
✓
1� �
M
◆
1l4 ,
if 0 6 � 6 �1
= M(1� "), and
x?� =
✓
�"+ �� �1
2M
◆
(1l1+1l2)+
✓
"� �� �1
2M
◆
(1l3+1l4),
if �1
6 � 6 �2
= �1
+ 2"M , and 0 if � > �2
. Similarly, if" < 0, the solution x?
� reads
x?� =
✓
�1 +
�
M
◆
1l1�✓
"+ 2
�
M
◆
(1l2�1l3)+
✓
1� �
M
◆
1l4 ,
if 0 6 � 6 ¯�1
= �"M2
, and
x?� =
✓
�1 +
�
M
◆
1l1 +
✓
1� �
M
◆
1l4 ,
if �1
6 � 6 ¯�2
= M , and 0 if � > �2
. Figure 2 displaysplots of the the coordinates’ path for both cases. It is worth
x?�[i]
�
0
1
¯�2
�"
¯�1
x?�[i]
�
0
1
�1
"
�2
y[i]
i
0
2✏
y[i]
i
0
Fig. 2: Top row: Signals y for " < 0 (left) and " > 0
(right). Bottom row: Corresponding coordinates’ path of x?� as
a function of �. The solid lines correspond to the coordinatesin l
1
and l4
, and the dashed ones to the coordinates in l2
andl3
.
pointing out that when " > 0, the D-support of x?� is always
different from that of x0
whatever the choice of �, whereasin the case " < 0, for any ¯�
1
6 � 6 ¯�2
, the D-support of x?�
and sign of D⇤x?� are exactly those of x
0
.
D. Shift-Invariant Haar Deconvolution
Sparse analysis regularization using a 1-D shift invariantHaar dictionary is efficient to recover piecewise constant
signals. This dictionary is built using a set of scaled and dilatedHaar filters
(j)i =
1
2
⌧(j+1)
8
>
<
>
:
+1 if 0 6 i < 2
j
�1 if � 2
j 6 i < 0
0 otherwise,
where ⌧ > 0 is a normalization exponent. For ⌧ = 1,the dictionary is said to be unit-normed. For ⌧ = 1/2, itcorresponds to a Parseval tight-frame. The action on a signalx of the analysis operator corresponding to the translationinvariant Haar dictionary DH is
D⇤Hx =
⇣
(j) ? x⌘
06j6Jmax
,
where ? stands for the discrete convolution (with appropriateboundary conditions) and Jmax < log
2
(N), where N is thesize of the signal. The analysis regularization ||D⇤
Hx||1
can alsobe written as the sum over scales of the TV semi-norms offiltered versions of the signal. As such, it can be understood asa sort of multiscale total variation regularization. Apart from amultiplicative factor, one recovers Total Variation when Jmax =
0.We consider a noiseless convolution setting (for N = 256)
where � is a circular convolution operator with a Gaussiankernel of standard deviation �. We first study the impact of �on the identifiability criterion IC. The blurred signal x⌘ is acentered boxcar signal with a support of size 2⌘N
x⌘ = 1{bN/2�⌘Nc,...,bN/2+⌘Nc}, ⌘ 2 (0, 1/2] .
Figure 3 displays the evolution of IC(sign(D⇤Hx
0
) as afunction of � for three dictionaries (⌧ = 1, ⌧ = 0.5 and theTotal Variation), where we fixed ⌘ = 0.2. In the identifiability
Fig. 3: Behaviour of IC for a noiseless deconvolution scenariowith a Gaussian blur and `
1
-analsyis sparsity regularizationin a shift invariant Haar dictionary with J
max
= 4. IC isplotted as a function of the Gaussian bluring kernel size � 2[0.5, 3.0] for the total variation dictionary and the Haar waveletdictionary with two normalization exponents ⌧ . Dash-dottedline: ⌧ = 1 (unit-normed). Dashed line: ⌧ = 1/2 (tight-frame).Solid line: total variation.
regime, IC(sign(D⇤Hx
0
) appears smaller in the case of theunit-normed normalization. However, one should avoid to
⌧ = 1
⌧ = 1/2
TV
x0
Conclusion
SURE risk estimation:
Sensitivity analysis.
Computing risk estimates
()
Open problem:
Fast algorithms to optimize �.
Conclusion
SURE�
�?�
SURE risk estimation:
Sensitivity analysis.
Computing risk estimates
()
Open problem:
Fast algorithms to optimize �.
Conclusion
SURE�
�?�
SURE risk estimation:
Sensitivity analysis.
Computing risk estimates
()
Analysis support is less stable.
Analysis vs. synthesis regularization:
Open problem:
Fast algorithms to optimize �.
Conclusion
SURE�
�?�
SURE risk estimation:
Sensitivity analysis.
Computing risk estimates
()
Stability without support stability.
Open problem:
Analysis support is less stable.
Analysis vs. synthesis regularization: