19
Institute for Computational Mathematics Hong Kong Baptist University ICM Research Report 08-15

Institute for Computational Mathematics Hong …Institute for Computational Mathematics Hong Kong Baptist University ICM Research Report 08-15 1 On the total variation dictionary model

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Institute for Computational Mathematics Hong …Institute for Computational Mathematics Hong Kong Baptist University ICM Research Report 08-15 1 On the total variation dictionary model

Institute for Computational Mathematics Hong Kong Baptist University

ICM Research Report 08-15

Page 2: Institute for Computational Mathematics Hong …Institute for Computational Mathematics Hong Kong Baptist University ICM Research Report 08-15 1 On the total variation dictionary model

1

On the total variation dictionary modelTieyong Zeng∗

Abstract

The goal of this paper is to investigate the impact of dictionary choosing for a total variation

dictionary model. After theoretical analysis and numerical aspects, we present the experiments in which

the dictionary contains the curvatures of known forms (letters). The data-fidelity term of this model

allows the appearance in the residue of all structures except forms being used to build the dictionary.

Therefore, these forms will remain in the result image while the other structures will disappear. Our

experiments are carried on the source separation problems and confirm this impression. The starting

image contains letters (known) on a very structured background (an image). We show that it is possible,

with this model, to obtain a reasonable separation of these structures. Finally, this work illustrates clearly

that the dictionary must represent sparsely the curvature of elements which we seek to preserve and not

the elements themselves, as we might think this naively.

Index Terms

Curvature, dictionary, source separation, sparse representation, total variation.

EDICS Category: SMR-REP, TEC-RST

1Department of Mathematics, Hong Kong Baptist University, Kowloon Tong, Hong Kong. e-mail: [email protected], tel:

(+852) 3411 2531, fax: (+852) 3411 5811.

December 3, 2008 DRAFT

Page 3: Institute for Computational Mathematics Hong …Institute for Computational Mathematics Hong Kong Baptist University ICM Research Report 08-15 1 On the total variation dictionary model

2

I. INTRODUCTION

The task of image denoising is to recover an ideal image u ∈ L2(Ω) from a noisy observation:

v = u + b,

where Ω is a rectangle of R2 to define the image, v ∈ L2(Ω) is the noisy image and b ∈ L2(Ω) is

Gaussian noise of standard variation σ.

In the past decades, a variety of denoising methods have been developed to process this task, among

which two approaches, total variation initiated in [1], and wavelet thresholding originally introduced in

[2], have drawn great attention. Eventually, the hybrid approach proposed in [3] may take the form of

the following optimization model:

(P∗) :

min TV (w)

subject to 〈w − v, ψ〉 ≤ τ, ∀ψ ∈ D,

for a finite dictionary D ⊂ L2(Ω) which is often symmetric and a positive parameter τ associated with

the noise level. Usually, the total variation for an image w ∈ L2(Ω) is defined as:

TV (w) =∫

Ω|∇w|, (1)

where the gradient is taken in the sense of distribution. Note that, as pointed out in [3], when the dictionary

D is the union of all unit-norm vectors of L2(Ω), the model (P∗) reduces to the Rudin-Osher-Fatami

(ROF) model.

For a small positive ε fixed, applying a steepest descent algorithm on the penalization energy:∫

Ω|∇w|+ 1

ε

ψ∈D

(sup(〈w − v, ψ〉 − τ, 0)

)2, (2)

Malgouyres [3] showed that when the dictionary D contains wavelet/wavelet packet bases and their

opposites, the model (P∗) preserves the texture better than the ROF model. The theoretical aspects of

the penalty method (2) for symmetric dictionary D are reported in [4].

In [5], the nonlinear programming task (P∗) was solved exactly via a dual Uzawa method and the

authors reaffirmed that this model allows very good structure-preserving reconstructions. However, their

experiments were still limited to the dictionary of wavelet/wavelet packet bases and their opposites. As

a result, the role of dictionary D for (P∗) was obscure, though they had already realized the importance

of this role and formulated it as:

Open Problem 1: Given a class of image, if one aims at obtaining optimal results by (P∗), how should

the dictionary D be designed?

December 3, 2008 DRAFT

Page 4: Institute for Computational Mathematics Hong …Institute for Computational Mathematics Hong Kong Baptist University ICM Research Report 08-15 1 On the total variation dictionary model

3

Note that in this paper, the obscure concept optimal is evaluated by the peak signal-to-noise ratio (PSNR).

Inspired by the open problem, the authors of [6] investigated twelve Gabor dictionaries for (P∗). Their

experiments demonstrated clearly that the choice of dictionary deeply impacts the performance of the

model. As Gabor filters are closely related to the representation of textures, their results are rather good

for the preservation of textures. However, their conclusion to the open problem was still somewhat vague.

The goal of this paper, which could be regarded as an extensive work of [7], is to provide a solid

investigation on the open problem. The remaining paper is organized as follows. In Section III, theoretical

analysis is conducted to illustrate the representation of the curvature of solution of (P∗) over the dictionary

D. This establishes a solid bridge between the total variation dictionary model with the minimization of

l1-norm. In Section IV, some numerical aspects are presented and in Section V, we report experiments

with known features for two typical source separation examples: image decomposition and denoising.

Finally, in Section VI, we address some discussion and then conclude that the dictionary must represent

sparsely the curvature of elements which we seek to preserve.

II. PRELIMINARIES

This section is devoted to present some classical results of convex analysis. Details and further results

will be found in [8], [9].

A. Notations

Let H be a real Hilbert space and we denote 〈·, ·〉 its scalar product, by ‖ · ‖2 the associated norm.

Let f : H → [−∞,∞] be a function. The domain and the epigraph of f are dom f = x ∈ H|f(x) <

+∞ and epi f = (x, η) ∈ H×R|f(x) ≤ η, respectively; f is lower semicontinuous if epi f is closed

in H×R, and convex if epi f is convex in H×R. Denote Γ0(H) the class of all lower semicontinuous

convex functions from H to (−∞, +∞] that are not identically +∞.

B. Elements of convex analysis

Let f ∈ Γ0(H). The conjugate of f is the function f∗ defined by

∀u ∈ H, f∗(u) = supx∈H

〈x, u〉 − f(x). (3)

Moreover, we have f∗∗ = f . For instance, the conjugate of the indicator function of a nonempty closed

convex set C, i.e.,

ιC : x 7→

0, if x ∈ C,

+∞, if x /∈ C,

December 3, 2008 DRAFT

Page 5: Institute for Computational Mathematics Hong …Institute for Computational Mathematics Hong Kong Baptist University ICM Research Report 08-15 1 On the total variation dictionary model

4

is the support function C, i.e.,

ι∗C = σC : u 7→ supx∈C

〈x, u〉.

In consequence,

σ∗C = ι∗∗C = ιC .

If f is pair, i.e., ∀x ∈ H, f(−x) = f(x), then readily for any u ∈ H, we have:

f∗(−u) = supx∈H

〈−x, u〉 − f(x) = supx∈H

〈x, u〉 − f(x) = f∗(u), (4)

i.e., f∗ is also pair.

The subdifferential of f is the set-valued operator ∂f : H 7→ 2H whose value at x ∈ H is given by

∂f(x) = u ∈ H|∀y ∈ H, 〈y − x, u〉+ f(x) ≤ f(y)

or, equivalently,

∂f(x) = u ∈ H|f(x) + f∗(u) = f(u). (5)

Accordingly, one has the Fermat’s rule:

∀x ∈ H, f(x) = inf f(H) ⇔ 0 ∈ ∂f(x).

Proposition 2: If f is pair, then for any x ∈ H,

∂f(−x) = −∂f(x). (6)

Proof: Indeed, by Eq.(5), we have:

∂f(−x) = u ∈ H|f(−x) + f∗(u) = f(u)

= u ∈ H|f(x) + f∗(u) = f(u)

= ∂f(x).

Moreover, if f is (Gateux) differentiable at x with gradient ∇f(x), then ∂f(x) = ∇f(x). We also

need the following proposition (see Thm 23.5, [8] or Prop. 6.1.2, [9]).

Proposition 3: Let f ∈ Γ0(H), then for all s, x ∈ H, we have:

s ∈ ∂f(x) ⇔ x ∈ ∂f∗(s). (7)

December 3, 2008 DRAFT

Page 6: Institute for Computational Mathematics Hong …Institute for Computational Mathematics Hong Kong Baptist University ICM Research Report 08-15 1 On the total variation dictionary model

5

C. Bregman distance

The application of Bregman distance [10] in the context of image processing is rather active recently.

In [11], Osher and al. proposed a Bregman distance based iterative regularization approach for the ROF

model; it was then extended to wavelet-based denoising [12], nonlinear inverse scale space in [13],

compressed sensing [14].

Let f : H 7→ [−∞, +∞] be a convex function. The Bregman distance associated with f for points

p, q ∈ H is:

Bf (p, q) = f(p)− f(q)− 〈∂f(q), p− q〉,

where ∂f(q) is a subgradient of f at the point q. Intuitively this can be regarded as the difference between

the value of f at point p and the value of the first-order Taylor expansion of f around point q evaluated at

point p. Because Bf (p, q) 6= Bf (q, p) in general, Bf (p, q) is not a distance in the usual sense. However,

it measures the closeness between p and q in the sense that Bf (p, q) ≥ 0 and Bf (p, q) ≥ Bf (r, q) for

all points r on the line segment connecting p and q.

Proposition 4: If f is pair, then for any p, q ∈ H,

Bf (−p,−q) = Bf (p, q).

Proof: Indeed, by definition and Eq.(7), we have:

Bf (−p,−q) = f(−p)− f(−q)− 〈∂f(−q), p− q〉

= f(p)− f(q)− 〈∂f(q), p− q〉

= Bf (p, q).

D. Strong duality theorem

We only present the strong duality theorem here since it is crucial for our upcoming theoretical analysis.

For details of the convex programming, we refer [8].

Theorem 5: (Strong duality theorem) Given an optimization problem with convex domain Ω ⊂ Rn,

min f(λ), λ ∈ Ω

subject to gi(λ) ≤ 0 i = 1, · · · , k

gi(λ) = 0 i = k + 1, · · · ,m

December 3, 2008 DRAFT

Page 7: Institute for Computational Mathematics Hong …Institute for Computational Mathematics Hong Kong Baptist University ICM Research Report 08-15 1 On the total variation dictionary model

6

Fig. 1. Left: curvature of Lena image; right: curvature of letters.

where f is convex and all the gi are affine functions, i.e.,

gi(λ) = Aλ− c

for some matrix A and vector c, the duality gap is zero.

This theorem can be found in book [15] (Theorem 5.20), another version of this theorem is Corollary

28.2.2 of book [8].

III. THEORETICAL ANALYSIS

Suppose w∗ is solution of (P∗) . Using Kuhn-Tucker Theorem (Thm.28.3, [8]), we know that there

exist positive Lagrangian parameters (λ∗ψ)ψ∈D such that:

∇TV (w∗) +∑

ψ∈Dλ∗ψψ = 0. (8)

From Eq.(1), it is easy to verify that for every w ∈ L2(Ω), we have:

∇TV (w) = −∇ · ( ∇w

|∇w|). (9)

Therefore, by Eq.(8), (9), we get:∑

ψ∈Dλ∗ψψ = ∇ · ( ∇w∗

|∇w∗|). (10)

The right side of Eq.(10) is the curvature of w∗. This shows that the curvature of the solution w∗ of

(P∗) is represented positively by the elements of the dictionary D. Moreover, we would like to illustrate

that in certain sense, this representation should be sparse. For this, we turn to considering the dual form

of the optimization model (P∗). Our consideration is inspired by [16], [17] where the dual form of the

ROF model was investigated.

December 3, 2008 DRAFT

Page 8: Institute for Computational Mathematics Hong …Institute for Computational Mathematics Hong Kong Baptist University ICM Research Report 08-15 1 On the total variation dictionary model

7

A. Dural form for (P∗)

Indeed, we consider a more general problem:

(GP∗) :

min J(w)

subject to 〈w − v, ψ〉 ≤ τ,∀ψ ∈ D,,

where J ∈ Γ0(H) is a convex function.

Theorem 6: The dural problem of (GP ∗) is:

min(λψ)ψ∈D≥0

BJ∗

ψ∈Dλψψ, q

+ τ

ψ∈D|λψ|, (11)

where q ∈ ∂J(v) and BJ∗ is the Bregman distance associated to the the conjugate function J∗.

Proof: By the duality theory (see [8]), the dual form of (P∗) is

M , minw

maxλψ≥0

J(w) +∑

ψ∈Dλψ(〈ψ, w − v〉 − τ).

Using the strong duality theorem (Theorem 5 or Corollary 28.2.2, [8]), we know that the duality gap

for the linear constraints convex problem (GP∗) is zero and we can exchange the minmax as maxmin.

Let’s denote:

λ = (λψ)ψ∈D, Ψ = (ψ)ψ∈D,

and

λ ·Ψ =∑

ψ∈Dλψψ, |λ|1 =

ψ∈D|λψ|.

With a straightforward calculation, we obtain:

M = maxλ≥0

minw

J(w) + 〈λ ·Ψ, w − v〉 − τ |λ|1

= −minλ≥0

maxw−J(w)− 〈λ ·Ψ, w − v〉+ τ |λ|1

= −minλ≥0

J∗(−λ ·Ψ) + 〈λ ·Ψ, v〉+ τ |λ|1,

where J∗ of the conjugate function of J defined in Eq.(3). Now taking q ∈ ∂J(v), then by Eq. (7), this

is equivalent to v ∈ ∂J∗(q). We have:

M + J∗(q)− q∂J∗(q)

= −minλ≥0

J∗(−λ ·Ψ)− J∗(q)

−〈−λ ·Ψ− q, ∂J∗(q)〉+ τ |λ|1

= − min(λψ)ψ∈D≥0

BJ∗

ψ∈Dλψψ, q

+ τ

ψ∈D|λψ|.

December 3, 2008 DRAFT

Page 9: Institute for Computational Mathematics Hong …Institute for Computational Mathematics Hong Kong Baptist University ICM Research Report 08-15 1 On the total variation dictionary model

8

Therefore, the dual form of (GP∗) is (11).

Particularly, we have:

Theorem 7: The dural problem of (P ∗) is:

min(λψ)ψ∈D≥0

BTV ∗

ψ∈Dλψψ,∇(

∇v

|∇v|) + τ

ψ∈D|λψ|, (12)

where BTV ∗ is the Bregman distance associated to TV ∗.

Proof: Taking J = TV in the above theorem, we know that the dural form of (P∗) is:

min(λψ)ψ∈D≥0

BTV ∗

ψ∈Dλψψ,−∇(

∇v

|∇v|) + τ

ψ∈D|λψ|. (13)

Noting for all w, TV (w) = TV (−w), by Eq.(4), we have

∀w, TV ∗(w) = TV ∗(w).

By Prop.4, we have

BTV ∗(−p,−q) = BTV ∗(p, q), ∀p, q.

Combing this with Eq.(13), we complete the proof.

The calculation of TV ∗ is rather simple. Indeed, if we define the convex BG in the G-space of Meyer

(see [18]) by:

BG , ∇ · ψ|ψ = (ψ1, ψ2), ψ1, ψ2 ∈ C∞0 (Ω), |ψ|∞ ≤ 1,

then easily we can prove that (for details, see [16]):

TV ∗(w) =

0, if w ∈ BG,

+∞, otherwise.

From the above discussion, we know that the vector λ∗ , (λ∗ψ)ψ∈D of Eq.(10) is solution of (12),

i.e., we are minimizing the l1-norm by keeping the Bgregman distance between the synthesis image∑

ψ∈D λψψ and the curvature ∇( ∇v|∇v|). We would like to point out an interesting observation here is that

the model (P∗) tries to use the elements of the dictionary, (ψ)ψ∈D to represent the curvature ∇( ∇v|∇v|),

not v itself which we might think naively.

Moreover, if we take J(·) = 12‖ · ‖2

2, easily we have J∗ = J . Therefore, in this case (11) turns to be:

min(λψ)ψ∈D≥0

12‖

ψ∈Dλψψ − v‖2

2 + τ∑

ψ∈D|λψ|.

This is indeed the Basis Pursuit denoising model (non-negative case) which have been studied extensively

(see [19], [20], [21], [22], [14], [23], [24] and references therein). The dural and pre-dual relationship of

December 3, 2008 DRAFT

Page 10: Institute for Computational Mathematics Hong …Institute for Computational Mathematics Hong Kong Baptist University ICM Research Report 08-15 1 On the total variation dictionary model

9

this model with the particular case J(·) = 12‖ · ‖2

2 for (GP∗) is also reported in [25]. Moreover, noting

that the minimization of l1-norm usually leads to the minimization of l0-norm, we know that the vector

(λ∗ψ)ψ∈D of Eq.(10) is sparse. Therefore, by Theorem 6 and theorem 7, we have established a close

bridge between the total variation dictionary model with the study of compressed sensing or precisely,

the minimization of l1-norm. The only difference is that the usual l2-norm term to measure the error

is replaced by Bregman distance on curvatures, thus by this we enriches the research of compressed

sensing.

B. Ad-hoc dictionary

Recall that if the dictionary D contains all unit-norm vectors of L2(Ω), the model (P∗) reduces to the

ROF model. Various experiments have already suggested that the ROF model is not good as (P∗) with

wavelet/wavelet packets basis (and their opposites) or Gabor dictionaries (see [3], [5], [6]). Therefore,

the construction of the dictionary D is not simply the union of all possible atoms. Actually, when D is

of large size, we can not neglect the interactions among the elements of D.

Note that the solution of (P∗) is only involved with the active constraints (where λ∗ψ > 0 and 〈w∗ −v, ψ〉 = τ ). If the vector (λ∗ψ)ψ∈D is sparse, this will reduce the possibility of interactions among the

atoms. Evidently, the non-trivial sparsest case is that the dictionary D contains only one element. By

Eq.(10), neglecting a normalization constant, we should take this element as:

ψ = ∇ · ( ∇u

|∇u|),

if one aims at recovering the ideal image u. We refer this as the ad-hoc dictionary. In the left image of

Fig.1, we show the curvature of the Lena image.

Now, we add a Gaussian additive noise of standard variation 20 to the Lena image. Fig.2 shows the

performances of (P∗) with the ad-hoc dictionary and the ROF model, where the parameter for both

models are tuned to get better performance. From this Fig., we clearly see that the model (P∗) with this

dictionary almost perfectly recovers the ideal image here. Not only the PSNR is very high, the visual

effect are much better than the ROF model. The residue image is nearly a Gaussian noise and this is an

important index to reflect the performance of the restoration (see [26]).

Interestingly, the success of the ad-hoc dictionary, the curvature of the ideal image, could be explained

via the idea of anisotropic diffusion [27]. Indeed, as most of the existing range image-based edge detection

algorithms base their detection, criterion on depth or curvature changes [28], curvature itself could be

considered as a simple edge detector. This phenomena can also be discovered directly from the left

December 3, 2008 DRAFT

Page 11: Institute for Computational Mathematics Hong …Institute for Computational Mathematics Hong Kong Baptist University ICM Research Report 08-15 1 On the total variation dictionary model

10

Fig. 2. Denoising by (P∗) with ad-hoc dictionary and ROF. Top: clean image, noisy image (σ = 20, PSNR = 22.11); middle:

result of ROF (PSNR = 27.66), result of (P∗) with ad-hoc dictionary (PSNR = 34.93); bottom: residue of ROF and (P∗).

image of Fig.1 where the curvature of ideal Lena image really looks similarly to edge. In consequence,

by taking the curvature as dictionary, (P∗) can be regarded as the total variation diffusion (see [29], [30])

by keeping the edge pixels, the most important positions to reflect the visual effect of the denoised image.

Therefore, this is no wonder that the performance of (P ∗) with the curvature dictionary is especially

good.

Overall, this section illustrates that theoretically, we should choose a dictionary D which can give a

sparse representation for the curvature of the underlying ideal image. This observation will be re-confirmed

by experimental results in the upcoming section after the presenting of numerical aspects.

December 3, 2008 DRAFT

Page 12: Institute for Computational Mathematics Hong …Institute for Computational Mathematics Hong Kong Baptist University ICM Research Report 08-15 1 On the total variation dictionary model

11

IV. NUMERICAL ASPECTS

The discrete version of the total variation of an image u ∈ RN2can be defined as:

TV (u) =N−1∑

i,j=0

√(ui+1,j − ui,j)2 + (ui,j+1 − ui,j)2

where we let ui,N = ui,0 and uN,j = u0,j . Note that in practice, it is now common technique to

regularization TV (u) by its continuous version (see [31]):

TVβ(u) =N−1∑

i,j=0

√(ui+1,j − ui,j)2 + (ui,j+1 − ui,j)2 + β,

where β is small positive (say, β = 0.001).

In order to evaluate our theoretical analysis, we will present some numerical experiments on (P∗) with

translated-invariant dictionary which is built based on a finite set

F0 = ψk1≤k≤r⊂RN2 .

In the remaining of the paper, we refer to these elements as ”features”.

For any k ∈ 1, . . . , r and any indexes (i, j) ∈ 0, . . . , N − 12, we denote

Ψk,i,jm,n , Ψk

m−i,n−j , (14)

where (m,n) ∈ 0, . . . , N − 12, the translation of Ψk. We then consider the dictionary

D := Ψk,i,j , for 1 ≤ k ≤ r and 0 ≤ i, j < N.

In order to solve (P∗), similarly to [6], we use a penalty method. More precisely, we minimize the

unconstrained energy

TV (w) +1ε

ψ∈Dθτ (〈w − v, ψ〉), (15)

with

θτ (t) = (sup(t− τ, 0))2,

and for a small positive number ε (say ε = 10−6). Note that the convergence of this penalty approach is

readily to be established similarly to [4].

The optimization problem of (15) is solved by a steepest descent algorithm. In order to get such an

algorithm, the main difficulty is to compute the gradient of (15). It takes the form

∇TV (w) + λ∑

ψ∈Dθ′τ (〈w − v, ψ〉)ψ,

December 3, 2008 DRAFT

Page 13: Institute for Computational Mathematics Hong …Institute for Computational Mathematics Hong Kong Baptist University ICM Research Report 08-15 1 On the total variation dictionary model

12

where θ′τ denotes the derivative of θτ :

θ′τ (t) =

2(t− τ), if t ≥ τ,

0, otherwise.

We do not detail how to compute ∇TV (w). It can easily be found in the literature. In order to compute

the gradient of the data fidelity term for our translated invariant dicionary, we need to compute the

decomposition in D and a recomposition. These two operations are detailed in the next two subsections.

A. The decomposition

The decomposition of u ∈ RN2provides the set of values

(〈u,Ψk,i,j〉)0≤i,j<N and 1≤k≤#F0.

Notice that, using (14), we have, for any u ∈ RN2and any feature Ψk,i,j ∈ F0,

〈u,Ψk,i,j〉 =N−1∑

m,n=0

um,nΨkm−i,n−j .

So the set of values (〈u,Ψk,i,j〉)1≤i,j<N , is just u ∗Ψk, where ∗ stands for the convolution product and

Ψkm,n = Ψk−m,−n (remember the images are periodized).

The decomposition can therefore be computed with one Fourier transform and #F0 inverse Fourier

transform, if we memorize the Fourier transforms of the features.

B. The recomposition

Denoting Λ = (λki,j)0≤i,j<N and 1≤k≤#F0

and m = #F0N2, the recomposition takes the following

form

T : Λ ∈ Rm →#F0∑

k=1

N−1∑

i,j=0

λki,jΨ

k,i,j ∈ Rn.

Using (14), we get

T (Λ) =#F0∑

k=1

λk ∗Ψk

This can be computed with #F0 Fourier transforms and one inverse Fourier transform.

December 3, 2008 DRAFT

Page 14: Institute for Computational Mathematics Hong …Institute for Computational Mathematics Hong Kong Baptist University ICM Research Report 08-15 1 On the total variation dictionary model

13

V. EXPERIMENTS

The analysis and experiment of Section III exhibit that when we know the curvature of the ideal

image, we can get a nearly perfect restoration. However, the task of obtaining a nearly perfect curvature

is equivalent to recover the ideal image.

Fortunately, sometimes we have some prior information about the image. For instance, we may know

that the ideal image contains some special structures and we are especially interested in processing these

structures. In this case we can still apply (P∗) together with a dictionary reflecting the prior information.

We remark that the assumption of knowing the prior information is equivalent to the common Basis

Pursuit model where we usually assume that D has been already fixed.

We present two examples of source separation: image decomposition and denoising. The numerical

aspects have already been presented in the previous section. Note that in each experiment, we tune the

parameter τ for both ROF and (P∗) to obtain better performance.

A. Image decomposition

Suppose that we are interested in processing some letters in a noisy image (right of Fig.3) which is

obtained by adding 20% impulse noise to the ideal image. These kinds of images are numerous in real life

image processing tasks, for instance, we might think about the photos widely emerged on the internet or

in videos whereas the name of the news agency, the name of the photographer, the photography date, the

logo of the video product company or even some rigid watermark are embedded. Note that the impulse

noise is adopted here to demonstrate that the noise accepted by (P∗) is rather general.

We want to separate the noisy image into two parts: one part containing the letters and the other part

containing the noise and the background information. Typically, the letter part can be used in a pattern

recognition process.

Fig. 3. Left: clean image; right: noisy image to decompose, obtained by 20% impulse noise on the left image.

December 3, 2008 DRAFT

Page 15: Institute for Computational Mathematics Hong …Institute for Computational Mathematics Hong Kong Baptist University ICM Research Report 08-15 1 On the total variation dictionary model

14

The ordinary decomposition method is not suitable for this task. For instance, the up part of Fig.4

displays the result of the ROF method. The upper-left is the cartoon part that we want to represent the

letter, but it also contains information of background; the upper-right is the texture part to represent the

background and noise, however, it contains information of letters.

Now suppose that we know the letters. Then we can construct a feature dictionary containing the

curvature of the letters. The right image of Fig.1 displays these curvatures. After normalization, we

translate all the filters of the feature dictionary on the plan to obtain a total dictionary D.

Using this dictionary D, the model (P∗) provides a fairly good image decomposition result displayed

in bottom of Fig.4. Clearly, most of the letter information is contained in the letter part while most

of background and noise information is kept in the residue part. The explanation of this phenomena

is similar to the discussion about the performance of the ad-hoc dictionary in Section III. Indeed, by

putting the curvature of letters into the feature dictionary, the edge whereas the noisy image has higher

correlation with the curvature of letters will be kept and the other information will be smoothed by the

procedure of the total variation diffusion. We thus obtain a rather clean image which contains only letter

information. Moreover, we can observe that the letters present in the bottom-left image of Fig.4 is rare,

therefore, the curvature of this image is represented sparsely by the dictionary.

Fig. 4. Image decomposition for right image of Fig.3. Top: cartoon part, noisy-texture part of ROF model; bottom: letter part,

background-noisy part of model (P∗).

December 3, 2008 DRAFT

Page 16: Institute for Computational Mathematics Hong …Institute for Computational Mathematics Hong Kong Baptist University ICM Research Report 08-15 1 On the total variation dictionary model

15

B. Image denoising

Fig. 5. Image denoising. Top: clean image, noisy image with σ = 20, PSNR = 22.08; middle: denoise result of ROF with

PSNR = 24.56, residue of ROF; bottom: denoise result of (P∗) with PSNR = 31.20, residue of (P∗).

Now we add a Gaussian noise of standard variation 20 to the clean image (top-left of Fig.5). The noisy

image is shown in top-right of Fig.5. We still suppose that we know the information about the letters. This

time, the feature dictionary is composed of two parts. The first part contains also the curvatures of letters

shown in right image of Fig.1. The second part contains 13 filters d1, . . . , d13 from the Daubechie-

3 wavelet (see [32]) of level 4 and their opposites −d1, . . . ,−d13. Note that these 13 Daubechie-3

wavelet filters are shown in Chapter 3 of [33]. Overall, the feature dictionary contains 9 + 2× 13 = 35

filters. After normalization, all the filters in the feature dictionary are then translated over the plan to build

a total dictionary D. The denoising results of model (P∗) with this dictionary and the ROF model are

shown in Figure 5. Clearly, with the known features we have a much better performance than the ROF

model. Indeed, for the ROF model, the letters of result image are vague as some information presents

December 3, 2008 DRAFT

Page 17: Institute for Computational Mathematics Hong …Institute for Computational Mathematics Hong Kong Baptist University ICM Research Report 08-15 1 On the total variation dictionary model

16

in the residue. However, for (P∗) , the letters and the background are well recovered as the dictionary

allows the appearance of letters and isotropic information.

VI. CONCLUDING REMARKS

If we neglect the interactions between features, we can conclude that a feature of the form −∇TV (f)

in the dictionary D will favor the appearance of the pattern f in solution w∗ of (P∗) , i.e., we have the

mechanism:

∇ · (∇f

|f | ) ! f.

Thus if we aim at recovering a special pattern/structure f from the noisy image by (P∗), we should

add the feature −∇TV (f) into the feature dictionary (when the position of this feature is not known)

or total dictionary D (when it has a known position).

Now turning back to the open problem presented in Section I, our conclusion is that for a certain class

of images, in order to obtain ideal restoration result with (P∗) , we should take a dictionary D which

gives sparse representation for the collection containing all the curvatures of image in that class. We

mention that the method of [34] might be useful for this task.

Overall, in this paper, after the theoretical analysis illustrating the representation of the curvature of

the solution of (P∗) over the dictionary, we presented the experiments in which the dictionary contains

curvatures of known forms (letters). The data-fidelity term of the this model authorizes the appearance

in the residue of all the structures, except forms being used to build the dictionary. Thus, we can expect

that these forms remain in the result and that the other structures will disappear. Our experiments are

carried on the problem of source separation and confirm this impression. The starting image contains

letters (known) on a very structured background (an image). We showed that it is possible, with the

model (P∗), to obtain a reasonable separation of these structures. Finally, this work illustrated clearly

that the dictionary D must contain the curvature of elements which we prefer to preserve.

The future work could be learning typical patterns from the curvature of a certain class of image and

then use (P∗) for image processing. Moreover, a fast algorithm based on Bregman distance and dural

form is also of great interest.

REFERENCES

[1] L. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Physica D, vol. 60, pp.

259–268, 1992.

December 3, 2008 DRAFT

Page 18: Institute for Computational Mathematics Hong …Institute for Computational Mathematics Hong Kong Baptist University ICM Research Report 08-15 1 On the total variation dictionary model

17

[2] D. Donoho and I. Johnstone, “Ideal spatial adaptation by wavelet shrinkage,” Biometrika, vol. 81, no. 3, pp. 425–455,

1994.

[3] F.Malgouyres, “Minimizing the total variation under a model convex constraint for image restoration,” IEEE Trans. Image

Process., vol. 11(12)), pp. 1450–1456, Dec.2002.

[4] F. Malgouyres, “Mathematical analysis of a model which combines total variation and wavelet for image restoration,”

Journal of information processes, vol. 2, no. 1, pp. 1–10, 2002, available at http://www.math.univ-paris13.fr/∼malgouy.

[5] S. Lintner and F. Malgouyres, “Solving a variational image restoration model which involves l∞ contraints,” Inverse

Problem, vol. 20, no. 3, pp. 815–831, June 2004.

[6] T. Zeng and F. Malgouyres, “Using gabor dictionaries in a tv− l∞ model, for denoising,” in Proceedings of ICASSP 2006,

vol. 2, Toulouse, France, May 2006, pp. 865–868.

[7] T. Zeng, “Incorporating known features into a total variation dictionary model for source separation,” in IEEE International

Conference on Image Processing (ICIP), San Diego, California, U.S.A, October 12-15 2008, pp. 577–580.

[8] R. Rockafellar, Convex analysis. Princeton University Press, 1970.

[9] L. C. Hiriart-Urruty J.-B., Convex Analysis and Minimization Algorithms Part 1: Fundamentals, ser. Grundlehren der

mathematischen Wissenschaften. Springer-Verlag, 1993, vol. 305.

[10] L. Bregman, “The relaxation method for finding common points of convex sets and its application to the solution of

problems in convex programming,” USSR Computational Mathematics and Mathematical Physics, vol. 7, pp. 200–217,

1967.

[11] S. Osher, M. Burger, D. Goldfarb, J. Xu, and W. Yin, “An iterative regularization method for total variation-based image

restoration,” Multiscale Model. Simul., vol. 4, no. 2, pp. 460–489, 2005.

[12] J. Xu and S. Osher, “Iterative regularization and nonlinear inverse scale space applied to wavelet-based denoising,” IEEE

Trans. on Image Process., vol. 16, no. 2, pp. 534–544, Feb. 2007.

[13] M. Burger, G. Gilboa, S. Osher, and J. Xu, “Nonlinear inverse scale space methods,” Commun. Math. Sci., vol. 4, no. 1,

pp. 179–212, 2006.

[14] W. Yin, D. G. S. Osher, and J. Darbon, “Bregman iterative algorithms for l1-minimization with applications to compressed

sensing,” SIAM J. Imaging Sciences, vol. 1, no. 1, pp. 143–168, 2008.

[15] J.-T. N.Cristianini, Support Vector Machines and other kernel-based learning methods. Cambridge University Press, 2000.

[16] J. Aujol and A. Chambolle, “Dual norms and image decomposition models,” International Journal of Computer Vision,

vol. 63, no. 1, pp. 85–104, June 2005.

[17] A. Chambolle, “An algorithm for total variation minimization and applications,” Journal of Mathematical imaging and

vision, vol. 20, no. 1-2, pp. 89–97, January-March 2004.

[18] Y. Meyer, Oscillating paterns in image processing and in some nonlinear evolution equation. Boston, MA, USA: AMS,

2001, the Fifteenth Dean Jacqueline B. Lewis Memorial Lectures.

[19] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition by basis pursuit,” SIAM J. Sci. Comput., vol. 20,

no. 1, pp. 33–61, 1999.

[20] D. L.Donoho, M. Elad, and V. Temlyakov, “Stable recovery of sparse overcomplete representations in the presence of

noise,” IEEE Trans. Inf. Theory, vol. 52, no. 1, pp. 6–18, Jan. 2006.

[21] D. Donoho and J. Tanner, “Sparse nonegative solution of underdetermined linear equations by linear programming,”

Proceedings of the National Academy of Sciences, vol. 102, no. 27, p. 9446, 9451 2005.

December 3, 2008 DRAFT

Page 19: Institute for Computational Mathematics Hong …Institute for Computational Mathematics Hong Kong Baptist University ICM Research Report 08-15 1 On the total variation dictionary model

18

[22] E. J. Candes, J. K. Romberg, and T. Tao, “Stable signal recovery from incomplete and inaccurate measurements,”

Communications on Pure and Applied Mathematics, vol. 59, no. 8, pp. 1207–1223, March 2006.

[23] A. Bruckstein, M. Elad, and M. Zibulevsky, “A non-negative and sparse enough solution of an underdetermined linear

system of equations is unique,” IEEE Trans. Info. Theory,, vol. 54, no. 11, pp. 4813–4820, November 2008.

[24] J. Shtok and M. Elad, “Analysis of the basis pursuit via the capacity sets,” Journal of Fourier Analysis and Applications,

To appear, available at http://www.cs.technion.ac.il/∼elad/publications/journals/.

[25] F.Malgouyres and T.Zeng, “A predual proximal point algorithm solving a non negative basis pursuit denoising model,”

CCSd CNRS, Tech. Rep. ccsd-00133050, Feb. 2007, submitted.

[26] T. Buades, B. Coll, and J. Morel, “A review of denoising algorithms, with a new one,” SIAM Multiscale Model. Simul.,

vol. 4, no. 2, pp. 490–530, 2005.

[27] P. Perona and J. Malik, “Scale-space and edge detection using anisotropic diffusion,” IEEE Trans. Pattern Anal. Mach.

Intell., vol. 12, no. 7, pp. 629–639, July 1990.

[28] S. Niitsuma1 and K. Tokunaga1, “Edge detection and curvature calculation in the visual system,” Artificial Life and

Robotics, vol. 9, no. 3, pp. 135–138, 2005.

[29] D. Strong and T. Chan, “Edge-preserving and scale-dependent properties of total variation regularization,” Inverse Problems,

vol. 19, no. 6, pp. 165–187, 2003.

[30] A. Fuensanta, B. Coloma, C. Vicent, and M. Jose, “Minimizing total variation flow,” Comptes rendus de l’Academie des

sciences. Serie 1, Mathematique, vol. 331, no. 11, pp. 867–872, 2000.

[31] R.Acar and C.Vogel, “Analysis of bounded variation methods for ill-posed problems,” Inverse Problems,, vol. vol.10, pp.

pp.1217–1229, 1994.

[32] S. Mallat, A Wavelet Tour of Signal Processing. Boston: Academic Press, 1998.

[33] T. Zeng, “Etudes de modele variationnels et apprentissage de dictionnaires,” Ph.D. dissertation, Universite Paris 13, 2007.

[34] P. Jost, P. Vandergheynst, S. Lesage, and R. Gribonval, “Motif : an efficient algorithm for learning translation invariant

dictionaries,” in proc. of ICASSP, Toulouse, France, May 2006, pp. 857–860.

December 3, 2008 DRAFT