Discriminative Approach for Wavelet Denoising Yacov Hel-Or and Doron Shaked I.D.C.- Herzliya HPL-Israel

Discriminative Approach forWavelet Denoising

Yacov Hel-Or and Doron Shaked

I.D.C.- Herzliya HPL-Israel

- Can we clean Lena?

Motivation – Image denoisingMotivation – Image denoising

Some reconstruction problemsSome reconstruction problems

Images of Venus taken by the Russian lander Ventra-10 in 1975

- Can we “see” through the missing pixels?

Sapiro et. al.

Image InpaintingImage InpaintingSapiro et.al.

Image De-Image De-mosaicingmosaicing

- Can we reconstruct the color image?

Image De-Image De-blurringblurring

Can we sharpen Barbara?

• All the above deal with degraded images.• Their reconstruction requires solving an

inverse problem

• Inpainting

• De-blurring

• De-noising

• De-mosaicing

Typical Degradation Sources Typical Degradation Sources

Low Illumination

Atmospheric attenuation(haze, turbulence, …)

Optical distortions(geometric, blurring)

Sensor distortion(quantization, sampling,

sensor noise, spectral sensitivity, de-mosaicing)

Reconstruction as an Inverse ProblemReconstruction as an Inverse Problem

Distortion

Hnxy Hx

n noise

measurements

Original image

Reconstruction Algorithmy x̂

Years of extensive study

Thousands of research papers

• Typically:– The distortion H is singular or ill-posed.– The noise n is unknown, only its statistical

properties can be learnt.

y x̂ ny 1H

nxy H

Key point: Stat. Prior of Natural ImagesKey point: Stat. Prior of Natural Images

The Image PriorThe Image Prior

Px(x)

Image space

1

0

Image space

measurements

Most probable solution

• From amongst all possible solutions, choose the one that maximizes the a-posteriori probability PX(x|y)

Bayesian Reconstruction (MAP)Bayesian Reconstruction (MAP)

P(x|y)

PX(x)

Unfortunately not!• The p.d.f. Px defines a prior dist. over natural

images:

– Defined over a huge dim. space (1E6 for 1Kx1K grayscale image)

– Sparsely sampled.

– Known to be non Gaussian.

– Complicated to model.

So, are we set?So, are we set?

Example: 3D prior of 2x2 image neighborhoods

form Mumford & Huang, 2000

Marginalization of Image PriorMarginalization of Image Prior

• Observation1: The Wavelet transform tends to de-correlate pixel dependencies of natural images.

W.T.

xWxW i

iWiW xPxP

• Observation2: The statistics of natural images are homogeneous.

iWiband

iWi xPxP

Share the same statistics

How Many Mapping FunctionsHow Many Mapping Functions

Wavelet Shrinkage Denoising Wavelet Shrinkage Denoising Donoho & Johnston 94 (unitary case)Donoho & Johnston 94 (unitary case)

• Degradation Model:

• The MAP estimator:

nxy

WW

xW yxPyx

w

|maxargˆ

,0~, NnIH

• The MAP estimator gives:

WW

xW yxPyx

w

|maxargˆ

WWWxW xPxyPyx

w

|maxargˆ

WWWxW xPxyPyx

w

log|logminargˆ

i

iWi

i

i

W

iW

xW xPyxyx

w

logminargˆ2

• The MAP estimator diagonalizes the system:

This leads to a very useful property:

Scalar mapping functions:

i

Wii

W

iW

x

i

W

iW xPyxyx

w

logminargˆ2

i

WiiW yMx ˆ

Wavelet Shrinkage Pipe-lineWavelet Shrinkage Pipe-line

y Transform

WTransform

W

Mapping functionsMi(yi

w)

Mapping functionsMi(yi

w)Inverse

Transform

WT

InverseTransform

WT

x

yiw

xiw

Non linear operation

yWMWx WTˆ

• Due to the fact that:

• N mapping functions are needed for N sub-bands.

iWiband

iWi xPxP

How Many Mapping Functions?How Many Mapping Functions?

Subband DecompositionSubband Decomposition

yByB

• Wavelet transform:

• Shrinkage:

k

kkTkB

T yBMByBMBx̂

NB

B

B 1

where


y B1B1

x

Wavelet transform

B1B1B1B1BiBi

Shrinkage functions

Inverse transform

B1B1B1B1B1B1BT

iBT

i

k

kkTk yBMBx̂

xiB

yiB

+

• The shape of the mapping function Mj depends solely on Pj and the noise variance .

Designing The Mapping FunctionDesigning The Mapping Function

jBi

iWx

yw

Modeling marginal p.d.f.

of band j

(noise variance) (noise variance)

jMMAPobjective

MAPobjective

iWj

i

W

iW

x

i

W

iW xPyxyx

w

logminargˆ

• Commonly Pj(yw) are approximated by GGD:

psxw exP ~ for p<1

from: Simoncelli 99

from: Simoncelli 99

Hard Thresholding

Soft Thresholding

Linear Wiener Filtering

MAP estimators for GGD model with three different exponents. The noise is additive Gaussian, with variance one third that of the signal.

• Due to its simplicity Wavelet Shrinkage became extremely popular:– Thousands of applications.

– Hundreds of related papers (984 citations of D&J paper in Google Scholar).

• What about efficiency?– Denoising performance of the original Wavelet Shrinkage

technique is far from the state-of-the-art results.

• Why?– Wavelet coefficients are not really independent.

Recent DevelopmentsRecent Developments• Since the original approach suggested by D&J

significant improvements were achieved:

Original Shrinkage

Over-completeJoint (Local) Coefficient

Modeling

• Overcomplete transform• Scalar MFs• Simple• Not considered state-of-the-art

• Multivariate MFs

• Complicated

• Superior results

Joint (Local) Coefficient ModelingJoint (Local) Coefficient Modeling

94 06200097 03

HMMCrouse et. al.

Joint BayesianPizurika et. al

Context ModelingPortilla et. al.

Context ModelingChang, et. al.

HMMFan-Xia

SparsityMallat, Zhang Joint Bayesian

Simoncelli

BivariateSendur, Selesnick

Co-occurenceShan, Aviyente

Adaptive Thresh.Li, Orchad

Shrinkage in Over-complete TransformsShrinkage in Over-complete Transforms

94

ShrinkageD.J.

06200097 03

SteerableSimoncelli, Adelson

Undecimatedwavelet

Coifman, Donoho

RidgeletsCandes

RidgeletsCarre, Helbert

RidgeletsNezamoddini et. al.

ContourletsMatalon, et. al.

ContourletsDo, Vetterli

CurveletsStarck et. al.

K-SVDAharon, Elad

Over-Complete Shrinkage DenoisingOver-Complete Shrinkage Denoising

yByB

• Over-complete transform:

• Shrinkage:

• Mapping Functions: Naively borrowed from the Unitary case.

kkk

Tk

TB

TT yBMBBByBMBBBx11

ˆ

NB

B

B 1

where

What’s wrong with existing MFs?What’s wrong with existing MFs?1. Map criterion:

– Solution is biased towards the most probable case.

2. Independent assumption:– In the overcomplete case, the wavelet coefficients are

inherently dependent.

3. Minimization domain:– For the unitary case MFs optimality is expressed in the

transform domain. This is incorrect in the overcomplete case.

4. White noise assumption:– Image noise is not necessarily white i.i.d.

Why unitary based MFs are being used?Why unitary based MFs are being used?

• Non-marginal statistics.

• Multivariate minimization.

• Multivariate MFs.

• Non-white noise.

Suggested Approach:Suggested Approach:

• Maintain simplicity – Use scalar LUTs.

• Improve Efficiency – Use Over-complete Transforms.

– Design optimal MFs with respect to a given set of images.

– Express optimality in the spatial domain.

– Attain optimality with respect to MSE.

Optimal Mapping Function:Optimal Mapping Function:

• Traditional approach: Descriptive

• Suggested approach: Discriminative

jBi

iWx

Modelingwavelet p.d.f . jMMAP

objective

MAPobjective

ex eyOptimality criteria

jM

The optimality CriteriaThe optimality Criteria

• Design the MFs with respect to a given set of examples: {xe

i} and {yei}

• Critical problem: How to optimize the non-linear MFs

i k

e

ikkTk

Tei yBMBBBx

21

The Spline Transform The Spline Transform

• Let xR be a real value in a bounded interval [a,b).

• We divide [a,b) into M segments q=[q0,q1,...,qM]

• w.l.o.g. assume x[qj-1,qj)

• Define residue r(x)=(x-qj-1)/(qj-qj-1)

a bx

q0 q1 qMqj-1 qj

r(x)

x=r(x) qj+(1-r(x)) qj-1x=[0,,0,1-r(x),r(x),0,]q = Sq(x)q

The Spline Transform-Cont. The Spline Transform-Cont.

• We define a vectorial extension:

• We call this the

Spline Transform (SLT) of x.

qq xSx

xqS

0,r,r-1,0 ii xx

ith row

The SLTThe SLT PropertiesProperties

• Substitution property: Substituting the boundary vector q with a different vector p forms a piecewise linear mapping.

=Sq(x)

xq0

q1

q2

q3

q4

q1 q2 q3 q4

qp

p0

p1

p2

p3

p4

xx’

x

x

x’

Back to the MFs DesignBack to the MFs Design

• We approximate the non-linear {Mk} with piece-wise linear functions:

• Finding {pk} is a standard LS problem with a closed form solution!

i kk

e

ikqTk

Tei yBSBBBx

k

21

p

kq pk

yBSyBM kkk

ey B1B1B1B1B1B1BkBk

B1B1B1B1B1B1BT

kBT

k

+

Designing the MFsDesigning the MFs

ex exclosed form solution:

Mk(y; pk)

kBy

kBx

kp

(BTB)-1

Undecimated wavelet: 2D convolutions

ResultsResults

Training ImagesTraining Images

Tested ImagesTested Images

Simulation setupSimulation setup

• Transform used: Undecimated DCT• Noise: Additive i.i.d. Gaussian • Number of bins: 15• Number of bands: 3x3 .. 10x10

ey B1B1B1B1BkBk

Option 1Option 1: Transform domain –: Transform domain – independent bandsindependent bands

ex

Mk(y; pk)

kBy

kBx

(BTB)-1B1B1B1B1BT

kBT

k

ex B1B1B1B1BkBk

ex(BTB)-1B1B1B1B1BT

kBT

kkBy

kBx

i

ke

ikqeikk yBSxB

k

2p

ey B1B1B1B1BkBk

exkB

y

kBx

(BTB)-1B1B1B1B1BT

kBT

k

ex B1B1B1B1BkBk

ex(BTB)-1B1B1B1B1BT

kBT

kkBy

kBx

Option 2Option 2: Spatial domain –: Spatial domain – independent bandsindependent bands

i

ke

ikqTk

eik

Tkk yBSBxBB

k

2p

Mk(y; pk)

ey B1B1B1B1BkBk

exkB

y

kBx

(BTB)-1B1B1B1B1BT

kBT

k

ex B1B1B1B1BkBk

ex(BTB)-1B1B1B1B1BT

kBT

kkBy

kBx

Option 3Option 3: Spatial domain –: Spatial domain – joint bandsjoint bands

i kk

e

ikqTk

Tei yBSBBBx

k

21

p

Mk(y; pk)

MFs for UDCT 8x8 (i,i) bands, i=1..4, =20

OptionOption 1

OptionOption 2

OptionOption 3

Comparing psnr results for 8x8 undecimated DCT, sigma=20.

barbara boat fingerprint house lena peppers256 27.5

28

28.5

29

29.5

30

30.5

31

31.5

32

32.5

33

psnr

Method 1

Method 2

Method 3

8x8 UDCT=10

8x8 UDCT=20

8x8 UDCT=10

5 10 15 20 25 30 3530.5

31

31.5

32

32.5

33

33.5

34

34.5

35

35.5

number of bins

psnr

barbaraboat

fingerprint

house

lenapeppers256

The Role of Quantization Bins

8x8 UDCT=10

barbara boat fingerprint house lena peppers256 32

32.5

33

33.5

34

34.5

35

35.5

36ps

nr

3x3 DCT

5x5 DCT

7x7 DCT

9x9 DCT

The Role of Transform Used

=10


28

29

30

31

32

33

34

35

36ps

nr

The Role of Training Image

MFs for UDCT 8x8 (i,i) bands, i=2..6.

=5

=10

=15

=20

The Role of noise variance

• Observation: The obtained MFs for different noise variances are similar up to scaling:

The role of noise varianceThe role of noise variance

0

0

s

where

s

vMsvM

Comparison between M20(v) and 0.5M10(2v) for basis [2:4]X[2:4]

1 2 5 10 15 20 25

30

35

40

45

50

s.t.d.

PS

NR

barbara

1 2 5 10 15 20 25

30

35

40

45

50

s.t.d.

PS

NR

boat

1 2 5 10 15 20 25

30

35

40

45

50

s.t.d.

PS

NR

fingerprint

1 2 5 10 15 20 25

30

35

40

45

50

s.t.d.

PS

NR

house

1 2 5 10 15 20 25

30

35

40

45

50

s.t.d.

PS

NR

lena

1 2 5 10 15 20 25

30

35

40

45

50

s.t.d.

PS

NR

peppers

Comparison with BLS-GSM

1 2 5 10 15 20 25

28

30

32

34

36

38

40

42

44

46

48

50

s.t.d.

PS

NR

proposed method

GSM method

Comparison with BLS-GSM

Other Degradation ModelsOther Degradation Models

JPEG Artifact RemovalJPEG Artifact Removal

JPEG Artifact RemovalJPEG Artifact Removal

Image SharpeningImage Sharpening


ConclusionsConclusions

• New and simple scheme for over-complete transform based denoising.

• MFs are optimized in a discriminative manner.

• Linear formulation of non-linear minimization.

• Eliminating the need for modeling complex statistical prior in high-dim. space.

• Seamlessly applied to other degradation problems as long as scalar MFs are used for reconstruction.

Conclusions – cont.Conclusions – cont.• Extensions:

– Filter-cascade based denoising.

– Multivariate MFs (activity level).

– Non-homogeneous noise characteristics.

• Open problems:

– What is the best transform for a given image?

– How to choose training images that form faithful representation?

Thank You

scale x

scal

e y

0.5 1.0 1.5 2.0 2.5

0.5

1.0

1.5

2.0

2.5

12

13

14

15

16

17

18

19

20

21

22

MSE for MF scaling from =10 to =20

scale x

scal

e y

0.5 1.0 1.5 2.0 2.5

0.5

1.0

1.5

2.0

2.5

11

12

13

14

15

16

17

18

19

20

21

22


scale x

scal

e y

0.5 1.0 1.5 2.0 2.5

0.5

1.0

1.5

2.0

2.5

12

13

14

15

16

17

18

19

20

21

22


0

0

s

where

s

vMsvM

0 0.5 1 1.50

0.5

1

1.5

sigma / 20

S



y B1B1

x

Wavelet transform

B1B1B1B1BiBi

Shrinkage functions

Inverse transform

B1B1B1B1B1B1BT

iBT

i

kkk

Tk

T yBMBBBx1

ˆ

xiB

yiB

+

(BTB)-1


OptionOption 1


OptionOption 2


OptionOption 3

barbara boat fingerprint house lena peppers256 27.5

28

28.5

29

29.5

30

30.5

31

31.5

32

32.5

33

psnr

Traditional

Suggested



32.5

33

33.5

34

34.5

35

35.5

36

psnr

Traditional

Suggested


Documents

Discriminative Approach for Wavelet Denoising Yacov Hel-Or and Doron Shaked I.D.C.- Herzliya HPL-Israel