Computational Algebraic Problems in Variational PDE Image Processing Tony F. Chan Department of Mathematics, UCLA International Summer School in Numerical

Computational Algebraic Problems in

Variational PDE Image Processing

Tony F. Chan

Department of Mathematics, UCLA

International Summer School in Numerical Linear Algebra

Chinese University of Hong Kong

Reports: www.math.ucla.edu/~imagers

Supported by ONR, NIH and NSF

Collaborators

Peter Blomgren (Stanford)

Raymond Chan (CUHK)

Gene Golub (Stanford)

Pep Mulet (Valencia)

Jackie Shen (Minnesota)

Luminita Vese (UCLA)

Justin Wan (Waterloo)

C.K. Wong

Jamylle Carter (MSRI)

Berta Sandberg (TechFinity)

Ke Chen (Liverpool)

Xue-Cheng Tai (Bergen, Norway)

Jean-Francois Aujol (Cachan)

Selim Esedoglu (Michigan)

Fred Park (UCLA -> Michigan)

Outline of 4 Lectures

1. Introduction to PDE Image Models & Algorithms:

Denoising, deblurring, active contours, segmentation

2. Algebraic Problems from Denoising & Deblurring

Linear preconditioning techniques

Nonlinear + Optimization (duality) techniques

Multigrid techniques

3. Algebraic Problems from Active Contours/Segmentation

Curve evolution techniques

Direct optimization techniques

Goal of Lectures

• Broad overview rather than latest techniques• Details on only a few topics --- see papers + reports

+ web• No comprehensive referencing• Limit to PDE aspects (there are non-PDE

approaches; see forthcoming SIAM book by Hansen, Nagy, O’Leary)

• Please ask questions (English, Cantonese, Mandarin)

TYPICAL IMAGE PROCESSING TASKS

* Denoising/Inpaint * Object Detection/Identification

* Deblurring * Object/Pattern Recognition

* Enhancement * Gray-Scale vs Vector-Valued

* Compression * Still vs Video

* Segmentation * Registration

Related fields: Computer Graphics, Computer Vision.

IP And Applied Math• Important applications:

– Medical, astronomy, Comp. Vision/Comp. Graphics,

• Math Models:– standard or create your own

• Math Tools: – harmonic analysis, PDEs, Baysean statistics,

differential geometry, CFD, multiscale, optimization,…

• Analysis of Models: – existence, uniqueness, properties

• Challenging Computations– 3D+time+multi-components, nonlinearity, non-smooth

Examples of PDE Image Models

Denoising and Inpainting

The Restoration ProblemA given Observed image z

Related to True Image u

Through Blur K

And Noise n

Blur+NoiseInitial Blur

Inverse Problem: restore u, given K and statistics for n.

Ill-posed: needs proper regularization.

Keeping edges sharp and in the correct location is a key problem !

nuKz

Total Variation Regularization

dxuuTV ||)(

• Measures “variation” of u, w/o penalizing discontinuities.

• |.| similar to Huber function in robust statistics.

• 1D: If u is monotonic in [a,b], then TV(u) = |u(b) – u(a)|, regardless of whether u is discontinuous or not.

• nD: If u(D) = char fcn of D, then TV(u) = “surface area” of D.

• (Coarea formula)

• Thus TV controls both size of jumps and geometry of boundaries.

• Extensions to vector-valued functions

• Color TV: Blomgren-C 98; Ringach-Sapiro, Kimmel-Sochen

drdsfdxufnR

ru

)(||}{

Total Variation Restoration

2||||2

1)()(min fKuuTVuf

u

0n

uGradient flow:

)(1

|| )( * fKKuK

u

uugu

t

anisotropic diffusion data fidelity

dxuuTV ||)(

* First proposed by Rudin-Osher-Fatemi ’92.

* Allows for edge capturing (discontinuities along curves).

* TVD schemes popular for shock capturing.

Regularization:

Variational Model:

Comparison of different methods for signal denoising & reconstruction

Denoising: TV vs H1

Inpaintings: Generalized Restoration Models Scratch Removal Disocclusion

Graffiti Removal

Examples of TV Inpaintings

Where is the Inpainting Region?

Unified TV Restoration & Inpainting model

EDE

dxdyuudxdyuuJ ,||2

||][ 20

,0)(||

0

uuu

ue

.0; ,,DzEz

e


Blind (unknown blur) and Non-blind Deburring

- Blurring operator K usually ill-conditioned

- Need to solve systems with differential-convolution operator:

Original Image Out of focus blur Blurred Image

Deblurring

zKKuKu

u

||

Debluring: TV vs H1

TV Blind Deconvolution C. + Wong (98)

* Variational Model:

* Alternating minimization algorithm:

22021

,||||

2

1)()(),(min uukkTVuTVkuf

ku

),(min),(

),(min ),(

111

1

kufkuf

kufkuf

n

k

nn

n

u

nn

* Algorithm gives 1-parameter family of solutions,

determined by SNR.

)( 2)( 1

TV Blind deconvolution

Recovered image

Recovered Blurring Function

61 102

2 0 1e-7 1e-5 1e-4

Original image

Out of focus blurred blind non-blind

Gaussian blurred blind non-blind


Active Contours & Segmentation

Features:

Automatically detects interior contours!

Works very well for concave objects

Robust w.r.t. noise

Detects blurred contours

The initial curve can be placed anywhere!

Allows for automatical change of topolgy

Active Contour w/o Edges (C.-Vese 99)

C of Evolution Objects Found

Europe nightlightsThe model detects contours without gradient (cognitive contours)

MRI brain image

Motion Segmentation (Moelich, Chan 2004)

Olympic Blvd, LA(2 frames per sec)

UCLA(1 frame per sec)

Westwood Blvd

Motion determined by logical AND & OR on frame differences over 3 consecutive frames.

Designed for low frame rate videos.

Another extension of Chan-Vese 2001 model.

Implemented via level sets.

Other Related PDE Image Models* Geometry driven diffusion (see book by Bart M. ter Haar Romeny 1994)

* Anisotropic diffusion (Perona-Malik 87)

* Fundamental IP PDE (Alvarez-Guichard-Lions-Morel 92)Affine invariant flow (Sapiro-Tanenbaum 93)

* TV + Textures (Meyer 2001, Osher-Vese 02, Osher-Sole-Vese 02, Osher-Sapiro-Vese 02, Chambolle 03)

||

,||)(u

ucurvatureucurvatureFut

)|)(|( uufut

3/1)(|| curvatureuut

Different Frameworks for Image Processing

Statistical/Stochastic Models:

Maximum Likelihood Estimation with uncertain data

Transform-Based Models:

Fourier/Wavelets --- process features of images (e.g. noise)

in transform space (e.g. thresholding)

Variational PDE Models:

Evolve image according to local derivative/geometric info,

e.g. denoising diffusion

Concepts are related mathematically:

Brownian motion – Fourier Analysis --- Diffusion Equation

Features & Advantages of PDE Imaging Models

* Use PDE concepts: gradients, diffusion, curvature, level sets

* Exploit sophisticated PDE and CFD (e.g. shock capturing) techniques

Restoration:

- sharper edges, less edge artifacts, often morphological

Segmentation:

- scale adaptivity, geometry-based, controlled regularity of boundaries, segments can have complex topologies

Newer, less well developed/accepted.

Combining PDE with other techniques.

Computational challenges* size: large # of pixels, color and multi-channel, 3D, videos.

* TV(u) non-differentiable if

need numerical regularization:

* High nonlinearity of

* Ill-conditioning:

* Highly varying coefficients:

* Need to precondition differential + convolution operators.

0|| u

0 ,|||| 2 uu

|| u

u

operatorcompact ddiscretize :

)(||

cond 2

KK

hOu

u

||

1

u

Some Books/Surveys for PDE Imaging• Morel-Solimini 94: Variational Meths in Image Segmentation

• Romeny 94: Geometry Driven Diffusion in Computer Vision

• Alvarez-Morel 94: Acta Numerica (review article)

• IEEE Tran. Image Proc. 3/98, Special Issue

• J. Weickert 98: Anisotropic Diffusion in Image Processing

• G. Sapiro 2000: Geometric PDE & Imaging

• Aubert-Kornprost 2002: Math Aspects of Image Processing

• Osher-Fedkiw 2003: “Level Set Bible”

• Chan, Shen & Vese Jan 03, Notices of AMS (review)

• Paragios, Chen, Faugeras 2005: Collection of articles

• Chan-Shen 2005: Image Processing & Analysis

Available since Sept 05 (www.siam.org)











)(||

)( *zKKuKu

uugut

2||||2

1)()(min zKuuTVuf

u

N

i

Tii zuuA 2

22 ||||2

1||||

TiAwhere denotes the discrete gradient , divergence.

Restoration Problem in Discrete Algebraic Form

0)(||||

)( zuuA

uAAug

N

iTi

Tii

Gradient Flow:

iA

)(||

|| )( *zKKuKu

uuugut

Morphological Diagonal Scaling (Marquina, Osher 2000)

Two different but related motivations:

- morphological evolution of level sets --- moves in direction of normal with speed proportional to curvature, independent of contrast.

- diagonally scaled Richardson stationary iteration.

Advantages:

- cost per time step similar to time marching

- much faster convergence to steady state

Unconstrained Modular Solver for Discrepancy Principle

Blomgren-C. 96

min R(u) u

subject to ,||||2

1 22 zuK

)(||||2

1)(min 2 uRzuKuf

u

Discrepancy Principle Constrained Problem:

Tikhonov Unconstrained Problem:

Modular Solver: efficient solver for fixed ),( uSu (e.g. Time Marching, Fixed Point, Primal-Dual.)

- Can make use of S to solve constrained problem efficiently.

- Based on block-elimination + Newton’s method for constrained problem via calls to S for computing directional derivatives.

A modular Newton’s Method

System of nonlinear equations: G(u, ) = 0; N(u, ) = 0.

Newton’s Method:

Gu G u - G

=

Nu N - N

Block elimination:

Define w = - Gu-1G; v = Gu

-1G

= - (N – Nu v)-1 (Nu w + N)

Main idea: Replace computation of w & v by calls to S(u,)

Robustness of Modular Solver

Efficiency of Modular Solver for Constrained Problem











Difficulties with Primal TV

• TV Norm Non Differentiable ) Regularize Functional:

• Primal: Gradient Flow Also Needs Regularization

• Problem Becomes difficult for Small

• But edges smeared for large • Artificial Time Marching at best Linearly Convergent

• Nonlinear relaxation (e.g. GS) non-convergent w/o • Ill-conditioning due to spatial scales (CFL; MG)

Towards Quadratically Convergent Methods

Newton’s Method [R. Chan, T. Chan, Zhou, ‘95]

• As ! 0 ) Size of Domain of Convergence ! 0

• Tied Continuation on .

• But Efficient Continuation Not Easy to Obtain!

Introduce Auxiliary (Dual Variable) w := ru / |ru| , Note |w| =1.

Linearize the (w,u)-system:

• Similar to primal-dual (Conn & Overton ’95, Anderson ’95)

Instead of u system:

Primal-Dual Method: (Chan, Golub, and Mulet ‘95)

• w =ru / |ru| Normal to the Level Sets of u• r ¢ w = Curvature of Level Sets

• Time Marching Regarded as Curvature Driven Flow

• Better Global Convergence Of Newton Meth for (w,u)- system:

(w,u) system more “globally linear” than u-system

Why Linearization Works Well

• Linearization of u-system

• Linearization of (w,u)-system

• Similar Structure and Cost for Both Systems

Linearized Systems

CGM Method: Convergence Results

Residual vs. Iterations ||un-utrue|| vs. Iterations

Relatively robust wrt ; but iteration # increases as -> 0.

Introducing Primal-Dual (J. Carter ’01)

Rewrite By Introducing Dual Variable :

Swap inf & sup (G strict. convex in u and concave (linear) in ):

For each , Solution to infu G(u,):

Back Substitute u, and use:

TV Model:

Primal-Dual ) Dual

Problem Reduces to the Dual Problem:

Optimality Conditions (Discrete Case):

Complementarity:

Update for u :

Objective quadratic; but many constraints

• Many Constraints

• Use Standard Constrained Optimization Meths– [Carter, Vandenberghe ‘02] Barrier Methods– Interior Point Methods?– Penalty Methods?

• But need to estimate algorithmic parameters• Other related ideas:

– Second order cone programming [Yin, Goldfarb, Osher]– Graph Cut [Zabih, Boykov, Komogorov,..]

Potential Difficulties To Dual Problem

Chambolle’s Key Observation (‘04)Optimality Conditions from Dual TV Formulation:

Key Observation:

Complementarity:

Lagrange Multipliers Eliminated!

Elimination of Lagrange Multipliers

Thus, Lagrangian Simplifies To:

Reduces to Explicit Scheme:

Solve via Semi-Implicit Gradient Descent (Chambolle):

Convergence of Scheme

Theorem:(Chambolle)

• Iterates Decrease Energy

• Empirically: # iter ~ # pixels (2-D)

• Convergence: At Best Linear

• No Parameter Needed

• Globally Convergent for any p

ROF Dual: Residual vs #Iterations

# Iterations

Time Marching

Chamb DualObserved image

250 iters 1500 itersTM CD TM CD

Residuals

#Iterations versus

# iterations

Values of 0 100 200 300 400 500 600 700 800

102

103

104

lambda vs iterations

#Iterations versus Image Size

Size of image10

310

410

510

610

2

103

104

# iterations

Non-Smooth Newton Methods

• Recent works by M. Ng, Qi 2005• Chambolle’s equation is non-differentiable:

• Non-smooth Newton: uses sub-gradients near singularities

• Superlinear convergence is achievable in theory











Time Marching vs Fixed PointTime Marching vs Fixed Point

• In image denoising, TV regularization approach leads to:

• In fixed-pt iteration, if one fixes |ux| = |uxn| and u - u0 = un - u0, one needs

to solve:

• Apply 1 step of Richardson with relaxation parameter t & precond. B:

.0)(||

0

uu

u

u

xx

x

.0)(||

01

uuu

u n

x

nx

nx

].||

)([ 01

x

nx

nxnnn

u

uuuBtuu

Time Marching vs Fixed Point (cont.)Time Marching vs Fixed Point (cont.)

• B = 1 time marching scheme by Rudin-Osher-Fatemi (92):

• B = |uxn| time marching scheme by Marquina-Osher (00):

i.e. diagonal precond. Richardson.

• In general, one can choose other B, e.g. multigrid, to speed up convergence.

• If 1 pre- & 1 post- GS smoothing are used, 1 MG cycle 4 time marching steps.

.||

)( 01

x

nx

nxn

nn

u

uuu

t

uu

.||

||)(|| 01

x

nx

nxn

xnn

x

nn

u

uuuuu

t

uu

Comparison of Preconditioners

Inner iteration: 1 Richardson step with diag or MG preconditioner

Nonlinear Iteration: TM vs FPNonlinear Iteration: TM vs FP

Exact FP vs Inexact FPExact FP vs Inexact FP

Inexact FP: 1 MG V-cycle, Exact FP: ~10 MG V-cycle

Show video

Show video

Show video

A demonstration

Show video of X-ray of hand

Multigrid for Differential+Convolution Problems

-Linear MG, V-cycles, various smoothers (R. Chan, C., Wan 97)

> works well for I, Laplacian but not as well for TV.

> difficult to find good smoother for diff-conv problems

> spectral properties of diff and conv operators “flipped”.

-Multilevel Additive Schwarz (Hanke, Vogel 98)

> 2 grid levels, projection of diff-conv operator directly to

a coarse grid.

> coarse problem dense, solved using direct method.

zKKuKu

u

||

Spectrum of - Spectrum of - + K + K

= 1

smallest e.v. middle e.v. largest e.v.

= 10-4

= 10-8

Richardson SmoothingRichardson Smoothing

= 1

initial error 1 iter 5 iter

= 10-4

= 10-8

10 iter

Outline

1. PDE Image Models:



Nonlinear + Optimization techniques



Features:

Automatically detects interior contours!

Works very well for concave objects

Robust w.r.t. noise

Detects blurred contours

The initial curve can be placed anywhere!

Allows for automatical change of topolgy

Active Contour w/o Edges (C.-Vese 99)

C of Evolution Objects Found

)( )(

220

210

21,,

||||

))((||),,(inf21

Cinside Coutside

Ccc

dxdycudxdycu

CinsideAreaCCccF

An Active Contour model “without edges”

Fitting + Regularization terms (length, area)

Connection with Segmentation:

Active contour model partitions the image into 2 segments –

inside and outside.

Level Sets (Osher - Sethian ‘87)

Inside C

Outside C

Outside C0

0

0

0C

nn

n

||

,||

Normal

divKCurvaturen

* Allows automatic topology changes, cusps, merging and breaking.

0),(|),( yxyxC

),(),,0(

)()(||

)(

0

220

210

yxyx

cucudivt

),(),,0(

)()(||

)(

0

220

210

yxyx

cucut

Main Evolutionary Equation in Active Contour Model

Possible Approaches:

-Time marching using explicit or implicit schemes

- Solve for Steady State directly

- Use pointwise-relaxation for linear systems

||

1

n

n

- Linearize curvature term using FP:

Fast Algorithms for Level Set Segmentation

• Implicit methods (CV ’01): allow larger time steps

• Multigrid (Fedkiw, C, Kang, Vese ’01, Tsai, Wilsky, Yezzi ‘00): – interpolate LSF from coarse grid as initial guess for

fine grid

• Direct Optimization (Song-C ’02): – sweep through pixels, decide pixel’s region

membership by value of energy functional.

0))(())((||

)everywhere positive(strictly 0)( ionapproximatour Because

0)()(||

)(

220

210

1

220

210

nn

n

n

cucu

cucu

Evolutionary iterative scheme

220

210

11

))(())((||

)( nnn

nn

nn

cucut

Two linear schemes (fixed point)

Stationary iterative scheme

2

202

10 )()(||

)( cucut

Typically, we use only 1 step of Gauss-Seidel relaxation for 1n

Evolutionary Scheme (CPU time = 59.13 sec)

0 It 50 It 500 It 1000 It 2000 It 4000 It

Stationary Scheme (CPU time = 0.63 sec)

0 It 10 It 20 It 30 It 40 It 50 It

Comparison of the evolutionary/stationary schemes

1 ,1.0 t

1 ,1 ,0 ,2550.01 2 x

Multigrid Ideas For Active Contours

- Use MG for solving the linear systems arising in evolution

- Use MG for solving nonlinear steady state equations

- Use full MG to obtain better initial guess for curve:

> Down-sample image to lower resolution

> Solve active contour problem on low resolution image

> Interpolate level set function to fine resolution image.

smooth good approximation from low resolution.

> Evolution on fine resolution image picks up details.

Refs: Tsai, Willsky, Yezzi 2000, C., Fedkiw, Kang, Vese 2000

Original image

256x171

32x22 64x43

128x86 256x171

Multigrid for Active Contours

Animation of Multigrid Active Contours

4 levels: 256x171, 128x86, 64x43, 32x22

Fast Direct Search Algorithm

1. Initialization. Partition domain into and

2. Advance. For each point x in the domain, if the energy F lower when we change to , then update this point. (Can be updated fast.)

3. Repeat step 2 until energy F remains unchanged

0 0

)(x )(x

Insight: Segmentation only needs sign of LSF but not its value

(Song-C ’02)

(Related to K-mean algorithm, and “region merging” algorithm of Koepfler, Lopez, Morel ’94)

A 2-phase example

(a), (b), (c), (d) are four different initial condition.All of them converge in one sweep!

Example with Noise

Converged in 4 steps.

(Gradient Descent on Euler-Lagrange took > 400 steps.)

Convergence of the algorithm

Theorem: For 2-phase images, algorithm (w/o length term) converges in 1 sweep, independent of sweeping order.

Why is 1-step covergence is possible?

Problem is global: usually cannot have finite step convergence based on local updates only

But, in our case, we can exactly calculate the global energy change via local update (can update global average locally)

Application to piecewise linear CV model(Vese 2002)

))(1(||

)(|||)(|),),((

221002

22100121

Hybxbbu

HyaxaauHccHF

Original P.W.Constant

Converged in 4 steps

P.W. Linear

Converged in 6 steps

Other Fast Algorithms for Level Set Segmentation

• Narrow Band (see Osher-Fedkiw ’03): – only solve PDE near zero LS

• Operator Splitting (Gibou-Fedkiw ’02): – split length term (nonlinearly diffuse) from fidelity term (opt via

k-mean).

• Threshold Dynamics (Esedoglu & Tsai ’04, +Ruuth ‘05):– Extends Merriman, Bence, Osher ’92 diffusion generated motion

by mean curvature to MS segmentation. Alternates diffusion with thresholding.

– Operator split phase field formulation of Mumford-Shah functional

Documents

Computational Algebraic Problems in Variational PDE Image Processing Tony F. Chan Department of Mathematics, UCLA International Summer School in Numerical