OPTIMAL DESIGN IN MEDICAL INVERSION€¦ · EXPERIMENTAL DESIGN BY TRIAL AND ERROR • Pick a model • Run observation model of different experimental designs, and get data • Invert

OPTIMAL DESIGN IN MEDICAL INVERSION

Lior Horesh, Eldad Haber, Luis Tenorio

IBM TJ Watson Research Center - Toronto, Canada - June 2011

INTRODUCTION

• Aim: infer model

• Given

• Design parameters

• Measurements

• Observation model

• Naïve inversion ... Fails...

• Cast as an optimization problem

EXPOSITION - INVERSE PROBLEMS

y

m

( );d m y

( ) ( ); ;F m y d m yh+ =

m

( ) ( ) ( ){

2

ˆ arg min ;

regularizationdata fit

m F m y d y S m= - +144444442 44444443

2ˆ (1)m m- = O

HOW TO IMPROVE MODEL RECOVERY ?

• How can we ...

• Improve observation model ?

• Use more meaningful a-priori information ?

• Extract more information in the measurement procedure ?

( ) ( ) ( ){

2

2

ˆ arg min ;


m F m y d y S m= - +144444442 44444443

( ) ( ) ( ){

2

2

ˆ arg min ;


m m y d y S mF= - +144444442 44444443

( ) ( ) ( ){

2

2

ˆ arg min ;


y ym mF d S m= - +144444442 44444443

( ) ( ) ( ){

2

2

ˆ arg min ;


y ym mF Sd m= - +144444442 44444443

( ) ( ) ( ){

2

2

arg minˆ ;


y ym mF Sd m= - +144444442 44444443

• Provide more efficient optimization schemes ?

PART IREGULARIZATION DESIGN

REGULARIZATION DESIGN -BACKGROUND

REGULARIZATION APPROACHES

• Why regularization is necessary ?

• Imposes a-priori information

• Stabilizes the inversion process

• Provides a unique solution

• Two approaches

• Explicit

• Sparse representation

( ) ( ) ( ){

2

2

ˆ arg min ;


y ym mF Sd m= - +144444442 44444443

HOW TO REPRESENT SPARSELY ?

• Principle of parsimony à True model can be represented by a small number of parameters

• Each column is a prototype model ó atom

• Sparse representation vector

l

p

Over-complete

dictionary Sparse

vector

==

ui

D

Standard

representation

Du

m

SPARSE REPRESENTATION

• Ideally sparsest solution achieved by -’norm’ penalty

• Non-convex à NP-hard combinatorial problem

• Instead employ -norm ( Donoho 2006 )1l

0l

( ) ( )2ˆ 1

2

ˆ arg min ; -u

u DF d yu y ua= +

( ) ( ) ( ){

2

2ˆ

ˆ arg min ; -m

regularizationdata misfit

y y Sm m d mF= +14444442 4444443

SPARSE REPRESENTATION PERFORMANCE

Total Variation Sparse Representation

m d

SmoothnessEnergy

PSF

SPARSE REPRESENTATION PERFORMANCE - DIFFERENT

OPERATORS

Tm ( )1 1

F m dh+ = ( )2 2F m dh+ =1

Du2

Du

Fadili et al 2007

Fadili et al 2007

d F 1 1D u

2 2D u

1 1D u

2 2D ud F

SPARSE REPRESENTATION PERFORMANCE -

DIFFERENT DICTIONARIES

SPARSE REPRESENTATION PERFORMANCE

Chung, Nagy, O’Leary 2006 Figueiredo, Nowak, Wright 2007

Lanczos Hybrid Bidiagonalization

Regularization (HyBR)

Gradient Projection Sparse

Representation (GPSR)

Singular

vectorsWavelets

m d

IMPLICIT REGULARIZATION - RATIONALE

• Sometimes sparse representation performs well, sometimes not...

Why?

• Model and operator dependent

• Some dictionaries perform better than others for specific problems

• should be chosen such that it sparsifies the representations

• One approach: choose from a known set of transforms (Steerable wavelet,

Curvelet, Contourlets, Bandlets, Singular vectors…)

D

Local DCTlocally oscillatory, stationary texture

Wavelets piecewise smooth, isotropic structures

Curveletspiecewise smooth with C2 contours

D

• Objective vs. subjective function

• Heuristic choice of regularization functional based on ad-hoc assumptions

• Solutions are intrinsically subjective to the regularization functional choice

• Adaptability - account for the problem’s statistics (model, operator and noise)

• Efficiency and precision - use the right jargon to express a message/model

IMPLICIT REGULARIZATION BY

DICTIONARY DESIGN

{ }1,...,

sm m

How to construct more objective regularization functionals ?

Design a dictionary by learning from authentic examples

• Approximated Maximum Likelihood (Olshusen & Field 1996, 1997 )

• Overcomplete ICA (Lewicki 2000 )

• Method of Optimal Directions (Engan et al 2001, 2005 )

• Sparse Bayesian Learning (Girolami 2001, Wipf 2005 )

• FOCUSS (Delgado et al 2003 ) - Bayesian MAP & relative complexity

• K-SVD (Aharon & Elad 2006 )

• FOCUSS+ (Murray & Delgado 2007 )

• All addressed sparse coding à observation operator was identity

DICTIONARY DESIGN - PREVIOUS WORK

:F I=

But

REGULARIZATION LEARNING –STATISTICAL MERIT

DICTIONARY LEARNING - OPTIMALITY CRITERION

• Loss

• Mean Square Error

ß Depends on the noise

ß Depends on an unknown model ( ) ( )

2

2ˆ, : ( ), ,m D m d D u mh= -L

( ) ( )2

2ˆ, : ( ), ,MSE m D m d D u m

hh= -E

h

ß Depends on an unknown model m

m

DICTIONARY LEARNING - OPTIMALITY CRITERION

• Bayes risk

• Bayes empirical risk

• Assume a set of feasible authentic model examples is available

ß Computationally infeasible( )2

2ˆ, : ( , )

mtrueD m D u m

e= -ER M

( )1

2

2ˆ, ( , )

s

iempirical i i i

D m D uh

m m=

= -åER

{ }1,...,

sm m

REGULARIZATION LEARNING –OPTIMIZATION FRAMEWORK

OVER-COMPLETE DICTIONARY DESIGN - FORMULATION

• Bi-level optimization problem

• Non-smooth - norm is replaced by a smooth optimization problem with inequality constraints

• Sensitivity by differentiating the necessary conditions of the decomposition

• Non-smooth optimization framework à Modified L-BFGS (Overton 2003 )

ˆ1

2

2

ˆ ˆargmin ( , )1

i iiD

s

isD m D u

hm

=

= -åE

( ) ( )2

2 1s.t . u = arg min ;

i

ii i iu

F D yu y d ua- +

Horesh & Haber 2009

1l

, 0p q ³u p q= -

( ) ( ), , , ,I J

k k

p qf D p gF D p

D DF

¶ ¶= =

¶ ¶

REGULARIZATION DESIGN –NUMERICAL RESULTS

DICTIONARY DESIGN - TRAINING SET

Horesh & Haber 2009

DICTIONARY DESIGN - COMPARISON

Tm ( )T

F m dh+ = 0 0D u

Horesh & Haber 2009

TVm

t tDu

DICTIONARY LEARNING –

ASSESSMENT WITH NOISE

( )F m dh+ = 0 0D u

Horesh & Haber 2009

t tDu

0.1%

1%

5%

truem

TVm

PART IIOPTIMAL EXPERIMENTAL DESIGN

OPTIMAL EXPERIMENTAL DESIGN -MOTIVATION

MOTIVATION – LIMITED ANGLE TOMOGRAPHY

Clerbout 2000

MOTIVATION – DIFFUSE OPTICAL TOMOGRAPHY

MOTIVATION – ULTRASOUND IMAGING

Clerbout 2000

DESIGN EXPERIMENTAL LAYOUT

Stonehenge, 2500 B.C.

DESIGN EXPERIMENTAL PROCESS

Galileo Galilei, 1564-1642

RESPECT EXPERIMENTAL CONSTRAINTS…

French nuclear test, Mururoa, 1970

OPTIMAL EXPERIMENTAL DESIGN -BACKGROUND

• Previous work

• Well-posed problems - well established (Fedorov 1997, Pukelsheim 2006 )

• Ill-posed problems - under-researched (Curtis 1999, Bardow 2008 )

• Many practical problems in engineering and sciences are ill-posed (under-determined)

What makes non-linear ill-posed problems so special ?

ILL VS. WELL-POSED

OPTIMAL EXPERIMENTAL DESIGN

( ) ( ) ( ){

2

2

ˆ arg min ;


y ym mF Sd m= - +144444442 44444443

OPTIMALITY CRITERIA IN OVER-DETERMINED PROBLEMS

• For linear inversion, employ Tikhonov regularized least squares solution

• Bias - variance decomposition

• For over-determined problems

• A-optimal design problem

0a =

( )( )

( )1 ,ˆ

C y

F m ym J J L L J d J

ma

- ¶= + º

¶1444442 444443• • •

( )1

min y

trace C y-æ ö÷ç ÷÷çè ø

( ) ( )2

2 2 12 2

22

variance bias

m̂ m trace JC y J C y L Lms a- -æ ö÷ç- = +÷÷çè ø

E

1444444442 444444443 14444442 4444443

• •

OPTIMALITY CRITERIA IN OVER-DETERMINED PROBLEMS

• Optimality criteria of the information matrix

• A-optimal design ó average variance

• D-optimality ó uncertainty ellipsoid

• E-optimality ó minimax

• Almost a complete alphabet…

( )1

C y-

( )1

min y

trace C y-æ ö÷ç ÷÷çè ø

( )1

min dety

C y-æ ö÷ç ÷÷çè ø

( )1

min max eigy

C y-æ öæ ö÷ç ÷ç ÷÷ç ÷ç ÷ç è øè ø

THE PROBLEM...

• For non-linear ill-posed problems à none of these apply !

• Non-linearity à bias-variance decomposition is impossible

• Ill-posedness à controlling variance alone reduces mildly the error

What strategy can be used ?

Proposition 1 - Common practice so far

Trial and Error…

EXPERIMENTAL DESIGN BY TRIAL AND ERROR

• Pick a model

• Run observation model of different experimental designs, and get data

• Invert and compare recovered models

• Choose the experimental design that provides the best model recovery

Tm =

( )1 1,

TdyF m h+ = ( )2 2

,T

dyF m h+ =

2m̂ =

1m̂ =

2y

1y

THE PROBLEM...

• For non-linear ill-posed problems à none of these apply !

• Non-linearity à bias-variance decomposition is impossible

• Ill-posedness à controlling variance alone reduces mildly the error

What other strategy can be used ?

Proposition 2 - Minimize bias and variance altogether by some optimality criterion

How to define the optimality criterion ?

Horesh, Haber & Tenorio 2010

OPTIMAL EXPERIMENTAL DESIGN -STATISTICAL MERIT

OPTIMALITY CRITERION

• Loss

• Mean Squared Error

ß Depends on the noise

ß Depends on an unknown model

h

ß Depends on an unknown model m

m

( ) ( )2

2ˆ, :MSE m m my y

h= -E

( ) ( )2

2ˆ, :y ym m m= -L

OPTIMALITY CRITERION

• Bayes risk

• Bayes empirical risk

• Assume a set of feasible authentic model examples is available

How can be regularized ?

( ) ( )2,

, 12

ˆ, :1

ij j

k s

i jskmy ym m

=

= -åR

y

{ }1,...,

sm m

( ) ( )2

2ˆ, :

true risk my ym m

e= -ER M

ß Computationally infeasible

OPTIMALITY CRITERION –

SPARSITY CONTROLLED DESIGN

• Regularized empirical risk - Direct density penalty for activation

• Assume: fixed number of experiments

• Let

• The data

• Regularized risk


( ) { ( )1

, ( , ),y

y V Qm A Qd m F mVh h-

= + = +•

{ },y Q V=

1y

2y

( ) ( ),

, 1

2

2

1ˆ, :

reg ij

k s

i jpj

ysk

my ym m b=

= - +åR



• Regularized empirical risk - Direct approach

• Total number of experiments may be large

• Effective when activation of each source and receiver is expensive

• Derivatives of the forward operator w.r.t. Difficult…y




• Regularized empirical risk - Weights formulation

• Density penalty over selected experiments from a predefined set

( )1;F m y( )2;F m y( )999;F m y ( )1

;F m y( )2;F m y( )3;F m y( )1000

;F m y

Haber, Horesh & Tenorio 2010




• Let be discretization of the space

• Let

• The observation operator is weighted

• If experiment is not conducted

( ) ( )2

2

,

, 1

ˆ, : ,1

0r

k s

g j pj

e i jisk

wmw w wbm m=

= - + ³åR

( )( ) ( ) ( )1 1 12 2 2W F m W d m W diag wh+ = =

0i

w =

{ }1, ,

sy yK Y

( )

( )

1

s

F y

F

F y

ì üï ïï ïï ïï ï= í ýï ïï ïï ïï ïî þ

M


i




• Suitable when each experiment conduction is costly

• Source and receiver activation may be highly populated

• Less DOF

• No explicit access to the observation operator needed


OPTIMAL EXPERIMENTAL DESIGN -OPTIMIZATION FRAMEWORK

THE OPTIMIZATION PROBLEMS

• Direct formulation

• Weights formulation

( )

( ) ( )

, 2

2, 1

2

ˆ 2

1ˆ

ˆs.t .

min

arg mi

( ; ) ;nij

k s

ij ji j

ij ij i j ij

y

m

pm

sk

m m

y

S

y

y mF F yh

bm

m

=

- +

= + - +

å

( )

( )( ) ( )12

, 2

2, 1

2

ˆ 2

1ˆ

ˆs.t . ( )

min

arg mi

0

nij

k s

ij ji j

ij ij i j ijm

pwm

sk

m W F F

w w

w

Sm m

bm

h m

=

- +

= + - +

³

å

Horesh, Haber & Tenorio 2011Haber, Horesh & Tenorio 2010

THE OPTIMIZATION PROBLEM

• Bi-level optimization problem

• Assuming the lower optimization level is:

• Convex with a well defined minimum

• With no inequality constraints

( )

( )( ) ( )12

, 2

12

2ˆ

2,

1ˆ

ˆs.t . ( )

min

arg mi

0

nij

k s

ij ji j

i mj ij i

p

i

w

j j

msk

m W F F

w w

w

Sm m

bm

h m

=

- +

= + - +

³

å


( )

( ) ( ) ( )( ) ( )

, 2

2, 1

1

ˆs.t . c c , , ( ) 0

mi

0

'

nk s

ij ji j

ij ij j ij j

p

j

y

i j i

msk

F Fm J m mWw m

w

w

S

wm

m h m

b=

- +

º = + - + =

³

å•

THE OPTIMIZATION PROBLEM

• m is eliminated from the equations and viewed as a function of

• Compute gradient by implicit differentiation

• The sensitivity

• The reduced gradient

( ) ( ) ( )

( ) ( )( )diag

ij

ij ij ij ij

ij

ij

ij ij ij

cJ m J m S m K

m

cJ

wm m d

W

F

¶¢¢= + +

¶

¶= -

¶

•

•


1

: ij ij ij

ij

ijw w

m c cM

m

-æ ö¶ ¶ ¶÷ç ÷ç= = - ÷ç ÷¶ ¶ ¶ç ÷è ø

( )( ) ( ),

1 ,

ij ij ij ii j

wR m M mw w e

LKbbmÑ = - +å •

w

OPTIMAL EXPERIMENTAL DESIGN –NUMERICAL STUDIES

IMPEDANCE TOMOGRAPHY –

OBSERVATION MODEL

• Governing equations

• Following Finite Element discretization

• Given model and design settings

• Find data ,

( ) { ( )1

, ( , ),y

y V Qm A Qd m F mVh h-

= + = +•

n k<nd Î ¡

km Î ¡ y

( ) 0 in

. . on

m u

B C

Ñ × Ñ = W

¶ W

( )A m u Q=

IMPEDANCE TOMOGRAPHY –

DESIGNS COMPARISON

True modelNaive design Optimized design


MAGNETO-TULLERICS TOMOGRAPHY –

OBSERVATION MODEL

• Governing equations

• Following Finite Volume discretization

• Given: model and design settings (frequency )

• Find: data ,

( ) {1; ( ; )

y

d m V A m i sw w

w w hw-= +•

n k<nd Î ¡

km Î ¡ w

1 in

ˆ 0 on r

E i mE i s

E n

m w w-Ñ ´ Ñ ´ - = W

Ñ ´ ´ = ¶ W

r r rr

y

MAGNETOTELLURICS TOMOGRAPHY –

DESIGNS COMPARISON

True model Naive design

Optimized non-linear designOptimal linearized design

Haber, Horesh & Tenorio 2010Haber, Horesh & Tenorio 2008

THE PARETO CURVE – A DECISION MAKING TOOL


10

6

8

0 20

2

4

40 60 80 1000

Ris

k

1w

SUMMARY

SUMMARY

• Generic approaches for design in ill-posed inverse problems

• Design of adaptive regularization

• Optimal experimental design

• Only two (important) elements in the big puzzle...

• New frontiers in inverse problems and optimization

• Vast range of applications in medical imaging, that offers:

• Faster

• Safer

• Higher fidelity image reconstructions

ACKNOWLEDGMENTS

DESIGN IN INVERSION –

OPEN COLLABORATIVE RESEARCH

• IBM Research

• MITACS

• University of British Columbia

Andy Conn

Michael Henderson

Ulisses Mello

David Nahamoo

ACKNOWLEDGMENTS

Michele Benzi

Eldad Haber

Raya Horesh

Jim Nagy

Questions?Thank you

Documents

OPTIMAL DESIGN IN MEDICAL INVERSION€¦ · EXPERIMENTAL DESIGN BY TRIAL AND ERROR • Pick a model • Run observation model of different experimental designs, and get data • Invert