C. Olsson Higher-order and/or non-submodular optimization: Yuri Boykov jointly with Western University Canada O. Veksler Andrew Delong L. Gorelick C. NieuwenhuisE

C. Olsson

Higher-order and/or non-submodular optimization:

Yuri Boykov jointly with

Western UniversityCanada

O. Veksler

AndrewDelong

L. Gorelick

C. Nieuwenhuis E. Toppe

I. Ben Ayed

A. Delong

H. Isack

A. Osokin

M. Tang

Tutorial on Medical Image Segmentation: Beyond Level-SetsMICCAI, 2014

Basic segmentation energy E(S)

][ qpNpq

pqpp

p sswsf

boundary smoothness(quadratic / pairwise)

segment appearance(linear / unary)

- such second-order functionscan be minimized exactly via graph cuts

[Greig et al.’91, Sullivan’94, Boykov-Jolly’01]pqw

n-links

s

ta cut

t-lin

k

t-link

Basic segmentation energy E(S)

][ qpNpq

pqpp

p sswsf

boundary smoothness(quadratic / pairwise)

segment appearance(linear / unary)

- submodular second-order functionscan be minimized exactly via graph cuts

[Greig et al.’91, Sullivan’94, Boykov-Jolly’01]pqw

n-links

s

ta cut

t-lin

k

t-link

[Hammer 1968, Pickard&Ratliff 1973]

2any (binary) segmentation functional E(S)is a set function E:

S

Ω

Submodular set functions


Set function is submodular if for any2:E

)()()()( TESETSETSE TS,

Significance: any submodular set function can be globally optimized in polynomial time

[Grotschel et al.1981,88, Schrijver 2000]

S TΩ

)||( 9O

Edmonds 1970 (for arbitrary lattices)


Set function is submodular if for any2:E

)()}{()()}{( SEvSETEvTE TS

S TΩ

equivalent intuitive interpretation via “diminishing returns”

v v

Easily follows from the previous definition: E(T))}{()()}{( vSESEvTESTS TS

Significance: any submodular set function can be globally optimized in polynomial time

[Grotschel et al.1981,88, Schrijver 2000])||( 9O

Assume set Ω and 2nd-order (quadratic) function

Function E(S) is submodular if for any

)()()()( 10011100 ,,,, pqpqpqpq EEEE Nqp )( ,

Significance: submodular 2nd-order boolean (set) function can be globally optimized in polynomial time by graph cuts

[Hammer 1968, Pickard&Ratliff 1973]

N)pq(

qppq s,sESE )()( }{ 10,, qp ssIndicator variables

[Boros&Hammer 2000, Kolmogorov&Zabih2003])|||N(| 2O


Combinatorialoptimization

Continuousoptimization

Global Optimization

submodularity convexity?

Global Optimization

f (x,y) = a∙xy for < 0 a

01

1y

xA C

D

B

EXAMPLE:

01

1y

xA C

D

B

f (0,0) + f (1,1) ≤ f (0,1) + f (1,0)

submodular binary energy convex continuous extension

submodularity convexity?

submodularity convexity

Global Optimization

01

1y

xA C

D

B

f (0,0) + f (1,1) ≤ f (0,1) + f (1,0)

01

1y

xA C

D

B

submodular binary energy concave continuous extension

f (x,y) = a∙xy for < 0 aEXAMPLE:

Assume Gibbs distribution over binary random variables

for

posterior optimization(in Markov Random Fields)

}{ 10,ps

))((),...,( SEexpssPr n1

Theorem [Boykov, Delong, Kolmogorov, Veksler in unpublished book 2014?]

All random variables sp are positively correlated iff set function E(S) is submodular

That is,… submodularity implies MRF with “smoothness” prior

}{ 1s|pS p

second-order submodular energy

][ qpNpq

pqpp

p sswsf

boundary smoothnesssegment region/appearance

wpq< 0 would imply non-submodularity

Now - more difficult energies

• Non-submodular energies

• High-order energies

Beyond submodularity:even second-order is challenging

Example: deconvolution

][)( ||

Npq

qpp Bq

qBp sswsISEp

21 )(

image I blurred with mean kernel

QPBO

TRWS

“partial enumeration”

submodular quadratic term

non-submodular quadratic term

Beyond submodularity:even second-order is challenging

Example: quadratic volumetric prior

E(S)

|| S0V

e.g. 2

02

0 psVSV ||E(S)

NOTE: any convex cardinality potentials are supermodular [Lovasz’83](not submodular)

Many useful higher-order energies

• Cardinality potentials• Curvature of the boundary• Shape convexity• Segment connectivity• Appearance entropy, color consistency• Distribution consistency• High-order shape moments• …

• QPBO [survey Kolmogorov&Rother, 2007]

• LP relaxations [e.g. Schlezinger, Komodakis, Kolmogorov, Savchinsky,…]

• Message passing, e.g. TRWS [Kolmogorov]

• Partial Enumeration [Olsson&Boykov, 2013]

• Trust Region [e.g. Gorelick et al., 2013]

• Bound Optimization [Bilmes et al.’2006, Ben Ayed et al.’2013, Tang et al.’2014]

• Submodularization [e.g. Gorelick et al., 2014]

Non-submodular and/or high-order functions E(S)

Optimization is a very active area of research…

Our recent methods: local submodular approximations (submodularization)

• gradient descent + level sets (in continuous case)• LP relaxations (QPBO, TRWS, etc)• local linear approximations (parallel ICM)

Prior approximation techniques based on linearization

• Fast Trust Region [CVPR 13]• Auxiliary Cuts [CVPR 13]• LSA [CVPR 14] • Pseudo-bounds [ECCV 14]

, [Bilmes et al.2009]

Trust Region Approximation

|S|0

][)Pr(

Npq

qppqp

Sp ssw|Ilnp

submodular terms

appearance log-likelihoods boundary length

non-submodular term

volume constraint

Linear approx.at S0

S0

S0 submodularapprox.

trust region

p

oppp ssd )(

L2 distance to S0

Volume Constraint for Vertebrae segmentation

Log-Lik. + length

20

1

Trust region Vs. bound optimization

Fast iterative techniques

Sub-problem solutions

Convex

continuous formulation

Graph Cuts

(submodular discrete formulation)

Gradient descent and Level sets

Vs.

Vs.

Trust Region vs. Gradient Descent

2

S~

0S

S*

iso-surfaces of

iso-surfaces ofapproximation

L(S)(S)U(S)E 00 ~

L(S)R(S)E(S) E S-

trust region||S – S0|| ≤ d

gradient

Gorelick et al., CVPR 2013

Trust Region vs. Gradient Descent

3

S~

0S

S*

iso-surfaces of

iso-surfaces ofapproximation

L(S)(S)U(S)E 00 ~

L(S)R(S)E(S) E S-

trust region||S – S0|| ≤ d

gradient

e.g., Gorelick et al., CVPR 2013

Can be solved globally with graph cuts

or convex relaxation

Bound Optimization

solution spaceSSt St+1

At(S)E(S)

E(St)

E(St+1)

(Majorize-Minimize, Auxiliary Function, Surrogate Function)e.g., Ben Ayed et al., CVPR 2013

4

Bound Optimization

solution spaceSSt St+1 St+2

At+1(S)E(S)

E(St)

E(St+1)

E(St+2)

local minimum


4

Bound Optimization

solution spaceSSt St+1 St+2

At+1(S)E(S)

E(St)

E(St+1)

E(St+2)

Can be solved globally with

graph cuts or

Convex relaxation


4

Pseudo-Bound Optimization(Majorize-Minimize, Auxiliary Function, Surrogate Function)Tang et al.,

ECCV 2014

Make larger move!

(a)

(b)

(c)

solution space

SSt St+1

E(S)

E(St)

E(St+1)

4

5

An example illustrating the differencebetween the three frameworks

Initialization

Adding volume

Log-likelihood

Gradient Descent

5


Initialization

Adding volume

Log-likelihood

Trust Region

5


Initialization

Adding volume

Log-likelihood

Bound Optimization

5


Initialization

Adding volume

Log-likelihood

Pseudo-Bound Optimization

6

Summary of main differences/similarities

Framework Properties ApplicabilityGradientDescent

(e.g. level-sets)

- Fixed small moves- Linear approximation

Any differentiable functional

Trust region

- Adaptive large moves- Submodular or convex

approximation

Any differentiable functional

Bound optimization

- Unlimited large moves- Submodular or convex

bounds

Need a bound (not always easy)

7

Trust Region Vs. Level sets:

Experimental comparisons

Level set evolution without re initializationC. Li et al., CVPR 2005

No ad hoc initialization procedures

Keep a distance function by adding:

Frequently used by the community (>1500 citations)

Allows relatively large values of dt

High Low

Min for a circle

Compactness (Circularity) prior

Trust region Vs. Level sets: Example 1

Ben Ayed et al., MICCAI 14

8

Trust region

Compactness (Circularity prior)

Trust region Vs. Level sets: Example 1

Without

compactness 8

Trust region Vs. Level sets: Example 2 Volume constraint + Length

Discrete

Continuous

9

Trust Region Vs. Level Sets: Example 2 Volume Constraint + Length

Init LS, t=1 LS, t=5 LS, t=10 LS, t=50 LS, t=1000

FTR, α=10FTR, α=5FTR, α=2 9

Volume Constraint + Length

Trust Region Vs. Level Sets: Example 2

10



10



10

Level-Set, t=50Level-Set, t=1 Level-Set, t=5 Level-Set, t=10 Level-Set, t=1000

Init FTR, α=10FTR, α=2 FTR, α=5

Trust Region Vs. Level Sets: Example 3 Shape moment Constraint + Length

Up to order-2 moments learned

from user-providedellipse

11


Shape moment Constraint + Length


11




11

Level-Set, t=1 Level-Set, t=1000



11

Level-Set, dt=1 Level-Set, dt=50 Level-Set, dt=1000Init Surrogate

12

L2 bin count Constraint

Bound Optimization Vs. Level Sets

2L||- TS||



12



12

Entropy-based segmentation

Interactive segmentation

with box

volume balancing

color consistency

][

Npq

qppq ssw

boundary smoothness

|S| |Si||W||W|/20 |Wi|0 |Wi|/2

+ +

submodular termsnon-submodular term

1.8 sec1.3 sec

11.9 sec

576 sec

Pseudo-Bound optimization [ECCV 2014]

One-Cut [ICCV 2013]

GrabCut (block-coordinate descent)

Dual Decomposition

Entropy-based segmentation

Curvature optimization

dsS

2

instead of boundary length (pairwise submodular term)

(3-rd order non-submodular potential, see next slide)

S

S3-cliques

with configurations(0,1,0) and (1,0,1)

pp+

p-

general intuition example

Nieuwenhuis et al., CVPR 2014

more responses where curvature is higher

Need to penalize 3-clique configurations

(010) and (101)

uses submodular approximation and Trust Region [CVPR13, CVPR14]

Nieuwenhuis et al., CVPR 2014

SegmentationExamples

length-based regularization

elastica [Heber,Ranftl,Pock, 2012]


90-degree curvature [El-Zehiry&Grady, 2010]


our squared curvature


Take-home messages

- submodular or convex approximations are a good alternative to linearization methods (Level-Sets, QPBO, TRWS, etc.,) (Trust Region, Auxiliary Functions, Pseudo-bounds, etc.)[CVPR 2013, CVPR 2014, ECCV 2014]

- code available

Documents

C. Olsson Higher-order and/or non-submodular optimization: Yuri Boykov jointly with Western University Canada O. Veksler Andrew Delong L. Gorelick C. NieuwenhuisE