C. Olsson Higher-order and/or non-submodular optimization: Yuri Boykov jointly with Western...

C. Olsson

Higher-order and/or non-submodular optimization:

Yuri Boykov jointly with

Western UniversityCanada

O. Veksler

AndrewDelong

L. Gorelick

C. Nieuwenhuis E. Toppe

I. Ben Ayed

A. Delong

H. Isack

A. Osokin

M. Tang

Tutorial on Medical Image Segmentation: Beyond Level-SetsMICCAI, 2014

Basic segmentation energy E(S)

][ qpNpq

p sswsf

boundary smoothness(quadratic / pairwise)

segment appearance(linear / unary)

- such second-order functionscan be minimized exactly via graph cuts

[Greig et al.’91, Sullivan’94, Boykov-Jolly’01]pqw

n-links

ta cut

t-link

Basic segmentation energy E(S)

][ qpNpq

p sswsf

boundary smoothness(quadratic / pairwise)

segment appearance(linear / unary)

- submodular second-order functionscan be minimized exactly via graph cuts

[Greig et al.’91, Sullivan’94, Boykov-Jolly’01]pqw

n-links

ta cut

t-link

[Hammer 1968, Pickard&Ratliff 1973]

2any (binary) segmentation functional E(S)is a set function E:

Submodular set functions

Set function is submodular if for any2:E

)()()()( TESETSETSE TS,

Significance: any submodular set function can be globally optimized in polynomial time

[Grotschel et al.1981,88, Schrijver 2000]

)||( 9O

Edmonds 1970 (for arbitrary lattices)

Set function is submodular if for any2:E

)()}{()()}{( SEvSETEvTE TS

equivalent intuitive interpretation via “diminishing returns”

Easily follows from the previous definition: E(T))}{()()}{( vSESEvTESTS TS

Significance: any submodular set function can be globally optimized in polynomial time

[Grotschel et al.1981,88, Schrijver 2000])||( 9O

Assume set Ω and 2nd-order (quadratic) function

Function E(S) is submodular if for any

)()()()( 10011100 ,,,, pqpqpqpq EEEE Nqp )( ,

Significance: submodular 2nd-order boolean (set) function can be globally optimized in polynomial time by graph cuts

[Hammer 1968, Pickard&Ratliff 1973]

qppq s,sESE )()( }{ 10,, qp ssIndicator variables

[Boros&Hammer 2000, Kolmogorov&Zabih2003])|||N(| 2O

Combinatorialoptimization

Continuousoptimization

Global Optimization

submodularity convexity?

Global Optimization

f (x,y) = a∙xy for < 0 a

EXAMPLE:

f (0,0) + f (1,1) ≤ f (0,1) + f (1,0)

submodular binary energy convex continuous extension

submodularity convexity?

submodularity convexity

Global Optimization

f (0,0) + f (1,1) ≤ f (0,1) + f (1,0)

submodular binary energy concave continuous extension

f (x,y) = a∙xy for < 0 aEXAMPLE:

Assume Gibbs distribution over binary random variables

posterior optimization(in Markov Random Fields)

}{ 10,ps

))((),...,( SEexpssPr n1

Theorem [Boykov, Delong, Kolmogorov, Veksler in unpublished book 2014?]

All random variables sp are positively correlated iff set function E(S) is submodular

That is,… submodularity implies MRF with “smoothness” prior

}{ 1s|pS p

second-order submodular energy

][ qpNpq

p sswsf

boundary smoothnesssegment region/appearance

wpq< 0 would imply non-submodularity

Now - more difficult energies

• Non-submodular energies

• High-order energies

Beyond submodularity:even second-order is challenging

Example: deconvolution

][)( ||

qpp Bq

qBp sswsISEp

image I blurred with mean kernel

“partial enumeration”

submodular quadratic term

non-submodular quadratic term

Beyond submodularity:even second-order is challenging

Example: quadratic volumetric prior

|| S0V

e.g. 2

0 psVSV ||E(S)

NOTE: any convex cardinality potentials are supermodular [Lovasz’83](not submodular)

Many useful higher-order energies

• Cardinality potentials• Curvature of the boundary• Shape convexity• Segment connectivity• Appearance entropy, color consistency• Distribution consistency• High-order shape moments• …

• QPBO [survey Kolmogorov&Rother, 2007]

• LP relaxations [e.g. Schlezinger, Komodakis, Kolmogorov, Savchinsky,…]

• Message passing, e.g. TRWS [Kolmogorov]

• Partial Enumeration [Olsson&Boykov, 2013]

• Trust Region [e.g. Gorelick et al., 2013]

• Bound Optimization [Bilmes et al.’2006, Ben Ayed et al.’2013, Tang et al.’2014]

• Submodularization [e.g. Gorelick et al., 2014]

Non-submodular and/or high-order functions E(S)

Optimization is a very active area of research…

Our recent methods: local submodular approximations (submodularization)

• gradient descent + level sets (in continuous case)• LP relaxations (QPBO, TRWS, etc)• local linear approximations (parallel ICM)

Prior approximation techniques based on linearization

• Fast Trust Region [CVPR 13]• Auxiliary Cuts [CVPR 13]• LSA [CVPR 14] • Pseudo-bounds [ECCV 14]

, [Bilmes et al.2009]

Trust Region Approximation

][)Pr(

Sp ssw|Ilnp

submodular terms

appearance log-likelihoods boundary length

non-submodular term

volume constraint

Linear approx.at S0

S0 submodularapprox.

trust region

oppp ssd )(

L2 distance to S0

Volume Constraint for Vertebrae segmentation

Log-Lik. + length

Trust region Vs. bound optimization

Fast iterative techniques

Sub-problem solutions

Convex

continuous formulation

Graph Cuts

(submodular discrete formulation)

Gradient descent and Level sets

Trust Region vs. Gradient Descent

iso-surfaces of

iso-surfaces ofapproximation

L(S)(S)U(S)E 00 ~

L(S)R(S)E(S) E S-

trust region||S – S0|| ≤ d

gradient

Gorelick et al., CVPR 2013

Trust Region vs. Gradient Descent

iso-surfaces of

iso-surfaces ofapproximation

L(S)(S)U(S)E 00 ~

L(S)R(S)E(S) E S-

trust region||S – S0|| ≤ d

gradient

e.g., Gorelick et al., CVPR 2013

Can be solved globally with graph cuts

or convex relaxation

Bound Optimization

solution spaceSSt St+1

At(S)E(S)

E(St+1)

(Majorize-Minimize, Auxiliary Function, Surrogate Function)e.g., Ben Ayed et al., CVPR 2013

Bound Optimization

solution spaceSSt St+1 St+2

At+1(S)E(S)

E(St+1)

E(St+2)

local minimum

Bound Optimization

solution spaceSSt St+1 St+2

At+1(S)E(S)

E(St+1)

E(St+2)

Can be solved globally with

graph cuts or

Convex relaxation

Pseudo-Bound Optimization(Majorize-Minimize, Auxiliary Function, Surrogate Function)Tang et al.,

ECCV 2014

Make larger move!

solution space

SSt St+1

E(St+1)

An example illustrating the differencebetween the three frameworks

Initialization

Adding volume

Log-likelihood

Gradient Descent

Initialization

Adding volume

Log-likelihood

Trust Region

Initialization

Adding volume

Log-likelihood

Bound Optimization

Initialization

Adding volume

Log-likelihood

Pseudo-Bound Optimization

Summary of main differences/similarities

Framework Properties ApplicabilityGradientDescent

(e.g. level-sets)

- Fixed small moves- Linear approximation

Any differentiable functional

Trust region

- Adaptive large moves- Submodular or convex

approximation

Any differentiable functional

Bound optimization

- Unlimited large moves- Submodular or convex

bounds

Need a bound (not always easy)

Trust Region Vs. Level sets:

Experimental comparisons

Level set evolution without re initializationC. Li et al., CVPR 2005

No ad hoc initialization procedures

Keep a distance function by adding:

Frequently used by the community (>1500 citations)

Allows relatively large values of dt

High Low

Min for a circle

Compactness (Circularity) prior

Trust region Vs. Level sets: Example 1

Ben Ayed et al., MICCAI 14

Trust region

Compactness (Circularity prior)

Trust region Vs. Level sets: Example 1

Without

compactness 8

Trust region Vs. Level sets: Example 2 Volume constraint + Length

Discrete

Continuous

Trust Region Vs. Level Sets: Example 2 Volume Constraint + Length

Init LS, t=1 LS, t=5 LS, t=10 LS, t=50 LS, t=1000

FTR, α=10FTR, α=5FTR, α=2 9

Volume Constraint + Length

Trust Region Vs. Level Sets: Example 2

Level-Set, t=50Level-Set, t=1 Level-Set, t=5 Level-Set, t=10 Level-Set, t=1000

Init FTR, α=10FTR, α=2 FTR, α=5

Trust Region Vs. Level Sets: Example 3 Shape moment Constraint + Length

Up to order-2 moments learned

from user-providedellipse

Shape moment Constraint + Length

Level-Set, t=1 Level-Set, t=1000

Level-Set, dt=1 Level-Set, dt=50 Level-Set, dt=1000Init Surrogate

L2 bin count Constraint

Bound Optimization Vs. Level Sets

2L||- TS||

Entropy-based segmentation

Interactive segmentation

with box

volume balancing

color consistency

qppq ssw

boundary smoothness

|S| |Si||W||W|/20 |Wi|0 |Wi|/2

submodular termsnon-submodular term

1.8 sec1.3 sec

11.9 sec

576 sec

Pseudo-Bound optimization [ECCV 2014]

One-Cut [ICCV 2013]

GrabCut (block-coordinate descent)

Dual Decomposition

Entropy-based segmentation

Curvature optimization

instead of boundary length (pairwise submodular term)

(3-rd order non-submodular potential, see next slide)

S3-cliques

with configurations(0,1,0) and (1,0,1)

general intuition example

Nieuwenhuis et al., CVPR 2014

more responses where curvature is higher

Need to penalize 3-clique configurations

(010) and (101)

uses submodular approximation and Trust Region [CVPR13, CVPR14]

Nieuwenhuis et al., CVPR 2014

SegmentationExamples

length-based regularization

elastica [Heber,Ranftl,Pock, 2012]

90-degree curvature [El-Zehiry&Grady, 2010]

our squared curvature

Take-home messages

- submodular or convex approximations are a good alternative to linearization methods (Level-Sets, QPBO, TRWS, etc.,) (Trust Region, Auxiliary Functions, Pseudo-bounds, etc.)[CVPR 2013, CVPR 2014, ECCV 2014]

- code available

C. Olsson Higher-order and/or non-submodular optimization: Yuri Boykov jointly with Western...

Documents

CS4442/9542b Artificial Intelligence II Prof. Olga Veksler · CS4442/9542b Artificial Intelligence II Prof. Olga Veksler . Outline •Introduction to Machine Learning •Basic Linear

Olga Veksler, University of Western Ontario

Draft of the Programme of the Veksler and Baldin ...Draft of the Programme of the Veksler and Baldin Laboratory of High Energies for Next 7 Years A.I.Malakhov Report at the 92nd Session

CS4442/9542b Artificial Intelligence II prof. Olga Veksler · 2016-03-04 · CS4442/9542b Artificial Intelligence II prof. Olga Veksler. Lecture 13 . Natural Language Processing

CS4442/9542b ArtificialIntelligence II prof. Olga Veksler

Service Design: Innovation 2.0 - Olof Schybergson & Claudia Gorelick, Fjord

rOOT gOreLiCK...gummosus rOOT gOreLiCK 2008 volume 80 number 1 Just south of San Quintin, we spent an hour and a half at a site by the Pacific Ocean, where I saw five sepa-rate crested

Yuri Boykov Research Interests

CS4442/9542b Artificial Intelligence II Prof. Olga Veksler

DM B4 Justice Dept 2 of 2 Fdr- 1995 Memos Paper Clipped) Re the Wall- Gorelick

Veksler Engineering Delhi India

20100822 computervision boykov

1 CS4442/9542b Artificial Intelligence II Prof. Olga Veksler Lecture 1 Course Introduction

CS434a/541a: Pattern Recognition Prof. Olga Veksler

Compact Windows for Visual Correspondence via Minimum Ratio Cycle Algorithm Olga Veksler NEC Labs America

SCIENTIFIC ABSTRACT VEKSLER, M.A. - VEKSLER, V.I

Sylvia Mae Gorelick - Olympians, we are breathless

S ROOT GORELICK Cochemiea halei on peninsular Baja

CS4442/9542b Artificial Intelligence II prof. Olga Veksler ...olga/Courses/Winter2018/CS4442_9542b/L4-ML-L… · CS4442/9542b Artificial Intelligence II prof. Olga Veksler Lecture

Hossam Isack Yuri Boykov Olga Veksler arXiv:1602.01006v1 ... · Hossam Isack Yuri Boykov Olga Veksler Computer Science University of Western Ontario, Canada habdelka@csd.uwo.ca yuri@csd.uwo.ca