61
Fast Training of Pairwise or Higher-order CRFs Nikos Komodakis (University of Crete)

Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

  • Upload
    hathu

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

Fast Training of Pairwise or Higher-order CRFs

Nikos Komodakis

(University of Crete)

Page 2: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

Introduction

Page 3: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

Conditional Random Fields (CRFs)

• Ubiquitous in computer vision

• segmentation stereo matchingoptical flow image restorationimage completion object detection/localization...

• and beyond

• medical imaging, computer graphics, digital communications, physics…

• Really powerful formulation

Page 4: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

Conditional Random Fields (CRFs)

• Extensive research for more than 20 years

• Key task: inference/optimization for CRFs/MRFs

• Lots of progress

• Graph-cut based algorithms

• Message-passing methods

• LP relaxations

• Dual Decomposition

• ….

• Many state-of-the-art methods:

Page 5: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

MAP inference for CRFs/MRFs

• Hypergraph

– Nodes

– Hyperedges/cliques

• High-order MRF energy minimization problem

high-order potential(one per clique)

unary potential(one per node)

hyperedges

nodes

Page 6: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

CRF training

• But how do we choose the CRF potentials?

• Through training

• Parameterize potentials by w

• Use training data to learn correct w

• Characteristic example of structured output learning [Taskar], [Tsochantaridis, Joachims]

:f Z X

can contain any kind of data

CRF variables (structured object)

how to determine f ?

Page 7: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

CRF training

• Stereo matching:

• Z: left, right image

• X: disparity map

Z X

f :

argf parameterized by w

Page 8: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

CRF training

• Denoising:

• Z: noisy input image

• X: denoised output image

Z X

f :

argf parameterized by w

Page 9: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

CRF training

• Object detection:

• Z: input image

• X: position of object parts

Z X

f :

argf parameterized by w

Page 10: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

CRF training

• Equally, if not more, important than MAP inference

• Better optimize correct energy (even approximately)

• Than optimize wrong energy exactly

• Becomes even more important as we move towards:

• complex models

• high-order potentials

• lots of parameters

• lots of training data

Page 11: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

Contributions of this work

Page 12: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

CRF Training via Dual Decomposition

• A very efficient max-margin learning framework for general CRFs

Page 13: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

CRF Training via Dual Decomposition

• A very efficient max-margin learning framework for general CRFs

• Key issue: how to properly exploit CRF structure during learning?

Page 14: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

CRF Training via Dual Decomposition

• A very efficient max-margin learning framework for general CRFs

• Key issue: how to properly exploit CRF structure during learning?

• Existing max-margin methods:

• use MAP inference of an equally complex CRF as subroutine

• have to call subroutine many times during learning

Page 15: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

CRF Training via Dual Decomposition

• A very efficient max-margin learning framework for general CRFs

• Key issue: how to properly exploit CRF structure during learning?

• Existing max-margin methods:

• use MAP inference of an equally complex CRF as subroutine

• have to call subroutine many times during learning

• Suboptimal

Page 16: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

CRF Training via Dual Decomposition

• A very efficient max-margin learning framework for general CRFs

• Key issue: how to properly exploit CRF structure during learning?

• Existing max-margin methods:

• use MAP inference of an equally complex CRF as subroutine

• have to call subroutine many times during learning

• Suboptimal

• computational efficiency ???• accuracy ???• theoretical properties ???

Page 17: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

CRF Training via Dual Decomposition

• Reduces training of complex CRF to parallel training of a

series of easy-to-handle slave CRFs

Page 18: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

CRF Training via Dual Decomposition

• Reduces training of complex CRF to parallel training of a

series of easy-to-handle slave CRFs

• Handles arbitrary pairwise or higher-order CRFs

Page 19: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

CRF Training via Dual Decomposition

• Reduces training of complex CRF to parallel training of a

series of easy-to-handle slave CRFs

• Handles arbitrary pairwise or higher-order CRFs

• Uses very efficient projected subgradient learning scheme

Page 20: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

CRF Training via Dual Decomposition

• Reduces training of complex CRF to parallel training of a

series of easy-to-handle slave CRFs

• Handles arbitrary pairwise or higher-order CRFs

• Uses very efficient projected subgradient learning scheme

• Allows hierarchy of structured prediction learning

algorithms of increasing accuracy

Page 21: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

CRF Training via Dual Decomposition

• Reduces training of complex CRF to parallel training of a

series of easy-to-handle slave CRFs

• Handles arbitrary pairwise or higher-order CRFs

• Uses very efficient projected subgradient learning scheme

• Allows hierarchy of structured prediction learning

algorithms of increasing accuracy

• Extremely flexible and adaptable

• Easily adjusted to fully exploit additional structure in any class of CRFs (no matter if they contain very high order cliques)

Page 22: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

Dual Decomposition for CRF MAP Inference (brief review)

Page 23: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

MRF Optimization via Dual Decomposition

• Very general framework for MAP inference [Komodakiset al. ICCV07, PAMI11]

• Master = coordinator (has global view)Slaves = subproblems (have only local view)

Page 24: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

MRF Optimization via Dual Decomposition

• Very general framework for MAP inference [Komodakiset al. ICCV07, PAMI11]

• Master = (MAP-MRF on hypergraph G)

= min

Page 25: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

MRF Optimization via Dual Decomposition

• Very general framework for MAP inference [Komodakiset al. ICCV07, PAMI11]

• Set of slaves =(MRFs on sub-hypergraphs Gi whose union covers G)

• Many other choices possible as well

Page 26: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

MRF Optimization via Dual Decomposition

• Very general framework for MAP inference [Komodakiset al. ICCV07, PAMI11]

• Optimization proceeds in an iterative fashion viamaster-slave coordination

Page 27: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

MRF Optimization via Dual Decomposition

convex dual relaxation

Set of slave MRFs

For each choice of slaves, master solves (possibly different) dual relaxation• Sum of slave energies = lower bound on MRF optimum• Dual relaxation = maximum such bound

Page 28: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

MRF Optimization via Dual Decomposition

convex dual relaxation

Set of slave MRFs

Choosing more difficult slaves tighter lower boundstighter dual relaxations

Page 29: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

CRF Training via Dual Decomposition

Page 30: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

Max-margin Learning via Dual Decomposition

• Input:

• k-th sample: CRF on

• (training set of K samples)

• Feature vectors: ,

• Constraints:

= dissimilarity function, ( )

Page 31: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

Max-margin Learning via Dual Decomposition

• Input:

• k-th sample: CRF on

• (training set of K samples)

• Feature vectors: ,

• Constraints:

= dissimilarity function, ( )

Page 32: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

Max-margin Learning via Dual Decomposition

• Regularized hinge loss functional:

Page 33: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

Max-margin Learning via Dual Decomposition

• Regularized hinge loss functional:

Page 34: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

Max-margin Learning via Dual Decomposition

• Regularized hinge loss functional:

Page 35: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

Max-margin Learning via Dual Decomposition

• Regularized hinge loss functional:

Learning objective intractable due to this term

Problem

Page 36: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

Max-margin Learning via Dual Decomposition

• Regularized hinge loss functional:

Solution: approximate it with dual relaxation from decomposition

Page 37: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

Max-margin Learning via Dual Decomposition

Page 38: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

Max-margin Learning via Dual Decomposition

• Regularized hinge loss functional:

now

Page 39: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

Max-margin Learning via Dual Decomposition

• Regularized hinge loss functional:

now

before

Page 40: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

Max-margin Learning via Dual Decomposition

• Regularized hinge loss functional:

now

before

Training of complex CRF was decomposed to parallel training of easy-to-handle slave CRFs !!!

Page 41: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

Max-margin Learning via Dual Decomposition

• Global optimum via projected subgradient learning algorithm:

• Input:

• Hypergraphs:

• Training samples:

• Feature vectors:

Page 42: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

Max-margin Learning via Dual Decomposition

• Global optimum via projected subgradient learning algorithm:

so as to satisfy

Page 43: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

Max-margin Learning via Dual Decomposition

• Global optimum via projected subgradient learning algorithm:

so as to satisfy

Page 44: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

Max-margin Learning via Dual Decomposition

• Global optimum via projected subgradient learning algorithm:

so as to satisfy

Page 45: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

Max-margin Learning via Dual Decomposition

• Global optimum via projected subgradient learning algorithm:

so as to satisfy

fully specified from ,ˆ i kx

Page 46: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

Max-margin Learning via Dual Decomposition

• Global optimum via projected subgradient learning algorithm:

so as to satisfy

fully specified from ,ˆ i kx

Page 47: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

Max-margin Learning via Dual Decomposition

• Incremental subgradient version:

• Further improves computational efficiency

• Same optimality guarantees & theoretical properties

• Same as before but considers subset of slaves per iteration

• Subset chosen

• deterministically or

• randomly (stochastic subgradient)

Page 48: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

Max-margin Learning via Dual Decomposition

• Resulting learning scheme:

Slave problems freely chosen by the user

Easily adaptable to further exploit special structure of any class of CRFs

Very efficient and very flexible

Requires from the user only to provide an optimizer

for the slave MRFs

Page 49: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

Choice of decompositions

= true loss (intractable)

= loss from decomposition

(hierarchy of learning algorithms)

(upper bound property)

Page 50: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

• denotes following decomposition:

– One slave per clique – Corresponding sub-hypergraph :

,

• Resulting slaves often easy (or even trivial) to solve even if global problem is complex and NP-hard

– leads to widely applicable learning algorithm

• Corresponding dual relaxation is an LP

– Generalizes well known LP relaxation for pairwiseMRFs (at the core of most state-of-the-art methods)

Choice of decompositions

Page 51: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

• But we can do better if CRFs have special structure…

Choice of decompositions

• Structure means:

• More efficient optimizer for slaves (speed)

• Optimizer that handles more complex slaves (accuracy)

(Almost all known examples fall in one of above two cases)

• We adapt decomposition to problem at hand to exploit its structure

Page 52: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

• But we can do better if CRFs have special structure…

• E.g., pattern-based high-order potentials (for a clique c)

[Komodakis & Paragios CVPR09]

• We only assume:

– Set is sparse

– It holds

– No other restriction

subset of (its vectors called patterns)

Choice of decompositions

Page 53: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

Experimental results

Page 54: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

Image denoising

• Piecewise constant images

• Potentials:

• Goal: learn pairwise potential

Z X

k

p p p pu x x z ,k

pq p q p qh x x V x x

Page 55: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

Image denoising

Page 56: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

Stereo matching

• Potentials:

• Goal: learn function f (.) for gradient-modulated Potts model

k left right

p p pu x I p I p x

, ( )k left

pq p q p qh x x f I p x x

Page 57: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

Stereo matching

“Venus” disparity using f (.) as estimated at different iterations of learning algorithm

• Potentials:

• Goal: learn function f (.) for gradient-modulated Potts model

k left right

p p pu x I p I p x

, ( )k left

pq p q p qh x x f I p x x

Page 58: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

Stereo matching

Sawtooth4.9%

Poster 3.7%

Bull2.8%

• Potentials:

• Goal: learn function f (.) for gradient-modulated Potts model

k left right

p p pu x I p I p x

, ( )k left

pq p q p qh x x f I p x x

Page 59: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

Stereo matching

• Potentials:

• Goal: learn function f (.) for gradient-modulated Potts model

k left right

p p pu x I p I p x

, ( )k left

pq p q p qh x x f I p x x

Page 60: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

High-order Pn Potts model

Cost for optimizing slave CRF: O(|L|)

• 100 training samples

• 50x50 grid

• clique size 3x3

• 5 labels (|L|=5)

[Kohli et al. CVPR07]

Goal: learn high order CRF with potentials given by

Fast training

Page 61: Fast Training of Pairwise or Higher-order CRFscseweb.ucsd.edu/~jmcauley/cvpr2011/slides/komodakis.pdf · CRF Training via Dual Decomposition •Reduces training of complex CRF to

Clustering

• Goal: distance learning for clustering [ICCV’11]

• In this case cliques are of very high order: contain all variables

• Novel discriminative formulation

• Significant extension: dual decomposition for training high-order CRFs with latent variables

• On top of that, there exist unobserved (latent) variables during training