Yilin Wang 11/5/2009. Background Labeling Problem Labeling: Observed data set (X) Label set (L)...
52
Conditional Random Fields for Image Labeling Yilin Wang 11/5/2009
Yilin Wang 11/5/2009. Background Labeling Problem Labeling: Observed data set (X) Label set (L) Inferring the labels of the data points Most vision problems
Background Labeling Problem Labeling: Observed data set (X)
Label set (L) Inferring the labels of the data points Most vision
problems can be posed as labeling problems Stereo matching Image
segmentation Image restoration
Slide 3
Examples of Labeling Problem Stereo Matching For a pixel in
image 1, where is the corresponding pixel in image 2? Picture
source: S. Lazebnik Label set: Differences (disparities) between
corresponding pixels
Slide 4
Examples of Labeling Problem Image Segmentation To partition an
image into multiple disjoint regions. Picture source:
http://mmlab.ie.cuhk.edu.hk Label set: Region IDs
Slide 5
Examples of Labeling Problem Image Restoration To "compensate
for" or "undo" defects which degrade an image. Picture source:
http://www.photorestoration.co.nz Label set: Restored
Intensities
Slide 6
Background Image Labeling Given an image, the system should
automatically partition it into semantically meaningful areas each
labeled with a specific object class Cow Lawn Plane Sky Building
Tree
Slide 7
Image Labeling Problem Given : the observed data from an input
image, where is the data from site (pixel or block) of the image
set S A pre-defined label set Let be the corresponding labels at
the image sites, we want to find proper L to maximize the
conditional probability :
Slide 8
Which kinds of information can be used for labeling? Features
from individual sites Intensity, color, texture, Interactions with
neighboring sites Contextual information Vegetation Sky or
Building?
Slide 9
Two types of interactions Interaction with neighboring labels
(Spatial smoothness of labels) neighboring sites tend to have
similar labels(except at the discontinuities) Interactions with
neighboring observed data Building Sky
Slide 10
Information for Image Labeling Let be the label of the site of
the image set S, and N i be the neighboring sites of site i Three
kinds of information for image labeling Features from local site
Interaction with neighboring labels Interaction with neighboring
observed data Picture source: S. Xiang site iS-{i}NiNi
Slide 11
Markov Random Fields (MRF) Markov Random Fields (MRFs) are the
most popular models to incorporate local contextual constraints in
labeling problems Let be the label of the site of the image set S,
and N i be the neighboring sites of site i The label set L ( ) is
said to be a MRF on S w.r.t. a neighborhood N iff the following
condition is satisfied: Markov property: Maintain global spatial
consistency by only considering relatively local dependencies
!
Slide 12
Markov-Gibbs Equivalence Let l be a realization of, then P(l)
has an explicit formulation (Gibbs distribution): where Energy
function Z: a normalizing factor, called the partition function T:
a constant Clique C k ={{i,i,i,}|i,i,i, are neighbors to one
another} Potential functions represent a priori knowledge of
interactions between labels of neighboring sites
Slide 13
Auto-Model With clique potentials up to two sites, the energy
takes the form When and, where G i () are arbitrary functions (or
constants) and are constants reflecting the pairwise interaction
between i and i, the energy is Such models are called auto-models
(Besag 1974)
Slide 14
Parameter Estimation Give the functional form of the auto-model
How to specify its parameters ?
Slide 15
Maximum Likelihood Estimation Given a realization l of a MRF,
the maximum likelihood (ML) estimate maximizes the conditional
probability P(l | ) (the likelihood of ), that is: By Bayesian
rules: The prior P() is assumed to be flat when the prior
information is totally unavailable. In this case, the MAP
estimation reduces to the ML estimation.
Slide 16
Maximum Likelihood Estimation The likelihood function is in the
Gibbs form where However, the computation of Z() is intractable
even for moderately sized problems because there are a
combinatorial number of elements in the configuration space L.
Slide 17
Maximum Pseudo-Likelihood Assumption: and are independent.
Notice that the pseudo-likelihood does not involve the partition
function Z. {, } can be obtained by solving
Slide 18
Inference Recall that in image labeling, we want to find L such
that maximizes the posterior, by Bayesian rules: Where prior
probability: Let and then: posterior energy likelihood energy prior
energy
Slide 19
MAP-MRF Labeling Maximizing a posterior probability is
equivalent to minimizing the posterior energy: Steps of MAP:
Picture source: S. Xiang N C
Slide 20
MRF for Image Labeling Difficulties and disadvantages Very
strict independence assumptions : The interactions among label are
modeled by the priori term (P(L)), and are independent of the
observation data, which prohibits one from modeling data- dependent
interactions in labels.
Slide 21
Conditional Random Fields Let G = (S, E) be a graph, then (X,
L) is said to be a Conditional Random Field (CRF) if, when
conditioned on X, the random variables obey the Markov property
with respect to the graph: where S-{i} is the set of all sites in
the graph except the site i, Ni is the set of neighbors of the site
i in G. Compare with MRF:
Slide 22
CRF According to the Markov-Gibbs equivalence, we have If only
up to pairwise clique potentials are nonzero, the posterior
probability P(L| X) has the form where V1 and V2 are called the
association and interaction potentials, respectively, in the CRF
literature
Slide 23
CRF vs. MRF MRF is a generative model(Two-step) Infer
likelihood and prior Then use Bayes theorem to determine posterior
CRF is a discriminative model(One-step) Directly infer
posterior
Slide 24
CRF vs. MRF More differences between the CRF and MRF MRF: CRF:
In CRF, both Association and Interaction Potentials are functions
of all the observation data as well as that of the labels
Slide 25
Discriminative Random Fields The Discriminative Random Field
(DRF) is a special type of CRF with two extensions. First, a DRF is
defined over 2D lattices (such as the image grid) Second, the unary
(association) and pairwise (interaction) potentials therein are
designed using local discriminative classifiers Kumar, S. and M.
Hebert: `Discriminative Random Fields: A Discriminative Framework
for Contextual Interaction in Classification'. ICCV 2003
Slide 26
DRF Formulation of DRF where and are called association
potential and interaction potential Picture source: S. Xiang
Slide 27
Association Potential is modeled using a local discriminative
model that outputs the association of the site i with class l i as:
where f i (.) is a linear function that maps an patch centered at
site i to a feature vector. Picture source: S. Srihari
Slide 28
Association Potential For binary classification (l i = -1 or
1), the posterior at site i is modeled using a logistic function:
Since l i = -1 or 1, the probability can be compactly expressed as:
Finally, the association potential is defined as: Picture source:
S. Srihari
Slide 29
Interaction Potential The interaction potential can be seen as
a measure of how the labels at neighboring sites i and i' should
interact given the observed image X. Given the features at two
different sites, a pairwise discriminative model is defined as:
where is a function that maps an patch centered at site i to a
feature vector, is a new feature vector, and v are model parameters
is a measure of how likely site i and i have the same label given
image X
Slide 30
Interaction Potential The interaction potential is modeled
using data-dependent term along with constant smoothing term The
first term is a data-independent smoothing term, similar to the
auto-model The second term is a [-1, 1] mapping of the pairwise
logistic function, which ensures that both terms have the same
range Ideally, the data-dependent term will act as a discontinuity
adaptive model that will moderate smoothing when the data from two
sites is 'different'.
Slide 31
Discussion of I(l i,l i, X) Suppose,, and Also for simplicity,
we assume Then If only considering, will never choose b. The second
term is used to compensate the effect of the smoothness assumption.
Oversmoothed!
Slide 32
Parameter Estimation ={w,v,,K} Maximum likelihood estimation In
the conventional maximum-likelihood approach, the evaluation of Z
is an NP-hard problem. Approximate evaluation of partition function
Z by pseudo-likelihood where m indexes over the training images and
M is the total number of training images, and Subject to 0K1
Slide 33
Inference Objective function: Iterated Conditional Modes (ICM)
algorithm Given an initial label configuration, ICM maximizes the
local conditional probabilities iteratively, i.e. ICM yields local
maximum of the posterior and has been shown to give reasonably good
results
Slide 34
Experiment Task: detecting man-made structures in natural
scenes Database Corel (training: 108 images, test: 129 images) Each
image was divided in non-overlapping 16*16 pixels blocks Compared
methods Logistic MRF DRF
Slide 35
Experiment Results Detection Rates (DR) and False Positives
(FP) Superscript - indicates no neighborhood data interaction was
used. K = 0 indicates the absence of the data independent term in
the interaction potential in DRF. The DRF reduces false positives
from the MRF by more than 48%.
Slide 36
Experiment Results For similar detection rate, DRF has the
lower false positives Detection rate of DRF is higher than that of
MRF for similar false positives
Slide 37
Conclusion of DRF Pros Provide the benefits of discriminative
models Demonstrate good performance Cons Although the model
outperforms traditional MRFs, it is not strong enough to capture
long range correlations among the labels due to the rigid lattice
based structure which allows for only pairwise interactions
Slide 38
Problem Local information can be confused when there are large
overlaps between different classes Sky or Water ? Solution:
utilizing the global contextual information to improve the
performance
Slide 39
Multiscale Conditional Random Field (mCRF) Considering features
in different scales Local Features (site) Regional Label Features
(small patch) Global Label Features (big patch or the whole image)
The conditional probability P(L|X) is formulated by features in
different scales s: where He, X., R. Zemel, and M.
Carreira-Perpinan: 2004, `Multiscale conditional random fields for
image labelling'. IEEE Int. Conf. CVPR.
Slide 40
Local Features The local feature of site i is represented by
the outputs of several filters. The aim is to associate the patch
with one of a predefined set of labels.
Slide 41
Local Classifier Here a multilayer perceptron is used as the
local classifier. Independently at each site i, the local
classifier produces a conditional distribution over label variable
l i given filter outputs x i within an image patch centered on site
(pixel) i: where are the classifier parameters.
Slide 42
Regional Label Features Encoding a particular constraint
between the image and the labels within a region of the image
Sample pattern: ground pixels (brown) above water pixels
(cyan)
Slide 43
Global Label Features Operate at a coarser resolution,
specifying common value for a patch of sites in the label field.
Sample pattern: sky pixels (blue) at the top of the image, hippo
pixels (red) in the middle, and water pixels (cyan) near the
bottom.
Slide 44
Feature Function Global label features are trained by
Restricted Boltzmann Machines (RBM) two layers: label sites (L) and
features (f) features and labels are fully inter-connected, with no
intra-layer connections where w a is the parameter connecting
hidden global label feature f a and label sites L The joint
distribution of the global label feature model is:
Slide 45
Feature Function By marginalizing out the hidden variables (f),
the global component of the model is: Similarly, the regional
component of the model can be represented as: By multiplicatively
combining component conditional distributions:
Slide 46
Parameter Estimation and Inference Parameter Estimation The
conditional model is trained discriminatively based on the
Conditional Maximum Likelihood (CML) criterion, which maximizes the
log conditional likelihood: Inference Maximum Posterior Marginals
(MPM):
Slide 47
Experiment Results Database Corel (100 images with 7 labels)
Sowerby (104 images with 8 labels) Compared methods Single
classifier (MLP) MRF mCRF
Slide 48
Labeling Results
Slide 49
Conclusion of mCRF Pros Formulating the image labeling problem
into a multiscale CRF model Combining the local and larger scale
contextual information in a unique framework Cons Including
additional classifiers operating at different scales into the mCRF
framework introduces a large number of model parameters The model
assumes conditional independence of hidden variables given the
label field
Slide 50
More CRF models Hierarchical Conditional Random Field (HCRF) S.
Kumar and M. Hebert. A hierarchical field framework for unified
context-based classification. 2005 Jordan Reynolds and Kevin
Murphy. Figure-ground segmentation using a hierarchical conditional
random field. 2007 Tree Structured Conditional Random Fields (TCRF)
P. Awasthi, A. Gagrani, and B. Ravindran, Image Modeling using Tree
Structured Conditional Random Fields. 2007
Slide 51
Reference Li, S. Z.: 2009, `Markov Random Field Modeling in
Image Analysis. Springer, 2009 Kumar, S. and M. Hebert: 2003,
`Discriminative Random Fields: A Discriminative Framework for
Contextual Interaction in Classification'. in proc. IEEE
International Conference on Computer Vision (ICCV) He, X., R.
Zemel, and M. Carreira-Perpinan: 2004, `Multiscale conditional
random fields for image labelling'. IEEE Int. Conf. CVPR.