Yilin Wang 11/5/2009. Background Labeling Problem Labeling: Observed data set (X) Label set (L) Inferring the labels of the data points Most vision problems

Yilin Wang 11/5/2009

Background Labeling Problem Labeling: Observed data set (X) Label set (L) Inferring the labels of the data points Most vision problems can be posed as labeling problems Stereo matching Image segmentation Image restoration

Examples of Labeling Problem Stereo Matching For a pixel in image 1, where is the corresponding pixel in image 2? Picture source: S. Lazebnik Label set: Differences (disparities) between corresponding pixels

Examples of Labeling Problem Image Segmentation To partition an image into multiple disjoint regions. Picture source: http://mmlab.ie.cuhk.edu.hk Label set: Region IDs

Examples of Labeling Problem Image Restoration To "compensate for" or "undo" defects which degrade an image. Picture source: http://www.photorestoration.co.nz Label set: Restored Intensities

Background Image Labeling Given an image, the system should automatically partition it into semantically meaningful areas each labeled with a specific object class Cow Lawn Plane Sky Building Tree

Image Labeling Problem Given : the observed data from an input image, where is the data from site (pixel or block) of the image set S A pre-defined label set Let be the corresponding labels at the image sites, we want to find proper L to maximize the conditional probability :

Which kinds of information can be used for labeling? Features from individual sites Intensity, color, texture, Interactions with neighboring sites Contextual information Vegetation Sky or Building?

Two types of interactions Interaction with neighboring labels (Spatial smoothness of labels) neighboring sites tend to have similar labels(except at the discontinuities) Interactions with neighboring observed data Building Sky

Information for Image Labeling Let be the label of the site of the image set S, and N i be the neighboring sites of site i Three kinds of information for image labeling Features from local site Interaction with neighboring labels Interaction with neighboring observed data Picture source: S. Xiang site iS-{i}NiNi

Markov Random Fields (MRF) Markov Random Fields (MRFs) are the most popular models to incorporate local contextual constraints in labeling problems Let be the label of the site of the image set S, and N i be the neighboring sites of site i The label set L ( ) is said to be a MRF on S w.r.t. a neighborhood N iff the following condition is satisfied: Markov property: Maintain global spatial consistency by only considering relatively local dependencies !

Markov-Gibbs Equivalence Let l be a realization of, then P(l) has an explicit formulation (Gibbs distribution): where Energy function Z: a normalizing factor, called the partition function T: a constant Clique C k ={{i,i,i,}|i,i,i, are neighbors to one another} Potential functions represent a priori knowledge of interactions between labels of neighboring sites

Auto-Model With clique potentials up to two sites, the energy takes the form When and, where G i () are arbitrary functions (or constants) and are constants reflecting the pairwise interaction between i and i, the energy is Such models are called auto-models (Besag 1974)

Parameter Estimation Give the functional form of the auto-model How to specify its parameters ?

Maximum Likelihood Estimation Given a realization l of a MRF, the maximum likelihood (ML) estimate maximizes the conditional probability P(l | ) (the likelihood of ), that is: By Bayesian rules: The prior P() is assumed to be flat when the prior information is totally unavailable. In this case, the MAP estimation reduces to the ML estimation.

Maximum Likelihood Estimation The likelihood function is in the Gibbs form where However, the computation of Z() is intractable even for moderately sized problems because there are a combinatorial number of elements in the configuration space L.

Maximum Pseudo-Likelihood Assumption: and are independent. Notice that the pseudo-likelihood does not involve the partition function Z. {, } can be obtained by solving

Inference Recall that in image labeling, we want to find L such that maximizes the posterior, by Bayesian rules: Where prior probability: Let and then: posterior energy likelihood energy prior energy

MAP-MRF Labeling Maximizing a posterior probability is equivalent to minimizing the posterior energy: Steps of MAP: Picture source: S. Xiang N C

MRF for Image Labeling Difficulties and disadvantages Very strict independence assumptions : The interactions among label are modeled by the priori term (P(L)), and are independent of the observation data, which prohibits one from modeling data- dependent interactions in labels.

Conditional Random Fields Let G = (S, E) be a graph, then (X, L) is said to be a Conditional Random Field (CRF) if, when conditioned on X, the random variables obey the Markov property with respect to the graph: where S-{i} is the set of all sites in the graph except the site i, Ni is the set of neighbors of the site i in G. Compare with MRF:

CRF According to the Markov-Gibbs equivalence, we have If only up to pairwise clique potentials are nonzero, the posterior probability P(L| X) has the form where V1 and V2 are called the association and interaction potentials, respectively, in the CRF literature

CRF vs. MRF MRF is a generative model(Two-step) Infer likelihood and prior Then use Bayes theorem to determine posterior CRF is a discriminative model(One-step) Directly infer posterior

CRF vs. MRF More differences between the CRF and MRF MRF: CRF: In CRF, both Association and Interaction Potentials are functions of all the observation data as well as that of the labels

Discriminative Random Fields The Discriminative Random Field (DRF) is a special type of CRF with two extensions. First, a DRF is defined over 2D lattices (such as the image grid) Second, the unary (association) and pairwise (interaction) potentials therein are designed using local discriminative classifiers Kumar, S. and M. Hebert: `Discriminative Random Fields: A Discriminative Framework for Contextual Interaction in Classification'. ICCV 2003

DRF Formulation of DRF where and are called association potential and interaction potential Picture source: S. Xiang

Association Potential is modeled using a local discriminative model that outputs the association of the site i with class l i as: where f i (.) is a linear function that maps an patch centered at site i to a feature vector. Picture source: S. Srihari

Association Potential For binary classification (l i = -1 or 1), the posterior at site i is modeled using a logistic function: Since l i = -1 or 1, the probability can be compactly expressed as: Finally, the association potential is defined as: Picture source: S. Srihari

Interaction Potential The interaction potential can be seen as a measure of how the labels at neighboring sites i and i' should interact given the observed image X. Given the features at two different sites, a pairwise discriminative model is defined as: where is a function that maps an patch centered at site i to a feature vector, is a new feature vector, and v are model parameters is a measure of how likely site i and i have the same label given image X

Interaction Potential The interaction potential is modeled using data-dependent term along with constant smoothing term The first term is a data-independent smoothing term, similar to the auto-model The second term is a [-1, 1] mapping of the pairwise logistic function, which ensures that both terms have the same range Ideally, the data-dependent term will act as a discontinuity adaptive model that will moderate smoothing when the data from two sites is 'different'.

Discussion of I(l i,l i, X) Suppose,, and Also for simplicity, we assume Then If only considering, will never choose b. The second term is used to compensate the effect of the smoothness assumption. Oversmoothed!

Parameter Estimation ={w,v,,K} Maximum likelihood estimation In the conventional maximum-likelihood approach, the evaluation of Z is an NP-hard problem. Approximate evaluation of partition function Z by pseudo-likelihood where m indexes over the training images and M is the total number of training images, and Subject to 0K1

Inference Objective function: Iterated Conditional Modes (ICM) algorithm Given an initial label configuration, ICM maximizes the local conditional probabilities iteratively, i.e. ICM yields local maximum of the posterior and has been shown to give reasonably good results

Experiment Task: detecting man-made structures in natural scenes Database Corel (training: 108 images, test: 129 images) Each image was divided in non-overlapping 16*16 pixels blocks Compared methods Logistic MRF DRF

Experiment Results Detection Rates (DR) and False Positives (FP) Superscript - indicates no neighborhood data interaction was used. K = 0 indicates the absence of the data independent term in the interaction potential in DRF. The DRF reduces false positives from the MRF by more than 48%.

Experiment Results For similar detection rate, DRF has the lower false positives Detection rate of DRF is higher than that of MRF for similar false positives

Conclusion of DRF Pros Provide the benefits of discriminative models Demonstrate good performance Cons Although the model outperforms traditional MRFs, it is not strong enough to capture long range correlations among the labels due to the rigid lattice based structure which allows for only pairwise interactions

Problem Local information can be confused when there are large overlaps between different classes Sky or Water ? Solution: utilizing the global contextual information to improve the performance

Multiscale Conditional Random Field (mCRF) Considering features in different scales Local Features (site) Regional Label Features (small patch) Global Label Features (big patch or the whole image) The conditional probability P(L|X) is formulated by features in different scales s: where He, X., R. Zemel, and M. Carreira-Perpinan: 2004, `Multiscale conditional random fields for image labelling'. IEEE Int. Conf. CVPR.

Local Features The local feature of site i is represented by the outputs of several filters. The aim is to associate the patch with one of a predefined set of labels.

Local Classifier Here a multilayer perceptron is used as the local classifier. Independently at each site i, the local classifier produces a conditional distribution over label variable l i given filter outputs x i within an image patch centered on site (pixel) i: where are the classifier parameters.

Regional Label Features Encoding a particular constraint between the image and the labels within a region of the image Sample pattern: ground pixels (brown) above water pixels (cyan)

Global Label Features Operate at a coarser resolution, specifying common value for a patch of sites in the label field. Sample pattern: sky pixels (blue) at the top of the image, hippo pixels (red) in the middle, and water pixels (cyan) near the bottom.

Feature Function Global label features are trained by Restricted Boltzmann Machines (RBM) two layers: label sites (L) and features (f) features and labels are fully inter-connected, with no intra-layer connections where w a is the parameter connecting hidden global label feature f a and label sites L The joint distribution of the global label feature model is:

Feature Function By marginalizing out the hidden variables (f), the global component of the model is: Similarly, the regional component of the model can be represented as: By multiplicatively combining component conditional distributions:

Parameter Estimation and Inference Parameter Estimation The conditional model is trained discriminatively based on the Conditional Maximum Likelihood (CML) criterion, which maximizes the log conditional likelihood: Inference Maximum Posterior Marginals (MPM):

Experiment Results Database Corel (100 images with 7 labels) Sowerby (104 images with 8 labels) Compared methods Single classifier (MLP) MRF mCRF

Labeling Results

Conclusion of mCRF Pros Formulating the image labeling problem into a multiscale CRF model Combining the local and larger scale contextual information in a unique framework Cons Including additional classifiers operating at different scales into the mCRF framework introduces a large number of model parameters The model assumes conditional independence of hidden variables given the label field

More CRF models Hierarchical Conditional Random Field (HCRF) S. Kumar and M. Hebert. A hierarchical field framework for unified context-based classification. 2005 Jordan Reynolds and Kevin Murphy. Figure-ground segmentation using a hierarchical conditional random field. 2007 Tree Structured Conditional Random Fields (TCRF) P. Awasthi, A. Gagrani, and B. Ravindran, Image Modeling using Tree Structured Conditional Random Fields. 2007

Reference Li, S. Z.: 2009, `Markov Random Field Modeling in Image Analysis. Springer, 2009 Kumar, S. and M. Hebert: 2003, `Discriminative Random Fields: A Discriminative Framework for Contextual Interaction in Classification'. in proc. IEEE International Conference on Computer Vision (ICCV) He, X., R. Zemel, and M. Carreira-Perpinan: 2004, `Multiscale conditional random fields for image labelling'. IEEE Int. Conf. CVPR.

End Thanks!

Documents

Yilin Wang 11/5/2009. Background Labeling Problem Labeling: Observed data set (X) Label set (L) Inferring the labels of the data points Most vision problems