12
Copyright (c) 2010 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected]. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 1 Multiregion Image Segmentation by Parametric Kernel Graph Cuts M. Ben Salah * , Member, IEEE, A. Mitiche *‡ , Member, IEEE, and I. Ben Ayed , Member, IEEE * INRS-EMT, Institut national de la recherche scientifique Montr´ eal, H5A 1K6, QC, Canada Email:[bensalah, mitiche]@emt.inrs.ca General Electric (GE) Canada, 268 Grosvenor, E5-137, London, N6A 4V2, ON, Canada E-mail:[email protected] Corresponding author Abstract— The purpose of this study is to investigate mul- tiregion graph cut image partitioning via kernel mapping of the image data. The image data is transformed implicitly by a kernel function so that the piecewise constant model of the graph cut formulation becomes applicable. The objective function contains an original data term to evaluate the deviation of the transformed data, within each segmentation region, from the piecewise constant model, and a smoothness, boundary preserving regularization term. The method affords an effective alternative to complex modeling of the original image data while taking advantage of the computational benefits of graph cuts. Using a common kernel function, energy minimization typically consists of iterating image partitioning by graph cut iterations and evaluations of region parameters via fixed point computation. A quantitative and comparative performance assessment is car- ried out over a large number of experiments using synthetic grey level data as well as natural images from the Berkeley database. The effectiveness of the method is also demonstrated through a set of experiments with real images of a variety of types such as medical, SAR, and motion maps. Index Terms— Image segmentation, graph cuts, kernel k- means. I. I NTRODUCTION Image segmentation is a fundamental problem in computer vision. It has been the subject of a large number of theoretical and practical studies [1], [2], [3]. Its purpose is to divide an image into regions answering a given description. Many studies have focused on variational formulations because they result in the most effective algorithms [4], [5], [6], [7], [8], [9], [10], [11]. Variational formulations [4] seek an image partition which minimizes an objective functional containing terms that embed descriptions of its regions and their boundaries. The lit- erature abounds of both continuous and discrete formulations. Continuous formulations view images as continuous func- tions over a continuous domain [4], [5], [6]. The most effective minimizes active curve functionals via level sets [6], [2], [3]. The minimization relies on gradient descent. As a result, the algorithms converge to a local minimum, can be affected by the initialization [6], [12], and are notoriously slow in spite of the various computational artifacts which can speed their execution [13]. The long time of execution is the major impediment in many applications, particularly those which deal with large images and segmentations into a large number of regions [12]. Discrete formulations view images as discrete functions over a positional array [14], [15], [9], [16], [18], [19]. Combi- natorial optimization methods which use graph cut algorithms have been the most efficient. They have been of intense interest recently as several studies have demonstrated that graph cut optimization can be useful in image analysis. Very fast meth- ods have been implemented for image segmentation [15], [20], [21], [22], [23], [24], [25], motion and stereo segmentation [26], [27], tracking [28], and restoration [14], [11]. Following the work in [10], objective functionals typically contain a data term to measure the conformity of the image data within the segmentation regions to statistical models and a regularization term for smooth segmentation boundaries. Minimization by graph cuts of objective functionals with a piecewise constant data term produce nearly global optima [11] and, therefore, are less sensitive to initialization. Unsupervised graph cut methods, which do not require user intervention, have used the piecewise model, or its Gaussian generalization, because the data term can be written in the form required by the graph cut algorithm [11], [25], [29], [30]. However, although useful, these models are not generally applicable. For instance SAR images are best described by the Gamma distribution [31], [32], [33] and polarimetric images the Wishart distribution or the complex Gaussian [12], [34]. Even within the same image, different regions may require completely different models. For example, the luminance within shadow regions in sonar imagery is well modeled by the Gaussian distribution whereas the Rayleigh distribution is more accurate in the reverberation regions [35]. The parameters of all such models do not depend on the data in a way that would preserve the form of the data term required by the graph cut algorithm and, as a result, the models cannot be used. The form of the objective function must be a sum over all pixels of pixel or pixel neighborhood dependent data and variables. Variables which are global over the segmentation regions do not apply if they cannot be written in a way that affords such a form. The parameters of a Gaussian are, in discrete approximation, linear combinations of image data and, as a result, can be used in the objective function. One way to introduce more general models is to allow user interaction. Several interactive graph cut methods have used models more general than the Gaussian by adding a process

Kernel Graph Cut

Embed Size (px)

Citation preview

Page 1: Kernel Graph Cut

Copyright (c) 2010 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

1

Multiregion Image Segmentation by ParametricKernel Graph Cuts

M. Ben Salah∗, Member, IEEE, A. Mitiche∗‡, Member, IEEE, and I. Ben Ayed†, Member, IEEE∗INRS-EMT, Institut national de la recherche scientifique

Montreal, H5A 1K6, QC, CanadaEmail:[bensalah, mitiche]@emt.inrs.ca

†General Electric (GE) Canada, 268 Grosvenor, E5-137, London, N6A 4V2, ON, CanadaE-mail:[email protected]

‡ Corresponding author

Abstract— The purpose of this study is to investigate mul-tiregion graph cut image partitioning via kernel mapping ofthe image data. The image data is transformed implicitly bya kernel function so that the piecewise constant model of thegraph cut formulation becomes applicable. The objective functioncontains an original data term to evaluate the deviation ofthe transformed data, within each segmentation region, fromthe piecewise constant model, and a smoothness, boundarypreserving regularization term. The method affords an effectivealternative to complex modeling of the original image data whiletaking advantage of the computational benefits of graph cuts.Using a common kernel function, energy minimization typicallyconsists of iterating image partitioning by graph cut iterationsand evaluations of region parameters via fixed point computation.A quantitative and comparative performance assessment is car-ried out over a large number of experiments using synthetic greylevel data as well as natural images from the Berkeley database.The effectiveness of the method is also demonstrated through aset of experiments with real images of a variety of types such asmedical, SAR, and motion maps.

Index Terms— Image segmentation, graph cuts, kernel k-means.

I. INTRODUCTION

Image segmentation is a fundamental problem in computervision. It has been the subject of a large number of theoreticaland practical studies [1], [2], [3]. Its purpose is to dividean image into regions answering a given description. Manystudies have focused on variational formulations because theyresult in the most effective algorithms [4], [5], [6], [7], [8], [9],[10], [11]. Variational formulations [4] seek an image partitionwhich minimizes an objective functional containing terms thatembed descriptions of its regions and their boundaries. The lit-erature abounds of both continuous and discrete formulations.

Continuous formulations view images as continuous func-tions over a continuous domain [4], [5], [6]. The most effectiveminimizes active curve functionals via level sets [6], [2], [3].The minimization relies on gradient descent. As a result, thealgorithms converge to a local minimum, can be affectedby the initialization [6], [12], and are notoriously slow inspite of the various computational artifacts which can speedtheir execution [13]. The long time of execution is the majorimpediment in many applications, particularly those whichdeal with large images and segmentations into a large numberof regions [12].

Discrete formulations view images as discrete functionsover a positional array [14], [15], [9], [16], [18], [19]. Combi-natorial optimization methods which use graph cut algorithmshave been the most efficient. They have been of intense interestrecently as several studies have demonstrated that graph cutoptimization can be useful in image analysis. Very fast meth-ods have been implemented for image segmentation [15], [20],[21], [22], [23], [24], [25], motion and stereo segmentation[26], [27], tracking [28], and restoration [14], [11]. Followingthe work in [10], objective functionals typically contain a dataterm to measure the conformity of the image data within thesegmentation regions to statistical models and a regularizationterm for smooth segmentation boundaries. Minimization bygraph cuts of objective functionals with a piecewise constantdata term produce nearly global optima [11] and, therefore,are less sensitive to initialization.

Unsupervised graph cut methods, which do not require userintervention, have used the piecewise model, or its Gaussiangeneralization, because the data term can be written in theform required by the graph cut algorithm [11], [25], [29],[30]. However, although useful, these models are not generallyapplicable. For instance SAR images are best described bythe Gamma distribution [31], [32], [33] and polarimetricimages the Wishart distribution or the complex Gaussian[12], [34]. Even within the same image, different regionsmay require completely different models. For example, theluminance within shadow regions in sonar imagery is wellmodeled by the Gaussian distribution whereas the Rayleighdistribution is more accurate in the reverberation regions [35].The parameters of all such models do not depend on the datain a way that would preserve the form of the data term requiredby the graph cut algorithm and, as a result, the models cannotbe used. The form of the objective function must be a sum overall pixels of pixel or pixel neighborhood dependent data andvariables. Variables which are global over the segmentationregions do not apply if they cannot be written in a way thataffords such a form. The parameters of a Gaussian are, indiscrete approximation, linear combinations of image data and,as a result, can be used in the objective function.

One way to introduce more general models is to allow userinteraction. Several interactive graph cut methods have usedmodels more general than the Gaussian by adding a process

Page 2: Kernel Graph Cut

Copyright (c) 2010 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

2

to learn the region parameters at any step of the graph cutsegmentation process. These parameters become part of thedata at each step. However, the parameter learning and thesegmentation process are only loosely coupled in the sense thatthey do not result from the objective function optimization.Current interactive methods have treated the case of theforeground/background segmentation, i.e., the segmentationinto two regions. The studies in [10], [14] have described theregions by their image histograms and [17], [18] used mixturesof Gaussians. Although very effective in several applications,for instance image editing, interactive methods do not extendreadily to multiregion segmentation into a large number ofregions, where user interactions and general models, such ashistograms and mixtures of Gaussians, become intractable.Note that it is always possible to use a given model in un-supervised graph cut segmentation if this model were learnedbeforehand. In this case, the model becomes part of the data.However, modeling is notoriously difficult and time consuming[36]. Moreover, a model learned using a sample from a classof images is generally not applicable to images of a differentclass.

The purpose of this study is to investigate kernel mappingto bring the unsupervised graph cut formulation to bear onmultiregion segmentation of images more general than Gaus-sian. The image data is mapped implicitly via a kernel functioninto data of a higher dimension so that the piecewise constantmodel, and the unsupervised graph cut formulation thereof,becomes applicable (see Fig. 1 for an illustration). The map-ping is implicit because the dot product, the Euclidean normthereof, in the higher dimensional space of the transformeddata can be expressed via the kernel function without explicitevaluation of the transform [37]. Several studies have shownevidence that the prevalent kernels in pattern classificationare capable of properly clustering data of complex structure[38], [39], [40], [41], [42], [43], [37]. In the view that imagesegmentation is spatially constrained clustering of image data[44], kernel mapping should be quite effective in segmentationof various types of images.

The proposed functional contains two terms: an originalkernel-induced term which evaluates the deviation of themapped image data within each region from the piecewiseconstant model and a regularization term expressed as a func-tion of the region indices. Using a common kernel function, theobjective functional minimization is carried out by iterationsof two consecutive steps: (1) minimization with respect to theimage segmentation by graph cuts and, (2) minimization withrespect to the regions parameters via fixed point computation.The proposed method shares the advantages of both simplemodeling and graph cut optimization. Using a common kernelfunction, we verified the effectiveness of the method by aquantitative and comparative performance evaluation over alarge number of experiments on synthetic images. In com-parisons to existing graph cut methods, the proposed methodbrings advantages in regard to segmentation accuracy andflexibility. To illustrate these advantages, we also ran the testswith various classes of real images, including natural imagesfrom the Berkeley database, medical images, and satellite data.More complex data, such as color images and motion maps,

have also been used.The remainder of this paper is organized as follows. The

next section reviews graph cut image segmentation commonlystated as a maximum a posteriori (MAP) estimation problem[10], [14], [17], [29]. Section III introduces the kernel-induceddata term in the graph cut segmentation functional. It alsogives functional optimization equations and the ensuing algo-rithm. Section IV describes the validation experiments, andsection V contains a conclusion.

Fig. 1. Illustration of nonlinear 3D data separation with mapping: The datais non linearly separable in the data space. The data is mapped to a higherdimensional feature (kernel) space so as to have a better separability, ideallylinear. For the purpose of display, the feature space in this example is of thesame dimension as the original data space. In general, the feature space is ofhigher dimension.

II. UNSUPERVISED PARAMETRIC GRAPH CUTSEGMENTATION

Let I : p ∈ Ω ⊂ R2 → Ip = I(p) ∈ I be an imagefunction from a positional array Ω to a space I of photometricvariables such as intensity, disparities, color or texture vectors.Segmenting I into Nreg regions consists of finding a partitionRlNreg

l=1 of the discrete image domain so that each region ishomogeneous with respect to some image characteristics.

Graph cut methods state image segmentation as a labelassignment problem. Partitioning of the image domain Ωamounts to assigning each pixel a label l in some finite set oflabels L. A region Rl is defined as the set of pixels whose labelis l, i.e., Rl = p ∈ Ω | p is labeled l. The problem consistsof finding the labeling which minimizes a given functionaldescribing some common constraints. Following the pioneerwork of Boykov and Jolly [10], functionals which are oftenused and which arise in various computer vision problems arethe sum of two characteristic terms: a data term to measurethe conformity of image data within the segmentation regionsto a statistical model and a regularization term (the prior)for smooth regions boundaries. Definition of the data termis crucial to the ensuing algorithms and is the main focusof this study. In this connection, several existing methodsstate image segmentation as a Maximum A Posteriori (MAP)

Page 3: Kernel Graph Cut

Copyright (c) 2010 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

3

estimation problem [14], [10], [17], [18], [25], [29], [30], [45],[46], [47], where optimization of the region term amountsto maximizing the conditional probability of pixel data giventhe assumed model distributions within regions, P (Ip/Rl). Inthe context of interactive image segmentation, these modeldistributions are estimated from user interactions [14], [17],[18]. In this connection, Histograms [14] and Mixture ofGaussian Models (MGM) [17] are commonly used to estimatemodel distributions. In the context of unsupervised image seg-mentation, i.e., segmentation without user interactions, someparametric distributions, such as the Gaussian model, wereamenable to graph cut optimization [29], [30]. To write downthe segmentation functional, let λ be an indexing function. λassigns each point of the image to a region:

λ : p ∈ Ω −→ λ(p) ∈ L (1)

where L is the finite set of region indices whose cardinality isless or equal to Nreg . The segmentation functional can thenbe written as:

F(λ) = D(λ) + α R(λ) (2)

where D is the data term and R the prior. α is a positivefactor. The MAP formulation using a given parametric modeldefines the data term as follows:

D(λ) =∑

p∈Ω

Dp(λ(p)) =∑

l∈L

p∈Rl

− log P (Ip/Rl). (3)

In this statement, the piecewise constant segmentation model,a particular case of the Gaussian distribution, has been thefocus of several recent studies [11], [29], [30] because theensuing algorithms are computationally simple. Let µl be thepiecewise constant model parameter of region Rl. In this case,the data term is given by:

D(λ) =∑

p∈Ω

Dp(λ(p)) =∑

l∈L

p∈Rl

(µl − Ip)2. (4)

The prior is expressed as follows:

R(λ) =∑

p,q∈Nr(λ(p), λ(q)) (5)

with N a neighborhood set containing all pairs of neighbor-ing pixels and r(λ(p), λ(q)) is a smoothness regularizationfunction given by the truncated squared absolute difference[11], [21], [29] r(λ(p), λ(q)) = min(const2, |µλ(p)−µλ(q)|2)where const is a constant.

Although used most often, the piecewise constant model is,generally, not applicable. For instance, natural images requiremore general models [48], and the specific, yet important,SAR and polarimetric images require the Rayleigh and Wishartmodels [12], [31], [32]. In the next section, we will introducea data term which references the image data transformed viaa kernel function, and explain the purpose and advantages ofdoing so.

III. SEGMENTATION FUNCTIONAL IN THE KERNELINDUCED SPACE

In general, image data is complex. Therefore, computation-ally efficient models, such as the piecewise Gaussian distri-bution, are not sufficient to partition non-linearly separabledata. This study uses kernel functions to transform imagedata: rather than seeking accurate (complex) image modelsand addressing a non linear problem, we transform the imagedata implicitly via a mapping function φ, typically nonlinear,so that the piecewise constant model becomes applicable inthe mapped space and, therefore, solve a (simpler) linearproblem (refer to Fig. 1). More importantly, there is no needto explicitly compute the mapping φ. Using the Mercer’stheorem [37], the dot product in the feature space suffices towrite the kernel-induced data term as a function of the image,the regions parameters, and a kernel function. Furthermore,neither prior knowledge nor user interactions are required forthe proposed method.

A. Proposed functional

Let φ(.) be a non-linear mapping from the observation spaceI to a higher (possibly infinite) dimensional feature/mappedspace J . A given labeling assigns each pixel a label and,consequently, divides the image domain into multiple regions.Each region is characterized by one label: Rl = p ∈Ω | λ(p) = l, 1 ≤ l ≤ Nreg . Solving image segmentation ina kernel-induced space with graph cuts consists of finding thelabeling which minimizes:

FK(µl, λ) =∑

l∈L

p∈Rl

(φ(µl)−φ(Ip))2 + α∑

p,q∈Nr(λ(p), λ(q))

(6)FK measures kernel-induced non Euclidean distances betweenthe observations and the regions parameters µl for 1 ≤ l ≤Nreg .

In machine learning, the kernel trick consists of using alinear classifier to solve a non-linear problem by mappingthe original non-linear data into a higher-dimensional space.Following the Mercer’s theorem [37], which states that anycontinuous, symmetric, positive semi-definite kernel functioncan be expressed as a dot product in a high-dimensional space,we do not have to know explicitly the mapping φ. Instead, wecan use a kernel function, K(y, z), verifying:

K(y, z) = φ(y)T · φ(z), ∀ (y, z) ∈ I2, (7)

where “·” is the dot product in the feature space.Substitution of the kernel functions gives

JK

(Ip, µ

)= ‖φ(Ip)− φ(µ)‖2= (φ(Ip)− φ(µ))T · (φ(Ip)− φ(µ))= φ(Ip)T φ(Ip)− φ(µ)T φ(Ip)− φ(Ip)T φ(µ) + φ(µ)T φ(µ)= K

(Ip, Ip

)+ K

(µ, µ

)− 2K(Ip, µ

), µ ∈ µl1≤l≤Nreg ,(8)

which is a non-Euclidean distance measure in the original dataspace corresponding to the squared norm in the feature space.

Page 4: Kernel Graph Cut

Copyright (c) 2010 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

4

Now, the simplifications in equation (8) lead to the follow-ing kernel-induced segmentation functional FK:

FK(µl, λ) =∑

l∈L

p∈Rl

JK

(Ip, µl

)+ α

p,q∈Nr(λ(p), λ(q))

(9)The functional (9) depends both on regions parameters,µll=1,..,Nreg , and the labeling λ(·). In the next subsection,we describe the segmentation functional optimization strategy.

B. Optimization

Functional (9) is minimized with an iterative two-stepoptimization strategy. Using a common kernel function, thefirst step consists of fixing the labeling (or the image par-tition) and optimizing FK with respect to statistical regionsparameters µll=1,..,Nreg

via fixed point computation. Thesecond step consists of finding the optimal labeling/partitionof the image, given region parameters provided by the firststep, via graph cut iterations. The algorithm iterates these twosteps until convergence. Each step decreases FK with respectto a parameter. Thus, the algorithm is guaranteed to convergeat least to a local minimum.

1) Update of the region parameters: Given a partition ofthe image domain, the derivative of FK with respect to µk,k ∈ L, yields the following equations:

∂FK

∂µk=

p∈Rk

∂JK

(Ip, µk

)

∂µk+ α

p,q∈N

∂r(λ(p), λ(q))∂µk

=∑

p∈Rk

∂µk

[K

(µk, µk

)− 2K(Ip, µk

)]+ α

p,q∈N ,λ(p)=k

|µλ(p)−µλ(q)|<const

(µλ(p) − µλ(q))

(10)

For each pixel p, the p-neighborhood system we adopt, Np,is of size 4, i.e., it is compound of the four horizontally andvertically adjacent pixels to p. Let N p be the set of neighborsq of p verifying λ(q) 6= λ(p) and |µλ(p) − µλ(q)| < const.Then (10) can be written as:

∂FK

∂µk=

p∈Rk

∂µk

[K

(µk, µk

)− 2K(Ip, µk

)]+ α

p∈Rk︸︷︷︸Sum A

q∈Np︸ ︷︷ ︸Sum B

(µk − µλ(q)), k ∈ L

(11)

The sum A in (11) can be restricted to pixels laying on theboundary Ck of each region Rk. Indeed, for a pixel p in theinterior of Rk, i.e., p ∈ Rk and p /∈ Ck, all neighbors q belongto Rk so that N p = ∅. This simplifies (11) to:

∂FK

∂µk=

p∈Rk

∂µk

[K

(µk, µk

)− 2K(Ip, µk

)]+ α

p∈Ck

q∈Np

(µk − µλ(q)), k ∈ L

Table I lists commonly used kernel functions [39]. In all ourlarge number of experiments, we used the radial basis function(RBF) kernel. The RBF kernel has been prevalent in patterndata clustering [39], [43], [59]. With this kernel, the necessarycondition for a minimum of the segmentation functional FK

TABLE IEXAMPLES OF PREVALENT KERNEL FUNCTIONS

RBF Kernel K(y, z) = exp(−‖y − z‖2/σ2)

Polynomial Kernel K(y, z) = (y · z + c)d

Sigmoid Kernel K(y, z) = tanh(c(y · z) + θ)

with respect to region parameter µk, k ∈ L, is the followingfixed-point equation:

µk − fRk(µk) = 0, k ∈ L (12)

where

fRk(µk) =

p∈Rk

IpK(Ip, µk) + α∑

p∈Ck

q∈Np

µλ(q)

p∈Rk

K(Ip, µk) + α∑

p∈Ck

]N p

, k ∈ L

(13)and ] designates set cardinality. A proof of the existence of afixed point is given in the Appendix A. Therefore, a minimumof FK with respect to the region parameters can be computedby gradient descent or fixed-point iterations using (12). TheRBF kernel, although simple, yields outstanding results on alarge set of experiments with images of various types (see theexperimental Section IV). Also, note that, obviously, this RBFkernel method does not reduce to the gaussian model methodscommonly used in the literature [17], [29]. The differencecan be clearly seen in the fixed point update of the regionparameters in (12) which, in the case of the Gaussian model,would be the simple mean update inside each region.

2) Update of the partition with Graph cut optimization:In each iteration of this algorithm, the first step, describedabove, is followed by a second step which minimizes thesegmentation functional with respect to the partition of theimage domain. The second step, based on graph cut opti-mization, seeks the labeling minimizing functional FK . In thefollowing, we give some basic definitions and notations. Foreasier referral to the graph cut literature, we will use the wordlabel to mean region index.

Let G = 〈V, E〉 be a weighted graph, where V is the setof vertices (nodes) and E the set of edges. V contains a nodefor each pixel in the image and two additional nodes calledterminals. Commonly, one is called source and the other iscalled sink. There is an edge ep,q between any two distinctnodes p and q. A cut C ⊂ E is a set of edges verifying :• Terminals are separated in the graph G(C) = 〈V, E \ C〉• No subset of C separates terminals in G(C)

This means that a cut is a set of edges the removal ofwhich separates the terminals into two induced sub-graphs.In addition, this cut is minimal in the sense that none ofits subsets separates the terminals into the same two sub-graphs. The minimum cut problem consists of finding the cutC, in a given graph, with the lowest cost. The cost of a cut,denoted |C|, equals the sum of its edge weights. By settingproperly the weights of graph G, one can use swap movesfrom combinatorial optimization [11] to compute efficientlyminimum cost cuts corresponding to a local minimum offunctional FK . For each pair of labels θ, β, swap moves

Page 5: Kernel Graph Cut

Copyright (c) 2010 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

5

TABLE IIWEIGHTS ASSIGNED TO THE EDGES OF THE GRAPH FOR

MINIMIZING THE PROPOSED FUNCTIONAL WITH RESPECT TO THE

PARTITION

edge weight forp, θ JK

Ip, θ

+P

q∈Npq /∈Pθ,β

r(θ, λ(q)) p ∈ Pθ,β

p, β JK

Ip, β

+P

q∈Npq /∈Pθ,β

r(β, λ(q)) p ∈ Pθ,β

ep,q r(θ, β)p,q∈N

p, q ∈ Pθ,β

find the minimum cut in the sub-graph Gθ,β = 〈Vθ,β , Eθ,β〉.The set Vθ,β is equal to Pθ,β∪θ, β, i.e., it consists of the set(Pθ,β) of vertices labeled θ or β and terminals θ and β. Theset Eθ,β consists of edges connecting nodes of Pθ,β betweenthem and to terminals θ and β.

Given region parameters provided by the first step, this stepconsists of iterating the search for the minimum cut on the sub-graph Pθ,β over all pairs of labels. To do so, the graph weightsneed to be set dynamically whenever region parameters andthe pair of labels change. Table II shows the weights assignedto the graph edges for minimizing functional (9) with respectto the partition.

IV. EXPERIMENTAL RESULTS

Recall that the intended purpose of the proposed parametrickernel graph cut method (KM) is to afford an alternativeimage modeling by implicitly mapping the image data by akernel function so that the piecewise constant model becomesapplicable. Therefore, the purpose of this experimentation isto show how KM can segment images of a variety of typeswithout an assumption regarding image model or models.

To this end, we have two types of validation tests:(1) quantitative verification, via pixel misclassification fre-quency, using• Synthetic images of various models• Multi-model images, i.e., images composed of regions

each of a different model• Simulations of real images such as SAR

(2) real images of three types:• SAR, which are best modeled by a Gamma distribution• Medical/polarimetric, which are best modeled by a

Wishart distribution• Natural images of the Berkeley database. These results

will be evaluated quantitatively using the provided multi-operator manual segmentations

Additional experiments using vectorial images, namely colorand motion, will also be given.

A. PERFORMANCE EVALUATION WITH SYNTHETIC DATA

In this subsection, we first show two typical examples ofour large number of experiments with synthetic images anddefine the measures we adopted for performance analysis:the contrast and the percentage of misclassified pixels (PMP).

0 50 100 150 200 2500

0.02

0.04

0.06

0.08

0.1

Intensity

Dis

trib

ution region 1

region 2

0 50 100 150 200 2500

0.005

0.01

0.015

0.02

Intensity

region 1

region 2

(a) (b)

Fig. 2. Image intensity distributions: (a) small overlap; (b) significant overlap.

Figs. 3(a,d) depict two different versions of a two-region syn-thetic image, each perturbed with a Gamma noise. In the firstversion (Fig. 3 a), noise parameters result in a small overlap(significant contrast) between the intensity distributions withinthe regions as shown in Fig. 2(a). In the second version(Fig. 3 d), there is a significant overlap between the intensitydistributions as depicted in Fig. 2(b). The larger the overlap,the more difficult the segmentation [50].

The piecewise constant segmentation method and its piece-wise Gaussian generalization have been the focus of moststudies and applications [11], [29], [30], [51] because oftheir tractability. In the following, evaluation of the proposedmethod, referred to as Kernelized Method (KM), is systemat-ically supported by comparisons with the Piecewise GaussianMethod (PGM).

(a) (b) (c)

(d) (e) (f )

Fig. 3. Segmentation of two Gamma noisy images with different contrasts:(a,d) noisy images with different contrasts; (b,e) segmentation boundaryresults with PGM; (c,f) segmentation results with KM. Images size: 128×128.

The PGM was used first to segment, as depicted in Figs.3(b) and 3(e), the two versions of the two-region imagewith different contrast values. As the actual noise model is aGamma model, segmentation quality obtained with the PGM

Page 6: Kernel Graph Cut

Copyright (c) 2010 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

6

was significantly affected when the contrast is small in Fig.3 (e). However, the KM yielded almost the same resultsquality for both images (Fig. 3 c, f ), although the secondimage undergoes a significant overlap between the intensitydistributions within the two regions as shown in Fig. 2(b).

The previous example, although simple, showed that, with-out assumptions as to the images model, the KM can seg-ment images whose noise model is different from piecewiseGaussian. To demonstrate that the KM is a flexible andeffective alternative to image modeling, we proceeded to aquantitative and comparative performance evaluation over avery large number of experiments. We generated more than100 versions of a synthetic two-region image, each of themis the combination of a noise model and a contrast value.The noise models we used include the Gaussian, exponentialand Gamma distributions. We segmented each image with thePGM, the KM and the actual noise model, i.e., the noise modelused to generate the image at hand1. The contrast values weremeasured by the Bhattacharyya distance between the intensitydistributions within the two regions of the actual image [50]

B = − ln∑

I∈I

√P1(I)P2(I), (14)

where P1 and P2 denote the intensity distributions within thetwo regions and∑

I∈I√

P1(I)P2(I) is the Bhattacharyya coefficient measur-ing the amount of overlap between these distributions. Smallvalues of the contrast measure B, correspond to images withhigh overlap between the intensity distributions within theregions. We carried out a comparative study by assessingthe effect of the contrast on segmentation accuracy for thethree segmentations methods. We evaluated the accuracy ofsegmentations via the percentage of misclassified pixels (PMP)defined, in the two-region segmentation case, as:

PMP =(1− |BO ∩BS |+ |FO ∩ FS |

|BO|+ |FO|)× 100, (15)

where BO and FO denote the background and foreground ofthe ground truth (correct segmentation) and BS and FS denotethe background and foreground of the segmented image.

One example from the set of experiments we ran on syn-thetic data is depicted in Fig. 4. We show two different noisyversions of a piecewise constant two-region image perturbedwith a Gamma (first row) and exponential noises (secondrow). The overlap between the intensity distributions withinregions is relatively important in both images of Fig. 4(a).Indeed, the Gamma noisy image has a contrast of B = 0.1,and the exponentially noisy image (second row) a contrastof B = 0.3. The PGM yielded unsatisfactory results (Figs.4 b); the percentage of misclassified pixels was 36% for theGamma noise and 42% for the exponential noise. The whitecontours are not visible because the regions they enclose are

1By the method using the actual noise model, we mean the method adaptedfor the noise model of the image at hand. For example, if we treat a syntheticimage perturbed with a Gamma noise, the PGM and KM are comparedtogether with the segmentation method where the parametric conditionalprobability in equation (3) is supposed Gamma.

(a) (b) (c) (d)

Fig. 4. (a) Synthetic images with Gamma (first row) and Exponential (secondrow) noises; (b) segmentation boundary results with PGM; (c) segmentationadapted to the actual noise model; and (d) the KM . Image size: 241× 183.

very small. When the correct model is assumed2 (Figs. 4c), better results were obtained. Although no assumption wasmade as to the noise model, the KM yielded a competitivesegmentation quality (Figs. 4 d). For instance, in the case ofthe image perturbed with an exponential noise, the methodadapted to the actual model, i.e., the exponential distribution,yielded a PMP equal to 5.2% wheras the KM yielded a PMPless than 2.6% for both noisy images. The proposed methodallows much more flexibility in practice because the modeldistribution of image data and its parameters do not have tobe fixed.

Our comparative study investigated the effect of the contraston the PMP over more than 100 experiments. Several synthetictwo-region images were generated from the Gaussian, Gammaand exponential noises. For each noise model, the contrastbetween the two regions was varied gradually, which yieldedmore than 30 versions. For each image, we evaluated thesegmentation accuracy, via the PMP, for the three segmentationmethods: The KM, the PGM, and segmentation when thecorrect model and its parameters are assumed, i.e., the modeland parameters used to generate the image of interest.First, we segmented the subset of images perturbed with theGaussian noise using both the KM and PGM methods, anddisplayed the PMP as a function of the contrast (Fig. 5 a). Inthis case, the actual model is the piecewise Gaussian modelwhich corresponds to the PGM; therefore; we plotted onlytwo curves. The higher the PMP, the higher the segmentationerror and the poorer the segmentation quality. Although fullyunsupervised, the KM yielded approximately the same erroras segmentation with the correct model whose parameters arelearned in advance, i.e., the Gaussian model in this case.Second, we segmented the set of images perturbed with the

2Note that, for comparisons, embedding models such Gamma and expo-nential in the data term is not directly amenable to graph cut optimization.Instead, we assume that these parametric distributions are known beforehand.This is similar to several previous studies which used interactions to estimatesthe distributions of the segmentation regions. The negative log-likelihood ofthese distributions is, thus, used to set regional penalties in the data term [14],[10], [23]. Note that assuming that regions distributions are known beforehandis in favor of the methods we compared with and, therefore, would not biasthe comparative appraisal of the proposed kernel method.

Page 7: Kernel Graph Cut

Copyright (c) 2010 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

7

exponential noise with PGM, KM and the method adapted tothe actual model, i.e., the exponential model in this case. InFig. 5 (b), the PMP is plotted as a function of the contrast. ThePGM undergoes a high error gradient at some Bhattacharyyadistance. This is consistent with the level set segmentationexperiments in [49], [50]. When B is superior to 0.55, allmethods yield a low segmentation error with a PMP lessthan 5%. When the contrast decreases (B < 0.5), the KMoutperforms clearly the PGM and behaves like the correctmodel for a large range of Bhattacharyya distance values (referto Fig. 5 b).Similar experiments were run with the set of images perturbedwith a Gamma noise, and a similar behavior was observed(refer to Fig. 5 c). These results demonstrate that the KMcan deal with various classes of image noises and very smallcontrast values. It yields competitive results in comparison tousing the correct model, neither the model nor its parameterswere learned a priori. This is an important advantage of theproposed method–it relaxes assumptions as to the noise modeland, therefore, is a flexible alternative to image modeling.

0 0.5 1 1.5 20

5

10

15

20

Bhattacharyya distance

PM

P

gaussianKM

0 0.2 0.4 0.6 0.8 10

20

40

60

Bhattacharyya distance

PM

P

KMgauss.expo.

0 0.2 0.4 0.6 0.8 10

20

40

60

Bhattacharyya distance

PM

P

KMgauss.gamma

(a) (b) (c)

Fig. 5. Evaluation of segmentation error for different methods over a largenumber of experiments (PMP as a function of the contrast): comparisonsover the subset of synthetic images perturbed with a Gaussian noise in (a),the exponential noise in (b), and the Gamma noise in (c).

Another important advantage of the KM is the abilityto segment multi-model images, i.e., images whose regionsrequire different models. For instance, this can be the casewith synthetic aperture radar (SAR) images where the intensityfollows a Gamma distribution in a zone of constant reflectivityand a K distribution in a zone of textured reflectivity [52].The luminance within shadow regions in sonar imagery iswell modeled by the Gaussian distribution while the Rayleighdistribution is more accurate in the reverberation regions [35].

To demonstrate the ability of KM to process multi-modelimages, consider the synthetic image of three regions, eachwith a different noise model in Fig. 6(a). A Gaussian dis-tributed noise has been added to the clearer region, a Rayleighdistributed noise to the gray, and a Poisson distributed noise tothe darker region. The final segmentation (i.e. at convergence)with the KM is displayed in Fig. 6(b) where each region isdelineated by a colored curve. Fig. 6(c) shows each region, inthe final segmentation, represented by its corresponding labeland Figs. 6(d)-(f) show the segmentation regions separately.As expected, the PGM yielded corrupted segmentation results(see the segmentation regions at convergence in Fig. 6(h),and/or the labels at convergence in Fig. 6(i)). This experimentdemonstrates that the KM can discriminate different distribu-

tions within one single image.

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

Fig. 6. Image with different noise models: (a) noisy image; (b) segmentationboundary at convergence; (c) regions labels at convergence; (d)-(f) segmen-tation regions separately; (h) segmentation boundary at convergence with thePGM; (i) regions labels at convergence with PGM. α = 2 for both methods.Image size: 163×157. Note that a value of parameter α approximately equalto 2 is shown to be optimal for distributions from the exponential family suchas the piecewise Gaussian model [50]. The study in [50] shows that this valuecorresponds to the minimum of the mean number of misclassified pixels, andhas an interesting minimum description length (MDL) interpretation.

The preceding example used a multi-model image withadditive noise. The purpose of this next example is to showthat KM is also applicable to multi-model images with mul-tiplicative noise such as SAR. It is generally recognized thatthe presence of multiplicative speckle noise is the single mostcomplicating factor in SAR segmentation [31], [32], [33].Fig. 7(a) depicts an amplitude 8-look3 synthetic 512 × 512SAR image compound of four regions. The KM was appliedwith an initialization of four regions with their parameterschosen arbitrarily. The purpose of this experiment, is to showthe ability of the KM to adapt systematically to this kind ofnoise model. Fig. 7(b) depicts final segmentation regions, eachrepresented by its label at convergence. These segmentationregions are displayed separately in Figs. 7(c)-(f).

B. REAL DATA

The following experiments demonstrate KM applied to threedifferent types of images: SAR, polarimetric/medical, and theimages of the Berkeley database. We also include additionalexperiments with vectorial images, namely color and motion.

3In single-look SAR images, the intensity is given by I = a2 + b2, wherea and b denote the real and imaginary parts of the complex signal acquiredfrom the radar. For multilook SAR images, the L-look intensity is the averageof the L intensity images [31].

Page 8: Kernel Graph Cut

Copyright (c) 2010 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

8

(a) (b) (c)

(d) (e) (f)

Fig. 7. (a) Simulated multilook SAR image; (b) segmentation at convergencewith KM; (c)-(f) segmentation regions separately. α = 0.5. Image size: 512×512.

1) SAR and polarimetric/medical images: As mentionedearlier, SAR image segmentation is generally acknowledgedto be difficult [31], [48] because of the multiplicative specklenoise. We also recall that SAR images are best modeledby Gamma distribution [31]. Fig. 8(a) depicts a monolookSAR image where the contrast is very low in many spots ofthe image. Detecting the edges of the object of interest, insuch case, is very sensible. In Fig. 8(b), we show the finalsegmentation of this image into two regions, where the regionof interest is delineated by the colored curve. In Fig. 8(c),each region is represented by its parameter µk (correspondingto label k) at convergence. As we can see, the PGM producesunsatisfactory results although we used the value of parameterα which gives the best visual result. Another example of realSAR images is given in Fig. 9(a). This SAR image depictsthe region of Flevoland in Netherland. Figs. 9(b)-(c) depictrespectively the final segmentation and segmentation regionsrepresented with their parameters at convergence.

This next example uses medical/polarimetric images, bestmodeled by a Wishart distribution [12]. The brain image,shown in Fig. 10(a), was segmented into three regions. Inthis case, the choice of the number of regions is based onprior medical knowledge. Segmentation at convergence andfinal labels are displayed as in previous examples. Fig. 10(d)depicts a spot of very narrow human vessels with very smallcontrast within some regions. The results obtained in bothcases are satisfying. The most important aim of this experimentis to demonstrate the ability of the proposed method to adaptautomatically to this class of images acquired with specialtechniques other than the common ones.

These results with gray level images show that the proposedmethod is robust and flexible; it can deal with various typesof images.

2) Natural images–Berkeley database: We compare theKM against five unsupervised algorithms which are publicly

(a) (b) (c) (d)

Fig. 8. (a) Monolook SAR image; (b) segmentation boundary at convergencewith KM; (c) segmentation labels; (d) segmentation boundary at convergencewith PGM. Values of α were set for each method in order to give the bestresults. Image size: 361× 151.

(a) (b) (c)

Fig. 9. (a) SAR image of the Flevoland region; (b) segmentation boundaryat convergence; (c) segmentation labels. α = 0.8. Image size: 512× 800.

available on natural images from the Berkeley database: Mean-Shift [53], NCuts [9], CTM [54], FH [55], and PRIF [56].As our purpose is to address a variety of image models,we perform also comparative tests on the synthetic datasetused in subsection IV-A to study the ability of these methodsto adapt to different noise models. These comparisons arebased basically on two performance measures which seemto be among the most correlated with human segmentationin term of visual perception: The Probabilistic Rand Index(PRI) and the Variation of Information (VoI)4 [54], [57]. Sinceall methods are unsupervised, we use both training and testimages in the Berkeley database which consists of 300 colorimages of size 481 × 321. To measure the accuracy of theresults, a set of benchmark segmentation results provided by4 to 7 human observers for each image are available [58].The PRI is conceived to take into account this variability ofsegmentations between human observers as it, not only, countsthe number of pairs of pixels whose labels are consistentbetween automatic and ground truth segmentations, but also

4We used the Matlab codes provided by Allen Y. Yang to com-pute PRI and VoI measures. This code is available on-line athttp://www.eecs.berkeley.edu/∼yang/software/lossy segmentation/.

Page 9: Kernel Graph Cut

Copyright (c) 2010 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

9

(a) (b) (c)

(d) (e) (f )

Fig. 10. Brain and Vessel images: (a, d) Original images; (b, e) segmentationsat convergence; (c, f ) final labels.

averages with regard to various human segmentations [57].For the comparative appraisal, internal parameters of the

five used algorithms are set (as in [54]) to optimal values or tovalues suggested by the authors. These parameters are: (hs =13, hr = 19) for Mean-shift [53]; (K = 20) for NCuts [9];(σ = 0.5, k = 500) for FH [55]; (η = 0.15) which gives thebest PRI for CTM [54], and (K1 = 18, K2 = 10,K3 = 2) forPRIF [56]. As in [54], [56], the color images are normalizedto have the longest side equal to 320 pixels due to memoryissues.

As the purpose of our method is to address various imagenoise models, we compare the KM on two datasets: theBerkeley database for natural images and the synthetic datasetcontaining images with three noise models: Gaussian, Gammaand exponential. In the first comparative study, we tested theKM against five unsupervised algorithms which are availableon-line. After that, the two methods which obtained the bestPRI scores in the first comparative study, PRIF and FH forinstance, are tested against the KM on the synthetic database.The KM has been run with an initial maximum number ofregions Nreg = 10 and the regularization weight α = 2. Notethat, in this work, a region is a set of pixels having the samelabel which is different from works in [54], [56], and [58]where it is rather a set of non connected pixels belonging tothe same class. Experimentally, a value of Nreg = 10 agreeswith the average number of segments from the human subjects.

Table III shows the quantitative comparison of KM againstthe five other algorithms on the Berkeley benchmark. Valuesof PRI are in [0 1], higher is better and VoI ranges in[0 ∞[, lower is better. The KM outperforms CTM, Mean-Shift, and NCuts in terms of the PRI indice although it is notconceived to specially work on natural images as the CTMdoes for example. The FH yields better performance than KMin terms of PRI but not the VoI. In terms of VoI, the KMoutperforms Mean-Shift, NCuts, and FH. It is not surprisingthat the CTM yielded better VoI than the KM as it optimizes

TABLE IIIAVERAGE PERFORMANCES OF THE KM METHOD AGAINST FIVE

UNSUPERVISED ALGORITHMS IN TERMS OF THE PRI AND VOIMEASURES ON THE BERKELEY DATABASE [58]. THE FIRST LINE

CORRESPONDS TO PERFORMANCES OF HUMAN OBSERVERS.

Algorithms PRI [57] VoI [54]Humans (in [54]) 0.8754 1.1040KM 0.7650 2.4091CTM [54] 0.7627 2.2035Mean-Shift [53] (in [54]) 0.7550 2.477NCuts [9] (in [54]) 0.7229 2.9329FH [55] (in [54]) 0.7841 2.6647PRIF [56] 0.8006 1.9793

TABLE IVAVERAGE PERFORMANCES OF THE KM METHOD AGAINST TWO

UNSUPERVISED ALGORITHMS IN TERMS OF THE PRI AND VOIMEASURES ON THE SYNTHETIC DATASET OF SUBSECTION IV-A.

Algorithms PRI [57] VoI [54]KM 0.8253 0.8930PRIF [56] 0.7748 1.1711FH [55] 0.6188 1.3817

an information-theoretic criterion (recall that VoI measures theaverage conditional entropy of one segmentation given theother). PRIF is the unique method which yields the best resultsin terms of both measures.

Based on these performances in terms of the PRI, we alsocompared the KM with both PRIF and FH on images of thesynthetic multi-model image dataset, images which are verydifferent from natural images. We recall that these imagesare perturbed with Gaussian, Gamma, and exponential noisesand the contrasts between the foreground and the backgroundvaries considerably (see subsection IV-A). As shown in TableIV, the KM outperforms PRIF and FH algorithms in terms ofboth indices. We kept the same internal parameters used in thefirst comparison on Berkeley database for the three algorithms.This is mainly because the purpose is to compare the behaviourof the different methods regarding different kind of imageswithout any specific tuning for each dataset.

Overall, the KM reached competitive results as it givesrelatively good results in terms of two segmentation indices fornatural images, and the best results for the synthetic imagescompared to two state of the art algorithms. Fig. 12 depictsthe distribution of the PRI indice corresponding to the KMover the 300 images of Berkeley database. Note the relativelow variance especially for low scores (standard deviation= 0.12). For visual inspection, we show some examples ofthese segmentations obtained by the KM in Fig. 11.

3) Vectorial data: In this part, we run experiments onvectorial images in order to test the effectiveness of the methodfor more complex data. We show a representative sample of thetests with motion sequences. Color images have been alreadytreated in section IV-B.2 with the Berkeley database. In an M -vectorial image, each pixel value I(p) is viewed as a vectorof size M . The kernels in Table I support vectorial data and,hence, may be applied directly.

Page 10: Kernel Graph Cut

Copyright (c) 2010 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

10

Fig. 11. Examples of segmentations obtained by the KM algorithm on colorimages from Berkeley database. Although displayed in gray level, all theimages are processed as color images.

Motion sequences are the kind of vectorial data that we con-sider here for experimentation. In the following, we segmentoptical flow images into motion regions. Optical flow at eachregion is a two-dimensional vector (M = 2) computed usingthe method in [60]. We consider two typical examples: theRoad image sequence (refer to the first row of Fig. 13) andthe Marmor sequence (refer to the second row of Fig. 13). TheRoad sequence is compound of two regions: a moving vehicleand a background. Marmor sequence contains three regions:two moving objects in different directions and a background.The optical flow field is represented, at each pixel, by a vectoras shown in Figs. 13(a) and (d). Final segmentations are shownin Figs. 13(b) and (e) where each moving object is delineatedby a different colored curve. In Figs. 13(c), (f) and (g), wedisplay the segmented moving objects separately.

0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

10

20

30

40

50

60

70

PRI

Seg

men

tatio

ns

Fig. 12. Distribution of PRI measure of the KM applied to the images ofBerkeley database. Nreg = 10, α = 2. Images size 320× 214.

(a) (b) (c)

(d) (e) (f) (g)

Fig. 13. Segmentation of the Road and Marmor sequences: (a) the motionmap of the Road sequence, (b) segmentation at convergence of Road sequence,(c) the moving object segmentation region, (d) the motion map of Marmorsequence, (e) segmentation at convergence of Marmor sequence, (f-g) movingobjects separately.

V. CONCLUSION

This study investigated multiregion graph cut image seg-mentation in a kernel-induced space. The method consists ofminimizing a functional containing an original data term whichreferences the image data transformed via a kernel function.The optimization algorithm iterated two consecutive steps:graph cut optimization and fixed point iterations for updat-ing the regions parameters. A quantitative and comparativeperformance study over a very large number of experimentson synthetic images illustrated the flexibility and effectivenessof the proposed method. We have shown that the proposedapproach yields competitive results in comparison with usingthe correct model learned in advance. Moreover, the flexibilityand effectiveness of the method were tested over various typesof real images including synthetic SAR images, medical andnatural images, as well as motion maps.

Page 11: Kernel Graph Cut

Copyright (c) 2010 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

11

APPENDIX A

Consider the scalar function:

fR(µ) =

p∈R

IpK(Ip, µ) + α∑

p∈C

q∈Np

µλ(q)

p∈R

K(Ip, µ) + α∑

p∈C]N p

, (A-1)

where K is the RBF kernel used in our experiments. In thefollowing, we prove that this function has at least one fixedpoint.

The continuous variable µ is a continuous grey level which,typically, belongs to a given interval [a b]. Similarly, Ip

and µλ(q) lay in the same interval. Hence, the followinginequalities are automatically derived:

p∈C

q∈Np

µλ(q) > a∑

p∈C]N p

p∈R

IpK(Ip, µ) > a∑

p∈R

K(Ip, µ)

p∈C

q∈Np

µλ(q) 6 b∑

p∈C]N p

p∈R

IpK(Ip, µ) 6 b∑

p∈R

K(Ip, µ). (A-2)

It results from these inequalities that fR(µ) ∈ [a b]. Conse-quently, the function fR is as follows:

fR : [a b] → I ⊂ [a b]µ 7→ fR(µ).

Let’s define the function g as follows

g : µ 7→ g(µ) = fR(µ)− µ.

One can easily note that g(a)× g(b) ≤ 0. As g is continuous(sum of continuous functions) over [a b], there is at least oneµc ∈ [a b] such that g(µc) = 0. Hence, the function fR has atleast µc as a fixed-point.

ACKNOWLEDGMENTS

The author would like to thank the two anonymous review-ers whose detailed comments and suggestions have contributedto improving both the content and the presentation quality ofthis paper.

REFERENCES

[1] J.M. Morel, and S. Solimini, Variational methods in image segmentation,Birkhauser Boston, 1995.

[2] G. Aubert and P. Kornprobst, Mathematical Problems in Image Pro-cessing: Partial Differential Equations and the Calculus of Variations,Springer-Verlag New York, 2006.

[3] D. Cremers, M. Rousson, and R. Deriche, “A review of statisticalapproches to level set segmentation: integrating color, texture, motionand shape,” IJCV, 72(2), 2007.

[4] D. Mumford and J. Shah, “Optimal approximation by piecewise smoothfunctions and associated variational problems,” Comm. Pure Appl. Math,Vol. 42, pp. 577-685, 1989.

[5] S.C. Zhu, and A. Yuille, “Region competetion: Unifying snakes, regiongrowing, and bayes/MDL for multiband image segmentation,” IEEETrans. on PAMI, vol. 18, no. 6, pp. 884-900, Jun. 1996.

[6] T.F. Chan, and L.A. Vese, “Active Contours Without Edges,” IEEE Trans.on Image Processing, vol. 10, No. 2, pp. 266-277, 2001.

[7] L.A. Vese, and T.F Chan, “A multiphase level set framework for imagesegmentation using the Mumford ans Shah model,” Int. J. Comput. Vis.,vol. 50, no. 3, pp. 271-293, 2002.

[8] V. Caselles, R. Kimmel, and G. Sapiro, “Geodesic Active Contours,”ICCV, pp. 694-699, 1995.

[9] J. Shi, and J. Malik, “Normalized cuts and image segmentation,” IEEETrans. on PAMI, vol. 22, no. 8, pp. 888-905, Aug. 2000.

[10] Y. Boykov, and M.-P. Jolly, “Interactive Graph Cuts for Optimal Bound-ary and Region Segmentation of Objects in N-D Images,” In Proc. ofIEEE ICCV, pp. 105-112, 2001.

[11] Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy mini-mization via graph cuts,” IEEE Trans. on Patt. Anal. and Mach. Intell.vol. 23, no. 11, pp. 1222-1239, Nov. 2001.

[12] I. Ben Ayed, A. Mitiche, and Ziad Belhadj, “Polarimetric ImageSegmentation via Maximum-Likelihood Approximation and EfficientMultiphase Level-Sets,” IEEE Trans. Pattern Anal. Mach. Intell. (PAMI),vol. 28, no. 9, pp. 1493-1500, 2006.

[13] J.A. Sethian, Level Set Methods and Fast Marching Methods, seconded. Cambridge Univ. Press, 1999.

[14] Y. Boykov, and G. Funka-Lea, “Graph Cuts and Efficient N-D ImageSegmentation,” IJCV, vol. 70, no. 2, 2006.

[15] Y. Boykov, V. Kolmogorov, D. Cremers, and A. Delong, “An integralsolution to surface evolution PDEs via geo-cuts,” ECCV 2006, LNCS,3953, vol. 3, pp. 409-422.

[16] Y.G. Leclerc, “Constructing Simple Stable Descriptions for ImagePartitioning,” International Journal of Computer Vision, vol. 3, no. 1,pp. 73-102, 1989.

[17] A. Blake, C. Rother, M. Brown, P. Perez, and P. Torr, “Interactive imagesegmentation using an adaptive GMMRF model,” ECCV, 2004.

[18] C. Rother, V. Kolmogorovy and A. Blake. Grabcut-interactive fore-ground extraction using iterated graph cuts. SIGGRAPH, ACM Trans.on Graphics, 2004.

[19] F. Destrempes, M. Mignotte, and J.-F. Angers, “A stochastic methodfor Bayesian estimation of Hidden Markov Random Field models withapplication to a color model,” IEEE Transactions on Image Processing,14(8):1096-1124, August 2005.

[20] Y. Boykov, and V. Kolmogorov, “An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision,” In IEEETrans. on PAMI, vol. 26, no. 9, pp. 1124-1137, Sep. 2004.

[21] O. Veksler, “Efficient Graph-Based Energy Minimization Methods incomputer Vision,” PhD Thesis, Cornell Univ., July 1999.

[22] D. Freedman, and T. Zhang, “Interactive Graph Cut Based SegmentationWith Shape Priors,” In Proc. IEEE Intern. Conf. on Comput. Vis. and Patt.Recog. (CVPR), 2005.

[23] S. Vicente, V. Kolmogorov, and C. Rother, “Graph cut based imagesegmentation with connectivity priors,” IEEE Intern. Conf. on Comput.Vis. and Patt. Recog. (CVPR), Alaska, June 2008.

[24] P. Felzenszwalb, and D. Huttenlocher, “Efficient graph-based imagesegmentation,” Int. Jour. Comput. Vis., vol. 59, pp. 167181, 2004.

[25] N. Vu and B.S. Manjunath. Shape Prior Segmentation of MultipleObjects with Graph Cuts, IEEE Intern. Conf. on Comput. Vis. and Patt.Recog. (CVPR), 2008.

[26] T. Schoenemann, and D. Cremers, “Near Real-Time Motion Segmenta-tion Using Graph Cuts,” DAGM-Symposium, pp. 455-464, 2006.

[27] S. Roy, “Stereo without epipolar lines: A maximum-flow formulation,”Int. Jour. of Comp. Vision, vol. 34, no. 2-3, pp. 147-162, 1999.

[28] J. Xiao, and M. Shah, “Motion Layer Extraction in the Presence ofOcclusion Using Graph Cuts,” IEEE Trans. Pattern Anal. Mach. Intell.,vol. 27, no. 10, pp. 1644-1659, 2005.

[29] M. Ben Salah, A. Mitiche, and I. Ben Ayed, “A continuous labelingfor multiphase graph cut image partitioning,” In Advances in VisualComputing, Lecture Notes in Computer Science (LNCS), G. Bebis et al.(Eds.), vol. 5358, pp. 268-277, Springer-Verlag, 2008.

[30] N. Youssry El-Zehiry and A. Elmaghraby. A Graph Cut based ActiveContour for Multiphase Image Segmentation, ICIP 08.

[31] I. Ben Ayed, A. Mitiche, and Z. Belhadj, “Multiregion Level-SetPartitioning of Synthetic Aperture Radar Images,” IEEE Trans. PatternAnal. Mach. Intell. (PAMI), vol. 27, no. 5, pp. 793-800, 2005.

[32] E. E. Kuruoglu, and J. Zerubia, “Modeling SAR images with a general-ization of the Rayleigh distribution,” IEEE Trans. on Image Processing,vol. 13, no. 4, pp. 527-533, 2004.

[33] A. Achim, E. E. Kuruoglu, and J. Zerubia, “SAR image filtering basedon the heavy-tailed Rayleigh model,” IEEE Trans. on Image Processing,vol. 15, no. 9, pp. 2686-2693, 2006.

[34] A. A. Nielsen, K. Conradsen, and H. Skriver, “Polarimetric SyntheticAperture Radar Data and the Complex Wishart Distribution,” 13th Scandi-navian Conference on Image Analysis (SCIA), pp. 1082-1089, June 2003.

Page 12: Kernel Graph Cut

Copyright (c) 2010 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

12

[35] M. Mignotte, C. Collet, P. Prez, and P. Bouthemy, “Sonar imagesegmentation using an unsupervised hierarchical MRF model,” IEEETrans. Image Process., vol. 9, no. 7, pp. 12161231, Jul. 2000.

[36] S.C. Zhu, “Statistical modeling and conceptualization of visual patterns,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, pp. 691-712, 2003.

[37] K. R. Muller, S. Mika, G. Ratsch, K. Tsuda, and B. Scholkopf, “Anintroduction to kernel-based learning algorithms,” IEEE Transactions onNeural Networks, vol. 12, no. 2, pp. 181-202, 2001.

[38] A. Jain, P. Duin, and J. Mao, “Statistical pattern recognition: A review,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 1, pp. 4-37, 2000.

[39] I.S. Dhillon, Y. Guan, and B. Kulis, “Weighted Graph Cuts withoutEigenvectors: A Multilevel Approach,” IEEE Trans. Pattern Anal. Mach.Intell., vol. 29, no. 11, pp. 1944-1957, 2007.

[40] B. Scholkopf, A. J. Smola, and K. R. Muller, “Nonlinear componentanalysis as a kernel eigenvalue problem”, Neural Computation, vol. 10,pp. 1299-1319, 1998.

[41] T. M. Cover, “Geomeasureal and statistical properties of systems oflinear inequalities in pattern recognition,” Electron. Comput., vol. EC-14,pp. 326-334, Mar. 1965.

[42] M. Girolami, “Mercer kernel based clustering in feature space,” IEEETransactions on Neural Networks, vol. 13, pp. 780-784, 2001.

[43] D. Q. Zhang, and S. C. Chen, “Fuzzy clustering using kernel methods,”in Proc. Int. Conf. Control Automation, Xiamen, P.R.C., pp. 123-127,June 2002.

[44] C. Vazquez, A. Mitiche, and I. Ben Ayed, “Image segmentation asregularized clustering: a fully global curve evolution method,” Proc. IEEEInt. Conf. Image Processing, pp. 3467-3470, oct. 2004.

[45] P. Kohli, J. Rihan, M. Bray, and P. H.S. Torr. Simultaneous Segmentationand Pose Estimation of Humans Using Dynamic Graph Cuts. Int. J. ofComputer Vision, 79(3):285-298, 2008.

[46] V. Kolmogorov, A. Criminisi, A. Blake, G. Cross, C. Rother: Probabilis-tic Fusion of Stereo with Color and Contrast for Bilayer Segmentation.IEEE Trans. on Pattern Anal. and Machine Intell. 28(9):1480-1492, 2006.

[47] X. Liu, O. Veksler, J. Samarabandu. Graph Cut with Ordering Con-straints on Labels and its Applications, IEEE Intern. Conf. on Comput.Vis. and Patt. Recog. (CVPR), 2008.

[48] I. Ben Ayed, N. Hennane, and A. Mitiche, “Unsupervised VariationalImage Segmentation/Classification using a Weibull Observation Model,”IEEE Trans. on Image Processing, vol. 15, no. 11, pp. 3431-3439, 2006.

[49] G. Delyon, F. Galland, and Ph. Refregier, “Minimal Stochastic Com-plexity Image Partitioning With Unknown Noise Model,” IEEE Trans. onImage Processing, vol. 15, no. 10, pp. 3207- 3212, Oct. 2006.

[50] P. Martin, P. Refregier, F. Goudail, and F. Guerault, “Influence ofthe Noise Model on Level Set Active Contour Segmentation,” IEEETransactions on Pattern Analysis and Machine Intelligence, vol. 26, no.6, pp. 799803, 2004.

[51] X.-C. Tai, O. Christiansen, P. Lin, and I. Skjælaaen “Image SegmentationUsing Some Piecewise Constant Level Set Methods with MBO Type ofProjection,” Int. J. Comput. Vision, vol. 73, no. 1, pp. 61-76, 2007.

[52] R. Fjortoft, Y. Delignon, W. Pieczynski, M. Sigelle, and F. Tupin,“Unsupervised classification of radar images using hidden Markov chainsand hidden Markov random fields,” IEEE Trans. Geosci. Remote Sens.,vol. 41, no. 3, pp. 675686, Mar. 2003.

[53] D. Comaniciu and P. Meer, “Mean shift: A robust approach towardfeature space analysis,” IEEE Trans. Pattern Anal. Machine Intell.,24(5):603-619, May 2002.

[54] Yang, Allen Y., J. Wright, Y. Ma, and S. Shankar Sastry “Unsupervisedsegmentation of natural images via lossy data compression,” Comput. Vis.Image Underst., 110(2):212–225, 2008.

[55] P. Felzenszwalb, and D. Huttenlocher “Efficient Graph-Based ImageSegmentation,” International Journal on Computer Vision, 59(2):167-181,2004.

[56] M. Mignotte “A Label Field Fusion Bayesian Model and Its PenalizedMaximum Rand Estimator For Image Segmentation,” IEEE Trans. onImage Processing, 19(6):1610-1624, 2010.

[57] R. Unnikrishnan, C. Pantofaru, and M. Hebert “A measure for objectiveevaluation of image segmentation algorithms,” in Proc. of the 2005 IEEEIntern. Conf. on Comput. Vis. and Patt. Recog. (CVPR), vol. 3, pp. 34-41,June 2005.

[58] D. Martin and C. Fowlkes “The Berkeley segmentation databaseand benchmark,” image database and source code publicly available athttp://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/segbench/.

[59] P. J. Hubert, Robust Statistics. New York, 1981.[60] C. Vazquez, A. Mitiche, and R. Laganiere, “Joint Multiregion Segmen-

tation and Parametric Estimation of Image Motion by Basis FunctionRepresentation and Level Set Evolution,” IEEE Transactions on PatternAnalysis and Machine Intelligence, vol. 28, pp. 782793, 2006.

Mohamed Ben Salah was born in 1980. He receivedthe State engineering degree from the cole Poly-technique, Tunis, Tunisia, in 2003, The M.S. degreein computer science from the National Institute ofScientific Research (INRS-EMT), Montreal, Que-bec, Canada, in January 2006, where he is currentlypursuing the Ph.D. degree in computer science. Hisresearch interests are in image and motion segmen-tation with focus on level set and graph cut methods,and shape priors.

Amar Mitiche holds a Licence Es Sciences inmathematics from the University of Algiers and aPhD in computer science from the University ofTexas at Austin.

He is currently a professor at the Institut Na-tional de Recherche Scientifique (INRS), departmentof telecommunications (INRS-EMT), in Montreal,Quebec, Canada. His research is in computer vision.His current interests include image segmentation,motion analysis in monocular and stereoscopic im-age sequences (detection, estimation, segmentation,

tracking, 3D interpretation) with a focus on level set and graph cut methods,and written text recognition with a focus on neural networks methods.

Ismail Ben Ayed received the Ph.D. degree in com-puter science from the National Institute of ScientificResearch (INRS-EMT), Montreal, Quebec, Canada,in May 2007. Since then, he has been a scientist withGE Healthcare, London, ON, Canada. His recentresearch interests are in machine vision with focuson variational methods for image segmentation andin medical image analysis with focus on cardiacimaging.