Weighted Color and Texture Sample Selection for Image Matting

4260 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 11, NOVEMBER 2013

Weighted Color and Texture Sample Selection forImage Matting

Ehsan Shahrian Varnousfaderani and Deepu Rajan

Abstract— Color sampling based matting methods find thebest known samples for foreground and background colors ofunknown pixels. Such methods do not perform well if there is anoverlap in the color distribution of foreground and backgroundregions because color cannot distinguish between these regionsand hence, the selected samples cannot reliably estimate thematte. Furthermore, current sampling based matting methodschoose samples that are located around the boundaries offoreground and background regions. In this paper, we overcomethese two problems. First, we propose texture as a feature that cancomplement color to improve matting by discriminating betweenknown regions with similar colors. The contribution of textureand color is automatically estimated by analyzing the contentof the image. Second, we combine local sampling with a globalsampling scheme that prevents true foreground or backgroundsamples to be missed during the sample collection stage. Anobjective function containing color and texture components isoptimized to choose the best foreground and background pairamong a set of candidate pairs. Experiments are carried outon a benchmark data set and an independent evaluation of theresults shows that the proposed method is ranked first amongall other image matting methods.

Index Terms— Alpha matting, local and global sampling, colorand texture.

I. INTRODUCTION

D IGITAL matting refers to the accurate extraction of fore-ground objects from an image where part of the regions

in the object could have contributions from the background.This contribution is incorporated into a compositing equationin the form of opacity α of a pixel; the equation expresses theobserved color value of a pixel as a convex combination offoreground (F) and background (B) colors and is given by

Iz = αz Fz + (1 − αz)Bz (1)

where Iz , Fz and Bz are the observed, foreground and back-ground colors of pixel z, respectively. The opacity, α, takesvalue in the range [0, 1], with 0 indicating that the pixelis from the background and 1 indicating that it is from theforeground. Estimating the digital matte is useful in image andvideo editing tasks such as background replacement. As seenfrom eq. (1), extracting the matte is a highly ill-posed problem

Manuscript received October 11, 2012; revised March 28, 2013; acceptedJune 18, 2013. Date of publication July 4, 2013; date of current versionSeptember 11, 2013. The associate editor coordinating the review of thismanuscript and approving it for publication was Dr. Dimitrios Tzovaras.

The authors are with the Center for Multimedia and Network Technology,School of Computer Engineering, Nanyang Technological University, 639798Singapore (e-mail: [email protected]; [email protected]).

Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TIP.2013.2271549

since it involves estimation of seven unknowns for eachpixel from three compositing equations - one for each colorcomponent. The problem is constrained using assumptions onimage statistics or by using trimaps that partition the imageinto three regions - known foreground, known backgroundand unknown regions; this last region consists of a mixtureof background and foreground colors. The trimap could bedrawn by the user, or generated automatically [1] or semi-automatically [2].

Alpha matting approaches can be divided into three cate-gories - α-propagation based, color sampling based and theircombination. In α-propagation approach, the correlation ofneighboring pixels with respect to local image statistics isleveraged to propagate alpha from known regions towardsunknown ones. The affinity matrix that encodes the local cor-relation of pixels is used in random walk matting [3], Poissonmatting [4], Closed-form matting [5] and Nonlocal matting [2].The assumptions of large kernels by [2] and local color line of[5] are relaxed in KNN matting [6] using nonlocal principlesand K nearest neighbors. In color sampling approaches, somesamples are collected from known foreground and knownbackground regions and the best sample that ‘represents’ thetrue foreground and background colors of an unknown pixel isestimated by optimizing an objective function. Once the bestforeground (F) and best background (B) sample is selected,the α for that pixel is given by

αz = (Iz − B)(F − B)

‖(F − B)‖2 , (2)

where ‖·‖2 is Euclidean distance. Examples of the thirdcategory include [1], [7] in which the matting problem is castas an optimization problem where the cost function consistsof a data term representing color sampling and a smoothnessterm representing alpha propagation.

Color sampling approaches can be further sub-divided intoparametric and non-parametric methods. Parametric samplingmethods like [1], [8], [9] fit a parametric statistical modelto known foreground and background samples and then esti-mate alpha by considering the distance of unknown pixelsto known foreground and background distributions. Non-parametric methods including [10]–[13] simply collect a setof known F and B samples to estimate alpha matte by findingbest samples for unknown pixels. The quality of extractedmatte degrades when the true foreground and backgroundcolors of unknown pixels are not in the sample sets. Hence,the main challenge is to select a comprehensive set of knownsamples that include the different F and B colors in theimage.

1057-7149 © 2013 IEEE

VARNOUSFADERANI AND RAJAN: WEIGHTED COLOR AND TEXTURE SAMPLE SELECTION FOR IMAGE MATTING 4261

Fig. 1. (a) Original Image with highlighted boundaries of foreground (red) and background (blue) regions, (b) Zoomed-in Region, (c) Bayesian [8],(d) Closed-Form [5], (e) Robust [7], (f) Shared [12], (g) Global [13], (h) SVR [14], (i) Proposed method.

There are two drawbacks in current color samplingapproaches for digital matting despite color being a usefulfeature in estimating the matte. First, color sampling methodswill fail if the foreground and background regions have similarcolor distributions. Second, if the set of color samples toestimate the true F and B color is not comprehensive, theresulting matte will be erroneous. This could happen if thesamples are collected only from the boundary of a trimap orat certain predefined locations distributed over the image. Wediscuss the first drawback immediately below and postponethe discussion of the second drawback to the next section.

The problem of overlapped color distributions is illustratedin Fig. 1, which shows the original image marked withboundaries of foreground (red) and background (blue) regionsand a zoomed area of the image in Fig. 1 (a) and (b), respec-tively. Alpha mattes extracted by color sampling approaches ofBayesian matting [8], robust matting [7], shared matting [12],and global matting [13] are shown in Fig. 1(c,e,f,g) and thoseby α-propagation based approach of closed-form matting [5]and learning based approach of SVR matting [14] are shownin Fig. 1 (d) and (h), respectively. As shown in the zoomedarea of Fig. 1(b), the color similarity between the hairs and thetower of the bridge is high, which results in overlap of localforeground and background color distributions. Even carefulselection of known foreground and background pairs resultsin some portions of the background being wrongly estimatedas belonging to the matte as seen in the Bayesian, robust,shared, and global mattes obtained. At the same time, becauseof the false correlation between F and B samples, the alphavalues are wrongly propagated resulting in an inaccurate mattefor closed form matting in Fig. 1(d). Besides, the presenceof strong edges could inhibit the propagation of alpha if theregions on both sides of an edge belong to the backgroundas in the edge between the bridge tower and sky in Fig. 1(b).Furthermore, the learning based SVR matting method cannotestimate the matte properly and a big portion of the bridge is

considered as foreground as seen in Fig. 1(h). From the aboveexamples, we note that color alone is insufficient for digitalmatting.

In this paper, a new non-parametric sampling based methodis presented that uses texture as an additional feature forthe matting task. Our sampling strategy considers both localboundary as well as global samples, where the former impliessamples collected from the boundaries of known regions whilethe latter implies that samples are collected from diverselocations in the image. Local and global candidate sets ofF and B samples are collected and a new objective function isformulated that combines color and texture statistics to deter-mine the best samples that represent the true foreground andbackground of unknown pixels. Finally, the estimated mattesare refined using conventional Laplacian approach. The resultof the proposed method is shown in Fig. 1(i), which showsimproved performance compared to other methods especiallyin the challenging parts where the texture information hashelped in removing portions of the background bridge thatwould otherwise have been part of the matte. We showthat texture feature helps to complement color informationeffectively leading to more accurate mattes; the proposedmethod obtains state-of-the-art performance on a benchmarkdataset [15].

The paper is organized as follows: in section II, we presenta brief review of non-parametric color sampling based mattingmethods and illustrate the problem of missing true samples.The texture feature extraction and proposed sampling methodis presented in sections III and IV, respectively. The com-bination of color and texture features to arrive at the finalalpha matte is discussed in section V. Experimental results arepresented in section VI and finally conclusions are presentedin section VII . Part of this paper was presented at the IEEEComputer Vision and Pattern Recognition (CVPR) conferencein June, 2012. The main differences between this paper andthe CVPR version are 1) inclusion of global sample in addition


a b c d e

f g h j

Fig. 2. Illustration of missing true samples. (a) Original Image. Sampling strategies of (b) Robust [7], (c) Shared [12], (d) Global sampling [13] and(e) proposed matting methods , (f) Trimap, Estimated mattes by (g) Robust, (h) Shared, (i) Global sampling and (j) proposed matting methods.

to local samples, 2) more elaborate experimentation on newdata base of 15 synthesized images. These modifications leadto improved matting results.

II. RELATED WORK

Since the proposed matting method belongs to the class ofnon-parametric sampling methods, we review methods in thiscategory only. A comprehensive survey on image and videomatting is presented in [16].

In blue screen matting of Mishima [10], the foregroundobject is captured in front of a monochrome background whichforces foreground and background samples to form two colorclusters. Alpha values of unknown pixels are computed usingtheir proximity to the clusters.

In the Knockout system [11], F and B colors of unknownpixels are approximated as a weighted sum of nearby knownF and B samples; the weights are proportional to spatialdistances of known samples to unknown pixel. The alpha valueof unknown pixel is computed in three color channels and theirweighted sum is used to estimate the final alpha. The weightsare proportional to the distance between estimated foregroundand background colors in the different color channels.

In robust matting [7], a set of known samples that arespatially close to the unknown pixels are collected (Fig. 2b)and those that best fit the compositing equation are selected.The geodesic distance is used to collect samples in improvedcolor matting [17]. These two methods perform better thanthe Knockout system because only good samples that linearlyrepresent the observed color of unknown pixels are used formatting. However, the accuracy of the matte degrades whensome of the true F and B samples are not in the set of knownsamples, i.e., the set is not comprehensive.

In shared matting [12], the image plane is divided intosectors with equal angles and a set of known samples that liealong rays emanating from the unknown pixel are collectedas shown in Fig. 2(c). It collects samples from the boundariesof known foreground and background regions as specified bythe trimap and selects the best samples with respect to spatial,

photometric and probabilistic characteristics of the image. Theselected samples are shared among neighboring pixels to refinethe estimated alpha. Just like robust matting, shared mattingalso suffers from missing true samples. When the true samplesdo not lie on the emanated rays, the best samples might bemissed resulting in an inaccurate matte.

Global sampling [13] builds large sets of foreground andbackground samples by collecting all known boundary samplesto avoid missing true samples as shown in Fig. 2(d). The bestsamples among a huge number of known samples are selectedusing an efficient random search and a simple cost function.This method collects largest number of known samples com-pared to other matting methods. Irrespective of the efficientcomputation in choosing the samples, the true samples maystill be missed if they are not on the boundary of the trimapfrom where the samples are collected.

The drawback of collecting samples only from regionsaround the boundaries of known regions is illustrated inFig. 2. The original image and trimap are shown in Fig. 2(a) and (f). Even for such a simple image with smooth bluebackground, sampling based methods like robust and sharedmatting cannot extract alpha matte properly; notice the blueparts of the ball are wrongly estimated as background. Thisproblem is further compounded due to the overlapped colordistributions of foreground and background. More importantly,when all samples along the boundary are selected as shownin Fig. 2(i) for global sampling, the matte is still inaccuratebecause the boundary of the known foreground region doesnot include the blue colors of the ball and hence, is excludedfrom the set of candidate samples. Thus, it is importantthat the set of candidate samples should be comprehensiveenough to represent all color variations in foreground andbackground regions. To this end, the proposed method takesadvantage of local boundary samples and global samples thattruly encompass the different F and B colors and texturescontained in the image as shown in the Fig. 2(e). The result ofproposed method is shown in Fig. 2(j) in which the blue partsare accurately considered as foreground because the global


samples include the true foreground samples for blue part ofthe ball.

The proposed method addresses the two drawbacks ofcurrent sampling methods mentioned in section I in two ways:first, by proposing texture as a complementary feature to colorand by combining local and global samples of color as wellas texture to obtain a comprehensive set of candidate samples.The contributing weights of texture and color are automaticallydetermined based on image content. In the following, first thetexture feature used in the algorithm is described followedby description of local boundary and global sample selection.Next, a new cost function is designed to choose the bestsamples from which the matte is generated. The initial matteis further refined using Laplacian post-processing.

III. TEXTURE AS A COMPLEMENTARY FEATURE

The objective here is not to suggest a texture feature devel-oped especially for image matting, but to show that texture canbe a useful additional feature. To this end, one could possiblyuse any of the texture feature extraction techniques availablein the literature e.g. [18]. Our approach consists of a 2-levelHaar wavelet decomposition of the image and estimation ofthe mean of the coefficients over a 3×3 neighborhood for eachof the 4 sub-images at each level. In addition, the gradient andvariance over the same neighborhood is calculated to capturethe color variation in the approximation image. Note thatthe sub-images are resized to the original resolution throughbicubic interpolation. Thus the texture feature is representedas

FVT = {Agrad(l,c) , Avar

(l,c), Amean(l,c) , H mean

(l,c) , V mean(l,c) , Dmean

(l,c) },c = {R, G, B}, l = 1, 2

(3)where A, H, V and D refer to the approximation image,horizontal, vertical and diagonal sub-images, l is the levelnumber and c is the color channel. In the texture featurevector, the structural aspects of the texture are encoded inH, V and D sub-images while the color aspects are capturedin the approximation image.

A two-stage dimensional reduction process is applied onthe above 36-dimensional feature vector. In the first stage,principal components analysis is applied to retain at least95% of the information. The second stage employs lineardiscriminant analysis (LDA) in which the known F and Bpixels are grouped into an optimal number of m and n clusterswhere m and n are obtained using the Akaike InformationCriterion (AIC). Projections that represent the best separationbetween clusters constitute the reduced dimensions for knowndata in such a way that 90% of information is retained. Theselected eigenvectors in PCA and projections in LDA are usedto reduce the dimensions of texture feature vector for unknownpixels to ensure that the same dimension reduction process isapplied for both known and unknown pixels. If the reduceddimension feature vector for all pixels is displayed as animage, it would show the foreground and background regionsdelineated even though there might be overlap in the colordistributions. For consistency with color image, the texture

(a) (b)

(d)

Fig. 3. Illustration of overlap in color distributions of foreground andbackground regions. (a) Red channel of image. (b) Color distribution offoreground and background in (a). (c) First dimension of texture feature asan image. (d) Texture distribution of foreground and background in (c).

feature is scaled to [0, 255]. The texture samples are collectedfrom such an image.

Fig. 3 shows the effectiveness of proposed texture feature todiscriminate between foreground and background regions. Thered channel of an image is shown in fig. 3(a) together withits histogram showing overlap of foreground and backgroundcolor distributions in fig. 3(b). The first dimension of thetexture feature is displayed as an image in fig. 3(c) and its his-togram is shown in fig. 3(d). As expected, the texture featurehas enabled the histograms for the foreground and backgroundregions to be separated. This separation of foreground andbackground allows robust selection of texture samples, whichare then suitably weighted to reflect the fact that color is nota reliable feature for matte extraction in this case.

IV. LOCAL AND GLOBAL SAMPLING

In this section, we describe a novel combination of local andglobal samples that enables the formation of a comprehensiveset of candidate samples. Initially, the known regions areexpanded to unknown regions according to following condi-tion: An unknown pixel z is considered as foreground if, fora pixel q ∈ F ,

(‖z, q‖ < Ethr ) ∧ (⋃

i={c,t}(∥∥∥I i

z − I iq

∥∥∥ ≤ (V ithr − ‖z, q‖))) (4)

where ‖z, q‖ is the Euclidean distance between pixelz and q . The I c

z and I tz refer to color and texture features

of the pixel z. Ethr , V cthr and V t

thr are thresholds in spatial,color and texture spaces, respectively. A similar formulationis applied to compare the unknown pixel with a backgroundpixel. After the trimap is expanded, a candidate set of local andglobal samples for unknown pixels is selected. Local sampleshave high correlation with neighboring unknown samples, butthere is a possibility that samples lying further away fromthe boundary also belongs to the regions from where thelocal samples are gathered. Hence, there is a need for aglobal sampling process that incorporates larger variation inthe features.


Foreground sample

Background sample

Block of samples

Global Set of (F,B) pairs

Background

Foreground

Unknown

Original Image

Two Level Clustering

1st Level

2nd LevelSe

lect

ed B

ackg

roun

d B

lock

s

Sele

cted

For

egro

und

Blo

cks

Set o

f B

ackg

roun

d S

ampl

es

Set o

f F

oreg

roun

d S

ampl

es

Local Boundary and Global Set of (F,B) pairs

Local Boundary Sample Selection

Global Sample Selection

z

( a )

( b ) ( d )( c )

( e ) ( f ) ( g )

( h )

Local Set of (F,B) pairs

Clustering Result

Unknown

Foreground

Background Local Boundary (F,B) pairs

Global (F,B) pairs

Fig. 4. Local boundary and global sampling scheme. (a) Original image with printed trimap, (b) Selected known foreground and background blocks using8 rays, (c) Generated samples from set of selected known blocks, (d) Set of 12 (F,B) pairs, (e) Two-Level clustering, (f) Clustered known regions, (g) Set of(F,B) pairs from generated samples of clustered known regions (h) Combination of local boundary (F,B) pairs and subset of global (F,B) pairs. Images bestviewed in color.

A. Local Sampling

Local sampling is motivated by the sample gathering stageof [12] with the difference that we consider (i) texturein addition to color and (ii) blocks of pixels instead ofindividual pixels. The image is divided into blocks of size3 × 3 and a set of at most m known foreground andbackground blocks are collected for every unknown blockthrough m rays emanating from the unknown block. Theserays divide the image into sectors each of angle θinc =2π/m and terminate at the first known block that theyencounter.

The local boundary sampling is illustrated for block z inFig. 4(b) for the original Ball image shown in Fig. 4(a) inwhich the boundaries of known foreground and backgroundregions are shown in red and blue colors. In the example,eight rays select 3 known foreground blocks shown in redand 4 known background blocks shown in blue. The selectedblocks are shown in Fig. 4(c). The larger the number ofrays, more variations in color or texture can be captured.However, note that in this example, increasing the numberof rays will still not capture all the diverse regions in theball. The foreground blocks are taken from the red, greenand white regions of the ball and the background blocks aretaken from the predominantly blue region and the relativelysparse white region (corresponding to the clouds). In theexperiments, we use 45 rays spaced 8◦ apart. The meanvalues of color/texture in each block constitutes the set ofcandidate F and B samples. These samples are paired togetherto build a set of 12 (F, B) pairs as shown in Fig. 4(d).As seen in the example, the local samples cannot capture allthe color variations such as the samples from blue, yellow andorange parts of the ball. This problem is overcome by globalsampling.

B. Global Sampling

The first step in global sampling is a two-level hierarchicalclustering to partition the foreground and background regions.In the first level, the samples are clustered with respectto color feature through Gaussian mixture models (GMM)in which the number of components of GMM is same asnumber of peaks in the color histogram of samples in theregion. In the second level, the samples of each clusterare further partitioned to sub-clusters by applying the sameclustering process but with respect to the texture feature.The sample in each sub-cluster that is spatially closest tothe unknown sample is selected to build the set of knownforeground and background samples. The known samples arethen paired together to build a global set of (F, B) pairsfor unknown samples. Global sampling is done at the sameblock resolution as local sampling and is illustrated in Fig. 4.The two-level clustering (Fig. 4(e)) partitions the knownforeground region into six clusters as white, green, red, yellow,orange and blue and partitions the known background intotwo clusters as shown in the Fig. 4(f) to obtain a globalset of 12 (F, B) pairs for unknown samples as shown inFig. 4(g).

If two sub-clusters from which a particular global (F, B)pair is formed is already represented in the local set, thenthat global (F, B) pair is deemed redundant and removedfrom further consideration. Such (F, B) pairs are markedby a cross in Fig. 4. In this way, previously missing sam-ples after local sampling are also incorporated into theframework so that the final set of local boundary andglobal samples is comprehensive enough to cover all colorvariations. The final set is shown in Fig. 4 (h). Notethat Fig. 4 is only a schematic of the proposed samplingscheme.


V. SELECTION OF BEST (F, B) PAIRS

The best (F, B) pair is assigned to all pixels in a block andis selected by maximizing the following objective function thattakes advantage of color and texture:

O = (Cα)eC × (Tα)eT , (5)

where Cα is a measure of color fitness of an (F, B) pair and Tα

is a measure of the compatibility of color and texture feature inagreeing to a particular value of α. The contributions of colorand texture feature are weighted by eC and eT , respectively.Brute-force optimization is used to find the best (F, B) pair.We now describe each term of the objective function.

Cα measures the color fitness by considering the compat-ibility of the linear model of compositing equation (1) withconvex combination of foreground and background componentof a particular pair (Fk , Bk) and is given by

Cα =exp

⎛

⎜⎝−||I c

z −(α̂Fck +(1−α̂)Bc

k )||1

N F BBlk i

∑(Fk ,Bk)

∈ SF BBlki

||I cz − (α̂Fc

k + (1 − α̂)Bck )||

⎞

⎟⎠

(6)where I c

z is observed color of pixel z, α̂ is the alpha estimatedusing equation (2) for a pixel using its selected (Fk, Bk) pair,SF B

Blkiis a set of (F, B) pairs for block i whose cardinality is

N F BBlki

. The superscript c in Fck and Bc

k indicates color feature.The color fitness is high when color components of (F, B)pairs accurately estimate the observed color of the pixel.

Tα indicates the compatibility of estimated α using colorinformation with similarity of pixel to foreground or back-ground in texture space. A high compatibility implies thatcolor information can estimate α with high reliability. Thus,when the estimated α using color samples is close to 1 andthe probability of a pixel to be foreground in texture space isclose to 1, then the compatibility is high and the estimatedα is reliable. Therefore, Tα is formulated as

Tα = α̂ × PF T (z) + (1 − α̂) × PBT (z), (7)

where

PF T (z) =∥∥∥BT

k − I Tz

∥∥∥ /(∥∥∥BT

k − I Tz

∥∥∥ +∥∥∥FT

k − I Tz

∥∥∥) (8)

PBT (z) =∥∥∥FT

k − I Tz

∥∥∥ /(∥∥∥BT

k − I Tz

∥∥∥ +∥∥∥FT

k − I Tz

∥∥∥). (9)

FTk and BT

k are the foreground and background texture com-ponent of (Fk, Bk) pair and I T

z is the texture value of pixelz. The PF T (z) and PBT (z) are probabilities of pixel z to beforeground or background in texture space and they are com-puted using texture similarities of the pixel with foregroundand background texture samples. The best (F, B) pair for eachpixel in a block is obtained through the optimization processand α for the pixel is computed using eq. (2).

A. Automatic Selection of eC and eT

Texture complements color when the foreground and back-ground color distributions overlap, which might result inerroneous selection of foreground and background samples.The two features are weighted through eC and eT so that whenthere is significant overlap in color, texture is considered more

reliable. However, when the color distributions are distinct, itshould be assigned a higher weight. The degree of overlap ofdistributions is computed as

O L(H F , H B) =∑n

i=1 H F(i) × H B(i)∑ni=1(H F(i)2 + H B(i)2)/2

(10)

where H F and H B are normalized histograms of foregroundand background, respectively and n is number of histogrambins. The overlap computation is carried out for the set ofcandidate (F, B) pairs and is independent of the number offoreground and background samples. It is 1 when F and Bhave same distributions and is 0 when their distributions aredistinct. The overlap is used to determine the weights eC andeT as

eC = exp

(− O LC

O LT + O LC

)(11)

eT = exp

(− 2 × O LT

O LT + O LC

)(12)

where O LC and O LT are overlap of foreground and back-ground distributions in texture and color spaces, computed asaverage of histogram overlaps in texture and color channels,respectively. When color F and B distributions overlap com-pletely and corresponding texture distributions are completelyseparated, ec = e−1 and eT = 1. In the converse case, ec = 1and eT = e−2. Note that texture by itself may not be veryreliable since it is computed at block resolution.

B. Post-Processing

The alpha matte obtained by estimating α for each pixelusing the best (F, B) pair in eq. (2) is further refined toobtain a smooth matte by considering correlation betweenneighboring pixels. In particular, we adopt the post-processingmethod of [12] where a cost function consisting of the dataterm α̂ and a confidence value f together with a smoothnessterm consisting of the matting Laplacian [5] is minimizedwith respect to α. The confidence value is the value of theobjective function in eq. (5) for the selected (F, B) pair. Thecost function is given by [12]

α = arg min αT Lα + λ(α − α̂)T D(α − α̂)

+ γ (α − α̂)T �̂(α − α̂)(13)

where λ is a large weighting parameter compared to theestimated alpha α̂ and its associated confidence f whileγ is a constant (10−1) that indicates the relative impor-tance of data and smoothness terms. D is a diagonal matrixwith values 1 for known foreground and background pix-els and 0 for unknown ones, while diagonal matrix �̂has values 0 for known foreground and background pixels andf (the confidence value) for unknown pixels.

VI. EXPERIMENTAL RESULTS

In this section, we demonstrate the effectiveness of the pro-posed algorithm to address the main drawbacks of color sam-pling matting methods. In the first experiment, we illustratethe effectiveness of the proposed local and global sampling


Fig. 5. Visual comparison of proposed method with other matting methods to illustrate the problem of missing true samples. (a) Original images withboundaries of foreground (red) and background (blue) regions, (b) Zoomed Region, (c) Ground truth matte, (d) Proposed method, (e) Global, (f) Shared,(g) Robust.

TABLE I

COMPARISON OF MATTING METHODS WITH PROPOSED ONE ON BENCHMARK SET OF IMAGES AS EVALUATED BY [19] WITH RESPECT TO SAD

scheme that overcomes the problem of missing true samples.Visual as well as quantitative comparison of the proposedmethod with other matting methods over a benchmark datasetdeveloped by [15] is illustrated in second experiment. In thethird experiment, we evaluate the performance on a new set ofimages that contain significant overlap in color distributionsof foreground and background and show that the proposedmethod outperforms other sampling-based methods. Finally,failure cases are presented.

A. Missing True Samples

Current sampling based matting methods fail to estimatetrue foreground and background colors of unknown sampleswhen sets of collected known samples do not contain truecolors of unknown samples. Fig. 5(a) shows three originalimages with their corresponding trimap boundaries of knownF and B regions. Zoomed regions and ground truth alphamattes of zoomed regions are shown in Fig. 5(b) and (c),

respectively. The foreground object in the first image has blackand light brown colors. The trimap shows that the knownforeground region has both these colors, but its boundariesdo not contain the black color. Thus, sets of collected knownforeground samples by global and shared matting methods donot contain black colors and therefore, the black region of thedoll is wrongly estimated as background as shown in Fig. 5(e)and (f) in row 1. Even the set of spatially close samples cannotsolve the problem as shown for robust matting in Fig. 5(g).A similar situation arises in the second image whereby global,shared and robust matting miss the true black color in theforeground as shown in Figs. 5(e), (f), and(g), respectively. Inthe last image, the true colors of the hat are missed by robustmatting and the hat is considered as background as shown inthe Fig. 5(g). In this case, shared and global matting methodperforms slightly better as shown in Fig. 5(e) and (f).

The proposed method uses global samples to complementthe set of highly correlated local samples to solve the problemof missing true samples by covering all color variations.


Fig. 6. Visual comparison between various methods. (a) Original Image, (b) Zoomed-in region, (c) Closed-form, (d) Robust, (e) Shared, (f) Global,(g) SVR, (h) Proposed method.

TABLE II

RANKS OF MATTING METHODS WITH RESPECT TO SAD AND MSE FOR

BENCHMARK DATA SET AS EVALUATED BY [19]. SMALL, LARGE AND

USER REFER TO THE SIZES OF THE TRIMAPS

Also, the use of texture as a complementary feature of colorhelps to discriminate between known regions when they havesimilar color distribution. The visual comparison betweenground truth mattes and estimated mattes by the proposedmethod are shown in Fig. 5(c) and (d).

B. Evaluation on Benchmark Dataset

A preview of the effectiveness of using texture in additionto color for accurate matte extraction was shown in Fig. 1.For further visual comparison, Fig. 6 compares the proposedmethod with five other matting methods on Donkey, Elephantand Pineapple images obtained from the benchmark datasetof images [15]. The original images and zoomed parts areshown in Fig. 6(a,b). The extracted alpha matte for the zoomedportion using closed-form, robust, shared, global sampling,SVR and proposed matting methods are shown in Fig. 6(c-h), respectively. These methods are not as effective as the

Image Very Large TrimapLarge TrimapSmall TrimapGround truth

a

b

1 5432

6 10987

11 15141312

Fig. 7. Illustration of new database, (a) Sample image, ground truth matteand three types of trimap, (b) Set of 15 synthesized images.

proposed method shown in Fig. 6(h) due to the overlap incolors between foreground and background. Here, the textureinformation has helped to complement color so that a moreaccurate matte is extracted.

For quantitative comparison, we compare the proposedmatting method with other techniques listed in the alphamatting website [15]. Table I shows the sum of absolutedifferences (SAD) for all the images and for three types oftrimaps - small, large and user. The superscripts next to theSAD for each method indicates its rank for the particularimage and trimap. Methods with first rank are in bold. TheSAD of estimated mattes by the proposed method for Troll,Elephant and Pineapple images with large trimap are 16,3.5 and 7.4 and is ranked 1st . The overall ranks of mattingmethods for all images and trimaps along with average ranksover images for small, large and user trimaps with respect toSAD and mean square error (MSE) are shown in Table II.The proposed method of using texture to complement colorwith global and local sampling outperforms all other methods.


a b c d e f

RobustImage Ground Truth Shared Closed-FromProposed

Fig. 8. Qualitative evaluations of matting methods on synthesized images. (a) Original image with boundaries of known regions, (b) Ground truth, (c) Proposed,(d) Shared, (e) Robust, (f) Closed form.

It performs best with overall ranks of 5.1 and 5 with respect toSAD and MSE, respectively while the second ranked method,SVR matting, has overall SAD and MSE ranks of 5.8 and5.5, respectively. The effect of considering global samplescan be seen when the proposed method is compared with‘weighted color and texture matting’, which is our previouswork [20] in which only local samples were used. With SADand MSE ranks of 6 and 6.6, respectively, the advantage ofa comprehensive set of known samples generated from localand global sampling techniques is evident.

C. Matte Extraction in Complex Images

This experiment illustrates the effectiveness of the proposedmatting method in dealing with highly textured images thathighlight the problem of overlapped color distributions. Wesynthesize a new set of 15 images with three types of trimaps(Fig. 7). Some of the images are taken from the training setof [15] and the synthesized images are generated by replacingthe background with one that has similar colors as foreground.

The proposed method is compared with Closed-Form,Robust and Shared matting methods over the new set ofimages. Six of the fifteen images with ground truth mattesare shown in Fig. 8(a) and (b). The rest are available inthe supplementary material. In the first row, the petals of theflower have similar colors as background that make it hard forshared and robust matting to extract an accurate matte as seenin Fig. 8(d) and (e), respectively. Similar comments can bemade for the other images. In the second row, the foregroundobject contains holes through which the textured backgroundhaving very similar color is visible. The holes are visible inthe ground truth matte but color similarity of foreground andbackground and strong edges of foreground make it hard formatting methods to estimate accurate alpha matte. The holesare considered as foreground by closed form matting becausethe propagation of alpha is blocked by strong edges in theforeground.

For further evaluation, the performance of proposed methodis evaluated over more challenging images like the doll in


0

0.5

1

1.5

2

2.5

3

3.5

4

.5

Small Trimap Large Trimap Very Large Trimap

Ove

rall

Ran

k

Proposed Closed-Form Shared Robust

Fig. 9. Overall ranks of Proposed, Closed-form, Shared and Robust matting methods with respect to MSE [7] on new set of images.

Fig. 10. Failure cases. (a) Original Image, (b) Trimap, (c) Ground truth, (d) Proposed method.

front of textured regions as shown in the third and fourthrows of Fig. 8. The alpha is not accurately propagated inclosed-form matting through the background regions which arepartially occluded by the doll’s hairs (Fig. 8(f)). Moreover theestimated mattes by color sampling based matting methods arenot accurate. Some parts of textured background are wronglyestimated as foreground and vice versa as shown in Fig. 8(d,e)for shared and robust matting. The proposed method estimatesthe most accurate mattes (Fig. 8(c)).

The most challenging images are fuzzy images in whichforeground regions are gradually blended with backgroundones as shown in fifth and sixth rows of Fig. 8(a). Thecolor/texture of foreground and background are blendedtogether leading to new color/texture for fuzzy region thatis different from foreground or background. Here, the back-ground has vertical red lines and foreground has diagonalred lines and the fuzzy region has both vertical and diagonalred lines that generate different texture patterns as shown infifth row of Fig. 8(a). This phenomenon makes such fuzzilyblended images a very challenging one for alpha mattingespecially when foreground and background have similarcolor distributions. Robust and shared matting methods cannot properly estimate alpha mattes for the fuzzy regions asshown in Fig. 8(d,e). False correlations are increased due tocolor similarity for closed-form matting and also strong edgesintensify the problem by blocking the propagation of alpha(Fig. 8(f)). The proposed methods smoothly estimates alphamattes for fuzzy regions characterized by gradual transitionfrom background to foreground seen in Fig. 8(c). Althoughthe estimated mattes by proposed method are not perfect, theyperform better than the other color based matting methodsshown in Fig. 8.

A quantitative evaluation on the new set of images withthree types of trimaps - small, large and very large - isdone based on MSE calculated according to [7]. The overall

ranks of shared, robust, closed form and proposed mattingmethods are plotted in Fig. 9 for small, large and very largetrimaps, respectively. The proposed sampling method achievesthe best ranks illustrating its effectiveness in dealing withknown samples that come from similar color distributions.The performance of the proposed method in estimating highquality mattes for textured images shows the potential powerof texture when it is used as a complementary feature withcolor one.

D. Failure Cases

The proposed method relies on texture information to dis-criminate between known regions when they have overlappedcolor distributions but it fails to extract high quality matteswhen known region have both similar colors and textures asshown in the Fig. 10. In the Baby image of Fig. 10(a) whosetrimap and ground truth are shown in Fig. 10(c) and (d), theforeground and background regions have similar skin colorand texture that makes it hard for proposed method to extractthe matte as seen in Fig. 10(d) The problem is the same forthe second image in Fig. 10(d). We are currently investigatingon how these cases can be handled.

VII. CONCLUSION

In this paper, a new sampling based matting method ispresented which avoids missing true samples by collectinga comprehensive set of foreground and background samples.This collection includes highly correlated local samples anddoes not restrict the samples to be near the boundariesof known regions by considering global samples as well.The proposed method uses texture to complement color inextracting accurate mattes in images when foreground andbackground colors overlap. This is handled by designing anobjective function that uses weighted contribution of color and


texture information to choose the best (F,B) pair for unknownsamples. The weights corresponding to the contribution ofcolor and texture are determined using an automatic content-based method. The final matte is further refined using conven-tional Laplacian method. Experimental results on a benchmarkdataset achieve state-of-the-art performance in terms of stan-dard error measures that reveal the effectiveness of proposedmethod in dealing with complex images when foreground andbackground regions have similar colors.

REFERENCES

[1] J. Wang and M. Cohen, “An iterative optimization approach for unifiedimage segmentation and matting,” in Proc. 10th IEEE ICCV, vol. 2.Jan. 2005, pp. 936–943.

[2] P. Lee and Y. Wu, “Nonlocal matting,” in Proc. IEEE CVPR, Jun. 2011,pp. 2193–2200.

[3] L. Grady, “Random walks for image segmentation,” IEEE Trans. PatternAnal. Mach. Intell., vol. 28, no. 11, pp. 1768–1783, Nov. 2006.

[4] J. Sun, J. Jia, C. Tang, and H. Shum, “Poisson matting,” ACM Trans.Graph., vol. 23, no. 3, pp. 315–321, 2004.

[5] A. Levin, D. Lischinski, and Y. Weiss, “A closed-form solution to naturalimage matting,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 1,pp. 228–242, Jan. 2007.

[6] Q. Chen, D. Li, and C. Tang, “KNN matting,” in Proc. IEEE CVPR,Jun. 2012, pp. 869–876.

[7] J. Wang and M. Cohen, “Optimized color sampling for robust matting,”in Proc. IEEE CVPR, Jun. 2007, pp. 1–8.

[8] Y. Chuang, B. Curless, D. Salesin, and R. Szeliski, “A bayesian approachto digital matting,” in Proc. IEEE CVPR, vol. 2. Dec. 2001, pp. 7–15.

[9] M. Ruzon and C. Tomasi, “Alpha estimation in natural images,” in Proc.IEEE CVPR, vol. 1, Jun. 2000, pp. 18–25.

[10] Y. Mishima, “Soft edge chroma-key generation based upon hexoctahe-dral color space,” U.S. Patent 5,355,174, Oct. 11, 1994.

[11] A. Berman, A. Dadourian, and P. Vlahos, “Method for removing froman image the background surrounding a selected object,” U.S. Patent6,134,346, Oct. 17, 2000.

[12] E. Gastal and M. Oliveira, “Shared sampling for real time alpha matting,”Proc. Eurograph., vol. 29, no. 2, pp. 575–584, Dec. 2010.

[13] K. He, C. Rhemann, C. Rother, X. Tang, and J. Sun, “A globalsampling method for alpha matting,” in Proc. IEEE CVPR, Jun. 2011,pp. 2049–2056.

[14] Z. Zhanpeng, Z. Qingsong, and X. Yaoqin, “Learning based alphamatting using support vector regression,” in Proc. 19th IEEE ICIP,Sep. 2012, pp. 2109–2112.

[15] C. Rhemann, C. Rother, J. Wang, M. Gelautz, P. Kohli, and P. Rott,“A perceptually motivated online benchmark for image matting,” in Proc.IEEE CVPR, Jun. 2009, pp. 1826–1833.

[16] J. Wang and M. Cohen, “Image and video matting: A survey,” Found.Trends Comput. Graph. Vis., vol. 3, no. 2, pp. 97–175, Jan. 2007.

[17] C. Rhemann, C. Rother, and M. Gelautz, “Improving color modelingfor alpha matting,” in Proc. BMVC, 2009, pp. 1155–1164.

[18] M. Mirmehdi, X. Xie, and J. Suri, Handbook of Texture Analysis.London, U.K.: Imperial College Press, 2008.

[19] (2009). Alpha Matting Evaluation Website [Online]. Available:http://www.alphamatting.com

[20] E. Shahrian and D. Rajan, “Weighted color and texturesample selection for image matting,” in Proc. IEEE CVPR,Jun. 2012, pp. 718–726.

Ehsan Shahrian Varnousfaderani received theB.Eng. (Hons.) degree in computer engineering(software engineering) from Karaj Islamic AzadUniversity, Karaj, Iran, in 2006, the master’s degreein computer engineering - artificial intelligence androbotics from the Iran University of Science andTechnology, Tehran, Iran, in 2008, and the Ph.D.degree from Nanyang Technological University, Sin-gapore, in 2013. He was awarded the A*STARGraduate Scholarship in 2008 and Adobe Researchfund in 2012. He is currently a Research Associate

with the Vienna Reading Center, Medical University of Vienna, Vienna,Austria. His current research interests include medical image processing,computer graphics, and image and video processing.

Deepu Rajan received the Bachelor of Engineeringdegree in electronics and communication engineer-ing from the Birla Institute of Technology, Ranchi,India, the M.S. degree in electrical engineering fromClemson University, Clemson, SC, USA, and thePh.D. degree from the Indian Institute of Technol-ogy Bombay, Mumbai, India. He is an AssociateProfessor with the School of Computer Engineering,Nanyang Technological University, Singapore. From1992 to 2002, he was a Lecturer with the Departmentof Electronics, Cochin University of Science and

Technology, Cochin, India. His current research interests include imageprocessing, computer vision, and multimedia signal processing.

Documents

Weighted Color and Texture Sample Selection for Image Matting