10
 158 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 18, NO. 1, J ANUARY 2009 Independent Component Analysis-Based Background Subtraction for Indoor Surveillance Du-Mi ng Tsai and Shia- Chih La i  Abstract—In video surveil lance , dete ction of movin g objec ts from an image sequen ce is very importa nt for target trackin g, activ ity rec ognit ion, and beha vior under stand ing. Backg rou nd subtraction is a very popular approach for foreground segmenta- tion in a still scene image. In order to compensate for illumination cha nge s, a bac kgr ound model upd ati ng pr ocess is gen er all y adopted, and leads to extra computation time. In this paper, we prop ose a fast backgrou nd subtr actio n sche me using indepen- dent componen t analys is (ICA) and, particula rly , aims at indoor surv eilla nce for possi ble appli catio ns in home -car e and healt h-car e monitoring, where moving and motionless persons must be reli- ably detected. The proposed method is as computationally fast as the simple image difference method, and yet is highly tolerable to changes in room lighting. The proposed background subtraction scheme involves two stages, one for training and the other for de- tection. In the training stage, an ICA model that directly measures the statistical independency based on the estimations of joint and margi nal prob abilit y dens ity func tions from rel ativ e fre quenc y distributions is rst proposed. The proposed ICA model can well separate two highly-correlated images. In the detection stage, the trained de-mixing vector is used to separate the foreground in a scene image with respect to the reference background image. Two sets of indoor examples that involve switching on/off room lights and opening/closing a door are demonstrated in the experiments. The performance of the proposed ICA model for backg roun d subtraction is also compared with that of the well-known FastICA algorithm.  Index T erms—Background subtraction, foreground segmenta- tion, independent component analysis (ICA), indoor surveillance. I. INTRODUCTION I N video surveillance, detection of moving objects from an image sequence is very important for the success of obj ect tra cki ng, act ivity rec ogn itio n, and beh avior under- stand ing. Motio n detec tion aims at segmentin g foreg round regions corresponding to moving objects from the background. Backg round subtraction and tempo ral diff erenc ing are two popular approaches to segment moving objects in an image sequence under a stationary camera. Background subtraction detects moving objects in an image by evaluating the differ- ence of pixel features of the current scene image against the reference background image. This approach is very sensitive to illumination changes without adaptively updating the reference Manuscript received July 20, 2008; revised September 04, 2008. First pub- lished November 18, 2008; current version published December 12, 2008. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Jenq-Neng Hwang. The authors are with the Department of Industrial Engineering and Man- agement, Y uan-Z e Uni vers ity , Chun g-Li 32003, T aiwan, R.O.C. (e-ma il: [email protected] zu.edu.tw; [email protected] m.tw). Digital Object Identier 10.1109/TIP.2008.200 7558 background. Temporal differencing calculates the difference of pixel features between consecutive scene frames in an image sequence. It is very effective to accommodate environmental changes, but generally can only recover partial edge shapes of moving objects. In this study, we propose a fast background subtra ction sche me using indep enden t component analy sis (ICA) and, particularly, aim at indoor surveillance for possible home-care and health-care applications. Hu  et al.  [1] categorized motion segmentation approaches as backg round subtraction, tempo ral diff erenc ing, and optica l ow. As aforementioned, background subtraction and temporal differencing are applied to stationary cameras. Optical ow methods can be used for nonstationary cameras, which assign to every pixel a 2-D velocity vector over a sequence of images. Moving objects are then detected based on the characteristics of the velocity vectors. Optical ow methods are computationally intensive, and can only detect partial edge shapes of moving objects. In order to mak e the bac kgr oun d sub tra ction ada pti ve to environmental changes such as illumination variations, many background model updating strategies were proposed. Wren  et al.  [2] developed a system called Pnder  for person segmenta- tion, tracking and interpretation. They modeled each pixel of the background over time with a single Gaussian distribution. Since the estimation of Gaussian parameter values using the standard algorithms such as Expectation Maximization (EM) for each pixel in an image is computationally prohibited, recur- sive updating using a simple linear adaptive lter is applied for real-time implementation. The general form of the recursive linear lter formulation can be given by [3] (1) where the model pa ra me ter at ti me is up date d by a loca l es ti - mate with a learning rate . The lear n in g r ate used in the linear lter is usu all y xed thr oug hout thewhole seq uen ce of images and obtained by try-and-error. It controls the speed at which the model adapts to changes. A small value of learning rate results in slow convergence when a Gaussian adapts to background changes. Conversely, a large value of learning rate may improve the convergence speed, but the new background model will not maintain sufcient historical information of pre- vious images. The single Gaussian updating models were also adopted in the [4]–[7] for background subtraction. A robust background modeling is to represent each pixel of the background image over time by a mixture of Gaussians. This approach was rst proposed by Stauffer and Grimson [8], [9], and became a stand ard backg round updating procedur e for comparison. In their background model, the distribution of 1057-7149/$25.00 © 2009 IEEE

Independent Component Analysis-Based Background Subtraction for Indoor Surveillance

Embed Size (px)

DESCRIPTION

based on image processing

Citation preview

  • 158 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 18, NO. 1, JANUARY 2009

    Independent Component Analysis-Based BackgroundSubtraction for Indoor Surveillance

    Du-Ming Tsai and Shia-Chih Lai

    AbstractIn video surveillance, detection of moving objectsfrom an image sequence is very important for target tracking,activity recognition, and behavior understanding. Backgroundsubtraction is a very popular approach for foreground segmenta-tion in a still scene image. In order to compensate for illuminationchanges, a background model updating process is generallyadopted, and leads to extra computation time. In this paper, wepropose a fast background subtraction scheme using indepen-dent component analysis (ICA) and, particularly, aims at indoorsurveillance for possible applications in home-care and health-caremonitoring, where moving and motionless persons must be reli-ably detected. The proposed method is as computationally fast asthe simple image difference method, and yet is highly tolerable tochanges in room lighting. The proposed background subtractionscheme involves two stages, one for training and the other for de-tection. In the training stage, an ICA model that directly measuresthe statistical independency based on the estimations of joint andmarginal probability density functions from relative frequencydistributions is first proposed. The proposed ICA model can wellseparate two highly-correlated images. In the detection stage, thetrained de-mixing vector is used to separate the foreground in ascene image with respect to the reference background image. Twosets of indoor examples that involve switching on/off room lightsand opening/closing a door are demonstrated in the experiments.The performance of the proposed ICA model for backgroundsubtraction is also compared with that of the well-known FastICAalgorithm.

    Index TermsBackground subtraction, foreground segmenta-tion, independent component analysis (ICA), indoor surveillance.

    I. INTRODUCTION

    I N video surveillance, detection of moving objects froman image sequence is very important for the success ofobject tracking, activity recognition, and behavior under-standing. Motion detection aims at segmenting foregroundregions corresponding to moving objects from the background.Background subtraction and temporal differencing are twopopular approaches to segment moving objects in an imagesequence under a stationary camera. Background subtractiondetects moving objects in an image by evaluating the differ-ence of pixel features of the current scene image against thereference background image. This approach is very sensitive toillumination changes without adaptively updating the reference

    Manuscript received July 20, 2008; revised September 04, 2008. First pub-lished November 18, 2008; current version published December 12, 2008. Theassociate editor coordinating the review of this manuscript and approving it forpublication was Dr. Jenq-Neng Hwang.

    The authors are with the Department of Industrial Engineering and Man-agement, Yuan-Ze University, Chung-Li 32003, Taiwan, R.O.C. (e-mail:[email protected]; [email protected]).

    Digital Object Identifier 10.1109/TIP.2008.2007558

    background. Temporal differencing calculates the difference ofpixel features between consecutive scene frames in an imagesequence. It is very effective to accommodate environmentalchanges, but generally can only recover partial edge shapes ofmoving objects. In this study, we propose a fast backgroundsubtraction scheme using independent component analysis(ICA) and, particularly, aim at indoor surveillance for possiblehome-care and health-care applications.

    Hu et al. [1] categorized motion segmentation approachesas background subtraction, temporal differencing, and opticalflow. As aforementioned, background subtraction and temporaldifferencing are applied to stationary cameras. Optical flowmethods can be used for nonstationary cameras, which assignto every pixel a 2-D velocity vector over a sequence of images.Moving objects are then detected based on the characteristics ofthe velocity vectors. Optical flow methods are computationallyintensive, and can only detect partial edge shapes of movingobjects.

    In order to make the background subtraction adaptive toenvironmental changes such as illumination variations, manybackground model updating strategies were proposed. Wren etal. [2] developed a system called Pfinder for person segmenta-tion, tracking and interpretation. They modeled each pixel ofthe background over time with a single Gaussian distribution.Since the estimation of Gaussian parameter values using thestandard algorithms such as Expectation Maximization (EM)for each pixel in an image is computationally prohibited, recur-sive updating using a simple linear adaptive filter is applied forreal-time implementation. The general form of the recursivelinear filter formulation can be given by [3]

    (1)

    where the model parameter at time is updated by a local esti-mate with a learning rate . The learning rate usedin the linear filter is usually fixed throughout the whole sequenceof images and obtained by try-and-error. It controls the speed atwhich the model adapts to changes. A small value of learningrate results in slow convergence when a Gaussian adapts tobackground changes. Conversely, a large value of learning ratemay improve the convergence speed, but the new backgroundmodel will not maintain sufficient historical information of pre-vious images. The single Gaussian updating models were alsoadopted in the [4][7] for background subtraction.

    A robust background modeling is to represent each pixel ofthe background image over time by a mixture of Gaussians.This approach was first proposed by Stauffer and Grimson [8],[9], and became a standard background updating procedurefor comparison. In their background model, the distribution of

    1057-7149/$25.00 2009 IEEE

  • TSAI AND LAI: INDEPENDENT COMPONENT ANALYSIS-BASED BACKGROUND SUBTRACTION FOR INDOOR SURVEILLANCE 159

    recently observed value of each pixel in the scene is character-ized by a mixture of several Gaussians. All parameter values ofthe Gaussian mixture model are updated according to (1) usinga fixed learning rate (i.e., ). The background modelcan deal with multimodal distributions caused by shadows,swaying branches, etc. It can handle slow lighting changes byslowly adapting the parameter values of the Gaussians. Kaew-TraKulPong and Bowden [10] and Lee [11] further presentedeffective background updating schemes to improve the conver-gence rate of Gaussian mixture models. Li et al. [12] derived atheoretical analysis for the choice of the fixed learning rate .

    Instead of modeling the feature vectors of each pixel by a mix-ture of several Gaussians, Elgammal et al. [13] proposed to eval-uate the probability of a background pixel using a nonparametrickernel density estimation based on very recent historical sam-ples in the image sequence. In order to achieve quick adaptationto changes in the scene for sensitive detection and low false pos-itive rates, they proposed a way to combine the results of short-term and long-term background models for better update deci-sion. Kernel density estimation is very computationally inten-sive. Elgammal et al. [14] further proposed a fast nonparametrickernel density estimation algorithm for object tracking. By con-sidering a Gaussian kernel function, the Fast Gauss Transform[15], [16] improves the computation of the kernel density esti-mation from to , where is the number of sam-ples used in the kernel model. Ianasi et al. [17] also used non-parametric kernel density estimation for moving object detec-tion and tracking. They proposed a fast algorithm using mul-tiresolution and recursive density estimation with mean shiftbased mode tracking.

    The background subtraction methods reviewed above haveworked successfully for indoor and outdoor surveillances,where the environment may change gradually or abruptly overtime. Although many relatively fast background modelingmethods have been proposed for real-time implementation, theupdating process still consumes a significant amount of totalcomputation time, and restricts the image frame to a smallsize (e.g., 150 200). This causes a low image resolution, andthe object of interest may appear as a tiny foreground regionin the scene image. Besides the extra computational time ofbackground updating, one of the limitations of the adaptivebackground modeling is that it can wrongly absorb a foregroundobject into the background if the object remains motionless fora long period of time. For home-care surveillance, a foregroundperson may watch TV or fall asleep in the living room andremains still for long time duration. Or, for health-care moni-toring, a patient may fall off the bed and become unconscious.The foreground objects in such circumstances can not bedetected with the newly updated background model. To copewith the problem of motionless foreground objects, a referencebackground image that contains no moving objects may berequired. Background subtraction based on a single referenceimage is very computationally fast. However, it is also verysensitive to illumination changes, which deters the use of suchan approach in many surveillance applications.

    In this paper, we propose a simple and fast background sub-traction scheme without background model updating, and yetit is tolerable to changes in room lighting for indoor surveil-

    lance using Independent Component Analysis (ICA). ICA is anovel statistical signal process technique to extract independentsources given only observed data that are mixtures of unknownsources without any prior knowledge of the mixing mechanisms[18], [19]. In the ICA model, the observed signals are generallyassumed to be a linear mixture of the unknown sources from amixing matrix. The estimated source signals are termed inde-pendent components (ICs), and the inverse of the mixing matrixis called de-mixing matrix. The de-mixing matrix is solved bymaximizing the independency of the estimated source signals.

    In this study, we use ICA to detect foreground objects in a se-quence of images. Since the moving objects and the stationarybackground in an image are considered to be independent, ICAis applied to obtain the de-mixing matrix that can separate ascene image into foreground objects and still background. Theproposed ICA-based background subtraction scheme involvestwo processing stages: training stage and detection stage. In thetraining stage, two sample images, one representing the refer-ence background and the other containing arbitrary foregroundobjects, are selected to form the data matrix of the ICA model.Then, an ICA algorithm is applied to solve for the de-mixingmatrix. In the detection stage, the reference background imageused in training and the current scene image in the sequenceare used to form the data matrix. One of the two vectors of thetrained de-mixing matrix is associated with foreground objects,and is employed as a filter to extract the foreground objects inthe scene image. Since the row vector of the de-mixing matrix isonly of a small size of 1 2, it is very computationally efficientin the detection stage. It then allows high frame rates for largeimage sizes, or leaves sufficient time for high-level processessuch as tracking and activity recognition.

    The two sample images that form the data matrix in thetraining stage basically are identical except for the relativelysmall foreground region in the whole image. Therefore, thesetwo signals show a high correlation coefficient. The well-knownFastICA algorithm [20] cannot effectively recover highly- cor-related source signals. In this study, an independent componentanalysis model that directly measures the difference of thejoint probability density function (PDF) and the product ofmarginal probability density functions is proposed. The PDFsare estimated from the relative frequency distributions, and theParticle Swarm Optimization (PSO) algorithm is used to searchfor the best de-mixing matrix of the ICA model. The proposedICA model can well separate highly correlated data, and theestimated de-mixing matrix can effectively segment foregroundobjects under illumination changes for indoor surveillanceapplications.

    The organization of this paper is as follows. Section II firstoverviews the ICA model. The proposed ICA model that di-rectly measures the difference between the joint PDF and theproduct of marginal PDFs is then discussed. The PSO searchalgorithm that determines the best de-mixing matrix from theproposed ICA model is finally presented. Section III describestwo sets of experiments used to demonstrate the efficiency andeffectiveness of the proposed method, and to compare the de-tection performance with the FastICA algorithm and the imagedifference method. The paper is concluded in Section IV.

  • 160 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 18, NO. 1, JANUARY 2009

    II. ICA-BASED BACKGROUND SUBTRACTION

    A. Basic ICA ModelIn the basic ICA model [19], [20], the observed mixture sig-

    nals can be expressed as

    where is an unknown mixing matrix, and represents thelatent source signals, meaning that they cannot be directly ob-served from the mixture signals . The ICA model describeshow the observed mixture signals are generated by a processthat uses the mixing matrix to mix the latent source signals .The source signals are assumed to be mutually statistically inde-pendent. Based on the assumption, the ICA solution is obtainedin an unsupervised learning process that finds a de-mixing ma-trix . The matrix is used to transform the observed mix-ture signals to yield the independent signals, i.e., .The independent signals are used as the estimates of the la-tent source signals . The components of , called independentcomponents, are required to be as mutually independent as pos-sible.

    The objective of the algorithm for an ICA model is to maxi-mize the statistical independency (non-Gaussianity) of the ICs.The non-Gaussianity of the ICs can be measured by the negen-tropy [19], which is given by

    (2)

    where is a Gaussian random vector of the same covari-ance matrix as . is the entropy of a random vector .

    The negentropy is always non-negative and is zero if and onlyif has a Gaussian distribution. It is well justified as an esti-mator of the non-Gaussianity of the ICs. Since the problem inusing negentropy is computationally very difficult, an approxi-mation of negentropy is proposed as follows:

    (3)

    where is a Gaussian variable of zero mean and unit variance.is a nonquadratic function, which is generally given by

    or with. The FastICA algorithm [21], [22] has been one of the

    most popular optimization methods in ICA, which is based ona fixed-point iteration scheme for finding a maximum of thenegentropy.

    B. Proposed ICA ModelIn order to separate foreground objects from the background

    in a scene image, we need at least two sample images to con-struct the mixture signals in the ICA model. Let the sample im-ages be of size . Each sample image is organized as arow vector of dimensions, where . Denote by

    the reference background image con-taining no foreground objects, andthe foreground image containing an arbitrary foreground objectin the stationary background. In the training stage of the pro-posed background subtraction scheme, the ICA model is givenby

    where

    ICA aims to find a 2 2 de-mixing matrix of the data ma-trix such that one of the two row vectors in the estimatedsource signals will contain only the foreground object in auniform region without the detailed contents of the background.The 1 2 row vector associated with the source signal of theforeground object in the de-mixing matrix is then used inthe detection stage to separate foreground objects in each sceneimage of the video sequence.

    The two sample images and in the mixture data matrixcan be highly correlated since both contain a large portion of

    the same stationary background region. The conventional Fas-tICA algorithm cannot effectively separate source signals whichare correlated in nature. In this study, the statistical indepen-dence is directly employed to form the objective of the ICAmodel, i.e., if two random variables and are mutuallyindependent, the joint PDF can be factorized as

    where is the marginal PDF of source signal ,and . Therefore, the objective of the proposedICA model for background subtraction is to minimize the abso-lute difference between the joint PDF and the product of mar-ginal PDFs. That is

    (4)

    The difficulty of applying the definition of statistical indepen-dence in (4) is that we have no prior knowledge on the joint andmarginal PDFs. Since the ICA model applied to backgroundsubtraction in this study involves only two source signals offoreground and background, the combination of fallswithin a manageable number. The joint and marginal PDFs aresimply estimated from the relative frequency of the discrete ob-served data. We start with the construction of a histogram foreach individual source signal (i.e., marginal PDF). The range ofsignal values covered by the data is initially broken up intodisjoint adjacent intervals, each of the same width. The Sturgesrule [23] is used as a guideline for choosing the initial numberof intervals.

    Given intervals in each of the two marginal density dis-tributions, the number of 2-D grids in the joint density distri-bution becomes a large amount of . When most of the gridshave zero-entries in the joint density distribution, the estima-tion of the joint PDF becomes very unstable and leads to false

  • TSAI AND LAI: INDEPENDENT COMPONENT ANALYSIS-BASED BACKGROUND SUBTRACTION FOR INDOOR SURVEILLANCE 161

    separation of source signals. In order to prevent zero probabili-ties in most of the grids in the joint PDF, the interval with zeroentries in the marginal density distribution is merged with itsneighboring intervals. The interval merging process is repeateduntil all resulting intervals of each marginal density distributionhave sufficient entries. In this study, we use 20 entries as thethreshold. By the end of the merge, each interval in individualmarginal density distributions may have different widths. Theresulting interval widths used for estimating marginal PDFs arealso used to estimate the joint PDF.

    Let be the -th interval of source signal and, where is the resulting number of intervals

    for source signal . Also let be the resulting number ofentries in the interval . The marginal PDF of source signal

    at interval is estimated by

    The joint PDF of and at grid is estimated by

    where is the number of entries falling in the 2-Dgrid is the th merged interval of sourcesignal for .

    For a given de-mixing matrix , the source signals andare estimated by

    and their joint and marginal PDFs can be estimated accordinglyfrom the relative frequency distributions. The objective of theproposed ICA model is, therefore, reformulated as

    (5)The objective function of the ICA model defined in (5) is notdifferentiable. In this study, we, therefore, propose a stochasticoptimization procedure based on the particle swarm optimiza-tion (PSO) algorithm to determine the de-mixing matrix thatminimizes the objective .C. PSO Procedure

    Particle swarm optimization is an evolutionary computationtechnique originally developed by Kennedy and Eberhart [24].PSO resembles the social interaction from a school of flyingbirds. Individuals in a group of flying birds are evolved by co-operation and competition among the individuals themselvesthrough generations [25]. Each individual, named as a par-ticle, adjusts its flying according to its own flying experienceand its companions flying experience. Each particle with itscurrent position represents a potential solution to the problemin hand. In this study, each particle is treated as a point in a 4-Dspace since the de-mixing matrix is of size 2 2.

    In PSO, a number of particles, which simulate a group offlying birds, are simultaneously used to find the best fitness inthe search space. At each iteration, every particle keeps track of

    its personal best position by dynamically adjusting its flying ve-locity. The new velocity is evaluated by its current velocity andthe distances of its current position with respect to its previousbest local position and the global best position. After a sufficientnumber of iterations, the particles will eventually cluster aroundthe neighborhood of the fittest solution.

    Let the -th particle be represented as. The best previous position,

    i.e., the position giving the best fitness value, of particle isrecorded and represented as .Assume there are a total of particles in the PSO. Let

    denote by the best particlethat gives the current optimal fitness value among allthe particles in the population. The rate of the positionchange, i.e., the velocity, for particle is represented by

    . Each particle moves over thesearch space with a velocity dynamically adjusted according toits historical behavior and its companions. The velocity andposition of a particle are updated according to the followingequations [24]:

    (6a)

    (6b)where and are two random numbers in the rangebetween 0 and 1; and are two positive constants.

    The two positive constants and are the weights used toregulate the acceleration of self-cognition (a local best) and so-cial interaction (a global best). Kennedy and Eberhart [24], andShi and Eberhart [25] suggested since they makethe weights for cognition and social parts to be 1 on average.They are also the values adopted in this study. The velocityequation in (6a) calculates the particles new velocity in eachdimension according to its previous velocity and the distancesof its current position from its own best experience (i.e., posi-tion) and the groups best experience. Then the particle movestoward a new position according to the position update equationin (6b). The performance of each particle is measured by thefitness value, i.e., the objective function value in (5). When allparticles are evaluated, the global best position is updated by

    The procedure is repeated for a sufficient number of interations,and the final global best position is used as the de-mixingmatrix , which will be used in the detection stage for sepa-rating foreground objects from the background.

    Experiments have shown that the PSO search process con-verges fast after 150 iterations for a set of numerous test samplesevaluated in this study. Therefore, 200 iterations are consideredto be sufficient for our problem. Given the objective of mini-mizing the difference between the joint PDF and the product ofmarginal PDFs, the PSO algorithm with a moderate number ofiterations can search for an intermediate de-mixing matrix thatreduces the fitness value, but not a de-mixing matrix that re-sults in a zero fitness value, i.e., two completely independent

  • 162 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 18, NO. 1, JANUARY 2009

    source signals. Therefore, the proposed ICA model with thePSO search procedure can recover two highly correlated sourcesignals. The fixed-point algorithm for the negentropy model isconverged fast and finds straightly a de-mixing matrix that sep-arates two mixture signals into two independent signals, eventhough they are highly correlated in nature. Note that when theproposed ICA model is solved by the PSO algorithm with anextremely large number of iterations (e.g., 5000 or more), theestimated source signals are then similar to those of the Fas-tICA algorithm.

    D. Foreground DetectionTo demonstrate the proposed ICA model for background

    subtraction, Fig. 1(a) and (b) presents a reference backgroundimage and a scene image containing a foreground object,respectively. The correlation coefficient between the two im-ages is 0.90. Let be the 2-D image of size . It isfirst reshaped as a vector by the following transformation:

    The de-mixing matrix obtained from the PSO search ofthe proposed ICA model is given by

    . Therefore, the sourcesignals can be estimated by

    The resulting source signal andin vector form is converted back to a 2-D image

    by

    and

    In order to visualize the resulting value of in a gray-levelimage, is transformed by

    (7)

    where and are the mean gray value and standard de-viation of the reference background image

    is a conversion constant that sets up the rangeof the converted gray values; it is given by in this study.

    To segment foreground objects in the estimated source imagefor some or 2, an adaptive threshold is used to

    binarize the image. The threshold for segmentation in imageis given by a simple statistical control limit

    (8)

    where and are the mean and standard deviation of image, and is the control constant, which is given by 1.5 in

    this study. If

    Fig. 1. Foreground and background separation from the proposed ICAmodel (training stage): (a) reference background image ; (b) sceneimage containing a walking person; (c) separated background imagefrom ; (d) separated foreground image from ; (e) resulting binary image of (d).

    then pixel is a background point and shown in white inthe binary image. Otherwise, it is classified as a foregroundpoint and shown in black in the segmented image. Fig. 1(c)and (d) depicts the estimated source images ,which are respectively derived from the de-mixing vectors

    and . It isapparent from the separation results that de-mixing vectorcorresponds to the foreground object, and vector is asso-ciated with the reference background. The estimated sourceimage in Fig. 1(d) clearly shows the foreground objectwithout the details of the reference image. The foregroundobject can, therefore, be segmented easily in the estimatedsource image with simple thresholding, as seen in Fig. 1(e).

    The de-mixing vector corresponding to foreground objectsis then used in the detection stage to separate foreground ob-jects in each scene image of a video sequence. The resultingde-mixing vector used for foreground segmentation must havethe two elements with opposite signs. The de-mixing vector as-sociated with the reference background is discarded for furtherconsideration in the detection stage.

    In the detection stage, the reference background image isretained. A scene image at time frame in a video sequencealong with the background image are used to construct thedata matrix . Let be the de-mixingvector corresponding to foreground objects in the training stage.Then the sole estimated source image is given by

    Once the source image is obtained, it can be converted to agray-level image using (7) for visual display, and thresholded toa binary image using (8). The de-mixing vector is obtainedfrom the training stage, and the ICA modeling and optimizationare not required in the detection stage. It is, therefore, very com-putationally fast to derive the estimated source image, and todetect foreground objects with or without motions in the sceneimage.

    The 2-D de-mixing vector with opposite signsof elements, i.e., , can be treated as a convo-lution filter for image difference. It assigns weight to the2-D reference background image , and weight to the

  • TSAI AND LAI: INDEPENDENT COMPONENT ANALYSIS-BASED BACKGROUND SUBTRACTION FOR INDOOR SURVEILLANCE 163

    2-D scene image . The response amplitude indicates thepresence/absence of foreground objects in the scene. If the mag-nitude of the weighted differenceis distinctly large, then it shows the strong evidence thatis a foreground point.

    III. EXPERIMENTAL RESULTSThis section presents the experimental results from two sets of

    indoor image sequences to evaluate the performance of the pro-posed ICA-based background subtraction method. The test im-ages in the experiments were 150 200 pixels wide with 8-bitgray levels. In the PSO search algorithm, the required param-eters were set up as follows: the number of particles was 20,the weights , and the maximum number of itera-tions was 200. In the detection stage, the constant in (7) forconverting ICA signal values to gray values was set at 0.5, andthe control constant in (8) for binary thresholding was givenby 1.5 for all scene images. The implementation of the pro-posed method runs at a very fast speed of 142 fps (frames persecond) for 150 200 gray-level images on a personal computerwith an AMD 64-bit 1.8-GHz processor. In the experiments, wealso compare the performance of the proposed method with theFastICA algorithm, and the simple image difference method.For a fair comparison, the same parameter values of andin (7) and (8) were applied for the FastICA and image differ-ence methods, as well. To show the primary performance ofthe proposed method, postprocessing such as noise cleaning andconnected component analysis were not introduced to the seg-mented images.

    A. Single Reference BackgroundThe first test sample is a scene room with a large window.

    The room has three sets of fluorescent lights on the ceiling. Itis not uncommon that a room will have all lights on, all lightsoff, or partial lights in daily life. The illumination in the room is,therefore, affected by indoor lights and outdoor sunlight throughthe window, and the illumination could be changed graduallyor suddenly. Fig. 2(a) and (b) shows, respectively, a referencebackground image of the room , and a scene image containinga walking person , where all three sets of fluorescent lightswere turned on. Note that the window region is brighter thanthe remaining area in the image. Fig. 2(c) and (d) shows theestimated source images from the respective de-mixing vector

    and . It is vis-ibly shown in Fig. 2(d) that corresponds to the foreground,which distinctly preserves the foreground object without the de-tailed contents of the reference background.

    In the detection stage of the first test sample, the examplecame from a scenario that a person walked in the room, in whichall three sets of lights were originally turned on, then one set ofthe ceiling lights was switched off for a period of time, and fi-nally two sets of lights were turned off. The gray-level imagesshown in the first column (a) of Fig. 3 display the original se-quence at varying frames. The video images were taken at 5 fps.The symbol marked in the figure represents the frame numberin the image sequence.

    Prior to frame 25 , all three sets of lights wereturned on. Images between frames 65 and 88 were the results

    Fig. 2. Experimental results on a room with a large window (training stage):(a) reference background image; (b) scene image containing a person whoseclothes have similar intensities to the background wall; (c) separated back-ground image from ; (d) separated foreground imagefrom .

    Fig. 3. Experimental results on a room with a large window (detection stage):The left-most column (a) shows the discrete scene images in a video sequencetaken at 5 fps. The symbol indicates the frame number in the sequence.Columns 2, 3, and 4 (b)(d) give the detected foregrounds from the proposedmethod, FastICA, and the image difference method, respectively.

    of switching off one set of lights. After frame 153, two sets oflights were turned off. Note that the two images at frames 65 and153 do not present any foreground objects, and they were takenunder dramatic changes in scene lighting. The second and thirdcolumns (b) and (c) of Fig. 3 are the separation results, shown

  • 164 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 18, NO. 1, JANUARY 2009

    as gray-level images using (7), from the proposed method andthe FastICA algorithm, respectively. The fourth column (d) ofFig. 3 is the difference values, also shown as gray-level imagesusing (7), from the simple image difference method.

    Given the training sample images in Fig. 2(a) and (b),the FastICA algorithm generates a de-mixing vector of

    , which is very close to the simple imagedifference method that uses as the filter. From thesecond column (b) of Fig. 3, it is apparent that the proposedmethod can distinctly enhance the walking person and removethe background details in the separated images, even with largechanges in illumination. The FastICA algorithm and the imagedifference method only work well under fixed lighting. Whenthe room lighting is distinctly different from the one in thereference background image, both the moving person and thebackground details are presented in the resulting images, asseen in the third and fourth columns of Fig. 3.

    Fig. 4 shows the binarization results from the images inFig. 3. The first column of Fig. 3 is repeated in the first column(a) of Fig. 4 for visual evaluation of the segmentation results.The second, third, and fourth columns (b)(d) of Fig. 4 are thethresholding results from the proposed method, the FastICAalgorithm, and the image difference method, respectively.It can be seen from the binary images that the FastICA andthe image difference methods detect the bright window as aforeground object, and the moving person cannot be identifiedas a foreground object when she passed through the window.The proposed method can effectively detect the walking personeven under distinct changes in room lighting.

    B. Multiple Reference BackgroundsIn indoor surveillance, it is quite common to have a door

    widely open, completely closed, or left half open in a sceneroom. Or, a person may be opening or closing a door whenhe/she walks into or out of the room. One may encounter thesame situation when opening or closing a curtain across thewindow. A door or a curtain is a nonstationary backgroundobject, and should not be identified as a foreground object inbackground subtraction. The problem of such nonstationarybackground objects can be coped with multiple reference back-grounds. The second indoor surveillance test sample involves ascene room with opening and closing of a door. To apply theproposed method for background subtraction, we first selecttwo reference background scenes and , one having thedoor widely open and the other having the door fully closed. Wethen individually train two ICA models using the two referencebackground images, and obtain two de-mixing vectorsand that can be used to separate foreground objects in theimage. In the detection stage, a scene image in the videosequence is individually convoluted with and by

    and

    Let and be the respective binary images of and, where a foreground pixel and a background pixel are, re-

    spectively, given by the values of 1 and 0. Given a scene imagecontaining a half-open door, or an opening/closing operation of

    Fig. 4. Thresholding results of the separated foreground images in Fig. 3:(a) original scene images; (b)(d) binary results of the recovered images inFig. 3(b)(d), respectively.

    the door, we expect that the pixels of the nonstationary door willfall within either the reference background with a widely opendoor, or the one with a fully closed door. A true foreground pixelshould be present in both binary images and . There-fore, foreground objects are segmented by the intersection of

    and , i.e.,

    Fig. 5 show two sets of mixture data used for training theICA models. The first set in Fig. 5(a1) and (b1) involves thebackground scene with the door completely closed, and thesecond set in Fig. 5(c1) and (d1) contains the background scene

    with the door widely open. Fig. 6(a1)(c1) shows threescene images for foreground segmentation, in which (a1) con-tains a person walking through a widely open door, (b1) involvesa man passing through a half-open door, and (c1) is the back-ground scene with a half-open door. Fig. 6(a2)(c2) gives thebinary results from the first reference background , andFig. 6(a3)(c3) shows the binary results from the secondreference background . Fig. 6(a4)(c4) shows the intersec-tion results of and . It can be observed from Fig. 6 that

  • TSAI AND LAI: INDEPENDENT COMPONENT ANALYSIS-BASED BACKGROUND SUBTRACTION FOR INDOOR SURVEILLANCE 165

    Fig. 5. Training results on a living room with a door opening/closing:(a1) reference background image with the door completely closed;(b1) foreground scene image with respect to (a1); (c1) reference backgroundimage with the door widely open; (d1) foreground scene image withrespect to (c1); (a2)(d2) separated backgrounds and foregrounds of (a1)(d1).

    Fig. 6. Detection results of door opening/closing scenario: (a1)(c1) scene im-ages; (a2)(c2) resulting binary images from reference background inFig. 5(a1); (a3)(c3) resulting binary images from reference background in Fig. 5(c1); (a4)(c4) intersection results of and .

    the foreground object can be reliably segmented without pre-senting the nonstationary door as a foreground region.

    C. Effect of Varying Foreground ObjectsIn this subsection, we evaluate the effect of object sizes and

    intensity contrast for foreground segmentation. Fig. 7(a) and(b) shows the reference background and the foreground imageused in the training process. The foreground image contains atiny girl wearing white clothes similar to the background. Theresulting de-mixing vector is . Fig. 7(c)(e)displays the detection results of the same girl walking towardthe still camera. The object sizes in the scene images have beendistinctly enlarged when the object gets closer to the camera.Fig. 7(f)(h) further demonstrates the detection results of a manwearing black clothes and walking away from the camera. Theintensity of the foreground object in the scene image is dis-tinctly different from the one in the training image. The pro-posed method also reliably detects the moving object.

    In the experiments, we have also evaluated a set of the reversetest images, where the man wearing black clothes is used as the

    Fig. 7. Detection results of varying foreground objects: (a) reference back-ground image; (b) training scene image; (c1)(h1) test scene images; (c2)(h2)detected foreground images of (c1)(h1); (c3)(h3) segmented foreground ob-jects of (c2)(h2), respectively.

    training foreground image and the girl wearing white clothesis used as the scene images. The resulting de-mixing vectoris . Similar results as the ones in Fig. 7 arealso detected. The detection results reveal that the proposed ICAmodel with the PSO search generates similar de-mixing vectors,and the de-mixing vectors can be reliably used to detect variousforeground objects, regardless of changes in object sizes and ob-ject intensities.

    In addition, there is an interesting finding from the detectionresults in Fig. 7. When the foreground object is relatively verysmall in the scene image, the proposed method is sensitive toobject shadows, as seen in Fig. 7(c3) and (h3). While the objectsize in the scene image gets larger, the shading effect becomesinsignificant, as seen in Fig. 7(d3) and (g3).

    D. Analysis of the Demixing VectorAs aforementioned, the 2-D demixing vector

    with from the proposed ICA model can be usedas a convolution filter for the reference background and the

  • 166 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 18, NO. 1, JANUARY 2009

    scene image. The conventional image difference method canalso be treated as a convolution filter. Foreground ob-jects generally have more distinct gray-levels with respect tothe background. Both image difference method and the pro-posed method can well detect foreground objects in an imagesequence, but the image difference method is more sensitiveto illumination changes. This is because the filter re-sults in a more distinct gray-level difference, compared to thatof the proposed filter . In the experiments, wehave evaluated numerous test sample images with varying ref-erence backgrounds. The proposed ICA model with the PSOsearch consistently converges to a de-mixing vector similar to

    . Therefore, this vector can be treated as anoptimal filter for background subtraction.

    Let and be the gray levels at pixel coordi-nates of the reference background image and the sceneimage at frame , respectively. If

    (9)

    then the response amplitude from the filter is distinctlylarger than that from the proposed filter and,therefore, is more sensitive to noise and illumination changes.In the experiments, the filter is given byfrom the proposed ICA model. It indicates if the gray-levelratio [the left-hand side of (9)] is less than 0.665 [the right-handside of (9)], the filter will generate a more significantamount of gray-level difference. Since most of the illuminationchanges in a scene background generally have a gray-levelratio smaller than 0.665, i.e., larger gray-level difference, itwill detect both foreground objects and regions with significantillumination changes as foreground. This is why the brightwindow is intensified in Fig. 3(d) and detected as a foregroundobject in Fig. 4(d) when the simple image difference method isapplied for foreground segmentation.

    IV. CONCLUSIONBackground subtraction is a widely used approach for de-

    tecting foreground objects in videos from a static camera. Indoorsurveillance applications such as home-care and health-caremonitoring, a motionless person should not be a part of thebackground. A reference background image without movingobjects is, therefore, required for such applications. In thispaper, we have presented an ICA-based background subtractionscheme for foreground segmentation. The proposed ICA modelis based on the direct measurement of statistical independencythat minimizes the difference between the joint PDF andthe product of marginal PDFs, in which the probabilities aresimply estimated from the relative frequency distributions.The proposed ICA model well performs the separation ofhighly-correlated signals.

    The de-mixing vector derived from theproposed ICA model can be considered as an optimal filter forbackground subtraction under arbitrary reference backgrounds.Since the de-mixing vector is only of a small size of 1 2, itscomputation is as fast as the image difference method. Compu-tation times in the detection stage of the proposed method areonly 0.007 seconds (i.e., 142 fps) for a small 150 200 image,

    and 0.09 seconds (i.e., 11 fps) for a large 640 480 image onan AMD 64-bit 1.8-GHz personal computer.

    As seen in the experimental results, a part of the personsbody was missing due to the camouflage foreground ingray-level images. In order to eliminate the effect of camou-flage, color may provide more information to discriminate aforeground object from its surrounding background. The ICAmodel that separates vector-valued signals in RGB color imagesis worth further investigation.

    REFERENCES[1] W. Hu, T. Tan, L. Wang, and S. Maybank, A survey on visual surveil-

    lance of object motion and behaviors, IEEE Trans. Syst., Man, Cybern.C, Appl. Rev., vol. 34, no. 3, pp. 334352, Aug. 2004.

    [2] C. R. Wren, A. Azarbayejani, T. Darrell, and A. P. Pentland, Pfinder:Real-time tracking of the human body, IEEE Trans. Pattern Anal.Mach. Intell., vol. 19, no. 7, pp. 780785, Jul. 1997.

    [3] D.-S. Lee, Online adaptive Gaussian mixture learning for video ap-plications, in Proc. ECCV Workshop SMVP, Prague, Czech Republic,May 2004, pp. 105116.

    [4] T. Olson and F. Brill, Moving object detection and event recognitionalgorithm for smart cameras, in Proc. DARPA Image UnderstandingWorkshop, 1997, pp. 159175.

    [5] C. Eveland, K. Konolige, and R. C. Bolles, Background modelingfor segmentation of video-rate stereo sequences, in Proc. IEEE Conf.Computer Vision and Pattern Recognition, 1998, pp. 266271.

    [6] T. Kanade, R. Collins, A. Lipton, P. Burt, and L. Wixson, Advancesin cooperative multi-sensor video surveillance, in Proc. DARPA ImageUnderstanding Workshop, Nov. 1998, vol. 1, pp. 324.

    [7] A. Cavallaro and T. Ebrahimi, Video object extraction based on adap-tive background and statistical change detection, in Proc. SPIE VisualCommunications and Image Processing, Jan. 2001, pp. 465475.

    [8] C. Stauffer and W. E. L. Grimson, Adaptive background mixturemodels for real-time tracking, in Proc. IEEE Conf. Computer Visionand Pattern Recognit., 1999, vol. 2, pp. 246252.

    [9] C. Stauffer and W. E. L. Grimson, Learning patterns of activity usingreal-time tracking, IEEE Trans. Pattern. Anal. Mach. Intell., vol. 22,no. 8, pp. 747757, Aug. 2000.

    [10] P. KaewTraKulPong and R. Bowden, An improved adaptive back-ground mixture model for real-time tracking with shadow detection,in Proc. 2nd European Workshop on Advanced Video Based Surveil-lance Systems, Kingston, U.K., Sep. 2001, pp. 149158.

    [11] D.-S. Lee, Effective Gaussian mixture learning for video backgroundsubtraction, IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 5,pp. 827832, May 2005.

    [12] L. Li, W. Huang, I. Y.-H. Gu, and Q. Tian, Statistical modeling ofcomplex backgrounds for foreground object detection, IEEE Trans.Image Process., vol. 13, no. 11, pp. 14591472, Nov. 2004.

    [13] A. Elgammal, R. Duraiswami, D. Harwood, and L. S. Davis, Back-ground and foreground modeling using nonparametric kernel densityestimation for visual surveillance, Proc. IEEE, vol. 90, no. 7, pp.11511163, Jul. 2002.

    [14] A. Elgammal, R. Duraiswami, and L. Davis, Efficient kernel densityestimation suing the Fast Gauss Transform with applications to colormodeling and tracking, IEEE Trans. Pattern Anal. Mach. Intell., vol.25, no. 11, pp. 14991504, Nov. 2003.

    [15] L. Greengard and J. Strain, The fast Gauss transform, SIAM J. Sci.Statist. Comput., vol. 12, no. 1, pp. 7994, 1991.

    [16] C. Yang, R. Duraiswami, N. A. Gumerov, and L. Davis, Improved fastGauss transform and efficient kernel density estimation, in Proc. IEEEInt. Conf. Computer Vision, Nice, France, Oct. 2003, pp. 464471.

    [17] C. Ianasi, V. Gui, C. I. Toma, and D. Pescaru, A fast algorithm forbackground tracking in video surveillance, using nonparametric kerneldensity estimation, Facta Universitatis Ser.: Elect. Energ., vol. 18, no.1, pp. 127144, Apr. 2005.

    [18] T. W. Lee, Independent Component Analysis: Theory and Applica-tion. Boston, MA: Kluwer, 1998.

    [19] A. Hyvarinen, J. Karhunen, and E. Oja, Independent Component Anal-ysis. New York: Wiley, 2001.

    [20] A. Hyvarinen and E. Oja, Independent component analysis: Algo-rithms and applications, Neural Netw., vol. 13, pp. 411430, 2000.

    [21] A. Hyvarinen and E. Oja, A fast fixed-point algorithm for independentcomponent analysis, Neural Comput., vol. 9, pp. 14831492, 1997.

  • TSAI AND LAI: INDEPENDENT COMPONENT ANALYSIS-BASED BACKGROUND SUBTRACTION FOR INDOOR SURVEILLANCE 167

    [22] J. Hurri, H. Gavert, J. Sarela, and A. Hyvarinen, FastICA Package, 2004[Online]. Available: http://www.cis.hut.fi/projects/ica/fastica/

    [23] D. C. Hoaglin, F. Mosteller, and J. W. Tukey, Understanding Robustand Exploratory Data Analysis. New York: Wiley, 1983, pp. 2324.

    [24] J. Kennedy and R. Eberhart, Particle swarm optimization, inProc. IEEE Int. Conf. Neural Networks, Perth, Australia, 1995, pp.19421948.

    [25] Y. Shi and R. Eberhart, A modified particle swarm optimizer, in Proc.IEEE Int. Conf. Evolutionary Computation, Anchorage, AK, 1998, pp.6973.

    Du-Ming Tsai received the B.S. degree in industrialengineering from the Tunghai University, Taiwan,R.O.C., in 1981, and the M.S. and Ph.D. degrees inindustrial engineering from Iowa State University,Ames, in 1984 and 1987, respectively.

    From 1988 to 1990, he was a Principal Engineerof Digital Equipment Corporation, Taiwan branch,where his work focused on process and automationresearch and development. Currently, he is a Pro-fessor of industrial engineering and managementat the Yuan-Ze University, Taiwan. His research

    interests include automated visual inspection, texture analysis, and visualsurveillance.

    Shia-Chih Lai received the B.S. degree in statisticsfrom the Tunghai University, Taiwan, R.O.C., in2004, and the M.S. degree in industrial engineeringand management from the Yuan-Ze University,Taiwan, in 2006.

    She is currently a researcher with Delta Elec-tronics, Inc., Taiwan.