[IEEE 2005 IEEE International Joint Conference on Neural Networks, 2005. - MOntreal, QC, Canada (July 31-Aug. 4, 2005)] Proceedings. 2005 IEEE International Joint Conference on Neural

Proceedings of International Joint Conference on Neural Networks, Montreal, Canada, July 31 - August 4, 2005

Skin Color Segmentation by Histogram-BasedNeural Fuzzy Network

Chia-Feng Juang and Hwai-Sheng PemgDepartment of Electrical Engineering,

National Chung Hsing University,Taichung, 402 Taiwan, R.O.C.

Abstract- Skin color image segmentation by a histogram-based Self-cOnstructing Neural Fuzzy Inference Network(SONFIN) is proposed in this paper. Each color pixel isrepresented by a Hue-Saturation (HS) space. To represent ablock color by histogram as accurately as possible, a non-uniform quantization approach of HS space is considered.Histogram information of HS from images under differentenvironments is used to train SONFIN to make the method asrobust as possible. To verify performance of the proposedmethod, experiment on human-hand segmentation is performed.For comparison, other segmentation methods are applied to thesame problem.

I. INTRODUCTION

Color is a visual feature which is immediately perceivedwhen looking at an image, and is an important feature thathuman sense to cluster desired objects. Color imagesegmentation has been extensively applied in patternrecognition, image analysis and computer vision. It is aprocess of dividing an image into different regions such thateach region is homogeneous [1]. The color of an image cancontain much more information than its gray level. Thus, inmany pattern recognition and computer vision applications,the additional information provided by color can help theimage analysis process and yield better results thanapproaches using only gray scale information.

Based on color information, many techniques have beenproposed, including principal component transformation(PCT) [2], fuzzy C-means (FCM) [3], neural networks [4],histogram-based classifier [5]-[7], and mixture of Gaussianclassifier [8], etc. For histogram based approach similaritiesof colors are computed according to allocations on histogrambins. The potential of using color histograms for color imageindexing has previously been demonstrated in [5]. In [6],hand is segmented by tracking patches of the image to thestored histogram using histogram intersection (HI) algorithm.Since only one patch of hand histogram is used as referenceand background patches are not considered, the approachmay be very sensitive to environment. In [7], histogram isused to represent the probability density function inlikelihood ratio calculation. A major drawback of thisapproach is that a considerable amount of training data isrequired. In this paper, we will apply histogram informationto color segmentation, and for the aforementioned histogram-

Shin-Kuan ChenDepartment of Electrical Engineering,Chung Chou Institute of Technology,

Yuan-Lin, Taiwan, R.O.C

based problems, segmentation by neural fuzzy network isproposed.

Two colors will be considered as totally different if theyfall into two different bins, even though they might be verysimilar to each other. This makes the color histogram basedsegmentation method sensitive to noisy interference. Toimprove the segmentation robustness, segmentation byhistogram-based neural fuzzy network is proposed. Incontrast to PCT or FCM methods, where only one sampledata is used, by using neural fuzzy networks we may train theneural fizzy network based on several sample patternscollected from different environments. In addition, forhistogram computation, to represent block color by histogramas accurately as possible, instead of using the general uniformquantization for a color space, a non-uniform quantizationapproach is proposed in this paper.

The neural fuzzy network used in this paper is the self-constructing neural fuzzy inference network (SONFIN)[9],although other choices of neural networks or fuzzy systemsare possible. The choice of neural fizzy network is based onseveral aspects. First, the SONFIN is a hybrid system ofneural networks and fuzzy logic. With a fuzzy-inference-typed structured network, the SONFIN can always achievehigher learning accuracy than normal neural networks.Second, as compared to the existing neural fuzzy networks,the SONFIN can perform both structure and parameterlearning simultaneously so that it can online construct itselfdynamically.

The remaining sections of this paper are organized asfollows. Section II describes the non-uniform color spacedivision approaches for histogram computation. In Section III,we introduce SONFIN and how it is applied to color imagesegmentation. Section IV experiments on hand segmentationbased on several color image segmentation methods,including HI, PCT, FCM, and histogram-based SONFIN, forskin-color segnentation. In addition, performancecomparisons between these methods are made. Finally,conclusions are summarized in the Section V.

II. COLOR SPACE DIVISIONColor histograms are a traditional and highly efficient

way to describe the color properties of images. It is obtainedby quantizing image colors and counting how many pixelsbelong to each color. In the following study, histogram basedon distribution on H (Hue) and S (Saturation) color model is

0-7803-9048-2/05/$20.00 ©2005 IEEE 3058

studied. Color quantization plays an important role inrepresenting color content by histograms. Quantization mustbe fine enough so that distant colors are not in the same bin.In contrast the uniform partition, where each color channel isequally partitioned, quantization based on non-uniform HSspace partition is proposed in this paper. The main goal of thepartition is that the image pixels should be equally distributedin each histogram bin. For demonstration, suppose there arenine pixels. The partition process of the proposed approach isdescribed as follows. In this partition, a horizontal partitionis followed a vertical partition in each partition process. Toperform a new partition, the bin with the maximum numberof pixels is partitioned. In each partition process, first ahorizontal partition that partitions the bin into two equallysized bins is performed. Then, the bin with the maximumnumber of pixels is found and vertically partitioned into twoequally sized bins. One partition process ends after thevertical partition is completed. Thus, after n times ofpartition processes, we will have a total of n +1 bins. Anillustration of the partition process is shown in Fig. 1, whereD. denotes the i th partition.

III. SEGMENTATION BY NEURAL FUZZY NETWORK

The neural fuzzy network used for color imagesegmentation is the self-constructing neural fuizzy inferencenetwork (SONFIN) [9]. A key feature of the SONFINstructure is that a high-dimensional fuizzy system isimplemented with a small number of rules and fizzy terms.This is achieved first by partitioning the input and outputspaces into clusters efficiently through learning proper fuizzyterms for each input/output variable, and then by constructingfuzzy rules optimally through finding proper mappingbetween input and output clusters in the SONFIN. ForSONFIN learning, there are no rules initially in the SONFIN.They are created and adapted as on-line learning proceeds viasimultaneous structure and parameter learning. The numberof generated rules and membership functions is small evenfor modeling a sophisticated system. The SONFIN alwaysproduces an economical network size, and the learning speedand modeling ability are superior to ordinary neural networks.

A. Structure and Learning ofSONFINThe structure of the SONFIN is shown in Fig. 2. This

six-layered network realizes a fiuzzy model of the followingform:

Rule i: IF xl is A' and and x is A'THEN y is a' +ax1+a --

where A' is the fiuzzy set of the ith linguistic term of input

variable x1, and a' is the consequent parameter. Let u(k)

and o(k) denote the input and output of the ith node in layerk, respectively. We shall describe the functions of the nodesin each of the six layers of the SONFIN as follows.

Layer 1: No computation is done in this layer. Eachnode in this layer, which corresponds to one input variable,only transmits input values to the next layer directly. That is

f = u(1) and a°' = f (1)From the above equation, the link weight in layer one(w(')is unity.

Layer 2: Each node in this layer corresponds to onelinguistic label (small, large, etc.) of one of the inputvariables in Layer 1. In other words, the membership value,which specifies the degree to which an input value belongs toa fiuzzy set is calculated in Layer 2. With the choice ofGaussian membership function, the operation performed inthis layer is

f(u) - - 2 and a (f) = e07

(2)

where mii and a., are, respectively, the center (or mean)and the width (or variance) of the Gaussian membershipfunction of the jth partition for the ith input variable u, .

Hence, the link weight in this layer can be interpreted as mrj .

Layer 3: A node in this layer represents one fuzzy logicrule and perforns precondition matching of a rule. Here weuse the following AND operation for each Layer-3 node:

f (U(3) ) U(3) and a(3(f3)=fi=l

(3)

where q is the number of Layer-2 nodes participating in the

IF part of the rule. The weights of the links in Layer 3 ( w(3) )have the value of one. The output of a Layer-3 noderepresents the firing strength of the corresponding fuizzy rule.

Layer 4: The number of nodes in this layer is equal tothat in Layer 3 and the firing strength calculated in Layer 3 isnormalized in this layer by

f(U 4) = (4) and a(4)(f) - (4) /i=1

(4)

where r is the number of rule nodes in Layer 3. Like Layer3, the link weight (w(4) )in this layer is unity too.

Layer 5: This layer is called the consequent layer. Twotypes of nodes are used in this layer and they are denoted asblank and shaded circles in Fig. 2, respectively. The functionof the blank node is

s

f=U(5) an (5) (f)=f *-aOf= U, and a5(f=i=1

(5)

where s is the number of nodes in Layer 4 and aI = Mr isthe center of a Gaussian membership function. One of theinputs to a shaded node is the output delivered from Layer 4and the other possible inputs (terms) are the input variablesfrom Layer 1. The shaded node function is

3059

n

f =ZaSx and a(5)(f)=f U(5), (6)j=1

where the summation is over the significant terms connectedto the shaded node only and a' is the correspondingparameter. Combining these two types of nodes in Layer 5,we obtain the whole function performed by this layer as

a(5)(f=ax +a (5). 7

Layer 6: Each node in this layer corresponds to oneoutput variable. The node integrates all the actionsrecommended by Layer 5 and acts as a defuzzifier with

f(u(6))= u(6) and a(6f) = f (8)1=1

where t is the number of nodes in Layer 5.Learning of SONFIN consists of structure and parameter

learning, details ofwhich can be found in [9].B. Segmentation by SONFIN

To implement SONFIN for color segmentation, supposeour objective is to segment human hand from an image basedon skin-color. In training SONFIN, the inputs are colorfeatures from the image and the outputs show the category,skin or non-skin color, to which the input feature belongs.There are two outputs in SONFIN. When the inputs are colorfeatures representing skin-color, then the desired output is (1,0); otherwise, the desired output is (0, 1). During a test, whenan unknown color feature vector is fed to the input ofSONFIN, we compute the two outputs (Y,1Y2). If y1 is

larger than y2, the corresponding pixel(s) is classified as"hand"; otherwise, it is classified as "non-hand".A histogram on the HS space is used as feature vector for

SONFIN input. For this, an original image is divided intonon-overlapping blocks, where each block size is of lOx 10pixels. For an image containing 640 x 480 pixels, there willbe a total of 3072 non-overlapping blocks. The histograms ofa block are then used as network inputs. For SONFINtraining, histograms of skin color and non-skin color fromblocks of images from different people, and backgrounds arecollected. This way, we may increase the robustness of thesegmentation system.

IV. EXPERIMENTSTo illustrate the proposed approach for color image

segmentation, experiments on segmenting "hand" from "non-hand" based on skin-color features are performed. To showthe generalization ability of the proposed method, only tenimages are used for training. Each image consists of640 x 480 pixels. These images differ not only in skin-colorsbut also in backgrounds and lighting conditions. ForSONFIN-based segmentation, each original image is dividedinto non-overlapping blocks, where each block size is lO x 10pixels. Thus, each image consists of 3072 blocks. Training

data are collected from the ten training images, where tenskin-color blocks and ten non-skin-color blocks are collectedfrom each image, and there are a total of 200 blocks fortraining. The 20000 pixels from the 200 blocks aretransformed to HS space, and then the space is partition into49 bins by the proposed non-uniform partition approach.

To use the histogram-based SONFIN segmentationmethod, we first need to train SONFIN. Histograms of the200 training blocks are used as SONFIN input. Performanceof SONFIN by histograms from the four types of divisionsthat partitions the HS space into 49 bins are all tested. Theinput and output dimensions of SONFIN are 49 and 2,respectively. The learning parameters in SONFIN are setas 7 =0.02, Fin=0.03, A=1.0, where q is the learning rate

for antecedent part parameters, Fin is a pre-specifiedthreshold that influences the number of rules generated, 2 isthe parameter for consequent parameters leaming using RLS.The training is performed for only 5 iterations. After training,4 input clusters (rules) are generated. Then, another 10images are used for test.

To evaluate segmentation performance, we simply use thefollowing performance index

Rate = Ph x05+ Pnh x 0.5dh dnh

(9)

where dh and dnh are the total numbers of pixels trulybelonging to skin and non-skin colors, respectively; whileph and pnh are the total numbers of practically segmentedpixels for skin and non-skin colors, respectively.The segmentation performance by the trained SONFIN

with uniform partition and the proposed non-uniformpartition is shown in Table 1, where the "error pixel" meansthe total number of misclassified pixels. The result showsthat the proposed partition method performs better then thegenerally adopted uniform partition. As illustration, some testimages and their segmentation results by SONFIN are shownin Fig. 3 and 4, respectively.

To verify the performance of the histogram-basedSONFIN segmentation method, several other color imagesegmentation methods are tested and compared. Thesecompared experiments contain color image segmentation bythe histogram intersection (HI) method, principal componenttransformation (PCT), and fizzy c-means (FCM).

For the histogram intersection (HI) segmentation method[6], a patch of hand with the size of 10 x 10 pixels is selectedfrom the image, and histogram HP of the sample patch iscomputed. For each segmented image, histograms HW ofeach 10 x 10 non-overlapping block of the image iscalculated, and matching score Mp,q is calculated by

3060

Mhs min(HP(Zhs),Hq(h,s))

pqm= nHP(h,s) (2 dimension h and s)p,q EZhs HP(h,s)Those blocks whose scores are above a threshold are of skincolor. Different values of threshold are tested, and theoptimal value is found to be 0.35.

As in the general approach [2], [3], we perform PCT andFCM segmentations on the color space. For PCT, we sampleone skin-color block from the training image and find theaverage h and s values. Then, find the principal eigenspace.The projection of the sample skin-color block onto theprincipal eigenspace is denoted as Pd. For each image, wetake the average h and s values of each 10 x 10 non-overlapping block, so we obtain 3072 vectors. Then, findprojections pjs' of these vectors on the eignespace. A block

is classified as skin color if 0.8Pd < pj < 1.3pd; otherwise,

it is classified as non-skin color. Performances ofPCT for thetest images are shown in Table 1.

The FCM algorithm is an iterative unsupervisedclustering algorithm, which adjusts group representativecenters to best partition the image into several distinct classes.For FCM, we also use the average H and S values in eachblock as color features. First, 10 of the 3072 features in animage are sampled and are initially used as centers of tenclusters. We use Eq. (10) to compute the degrees to which the3072 features of each feature belong the ten clusters,

4._~~~ ~ ~~~(0(f 1

ui] =

i = 1, 2,... , 10; j = 1, 2,..., 3072.where v denotes the cluster center and m = 1.25. The tenclusters are updated iteratively, and the iteration stops wheneach feature's degree equals or surpasses 0.8. Afterclustering, we need to decide which clusters arerepresentative of skin-color by calculating the degree towhich the reference hand feature belongs the ten clusters.After trials, we find that the best performance is achievedby selecting two clusters with highest degree. Theperformance of FCM for test images is shown in Table 1.Overall, the histogram-based SONFIN segmentationmethod achieves the best performance.

V. CONCLUSIONThis paper proposes color image segmentation by

histogram-based SONFIN. To represent a color by histogramas accurately as possible, a non-uniform space partitionapproaches on the HS plane is proposed. For comparison,color image segmentation by HI method, PCT and FCM areperformed, and the performance of these compared methodsis inferior to that of the proposed method. In the future, wewill test the proposed approach with a larger databasecontaining skin colors not restricted to human-hand only. For

the proposed approach, the segmentation is based on non-overlapping 10 x 10 blocks instead of individuals pixels. Inthe future, we will use overlapping blocks to produce finersegmentation results.

ACKNOWLEDGMENTThis work was supported by the National Science Council,

Taiwan, R.O.C. under Grant NSC-93-2218-E-005-039.REFERENCES

[1] N. Pal and S. Pal, "A review on image segmentationtechniques," Pattern Recognti., Vol. 26, no. 9, pp.1277-1294, 1993.

[2] R.D. Dony and S. Haykin, "Image segmentation using amixture of principal components representation," IEEProc. Vision, Image Signal Process, pp. 73-80 April1997.

[3] J.-F. Yang, S.-S. Hao and P.-C. Chung, "Color imagesegmentation using fuzzy C-means and eigenspaceprojections," Signal Processing, pp. 461-472, March2002.

[4] E. Littmann and H. Ritter, "Adaptive colorsegmentation - a comparison of neural and statisticalmethods," IEEE Trans. Neural Networks, pp. 175-185,January 1997.

[5] M. J. Swain and D. H. Ballard, "Color indexing," Int. J.Comput. Vis., Vol. 7, no. 1, pp. 11-32, 1991.

[6] S. Ahmad, "A Usable Real-Time 3D Hand Tracker," Proc.the Twenty-Eighth Asilomar Conf: on Signals, Systemsand Computers, vol. 2, Oct. 1994.

[7] M. J. Jones and J. M. Rehg, "Statistical color models withapplication to skin detection," Int. Journal of ComputerVision, vol. 46, no. 1, pp. 81-96, Jan, 2002.

[8] M. H. yang and N. Ahuja, "Gaussian mixture model forhuman skin color and its application in images and videodatabases," Proc. Of the SPIE: ConfJ on Storage andRetrievalfor Image and Video databases, vol. 3656, pp.458466, 1999.

[9] C.F. Juang and C.T. Lin, "An on-line self-constructingneural fiuzzy inference network and its applications,"IEEE Trans. Fuzzy Systems, vol. 6, pp. 12-32, 1998.

3061

I

1/(M-I) ,10 2

2: k=l xi Vk

or 02 03 04 05 06 07 03 09

S

Fig. 1. The proposed color space partition method.

Fig. 4 Segmentation results for the test images by histogram-based SONFINmethod with non-uniform partition..

Table 1. Segmentation Performances For The Test Images Using DifferentAlgorithms.

a

x

Fig. 2. Structure ofthe SONFIN

Fig. 3. Illustrations ofthe images used for testing.

3062

240 . t

130 D1

120 + C 2

o,

Lw5L4 [

LaY 3 1

L.y 2

1.Y I[

SONFIN SONFIN HI PCT FCM

(uniform) (non-uniform)

Rate 75.90% 83.15% 49.830/ 69.26% 60.59%

Error 88254 64921 271611 128399 181200pixels

Documents

[IEEE 2005 IEEE International Joint Conference on Neural Networks, 2005. - MOntreal, QC, Canada (July 31-Aug. 4, 2005)] Proceedings. 2005 IEEE International Joint Conference on Neural