8
Y. Zhang et al. (Eds.): IScIDE 2011, LNCS 7202, pp. 599–606, 2012. © Springer-Verlag Berlin Heidelberg 2012 An Improved Generalized Fuzzy C-Means Clustering Algorithm Based on GA Wenping Ma, Xiaohua Ge, and Licheng Jiao Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education Postbox: 224, Xidian University, No.2 of Taibai Road, 710071 Xi’an, P.R. China [email protected], [email protected] Abstract. A new generalized clustering algorithm with the name of genetic algorithm based rough-fuzzy possibilistic c-means (GARFPCM) is proposed. It derives from an unsupervised learning algorithm called RFPCM, which is unstable for the reason of random initialization. GA is introduced into RFPCM to generate an improved version, which is GARFPCM mentioned above. GARFPCM can obtain better clustering quality. Through performance evaluation on image segmentation, GARFPCM is shown to perform excellently. Keywords: Clustering, GARFPCM, Image Segmentation. 1 Introduction Clustering partitions some objects into several different classes, which works in such a way that two objects are divided into a same class if they are similar enough. K-means [1], FCM [2], and PCM [3] clustering algorithms are the most commonly used clustering methods. Based on rough-set [4], P. Maji and Sankar K. Pal come up with a new method called rough-fuzzy possibilistic c-means (RFPCM) [5]. RFPCM can avoid the problems of noise sensitivity of FCM and the coincident clusters of PCM. However, RFPCM may be stuck at a local optimum. So based on GA [6] [7] an evolutionary version of RFPCM is proposed which is called GARFPCM. It can effectively avoid the initialization sensitivity and local optimum problems. 2 RFPCM Clustering Algorithm Let X={x 1 , …, x j , …, x n } be the set of n objects and V={v 1 , …, v i , …, v c } be the set of c cluster centers. Let A (β i ) and A (β i ) be the lower and upper approximation of cluster β i , and let B(β i )={A (β i )-A (β i )} denote the boundary region of cluster β i . The original RFPCM algorithm partitions X into c clusters by minimizing the objective function

[Lecture Notes in Computer Science] Intelligent Science and Intelligent Data Engineering Volume 7202 || An Improved Generalized Fuzzy C-Means Clustering Algorithm Based on GA

  • Upload
    ying

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Y. Zhang et al. (Eds.): IScIDE 2011, LNCS 7202, pp. 599–606, 2012. © Springer-Verlag Berlin Heidelberg 2012

An Improved Generalized Fuzzy C-Means Clustering Algorithm Based on GA

Wenping Ma, Xiaohua Ge, and Licheng Jiao

Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education Postbox: 224, Xidian University,

No.2 of Taibai Road, 710071 Xi’an, P.R. China [email protected],

[email protected]

Abstract. A new generalized clustering algorithm with the name of genetic algorithm based rough-fuzzy possibilistic c-means (GARFPCM) is proposed. It derives from an unsupervised learning algorithm called RFPCM, which is unstable for the reason of random initialization. GA is introduced into RFPCM to generate an improved version, which is GARFPCM mentioned above. GARFPCM can obtain better clustering quality. Through performance evaluation on image segmentation, GARFPCM is shown to perform excellently.

Keywords: Clustering, GARFPCM, Image Segmentation.

1 Introduction

Clustering partitions some objects into several different classes, which works in such a way that two objects are divided into a same class if they are similar enough.

K-means [1], FCM [2], and PCM [3] clustering algorithms are the most commonly used clustering methods. Based on rough-set [4], P. Maji and Sankar K. Pal come up with a new method called rough-fuzzy possibilistic c-means (RFPCM) [5]. RFPCM can avoid the problems of noise sensitivity of FCM and the coincident clusters of PCM. However, RFPCM may be stuck at a local optimum. So based on GA [6] [7] an evolutionary version of RFPCM is proposed which is called GARFPCM. It can effectively avoid the initialization sensitivity and local optimum problems.

2 RFPCM Clustering Algorithm

Let X={x1, …, xj, …, xn} be the set of n objects and V={v1, …, vi, …, vc} be the set of

c cluster centers. Let A—

(βi) and A(βi) be the lower and upper approximation of cluster

βi, and let B(βi)={A—

(βi)-A(βi)} denote the boundary region of cluster βi. The original RFPCM algorithm partitions X into c clusters by minimizing the objective function

600 W. Ma, X. Ge, and L. Jiao

( ) ( ) ( )( ) ( )( ) ( )

( ) ( ){ }( )

( )( )

( )

1 2

2

1

1 1

1

1

2

11

1

1

1 , if ,

, if ,

, if ,

1

j i

j i

i i

i i i

i i

c m m

ij ij j ii x A

c m

i iji x A

m

ij i

A B A B

J A A B

B A B

A a b x v

B a b

β

β

ω ω β ββ ββ β

μ ν

η ν

μ ν

= ∈

= ∈

× + − × ≠ ∅ ≠ ∅

= ≠ ∅ = ∅ = ∅ ≠ ∅

= + −

+ −

= +

( ){ }( )

( )( )

2

2

2

1

1

1 .

j i

j i

c m

j j ii x B

c m

i iji x B

x vβ

βη ν

= ∈

= ∈

+ −

(1)

where m1 and m2 are the fuzzifiers (generally m1=m2=2). The parameter ω corresponds to the relative importance of lower region. Constants a and b define the relative importance of probabilistic membership μij and possibilistic membership νij. The computing formula for the scale parameter ηi is

( )

( )

2

2

2

1

1

.

n m

ij j ij

i n m

ijj

x vνη

ν

=

=

−=

(2)

It is necessary to determine whether xj belongs to A—

(βi) or A(βi). First let uij represent the total membership of xj to cluster βi, and uij={aμij+bνij}. After computing {u1j, …, uij, …, ucj} for all c clusters, the difference of the highest membership uij and the second highest membership ukj is compared with a threshold value δ. If (uij-ukj)>δ, then xj∈A(βi), as well as xj∈A

(βi); otherwise, xj∈A—

(βi), and xj∈A—

(βk). After the update process the clustering result of each object can be obtained. The

convergence property and update formulae can be obtained from literature [5].

3 GA Based RFPCM Clustering Algorithm

Here GA is used to find the optimal solution to clustering problems. By encoding the initial parameters into chromosomes the population is initialized. Here the parameters to be encoded include the cluster centers and scale parameters (η). After that the population is updated and evolved by means of genetic manipulation.

An Improved Generalized Fuzzy C-Means Clustering Algorithm Based on GA 601

The basic steps of GARFPCM are shown in Fig. 1 and they are now described in detail.

Objects to be clustered

Initialize the parameters and populations

Clustering result

Termination condition satisfied?

Update the memberships

Update the population

Y

N

Calculate the fitness functions

Evolve the population

Fig. 1. Flow chart of GARFPCM

For an N-dimensional space, each chromosome is made up of two parts. The upper part is a string of N*c real numbers which represent c cluster centers, while the lower part is a string of c real numbers which represent c scale parameters. For example, if N=2 and c=3, then each individual is generated like what is shown in Fig. 2.

Fig. 2. Encoding scheme: an individual

As the population size P is considered, the upper part is initialized with c different objects randomly, while the lower part is initialized with c positive real numbers.

For every generation, the memberships and population are updated for just one time. In other words, update the two memberships, and then update the upper and lower part, respectively. The process of evolution makes a global search for the optimal solution, which includes 3 procedures: selection, crossover, and mutation.

4 Image Segmentation Using GARFPCM

Image segmentation [8] partitions a digital image into several salient segments. Kinds of image segmentation techniques have been proposed [9]. Clustering is one of the most commonly used techniques [10] [11]. In this paper GARFPCM is employed to segment gray images. The whole procedure is now described in detail.

602 W. Ma, X. Ge, and L. Jiao

4.1 Feature Extraction

The image information here to be extracted is called texture, which can be considered to be repeating patterns of local variation of pixel intensities. Methods based on gray-level co-occurrence matrix (GLCM) [12] [14] [15] [16] and wavelet decomposition [13] are employed in this paper.

4.2 Watershed Pre-segmentation

In this paper watershed pre-segmentation [17] is used to reduce the computational complexity, with images divided into many small regions. The basic unit to be clustered is changed into these regions from numerous image pixels. After watershed pre-segmentation finished, every pixel in the object image is labeled a region mark.

4.3 Representation of Region Features

With feature extraction and pre-segmentation finished, features of all regions are computed as the objects to be clustered by GARFPCM. For any region, every pixel belonging to it is considered to obtain the features. Let the average vector of features of all the pixels in a region be the texture features.

4.4 Clustering with GARFPCM

Region features are taken as the objects and clustered using GARFPCM. Regions that are partitioned into the same cluster are assigned the same class label.

After pixels in the object image are assigned the class labels, the process of image segmentation is finished.

5 Experimental Study

The goal of this section is to evaluate the performance of the proposed GARFPCM clustering algorithm compared with other congeneric algorithms from different perspectives.

This experiment is about application of clustering in image segmentation. Here the results of different clustering algorithms are presented to research the application ability of GARFPCM in the field of image processing.

For each image, region features are generated and clustered by clustering algorithms (including k-means, FCM, RFPCM, and GARFPCM) to obtain the segmentation result. Here six images including four synthetic textured images (shown in Fig. 3), one SAR image and one natural image (shown in Fig. 4) are selected to be the experimental objects. Experimental parameters are the same as that of the first part except ω=0.6 and a=0.8 for RFPCM and GARFPCM.

Fig. 5 presents the typical results from the four selected algorithms on all six test images.

An Improved Generalized Fuzzy C-Means Clustering Algorithm Based on GA 603

Fig. 3. Four synthetic textured images and their true partitioning results (a: Mosaic3_2, b: Mosaic4_new, c: D24D68D16D19_gray, d: Song_spie2003_tex3)

Fig. 4. The other two images (e: SAR image Air_port3 (including 3 regions), f: natural image Moon (including 2 regions))

About these four synthetic textured images, Table 1 presents the statistical results of this experiment, which are computed according to their true partitioning results. It shows that on image Mosaic3_2, GARFPCM and RFPCM obtain higher correct ratio than the others do; on the rest images, the proposed GARFPCM algorithm gets the highest correct ratio and has the most stable segmentation result of all the algorithms with respect to the variances listed. Compared with RFPCM, the new version GARFPCM increases the correct ratio by 7.26% averagely. Their visual results are shown in Fig. 5 (from part (a) to part (d)).

Part (e) of Fig. 5 gives the segmentation results on SAR image Air_port3, which includes three target regions: flat ground, airport runways, and buildings. It is shown that GARFPCM can identify these three regions to some degree and perform well with respect to edge preserving about the runways. In addition, GARFPCM shows an excellent capability of speckle suppression in the ground region. By comparison, RFPCM loses much information about the buildings while k-means and FCM leave more speckles than GARFPCM does. Part (f) of Fig. 5 presents the segmentation results on natural image Moon, which includes two target regions: the moon and the background. This part indicates that k-means, FCM and GARFPCM can successfully separate the moon from the dark background, of which GARFPCM leaves the least misclassified pixels in the moon region. As for RFPCM, it produces a bad segmentation result by mistakenly classifying the inner part of the moon and the background into the same class.

604 W. Ma, X. Ge, and L. Jiao

Fig. 5. Typical segmentation results from 4 algorithms on the 6 selected images

An Improved Generalized Fuzzy C-Means Clustering Algorithm Based on GA 605

Table 1. Correct ratios and corresponding variances (enclosed in parentheses) obtained from four selected algorithms on the synthetic textured images

Image label K-means FCM RFPCM GARFPCM

(1) 95.21% (0)

95.21% (0)

96.03% (0)

96.03% (2.4×10-7)

(2) 88.64% (0.0189)

84.92% (0.027)

80.00% (0.0303)

95.02% (4.3×10-7)

(3) 78.26% (0.0224)

92.44% (0.0105)

86.09% (0.0239)

96.04% (1.5×10-6)

(4) 83.88% (0.0206)

64.16% (0.0045)

91.59% (0.0117)

95.67% (8.6×10-8)

In sum, as for application in image segmentation, the proposed GARFPCM

clustering algorithm shows more satisfying result than other common congeneric algorithms do in terms of statistics and visual perception.

6 Conclusion

In this paper an improved generalized fuzzy c-means clustering algorithm called GARFPCM is proposed. GA has been used in this new algorithm for different clustering tasks with respect to minimizing the objective function. To demonstrate the effectiveness of GARFPCM, several experiments on image segmentation have been considered. The results show that GARFPCM provides a performance significantly superior to that of the original RFPCM and other common congeneric methods, according to several evaluation criterions.

Acknowledgments. This work has been partially supported by the Fund for Foreign Scholars in University Research and Teaching Programs (the 111 Project) (No. B07048); the Natural Science Basic Research Plan in Shaanxi Province of China (No. 2009JQ8015 and 2010JQ8023), and the Fundamental Research Funds for the Central Universities (No.JY10000902001 and K50510020011).

References

1. Jain, A.K., Murty, M.N., Flyn, P.J.: Data Clustering: A Review. ACM Computing Surveys 31(3) (1999)

2. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithm. Plenum, New York (1981)

3. Krishnapuram, R., Keller, J.M.: The Possibilistic C-Means Algorithm: Insights and Recommandations. IEEE Trans. Fuzzy Syst. 4, 385–393 (1996)

4. Pawlak, Z.: Rough Sets: Theoretical Aspects of Reasoning about Data. Kluwer, Dordrecht (1991)

606 W. Ma, X. Ge, and L. Jiao

5. Maji, P., Pal, S.K.: Rough Set Based Generalized Fuzzy C-Means Algorithm and Quantitative Indices. IEEE Trans. Syst., Man, Cybern.-Part B: Cybernetics 37(6) (2007)

6. Hruschka, E.R., Campello, R.J.B., Freitas, A.A., de Carvalho, A.C.P.L.F.: A Survey of Evolutionary Algorithms for Clustering. IEEE Trans. Syst., Man, Cybern.-Part C: Applications and Reviews 39(2) (2009)

7. Davis, L.: Handbook of Genetic Algorithms. Van Nostrand Reinhold, New York (1991) 8. Pal, N.R., Pal, S.K.: A Review on Image Segmentation Techniques. Pattern

Recognition 26(9), 1277–1294 (1993) 9. Fu, K.S., Mui, J.K.: A Survey on Image Segmentation. Pattern Recognition 13(1), 3–16

(1981) 10. Bezdek, J.C., Hall, L.O., Clarke, L.P.: Review of MR Image Segmentation Techniques

Using Pattern Recognition. Medical Physics 20(4), 1033–1048 (1993) 11. Pappas, T.: An Adaptive Clustering Algorithm for Image Segmentation. IEEE Trans.

Signal Process. 40(4), 901–914 (1992) 12. Haralick, R.M., Shanmugam, K., Dinstein, I.: Textural Features for Images Classification.

IEEE Trans. Syst., Man, Cybern. 3(6), 610–621 (1973) 13. Fukuda, S., Hirosawa, H.: A Wavelet-Based Texture Feature Set Applied to Classification

of Multifrequency Polarimetric SAR Images. IEEE Trans. Geosci. Remote Sensing 37(8), 2282–2286 (1999)

14. Srinivasan, G.N., Shobha, G.: Statistical Texture Analysis. Proc. of Word Academy of Science, Engineering and Technology 36, 1264–1270 (2008)

15. Clausi, D.A., Yue, B.: Comparing Cooccurrence Probabilities and Markov Random Fields for Texture Analysis of SAR Sea Ice Imagery. IEEE Trans. Geosci. Remote Sensing 42(1), 215–228 (2004)

16. Jiao, L., Gong, M., et al.: Natural and Remote Sensing Image Segmentation Using Memetic Computing. IEEE Computational Intelligence Magazine 5(2), 78–91 (2010)

17. Wang, D.: A Multiscale Gradient Algorithm for Image Segmentation Using Watersheds. Pattern Recognition 30(12), 2043–2052 (1997)