7
A Maximizing Model of Bezdek-like Spherical Fuzzy c-Means Clustering Yuchi Kanzawa Abstract— In this study, a maximizing model of Bezdek-type spherical fuzzy c-means clustering is proposed, which is based on the regularization of the maximizing model of spherical hard c-means. Using theoretical analysis and numerical experiments, it is shown that the proposed method is not equivalent to the minimizing model of Bezdek-type spherical fuzzy c-means, because the effect of its fuzzifier parameter is different from that found in conventional methods. I. I NTRODUCTION The hard c-means (HCM) clustering algorithm [1] splits datasets into well-separated clusters by minimizing the sum of squared distances between the data and cluster centers. This concept has been extended to fuzzy clustering, where datum membership is shared among all of the cluster centers rather than being restricted to a single cluster. To derive fuzzy clustering, the objective function of HCM is transformed into nonlinear functions. Specifically, Dunn’s algorithm replaces linear membership weights with squared ones and creates cluster centers based on weighted means [2]. Bezdek gener- alized Dunn’s method by using the power of membership as weights [3], thereby producing what is commonly known as the fuzzy c-means (FCM) algorithm. Pal and Bezdek [4] sug- gested taking an exponent from 1.5–2.5. To distinguish this algorithm from the many variants that have been proposed since, this is referred to as the Bezdek-type FCM (bFCM) in the present study. Another fuzzy approach used for clus- ter analysis is the regularization of the objective function of HCM. Recognizing that HCM is singular, and that an appropriate cluster cannot be obtained using the Lagrangian multiplier method, Miyamoto and Mukaidono introduced a regularization term into its objective function as the negative entropy of membership [5] with a positive parameter, thereby producing entropy-regularized FCM (eFCM). Note that these FCM-variants are based on a minimization problem with respect to the membership where an object belongs to a cluster and the cluster centers, i.e., a minimizing model. In the aforementioned clustering methods, the squared Euclidean distance is assumed to be the dissimilarity measure between an object and a cluster center. However, there are many other similarity and dissimilarity measures. In partic- ular, spherical kmeans [6], and its fuzzified methods [7], subtract the cosine correlation between an object and a cluster center from 1, which is used as the dissimilarity Yuchi Kanzawa is with the Department of Communication Engineering, Faculty of Engineering, Shibaura Institute of Technology, Tokyo, JAPAN (email: [email protected]). This study was supported partly by a Grant-in-Aid for Scientific Research from the Japan Society for the Promotion of Science, No. 00298176. measure between an object and a cluster center. These methods are called spherical clustering because the cosine correlation ignores the magnitude of objects, hence all ob- jects are assumed to be on the unit hypersphere. The spherical clustering methods that correspond to HCM, bFCM, and eFCM are referred to as spherical HCM (sHCM), spherical bFCM (sbFCM), and spherical eFCM (seFCM) in the present study. All of the aforementioned clustering methods are mini- mizing models, i.e., the algorithms are obtained by solving the corresponding minimization problems based on the dis- similarities between objects and clusters, but a maximization model can also be considered, i.e., maximization problems based on the similarities between objects and clusters. The maximizing models of sHCM and seFCM (msHCM and mseFCM) based on the cosine similarity between objects and clusters are equivalent to the corresponding minimizing models (sHCM and seFCM), whereas a maximizing model of sbFCM is unclear [8]. This is the motivation of this study. In this study, a maximizing model of sbFCM (msbFCM) is proposed based on the regularization of the msHCM. First, similar to the eFCM derived by regularizing HCM, it is shown that seFCM and mseFCM can be derived by regularizing sHCM and msHCM, respectively. Next, bFCM and sbFCM are interpreted as regularizations of HCM and sHCM, respectively. Next, the proposed method, a maximiz- ing model of sbFCM (msbFCM), is constructed by adding a regularization term to msHCM. It must be noted that the pro- posed method is valid only if the objects are on the first quad- rant of the unit hypersphere, unlike conventional spherical clustering methods, which are valid for objects on the whole unit hypersphere. However, this constraint is not considered to restrict the applicability of the proposed method greatly. Example applications, document clustering and kernelized clustering, will be used in a future study to illustrate the performance of the proposed method. In document clustering, each document is described as a normalized term-frequency vector or tf-idf weighted vector, the elements of which are all positive. The best-known kernel, the Gaussian kernel, and its variants have elements in [0, 1] and diagonal elements of one, which means that all of the vectors are positive in the feature. Theoretical analysis and numerical experiments are conducted to show that the proposed method is not equivalent to sbFCM, because the effect of fuzzifier parameter for the proposed method is different from that found in other methods. All of the methods mentioned in this paper are summarized in TABLE I. The rest of this paper is organized as follows. In section 2014 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) July 6-11, 2014, Beijing, China 978-1-4799-2072-3/14/$31.00 ©2014 IEEE 2482

A Maximizing Model of Bezdek-like Spherical Fuzzy c-Means

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Maximizing Model of Bezdek-like Spherical Fuzzy c-Means

A Maximizing Model ofBezdek-like Spherical Fuzzy c-Means Clustering

Yuchi Kanzawa

Abstract— In this study, a maximizing model of Bezdek-typespherical fuzzy c-means clustering is proposed, which is basedon the regularization of the maximizing model of spherical hardc-means. Using theoretical analysis and numerical experiments,it is shown that the proposed method is not equivalent tothe minimizing model of Bezdek-type spherical fuzzy c-means,because the effect of its fuzzifier parameter is different fromthat found in conventional methods.

I. INTRODUCTION

The hard c-means (HCM) clustering algorithm [1] splitsdatasets into well-separated clusters by minimizing the sumof squared distances between the data and cluster centers.This concept has been extended to fuzzy clustering, wheredatum membership is shared among all of the cluster centersrather than being restricted to a single cluster. To derive fuzzyclustering, the objective function of HCM is transformed intononlinear functions. Specifically, Dunn’s algorithm replaceslinear membership weights with squared ones and createscluster centers based on weighted means [2]. Bezdek gener-alized Dunn’s method by using the power of membership asweights [3], thereby producing what is commonly known asthe fuzzy c-means (FCM) algorithm. Pal and Bezdek [4] sug-gested taking an exponent from 1.5–2.5. To distinguish thisalgorithm from the many variants that have been proposedsince, this is referred to as the Bezdek-type FCM (bFCM)in the present study. Another fuzzy approach used for clus-ter analysis is the regularization of the objective functionof HCM. Recognizing that HCM is singular, and that anappropriate cluster cannot be obtained using the Lagrangianmultiplier method, Miyamoto and Mukaidono introduced aregularization term into its objective function as the negativeentropy of membership [5] with a positive parameter, therebyproducing entropy-regularized FCM (eFCM). Note that theseFCM-variants are based on a minimization problem withrespect to the membership where an object belongs to acluster and the cluster centers, i.e., a minimizing model.

In the aforementioned clustering methods, the squaredEuclidean distance is assumed to be the dissimilarity measurebetween an object and a cluster center. However, there aremany other similarity and dissimilarity measures. In partic-ular, spherical kmeans [6], and its fuzzified methods [7],subtract the cosine correlation between an object and acluster center from 1, which is used as the dissimilarity

Yuchi Kanzawa is with the Department of Communication Engineering,Faculty of Engineering, Shibaura Institute of Technology, Tokyo, JAPAN(email: [email protected]).

This study was supported partly by a Grant-in-Aid for Scientific Researchfrom the Japan Society for the Promotion of Science, No. 00298176.

measure between an object and a cluster center. Thesemethods are called spherical clustering because the cosinecorrelation ignores the magnitude of objects, hence all ob-jects are assumed to be on the unit hypersphere. The sphericalclustering methods that correspond to HCM, bFCM, andeFCM are referred to as spherical HCM (sHCM), sphericalbFCM (sbFCM), and spherical eFCM (seFCM) in the presentstudy.

All of the aforementioned clustering methods are mini-mizing models, i.e., the algorithms are obtained by solvingthe corresponding minimization problems based on the dis-similarities between objects and clusters, but a maximizationmodel can also be considered, i.e., maximization problemsbased on the similarities between objects and clusters. Themaximizing models of sHCM and seFCM (msHCM andmseFCM) based on the cosine similarity between objectsand clusters are equivalent to the corresponding minimizingmodels (sHCM and seFCM), whereas a maximizing modelof sbFCM is unclear [8]. This is the motivation of this study.

In this study, a maximizing model of sbFCM (msbFCM)is proposed based on the regularization of the msHCM.First, similar to the eFCM derived by regularizing HCM,it is shown that seFCM and mseFCM can be derived byregularizing sHCM and msHCM, respectively. Next, bFCMand sbFCM are interpreted as regularizations of HCM andsHCM, respectively. Next, the proposed method, a maximiz-ing model of sbFCM (msbFCM), is constructed by adding aregularization term to msHCM. It must be noted that the pro-posed method is valid only if the objects are on the first quad-rant of the unit hypersphere, unlike conventional sphericalclustering methods, which are valid for objects on the wholeunit hypersphere. However, this constraint is not consideredto restrict the applicability of the proposed method greatly.Example applications, document clustering and kernelizedclustering, will be used in a future study to illustrate theperformance of the proposed method. In document clustering,each document is described as a normalized term-frequencyvector or tf-idf weighted vector, the elements of which areall positive. The best-known kernel, the Gaussian kernel, andits variants have elements in [0, 1] and diagonal elements ofone, which means that all of the vectors are positive in thefeature. Theoretical analysis and numerical experiments areconducted to show that the proposed method is not equivalentto sbFCM, because the effect of fuzzifier parameter forthe proposed method is different from that found in othermethods. All of the methods mentioned in this paper aresummarized in TABLE I.

The rest of this paper is organized as follows. In section

2014 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) July 6-11, 2014, Beijing, China

978-1-4799-2072-3/14/$31.00 ©2014 IEEE 2482

Page 2: A Maximizing Model of Bezdek-like Spherical Fuzzy c-Means

2, the notation and conventional methods are introduced.In section 3, the basic concepts and the proposed methodare presented. In section 4, some illustrative examples areprovided. Section 5 contains some concluding remarks.

TABLE IMETHODS IN THIS PAPER

Abbreviation MethodsHCM hard c-meansbFCM Bezdek-type fuzzy c-meanseFCM entropy-regularized fuzzy c-meanssHCM spherical hard c-meanssbFCM spherical Bezdek-type fuzzy c-meansseFCM spherical entropy-regularized fuzzy c-meansmsHCM maximizing model of spherical hard c-means, which is

equivalent to sHCMmseFCM maximizing model of spherical entropy-regularized

fuzzy c-means, which is equivalent to seFCMmsbFCM maximizing model of spherical Bezdek-type fuzzy c-

means, the proposed method

II. PRELIMINARIES

Let X = {xk ∈ Rp | k ∈ {1, · · · , N}} be a dataset ofp-dimensional points. The membership of xk that belongsto the i-th cluster is denoted by ui,k (i ∈ {1, · · · , C}, k ∈{1, · · · , N}) and the set of ui,k is denoted by u, which isalso known as the partition matrix. The cluster center setis denoted by v = {vi | vi ∈ Rp, i ∈ {1, · · · , C}}. Thesquared Euclidean distance between the k-th datum and thei-th cluster center is denoted by di,k = ‖xk − vi‖22. HCM isobtained by solving the following optimization problem:

minimizeu,v

C∑i=1

N∑k=1

ui,kdi,k, (1)

subject toC∑i=1

ui,k = 1 (2)

where the algorithm is a 2-step iteration, which involves thecalculation of memberships ui,k and cluster centers vi [1].

The Bezdek-type Fuzzy c-means (bFCM) [2],[3] is ob-tained by solving the following optimization problem:

minimizeu,v

C∑i=1

N∑k=1

umi,kdi,k (3)

subject to Eq. (2), where m > 1 is an additional weightingexponent. If m = 1, bFCM is reduced to HCM. If m islarger, the memberships will be fuzzier, so m can be treatedas the fuzzification parameter.

Another approach for fuzzifying membership is regular-izing the objective function of HCM, which was achievedby Miyamoto and Mukaidono by introducing a regulariza-tion term with a positive parameter λ into the objectivefunction. Using the entropy term [5], the entropy-regularizedFCM (eFCM) is defined as

minimizeu,v

C∑i=1

N∑k=1

ui,kdi,k + λ−1C∑i=1

N∑k=1

ui,k log(ui,k) (4)

subject to Eq. (2).If all of the objects are on the unit hypersphere, 1− xTkvi

can be used as the dissimilarity between an object xk andan cluster center vi, and three methods that correspondto Eqs. (1), (3), and (4) are obtained for the followingoptimization problems:

minimizeu,v

C∑i=1

N∑k=1

ui,k(1− xTkvi), (5)

minimizeu,v

C∑i=1

N∑k=1

umi,k(1− xTkvi), (6)

minimizeu,v

C∑i=1

N∑k=1

ui,k(1− xTkvi)

+ λ−1C∑i=1

N∑k=1

ui,k log(ui,k), (7)

respectively, subject to Eq. (2) and

‖vi‖2 = 1. (8)

The equivalent minimization problems (5) and (7) are de-scribed using Eq. (2) as the following maximization problems

maximizeu,v

C∑i=1

N∑k=1

ui,ksi,k, (9)

maximizeu,v

C∑i=1

N∑k=1

ui,ksi,k

− λ−1C∑i=1

N∑k=1

ui,k log(ui,k) (10)

subject to Eq. (8), where

si,k = xTkvi, (11)

but the maximization model that correspond to Eq. (6) isunclear because its fuzzification is based on the power ofthe membership. This is the motivation of this study. Thespherical clustering algorithms for Eqs. (9), (6), and (10) arerepresented by the following algorithms:

Algorithm 1 (Spherical Clustering Algorithms)STEP 1. Give the number of clusters C and the fuzzifi-

cation parameter m for sbFCM and λ for seFCM, andset the initial cluster centers as v.

STEP 2. Calculate s by

si,k = xTkvi. (12)

STEP 3. Calculate u by

ui,k =

{1 (i = arg max1≤j≤C{sj,k}) ,0 (otherwise)

(13)

for msHCM,

ui,k =

C∑j=1

(1− si,k1− sj,k

) 1m−1

−1

(14)

2483

Page 3: A Maximizing Model of Bezdek-like Spherical Fuzzy c-Means

for sbFCM, and

ui,k =exp(λsi,k)∑Cj=1 exp(λsj,k)

(15)

for mseFCM.STEP 4. Calculate v by

vi =

∑Nk=1 ui,kxk

‖∑N

k=1 ui,kxk‖2(16)

for msHCM and mseFCM, and

vi =

∑Nk=1 u

mi,kxk

‖∑N

k=1 umi,kxk‖2

(17)

for sbFCM.STEP 5. Check the stopping criterion for (u, v). If the

criterion is not satisfied, go to STEP 2. �

It is noted that sbFCM with m→ +∞ produces the fuzziestmembership 1/C.

A fuzzy classification function [9] shows how prototypicalan arbitrary point in the data space is relative to a clusterby extending the membership to the whole space. Thismethod can be used to classify new data into a cluster afterclustering is performed initially with the given data. Thisis a type of supervised classification procedure. A fuzzyclassification function is also useful for investigating thefeatures of the corresponding clustering algorithm because itclarify the classification situation in the whole space, ratherthan the membership of only a finite number of data. Thefuzzy classification functions, ui(x), for the membership ofa datum x that belongs to the i-th cluster, which correspondto the algorithm above are described as

ui(x) =

{1 (i = arg max1≤j≤C{sj(x)}) ,0 (otherwise)

(18)

for msHCM,

ui(x) =

C∑j=1

(1− si(x)1− sj(x)

) 1m−1

−1

(19)

for sbFCM, and

ui(x) =exp (λsi(x))∑Cj=1 exp (λsj(x))

(20)

for mseFCM, where

si(x) = xTvi. (21)

These functions are used to compare the correspondingclustering algorithms with the algorithm proposed in thisstudy from the perspective of these clustering rules.

III. PROPOSED METHOD

A. Basic Concept

This study proposed msbFCM based on the regularizationof msHCM. First, we recall that eFCM is derived by reg-ularizing HCM using the negative entropy term, where theminimization of HCM objective function has a crisp solutionand the negative entropy is convex in [0, 1] (Fig. 1(a)).Similarly, the maximizing model of spherical eFCM (10)can be derived by regularizing spherical HCM using theentropy term, where the maximization of the spherical HCMobjective function has a crisp solution and the entropy isconcave in [0, 1] (Fig. 1(b)). Next, we consider that bFCMis a type of HCM regularization with the equivalent form ofthe bFCM objective function (3) as follows:

Eq. (3)⇔minimizeu,v

C∑i=1

N∑k=1

ui,kdi,k

+C∑i=1

N∑k=1

(umi,k − ui,k)di,k, (22)

where the first term is the objective function of HCM,which has the minimizer as a crisp membership, and thesecond term is a convex regularizer (Fig. 1(c)). Similarly,a regularization term

∑Ci=1

∑Nk=1(u

1m

i,k − ui,k)si,k, which isconcave in [0, 1] (Fig. 1(d)) under si,k ≥ 0 and m > 1, isintroduced into the maximizing model of spherical HCM (9)as

maximizeu,v

C∑i=1

N∑k=1

ui,ksi,k +C∑i=1

N∑k=1

(u1m

i,k − ui,k)si,k

⇔maximizeu,v

C∑i=1

N∑k=1

u1m

i,ksi,k. (23)

The following two remarks should be noted. First, this opti-mization problem is not equivalent to Eq. (6), which will beconfirmed numerically in a later section, while msHCM andmseHCM are equivalent to sHCM and seFCM, respectively.Second, this optimization problem is valid under si,k ≥ 0, butthe applicability which will described fully in a future study,is not considered to be restricted greatly by this constraint.The first application of this method is document clustering,where each document is described as a normalized term-frequency vector or tf-idf weighted vector, the elements ofwhich are all positive, hence any inner products of pairs ofobjects are nonnegative (xTkx` > 0). Therefore,

si,k =xTkvi

=

∑N`=1 u

1m

i,kxTkx`

‖∑N

`=1 u1m

i,kxTkx`‖2

≥ 0. (24)

The second application is clustering with a kernel, The best-known kernel, the Gaussian kernel, and its variants haveelements in [0, 1] and the diagonal elements of one, which

2484

Page 4: A Maximizing Model of Bezdek-like Spherical Fuzzy c-Means

(a) (b)

(c) (d)

Fig. 1. Regularizers. (a): Negative entropy ui,k log(ui,k) for eFCM, whichis convex. (b): Entropy −ui,k log(ui,k) for seFCM, which is concave. (c):(um

i,k − ui,k)di,k for sbFCM (shown with (m, di,k) = (2.5, 1)), which

is convex. (d): (u1mi,k − ui,k)si,k for msbFCM (shown with (m, si,k) =

(2.5, 1)), which is convex.

means that all of the vectors are positive in the feature space,hence si,k ≥ 0.

In the next subsection, we derive a clustering algorithmbased on the maximization problem Eq. (23) subject to theconstraints Eqs. (2) and (8).

B. Proposed Algorithm and Classification Function

The proposed algorithm is obtained by solving the opti-mization problem (23) subject to the constraints (2) and (8),where the Lagrangian L(u, v) is described as

L(u, v) =C∑i=1

N∑k=1

u1m

i,ksi,k

+

C∑k=1

γk

(1−

C∑i=1

ui,k

)

+C∑i=1

νi(1− ‖vi‖22

)(25)

with the Lagrange multipliers (γ, ν). The necessary condi-tions of optimality are described as

∂L(u, v)

∂ui,k= 0, (26)

∂L(u, v)

∂vi= 0, (27)

∂L(u, v)

∂γk= 0, (28)

∂L(u, v)

∂νi= 0. (29)

The optimal membership is obtained by Eq. (26) as

uj,k =1

mγks

mm−1

j,k (30)

with the Lagrange multiplier γk. Summing up for the clusterindex j ∈ {1, · · · , C} and considering Eq. (28)⇔Eq. (2), wehave

1

mγk

C∑j=1

sm

m−1

j,k = 1

⇔ 1

mγk=

1∑Cj=1 s

mm−1

j,k

. (31)

By inserting Eq. (30) into this equation, we can eliminateγk, which yields

ui,k =s

mm−1

i,k∑Cj=1 s

mm−1

j,k

. (32)

The optimal cluster center is obtained using Eq. (27) as

vi =1

2νi

N∑k=1

u1m

i,kxk

(33)

with the Lagrange multiplier νi. By considering the squaredEuclidean norm and taking Eq. (29)⇔Eq. (8) into account,we have

1

(2νi)2

∥∥∥∥∥N∑

k=1

u1m

i,kxk

∥∥∥∥∥2

2

= 1

⇔ 1

2νi=

1∥∥∥∑Nk=1 u

1m

i,kxk

∥∥∥2

. (34)

By inserting Eq. (33) into this equation, we can eliminate νi,which yields

vi =

∑Nk=1 u

1m

i,kxk∥∥∥∑Nk=1 u

1m

i,kxk

∥∥∥2

. (35)

This analysis is summarized by the following algorithm:

Algorithm 2 (msbFCM)STEP 1. Give the number of clusters C and the fuzzifi-

cation parameter m, and set the initial cluster centersas v.

STEP 2. Calculate s by

si,k = xTkvi. (36)

STEP 3. Calculate u by

ui,k =s

mm−1

i,k∑Cj=1 s

mm−1

j,k

. (37)

2485

Page 5: A Maximizing Model of Bezdek-like Spherical Fuzzy c-Means

STEP 4. Calculate v by

vi =

∑Nk=1 u

1m

i,kxk∥∥∥∑Nk=1 u

1m

i,kxk

∥∥∥2

(38)

STEP 5. Check the stopping criterion for (u, v). If thecriterion is not satisfied, go to Step. 2.

Using the obtained cluster centers with this algorithm, thecorresponding classification function ui(x) is described as

ui(x) =si(x)

mm−1∑C

j=1 sj(x)m

m−1

, (39)

where

si(x) = xTvi. (40)

Thus, the proposed method converges on sHCM for m −1 → +0 based on both the optimization problem and thealgorithm, as follows. First, the optimization problem (23) ofthe proposed method with m = 1 obviously coincides withthe optimization problem (9) of sHCM. Next, we consider theupdating equation of the membership (32) for m− 1→ +0.If i = arg maxj≤C{sj,k}, all of the ratios between sj,k andsi,k are less than one:

sj,ksi,k

< 1 (j 6= i), (41)

hence we have

1

ui,k=

C∑j=1

(sj,ksi,k

) mm−1

=1 +C∑

j=1,j 6=i

(sj,ksi,k

) mm−1

→1

(m− 1→ +0⇔ m

m− 1→ +∞

), (42)

i.e., is ui,k = 1. Otherwise, there is at least one cluster-indexj where sj,k > si,k, and the ratio of sj,k relative to si,k isgreater than one:

sj,ksi,k

> 1, (43)

and we obtain

1

ui,k=

C∑j=1

(sj,ksi,k

) mm−1

→+∞(m− 1→ +0⇔ m

m− 1→ +∞

), (44)

i.e., ui,k = 0. Therefore, the membership (32) of theproposed algorithm converges on the membership (13) ofsHCM.

However, the effect of larger fuzzifier parameter in theproposed method is different from that in other methods, asfollows. The memberships of sbFCM and mseFCM, Eqs (14)and (15), converge on 1/C as m → +∞ and λ → +0,

respectively, whereas that of the proposed method does not,but

Eq.(37)→ si,k∑Cj=1 sj,k

(m→ +∞⇔ m

m− 1→ 1

)(45)

except for si,k = sj,k (i 6= j). The membership equationalone does not produce the fuzziest cluster with a largerfuzzifier parameter, but the overall algorithm produces thecluster because the fraction of factors for xk in the clustercenter equation∥∥∥∑N

`=1 u1m

i,`x`

∥∥∥2

u2m

i,k

=

∥∥∥∥∥N∑`=1

(ui,`ui,k

) 1m

x`

∥∥∥∥∥2

(46)

converges on∥∥∥∑N

`=1 x`

∥∥∥2

as m → +∞, which is indepen-dent of the cluster index i. Thus, all of the cluster centersare collected into only one cluster, from which we obtainsi,k = sj,k (i 6= j). Therefore, all of the membershipsproduce the fuzziest value 1/C.

IV. NUMERICAL EXAMPLE

This section provides some examples based on two artifi-cial datasets. The first example illustrates performance of theproposed method using an artificial dataset of three clusters,each of which contained 50 points in the first quadrant ofthe unit sphere (Fig. 2(a)). Using cluster number C = 3and fuzzifier parameter m = 1.5, the proposed methodpartitioned this dataset adequately, as shown in Fig. 2(b),where the maximal membership of the data are shownby squares, circles, and triangles, respectively. The second

(a) Data (b) Clustering Result

Fig. 2. Artificial Dataset #1 and the clustering results obtained withmsbFCM.

set of examples compares the characteristic features of theproposed fuzzification method with conventional algorithms,which used an artificial dataset consisting of 231 pointsscattered in the first quadrant of the unit sphere (Fig. 3(a)).Using cluster number C = 4 and fuzzifier parameter m =1.5, the proposed method partitioned this dataset as shownin Fig. 3(b). msHCM, mseFCM with λ = 1.0, and sbFCMwith m = 1.5 produced similar results. However, there weredifferences in the classification rules obtained from theirclassification functions (Figs. 4–7), where “(a)” shows theclassification function for the top cluster and “(b)” is thatfor the middle cluster. First, msHCM classified data with

2486

Page 6: A Maximizing Model of Bezdek-like Spherical Fuzzy c-Means

crisp memberships (Fig. 4) while the other methods (in-cluding the proposed method) classified the data with fuzzymemberships (Figs. 5–7). Next, although both the proposedmethod and sbFCM fuzzified based on the power of themembership in their optimization problems (Eqs. (6) and(23)), their classification functions (Figs. 5 and 6) weredifferent, where sbFCM had the maximal and minimal valuesof its classification function at the cluster centers, while theproposed method did not. The classification rule obtainedwith the proposed method was similar to that produced usingmseFCM, as shown in Fig. 7.

(a) Data (b) Clustering Result

Fig. 3. Artificial Dataset #2 and clustering results obtained with msbFCM.

(a) for the top cluster (b) for the middle cluster

Fig. 4. Classification function results obtained with msHCM.

(a) for the top cluster (b) for the middle cluster

Fig. 5. Classification function results obtained with msbFCM.

The next set of examples show the features for the fuzzifierparameter m with the proposed method. Using the samedataset (Fig. 3(a)), the classification function of the proposedmethod with fuzzifier parameter m = 1.1 is shown inFig. 8, which is compared with the result with m = 1.5in Fig. 5, where a lower fuzzifier parameter yielded acrisp classification function. Furthermore, the classificationfunction obtained using the proposed method with m = 1.01was similar to that produced with msHCM (Fig. 4), whichconfirmed the theoretical prediction that the proposed method

(a) for the top cluster (b) for the mid cluster

Fig. 6. Classification function results obtained with sbFCM.

(a) for the top cluster (b) for the middle cluster

Fig. 7. Classification function results obtained with mseFCM

converged on msHCM. The classification function of theproposed method with fuzzifier parameter m = 106 is shownin Fig. 9, which is compared with the result obtained withm = 1.5 in Fig. 5, where a larger fuzzifier parameter valuemade the classification function fuzzy. The cluster centersobtained by the proposed method with m = 1.5 and theclassification function of the proposed method with fuzzifierparameter m = 106 are shown in Fig. 10, which confirmedthe theoretical prediction that the proposed method withthe membership equation alone did not produce the fuzziestcluster with the larger fuzzifier parameter.

(a) for the top cluster (b) for the mid cluster

Fig. 8. Classification function results obtained with msbFCM using m =1.1.

V. CONCLUSIONS

This study proposed a maximizing model of sbFCM (ms-bFCM) based on the regularization of msHCM. First, sim-ilar to the derivation of eFCM by regularizing HCM, itwas shown that seFCM and mseFCM can be derived byregularizing sHCM and msHCM, respectively. Next, bFCMand sbFCM were interpreted as regularizations of HCM andsHCM, respectively. Next, the proposed method, a maximiz-ing model of sbFCM (msbFCM), was constructed by supple-menting msHCM with a regularization term. A theoretical

2487

Page 7: A Maximizing Model of Bezdek-like Spherical Fuzzy c-Means

(a) for the top cluster (b) for the middle cluster

Fig. 9. Classification function results obtained with msbFCM using m =106

(a) for the top cluster (b) for the mid cluster

Fig. 10. Classification function results obtained with msbFCM using m =106, where the cluster centers are obtained using the proposed method withm = 1.5

analysis and numerical experiments were performed to showthat the proposed method is not equivalent to sbFCM becausethe effect of its fuzzifier parameter is different from thatfound in other methods. Numerical examples demonstratedthat the proposed method is valid.

In future research, the proposed method will be applied todocument clustering and will be compared with conventionalmethods based on the clustering accuracy. Next, the proposedmethod will be extended as follows. The introduction of avariable to controll cluster sizes into the proposed methodwill capture clusters with different sizes as HCM and eFCMwere extended [8]. The proposed method will be kernelizedin a similar manner to [14] in order to cluster data withnonlinear borders. Using the technique propsoed in this study,the linear membership weights of msHCM was replaced withless-than-one power of membership, will be also appliedto other methods, such as fuzzy nonmetric models [15],Windham’s AP algorithm [16], possibilistic c-means [17],

and co-clustering [18].

REFERENCES

[1] MacQueen, J.B.: “Some Methods of Classification and Analysis ofMultivariate Observations,” Proc. 5th Berkeley Symposium on Math.Stat. and Prob., pp. 281–297 (1967).

[2] Dunn, J.: “A Fuzzy Relative of the Isodata Process and Its Use inDetecting Compact, Well-Separated Clusters,” Journal of Cybernetics,Vol. 3, No. 3, pp. 32–57 (1973).

[3] Bezdek, J.:Pattern Recognition with Fuzzy Objective Function Algo-rithms, Plenum Press, New York (1981).

[4] Pal, N. R. and Bezdek, J. C.: “On Cluster Validity for Fuzzy c-MeansModel,” IEEE Trans. Fuzzy Syst., Vol. 1, pp. 370–379 (1995).

[5] Miyamoto, S. and Mukaidono, M.: “Fuzzy c-Means as a Regulariza-tion and Maximum Entropy Approach,” Proc. 7th Int. Fuzzy SystemsAssociation World Congress (IFSA ’97), Vol. 2, pp. 86–92 (1997).

[6] Dhillon, I. S., and Modha, D. S.: “Concept Decompositions for LargeSparse Text Data Using Clustering,” Machine Learning, Vol. 42,pp. 143–175 (2001).

[7] Miyamoto, S., and Mizutani, K.: “Fuzzy Multiset Model and Methodsof Nonlinear Document Clustering for Information Retrieval,” LNCS,Vol. 3131, pp. 273–283 (2004).

[8] Miyamoto, S., Ichihashi, H., and Honda, K.: Algorithms for FuzzyClustering, Springer (2008).

[9] Miyamoto, S. and Umayahara, K.: “Methods in Hard and FuzzyClustering,” in Liu, Z.-Q. and Miyamoto, S. (eds), Soft computingand Humancentered Machines, Springer–Verlag Tokyo, (2000).

[10] Buchta, C., Kober, M., Feinerer, I. and Hornik, K.: “Spherical k-MeansClustering,” Journal of Statistical Software, Vol. 50, No. 10, (2012).

[11] Banerjee, A., Dhillon, I. S., Ghosh, J., and Sra, S.: “Clustering on theUnit Hypersphere using von Mises-Fisher Distributions,” Journal ofMachine Learning Research, Vol. 6, pp. 1345–1382, (2005).

[12] Cai, D. and He, X.: “Manifold Adaptive Experimental Design forText Categorization,” IEEE Transactions on Knowledge and DataEngineering, Vol. 24, No. 4, pp. 707–719, (2012).

[13] Ghosh, G., Strehl, A, and Merugu, S.: “A Consensus Framework for In-tegrating Distributed Clusterings under Limited Knowledge Sharing,”Proc. NSF Workshop on Next Generation Data Mining, pp. 99–108,(2002).

[14] Miyamoto, S. and Suizu, D.: “Fuzzy c-Means Clustering Using KernelFunctions in Support Vector Machines,” J. Advanced ComputationalIntelligence and Intelligent Informatics, Vol. 7, No. 1, pp. 25–30,(2003).

[15] Roubens, M.: “Pattern Classification Problems and Fuzzy Sets,” FuzzySets and Syst., Vol. 1, pp. 239–253, (1978).

[16] Windham, M. P.: “Numerical Classification of Proximity Data withAssignment Measures,” J. Classification, Vol. 2, pp. 157–172, (1985).

[17] Krishnapuram, R. and Keller, J. M.: “A Possibilistic Approach toClustering,” IEEE Trans. on Fuzzy Systems, Vol. 1, pp. 98–110,(1993).

[18] Oh, C., Honda, K., and Ichihashi, H.: “Fuzzy Clustering for Categori-cal Multivariate Data”, Proc. IFSA World Congress and 20th NAFIPSInternational Conference, pp.2154–2159, (2001).

2488