Research Article An Efficient Optimization Method for ...downloads.hindawi.com/journals/cmmm/2015/802754.pdf · 4. BBO Algorithm for Data Clustering In order to use BBO algorithm

Research ArticleAn Efficient Optimization Method for Solving UnsupervisedData Classification Problems

Parvaneh Shabanzadeh12 and Rubiyah Yusof12

1Centre for Artificial Intelligence and Robotics Universiti Teknologi Malaysia 54100 Kuala Lumpur Malaysia2Malaysia-Japan International Institute of Technology (MJIIT) Universiti Teknologi Malaysia 54100 Kuala Lumpur Malaysia

Correspondence should be addressed to Rubiyah Yusof rubiyahklutmmy

Received 13 March 2015 Revised 11 June 2015 Accepted 29 June 2015

Academic Editor Andrzej Kloczkowski

Copyright copy 2015 P Shabanzadeh and R YusofThis is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in anymedium provided the originalwork is properly cited

Unsupervised data classification (or clustering) analysis is one of the most useful tools and a descriptive task in data miningthat seeks to classify homogeneous groups of objects based on similarity and is used in many medical disciplines and variousapplications In general there is no single algorithm that is suitable for all types of data conditions and applications Each algorithmhas its own advantages limitations and deficiencies Hence research for novel and effective approaches for unsupervised dataclassification is still active In this paper a heuristic algorithm Biogeography-Based Optimization (BBO) algorithm was adaptedfor data clustering problems by modifying the main operators of BBO algorithm which is inspired from the natural biogeographydistribution of different species Similar to other population-based algorithms BBO algorithm starts with an initial population ofcandidate solutions to an optimization problem and an objective function that is calculated for them To evaluate the performanceof the proposed algorithm assessment was carried on six medical and real life datasets and was compared with eight well knownand recent unsupervised data classification algorithms Numerical results demonstrate that the proposed evolutionary optimizationalgorithm is efficient for unsupervised data classification

1 Introduction

Unsupervised data classification (or data clustering) is oneof the most important and popular data analysis techniquesand refers to the process of grouping a set of data objects intoclusters in which the data of a cluster must have high degreeof similarity and the data of different clusters must havehigh degree of dissimilarity [1] The aim is to minimize theintercluster distance and maximize the intracluster distanceClustering techniques have been applied in many areas suchas document clustering [2 3] medicine [4 5] biology [6]agriculture [7] marketing and consumer analysis [8 9]geophysics [10] prediction [11] image processing [12ndash14]security and crime detection [15] and anomaly detection [16]

In clustering problem a dataset is divided into 119896 numberof subgroups such that elements in one group are moresimilar to one another than elements of another group [17]It can be defined to find out unknown patterns knowledgeand information from a given dataset119860which was previously

undiscovered using some criterion function [18] It is NPcomplete problem when the number of cluster is greaterthan three [17] Over the last two decades many heuristicalgorithms have been suggested and it is demonstrated thatsuch algorithms are suitable for solving clustering problemsin large datasets For instance the Tabu Search Algorithm forthe clustering is presented in [19] the Simulated AnnealingAlgorithm in [20] the Genetic Algorithm in [21] and theparticle swarm optimization algorithm in [22] which is oneof powerful optimization methods Fernandez Martınez andGarcia-Gonzalo [23ndash26] clearly explained how PSO familyparameters should be chosen close to the second orderstability region Hatamlou et al in [27] introduced the BigBang Big Crunch algorithm for the clustering problem Thisalgorithm has its origin from one of the theories of theevolution of the universe namely the Big Bang and BigCrunch theory An Ant Colony Optimization was developedto solve the clustering problem in [28] Such algorithms are

Hindawi Publishing CorporationComputational and Mathematical Methods in MedicineVolume 2015 Article ID 802754 9 pageshttpdxdoiorg1011552015802754

2 Computational and Mathematical Methods in Medicine

able to find the global solution to the clustering Applica-tion of the Gravitational Search Algorithm (GSA) [29] forclustering problem has been introduced in [30] A compre-hensive review on clustering algorithms can be found in [31ndash33]

In this paper a new heuristic clustering algorithm isdeveloped It is based on the evolutionary method called theBiogeography-Based Optimization (BBO) method proposedin [34] The BBOmethod is inspired from the science of bio-geography it is a population-based evolutionary algorithmConvergence results for this method and its practical applica-tions can be found in [35] The algorithm has demonstratedgood performance on various optimization benchmark prob-lems [36] The proposed clustering algorithm is tested on sixdatasets from UCI Machine Learning Repository [37] andthe obtained results are compared with those obtained usingother similar algorithms

The rest of this paper is organized as follows Section 2describes clustering problem A brief overview of the BBOalgorithm is given in Section 3 Section 4 presents the cluster-ing algorithm Experimental results are reported in Section 5Finally Section 6 presents conclusions with future researchdirection

2 Cluster Analysis

In cluster analysis we suppose that we have been given a set119860of a finite number of points of 119889-dimensional space 119877119889 thatis 1198861 1198862 119886119899 where 119886119894 isin 119877119889 119894 = 1 2 119899

In all clustering algorithms can be classified into twocategories namely hierarchical clustering and partitionalclustering Partitional clustering methods are the most pop-ular class of center based clustering methods It has beenseen that partitional algorithm is more commendable ratherthan hierarchical clustering The advantage of partitionalalgorithm is its visibility in circumstances where applicationinvolving large dataset is used where construction of nestedgrouping of patterns is computationally prohibited [38 39]The clustering problem is said to be hard clustering if everydata point belongs to only one cluster Unlike hard clusteringin the fuzzy clustering problem the clusters are allowed tooverlap and instances have degrees of appearance in eachcluster [40] In this paperwewill exclusively consider the hardunconstrained clustering problem Therefore the subject ofcluster analysis is the partition of the set 119860 into a givennumber 119902 or disjoint subsets 119861119894 119894 = 1 2 119902 with respectto predefined criteria such that

119861119894 = 0 119894 = 1 2 119902

119861119894 cap119861119895 = 0 forall119894 = 119895 119894 119895 = 1 2 119902

119902

⋃

119894=1

119861119894 = 119860

(1)

Each cluster 119861119894 can be identified by its center (orcentroid) To determine the dissimilarity between objects

many distance metrics have been defined The most populardistance metric is the Euclidean distance In this researchwe will also use Euclidean metric as a distance metric tomeasure the dissimilarity between data objects So for giventwo objects 119886119894 and 119886119895 with 119889-dimensions the distance isdefined by [38] as

119889 (119886119894 119886119895) = radic

119889

sum

119903=1(119886119903119894minus 119886119903119895)2 (2)

Since there are different ways to cluster a given set ofobjects a fitness function (cost function) for measuring thegoodness of clustering should be defined A famous andwidely used function for this purpose is the totalmean-squarequantization error (MSE) [41] which is defined as follows

MSE =119902

sum

119895=1sum

119886119894isin119861119895

119889 (119886119894 119861119895)

2 (3)

where 119889(119886119894 119861119895)2 is the distance between object 119886119894 and the

center of cluster 119862119895(119861119895) to be found by calculating the meanvalue of objects within the respective cluster

3 Biogeography-BasedOptimization Algorithm

In this section we give a brief description of the Biogeog-raphy-Based Optimization (BBO) algorithm BBO is a newevolutionary optimization method based on the study ofgeographic distribution of biological organisms (biogeogra-phy) [34] Organisms in BBO are called species and theirdistribution is considered over time and space Species canmigrate between islands which are called habitat Habitatis characterized by a Habitat Suitability Index (HSI) HSIin BBO is similar to the fitness in other population-basedoptimization algorithms and measures the solution good-ness HSI is related to many features of the habitat [34]Considering a global optimization problem and a populationof candidate solutions (individuals) each individual can beconsidered as a habitat and is characterized by its HSI Ahabitat with high HSI is a good solution (maximizationproblem) Similar to other evolutionary algorithms goodsolutions share their features with others to produce a betterpopulation in the next generations Conversely an individualwith low fitness is unlikely to share features and likelyaccept features Suitability index variable (SIV) implies thehabitability of a habitat As there are many factors in thereal world which make a habitat more suitable to reside thanothers there are several SIVs for a solution which affectits goodness A SIV is a feature of the solution and can beimagined like a gene in GA BBO consists of two main stepsmigration andmutationMigration is a probabilistic operatorthat is intended to improve a candidate solution [42 43] InBBO the migration operator includes two different typesimmigration and emigration where for each solution in each

Computational and Mathematical Methods in Medicine 3

generation the rates of these types are adaptively determinedbased on the fitness of the solution In BBO each candidatesolution ℎ119894 has its own immigration rate 120582119894 and emigrationrate 120583119894 as follows

120582119894 = 119868(1minus119896 (119894)

119899pop)

120583119894 = 119864(119896 (119894)

119899pop)

(4)

where 119899pop is the population size and 119896(119894) shows the rank of119894th individual in a ranked list which has been sorted basedon the fitness of the population from the worst fitness tothe best one (1 is worst and 119899pop is best) Also 119864 and 119868 arethe maximum possible emigration and immigration rateswhich are typically set to one A good candidate solution haslatively high emigration rate and allows immigration ratewhile the converse is true for a poor candidate solutionTherefore if a given solution ℎ119894 is selected to be modified(in migration step) then its immigration rate 120582119894 is appliedto probabilistically modify each SIV in that solution Theemigrating candidate solution ℎ119895 is probabilistically chosenbased on 120583119895 Different methods have been suggested forsharing information between habitats (candidate solutions)in [44] where migration is defined by

ℎ119894 (SIV) = 120572lowastℎ119894 (SIV) + (1minus120572) lowast ℎ119895 (SIV) (5)

where 120572 is a number between 0 and 1 It could be randomor deterministic or it could be proportional to the relativefitness of the solutions ℎ119894 and ℎ119895 Equation (5) means that(feature solution) SIV of ℎ119894 comes from a combination ofits own SIV and the emigrating solutionrsquos SIV Mutation isa probabilistic operator that randomly modifies a decisionvariable of a candidate solutionThe purpose ofmutation is toincrease diversity among the populationThemutation rate iscalculated in [34]

119898119894 = 119898max (1 minus 119875119894119875max

) (6)

where 119875119894 is the solution probability and119875max = max119894119875119894 119894 =1 119899pop where 119899pop is the population size and 119898max isuser-defined parameter

If ℎ119894(SIV) is selected for mutation then the candidatesolution ℎ119895 is probabilistically chosen based on 119898119894 thusreplace ℎ119894(SIV) with a randomly generated SIV Severaloptions can be used for mutation but one option for imple-menting that can be defined as

ℎ119894 (SIV) = ℎ119894 (SIV) + 120588 (7)

where

120588 = 120597 (max (ℎ119894 (SIV)) minusmin (ℎ119894 (SIV))) 120590 (8)

C1

1C2

1C1

2C2

2C1

3C2

3

Figure 1 The encoding of an example of candidate solution

120597 is user-defined parameter near 0 and alsomax(ℎ119894(SIV))min(ℎ119894(SIV)) are the upper and lower bounds for eachdecision variable and 120590 is random number normally dis-tributed in the range of (0 1)

Based on the above description themain steps of the BBOalgorithm can be described as follows

Step 1 (initialization) At first introduce the initial param-eters that include the number of generations necessary forthe termination criterion population size which indicatesthe number of habitatsislandssolutions number of designvariables maximum immigration and emigration rates andmutation coefficient and also create a random set of habitats(population)

Step 2 (evaluation) Compute corresponding HSI values andrank them on the basis of fitness

Step 3 (update parameters) Update the immigration rate120582119894 and emigration rate 120583119894 for each islandsolution Badsolutions have low emigration rates and high immigrationrates whereas good solutions have high emigration rates andlow immigration rates

Step 4 (select islands) Probabilistically select the immigra-tion islands based on the immigration rates and select theemigrating islands based on the emigration rates via roulettewheel selection

Step 5 (migration phase) Randomly change the selectedfeatures (SIVs) based on (4)ndash(5) and based on the selectedislands in the previous step

Step 6 (mutation phase) Probabilistically carry out mutationbased on the mutation probability for each solution that isbased on (6)

Step 7 (check the termination criteria) If the output of thetermination criterion step is not met go to Step 2 otherwiseterminate it

4 BBO Algorithm for Data Clustering

In order to use BBO algorithm for data clustering one-dimensional arrays are used to encode the centres of thedesired clusters to present candidate solutions in the pro-posed algorithm The length of the arrays is equal to 119902 times 119889where 119902 is the number of clusters and 119889 is the dimensionalityof the considered datasets Figure 1 presents an example ofcandidate solution for a problemwith 3 centroids clusters and2 attributes


Then assume POP119894 = 1198621 1198622 119862119902 is the 119894th candidatesolution and 119862119895 = 119862

1119895 119862

2119895 119862

119889

119895 is the 119895th cluster centre

for the 119894th candidate solution (119894 = 1 2 119899pop) and (119895 =1 2 119902) so that 119899pop is the number of islands or candidatesolutions inwhich its value in this work is set to 100Thereforeeach of these candidate solutions shows centers of all clus-ters

A good initial population is important to the performanceof BBO and most of the population-based methods areaffected by the quality of the initial population Then inthe proposed algorithm taking into considering the natureof the input datasets a high-quality population is createdbased on special ways as mentioned in pseudocodes Oneof the candidate solutions will be produced by dividingwhole dataset to 119902 equal sets and three of them will beproduced based onminimummaximum and average valuesof data objects in each dataset and other solutions willbe created randomly This procedure creates a high-qualityinitial population and consequently this procedure ensuresthat the candidate solutions are spread in the wide area of thesearch space which as a result increases the chance of finding(near) global optima

To ensure that the best habitatssolutions are preservedelitist method is used to save the best individual found sofar into the new population So elitism strategy is proposedin order to retain the best solutions in the populationfrom one generation to the next Therefore in the proposedalgorithm new population is created based on merginginitial population (old population) and the population dueto migration and mutation process (new population) Thensuppose POP is the entire initial population of candidatesolutions and119873119890119908 119875119874119875 is the initial population changed byiteration of BBO and 120574 is percentage of initial population thatis chosen in next iteration (whose value in this work is 30)So the number of kept habitats of old population (KHOP) isas follows

KHOP = round (120574 times 119899pop) (9)

And the number of kept habitats of new population (KHCP)is as follows

KHCP = 119899popminusKHOP (10)

Hence the population of next iteration can be as follows

POPlarr997888 [POP (1 KHOP)

NewPOP (1 KHCP)] (11)

Suppose POP119894 is the 119894th candidate solution andPOP119894(119904) is the 119904th decision variable of POP119894 (ie 119862

119905

119903 119905 =

1 2 119889 and 119903 = 1 2 119902) Based on the above descrip-tion the pseudocode of the proposed method is shown inAlgorithm 1

Table 1 Summarized characteristics of the test datasets

Name ofdataset

Number of dataobjects

Number offeatures

Number ofclusters

Cancer 683 9 2 (444 239)CMC 1473 9 3 (629 334 510)

Glass 214 9 6 (70 76 17 13 929)

Iris 150 4 3 (50 50 50)

Vowel 871 3 6 (72 89 172 151207 180)

Wine 178 13 3 (59 71 48)

5 Experimental Results

The proposed method is implemented using MATLAB 76on a T6400 2GHz 2GB RAM computer To evaluatethe performance of the proposed algorithm the resultsobtained have been compared with other algorithms byapplying them on some well known datasets taken fromMachine Learning Laboratory [37] Six datasets are employedto validate the proposed method These datasets namedCancer CMC Iris Glass Wine and Vowel cover examplesof data of low medium and high dimensions The brief ofthe characteristics of these datasets is presented in Table 1They have been applied by many authors to study andevaluate the performance of their algorithms and they canbe described as follows

Wisconsin Breast Cancer Dataset (119899 = 683 119889 = 9 119896 =

2) This dataset has 683 points with nine features such ascell size uniformity clump thickness cell bare nuclei shapeuniformity marginal adhesion single epithelial cell sizebland chromatin normal nucleoli andmitosesThere are twoclusters in this dataset malignant and benign

Contraceptive Method Choice Dataset (119889119890119899119900119905119890119889 119886119904 119862119872119862119908119894119905ℎ 119899 = 1473 119889 = 10 119896 = 3) This dataset is a subset ofthe 1987 National Indonesia Contraceptive Prevalence Sur-vey The samples are married women who either were notpregnant or did not know if they were at the time of interviewThe problem is to predict the choice of current contraceptivemethod (no use has 629 objects long-term methods have334 objects and short-term methods have 510 objects) ofa woman based on her demographic and socioeconomiccharacteristics

Ripleyrsquos Glass Dataset (119899 = 214 119889 = 9 119896 = 6) This data-set has 214 points with nine features The dataset has sixdifferent clusters which are buildingwindows float processedbuilding windows nonfloat processed vehicle windows floatprocessed containers tableware and headlamps [41]

Iris Dataset (119899 = 150 119889 = 4 119896 = 3) This data consists ofthree different species of iris flower Iris setosa Iris virginicaand Iris versicolour For each species 50 samples with four


Create an initial population POP as follow(i) POP1 = 119862119895 | 119862119895 = Min(dataset) minus (119895 minus 1)lowast((Max(dataset) minusMin(dataset))119902) 119895 = 1 2 119902Where Min(dataset) and Max(dataset) are correspond with data items that their features arethe minimum and maximum values in whole of the dataset respectively(ii) POP2 = Create a candidate solution using the minimum of the dataset(iii) POP3 = Create a candidate solution using the maximum of the dataset(iv) POP4 = Create a candidate solution using the mean of the dataset(v) POP5 POP119899pop Create all other candidate solutions randomly as followfor 119894 = 5 119899pop

for 119895 = 1 119889for 119892 = 0 119902 minus 1POP119894(119895 + 119892 lowast 119889) = random number in range of (119871119862(119895) 119880119862(119895))Where LC(119895) and UC(119895) are the lower and upper bounds for each decisionvariable (ie LC(119895) lt 119862119895

119903lt UC(119895) 119903 = 1 119902)

end forend for

end forCalculate fitness of POP (cost function) according to (5) and sort it from the Best(minimum) fitness to the worst one (maximum)for 119894 = 1 to 119899pop (for each solution POP119894)Compute emigration rate 120583119894 proportional to fitness of POP119894 where 120583119894 isin rand[0 1] so that 1205831 = 1120583119899pop = 0 and immigration rate 120582119894 = 1 minus 120583i so that 1205821 = 0 120582119899pop = 1

end forSet 120597 120574 KHOP KHCP based on (11)ndash(14)

While termination conditions are not metNewPOPlarr POP

for 119894 = 1 to 119899pop (for each solution NewPOP119894)for 119904 = 1 to 119889 (for each candidate solution decision variable index 119904)

Choose NewPop119894whether to immigrate with probabilistically decide 120582119894

if NewPOP119894is selected for immigrating

Use values 120583 to probabilistically select the emigrating solution POP119895if POP119895 is selected for emigrating

NewPOP119894(119904) larr POP119894(119904) + 120572 times (POP119895(119904) minus POP119894(119904)) based on (8) by setting120572 = random number in range of (09 1)

end ifend if

With probabilistically119898119894 decide whether to mutate NewPOP119894 based on (9)if NewPOP119894 is selected for mutation thus based on (10) (11)for 119892 = 1 119902if (119904 minus (119892 minus 1) lowast 119889) lt= 119889

NewPOP119894(119904) = NewPOP119894(119904) + 120590 times sigmalowast(119904 minus (119892 minus 1))lowastsigma(119895) = 120597 times (119880119862(119895) minus 119871119862(119895)) 119895 = 1 119889 where 120597 its value in this workis 002 also 119880119862(119895) minus 119871119862(119895) are the lower and upper bounds for each decision variable

Breakend if

end forend if

end for (repeat for next candidate solution decision variable index)Recalculate fitness of NewPOP

End for (repeat for next solution)Sort population based on fitness functionMake new population POP based on combinatorial of old POP and New POP based on (10)Sort new population based on fitness functionUpdate and store best solution ever found

end while

Algorithm 1 Pseudocodes of proposed method


Table 2 Intracluster distances for real life datasets

Dataset Criteria 119870-means TS SA PSO BB-BC GA GSA ACO BBO

Cancer

Average 30322478 325137 323917 29817865 29643880 324946 29726631 304606 29643879Best 29869613 298284 299345 29744809 29643875 299932 29657639 297049 29643875Worst 52160895 343416 342195 30534913 29643890 342743 29932446 324201 29643887Std 3151456 232217 230192 1043651 000048 229734 891860 9050028 000036

CMC


Glass


Iris


Vowel


Wine


Table 3 The obtained best centroids coordinate for Cancer data

Cancer data Cluster 1 Cluster 2Feature A 71156 28896Feature B 66398 11278Feature C 66238 12018Feature D 56135 11646Feature E 52402 19943Feature F 80995 11215Feature G 60789 20059Feature H 60198 11014Feature I 23282 10320

features each (sepal length sepal width petal length andpetal width) were collected [45]

Vowel Dataset (119899 = 871 119889 = 3 119896 = 6) It consists of 871 In-dian Telugu vowel soundsThe dataset has three features cor-responding to the first second and third vowel frequenciesand six overlapping classes [45]

Wine Dataset (119899 = 178 119889 = 13 119896 = 3) This dataset de-scribes the quality of wine from physicochemical properties

Table 4 The obtained best centroids coordinate for CMC data

CMC data Cluster 1 Cluster 2 Cluster 3Feature A 436354 334957 244102Feature B 30140 31307 30417Feature C 34513 35542 35181Feature D 4582 36511 17947Feature E 07965 07928 09275Feature F 07629 06918 07928Feature G 18245 20903 22980Feature H 34355 329183 29754Feature I 0094 00573 0037

in Italy There are 178 instances with 13 continues attributesgrouped into 3 classesThere is nomissing value for attributes

In this paper the performance of the proposed algorithmis compared with recent algorithms reported in the literatureincluding 119870-means [38] TS [19] SA [20] PSO [22 39] BB-BC [27] GA [21] GSA [30] and ACO [46]

In this paper two criteria are used to measure the qualityof solutions found by clustering algorithms

(i) Sum of intracluster distances The distance betweeneach data vector in a cluster and the centroid of that


Table 5 The obtained best centroids coordinate for Glass data

Glass data Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6Feature A 15260 15156 15228 15266 15203 15243Feature B 119759 130863 146577 132229 137277 138085Feature C 0006 35272 00061 04232 35127 23414Feature D 10514 13618 22170 15242 10249 25919Feature E 720540 728710 732504 730610 719072 711423Feature F 02552 05768 00299 03865 02067 25749Feature G 143566 83588 86714 111471 94166 59948Feature H 01808 00046 1047 000979 00281 13373Feature I 01254 00568 00196 01544 00498 02846

cluster is calculated and summed up as defined in (3)It is also the evaluation fitness in this paper Clearlythe smaller the value is the higher the quality of theclustering is

(ii) Error rate (ER) It is defined as the number ofmisplaced points over the total number of points inthe dataset as

ER =(sum119899pop119894=1 (if (119861119894 = 119862119894) then 0 else 1))

119899poplowast 100 (12)

where 119899pop is the total number of data points and 119861119894 and 119862119894denote the datasets of which the 119894th point is a member beforeand after clustering respectively

Since all the algorithms are stochastic algorithms there-fore for each experiment 10 independent runs are carried outto indicate the stability and robustness of the algorithms foragainst with the randomized nature of the algorithms Theaverage best (minimum) and worst (maximum) solutionsand standard deviation of solutions of 10 runs of each algo-rithm are obtained by using algorithms on the datasets whichhave been applied for comparison This process ensuresthat the candidate solutions are spread in the wide area ofthe search space and thus increases the chance of findingoptima

Table 2 presents the intracluster distances obtained fromthe eight clustering algorithms for the datasets above Forthe cancer dataset the average best and worst solutionsof BBO algorithm are 29643879 29643875 and 29643887respectively which are much better than those of otheralgorithms except BB-BC which is the same as it Thismeans that it provides the optimum value and small standarddeviation when compared to those obtained by the othermethods For the CMCdataset the proposedmethod reachesan average of 55322550 while other algorithms were unableto reach this solution Also the results obtained on the glassdataset show that BBO method converges to the optimumof 2152097 in all of runs while the average solutions ofthe 119896-means TS SA GA PSO BB-BC GSA and ACOare 2279779 28379 28219 23049328 2312306 255382335433 and 27346 respectively For the iris dataset theaverage of solutions found by BBO is 965653 while this value

Table 6 The obtained best centroids coordinate for Iris data

Iris data Cluster 1 Cluster 2 Cluster 3Feature A 50150 59338 67343Feature B 34185 27974 30681Feature C 14681 44173 56299Feature D 02380 14165 21072

for the 119896-means TS SA GA PSO BB-BC GSA and ACOis 1057290 978680 9995 981423 967654 1251970 967311and 971715 respectively As seen from the results for the voweldataset the BBO algorithm outperformed the 119870-means TSSA GA PSO BB-BC GSA and ACO algorithms with theaverage solution 1490729042 For theWine dataset the BBOalgorithm achieved the optimum value of 162926782 whichis significantly better than the other tested algorithms

From Table 2 we can see that the BBO algorithm hasachieved the good performance in terms of the average bestandworst intercluster distances on these six datasets Itmeansthat BBO can find good quality solutions

The best centroids coordinates obtained by the BBOalgorithm on the test dataset are shown in Tables 3ndash8 FinallyTable 9 shows the error rate values obtained by algorithms forreal datasets As seen from the results in Table 9 the BBOalgorithm presents a minimum average error rate in all thereal datasets However the topography of the cost function ofclustering (3) has a valley shape therefore the found solutionsby thesemethodswere not globalTherefore the experimentalresults in the tables demonstrate that the proposed method isone of practicable and good techniques for data clustering

6 Conclusions

In summary this paper presents a new clustering algorithmbased on the recently developed BBOheuristic algorithm thatis inspired by mathematical models of science of biogeogra-phy (study of the distribution of animals and plants over timeand space)

To evaluate the performance of the BBO algorithm itwas tested on six real life datasets and compared with othereight clustering algorithmsThe experimental results indicate


Table 7 The obtained best centroids coordinate for Vowel data

Vowel data Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6Feature A 3578349 3758459 5081135 4079219 6236778 4396126Feature B 22916435 21484110 18382133 101820145 13098038 9874300Feature C 29782399 26788524 25559085 23172847 23327767 26654154

Table 8 The obtained best centroids coordinates forWine data

Wine data Cluster 1 Cluster 2 Cluster 3Feature A 133856 127859 127093Feature B 19976 23535 23219Feature C 23150 24954 24497Feature D 169836 195480 211983Feature E 1052124 989327 926449Feature F 30255 20964 21366Feature G 31380 14428 19187Feature H 051050 031322 03520Feature I 23769 17629 14966Feature J 57760 58415 43213Feature K 08339 11220 12229Feature L 30686 19611 25417Feature M 11374923 6873041 4638856

Table 9 Error rates for real life datasets

Dataset 119870-means PSO GSA BBOCancer 408 511 374 37CMC 5449 5441 5567 5422Glass 3771 4559 4139 3647Iris 1780 1253 1004 1003Vowel 4426 4465 4226 4136Wine 3112 2871 2915 2865

that the BBO optimization algorithm is suitable and usefulheuristic technique for data clustering In order to improvethe obtained results as a future work we plan to hybridizethe proposed approach with other algorithms and we intendto apply this method with other data mining problems

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors would like to thank the Ministry of EducationMalaysia for funding this research project through a ResearchUniversity Grant of Universiti Teknologi Malaysia (UTM)project titled ldquoBased Syariah Compliance Chicken Slaugh-tering Process for Online Monitoring and Enforcementrdquo(01G55) Also thanks are due to the Research ManagementCenter (RMC) of UTM for providing an excellent researchenvironment in which to complete this work

References

[1] W A Barbakh Y Wu and C Fyfe ldquoReview of clustering algo-rithmsrdquo inNon-Standard Parameter Adaptation for ExploratoryData Analysis vol 249 of Studies in Computational Intelligencepp 7ndash28 Springer Berlin Germany 2009

[2] X Cai and W Li ldquoA spectral analysis approach to documentsummarization clustering and ranking sentences simultane-ouslyrdquo Information Sciences vol 181 no 18 pp 3816ndash3827 2011

[3] M Carullo E Binaghi and I Gallo ldquoAn online document clus-tering technique for short web contentsrdquo Pattern RecognitionLetters vol 30 no 10 pp 870ndash876 2009

[4] W Halberstadt and T S Douglas ldquoFuzzy clustering to detecttuberculous meningitis-associated hyperdensity in CT imagesrdquoComputers in Biology and Medicine vol 38 no 2 pp 165ndash1702008

[5] L Liao T Lin and B Li ldquoMRI brain image segmentation andbias field correction based on fast spatially constrained kernelclustering approachrdquo Pattern Recognition Letters vol 29 no 10pp 1580ndash1588 2008

[6] E RHruschka R J G B Campello andLN deCastro ldquoEvolv-ing clusters in gene-expression datardquo Information Sciences vol176 no 13 pp 1898ndash1927 2006

[7] P Papajorgji R Chinchuluun W S Lee J Bhorania and PM Pardalos ldquoClustering and classification algorithms in foodand agricultural applications a surveyrdquo inAdvances inModelingAgricultural Systems pp 433ndash454 Springer US New York NYUSA 2009

[8] Y-J Wang and H-S Lee ldquoA clustering method to identifyrepresentative financial ratiosrdquo Information Sciences vol 178no 4 pp 1087ndash1097 2008

[9] J Li K Wang and L Xu ldquoChameleon based on clusteringfeature tree and its application in customer segmentationrdquoAnnals of Operations Research vol 168 no 1 pp 225ndash245 2009

[10] Y-C Song H-D Meng M J OrsquoGrady and G M P OrsquoHareldquoThe application of cluster analysis in geophysical data interpre-tationrdquo Computational Geosciences vol 14 no 2 pp 263ndash2712010

[11] G Cardoso and F Gomide ldquoNewspaper demand predictionand replacement model based on fuzzy clustering and rulesrdquoInformation Sciences vol 177 no 21 pp 4799ndash4809 2007

[12] S Das and S Sil ldquoKernel-induced fuzzy clustering of imagepixels with an improved differential evolution algorithmrdquo Infor-mation Sciences vol 180 no 8 pp 1237ndash1256 2010

[13] R I John P R Innocent and M R Barnes ldquoNeuro-fuzzyclustering of radiographic tibia image data using type 2 fuzzysetsrdquo Information Sciences vol 125 no 1ndash4 pp 65ndash82 2000

[14] S Mitra and P P Kundu ldquoSatellite image segmentation withshadowed C-meansrdquo Information Sciences vol 181 no 17 pp3601ndash3613 2011

[15] T H Grubesic ldquoOn the application of fuzzy clustering for crimehot spot detectionrdquo Journal of Quantitative Criminology vol 22no 1 pp 77ndash105 2006


[16] N H Park S H Oh and W S Lee ldquoAnomaly intrusiondetection by clustering transactional audit streams in a hostcomputerrdquo Information Sciences vol 180 no 12 pp 2375ndash23892010

[17] A J Sahoo and Y Kumar ldquoModified teacher learning basedoptimization method for data clusteringrdquo in Advances in Sig-nal Processing and Intelligent Recognition Systems vol 264 ofAdvances in Intelligent Systems and Computing pp 429ndash437Springer 2014

[18] M R Garey D S Johnson and H S Witsenhausen ldquoThecomplexity of the generalized Lloyd-max problemrdquo IEEETrans-actions on Information Theory vol 28 no 2 pp 255ndash256 1982

[19] K S Al-Sultan ldquoA Tabu search approach to the clusteringproblemrdquo Pattern Recognition vol 28 no 9 pp 1443ndash1451 1995

[20] S Z Selim and K Al-Sultan ldquoA simulated annealing algorithmfor the clustering problemrdquo Pattern Recognition vol 24 no 10pp 1003ndash1008 1991

[21] C A Murthy and N Chowdhury ldquoIn search of optimal clustersusing genetic algorithmsrdquoPatternRecognition Letters vol 17 no8 pp 825ndash832 1996

[22] J Kennedy and R Eberhart ldquoParticle swarm optimizationrdquo inProceedings of the IEEE International Conference on Neural Net-works vol 4 pp 1942ndash1948 IEEE Perth Australia December1995

[23] J L Fernandez Martınez E Garcia-Gonzalo and J PFernandez-Alvarez ldquoTheoretical analysis of particle swarm tra-jectories through a mechanical analogyrdquo International Journalof Computational Intelligence Research vol 4 no 2 pp 93ndash1042008

[24] J L Fernandez-Martınez and E Garcıa-Gonzalo ldquoStochasticstability and numerical analysis of two novel algorithms of thePSO family PP-GPSO and RR-GPSOrdquo International Journal onArtificial Intelligence Tools vol 21 no 3 Article ID 1240011 20pages 2012

[25] J L Fernandez Martınez and E Garcia-Gonzalo ldquoStochasticstability analysis of the linear continuous and discrete PSOmodelsrdquo IEEE Transactions on Evolutionary Computation vol15 no 3 pp 405ndash423 2011

[26] J L Fernandez Martınez E Garcıa Gonzalo Z FernandezMuniz and T Mukerji ldquoHow to design a powerful family ofparticle swarm optimizers for inverse modellingrdquo Transactionsof the Institute of Measurement and Control vol 34 no 6 pp705ndash719 2012

[27] A Hatamlou S Abdullah and M Hatamlou ldquoData clusteringusing big bangndashbig crunch algorithmrdquo in Innovative ComputingTechnology vol 241 of Communications in Computer andInformation Science pp 383ndash388 2011

[28] M Dorigo G Di Caro and LM Gambardella ldquoAnt algorithmsfor discrete optimizationrdquo Artificial Life vol 5 no 2 pp 137ndash172 1999

[29] E Rashedi H Nezamabadi-pour and S Saryazdi ldquoGSA agravitational search algorithmrdquo Information Sciences vol 179no 13 pp 2232ndash2248 2009

[30] A Hatamlou S Abdullah and Z Othman ldquoGravitationalsearch algorithm with heuristic search for clustering problemsrdquoin Proceedings of the 3rd IEEE Conference on Data Mining andOptimization (DMO rsquo11) pp 190ndash193 Putrajaya Malaysia June2011

[31] S Das A Abraham and A Konar ldquoMetaheuristic patternclusteringmdashan overviewrdquo inMetaheuristic Clustering vol 178 ofStudies in Computational Intelligence pp 1ndash62 Springer BerlinGermany 2009

[32] S-H Liao P-H Chu and P-Y Hsiao ldquoData mining techniquesand applicationsmdasha decade review from 2000 to 2011rdquo ExpertSystems with Applications vol 39 no 12 pp 11303ndash11311 2012

[33] S J Nanda and G Panda ldquoA survey on nature inspiredmetaheuristic algorithms for partitional clusteringrdquo Swarm andEvolutionary Computation vol 16 pp 1ndash18 2014

[34] D Simon ldquoBiogeography-based optimizationrdquo IEEE Transac-tions on Evolutionary Computation vol 12 no 6 pp 702ndash7132008

[35] W Guo M Chen L Wang S S Ge and Q Wu ldquoDesign ofmigration operators for biogeography-based optimization andmarkov analysisrdquo Submitted to Information Sciences

[36] D Simon M Ergezer and D Du ldquoPopulation distributions inbiogeography-based optimization algorithms with elitismrdquo inProceedings of the IEEE International Conference on SystemsMan and Cybernetics (SMC rsquo09) pp 991ndash996 October 2009

[37] C L Blake and C J Merz ldquoUCI repository of machine learningdatabasesrdquo 1998 httparchiveicsuciedumldatasetshtml

[38] A K Jain ldquoData clustering 50 years beyond K-meansrdquo PatternRecognition Letters vol 31 no 8 pp 651ndash666 2010

[39] A Hatamlou ldquoBlack hole a new heuristic optimizationapproach for data clusteringrdquo Information Sciences vol 222 pp175ndash184 2013

[40] S Rana S Jasola and R Kumar ldquoA review on particle swarmoptimization algorithms and their applications to data cluster-ingrdquo Artificial Intelligence Review vol 35 no 3 pp 211ndash2222011

[41] A M Bagirov and J Yearwood ldquoA new nonsmooth opti-mization algorithm for minimum sum-of-squares clusteringproblemsrdquo European Journal of Operational Research vol 170no 2 pp 578ndash596 2006

[42] S Yang R Wu M Wang and L Jiao ldquoEvolutionary clusteringbased vector quantization and SPIHT coding for image com-pressionrdquo Pattern Recognition Letters vol 31 no 13 pp 1773ndash1780 2010

[43] M Ergezer D Simon and D Du ldquoOppositional biogeography-based optimizationrdquo in Proceedings of the IEEE InternationalConference on Systems Man and Cybernetics (SMC rsquo09) pp1009ndash1014 October 2009

[44] S Yazdani J Shanbehzadeh and E Aminian ldquoFeature subsetselection using constrained binaryinteger biogeography-basedoptimizationrdquo ISATransactions vol 52 no 3 pp 383ndash390 2013

[45] S Paterlini andTKrink ldquoHigh performance clusteringwith dif-ferential evolutionrdquo in Proceedings of the Congress on Evolution-ary Computation (CEC rsquo04) vol 2 pp 2004ndash2011 PiscatawayNJ USA June 2004

[46] P S Shelokar V K Jayaraman and B D Kulkarni ldquoAn antcolony approach for clusteringrdquo Analytica Chimica Acta vol509 no 2 pp 187ndash195 2004

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014


MEDIATORSINFLAMMATION

of


Behavioural Neurology

EndocrinologyInternational Journal of



Disease Markers


BioMed Research International

OncologyJournal of



Oxidative Medicine and Cellular Longevity


PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of



Computational and Mathematical Methods in Medicine

OphthalmologyJournal of


Diabetes ResearchJournal of



Research and TreatmentAIDS


Gastroenterology Research and Practice


Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom


able to find the global solution to the clustering Applica-tion of the Gravitational Search Algorithm (GSA) [29] forclustering problem has been introduced in [30] A compre-hensive review on clustering algorithms can be found in [31ndash33]

In this paper a new heuristic clustering algorithm isdeveloped It is based on the evolutionary method called theBiogeography-Based Optimization (BBO) method proposedin [34] The BBOmethod is inspired from the science of bio-geography it is a population-based evolutionary algorithmConvergence results for this method and its practical applica-tions can be found in [35] The algorithm has demonstratedgood performance on various optimization benchmark prob-lems [36] The proposed clustering algorithm is tested on sixdatasets from UCI Machine Learning Repository [37] andthe obtained results are compared with those obtained usingother similar algorithms

The rest of this paper is organized as follows Section 2describes clustering problem A brief overview of the BBOalgorithm is given in Section 3 Section 4 presents the cluster-ing algorithm Experimental results are reported in Section 5Finally Section 6 presents conclusions with future researchdirection

2 Cluster Analysis

In cluster analysis we suppose that we have been given a set119860of a finite number of points of 119889-dimensional space 119877119889 thatis 1198861 1198862 119886119899 where 119886119894 isin 119877119889 119894 = 1 2 119899

In all clustering algorithms can be classified into twocategories namely hierarchical clustering and partitionalclustering Partitional clustering methods are the most pop-ular class of center based clustering methods It has beenseen that partitional algorithm is more commendable ratherthan hierarchical clustering The advantage of partitionalalgorithm is its visibility in circumstances where applicationinvolving large dataset is used where construction of nestedgrouping of patterns is computationally prohibited [38 39]The clustering problem is said to be hard clustering if everydata point belongs to only one cluster Unlike hard clusteringin the fuzzy clustering problem the clusters are allowed tooverlap and instances have degrees of appearance in eachcluster [40] In this paperwewill exclusively consider the hardunconstrained clustering problem Therefore the subject ofcluster analysis is the partition of the set 119860 into a givennumber 119902 or disjoint subsets 119861119894 119894 = 1 2 119902 with respectto predefined criteria such that

119861119894 = 0 119894 = 1 2 119902

119861119894 cap119861119895 = 0 forall119894 = 119895 119894 119895 = 1 2 119902

119902

⋃

119894=1

119861119894 = 119860

(1)

Each cluster 119861119894 can be identified by its center (orcentroid) To determine the dissimilarity between objects

many distance metrics have been defined The most populardistance metric is the Euclidean distance In this researchwe will also use Euclidean metric as a distance metric tomeasure the dissimilarity between data objects So for giventwo objects 119886119894 and 119886119895 with 119889-dimensions the distance isdefined by [38] as

119889 (119886119894 119886119895) = radic

119889

sum

119903=1(119886119903119894minus 119886119903119895)2 (2)

Since there are different ways to cluster a given set ofobjects a fitness function (cost function) for measuring thegoodness of clustering should be defined A famous andwidely used function for this purpose is the totalmean-squarequantization error (MSE) [41] which is defined as follows

MSE =119902

sum

119895=1sum

119886119894isin119861119895

119889 (119886119894 119861119895)

2 (3)

where 119889(119886119894 119861119895)2 is the distance between object 119886119894 and the

center of cluster 119862119895(119861119895) to be found by calculating the meanvalue of objects within the respective cluster

3 Biogeography-BasedOptimization Algorithm

In this section we give a brief description of the Biogeog-raphy-Based Optimization (BBO) algorithm BBO is a newevolutionary optimization method based on the study ofgeographic distribution of biological organisms (biogeogra-phy) [34] Organisms in BBO are called species and theirdistribution is considered over time and space Species canmigrate between islands which are called habitat Habitatis characterized by a Habitat Suitability Index (HSI) HSIin BBO is similar to the fitness in other population-basedoptimization algorithms and measures the solution good-ness HSI is related to many features of the habitat [34]Considering a global optimization problem and a populationof candidate solutions (individuals) each individual can beconsidered as a habitat and is characterized by its HSI Ahabitat with high HSI is a good solution (maximizationproblem) Similar to other evolutionary algorithms goodsolutions share their features with others to produce a betterpopulation in the next generations Conversely an individualwith low fitness is unlikely to share features and likelyaccept features Suitability index variable (SIV) implies thehabitability of a habitat As there are many factors in thereal world which make a habitat more suitable to reside thanothers there are several SIVs for a solution which affectits goodness A SIV is a feature of the solution and can beimagined like a gene in GA BBO consists of two main stepsmigration andmutationMigration is a probabilistic operatorthat is intended to improve a candidate solution [42 43] InBBO the migration operator includes two different typesimmigration and emigration where for each solution in each



120582119894 = 119868(1minus119896 (119894)

119899pop)

120583119894 = 119864(119896 (119894)

119899pop)

(4)




119898119894 = 119898max (1 minus 119875119894119875max

) (6)



ℎ119894 (SIV) = ℎ119894 (SIV) + 120588 (7)

where


C1

1C2

1C1

2C2

2C1

3C2

3















1119895 119862

2119895 119862

119889












119905

119903 119905 =



Name ofdataset


Number offeatures

Number ofclusters

Cancer 683 9 2 (444 239)CMC 1473 9 3 (629 334 510)

Glass 214 9 6 (70 76 17 13 929)

Iris 150 4 3 (50 50 50)

Vowel 871 3 6 (72 89 172 151207 180)

Wine 178 13 3 (59 71 48)











119903lt UC(119895) 119903 = 1 119902)

end forend for









end ifend if



Breakend if

end forend if



end while





Cancer


CMC


Glass


Iris


Vowel


Wine



















119899poplowast 100 (12)









6 Conclusions













Acknowledgments


References





















































of






Disease Markers



OncologyJournal of





PPAR Research



Journal of

ObesityJournal of


















120582119894 = 119868(1minus119896 (119894)

119899pop)

120583119894 = 119864(119896 (119894)

119899pop)

(4)




119898119894 = 119898max (1 minus 119875119894119875max

) (6)



ℎ119894 (SIV) = ℎ119894 (SIV) + 120588 (7)

where


C1

1C2

1C1

2C2

2C1

3C2

3















1119895 119862

2119895 119862

119889












119905

119903 119905 =



Name ofdataset


Number offeatures

Number ofclusters

Cancer 683 9 2 (444 239)CMC 1473 9 3 (629 334 510)

Glass 214 9 6 (70 76 17 13 929)

Iris 150 4 3 (50 50 50)

Vowel 871 3 6 (72 89 172 151207 180)

Wine 178 13 3 (59 71 48)











119903lt UC(119895) 119903 = 1 119902)

end forend for









end ifend if



Breakend if

end forend if



end while





Cancer


CMC


Glass


Iris


Vowel


Wine



















119899poplowast 100 (12)









6 Conclusions













Acknowledgments


References





















































of






Disease Markers



OncologyJournal of





PPAR Research



Journal of

ObesityJournal of


















1119895 119862

2119895 119862

119889












119905

119903 119905 =



Name ofdataset


Number offeatures

Number ofclusters

Cancer 683 9 2 (444 239)CMC 1473 9 3 (629 334 510)

Glass 214 9 6 (70 76 17 13 929)

Iris 150 4 3 (50 50 50)

Vowel 871 3 6 (72 89 172 151207 180)

Wine 178 13 3 (59 71 48)











119903lt UC(119895) 119903 = 1 119902)

end forend for









end ifend if



Breakend if

end forend if



end while





Cancer


CMC


Glass


Iris


Vowel


Wine



















119899poplowast 100 (12)









6 Conclusions













Acknowledgments


References





















































of






Disease Markers



OncologyJournal of





PPAR Research



Journal of

ObesityJournal of



















119903lt UC(119895) 119903 = 1 119902)

end forend for









end ifend if



Breakend if

end forend if



end while





Cancer


CMC


Glass


Iris


Vowel


Wine



















119899poplowast 100 (12)









6 Conclusions













Acknowledgments


References





















































of






Disease Markers



OncologyJournal of





PPAR Research



Journal of

ObesityJournal of



















Cancer


CMC


Glass


Iris


Vowel


Wine



















119899poplowast 100 (12)









6 Conclusions













Acknowledgments


References





















































of






Disease Markers



OncologyJournal of





PPAR Research



Journal of

ObesityJournal of






















119899poplowast 100 (12)









6 Conclusions













Acknowledgments


References





















































of






Disease Markers



OncologyJournal of





PPAR Research



Journal of

ObesityJournal of


























Acknowledgments


References





















































of






Disease Markers



OncologyJournal of





PPAR Research



Journal of

ObesityJournal of





















































of






Disease Markers



OncologyJournal of





PPAR Research



Journal of

ObesityJournal of





















of






Disease Markers



OncologyJournal of





PPAR Research



Journal of

ObesityJournal of
















Documents

Research Article An Efficient Optimization Method for ...downloads.hindawi.com/journals/cmmm/2015/802754.pdf · 4. BBO Algorithm for Data Clustering In order to use BBO algorithm