9
A Cooperative Multi-population Approach to Clustering Temporal Data Kristina Georgieva CIRG, Department of Computer Science University of Pretoria Email: [email protected] Andries P. Engelbrecht CIRG, Department of Computer Science University of Pretoria Email: [email protected] Abstract—In temporal environments, population-based data clus- tering algorithms suffer when changes in the data occur dur- ing the clustering process. Diversity of the population is lost and memory of the individuals of the population is outdated, making the clusters found before the change non-optimal. This paper proposes a new particle swarm optimisation alternative to clustering temporal data. It combines the dynamic properties of the multi-swarm particle swarm optimisation algorithm with the multi-objective properties of the cooperative particle swarm optimisation algorithm. The proposed alternative is compared to various existing data clustering algorithms which are shortly described in the paper and the results are discussed, including a comparison of four performance measures relevant to the clustering of data. I. I NTRODUCTION Clustering [1] refers to the grouping of similar objects together such that relationships among these objects can be determined and evaluated. Data clustering has been performed using k-means [2], fuzzy c-means [3], differential evolution [4], artificial immune systems [5], neural networks [6] and particle swarm optimisation [7] [8]. All these algorithms have been used to cluster static data. This paper focuses on evaluating particle swarm optimisation algorithms applied to the problem of clustering temporal data. Particle swarm optimisation (PSO) algorithms have previously been developed and used for the clustering of static-data [7][8]. A temporal dataset displays a different behaviour to a static one. The dataset changes as time passes, making it more complex to cluster. The complexity arises from the changes that occur in the data, which make the population’s current and previous positions no longer optimal. As time passes, the population becomes less diverse making it more difficult for the particles to explore the search space for the new optimal solutions. In this research, four algorithms have been evaluated on temporal datasets, namely the standard data clustering PSO, the reinitialising data clustering PSO, the multi-swarm PSO and the cooperative PSO. Problems such as the lack of re- diversification of the population and the lack of information exchange between populations in the multi-swarm based al- gorithms were encountered when evaluating these four algo- rithms. This paper aims at remedying these problems by com- bining two of the algorithms into a new clustering algorithm called the cooperative-multipopulation data clustering PSO. II. PARTICLE SWARM OPTIMISATION The particle swarm optimisation (PSO) algorithm [9] is a population based algorithm modelling the social behaviour of birds. It optimises a solution by adapting the position of “particles” across the search space. These particle positions are changed by giving each dimension of the particle a velocity value, a value determined by a social component and a cognitive component. The social component refers to the influence that the best position found within the entire population or a portion of it has on the new velocity values, while the cognitive component refers to the influence that the best position found by the particle being adapted has over its own velocity. Algorithm 1 adapts the velocity of each particle using [10] v ij (t + 1) = wv ij (t)+ c 1 r 1j (t)[y ij (t) - x ij (t)]+ c 2 r 2j (t)[ ˆ y j (t) - x ij (t)] (1) , and changes their position using [9] x ij (t + 1) = x ij (t)+ v ij (t + 1) (2) , where w is the inertia weight (how much the previous velocity of a particle will contribute to the new velocity), v ij (t + 1) is the velocity value for particle i at dimension j at time step t +1, c 1 is the influence of the cognitive component, c 2 is the influence of the social component, y ij is the personal best value of particle i for dimension j , x ij is the current position of particle i for dimension j , ˆ y j is the global best particle’s position for dimension j and r 1j and r 2j are two random values for each dimension from a sampled from two independent uniform distributions U(0,1), x ij (t +1) is particle i’s new position for dimension j at time step t +1. III. STANDARD DATA CLUSTERING PSO This paper chooses to evaluate Van der Merwe and Engel- brecht’s data clustering PSO [7] as the principal PSO algorithm for clustering data, and refers to this PSO as the standard data clustering PSO. The standard clustering PSO adds a 2013 IEEE Congress on Evolutionary Computation June 20-23, Cancún, México 978-1-4799-0454-9/13/$31.00 ©2013 IEEE 1983

[IEEE 2013 IEEE Congress on Evolutionary Computation (CEC) - Cancun, Mexico (2013.06.20-2013.06.23)] 2013 IEEE Congress on Evolutionary Computation - A cooperative multi-population

Embed Size (px)

Citation preview

Page 1: [IEEE 2013 IEEE Congress on Evolutionary Computation (CEC) - Cancun, Mexico (2013.06.20-2013.06.23)] 2013 IEEE Congress on Evolutionary Computation - A cooperative multi-population

A Cooperative Multi-population Approach toClustering Temporal Data

Kristina GeorgievaCIRG, Department of

Computer ScienceUniversity of Pretoria

Email: [email protected]

Andries P. EngelbrechtCIRG, Department of

Computer ScienceUniversity of Pretoria

Email: [email protected]

Abstract—In temporal environments, population-based data clus-tering algorithms suffer when changes in the data occur dur-ing the clustering process. Diversity of the population is lostand memory of the individuals of the population is outdated,making the clusters found before the change non-optimal. Thispaper proposes a new particle swarm optimisation alternativeto clustering temporal data. It combines the dynamic propertiesof the multi-swarm particle swarm optimisation algorithm withthe multi-objective properties of the cooperative particle swarmoptimisation algorithm. The proposed alternative is comparedto various existing data clustering algorithms which are shortlydescribed in the paper and the results are discussed, includinga comparison of four performance measures relevant to theclustering of data.

I. INTRODUCTION

Clustering [1] refers to the grouping of similar objects togethersuch that relationships among these objects can be determinedand evaluated. Data clustering has been performed usingk-means [2], fuzzy c-means [3], differential evolution [4],artificial immune systems [5], neural networks [6] and particleswarm optimisation [7] [8]. All these algorithms have beenused to cluster static data. This paper focuses on evaluatingparticle swarm optimisation algorithms applied to the problemof clustering temporal data.Particle swarm optimisation (PSO) algorithms have previouslybeen developed and used for the clustering of static-data [7][8].A temporal dataset displays a different behaviour to a staticone. The dataset changes as time passes, making it morecomplex to cluster. The complexity arises from the changesthat occur in the data, which make the population’s currentand previous positions no longer optimal. As time passes, thepopulation becomes less diverse making it more difficult forthe particles to explore the search space for the new optimalsolutions.In this research, four algorithms have been evaluated ontemporal datasets, namely the standard data clustering PSO,the reinitialising data clustering PSO, the multi-swarm PSOand the cooperative PSO. Problems such as the lack of re-diversification of the population and the lack of informationexchange between populations in the multi-swarm based al-gorithms were encountered when evaluating these four algo-rithms. This paper aims at remedying these problems by com-bining two of the algorithms into a new clustering algorithm

called the cooperative-multipopulation data clustering PSO.

II. PARTICLE SWARM OPTIMISATION

The particle swarm optimisation (PSO) algorithm [9] is apopulation based algorithm modelling the social behaviourof birds. It optimises a solution by adapting the position of“particles” across the search space. These particle positionsare changed by giving each dimension of the particle avelocity value, a value determined by a social componentand a cognitive component. The social component refers tothe influence that the best position found within the entirepopulation or a portion of it has on the new velocity values,while the cognitive component refers to the influence that thebest position found by the particle being adapted has over itsown velocity. Algorithm 1 adapts the velocity of each particleusing [10]

vij(t+ 1) = wvij(t) + c1r1j(t)[yij(t)− xij(t)]+

c2r2j(t)[yj(t)− xij(t)](1)

,and changes their position using [9]

xij(t+ 1) = xij(t) + vij(t+ 1) (2)

,where w is the inertia weight (how much the previous velocityof a particle will contribute to the new velocity), vij(t + 1)is the velocity value for particle i at dimension j at timestep t + 1, c1 is the influence of the cognitive component,c2 is the influence of the social component, yij is the personalbest value of particle i for dimension j, xij is the currentposition of particle i for dimension j, yj is the global bestparticle’s position for dimension j and r1j and r2j are tworandom values for each dimension from a sampled from twoindependent uniform distributions U(0,1), xij(t+1) is particlei’s new position for dimension j at time step t+ 1.

III. STANDARD DATA CLUSTERING PSO

This paper chooses to evaluate Van der Merwe and Engel-brecht’s data clustering PSO [7] as the principal PSO algorithmfor clustering data, and refers to this PSO as the standarddata clustering PSO. The standard clustering PSO adds a

2013 IEEE Congress on Evolutionary Computation June 20-23, Cancún, México

978-1-4799-0454-9/13/$31.00 ©2013 IEEE 1983

Page 2: [IEEE 2013 IEEE Congress on Evolutionary Computation (CEC) - Cancun, Mexico (2013.06.20-2013.06.23)] 2013 IEEE Congress on Evolutionary Computation - A cooperative multi-population

Algorithm 1 PSO

Initialize a swarm of N n-dimensional particles as Pwhile stopping condition has not been reached do

for each particle xi in P doif fitness(xi) < fitness(yi) then

yi = xi

end ifif fitness(yi) < fitness(y) then

y = yi

end ifend forfor each particle xi in P do

Update velocity using equation (1)Update position using equation (2)

end forend while

few changes to the particle swarm optimisation algorithmshown in Algorithm 1 in order for the algorithm to take intoconsideration a dataset when determining the fitness value ofa particle. The first change is the representation of a particle.While the particle swarm optimisation algorithm described byKennedy and Eberhart [9] represents a particle as a vector,a particle of the standard data clustering PSO holds a setof vectors, as shown in Figure 1. Each vector represents acluster centroid. The aim of this PSO is to adapt the positionsof these centroids to the center of each cluster, such that thequantization error is minimised.The second change involves the fact that the dataset needs toinfluence the results in some way. The fitness function ensuresthis by first assigning each data pattern to the centroid closestto it. Then, by calculating the quantization error,

Je =

∑Kk=1

∑∀z∈Ck

d(z,ck)

|Ck|

K(3)

each particle is assigned a fitness value depending on howclose its cluster centroids are to the real clusters, where zis a data pattern, Ck is the set of data patterns assigned to acluster, d(z, ck) is the Euclidean distance between data patternz and cluster centroid ck, |Ck| is the total number of patternsassigned to the centroid and K is the total number of centroids.The quantization error used in the standard data clustering PSOallows for division by zero. This is due to dividing by the totalnumber of data patterns assigned to a centroid, as centroidsthat are far from all patterns may result in no patterns beingassigned to them, making this number zero. In order to use thealgorithm without changing the behaviour, if a division by zerowas encountered, the fitness of the particle was approximatedto infinity.The standard data clustering PSO then uses the dataset todetermine the fitness of each particle and performs the samevelocity update and position update functions that the StandardPSO does. The standard data clustering PSO is shown inAlgorithm 2 below.

Fig. 1: Particle Representation a 6 centroid clustering problemwhere x,y and z are the attributes of the data

Algorithm 2 Standard Data Clustering PSO

Initialize a swarm, P, of N Kxn-dimensional particleswhile stopping condition is not reached do

for each particle xi in the population dofor each data pattern zp do

calculate the Euclidean distance between zp andall centroids

assign zp to the centroid that resulted in thesmallest distance

end forCalculate the particle’s fitness using equation (3)Update the global and personal best positionsUpdate velocity using equation (1)Update position using equation (2)

end forend while

IV. PARTICLE SWARM OPTIMISATION ALGORITHMS FORCLUSTERING TEMPORAL DATA

The standard data clustering PSO described was developedto cluster stationary data. As this paper focuses on temporaldata rather than stationary data, a few algorithms designed fordynamic environments were evaluated and compared againsteach other in order to determine the most appropriate one forthe clustering of temporal data. These algorithms are shortlydescribed below and include the reinitialising data clusteringPSO, the multi-swarm PSO, and the cooperative PSO.

A. Reinitialising Data Clustering PSO

The reinitialising PSO [11] is a standard data clustering PSOwhere a portion of, or the whole population is re-initialisedas soon as a change occurs in the environment. This meansthat a portion of the population is placed in random positionsaround the search space in order to increase diversity.

B. Multi-swarm Data Clustering PSO

The original multi-swarm PSO described by Blackwell [12]uses a set of populations, each optimising one of the multiplesolutions. In the case of data clustering, the particles needinformation from all cluster centroids in order to calculatetheir fitness value. For this reason, the original multi-swarmalgorithm, where each population optimises one solution,could not be implemented. An alternative was evaluated.Here, Blackwell’s multi-swarm was implemented as shown inAlgorithm 3, but instead of each population optimising onecentroid, each population attempts to optimise all centroids,

1984

Page 3: [IEEE 2013 IEEE Congress on Evolutionary Computation (CEC) - Cancun, Mexico (2013.06.20-2013.06.23)] 2013 IEEE Congress on Evolutionary Computation - A cooperative multi-population

using the same representation of a particle as the standarddata clustering PSO. This results in a set of populations allworking in parallel on the same problem around different areasof the search space. In the end, the best solution out of eachpopulation’s best solution is the final solution.

Algorithm 3 Data Clustering Multi-swarm PSO

Initialize s swarms Sk, k = 1, ..., sInitialise r, the exclusion radiusInitialise rconv , the convergence radiuswhile stopping condition has not been reached do

if all swarms have a diameter which is smaller thanrconv then

Reinitialize the worst one by placing its individualsin random positions of the search space

end iffor each swarm Sk do

Perform an iteration of the standard data clusteringPSO for Sk

end forfor each swarm Sk do

for each other swarm Sl doif distance between yk and yl < r then

if fitness(yk) < fitness(yl) thenReinitialize Sk by placing its individuals

in random positions of the search spaceelse

Reinitialize Sj by placing its individualsin random positions of the search space

end ifend if

end forend for

end while

The algorithm acts in the same way as the one described byBlackwell, by using repulsion and anti-convergence methodsto keep the swarms diverse. If two swarms’ global best valuesare within an exclusion radius from each other, one of thesepopulations is re-initialised. In the same manner, if all theswarms are converging, which is determined by the diameterof a swarm being within a convergence radius, one of thepopulations is re-initialised.

C. Cooperative Data Clustering PSO

The cooperative data clustering PSO [13], shown in Algo-rithm 4, is a multi-population PSO where each populationoptimises part of the solution. In the case of data clustering,each population optimises a centroid position. What makesthis more viable than Blackwell’s [12] multi-swarm PSO isthe use of a context particle. The context particle is createdfrom the best solution of each swarm and is used in thecalculation of the fitness of each particle. To calculate thefitness of entity i in subswarm j, dimension k of the contextparticle is replaced with entity i’s solution and the fitness iscalculated on the complete solution. Due to the division by

zero problem described in Section III, infinity values in theresults were replaced with the worst fitness encountered.

Algorithm 4 Cooperative Data Clustering PSO

Initialize K swarms of N one-dimensional particles eachInitialize the context particle b(k,m)while stopping condition has not been reached do

for each swarm Sk do . Update the context particlefor each particle pi in Sk do

if fitness(b(k,Sk.xi) < fitness(b(k,Sk.yi))then

Sk.yi = Sk.xi

end ifif fitness(b(k,Sk.yi) < fitness(b(k,Sk.yi))

thenSk.yi = Sk.yi

end ifend for

. Update particles using the new context particlefor each particle pi in Sk do

Update velocity using equation (1)Update position using equation (2)

end forend for

end while

V. COOPERATIVE-MULTIPOPULATION DATA CLUSTERINGPSO

The cooperative-multipopulation data clustering PSO wasdeveloped in an attempt to adapt Blackwell’s multi-swarmapproach to the clustering problem, by making each populationoptimise only one centroid rather than all the centroids inparallel. The cooperative-multipopulation uses K swarms, eachoptimising a different centroid, as shown in Figure 2, wherethe best solution of each swarm is one centroid position and thesolution to the optimisation problem is the combination of eachswarm’s global best particle. The adaptation of Blackwell’s[12] approach evaluated in this paper consists of variousswarms each optimising all centroids at once, as shown inFigure 3, where the best particle of each swarm consists ofall the centroids and the particle with the best fitness amongall swarm’s best particles is selected as the solution to theoptimisation problem. In both figures an example of the bestsolution has been marked by surrounding it with a rectangle.By separating the centroids into different swarms, the quanti-zation error of each complete solution can not be calculatedwithout introducing some form of communication betweenthe swarms. This communication was introduced by addinga context particle, exactly like the one described in thecooperative PSO. What makes this algorithm different to thecooperative PSO is the repulsion and anti-convergence mech-anisms borrowed from the Blackwell’s multi-swarm PSO.The main two problems with PSOs in dynamic environments[12] are the loss of diversity of the swarm and outdatedmemory of the individuals of the swarm. The loss of diversity

1985

Page 4: [IEEE 2013 IEEE Congress on Evolutionary Computation (CEC) - Cancun, Mexico (2013.06.20-2013.06.23)] 2013 IEEE Congress on Evolutionary Computation - A cooperative multi-population

Fig. 2: Example of centroid positions of global best particlesfrom three swarms in a Cooperative-multipopulation PSO

Fig. 3: Example of centroid positions of global best particlesfrom three swarms in a Multi-swarm PSO

refers to the fact that PSOs attempt to converge around agood solution. If a change occurs within the environment,the solution towards which particles are moving may not bethe best solution any more and the PSO may struggle due tothe lack of exploration that takes place in the later stages ofthe algorithm. If particles are not diverse enough, not enoughexploration occurs and the PSO can become stuck in a localoptimum. The second problem, outdated memory, refers tothe global best and personal best values found by particles nolonger being applicable once the environment has changed.Blackwell’s [12] multi-swarm approach aims at remedyingthese problems in two ways. If all swarms are converging, theweakest one is re-initialised in order to inject diversity intothe algorithm. There is, however, also a chance that swarmsattempt to converge around the same solution. To avoid this,Blackwell [12] used repulsion. Here, if two swarm’s globalbests are within a certain distance from each other, the swarmwith the better fitness value is re-initialised.The cooperative-multipopulation data clustering PSO, shownin Algorithm 5, initialises K populations, where K is thenumber of cluster centroids to be optimised. Each populationis assigned the task of finding an optimal cluster centroidposition. A context particle holding all centroid positions isgenerated using global best solutions from each population,regardless of whether the new context particle will have abetter fitness than the previous one. This context particle isused in order to calculate the fitness of a particle. As thequantization error fitness measure requires information aboutall clusters, the context particle holds the information about allclusters. When a particle’s fitness needs to be calculated, thecentroid position that the particle holds is used in combination

with the centroid positions held by the context particle, justlike in the cooperative PSO.Each swarm performs the standard data clustering PSO, whilethe multi-population PSO repulsion and anti-convergencemechanisms are applied to all the swarms. Swarms that areoptimising the same centroid position are repelled and oneswarm is always exploring the search space by re-initialisinga swarm when all swarms are converging.The context particle is a representation of the global bestcentroids found by each swarm. This means that the finalsolution to the optimisation problem is this context particle.The anti-convergence introduced by the multi-swarm strategy,however, ensures that as soon as all swarms begin to converge,one is re-initialised. This means that one of the centroids ofthe context particle, once swarms have converged, is a re-initialised centroid, making it an inaccurate representation ofthe solution. To avoid this problem an extra swarm, referredto as the “explorer swarm” was introduced. Once a swarm isre-initialised, it gains the status of “explorer swarm” and theprevious “explorer swarm” takes its place. This ensures thatthe context particle only holds centroid positions which havebeen exploited by the algorithm, and not random positionscaused by the re-initialisation of a swarm.

Algorithm 5 Cooperative-MultiPopulation PSO

Initialize K + 1 swarmsfor each swarm Sk do

initialize particlesend forInitialize the context particle b(k, z)while stopping condition is not reached do

for each swarm Sk in Population P dofor each particle pi in Sk do

if fitness(b(k,Sk.xi) < fitness(b(k,Sk.yi)) thenSk.yi = Sk.xi

end ifif fitness(b(k,Sk.yi) < fitness(b(k,Sk.yi)) then

Sk.yi = Sk.yi

end ifend for

end for. Update particles using the new context particle

Perform one Multi-swarm iteration, Algorithm 3end while

The cooperative-multipopulation data clustering PSO de-scribed above changes the context particle to include the bestsolution of each population regardless of whether the fitnessof the context particle itself will be better than that of theprevious context particle. In order to evaluate the effect of onlychanging the context particle if the new context particle’s valueis better than the old one, an elitist version of the algorithmwas developed. This elitist version only changes the contextparticle to include the new best solutions if the new contextparticle will have a better fitness than the old one, introducingadditional exploitation.

1986

Page 5: [IEEE 2013 IEEE Congress on Evolutionary Computation (CEC) - Cancun, Mexico (2013.06.20-2013.06.23)] 2013 IEEE Congress on Evolutionary Computation - A cooperative multi-population

VI. EXPERIMENTAL SETUP

The datasets used were auto-generated temporal clusteringdatasets with 8 clusters used and described by Graaff [14].The changes in clusters occurred by pattern migration, whichmeans that patterns moved from one cluster to another at eachinterval of change. The datasets consisted of patters with either3, 8 or 15 attributes, with changes occurring at frequencies 1to 5 and severities 1 to 5. All datasets contained 8000 patternsand 100 timesteps. Each timestep contained 80 data patterns,and a window of size of 80 was used to slide from one timestepto the next. To determine how many iterations a timestep usesiterationsPerT imestep = iterations

timeStep and to determine atwhich timestep a change occurs, timeStepOfChange = f

10 ∗timeStep. Therefore, to determine at which iteration a changeoccurs use timeStepOfChange ∗ iterationsPerT imestepwhich results in iterationOfChange = f

10∗totalIterations.Note that a “higher frequency of change” in this paper there-fore means that less changes occur in the dataset.Particles were initialised within the bounds of the dataset. Thefive algorithms described were evaluated on a combinationof change frequencies 1, 2, 3, 4 and 5 (where a lowerfrequency means more changes occur), change severities 1,2, 3, 4 and 5 (where a higher severity means a larger change)and dimensions 3, 8 and 15. The results of each experimentwere averaged over 30 independent runs and the algorithmsran for 1000 iterations with a swarm size of 50 particles.For the re-initialising algorithms 10% of the swarm was re-initialised when a change occurred. In order to keep centroidswithin the bounds of the search space, any centroid thatpassed the boundaries was reset to stay on the boundary. ThePSO parameters that were used are those described as goodparameters by Van den Bergh [15], namely an inertia valueof 0.729844, a social component of 1.496180 and a cognitivecomponent of 1.496180.A few measures were used to determine the clustering capabil-ities of the algorithm, including the inter-cluster distance, theintra-cluster distance and the Ray-turi validity index. The inter-cluster distance [14] refers to the distance between variousclusters, a distance that the algorithm must maximise andwhich is calculated using

Jiter =2

K(K − 1)

K−1∑k=1

K∑k2=k+1

d(ck, ck2) (4)

where K is the total number of cluster centroids, the functiond is the Euclidean distance and ck and ck2

are cluster centroidsat index k and k2 respectively.The intra-cluster distance [14] refers to the average distanceof patterns from their clusters, a distance that the algorithmmust minimise and which can be calculated using

Jintra =

K∑k=1

∑∀z∈Ck

d(z, ck)

|P |(5)

where z is a data pattern, Ck is a cluster, ck is the clustercentroid, |P | is the total number of data patterns in the datasetand d is the euclidean distance.Lastly, the Ray-turi [14] validity index was also used inorder to have a measure that combines both the intra-clusterand inter-cluster distances. Validity indexes are normally usedin order to determine the optimal number of clusters for adataset, but in this paper it was used as a more accuraterepresentation of each algorithm’s clustering performance. TheRay-turi validity index is calculated using

Qratio =Jintra

intermin(6)

where Jintra is the intra-cluster distance defined in equation(5) and intermin is the smallest inter-cluster distance foundusing equation (4).

VII. RESULTS AND DISCUSSION

The inter-cluster, intra-cluster and Ray-Turi validity indexvalues were used to analyse the results. These are shown forvarious frequencies, severities and dimensions in Figure 4 fol-lowed by the ranks of the Friedman test in Figure 5. Figure 4aand Figure 5a show the inter-cluster distance value and rankfor various frequencies, where the cooperative PSO has themost optimal inter-cluster distance throughout all frequenciesand it is followed by the cooperative-multipopulation PSOand its elitist variation. The same behaviour is shown inFigure 4d, Figure 5d, Figure 4g and Figure 5g which depict thevalues and ranks of all the algorithms’ inter-cluster distancesfor various severities and dimensions. The graphs also showthat, generally, the impact that changes in the severity andfrequency values have on the inter-cluster distance is not assignificant as the impact of changing the dimension, whereall algorithms gained a more optimal inter-cluster distance inhigher dimensions. More dimensions mean that the searchspace increases, allowing for the space between centroidsfound to be larger, which would lead to larger inter-clusterdistances.Figure 4b and Figure 5b show that there is a significant dif-ference between the cooperative- multipopulation algorithm’sintra-cluster distance to that of the rest of the algorithms.The standard PSO, re-initialising PSO, multi-swarm PSOand the elitist cooperative-multipopulation algorithms all havelow values for intra-cluster distance which change by smallamounts as frequencies increase. The same behaviour is shownfor severity changes in Figure 4e and Figure 5e. Lastly,Figure 4h and Figure 5h show that, as dimensions increase, theelitist cooperative-multipopulation PSO improves from beingthe algorithm with the least optimal intra-cluster distance valueto the algorithm with the most optimal value, while most otheralgorithms obtained a worse intra-cluster value.In data clustering the inter-cluster and intra-cluster distancesare both important to determining the quality of a solutionas clusters that are compact and separate from each other areneeded. For this reason the Ray-Turi validity index was usedas a measure which combines the inter-cluster and intra-cluster

1987

Page 6: [IEEE 2013 IEEE Congress on Evolutionary Computation (CEC) - Cancun, Mexico (2013.06.20-2013.06.23)] 2013 IEEE Congress on Evolutionary Computation - A cooperative multi-population

(a) Inter-cluster distance per frequency

(b) Intra-cluster distance per frequency (c) Ray-turi validity per frequency

(d) Inter-cluster distance per severity (e) Intra-cluster distance per severity (f) Ray-turi validity per severity

(g) Inter-cluster distance per frequency (h) Intra-cluster distance per severity (i) Ray-turi validity per dimension

Fig. 4: Averages of Inter-cluster distance, Intra-cluster distance and Ray-turi validity index per frequency, severity and dimension

1988

Page 7: [IEEE 2013 IEEE Congress on Evolutionary Computation (CEC) - Cancun, Mexico (2013.06.20-2013.06.23)] 2013 IEEE Congress on Evolutionary Computation - A cooperative multi-population

(a) Inter-cluster distance per frequency (b) Intra-cluster distance per frequency (c) Ray-turi validity per frequency

(d) Inter-cluster distance per severity (e) Intra-cluster distance per severity (f) Ray-turi validity per severity

(g) Inter-cluster distance per frequency

(h) Intra-cluster distance per severity

(i) Ray-turi validity per dimension

Fig. 5: Ranks of Inter-cluster distance, Intra-cluster distance and Ray-Turi validity index per frequency, severity and dimension

1989

Page 8: [IEEE 2013 IEEE Congress on Evolutionary Computation (CEC) - Cancun, Mexico (2013.06.20-2013.06.23)] 2013 IEEE Congress on Evolutionary Computation - A cooperative multi-population

distances in order to give a more accurate representation of thequality of the cluster centroids found by an algorithm.Figure 4c and Figure 5c both show the Ray-Turi validityindex values for changes in frequency. The elitist cooperative-multipopulation PSO dominates with the lowest Ray-Turivalidity index value throughout the majority of frequencychanges. The re-initialising PSO, standard PSO and multi-swarm PSO all remain with low Ray-Turi validity indexvalues, while the cooperative-multipopulation PSO has asignificantly higher value, which is likely caused by thehigh intra-cluster distance shown in Figure 4b. The samebehaviour can be seen in Figure 4f and Figure 5f, where, asseverity values change, the elitist cooperative-multipopulationPSO remains the first or second ranked algorithm and thecooperative-multipopulation PSO remains the sixth. The al-gorithms displayed a different behaviour when changes indimension took place. As shown in Figure 4i and Figure 5i theelitist cooperative- multipopulation PSO has a low Ray-Turivalidity index value in low dimensions, but as the dimensionsincrease, so does the Ray- Turi validity value. This moves theelitist cooperative-multipopulation PSO from being ranked asthe first algorithm to being ranked as the fifth, leaving there-initialising PSO as the best ranked algorithm for higherdimensions. Lastly, the cooperative PSO improves its Ray-Turi validity index value as dimensions increase, but althoughthis value is close to the standard PSO, re-initialising PSO andmulti-swarm PSO values, it still remains ranked as the fourthalgorithm.Table I contains the average and standard deviation for inter-cluster distance, intra-cluster distance and Ray-Turi validityindex for each algorithm. The table shows that the cooper-ative PSO has the highest inter-cluster distance, the multi-swarm PSO has the lowest intra-cluster distance and the elitistcooperative-multipopulation has the lowest Ray-Turi validityindex value with a low standard deviation due to its elitistcomponent. This confirms what was seen in Figure 4 andFigure 5 but also shows that, depending on the measure used, adifferent algorithm is shown as the one with the more optimalvalues. In order to clarify which algorithm performed theclustering task better, Figure 6 shows the actual clusters andcluster centroids for severity 5 and frequency 5 for these threealgorithms at timesteps 100, 400, 700 and 1000.Figure 6a shows that all three algorithm’s best solutions arenear some clusters. Figure 6b shows that by iteration 400the multi-swarm and the elitist cooperative-multipopulationPSOs both are moving their centroids closer to the availableclusters, while the cooperative PSO is moving its centroidsslightly further from the clusters or leaving them in the middleof the search space. A change occurs in iteration 500 andFigure 6c shows the state of the cluster centroids at iteration700, which is after the change. Here the multi-swarm PSOhas been affected by the change and is slowly recovering fromit, while the elitist cooperative-multipopulation PSO continuesto move its centroids towards the correct clusters and thecooperative PSO remains with its centroids in the middle ofthe search space. Lastly in iteration 1000, shown in Figure 6d,

the elitist cooperative-multipopulation PSO has found, or isclose to finding all the clusters, while the multi-swarm andcooperative PSOs are far from the solution.

VIII. CONCLUSION AND FUTURE WORK

This article described the implementation of six data clusteringPSOs. It discussed, evaluated and compared the results ofapplying these PSOs to a temporal data clustering problemwhere patterns migrate between clusters.It was shown that, according to the Ray-Turi validity indexmeasure and the depiction of the cluster centroids, the eli-tist cooperative-multipopulation PSO performed the patternmigration data clustering task more optimally than the otheralgorithms but struggled with an increase in dimension. Al-though the inter-cluster and intra-cluster distances showedthe cooperative PSO and the multi-swarm PSO to be thebetter choices, both measures need to be considered whendetermining the quality of a clustering solution.The cooperative-multipopulation algorithm showed to be un-successful when considering the Ray-Turi validity index andthe intra-cluster distances. This may be caused by too muchre-diversification of the swarms as each re-initialisation mayhave been too harsh, sending the cluster centroids too farand inherently increasing the inter-cluster distance as well asthe intra-cluster distance. The elitist alternative allowed forthe context particle to only change if the new one would bebetter than the old one, allowing for the possibly harsh re-initialisations to have less of an effect.The pattern migration problem moves data patterns fromone cluster to another, leaving the clusters themselves withina certain radius of their original positions causing minorchanges in the search space. This explains why the elitistcooperative- multipopulation PSO, a more exploitative algo-rithm, performed the clustering task more successfully thanthe more exploration-oriented cooperative PSO, multi-swarmPSO and cooperative-multipopulation PSO. The cooperative-multipopulation PSO is likely to perform the clustering taskbetter than its elitist alternative in problems other than patternmigration.Future work may include determining the best parameter com-bination for the cooperative-multipopulation PSO, a scalabilitystudy on the algorithms described in this paper, testing thecooperative-multipopulation PSO and its elitist alternative onproblems other than pattern migration, such as the alternativesdescribed by Graaff[14] and determining a fitness measure thatdoes not allow for division by zero and possibly combinesinter-cluster and intra-cluster distances.

REFERENCES

[1] A. K. Jain, “Data clustering: 50 years beyond k-means,” Pattern Recog-nition Letters, 2009.

[2] J. MacQueen, “Some methods for classification and analysis of mul-tivariate observations,” in Fifth Berkeley Symposium on MathematicalStatististics and Probability, vol. 1, pp. 281–297, 1967.

[3] C. S. J. W. T. C. K. Chuang, H. Tzeng, “Fuzzy c-means clusteringwith spatial information for image segmentation,” Computerized MedicalImaging and Graphics, vol. 30, pp. 9–15, 2006.

1990

Page 9: [IEEE 2013 IEEE Congress on Evolutionary Computation (CEC) - Cancun, Mexico (2013.06.20-2013.06.23)] 2013 IEEE Congress on Evolutionary Computation - A cooperative multi-population

TABLE I: Averages and standard deviation for inter-cluster distance, intra-cluster distance and ray-turi validity for eachalgorithm

AlgorithmInter-cluster Distance Intra-cluster Distance Ray-Turi Validity

Standard 3.940 ± 1.3 19.987 ± 1.017 1.079 ± 0.116Standard Reinitialising 3.935 ± 1.301 19.987 ± 0.995 1.091 ± 0.12Multipopulation 3.626 ± 0.821 19.935 ± 1.054 1.083 ± 0.11Cooperative 6.979 ± 1.593 20.168 ± 0.99 1.207 ± 0.188Cooperative-multipopulation 5.935 ± 2.359 21.103 ± 1.275 1.535 ± 0.233Elitist Cooperative-multipopulation 4.676 ± 2.588 19.973 ± 1.551 1.051 ± 0.682

(a) At Iteration 100 (b) At Iteration 400

(c) At Iteration 700 (d) At Iteration 1000

Fig. 6: Cluster positions for dataset of frequency 5 and severity 5 during iterations 100, 400, 700 and 1000

[4] D. S., A. A., and K. A., “Automatic clustering using an improveddifferential evolution algorithm,” IEEE Transactions on Systems Manand Cybernetics Part A Systems and Humans, vol. 38, no. 1, pp. 218–237, 2008.

[5] L. T., Z. Y., H. Z., and W. Z., “A new clustering algorithm based onartificial immune system,” in Fifth International Conference on FuzzySystems and Knowledge Discovery, 2008. FSKD ’08, vol. 2, pp. 347–351, 2008.

[6] H. J., V. A., and D. J., “A hierarchical unsupervised growing neural net-work for clustering gene expression patterns.,” Bioinformatics, vol. 17,no. 2, pp. 126–136, 2001.

[7] D. W. V. D. Merwe and A. P. Engelbrecht, “Data clustering using particleswarm optimization,” in 2003 Congress on Evolutionary Computation2003 CEC 03 (2003), vol. 1, pp. 215–220, 2003.

[8] H. J. P. Zhenkui, H. Xia, “The clustering algorithm based on particleswarm optimization algorithm,” in International Conference on Intelli-gent Computation Technology and Automation, vol. 1, p. 148, 2008.

[9] J. Kennedy and R. Eberhart, “Particle swarm optimization,” in IEEEInternational Joint Conference on Neural Networks, p. 19421948, 1995.

[10] Y. Shi and R. Eberhart, “A modified particle swarm optimizer,” in IEEECongress on Evolutionary Computation, p. 6973, 1998.

[11] R. Eberhart and Y. Shi, “Tracking and optimizing dynamic systemswith particle swarms,” in IEEE Congress on Evolutionary Computation,vol. 1, pp. 94–100, 2001.

[12] T. Blackwell, “Particle swarm optimization in dynamic environments,”Evolutionary Computation in Dynamic and Uncertain Environments,vol. 49, pp. 29–49, 2007.

[13] F. van den Bergh and A. P. Engelbrecht, “A cooperative approach toparticle swarm optimization,” in IEEE Transactions on EvolutionaryComputation, vol. 8, pp. 225–239, 2004.

[14] A. Graaff, A Local Network Neighbourhood Artificial Immune System.PhD thesis, University of Preotria, June 2011.

[15] F. V. den Bergh, An Analysis of Particle Swarm Optimizers. PhD thesis,University of Pretoria, November 2001.

1991