Improved multi-objective clustering with automatic

1

Improved multi-objective clustering with automatic determination of the number of clusters

María-Guadalupe Martínez-Peñaloza*, Efrén Mezura-Montes, Nicandro Cruz-Ramírez, Héctor-Gabriel Acosta-Mesa and Homero-Vladimir Ríos-Figueroa.

Department of Artificial Intelligence, University of Veracruz

Faculty of Physics and Artificial Intelligence, Sebastián Camacho No. 5, Xalapa Ver., México

Phone number: (228)817-29-57 / 842-17-00 ext. 10204. [email protected], {emezura, ncruz, heacosta, hrios}@uv.mx

Abstract. The multi-objective clustering with automatic determination of the number of clusters (MOCK) approach is improved in this work by means of an empirical comparison of three multi-objective evolutionary algorithms added to MOCK instead of the original algorithm used in such approach. The results of two different experiments using seven real data sets from UCI repository are reported: (1) using two multi-objective optimization performance metrics (hypervolume and two-set coverage) and (2) using the F-measure to evaluate the clustering quality. The results are compared against the original version of MOCK and also against other algorithms representative of the state-of-the-art. Such results indicate that the new versions are highly competitive and capable to deal with different types of data sets.

Keywords: Clustering Analysis, Evolutionary algorithms, MOCK, Multi-objective optimization.

1 Introduction

In the field of optimization, there are many problems that consider a single objective, however, most real-world problems involve more than one objective to be optimized while a set of constraints must be satisfied [1]. The presence of multiple objectives is natural and makes the optimization interesting and complex to solve. Data clustering is precisely an example of a problem where good solutions can be better described by considering different criteria, i.e., more than one objective to optimize [2]. On the other hand, multi-objective evolutionary algorithms (MOEAs) are highly competitive when solving complex multi-objective optimization problems [3]. Therefore, there are approaches where MOEAs are adopted to deal with data clustering as will be detailed later.

* Corresponding author

2

Several approaches have been proposed to solve the clustering problem using evolutionary algorithms. In this section we briefly mention some of those studies.

Different evolutionary algorithms have been designed to optimize partitions for fixed (user-defined) values of number of clusters (k). However, in practice clustering algorithms that optimize the number of clusters and the corresponding partitions are often preferred [27]. Cole [4] propose a label-based integer encoding in which a genotype was an integer vector of N positions, where each one was associated with a data object and took a value (cluster label) over the alphabet {1,2,3, ..., k}. In this case k was interpreted as the maximum number of clusters represented in each individual. Ma et al. [5] proposed an evolutionary algorithm for clustering named EvoCluster, which encodes a partition in a way that each gene represents one cluster and contains the labels of the objects grouped into it. Thus a genotype encoding k clusters (C1, C2, ... , Ck) of a data set with N objects was formed by k genes, each of which stored ni labels ("1 +"2+. . . +"( = *). In [28], Bandyopadhyay and Maulik propose a real enconding where genotypes are made up of real numbers that represent the coordinates of the cluster centers. If genotype i encodes k clusters in n dimensional space Rn, then its length is n k. An important advantage of this encoding scheme is that it leads to variable length genotypes, once k is no longer assumed to be constant. Pan and Cheng [29] adopts a binary encoding scheme based on a maximum number of clusters that is determined a priori. Each position of the string corresponds to an initial cluster center. Thus, the total number of 1s in the string corresponds to the number of clusters encoded into the genotype. Casillas et al. [6] adopted a binary vector with N-1 elements encoding scheme where the elements represented the N-1 edges of a Minimum Spanning Tree (MST) [7]. The nodes represented the N data set objects and the edges correspond to proximity indexes between objects. In this representation the value “0” means that the corresponding edge remains, whereas the value “1” means that the corresponding edge is eliminated.

The multi-objective clustering concept was firstly introduced in 1992 [8]. More than a decade after, in 2004, Handl and Knowles proposed a multi-objective clustering algorithm called VI-ENNA[9] which they improved later to obtain the MOCK algorithm [10,11]. This algorithm is the main reference of this work where the focus of the study relies in the multi-objective evolutionary algorithm adopted and called PESA-II [12,13]. The two objectives used in such work were the total intra-cluster variation (which was computed over all clusters) and the number of clusters. Both objectives should be minimized but they are conflicting with each other, because of, enhancing the value of an objective function involves worsening another, i.e., to enhance the value of the intra-cluster variation involves to increase the number of clusters. Hence, by using the concept of Pareto dominance, the algorithm looks for finding a set of different non-dominated clustering solutions with a good trade-off between the number of clusters and a small total intra-cluster variation.

Mukhopadhyay et al. [14] used two objectives where the first one was also a measure of total intra-cluster variation and the second one measured the clusters' separation (essentially, the summation of distances between all pairs of clusters). Ripon et al. [15] also proposed a multi-objective evolutionary clustering algorithm with two objectives, where the first one was an average intra-cluster variation computed over all clusters. The measure value was used to produce a normalized

3

value of the measure taking into account the number of clusters, which varied for different individuals in the evolutionary algorithm's population. The second objective was a measure of inter-cluster distance which measured the average distance separating a pair of clusters.

The background described above shows that evolutionary algorithms have been successfully applied in data clustering. However, to the best of our knowledge, the performance of the MOEA adopted has not been further studied as a source of improvement. Therefore, in this work three state-of-the-art MOEAs are added to MOCK and their performance is analyzed in two ways: (1) based on performance metrics used in evolutionary multi-objective optimization and (2) based on clustering metrics so as to analyze the influence of the search algorithm in the overall performance of this particular clustering technique.

The remaining of this paper is organized as follows. Some important concepts on

evolutionary multi-objective optimization are given in Section 2. A detailed description of MOCK and the proposed approach are explained in Section 3. Section 4 presents the methodology, experiments and the comparison of results. Finally, Section 5 provides the concluding remarks and future work.

2 Basic concepts

A multi-objective optimization problem (MOP) can be mathematically defined as follows [30]: Find , which minimizes: - , ≔ -/(,), -1(,), . . . , -2(,) , subject to m inequality constraints: 34 , ≤ 0, 7 = 1, 2, . . . , 8, and p equality constraints ℎ: , = 0, ; = 1,2, . . . , <, where , = [,/, ,1, … , ,?]

Ais the vector of decision variables; - is the set of objective functions to be minimized. The functions gi and hj, represent the set of constraints, that defines the feasible region in the search space ℱ. Any vector of variables , which satisfies all the constraints is considered a feasible solution. Regarding optimal solutions in MOPs, the following definitions are provided:

Definition 1: A vector of decision variables ,/ dominates another vector of

decision variables ,1, (denoted by ,/ ≺ ,1) if and only if ,/ is partially less than ,1, i.e., ∀4∈ 1,2, . . . , ( : -4 ,/ ≤ -4 ,1 ∃7 ∈ 1, … , ( : , -4 ,/ < -4 ,1 .

4

Definition 2: A vector of decision variables , ∈ Iis nondominated with respect to I, if there does not exist another ,J ∈ Isuch that - ,J ≺ -(,).

Definition 3: A vector of decision variables ,∗ ∈ ℱ is Pareto-optimal if it is

nondominated with respect to ℱ. Definition 4: The Pareto optimal set L∗ is defined by

L∗ = , ∈ ℱ ,7MNOPQRS − S<R78OU Definition 5: The Pareto front Lℱ∗ is defined by

Lℱ∗ = - , , ∈ L∗

The goal on a MOP consists on determining the Pareto-optimal set form the set ℱ of all the decision variable vectors that satisfy gi and hj .

3 Improved MOCK

In multi-objective clustering each objective is evaluated simultaneously for each potential grouping solution [8, 11]. The final result is a set of trade-off grouping solutions, where each solution has a different balance between the objectives that have been optimized. As an example, Figure 1 (b) shows the Pareto set obtained applying multi-objective clustering for the data in Figure 1(a) where the intra-cluster and inter-cluster distances are optimized.

Fig. 1 (a) Elements to group, (b) Pareto solutions obtained by the multi-objective clustering

3.1 MOCK elements

As mentioned in Section 1, MOCK is an algorithm designed for multi-objective clustering. Two objectives are considered: connectivity and compactness of the

5

groups formed (see section 3.1.3). The multi-objective optimization employed by MOCK allows it to get a set of optimal solutions, where each of them represents a possible grouping of data. Each solution found has a different k (number of groups) value. The automatic determination of the number of clusters "k" is one of the most important advantages of the approach, due to the fact that in most existing traditional clustering techniques such k value must be specified [16,17]. The optimization in MOCK is carried out by a particular MOEA, called “Pareto Envelope-based Selection Algorithm II” (PESA-II). To apply PESA-II, it is necessary to define a suitable genetic encoding that represents a possible grouping, one or more genetic variation operators (mutation and crossover), and two or more objectives to optimize. In Figure 2, the general structure of MOCK algorithm is presented, where the input of the algorithm is the data set to be grouped, then, the minimum spanning tree (MST) is calculated to initialize the population of the MOEA, once the population is generated, a MOEA (PESA-II in the original version) is used to optimize the solutions and get a set of optimal solutions represented in a Pareto front, where all the solutions can be seen like good solutions and represent one possible clustering solution. The following sections describe in more detail the operation of MOCK, its solution representation, the variation operators and the objective functions adopted, taking into account the technical report of the MOCK authors [10].

Fig. 2 MOCK’s general algorithm

3.1.1 Initialization and representation

The individual encoding employs a representation where each individual g consists of N genes g1, g2,...,gN, where N is the size of the clustered data set, and each gene gi can take allele values j in the range {1,...,N}. Hence, a value of j assigned to the i-th gene, is interpreted as a link between data items i and j: talking in terms of clustering, this means that data items i and j will be in the same cluster (see Figures 3 (a) and (b)). The decoding of this representation requires the identification of all sub-graphs. All data items belonging to the same sub-graph are then assigned to one cluster. The decodification of solutions inside MOCK can be done in linear time O(n), because a backtracking scheme was used as indicated in [10].

6

For the initialization, for a given data set, the MST is calculated by using Prim's algorithm [18]. The first individual in the population will be that which is obtained by applying the algorithm to find the minimum spanning tree. The i-th individual of the initial population is then formed eliminating the (i-1) largest links, i.e., the edges in the tree with largest distance between the nodes (elements to be clustered). When a link is removed, in the genotype the corresponding genes are changed to link to themselves, Figure 3 (b) shows the third initial member of the population. This initialization scheme provides different solutions in the initial population, each of them with different k values, to begin the search.

Fig. 3 (a) Initialization of the individuals with minimum spanning tree, first individual, (b) Sub-graphs obtained by MOCK, third individual [10]

3.1.2 Variation operators

Uniform crossover is used as the recombination operator in MOCK because it can generate any combination of alleles from the two parents (in a single crossover event). The mutation is a single event where the value of the allele from one position in an individual is changed. MOCK uses a special mutation operator called "nearest neighbor mutation operator" where each data item can only be linked to one of its L nearest neighbor. Hence, 34 ∈ {""4/, . . . , ""4W}, where ""4Y denotes the l-th nearest neighbor of data item i. This allows to reduce the extent of the search space to just LN. In order to apply the operator described above, it is necessary that the nearest neighbor list can be pre-computed in the initialization phase of the algorithm, the L number of neighbors is a user-defined parameter.

7

3.1.3 Objective functions

MOCK deals with two objectives that must be minimized (one based on compactness and the other one based on connectedness of clusters). In order to express cluster compactness, the overall deviation of a partitioning is calculated. This is computed summing the total distances between each data item and their corresponding centroid in a particular group. This objective can be mathematically expressed as follows:

ZQ[(\) = ] 7, ^24∈_`_`∈_

where C is the set of all clusters, ^2 is the centroid of cluster Ck and ] . , . is a distance measure (Euclidean distance). As an objective, overall deviation should be minimized. This criterion is similar to the intra-cluster variance.

For the connectivity among groups, a measure that evaluates the degree to which neighbors of a data item have been placed in the same cluster of the current data item is computed. In this objective function to measure the proximity between the nearest neighbors and one data item, the Euclidian distance was used too. Mathematically it can be expressed as follows:

\S""(\) = ,4,??a(:)

W

:b/

, cℎQPQ,d,e =1

;7-∄\2: P, M ∈ \2

0SRℎQPc7MQ

?

4b/

""4(;) is the jth nearest neighbor of datum i, and L is a parameter determining the number of neighbors that contribute to the connectivity measure. This objective function should be minimized too.

3.2 Proposed approach

As was aforementioned, PESA-II is MOCK’s MOEA (upper right-hand side in Figure 2). However, this works aims to evaluate MOCKs improvement based on using other MOEA while keeping all its other elements. The selected MOEAs for the comparison are: NSGA-II, SPEA-2 and MOEA/D, these three algorithms were selected because of, all of them have different components that help to find good solutions of the Pareto front. A brief description of each MOEA compared in this work is presented below: Pareto Envelope-based Selection Algorithm (PESA): this algorithm was proposed by Corne et al. [19]. This approach uses two populations of solutions: a small internal population (IP) and a larger external population (EP). The purpose of EP and IP is to exploit good solutions and to explore new solutions and achieves this by the standard EA processes of reproduction and variation, respectively. The solutions in EP are stored in “niches”, because PESA uses a hyper-grid division in the objective space to maintain diversity. The selection mechanism is based on the crowding measure used by the hyper-grid. This crowding measure is also used to decide what solutions to introduce into the EP. The second version of PESA, called PESA-II [12] has basically the same performance than PESA, except for the fact that region-based is used in this

8

case. In region-based selection, the unit of selection is a hyper-box rather than an individual. The procedure consists of selecting a hyper-box and then randomly selects an individual within such hyperbox. Algorithm 1 shows PESA-II’s pseudocode. For further details see [12].

Non-dominated sorting genetic algorithm-II (NSGA-II): this is an improved version of the NSGA algorithm [20]. This algorithm builds a population of competing individuals, ranks and sorts each individual according to non-dominated level, then applies the evolutionary operators to create the offspring, and then combines the parents and offspring before partitioning the new combined pool into fronts. The algorithm conducts niching by adding a crowding distance to each member. It uses this crowding distance in its selection operator to keep a diverse front by making sure each member stays a crowding distance apart. NSGA-II keeps the population diverse and helps the algorithm to explore the fitness landscape. Although, NSGA-II was proposed in 2002, it is considered one of the main traditional MOEAs that presents good performance in terms of finding a diverse set of solutions and in converging near to the true Pareto-optimal set. In addition, due to NSGA-II is a genetic algorithm, its implementation is intuitive and easy to adapt to different problem representions. The pseudocode of the NSGA-II is shown in Algorithm 2.

9

Strength Pareto Evolutionary Algorithm 2 (SPEA-2): this is also a revised version of SPEA whose pseudo code is shown in Algorithm 3. The algorithm uses an external archive containing non-dominated solutions previously found. At each generation, non-dominated individuals are copied to the external non-dominated set. For each individual in each external set, a strength value is computed. SPEA-2 has three main differences with respect to its predecessor [21]: (1) it incorporates a fine-grained fitness assignment strategy which takes into account for each individual the number of individuals that dominate it and the number of individuals by which it is dominated (i.e., its strength); (2) it uses a nearest neighbor density estimation technique which guides the search more efficiently, and (3) it has an enhanced archive truncation method that guarantees the preservation of boundary solutions.

10

Multiobjective evolutionary algorithm based on decomposition (MOEA/D): This algorithm requires a decomposition approach for converting an approximation of the PF into a number of single objective optimization problems. In principle, different decomposition approaches can be used for this purpose however, the most traditional approach is known as Tchebycheff approach [31].

According to Tchebycheff approach, let g/, … , ghbe a set of spread weight vectors and i∗ be a reference point, the problem of approximation of the PF can be decomposed into * scalar optimization subproblems and the objective function of the jth subproblem is:

3jk , g:, i∗ = max

/o4opg4: -4 , − i4

∗ where g: = (g/,…,

: gp: )A. In this case 3jk is continuous of g, and the optimal

solution of 3jk , g4, i∗ should be close to that of 3jk , g:, i∗ if g4 and g: are close to each other. Therefore, any information about these 3jk’s with weight vectors close to g4should be helpful for optimizing 3jk , g4, i∗ . In MOEA/D, a neighborhood of weight vector g4 is defined as a set of its several closest weight vector in {g/,…,gh}. The neighborhood of the ith subproblem consists of all the subproblems with the weight vectors from the neighborhood of g4. The population is composed of the best solution found so far for each subproblem. Only the current solutions to its neighboring subproblems are exploited for optimizing a subproblem in MOEA/D. In Algorithm 4 the pseudocode of MOEA/D is presented.

Since, NSGA-II and SPEA-2 are traditional MOEAs based on Pareto dominance, and it is known that a Pareto optimal solution to a MOP could be an optimal solution of a scalar optimization problem. Therefore, the approximation of the Pareto front can be descomposed in different optimization subproblems, a MOEA/D was selected for the comparison because of, it is a very recent algorithm that adopts the idea of descomposition.

For this comparison, the MOCK implementation was based on [10]. Similar to the

original MOCK algorithm, an integer encoding was adopted. Regarding the population handling, once a population of size N is generated, where N is the total data items to be grouped, the number of individuals in the population represent the number of instances of each data set. However, it is necessary to reduce the population size, due to the value assigned to this parameter depends on the number of instances in the same way to the original MOCK implementation (see section 4.3). Therefore, a certain amount of individuals of such population are randomly selected so as to decrease the size of the population.

11

To generate the three new MOCK versions with NSGA-II, SPEA-2 and MOEA/D, the same variation operators used by PESA-II are adopted to promote a fair comparison among the three algorithms. Two constraints were considered, the first refers to the number of clusters in a solution, which must be between 1 and 25. The second constraint states that a group must have at least two elements. The constraint-handling technique employed was a combination of the feasibility rules proposed by Deb [22] with Pareto dominance for non-dominated sorting [20]. Therefore, the criteria used in pair-wise comparisons are the following:

a) Between two feasible solutions, the one which dominates the other is preferred. If both solutions are feasible and non-dominated each other, one is chosen at random.

b) Between one feasible solution and one infeasible solution, the feasible one is preferred.

c) Between two infeasible solutions, the one with the lowest sum of constraint violation is preferred.

12

4 Methodology, experiments and comparison of results

Four algorithms: MOCKPESA-II, (the original MOCK), MOCKNSGA-II, MOCKSPEA-2 and MOCKMOEA/D, are evaluated on seven real data sets from the Machine Learning Repository [23]. The characteristics of each of them are summarized in Table 1 where N is the total number of data items in the data set (instances), Ni is the number of items for cluster i, D is the dimensionality (attributes) and k represents the optimal number of clusters in the data set. The first five data sets in Table 1 were chosen because they are the same data bases solved by the original MOCK algorithm [10,11] and in order to evaluate the performance of the algorithms a fair comparison is required. Seeds and User Knowledge Modeling are data sets commonly used for clustering task, i.e., for unsupervised learning where the class attribute is unknown. However, although in the first five data sets are typically used for classification because the class attribute is known, for the experiments carried out, the class attribute is not considered for the executions of the algorithms, i.e., all the data sets are considered like data sets for unsupervised learning.

Table 1 Summary of the real data sets adopted in the experiments

Data set N Ni D k Type Dermatology 366 112, 61, 72, 49, 52, 20 34 6 Integer Iris 150 3 x 50 4 3 Continuous Wine 178 59, 71, 48 13 3 Continuous Wisconsin 699 485, 241 9 2 Integer

Yeast 1484 463, 429, 244, 163, 51, 44, 37, 30, 20, 5 8 10 Continuous

Seeds 210 3 x 70 7 3 Continuous User Knowledge Modeling 403 50, 129, 122, 102 4 4 Continuous

Two experiments were designed to evaluate the performance of the four

algorithms: a) A comparison of the original MOCK and the three proposed versions (with

NSGA-II, SPEA-2 and MOEA/D) from the evolutionary multi-objective optimization standpoint, taking into account two metrics to determine the performance of the optimization process.

b) A comparison of the most competitive MOCK version against classical clustering algorithms from the machine learning standpoint, taking into account different clustering metrics.

4.1 Evolutionary multi-objective optimization metrics

For comparison purposes, 10 independent runs were carried out by each algorithm on each data set in Table 1. The Pareto fronts generated in the 10 runs are used to compute the values of two performance metrics, which are described next:

13

• Hypervolume (HV): Having a sub-optimal Pareto set PFknown, and a reference point in objective space zref, this performance measure estimates the Hypervolume attained by it [1]. The Hypervolume corresponds to the non-overlapping volume of all the hypercubes formed by the reference point (zref) and every vector in the Pareto set approximation. This is formally defined as:

qr = [SU4 [Qs4 ∈ Nt2?uvw

where veci is a non-dominated vector from the sub-optimal Pareto set, and voli is the volume for the hypercube formed by the reference point and the non-dominated vector veci.

• Two-set Coverage (C-Metric): This binary performance measure estimates the coverage proportion, in terms of percentage of dominated solutions, between two sub-optimal Pareto sets [1]. Given two sets X' and X'', both containing only feasible non-dominated solutions, the C-Metric is formally defined as:

\ xJ, xJJ =OJJ ∈ xJJ; ∃OJ ∈ xJ: O ≽ OJJ

xJJ

If all the points in X' dominate or are equal to all points in X'', then by definition C = 1. CS = 0 otherwise. The C-measure value means the portion of solutions in X'' being dominated by any solution in X'.

4.2 Clustering metrics.

Regarding the comparison of the four versions of MOCK (MOCKPESA-II, MOCKNSGA-II, MOCKSPEA-2 and MOCKMOEA/D) against clustering algorithms (an agglomerative hierarchical clustering algorithm, the k-means algorithm and a meta-clustering algorithm) two different metrics were used to evaluate the performance of algorithms: F-measure and Silhouette Coefficient.

• F-measure: This measure requires knowledge of the correct class labels for

each instance of a given data set. It is compared with the generated model by a given clustering algorithm. ni,j gives the number of elements of class i within cluster j. For each class i and cluster j precision and recall are then defined as: <(7, ;) =

?a,{

?{, and P(7, ;) =

?a,{

?a, respectively, and the

corresponding value under the F-measure is :

t(7, ;) =(|1 + 1) ∗ <(7, ;) ∗ P(7, ;)

|1 ∗ <(7, ;) + P(7, ;)

where b=1, to obtain equal weighting for p(i,j) and r(i,j). The overall F-measure for a partitioning is computed as:

14

t ="4"

4

8O,: t(7, ;)

where n is the total size of the data set. F is limited to the interval [0,1] and it should be maximized. Therefore, a value closer to 1 indicates a better result.

• Silhouette coefficient: This cluster quality measure is based on the idea that a “good” clustering should consist of well-separated, cohesive clusters, it is defined as follows: Given a partitioning P of a set of N objects into k clusters, consider any fixed object i and let Ci denote the set of indices for all objects clustered together with object i. As a cohesion measure for cluster Ci, take the average dissimilarity between object i and all other objects in Ci:

O(7) =1

"4 }4::∈_a

where }4: is the dissimilarity between objects i and j and "4 is the number of objects in cluster Ci . To characterize the separation between clusters, let Kl

denote the lth neighboring cluster, distinct from Ci, for U = 1, 2, … , ( − 1. Define |(7) as the average dissimilarity between object i in cluster Ci and the objects in the closest neighboring cluster.

|(7) = min?Ä

}7;:∈ÅÄ

The silhouette coefficient s(i) for object i is defined as the normalized difference:

M(7) =|(7) − O(7)

8O,{O(7), |(7)}

For the silhouette coefficient M(7) to be well-defined, P must contain at least two clusters and every cluster in P must contain at least two objects. According to the authors of this quality measure, the interpretation for silhouette coefficient should be as follows [32]: Values between 0.70 - 1.00 strong structure. Values between 0.51 – 0.70 reasonable structure. Values between 0.26 – 0.50 weak structure, try other methods. Values between below 0.25 no structure found.

For this experiment 10 independent runs were also carried out by each algorithm. In this case it is worth considering that multi-objective clustering algorithms

15

(MOCKPESA-II, MOCKNSGA-II, MOCKSPEA-2 and MOCKMOEA/D) return many clustering solutions, so that for each independent execution a solution of the Pareto front must be selected. The solution chosen for this experiment is the solution with the highest value of F-measure, therefore the results represent the best model obtained. This experiment is designed to report the statistical results of F-measure and silhouette coefficient for each algorithm. The value of the silhouette coefficient can vary between -1 and 1.

4.3 Parameter Tuning

To obtain suitable parameters for each MOCK version, the Irace tool (Iterated Racing Algorithm for Automatic Configuration) was used [24,25]. The hypervolume value was used as a performance value for such tuning process. The calibration of three MOCK's parameters was carried out: the number of generations, L (amount of nearest neighbors) and the crossover probability. The rest of the parameters of the algorithms were not considered since those values proposed in [10,11] were taken as they depend on the number of instances of each data set.

The two configurations of parameter values obtained after the tuning process using Irace, as well as those values for the rest of the parameters taken from the literature, are shown in Table 2. Note that the value of the external population size is different in MOCKPESA-II and MOCKSPEA-2 and MOCKMOEA/D. In MOCKNSGA-II this parameter is not necessary.

Table 2 Parameter values, those obtained by the Irace tool are shown in bold Algorithm Parameter Value

First configuration

MOCKPESA-II

MOCKNSGA-II

MOCKSPEA-2

MOCKMOEA/D

Number of generations 292

External population size 1000 for MOCKPESA-II

Max(50,N/20) for MOCKSPEA-2

and MOCKMOEA/D

Internal population size Max(50,N/20) L nearest neighbours 22

Crossover rate 0.6 Mutation rate 1/N

Second configuration

MOCKPESA-II MOCKNSGA-II

MOCKSPEA-2

MOCKMOEA/D

Number of generations 246

External population size 1000 for MOCKPESA-II

Max(50,N/20) for MOCKSPEA-2

and MOCKMOEA/D Internal population size Max(50,N/20)

L nearest neighbours 20 Crossover rate 0.7 Mutation rate 1/N

16

4.4 Results

The results shown below are distributed as follows: • Results of the first experiment: For each parameter configuration in Table 2,

the Pareto fronts obtained by each algorithm are shown, as well as the statistical values of the hypervolume and two-set coverage metrics. Finally, the results of the Wilcoxon test on those results are provided to get confidence on the differences observed in the samples of runs.

• Results of the second experiment: Similar to the first experiment, the results with the two parameter configurations are presented, but in this case results of the F-measure and silhouette coefficient are provided.

4.4.1 Results of the first experiment

The Pareto fronts obtained by each algorithm on the run located in the median value of the hypervolume metric are presented in Figures 4 and 5 for each one of the two parameter configurations in Table 2.

From the plots in Figures 4 and 5, it can be seen that the three new versions

proposed in this work (MOCKNSGA-II, MOCKSPEA-2 and MOCKMOEA/D) obtained better overall results compared to the original algorithm (MOCKPESA-II). In this plots it can also be observed that the distribution of solutions in MOCKSPEA-2 and MOCKMOEA/D is similar, but MOCKNSGA-II presents the best distribution of the Pareto front. However, such visual comparison must be enriched in more detail with the metric values.

The results for the hypervolume metric on 10 independent runs are summarized in

Tables 3 and 4, where the best result is remarked in bold. The reference point to calculate that metric was (0, 0). For this metric, a lower value of hypervolume identifies a better result.

17

Fig. 4 Pareto fronts obtained by the four compared MOCK versions on each data set by using the first parameter configuration

18

Fig 5 Pareto fronts obtained by the four compared MOCK versions on each data set by using the second parameter configuration

19

Table 3 Statistical results of the hypervolume metric for each data set using the first parameter configuration

Algorithm Best Mean Median St. dev. Worst

Dermatology

MOCKPESA-II 0.5058 0.50606 0.5060 0.000250333 0.5074 MOCKNSGA-II 0.1796 0.18047 0.1809 0.000561842 0.1809 MOCKSPEA-2 0.2431 0.24517 0.2454 0.000763108 0.2456 MOCKMOEA/D 0.2308 0.23597 0.2361 0.003769423 0.2410

Iris


Wine


Wisconsin


Yeast


Seeds


User Knowledge Modeling


Table 4 Statistical results of the hypervolume metric for each data set using the second parameter configuration

Dermatology


Iris


Wine


20

Wisconsin


Yeast


Seeds


User Knowledge Modeling (UKM)


The results in Tables 3 and 4 suggest that MOCKNSGA-II outperformed the other three algorithms, obtaining better hypervolume values. Instead of the results of MOCKSPEA-2 and MOCKMOEA/D are similar, MOCKMOEA/D obtains better hypervolume results in most data sets. The first configuration obtained better values in the Iris, Wisconsin and User Knowledge Modeling data sets, while the second configuration obtained better results in the Dermatology, Wine and Seeds data sets. The results of the 95%-confidence Wilcoxon test [26] are presented in Tables 5 and 6.

Table 5 Wilcoxon test results for the hypervolume metric using the first parameter configuration

Dermatology Iris Wine Wisconsin Yeast Seeds UKM

MOCKPESA-II vs

MOCKNSGA-II p-value 1.3093e-04 1.2161e-04 1.5028e-04 9.3489e-05 1.2543e-04 1.8267e-04 1.8267e-04

Significant difference

MOCKPESA-II vs MOCKSPEA-2

p-value 1.4332e-04 1.2313e-04 1.5566e-04 1.3993e-04 1.1495e-04 1.8267e-04 1.8267e-04 Significant difference

MOCKPESA-II vs MOCKMOEA/D


MOCKNSGA-II vs MOCKSPEA-2


MOCKNSGA-II vs MOCKMOEA/D


MOCKSPEA-2 vs MOCKMOEA/D

p-value 1.5930e-04 1.4760e-04 1.6492e-04 0.1824 1.4596e-04 1.0830e-05 1.8267e-04 Significant difference

X

21

Table 6 Wilcoxon test results for the hypervolume metric using the second parameter configuration

Dermatology Iris Wine Wisconsin Yeast Seeds Data

MOCKPESA-II vs

MOCKNSGA-II p-value 9.5991e-05 1.1208e-04 1.2934e-04 1.2161e-04 1.5116e-04 1.8267e-04 1.8267e-04

Significant difference

MOCKPESA-II vs MOCKSPEA-2


MOCKPESA-II vs MOCKMOEA/D


MOCKNSGA-II vs MOCKSPEA-2


MOCKNSGA-II vs MOCKMOEA/D


MOCKSPEA-2 vs MOCKMOEA/D


According to the Wilcoxon test results for the hypervolume metric the differences

observed in most samples of runs are significant. Therefore, MOCKNSGA-II outperformed the other two algorithms based on this metric. However, in the case of the first configuration, for Wisconsin data set, the statistical test shows that doesn’t exist significant difference between MOCKSPEA-2 and MOCKMOEA/D, therefore it can be concluded that in this case these algorithms have similar behavior.

To complement the results obtained with the hypervolume metric, in Tables 7 and

8 the statistical results obtained by C-metric are summarized. Such results are based on all the pairwise combinations of the 10 independent runs executed by each algorithm (each one of the ten Pareto fronts of one algorithm was compared with each one the ten fronts obtained by the other algorithm). The 95%-confidence Wilcoxon test was used to validate the differences observed in the samples of runs.

For this metric the optimal value is 1. So that, if all solutions of an algorithm

dominate or are equal to all the solutions of the other algorithm, the value for C-metric will be equal to 1, or 0 otherwise. To say that an algorithm is better than the other, it is preferable to have values close to 1.

22

Table 7 Statistical results of the two-set coverage metric for each data set using the first parameter configuration, st. dev. means standard deviation, the symbol means that there is

significant difference

Algorithm Best Mean Median St. dev. Worst Significant difference

Dermatology

MOCKPESA-II vs MOCKNSGA-II MOCKNSGA-II vs MOCKPESA-II

0 1

0 1

0 1

0 0

0 1

1.5938e-05

MOCKPESA-II vs MOCKSPEA-2 MOCKSPEA-2 vs MOCKPESA-II

0 1

0 1

0 1

0 0

0 1

1.5938e-05

MOCKPESA-II vs MOCKMOEA/D MOCKMOEA/D vs MOCKPESA-II

0 1

0 1

0 1

0 0

0 1

1.5938e-05

MOCKNSGA-II vs MOCKSPEA-2 MOCKSPEA-2 vs MOCKNSGA-II

0.7800 0.2000

0.75600 0.19000

0.7400 0.2000

0.020655911 0.016996732

0.7400 0.1600

9.8543e-05

MOCKNSGA-II vs MOCKMOEA/D MOCKMOEA/D vs MOCKNSGA-II

0.5800 0.3400

0.5640 0.3281

0.5600 0.3300

0.015776213 0.013984118

0.5400 0.3000

1.4581e-04

MOCKSPEA-2 vs MOCKMOEA/D MOCKMOEA/D vs MOCKSPEA-2

0.0400 0.1600

0.0260 0.1500

0.0200 0.1500

0.009660918 0.010540926

0.0200 0.1400

9.9845e-05

Iris


0.0263 0.9868

0.01975 0.98287

0.0198 0.9868

0.006904306 0.006327901

0.0132 0.9737

9.9839e-05


0.0260 0.9870

0.0156 0.9857

0.0131 0.9870

0.005481281 0.004110961

0.0130 0.9740

4.7314e-05


0.0263 0.9870

0.0198 0.9872

0.0197 0.9872

0.004729392 0.004612738

0.0132 0.9746

8.8328e-05


0.6600 0.0600

0.64600 0.05600

0.6400 0.0600

0.009660918 0.008432740

0.6400 0.0400

7.3650e-05


0.8600 0.0800

0.8260 0.0580

0.8300 0.0600

0.026749871 0.011352924

0.7600 0.0400

1.1572e-04


0.8800 0.0400

0.8480 0.0160

0.8500 0.0200

0.023475756 0.015776213

0.8200 0

1.5291e-04

Wine


0 1

0 1

0 1

0 0

0 1

1.5938e-05


0 1

0 1

0 1

0 0

0 1

1.5938e-05


0 1

0 1

0 1

0 0

0 1

1.5938e-05


0.7000 0.3400

0.67400 0.33800

0.6800 0.3400

0.018973666 0.006324555

0.6400 0.3200

5.8493e-05


0.4600 0.3800

0.4460 0.3620

0.4500 0.3600

0.016465452 0.014757296

0.4200 0.3400

1.3661e-04


0.0400 0.0400

0.02400 0.03200

0.0200 0.0400

0.008432740 0.010327956

0.0200 0.0200 .08274 X

Wisconsin


0 1

0 1

0 1

0 0

0 1

1.5938e-05

MOCKPESA-II vs MOCKSPEA-II MOCKSPEA-II vs MOCKPESA-II

0.0594 0.9505

0.05049 0.94357

0.0495 0.9406

0.005619698 0.006681991

0.0396 0.9307

1.0858e-04


0 1

0 1

0 1

0 0

0 1

1.5938e-05


0.9400 0.2400

0.91800 0.22200

0.9200 0.2300

0.011352924 0.019888579

0.9000 0.2000

1.0858e-04

23


0.5000 0.3400

0.4272 0.3040

0.4600 0.3000

0.016865481 0.024585452

0.4600 0.2800

1.3666e-04


0.2400 0.7600

0.2200 0.7400

0.2200 0.7400

0.013333333 0.013333333

0.2200 0.7200

1.1790e-04

Yeast


0 1

0 1

0 1

0 0

0 1

1.5938e-05


0 1

0 1

0 1

0 0

0 1

1.5938e-05


0 1

0 1

0 1

0 0

0 1

1.5938e-05


0.7027 0.1892

0.68380 0.17164

0.6892 0.1622

0.015846135 0.015669588

0.6622 0.1486

1.3251e-04


0.5946 0.2432

0.56895 0.21890

0.5676 0.2162

0.016162457 0.012405643

0.5541 0.2027

1.4760e-04


0.1216 0.1892

0.09595 0.16487

0.0946 0.1622

0.018499324 0.013981499

0.0676 0.1486

1.6400e-04

Seeds


0 1

0 1

0 1

0 0

0 1

1.5938e-05


0 1

0 1

0 1

0 0

0 1

1.5938e-05


0 1

0 1

0 1

0 0

0 1

1.5938e-05


0.5600 0.3800

0.51200 0.34000

0.5100 0.3400

0.028596814 0.031269438

0.4800 0.3000

1.6590e-04


0.3600 0.2600

0.32400 0.23200

0.3200 0.2200

0.022705848 0.021499354

0.3000 0.2000

1.4590e-04


0.0800 0.2600

0.04000 0.20800

0.0400 0.2100

0.021081851 0.028596814

0.0200 0.1800

1.5470e-04

User Knowledge

Modeling Data


0 1

0 1

0 1

0 0

0 1

1.5938e-05


0 1

0 1

0 1

0 0

0 1

1.5938e-05


0 1

0 1

0 1

0 0

0 1

1.5938e-05


0.7200 0.2400

0.68000 0.19600

0.6800 0.1900

0.021081851 0.020655911

0.6600 0.1800

1.4590e-04


0.9000 0.0600

0.86600 0.03000

0.8600 0.0200

0.023190036 0.014142135

0.8400 0.0200

1.3740e-04


0.8000 0.1600

0.77200 0.12600

0.7700 0.1200

0.013984118 0.018973666

0.7600 0.1000

1.4080e-04

24

Table 8 Statistical results of the two-set coverage metric for each data set using the second parameter configuration, st dev means standard deviation, the symbol means that there is

significant difference

Algorithm Best Mean Median St. Dev. Worst Significant difference

Dermatology


0 1

0 1

0 1

0 0

0 1

1.5938e-05


0 1

0 1

0 1

0 0

0 1

1.5938e-05


0 1

0 1

0 1

0 0

0 1

1.5938e-05


0.7800 0.1800

0.74800 0.17600

0.7400 0.1800

0.013984118 0.008432740

0.7400 0.1600

7.5212e-05


0.7200 0.2800

0.67600 0.25200

0.6800 0.2600

0.027968236 0.021499354

0.6400 0.2200

1.5750e-04


0.0800 0.1000

0.04000 0.07000

0.0400 0.0700

0.021081851 0.021602469

0.0200 0.0400 .01065

Iris


0.0411 0.9863

0.02603 0.97671

0.0274 0.9863

0.011995652 0.012996961

0.0137 0.9589

1.3093e-04


0.0541 0.9865

0.03646 0.97029

0.0405 0.9730

0.009133236 0.012427429

0.0270 0.9459

1.3497e-04


0.0400 1

0.01400 0.98600

0.0100 0.9900

0.016465452 0.014628193

0 0.9600

1.3660e-04


0.7600 0.0800

0.71400 0.07800

0.7000 0.0800

0.023190036 0.006324555

0.7000 0.0600

5.8493e-05


0.6800 0.0400

0.64400 0.01800

0.6400 0.0200

0.020655911 0.017511900

0.6200 0

1.5380e-04


0.1400 0.1800

0.09800 0.15000

0.1000 0.1500

0.023944380 0.025385910

0.0600 0.1200

1.1210e-03

Wine


0 1

0 1

0 1

0 0

0 1

1.5938e-05


0 1

0 1

0 1

0 0

0 1

1.5938e-05


0 1

0 1

0 1

0 0

0 1

1.5938e-05


0.6600 0.4400

0.62400 0.39800

0.6400 0.3800

0.030983867 0.028982753

0.5800 0.3800

1.0049e-04


0.5600 0.4800

0.51400 0.44200

0.5200 0.4400

0.029888682 0.028982753

0.4600 0.4000

5.3160e-04


0.1000 0.1600

0.06200 0.12400

0.0600 0.1200

0.023944380 0.022705848

0.0400 0.1000

3.4880e-04

Wisconsin


0 1

0 1

0 1

0 0

0 1

1.5938e-05

MOCKPESA-II vs MOCKSPEA-II MOCKSPEA-II vs MOCKPESA-II

0 1

0 1

0 1

0 0

0 1

1.5938e-05


0 1

0 1

0 1

0 0

0 1

1.5938e-05


0.5800 0.3800

0.56400 0.37400

0.5600 0.3800

0.008432740 0.009660918

0.5600 0.3600

7.3650e-05

25


0.5600 0.2400

0.52400 0.20600

0.5200 0.2000

0.022705848 0.023190036

0.5000 0.1800

1.5840e-04


0.3600 0.2000

0.33400 0.16200

0.3400 0.1600

0.023190036 0.019888578

0.3000 0.1400

1.5840e-04

Yeast


0 1

0 1

0 1

0 0

0 1

1.5938e-05


0 1

0 1

0 1

0 0

0 1

1.5938e-05


0 1

0 1

0 1

0 0

0 1

1.5938e-05


0.6892 0.3514

0.65405 0.32702

0.64865 0.32430

0.022275609 0.017803982

0.6351 0.3108

1.3661e-04


0.4595 0.3108

0.42566 0.28110

0.41890 0.27705

0.017157389 0.016595180

0.4054 0.2568

1.5930e-04


0.0541 0.2703

0.02972 0.25270

0.02700 0.25000

0.016627741 0.014348441

0.0135 0.2297

1.5570e-04

Seeds


0 1

0 1

0 1

0 0

0 1

1.5938e-05


0 1

0 1

0 1

0 0

0 1

1.5938e-05


0 1

0 1

0 1

0 0

0 1

1.5938e-05


0.5200 0.4000

0.48200 0.37600

0.4800 0.3800

0.022010098 0.015776212

0.4600 0.3600

1.4850e-04


0.4200 0.3200

0.39600 0.29200

0.3900 0.3000

0.018378731 0.016865480

0.3800 0.2600

1.3740e-04


0.0800 0.2000

0.04600 0.16800

0.0400 0.1600

0.026749870 0.025298221

0.0200 0.1400

1.5660e-04

User Knowledge

Modeling Data


0 1

0 1

0 1

0 0

0 1

1.5938e-05


0 1

0 1

0 1

0 0

0 1

1.5938e-05


0 1

0 1

0 1

0 0

0 1

1.5938e-05


0.5200 0.3200

0.49818 0.28364

0.5000 0.2800

0.018856180 0.026331223

0.4800 0.2600

6.2670e-05


0.3200 0.2400

0.29636 0.21636

0.3000 0.2200

0.012649110 0.015776212

0.2800 0.2000

5.2280e-05


0.4200 0.4400

0.37273 0.42727

0.3600 0.4200

0.021186998 0.013498971

0.3600 0.4000

1.4441e-04

The results in Tables 7 and 8 show that in all cases the algorithms proposed in

this research (MOCKNSGA-II, MOCKSPEA-2 , MOCKMOEA/D ) outperformed the original algorithm (MOCKPESA-II) based in C-metric. Regarding only the proposed algorithms, MOCKNSGA-II has a slightly better performance with respect to MOCKSPEA-2 and MOCKMOEA/D. The Wilcoxon test indicates that in all comparisons of the second configuration, the differences observed are significant. Regarding the first configuration, in the case of Wine data set MOCKSPEA-2 and MOCKMOEA/D have

26

similar behavior according to C-metric, that means that the solutions of both algorithms have similar distribution in the Pareto front.

4.4.2 Results of the second experiment

In order to evaluate the performance of the algorithms proposed according to the quality of the clustering obtained, a comparison was made against different clustering algorithms using the F-measure and silhouette coefficient. For the three methods used in the comparison, the number of clusters (k) should be specified a priori, so that, the value of k used for the algorithms is shown in Table 1. This experiment is interesting because it allows to obtain information on the performance of algorithms to find solutions of high quality, giving the advantage that unlike other classical clustering algorithms, a scenario where the expert has the opportunity to try a number of alternative solutions is obtained. The results obtained on 10 independent runs by computing the mean value of the F-measure and silhouette coefficient are summarized in Tables 9, 10, 11 and 12. The best result is shown in bold letters.

Table 9 Average F-measure value on ten independent runs, using the first parameter configuration

Table 10 Average silhouette coefficient value on ten independent runs, using the first parameter configuration

Algorithm Database

Hierarchical Clustering k-means Meta-

clustering MOCK

PESA-II MOCK NSGA-II

MOCK SPEA-2

MOCK MOEA/D

Dermatology 0.898637 0.953200 0.782163 0.939852 0.958497 0.953284 0.953967 Iris 0.809857 0.817795 0.774389 0.896429 0.992410 0.988101 0.917362

Wine 0.925527 0.919785 0.925524 0.944720 0.949748 0.949028 0.949375 Wisconsin 0.965966 0.965825 0.849309 0.975968 0.983718 0.971382 0.978927

Yeast 0.448316 0.422493 0.309288 0.513901 0.804620 0.800271 0.801936

Algorithm Database


clustering MOCK


MOCK SPEA-2

MOCK MOEA/D


Wine 0.839589 0.799685 0.830383 0.852838 0.890738 0.870036 0.877925 Wisconsin 0.857125 0.859636 0.729474 0.873748 0.920476 0.909873 0.915293

Yeast 0.479937 0.498068 0.370348 0.550103 0.749739 0.679037 0.710327 Seeds 0.779985 0.780747 0.759757 0.797361 0.908756 0.869173 0.889348

User Knowledge Modeling Data 0.842688 0.827868 0.789553 0.850283 0.897535 0.886383 0.862735

27

Table 11 Average F-measure value of ten independent runs, using the second parameter configuration

Table 12 Average silhouette coefficient value on ten independent runs, using the second parameter configuration

Based on the results observed in Tables 9 and 11, the four versions of MOCK outperformed the other clustering algorithms. According to the F-measure, MOCKNSGA-II had a better performance with respect to MOCKSPEA-2, MOCKMOEA/D and the original algorithm in all data sets. The results obtained in this second experiment agree with those obtained in the first experiment, where the evolutionary multi-objective performance metrics suggested that the three new MOCK versions outperformed the original one. Moreover, based on the F-measure, three new versions also provided highly competitive results with respect to other clustering algorithms by keeping the advantages of MOCK. The results of Seeds and User Knowledge Modeling Data Set were not considered based on F-measure, because this metric consider the class atriibutte to obtain the F-measure value and these data sets are for unsupervised learning. The poor results of F-measure for traditional methods are affected, because of, they can be fail on certain data sets, for instance with databases of various shapes and dimensions, k-means and agglomerative hierarchical clustering fail for data sets with

Algorithm Database


clustering MOCK


MOCK SPEA-2

MOCK MOEA/D


Wine 0.919374 0.910274 0.920837 0.944958 0.951094 0.947928 0.949628 Wisconsin 0.950374 0.963944 0.959837 0.970968 0.973972 0.970988 0.971973

Yeast 0.453938 0.419373 0.310383 0.509658 0.801957 0.800126 0.800392

Algorithm Database


clustering MOCK


MOCK SPEA-2

MOCK MOEA/D


Wine 0.810839 0.780738 0.8099003 0.867559 0.880273 0.869677 0.870119 Wisconsin 0.839668 0.849780 0.730771 0.868011 0.910368 0.879770 0.880064

Yeast 0.429048 0.500739 0.384938 0.571024 0.750749 0.690274 0.719738 Seeds 0.770389 0.790383 0.762037 0.810221 0.910337 0.872037 0.880382

User Knowledge Modeling Data 0.853074 0.819734 0.770748 0.860103 0.900380 0.892141 0.880421

28

elongated cluster shapes. For example, in the case of Iris, Dermatology and Yeast data sets, the F-measure values obtained by the classical clustering algorithms are significantly lower. This is due to the fact that in these data sets, the actual cluster borders (according to the true class labels) are not discernible from the structure, making difficult identify the correctly number of clusters. Otherwise happens in the case of Wine and Wisconsin data sets, where the cluster structure is clear. For silhouette coefficient results, it can be observed that, for two parameter configurations, in most data sets the MOCK versions always found clusters with a good structure, however in the case of Yeast data set, MOCKPESA-II (original algorithm) found clusters with a reasonable structure, the rest of the clustering techniques selected have a poor perfomance compared with evolutionary approaches, it can be see in Tables 10 and 12 that the traditional techniques obtain lower values of silhouette coefficient, therefore, the results suggest that all MOCK versions are good methods for clustering task and are better than traditional methods where the quality of cluster is relatively lower. This quality measure confirms the results previously obtained, where the best algorithm was MOCKNSGA-II. Regarding to the performance of auto k-determination, in all data sets the proposed approaches found the appropriate k perfectly according to Table 1. The correct number of clusters could be reflected in the results obtained by clustering metrics due to, in all cases the evolutionary approaches obtain good values of F-measure and silhouette coeficient.

5 Conclusions and future work

In this work an improvement of the MOCK algorithm based on using different evolutionary multi-objective algorithms was presented. Three new versions of MOCK based on NSGA-II, SPEA-2 and MOEA/D were proposed and analyzed. A comparative study based on two standpoints (evolutionary multi-objective optimization metrics and clustering metrics) was carried out by dealing with data sets with different features. The obtained results indicate that three new MOCK versions are viable alternatives since their performance are highly competitive with respect to the original MOCK based on the hypervolume and C-metric. Furthermore, three proposed versions outperformed other clustering algorithms based on the F-metric and silhouette coefficient metric. The obtained F-metric and silhouette coefficient results, also confirm that the evolutionary approach can be used getting favorable results in clustering task with more advantages than traditional clustering algorithms, like the fact that the value of k (number of groups) doesn’t need to be specified a-priori, the MOEA also performs a better exploration and exploitation of the search space and avoids falling into local optimal solutions, this allows to obtain better solutions and provides to the user a set of clustering solutions instead of a single solution, this enable to the user choose one or more clustering solutions according to different criteria. Finally, between the three new MOCK versions proposed in this research, the one based on NSGA-II provided slightly better results with respect to the one based on SPEA-2 and MOEA/D.

29

Part of the future work relies in the study of different variation operators within the improved MOCK, using in this case other operators for integer encoding, Another recently MOEAs could be used too to improve the performance of MOCK, for example NSGA-III, DEMO, SMS-EMOA, etc. Furthermore, the algorithms will be tested in others data sets and another objective could be included, for example for maximize the distances among clusters to evaluate if the results could be improved.

Acknowledgments

The first author acknowledges economical support from CONACyT through scholarship No. 258800 and the academic support from the University of Veracruz to pursue graduate studies. The second author acknowledges support from CONACyT through project No. 220522.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this article.

References

1. Coello CAC, Van Veldhuizen DA, Lamont GB (2002) Evolutionary Algorithms for Solving Multi-Objective Problems. New York, USA

2. Jin Y (ed) (2006) Multi-Objective Machine Learning. New York, USA 3. Deb K (2001) Multi-objective optimization using evolutionary algorithms.

Chichester, London 4. Cole RM (1998) Clustering with Genetic Algorithms. Western University, MSc

Thesis 5. Ma PCH, Chan KCC, Yao X, Chiu DKY (2006) An Evolutionary Clustering

Algorithm for Gene Expression Microarray Data Analysis. IEEE Trans Evol Comput 10(3): 296–314

6. Casillas A, González de Lena MT, Martínez R (2003) Document Clustering into an Unknown Number of Clusters Using a Genetic Algorithm. In: Matoušek V, Mautner P (eds) TSD 2003: Text, Speech and Dialogue. 6th international conference on Text Speech and Dialogue, České Budéjovice, September 2003. Lecture notes in computer science, vol 2807. Springer, Heidelberg, pp 43–49

7. Cormen TH, Leiserson CE, Rivest RL, Stein C (2001) Introduction to Algorithms. Massachusetts, USA.

8. Ferligoj A, Batagelj V (1992) Direct multicriterion clustering. J Classif 9:43–61 9. Handl J, Knowles J (2004) Evolutionary multiobjective clustering. In: Yao X et al

(eds) PPSN VIII: Parallel Problem Solving from Nature, Birmingham, September 2004. Lecture notes in computer science, vol 3242. Springer, Heidelberg, pp 1081–1091

30

10. Handl J, Knowles J (2004) Multiobjective clustering with automatic determination of the number of clusters, Technical Report TR-COMPSYSBIO-2004-02, UMIST, Manchester, UK.

11. Handl J, Knowles J (2007) An Evolutionary Approach to Multiobjective Clustering. IEEE Trans Evol Comput 11(1): 56–76

12. Corne DW, Jerram NR, Knowles JD, Oates MJ (2001) PESA-II: Region-Based Selection in Evolutionary Multiobjective Optimization. In: Spector L, Goodman D et al (eds) Proceedings of the genetic evolutionary computation conference. Morgan Kaufmann, San Francisco, pp 283–290

13. Korkmaz EE, Du J, Alhajj R, Barker K (2006) Combining advantages of new chromosome representation scheme and multi-objective genetic algorithms for better clustering. J Intell Data Anal 10(2):163–182

14. Mukhopadhyay A, Maulik U, Bandyopadhyay S (2007) Multiobjective genetic fuzzy clustering of categorical attributes. In: Proceedings of the 10th international conference on Information Technology, Orissa, December 2007, IEEE Computer Society, pp 74–79

15. Ripon KSN, Tsang C-H, Kwong S, Ip M-K (2006) Multi-objective evolutionary clustering using variable-length real jumping genes genetic algorithm. In: Proceedings of the 18th international conference on Pattern Recognition (ICPR'06). IEEE Computer Society, vol 1, pp 1200 –1203

16. Jain AK (2010) Data Clustering: 50 Years Beyond K-Means. Pattern Recogn Lett 31(8):651–66.

17. Rokach L (2010) A survey of Clustering Algorithms. In Maimon O, Rokach L (eds) Data Mining and Knowledge Discovery Handbook, Springer, US, pp 269–298

18. Wilson RJ, Watkins JJ (1990) Graphs: an introductory approach: a first course in discrete mathematics. New York, USA

19. Corne DW, Knowles JD, Oates MJ (2000) The Pareto envelope-based selection algorithm for multiobjective optimization. In: Schoenauer M, Deb K, Rudolph G et al (eds) Proceedings of the 6th international conference of Parallel Problem Solving from Nature (PPSN VI), Paris France, September 2000. Lecture notes in computer science, vol 1917. Springer, Heidelberg, pp 839–848

20. Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A Fast and Elitist Multiobjective Genetic Algorithm: NSGA–II. IEEE Trans Evol Comput 6(2): 182–197

21. Zitzler E, Laumanns M Thiele L (2001) SPEA2: Improving the Strength Pareto Evolutionary Algorithm, Technical Report 103, Computer Engineering and Networks Laboratory (TIK), Swiss Federal Institute of Technology (ETH), Zurich, Switzerland

22. Deb K (2000) An Efficient Constraint Handling Method for Genetic Algorithms. Comput Method Appl M 186(2/4): 311–338

23. Blake C, Merz C (1998) UCI repository of machine learning databases. http://www.ics.uci.edu/mlearn/MLRepository.html. Accessed 25 March 2014

24. López-Ibáñez M, Dubois-Lacoste J (2003) The irace package: Iterated Race for Automatic Algorithm Configuration. http://cran.rproject.org/web/packages/irace/inde x.html. Accesed 12 May 2013

25. López-Ibáñez M, Dubois-Lacoste J, Stützle T, Birattari M (2011) The irace Package: Iterated Racing for Automatic Algorithm Configuration, Technical Report TR/IRIDIA/2011-004, IRIDIA, Université Libre de Bruxelles, Belgium

26. Derrac J, García S, Molina D, Herrera Francisco (2011) A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol Comput 1(1): 3–18

27. Hruschka ER, Campello RJGB, Freitas AA, de Carvalho ACPLF (2009) A survey of evolutionary algorithms for clustering. IEEE Trans. Syst., Man, Cybern., Syst. 39(2): 133-155

31

28. Bandyopadhyay S, Maulik U (2001) Nonparametric Genetic Clustering: Comparison of Validity Indices. IEEE Trans. Syst., Man, Cybern., Syst. 31(1): 120–125

29. Pan S, Cheng K (2007) Evolution-Based Tabu Search Approach to Automatic Clustering. IEEE Trans. Syst., Man, Cybern., Syst. 37(5): 827–838

30. Arias-Montaño A, Coello CAC, Mezura-Montes E (2012) Multiobjective Evolutionary Algorithms in Aeronautical and Aerospace Engineering. IEEE Trans Evol Comput 16 (5): 662–694

31. Zhang Q, Li H (2007) MOEA/D: A Multiobjective Evolutionary Algorithm Based on Decomposition. IEEE Trans Evol Comput 11 (6): 712–731

32. Pearson RK, Zylkin T, Schwaber JS, Gonye GE (2004) Quantitative evaluation of clustering results using computationl negative controls. In: Proceedings of the 15th international conference on Data Mining (SIAM), Lake Buena Vista, Florida, pp 188–199

Documents

Improved multi-objective clustering with automatic