Nanda and Panda 2013 - A Survey on Nature Inspired Metaheuristic Algorithms for Partitional Clustering

  • Upload
    fckw-1

  • View
    222

  • Download
    0

Embed Size (px)

Citation preview

  • 8/16/2019 Nanda and Panda 2013 - A Survey on Nature Inspired Metaheuristic Algorithms for Partitional Clustering

    1/18

    Review

    A survey on nature inspired metaheuristic algorithmsfor partitional clustering

    Satyasai Jagannath Nanda a,n, Ganapati Panda b

    a Department of Electronics and Communication Engineering, Malaviya National Institute of Technology Jaipur, Rajasthan 302017, Indiab School of Electrical Sciences, Indian Institute of Technology Bhubaneswar, Odisha 751013, India

    a r t i c l e i n f o

     Article history:

    Received 10 October 2012Received in revised form

    23 August 2013

    Accepted 20 November 2013

    Keywords:

    Partitional clustering

    Nature inspired metaheuristics

    Evolutionary algorithms

    Swarm intelligence

    Multi-objective Clustering

    a b s t r a c t

    The partitional clustering concept started with K-means algorithm which was published in 1957. Since

    then many classical partitional clustering algorithms have been reported based on gradient descentapproach. The 1990 kick started a new era in cluster analysis with the application of nature inspired

    metaheuristics. After initial formulation nearly two decades have passed and researchers have developed

    numerous new algorithms in this  eld. This paper embodies an up-to-date review of all major nature

    inspired metaheuristic algorithms employed till date for partitional clustering. Further, key issues

    involved during formulation of various metaheuristics as a clustering problem and major application

    areas are discussed.

    &   2014 Published by Elsevier B.V.

    Contents

    1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

    2. Single objective nature inspired metaheuristics in partitional clustering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    2.1. Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    2.2. Historical developments in nature inspired metaheuristics for partitional clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    2.2.1. Evolutionary algorithms in partitional clustering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    2.2.2. Physical algorithms in partitional clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    2.2.3. Swarm Intelligence algorithms in partitional clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    2.2.4. Bio-inspired algorithms in partitional clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    2.2.5. Other nature inspired metaheuristics for partitional clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    2.3. Fitness functions for partitional clustering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    2.4. Cluster validity indices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    3. Multi-objective algorithms for  exible clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    3.1. Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    3.2. Historical development in multi-objective algorithms for partitional clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    3.3. Evaluation methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    4. Real life application areas of nature inspired metaheuristics based partitional clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    6. Future research issues. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

    Contents lists available at  ScienceDirect

    journal homepage:   www.elsevier.com/locate/swevo

    Swarm and Evolutionary Computation

    2210-6502/$- see front matter  &  2014 Published by Elsevier B.V.

    http://dx.doi.org/10.1016/j.swevo.2013.11.003

    n Corresponding author.

    E-mail addresses: [email protected] (S.J. Nanda), [email protected] (G. Panda).

    Please cite this article as: S.J. Nanda, G. Panda, A survey on nature inspired metaheuristic algorithms for partitional clustering, Swarmand Evolutionary Computation (2014),  http://dx.doi.org/10.1016/j.swevo.2013.11.003i

    Swarm and Evolutionary Computation ∎ (∎∎∎∎) ∎∎∎–∎∎∎

    http://www.sciencedirect.com/science/journal/22106502http://www.elsevier.com/locate/swevohttp://dx.doi.org/10.1016/j.swevo.2013.11.003mailto:[email protected]:[email protected]://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003mailto:[email protected]:[email protected]://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003http://www.elsevier.com/locate/swevohttp://www.sciencedirect.com/science/journal/22106502

  • 8/16/2019 Nanda and Panda 2013 - A Survey on Nature Inspired Metaheuristic Algorithms for Partitional Clustering

    2/18

  • 8/16/2019 Nanda and Panda 2013 - A Survey on Nature Inspired Metaheuristic Algorithms for Partitional Clustering

    3/18

    inspired metaheuristics used in partitional clustering, (2) up-to-

    date survey on   exible partitional clustering based on multi-

    objective metaheuristic algorithms, (3) consolidation of recently

    developed cluster validation majors, and (4) exploration of the

    new application areas of partitional clustering algorithms.

    The paper is organized as follows.   Section 2   deals with the

    advances in single objective nature inspired metaheuristics for

    partitional clustering, which includes recent developments in

    algorithm design,   tness functions selection and cluster validity

    indices used for verication. The multi-objective metaheuristicsused for  exible clustering are discussed in Section 3. The real life

    application areas of nature inspired partitional clustering are

    highlighted in   Section 4. Finally the concluding remarks of 

    investigation made in the survey are presented in   Section 5.

    A number of issues on innovative future research are presented

    in Section 7.

    2. Single objective nature inspired metaheuristics in

    partitional clustering 

     2.1. Problem formulation

    Given an unleveled dataset   Z N D ¼ f z 1D; z 2D;…

    ; z N Dg   repre-senting   N   patterns, each having   D   features, partitional approach

    aims to cluster the dataset into  K   groups  ðK rN Þ  such that

    C kaϕ   8 k ¼ 1; 2;…; K ;

    C k   \ C l ¼ ϕ   8k;   l ¼ 1; 2;‥K   and   ka l;   ⋃K 

    k ¼  1

    C k ¼ Z :   ð1Þ

    The clustering operation is dependent on the similarity between

    elements present in the dataset. If  f    denotes the  tness function

    then the clustering task is viewed as an optimization problem as

    C kOptimize

     f ð Z N D; C kÞ

      8 k ¼ 1; 2;…; K    ð2Þ

    Hence the optimization based clustering task is carried out by

    single objective nature inspired metaheuristic algorithms.

     2.2. Historical developments in nature inspired metaheuristics

     for partitional clustering 

    In the last two decades a number of nature inspired metaheur-

    istics have been proposed in the literature and applied to many

    real life applications. In recent years to solve various unsupervised

    optimization problems the metaheuristic algorithms are success-

    fully used. Present stage for any unsupervised optimization pro-

    blem in hand an user can easily pick up a suitable metaheuristic

    algorithm for solving the purpose. The solution achieved ensuresoptimality as these population based algorithms explore the entire

    search space with the progress in generations.

    The basic steps associated with the core algorithms for parti-

    tional clustering are listed in   Table 2. The recent works on

    partitional clustering are outlined in sequence.

     2.2.1. Evolutionary algorithms in partitional clustering 

    The evolutionary algorithms are inspired by Darwin theory of 

    natural selection which is based on survival of  ttest candidate for

    a given environment. These algorithms begin with a population

    (set of solutions) which tries to survive in an environment

    (dened with   tness evaluation). The parent population sharestheir properties of adaptation to the environment to the children

    with various mechanisms of evolution such as genetic crossover

    and mutation. The process continues over a number of generations

    (iterative process) till the solutions are found to be most suitable

    for the environment. With this concept in mind initially Holland

    proposed the Genetic Algorithm (GA) in 1975 [46,47]. It is followed

    by development of Evolution Strategies (ES) by Schwefel in 1981

    [49–51] and Genetic Programming (GP) by Koza [52] in 1992. Storn

    and Price developed another evolutionary concept in 1997 termed

    as Differential Evolution (DE)   [53]. The books   [54,150]   on DE,

    research work on adaptive DE   [55,56]   and opposition-based DE

    [57,58]   made the DE quite popular amongst researchers. The

    application of these evolutionary algorithms to partitional cluster-

    ing is outlined below.

     Table 1

    Broad classication of nature inspired metaheuristic algorithms.

    Types Single objective Multi-objective

    Evolutionary algorithms Genetic Algorithm (GA)  [46,47]   NSGA II [305,306],

    Differential Evolution (DE) [53–58]   Multi-objective DE [343]

    Genetic Programming (GP) [52]   Multi-objective GP [317]

    Evolutionary Strategy (ES) [51–139]   Multi-objective ES [318]

    Granular agent evolutionary algo. [358]   SPEA [326], PESA II  [325]

    Physical algorithms Simulated Annealing (SA) [48]   Multi-objective SA [313]

    Memetic Algorithm (MA) [167–170]   Multi-objective MA [314]

    Harmony Search (HS) [173,174]   Multi-objective HS [315]

    Shuf ed Frog-Leaping algo. (SFL) [179]   Multi-objective SFL   [316]

    Swarm intelligence Ant Colony Opt. (ACO) [62–67]   Multi-objective ACO [333]

    Particle Swarm Opt. (PSO) [68–72]   Multi-objective PSO [307]

    Articial Bee Colony (ABC)  [73–77]   Multi-objective ABC [310]

    Fish Swarm algo. (FSA) [254,255]   Multi-objective FSA [321]

    Bio-inspired algorithms Articial Immune System (AIS) [78–83]   Multi-objective AIS [308]

    Bacterial Foraging Opt. (BFO)  [84,85]   Multi-objective BFO [309]

    Dendritic Cell algo. [87,88]

    Krill herd algo.  [356]

    Other nature inspired algorithms Cat Swarm Opt. (CSO)[269,270]   Multi-objective CSO [311]

    Cuckoo Search algo. [272–274]   Multi-objective Cuckoo  [319]

    Firey algo. [275–277]   Multi-objective Firey [312]

    Invasive Weed Opt. algo. (IWO)[280]   Multi-objective IWO [283]Gravitational Search algo.  [285,286]   Multi-objective GSA [320]

    River formation dynamics [357]

    Bat algorithm [359,360]   Multi-objective Bat  [361]

    S.J. Nanda, G. Panda / Swarm and Evolutionary Computation ∎ (∎∎∎∎) ∎∎∎–∎∎∎   3

    Please cite this article as: S.J. Nanda, G. Panda, A survey on nature inspired metaheuristic algorithms for partitional clustering, Swarmand Evolutionary Computation (2014),  http://dx.doi.org/10.1016/j.swevo.2013.11.003i

    http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003

  • 8/16/2019 Nanda and Panda 2013 - A Survey on Nature Inspired Metaheuristic Algorithms for Partitional Clustering

    4/18

      GA-based approaches: Bezdek et al. [100] initially proposed the useof basic genetic algorithm for partitional clustering. The standard

    binary encoding scheme with   xed number of cluster centers(k) is used for initialization of chromosomes [100–102]. The repro-

    duction operation is carried out using uniform crossover and

    cluster-oriented mutation (altering the bits of binary string).

    Subsequently integer based encoding of chromosomes is used

    by Murthy and Chowdhury [103]. They suggested the use of single

    point crossover and Xiaofeng–Palmieri based mutation scheme

    [104] for reproduction. However theoretically this mutation may

    produce invalid offsprings. Maulik and Bandyopadhyay have

    proposed the use of real coded genetic algorithm for partitional

    clustering [105]. With real coding the computational complexity is

    reduced to O(k) compared to O(nk) associated with integer or

    binary encoding. A genetic K-mean algorithm is proposed in [106]

    which replaces the crossover phenomenon by the basic search

    operation with K-means. Based on this concept Lu et al. havedeveloped fast genetic K-means [108] and incremental genetic k-

    means  [109] algorithms for gene expression data analysis. Simi-

    larly Sheng and Liu  [107] have proposed a genetic based hybrid

    K-medoid algorithm for accurate clustering of large databases.

    All these algorithms are based on a   xed number of clusters.

    These algorithms work satisfactorily when the suitable number of 

    partitions for a dataset is known a priori. But in many practical

    scenarios the value of K (number of clusters) is unknown to user. The

    K value directly affects the partition quality, therefore it is necessary

    that the clustering algorithm should explore the number of partitions

    along with the process of optimization. Cowgill et al.   [110]   have

    developed a hybrid algorithm COWCLUS which   rst uses non-

    deterministic genetic algorithm based approach to determine the

    good partitions, then used hill-climbing approach to improve these

    partitions to produce the   nal best partition. Tseng and Yang   [111]

    have proposed the automatic evolution of clusters with geneticalgorithm. In   [112,113]   Bandopadhay and Maulik have developed

    nonparametric genetic algorithm for automatic selection of number

    of partitions K. Based upon this concept a self-adaptive genetic

    algorithm for cluster analysis is reported in   [125]. Recently a

    quantum inspired genetic algorithm for k-means clustering is pro-

    posed by Xiao et al. [128] which reports superior performance than

    that obtained in   [112,113]. The automatic evolution of clusters has

    been successfully applied to image classication   [113], document

    clustering [115], intrusion detection [116], microarray [117] and gene-

    expression data analyses [118–120].

    In genetic based evolutionary approaches normally a population is

    initialized where each individual searches for the optimal weight

    vector for all the clusters. Gancarski and Blansche  [126,127]   devel-

    oped co-evolutionary approaches (unlike evolutionary here severalpopulations are employed and each population searches for a local

    weight vector for a cluster) based upon Darwinian theory, Lamarck-

    ian theory and Baldwin effect for feature weighting in K-means

    algorithms. Based upon the three theories they proposed six genetic

    approaches for feature weighting in K-means (three based on

    evolutionary scheme DE-LKM, LE-LKM and BE-LKM and three co-

    evolutionary schemes DC-LKM, LC-LKM and BC-LKM). They reported

    that the co-evolutionary based approach for cluster analysis provides

    superior performance than the traditional evolutionary based ones.

    Intuitively hybrid evolutionary algorithms (formulated by com-

    bining the good features of two individual parent processes)

    provide superior performance than the conventional parent algo-

    rithms. A hybrid of GA and PSO based algorithm is developed in

    [129] for order clustering to reduce the surface mount technology

     Table 2

    Basic steps involved in single objective standard GA, DE, ACO, PSO, ABC, AIS and BFO algorithms for solving partitional clustering problem.

    GA   ( ( (   Next Generation   ( ( (

    + *

    Initialize   )   Crossover   )   Mutation   )   Fitness   )   Selection   )   Cl.

    Chromos. O/p

    DE   ( ( (   Next Generation   ( ( (

    + *

    Initialize   )   Mutation   )   Crossover   )   Fitness   )   Selection   )   Cl.

    Particles O/p

    ACO   ( ( (   Next Generation   ( ( (

    + *

    Initialize   )   Fitness   )   Update Pheromone   )   Drop or   )   Short   )   Cl.

    Ants Intensity Peak Memory O/p

    PSO   ( ( (   Next Generation   ( ( (

    + *

    Initialize   )   Vel. update   )   Compute   )   Fitness   )   Selection   )   Cl.

    Particles Pos. update   GBst  &  P Bst    O/p

    ABC   ( ( (   Next Generation   ( ( (

    + *

    Initialize   )   Compute   )   Greedy Selection   )   Onlooker   )   Selection   )   Cl.Bees Emp. Bees & Fitness Bees O/p

    AIS   ( ( (   Next Generation   ( ( (

    + *

    Initialize   )   Fitness   )   Clone   )   Mutation   )   Selection   )   Cl.

    Immune Cells O/p

    BFO   ( ( (   Next Generation   ( ( (

    + *

    Initialize   )   Chemotaxis   )   Swarming   )   Reproduction   )   Eliminat.   )   Cl.

    Bacteria & Dispers. O/p

    S.J. Nanda, G. Panda / Swarm and Evolutionary Computation ∎ (∎∎∎∎) ∎∎∎–∎∎∎4

    Please cite this article as: S.J. Nanda, G. Panda, A survey on nature inspired metaheuristic algorithms for partitional clustering, Swarmand Evolutionary Computation (2014),  http://dx.doi.org/10.1016/j.swevo.2013.11.003i

    http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003

  • 8/16/2019 Nanda and Panda 2013 - A Survey on Nature Inspired Metaheuristic Algorithms for Partitional Clustering

    5/18

    (SMT) setup time. Feng-jie and Ye  [130] applied the GA and PSO

    based hybrid clustering algorithm for image segmentation of 

    transmission lines picture to determine the faults. This system is

    helpful for remote video monitoring. Hong and Kwong   [131]

    combined steady-state genetic algorithm and ensemble learning

    for cluster analysis. Chaves and Lorenab [132] developed a hybrid

    algorithm   ‘Clustering Search’   (consisting of GA along with local

    search heuristic) to solve capacitated centered clustering problem.

    Recently a two stage genetic algorithm was proposed by He et al.[134]  for cluster analysis in which two-stage selection and muta-

    tion operations are incorporated to enhance the search capability

    of the algorithm. The two stage genetic algorithm provides

    accurate results compared to agglomerative k-means  [133]   and

    standard genetic k-means algorithms. A grouping genetic algo-

    rithm (GGA) is a compact one proposed by Falkenauer   [135]   to

    handle grouping-based problems. The GGA is successfully used for

    cluster analysis of benchmark UCI datasets in [136].  Recently Tan

    et al. [137] applied the GGA based clustering technique to improve

    the spectral ef ciency of OFDMA (orthogonal frequency-division

    multiple access) based multicast systems.

      ES-based approaches: Babu and Murty   [138]   developed thepartitional and fuzzy clustering algorithms with ES in 1994. They

    have used the minimization of WGSS (within group sum of 

    squared error) objective function for partitional clustering and

    minimization of FCM (fuzzy C-means) objective functions for

    fuzzy clustering. The paper by Beyer and Schwefel [139] discusses

    the fundamental and recent advancements in partitional cluster-

    ing with ES. Hybrid partitional clustering algorithm based on K-

    means and ES is developed in [140]. It is observed that the hybrid

    algorithms provide better performance than the regular ES on

    cluster analysis of benchmark UCI datasets. The ES based parti-

    tional clustering has been suitably used for cluster analysis of DNA

    microarray database [141].   GP-based approaches: The GP is related to GA, where it auto-

    matically generates computer programs, based on the Darwin

    principle. Each individual computer program is a solution to

    the optimization problem and is encoded in the form of a treecomprising functions and terminals. The GP has been widely

    used for supervised classication problem and it is reported

    that the trees generated by GP have capability to separate

    regions with varieties of shapes [142–144]. Falco et al. [145,146]

    developed the partitional clustering algorithm based on GP. The

    algorithm starts with a population of program trees generated

    at random. The algorithm determines the optimal number of 

    clusters by selecting a variable number of trees per individual.

    The user has to provide a parameter that directly inuences the

    number of clusters present in the dataset. The trees undergotness evaluation and those having higher   tness have the

    higher probability to serve as parents for next generation. The

    genetic operators like crossover and mutation are applied on

    the parent trees to generate offspring. The process continuestill a predened stopping criteria corresponding to the optimal

    cluster partition get satised. Boric et al.  [147] modied the GP

    based partitional clustering with an information theoretic

    tness measure which can determine arbitrary shape clusters

    present in the dataset.  DE-based approaches: The book on Metaheuristic Clustering by

    Das et al. [2]  in 2009 discusses the fundamental as well as the

    advances in DE approaches for cluster analysis  [150]. In case of 

    DE based clustering the individual target solutions (which

    combines to create a population   P ) are taken as parameter

    vectors or genomes. Each target vector   xi ¼ ½mi1; mi2;‥;

    mik;‥; miK , where   mik   is the centroid of cluster   c ik   and   K 

    represents the number of clusters. Then DE employs the

    mutation operation to produce a mutant vector   vi. The   ve

    most commonly used mutation strategies are

    v1;i ¼ xr 1 ;i þ F   ð xr 2 ;i  xr 3 ;iÞ

    v2;i ¼ xbest þ F   ð xr 1 ;i  xr 2;iÞ

    v3;i ¼ xi þF   ð xbest ;i  xiÞ þF   ð xr 1 ;i  xr 2;iÞ

    v4;i ¼ xbest þ F   ð xr 1 ;i  xr 2;iÞ þF   ð xr 3;i  xr 4 ;iÞ

    v5;i ¼ xr 1 ;i þ F   ð xr 2 ;i  xr 3 ;iÞ þF   ð xr 4 ;i  xr 5;iÞ ð3Þ

    where   i   varies from 1 to   P   and   r 1; r 2; r 3; r 4; r 5   are mutually

    exclusive integers randomly generated within the range [1,  P ].The scale factor F  is a control parameter used for amplication

    of difference vector, normally lies in range [0,2]. Then a cross-

    over operation is applied to each pair of the target vector  xi and

    its corresponding mutant vector   vi to obtain a trial vector  ui as

    ui ¼vi   if   ðrand1rCRÞ or  ði ¼ irandÞ

     xi   Otherwise;   8 i ¼ 1; 2;‥; P 

    (  ð4Þ

    where rand1 is a random number in [0,1]. The crossover rate  CR

    is an user dened constant in the range [0,1]. The   irand   is a

    randomly chosen integer in the range [1,   P ]. The   tness of all

    target vector xi and trail vector  ui is evaluated using one of the

    tness functions dened in   Table 3. Then the population for

    next generation is given by

     xt þ 1i   ¼ut i   if   f ðu

    t i Þr f ð x

    t i Þ

     xt i   Otherwise;   8 i ¼ 1; 2;‥; P 

    (  ð5Þ

    where   t   is the number of generation. The algorithm run for

    certain number of generations till the algorithm converges and

    the optimum clusters are achieved.

    The benchmark research article on DE based automatic clustering

    [16]  is published in 2008 by Das et al. Prior to that the DE based

    framework introduced for partitional clustering by Paterlini and

    Krink [148,149] is worth mentioning. Further research work by Das

    et al. deals with hybridization of Kernel-based clustering with DE

    [151]  and application of DE based clustering algorithms to image

    pixel clustering [152]. Subsequently various hybrid algorithms based

    on DE are developed by several researchers which include DE-K

    means by Kwedlo [153], DE-K harmonic means by Tian et al.  [154],

    DE-possibilistic clustering by Hu et al.  [71]. These algorithms have

    been successfully applied to image classication   [156], document

    clustering [157] and node selection in mobile networks [158].

     2.2.2. Physical algorithms in partitional clustering 

    The physical algorithms are inspired by physical processes such as

    heating and cooling of materials (Simulated Annealing given by

    Kirkpatrick et al. in 1983 [48]), discrete cultural information which is

    treated as in between genetic and culture evolution (Memetic

    algorithm by Moscato  [167] in 1989), harmony of music played by

    musicians (Harmony Search by Geem et al.   [173]   in 2001) and

    cultural behavior of frogs (Shuf ed frog-leaping algorithm by Eusuff et al.   [179]  in 2006). These algorithms have been applied to solve

    partitional clustering problem as briey explained in sequence:

      Simulated Annealing  (SA)  based approaches: Selim and Alsultanrst developed the SA based partitional clustering in 1991

    [159].  Then in 1992 Brown and Huntley  [160]  applied the SA

    based partitional clustering algorithm to solve multi-sensor

    fusion problem. The clustering algorithm begins with an initial

    solution  ‘ x’ (cluster centroids) having a large initial temperature‘T ’. The  tness of the initial solution   ‘ f ( x)’   (computed with any

    function from   Table 3) represents the internal energy of the

    system. The heuristic algorithm moves to a new solution   ‘ x’

    (selected from its neighborhoods of a state) or remain in the

    old state   ‘ x’ depending upon a acceptance probability function

    S.J. Nanda, G. Panda / Swarm and Evolutionary Computation ∎ (∎∎∎∎) ∎∎∎–∎∎∎   5

    Please cite this article as: S.J. Nanda, G. Panda, A survey on nature inspired metaheuristic algorithms for partitional clustering, Swarmand Evolutionary Computation (2014),  http://dx.doi.org/10.1016/j.swevo.2013.11.003i

    http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003

  • 8/16/2019 Nanda and Panda 2013 - A Survey on Nature Inspired Metaheuristic Algorithms for Partitional Clustering

    6/18

    given by

    P ðaccept Þ ¼ exp  f ð xÞ  f ð x0Þ

      ð6Þ

    where f ( x) is the energy and T  is the temperature of the present

    state. The probability function  P (accept ) is positive when   f ð x0Þis lower than   f ( x) which represents that the smaller energy

    solutions are better than those with a greater energy. The

    temperature   ‘T ’ plays a crucial role in controlling the evolution

    of the state with the cooling process of the system. The

    algorithm continues either for a  xed number of iterations or

    until a state with minimum energy is found (global solution

    corresponds to optimal cluster partition).

    The basic SA has been suitably combined with K-means   [161]

    and K-harmonic means [162] to develop hybrid algorithms which

    provide superior performance in accurately clustering the UCI

    datasets. A GA and SA based hybrid clustering algorithm is

    developed in   [164]   to solve the dynamic topology management

    and energy conservation problem in mobile ad hoc network.

    Lu et al. [165] developed a fast simulated annealing based cluster-

    ing approach by combining multiple clusterings based on different

    agreement measures between partitions. Recently the SA based

    clustering is applied to group the suppliers for effective manage-

    ment and to fulll the demands of customers (i.e. to build a good

    supply chain management system) [166].

     Memetic Algorithm-based approaches: The recent survey articlesby Chen et al. [169,170] highlight the recent advances in theory

    and application areas of memetic algorithm. This algorithm is

    used by Merz   [168]   to perform cluster analysis on gene

    expression proles using minimization of the sum-of-squares

    as   tness measure. It begins with a population which under-

    goes a global search (exploration of various areas of the search

    space), combined with an individual solution improvement

    (performed by a local search heuristic to provide local rene-

    ments). A balance mechanism is carried out with the local and

    global mechanisms to ensure that the system does not achieve

    premature convergence to a local solution as well as it does

    not consume more computational resources for achieving the

     Table 3

    Similarity functions  f ðÞ   used by the single objective nature inspired metaheuristic algorithms for cluster analysis. Considering dataset  Z N D ¼ f z 1D; z 2D;…; z N Dg  to be

    divided into K clusters with valid partitions  C k as per (1).

    S imilarity fun. Characterist ics Details

    Medoid distance Explanation Minimization of sum of Distance between objects and medoids of dataset

    RepresentationF 1  ¼ ∑

    N i  ¼  1 jAf1;‥K g

    min

    dð z i ; m j Þ  where medoids  fm1; m2;…; mkg  Z ,  d   is any distance

    Used in Lucasius et al. [121], Castro and Murray [122], Sheng and Liu [107]

    Centroid distance Explanation Minimization of sum of squared Euclidean distance of objects from respective cluster meansRepresentation   F 2  ¼ ∑

    K  j  ¼  1∑ z i Ac  j  J z i  μ j J

    2, with  μ j  is the mean of  c  j

    Used in Maulik and Bandyopadhyay [105], Zhang and Cao [207], Murthy and Chowdhury [103]

    Distortion distance Explanation Minimization of intraclus- ter diversity

    Representation   F 3  ¼ F 2=ðN   DÞ

    Used i n K ri shna a nd Mu rty [106], Lu et al.  [108,109], Franti et al. [124], Kivijarvi et al.  [125]

    Variance ratio criterion Explanation It is the ratio of between cluster (B) and pooled within cluster (W ) covariance matrices. The VRC should be maximized

    RepresentationF 4  ¼ VRC  ¼

      trace  B=ðK 1Þ

    trace W =ðN  K Þ

    Used in Cowgill et al. [110], Casillas et al.  [115]

    Intra- and inter-clust. distance Explanation Difference between inter-cluster to intra-cluster dist.

    Representation   F 5  ¼ ∑K i  ¼  1 Dinter ðc iÞw Dintraðc i Þ,  w   is a parameter

    Used in Tseng and Yang [111]

    Dunn0s index Explanation Dunn0 s index to be maximized for optimal partition

    RepresentationF 6  ¼ DI ðK Þ ¼ iAK 

    min

     jAK ; jaimin

    δ ðc i ;c  j ÞkAK 

    max

    f Δðc k Þg

    , where  δ ðc i ; c  jÞ ¼ minfdð z i; z  j Þ  :  z iAc i ; z  jAc  j g,  Δðc k Þ ¼ maxfdð z i; z  j Þ  :  z i ; z  jAc ig,

    d  is the distance

    Used in Dunn [293], Zhang and Cao [207]

    Davis–Bouldin (DB) index Explanation Ratio of sum of within cluster scatter to between cluster separation. DB index to be minimized

    RepresentationF 7  ¼ DBðK Þ ¼

    1k∑K i  ¼  1 Ri;qt , where  Ri;qt  ¼ jAK ;   ja i

    max S i;q þS  j;qdij;t 

    The ith cluster scatter  S i;q ¼  1

    N i∑ z A c i  J z  μi J

    qh i1=q

    , where  N i  and  μ i  are the number of elements and center

    of  c i  respectively. The separation distance between  ith and  jth cluster is  dij;t  ¼ ½∑Dd ¼  1 j μi;d  μ j;dj

    t 1=t 

    Used i n D avis a nd B ou ldi n [291], Cole [123], Das et al.  [16], Bandyopadhyay and Maulik [113],  Agustin-Blas et al. [136]

    C S measure Expl ana tion C S M ea su re i s to b e min imiz ed for optimal p ar ti tioni ng

    Representation

    F 8  ¼ CS ðK Þ ¼∑K i  ¼  1

    1N i∑ z  j Ac i z qAc i

    maxdð z  j ; z qÞ h i

    ∑K i ¼  1   jAK ; jai

    min

    fdðmi ; m j Þg

    , with centroid  mi  ¼   1N i∑ z  j Ac i z  j , where  N i  is number of elements in  c iUsed in Chou et al. [292], Das et al. [16]

    Silhouette Explanation Higher silhouette provides better assignment of elements

    RepresentationF 9  ¼ ∑

    N i  ¼  1

    S ð z i Þ

    N   where element  z iA A, with  A; B c k

    S ð z i Þ ¼  bð z i Þ að z iÞ

    maxfað z i Þ; bð z i Þg, Silhouette range:   1rS ð z i Þr1,

    að z i Þ   is the average dissimilarity of  z i  to other elements of  A,

    neighbor dissimilarity  bð z iÞ ¼ min dissð z i ; BÞ,  AaB

    Used in Kaufman and Rousseeuw [294], Hruschka et al.   [3,118]

    S.J. Nanda, G. Panda / Swarm and Evolutionary Computation ∎ (∎∎∎∎) ∎∎∎–∎∎∎6

    Please cite this article as: S.J. Nanda, G. Panda, A survey on nature inspired metaheuristic algorithms for partitional clustering, Swarmand Evolutionary Computation (2014),  http://dx.doi.org/10.1016/j.swevo.2013.11.003i

    http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003

  • 8/16/2019 Nanda and Panda 2013 - A Survey on Nature Inspired Metaheuristic Algorithms for Partitional Clustering

    7/18

    solution. The memetic based partitional clustering algorithm

    has been applied for energy ef cient clustering of nodes in

    wireless sensor networks   [171]   and segmentation of natural

    and remote sensing images [172].   Harmony Search: The Harmony search algorithm becomes

    popular after Lee and Geem   [174]   applied it for various

    engineering optimization problems. Mahdavi et al. developed

    the Harmony search based partitional algorithm for web page

    clustering [175,176]. The algorithm is inspired by the harmonyplayed by the musicians. Here each musician represents a

    decision variable which denotes solution of the problem. The

    musicians try to match harmony with respect to time by

    incorporating variation and improvisations in the pitch played

    by him. The variation in pitch is given by  x 0 ¼ x þ PB  ε, wherePB is the pitch bandwidth which is an user dened parameter

    to control the amount of change and  ε  is a random number inthe range [ 1,1]. This variation is reected in the form

    of improvement in the cost function to achieve the global

    solution. Mahdavi and Abolhassani have also formulated

    a hybrid Harmony K-means algorithm for document clustering

    [177]. The clustering algorithm   [178]   has been suitably

    applied for designing clustering protocols for wireless sensor

    networks.   Shuf  ed frog-leaping algorithm  (SFL): The SFL algorithm mimics

    the nature of frogs in the memeplexes. The algorithm is used to

    solve partitional clustering problem   [180]   and has been

    reported to yield better solutions than ACO, simulated anneal-

    ing, genetic k-means [106] approaches on several synthetic and

    real life datasets. The initial population consists of a set of frogs

    (solutions) which is grouped into subsets known as meme-

    plexes. The frogs which belong to different memeplexes are

    assumed to be of different cultures and are allowed to perform

    local search. So within each memeplexes each individual frog

    shares its ideas with other frogs and thus the group evolves

    with new ideas (with memetic evolution). After a pre-dened

    number of steps (with memetic evolution), the ideas are shared

    among the memeplexes using a shuf ing process. The local

    (memetic) and global searches (shuf ing process) continue tillthe optimal  tness (accurate clusters) is achieved. The cluster-

    ing algorithm based on SFL has been used for color image

    segmentation [181] and web0s text mining [182].

     2.2.3. Swarm Intelligence algorithms in partitional clustering 

    Swarm intelligence is the group of natural metaheuristics

    inspired by the   ‘collective intelligence’. The collective intelligence

    is built up through a population of homogeneous agents interact-

    ing with each other and with their environment. Example of such

    intelligence is found among colonies of ants,   ocks of birds,

    schools of  sh, etc. The books [59–61] highlight the fundamentals

    and developments in swarm intelligence algorithms for solvingnumerous real life optimization problems. The major such algo-

    rithms include: Ant colony optimization (ACO) by Dorigo  [62]  in

    1992, Particle swarm optimization (PSO) by Kennedy and Eberhart

    in 1995 [68,69], Articial bee colony (ABC) algorithm by Karaboga

    and Basturk in 2006 [73], Fish Swarm Algorithm (FSA) by Li et al.

    in 2002 [254,255]. Application of these algorithms to solve parti-

    tional clustering problems is outlined in sequence

      ACO-based approaches: The ACO algorithm is inspired by antsbehavior in determining the optimal path from the nest to the

    food source. The algorithm becomes popular after Dorigo et al.

    work was standardized in IEEE  [63–65]. With the progress

    of time Dorigo0s book on ACO   [66]   and survey paper  [67]

    are heavily cited by the researchers and scientists in this  eld.

    The cluster analysis algorithms based on ACO follow either of 

    the two fundamental natures of real life ants.

    First one is based on ants foraging behavior for determining the

    food source. Initially ants wander randomly for food in the surround-

    ing regions of nest. An ant0s movement is observed by the neighboring

    ants with the pheromone intensity it lays down while searching for

    food. Once a food source is found the pheromone intensity of the path

    increases due to the movement of ant from source to nest and otherants instead of searching at random, they follow the trail. With the

    progress in time the pheromone intensity starts to evaporate and

    reduce its attraction. The amount of time taken for an ant to travel to

    food source and back to the nest is directly proportional to the

    quantity of pheromone evaporation. So with time an optimal shortest

    path is achieved to maintain the high pheromone intensity. With this

    concept the cluster analysis is formulated as an optimization problem

    and solved using ACO to obtain the optimal partitions in [183,184]. A

    constrained ACO (C-ACO)   [185]   was proposed to handle arbitrary

    shaped clusters and outliers present in the data. Then adaptive ACO

    was proposed by several researchers   [186–188]   to improve the

    convergence rate and to determine the optimal number of clusters.

    A variant of ACO, known as APC (aggregation pheromone density-

    based clustering) algorithm is proposed by Ghosh et al. [189,190]. The

    beauty of APC is updation of the pheromone matrix which is helpful to

    avoid the convergence of solutions to a local optima.

    The second one imitates the ants behavior of grouping dead

    bodies. Ants work together to deposit more dead bodies in their

    nest and group them with respect to their size. This grouping

    property of ants is   rst coded in the form of algorithm for data

    clustering (LF algorithm) by Lumer and Faieta  [191]. The basic LF

    algorithm was followed and improved by several researchers

    [192,193]. Yang et al.   [194,195]   proposed the use of multi-ant

    colonies algorithm for clustering. In this concept, the algorithm

    consists of several independent ant colonies (each having a queen

    ant). The moving speed of ants and parameters of the probability

    conversion function in different colonies differ from each other.

    Each colony produces a clustering result in parallel and sends it to

    the queen ant agent. A hypergraph model (through queen ants) isused to combine all the parallel colonies.

    Handel et al. published a number of articles on ACO [196–199]

    which are extensively cited by the researchers. They have incor-

    porated robustness in the standard LF algorithm (known as ACA)

    and applied it for document retrieval   [196]. The performance of 

    these methods have been compared with that obtained by ant-

    based clustering with K-means, average link and 1DSOM in  [197].

    They have suggested an improved ACO in [198] which incorporates

    adaptive and heterogeneous ants for better exploration of search

    space. In the survey article [199] both the approaches of ACO along

    with other swarm based clustering approaches (like bird  ocking

    algorithm and PSO) have been dealt. A modied version of ACA

    (known as ACAM) is proposed by Boryczka et al.  [200] which has

    been shown to outperform ACA [196] in terms of accuracy (testedwith   ve cluster validation measures). Recently an automatic

    clustering based on ant dynamics is proposed in   [201], which

    can detect arbitrary shape clusters (both convex and/or non-

    convex). Another algorithm known as chaotic ant swarm (CAS)

    proposed by Wan et al. [202] provides optimal partitions irrespec-

    tive of cluster size and density.

    A number of hybrid algorithms based on ants are available in

    the literature. Initially Kuo et al.  [203] have proposed ants based K-

    means algorithm, which is subsequently improved by hybridiza-

    tion of ACO, self-organizing maps(SOM) and k-means in   [204].

    Further, Jiang et al. have developed new hybrid clustering algo-

    rithms by combining the ACO with K-harmonic means algorithm

    in [205] and DSBCAN algorithm in  [206]. Recently Zhang and Cao

    [207] have suggested a new one by integrating ACO with kernel

    S.J. Nanda, G. Panda / Swarm and Evolutionary Computation ∎ (∎∎∎∎) ∎∎∎–∎∎∎   7

    Please cite this article as: S.J. Nanda, G. Panda, A survey on nature inspired metaheuristic algorithms for partitional clustering, Swarmand Evolutionary Computation (2014),  http://dx.doi.org/10.1016/j.swevo.2013.11.003i

    http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003

  • 8/16/2019 Nanda and Panda 2013 - A Survey on Nature Inspired Metaheuristic Algorithms for Partitional Clustering

    8/18

    principal component analysis (KPCA). Here the KPCA is applied on

    the dataset to compute ef cient features and then ant based

    clustering is performed in the feature space (instead of the input

    space). A multiple cluster detection algorithm based on spatial

    scan statistic and ACO is reported in [208]. It is observed that these

    hybrid algorithms exhibit performances superior to that of the

    individual algorithms in terms of ef ciency and clustering quality.

    The ant based clustering algorithms   nd applications to web

    mining   [209], test mining   [188], texture segmentation   [210],intrusion detection   [211,212], high dimensional data analysis

    [213], long-term electrocardiogram processing   [214]   and gene

    expression data analysis [215].

      PSO-based approaches: The PSO is based on the swarmingbehavior of particles searching for food in a collaborative

    manner. The algorithm has become popular among the

    researchers [70–72] due to its simple form for implementation,

    easier selection of parameters and faster convergence rate.

    The cluster analysis using PSO was proposed by Omran et al.

    [219]  for image clustering. Then van der Merwe and Engelhrecht

    [220]   applied it for cluster analysis of arbitrary datasets. The

    algorithm in its basic form for cluster analysis consists of a swarmin a D dimensional search space in which each particle 0s position

     xi ¼ ½mi1; mi2;‥; mik;‥; miK    consists of K cluster centroid vectors.

    The  mik  is the centroid of cluster  c ik. The position of  ith particle is

    associated with a velocity   V i ¼ ½vi1; vi2;‥; viK , where   vi1; vi2   are

    initialized as random numbers in the search range. Then the

    tness of particles is evaluated with a suitable   tness function f ð:Þ dened in Table 2. Based on the tness values the best previous

    positions achieved by the particles represent the local solutions

    given by   P i ¼ ½ pi1; pi2;‥; piK . For the initial run   P i ¼ xi. The global

    solution is the best position achieved by the swarm in a generation

    given by P  g  ¼ ½ p g 1; p g 2;‥; p gt , where t  is the number of generation.

    The cluster centroid positions are updated with the velocity and

    position update of the particles given by

    vikðt þ 1Þ ¼ w  vikðt Þ þc 1   r 1  ð pikðt Þ  xikðt ÞÞþc 2  r 2  ð p g ðt Þ  xikðt ÞÞ ð7Þ

     xikðt þ1Þ ¼ xikðt Þ þ vikðt þ1Þ   ð8Þ

    where   r 1  and   r 2   represent random numbers between [0, 1],   w   is

    the inertia weight which is taken as 0.4. The   c 1   and   c 2   are

    acceleration constants taken as 2.05. The updation process con-

    tinues till the number of data points which belongs to each cluster

    remains constant for certain generations.

    A number of variants of PSO based clustering algorithms have

    been reported by researchers in the last couple of years. Cohen and

    Castro   [221]   have proposed a particle swarm clustering (PSC)

    algorithm in which the particle0s velocity update is inuenced by

    particle0s previous position along with a cognitive term, social

    term and self-organizing term. These terms are helpful to guidethe particle for better solutions and to avoid local stagnation.

    A combinatorial particle swarm optimization (CPSO) based parti-

    tional clustering is proposed in   [222]   for solving multi-mode

    resource-constrained project scheduling problem. Chuang et al.

    [223]   developed a chaotic PSO which replaces the convergence

    parameters like  w,  c 1,  c 2,  r 1,  r 2  with chaotic operators. These new

    operators incorporates ergodic, irregular, and stochastic properties

    of chaos in PSO to improve its convergence. A selective particle

    regeneration based PSO (SRPSO) and a combination of it with K-

    means (KSRPSO) are proposed in  [224]   for partitional clustering.

    Both algorithms provide faster convergence than PSO and

    K-means, due to particle regeneration operation that enables

    better exploration of search space. Sun et al.   [225]   proposed a

    quantum-behaved PSO (QPSO) algorithm for cluster analysis of 

    gene expression database. Recently a new PSO based partitional

    clustering algorithm is developed by Cura et al.   [226]  to handle

    unknown number of clusters.

    The hybrid algorithm based on K-means and PSO is proposed

    by van der Merwe and Engelhrecht   [220]   in 2003. The PSO has

    been suitably combined with K-harmonic means  [227] and rough

    set theory   [228]   to produce hybrid algorithms for partitional

    clustering. Du et al. have formulated a DK algorithm   [229]   by

    hybridizing particle-pair optimizer (PPO) algorithm (a variation onthe traditional PSO) with K-means for microarray data cluster

    analysis. The DK algorithm is reported to be more accurate and

    robust than K-means and Fuzzy K-means(FKM) algorithm. Zhang

    et al.   [230]   combined PSO with possibilistic C-means(PCM) for

    image segmentation which provides superior performance than

    fuzzy C-means(FCM) algorithm. Another ef cient approach based

    on PSO, ACO and K-means for cluster analysis is reported in [231].

    Recently several researchers have produced new hybrid evolu-

    tionary clustering algorithms by suitably combining PSO with

    differential evolution   [232], genetic algorithm   [233], immune

    algorithms [234,235] and simulated annealing [236]. These hybrid

    algorithms provide superior performance than the individual

    traditional evolutionary algorithms in terms of ef ciency, robust-

    ness and clustering accuracy.

    The PSO based clustering algorithms have been effectively used

    in several real life applications including node clustering in

    wireless sensor network (WSN) to enhance lifetime of sensors

    and coverage area   [17], energy balanced cluster routing in WSN

    [237], clustering in mobile ad hoc networks to determine the

    cluster heads which becomes responsible for aggregating the

    topology information [238], cluster analysis of stock market data

    for portfolio management   [27], grouping for security assessment

    in power systems [239], gene expression data analysis [240], color

    image segmentation   [241], clustering for manufacturing cell

    design [242], image clustering  [243], document clustering   [244],

    cluster analysis of web usage data   [245]   and network anomaly

    detection [246].

     ABC-based approaches: The ABC algorithm mimics the foragingbehavior of honey bee swarm. The algorithm has become

    popular after a sequence of publication made by Karaboga

    et al.   [74–77]. Recently the ABC algorithm is used for cluster

    analysis by several researchers like Zhang et al. [247], Zou et al.

    [248], Fathian et al.  [249] and Karaboga et al.  [250].

    The clustering algorithm based on ABC begins with initialization of 

    bee population with randomly selected cluster centroids in the

    dataset. The initial population is categorized into two parts: employed

    bees and the onlookers. The employed bees are always associated with

    a food source. The food source represents the quality of the solution

    (in terms of tness) to the problem and to be optimized. An employed

    bee modies its position (i.e. determines a new food source) depend-

    ing upon local information and the  tness value of new source. If thetness value of new source is more than the previous one than the

    employed bee memorizes the new position and forgets the old one.

    After all employed bees complete the search, they share the informa-

    tion on food sources and their position with the onlooker bees on the

    dance area. Then the onlooker bees are assigned as employed bees

    based on a probability which is related to the   tness of the food

    source. These bees now update their position and share their

    information. Every bee colony has scout bees which do random search

    in the environment surrounding the next to discover new food

    sources. This process is helpful for exploration in the search space

    and to avoid the solutions being trapped into a local food source

    (optima). The clustering algorithm based on ABC has been suitably

    applied for solving network routings  [251]   and sensor deployment

    problems   [252]   in wireless sensor networks. Recently a hybrid

    S.J. Nanda, G. Panda / Swarm and Evolutionary Computation ∎ (∎∎∎∎) ∎∎∎–∎∎∎8

    Please cite this article as: S.J. Nanda, G. Panda, A survey on nature inspired metaheuristic algorithms for partitional clustering, Swarmand Evolutionary Computation (2014),  http://dx.doi.org/10.1016/j.swevo.2013.11.003i

    http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003

  • 8/16/2019 Nanda and Panda 2013 - A Survey on Nature Inspired Metaheuristic Algorithms for Partitional Clustering

    9/18

    clustering algorithm HABC is proposed by Yan et al.   [253]   by

    incorporating crossover operation of GA in ABC which provides

    superior performance than that obtained by each of PSO, CPSO, GA,

    ABC and K-means algorithm.

     Fish Swarm Algorithm  (FSA): The FSA algorithm is derived fromthe schooling behavior of   sh. Cheng et al.  [256]   applied the

    FSA for cluster analysis. The algorithm operates by mimicking

    three important behavior of natural   sh: searching behavior(tendency of   sh to look at food), swarming behavior (sh

    assembles in swarms to minimize danger) and following

    behavior (when a   sh identify food source, its neighboring

    individuals follow based on   sh0s visual power). Tsai and Lin

    [257]  have reported improved solution provided by FSA com-

    pared to PSO for several optimization problems.

     2.2.4. Bio-inspired algorithms in partitional clustering 

    The bio-inspired, short form of biologically inspired algorithms

    comprise natural metaheuristics derived from living phenomena

    and behavior of biological organisms. The intelligence derived

    with bio-inspired algorithms are decentralized, distributed, self-

    organizing and adaptive in nature (under uncertain environ-

    ments). The major algorithms in this   eld include Articialimmune systems (AIS)  [78–83], Bacterial foraging optimization

    (BFO)   [84–86], Dendritic cell algorithm   [87,88]   and Krill herd

    algorithm [356]. The usage of these algorithms to ef ciently solve

    partitional clustering problem is highlighted for each case:

     AIS-based approaches: The books by Dasgupta [78], Charsto andTimmis   [79]   provide the fundamental concepts on articial

    immune system for computing and its potential applications.

    The four core models developed by mimicking the principle of 

    biological immune system include: negative selection algorithm,

    clonal selection algorithm, immune network model and danger

    theory. Among these four the clonal selection principle by

    Charsto and Zuben   [80]   has becomes popular for machine

    learning and optimization purposes. The recent articles byDasgupta et al.   [81,82]  and thesis by Nanda   [83] highlight the

    major advances in the theory and applications of AIS.

    Initially Nasraoui et al. [258] developed an AIS based model for

    dynamic unsupervised learning. Then the clonal selection algo-

    rithm   [259,260]   has been effectively used for cluster analysis.

    In this algorithm the immune cells (they combine to form a

    population which is responsible to protect the body against

    infection) are initialized with   K   cluster centroid vectors. When

    an antigen (foreign element) invades the body; number of anti-

    bodies (immune cells) that recognize these antigens survive

    (based on the best   tness value). These immune cells undergo

    clonal reproduction (new immune cells are produced which are

    copies of ef cient parent cells). Then a portion of cloned popula-tion undergoes a mutation mechanism (somatic hypermutation).

    The mutation mechanism is responsible to diversify the solutions

    in the search space, thus avoids the cells to be trapped in the local

    optima. The best particles among the mutated and cloned ones are

    kept as the parents for next generation. The algorithm runs for a

    xed number of generations (user dened) till the convergence is

    achieved and a optimal number of clusters are obtained.

    Li and Tan [261]  rst developed the hybrid clustering algorithm

    based on AIS by combining it with support vector machine (SVM).

    Then an immune K-means algorithm is developed in  [262] which

    is based on the negative selection principle. Nanda et al.   [234]

    developed an Immunized PSO (IPSO) algorithm in which the global

    best particle is cloned and mutated after the velocity and position

    update to enhance the particles search in an focused manner. In a

    recent work the IPSO has been suitably employed for partitional

    clustering task [234]. Graaff and Engelbrecht [263] initially devel-

    oped a local network neighborhood clustering method based on

    AIS. Later on they have formulated the immune based algorithm

    for cluster analysis under uncertain environments [264].

      BFO-based approaches: Passino   [84]   proposed the bacterialforaging optimization (BFO) algorithm in 2002 which imitates

    the foraging strategies of E. coli bacteria for  nding food. An E.coli bacterium can search for food in its surrounding by two

    types of movements: run or tumble. These movements are

    possible with the help of agella (singular, agellum) that enable

    the bacterium to swim. If the   agella move counterclockwise,

    their effects accumulate in the form of a bundle which pushes

    the bacterium to move forward in one direction (run). When the

    agella rotate clockwise, each   agellum separates themselves

    from the others and the bacterium tumbles (it does not have any

    set direction for movement and there is almost no displace-

    ment). The bacterium alternates between these two modes of 

    operation throughout its entire lifetime. After the initial devel-

    opment by Passino the algorithm gradually has become popular

    due to its capability to provide good solution in dynamic   [85]

    and multi-modal [86] environments.

    Literature review indicates that this algorithm has recently

    being applied to cluster analysis  [265, 266]. The basic clustering

    algorithm based upon the BFO consists of four fundamental steps:

    chemotaxis, swarming, reproduction, elimination and dispersal.

    The initial solution space is created by assigning the bacteria

    positions as the randomly chosen cluster centroids in the dataset.

    Then the chemotaxis process denes the movement of bacteria,

    which represents either a tumble followed by a tumble or tumble

    followed by a run. The detailed mathematical expression in

    chemotaxis for the movement of bacteria (i.e. cluster head) is

    dened in [84]. The swarming operation represents the cell-to-cell

    signaling scheme of bacteria via an attractant. The clustering task

    can be also performed satisfactorily without the swarmingscheme (which involves high computational complexity and thus

    eliminated in [267]). After performing a   xed number of chemo-

    taxis loops the reproduction is carried out where the population is

    sorted with respect to the tness value. The rst half of the bacteria

    is retained and the second half (i.e. least healthy bacteria) is

    allowed to die. Each of the healthiest bacteria splits into two

    bacteria, which are placed at the same location. In order to prevent

    the bacteria from being trapped into local optima the elimination

    and dispersal phases are carried out. Here a bacterium is chosen

    according to a preset probability and is allowed to disperse

    (i.e move to another random position). The dispersal at times

    becomes useful as it may place bacteria near good food sources (i.e.

    optimal cluster partitions). The BFO based clustering algorithm has

    been successfully applied to deploying sensor nodes in wirelesssensor network to enhance the coverage and connectivity [268].

     2.2.5. Other nature inspired metaheuristics for partitional clustering 

     Cat Swarm Optimization (CSO)   – The CSO algorithm is proposedby Chu and Tsai  [269,270]   by observing the natural hunting

    skill of cats. Santosa et al.  [271] used CSO based clustering to

    classify benchmark UCI datasets   [354]. The algorithm deter-

    mines the optimal solution based on two modes of operation

    cats: seeking mode (represents global search technique which

    mimics the resting position of cats with slow movement) and

    tracing mode (local search technique which reects the rapid

    chase of cat behind the target). Recently Pradhan et al.   [20]

    S.J. Nanda, G. Panda / Swarm and Evolutionary Computation ∎ (∎∎∎∎) ∎∎∎–∎∎∎   9

    Please cite this article as: S.J. Nanda, G. Panda, A survey on nature inspired metaheuristic algorithms for partitional clustering, Swarmand Evolutionary Computation (2014),  http://dx.doi.org/10.1016/j.swevo.2013.11.003i

    http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003

  • 8/16/2019 Nanda and Panda 2013 - A Survey on Nature Inspired Metaheuristic Algorithms for Partitional Clustering

    10/18

    applied the multi-objective CSO algorithm for optimal deploy-

    ment of sensor nodes in wireless sensor networks.   Cuckoo Search Algorithm   –   The cuckoo search algorithm is

    developed by Yang and Deb   [272]   in 2009. The algorithm

    mimics the breeding behavior of cuckoos (to lay their eggs

    in the nests of other birds). Three basic operations associated

    are: (i) every cuckoo lays one egg at a time, and dumps its egg

    in randomly selected nest in the environment, (ii) the nests

    with good quality of eggs will remain for next generations, (iii)the number of host bird nests is   xed, and the egg laid by a

    cuckoo is identied by the host bird depending on a probability

    in the range [0, 1] (under such situation, the host bird can

    either destroy the egg or destroy the present nest and build a

    new one). Goel et al.  [274] have formulated the cuckoo search

    based clustering algorithm and applied it for extraction of 

    water body information from remote sensing satellite images.   Fire  y algorithm  –  The algorithm is proposed by Yang [275–277]

    observing the rhythmic   ashes of   reies. Senthilnath et al.

    [278] applied the algorithm for cluster analysis of UCI datasets.

    The algorithm follows three rules based upon the glowing

    nature of   reies: (i) all   reies are unisex and each   rey is

    attracted towards other   reies regardless of their sex; (ii) the

    attraction is proportional to their brightness. Therefore between

    any two   ashing   reies, the less brighter one will move

    towards the brighter one. As the attraction is proportional to

    the brightness, both decrease with the increase in distance

    between  reies. In the surrounding if there is no brighter one

    than a particular  rey, then it has to move randomly; (iii) the

    brightness of a  rey is determined by the nature of objective

    function. Initially at the beginning of clustering algorithm, all the

    reies are randomly dispersed across the entire search space.

    Then the algorithm determines the optimal partitions based on

    two phases: (i) variation of light intensity: the brightness of a

    rey at current position is reected on its   tness value, (ii)

    movement towards attractive   rey: the   rey changes its

    position by observing the light intensity of adjacent   reies.

    Hassanzadeh et al.   [279]   have successfully applied the   rey

    clustering algorithm for image segmentation.   Invasive Weed Optimization Algorithm   (IWO)   –   The IWO is

    proposed by Mehrabian and Lucas   [280]   by following the

    colonization of weeds. The weeds reproduce their seeds spread

    over a special area and grow to new plants in order to  nd the

    optimized position. The automatic clustering algorithm based

    upon IWO is formulated by Chowdhury et al.   [281]. The

    algorithm is based upon four basic steps: (i) initialization of 

    the weeds in the whole search space, (ii) reproduction of the

    weeds, (iii) distribution of the seeds, (iv) competitive exclusion

    of the weeds (tter weeds produce more seeds). Su et al.  [282]

    applied the algorithm for image clustering. The multi-objective

    IWO is proposed by Kundu et al. [283] which is recently applied

    for cluster analysis by Liu et al.  [284].

      Gravitational Search Algorithm   (GSA)   –  Rashedi  [285,286]   pro-posed the GSA following the principles of Newton law of 

    gravity which states that   ‘Every particle in the universe attracts

    every other particle with a force that is directly proportional to

    the product of their masses and inversely proportional to the

    square of the distance between them’. The algorithm is used for

    cluster analysis by Hatamlou et al.   [287]. Recently Yin et al.

    [288]   developed a hybrid algorithm based on K-harmonic

    means and GSA.

     2.3. Fitness functions for partitional clustering 

    The similarity function f ðÞ described in (2) plays a major role in

    effectively partitioning the dataset. It represents a mathematical

    function that quanties the goodness of a partition based on the

    similarity between the patterns present in it. Various   tness

    functions used by the nature inspired metaheuristic algorithms

    for partitional clustering are listed in Table 3.

     2.4. Cluster validity indices

    The cluster validity indices represent statistical functions used

    for quantitative evaluation of the clusters derived from a dataset.The objective is to determine the importance of the disclosed

    cluster structure produced by any clustering algorithm. In a recent

    review article [289] Xu et al. compared the performance of eight

    major validity indices used by swarm-intelligence-based cluster-

    ing on synthetic and benchmark UCI datasets. Arbelaitz et al. [372]

    have demonstrated the use of 30 cluster validity indices in 720

    synthetic and 20 real datasets. The books by Gan et al.   [95],

    Berkhin [96]   and Maulik et al.  [322]   present the validity indices

    used by the evolutionary clustering algorithms. Some popular

    validity indices like DB index, Dunn index, CS Measure and

    Silhouette are also used as  tness function by several researchers

    (the details are enlisted in Table 3). Other validity indices used in

    the bio-inspired clustering literature include CH Index  [290,299],

    I Index [112], Rand index [95], Jaccard coef cient [95], Folkes andMallows index [1], Hubert0s Γ  statistic [95], SD Index [298], S-Dbwindex   [295,296], root-mean-square standard deviation index

    [295,296], RS index  [297], PBM index   [300]   and SV index  [301].

    Gurrutxaga et al.   [302]   suggested a standard methodology to

    evaluate internal cluster validity indices. Recently Saha and Ban-

    dyopadhyay  [303] have proposed connectivity based measures to

    improve the performance of standard cluster validity indices used

    by bio-inspired clustering techniques.

    3. Multi-objective algorithms for  exible clustering 

    Recent survey article by Zhou et al.  [304]  highlights the basic

    principles, advancements and application of multi-objective algo-

    rithms to several real world optimization problems. Basically these

    algorithms are preferred over single objective counterparts as they

    incorporate additional knowledge in terms of objective functions

    to achieve optimal solution. In the last decade researchers have

    developed many nature inspired multi-objective algorithms which

    include non-dominated sorting GA (NSGA-II)   [305,306], Pareto

    envelope-based selection algorithm (PESA II)   [325], Strength

    Pareto Evolutionary Algorithm (SPEA)   [326], and Voronoi Initia-

    lised Evolutionary Nearest-Neighbour Algorithm (VIENNA)  [327].

    Along with these other major nature inspired multiobjective

    algorithms are enlisted in   Table 1. The recent book by Maulik

    et al.   [322]   highlights the overview and applicability of these

    multi-objective algorithms for partitional clustering.

     3.1. Problem formulation

    The partitional clustering problem can be formulated as a

    multi-objective problem by simultaneously minimizing   M  objec-

    tive function represented by

    kAK Min

    FðkÞ ¼ min½ f 1ðkÞ; f 2ðkÞ;…; f M ðkÞ ð9Þ

    where K  is the set of feasible clusters derived from dataset Z N D. In

    multi-objective clustering, instead of achieving a single solution

    (cluster partition achieved in single objective algorithm), a group

    of optimal solutions are obtained (known as Pareto optimal) by

    suitable combination of different objective functions. All the

    Pareto optimal solutions are better from each other in the

    form of some objective functions and therefore known as

    S.J. Nanda, G. Panda / Swarm and Evolutionary Computation ∎ (∎∎∎∎) ∎∎∎–∎∎∎10

    Please cite this article as: S.J. Nanda, G. Panda, A survey on nature inspired metaheuristic algorithms for partitional clustering, Swarmand Evolutionary Computation (2014),  http://dx.doi.org/10.1016/j.swevo.2013.11.003i

    http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003

  • 8/16/2019 Nanda and Panda 2013 - A Survey on Nature Inspired Metaheuristic Algorithms for Partitional Clustering

    11/18

    non-dominated solutions   [305]. The pictorial representation of 

    Pareto optimal solutions, with respect to the objective function is

    known as Pareto optimal front [306].

     3.2. Historical development in multi-objective algorithms for 

     partitional clustering 

    The survey paper by Bong and Rajeswari [323] reports that the

    design, development and uses of multi-objective bio-inspiredalgorithms for clustering and classication problems have expo-

    nentially increased from year 2006 to 2010. Research in the area of 

    bio-inspired multiobjective clustering has been strengthened after

    the work on MOCK (Multi-objective clustering with automatic K

    determination) by Handl and Knowles   [330]  published in 2007.

    Prior to that Corne et al. developed pareto envelope-based selec-

    tion algorithm (PESA) [324] and PESA-II [325] to solve partitional

    clustering problem. Then the work by Handl and Knowles on

    VIENNA (Voronoi Initialised Evolutionary Nearest-Neighbour Algo-

    rithm) [327], multi-objective clustering with automatic determi-

    nation of the number of clusters  [328]   and improvements in the

    scalability [329]  have drawn the attention of many evolutionary

    computing researchers. These articles are considered to have

    served as backbone for the development of MOCK  [330].On the same year with MOCK  [330], Bandyopadhyay et al. in

    [331]  reported multi-objective clustering based on NSGA-II  [306]

    and applied it for classication of remote sensing images. The

    NSGA-II based multi-objective clustering has recently been applied

    for MR brain image segmentation in   [332]. Santosh et al.   [333]

    have proposed a multi ant-colonies based multi-objective cluster-

    ing that can effectively group distributed data. Here each colony

    works in parallel over the same dataset and simultaneous optimi-

    zation of two objectives provides better solutions than those

    achieved with individual objectives being separately optimized.

    An immune-inspired algorithm to solve multiobjective cluster-

    ing is initially proposed in   [334]   to classify the benchmark UCI

    datasets UCI12. Then Ma et al.  [335] developed the immunodomi-

    nance and clonal selection inspired multiobjective clustering forclassifying handwritten digits. The immune multi-objective clus-

    tering has been suitably applied for the SAR image segmentation

    [336]. Recently Gou et al.  [370]   have reported development of 

    multi-elitist immune clonal quantum clustering algorithm.

    An automatic kernel clustering using multi elitist PSO is

    proposed by Das et al. in  [337]. Paoli et al.  [338] have formulated

    the MOPSO based clustering for grouping hyperspectral images.

    Recently the MOPSO has been applied for energy-ef cient cluster-

    ing in mobile ad hoc networks [339].

    Simulated annealing based multi-objective clustering algo-

    rithm which uses symmetry distance is reported by Saha and

    Bandyopadhyay   [340,341]. A scatter tabu search algorithm is used

    for multiobjective clustering problems in [342]. Suresh et al.  [343]

    have proposed a multi-objective differential evolution basedautomatic clustering for micro-array data analysis. The multi-

    objective invasive weed optimization (MOIWO) has recently been

    applied for cluster analysis by Liu et al.  [284].

    A clustering ensemble developed by Faceli et al.   [344]   deals

    with generation of multiple partitions of the same data. Combining

    these resulting partitions, an user can obtain a good data partition-

    ing even though the original output clusters are not compact and

    well separated. Ripon and Siddique have proposed an evolutionary

    multi-objective tool for overlapping clusters detection.

     3.3. Evaluation methods

    Handl and Knowles [346] initially described the cluster validity

    indices for multi-objective bio-inspired clustering. Then Brusco

    and Steinley   [347]  reported the cross validation issues in multi-

    objective clustering.

    Recently the use of parametric and nonparametric statistical tests

    has become popular among the evolutionary researchers. Usually

    these tests are carried out to decide where one evolutionary

    algorithm is considered better than another  [348]. Therefore these

    tests can effectively be applied to evaluate the performance of the

    new multi-objective clustering algorithms. The parametric tests

    described by Garcia et al.   [349]   are popular in which the authorshave selected 14 UCI datasets to compare the performance of   ve

    evolutionary algorithm used for classication purpose. They have

    used Wilcoxon signed-ranks to evaluate the performance with

    classication rate and Cohen0s kappa as accuracy measure. However

    the parametric tests are based upon the assumptions of indepen-

    dence, normality, and homoscedasticity which at times do not get

    satised in multi-problem analysis. Under such situations the non-

    parametric test is preferable. The papers by Derrac et al.  [348] and

    Garcia et al. [350] clearly highlight the signicance of nonparametric

    test, which can perform two classes of analysis pairwise comparisons

    and multiple comparisons. The pairwise comparisons include Sign

    test, Wilcoxon test, Multiple sign test, and Friedman test. The multi-

    ple comparisons consist of Friedman Aligned ranks, Quade test,

    Contrast Estimation. The books [353,352]  and statistical toolbox in

    MATLAB [351] are helpful in implementing these statistical tests.

    4. Real life application areas of nature inspired metaheuristics

    based partitional clustering 

    The nature inspired partitional clustering algorithms have been

    successfully applied to diversied areas of engineering and science.

    Many researchers have employed the benchmark UCI datasets to

    validate the performance of nature inspired clustering algorithms.

    Some popular UCI datasets and its uses in the corresponding

    algorithms are listed in   Table 4. The major applications of the

    nature inspired clustering literature and the corresponding authors

    are shown in  Table 5. Along with Table 5 some more applicationareas include character recognition   [10,335], traveling salesman

    problem   [91], blind channel equalizer design  [21],   human action

    classication   [22,363], book clustering  [32], texture segmentation

    [210], tourism market segmentation [371], analysis of gene expres-

    sion patterns   [365], electrocardiogram processing   [214], security

    assessment in power systems   [239], manufacturing cell design

    [242], clustering of sensor nodes [362], identication of clusters for

    accurate analysis of seismic catalogs [364].

    5. Conclusion

    This paper provides an up-to-date review of nature inspired

    metaheuristic algorithms for partitional clustering. It is observedthat the traditional gradient based partitional algorithms are

    computationally simpler but often provide inaccurate results as

    the solution is trapped in the local minima. The nature inspired

    metaheuristics explore the entire search space with the population

    involved and ensure that optimal partition is achieved. Further

    single objective algorithms provide one optimal solution where as

    the multi-objective algorithms provide the  exibility to select the

    desired solution from a set of optimal solutions. The promising

    solutions of automatic clustering are much helpful as they do not

    need apriori information about the number of clusters present in

    the dataset. It is important to note that although numerous

    clustering algorithms have been published considering various

    practical aspects, no single clustering algorithm has been shown to

    dominate the rest for all application areas.

    S.J. Nanda, G. Panda / Swarm and Evolutionary Computation ∎ (∎∎∎∎) ∎∎∎–∎∎∎   11

    Please cite this article as: S.J. Nanda, G. Panda, A survey on nature inspired metaheuristic algorithms for partitional clustering, Swarmand Evolutionary Computation (2014),  http://dx.doi.org/10.1016/j.swevo.2013.11.003i

    http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003http://dx.doi.org/10.1016/j.swevo.2013.11.003

  • 8/16/2019 Nanda and Panda 2013 - A Survey on Nature Inspired Metaheuristic Algorithms for Partitional Clustering

    12/18

    6. Future research issues

    The   eld of nature inspired partitional clustering is relatively

    young and emerging with new concepts and applications. There

    are many new research directions in this   eld which need

    investigations include:

      In order to solve any partitional clustering problem the success of a particular nature inspired metaheuristic algorithm to achieve

    optimal partition is dependent on its design environment (i.e

    encoding scheme, operators, set of parameters, etc.). So for a given

    complex problem the design choices should be theoretically

    analyzed before the simulation and implementation.

     Table 5

    Real life application areas of nature inspired metaheuristic based partitional clustering.

    Appl ica ti ons Popu lar researc h a rtic les b ased on n ature in sp ired pa rtitional c lu steri ng

    Image segmentation GA  –  Feng et al. [130], PSO  –  Lee et al. [241], Abraham et al. [4], Zhang et al. [230], ACO  –  Ghosh et al. [190], DE  –  Das et al.

    [2], Review  –  Jain et al. [10], NSGA II  –  Mukhopadhyay et al.  [332], Bandyopadhyay et al. [331], MOCLONAL  –  Yang et al.

    [336], Multiobj. Review   – Bong and Rajeswari [323]

    Image clustering GA   – Bandyopadhyay et al.  [113], DE   –  Das et al. [151,152], Omran et al.  [156], PSO   – Omran et al. [219,243],

    NSGA II   – Bandyopadhyay et al.  [331]

    Document clustering GA   –  Casillas et al. [115], Kuo and lin  [129], PSO   – Cui et al. [244], ACO   – Yang et al. [194], Handl and Meyer [196],

    DE   – Abraham et al.  [157], Review   – Steinbach et al.  [35], Andrews et al.  [33],  Jain et al.  [34]

    Web mining ACO   – Labroche et al.  [209], Abraham and Ramos [216], PSO   – Alam et al.  [245]

    Text mining ACO   – Handl and Meyer [196], Vizine et al.  [218], SA - Chang [163]

    Clustering in wireless sensor networks GA –  Tan et al. [137], ABC  –  Karaboga et al. [251], Udgata et al. [252], PSO  –  Yu et al. [237], BFO  –  Gaba et al. [268], MOCSO

    – Pradhan and Panda [20], Review  –  O. Younis et al.  [17], M. Younis et al.  [18], Kumarawadu et al.  [19]

    Clustering in mobile networks ACO   –  Merkle et al. [217], PSO   – Ji et al.  [238],  Ali et al. [339], DE   – Chakraborty et al.  [158]

    SA  –  W Jin et al.  [164]

    Gene expression data analysis GA  –  Lu et al. [109], Ma et al.  [117], ACO   – He and Hui  [215], DE  –  Das et al.  [2], PSO  –  Sun et al.  [225], Du et al. [229],

    Thangavel et al. [236], AIS –

     Lie et al. [260], Review –

     Jiang et al. [30], Lukashin et al. [31], Xu and Wunsch [91], Hruschkaet al. [3,119,120], Jain et al.  [34], MODE   –  Suresh et al.  [343]

    Intrusion detection GA   – Liu et al. [116], ACO   – Ramos and Abraham [211], Tsang and Kwong [212], PSO   – Lima et al.  [246]

    Computational  nance Review   –  MacGregor et al.  [24], Brabazon et al.  [25], Amendola et al. [26], Nanda et al.   [27]

    Large datasets analysis GA  –  Franti et al.  [124], Lucasius et al.  [121], ACO  –  Chen et al.  [213] Evolutionary algorithm NOCEA  –  Saras et al. [29]

    Geological data analysis PSO   – Cho  [355], Review   – Jain et al.   [10,34], Zaliapin et al.  [28],

    Nanda et al.  [364]

     Table 4

    Widely used UCI benchmark data sets for nature inspired metaheuristics based partitional clustering.

    Datasets Creater Used in popular research articles for partitional clustering

    Iris [150 4], Cl-3 R.A. Fisher GA  –  [100,114,91,128,134], DE   – [16,155,153],

    ACO   –  [184,198,193,202,207], BFO   – [267],

    PSO   – [223,226,233,224,222,227], CSO   –  [271],

    ABC   –  [247,250,248], Firey   – [278], Frog  –  [180],

    NSGA II   –  [345], MOAIS   – [334], MOCK   – [22], MODE   –  [343]

    Wine [178 13], Cl-3 Forina et al. GA  –  [128], ACO   –  [201,189,193,202,207],

    PSO   – [223,224,233,226,227,231], DE   –  [16,155],

    BFO   – [267], AIS   – [235], ABC   – [247,250,248],

    Firey   – [278], GSA   – [288], Frog  –  [180], NSGA II   – [345],

    MODE  –  [343], MOSA   –  [340,341], VIENNA   – [327]

    Glass [214 9], Cl-6 B. German GA  –   [4,128,134], ACO   –  [201,189,193,202,231],

    PSO   – [224,226,233,222,227], DE   – [16,155],

    BFO   – [267], ABC   – [250,248], Firey   – [278], CSO   –  [271],

    GSA   – [288], NSGA II   – [345]

    Brest cancer [683 9], Cl-2 W.H. Wolberg GA  –  [4,134], ACO  –   [201,231], DE   –  [16,155],

    O. Mangasarian PSO   – [223,236,224,226,227], BFO   – [267],

    ABC   – [250,248], Firey   – [278], GSA  –  [288], MODE   – [343],

    MOAIS   –  [334], MOSA   – [340,341], VIENNA   – [327]

    Thyroid [215 5], Cl-3 R. Quinlan ACO   – [189,193], PSO   –  [226], ABC   –  [247,250],

    Firey   – [278], Frog