ga-pso1

Embed Size (px)

Citation preview

  • 8/10/2019 ga-pso1

    1/7

    Classication rule discovery with DE/QDE algorithm

    Haijun Su a, * , Yupu Yang a , Liang Zhao ba Department of Automation, Shanghai Jiaotong University, Shanghai, PR Chinab Key Laboratory of Advanced Control and Optimization for Chemical Processes, Ministry of Education, East China University of Science and Technology, Shanghai, PR China

    a r t i c l e i n f o

    Keywords:

    ClassicationQuantum-inspiredDifferential evolutionData miningContinuous attribute

    a b s t r a c t

    The quantum-inspired differential evolution algorithm (QDE) is a new optimization algorithm in the bin-

    ary-valued space. The paper proposes the DE/QDE algorithm for the discovery of classication rules. DE/QDE combines the characteristics of the conventional DE algorithm and the QDE algorithm. Based onsome strategies of DE and QDE, DE/QDE can directly cope with the continuous, nominal attributes with-out discretizing the continuous attributes in the preprocessing step. DE/QDE also has specic weightmutation for managing the weight value of the individual encoding. Then DE/QDE is compared withAnt-Miner and CN2 on six problems from the UCI repository datasets. The results indicate that DE/QDE is competitive with Ant-Miner and CN2 in term of the predictive accuracy.

    2009 Elsevier Ltd. All rights reserved.

    1. Introduction

    Data mining is the process of knowledge discovery, which

    searches a large volume of data to discover interesting and usefulinformation previously unknown ( Collard & Francisci, 2001 ). Dataclassication is one of the most common tasks of data mining. Itgenerates from a set of training examples a set of rules to classifyfuture test data. Evolutionary algorithms (EAs) have been appliedto the numerical optimization, combinatorial optimization, neuralnetworks, and data mining.

    1.1. Related work

    Genetic algorithms (GAs) have been applied widely to data min-ing for classication. Holland (1986) proposed Michigan approachwhich represents one rule by one individual, and Smith (1983) pro-posed Pittsburgh approach which represents several rules by oneindividual. Rule induction is one of the most common forms of knowledge discovery. It is able to convert the data into a set of IF-THEN rules for classication. The algorithms based on GAsfor rule discovery has been studied in Jong, Spears, and Gordon(1993), Liu and Kwok (2000), Fidelis, Lopes, and Freitas (2000),Au, Chan, and Yao (2003) and Chiu (2002).

    Recently, some algorithms based on other EAs have been devel-oped for rule discovery. Jiao, Liu, and Zhong (2006) proposed theorganizational coevolutionary algorithm for classication (OCEC).OCEC uses a bottom-up search mechanism, and causes the evolu-tion of sets of examples which form organizations. Three new evo-

    lutionary operators and a selection mechanism are devised forrealizing the evolutionary operations performed on organizations.OCEC can handle multiclass learning in a natural way because it

    is inspired from the coevolutionary model.Genetic programming for discovering comprehensible classi-cation rules have been investigated ( Falco, Cioppa, & Tarantino,2002; Johnson, Gilbert, & Winson, 2000 ). The algorithm proposedin Falco et al. (2002) can provide the compact and comprehensibleclassication rules and has good robustness.

    Particle swarm optimizer (PSO) is a new evolutionary algo-rithm, which simulates the coordinated motion in ocks of birds.Sousa, Silva, and Neves (2004) proposed the use of PSO for datamining. PSO can achieve the rule discovery process. The rule repre-sentation in PSO uses the Michigan approach. PSO needs fewer par-ticles than GA to obtain the same results.

    Ant colony optimization (ACO) is a new heuristic algorithm byresearch on the behavior of real ant colonies. Parpinelli, Lopes,and Freitas (2002) rstly proposed Ant-Miner based ACO algorithmfor extracting classication rules from data. Ant-Miner discoveredrules referring only to nominal attributes. Continuous attributeshas to be discretized. In the initial population, Ant-Miner usingthe entropy measures has more the quality of the rules than aGA algorithm generating the initial population at random. Ant-Miner adopts the normalized information-theoretic heuristic func-tion which computed the entropy for an attribute-value pair only.

    Holden and Freitas (2008) proposed a hybrid PSO/ACO algo-rithm for discovering classication rules. In PSO/ACO, the rule dis-covery process is divided into two separate phases. In the rstphase, ACO discovers a rule containing nominal attributes only.In the second phase, PSO discovers the rule potentially extendedwith continuous attributes.

    0957-4174/$ - see front matter 2009 Elsevier Ltd. All rights reserved.doi: 10.1016/j.eswa.2009.06.029

    * Corresponding author. Tel.: +86 21 34204261; fax: +86 21 34204427.E-mail addresses: [email protected] , [email protected] (H. Su).

    Expert Systems with Applications 37 (2010) 12161222

    Contents lists available at ScienceDirect

    Expert Systems with Applications

    j o u rn a l homepa ge : www.e l s ev i e r. com / l oca t e / e swa

    http://dx.doi.org/10.1016/j.eswa.2009.06.029mailto:[email protected]:[email protected]://www.sciencedirect.com/science/journal/09574174http://www.elsevier.com/locate/eswahttp://www.elsevier.com/locate/eswahttp://www.sciencedirect.com/science/journal/09574174mailto:[email protected]://dx.doi.org/10.1016/j.eswa.2009.06.029
  • 8/10/2019 ga-pso1

    2/7

    1.2. Proposed algorithm

    The differential algorithm (DE) is a population-based, sto-chastic global optimization approach proposed by Stron andPrice (1997) . DE is an excellent algorithm in the oat-pointsearch space. Many modied DE algorithms have been devel-oped for solving continuous optimization problems. Althoughthese DE algorithms have good performance for continuousproblems, they are difcult to solve lots of practical engineeringproblems formulated as discrete optimization problems, such asthe combinational problems, the scheduling or routing prob-lems. Compared with continuous DE, binary DE has not beenresearched extensively, and its applications are still limited ina few cases.

    Lampinen modied the differential evolution to solve non-lin-ear programming problems containing integral, discrete and con-tinuous variables ( Lampinen & Zelinka, 1999 ). A new binary DE,called AMDE, was proposed for numerical optimization in Pam-para, Engelbrecht, and Franken (2006) .

    Han proposed quantum-inspired evolutionary algorithm basedon the concepts of Q-bits and superposition of states in Han andKim (2002) and Han and Kim (2004). Although QEA has good per-formance for the knapsack problem, changing the initial values of Q-bits can inuence QEA to search the best solutions. Additionallythe strategy choosing the magnitude of the rotation angle has aneffect on the convergence speed of QEA.

    We previously proposed a novel binary differential evolutionalgorithm, called the quantum-inspired differential evolutionalgorithm (QDE) ( Su & Yang, 2008 ). QDE uses a Q-bit individualas a probabilistic representation, instead of binary or oat-pointed representation. The mutation operator and crossoveroperator of DE are adapted in order to generate new Q-bitstrings. The selection operator of DE can make better Q-bitstrings and their observing states enter next generationpopulation.

    In this paper, we propose a new classication algorithm calledDE/QDE. DE/QDE uses ideas from an original DE and a modiedDE to cope with continuous and integral attributes, and use ideasfrom QDE to cope with binary attributes. Here, the binary and inte-gral attributes belong to the nominal attributes.

    1.3. Structure of the paper

    The rest of the paper is organized as follows: Section 2 describesthe original DE, the modied DE for integer optimization, and theQDE algorithm for binary optimization. Section 3 discusses the useofDE/QDEforrule discovery. In Section 4 wepresentthe experimen-tal procedure andresults.Some conclusions arediscussed in Section5 .

    2. The DE algorithm

    In the section, it describes three versions of DE which can dealwith continuous, integral and binary optimization, respectively.

    2.1. DE for continuous optimization

    DE is a novel parallel search method. Because DE is a oating-point encoded evolutionary algorithm, it often deals with thereal-valued optimization problems. DE generates new candidateindividuals by combining a parent individual and one or severaldifferences. DE has three parameters: mutation control parameterF , crossover control parameter CR and population size NP .

    Some versions of DE provided many novel strategies for threeparameters.

    2.1.1. MutationThere are several mutation forms of DE at present. The scheme

    DE =rand =1 is one of the most popular schemes used. This schemewill be described as follow. Target vectors denote X i;G x1 ;i; . . . ; xN ;i

    T , and trial vectors denote V i;G v 1 ;i; . . . ; v N ;iT ,

    where i 1 ; . . . ;NP ; N is the dimension of the target function, andthe subscript G denotes the G-th generation. DE =rand =1 is ex-pressed the following equations:

    V i;G X r 1 ;G F X r 2 ;G X r 3 ;G 1

    where r 1 ; r 2 ; r 3 are randomly and mutually different integers chosenin the range f 1 ; . . . ;NP g, and are also different from the running in-dex i. F is a real parameter which controls the amplication of thedifferential variation. Generally, the value of F is set in the range[0,2], usually less than 1. If F is chosen a lower value, the diversityof DE becomes worse, and DE is easier to get into local optima.

    2.1.2. Crossover After the mutation step, the nal trial vector U i;G u1 ;i; . . . ;uN ;i

    T

    is calculated by the following equation:

    u j;i

    v j;i if rand j0 ; 1 6 CR or j jrand x j;i otherwise

    2

    where j 1 ; . . . ;N ; CR 2 0 ; 1 ; jrand 2 f 1 ; . . . ;N g. CR represents theprobability that an element of the nal trial vector is chosen fromthe new mutation vector and not from the old target vector. If CRis set to high value, DE becomes to converge faster. If CR is set tolow value, DE becomes robust, but spends more time in ndingthe minimumof the problem. jrand is to make sure that the nal trialvector is different from the corresponding target vector by at leastone element. The Eq. (2) is binomial crossover operator and isadopted in the paper.

    2.1.3. SelectionEach target vector of the next generation is generated as:

    X i;G 1 U i;G if f U i;G < f X i;G X i;G otherwise

    3

    The equation is a greedy selection scheme. If the vector U i;G yields asmaller objective function value (for minimization problem) than X i;G, U i;G will replace X i;G and enter the population of the next gener-ation, i.e. X i;G 1 obtains the information of U i;G, otherwise X i;G will re-tain in the population for the next generation, i.e. X i;G 1 obtains theinformation of X i;G.

    2.2. DE for integer optimization

    DE is only capable of handling continuous variables in its nor-mal form. Lampinen and Zelinka (1999) discussed how to modifyDE for the integral variables and extended DE for the integer opti-mization. It is proposed two simple modications. First, integervalues should be used in order to evaluate the tness function,but DE may still works internally with continuous values. Thusthe form of the tness function is described as:

    f y j 4

    where

    y j INT x j; x j 2 X ; j 1 ; . . . ; N

    INT is a function for converting a real value to an integer value bytruncation. Truncated values are not elsewhere assigned. DE per-forms its operators with a population of continuous variables

    regarding of the corresponding variable type. Second, in case of integral variables, the population should be initialized as follows:

    H. Su et al. / Expert Systems with Applications 37 (2010) 12161222 1217

  • 8/10/2019 ga-pso1

    3/7

    xi; j xL j r i; j xU j x

    L j 1 5

    where j 1 ; . . . ;N . xU j and xL j are the upper and the lower bounds of

    the j-th variable, respectively. Using the two modied equations,the problem containing integral variables should be handled easily.

    2.3. QDE for binary optimization

    The above method is not suit to handle binary variablesbecause the value of an binary variable is either 0 or 1. Thusthe form of DE should be modied properly to implement binaryoptimization. QDE is a novel evolutionary algorithm based on theconcept and principles of quantum computing ( Su & Yang, 2008 ).It uses a string of Q-bits as an individual. The Q-bit representationhas a better feature than ordinary binary string. QDE is designedwith the Q-bit representation. We present the QDE algorithm inthe following.

    QDE maintains a population of Q-bit individuals,Q G qG1 ; qG2 ; . . . ;qGn at generation G, where n is the size of pop-ulation, and q is a Q-bit individual dened as

    qGi

    aGi1 aGi2 . . . aGimbGi1 bGi2 . . . bGim" #

    6

    where m is the length of a Q-bit individual. Because jaGij j2 or jbGij j

    2 de-notes a probability toward either 0 or 1 state, they can be changedby the mutation operator of DE. Thus the mutation operator can beexpressed as

    v Gi; j aGr 1 ; j F aGr 2 ; j a

    Gr 3 ; j 7

    where i 1; . . . ;n ; j 1; . . . ;m . The integers r 1 ; r 2 ; r 3 are randomlyand mutually different integers chosen in the range 1 ; . . . ;n , andthey are also different from the running index i. F is a real factorwhich controls the amplicationof the differential variation. Gener-ally, the value of F is set in the range of [0,2].

    The crossover operator is expressed as

    a 0i; j v Gi; j if rand j0 ; 1 6 CR or j jrand

    aGi; j otherwise( 8 where i 1; . . . ;n; j 1 ; . . . ;m ; CR 2 0 ; 1 ; jrand 2 f 1 ; . . . ;ng. CR is thecrossover probability. b0i; j is calculated by the following equation

    b0i; j ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1 a0i; j 2r 9

    So the new Q-bit individual is

    q0i a0i1 a0i2 . . . a0imb0i1 b

    0i2

    . . . b0im 10 The new binary string u

    Gi can be obtained by observing each Q-bitstate of qGi . The population at generation G denotes X G

    xG1 ; . . . ; xGn , and the observation vectors denote U G u01 ; . . . ;u0n .

    The selection operator is expressed as

    xG 1i u0i if f u0i < f x

    Gi

    xGi otherwise( 11 and

    qG 1i q0i if f u0i < f x

    Gi

    qGi otherwise( 12 So the observing population at generation G 1 is X G 1

    xG 11 ; x

    G 12 ; . . . ; x

    G 1n , and the Q-bit population at generation G 1is Q G 1 qG 11 ; qG 12 ; . . . ;qG 1n .

    After a Q-bit individual is performed the mutation operationand the crossover operation, it will update the values of the corre-sponding bits. The individual must be observed in order to get thebinary string. Of course, it is unnecessary to observe its entireQ-bits. Observing the entire Q-bits of one individual not onlyincreases the computational cost, but also may generate a bran-newbinary individual whichdoes not inherit any information fromthe former binary individual. For binary problems, such as theknapsack problem, it is proposed the binomial observationapproach.

    3. The DE/QDE algorithm for rule discovery

    In this section, we use three versions of DE to solve the classi-cation problem. Our algorithm is called DE/QDE which can copewith the datasets containing the continuous, binary and integralattributes. Especially, the continuous attribute can be useddirectly.

    3.1. Representations of the rule

    For describing a problem with m attributes A j; j 1 ; . . . ;m ,rules can be represented as: if cond

    1 ^ . . . ^ cond

    m then class

    C k. The representation of an individual in DE/QDE is cond 1 ;. . . ;cond m ; C k. Here, C k is the value of the class, and cond j is a con-dition on A j. For a continuous attribute, the form of cond j isV j;lower 6 A j 6 V j;upper . For a nominal attribute, the form of cond jis A j V j.

    It is well known that DE is easy to deal with continuous vari-ables. But there are nominal attributes in the classication prob-lems. If a nominal attribute has more than two values, it is calledan integral attribute. If a nominal attribute has only two values,it is called a binary attribute. When DE is modied simply, it candeal with integral attributes. For an binary attribute, DE must in-crease some mechanisms. So QDE proposed in Section 2.3 is usedin order to solve the binary attribute.

    Fig. 1 shows the form of an individual in the population. In aproblem with m attributes, Gene j represents the condition of the j-th attribute A j; j 1 ; . . . ;m . Each Gene contains three elds. Therst eld, the weight W j, is a binary variable taking 0 or 1. Whenthe value of W j is set to 0 or 1, the j-th attribute is to be removedor inserted into the individual. The second eld of Gene j shows dif-ferent forms depending to the type of the corresponding attribute.If an attribute A j is continuous, the secondeld represents the low-er bound V j;lower of the attribute. If A j is binary, the second eld rep-resents the quantum representation V j;1 of the attribute. If A j isintegral, the second eld represents the value V j;1 of the attribute,and is a real number which is used in the mutation operator. Thethird eld is similar to the second. If A j is continuous, the third eld

    W j V j,1 V j,2

    Continuous

    Binary

    Integer

    W j

    V1,1 V1,2W1 Vm,1 Vm,2Wm

    Gene 1 Gene m

    V j,lower

    Class

    Ck

    V j,upper

    W j V j,1 V j,2

    Fig. 1. Representation of an individual.

    1218 H. Su et al. / Expert Systems with Applications 37 (2010) 12161222

  • 8/10/2019 ga-pso1

    4/7

    represents the upper bound V j;upper of the attribute. If A j is binary,the third eld represents the value V j;2 of the attribute, i.e.V j;2 0 or 1. If A j is integral, the third eld represents the valueV j;2 of the attribute, i.e. V j;2 INT V j;1 . C k is the value of whichclass the individual belongs to.

    Although the encoding of each individual in DE/QDE has a xedlength, the length of its corresponding rule is alterable dependingto the value of the eld weight

    W j. How to regulate the weight is

    described in Section 3.4 .

    3.2. Fitness function

    The tness function is used to evaluate the quality of each ruleduring the training process. Fidelis et al. (2000) proposed the t-ness function based on the sensitivity Se and the specicitySp. The following four aspects must be evaluated rstly:

    (1) True Positive TP : the rule covers the number of examplesthat have the class predicted by the rule.

    (2) False Positive FP : the rule covers the number of examplesthat have a class different from the class predicted by therule.

    (3) False Negative FN : the rule does not cover the number of examples that have the class predicted by the rule.

    (4) True Negative TN : the rule does not cover the number of examples that have a class different from the class predictedby the rule.

    The accuracy Ac , the sensitivity Seand the specicity Sp aredened as follows, respectively:

    Ac TP

    TP FP 13

    Co TP

    TP FN 14

    Sp TN

    TN FP 15

    We dene the tness function as follows:

    Fitness x 1 Ac Co Sp x 2 Simp 16

    where Simp is a measure of rule simplicity, and x 1 and x 2 are user-denedweights. The Simp measure can be denedin many differentways. Here, Simp is expressed as follows:

    Simp TermuTerm a

    17

    where Term a means thenumber of theuseful attributes of a rule, andTerm u means the number of potentially useful attributes. A givenattribute Ai is said tobepotentiallyuseful if thereis at leastonetrain-ing example having both the Ais value specied in the rule anteced-

    ent and the goal atribute value specied in the rule consequent ( deAraujo, Lopes, & Freitas, 1999). In our experiment, x 1 and x 2 areset to 0.8 and 0.2, respectively. The tness is taken on values in therange [0,1]. In the classication problem, we search an individualhaving the maximumtness value per optimization process.

    3.3. Rule extraction and prediction method

    In DE/QDE, each individual of the population represents a rule.The genome of an individual consists of the antecedent (IF part)and the consequent (THEN part) of the rule. During one run, theantecedent of the genome of each individual needs to performthe interrelated evolution operators, but the consequent speciesa preset xed class. Each run discovers a single rule which predicts

    a given class for examples. If a given dataset contains m classes, thealgorithm needs to run m times at least.

    After a rule is discovered, it goes through a pruning process inorder to remove redundant attributes. This process is done by iter-atively removing one attribute of the rule at a time. If the new ob-tained rule has the same or higher quality than the original rule,the new rule replaces the original. It is noted that our pruning pro-cess only regulates the length of a rule rather than reduces thenumber of the rules.

    When an example is tested, there are three possible out-comes. First, there might be one rule covering the example. Inthis case, the example is simply classied by the rule. Second,there might be more than one rule covering the example, andthe consequences of the rules belong to different classes. In thiscase the example is classied by the rule having the highest-quality tness among all the rules covering the example. Third,there might be no rule covering the example. In this case, theexample is classied by the rule having the maximum match va-lue among all the rules. When more than one rules have thesame maximum match value, the one having higher-quality t-ness is used. The match value is dened as MV ir term

    ir =

    jterm r j, where term r denotes the number of terms in the anteced-ent of the rule r , and term ir denotes the number of terms satis-ed by the example i ( Jiao et al., 2006 ). According to thedenition, the range of the match value is [0,1], but the matchvalue must be less than 1 in the third case.

    3.4. Implementation of DE/QDE

    There may be continuous, binary, integral attributes or theircombination in a given dataset. For many classication algorithms,the continuous attributes have to be converted into discrete valuesby the discretization methods in a preprocessing step. Discretiza-tion often improves the comprehensibility of the continuous attri-butes, because the classication algorithms take on only a fewdiscrete values rather than a set of continuous values.

    Our DE/QDE algorithm can cope with the continuous attributesdirectly. It has introduced three mutation operators for the contin-uous, binary and integral attributes in Section 2 . Here, we need toorganize these strategies well. Supposing that a dataset contains mattributes which might belong to the continuous, binary and inte-gral attributes respectively. We need to mark clearly the kind of each attribute in sequence. The j-th item of an individual denotesthe j-th attribute in the dataset, j 1 ; . . . ;m . If the j-th item iscontinuous, the mutation strategy in Section 2.1 is performed; if the j-th item is integral, the mutation strategy in Section 2.2 is per-formed; if the j-th item is binary, the mutation strategy in Section2.3 is performed. Each continuous item has the lower and upperbounds, so the two bounds must perform the mutation operatorrespectively. The following codes sun up the process of the valuemutation.

    The pseudo-code of the value mutation:

    There are NP individuals in the population. Each individual hasm attributes.

    beginfor i 0 ; i < NP ; i {

    for j 0 ; j < N ; j {

    if( the j-th attribute is continuous ){

    Change the lower and upper bounds respectively;}else if( the j-th attribute is binary ){

    Change the value of the quantum bit;Obtain new value by the observation approach;

    H. Su et al. / Expert Systems with Applications 37 (2010) 12161222 1219

  • 8/10/2019 ga-pso1

    5/7

    }else if( the j-th attribute is integral ){

    Change the value of the second eld V j;1 ;V j;2 INT V j;1 ;

    }}

    }end.

    The weight mutation is developed to change the weight of anattribute in an individual. In the initial population, we usuallychoose an attribute randomly whose weight value is set to 1. Itmeans that each initial individual has only a useful attribute. Theparameter pw denotes the attribute-insertion or attribute-removalprobability. When a random number is less than pw and the j-thattribute is useless, the j-th attribute is transformed to be useful.When a random number in the range [0,1] is less than pw andthe j-th attribute is useful, the j-th attribute is transformed to beuseless. However, during the weight mutation process of an indi-vidual, the transformation from useless to useful is limited notmore than two times, and the transformation from useful to use-less is limited not more than one time. The following codes sunup the process of the weight mutation.

    The pseudo-code of the weight mutation:

    beginfor i 0 ; i 6 NP ; i {

    bag1=0; bag2=0;for j 0 ; j 6 m ; j {

    if( rand < pw k j m 1){

    if( W j 0&& bflag1 < 2{

    W j 1; bag1++;}if( W j 1&& bflag2 < 1){

    W j 0; bag2++;}

    }}

    }end.

    We consider that pm should be set to a little value, such as 0.1.Then each attribute has a small chance to be removed or insertedin an individual. It ensures that the actual length of an individualis changed slightly in each iteration. This attribute-insertion andattribute-removal strategy can test iteratively whether an attributeis useful to an individual. The basic process of the DE/QDE algo-rithm is presented as follow:

    Step 1: Initialize the population.Step 2: Perform the value mutation operator according to the

    kind of the attributions.Step 3: Perform the weight mutation operator.Step 4: Perform the crossover operator.Step 5: Evaluate each individuals tness.Step 6: Perform the selection operator.Step 7: If the iterative generation gets to the preset value, go to

    Step 8, else go to Step 2.

    Step 8: Extract a rule from the best individual.

    4. Experiments and results

    For the experiments, six datasets from the well-known UCIdataset repository ( Blake, Keogh, & Merz, 1998) are used to testthe performance of DE/QDE. The basic information of the datasetsis presented in Table 1 . The attribute is partitioned into threetypes: the continuous, binary and integral attributes. Table 1 indi-

    cates the numbers of the continuous, binary and integral attributesof each dataset. Breast cancer (L) contains the binary and integralattributes. Breast cancer (W) and Tic-tac-toe only contain the inte-gral attributes. Dermatory, Hepatitis and Cleveland contain thecontinuous, binary and integral attributes. As mentioned earlier,DE/QDE can cope with the continuous attributes directly, so dis-cretization is abandoned completely in the data preprocessingstep. We run 10-fold cross-validation as the test method. One data-set is divided into 10 equal partitions. A partition is used as the testset and the remaining nine partitions are used as the training set.The average predictive accuracies of the 10 runs are reported asthe predictive accuracy of the discovered rule set.

    DE/QDE has 50 individuals and runs for a maximum of 200 iter-ations to discovery one rule. Other important parameters are listedas follows:

    Mutation parameter F is generated randomly in the range [0,0.5];

    Crossover parameter CR is generated randomly in the range [0,0.3];

    Attribute-insertion or attribute-removal probability pw is setto 0.1;

    Minimum number of the training examples covered by per ruleis set to 4.

    We evaluate the performance of DE/QDE in comparison withAnt-Miner (Parpinelli et al., 2002) and CN2 ( Clark & Niblett,1989 ). Ant-Miner and CN2 are well-known classication algo-rithms using the rule set. Ant-Miner is an algorithm using ant col-ony optimization (ACO) for the discovery of classication rules.CN2 is an induction algorithm combining some ne strategies of ID3 and AQ. Table 2 reports the average predictive accuracy andthe standard deviation. Table 3 reports the average number of rules, the standard deviation. Table 4 reports the number of termsper rule.

    Theresults of Ant-Miner without rule pruning are also shown inthe tables. It can be seen that the rules discovered by Ant-Minerwithout rule pruning is longer than those discovered by Ant-Miner,and the rule set discovered by Ant-Miner without rule pruning ishuger than that discovered by Ant-Miner. The reason is that therule pruning is usually benecial to improving the predictive accu-racy of the rule set since it can delete redundant rules and reducethe antecedent of each rule. Additionally, a default rule is used to

    simply predict a new example uncovered by the rule list inAnt-Miner. The default rule containing no conditions has only theconsequence. Rules will be removed if they have the same conse-quence as the default rule. So using the default rule is helpful to

    Table 1

    UCI repository datasets used in experiments.

    No. Datasets Examples Attributes Class

    Continuous Binary Integral

    1 Breast cancer (L) 282 6 3 22 Breast cancer (W) 683 9 23 Tic-tac-toe 958 9 24 Dermatory 366 1 1 32 65 Hepatitis 155 6 13 26 Cleveland 303 5 3 5 5

    1220 H. Su et al. / Expert Systems with Applications 37 (2010) 12161222

  • 8/10/2019 ga-pso1

    6/7

    reduce the number of the rules. In DE/QDE, the match value is usedto classify a new example uncovered by the rule set. Pruning theantecedent of a rule does not reduce the number of the rule set,so the number of the rule set is not changed in the rule pruningstep.

    In Table 2 , it can be seen that the predictive accuracies of DE/QDE for Breast cancer (L), Tic-tac-toe and Hepatitis are higher thanthose of Ant-Miner. DE/QDE has higher predictive accuracies thanCN2 for Breast cancer (L), Tic-tac-toe, Dermatology and Hepatitis.But the standard deviations of DE/QDE are often larger than thoseof Ant-Miner and CN2. In DE/QDE, the continuous attributes are di-rectly used in the process of nding rules. Ant-Miner needs to dis-cretize the continuous attributes. After a continuous attribute isdiscretized, it values will become several isolated values ratherother many continuous value in a range. So discretization is oftena good thing because this step simplies the distribution of the

    continuous attribute. Of course, DE/QDE can also accept the contin-uous attributes which are discretized. But this is not the purpose of the paper.

    In each dataset, the number of rules of DE/QDE is obviously lessthan those of Ant-Miner without rule pruning and CN2. But DE/QDE obtains better result than Ant-Miner only for Breast cancer(L). Likewise, in Table 4 , the number of term per rule has similarcomparison results, but DE/QDE obtains better result than Ant-Miner only for Breast cancer (W).

    Generally, the performance of a classication method usingrules is affected by several aspects, such as the discovering rulesalgorithm, the evaluation tness and the rule pruning method. InDE/QDE, the values of F and CR are not discussed specially becausethey seldom inuence DE/QDE to discover classication rules. But,

    how the weight mutation works is a very important step whichdetermines the conversion of the attributes between useful and

    useless. Then the tness function is another important factor.Designing the tness function needs to depend on the factual de-mand, such as the predictive accuracy, the comprehensibility andthe interestingness. In many cases, the tness functions need totake into account more than a measure.

    5. Conclusion

    This paper has proposed an new classication algorithm calledDE/QDE. DE/QDE combines two DE algorithms, i.e. DE and QDE.QDE is an optimization algorithm based on the strategies of theDE algorithm in the binary-valued space. DE/QDE can deal withthe datasets containing the continuous, binary and integral attri-butes. Because the continuous attribute can be used directly inDE/QDE, it is possible that discretization is canceled in the datapreprocessing step. The weight mutation operator is used to up-

    date the weight of the attributes for an individual. DE/QDE is veryexcellent in term of the search ability, and can search high-qualityrules. The results of six datasets show that DE/QDE canobtain com-petitive the predictive accuracies, although it generates a little lar-ger rule sets than Ant-Miner. So the future research will reduce thenumber of the rule set and improve the comprehensibility of therule set.

    References

    Au, W.-H., Chan, K. C. C., & Yao, X. (2003). A novel evolutionary data miningalgorithm with applications to churn prediction. IEEE Transaction onEvolutionary Computation, 7 (6), 532545.

    Blake, C., Keogh, E., & Merz, C. J. (1998). Uci repository of machine learning databases , .

    Chiu, C. (2002). A case-based customer classication approach for direct marketing.Experts Systems with Applications, 22 (2), 163168.

    Table 2

    The predictive accuracy of DE/QDE, Ant-Miner, Ant-Miner without rule pruning and CN2 for the datasets.

    DE/QDE Ant-Miner Ant-Miner without rule pruning CN2

    Breast cancer (L) 75.52 4.91 75.28 2.24 70.69 3.87 67.69 3.59Breast cancer (W) 92.68 5.07 96.04 0.93 95.74 0.74 94.88 0.88Tic-tac-toe 98.85 2.07 73.04 2.53 76.83 2.27 97.38 0.52Dermatology 91.53 2.40 94.29 1.20 83.05 1.94 90.38 1.66Hepatitis 90.97 6.34 90.00 3.11 92.50 2.76 90.00 2.50

    Cleveland 52.15 3.61 59.67 2.50 54.82 2.56 57.48 1.78

    Table 3

    The average number of rules of DE/QDE, Ant-Miner, Ant-Miner without rule pruning and CN2 for the datasets.

    DE/QDE Ant-Miner Ant-Miner without rule pruning CN2

    Breast cancer (L) 6.30 1.19 7.10 0.31 19.60 0.22 55.40 2.07Breast cancer (W) 11.80 1.08 6.20 0.25 22.80 0.20 18.60 0.45Tic-tac-toe 10.00 0.00 8.50 0.62 68.80 0.32 39.70 2.52Dermatology 11.90 2.02 7.30 0.15 25.90 0.31 18.50 0.47Hepatitis 4.30 0.64 3.40 0.16 6.80 0.13 7.20 0.25Cleveland 11.10 1.37 9.50 0.92 21.80 0.20 42.40 0.71

    Table 4The number of terms per rule of DE/QDE, Ant-Miner, Ant-Miner without rule pruning and CN2 for the datasets.

    DE/QDE Ant-Miner Ant-Miner without rule pruning CN2

    Breast cancer (L) 2.80 1.28 3.25 2.21Breast cancer (W) 1.20 1.97 5.72 2.39Tic-tac-toe 2.60 1.18 3.47 2.90Dermatology 3.11 3.16 16.86 2.47Hepatitis 2.98 2.41 6.01 1.58Cleveland 3.38 1.71 4.32 2.79

    H. Su et al. / Expert Systems with Applications 37 (2010) 12161222 1221

    http://www.ics.uci.edu/mlearn/MLRepository.htmlhttp://www.ics.uci.edu/mlearn/MLRepository.htmlhttp://www.ics.uci.edu/mlearn/MLRepository.html
  • 8/10/2019 ga-pso1

    7/7

    Clark, P., & Niblett, T. (1989). The CN2 induction algorithm. Machine Learning, 3 (4),261283.

    Collard, M., & Francisci, D. (2001). Evolutionary data mining: an overview of genetic-based algorithms. In Proceedings of the IEEE congress on evolutionarycomputation (pp. 39).

    deAraujo, D. L. A., Lopes, H.S., & Freitas, A. A.(1999). A parallelgenetic algorithm forrule discovery in large databases. In Proceedings of the IEEE congress on systems,man and cybernetics, Tokyo (pp. 940945).

    Falco, I., Cioppa, A., & Tarantino, E. (2002). Discovering interesting classicationrules with genetic programming. Applied Soft Computing (1), 257269.

    Fidelis, M. V., Lopes, H. S., & Freitas, A. A. (2000). Discovering comprehensibleclassication rules with a genetic algorithm. In Proceedings of the IEEE congresson evolutionary computation (pp. 805810).

    Han, K.-H., & Kim, J.-H. (2002). Quantum-inspired evolutionary algorithm for a classof combinatorial optimization. IEEE Transactions on Evolutionary Computation,6 (6), 580593.

    Han, K.-H., & Kim, J.-H. (2004). Quantum-inspired evolutionary algorithms with anew termination criterion, H gate and two-phase scheme. IEEE Transactions onEvolutionary Computation, 8 (2), 156169.

    Holden, N., & Freitas, A. A. (2008). A hybrid PSO/ACO algorithm for discoveringclassication rules in data mining. Journal of Articial Evolution and Applications, 2008 , 11. Article ID 316145. doi:10.1155/2008/316145 .

    Holland, J. H. (1986). Escaping brittleness: The possibilities of general purposelearning algorithms applied to parallel rule-based systems. Machine Learning: An Articial Intelligence Approach, 2 , 593623.

    Jiao, L., Liu, J., & Zhong, W. (2006). An organizational coevolutionary algorithm forclassication. IEEE Transaction on Evolutionary Computation, 10 (1), 6780.

    Johnson, H. E., Gilbert, R. J., & Winson, Michael K. (2000). Explanatory analysis of themetabolome using genetic programming of simple interpretable rules. Genetic Programming and Evolvable Machines (1), 243258.

    Jong, K. A. D., Spears, W. M., & Gordon, D. F. (1993). Using genetic algorithms forconcept learning. Machine Learning, 13 (23), 161188.

    Lampinen, J., & Zelinka, I. (1999). Mixed integerdiscretecontinuous optimizationby differential evolution, part 1. In Proceedings of the fth international congresson soft computing .

    Liu, J. J., & Kwok, J. T.-Y. (2000). An extended genetic rule induction algorithm. InProceedings of the IEEE congress on evolutionary computation (pp. 458463).

    Pampara, G., Engelbrecht, A., & Franken, N. (2006). Binary differential evolution. InProceedings of the IEEE congress on evolutionary computation (pp. 18731879).

    Parpinelli, R. S., Lopes, H. S., & Freitas, A. A. (2002). Data mining with an ant colonyoptimization algorithm. IEEE Transactions on Evolutionary Computing, 6 (4),321332.

    Smith, S. F. (1983). Flexible learning of problem solving heuristics through adaptivesearch. In Proceeding of 8th international congress on articial intelligence,Karlsruhe, Germany (pp. 422425).

    Sousa, T., Silva, A., & Neves, A. (2004). Particle swarm based data mining algorithmsfor classication tasks. Parallel Computing, 30 , 267783.

    Stron, R., & Price, K. (1997). Differential evolution A simple and efcient heuristicfor global optimization over continuous spaces. Journal of Global Optimization,11 (4), 341359.

    Su, H., & Yang, Y. (2008). Quantum-inspired differential evolution for binaryoptimization, In The 4-th international conference on natural computation (pp.341346).

    1222 H. Su et al. / Expert Systems with Applications 37 (2010) 12161222

    http://dx.doi.org/10.1155/2008/316145http://dx.doi.org/10.1155/2008/316145