Rule Extraction

Embed Size (px)

Citation preview

  • 7/29/2019 Rule Extraction

    1/13

    Applied Soft Computing 4 (2004) 6577

    Extracting rules from trained neural network usingGA for managing E-business

    A. Ebrahim Elalfi a,, R. Haque b, M. Esmel Elalami a

    a Department of Computer Instructor Preparation, Faculty of Specific Education, Mansoura University, Mansoura, Egyptb High Tech. International.com, Montreal, Que., Canada

    Received 23 September 2002; received in revised form 13 August 2003; accepted 19 August 2003

    Abstract

    Theability to intelligently collect, manageand analyze information about customersand sellers is a key sourceof competitive

    advantage for an e-business. This ability provides an opportunity to deliver real time marketing or services that strengthen

    customer relationships. This also enables an organization to gather business intelligence about a customer that can be used

    for future planning and programs.

    This paper presents a new algorithm for extracting accurate and comprehensible rules from databases via trained artificial

    neural network (ANN) using genetic algorithm (GA). The new algorithm does not depend on the ANN training algorithms

    also it does not modify the training results. The GA is used to find the optimal values of input attributes (chromosome),

    Xm, which maximize the output function k of output node k. The function k = f(xi, (WG1)i,j, (WG2)j,k) is nonlinear

    exponential function. Where (WG1)i,j, (WG2)j,k are the weights groups between input and hidden nodes, and hidden and

    output nodes, respectively. The optimal chromosome is decoded and used to get a rule belongs to classk .

    2003 Elsevier B.V. All rights reserved.

    Keywords: E-business; Artificial neural network; Genetic algorithms; Personalization; Online shopping; Rule extraction

    1. Introduction

    E-commerce has evolved from consumers con-

    ducting basic transactions on the Web, to a complete

    retooling of the way partners, suppliers and cus-tomers transact. Now one can link dealers and sup-

    pliers online, reducing both lag time and paperwork.

    One can move procurement online by setting up an

    extranet that links directly to vendors, cutting inven-

    tory carrying costs and becoming more responsive

    to his/her customers. Also, you can streamline your

    Corresponding author.

    E-mail address: ael [email protected] (A.E. Elalfi).

    financial relationships with customers and suppliers

    by Web-enabling billing and payment systems.

    Recent literature suggests that Internet and WWW

    as a business transaction tool provides both firms and

    consumers with various benefits including lower trans-action cost, lower search cost, and greater selection of

    goods [1].

    The ability to provide content and services to indi-

    viduals on the basis of knowledge about their prefer-

    ences and behavior has become important marketing

    tool [2].

    A complete customer profile has two parts: factual

    and behavioral. The factual profile contains informa-

    tion, such as name, gender, and date of birth that the

    personalization system obtained from the customers

    1568-4946/$ see front matter 2003 Elsevier B.V. All rights reserved.

    doi:10.1016/j.asoc.2003.08.004

  • 7/29/2019 Rule Extraction

    2/13

    66 A.E. Elalfi et al. / Applied Soft Computing 4 (2004) 6577

    factual data. The factual profile also can contain infor-

    mation derived from the transaction data. A behavioral

    profile models the customers actions and is usually

    derived from transactional data.Personalization begins with collecting customer

    data from various sources. This data might include

    histories of customers; web purchasing and browsing

    activities, as well as demographic and psychological

    information. After the data is collected, it must be

    prepared, cleaned, and stored in data warehouse.

    Real world data is dirty. Data cleaning including the

    removal of contradictory and redundant data items and

    the elimination of irrelevant attributes has been an im-

    portant topic in data mining research development [3].

    Extracting rules from a given database via trained

    neural networks is important [4]. Although several al-

    gorithms have been proposed by several researchers

    [5,6], there is no algorithm which can be applied to

    any type of networks, to any training algorithm, and

    to both discrete and continuous values [4]. A method

    for extracting M-of-N rules from trained artificial neu-

    ral networks (ANN) was presented by Setiono [5].

    However, the algorithm was based on the standard

    threelayered feed forward networks. Also the at-

    tributes of the database are assumed to have binary

    values 1 or 1. Hiroshi had presented a decomposi-

    tion algorithm that can be applied to multilayer ANNand recurrent networks [6]. The units of ANN are ap-

    proximated by Boolean functions. The computational

    complexity of the approximation is exponential, and

    so a polynomial algorithm was presented [7]. To re-

    duce the computational complexity higher order terms

    were neglected. Consequently the extraction of accu-

    rate rules is not guaranteed.

    An approach for extracting rules from trained ANNs

    for regression was presented [13]. Each rule in the ex-

    tracted rule set corresponds to subregion of the input

    space and a linear function involving the relevant inputattributes of the data approximates the network output

    for all data samples in this subregion. However, the

    method extracts rules from trained ANN by approxi-

    mating the hidden activation function; h(x) = tanh(x)

    by either three-piece or five-piece linear function. This

    approximation yields to less accuracy and makes the

    computation burdensome.

    This paper presents a new algorithm for extracting

    rules from trained neural network using genetic algo-

    rithm. It does not depend on the training algorithms of

    ANN and does not modify the training results. Also

    the algorithm can be applied on discrete and con-

    tinuous attributes. The algorithm does not make any

    approximation to the hidden unit activation function.Additionally it takes into consideration any number

    of hidden layers in the trained ANN.

    The extracted rules can be used to define customer

    profile in order to make easy online shopping.

    2. Problem formulation

    A supervised ANN uses a set of training examples

    or records. These records include N attributes. Each

    attribute, An (n = 1, 2, . . . , N ), can be encoded into a

    fixed length binary sub-string {x1 . . . xi . . . xmn}, wheremn is the number of possible values for an attribute

    An. The element xi = 1 if its corresponding attribute

    value exists, while all the other elements = 0. So, the

    proposed number of input nodes, I, in the input layer

    of ANN can be given by

    I=

    Nn=1

    mn (1)

    The input attributes vectors, Xm, to the input layer can

    be rewritten asXm = {x1 . . . xi . . . xI}m (2)

    where m = (1, 2, . . . , M), M is the total number of

    input training patterns.

    The output class vector, Ck(k = 1, 2, . . . , K), can

    be encoded as a bit vector of a fixed length K as fol-

    lows:

    Ck{1 . . . k . . . K} (3)

    where Kis the number of different possible classes. If

    the output vector belongs to classk then the element kis equal to 1 while all the other elements in the vector

    are zeros. Therefore, the proposed number of output

    nodes in the output layer of ANN is K. Accordingly the

    input and the output nodes of the ANN are determined

    and the structure of the ANN is shown in Fig. 1. The

    ANN is trained on the encoded vectors of the input

    attributes and the corresponding vectors of the output

    classes. The training of ANN is processed until the

    convergence rate between the actual and the desired

    output will be achieved. The convergence rate can be

  • 7/29/2019 Rule Extraction

    3/13

    A.E. Elalfi et al. / Applied Soft Computing 4 (2004) 6577 67

    Fig. 1. The structure of the ANN.

    improved by changing the number of iterations, the

    number of hidden nodes (J), the learning rate, and the

    momentum rate.

    After training the ANN, two groups of weights can

    be obtained. The first group, (WG1)i,j, includes the

    weights between the input node i and the hidden nodej. The second group, (WG2)j,k , includes the weights

    between the hidden node j and the output node k. The

    activation function used in the hidden and output nodes

    of the ANN is a sigmoid function.

    The total input to the jth hidden node, IHN j, is

    given by;

    IHNj =

    Ii=1

    xi(WG1)i,j (4)

    The output of the jth hidden node, OHNj, is givenby

    OHNj =1

    1 + e

    Ii=1xi(WG1)i,j

    (5)

    The total input to the kth output node, IONk, is given

    by

    IONk =

    Jj=1

    (WG2)j,k1

    1 + e

    Ii=1xi(WG1)i,j

    (6)

    So, the final value of the kth output node, k, is given

    by

    k

    =

    11 + e

    Jj=1WG2j,k

    1/1+e

    Ii=1

    xi(WG1)i,j

    (7)

    The function, k = f(xi, (WG1)i,j, (WG2)j,k) is an

    exponential function in xi since (WG1)i,j, (WG2)j,kare constants. Its maximum output value is equal

    one.

    Definition. An input vector, Xm, belongs to a

    classk iff k Cm = 1 and all other elements in

    Cm = 0.

    Consequently, for extracting relation (rule) between

    the input attributes, Xm relating to a specific classkone must find the input vector, which maximizes k.

    This is an optimization problem and can be stated as:

    Maximize

  • 7/29/2019 Rule Extraction

    4/13

    68 A.E. Elalfi et al. / Applied Soft Computing 4 (2004) 6577

    k(xi)

    =

    11 + e

    Jj=1WG2j,k

    1/1+e

    Ii=1

    xi(WG1)i,j

    (8)

    Subjected to:

    xi are binary values (0 o r 1) (9)

    Since the objective function k(xi) is nonlinear and the

    constraints are binary so, it is a nonlinear integer op-

    timization problem. The genetic algorithm (GA) can

    be used to solve it. The following algorithm explains

    how the GA can be used to obtain the best chromo-

    some, which maximizes objective function k(xi):

    Begin{

    Assume the fitness function as k(xi)

    Create a chromosome structural as follows:{

    Generate number of slots equal I, which rep-

    resent input vector X.

    Put a random value 0 or 1 in each slot}G = 0

    where G is the number of generation.

    Create the initial population, P, ofTchromosomes,P(t)G,

    where t= 1 to T.

    Evaluate the fitness function according to P(t)G

    while termination conditions not satisfied

    Do {G = G + 1Select number of chromosomes from P(t)G accord-

    ing to the roulette wheel procedure

    Recombine between them using crossover and mu-tation;

    Modify the population from P(t)G1 to P(t)G

    Evaluate the fitness function according to P(t)G

    }Display the best chromosome that satisfies the

    conditions}End

    For extracting a rule belongs to classk the best chro-

    mosome must be decoded as follows:

    The best chromosome is divided into N segments.

    Each segment represents one attribute, An (n =

    1, 2, . . . , N ), and has a corresponding bits length

    mn which represents their values. The attribute values are existed if the corresponding

    bits in the best chromosome equal one and vice

    versa.

    The operators OR and AND are used to corre-

    late the existing values of the same attribute and the

    different attributes, respectively.

    After getting the set of rules make rule refinement

    and cancel redundant attributes, e.g. if an attribute

    has three values such as A, B, and C and a rule

    looks like:

    If attk has value A or B or C then classk such

    attribute can be dropped (redundant).

    The overall methodology of rule extraction is shown

    in Fig. 2.

    3. Generalization for multiple hidden layers

    The objective function obtained in Eq. (8) can be

    generalized for ANN, which has more than one hidden

    layer. Fig. 3 shows the ANN that includes three hidden

    layers.

    The function, k, in the final form for the kth output

    mode is given by

    k =1

    1 + e

    Jj3=1

    [1/(1+eA)(WG4)j3k] (10)

    where

    A=

    Jj2=1

    1

    1 + e

    Jj1=1

    1/1+e

    Ii=1

    XiWG1)ij1

    (WG2)j1j2

    (WG3)j2j3

    (11)

    Xi is the input values, where i = 1, 2, . . . , I . I is the

    total number of nodes at input layer, j1 = 1, 2, . . . , J ,

    for first hidden layer; j2 = 1, 2, . . . , J , for second hid-

    den layer; j3 = 1, 2, . . . , J , for third hidden layer; J

    is the total number of nodes at each hidden layer; k =

    1, 2, . . . , K; K is the total number of nodes at output

    layer; (WG1)ij1 weights group between input layer i,

  • 7/29/2019 Rule Extraction

    5/13

    A.E. Elalfi et al. / Applied Soft Computing 4 (2004) 6577 69

    Yes

    Yes

    Yes

    Input nodes = 1 , 2 ,..................., I

    Hideen nodes = 1 , 2 ,..................., J

    Output nodes = 1, 2 ,...................., K

    .

    Structure ANN with random paramters

    ( learning coef. , momentum coef. )

    Is the error

    satisfactory ?

    Is the

    iteration

    reach max. ?

    Iteration =

    Iteration +1

    Create another random paramters for

    ANN

    Separate database into input vectors (Xm

    ) and

    crossponding output vectors (Cm

    )

    Database is coded as bit string

    k = 1

    Create an intial population

    Iteration = 0.0

    Evaluate the fittness function

    Selection

    Crossover

    Mutation

    Update the population

    Is the iteration

    reach max. ?

    Iteration =

    Iteration +1

    Arrange the fittness function from up to down until to certain level.

    Decode the crossponding population into equivalnt rules which meet the classk

    Is k reach max. ? Stopk = k + 1

    Iteration = 0.0

    No

    No

    No

    Extract the weight groups { (WG1)i,j

    & (WG2)j,k

    }

    Create a set of general form for output function, )(i

    x

    Maximize the fittness function )( ix

    Yes

    No

    k

    k

    Fig. 2. Overall flowchart for the proposed methodology.

  • 7/29/2019 Rule Extraction

    6/13

    70 A.E. Elalfi et al. / Applied Soft Computing 4 (2004) 6577

    X1

    X2

    XI

    XI

    Input layeri

    Hidden layerj

    Output layerk

    (WG1)i,,j10

    1

    0

    0

    1

    1

    1

    0

    1

    1

    1

    0

    Output classvectores

    Input attributesvectors

    (WG4)j3,k

    S1

    Sk K

    110

    0

    0

    0

    00

    1

    0

    0

    01

    0

    Class Class Class

    j1 j2 j3

    (WG2)j1,,j2(WG3)j2,,j3

    Fig. 3. ANN with three hidden layers.

    and first hidden layer, j1. (WG2)j1j2 is the weights

    group between first hidden layer, j1, and second hid-

    den layer, j2. (WG3)j2j3 is the weights group between

    second hidden layer, j2, and third hidden layer, j3.(WG4)j3k is the weights group between third hidden

    layer, j3, and output layer, k.

    4. Personalized marketing and customer retention

    strategies

    As organizations attempt to develop marketing and

    customer retention strategies, they will need to collect

    visitors statistics and integrate data across systems.

    Additionally, there is a need to improve data about

    inventories. Personalization is a relatively new field,

    and different authors provide various definitions of the

    concept [11]. Fig. 4 shows the stages of personaliza-

    tion as an iterative process [2].

    Fig. 4. Stages of the personalization process.

    A framework in order to identify individual user be-

    havior by a system to make easy online shopping and

    to maximize user satisfaction has presented [12]. It is

    clear that the individual user behavior will act based

    on his/her preferences, attitude and personality. Each

    individual behavior such as preferences and attitudes

    are different from the others. An individual activities

    or expressions are monitored and captured by using

    sensing devices. The individual user behaviors are rec-

    ognized by pattern recognition systems (PRSs). The

    intelligent agents are used to make system strategies

    or plans based on the individual user behaviors and

    product state; so that the system can act as per indi-

    vidual behaviors to make it easy online shopping.

    A proposed record for products and inventories can

    have the following attributes: product name, color,

    store size, city, month, quantity, quantity sold, profit.

    A record for factual data include: customer ID, cus-

    tomer name, gender, birth date, nationality.

  • 7/29/2019 Rule Extraction

    7/13

    A.E. Elalfi et al. / Applied Soft Computing 4 (2004) 6577 71

    Table 1

    Example for target concept play tennis [8]

    Day Outlook Temperature Humidity Wind Play

    tennisD1 Sunny Hot High Weak No

    D2 Sunny Hot High Strong No

    D3 Overcast Hot High Weak Yes

    D4 Rain Mild High Weak Yes

    D5 Rain Cool Normal Weak Yes

    D6 Rain Cool Normal Strong No

    D7 Overcast Cool Normal Strong Yes

    D8 Sunny Mild High Weak No

    D9 Sunny Cool Normal Weak Yes

    D10 Rain Mild Normal Weak Yes

    D11 Sunny Mild Normal Strong Yes

    D12 Overcast Mild High Strong Yes

    D13 Overcast Hot Normal Weak Yes

    D14 Rain Mild High Strong No

    A record for transactional data may include the at-

    tributes: customer ID, date, time, store, product,

    coupon used.

    5. Illustrative example

    A given database (has four attributes and two dif-

    ferent output classes) is shown in Table 1 [8]. The

    encoding values of the given database are shown in

    Table 2. The ANN is trained on the encoding input

    attributes vectors, Xm, and the corresponding output

    Table 2

    Encoding database

    i/p

    patt.

    Outlook, m1 = 3 Temperature, m2 = 3 Humidity, m3 = 2 Wind, m4 = 2 O/P patt. Play tennis

    Xm Sunny

    (x1)

    Overcast

    (x2)

    Rain

    (x3)

    Hot

    (x4)

    Mild

    (x5)

    Cool

    (x6)

    High

    (x7)

    Norm

    (x8)

    Weak

    (x9)

    Strong

    (x10)

    Cm 1No

    2Yes

    X1 1 0 0 1 0 0 1 0 1 0 C1 1 0

    X2 1 0 0 1 0 0 1 0 0 1 C2 1 0

    X3 0 1 0 1 0 0 1 0 1 0 C3 0 1X4 0 0 1 0 1 0 1 0 1 0 C4 0 1

    X5 0 0 1 0 0 1 0 1 1 0 C5 0 1

    X6 0 0 1 0 0 1 0 1 0 1 C6 1 0

    X7 0 1 0 0 0 1 0 1 0 1 C7 0 1

    X8 1 0 0 0 1 0 1 0 1 0 C8 1 0

    X9 1 0 0 0 0 1 0 1 1 0 C9 0 1

    X10 0 0 1 0 1 0 0 1 1 0 C10 0 1

    X11 1 0 0 0 1 0 0 1 0 1 C11 0 1

    X12 0 1 0 0 1 0 1 0 0 1 C12 0 1

    X13 0 1 0 1 0 0 0 1 1 0 C13 0 1

    X14 0 0 1 0 1 0 1 0 0 1 C14 1 0

    Table 3

    Group of weights (WG1)i,j between input and hidden nodes

    Input

    nodes

    Hidden nodes

    H1 H2 H3 H4

    x1 4.09699 3.741246 1.2106 1.42853

    x2 6.154562 4.56639 0.349845 1.109533

    x3 0.82675 1.114981 0.153325 0.47917

    x4 0.42227 0.2961 0.19704 0.55404

    x5 4.128692 3.07741 0.15498 0.651919

    x6 2.73254 2.595217 0.56767 0.32539

    x7 4.93463 4.005334 1.17037 0.89697

    x8 5.282225 4.36782 0.235355 0.616702

    x9 3.060052 3.11607 1.106763 0.56799

    x10 3.63009 2.284223 1.36338 1.02158

    Table 4Group of weights (WG2)j,k output and hidden nodes

    Output

    nodes

    Hidden nodes

    H1 H2 H3 H4

    1 9.20896 9.012731 1.2113 0.90564

    2 9.22879 9.00487 0.773881 1.218929

    classes vectors, Cm. The number of input nodes is

    given by

    I=

    Nn=1

    mn = m1 +m2 + m3 + m4 = 10

    The number of output nodes is K = 2.

  • 7/29/2019 Rule Extraction

    8/13

    Table 5

    The rule extraction for class no (1 is maximum)

    Rule no. Fitness Xi vector from GA Directly extracted rules (dont play) Rules refinem

    A1 A2 A3 A4

    x1 x2 x3 x4 x5 x6 x7 x8 x9 x10

    1 0.99988 1 0 0 1 1 1 1 0 0 1 If Outlook is Sunny And Temperature is

    Hot or Mild or Cool And Humidity is

    High And WIND is Strong

    If Outlook is

    And WIND

    2 0.999874 1 0 1 0 0 1 1 0 0 1 If Outlook is Sunny or Rain And

    Temperature is Cool And Humidity is

    High And WIND is Strong

    If Outlook i

    Temperatur

    And WIND

    3 0.999867 1 0 0 1 1 1 1 0 1 1 If Outlook is Sunny And TEPERATURE

    is Hot or Mild or Cool And Humidity is

    High And WIND is weak or Strong

    If Outlook i

    4 0.999849 0 0 1 0 0 1 1 1 0 1 If Outlook is Rain And Temperature is

    Cool And Humidity is High or Normal

    And Wind is Strong

    If Outlook i

    And Wind i

  • 7/29/2019 Rule Extraction

    9/13

    Table 6

    The rule extraction for class yes (2 is maximum)

    Rule no. Xi vector from GA Directly extracted rules (play) Rules refi

    A1 A2 A3 A4

    x1 x2 x3 x4 x5 x6 x7 x8 x9 x10

    1 0.99998 0 1 1 0 0 0 0 0 1 0 If Outlook is Overcast or Rain And Wind

    is Weak

    If Outlo

    is Weak

    2 0.999972 0 1 0 0 0 0 0 0 0 0 If Outlook is Overcast If Outlo

    3 0.999960 1 1 0 0 0 0 0 1 0 0 If Outlook is Sunny or Overcast And

    Humidity is Normal

    If Outloo

    Humidit

  • 7/29/2019 Rule Extraction

    10/13

    74 A.E. Elalfi et al. / Applied Soft Computing 4 (2004) 6577

    Table 7

    RITIO induced rule set from Table 1 [9]

    Rule no. Rule

    1 If Outlook is Sunny And Humidity is High Then CLASS No2 If Outlook is Overcast And Humidity is High Then CLASS Yes

    3 If Humidity is Normal Then CLASS Yes

    4 If HUMIDITY is Normal And Wind is Weak Then CLASS Yes

    5 If Outlook is Rain And Humidity is High And Wind is Weak Then CLASS Yes

    6 If Outlook is Rain And Humidity is normal And Wind is Strong Then CLASS No

    7 If Outlook is Rain And Humidity is high And Wind is Strong Then CLASS No

    The convergence rate between the actual and

    the desired output is achieved by: 4 hidden nodes,

    0.55 learning coefficient, 0.65 momentum coefficient

    and 30,000 iterations. The allowable error equals

    0.000001. Table 3 shows the first group of weights(WG1)i,j between each input node and the hidden

    nodes. The second group of weights (WG2)j,k be-

    tween each hidden node and the output nodes is

    shown in Table 4.

    Applying the GA to solve the equation 1 in order

    to get the i/p attributes vector which maximizes that

    function.

    The GA has population of 10 individuals evolv-

    ing during 1300 generations. The crossover and the

    mutation were 0.25 and 0.01 respectively. The out-

    put chromosomes ofplay and dont play target classesare sorted descendingly according to their fitness val-

    ues. The threshold levels of the two target classes are

    0.99996 and 0.999849, respectively.

    Therefore, both the local and global maximum of

    output chromosomes has been determined and will be

    translated into rules. Tables 5 and 6 present the best

    set of rules belonging to dont play and play target,

    respectively.

    Table 7 shows RITIO induced set of rule for the

    same database [9]. Although RITIO gives a good

    indication of the algorithm stability over different

    databases; the rule number 3 is not verified. The algo-rithm proposed here shows that all rules are verified.

    6. Application and results

    The MONKS problems are benchmark binary clas-

    sification tasks in which robots are described in terms

    of six characteristics and a rule is given which spec-

    ifies the attributes that determine membership of the

    Table 8

    The attributes and their values of MONK1S database [10]

    Robot characteristics (attributes) Nominal values

    Head shape Round, square, octagon

    Body shape Round, square, octagon

    Is smiling Yes, no

    Holding Sword, flag, balloon

    Jacket colour Red, yellow, green, blue

    Has tie yes, no

    target class [10]. The six attributes and their values are

    shown in Table 8.

    The two rules that determine the memberships of

    the target class in the MONK1S database are shown

    in Table 9.

    The ANN is trained on 123 input vectors, Xm. The

    corresponding output classes vectors, Cm are shown

    in Table 10. The number of input nodes, I= 17, and

    the number of output nodes, K = 2. The convergence

    rate between the actual and desired output is achieved

    by: 6 hidden nodes, 0.25 learning coefficient, 0.85 mo-

    mentum coefficient and 31,999 iterations. The allow-

    able error equals 0.0000001.

    Table 11 shows the first group weights (WG1)i,jbetween each input node and the hidden nodes. The

    second group weights (WG2)j,k between each hidden

    node and the output nodes is shown in Table 12.

    Table 9

    Two rules satisfy the target

    Rule 1 Rule 2

    If Head Shape Value = If Jacket Color = Red

    Body Shape Value THEN Robot is in

    THEN Target Class

    Robot is in Target Class

  • 7/29/2019 Rule Extraction

    11/13

    A.E. Elalfi et al. / Applied Soft Computing 4 (2004) 6577 75

    Table 10

    The MONK1S database [10]

    Xm Head shape Body shape Is smiling Holding Jacket colour Has tie Cm Target

    1 Round Round Yes Sword Green Yes 1 Yes

    2 Round Round Yes Flag Yellow Yes 2 Yes

    3 Round Square Yes Sword Green Yes 3 No

    4 Round Octagon Yes Flag Blue Yes 4 No

    55 Square Round Yes Sword Green Yes 55 No

    56 Square Square Yes Sword Green Yes 56 Yes

    57 Square Square Yes Flag Red No 57 Yes

    58 Square Octagon No Balloon Red Yes 58 Yes

    120 Octagon Round No Sword Red Yes 120 Yes

    121 Octagon Round No Balloon Yellow No 121 No

    122 Octagon Octagon No Flag Yellow No 122 Yes

    123 Octagon Octagon No Flag Green No 123 Yes

    Table 11

    Group of weights (WG1)i,j between each input and hidden nodes

    Input nodes Hidden nodes

    H1 H2 H3 H4 H5 H6

    x1 5.08851 6.40872 2.478146 0.53785 3.331379 1.01267

    x2 4.094656 0.55311 2.24007 1.00648 6.64513 0.53136

    x3 2.711605 7.121283 2.49793 0.15809 0.468151 0.3962

    x4

    2.9641 7.48084 1.351769 0.69977 6.00667 0.18359

    x5 0.929943 7.760751 2.3443 0.53314 5.33333 0.2059

    x6 3.494829 0.138298 2.217123 0.63468 1.24655 0.4458

    x7 0.475753 0.275564 0.829914 1.09122 1.47744 0.8716

    x8 0.358807 0.269779 0.623271 1.23704 1.61803 0.92063

    x9 0.10996 0.243966 0.019956 0.29096 1.02741 0.006704

    x10 0.385337 0.31376 0.989733 0.58041 0.54741 0.50737

    x11 0.13311 0.07916 0.539239 1.02715 0.74859 0.77975

    x12 7.31878 12.26899 4.98723 0.279794 4.79433 0.471633

    x13 2.941625 3.99095 1.822638 0.49974 0.666357 1.03168

    x14 2.469945 4.1919 2.270769 0.57977 0.686182 1.01134

    x15 2.658616 3.47783 2.435963 0.62123 1.15922 0.59382

    x16 0.48247 0.314717 0.777509 0.83715 1.61191 0.56232

    x17 0.878135 0.340808 0.315489 0.77439 2.04905 1.21304

    Table 12

    Group of weights (WG2)j,k between each hidden and output nodes

    Output nodes Hidden nodes

    H1 H2 H3 H4 H5 H6

    1 13.3740 14.5207 6.48067 0.40159 11.70462 0.52939

    2 13.37457 14.52426 6.48808 0.07072 11.7054 0.33697

  • 7/29/2019 Rule Extraction

    12/13

    Table 13

    The set of rules belongs to target class

    Rule no. Fitness Xi vector from GA Directly extracted rules

    A1 A2 A3 A4 A5 A6

    1 0.9999 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x16 x17 If Jacket color is Red

    2 0.99947 0 0 1 0 0 1 1 1 1 1 1 1 1 1 1 0 0 If Head Shape is Octagon AN

    Body Shape is Octagon AND

    Smiling is Yes OR No AND

    Holding is Sword OR Flag O

    Balloon AND Jacket Color is

    Red OR Yellow OR Green OBlue

    3 0.99946 0 1 0 0 1 0 1 1 1 1 1 0 0 0 0 1 1 If Head Shape is Square AND

    Body Shape is Square AND I

    Smiling is Yes OR No AND

    Holding is Sword OR Flag O

    Balloon AND Has Tie is Yes

    OR No

    4 0.99845 1 0 0 1 0 0 1 1 1 1 1 0 0 0 0 0 0 If Head Shape is Round AND

    Body Shape is Round AND I

    Smiling is Yes OR No AND

    Holding is Sword OR Flag O

    Balloon

  • 7/29/2019 Rule Extraction

    13/13

    A.E. Elalfi et al. / Applied Soft Computing 4 (2004) 6577 77

    Table 14

    Accuracy results for different algorithms [9]

    Database MONK1S

    HCV (%) 100C4.5 (%) 83.3

    RITIO (%) 97.37

    C4.5 rules (%) 100

    Proposed algorithm (%) 100

    The GA has a population of 10 individuals evolving

    during 1225 generations. The crossover and mutation

    are 0.28 and 0.002, respectively. The output chromo-

    somes for target class are sorted according to their fit-

    ness values until the level 0.99845. Table 13 presents

    the best set of rules, belongs to target class according

    to the fitness values.From Table 13, the rules extracted from the pro-

    posed algorithm and the standard rules given in Table 9

    are identical. This shows a good indication of the algo-

    rithm stability. The accuracy of the proposed algorithm

    among different algorithms for MONK1s database is

    shown in Table 14 [9].

    The discovered rules for hypothetical individual

    person data and the products are in the following

    format:

    IF PRODUCT = Hat THEN Profit = Medium.

    IF Color = Blue THEN Profit = High.IF MONTH = June THEN Profit = Medium.IF MONTH = December THEN Profit = High.

    7. Conclusions

    A novel machine learning algorithm for extracting

    comprehensible rules have been presented in this pa-

    per. It does not need the computational complexity as

    deterministic finite state automata (DNF) algorithm.

    It takes all input attributes into consideration so it

    produces an accurate rules but other algorithms suchas DNF uses only the input attributes up to certain

    level. Also, it uses only part of weights to extract rules

    belongs to certain class. So it has a less computational

    time compared with another algorithms. The proposed

    methodology does not make any approximation to the

    activation function.

    The user profile information stored in a database

    along with a unique user ID and password. A datawarehouse repository with such data can be analyzed.

    This algorithm can help devise rules to govern which

    messages are offered to the an anonymous prospect,

    how to counter points of resistance, and when to at-

    tempt to close a sale.

    The future work should consist of more experiments

    with other data sets, as well as more elaborated exper-

    iments to optimize the GA parameters of the proposed

    algorithm.

    References

    [1] J. Jhang, H. Jain, K. Ramamurthy, Effective design of

    electronic commerce environments: a proposed theory of

    congruence and an illustration, IEEE Trans. Systems Man

    Cybernet. Part A: Syst. Hum. 30 (4) (2000) 456471.

    [2] G. Adomavicius, A. Tuzbilin, Using data mining methods to

    build customer profiles, IEEE Comput. 34 (2) (2001) 7482.

    [3] X. Wu, D. Urpani, Induction by attribute elimination, IEEE

    Trans. Knowl. Data Eng. 11 (5) (1999) 805812.

    [4] H. Tsukimoto, Extracting rules from trained neural networks,

    IEEE Trans. Neural Networks 11 (2) (2000) 377389.

    [5] R. Setiono, Extracting M-of-N rules from trained neural

    networks, IEEE Trans. Neural Networks 11 (2) (2000) 512519.

    [6] F. Wotawa, G. Wotawa, Deriving qualitative rules from

    neural networksa case study for ozone forecasting AI

    communications, vol. 14, 2001, 23-33 ISSN 0921-7126,

    2001, IOS Press.

    [7] H. Tsukimoto, Extracting rules from trained neural networks,

    IEEE Trans. Neural Networks 11 (2) (2000) 377389.

    [8] Tom M. Mitchell, Machine Learning Book, Copyright 1997.

    [9] Xindond Wu, D. Urpani, Induction by attribute elimination,

    IEEE Trans. Knowl. Data Eng. 11 (5) (1999).

    [10] http://www.cse.unsw.edu.aucs3411/C4.5/Data.

    [11] Comm. ACM, Special Issue on Personalization, vol. 43, no.

    8, 2000.

    [12] A.E. El-Alfy, R. Haque, Y. Al-Ohali, A framework toemploy multi AI systems to facilitate easy online shopping,

    http://www-3.ibm.com/easy/eou ext.nsf/Publish/2049.

    [13] R. Setiono, W.K. Leow, J.M. Zurada, Extraction of rules

    from artificial neural networks for nonlinear regression, IEEE

    Trans. Neural Networks 13 (3) (2002) 564577.

    http://www-3.ibm.com/easy/eou_ext.nsf/Publish/2049http://www-3.ibm.com/easy/eou_ext.nsf/Publish/2049