Rule Extraction

7/29/2019 Rule Extraction

1/13

Applied Soft Computing 4 (2004) 6577

Extracting rules from trained neural network usingGA for managing E-business

A. Ebrahim Elalfi a,, R. Haque b, M. Esmel Elalami a

a Department of Computer Instructor Preparation, Faculty of Specific Education, Mansoura University, Mansoura, Egyptb High Tech. International.com, Montreal, Que., Canada

Received 23 September 2002; received in revised form 13 August 2003; accepted 19 August 2003

Abstract

Theability to intelligently collect, manageand analyze information about customersand sellers is a key sourceof competitive

advantage for an e-business. This ability provides an opportunity to deliver real time marketing or services that strengthen

customer relationships. This also enables an organization to gather business intelligence about a customer that can be used

for future planning and programs.

This paper presents a new algorithm for extracting accurate and comprehensible rules from databases via trained artificial

neural network (ANN) using genetic algorithm (GA). The new algorithm does not depend on the ANN training algorithms

also it does not modify the training results. The GA is used to find the optimal values of input attributes (chromosome),

Xm, which maximize the output function k of output node k. The function k = f(xi, (WG1)i,j, (WG2)j,k) is nonlinear

exponential function. Where (WG1)i,j, (WG2)j,k are the weights groups between input and hidden nodes, and hidden and

output nodes, respectively. The optimal chromosome is decoded and used to get a rule belongs to classk .

2003 Elsevier B.V. All rights reserved.

Keywords: E-business; Artificial neural network; Genetic algorithms; Personalization; Online shopping; Rule extraction

1. Introduction

E-commerce has evolved from consumers con-

ducting basic transactions on the Web, to a complete

retooling of the way partners, suppliers and cus-tomers transact. Now one can link dealers and sup-

pliers online, reducing both lag time and paperwork.

One can move procurement online by setting up an

extranet that links directly to vendors, cutting inven-

tory carrying costs and becoming more responsive

to his/her customers. Also, you can streamline your

Corresponding author.

E-mail address: ael [email protected] (A.E. Elalfi).

financial relationships with customers and suppliers

by Web-enabling billing and payment systems.

Recent literature suggests that Internet and WWW

as a business transaction tool provides both firms and

consumers with various benefits including lower trans-action cost, lower search cost, and greater selection of

goods [1].

The ability to provide content and services to indi-

viduals on the basis of knowledge about their prefer-

ences and behavior has become important marketing

tool [2].

A complete customer profile has two parts: factual

and behavioral. The factual profile contains informa-

tion, such as name, gender, and date of birth that the

personalization system obtained from the customers

1568-4946/$ see front matter 2003 Elsevier B.V. All rights reserved.

doi:10.1016/j.asoc.2003.08.004


2/13

66 A.E. Elalfi et al. / Applied Soft Computing 4 (2004) 6577

factual data. The factual profile also can contain infor-

mation derived from the transaction data. A behavioral

profile models the customers actions and is usually

derived from transactional data.Personalization begins with collecting customer

data from various sources. This data might include

histories of customers; web purchasing and browsing

activities, as well as demographic and psychological

information. After the data is collected, it must be

prepared, cleaned, and stored in data warehouse.

Real world data is dirty. Data cleaning including the

removal of contradictory and redundant data items and

the elimination of irrelevant attributes has been an im-

portant topic in data mining research development [3].

Extracting rules from a given database via trained

neural networks is important [4]. Although several al-

gorithms have been proposed by several researchers

[5,6], there is no algorithm which can be applied to

any type of networks, to any training algorithm, and

to both discrete and continuous values [4]. A method

for extracting M-of-N rules from trained artificial neu-

ral networks (ANN) was presented by Setiono [5].

However, the algorithm was based on the standard

threelayered feed forward networks. Also the at-

tributes of the database are assumed to have binary

values 1 or 1. Hiroshi had presented a decomposi-

tion algorithm that can be applied to multilayer ANNand recurrent networks [6]. The units of ANN are ap-

proximated by Boolean functions. The computational

complexity of the approximation is exponential, and

so a polynomial algorithm was presented [7]. To re-

duce the computational complexity higher order terms

were neglected. Consequently the extraction of accu-

rate rules is not guaranteed.

An approach for extracting rules from trained ANNs

for regression was presented [13]. Each rule in the ex-

tracted rule set corresponds to subregion of the input

space and a linear function involving the relevant inputattributes of the data approximates the network output

for all data samples in this subregion. However, the

method extracts rules from trained ANN by approxi-

mating the hidden activation function; h(x) = tanh(x)

by either three-piece or five-piece linear function. This

approximation yields to less accuracy and makes the

computation burdensome.

This paper presents a new algorithm for extracting

rules from trained neural network using genetic algo-

rithm. It does not depend on the training algorithms of

ANN and does not modify the training results. Also

the algorithm can be applied on discrete and con-

tinuous attributes. The algorithm does not make any

approximation to the hidden unit activation function.Additionally it takes into consideration any number

of hidden layers in the trained ANN.

The extracted rules can be used to define customer

profile in order to make easy online shopping.

2. Problem formulation

A supervised ANN uses a set of training examples

or records. These records include N attributes. Each

attribute, An (n = 1, 2, . . . , N ), can be encoded into a

fixed length binary sub-string {x1 . . . xi . . . xmn}, wheremn is the number of possible values for an attribute

An. The element xi = 1 if its corresponding attribute

value exists, while all the other elements = 0. So, the

proposed number of input nodes, I, in the input layer

of ANN can be given by

I=

Nn=1

mn (1)

The input attributes vectors, Xm, to the input layer can

be rewritten asXm = {x1 . . . xi . . . xI}m (2)

where m = (1, 2, . . . , M), M is the total number of

input training patterns.

The output class vector, Ck(k = 1, 2, . . . , K), can

be encoded as a bit vector of a fixed length K as fol-

lows:

Ck{1 . . . k . . . K} (3)

where Kis the number of different possible classes. If

the output vector belongs to classk then the element kis equal to 1 while all the other elements in the vector

are zeros. Therefore, the proposed number of output

nodes in the output layer of ANN is K. Accordingly the

input and the output nodes of the ANN are determined

and the structure of the ANN is shown in Fig. 1. The

ANN is trained on the encoded vectors of the input

attributes and the corresponding vectors of the output

classes. The training of ANN is processed until the

convergence rate between the actual and the desired

output will be achieved. The convergence rate can be


3/13

A.E. Elalfi et al. / Applied Soft Computing 4 (2004) 6577 67

Fig. 1. The structure of the ANN.

improved by changing the number of iterations, the

number of hidden nodes (J), the learning rate, and the

momentum rate.

After training the ANN, two groups of weights can

be obtained. The first group, (WG1)i,j, includes the

weights between the input node i and the hidden nodej. The second group, (WG2)j,k , includes the weights

between the hidden node j and the output node k. The

activation function used in the hidden and output nodes

of the ANN is a sigmoid function.

The total input to the jth hidden node, IHN j, is

given by;

IHNj =

Ii=1

xi(WG1)i,j (4)

The output of the jth hidden node, OHNj, is givenby

OHNj =1

1 + e

Ii=1xi(WG1)i,j

(5)

The total input to the kth output node, IONk, is given

by

IONk =

Jj=1

(WG2)j,k1

1 + e

Ii=1xi(WG1)i,j

(6)

So, the final value of the kth output node, k, is given

by

k

=

11 + e

Jj=1WG2j,k

1/1+e

Ii=1

xi(WG1)i,j

(7)

The function, k = f(xi, (WG1)i,j, (WG2)j,k) is an

exponential function in xi since (WG1)i,j, (WG2)j,kare constants. Its maximum output value is equal

one.

Definition. An input vector, Xm, belongs to a

classk iff k Cm = 1 and all other elements in

Cm = 0.

Consequently, for extracting relation (rule) between

the input attributes, Xm relating to a specific classkone must find the input vector, which maximizes k.

This is an optimization problem and can be stated as:

Maximize


4/13


k(xi)

=

11 + e

Jj=1WG2j,k

1/1+e

Ii=1

xi(WG1)i,j

(8)

Subjected to:

xi are binary values (0 o r 1) (9)

Since the objective function k(xi) is nonlinear and the

constraints are binary so, it is a nonlinear integer op-

timization problem. The genetic algorithm (GA) can

be used to solve it. The following algorithm explains

how the GA can be used to obtain the best chromo-

some, which maximizes objective function k(xi):

Begin{

Assume the fitness function as k(xi)

Create a chromosome structural as follows:{

Generate number of slots equal I, which rep-

resent input vector X.

Put a random value 0 or 1 in each slot}G = 0

where G is the number of generation.

Create the initial population, P, ofTchromosomes,P(t)G,

where t= 1 to T.

Evaluate the fitness function according to P(t)G

while termination conditions not satisfied

Do {G = G + 1Select number of chromosomes from P(t)G accord-

ing to the roulette wheel procedure

Recombine between them using crossover and mu-tation;

Modify the population from P(t)G1 to P(t)G

Evaluate the fitness function according to P(t)G

}Display the best chromosome that satisfies the

conditions}End

For extracting a rule belongs to classk the best chro-

mosome must be decoded as follows:

The best chromosome is divided into N segments.

Each segment represents one attribute, An (n =

1, 2, . . . , N ), and has a corresponding bits length

mn which represents their values. The attribute values are existed if the corresponding

bits in the best chromosome equal one and vice

versa.

The operators OR and AND are used to corre-

late the existing values of the same attribute and the

different attributes, respectively.

After getting the set of rules make rule refinement

and cancel redundant attributes, e.g. if an attribute

has three values such as A, B, and C and a rule

looks like:

If attk has value A or B or C then classk such

attribute can be dropped (redundant).

The overall methodology of rule extraction is shown

in Fig. 2.

3. Generalization for multiple hidden layers

The objective function obtained in Eq. (8) can be

generalized for ANN, which has more than one hidden

layer. Fig. 3 shows the ANN that includes three hidden

layers.

The function, k, in the final form for the kth output

mode is given by

k =1

1 + e

Jj3=1

[1/(1+eA)(WG4)j3k] (10)

where

A=

Jj2=1

1

1 + e

Jj1=1

1/1+e

Ii=1

XiWG1)ij1

(WG2)j1j2

(WG3)j2j3

(11)

Xi is the input values, where i = 1, 2, . . . , I . I is the

total number of nodes at input layer, j1 = 1, 2, . . . , J ,

for first hidden layer; j2 = 1, 2, . . . , J , for second hid-

den layer; j3 = 1, 2, . . . , J , for third hidden layer; J

is the total number of nodes at each hidden layer; k =

1, 2, . . . , K; K is the total number of nodes at output

layer; (WG1)ij1 weights group between input layer i,


5/13


Yes

Yes

Yes

Input nodes = 1 , 2 ,..................., I

Hideen nodes = 1 , 2 ,..................., J

Output nodes = 1, 2 ,...................., K

.

Structure ANN with random paramters

( learning coef. , momentum coef. )

Is the error

satisfactory ?

Is the

iteration

reach max. ?

Iteration =

Iteration +1

Create another random paramters for

ANN

Separate database into input vectors (Xm

) and

crossponding output vectors (Cm

)

Database is coded as bit string

k = 1

Create an intial population

Iteration = 0.0

Evaluate the fittness function

Selection

Crossover

Mutation

Update the population

Is the iteration

reach max. ?

Iteration =

Iteration +1

Arrange the fittness function from up to down until to certain level.

Decode the crossponding population into equivalnt rules which meet the classk

Is k reach max. ? Stopk = k + 1

Iteration = 0.0

No

No

No

Extract the weight groups { (WG1)i,j

& (WG2)j,k

}

Create a set of general form for output function, )(i

x

Maximize the fittness function )( ix

Yes

No

k

k

Fig. 2. Overall flowchart for the proposed methodology.


6/13


X1

X2

XI

XI

Input layeri

Hidden layerj

Output layerk

(WG1)i,,j10

1

0

0

1

1

1

0

1

1

1

0

Output classvectores

Input attributesvectors

(WG4)j3,k

S1

Sk K

110

0

0

0

00

1

0

0

01

0

Class Class Class

j1 j2 j3

(WG2)j1,,j2(WG3)j2,,j3

Fig. 3. ANN with three hidden layers.

and first hidden layer, j1. (WG2)j1j2 is the weights

group between first hidden layer, j1, and second hid-

den layer, j2. (WG3)j2j3 is the weights group between

second hidden layer, j2, and third hidden layer, j3.(WG4)j3k is the weights group between third hidden

layer, j3, and output layer, k.

4. Personalized marketing and customer retention

strategies

As organizations attempt to develop marketing and

customer retention strategies, they will need to collect

visitors statistics and integrate data across systems.

Additionally, there is a need to improve data about

inventories. Personalization is a relatively new field,

and different authors provide various definitions of the

concept [11]. Fig. 4 shows the stages of personaliza-

tion as an iterative process [2].

Fig. 4. Stages of the personalization process.

A framework in order to identify individual user be-

havior by a system to make easy online shopping and

to maximize user satisfaction has presented [12]. It is

clear that the individual user behavior will act based

on his/her preferences, attitude and personality. Each

individual behavior such as preferences and attitudes

are different from the others. An individual activities

or expressions are monitored and captured by using

sensing devices. The individual user behaviors are rec-

ognized by pattern recognition systems (PRSs). The

intelligent agents are used to make system strategies

or plans based on the individual user behaviors and

product state; so that the system can act as per indi-

vidual behaviors to make it easy online shopping.

A proposed record for products and inventories can

have the following attributes: product name, color,

store size, city, month, quantity, quantity sold, profit.

A record for factual data include: customer ID, cus-

tomer name, gender, birth date, nationality.


7/13


Table 1

Example for target concept play tennis [8]

Day Outlook Temperature Humidity Wind Play

tennisD1 Sunny Hot High Weak No

D2 Sunny Hot High Strong No

D3 Overcast Hot High Weak Yes

D4 Rain Mild High Weak Yes

D5 Rain Cool Normal Weak Yes

D6 Rain Cool Normal Strong No

D7 Overcast Cool Normal Strong Yes

D8 Sunny Mild High Weak No

D9 Sunny Cool Normal Weak Yes

D10 Rain Mild Normal Weak Yes

D11 Sunny Mild Normal Strong Yes

D12 Overcast Mild High Strong Yes

D13 Overcast Hot Normal Weak Yes

D14 Rain Mild High Strong No

A record for transactional data may include the at-

tributes: customer ID, date, time, store, product,

coupon used.

5. Illustrative example

A given database (has four attributes and two dif-

ferent output classes) is shown in Table 1 [8]. The

encoding values of the given database are shown in

Table 2. The ANN is trained on the encoding input

attributes vectors, Xm, and the corresponding output

Table 2

Encoding database

i/p

patt.

Outlook, m1 = 3 Temperature, m2 = 3 Humidity, m3 = 2 Wind, m4 = 2 O/P patt. Play tennis

Xm Sunny

(x1)

Overcast

(x2)

Rain

(x3)

Hot

(x4)

Mild

(x5)

Cool

(x6)

High

(x7)

Norm

(x8)

Weak

(x9)

Strong

(x10)

Cm 1No

2Yes

X1 1 0 0 1 0 0 1 0 1 0 C1 1 0

X2 1 0 0 1 0 0 1 0 0 1 C2 1 0

X3 0 1 0 1 0 0 1 0 1 0 C3 0 1X4 0 0 1 0 1 0 1 0 1 0 C4 0 1

X5 0 0 1 0 0 1 0 1 1 0 C5 0 1

X6 0 0 1 0 0 1 0 1 0 1 C6 1 0

X7 0 1 0 0 0 1 0 1 0 1 C7 0 1

X8 1 0 0 0 1 0 1 0 1 0 C8 1 0

X9 1 0 0 0 0 1 0 1 1 0 C9 0 1

X10 0 0 1 0 1 0 0 1 1 0 C10 0 1

X11 1 0 0 0 1 0 0 1 0 1 C11 0 1

X12 0 1 0 0 1 0 1 0 0 1 C12 0 1

X13 0 1 0 1 0 0 0 1 1 0 C13 0 1

X14 0 0 1 0 1 0 1 0 0 1 C14 1 0

Table 3

Group of weights (WG1)i,j between input and hidden nodes

Input

nodes

Hidden nodes

H1 H2 H3 H4

x1 4.09699 3.741246 1.2106 1.42853

x2 6.154562 4.56639 0.349845 1.109533

x3 0.82675 1.114981 0.153325 0.47917

x4 0.42227 0.2961 0.19704 0.55404

x5 4.128692 3.07741 0.15498 0.651919

x6 2.73254 2.595217 0.56767 0.32539

x7 4.93463 4.005334 1.17037 0.89697

x8 5.282225 4.36782 0.235355 0.616702

x9 3.060052 3.11607 1.106763 0.56799

x10 3.63009 2.284223 1.36338 1.02158

Table 4Group of weights (WG2)j,k output and hidden nodes

Output

nodes

Hidden nodes

H1 H2 H3 H4

1 9.20896 9.012731 1.2113 0.90564

2 9.22879 9.00487 0.773881 1.218929

classes vectors, Cm. The number of input nodes is

given by

I=

Nn=1

mn = m1 +m2 + m3 + m4 = 10

The number of output nodes is K = 2.


8/13

Table 5

The rule extraction for class no (1 is maximum)

Rule no. Fitness Xi vector from GA Directly extracted rules (dont play) Rules refinem

A1 A2 A3 A4

x1 x2 x3 x4 x5 x6 x7 x8 x9 x10

1 0.99988 1 0 0 1 1 1 1 0 0 1 If Outlook is Sunny And Temperature is

Hot or Mild or Cool And Humidity is

High And WIND is Strong

If Outlook is

And WIND

2 0.999874 1 0 1 0 0 1 1 0 0 1 If Outlook is Sunny or Rain And

Temperature is Cool And Humidity is

High And WIND is Strong

If Outlook i

Temperatur

And WIND

3 0.999867 1 0 0 1 1 1 1 0 1 1 If Outlook is Sunny And TEPERATURE

is Hot or Mild or Cool And Humidity is

High And WIND is weak or Strong

If Outlook i

4 0.999849 0 0 1 0 0 1 1 1 0 1 If Outlook is Rain And Temperature is

Cool And Humidity is High or Normal

And Wind is Strong

If Outlook i

And Wind i


9/13

Table 6

The rule extraction for class yes (2 is maximum)

Rule no. Xi vector from GA Directly extracted rules (play) Rules refi

A1 A2 A3 A4

x1 x2 x3 x4 x5 x6 x7 x8 x9 x10

1 0.99998 0 1 1 0 0 0 0 0 1 0 If Outlook is Overcast or Rain And Wind

is Weak

If Outlo

is Weak

2 0.999972 0 1 0 0 0 0 0 0 0 0 If Outlook is Overcast If Outlo

3 0.999960 1 1 0 0 0 0 0 1 0 0 If Outlook is Sunny or Overcast And

Humidity is Normal

If Outloo

Humidit


10/13


Table 7

RITIO induced rule set from Table 1 [9]

Rule no. Rule

1 If Outlook is Sunny And Humidity is High Then CLASS No2 If Outlook is Overcast And Humidity is High Then CLASS Yes

3 If Humidity is Normal Then CLASS Yes

4 If HUMIDITY is Normal And Wind is Weak Then CLASS Yes

5 If Outlook is Rain And Humidity is High And Wind is Weak Then CLASS Yes

6 If Outlook is Rain And Humidity is normal And Wind is Strong Then CLASS No

7 If Outlook is Rain And Humidity is high And Wind is Strong Then CLASS No

The convergence rate between the actual and

the desired output is achieved by: 4 hidden nodes,

0.55 learning coefficient, 0.65 momentum coefficient

and 30,000 iterations. The allowable error equals

0.000001. Table 3 shows the first group of weights(WG1)i,j between each input node and the hidden

nodes. The second group of weights (WG2)j,k be-

tween each hidden node and the output nodes is

shown in Table 4.

Applying the GA to solve the equation 1 in order

to get the i/p attributes vector which maximizes that

function.

The GA has population of 10 individuals evolv-

ing during 1300 generations. The crossover and the

mutation were 0.25 and 0.01 respectively. The out-

put chromosomes ofplay and dont play target classesare sorted descendingly according to their fitness val-

ues. The threshold levels of the two target classes are

0.99996 and 0.999849, respectively.

Therefore, both the local and global maximum of

output chromosomes has been determined and will be

translated into rules. Tables 5 and 6 present the best

set of rules belonging to dont play and play target,

respectively.

Table 7 shows RITIO induced set of rule for the

same database [9]. Although RITIO gives a good

indication of the algorithm stability over different

databases; the rule number 3 is not verified. The algo-rithm proposed here shows that all rules are verified.

6. Application and results

The MONKS problems are benchmark binary clas-

sification tasks in which robots are described in terms

of six characteristics and a rule is given which spec-

ifies the attributes that determine membership of the

Table 8

The attributes and their values of MONK1S database [10]

Robot characteristics (attributes) Nominal values

Head shape Round, square, octagon

Body shape Round, square, octagon

Is smiling Yes, no

Holding Sword, flag, balloon

Jacket colour Red, yellow, green, blue

Has tie yes, no

target class [10]. The six attributes and their values are

shown in Table 8.

The two rules that determine the memberships of

the target class in the MONK1S database are shown

in Table 9.

The ANN is trained on 123 input vectors, Xm. The

corresponding output classes vectors, Cm are shown

in Table 10. The number of input nodes, I= 17, and

the number of output nodes, K = 2. The convergence

rate between the actual and desired output is achieved

by: 6 hidden nodes, 0.25 learning coefficient, 0.85 mo-

mentum coefficient and 31,999 iterations. The allow-

able error equals 0.0000001.

Table 11 shows the first group weights (WG1)i,jbetween each input node and the hidden nodes. The

second group weights (WG2)j,k between each hidden

node and the output nodes is shown in Table 12.

Table 9

Two rules satisfy the target

Rule 1 Rule 2

If Head Shape Value = If Jacket Color = Red

Body Shape Value THEN Robot is in

THEN Target Class

Robot is in Target Class


11/13


Table 10

The MONK1S database [10]

Xm Head shape Body shape Is smiling Holding Jacket colour Has tie Cm Target

1 Round Round Yes Sword Green Yes 1 Yes

2 Round Round Yes Flag Yellow Yes 2 Yes

3 Round Square Yes Sword Green Yes 3 No

4 Round Octagon Yes Flag Blue Yes 4 No

55 Square Round Yes Sword Green Yes 55 No

56 Square Square Yes Sword Green Yes 56 Yes

57 Square Square Yes Flag Red No 57 Yes

58 Square Octagon No Balloon Red Yes 58 Yes

120 Octagon Round No Sword Red Yes 120 Yes

121 Octagon Round No Balloon Yellow No 121 No

122 Octagon Octagon No Flag Yellow No 122 Yes

123 Octagon Octagon No Flag Green No 123 Yes

Table 11

Group of weights (WG1)i,j between each input and hidden nodes

Input nodes Hidden nodes

H1 H2 H3 H4 H5 H6

x1 5.08851 6.40872 2.478146 0.53785 3.331379 1.01267

x2 4.094656 0.55311 2.24007 1.00648 6.64513 0.53136

x3 2.711605 7.121283 2.49793 0.15809 0.468151 0.3962

x4

2.9641 7.48084 1.351769 0.69977 6.00667 0.18359

x5 0.929943 7.760751 2.3443 0.53314 5.33333 0.2059

x6 3.494829 0.138298 2.217123 0.63468 1.24655 0.4458

x7 0.475753 0.275564 0.829914 1.09122 1.47744 0.8716

x8 0.358807 0.269779 0.623271 1.23704 1.61803 0.92063

x9 0.10996 0.243966 0.019956 0.29096 1.02741 0.006704

x10 0.385337 0.31376 0.989733 0.58041 0.54741 0.50737

x11 0.13311 0.07916 0.539239 1.02715 0.74859 0.77975

x12 7.31878 12.26899 4.98723 0.279794 4.79433 0.471633

x13 2.941625 3.99095 1.822638 0.49974 0.666357 1.03168

x14 2.469945 4.1919 2.270769 0.57977 0.686182 1.01134

x15 2.658616 3.47783 2.435963 0.62123 1.15922 0.59382

x16 0.48247 0.314717 0.777509 0.83715 1.61191 0.56232

x17 0.878135 0.340808 0.315489 0.77439 2.04905 1.21304

Table 12

Group of weights (WG2)j,k between each hidden and output nodes

Output nodes Hidden nodes

H1 H2 H3 H4 H5 H6

1 13.3740 14.5207 6.48067 0.40159 11.70462 0.52939

2 13.37457 14.52426 6.48808 0.07072 11.7054 0.33697


12/13

Table 13

The set of rules belongs to target class

Rule no. Fitness Xi vector from GA Directly extracted rules

A1 A2 A3 A4 A5 A6

1 0.9999 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x16 x17 If Jacket color is Red

2 0.99947 0 0 1 0 0 1 1 1 1 1 1 1 1 1 1 0 0 If Head Shape is Octagon AN

Body Shape is Octagon AND

Smiling is Yes OR No AND

Holding is Sword OR Flag O

Balloon AND Jacket Color is

Red OR Yellow OR Green OBlue

3 0.99946 0 1 0 0 1 0 1 1 1 1 1 0 0 0 0 1 1 If Head Shape is Square AND

Body Shape is Square AND I



Balloon AND Has Tie is Yes

OR No

4 0.99845 1 0 0 1 0 0 1 1 1 1 1 0 0 0 0 0 0 If Head Shape is Round AND

Body Shape is Round AND I



Balloon


13/13


Table 14

Accuracy results for different algorithms [9]

Database MONK1S

HCV (%) 100C4.5 (%) 83.3

RITIO (%) 97.37

C4.5 rules (%) 100

Proposed algorithm (%) 100

The GA has a population of 10 individuals evolving

during 1225 generations. The crossover and mutation

are 0.28 and 0.002, respectively. The output chromo-

somes for target class are sorted according to their fit-

ness values until the level 0.99845. Table 13 presents

the best set of rules, belongs to target class according

to the fitness values.From Table 13, the rules extracted from the pro-

posed algorithm and the standard rules given in Table 9

are identical. This shows a good indication of the algo-

rithm stability. The accuracy of the proposed algorithm

among different algorithms for MONK1s database is

shown in Table 14 [9].

The discovered rules for hypothetical individual

person data and the products are in the following

format:

IF PRODUCT = Hat THEN Profit = Medium.

IF Color = Blue THEN Profit = High.IF MONTH = June THEN Profit = Medium.IF MONTH = December THEN Profit = High.

7. Conclusions

A novel machine learning algorithm for extracting

comprehensible rules have been presented in this pa-

per. It does not need the computational complexity as

deterministic finite state automata (DNF) algorithm.

It takes all input attributes into consideration so it

produces an accurate rules but other algorithms suchas DNF uses only the input attributes up to certain

level. Also, it uses only part of weights to extract rules

belongs to certain class. So it has a less computational

time compared with another algorithms. The proposed

methodology does not make any approximation to the

activation function.

The user profile information stored in a database

along with a unique user ID and password. A datawarehouse repository with such data can be analyzed.

This algorithm can help devise rules to govern which

messages are offered to the an anonymous prospect,

how to counter points of resistance, and when to at-

tempt to close a sale.

The future work should consist of more experiments

with other data sets, as well as more elaborated exper-

iments to optimize the GA parameters of the proposed

algorithm.

References

[1] J. Jhang, H. Jain, K. Ramamurthy, Effective design of

electronic commerce environments: a proposed theory of

congruence and an illustration, IEEE Trans. Systems Man

Cybernet. Part A: Syst. Hum. 30 (4) (2000) 456471.

[2] G. Adomavicius, A. Tuzbilin, Using data mining methods to

build customer profiles, IEEE Comput. 34 (2) (2001) 7482.

[3] X. Wu, D. Urpani, Induction by attribute elimination, IEEE

Trans. Knowl. Data Eng. 11 (5) (1999) 805812.

[4] H. Tsukimoto, Extracting rules from trained neural networks,

IEEE Trans. Neural Networks 11 (2) (2000) 377389.

[5] R. Setiono, Extracting M-of-N rules from trained neural

networks, IEEE Trans. Neural Networks 11 (2) (2000) 512519.

[6] F. Wotawa, G. Wotawa, Deriving qualitative rules from

neural networksa case study for ozone forecasting AI

communications, vol. 14, 2001, 23-33 ISSN 0921-7126,

2001, IOS Press.

[7] H. Tsukimoto, Extracting rules from trained neural networks,

IEEE Trans. Neural Networks 11 (2) (2000) 377389.

[8] Tom M. Mitchell, Machine Learning Book, Copyright 1997.

[9] Xindond Wu, D. Urpani, Induction by attribute elimination,

IEEE Trans. Knowl. Data Eng. 11 (5) (1999).

[10] http://www.cse.unsw.edu.aucs3411/C4.5/Data.

[11] Comm. ACM, Special Issue on Personalization, vol. 43, no.

8, 2000.

[12] A.E. El-Alfy, R. Haque, Y. Al-Ohali, A framework toemploy multi AI systems to facilitate easy online shopping,

http://www-3.ibm.com/easy/eou ext.nsf/Publish/2049.

[13] R. Setiono, W.K. Leow, J.M. Zurada, Extraction of rules

from artificial neural networks for nonlinear regression, IEEE

Trans. Neural Networks 13 (3) (2002) 564577.
http://www-3.ibm.com/easy/eou_ext.nsf/Publish/2049http://www-3.ibm.com/easy/eou_ext.nsf/Publish/2049

Documents

Rule Extraction