Upload
haneesha-muddasani
View
214
Download
0
Embed Size (px)
Citation preview
7/29/2019 Rule Extraction
1/13
Applied Soft Computing 4 (2004) 6577
Extracting rules from trained neural network usingGA for managing E-business
A. Ebrahim Elalfi a,, R. Haque b, M. Esmel Elalami a
a Department of Computer Instructor Preparation, Faculty of Specific Education, Mansoura University, Mansoura, Egyptb High Tech. International.com, Montreal, Que., Canada
Received 23 September 2002; received in revised form 13 August 2003; accepted 19 August 2003
Abstract
Theability to intelligently collect, manageand analyze information about customersand sellers is a key sourceof competitive
advantage for an e-business. This ability provides an opportunity to deliver real time marketing or services that strengthen
customer relationships. This also enables an organization to gather business intelligence about a customer that can be used
for future planning and programs.
This paper presents a new algorithm for extracting accurate and comprehensible rules from databases via trained artificial
neural network (ANN) using genetic algorithm (GA). The new algorithm does not depend on the ANN training algorithms
also it does not modify the training results. The GA is used to find the optimal values of input attributes (chromosome),
Xm, which maximize the output function k of output node k. The function k = f(xi, (WG1)i,j, (WG2)j,k) is nonlinear
exponential function. Where (WG1)i,j, (WG2)j,k are the weights groups between input and hidden nodes, and hidden and
output nodes, respectively. The optimal chromosome is decoded and used to get a rule belongs to classk .
2003 Elsevier B.V. All rights reserved.
Keywords: E-business; Artificial neural network; Genetic algorithms; Personalization; Online shopping; Rule extraction
1. Introduction
E-commerce has evolved from consumers con-
ducting basic transactions on the Web, to a complete
retooling of the way partners, suppliers and cus-tomers transact. Now one can link dealers and sup-
pliers online, reducing both lag time and paperwork.
One can move procurement online by setting up an
extranet that links directly to vendors, cutting inven-
tory carrying costs and becoming more responsive
to his/her customers. Also, you can streamline your
Corresponding author.
E-mail address: ael [email protected] (A.E. Elalfi).
financial relationships with customers and suppliers
by Web-enabling billing and payment systems.
Recent literature suggests that Internet and WWW
as a business transaction tool provides both firms and
consumers with various benefits including lower trans-action cost, lower search cost, and greater selection of
goods [1].
The ability to provide content and services to indi-
viduals on the basis of knowledge about their prefer-
ences and behavior has become important marketing
tool [2].
A complete customer profile has two parts: factual
and behavioral. The factual profile contains informa-
tion, such as name, gender, and date of birth that the
personalization system obtained from the customers
1568-4946/$ see front matter 2003 Elsevier B.V. All rights reserved.
doi:10.1016/j.asoc.2003.08.004
7/29/2019 Rule Extraction
2/13
66 A.E. Elalfi et al. / Applied Soft Computing 4 (2004) 6577
factual data. The factual profile also can contain infor-
mation derived from the transaction data. A behavioral
profile models the customers actions and is usually
derived from transactional data.Personalization begins with collecting customer
data from various sources. This data might include
histories of customers; web purchasing and browsing
activities, as well as demographic and psychological
information. After the data is collected, it must be
prepared, cleaned, and stored in data warehouse.
Real world data is dirty. Data cleaning including the
removal of contradictory and redundant data items and
the elimination of irrelevant attributes has been an im-
portant topic in data mining research development [3].
Extracting rules from a given database via trained
neural networks is important [4]. Although several al-
gorithms have been proposed by several researchers
[5,6], there is no algorithm which can be applied to
any type of networks, to any training algorithm, and
to both discrete and continuous values [4]. A method
for extracting M-of-N rules from trained artificial neu-
ral networks (ANN) was presented by Setiono [5].
However, the algorithm was based on the standard
threelayered feed forward networks. Also the at-
tributes of the database are assumed to have binary
values 1 or 1. Hiroshi had presented a decomposi-
tion algorithm that can be applied to multilayer ANNand recurrent networks [6]. The units of ANN are ap-
proximated by Boolean functions. The computational
complexity of the approximation is exponential, and
so a polynomial algorithm was presented [7]. To re-
duce the computational complexity higher order terms
were neglected. Consequently the extraction of accu-
rate rules is not guaranteed.
An approach for extracting rules from trained ANNs
for regression was presented [13]. Each rule in the ex-
tracted rule set corresponds to subregion of the input
space and a linear function involving the relevant inputattributes of the data approximates the network output
for all data samples in this subregion. However, the
method extracts rules from trained ANN by approxi-
mating the hidden activation function; h(x) = tanh(x)
by either three-piece or five-piece linear function. This
approximation yields to less accuracy and makes the
computation burdensome.
This paper presents a new algorithm for extracting
rules from trained neural network using genetic algo-
rithm. It does not depend on the training algorithms of
ANN and does not modify the training results. Also
the algorithm can be applied on discrete and con-
tinuous attributes. The algorithm does not make any
approximation to the hidden unit activation function.Additionally it takes into consideration any number
of hidden layers in the trained ANN.
The extracted rules can be used to define customer
profile in order to make easy online shopping.
2. Problem formulation
A supervised ANN uses a set of training examples
or records. These records include N attributes. Each
attribute, An (n = 1, 2, . . . , N ), can be encoded into a
fixed length binary sub-string {x1 . . . xi . . . xmn}, wheremn is the number of possible values for an attribute
An. The element xi = 1 if its corresponding attribute
value exists, while all the other elements = 0. So, the
proposed number of input nodes, I, in the input layer
of ANN can be given by
I=
Nn=1
mn (1)
The input attributes vectors, Xm, to the input layer can
be rewritten asXm = {x1 . . . xi . . . xI}m (2)
where m = (1, 2, . . . , M), M is the total number of
input training patterns.
The output class vector, Ck(k = 1, 2, . . . , K), can
be encoded as a bit vector of a fixed length K as fol-
lows:
Ck{1 . . . k . . . K} (3)
where Kis the number of different possible classes. If
the output vector belongs to classk then the element kis equal to 1 while all the other elements in the vector
are zeros. Therefore, the proposed number of output
nodes in the output layer of ANN is K. Accordingly the
input and the output nodes of the ANN are determined
and the structure of the ANN is shown in Fig. 1. The
ANN is trained on the encoded vectors of the input
attributes and the corresponding vectors of the output
classes. The training of ANN is processed until the
convergence rate between the actual and the desired
output will be achieved. The convergence rate can be
7/29/2019 Rule Extraction
3/13
A.E. Elalfi et al. / Applied Soft Computing 4 (2004) 6577 67
Fig. 1. The structure of the ANN.
improved by changing the number of iterations, the
number of hidden nodes (J), the learning rate, and the
momentum rate.
After training the ANN, two groups of weights can
be obtained. The first group, (WG1)i,j, includes the
weights between the input node i and the hidden nodej. The second group, (WG2)j,k , includes the weights
between the hidden node j and the output node k. The
activation function used in the hidden and output nodes
of the ANN is a sigmoid function.
The total input to the jth hidden node, IHN j, is
given by;
IHNj =
Ii=1
xi(WG1)i,j (4)
The output of the jth hidden node, OHNj, is givenby
OHNj =1
1 + e
Ii=1xi(WG1)i,j
(5)
The total input to the kth output node, IONk, is given
by
IONk =
Jj=1
(WG2)j,k1
1 + e
Ii=1xi(WG1)i,j
(6)
So, the final value of the kth output node, k, is given
by
k
=
11 + e
Jj=1WG2j,k
1/1+e
Ii=1
xi(WG1)i,j
(7)
The function, k = f(xi, (WG1)i,j, (WG2)j,k) is an
exponential function in xi since (WG1)i,j, (WG2)j,kare constants. Its maximum output value is equal
one.
Definition. An input vector, Xm, belongs to a
classk iff k Cm = 1 and all other elements in
Cm = 0.
Consequently, for extracting relation (rule) between
the input attributes, Xm relating to a specific classkone must find the input vector, which maximizes k.
This is an optimization problem and can be stated as:
Maximize
7/29/2019 Rule Extraction
4/13
68 A.E. Elalfi et al. / Applied Soft Computing 4 (2004) 6577
k(xi)
=
11 + e
Jj=1WG2j,k
1/1+e
Ii=1
xi(WG1)i,j
(8)
Subjected to:
xi are binary values (0 o r 1) (9)
Since the objective function k(xi) is nonlinear and the
constraints are binary so, it is a nonlinear integer op-
timization problem. The genetic algorithm (GA) can
be used to solve it. The following algorithm explains
how the GA can be used to obtain the best chromo-
some, which maximizes objective function k(xi):
Begin{
Assume the fitness function as k(xi)
Create a chromosome structural as follows:{
Generate number of slots equal I, which rep-
resent input vector X.
Put a random value 0 or 1 in each slot}G = 0
where G is the number of generation.
Create the initial population, P, ofTchromosomes,P(t)G,
where t= 1 to T.
Evaluate the fitness function according to P(t)G
while termination conditions not satisfied
Do {G = G + 1Select number of chromosomes from P(t)G accord-
ing to the roulette wheel procedure
Recombine between them using crossover and mu-tation;
Modify the population from P(t)G1 to P(t)G
Evaluate the fitness function according to P(t)G
}Display the best chromosome that satisfies the
conditions}End
For extracting a rule belongs to classk the best chro-
mosome must be decoded as follows:
The best chromosome is divided into N segments.
Each segment represents one attribute, An (n =
1, 2, . . . , N ), and has a corresponding bits length
mn which represents their values. The attribute values are existed if the corresponding
bits in the best chromosome equal one and vice
versa.
The operators OR and AND are used to corre-
late the existing values of the same attribute and the
different attributes, respectively.
After getting the set of rules make rule refinement
and cancel redundant attributes, e.g. if an attribute
has three values such as A, B, and C and a rule
looks like:
If attk has value A or B or C then classk such
attribute can be dropped (redundant).
The overall methodology of rule extraction is shown
in Fig. 2.
3. Generalization for multiple hidden layers
The objective function obtained in Eq. (8) can be
generalized for ANN, which has more than one hidden
layer. Fig. 3 shows the ANN that includes three hidden
layers.
The function, k, in the final form for the kth output
mode is given by
k =1
1 + e
Jj3=1
[1/(1+eA)(WG4)j3k] (10)
where
A=
Jj2=1
1
1 + e
Jj1=1
1/1+e
Ii=1
XiWG1)ij1
(WG2)j1j2
(WG3)j2j3
(11)
Xi is the input values, where i = 1, 2, . . . , I . I is the
total number of nodes at input layer, j1 = 1, 2, . . . , J ,
for first hidden layer; j2 = 1, 2, . . . , J , for second hid-
den layer; j3 = 1, 2, . . . , J , for third hidden layer; J
is the total number of nodes at each hidden layer; k =
1, 2, . . . , K; K is the total number of nodes at output
layer; (WG1)ij1 weights group between input layer i,
7/29/2019 Rule Extraction
5/13
A.E. Elalfi et al. / Applied Soft Computing 4 (2004) 6577 69
Yes
Yes
Yes
Input nodes = 1 , 2 ,..................., I
Hideen nodes = 1 , 2 ,..................., J
Output nodes = 1, 2 ,...................., K
.
Structure ANN with random paramters
( learning coef. , momentum coef. )
Is the error
satisfactory ?
Is the
iteration
reach max. ?
Iteration =
Iteration +1
Create another random paramters for
ANN
Separate database into input vectors (Xm
) and
crossponding output vectors (Cm
)
Database is coded as bit string
k = 1
Create an intial population
Iteration = 0.0
Evaluate the fittness function
Selection
Crossover
Mutation
Update the population
Is the iteration
reach max. ?
Iteration =
Iteration +1
Arrange the fittness function from up to down until to certain level.
Decode the crossponding population into equivalnt rules which meet the classk
Is k reach max. ? Stopk = k + 1
Iteration = 0.0
No
No
No
Extract the weight groups { (WG1)i,j
& (WG2)j,k
}
Create a set of general form for output function, )(i
x
Maximize the fittness function )( ix
Yes
No
k
k
Fig. 2. Overall flowchart for the proposed methodology.
7/29/2019 Rule Extraction
6/13
70 A.E. Elalfi et al. / Applied Soft Computing 4 (2004) 6577
X1
X2
XI
XI
Input layeri
Hidden layerj
Output layerk
(WG1)i,,j10
1
0
0
1
1
1
0
1
1
1
0
Output classvectores
Input attributesvectors
(WG4)j3,k
S1
Sk K
110
0
0
0
00
1
0
0
01
0
Class Class Class
j1 j2 j3
(WG2)j1,,j2(WG3)j2,,j3
Fig. 3. ANN with three hidden layers.
and first hidden layer, j1. (WG2)j1j2 is the weights
group between first hidden layer, j1, and second hid-
den layer, j2. (WG3)j2j3 is the weights group between
second hidden layer, j2, and third hidden layer, j3.(WG4)j3k is the weights group between third hidden
layer, j3, and output layer, k.
4. Personalized marketing and customer retention
strategies
As organizations attempt to develop marketing and
customer retention strategies, they will need to collect
visitors statistics and integrate data across systems.
Additionally, there is a need to improve data about
inventories. Personalization is a relatively new field,
and different authors provide various definitions of the
concept [11]. Fig. 4 shows the stages of personaliza-
tion as an iterative process [2].
Fig. 4. Stages of the personalization process.
A framework in order to identify individual user be-
havior by a system to make easy online shopping and
to maximize user satisfaction has presented [12]. It is
clear that the individual user behavior will act based
on his/her preferences, attitude and personality. Each
individual behavior such as preferences and attitudes
are different from the others. An individual activities
or expressions are monitored and captured by using
sensing devices. The individual user behaviors are rec-
ognized by pattern recognition systems (PRSs). The
intelligent agents are used to make system strategies
or plans based on the individual user behaviors and
product state; so that the system can act as per indi-
vidual behaviors to make it easy online shopping.
A proposed record for products and inventories can
have the following attributes: product name, color,
store size, city, month, quantity, quantity sold, profit.
A record for factual data include: customer ID, cus-
tomer name, gender, birth date, nationality.
7/29/2019 Rule Extraction
7/13
A.E. Elalfi et al. / Applied Soft Computing 4 (2004) 6577 71
Table 1
Example for target concept play tennis [8]
Day Outlook Temperature Humidity Wind Play
tennisD1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
A record for transactional data may include the at-
tributes: customer ID, date, time, store, product,
coupon used.
5. Illustrative example
A given database (has four attributes and two dif-
ferent output classes) is shown in Table 1 [8]. The
encoding values of the given database are shown in
Table 2. The ANN is trained on the encoding input
attributes vectors, Xm, and the corresponding output
Table 2
Encoding database
i/p
patt.
Outlook, m1 = 3 Temperature, m2 = 3 Humidity, m3 = 2 Wind, m4 = 2 O/P patt. Play tennis
Xm Sunny
(x1)
Overcast
(x2)
Rain
(x3)
Hot
(x4)
Mild
(x5)
Cool
(x6)
High
(x7)
Norm
(x8)
Weak
(x9)
Strong
(x10)
Cm 1No
2Yes
X1 1 0 0 1 0 0 1 0 1 0 C1 1 0
X2 1 0 0 1 0 0 1 0 0 1 C2 1 0
X3 0 1 0 1 0 0 1 0 1 0 C3 0 1X4 0 0 1 0 1 0 1 0 1 0 C4 0 1
X5 0 0 1 0 0 1 0 1 1 0 C5 0 1
X6 0 0 1 0 0 1 0 1 0 1 C6 1 0
X7 0 1 0 0 0 1 0 1 0 1 C7 0 1
X8 1 0 0 0 1 0 1 0 1 0 C8 1 0
X9 1 0 0 0 0 1 0 1 1 0 C9 0 1
X10 0 0 1 0 1 0 0 1 1 0 C10 0 1
X11 1 0 0 0 1 0 0 1 0 1 C11 0 1
X12 0 1 0 0 1 0 1 0 0 1 C12 0 1
X13 0 1 0 1 0 0 0 1 1 0 C13 0 1
X14 0 0 1 0 1 0 1 0 0 1 C14 1 0
Table 3
Group of weights (WG1)i,j between input and hidden nodes
Input
nodes
Hidden nodes
H1 H2 H3 H4
x1 4.09699 3.741246 1.2106 1.42853
x2 6.154562 4.56639 0.349845 1.109533
x3 0.82675 1.114981 0.153325 0.47917
x4 0.42227 0.2961 0.19704 0.55404
x5 4.128692 3.07741 0.15498 0.651919
x6 2.73254 2.595217 0.56767 0.32539
x7 4.93463 4.005334 1.17037 0.89697
x8 5.282225 4.36782 0.235355 0.616702
x9 3.060052 3.11607 1.106763 0.56799
x10 3.63009 2.284223 1.36338 1.02158
Table 4Group of weights (WG2)j,k output and hidden nodes
Output
nodes
Hidden nodes
H1 H2 H3 H4
1 9.20896 9.012731 1.2113 0.90564
2 9.22879 9.00487 0.773881 1.218929
classes vectors, Cm. The number of input nodes is
given by
I=
Nn=1
mn = m1 +m2 + m3 + m4 = 10
The number of output nodes is K = 2.
7/29/2019 Rule Extraction
8/13
Table 5
The rule extraction for class no (1 is maximum)
Rule no. Fitness Xi vector from GA Directly extracted rules (dont play) Rules refinem
A1 A2 A3 A4
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
1 0.99988 1 0 0 1 1 1 1 0 0 1 If Outlook is Sunny And Temperature is
Hot or Mild or Cool And Humidity is
High And WIND is Strong
If Outlook is
And WIND
2 0.999874 1 0 1 0 0 1 1 0 0 1 If Outlook is Sunny or Rain And
Temperature is Cool And Humidity is
High And WIND is Strong
If Outlook i
Temperatur
And WIND
3 0.999867 1 0 0 1 1 1 1 0 1 1 If Outlook is Sunny And TEPERATURE
is Hot or Mild or Cool And Humidity is
High And WIND is weak or Strong
If Outlook i
4 0.999849 0 0 1 0 0 1 1 1 0 1 If Outlook is Rain And Temperature is
Cool And Humidity is High or Normal
And Wind is Strong
If Outlook i
And Wind i
7/29/2019 Rule Extraction
9/13
Table 6
The rule extraction for class yes (2 is maximum)
Rule no. Xi vector from GA Directly extracted rules (play) Rules refi
A1 A2 A3 A4
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
1 0.99998 0 1 1 0 0 0 0 0 1 0 If Outlook is Overcast or Rain And Wind
is Weak
If Outlo
is Weak
2 0.999972 0 1 0 0 0 0 0 0 0 0 If Outlook is Overcast If Outlo
3 0.999960 1 1 0 0 0 0 0 1 0 0 If Outlook is Sunny or Overcast And
Humidity is Normal
If Outloo
Humidit
7/29/2019 Rule Extraction
10/13
74 A.E. Elalfi et al. / Applied Soft Computing 4 (2004) 6577
Table 7
RITIO induced rule set from Table 1 [9]
Rule no. Rule
1 If Outlook is Sunny And Humidity is High Then CLASS No2 If Outlook is Overcast And Humidity is High Then CLASS Yes
3 If Humidity is Normal Then CLASS Yes
4 If HUMIDITY is Normal And Wind is Weak Then CLASS Yes
5 If Outlook is Rain And Humidity is High And Wind is Weak Then CLASS Yes
6 If Outlook is Rain And Humidity is normal And Wind is Strong Then CLASS No
7 If Outlook is Rain And Humidity is high And Wind is Strong Then CLASS No
The convergence rate between the actual and
the desired output is achieved by: 4 hidden nodes,
0.55 learning coefficient, 0.65 momentum coefficient
and 30,000 iterations. The allowable error equals
0.000001. Table 3 shows the first group of weights(WG1)i,j between each input node and the hidden
nodes. The second group of weights (WG2)j,k be-
tween each hidden node and the output nodes is
shown in Table 4.
Applying the GA to solve the equation 1 in order
to get the i/p attributes vector which maximizes that
function.
The GA has population of 10 individuals evolv-
ing during 1300 generations. The crossover and the
mutation were 0.25 and 0.01 respectively. The out-
put chromosomes ofplay and dont play target classesare sorted descendingly according to their fitness val-
ues. The threshold levels of the two target classes are
0.99996 and 0.999849, respectively.
Therefore, both the local and global maximum of
output chromosomes has been determined and will be
translated into rules. Tables 5 and 6 present the best
set of rules belonging to dont play and play target,
respectively.
Table 7 shows RITIO induced set of rule for the
same database [9]. Although RITIO gives a good
indication of the algorithm stability over different
databases; the rule number 3 is not verified. The algo-rithm proposed here shows that all rules are verified.
6. Application and results
The MONKS problems are benchmark binary clas-
sification tasks in which robots are described in terms
of six characteristics and a rule is given which spec-
ifies the attributes that determine membership of the
Table 8
The attributes and their values of MONK1S database [10]
Robot characteristics (attributes) Nominal values
Head shape Round, square, octagon
Body shape Round, square, octagon
Is smiling Yes, no
Holding Sword, flag, balloon
Jacket colour Red, yellow, green, blue
Has tie yes, no
target class [10]. The six attributes and their values are
shown in Table 8.
The two rules that determine the memberships of
the target class in the MONK1S database are shown
in Table 9.
The ANN is trained on 123 input vectors, Xm. The
corresponding output classes vectors, Cm are shown
in Table 10. The number of input nodes, I= 17, and
the number of output nodes, K = 2. The convergence
rate between the actual and desired output is achieved
by: 6 hidden nodes, 0.25 learning coefficient, 0.85 mo-
mentum coefficient and 31,999 iterations. The allow-
able error equals 0.0000001.
Table 11 shows the first group weights (WG1)i,jbetween each input node and the hidden nodes. The
second group weights (WG2)j,k between each hidden
node and the output nodes is shown in Table 12.
Table 9
Two rules satisfy the target
Rule 1 Rule 2
If Head Shape Value = If Jacket Color = Red
Body Shape Value THEN Robot is in
THEN Target Class
Robot is in Target Class
7/29/2019 Rule Extraction
11/13
A.E. Elalfi et al. / Applied Soft Computing 4 (2004) 6577 75
Table 10
The MONK1S database [10]
Xm Head shape Body shape Is smiling Holding Jacket colour Has tie Cm Target
1 Round Round Yes Sword Green Yes 1 Yes
2 Round Round Yes Flag Yellow Yes 2 Yes
3 Round Square Yes Sword Green Yes 3 No
4 Round Octagon Yes Flag Blue Yes 4 No
55 Square Round Yes Sword Green Yes 55 No
56 Square Square Yes Sword Green Yes 56 Yes
57 Square Square Yes Flag Red No 57 Yes
58 Square Octagon No Balloon Red Yes 58 Yes
120 Octagon Round No Sword Red Yes 120 Yes
121 Octagon Round No Balloon Yellow No 121 No
122 Octagon Octagon No Flag Yellow No 122 Yes
123 Octagon Octagon No Flag Green No 123 Yes
Table 11
Group of weights (WG1)i,j between each input and hidden nodes
Input nodes Hidden nodes
H1 H2 H3 H4 H5 H6
x1 5.08851 6.40872 2.478146 0.53785 3.331379 1.01267
x2 4.094656 0.55311 2.24007 1.00648 6.64513 0.53136
x3 2.711605 7.121283 2.49793 0.15809 0.468151 0.3962
x4
2.9641 7.48084 1.351769 0.69977 6.00667 0.18359
x5 0.929943 7.760751 2.3443 0.53314 5.33333 0.2059
x6 3.494829 0.138298 2.217123 0.63468 1.24655 0.4458
x7 0.475753 0.275564 0.829914 1.09122 1.47744 0.8716
x8 0.358807 0.269779 0.623271 1.23704 1.61803 0.92063
x9 0.10996 0.243966 0.019956 0.29096 1.02741 0.006704
x10 0.385337 0.31376 0.989733 0.58041 0.54741 0.50737
x11 0.13311 0.07916 0.539239 1.02715 0.74859 0.77975
x12 7.31878 12.26899 4.98723 0.279794 4.79433 0.471633
x13 2.941625 3.99095 1.822638 0.49974 0.666357 1.03168
x14 2.469945 4.1919 2.270769 0.57977 0.686182 1.01134
x15 2.658616 3.47783 2.435963 0.62123 1.15922 0.59382
x16 0.48247 0.314717 0.777509 0.83715 1.61191 0.56232
x17 0.878135 0.340808 0.315489 0.77439 2.04905 1.21304
Table 12
Group of weights (WG2)j,k between each hidden and output nodes
Output nodes Hidden nodes
H1 H2 H3 H4 H5 H6
1 13.3740 14.5207 6.48067 0.40159 11.70462 0.52939
2 13.37457 14.52426 6.48808 0.07072 11.7054 0.33697
7/29/2019 Rule Extraction
12/13
Table 13
The set of rules belongs to target class
Rule no. Fitness Xi vector from GA Directly extracted rules
A1 A2 A3 A4 A5 A6
1 0.9999 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x16 x17 If Jacket color is Red
2 0.99947 0 0 1 0 0 1 1 1 1 1 1 1 1 1 1 0 0 If Head Shape is Octagon AN
Body Shape is Octagon AND
Smiling is Yes OR No AND
Holding is Sword OR Flag O
Balloon AND Jacket Color is
Red OR Yellow OR Green OBlue
3 0.99946 0 1 0 0 1 0 1 1 1 1 1 0 0 0 0 1 1 If Head Shape is Square AND
Body Shape is Square AND I
Smiling is Yes OR No AND
Holding is Sword OR Flag O
Balloon AND Has Tie is Yes
OR No
4 0.99845 1 0 0 1 0 0 1 1 1 1 1 0 0 0 0 0 0 If Head Shape is Round AND
Body Shape is Round AND I
Smiling is Yes OR No AND
Holding is Sword OR Flag O
Balloon
7/29/2019 Rule Extraction
13/13
A.E. Elalfi et al. / Applied Soft Computing 4 (2004) 6577 77
Table 14
Accuracy results for different algorithms [9]
Database MONK1S
HCV (%) 100C4.5 (%) 83.3
RITIO (%) 97.37
C4.5 rules (%) 100
Proposed algorithm (%) 100
The GA has a population of 10 individuals evolving
during 1225 generations. The crossover and mutation
are 0.28 and 0.002, respectively. The output chromo-
somes for target class are sorted according to their fit-
ness values until the level 0.99845. Table 13 presents
the best set of rules, belongs to target class according
to the fitness values.From Table 13, the rules extracted from the pro-
posed algorithm and the standard rules given in Table 9
are identical. This shows a good indication of the algo-
rithm stability. The accuracy of the proposed algorithm
among different algorithms for MONK1s database is
shown in Table 14 [9].
The discovered rules for hypothetical individual
person data and the products are in the following
format:
IF PRODUCT = Hat THEN Profit = Medium.
IF Color = Blue THEN Profit = High.IF MONTH = June THEN Profit = Medium.IF MONTH = December THEN Profit = High.
7. Conclusions
A novel machine learning algorithm for extracting
comprehensible rules have been presented in this pa-
per. It does not need the computational complexity as
deterministic finite state automata (DNF) algorithm.
It takes all input attributes into consideration so it
produces an accurate rules but other algorithms suchas DNF uses only the input attributes up to certain
level. Also, it uses only part of weights to extract rules
belongs to certain class. So it has a less computational
time compared with another algorithms. The proposed
methodology does not make any approximation to the
activation function.
The user profile information stored in a database
along with a unique user ID and password. A datawarehouse repository with such data can be analyzed.
This algorithm can help devise rules to govern which
messages are offered to the an anonymous prospect,
how to counter points of resistance, and when to at-
tempt to close a sale.
The future work should consist of more experiments
with other data sets, as well as more elaborated exper-
iments to optimize the GA parameters of the proposed
algorithm.
References
[1] J. Jhang, H. Jain, K. Ramamurthy, Effective design of
electronic commerce environments: a proposed theory of
congruence and an illustration, IEEE Trans. Systems Man
Cybernet. Part A: Syst. Hum. 30 (4) (2000) 456471.
[2] G. Adomavicius, A. Tuzbilin, Using data mining methods to
build customer profiles, IEEE Comput. 34 (2) (2001) 7482.
[3] X. Wu, D. Urpani, Induction by attribute elimination, IEEE
Trans. Knowl. Data Eng. 11 (5) (1999) 805812.
[4] H. Tsukimoto, Extracting rules from trained neural networks,
IEEE Trans. Neural Networks 11 (2) (2000) 377389.
[5] R. Setiono, Extracting M-of-N rules from trained neural
networks, IEEE Trans. Neural Networks 11 (2) (2000) 512519.
[6] F. Wotawa, G. Wotawa, Deriving qualitative rules from
neural networksa case study for ozone forecasting AI
communications, vol. 14, 2001, 23-33 ISSN 0921-7126,
2001, IOS Press.
[7] H. Tsukimoto, Extracting rules from trained neural networks,
IEEE Trans. Neural Networks 11 (2) (2000) 377389.
[8] Tom M. Mitchell, Machine Learning Book, Copyright 1997.
[9] Xindond Wu, D. Urpani, Induction by attribute elimination,
IEEE Trans. Knowl. Data Eng. 11 (5) (1999).
[10] http://www.cse.unsw.edu.aucs3411/C4.5/Data.
[11] Comm. ACM, Special Issue on Personalization, vol. 43, no.
8, 2000.
[12] A.E. El-Alfy, R. Haque, Y. Al-Ohali, A framework toemploy multi AI systems to facilitate easy online shopping,
http://www-3.ibm.com/easy/eou ext.nsf/Publish/2049.
[13] R. Setiono, W.K. Leow, J.M. Zurada, Extraction of rules
from artificial neural networks for nonlinear regression, IEEE
Trans. Neural Networks 13 (3) (2002) 564577.
http://www-3.ibm.com/easy/eou_ext.nsf/Publish/2049http://www-3.ibm.com/easy/eou_ext.nsf/Publish/2049