Upload
cheng-hao
View
212
Download
0
Embed Size (px)
Citation preview
This article was downloaded by: [McMaster University]On: 19 December 2014, At: 10:17Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH,UK
Cybernetics and Systems: AnInternational JournalPublication details, including instructions forauthors and subscription information:http://www.tandfonline.com/loi/ucbs20
GENERATING FUZZY RULESFROM TRAINING DATACONTAINING NOISE FORHANDLING CLASSIFICATIONPROBLEMSSHYI-MING CHEN a , CHENG-HSUAN KAO b & CHENG-HAO YU ba Department of Computer Science and InformationEngineering, National Taiwan University of Scienceand Technology, Taipei, Taiwan, R.O.C.b Department of Electronic Engineering, NationalTaiwan University of Science and Technology, Taipei,Taiwan, R.O.C.Published online: 30 Nov 2010.
To cite this article: SHYI-MING CHEN , CHENG-HSUAN KAO & CHENG-HAO YU (2002)GENERATING FUZZY RULES FROM TRAINING DATA CONTAINING NOISE FOR HANDLINGCLASSIFICATION PROBLEMS, Cybernetics and Systems: An International Journal, 33:7,723-748
To link to this article: http://dx.doi.org/10.1080/01969720213935
PLEASE SCROLL DOWN FOR ARTICLE
Taylor & Francis makes every effort to ensure the accuracy of all theinformation (the “Content”) contained in the publications on our platform.However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness,or suitability for any purpose of the Content. Any opinions and views
expressed in this publication are the opinions and views of the authors, andare not the views of or endorsed by Taylor & Francis. The accuracy of theContent should not be relied upon and should be independently verified withprimary sources of information. Taylor and Francis shall not be liable for anylosses, actions, claims, proceedings, demands, costs, expenses, damages,and other liabilities whatsoever or howsoever caused arising directly orindirectly in connection with, in relation to or arising out of the use of theContent.
This article may be used for research, teaching, and private study purposes.Any substantial or systematic reproduction, redistribution, reselling, loan,sub-licensing, systematic supply, or distribution in any form to anyone isexpressly forbidden. Terms & Conditions of access and use can be found athttp://www.tandfonline.com/page/terms-and-conditions
Dow
nloa
ded
by [
McM
aste
r U
nive
rsity
] at
10:
17 1
9 D
ecem
ber
2014
GENERATINGFUZZYRULESFROMTRAININGDATACONTAININGNOISEFORHANDLINGCLASSIFICATIONPROBLEMS
SHYI-MINGCHEN
Department of Computer Science and InformationEngineering, NationalTaiwan University of Science andTechnology,Taipei,Taiwan, R.O.C.
CHENG-HSUANKAOCHENG-HAOYU
Department of Electronic Engineering, NationalTaiwanUniversity of Science andTechnology,Taipei,Taiwan, R.O.C.
It is obvious that one of the important tasks in a fuzzy system is to find a set of
rules to deal with a specific classification problem. In recent years, many re-
searchers focused on the research topic of generating fuzzy rules from training
data for handling classification problems. In a previous paper, we presented
an algorithm to construct membership functions and to generate fuzzy rules
from training examples. In this paper, we extend that work to propose a new
algorithm to generate fuzzy rules from training data containing noise to deal
with classification problems. The proposed algorithm gets a higher classifi-
cation accuracy rate and generates fewer fuzzy rules and fewer input attributes
in the antecedent portions of the generated fuzzy rules.
This work was supported in part by the National Science Council, Republic of China,
under Grant NSC 89-2213-E-011-100.
Address correspondence to Professor Shyi-Ming Chen, Ph.D., Department of Compu-
ter Science and Information Engineering, National Taiwan University of Science and
Technology, Taipei, Taiwan, R.O.C.
Cybernetics and Systems: An InternationalJournal, 33: 723�748, 2002Copyright# 2002 Taylor & Francis
0196-9722/02 $12.00+ .00
DOI: 10.1080/01969720290040812
723
Dow
nloa
ded
by [
McM
aste
r U
nive
rsity
] at
10:
17 1
9 D
ecem
ber
2014
INTRODUCTION
We usually encounter many things that contain imprecision and vague-
ness. In 1965, Zadeh proposed the theory of fuzzy sets (Zadeh 1965). It
can deal properly with the imprecision and vagueness. After Zadeh
proposed the theory of fuzzy sets, there were many theoretic develop-
ments in fuzzy logic, and there are many applications of fuzzy logic. The
fuzzy classification system is an important application. In a fuzzy clas-
sification system, objects can be classified properly by a fuzzy rule base.
One of the important tasks in a fuzzy system is to find a set of rules to
deal with a specific classification problem. In recent years, many re-
searchers focused on the research topic of generating fuzzy rules from
training data for handling classification problems (Castro and Zurita
1997; Castro, Castro-Schez, and Zurita 1999; Chen and Lin 2000; Chen
and Yeh 1998; Chen, Lee, and Lee 1999; Hayashi and Imura 1990; Hong
and Chen 1999, 2000; Hong and Lee 1996; Kao and Chen 2000; Nozaki,
Ishibuchi, and Tanaka 1997; Wang et al. 1999; Wang and Mendel 1992;
Wu and Chen 1999; Yoshinari, Pedrycz, and Hirota 1993; Yuan and
Shaw 1995; Yuan and Zhuang 1996). Castro, Castro-Schez, and Zurita
(1999) presented a machine learning method by using fuzzy logic. Castro
and Zurita (1997) presented an induction-learning algorithm for fuzzy
systems. We presented an algorithm to generate fuzzy rules from rela-
tional database systems to estimate null values (Chen and Yeh 1998), a
method for generating fuzzy rules from numerical data for handling fuzzy
classification problems (Chen, Lee, and Lee 1999), and a method for
constructing fuzzy decision trees and generating fuzzy classification rules
from training examples using compound analysis techniques (Chen and
Lin 2000). Hayashi and Imura (1990) presented a fuzzy neural expert
system to generate fuzzy if�then rules automatically from a trained
neural network. Hong and Chen (1999) presented an algorithm to gen-
erate fuzzy rules and membership functions automatically, and later,
Hong and Chen (2000) presented an algorithm which can deal with the
noise data of a training data set. Hong and Lee (1996) presented an al-
gorithm to automatically generate fuzzy rules and membership functions
from training examples. Nozaki, Ishibuchi, and Tanaka (1997) presented
a method to construct fuzzy rules from numerical data. Wang et al.
(1999) presented a fuzzy inductive learning strategy to generate fuzzy
rules. Wang and Mendel (1992) presented an algorithm to generate fuzzy
rules from training examples. We have presented a method to construct
724 S.-M. CHEN ET AL.
Dow
nloa
ded
by [
McM
aste
r U
nive
rsity
] at
10:
17 1
9 D
ecem
ber
2014
fuzzy rules and membership functions from training examples (Wu and
Chen 1999). Yuan, Pedrycz, and Hirota (1993) presented an algorithm
based on fuzzy decision trees to generate fuzzy classification rules auto-
matically. Yuan and Shaw (1995) presented a method to construct fuzzy
decision trees and to generate fuzzy rules. Yuan and Zhuang (1996)
presented a genetic algorithm for generating fuzzy classification rules.
However, there are some drawbacks in the existing fuzzy-rules gene-
ration methods for handling classification problems: (1) Most algorithms
are unable to deal with the training data containing noise. (2) The clas-
sification accuracy rate is not high enough. (3) There are too many fuzzy
rules to be generated in the fuzzy rules generation process. (4) There are
too many input attributes in the antecedent portions of the generated
fuzzy rules. Thus, we must develop a better algorithm to overcome the
drawbacks of the existing methods.
In this article, we extend the work we presented in a previous paper
(Wu and Chen 1999) to propose a new algorithm to generate fuzzy rules
and to construct membership functions from training data containing
noise to deal with the Iris data classification problem (Fisher 1936). The
proposed algorithm can get a higher classification accuracy rate, generate
fewer fuzzy rules, and can generate fewer input attributes in the ante-
cedent portions of the generated fuzzy rules than the existing methods.
FUZZYSET THEORY
In 1965, Zadeh proposed the theory of fuzzy sets (Zadeh 1965). It can
properly deal with imprecision and vagueness of information in the real
world. In the following text, we briefly review some basic concepts of
fuzzy sets and fuzzy relations (see Lee 1990a, b; Zadeh 1965).
Let U be the universe of discourse, U ¼ fu1; u2; . . . ; ung. A fuzzy set A
of U can be represented by
A ¼ mAðu1Þ=u1 þ mAðu2Þ=u2 þ � � � þ mAðunÞ=un ð1Þ
where mA is the membership function of the fuzzy set A, and mA(ui) in-
dicates the grade of membership of ui in the fuzzy set A, where
mAðuiÞ 2 ½0; 1�. Figure 1 shows a triangular fuzzy set A with the mem-
bership function mA parameterized by a triplet (a, b, c) (i.e., A¼ (a, b, c)),
where b is called the center of the triangular fuzzy set A, and a and c are
called the left vertex and the right vertex, respectively, of the triangular
fuzzy set A.
FUZZY RULES FROM TRAINING DATA 725
Dow
nloa
ded
by [
McM
aste
r U
nive
rsity
] at
10:
17 1
9 D
ecem
ber
2014
In a fuzzy relation, the degree of relationship between two elements is
assigned by a real value between zero and one. Let A1;A2; . . . ;An be
fuzzy sets and let A1 � A2 � � � � � An be their Cartesian product. Then,
we can define an nary fuzzy relation R as
RðA1;A2; . . . ;AnÞ A1 � A2 � � � � � An ð2Þ
where
A1 � A2 � � � � � An ¼ fðx1; x2; . . . ; xnÞjxi 2 Ai and 1 � i � ng ð3Þ
The membership function of RðA1;A2; . . . ;AnÞ is denoted by mR, where
mRðx1; x2; . . . ;xnÞ 2 ½0; 1�.Assume that R1(X, Y) and R2(Y, Z) are two binary fuzzy relations,
where X, Y, and Z are fuzzy sets. The composition of R1(X, Y) and
R2(Y, Z) is defined as
mR1 +R2ðx; zÞ ¼ max
y2Ymin½mR1
ðx; yÞ; mR2ðy; zÞ� 8ðx; zÞ 2 X� Z ð4Þ
Let A be a fuzzy set. The transitive closure RT(A, A) of a binary fuzzy
relation R(A, A) is defined as
RTðA;AÞ ¼ Ri; where Ri ¼ Ri�1 +R;Riþ1 ¼ Ri; and i � 2 ð5Þ
Figure 1. A triangular fuzzy set.
726 S.-M. CHEN ET AL.
Dow
nloa
ded
by [
McM
aste
r U
nive
rsity
] at
10:
17 1
9 D
ecem
ber
2014
The a-cut Ra of a binary fuzzy relation R(A, A) is defined as
Ra ¼ fðx; yÞjmRðx; yÞ � a and a 2 ½0; 1�g ð6Þ
If a binary fuzzy relation R is reflexive, symmetric and transitive, then it is
called a fuzzy equivalence relation. If a binary fuzzy relation R is reflexive
and symmetric, then it is called a compatibility relation. If R is a com-
patibility relation, then compatibility classes are defined by a specified
membership degree a, where a 2 ½0; 1�. An a-compatibility class B is a
subset of A for a given fuzzy binary compatibility relation R(A, A), such
that Rðx; yÞ � a for all x; y 2 B.
GENERATINGFUZZYRULESANDCONSTRUCTINGMEMBERSHIP FUNCTIONSFROMTRAININGDATACONTAININGNOISE
In the following text, we extend our previous work (Wu and Chen 1999)
to present a new algorithm for constructing membership functions and
generating fuzzy rules from training data containing noise. It can de-
crease the number of input attributes appearing in the antecedent por-
tions of the generated fuzzy rules, generate fewer fuzzy rules, and obtain a
higher classification rate than the existing methods.
Because there may be some errors or noise in the training data set
due to typing errors, the membership functions and fuzzy rules we get
may not the best. The algorithms presented by Hong and Lee (1996) and
Wu and Chen (1999) can not deal with this kind of problem where the
training data contain noise. Thus, we need to propose a new method to
overcome the drawback of those algorithms. If a training data set con-
tains some errors or noise data, then we can find these errors or noise
data by using the automatic clustering algorithm which we presented in
an earlier work (Yeh and Chen 1994). The automatic clustering algorithm
is reviewed as follows:
Automatic Clustering Algorithm
Main() // main program //
{
//NOD: Number of Data, Data [ ]: Data Array
sorting Data[ ] using quick sort (Horowitz and Sahni 1978) in an
ascending sequence; //
FUZZY RULES FROM TRAINING DATA 727
Dow
nloa
ded
by [
McM
aste
r U
nive
rsity
] at
10:
17 1
9 D
ecem
ber
2014
//compute the difference between two neighboring elements in Data[ ]//
for i¼ 1 to NOD-1 do
Dif [i]¼Data[iþ 1]�Data[i];
//compute the difference between two neighboring elements in Dif [i]
which is not equal to 0, where p: point to the left one, q: point to the
right one, and ‘‘abs’’ denotes the ‘‘absolute’’ function //
for i¼ 1 to NOD-2 do
{
p¼ i;
while (Dif [p]¼ 0 and p> 1) do
p¼ p�1;q¼ iþ 1;
while (Dif [q]¼ 0 and q < NOD) do
q¼ qþ 1;
Dif_Dif [i]¼ abs (Dif [p]�Dif [q])};
// start clustering //
Cluster(1, NOD)
} // end of main () //
Cluster (L, R) // subroutine //
{
// set the boundary //
let Dif [L-1]¼Dif [R]¼Data[R];
let Dif_Dif [L-1]¼Dif_Dif [R-1]¼ 0;
//if the two ends of Dif [ ] are equal to 0, then adjust Dif_Dif [ ]//
temp¼R�1;while (Dif [temp],¼ 0 do
temp¼ temp�1;for i¼ temp to R�2 do
Dif_Dif [i]¼Dif [temp];
temp¼L;
while(Dif [temp]¼ 0) do
temp¼ tempþ 1;
for i¼L to temp-1 do
Dif_Dif [i]¼Dif [temp];
//Dif_Sum: summation from Dif [L] to Dif [R-1]; S: number of elements
from Dif [L] to Dif [R-1] which is not equal to 0 //
Dif_Mean¼Dif_Sum/S;
728 S.-M. CHEN ET AL.
Dow
nloa
ded
by [
McM
aste
r U
nive
rsity
] at
10:
17 1
9 D
ecem
ber
2014
let X¼Y¼L;
//Dif_Dif_Max: maximal value from Dif_Dif [L] to Dif_Dif [R-2]//
if(R¼L) or (Dif_Mean � Dif_Dif_Max) then
Data[L] to Data[R] forms a cluster
else
for i¼L to R-1 do
{
if (i¼R-1) or ((Dif [i]>Dif_Mean) and (Dif [i] � Dif_Dif [i-1]) and
(Dif [i] � Dif_Dif [i])) then
Y¼ i;
Cluster(X, Y);
X¼Yþ 1
};
Cluster(X, R);
} // end of Cluster ()//
In the following, we use an example to illustrate the automatic
clustering process of the algorithm. Assume that the input data are as
follows:
2; 8; 3; 21; 36; 27; 28; 28; 4; 10; 88; 35; 21; 15; 23; 36
After applying the quick sort (Horowitz and Sahni 1978), the data can be
sorted in an ascending sequence:
2; 3; 4; 8; 10; 15; 21; 21; 23; 27; 28; 28; 35; 36; 36; 88
[Cluster(1,16)]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Data½ � 2 3 4 8 10 15 21 21 23 27 28 28 35 36 36 88Dif½ � 88 1 1 4 2 5 6 0 2 4 1 0 7 1 0 52 88Dif Dif½ � 0 0 3 2 3 1 4 4 2 3 6 6 6 51 51 0
Dif_Mean¼ (1þ 1þ 4þ 2þ 5þ 6þ 2þ 4þ 1þ 7þ 1þ 52)/12¼ 7.167;
because of Dif [15]¼ 52>Dif_Mean and Dif [15]>Dif_Dif [14] and
Dif [15]>Dif_Dif [15],
so that CALL Cluster(1, 5);
CALL Cluster(16, 16);
FUZZY RULES FROM TRAINING DATA 729
Dow
nloa
ded
by [
McM
aste
r U
nive
rsity
] at
10:
17 1
9 D
ecem
ber
2014
[Cluster(1, 15)]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Data½ � 2 3 4 8 10 15 21 21 23 27 28 28 35 36 36Dif½ � 36 1 1 4 2 5 6 0 2 4 1 0 7 1 0 36DifDif½ � 0 0 3 2 3 1 4 4 2 3 6 6 6 1 0
Dif_Mean¼ (1þ 1þ 4þ 2þ 5þ 6þ 2þ 4þ 1þ 7þ 1)/11¼ 3.09;
because of Dif [3]¼ 4>Dif_Mean and Dif [3]>Dif_Dif [2] and
Dif [3] > Dif_Dif [3],
so that CALL Cluster(1, 3);
because of Dif [5]¼ 5>Dif_Mean and Dif [5]>Dif_Dif [4] and
Dif [5] > Dif_Dif [5],
so that CALL Cluster(4, 5);
because of Dif [6]¼ 6>Dif_Mean and Dif [6]>Dif_Dif [5] and
Dif [6] > Dif_Dif [6],
so that CALL Cluster(6, 6);
because of Dif [9]¼ 4>Dif_Mean and Dif [9]>Dif_Dif [8] and
Dif [9] > Dif_Dif [9],
so that CALL Cluster(7,9);
because of Dif [12]¼ 7>Dif_Mean and Dif [12]>Dif_Dif [11] and
Dif [12]>Dif_Dif [12],
so that CALL Cluster(10, 12);
CALL Cluster(13, 15);
[Cluster(1, 3)]
1 2 3
Data½ � 2 3 4Dif½ � 4 1 1 4Dif Dif½ � 0 0 0
Dif_Mean¼ (1þ 1)/2¼ 1;
because of Dif_Dif_Max¼ 0 < Dif_Mean,
so that {2, 3, 4} forms a cluster.
[Cluster(4, 5)]
4 5
Data½ � 8 10Dif½ � 10 2 10Dif Dif½ � 0
730 S.-M. CHEN ET AL.
Dow
nloa
ded
by [
McM
aste
r U
nive
rsity
] at
10:
17 1
9 D
ecem
ber
2014
Dif_Mean¼ 2/2¼ 1;
because of Dif_Dif_Max¼ 0 < Dif_Mean,
so that {8, 10} forms a cluster.
[Cluster(6, 6)]
6
Data½ � 15Dif½ � 15 15Dif Dif½ � 0
because of L¼R (i.e., 6¼ 6),
so that {15} forms a cluster.
{Cluster(7, 9)]
7 8 9
Data½ � 21 21 23Dif½ � 23 0 2 23Dif Dif½ � 0 2 0
Dif_Mean¼ 2/1¼ 2;
because of Dif_Mean¼Dif_Dif_Max (i.e., 2¼ 2),
so that {21, 21, 23} forms a cluster.
[Cluster(10, 12)]
10 11 12
Data½ � 27 28 28Dif½ � 28 1 0 28Dif Dif½ � 0 1 0
Dif_Mean¼ 1/1¼ 1;
because of Dif_Mean¼Dif_Dif_Max (i.e., 1¼ 1),
so that {27, 28, 28} forms a cluster.
[Cluster(13, 15)]
13 14 15
Data½ � 35 36 36Dif½ � 36 1 0 36Dif Dif½ � 0 1 0
FUZZY RULES FROM TRAINING DATA 731
Dow
nloa
ded
by [
McM
aste
r U
nive
rsity
] at
10:
17 1
9 D
ecem
ber
2014
Dif_Mean¼ 1/1¼ 1;
because of Dif_Mean¼Dif_Dif_Max (i.e., 1¼ 1),
so that {35, 36, 36} forms a cluster.
[Cluster(16, 16)]
16
Data½ � 88Dif½ � 88 88Dif Dif½ � 0
because of L¼R (i.e., 16¼ 16),
so that {88} forms a cluster.
Therefore, the clustering results are shown as follows:
Cluster[1]! {2, 3, 4},
Cluster[2]! {8, 10},
Cluster[3]! {15},
Cluster[4]! {21, 21, 23},
Cluster[5]! {27, 28, 28},
Cluster[6]! {35, 36, 36},
Cluster[7]! {88}.
We can use the automatic clustering algorithm to find some errors or
noise data in a training data set. Assume that there are m training in-
stances in a training data set which contains n input attributes (input
linguistic variables) X1;X2; . . . ;Xn, where the input values of the input
attribute Xiði ¼ 1; 2; . . . ; n) are denoted by xi;1; xi;2; . . . ; xi;m, and xi,p de-
notes the pth input value of the input attribute (input linguistic variable)
Xiði ¼ 1; 2; . . . ; nÞ. Furthermore, assume that the input values of each
input attribute (input linguistic variable) Xiði ¼ 1; 2; . . . ; nÞ of the trainingdata set have been sorted in an ascending sequence, i.e., xi;1 � xi;2 � � � �� xi;m, where xi;1; xi;2; . . . ; xi;m are the input values of the input attribute
(input linguistic variable) Xi and 1 � i � n. (Note: For the Iris data
(Fisher 1936), n¼ 4 and m¼ 150.) If the training data set contains some
errors or noise data, we can use the automatic clustering algorithm we
presented previously (Yeh and Chen 1994) to cluster the input values of
each input attribute (input linguistic variable) Xiði ¼ 1; 2; . . . ; nÞ of thetraining data set, respectively, to find the errors of noise data.
732 S.-M. CHEN ET AL.
Dow
nloa
ded
by [
McM
aste
r U
nive
rsity
] at
10:
17 1
9 D
ecem
ber
2014
Based on the automatic clustering algorithm, after the clustering
process, we can see that the input values of each input attribute (input
linguistic variable) Xiði ¼ 1; 2; . . . ; nÞ can be clustered as follows:
Xi ¼ fðcxi;1; cxi;2; . . . ; cxi;zÞjz is the number of clustersg ð7Þ
where cxi;JðJ ¼ 1; 2; . . . ; zÞ is a cluster of the input attribute (input lin-
guistic variable) Xiði ¼ 1; 2; . . . ; nÞ. A cluster cxi;J is a subset of the input
values of the input attribute (input linguistic variable) Xiði ¼ 1; 2; . . . ; nÞ.The number of elements in each cluster cxi;J is larger than or equal to 1
and smaller than or equal to n, and the summation of the number of
elements in each cluster of each input attribute (input linguistic variable)
Xiði ¼ 1; 2; . . . ; nÞ is m. The clusters cxi;1; cxi;2; � � � ; cxi;z of the input attri-
bute (input linguistic variable) Xiði ¼ 1; 2; . . . ; nÞ are sorted in an as-
cending sequence as follows:
cxi;J ¼ fðXi;1;Xi;2; . . . ;Xi; jÞjj is the number of elements in the cluster cxi;Jg
ð8Þ
where J ¼ 1; 2; . . . ; z. In Eq. (7), if z is larger than or equal to 2, then it
indicates that the training data set contains noise. Thus, we should find
out this noise data. Because there is not too much noise data in the
training data set, we find out the clusters cxi;J for the input attribute
(input linguistic variable) Xi, where i ¼ 1; 2; . . . ; n, which have the mini-
mum number of elements. If an element in a cluster cxi;J of the input
attribute (input linguistic variable) Xiði ¼ 1; 2; . . . ; n is larger than
AveðXiÞ � ð1þ tÞ or smaller than AveðXiÞ � ð1� tÞ, then this datum is
regarded as a noise datum, where t is a constant and AveðXiÞ is the
average input value of the input attribute (input linguistic variable)
Xiði ¼ 1; 2; . . . ; nÞ. (Note that in this paper, we set the parameter t to 0.5.)
Because noise data has a bad influence on constructing membership
functions and generating fuzzy rules, we must modify them into proper
values. The way we do it is to modify the noise value of an input attribute
(input linguistic variable) Xi into the average input value AveðXiÞ of theinput attribute (input linguistic variable) Xi, where i ¼ 1; 2; . . . ; n:
Furthermore, because some attributes of the training data might not
be useful for classification and because they will influence the training
efficiency and performance, we should remove these attributes before
the training process, and then generate the fuzzy rules to improve the
FUZZY RULES FROM TRAINING DATA 733
Dow
nloa
ded
by [
McM
aste
r U
nive
rsity
] at
10:
17 1
9 D
ecem
ber
2014
classification accuracy rate. Thus, we must find the attributes that are
useless for classification. We can calculate the degree of fuzziness of each
attribute to determine whether it is useful to deal with classification or
not. One method to calculate the degree of fuzziness is by calculating the
entropy values (Hong and Lee 1999) of each attribute. In this article, we
propose a new method to calculate the degree of fuzziness of each at-
tribute.
The entropy method is widely used to calculate the degrees of fuz-
ziness of attributes. Assume that attributes A with m possible values
fA1;A2; . . . ;Amg can partition the set of training instances S into
fS1;S2; . . . ;Sig, where Si may include some Ai. Furthermore, assume that
there are k classes to be classified. Let the number of training instances
with attribute value Ai and class Oj be nij. Then, the entropy value of
attribute A is calculated as (Hong and Lee 1999)
EðAÞ ¼ 1�Xmi¼1
Xk
j¼1nij
�Xk
r¼1nir
!� log nij
�Xk
r¼1nir
!" #( )ð9Þ
where EðAÞ 2 ½0; 1�. The larger the entropy value of E(A), the more
helpful is the attribute A to be used for classification. On the other hand,
the smaller the value of E(A), the less helpful is the attribute A to be used
for classification.
In this article, we present a new method to calculate the degree of
fuzziness DðXiÞ of an attribute Xiði ¼ 1; 2; . . . ; nÞ shown as follows:
Di; j ¼x0
i; j�AveðXiÞmaxðx0
i; jÞ�AveðXiÞ if x 0i; j � AveðXiÞ
AveðXiÞ�x0i; j
AveðXiÞ�minðx0i; jÞ
if x 0i; j < AveðXiÞ
8<: ð10Þ
DðXiÞ ¼
Pmj¼1
Di; j
mð11Þ
where x0i;1; x
0i;2; . . . ; x
0i;m are the input values of the input attribute
Xiði ¼ 1; 2; . . . ; nÞ, Max(x 0i,j) denotes the maximum value among the
values x 0i;1; x
0i;2; . . . ; x
0i;m of the input attribute Xp, Min(x 0
i,p) denotes the
minimum value among the values x 0i;1; x
0i;2; . . . ;X
0i;m of the input attribute
Xi, Ave(Xi) denotes the average input value of the input attribute Xi, and
DðXiÞ 2 ½0; 1�. The larger the value of D(Xi), the more helpful the input
734 S.-M. CHEN ET AL.
Dow
nloa
ded
by [
McM
aste
r U
nive
rsity
] at
10:
17 1
9 D
ecem
ber
2014
attribute Xiði ¼ 1; 2; . . . ; nÞ to be used for classification. The smaller the
value of D(Xi), the less helpful the input attribute Xiði ¼ 1; 2; . . . ; nÞ to beused for classification.
After calculating the degree of fuzziness of each attribute, if the de-
gree of fuzziness of an attribute is smaller than a threshold value o, whereo is given by the user and o 2 ½0; 1�, then it is regarded as a useless at-
tribute for classification.
In this article, we extend our previous work (Wu and Chen 1999) to
present a new method for generating fuzzy rules from a training data set
containing noise. Assume that the training data set P contains the fol-
lowing m input�output pairs:
P ¼ fðx1;j; x2;j; . . . ; xnj; yjÞj j ¼ 1; 2; . . . ;mg
where n denotes the number of training instances; xi, j denotes the jth
input value of the input attribute (input linguistic variable) Xiði ¼1; 2; . . . ; n and j ¼ 1; 2; . . . ;mÞ; yj denotes the jth output value of the
output attribute (output linguistic variable) Y, where j ¼ 1; 2; . . . ;m
(Note: In the Iris data, there are 150 training data (i.e., m¼ 150), and
there are 4 input attributes: ‘‘Sepal Length,’’ ‘‘Sepal Width,’’ ‘‘Petal
Length,’’ and ‘‘Petal Width’’ (i.e., n¼ 4). The output attribute is ‘‘the
flower’’; the possible output values of the output attribute ‘‘the flower’’
are ‘‘Setosa,’’ ‘‘Versicolor,’’ and ‘‘Virginica,’’ i.e., yj 2 fSetossa, Versi-color, Virginicag, where j ¼ 1; 2; . . . ;m.)
First, we consider the values y01; y02; . . . ; y
0m of the output attribute Y
to calculate the fuzzy relation Rðy0p1 ; y0p2Þ and to calculate the fuzzy
equivalence relation RT, where the definition of Rðy0p1 ; y0p2Þ is
Rðy0p1 ; y0p2Þ ¼ 1� 1
djy0p1 � y0p2 j ð12Þ
where y0p1 and y0p2 are output values of the output attribute and the value dcan be obtained as (Wu and Chen 1999)
d ¼Pm�1
i¼1 jyi � ymjm � 1
ð13Þ
where ym is the maximum output value of the output attribute Y. Because
Rðy0p1 ; y0p2Þ is a fuzzy compatibility relation and not a fuzzy equivalence
FUZZY RULES FROM TRAINING DATA 735
Dow
nloa
ded
by [
McM
aste
r U
nive
rsity
] at
10:
17 1
9 D
ecem
ber
2014
relation, after calculating the Max�Min transitive closure, we can get a
fuzzy equivalence reation RTðy0p1 ; y0p2Þ.
After calculating the fuzzy equivalence relation RTðy0p1 ; y0p2Þ, we can
divide the training data set P0 into many different parts by a-cuts (Wu
and Chen 1999). We divide the training data set P0 into r different subsets
Gjðj ¼ 1; 2; . . . ; rÞ as
Gj ¼ fðx01;p; x
02;p; . . . ; x
0n;p; y
0pÞjRTðy0p1 ; y
0p2Þ � a
a 2 ½0; 1�; 1 � p � m; 1 � p1 � m; and 1 � p2 � mg ð14Þ
where a is a threshold value to divide the training data set P0; a 2 ½0; 1�,and r is the number of subsets divided by a-cuts of the training data
set P 0.
After performing the a-cuts operations, we can see that Oj is the jth
output-value set of output attribute Y and Ii, j is the jth input-value set of
a subset Gj of input attribute X, where
Oj ¼fy0pj8ðx01;p;x
02;p; . . . ;x
0i;p;y
0pÞ 2Gj 1� p�mg; 1� j� r; ð15Þ
Ii; j ¼ fx0i;pj8ðx0
1;p; x02;p; . . . ; x
0i;p; . . . ; x
0n;p; y
0pÞ 2 Gj; 1 � p � mg;
1 � i � n; 1 � j � rð16Þ
Then, we can construct membership functions of output attribute Y from
the output-value set Oj, where j ¼ 1; 2; . . . ; r. Because we divide the
training data set P0 into r different output-value sets Ojðj ¼ 1; 2; . . . ; rÞ bya-cuts, every output-value set corresponds to a membership function as
Aj;a ¼ fy0jy0 2 Oj and mAjðy0Þ � ag j ¼ 1; 2; . . . ; r ð17Þ
where mAjis a membership function of the output attribute Aj. In this
article, we use triangular membership functions to represent fuzzy sets. A
fuzzy set can be represented by a triangular membership function (a, b, c)
as shown in Figure 1. The triangular membership function (aj, bj, cj) of
the output fuzzy set Aj can be obtained as
bj ¼y0min þ y0max
2; aj ¼ bj �
bj � y0min1� a
; cj ¼ bj þy0max � bj
1� að18Þ
736 S.-M. CHEN ET AL.
Dow
nloa
ded
by [
McM
aste
r U
nive
rsity
] at
10:
17 1
9 D
ecem
ber
2014
where y0min and y0max are the minimum element and the maximum element
of Aj;a, respectively, and the membership function can be obtained as
mAjðy0Þ ¼
y0 � aj
jbj � ajjif aj � y0 � bj
cj � y0
jcj � bjjif bj � y0 � cj
0 otherwise
8>>>>><>>>>>:
ð19Þ
After constructing the membership functions mA1ðy0Þ; mA2
ðy0Þ; . . . ; mArðy0Þ
of A1;A2; . . . ;Ar, respectively, we can construct the membership func-
tions of the n input attributes (input linguistic variables) X1;X2; . . . ;Xn,
respectively. For the input attribute (input linguistic variable) Xiði ¼1; 2; . . . ; nÞ, its corresponding input values in the sorted training data set
P0 also have concurrently been divided into r input-value sets Ii;1;
Ii;2; . . . ; Ii;r. We calculate the fuzzy compatibility relation Ri; jðx0i;p1
; x0i;p2
Þ as(Klir and Yuan 1995; Wu and Chen 1999)
Ri; jðx0i;p1
; x0i;p2
Þ ¼ 1� 1
dx0
i;p1� x0
i;p2
��� ��� ð20Þ
where x0i;p1
2 Ii; j; x0i;p2
2 Ii; j; 1 � i � n; 1 � j � r; 1 � p1 � jIi; jj; 1 � p2 �jIi; jj, and the value of d is calculated as
d ¼
PjIi; jj�1p¼1
X0i;p � X0
i;max
��� ���jIi; jj � 1
ð21Þ
where x0i;max is the maximum input-value of Ii; j, and jIi; jj denotes the
number of elements of the input-value set Ii; j. Because Ri; jðx0i;p1
; x0i;p2
Þ is afuzzy compatibility relation and not a fuzzy equivalence relation, after
calculating the Max�Min transitive closure, we can get a fuzzy equiva-
lence relation RTðx0i;p1
;x0i;p2
Þ. Furthermore, based on the a-cuts of fuzzyequivalence relations, we can divide different subsets Ii; j;k as
Ii; j;k ¼ fx0i;pjx0
i;p 2 Ii; j;RTi; jðx0
i;p1; x0
i;p2Þ � a; a 2 ½0; 1�;
1 � p � jIi; jj; 1 � p1 � jIi; jj; 1 � p2 � jIi; jjg ð22Þ
FUZZY RULES FROM TRAINING DATA 737
Dow
nloa
ded
by [
McM
aste
r U
nive
rsity
] at
10:
17 1
9 D
ecem
ber
2014
where k ¼ 1; 2; . . . ;TjðXiÞ, and TjðXjÞ is the number of input-value sub-sets obtained from the jth input-value set Ii; j.
Because we use a-cuts to divide the input attribute (input linguistic
variable) Xiði ¼ 1; 2; . . . ; nÞ into some different a-cuts Aði; j;kÞ;a as
Aði; j;kÞ;a ¼fx0i;pjx0
i;p 2 Ii; j;k;mAi; j;kðx0
i;pÞ� ag; 1� j� r; and 1� k�TjðXiÞð23Þ
where mAi; j;kis the membership function of the input fuzzy set Ai; j;k. We
can define the membership function mAi; j;kby a triplet (ai; j;k; bi; j;k; ci; j;k)
shown as
bi; j;k ¼ x0i;min
þx0i;max
2
ai; j;k ¼ ai; j;k �bi; j;k�x0
i;min
1�a
ci; j;k ¼ bi; j;k þx0
i;max�bi; j;k
1�a
ð24Þ
where x0i;min and x0
i;max are the minimum and the maximum elements of
the a-cut Aði; j;kÞ;a, respectively. Furthermore, we can define the mem-
bership function mAi; j;kas
mAi; j;kðx0
iÞ ¼
x0i�ai; j;k
jbi; j;k�ai; j;kj ifai; j;k � x0i � bi; j;k
ci; j;k�x0i
jci; j;k�bi; j;kj ifbi; j;k � x0i � ci; j;k
0 otherwise
8><>: ð25Þ
Thus, we can use these membership functions to generate fuzzy rules.
Consider the following two fuzzy rules (Wu and Chen 1999)
IFX1 is A1;1;1 and X2 isA2;1;1 THENY is y1
IFX1 is A1;2;1 and X2 isA2;2;1 THENY is y2
where X1 and X2 are input attributes (input linguistic variables); Y is the
output attribute; A1,1,1 and A1,2,1 are the input fuzzy sets of the input
attribute (input linguistic variable) X1; A2,1,1 and A2,2,1 are the input fuzzy
sets of the input attribute (input linguistic variable) X2; and y1 and y2 are
the output values of the output attribute (output linguistic variable) Y,
respectively. If the degree of similarity between the input fuzzy sets A2,1,1
738 S.-M. CHEN ET AL.
Dow
nloa
ded
by [
McM
aste
r U
nive
rsity
] at
10:
17 1
9 D
ecem
ber
2014
and A2,2,1 is larger than a threshold value, where the threshold value is
given by the user and is between zero and one, then we can merge the
input fuzzy sets A2,1,1 and A2,2,1 into A2,new and simplify these two fuzzy
rules into the following rules:
IFX1 is A1;1;1 and X2 isA2;new THENY is y1
IFX1 is A1;2;1 and X2 isA2;new THENY is y2
In the following,we briefly review the process of how tomerge the input
fuzzy sets Ai; j1;k1 and Ai; j2;k2 into Ai;new (Wu and Chen 1999). First, we can
merge some very similar membership functions appearing in the fuzzy rule
base to calculate the equality degree of two input fuzzy sets Ai; j1;k1 and
Ai; j2;k2 of input attribute (input linguistic variable)Xi as: (Lin andLee 1996)
EðAi; j1;k1 ;Ai; j2;k2Þ ¼jAi; j1;k1 \ Ai; j2;k2 jjAi; j1;k1 [ Ai; j2;k2 j
ð26Þ
where 1 � j1 � r; 1 � j2 � r; 1 � k1 � Tj1ðXiÞ; 1 � k2 � Tj2ðXiÞ, and
EðAi; j1;k1 ; Ai; j2;k2Þ 2 ½0; 1�. Let aEquality be a threshold value (Wu and Chen
1999), where aEquality 2 ½0; 1�. If EðAi; j1;k1 ;Ai; j2;k2Þ > aEquality, then we can
merge the input fuzzy sets Ai; j1;k1 and Ai; j2;k2 to get a new input fuzzy set
Ai;new which can be represented by a triplet ðai;new; bi;new; ci;newÞ. The tri-plet ðai;new; bi;new; ci;newÞ is calculated by averaging the triplets
ðai; j1;k1 ; bi; j1;k1 ; ci; j1;k1Þ and ðai; j2;k2 ; bi; j2;k2 ; ci; j2;k2Þ. Therefore, the triplet
ðai;new; bi;new; ci;newÞ of the merged fuzzy set Ai;new is obtained as
ai;new ¼ ai; j1;k1 þ ai; j2;k2
2
bi;new ¼ bi; j1;k1 þ bi; j2;k2
2
ci;new ¼ ci; j1;k1 þ ci; j2;k2
2
ð27Þ
where the membership function mAi;newof input fuzzy set Ai;new is obtained
as
mAi;newðx0iÞ ¼
x0i�ai;new
jbi;new�ai;newj if ai;new � xi � bi;new
ci;new�x0ijci;new�bi;new j if bi;new � x0
i � ci;new
0 otherwise
8><>:
FUZZY RULES FROM TRAINING DATA 739
Dow
nloa
ded
by [
McM
aste
r U
nive
rsity
] at
10:
17 1
9 D
ecem
ber
2014
In the following text, we present a new algorithm to construct member-
ship functions and to generate fuzzy rules from training data containing
noise. The algorithm is essentially a modification of the one we presented
previously (Wu and Chen 1999). The algorithm is now presented as
follows.
Input. The training data set P contains m input-output pairs
ðx1;j; x2;j; . . . ; xn;j; yjÞ;P ¼ fðx1;j; x2;j; . . . ; xn;j; yjÞjj ¼ 1; 2; . . . ;mg, where m
is the number of training instances; n is the number of input attributes
(input linguistic variables); xi;1;xi;2; . . . ; xi;m are the input values of the
input attributes (input linguistic variable) Xi, where 1 � i � n and
1 � j � m; yj is the jth output value of the output attribute (output lin-
guistic variable) Y of the training data set P, where 1 � j � m.
Output. Fuzzy rules.
Step 1. For each input attribute (input linguistic variable) Xiði ¼ 1;
2; . . . ; nÞ in the training data set P, sort the input values xi;1; xi;2; . . . ; xi;m
of each input attribute (input linguistic variable) Xi in an ascending se-
quence x0i;1; x
0i;2; . . . ; x
0i;m, respectively, to obtain the sorted training data
set P0.
Step 2. For each input attribute (input linguistic variable) Xiði ¼ 1;
2; . . . ; nÞ in the sorted training data set P0, use the automatic clustering
algorithm to cluster the sorted input values x0i;1;x
0i;2; . . . ; x
0i;m of the input
attribute (input linguistic variable) Xi, respectively, where 1 � i � n.
Step 3. For each input attribute (input linguistic variable)
Xiði ¼ 1; 2; . . . ; nÞ in the sorted training data set P0,
if the attribute values of the attribute Xi were clustered into more than or
equal to two clusters
then find the cluster Cj of the input attribute (input linguistic variable) Xi
which has the minimum number of elements, respectively. If the values of
the elements in cluster Cj are larger than the value of AveðXiÞ � ð1þ tÞ orsmaller than the value of AveðXiÞ � ð1� tÞ, where t is a constant (Note
that this paper, we set the parameter t¼ 0.5) and AveðXiÞ is the averageattribute value of the input attribute (input linguistic variable) Xi, then set
the values of the elements to AveðXiÞ.
740 S.-M. CHEN ET AL.
Dow
nloa
ded
by [
McM
aste
r U
nive
rsity
] at
10:
17 1
9 D
ecem
ber
2014
Step 4. For each input attribute (input linguistic variable)
Xiði ¼ 1; 2; . . . ; nÞ in the sorted training data set P0, calculate the degree of
fuzziness DðXiÞ of each input attribute (input linguistic variable) Xi using
Eqs. (10) and (11).
Step 5. for i ¼ 1 to n do
if the degree of fuzziness DðXiÞ of an input attribute (input
linguistic variable)Xi in the sorted training data setP0 is smaller
than the threshold value o given by the user, where o 2 ½0; 1�then remove the input attribute (input linguistic variable) Xi
from the sorted training data set P0;
end.
Step6. Construct the equivalence RT and divide the sorted training data
set P0 into r different sets by the following steps:
1. Sort the training data set P0 in an ascending sequence based on the
output values of the output attribute Y to obtain the sorted training
data set P00. Construct the equivalence relation RT between the output
values of the output attribute Y in the sorted training data set P00.
2. Divide the sorted training data in the sorted training set P00 into r
different output-value sets Oj of the output attribute Y and r different
input-values sets Ii; j of the input attribute Xi based on the a-cuts of theequivalence relations RT derived from the output values of the output
attribute Y in the sorted training set P00, where 1 � i � n and
1 � j � r; a is a threshold value given by the user, and a 2 ½0; 1�.
Step7. Derive the membership function mAjðyÞ of the output fuzzy set Aj
of the output attribute Y based on the a-cuts Aj;a of the output fuzzy set
Aj with respect to the output-value sets Oj, where 1 � i � r, a is a
threshold value given by the user, and a 2 ½0; 1�.
Step 8. Construct the equivalence relation RTi; j and divide the input va-
lues of the input-values set Ii; j into TjðXiÞ input-value subsets, where
1 � i � n and 1 � j � r, by the following steps:
1. Sort the input values of the input-value set Ii; j of the input attribute
(input linguistic variable) Xi in an ascending sequence, where
1 � i � n and 1 � j � r, and construct the equivalence RTi; j between
FUZZY RULES FROM TRAINING DATA 741
Dow
nloa
ded
by [
McM
aste
r U
nive
rsity
] at
10:
17 1
9 D
ecem
ber
2014
the input values of the jth input-value set Ii; j of the input attribute
(input linguistic variable) Xi.
2. Divide the input values of the input-value set Ii; j into TjðXiÞ input-value subsets Ii; j;k based on the a-cuts of the equivalence relation RT
i; j,
where 1 � i � n; 1 � j � r; 1 � k � TjðXiÞ, and TjðXiÞ is the number
of input-value subsets obtained from the jth input-value set Ii; j of the
input attribute Xi.
3. Derive the input membership function mAi; j;kðXiÞ of the input fuzzy set
Ai; j;k of the input attribute (input linguistic variable) Xi based on the
a-cuts Aði; j;kÞ;a of the input fuzzy set Ai; j;k with respect to the input-
value subset Ii; j;k of the input attribute (input linguistic variable) Xi,
where 1 � i � n; 1 � j � r; 1 � k � TjðXiÞ, and TjðXiÞ is the number
of input-value subsets obtained from the jth input-value set Ii; j of the
input attribute (input linguistic variable) Xi.
Step 9. Generate the fuzzy rules based on the hierarchical relationship
between the output value yj of the output attribute (output linguistic
variable) Y and the corresponding input fuzzy set Ai; j;k of the input at-
tribute (input linguistic variable) Xi, where 1 � i � n; 1 � j � r, and
1 � k � TjðXiÞ.
Step10. Calculate the equality degree between input fuzzy sets and mergethe membership functions of the fuzzy rule base by the following steps:
1. Calculate the equality degree EðAi; j1;k1 ;Ai; j2;k2Þ between the input
fuzzy sets Ai; j1;k1 and Ai; j2;k2 of the input attribute (input linguistic
variable) Xi based on Eq. (26), where 1 � i � n; 1 � j1 � r;
� j2 � r; 1 � k1 � Tj1ðXiÞ, and 1 � k2 � Tj2ðXiÞ.2. if EðAi; j1;k1 ;Ai; j2;k2Þ > aEquality, where the value of aEquality is specified
by the user and aEquality 2 ½0; 1�; 1 � j1 � r; 1 � j2 � r; 1 � k1 �Tj1ðXiÞ, and 1 � k2 � Tj2ðXiÞ
then
{
Construct the new input fuzzy set Ai;new of the input attribute (input
linguistic variable) Xi by merging the input fuzzy sets Ai; j1;k1 and
Ai; j2;k2 of the input attribute (input linguistic variable) Xi;
Replace the input fuzzy sets Ai; j1;k1 and Ai; j2;k2 of the input attribute
(input linguistic variable) Xi in the antecedent parts of the fuzzy
rules by the new input fuzzy set Ai;new
}.
742 S.-M. CHEN ET AL.
Dow
nloa
ded
by [
McM
aste
r U
nive
rsity
] at
10:
17 1
9 D
ecem
ber
2014
EXPERIMENTALRESULTS
We applied the proposed algorithm to deal with the Iris data classification
problem, where 5% noise was added into the Iris data. There are three
kinds of flowers in the Iris data (i.e., Setosa, Versicolor, and Virginica) and
there are 50 instances in each kind. Thus, the Iris data contains 150
instances where there are four input attributes in the Iris data, i.e., sepal
length (SL), sepal width (SW), petal length (PL), and petal width (PW).
We have used MATLAB version 5.0 to write a program on a Pentium II
PC to deal with the Iris data classification problem (Fisher 1936, Hong
and Lee 1996; Wu and Chen 1999). Because 150� 5%¼ 7.5, we generate 8
noise values randomly in the input attributes. We randomly chose 75
values from Iris data set as the training data set to construct the mem-
bership functions and to generate fuzzy rules, and the other 75 values are
used to test the correctness of the classification of the generated fuzzy
rules. In this case, we set the parameter t to 0.5 and use Eqs. (10) and (11)
to calculate the degree of fuzziness of each attribute. We also set the
threshold value o to 0.5, set the threshold value a to 0.7, and set the
threshold value aEquality to 0.5.
After executing the program, we can see that the degrees of fuzziness
of the attributes SL and SW are smaller than the threshold value o, re-spectively, where o ¼ 0:5. In other words, SL and SW are useless for
classification, so we remove those attributes from the training data. Now,
there are only two attributes in the training data set, i.e., PL and PW.
We have randomly chosen 75 instances from the Iris data as the training
data set to generate fuzzy rules, and the other 75 instances are used to test
the correctness of the classification of the generated fuzzy rules. According
to the proposed algorithm, we can generate the fuzzy rules as shown
IF petal length is PL0 AND petal width is PW0 THEN the flower is
Setosa,
IF petal length is PL1 AND petal width is PW1 THEN the flower is
Versicolor,
IF petal length is PL2 AND petal width is PW2 THEN the flower is
Virginica,
where themembership functionsofPL0,PL1,andPL2are shown inFigure2.
The membership functions of PW0, PW1, and PW2 are shown in Figure 3.
Based on Eq. (26), we can calculate the degree of equality between
the input fuzzy sets PL0, PL1, and PL2. Because all the equality degrees
between the input fuzzy sets PL0, PL1, and PL2 are smaller than the
FUZZY RULES FROM TRAINING DATA 743
Dow
nloa
ded
by [
McM
aste
r U
nive
rsity
] at
10:
17 1
9 D
ecem
ber
2014
threshold value aEquality, where the threshold value aEquality¼ 0.5, we do
not need to merge these membership functions. Finally, the fuzzy rules
generated by the proposed fuzzy learning algorithm are listed as follows:
IF petal length is PL0 AND petal width is PW0 THEN the flower is
Setosa,
Figure 2. Membership functions of PL0, PL1, and PL2.
Figure 3. Membership functions of PW0, PW1, and PW2.
744 S.-M. CHEN ET AL.
Dow
nloa
ded
by [
McM
aste
r U
nive
rsity
] at
10:
17 1
9 D
ecem
ber
2014
IF petal length is PL1 AND petal width is PW1 THEN the flower is
Versicolor,
IF petal length is PL2 AND petal width is PW2 THEN the flower is
Virginica.
After using 75 instances as the training data and the other 75 instances
the testing data, the classification results of the Iris data are shown in Table 1.
After executing 200 runs of the program, we can get the average
classification accuracy rate. We compare the results with Hong and Lee’s
algorithm (1996) and Wu and Chen’s algorithm (1999). The compared
results are shown in Table 2.
From Table 2, we see can that the average classification accuracy rate
of the proposed algorithm is 96.676%, and it is higher than the accuracy
rate 96.21% of Wu and Chen’s algorithm (1999) and the accuracy rate
95.57% of Hong and Lee’s algorithm (1996), and the number of the fuzzy
rules generated by the proposed algorithm is also less than Hong and
Lee’s algorithm. We also can see that the number of input attributes
appearing in the antecedent portions of the generated fuzzy rules is less
than Hong and Lee’s algorithm and Wu and Chen’s algorithm.
Furthermore, when we add 5% noise into the Iris data and randomly
choose 75 instances from the Iris data as the training data set, and the
other 75 instances are used as the testing data set, the average classification
accuracy rate of the proposed algorithm after 2000 runs is 95.88874%.
Furthermore, when we add 5% noise into the Iris data and randomly
choose 120 instances from the Iris data as the training data set, and the
other 30 instances are used as the testing data set, the average classification
accuracy rate of the proposed algorithm after 2000 runs is 96.604%.
When we also randomly choose 75 instances from the Iris data (no
noise) as the training data set, and the other 75 instances are used as the
testing data set, the average classification accuracy rate of the proposed
algorithm after 2000 runs is 96.33624%. Furthermore, when we also
Table 1. Classification results of the Iris data containing noise
Number of correct
classifications
Number of wrong
classifications
Setosa 25 0
Versicolor 24 1
Virginica 24 1
FUZZY RULES FROM TRAINING DATA 745
Dow
nloa
ded
by [
McM
aste
r U
nive
rsity
] at
10:
17 1
9 D
ecem
ber
2014
randomly choose 120 instances from the Iris data (no noise) as the
training data set, and the other 30 instances are used as the testing data
set, the average classification accuracy rate of the proposed algorithm
after 2000 runs is 96.90621%.
CONCLUSIONS
We have extended the work we presented previously (Wu and Chen 1999)
to present a new algorithm to generate fuzzy rules and construct mem-
bership functions from training data containing noise based on the a-cutsof equivalence relations. The proposed algorithm can get a higher aver-
age classification accuracy rate, generate fewer fuzzy rules, and generate
fewer input attributes in the antecedent portions of the generated fuzzy
rules. Based on the proposed algorithm, we implemented a program on a
Pentium II PC using MATLAB Version 5.0 to deal with the Iris data
classification problem. The experimental results were compared with the
Table 2. A comparison of the number of fuzzy rules, accuracy rate, and the number of
input fuzzy sets for different algorithms
Algorithms
Average
accuracy rate
Number of
fuzzy rules
Number of
input fuzzy
sets
Wu and Chen’s algorithm
(1999) (Training data set: 75
instances; testing data set: 75
instances; execution: 200 runs;
no noise)
96.2100% 3 8.21
Hong and Lee’s algorithm
(1996) (Training data set: 75
instances; testing data set: 75
instances; execution: 200 runs;
no noise)
95.5700% 6.21 8
Proposed algorithm (Training
data set: 120 instances; testing
data set: 30 instances;
execution: 200 runs; no noise)
96.8400% 3 6
Proposed algorithm (Training data
set: 75 instances; testing data
set: 75 instances; execution:
200 runs; no noise)
96.6760% 3 6
746 S.-M. CHEN ET AL.
Dow
nloa
ded
by [
McM
aste
r U
nive
rsity
] at
10:
17 1
9 D
ecem
ber
2014
results of Hong and Lee’s algorithm (1996) and Wu and Chen’s algorithm
(1999). The proposed algorithm is better than the existing methods due to
the following facts:
1. The proposed algorithm could get a higher average classification ac-
curacy rate than the existing methods.
2. The proposed algorithm can deal with noise data. However, Hong
and Lee’s algorithm (1996) and Wu and Chen’s algorithm (1999) can
not deal with noise data.
3. The proposed algorithm generates fewer input attributes appearing in
the antecedent portions of the generated fuzzy rules than that of Hong
and Lee’s algorithm and Wu and Chen’s algorithm.
REFERENCES
Castro, J. L., and J. M. Zurita. 1997. An inductive learning algorithm in fuzzy
systems. Fuzzy Sets and Systems 89(2):193�203Castro, J. L., J. J. Castro-Schez, and J. M. Zurita. 1999. Learning maximal
structure rules in fuzzy logic for knowledge acquisition in expert systems.
Fuzzy Sets and Systems 101(2):331�342.Chen, S. M., and S. Y. Lin. 2000. A new method for constructing fuzzy decision
trees and generating fuzzy classification rules from training examples.
Cybernetics and Systems: An International Journal 31(7):763�785.Chen, S. M., and M. S. Yeh. 1998. Generating fuzzy rules from relational da-
tabase systems for estimating null values. Cybernetics and Systems: An In-
ternational Journal 29(6):363�376.Chen, S. M., S. H. Lee, and C. H. Lee. 1999. Generating fuzzy rules from nu-
merical data for handling fuzzy classification problem. In Proceedings of the
1999 National Computer Symposium, vol. 2, 336�343, Taipei, Taiwan.Fisher, R. 1936. The use of multiple measurements in taxonomic problems. An-
nals of Eugenics 7:179�188.Giarratano, J., and G. Riley. 1994. Expert Systems: Principles and Programming.
Boston: PWS Publishing.
Hayashi, Y., and A. Imura. 1990. Fuzzy neural expert system with automated
extraction of fuzzy if-then rules from a trained neural network. Proceedings
of the 1990 First International Symposium on Uncertainty Modeling and
Analysis, University of Maryland, College Park, Maryland.
Hong, T. P., and J. B. Chen. 1999. Finding relevant attributes and membership
functions. Fuzzy Sets and Systems 103(1):389�404.Hong, T. P., and J. B. Chen. 2000. Process individual fuzzy attributes for fuzzy
rule induction. Fuzzy Sets and Systems 112(1):127�140.
FUZZY RULES FROM TRAINING DATA 747
Dow
nloa
ded
by [
McM
aste
r U
nive
rsity
] at
10:
17 1
9 D
ecem
ber
2014
Hong, T. P., and C. Y. Lee. 1996. Induction of fuzzy rules and membership
functions from training examples. Fuzzy Sets and Systems 84(1):33�47.Hong, T. P., and C. Y. Lee. 1999. Effect of merging order on performance of
fuzzy induction. Intelligent Data Analysis 3(2):139�151.Horowitz, E., and S. Sahni. 1978. Fundamentals of Computer Algorithms.
Maryland: Computer Science Press.
Kao, C. H., and S. M. Chen. 2000. A new method to generate fuzzy rules from
training data containing noise for handling classification problems. In Pro-
ceedings of the Fifth Conference on Artificial Intelligence and Applications, pp.
324�332, Taipei, Taiwan.Klir, G. J., and B. Yuan. 1995. Fuzzy Sets and Fuzzy Logic: Theory and Appli-
cations. Englewood Cliffs, NJ: Prentice Hall.
Lee, C. C. 1990a. Fuzzy logic in control systems: Fuzzy logic controller-Part I.
IEEE Transactions on Systems, Man, and Cybernetics 20(2):404�418.Lee, C. C. 1990b. Fuzzy logic in control systems: Fuzzy logic controller-Part II.
IEEE Transactions on Systems, Man, and Cybernetics 20(2):419�440.Lin, C. T., and C. S. G. Lee. 1996. Neural Fuzzy Systems. Englewood Cliffs, NJ:
Prentice Hall.
Nozaki, K., H. Ishibuchi, and H. Tanaka. 1997. A simple but powerful heuristic
method for generating fuzzy rules from numerical sets. Fuzzy Sets and Sys-
tems 86(3):251�270.Wang, C. H., J. F. Liu, T. P. Hong, and S. S. Tseng. 1999. A fuzzy inductive
strategy for modular rules. Fuzzy Sets and Systems 103(1):91�105.Wang, L. X., and J. M. Mendel. 1992. Generating fuzzy rules by learning from
examples. IEEE Transactions on Systems, Man, and Cybernetics
22(6):1414�1427.Wu, T. P., and S. M. Chen. 1999. A new method for constructing membership
functions and fuzzy rules from training examples. IEEE Transactions on
Systems, Man, and Cybernetics Part B 29(1):25�40.Yeh, M. S., and S. M. Chen. 1994. An automatic clustering algorithm for fuzzy
query processing. In Proceedings of the 5th International Conference on In-
formation Management, pp. 241�250, Taipei, Taiwan.Yoshinari, Y., W. Pedrycz, and K. Hirota. 1993. Construction of fuzzy models
through clustering techniques. Fuzzy Sets and Systems 54(2):157�165.Yuan, Y., and M. J. Shaw. 1995. Induction of fuzzy decision trees. Fuzzy Sets and
Systems 69(2):125�139.Yuan, Y., and H. Zhuang. 1996. A genetic algorithm for generating fuzzy clas-
sification rules. Fuzzy Sets and Systems 84(1):1�19.Zadeh, L. A. 1965. Fuzzy sets. Information and Control, 8:338�353.
748 S.-M. CHEN ET AL.
Dow
nloa
ded
by [
McM
aste
r U
nive
rsity
] at
10:
17 1
9 D
ecem
ber
2014