GENERATING FUZZY RULES FROM TRAINING DATA CONTAINING NOISE FOR HANDLING CLASSIFICATION PROBLEMS

This article was downloaded by: [McMaster University]On: 19 December 2014, At: 10:17Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH,UK

Cybernetics and Systems: AnInternational JournalPublication details, including instructions forauthors and subscription information:http://www.tandfonline.com/loi/ucbs20

GENERATING FUZZY RULESFROM TRAINING DATACONTAINING NOISE FORHANDLING CLASSIFICATIONPROBLEMSSHYI-MING CHEN a , CHENG-HSUAN KAO b & CHENG-HAO YU ba Department of Computer Science and InformationEngineering, National Taiwan University of Scienceand Technology, Taipei, Taiwan, R.O.C.b Department of Electronic Engineering, NationalTaiwan University of Science and Technology, Taipei,Taiwan, R.O.C.Published online: 30 Nov 2010.

To cite this article: SHYI-MING CHEN , CHENG-HSUAN KAO & CHENG-HAO YU (2002)GENERATING FUZZY RULES FROM TRAINING DATA CONTAINING NOISE FOR HANDLINGCLASSIFICATION PROBLEMS, Cybernetics and Systems: An International Journal, 33:7,723-748

To link to this article: http://dx.doi.org/10.1080/01969720213935

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all theinformation (the “Content”) contained in the publications on our platform.However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness,or suitability for any purpose of the Content. Any opinions and views

http://www.tandfonline.com/loi/ucbs20

http://dx.doi.org/10.1080/01969720213935

expressed in this publication are the opinions and views of the authors, andare not the views of or endorsed by Taylor & Francis. The accuracy of theContent should not be relied upon and should be independently verified withprimary sources of information. Taylor and Francis shall not be liable for anylosses, actions, claims, proceedings, demands, costs, expenses, damages,and other liabilities whatsoever or howsoever caused arising directly orindirectly in connection with, in relation to or arising out of the use of theContent.

This article may be used for research, teaching, and private study purposes.Any substantial or systematic reproduction, redistribution, reselling, loan,sub-licensing, systematic supply, or distribution in any form to anyone isexpressly forbidden. Terms & Conditions of access and use can be found athttp://www.tandfonline.com/page/terms-and-conditions

Dow

nloa

ded

by [

McM

aste

r U

nive

rsity

] at

10:

17 1

9 D

ecem

ber

2014

http://www.tandfonline.com/page/terms-and-conditions

GENERATINGFUZZYRULESFROMTRAININGDATACONTAININGNOISEFORHANDLINGCLASSIFICATIONPROBLEMS

SHYI-MINGCHEN

Department of Computer Science and InformationEngineering, NationalTaiwan University of Science andTechnology,Taipei,Taiwan, R.O.C.

CHENG-HSUANKAOCHENG-HAOYU

Department of Electronic Engineering, NationalTaiwanUniversity of Science andTechnology,Taipei,Taiwan, R.O.C.

It is obvious that one of the important tasks in a fuzzy system is to find a set of

rules to deal with a specific classification problem. In recent years, many re-

searchers focused on the research topic of generating fuzzy rules from training

data for handling classification problems. In a previous paper, we presented

an algorithm to construct membership functions and to generate fuzzy rules

from training examples. In this paper, we extend that work to propose a new

algorithm to generate fuzzy rules from training data containing noise to deal

with classification problems. The proposed algorithm gets a higher classifi-

cation accuracy rate and generates fewer fuzzy rules and fewer input attributes

in the antecedent portions of the generated fuzzy rules.

This work was supported in part by the National Science Council, Republic of China,

under Grant NSC 89-2213-E-011-100.

Address correspondence to Professor Shyi-Ming Chen, Ph.D., Department of Compu-

ter Science and Information Engineering, National Taiwan University of Science and

Technology, Taipei, Taiwan, R.O.C.

Cybernetics and Systems: An InternationalJournal, 33: 723�748, 2002Copyright# 2002 Taylor & Francis

0196-9722/02 $12.00+ .00

DOI: 10.1080/01969720290040812

723

Dow

nloa

ded

by [

McM

aste

r U

nive

rsity

] at

10:

17 1

9 D

ecem

ber

2014

INTRODUCTION

We usually encounter many things that contain imprecision and vague-

ness. In 1965, Zadeh proposed the theory of fuzzy sets (Zadeh 1965). It

can deal properly with the imprecision and vagueness. After Zadeh

proposed the theory of fuzzy sets, there were many theoretic develop-

ments in fuzzy logic, and there are many applications of fuzzy logic. The

fuzzy classification system is an important application. In a fuzzy clas-

sification system, objects can be classified properly by a fuzzy rule base.

One of the important tasks in a fuzzy system is to find a set of rules to

deal with a specific classification problem. In recent years, many re-

searchers focused on the research topic of generating fuzzy rules from

training data for handling classification problems (Castro and Zurita

1997; Castro, Castro-Schez, and Zurita 1999; Chen and Lin 2000; Chen

and Yeh 1998; Chen, Lee, and Lee 1999; Hayashi and Imura 1990; Hong

and Chen 1999, 2000; Hong and Lee 1996; Kao and Chen 2000; Nozaki,

Ishibuchi, and Tanaka 1997; Wang et al. 1999; Wang and Mendel 1992;

Wu and Chen 1999; Yoshinari, Pedrycz, and Hirota 1993; Yuan and

Shaw 1995; Yuan and Zhuang 1996). Castro, Castro-Schez, and Zurita

(1999) presented a machine learning method by using fuzzy logic. Castro

and Zurita (1997) presented an induction-learning algorithm for fuzzy

systems. We presented an algorithm to generate fuzzy rules from rela-

tional database systems to estimate null values (Chen and Yeh 1998), a

method for generating fuzzy rules from numerical data for handling fuzzy

classification problems (Chen, Lee, and Lee 1999), and a method for

constructing fuzzy decision trees and generating fuzzy classification rules

from training examples using compound analysis techniques (Chen and

Lin 2000). Hayashi and Imura (1990) presented a fuzzy neural expert

system to generate fuzzy if�then rules automatically from a trained

neural network. Hong and Chen (1999) presented an algorithm to gen-

erate fuzzy rules and membership functions automatically, and later,

Hong and Chen (2000) presented an algorithm which can deal with the

noise data of a training data set. Hong and Lee (1996) presented an al-

gorithm to automatically generate fuzzy rules and membership functions

from training examples. Nozaki, Ishibuchi, and Tanaka (1997) presented

a method to construct fuzzy rules from numerical data. Wang et al.

(1999) presented a fuzzy inductive learning strategy to generate fuzzy

rules. Wang and Mendel (1992) presented an algorithm to generate fuzzy

rules from training examples. We have presented a method to construct

724 S.-M. CHEN ET AL.

Dow

nloa

ded

by [

McM

aste

r U

nive

rsity

] at

10:

17 1

9 D

ecem

ber

2014

fuzzy rules and membership functions from training examples (Wu and

Chen 1999). Yuan, Pedrycz, and Hirota (1993) presented an algorithm

based on fuzzy decision trees to generate fuzzy classification rules auto-

matically. Yuan and Shaw (1995) presented a method to construct fuzzy

decision trees and to generate fuzzy rules. Yuan and Zhuang (1996)

presented a genetic algorithm for generating fuzzy classification rules.

However, there are some drawbacks in the existing fuzzy-rules gene-

ration methods for handling classification problems: (1) Most algorithms

are unable to deal with the training data containing noise. (2) The clas-

sification accuracy rate is not high enough. (3) There are too many fuzzy

rules to be generated in the fuzzy rules generation process. (4) There are

too many input attributes in the antecedent portions of the generated

fuzzy rules. Thus, we must develop a better algorithm to overcome the

drawbacks of the existing methods.

In this article, we extend the work we presented in a previous paper

(Wu and Chen 1999) to propose a new algorithm to generate fuzzy rules

and to construct membership functions from training data containing

noise to deal with the Iris data classification problem (Fisher 1936). The

proposed algorithm can get a higher classification accuracy rate, generate

fewer fuzzy rules, and can generate fewer input attributes in the ante-

cedent portions of the generated fuzzy rules than the existing methods.

FUZZYSET THEORY

In 1965, Zadeh proposed the theory of fuzzy sets (Zadeh 1965). It can

properly deal with imprecision and vagueness of information in the real

world. In the following text, we briefly review some basic concepts of

fuzzy sets and fuzzy relations (see Lee 1990a, b; Zadeh 1965).

Let U be the universe of discourse, U ¼ fu1; u2; . . . ; ung. A fuzzy set A

of U can be represented by

A ¼ mAðu1Þ=u1 þ mAðu2Þ=u2 þ � � � þ mAðunÞ=un ð1Þ

where mA is the membership function of the fuzzy set A, and mA(ui) in-

dicates the grade of membership of ui in the fuzzy set A, where

mAðuiÞ 2 ½0; 1�. Figure 1 shows a triangular fuzzy set A with the mem-

bership function mA parameterized by a triplet (a, b, c) (i.e., A¼ (a, b, c)),

where b is called the center of the triangular fuzzy set A, and a and c are

called the left vertex and the right vertex, respectively, of the triangular

fuzzy set A.

FUZZY RULES FROM TRAINING DATA 725

Dow

nloa

ded

by [

McM

aste

r U

nive

rsity

] at

10:

17 1

9 D

ecem

ber

2014

In a fuzzy relation, the degree of relationship between two elements is

assigned by a real value between zero and one. Let A1;A2; . . . ;An be

fuzzy sets and let A1 � A2 � � � � � An be their Cartesian product. Then,

we can define an nary fuzzy relation R as

RðA1;A2; . . . ;AnÞ A1 � A2 � � � � � An ð2Þ

where

A1 � A2 � � � � � An ¼ fðx1; x2; . . . ; xnÞjxi 2 Ai and 1 � i � ng ð3Þ

The membership function of RðA1;A2; . . . ;AnÞ is denoted by mR, where

mRðx1; x2; . . . ;xnÞ 2 ½0; 1�.Assume that R1(X, Y) and R2(Y, Z) are two binary fuzzy relations,

where X, Y, and Z are fuzzy sets. The composition of R1(X, Y) and

R2(Y, Z) is defined as

mR1 +R2ðx; zÞ ¼ max

y2Ymin½mR1

ðx; yÞ; mR2ðy; zÞ� 8ðx; zÞ 2 X� Z ð4Þ

Let A be a fuzzy set. The transitive closure RT(A, A) of a binary fuzzy

relation R(A, A) is defined as

RTðA;AÞ ¼ Ri; where Ri ¼ Ri�1 +R;Riþ1 ¼ Ri; and i � 2 ð5Þ

Figure 1. A triangular fuzzy set.


Dow

nloa

ded

by [

McM

aste

r U

nive

rsity

] at

10:

17 1

9 D

ecem

ber

2014

The a-cut Ra of a binary fuzzy relation R(A, A) is defined as

Ra ¼ fðx; yÞjmRðx; yÞ � a and a 2 ½0; 1�g ð6Þ

If a binary fuzzy relation R is reflexive, symmetric and transitive, then it is

called a fuzzy equivalence relation. If a binary fuzzy relation R is reflexive

and symmetric, then it is called a compatibility relation. If R is a com-

patibility relation, then compatibility classes are defined by a specified

membership degree a, where a 2 ½0; 1�. An a-compatibility class B is a

subset of A for a given fuzzy binary compatibility relation R(A, A), such

that Rðx; yÞ � a for all x; y 2 B.

GENERATINGFUZZYRULESANDCONSTRUCTINGMEMBERSHIP FUNCTIONSFROMTRAININGDATACONTAININGNOISE

In the following text, we extend our previous work (Wu and Chen 1999)

to present a new algorithm for constructing membership functions and

generating fuzzy rules from training data containing noise. It can de-

crease the number of input attributes appearing in the antecedent por-

tions of the generated fuzzy rules, generate fewer fuzzy rules, and obtain a

higher classification rate than the existing methods.

Because there may be some errors or noise in the training data set

due to typing errors, the membership functions and fuzzy rules we get

may not the best. The algorithms presented by Hong and Lee (1996) and

Wu and Chen (1999) can not deal with this kind of problem where the

training data contain noise. Thus, we need to propose a new method to

overcome the drawback of those algorithms. If a training data set con-

tains some errors or noise data, then we can find these errors or noise

data by using the automatic clustering algorithm which we presented in

an earlier work (Yeh and Chen 1994). The automatic clustering algorithm

is reviewed as follows:

Automatic Clustering Algorithm

Main() // main program //

{

//NOD: Number of Data, Data [ ]: Data Array

sorting Data[ ] using quick sort (Horowitz and Sahni 1978) in an

ascending sequence; //


Dow

nloa

ded

by [

McM

aste

r U

nive

rsity

] at

10:

17 1

9 D

ecem

ber

2014

//compute the difference between two neighboring elements in Data[ ]//

for i¼ 1 to NOD-1 do

Dif [i]¼Data[iþ 1]�Data[i];

//compute the difference between two neighboring elements in Dif [i]

which is not equal to 0, where p: point to the left one, q: point to the

right one, and ‘‘abs’’ denotes the ‘‘absolute’’ function //

for i¼ 1 to NOD-2 do

{

p¼ i;

while (Dif [p]¼ 0 and p> 1) do

p¼ p�1;q¼ iþ 1;

while (Dif [q]¼ 0 and q < NOD) do

q¼ qþ 1;

Dif_Dif [i]¼ abs (Dif [p]�Dif [q])};

// start clustering //

Cluster(1, NOD)

} // end of main () //

Cluster (L, R) // subroutine //

{

// set the boundary //

let Dif [L-1]¼Dif [R]¼Data[R];

let Dif_Dif [L-1]¼Dif_Dif [R-1]¼ 0;

//if the two ends of Dif [ ] are equal to 0, then adjust Dif_Dif [ ]//

temp¼R�1;while (Dif [temp],¼ 0 do

temp¼ temp�1;for i¼ temp to R�2 do

Dif_Dif [i]¼Dif [temp];

temp¼L;

while(Dif [temp]¼ 0) do

temp¼ tempþ 1;

for i¼L to temp-1 do

Dif_Dif [i]¼Dif [temp];

//Dif_Sum: summation from Dif [L] to Dif [R-1]; S: number of elements

from Dif [L] to Dif [R-1] which is not equal to 0 //

Dif_Mean¼Dif_Sum/S;


Dow

nloa

ded

by [

McM

aste

r U

nive

rsity

] at

10:

17 1

9 D

ecem

ber

2014

let X¼Y¼L;

//Dif_Dif_Max: maximal value from Dif_Dif [L] to Dif_Dif [R-2]//

if(R¼L) or (Dif_Mean � Dif_Dif_Max) then

Data[L] to Data[R] forms a cluster

else

for i¼L to R-1 do

{

if (i¼R-1) or ((Dif [i]>Dif_Mean) and (Dif [i] � Dif_Dif [i-1]) and

(Dif [i] � Dif_Dif [i])) then

Y¼ i;

Cluster(X, Y);

X¼Yþ 1

};

Cluster(X, R);

} // end of Cluster ()//

In the following, we use an example to illustrate the automatic

clustering process of the algorithm. Assume that the input data are as

follows:

2; 8; 3; 21; 36; 27; 28; 28; 4; 10; 88; 35; 21; 15; 23; 36

After applying the quick sort (Horowitz and Sahni 1978), the data can be

sorted in an ascending sequence:

2; 3; 4; 8; 10; 15; 21; 21; 23; 27; 28; 28; 35; 36; 36; 88

[Cluster(1,16)]

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Data½ � 2 3 4 8 10 15 21 21 23 27 28 28 35 36 36 88Dif½ � 88 1 1 4 2 5 6 0 2 4 1 0 7 1 0 52 88Dif Dif½ � 0 0 3 2 3 1 4 4 2 3 6 6 6 51 51 0

Dif_Mean¼ (1þ 1þ 4þ 2þ 5þ 6þ 2þ 4þ 1þ 7þ 1þ 52)/12¼ 7.167;

because of Dif [15]¼ 52>Dif_Mean and Dif [15]>Dif_Dif [14] and

Dif [15]>Dif_Dif [15],

so that CALL Cluster(1, 5);

CALL Cluster(16, 16);


Dow

nloa

ded

by [

McM

aste

r U

nive

rsity

] at

10:

17 1

9 D

ecem

ber

2014

[Cluster(1, 15)]

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Data½ � 2 3 4 8 10 15 21 21 23 27 28 28 35 36 36Dif½ � 36 1 1 4 2 5 6 0 2 4 1 0 7 1 0 36DifDif½ � 0 0 3 2 3 1 4 4 2 3 6 6 6 1 0

Dif_Mean¼ (1þ 1þ 4þ 2þ 5þ 6þ 2þ 4þ 1þ 7þ 1)/11¼ 3.09;


Dif [3] > Dif_Dif [3],










so that CALL Cluster(7,9);


Dif [12]>Dif_Dif [12],


CALL Cluster(13, 15);

[Cluster(1, 3)]

1 2 3

Data½ � 2 3 4Dif½ � 4 1 1 4Dif Dif½ � 0 0 0

Dif_Mean¼ (1þ 1)/2¼ 1;

because of Dif_Dif_Max¼ 0 < Dif_Mean,

so that {2, 3, 4} forms a cluster.

[Cluster(4, 5)]

4 5

Data½ � 8 10Dif½ � 10 2 10Dif Dif½ � 0


Dow

nloa

ded

by [

McM

aste

r U

nive

rsity

] at

10:

17 1

9 D

ecem

ber

2014

Dif_Mean¼ 2/2¼ 1;

because of Dif_Dif_Max¼ 0 < Dif_Mean,

so that {8, 10} forms a cluster.

[Cluster(6, 6)]

6

Data½ � 15Dif½ � 15 15Dif Dif½ � 0

because of L¼R (i.e., 6¼ 6),

so that {15} forms a cluster.

{Cluster(7, 9)]

7 8 9

Data½ � 21 21 23Dif½ � 23 0 2 23Dif Dif½ � 0 2 0

Dif_Mean¼ 2/1¼ 2;

because of Dif_Mean¼Dif_Dif_Max (i.e., 2¼ 2),


[Cluster(10, 12)]

10 11 12

Data½ � 27 28 28Dif½ � 28 1 0 28Dif Dif½ � 0 1 0

Dif_Mean¼ 1/1¼ 1;



[Cluster(13, 15)]

13 14 15

Data½ � 35 36 36Dif½ � 36 1 0 36Dif Dif½ � 0 1 0


Dow

nloa

ded

by [

McM

aste

r U

nive

rsity

] at

10:

17 1

9 D

ecem

ber

2014

Dif_Mean¼ 1/1¼ 1;



[Cluster(16, 16)]

16

Data½ � 88Dif½ � 88 88Dif Dif½ � 0

because of L¼R (i.e., 16¼ 16),

so that {88} forms a cluster.

Therefore, the clustering results are shown as follows:

Cluster[1]! {2, 3, 4},

Cluster[2]! {8, 10},

Cluster[3]! {15},

Cluster[4]! {21, 21, 23},

Cluster[5]! {27, 28, 28},

Cluster[6]! {35, 36, 36},

Cluster[7]! {88}.

We can use the automatic clustering algorithm to find some errors or

noise data in a training data set. Assume that there are m training in-

stances in a training data set which contains n input attributes (input

linguistic variables) X1;X2; . . . ;Xn, where the input values of the input

attribute Xiði ¼ 1; 2; . . . ; n) are denoted by xi;1; xi;2; . . . ; xi;m, and xi,p de-

notes the pth input value of the input attribute (input linguistic variable)

Xiði ¼ 1; 2; . . . ; nÞ. Furthermore, assume that the input values of each

input attribute (input linguistic variable) Xiði ¼ 1; 2; . . . ; nÞ of the trainingdata set have been sorted in an ascending sequence, i.e., xi;1 � xi;2 � � � �� xi;m, where xi;1; xi;2; . . . ; xi;m are the input values of the input attribute

(input linguistic variable) Xi and 1 � i � n. (Note: For the Iris data

(Fisher 1936), n¼ 4 and m¼ 150.) If the training data set contains some

errors or noise data, we can use the automatic clustering algorithm we

presented previously (Yeh and Chen 1994) to cluster the input values of

each input attribute (input linguistic variable) Xiði ¼ 1; 2; . . . ; nÞ of thetraining data set, respectively, to find the errors of noise data.


Dow

nloa

ded

by [

McM

aste

r U

nive

rsity

] at

10:

17 1

9 D

ecem

ber

2014

Based on the automatic clustering algorithm, after the clustering

process, we can see that the input values of each input attribute (input

linguistic variable) Xiði ¼ 1; 2; . . . ; nÞ can be clustered as follows:

Xi ¼ fðcxi;1; cxi;2; . . . ; cxi;zÞjz is the number of clustersg ð7Þ

where cxi;JðJ ¼ 1; 2; . . . ; zÞ is a cluster of the input attribute (input lin-

guistic variable) Xiði ¼ 1; 2; . . . ; nÞ. A cluster cxi;J is a subset of the input

values of the input attribute (input linguistic variable) Xiði ¼ 1; 2; . . . ; nÞ.The number of elements in each cluster cxi;J is larger than or equal to 1

and smaller than or equal to n, and the summation of the number of

elements in each cluster of each input attribute (input linguistic variable)

Xiði ¼ 1; 2; . . . ; nÞ is m. The clusters cxi;1; cxi;2; � � � ; cxi;z of the input attri-

bute (input linguistic variable) Xiði ¼ 1; 2; . . . ; nÞ are sorted in an as-

cending sequence as follows:

cxi;J ¼ fðXi;1;Xi;2; . . . ;Xi; jÞjj is the number of elements in the cluster cxi;Jg

ð8Þ

where J ¼ 1; 2; . . . ; z. In Eq. (7), if z is larger than or equal to 2, then it

indicates that the training data set contains noise. Thus, we should find

out this noise data. Because there is not too much noise data in the

training data set, we find out the clusters cxi;J for the input attribute

(input linguistic variable) Xi, where i ¼ 1; 2; . . . ; n, which have the mini-

mum number of elements. If an element in a cluster cxi;J of the input

attribute (input linguistic variable) Xiði ¼ 1; 2; . . . ; n is larger than

AveðXiÞ � ð1þ tÞ or smaller than AveðXiÞ � ð1� tÞ, then this datum is

regarded as a noise datum, where t is a constant and AveðXiÞ is the

average input value of the input attribute (input linguistic variable)

Xiði ¼ 1; 2; . . . ; nÞ. (Note that in this paper, we set the parameter t to 0.5.)

Because noise data has a bad influence on constructing membership

functions and generating fuzzy rules, we must modify them into proper

values. The way we do it is to modify the noise value of an input attribute

(input linguistic variable) Xi into the average input value AveðXiÞ of theinput attribute (input linguistic variable) Xi, where i ¼ 1; 2; . . . ; n:

Furthermore, because some attributes of the training data might not

be useful for classification and because they will influence the training

efficiency and performance, we should remove these attributes before

the training process, and then generate the fuzzy rules to improve the


Dow

nloa

ded

by [

McM

aste

r U

nive

rsity

] at

10:

17 1

9 D

ecem

ber

2014

classification accuracy rate. Thus, we must find the attributes that are

useless for classification. We can calculate the degree of fuzziness of each

attribute to determine whether it is useful to deal with classification or

not. One method to calculate the degree of fuzziness is by calculating the

entropy values (Hong and Lee 1999) of each attribute. In this article, we

propose a new method to calculate the degree of fuzziness of each at-

tribute.

The entropy method is widely used to calculate the degrees of fuz-

ziness of attributes. Assume that attributes A with m possible values

fA1;A2; . . . ;Amg can partition the set of training instances S into

fS1;S2; . . . ;Sig, where Si may include some Ai. Furthermore, assume that

there are k classes to be classified. Let the number of training instances

with attribute value Ai and class Oj be nij. Then, the entropy value of

attribute A is calculated as (Hong and Lee 1999)

EðAÞ ¼ 1�Xmi¼1

Xk

j¼1nij

�Xk

r¼1nir

!� log nij

�Xk

r¼1nir

!" #( )ð9Þ

where EðAÞ 2 ½0; 1�. The larger the entropy value of E(A), the more

helpful is the attribute A to be used for classification. On the other hand,

the smaller the value of E(A), the less helpful is the attribute A to be used

for classification.

In this article, we present a new method to calculate the degree of

fuzziness DðXiÞ of an attribute Xiði ¼ 1; 2; . . . ; nÞ shown as follows:

Di; j ¼x0

i; j�AveðXiÞmaxðx0

i; jÞ�AveðXiÞ if x 0i; j � AveðXiÞ

AveðXiÞ�x0i; j

AveðXiÞ�minðx0i; jÞ

if x 0i; j < AveðXiÞ

8<: ð10Þ

DðXiÞ ¼

Pmj¼1

Di; j

mð11Þ

where x0i;1; x

0i;2; . . . ; x

0i;m are the input values of the input attribute

Xiði ¼ 1; 2; . . . ; nÞ, Max(x 0i,j) denotes the maximum value among the

values x 0i;1; x

0i;2; . . . ; x

0i;m of the input attribute Xp, Min(x 0

i,p) denotes the

minimum value among the values x 0i;1; x

0i;2; . . . ;X

0i;m of the input attribute

Xi, Ave(Xi) denotes the average input value of the input attribute Xi, and

DðXiÞ 2 ½0; 1�. The larger the value of D(Xi), the more helpful the input


Dow

nloa

ded

by [

McM

aste

r U

nive

rsity

] at

10:

17 1

9 D

ecem

ber

2014

attribute Xiði ¼ 1; 2; . . . ; nÞ to be used for classification. The smaller the

value of D(Xi), the less helpful the input attribute Xiði ¼ 1; 2; . . . ; nÞ to beused for classification.

After calculating the degree of fuzziness of each attribute, if the de-

gree of fuzziness of an attribute is smaller than a threshold value o, whereo is given by the user and o 2 ½0; 1�, then it is regarded as a useless at-

tribute for classification.

In this article, we extend our previous work (Wu and Chen 1999) to

present a new method for generating fuzzy rules from a training data set

containing noise. Assume that the training data set P contains the fol-

lowing m input�output pairs:

P ¼ fðx1;j; x2;j; . . . ; xnj; yjÞj j ¼ 1; 2; . . . ;mg

where n denotes the number of training instances; xi, j denotes the jth

input value of the input attribute (input linguistic variable) Xiði ¼1; 2; . . . ; n and j ¼ 1; 2; . . . ;mÞ; yj denotes the jth output value of the

output attribute (output linguistic variable) Y, where j ¼ 1; 2; . . . ;m

(Note: In the Iris data, there are 150 training data (i.e., m¼ 150), and

there are 4 input attributes: ‘‘Sepal Length,’’ ‘‘Sepal Width,’’ ‘‘Petal

Length,’’ and ‘‘Petal Width’’ (i.e., n¼ 4). The output attribute is ‘‘the

flower’’; the possible output values of the output attribute ‘‘the flower’’

are ‘‘Setosa,’’ ‘‘Versicolor,’’ and ‘‘Virginica,’’ i.e., yj 2 fSetossa, Versi-color, Virginicag, where j ¼ 1; 2; . . . ;m.)

First, we consider the values y01; y02; . . . ; y

0m of the output attribute Y

to calculate the fuzzy relation Rðy0p1 ; y0p2Þ and to calculate the fuzzy

equivalence relation RT, where the definition of Rðy0p1 ; y0p2Þ is

Rðy0p1 ; y0p2Þ ¼ 1� 1

djy0p1 � y0p2 j ð12Þ

where y0p1 and y0p2 are output values of the output attribute and the value dcan be obtained as (Wu and Chen 1999)

d ¼Pm�1

i¼1 jyi � ymjm � 1

ð13Þ

where ym is the maximum output value of the output attribute Y. Because

Rðy0p1 ; y0p2Þ is a fuzzy compatibility relation and not a fuzzy equivalence


Dow

nloa

ded

by [

McM

aste

r U

nive

rsity

] at

10:

17 1

9 D

ecem

ber

2014

relation, after calculating the Max�Min transitive closure, we can get a

fuzzy equivalence reation RTðy0p1 ; y0p2Þ.

After calculating the fuzzy equivalence relation RTðy0p1 ; y0p2Þ, we can

divide the training data set P0 into many different parts by a-cuts (Wu

and Chen 1999). We divide the training data set P0 into r different subsets

Gjðj ¼ 1; 2; . . . ; rÞ as

Gj ¼ fðx01;p; x

02;p; . . . ; x

0n;p; y

0pÞjRTðy0p1 ; y

0p2Þ � a

a 2 ½0; 1�; 1 � p � m; 1 � p1 � m; and 1 � p2 � mg ð14Þ

where a is a threshold value to divide the training data set P0; a 2 ½0; 1�,and r is the number of subsets divided by a-cuts of the training data

set P 0.

After performing the a-cuts operations, we can see that Oj is the jth

output-value set of output attribute Y and Ii, j is the jth input-value set of

a subset Gj of input attribute X, where

Oj ¼fy0pj8ðx01;p;x

02;p; . . . ;x

0i;p;y

0pÞ 2Gj 1� p�mg; 1� j� r; ð15Þ

Ii; j ¼ fx0i;pj8ðx0

1;p; x02;p; . . . ; x

0i;p; . . . ; x

0n;p; y

0pÞ 2 Gj; 1 � p � mg;

1 � i � n; 1 � j � rð16Þ

Then, we can construct membership functions of output attribute Y from

the output-value set Oj, where j ¼ 1; 2; . . . ; r. Because we divide the

training data set P0 into r different output-value sets Ojðj ¼ 1; 2; . . . ; rÞ bya-cuts, every output-value set corresponds to a membership function as

Aj;a ¼ fy0jy0 2 Oj and mAjðy0Þ � ag j ¼ 1; 2; . . . ; r ð17Þ

where mAjis a membership function of the output attribute Aj. In this

article, we use triangular membership functions to represent fuzzy sets. A

fuzzy set can be represented by a triangular membership function (a, b, c)

as shown in Figure 1. The triangular membership function (aj, bj, cj) of

the output fuzzy set Aj can be obtained as

bj ¼y0min þ y0max

2; aj ¼ bj �

bj � y0min1� a

; cj ¼ bj þy0max � bj

1� að18Þ


Dow

nloa

ded

by [

McM

aste

r U

nive

rsity

] at

10:

17 1

9 D

ecem

ber

2014

where y0min and y0max are the minimum element and the maximum element

of Aj;a, respectively, and the membership function can be obtained as

mAjðy0Þ ¼

y0 � aj

jbj � ajjif aj � y0 � bj

cj � y0

jcj � bjjif bj � y0 � cj

0 otherwise

8>>>>><>>>>>:

ð19Þ

After constructing the membership functions mA1ðy0Þ; mA2

ðy0Þ; . . . ; mArðy0Þ

of A1;A2; . . . ;Ar, respectively, we can construct the membership func-

tions of the n input attributes (input linguistic variables) X1;X2; . . . ;Xn,

respectively. For the input attribute (input linguistic variable) Xiði ¼1; 2; . . . ; nÞ, its corresponding input values in the sorted training data set

P0 also have concurrently been divided into r input-value sets Ii;1;

Ii;2; . . . ; Ii;r. We calculate the fuzzy compatibility relation Ri; jðx0i;p1

; x0i;p2

Þ as(Klir and Yuan 1995; Wu and Chen 1999)

Ri; jðx0i;p1

; x0i;p2

Þ ¼ 1� 1

dx0

i;p1� x0

i;p2

�� ð20Þ

where x0i;p1

2 Ii; j; x0i;p2

2 Ii; j; 1 � i � n; 1 � j � r; 1 � p1 � jIi; jj; 1 � p2 �jIi; jj, and the value of d is calculated as

d ¼

PjIi; jj�1p¼1

X0i;p � X0

i;max

�� jIi; jj � 1

ð21Þ

where x0i;max is the maximum input-value of Ii; j, and jIi; jj denotes the

number of elements of the input-value set Ii; j. Because Ri; jðx0i;p1

; x0i;p2

Þ is afuzzy compatibility relation and not a fuzzy equivalence relation, after

calculating the Max�Min transitive closure, we can get a fuzzy equiva-

lence relation RTðx0i;p1

;x0i;p2

Þ. Furthermore, based on the a-cuts of fuzzyequivalence relations, we can divide different subsets Ii; j;k as

Ii; j;k ¼ fx0i;pjx0

i;p 2 Ii; j;RTi; jðx0

i;p1; x0

i;p2Þ � a; a 2 ½0; 1�;

1 � p � jIi; jj; 1 � p1 � jIi; jj; 1 � p2 � jIi; jjg ð22Þ


Dow

nloa

ded

by [

McM

aste

r U

nive

rsity

] at

10:

17 1

9 D

ecem

ber

2014

where k ¼ 1; 2; . . . ;TjðXiÞ, and TjðXjÞ is the number of input-value sub-sets obtained from the jth input-value set Ii; j.

Because we use a-cuts to divide the input attribute (input linguistic

variable) Xiði ¼ 1; 2; . . . ; nÞ into some different a-cuts Aði; j;kÞ;a as

Aði; j;kÞ;a ¼fx0i;pjx0

i;p 2 Ii; j;k;mAi; j;kðx0

i;pÞ� ag; 1� j� r; and 1� k�TjðXiÞð23Þ

where mAi; j;kis the membership function of the input fuzzy set Ai; j;k. We

can define the membership function mAi; j;kby a triplet (ai; j;k; bi; j;k; ci; j;k)

shown as

bi; j;k ¼ x0i;min

þx0i;max

2

ai; j;k ¼ ai; j;k �bi; j;k�x0

i;min

1�a

ci; j;k ¼ bi; j;k þx0

i;max�bi; j;k

1�a

ð24Þ

where x0i;min and x0

i;max are the minimum and the maximum elements of

the a-cut Aði; j;kÞ;a, respectively. Furthermore, we can define the mem-

bership function mAi; j;kas

mAi; j;kðx0

iÞ ¼

x0i�ai; j;k

jbi; j;k�ai; j;kj ifai; j;k � x0i � bi; j;k

ci; j;k�x0i

jci; j;k�bi; j;kj ifbi; j;k � x0i � ci; j;k

0 otherwise

8><>: ð25Þ

Thus, we can use these membership functions to generate fuzzy rules.

Consider the following two fuzzy rules (Wu and Chen 1999)

IFX1 is A1;1;1 and X2 isA2;1;1 THENY is y1

IFX1 is A1;2;1 and X2 isA2;2;1 THENY is y2

where X1 and X2 are input attributes (input linguistic variables); Y is the

output attribute; A1,1,1 and A1,2,1 are the input fuzzy sets of the input

attribute (input linguistic variable) X1; A2,1,1 and A2,2,1 are the input fuzzy

sets of the input attribute (input linguistic variable) X2; and y1 and y2 are

the output values of the output attribute (output linguistic variable) Y,

respectively. If the degree of similarity between the input fuzzy sets A2,1,1


Dow

nloa

ded

by [

McM

aste

r U

nive

rsity

] at

10:

17 1

9 D

ecem

ber

2014

and A2,2,1 is larger than a threshold value, where the threshold value is

given by the user and is between zero and one, then we can merge the

input fuzzy sets A2,1,1 and A2,2,1 into A2,new and simplify these two fuzzy

rules into the following rules:

IFX1 is A1;1;1 and X2 isA2;new THENY is y1

IFX1 is A1;2;1 and X2 isA2;new THENY is y2

In the following,we briefly review the process of how tomerge the input

fuzzy sets Ai; j1;k1 and Ai; j2;k2 into Ai;new (Wu and Chen 1999). First, we can

merge some very similar membership functions appearing in the fuzzy rule

base to calculate the equality degree of two input fuzzy sets Ai; j1;k1 and

Ai; j2;k2 of input attribute (input linguistic variable)Xi as: (Lin andLee 1996)

EðAi; j1;k1 ;Ai; j2;k2Þ ¼jAi; j1;k1 \ Ai; j2;k2 jjAi; j1;k1 [ Ai; j2;k2 j

ð26Þ

where 1 � j1 � r; 1 � j2 � r; 1 � k1 � Tj1ðXiÞ; 1 � k2 � Tj2ðXiÞ, and

EðAi; j1;k1 ; Ai; j2;k2Þ 2 ½0; 1�. Let aEquality be a threshold value (Wu and Chen

1999), where aEquality 2 ½0; 1�. If EðAi; j1;k1 ;Ai; j2;k2Þ > aEquality, then we can

merge the input fuzzy sets Ai; j1;k1 and Ai; j2;k2 to get a new input fuzzy set

Ai;new which can be represented by a triplet ðai;new; bi;new; ci;newÞ. The tri-plet ðai;new; bi;new; ci;newÞ is calculated by averaging the triplets

ðai; j1;k1 ; bi; j1;k1 ; ci; j1;k1Þ and ðai; j2;k2 ; bi; j2;k2 ; ci; j2;k2Þ. Therefore, the triplet

ðai;new; bi;new; ci;newÞ of the merged fuzzy set Ai;new is obtained as

ai;new ¼ ai; j1;k1 þ ai; j2;k2

2

bi;new ¼ bi; j1;k1 þ bi; j2;k2

2

ci;new ¼ ci; j1;k1 þ ci; j2;k2

2

ð27Þ

where the membership function mAi;newof input fuzzy set Ai;new is obtained

as

mAi;newðx0iÞ ¼

x0i�ai;new

jbi;new�ai;newj if ai;new � xi � bi;new

ci;new�x0ijci;new�bi;new j if bi;new � x0

i � ci;new

0 otherwise

8><>:


Dow

nloa

ded

by [

McM

aste

r U

nive

rsity

] at

10:

17 1

9 D

ecem

ber

2014

In the following text, we present a new algorithm to construct member-

ship functions and to generate fuzzy rules from training data containing

noise. The algorithm is essentially a modification of the one we presented

previously (Wu and Chen 1999). The algorithm is now presented as

follows.

Input. The training data set P contains m input-output pairs

ðx1;j; x2;j; . . . ; xn;j; yjÞ;P ¼ fðx1;j; x2;j; . . . ; xn;j; yjÞjj ¼ 1; 2; . . . ;mg, where m

is the number of training instances; n is the number of input attributes

(input linguistic variables); xi;1;xi;2; . . . ; xi;m are the input values of the

input attributes (input linguistic variable) Xi, where 1 � i � n and

1 � j � m; yj is the jth output value of the output attribute (output lin-

guistic variable) Y of the training data set P, where 1 � j � m.

Output. Fuzzy rules.

Step 1. For each input attribute (input linguistic variable) Xiði ¼ 1;

2; . . . ; nÞ in the training data set P, sort the input values xi;1; xi;2; . . . ; xi;m

of each input attribute (input linguistic variable) Xi in an ascending se-

quence x0i;1; x

0i;2; . . . ; x

0i;m, respectively, to obtain the sorted training data

set P0.

Step 2. For each input attribute (input linguistic variable) Xiði ¼ 1;

2; . . . ; nÞ in the sorted training data set P0, use the automatic clustering

algorithm to cluster the sorted input values x0i;1;x

0i;2; . . . ; x

0i;m of the input

attribute (input linguistic variable) Xi, respectively, where 1 � i � n.

Step 3. For each input attribute (input linguistic variable)

Xiði ¼ 1; 2; . . . ; nÞ in the sorted training data set P0,

if the attribute values of the attribute Xi were clustered into more than or

equal to two clusters

then find the cluster Cj of the input attribute (input linguistic variable) Xi

which has the minimum number of elements, respectively. If the values of

the elements in cluster Cj are larger than the value of AveðXiÞ � ð1þ tÞ orsmaller than the value of AveðXiÞ � ð1� tÞ, where t is a constant (Note

that this paper, we set the parameter t¼ 0.5) and AveðXiÞ is the averageattribute value of the input attribute (input linguistic variable) Xi, then set

the values of the elements to AveðXiÞ.


Dow

nloa

ded

by [

McM

aste

r U

nive

rsity

] at

10:

17 1

9 D

ecem

ber

2014

Step 4. For each input attribute (input linguistic variable)

Xiði ¼ 1; 2; . . . ; nÞ in the sorted training data set P0, calculate the degree of

fuzziness DðXiÞ of each input attribute (input linguistic variable) Xi using

Eqs. (10) and (11).

Step 5. for i ¼ 1 to n do

if the degree of fuzziness DðXiÞ of an input attribute (input

linguistic variable)Xi in the sorted training data setP0 is smaller

than the threshold value o given by the user, where o 2 ½0; 1�then remove the input attribute (input linguistic variable) Xi

from the sorted training data set P0;

end.

Step6. Construct the equivalence RT and divide the sorted training data

set P0 into r different sets by the following steps:

1. Sort the training data set P0 in an ascending sequence based on the

output values of the output attribute Y to obtain the sorted training

data set P00. Construct the equivalence relation RT between the output

values of the output attribute Y in the sorted training data set P00.

2. Divide the sorted training data in the sorted training set P00 into r

different output-value sets Oj of the output attribute Y and r different

input-values sets Ii; j of the input attribute Xi based on the a-cuts of theequivalence relations RT derived from the output values of the output

attribute Y in the sorted training set P00, where 1 � i � n and

1 � j � r; a is a threshold value given by the user, and a 2 ½0; 1�.

Step7. Derive the membership function mAjðyÞ of the output fuzzy set Aj

of the output attribute Y based on the a-cuts Aj;a of the output fuzzy set

Aj with respect to the output-value sets Oj, where 1 � i � r, a is a

threshold value given by the user, and a 2 ½0; 1�.

Step 8. Construct the equivalence relation RTi; j and divide the input va-

lues of the input-values set Ii; j into TjðXiÞ input-value subsets, where

1 � i � n and 1 � j � r, by the following steps:

1. Sort the input values of the input-value set Ii; j of the input attribute

(input linguistic variable) Xi in an ascending sequence, where

1 � i � n and 1 � j � r, and construct the equivalence RTi; j between


Dow

nloa

ded

by [

McM

aste

r U

nive

rsity

] at

10:

17 1

9 D

ecem

ber

2014

the input values of the jth input-value set Ii; j of the input attribute

(input linguistic variable) Xi.

2. Divide the input values of the input-value set Ii; j into TjðXiÞ input-value subsets Ii; j;k based on the a-cuts of the equivalence relation RT

i; j,

where 1 � i � n; 1 � j � r; 1 � k � TjðXiÞ, and TjðXiÞ is the number

of input-value subsets obtained from the jth input-value set Ii; j of the

input attribute Xi.

3. Derive the input membership function mAi; j;kðXiÞ of the input fuzzy set

Ai; j;k of the input attribute (input linguistic variable) Xi based on the

a-cuts Aði; j;kÞ;a of the input fuzzy set Ai; j;k with respect to the input-

value subset Ii; j;k of the input attribute (input linguistic variable) Xi,

where 1 � i � n; 1 � j � r; 1 � k � TjðXiÞ, and TjðXiÞ is the number

of input-value subsets obtained from the jth input-value set Ii; j of the

input attribute (input linguistic variable) Xi.

Step 9. Generate the fuzzy rules based on the hierarchical relationship

between the output value yj of the output attribute (output linguistic

variable) Y and the corresponding input fuzzy set Ai; j;k of the input at-

tribute (input linguistic variable) Xi, where 1 � i � n; 1 � j � r, and

1 � k � TjðXiÞ.

Step10. Calculate the equality degree between input fuzzy sets and mergethe membership functions of the fuzzy rule base by the following steps:

1. Calculate the equality degree EðAi; j1;k1 ;Ai; j2;k2Þ between the input

fuzzy sets Ai; j1;k1 and Ai; j2;k2 of the input attribute (input linguistic

variable) Xi based on Eq. (26), where 1 � i � n; 1 � j1 � r;

� j2 � r; 1 � k1 � Tj1ðXiÞ, and 1 � k2 � Tj2ðXiÞ.2. if EðAi; j1;k1 ;Ai; j2;k2Þ > aEquality, where the value of aEquality is specified

by the user and aEquality 2 ½0; 1�; 1 � j1 � r; 1 � j2 � r; 1 � k1 �Tj1ðXiÞ, and 1 � k2 � Tj2ðXiÞ

then

{

Construct the new input fuzzy set Ai;new of the input attribute (input

linguistic variable) Xi by merging the input fuzzy sets Ai; j1;k1 and

Ai; j2;k2 of the input attribute (input linguistic variable) Xi;

Replace the input fuzzy sets Ai; j1;k1 and Ai; j2;k2 of the input attribute

(input linguistic variable) Xi in the antecedent parts of the fuzzy

rules by the new input fuzzy set Ai;new

}.


Dow

nloa

ded

by [

McM

aste

r U

nive

rsity

] at

10:

17 1

9 D

ecem

ber

2014

EXPERIMENTALRESULTS

We applied the proposed algorithm to deal with the Iris data classification

problem, where 5% noise was added into the Iris data. There are three

kinds of flowers in the Iris data (i.e., Setosa, Versicolor, and Virginica) and

there are 50 instances in each kind. Thus, the Iris data contains 150

instances where there are four input attributes in the Iris data, i.e., sepal

length (SL), sepal width (SW), petal length (PL), and petal width (PW).

We have used MATLAB version 5.0 to write a program on a Pentium II

PC to deal with the Iris data classification problem (Fisher 1936, Hong

and Lee 1996; Wu and Chen 1999). Because 150� 5%¼ 7.5, we generate 8

noise values randomly in the input attributes. We randomly chose 75

values from Iris data set as the training data set to construct the mem-

bership functions and to generate fuzzy rules, and the other 75 values are

used to test the correctness of the classification of the generated fuzzy

rules. In this case, we set the parameter t to 0.5 and use Eqs. (10) and (11)

to calculate the degree of fuzziness of each attribute. We also set the

threshold value o to 0.5, set the threshold value a to 0.7, and set the

threshold value aEquality to 0.5.

After executing the program, we can see that the degrees of fuzziness

of the attributes SL and SW are smaller than the threshold value o, re-spectively, where o ¼ 0:5. In other words, SL and SW are useless for

classification, so we remove those attributes from the training data. Now,

there are only two attributes in the training data set, i.e., PL and PW.

We have randomly chosen 75 instances from the Iris data as the training

data set to generate fuzzy rules, and the other 75 instances are used to test

the correctness of the classification of the generated fuzzy rules. According

to the proposed algorithm, we can generate the fuzzy rules as shown

IF petal length is PL0 AND petal width is PW0 THEN the flower is

Setosa,


Versicolor,


Virginica,

where themembership functionsofPL0,PL1,andPL2are shown inFigure2.

The membership functions of PW0, PW1, and PW2 are shown in Figure 3.

Based on Eq. (26), we can calculate the degree of equality between

the input fuzzy sets PL0, PL1, and PL2. Because all the equality degrees

between the input fuzzy sets PL0, PL1, and PL2 are smaller than the


Dow

nloa

ded

by [

McM

aste

r U

nive

rsity

] at

10:

17 1

9 D

ecem

ber

2014

threshold value aEquality, where the threshold value aEquality¼ 0.5, we do

not need to merge these membership functions. Finally, the fuzzy rules

generated by the proposed fuzzy learning algorithm are listed as follows:


Setosa,

Figure 2. Membership functions of PL0, PL1, and PL2.

Figure 3. Membership functions of PW0, PW1, and PW2.


Dow

nloa

ded

by [

McM

aste

r U

nive

rsity

] at

10:

17 1

9 D

ecem

ber

2014


Versicolor,


Virginica.

After using 75 instances as the training data and the other 75 instances

the testing data, the classification results of the Iris data are shown in Table 1.

After executing 200 runs of the program, we can get the average

classification accuracy rate. We compare the results with Hong and Lee’s

algorithm (1996) and Wu and Chen’s algorithm (1999). The compared

results are shown in Table 2.

From Table 2, we see can that the average classification accuracy rate

of the proposed algorithm is 96.676%, and it is higher than the accuracy

rate 96.21% of Wu and Chen’s algorithm (1999) and the accuracy rate

95.57% of Hong and Lee’s algorithm (1996), and the number of the fuzzy

rules generated by the proposed algorithm is also less than Hong and

Lee’s algorithm. We also can see that the number of input attributes

appearing in the antecedent portions of the generated fuzzy rules is less

than Hong and Lee’s algorithm and Wu and Chen’s algorithm.

Furthermore, when we add 5% noise into the Iris data and randomly

choose 75 instances from the Iris data as the training data set, and the

other 75 instances are used as the testing data set, the average classification

accuracy rate of the proposed algorithm after 2000 runs is 95.88874%.

Furthermore, when we add 5% noise into the Iris data and randomly

choose 120 instances from the Iris data as the training data set, and the

other 30 instances are used as the testing data set, the average classification

accuracy rate of the proposed algorithm after 2000 runs is 96.604%.

When we also randomly choose 75 instances from the Iris data (no

noise) as the training data set, and the other 75 instances are used as the

testing data set, the average classification accuracy rate of the proposed

algorithm after 2000 runs is 96.33624%. Furthermore, when we also

Table 1. Classification results of the Iris data containing noise

Number of correct

classifications

Number of wrong

classifications

Setosa 25 0

Versicolor 24 1

Virginica 24 1


Dow

nloa

ded

by [

McM

aste

r U

nive

rsity

] at

10:

17 1

9 D

ecem

ber

2014

randomly choose 120 instances from the Iris data (no noise) as the

training data set, and the other 30 instances are used as the testing data

set, the average classification accuracy rate of the proposed algorithm

after 2000 runs is 96.90621%.

CONCLUSIONS

We have extended the work we presented previously (Wu and Chen 1999)

to present a new algorithm to generate fuzzy rules and construct mem-

bership functions from training data containing noise based on the a-cutsof equivalence relations. The proposed algorithm can get a higher aver-

age classification accuracy rate, generate fewer fuzzy rules, and generate

fewer input attributes in the antecedent portions of the generated fuzzy

rules. Based on the proposed algorithm, we implemented a program on a

Pentium II PC using MATLAB Version 5.0 to deal with the Iris data

classification problem. The experimental results were compared with the

Table 2. A comparison of the number of fuzzy rules, accuracy rate, and the number of

input fuzzy sets for different algorithms

Algorithms

Average

accuracy rate

Number of

fuzzy rules

Number of

input fuzzy

sets

Wu and Chen’s algorithm

(1999) (Training data set: 75

instances; testing data set: 75

instances; execution: 200 runs;

no noise)

96.2100% 3 8.21

Hong and Lee’s algorithm

(1996) (Training data set: 75

instances; testing data set: 75

instances; execution: 200 runs;

no noise)

95.5700% 6.21 8

Proposed algorithm (Training

data set: 120 instances; testing

data set: 30 instances;

execution: 200 runs; no noise)

96.8400% 3 6

Proposed algorithm (Training data

set: 75 instances; testing data

set: 75 instances; execution:

200 runs; no noise)

96.6760% 3 6


Dow

nloa

ded

by [

McM

aste

r U

nive

rsity

] at

10:

17 1

9 D

ecem

ber

2014

results of Hong and Lee’s algorithm (1996) and Wu and Chen’s algorithm

(1999). The proposed algorithm is better than the existing methods due to

the following facts:

1. The proposed algorithm could get a higher average classification ac-

curacy rate than the existing methods.

2. The proposed algorithm can deal with noise data. However, Hong

and Lee’s algorithm (1996) and Wu and Chen’s algorithm (1999) can

not deal with noise data.

3. The proposed algorithm generates fewer input attributes appearing in

the antecedent portions of the generated fuzzy rules than that of Hong

and Lee’s algorithm and Wu and Chen’s algorithm.

REFERENCES

Castro, J. L., and J. M. Zurita. 1997. An inductive learning algorithm in fuzzy

systems. Fuzzy Sets and Systems 89(2):193�203Castro, J. L., J. J. Castro-Schez, and J. M. Zurita. 1999. Learning maximal

structure rules in fuzzy logic for knowledge acquisition in expert systems.

Fuzzy Sets and Systems 101(2):331�342.Chen, S. M., and S. Y. Lin. 2000. A new method for constructing fuzzy decision

trees and generating fuzzy classification rules from training examples.

Cybernetics and Systems: An International Journal 31(7):763�785.Chen, S. M., and M. S. Yeh. 1998. Generating fuzzy rules from relational da-

tabase systems for estimating null values. Cybernetics and Systems: An In-

ternational Journal 29(6):363�376.Chen, S. M., S. H. Lee, and C. H. Lee. 1999. Generating fuzzy rules from nu-

merical data for handling fuzzy classification problem. In Proceedings of the

1999 National Computer Symposium, vol. 2, 336�343, Taipei, Taiwan.Fisher, R. 1936. The use of multiple measurements in taxonomic problems. An-

nals of Eugenics 7:179�188.Giarratano, J., and G. Riley. 1994. Expert Systems: Principles and Programming.

Boston: PWS Publishing.

Hayashi, Y., and A. Imura. 1990. Fuzzy neural expert system with automated

extraction of fuzzy if-then rules from a trained neural network. Proceedings

of the 1990 First International Symposium on Uncertainty Modeling and

Analysis, University of Maryland, College Park, Maryland.

Hong, T. P., and J. B. Chen. 1999. Finding relevant attributes and membership

functions. Fuzzy Sets and Systems 103(1):389�404.Hong, T. P., and J. B. Chen. 2000. Process individual fuzzy attributes for fuzzy

rule induction. Fuzzy Sets and Systems 112(1):127�140.


Dow

nloa

ded

by [

McM

aste

r U

nive

rsity

] at

10:

17 1

9 D

ecem

ber

2014

Hong, T. P., and C. Y. Lee. 1996. Induction of fuzzy rules and membership

functions from training examples. Fuzzy Sets and Systems 84(1):33�47.Hong, T. P., and C. Y. Lee. 1999. Effect of merging order on performance of

fuzzy induction. Intelligent Data Analysis 3(2):139�151.Horowitz, E., and S. Sahni. 1978. Fundamentals of Computer Algorithms.

Maryland: Computer Science Press.

Kao, C. H., and S. M. Chen. 2000. A new method to generate fuzzy rules from

training data containing noise for handling classification problems. In Pro-

ceedings of the Fifth Conference on Artificial Intelligence and Applications, pp.

324�332, Taipei, Taiwan.Klir, G. J., and B. Yuan. 1995. Fuzzy Sets and Fuzzy Logic: Theory and Appli-

cations. Englewood Cliffs, NJ: Prentice Hall.

Lee, C. C. 1990a. Fuzzy logic in control systems: Fuzzy logic controller-Part I.

IEEE Transactions on Systems, Man, and Cybernetics 20(2):404�418.Lee, C. C. 1990b. Fuzzy logic in control systems: Fuzzy logic controller-Part II.

IEEE Transactions on Systems, Man, and Cybernetics 20(2):419�440.Lin, C. T., and C. S. G. Lee. 1996. Neural Fuzzy Systems. Englewood Cliffs, NJ:

Prentice Hall.

Nozaki, K., H. Ishibuchi, and H. Tanaka. 1997. A simple but powerful heuristic

method for generating fuzzy rules from numerical sets. Fuzzy Sets and Sys-

tems 86(3):251�270.Wang, C. H., J. F. Liu, T. P. Hong, and S. S. Tseng. 1999. A fuzzy inductive

strategy for modular rules. Fuzzy Sets and Systems 103(1):91�105.Wang, L. X., and J. M. Mendel. 1992. Generating fuzzy rules by learning from

examples. IEEE Transactions on Systems, Man, and Cybernetics

22(6):1414�1427.Wu, T. P., and S. M. Chen. 1999. A new method for constructing membership

functions and fuzzy rules from training examples. IEEE Transactions on

Systems, Man, and Cybernetics Part B 29(1):25�40.Yeh, M. S., and S. M. Chen. 1994. An automatic clustering algorithm for fuzzy

query processing. In Proceedings of the 5th International Conference on In-

formation Management, pp. 241�250, Taipei, Taiwan.Yoshinari, Y., W. Pedrycz, and K. Hirota. 1993. Construction of fuzzy models

through clustering techniques. Fuzzy Sets and Systems 54(2):157�165.Yuan, Y., and M. J. Shaw. 1995. Induction of fuzzy decision trees. Fuzzy Sets and

Systems 69(2):125�139.Yuan, Y., and H. Zhuang. 1996. A genetic algorithm for generating fuzzy clas-

sification rules. Fuzzy Sets and Systems 84(1):1�19.Zadeh, L. A. 1965. Fuzzy sets. Information and Control, 8:338�353.


Dow

nloa

ded

by [

McM

aste

r U

nive

rsity

] at

10:

17 1

9 D

ecem

ber

2014

Documents

GENERATING FUZZY RULES FROM TRAINING DATA CONTAINING NOISE FOR HANDLING CLASSIFICATION PROBLEMS