Fuzzy Partitioning Methodswebia.lip6.fr/.../articles/2001-inbook-GranularComputing.pdf · 2014. 11. 9. · Fuzzy Partitioning Methods (draft) Such kinds of discretization are, for

Fuzzy Partitioning Methods?

Christophe Marsala

Universite Pierre et Marie Curie, LIP6 - mailbox 169, 4, place Jussieu,75252 Paris cedex 05, France - email: [email protected]

Abstract. In this chapter, we propose two new algorithms to infer automaticallya fuzzy partition for the universe of a set of values, when each of these values isassociated with a class. These algorithms are based on the use of mathematical mor-phology operators that are used to filter the given set of values and highlight kernelfor fuzzy subsets. Their purpose is to be used in an inductive learning algorithm toconstruct fuzzy partitions on numerical universes of values.

1 Introduction

In this chapter, we focus on discretization methods of a set of numericalvalues in the context of inductive learning.

Inductive learning raises a particular to the general. A set of classes C isconsidered, representing a physical or a conceptual phenomenon. This phe-nomenon is described by means of a set of attributes A = {A1, ..., AN}. Eachattribute Aj can take a value vjl in a given universe Xj . A description is aN -tuple of attribute-value pairs (Aj , vjl). Each description is associated witha particular class ck from the set C = {c1, ..., cK} to make up an instance (orexample, or case) ei of the phenomenon. Inductive learning is a process togeneralize from a training set E = {e1, ..., en} of examples to a general law tobring out relations between descriptions and classes in C.

Given an attribute Aj , a training set E defines the ordered set {vj1, ...,vjmj} of possible values for Aj . In this paper, to simplify our notation, we

denote Aj both the attribute itself, and the set {vj1, ..., vjmj} of all valuesfrom the training set of this attribute. Hereafter, for each ei from E , wedenote by ei(Aj) its value for attribute Aj (ie. there exist vjl from Aj suchthat ei(Aj) = vjl).

An attribute Aj is a numerical attribute if its set of values is a subset of acontinuous or infinite universe Xj (for instance, Xj = IR or Xj = IN). In thiscontext, given a numerical attribute Aj , the universe Xj of its values has tobe partitioned in order to reduce the number of values of Aj .

The problem of the use of numerical attributes in an inductive learningalgorithm has been studied in several algorithms (for instance, in [31], or[33]). Several solutions are based on the use a fuzzy representation of these

? Published in the book: “Granular Computing, an Emerging Paradigm”, W.Pedrycz (Ed.), Studies in Fuzziness and Soft Computing, Physica-Verlag, pp.163-186, 2001.

C. Marsala

values ([5], [7], [37], [43]). However, this solution gives rise to the complexproblem of the generation of a fuzzy representation. A natural idea is toobtain it from experts of the studied domain. But it could be difficult to findexperts or expertises for particular kinds of data.

One possible means to obtain a fuzzy representation of the values of anattribute is to infer a fuzzy partition from the data of the training set. Thereexist various methods to infer a fuzzy partition from a set of data [1]. Inan inductive learning scheme, we prefer to use an automatic method whichinfers a fuzzy partition at each step of the learning algorithm that can be,for instance, an algorithm to build decision trees. Thus, the induced fuzzypartition can be related to the training set of the current step of the decisiontree construction. We do not want a highly elaborated method which coversthe whole space of training examples as the Krishnapuram’s method [21], theneural network method, or genetic algorithm method. These methods clusterthe space covered by all the attributes of the data, in one step. We prefer touse an algorithm which builds a fuzzy decision tree to cluster this space.

Thus, we propose to infer a fuzzy partition in an automatic way, in anintermediate step of the construction of the decision tree. This method iseasy to implement and gives good results. At the end, the final decision treewill take into account all the attributes involved in the recognition of a classand dependencies of these attributes.

In this chapter, we present a solution based on the use of mathematicalmorphology operators, and formalized by means of tools from theory of formallanguage. The implementation of this solution is described in [23].

In Section 2, we present the state of the art of fuzzy partitioning methods.In Section 3, we present a new algorithm to infer a fuzzy partition over a setof numerical values. In Section 4, we propose an extension of this algorithm toextract a reduced set of fuzzy values from a set of fuzzy values. And, finally,we conclude on this method.

2 Discretization of numerical attributes

Classical method to discretize a universe of numerical values search for thresh-olds in this set. These thresholds will be used to test values under or abovethem.

In this section, we present several existing methods to find such thresholds:classical methods and fuzzy methods. More details on such methods can befound in several general articles [13], [20], or [36].

2.1 Searching for crisp thresholds

Methods that do not use the distribution of classes on the universe Xj todiscretize are not very interesting to study.

Fuzzy Partitioning Methods (draft)

Such kinds of discretization are, for instance, based on the splitting ofthe universe of values into a set of intervals with the same length, or with alength related to a given proportion of examples. In this case, nothing is doneto handle really the numerical attribute: this attribute is only considered asa symbolic one. We focus here on methods that use the class associated witheach value vjl of attribute Aj .

The discretization method implemented in the cart system [8] is basedon the optimization of the value of a criterion: the Shannon entropy measure.The chosen splitting points in Xj are those who enables us to obtain the bestvalue (the lowest one) for that measure. A splitting point generates 2 subsetsof values from the whole set: the subset of lower values and the subset ofgreater values. So, the best splitting point minimizes the value of the entropymeasure relatively to the distribution of classes in the 2 created subsets.

This kind of method can be optimized when considering that it is un-necessary to generate all the possible splitting points for a given universe inorder to find the best one. It can be proven that the best splitting point isalways between two values associated with 2 different classes [14,15].

Moreover, the entropy measure can be substituted with another criterion,for instance the minimum description length principle [15], or with a contrastmeasure [40]. However, a contrast measure alone could be inappropriate ina learning scheme because it does not handle classes associated with values.Thus, [40] proposed a new measure composed of a measure of entropy and acontrast measure to find the best thresholds.

In literature, there exist a lot of statistical-based methods to find suchthresholds. Some of these methods present the advantage of taking into ac-count the distribution of classes related to the numerical universe to find bestsplitting points.

For instance, the chimerge algorithm introduced by [19] is based on theuse of the statistical χ2 measure to merge intervals. This merging is doneuntil a given threshold for which the χ2 measure is reached. This algorithmis a bottom-up algorithm that starts from values and ends on intervals.

The fusinter algorithm by [35,45] is based on the use of the statisticaltest of Mood. This test highlights the separability of classes by studying theirdistribution curve. It enables us to build subsets and to find a set of pointsthat can be compared by means of a quadratic entropy measure.

In the bayesian classifier domain, the discretization algorithm by [30] pro-ceeds by splitting or merging intervals. This process is supervised by a crossvalidation method.

This algorithm is similar to the one proposed by [17]. Here, neighboringintervals are merged when they are labeled by the same class. Moreover, theauthors introduced a new value for the class in addition of existing classes.This new value of the class is called “indecision class” and is used to labelintervals when no majority class exists in this interval.

C. Marsala

2.2 Searching for fuzzy threshold

The problem with the discretization by means of the construction of crispthresholds, consists in the importance brought out by such thresholds whenthey are used. The crispness of such a threshold leads to a decisive answerwhen tested that produces a lack of flexibility in a learning scheme. In fact,we have to keep in mind that such a threshold is artificial and reflects onlythe result of the discretization method applied to a particular set of values.For instance, the threshold of 1m72 is found on the numerical universe ofhuman size as the average size of French men. This value splits the Frenchmen as small (< 1m72) or tall (≥ 1m72), but it is rather difficult to separatepeople with size equal to 1m71 or equal to 1m73.

To limit the influence of a threshold in the determination of a decision,[9] proposed a flexibility in the result of comparison related to this threshold.Like a fuzzy set theory based method, the authors weighted the result of thecomparison (below or above) with the estimated probability of this result.This kind of method is also used by [32] to construct probabilistic decisiontrees.

A fuzzy set theory based method was introduced in [6,22]. In this method,the numerical universe Xj is discretized by means of the Shannon entropy,like in the cart system, into two subsets of values by searching for a thresholdt ∈ Xj . This discretization highlights two particular values vl and vh from Aj

that flank the threshold t: vl ≤ t < vh. These two values are used to definetwo trapezoidal membership functions of two fuzzy subsets: vl is the highestvalue of the kernel of the fuzzy subset lower than t and the lowest value ofthe support of the subset greater than t, vh is the lowest value of the kernelof the fuzzy subset greater than t and the highest value of the support of thefuzzy subset lower than t. Thus, a fuzzy partition is constructed for Xj .

A similar method is proposed by [18]. However, here, the membershipfunctions are considered as gaussian and not only piecewise linear.

A rather similar method is proposed by [44]. They consider an intervalaround a threshold as a fuzzy region.

The limitation of this kind of methods lies on the fuzziness of the regioncreated around a threshold. This region is very limited and, when using it,few values will benefit from it.

That is the reason why the search for fuzzy partitioning methods seemsto be more promising.

2.3 Searching for fuzzy partitions

In general, fuzzy values on a numerical universe can be obtained by severalmeans.

First of all, they can be given by expert of the domain. But, often eitherno expert exists to give them, or existing experts cannot express such kind of


knowledge or express it poorly, even if they use it. Another means is to usea methodology to construct such a fuzzy partition.

Various methods exist to infer a fuzzy partition. A lot of them are basedon statistical theory to find a fuzzy partition.

Questionnaire-based methods construct a membership function by meansof answers to basic questions asked to experts or users of the domain. Thequestionnaire can consist either in simple questions like: “Is this value in thesubset or not?”, or more complex questions: “Is this value greater than thisone or not?” or “Can you associate this value with the more representativesubset?”. Answers allow the construction of kernels of membership functionsby means of a given aggregated method, more or less elaborated (see [1] or[2] for more details).

We are interested here by automatic methods to construct a fuzzy parti-tion that are also used. We do not develop here neural network based methodsor genetic methods used to optimize a fuzzy partition because this kind ofmethods is based on the use of a learning method only to optimize parametersof fuzzy membership functions. Neural networks provide a fuzzy partition ofa numerical universe as a function valued by means of the network. It is byitself a learning method [28]. In the same way, genetic methods are based onthe optimization of the parameters of membership functions given a previousfuzzy partition (either obtained from experts, or from any other method)[38].It is an optimization-based method. They can be useful in a pre-treatmentof the data in order to highlight fuzzy subsets of values in the numericaluniverse.

Similarly, in the safi software, [37] introduces an automatic method toenhance fuzzy partitions given by an expert. The expert can interactivelymodify its fuzzy partition in relation with the data from the training set. Con-structed fuzzy partitions are thus optimized by means of an entropy measureof fuzzy events. The method is based on a fixed limit, the minimal spreaddegree, that will enable the kernel of the membership functions to be distantby at least this degree.

In this chapter, we focus only on methods to construct membership func-tions to help a fuzzy inductive learning process, for instance in the datamining step of knowledge discovery from data [10], [25]. In this context, itis obvious that an automatic and autonomous method is more satisfactorythan a parameterized or an expert-based method.

Such a method is introduced by [43] in a fuzzy decision tree construc-tion process. Given intervals of Xj with the same length, a basic triangularfunction is associated with each interval. An iterative algorithm is used to op-timize the kernel by valuing the distance between each training data and themiddle of the kernel and by modifying the middles by means of their nearestdata. This automatic method is interesting but does not use the distributionof the values of the class on the numerical universe.

C. Marsala

Another method is proposed by [42] in a fuzzy decision tree construc-tion process too. This method constructs a fuzzy partition thanks to theKolmogorov-Smirnov measure1 normalized by means of a contrast measure.The authors justify the use of this measure because it is a non-convex mea-sure and, in consequence, it does not favor non fuzzy partitioning in spite offuzzy partitioning when it is optimized [41].

In a learning scheme, another interesting method is introduced by [29].Given a set of numerical values, called elements, the mean is valued for eachelement associated with the same class. This mean value is used as a prototypevalue: the distance of each element to this mean is valued. This distance isused to associate a membership degree to the class for the element and, thus,to build a fuzzy partition.

3 Fuzzy partitioning of a set of crisp values

In this section, we propose a new method to construct a fuzzy partition of aset of crisp values associated with crisp classes. This method is based on theuse of mathematical morphology operators to filter the distribution of classeson the set Aj of values.

First of all, we present the adaptations of these operators in our context.Thus, an algorithm is given to fuzzify a numerical universe. Basic mathemat-ical morphology operators used in this method are recalled in Annex.

3.1 Smoothing a set of values

Usually, the morphological operators are applied on a 2D-picture. They haveto be adapted to be used in our context of fuzzy partitioning.

Let Aj be the numerical attribute whose universe Xj has to be parti-tioned, according to a training set E . We introduce the representation of thedistribution of classes on the ordered set {vj1, ..., vjmj

} as a word. And wepropose the use of rewriting systems, from formal language theory [16], tosmooth this word in order to obtain fuzzy modalities of the attributes.

Let L be an alphabet, each letter of L representing one of the classesin E . We construct the alphabet Lu = L ∪ {u} with u 6∈ L. The letter uis a particular letter in the system, we will use it to determine uncertainsequences. For any alphabet A, we denote A∗ the set of all possible wordscomposed by letters from A.

For example, let the training set be E ={(17, cheap), (24, expensive),(29, expensive), (33, expensive), (42, expensive)} with {cheap, expensive} asset of classes and {17, 24, 29, 33, 42} as set of values of an attribute (e.g. the

1 Given two class values c1 and c2, the Kolmogorov-Smirnov measure values themaximal distance between the two distributions of probability of each class:K(c1, c2) = maxx(|Pc1(x) − Pc1(x)|) with Pc1 (resp. Pc2) distribution of prob-ability of c1 (resp. c2).


attribute “size”). Let L = {c, e}, c representing cheap and e representingexpensive. The word defined in L∗ by the training set is ceeee.

After this transformation of the training set into a word, we define variousways of using this word.

To construct a fuzzy partition of Xj , we want to obtain sequences of let-ters from L, as homogeneous as possible, in order to associate them withfuzzy subsets of Xj (constituting thus a fuzzy partitioning of Xj). Each sub-set will represent a linguistic modality of the attribute (for instance small andbig with the previous example). To obtain such sequences, we present sev-eral techniques to alter a word. Our goal is to erase non-representative valueswithin a word in order to smooth it. We use operators inspired by mathemat-ical morphology theory presented previously. To alter a word, these operatorsare implemented as rewriting systems. Each rewriting system is given as atransduction, an automaton where each transition is associated with bothan input and an output [16]. Basic notions on transductions are recalled inAnnex.

Transduction for the erosion and for the dilatation. Let us define atransduction Erx = 〈Lu,Lu, SEr, IEr, EEr, δEr〉 (see Fig. 1). This rewriting

Initial state

Terminal state

read and

replace it by

1

4

3

2

x | u y | y

y | y

y | uy

x |$ | $

$ | $

$ | u$

x | xε

x / y x

y

Fig. 1. Transduction for the erosion related to x ∈ L with y ∈ Lu y 6= x

system is used for the erosion of a word with a particular letter x ∈ L asstructuring element. It corresponds to the reduction of sequences of x in theword. From the word w, w ∈ L∗u, we obtain the word Erx(w) ∈ L∗u.

For example, with the word w = xyyyy, to find Erx(w) we use the trans-duction given in Fig. 1, and we obtain Erx(xyyyy) = uyyyy.

C. Marsala

Now, let us define Dix = 〈Lu,Lu, SDi, IDi, EDi, δDi〉 (see Fig. 2), anotherrewriting system. This system will dilate a sequence in a word when thissequence is surrounded with letters u.

x | x

y | y

$ | $

u | x

y | y

x | x $ | $

x | xx

$ | u$

y | uy

u | u

2

1

4

3

u |ε

x / y x

Initial state

Terminal state

read and

replace it by y

Fig. 2. Transduction for the dilatation related to x ∈ L with y ∈ Lu y 6= x

It can be proven that for any given word, the computed terminal wordis unique [23]. Thus, we are sure that Erx(w) and Dix(w) exist for all wordw ∈ L∗u and for all letter x ∈ L, and therefore, for all training sets.

Moreover, for any word w ∈ L∗u, we call Erkx(w) (resp. Dikx(w)), withk > 0, the word obtained from w after k consecutive erosions (resp. dilata-tions).

Now, with these two rewriting systems we will define the two usual oper-ations from the mathematical morphology: the opening and the closure.

The opening and closure operators. The opening is the compositionDix ◦ Erx of the two previous operators (e.g. rewriting systems). The k-opening (k ∈ IN) of a word w ∈ L∗u with respect to x ∈ L is defined asOpkx(w) = Dikx(Erkx(w)). The k-opening of a word allows to erase small se-quences with length smaller than 2k. The advantage of this operation is thatwe can erase all sequences in w with length smaller than a fixed value.

For example, with w = yyxxxyxyx, we have Op1x(w) = yyxxxyuyu andOp2x(w) = Di2x(Er2x(w)) = Di2x(yyuuuyuyu) = yyuuuyuyu.

The closure is the composition Erx ◦Dix of the two operators of erosionand dilatation. This composition allows to join sequences of letters in a wordif these sequences are separated by less than two letters u. The k-closure(k ∈ IN) of the word w ∈ L∗u with respect to x ∈ L is defined as Clkx(w) =


Erkx(Dikx(w)). With this operator, two sequences separated by less than 2kletters u are unified.

For example, with the word w = uuxxuuuxxuxuuu, we have Cl1x(w) =uuxxuuuxxxxuuu and Cl2x(w) = uuxxxxxxxxxuuu.

Finally, we introduce the filter operator which transforms a word intoa sequence of homogeneous series of letters of Lu. In the framework of theutilization of a training set, a filter allows to smooth the training set to deducea fuzzy partition.

c A filter operator. A filter is a composition of the previously described word-transforming operators. Let w ∈ L∗u, x ∈ L and k ∈ IN. The k-filter of theword w with respect to x is defined as:

if k = 1 : Fil1x(w) = Cl1x(Op1x(w))if k > 1 : Filkx(w) = Clkx(Opkx(Filk−1x (w)))

The particular combination of these operators has some interesting proper-ties. A filter will allow to smooth a fuzzy subset. First, the sequences with alength of 2k letters are erased (ie. , replaced by the letter u), then we unifysequences separated by 2k letters.

3.2 Fuzzy partitioning of a set of values

Now, we present an algorithm to infer a fuzzy partition on a set of numericalvalues, after the use of the previous rewriting system.

When we apply a filter to the word induced by a training set, we are ableto translate small sequences of classes into uncertain sequences. To smootha training set, we apply a k-filter to it. Then, we obtain a word with largesequences (with a length larger than 2k). k is a value, empirically fixed, givento the system. In our system, to filter the universe of values of attribute Aj ,we choose k equal to 10% of the number of cases in E : k = b0.1 ∗ nc.

The sequences of u represent uncertain sequences where the classes arehighly mixed. Some sequences consist of a single letter x, x ∈ L. Thesesequences describe roughly a single class; these sequences do not contain anyu character. We call them certain sequences, whatever x may be. We willuse these uncertain and certain sequences to build a fuzzy partition of thetraining set T that is related to an attribute. Certain sequences of letter xcorrespond to the kernels of the fuzzy sets of the partition.

Let r be the number of fuzzy modalities we want for the attribute. Weselect the r largest certain sequences containing one class (Fig. 3).

To each sequence, we assign intervals from X, for instance [Smin1 ,Smax

1 ]and [Smin

2 ,Smax2 ] when r = 2. In the case where we cannot find r such se-

quences, we can either reduce the number of applied filters, or select fewersequences. We summarize this in the following algorithm FPMM, Fuzzy Par-titioning using Mathematical Morphology, with r = 2:

C. Marsala

Smax2Smin

2Smax1Smin

1Binf Bsup X

Fig. 3. Fuzzy partition from 2 kernels

Algorithm 31 (FPMM) To find a fuzzy partition on X, given a trainingset E, in 2 fuzzy subsets

1. Transform E into a word w.2. For k fixed, smooth w.3. Find the two largest certain sequences S1 and S2.4. We denote by Smin

i (resp. Smaxi ) the value associated with the first (resp.

last) letter of Si in X, i = 1, 2. S1 ≡ [Smin1 ,Smax

1 ] (resp. S2 ≡ [Smin2 ,Smax

2 ])with Smax

1 < Smin2 .

5. The fuzzy partition is defined as a family of two fuzzy subsets. The kernelof the first one is ]−∞,Smax

1 ] and its support is ]−∞,Smin2 ]. The kernel

of the second one is [Smax1 ,+∞[ and its support is [Smin

2 ,+∞[

3.3 An illustration of the algorithm

Let E be a training set with a numerical attribute A (e.g. the age), definedon the universe X, and two classes + and -. E = {(5, -), (7, +), (8, -), (13, -),(14, -), (17, +), (20, -), (21, +), (22, -), (23, -), (25, +), (29, +), (30, +), (35,+), (36, -), (38, +), (40, +) }. We represent E in a graphical form (Fig. 4).

- - - - - - - -+ + + + + + + + +

X5 78

13 14 17 2021

22 23 25 29

30

35

36

38 40

Fig. 4. A training set

To infer a fuzzy partition on the X, we apply the algorithm FPMM. First,E is transformed into a word. Let L = {+, -}, the word associated with E isw =-+---+-+--++++-++ (see Fig. 4).


X5 78

13 14 17 2021

22 23 25 29

30

35

36

38 40

-+ + + + + + + + +u u u u uu u

Fig. 5. The training set after an erosion related to −

X5 78

13 14 17 2021

22 23 25 29

30

35

36

38 40

-+ + + + + + + + +u u uu u- -

Fig. 6. The training set after an opening related to −

X5 78

13 14 17 2021

22 23 25 29

30

35

36

38 40

- + +u u uu u- -u u u u u+ +

Fig. 7. The training set after an opening related to − followed by an opening relatedto +

We filter w with k = 1, and we obtain the filtered word on Lu where twonon uncertain sequences appear: a sequence of - and a sequence of +. We usethem as the basis of kernels of two fuzzy subsets (Fig. 8). In Fig. 5, Fig. 6,and Fig. 7 an illustration of the use of morphological operators is presented.

- - - - - + + + + + +- - +

25 29 3536

38 4030

20 2221

175 78

13 14 X23

Fig. 8. A fuzzy partition of the training set

C. Marsala

3.4 Applications of this algorithm

We implement this algorithm to infer fuzzy partitions for numerical attributesduring the construction of a fuzzy decision tree [23].

The software Salammb is based on an extension of the ID3 Algorithmwith a fuzzy measure of entropy, the entropy-star [7]. This measure is usablewhen a set of numerical values is associated with a fuzzy partition over it.Usually, this fuzzy partition is given by an expert of the considered domain.In the software Salammb, fuzzy partitions for numerical attributes from thetraining set are constructed by means of the algorithm FPMM.

Applications in several domain of knowledge discovery ([24–27]) show thatthe results are more interesting with the fuzzy decision tree based methodthan with a traditional ID3-based method: fuzzy trees are shorter and gen-eralize better for new cases.

For instance (for more details, see [25]), for the Breiman’s waveformsproblem [8] where examples are described by means of 21 numeric attributesand where there are 4 classes to recognize. Fuzzy decision trees constructedwith the help of the fuzzy partitioning algorithm have an average classificationrate (the number of test examples that are well classified by means of thebuilt tree) of 78.2%. In comparison, for the same data, classical decisiontrees constructed by means of the algorithm C4.5 (see [34]) have an averageclassification rate of 72.7%.

4 Fuzzy partitioning of a set of fuzzy values

In an inductive learning scheme, the training set could be composed of train-ing cases described by means of fuzzy values for some attributes. Moreover,the class can be a fuzzy value itself and each cases can thus be associatedwith a degree of membership to each class. The previous algorithm has to beenhanced to take into account such kind of training set.

Thus, we propose here a new algorithm, based on the same kind of con-siderations as previously given, to find a fuzzy partition for an universe froma fuzzy set of values.

When considering a set of values, a new method is introduced to reduceand to generalize this set. Given an attribute A defined on a numerical uni-verse X, and for each (fuzzy) value ck ∈ C, a fuzzy subset µck on X has tobe constructed.

This fuzzy subset will be considered as a new fuzzy value vck of A thatassociates values from the training set and knowledge related to the class ck.This new fuzzy value vck is constructed by aggregating all fuzzy values vl ofA from the training set E . Moreover, in this aggregation, each value vl mustbe weighted by its membership to the class ck. Thus, a set of new fuzzy valuesvck is defined on X, for each class ck.

In a learning scheme, it appears that such a new value, that reflects per-fectly the primary set of values, should be too specialized to this set and does


not handle impreciseness of the training set values. So, it should be filtered, asin the previous method, to fuzzify it. A morphological method is introducedafter the aggregation of values to obtain a set of more general fuzzy values.

In this section, we present a proposition of such a method of fuzzy parti-tioning from a set of fuzzy values.

Let E be a training set as previously defined. Let c ∈ C be a (fuzzy) class,and let A ⊆ A be a numerical attribute2, with X as universe of values. Ourgoal is to construct a fuzzy partition on X according to E .

4.1 Aggregation of fuzzy values

First of all, we construct a single fuzzy subset of values on X associated withE for c.

For each case e from E , the membership degree of e to c is denoted byδc(e). Given a value v ∈ A, several cases ei ∈ E , i ∈ {1, ..., n} can possess thisvalue, each one associated with a membership degree δc(ei) to c. We defineγc(v) the membership degree of v to c as:

γc(v) = max{ei | ei(A)=v}

(δc(ei)).

The membership of v to c is defined as the union of the membership degreesto c of all cases that possess this value.

The degree γc(v) enables us to weight the membership function µv. Thiswill produce a new membership function µv,c to characterize the simultaneousoccurrence of v and c in E . We define:

∀x ∈ X, µv,c(x) = min(µv(x), γc(v)).

The intersection operator is used to reflect the fact that x belongs to v andto ei which itself belongs to c with the degree δc(ei)

All computed degrees µv,c for modalities v are aggregated to obtain anew fuzzy value vc for attribute A. The membership function µvc of this newmodality reflects the membership of the whole set of training cases to c withregard to A. We define µvc as the aggregation of all the membership functionscomputed to measure the membership of a single value:

∀x ∈ X, µvc(x) = maxv∈A

(µv,c(x)).

At the end, for each class c ∈ C, a single fuzzy value vc is obtained forA. Such a value can be used directly, but it can also be filtered to generalizeit. The membership function of this fuzzy value is too complex because tooclose to the training values present in the training set.

In the following, we propose an algorithm to filter and to smooth againsuch fuzzy value in order to enhance its generalization power.

2 We recall that we denote by A both the attribute and its ordered set of valuesbelonging to the training set.

C. Marsala

4.2 Fuzzy morphological transformation

The FPMM algorithm cannot be applied directly to filter the obtained fuzzyvalue. In this case, we cannot use a transduction that modifies the lettersbecause the training set is represented for each class and not for the wholeset of classes. Thus, no word can be constructed. Here, given a class, theerosion and the dilation should be applied on the membership degrees.

Definition of a granularity level. In order to use morphological operators,we introduce a granularity level that will enable us to define transformationfunctions on continuous fuzzy sets.

Let vc be the fuzzy value constructed previously and let µvcbe its mem-

bership function on X. Let a step s ∈ IR be given. s defines the granularitylevel we want for the transformation. We define the following ordered setXS = {x1, x2, ..., xS} as:

• ∀i ∈ {1, ..., S}, xi ∈ X and ∀i ∈ {2, ..., S}, xi = xi−1 + s.• x1 is defined such that: ∀x ∈ X, x < x1 implies µvc(x) = 0.• xS is defined such that: ∀x ∈ X, x > xS implies µvc(x) = 0.

x1 is the lowest value of the support of µvc , and xS is the greatest valueof the support of µvc .

We define the set of corresponding membership degrees {α1, α2, ..., αS}as: ∀i ∈ {1, ..., S}, αi = µvc(xi).

Transformation of membership degrees. Given a value x ∈ XS , let αbe the membership degree of x before the morphological transformation, andlet α′ be the membership degree of x after the morphological transformation.

The study done in the previous part (part 3.1) highlights the fact thata morphological transformation applied on xi ∈ XS , i = 2, ..., S − 1 dependsonly on membership degrees of xi−1 and xi+1. Thus, α′i depends only onαi, αi−1 and αi+1, and the morphological transformation is a function: f :[0, 1]× [0, 1]× [0, 1] −→ [0, 1] such that α′i = f(αi−1, αi, αi+1). The functionf has to be defined for each kind of transformation we want to implement.

The properties wanted for such a function are connected to the math-ematical morphology theory, as in the previous part. Fuzzy mathematicalmorphology theories exist, for instance [3,4] or [12]. In [3,4], fuzzy subsets areeroded or dilated according to a structuring element which can be anotherfuzzy subset. Erosion operator and dilatation operator are implemented bymeans of a t-norm or a t-conorm.

In our scheme, the structuring element is implicit and can be associateddirectly to the definition of f . Thus, we propose two functions to erode andto dilate a fuzzy subset. These functions are based on t-norms and can beviewed as particular cases of functions from fuzzy mathematical morphologytheory presented in [3,4].


Erosion of a fuzzy set. The function fEr : [0, 1] × [0, 1] × [0, 1] −→ [0, 1],such that α′i = fEr(αi−1, αi, αi+1), that enables us to erode a fuzzy set shouldsatisfy the following properties:

i) α′i ≤ αi,ii) if αi−1 = αi+1 = 1 then α′i = αi,iii) |α′i − αi| increases when αi−1 and αi+1 tends to 0.

Property i) is required for an erosion-like function. Property ii) ensuresthat an element really inside the kernel of a fuzzy set (ie. surrounded byelements in the kernel) will not be eroded. Property iii) reflects the fact thatthe power of the erosion increases when αi−1 and αi+1 belongs to anotherclass.

A example of function fEr that satisfies these properties is:

fEr(αi−1, αi, αi+1) = max(0; αi −max(αi−1, αi+1)),

with α = 1− α.

Dilatation of a fuzzy set. In the same way, a function fDi : [0, 1]× [0, 1]×[0, 1] −→ [0, 1], such that α′i = fDi(αi−1, αi, αi+1), is defined to implementa dilatation of fuzzy sets. For such a function, the following properties arerequired:

i) α′i ≥ αi,ii) if αi−1 = αi+1 = 0 then α′i = αi,iii) |α′i − αi| increases when αi−1 and αi+1 tends to 1.

Property i) is required for a dilatation-like function. Property ii) ensuresthat an element really outside fuzzy set (ie. surrounded by elements outsidethe support) will not be dilated. Property iii) reflects the fact that the powerof the dilatation increases when αi−1 and αi+1 belong to the same class.

A example of function fDi is:

fDi(αi−1, αi, αi+1) = min(1; αi + max(αi−1, αi+1)).

An illustration of the use of the erosion function and the dilatation func-tion is given in Fig. 11.

Opening and closure of a fuzzy set. As usually, these morphologicaloperators are defined by combination of erosion and dilatation operators.

Here, the opening and the closure of a fuzzy set are the composition ofthe erosion function and the dilatation function:

fOp = fDi ◦ fEr and fCl = fEr ◦ fDi.

An illustration of the use of the opening and the closure is given in Fig. 12.

C. Marsala

Filtering a fuzzy set. The FPMM algorithm is extended to be applied tofuzzy sets by means of the introduced functions. The new algorithm is:

Algorithm 41 (FPMM’) To construct a fuzzy partition for a numericalattribute A, defined on X, from a training set E,:

1. For each class c ∈ C, aggregate the values of A to compute vc.2. Filter each vc into v′c by means of the defined morphological functions.3. Each obtained value v′c is a fuzzy subset of the fuzzy partition of X.

4.3 Illustration of the algorithm

Let the training set E be given in Table 1. This training set is a toy set com-posed of motorbikes described by means of a single attribute: their estimatedaverage speed (a fuzzy value), and associated with a class: their level of price(cheap or expensive, a fuzzy value).

Motorbike 500 GSE CBR 1100 125 Rebel 900 Bandit

Average speed normal around 150 slow around 180

Cheap 0.7 0.25 1.0 0.5

Expensive 0.3 0.75 0.0 0.5

Table 1. A fuzzy data base

The attribute average speed is associated with estimated fuzzy values asaround 150 km/h or slow. Membership functions of these values are given inFig. 9.

65 70 85 95 130 150 180 210

1

0

slow ar. 180ar. 150normal

Speed (km/h)

1

0

30000 60000

expensivecheap

Price (French francs)

Fuzzy values Fuzzy classes

Fig. 9. Fuzzy values in the training set

Attribute A is the average speed and is associated with the numeri-cal universe X = [0, 250]. It possesses 4 fuzzy values slow, normal, around150 et around 180. Only one case in the training set is associated with


the fuzzy value normal, with the membership degree µcheap(500 GSE) =0.7 and µexpensive(500 GSE) = 0.3. Thus, we have γcheap(normal) = 0.7 andγexpensive(normal) = 0.3. The membership function µnormal, cheap is shown inFig. 10. The result of the aggregation for the class cheap is given in Fig. 10.

65 70 85 95 130 180 2101500

normal

Normal speed andPrice is cheapwith a degree 0,7

Speed (km/h)

1

65 70 85 95 130 150 180 210

1

0

Speed (km/h)

Union of fuzzy values (class cheap)

Fig. 10. Aggregation of fuzzy values

The value vcheap is a continuous membership function here. We decomposedwith a level of granularity to define points. For instance, we can use a step sof 10 to define 16 points from Fig. 10. Results of the application of an erosionor a dilatation is given in Fig. 11. Results of application of an opening and

65 85 13070 150 180 210

1

0

Speed (km/h)

95 65 70 85 95 130 150 180 210

1

0

Speed (km/h)

Erosion Dilatation

Fig. 11. Erosion and dilatation of fuzzy subsets

closure is given in Fig. 12. In Fig. 13, we present results of filtering for thetwo classes cheap and expensive. Thus, 2 new modalities for the attributespeed are obtained for each class.

5 Conclusion

In this chapter, we propose two new algorithms to infer automatically afuzzy partition for the universe of a set of values, when each of these valuesis associated with a class.

These algorithms are based on the use of mathematical morphology op-erators that are used to filter the given set of values and highlight kernel for

C. Marsala

15065 70 85 95 180 210

1

0

Speed (km/h)

130 65 70 85 95 130 150 180 210

1

0

Speed (km/h)

Opening Closure

Fig. 12. Opening and closure of fuzzy sets

65 70 9585 150 180 210

1

0

Speed (km/h)

130 65 70 85 95 130 150 180 210

1

0

Speed (km/h)

Fuzzy speed associated with cheap Fuzzy speed associated with expensive

Fig. 13. Filtering a fuzzy set

fuzzy subsets. Their purpose is to be used in an inductive learning algorithmto construct fuzzy partitions on numerical universes of values.

The first algorithm enables us to construct a fuzzy partition from a setof numerical values associated with a class. This algorithm is based on sev-eral rewriting systems which are represented as transductions. Each of theserewriting systems implements a mathematical morphology operator. An op-erator is defined to reduce an arbitrary sequence of letters, and another op-erator is defined to enlarge a sequence of letters. When these two operatorsare combined, depending on the order of the composition, we obtain two op-erators to filter the set of values. Finally, we obtain an algorithm to smooth aword induced by a training set, and, from this word we propose a way to de-fine a fuzzy partition. This algorithm has been implemented and is currentlyused in several applications of fuzzy decision tree construction.

The second algorithm is a proposition of extension of the first one. It en-ables us to construct a fuzzy partition from a set of fuzzy values associatedwith a fuzzy class. This algorithm is also based on mathematical morphologytheory but operators are implemented in a different way. Functions are de-fined to filter the membership function associated with a given class. Finally,we obtain an algorithm to smooth a fuzzy subset induced by a training set,and a way to define a fuzzy partition.


References

1. N. Aladenise and B. Bouchon-Meunier. Acquisition de connaissances impar-faites : mise en evidence d’une fonction d’appartenance. Revue Internationalede Systemique, 11(1):109–127, 1997.

2. T. Bilgic and I. B. Turksen. Measurement of memberchip functions: Theoreticaland empirical work. In D. Dubois and H. Prade, editors, Fundamentals of fuzzySets, volume 7 of Handbook of Fuzzy Sets. Kluwer, 2000.

3. I. Bloch and H. Maıtre. Constructing a fuzzy mathematical morphology: Al-ternative ways. In Proceedings of the Second IEEE International Conferenceon Fuzzy Systems, San Francisco, USA, April 1993.

4. I. Bloch and H. Maıtre. Fuzzy mathematical morphology. Annals of Mathe-matics and Artificial Intelligence, 9(3,4), 1993.

5. B. Bouchon-Meunier and C. Marsala. Learning fuzzy decision rules. In D. D.J. Bezdek and H. Prade, editors, Fuzzy Sets in Approximate Reasoning andInformation Systems, volume 3 of Handbook of Fuzzy Sets, chapter 4. KluwerAcademic Publisher, 1999.

6. B. Bouchon-Meunier, C. Marsala, and M. Ramdani. Arbres de decision ettheorie des sous-ensembles flous. In Actes des 5emes journees du PRC-GDRd’Intelligence Artificielle, pages 50–53, 1995.

7. B. Bouchon-Meunier, C. Marsala, and M. Ramdani. Learning from imperfectdata. In D. Dubois, H. Prade, and R. R. Yager, editors, Fuzzy InformationEngineering: a Guided Tour of Applications, pages 139–148. John Wileys andSons, 1997.

8. L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification AndRegression Trees. Chapman and Hall, New York, 1984.

9. C. Carter and J. Catlett. Assessing credit card applications using machinelearning. IEEE Expert, Fall Issues:71–79, 1987.

10. K. J. Cios, W. Pedrycz, and R. W. Swiniarski. Data Mining - Methods forKnowledge discovery. Engineering and Computer Science. Kluwer AcademicPublishers, 1998.

11. M. Coster and J.-L. Chermant. Precis d’analyse d’images. Presses du CNRS,1989.

12. B. De Baets, N. Kwasnikowska, and E. Kerre. Fuzzy morphology based onconjunctive uninorms. In M. Mares, R. Mesiar, V. Novak, J. Ramik, andA. Stupnanova, editors, Proceedings of the Seventh International Fuzzy Sys-tems Association World Congress, volume 1, pages 215–220, Prague, CzechRepublic, June 1997.

13. J. Dougherty, R. Kohavi, and M. Sahami. Supervised and unsupervised dis-cretization of continuous features. In A. Prieditis and S. Russell, editors, Ma-chine Learning: Proceedings of the Twelfth International Conference, San Fran-cisco, CA, 1995. Morgan Kaufmann.

14. U. M. Fayyad and K. B. Irani. On the handling of continuous-valued attributesun decision tree generation. Machine Learning, 8(1):87–102, January 1992.Technical note.

15. U. M. Fayyad and K. B. Irani. Multi-interval discretization of continuous-valuedattributes for classification learning. In Proceedings of the 13th InternationalJoint Conference on Artificial Intelligence, volume 2, pages 1022–1027, 1993.

C. Marsala

16. S. Ginsburg. The Mathematical Theory of Context Free Languages. McGraw-Hill, New-York, 1966.

17. E. G. Henrichon and K.-S. Fu. A nonparametric partitioning procedure forpattern classification. IEEE Transactions on Computers, C-18(7):614–624, July1969.

18. J.-S. R. Jang. Structure determination in fuzzy modeling: a fuzzy CART ap-proach. In Proceedings of the 3rd IEEE Int. Conf. on Fuzzy Systems, volume 1,pages 480–485, Orlando, 6 1994. IEEE.

19. R. Kerber. ChiMerge: Discretization of numeric attributes. In Proceedings ofthe 10th National Conference on Artificial Intelligence, pages 123–128. AAAI,1992.

20. G. J. Klir and B. Yuan. Fuzzy Sets and Fuzzy Logic. Theory and Applcations.Prentice Hall, 1995.

21. R. Krishnapuram. Generation of membership functions via possibilistic clus-tering. In Proceedings of the 3rd IEEE Int. Conf. on Fuzzy Systems, volume 2,pages 902–908, Orlando, Florida, June 1994.

22. C. Marsala. Arbres de decision et sous-ensembles flous. Rapport 94/21,LAFORIA-IBP, Universite Pierre et Marie Curie, Paris, France, Novembre1994.

23. C. Marsala. Apprentissage inductif en presence de donnees imprecises : con-struction et utilisation d’arbres de decision flous. These de doctorat, UniversitePierre et Marie Curie, Paris, France, Janvier 1998. Rapport lip6 no 1998/014.

24. C. Marsala and B. Bouchon-Meunier. Fuzzy partioning using mathematicalmorphology in a learning scheme. In Proceedings of the 5th IEEE Int. Conf.on Fuzzy Systems, volume 2, pages 1512–1517, New Orleans, USA, September1996.

25. C. Marsala and B. Bouchon-Meunier. An adaptable system to construct fuzzydecision trees. In Proc. of the NAFIPS’99 (North American Fuzzy InformationProcessing Society), pages 223–227, New York (USA), June 1999.

26. C. Marsala and N. Martini Bigolin. Spatial data mining with fuzzy decisiontrees. In N. F. F. Ebecken, editor, Data Mining, pages 235–248. WIT Press,1998. Proceedings of the International Conference on Data Mining, Rio deJaneiro, Sept. 1998.

27. C. Marsala, M. Ramdani, M. Toullabi, and D. Zakaria. Fuzzy decision treesapplied to the recognition of odors. In Proceedings of the IPMU’98 Conference,volume 1, pages 532–539, Paris, July 1998. Editions EDK.

28. X. Menage. Apprentissage pour le controle de qualite, approche basee sur latheorie des possibilites et la theorie des reseaux de neurones. PhD thesis, Uni-versite P. et M. Curie, Paris, France, Septembre 1996.

29. H. Narazaki and A. L. Ralescu. An alternative method for inducing a member-ship function of a category. International Journal of Approximate Reasoning,11(1):1–28, july 1994.

30. M. J. Pazzani. An iterative improvement approach for the discretization of nu-meric attributes in bayesian classifiers. In U. M. Fayyad and R. Uthurusamy,editors, Proceedings of the First International conference on Knowledge Dis-covery and Data Mining, pages 228–233, Montreal, Quebec, Canada, August1995.

31. J. R. Quinlan. Induction of decision trees. Machine Learning, 1(1):86–106,1986.


32. J. R. Quinlan. Probabilistic decision trees. In R. S. Michalski, J. G. Carbonell,and T. M. Mitchell, editors, Machine Learning, volume 3, chapter 5, pages140–152. Morgan Kaufmann Publishers, 1990.

33. J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, SanMateo, Ca, 1993.

34. J. R. Quinlan. Improved use of continuous attributes in C4.5. Journal ofArtificial Intelligence Research, 4:77–90, 3 1996.

35. S. Rabaseda-Loudcher. Contributions a l’extraction automatique de connais-sances. Une application a l’analyse clinique de la marche. PhD thesis, Univer-site Lumiere Lyon 2, France, Decembre 1996.

36. S. Rabaseda-Loudcher, M. Sebban, and R. Rakotomalala. Discretisation ofcontinuous attribute: a survey of methods. In Proceedings of the 2nd annualJoint Conference on Information Sciences, pages 164–166, Wrightsville Beach,North Carolina, USA, September 1995.

37. M. Ramdani. Systeme d’Induction Formelle a Base de ConnaissancesImprecises. PhD thesis, Universite P. et M. Curie, Paris, France, Fevrier 1994.Aussi publiee en rapport du laforia-ibp no TH94/1.

38. E. Sanchez, T. Shibata, and L. A. Zadeh, editors. Genetic algorithm and fuzzylogic systems. Soft Computing Perspectives., volume 7 of Advances in FuzzySystems – Applications and Theory. World Scientific Publishing Co., 1997.

39. J.-P. Serra. Image Analysis and Mathematical Morphology. Academic Press,New York, 1982.

40. T. Van de Merckt. Decision trees in numerical attribute spaces. In IJCAI-93Proceedings of the 13th International Joint Conference on Artificial Intelli-gence, volume 2, pages 1016–1021, 1993.

41. L. Wehenkel. On uncertainty measures used for decision tree induction. InProceedings of the 6th International Conference IPMU, volume 1, pages 413–418, Granada, Spain, july 1996.

42. L. Wehenkel. Discretization of continuous attributes for supervised learning.variance evaluation and variance reduction. In M. Mares, R. Mesiar, V. Novak,J. Ramik, and A. Stupnanova, editors, Proceedings of the Seventh InternationalFuzzy Systems Association World Congress, volume 1, pages 381–388, Prague,Czech Republic, June 1997.

43. Y. Yuan and M. J. Shaw. Induction of fuzzy decision trees. Fuzzy Sets andsystems, 69:125–139, 1995.

44. J. Zeidler and M. Schlosser. Continuous-valued attributes in fuzzy decisiontrees. In Proceedings of the 6th International Conference IPMU, volume 1,pages 395–400, Granada, Spain, july 1996.

45. D. A. Zighed, R. Rakotomalala, and S. Rabaseda. A discretization method ofcontinuous attributes in induction graphs. In R. Trappl, editor, Cyberneticsand systems’96, volume 2, pages 997–1002. Austrian Society for CyberneticsStudies, 1996.

Annex

Mathematical morphology theory

The fundamental operators from mathematical morphology theory are theerosion operator and the dilatation operator. They are combined to produce

C. Marsala

the opening operator and the closure operator that can be used to filter a setof bodies. There operators come from the pattern recognition domain andare often used to filter 2D-pictures. More details on these operators and onthe mathematical morphology theory can be found in [39] or in [11].

The basic operators: erosion and dilatation. We consider a space ofmorphological bodies. These two basic operators enable us to modify a mor-phological body C (Fig. 14). This modification is related to a structuringelement se. The erosion is a particular subtraction of e in C, and the dilata-tion is a particular addition of se in C.

Erosion

Dilatation

structuring element

Fig. 14. Mathematical morphology operators

The operators opening and closure. Each of these operators is a com-bination of the two basic operators.

element

structuring

element

structuring

Erosion Dilatation

Fig. 15. Opening operator

The opening is the combination of an erosion followed by a dilatationapplied to a morphological body, with the same structuring element (Fig. 15).It enables destruction of small bodies in the space, with respect to the sizeof the chosen structuring element.


structuring

element

structuring

element

ErosionDilatation

Fig. 16. Closure operator

The closure is the combination of a dilatation followed by an erosionapplied to a morphological body, with the same structuring element (Fig. 16).It enables the destruction of small vacuum places occurring in a body, withrespect to the size of the chosen structuring element.

The open-close filter. A filter is a combination of openings and closures.It is composed by k successive openings followed by k successive closures(k = 1, 2, . . .) applied to all bodies of the space, with the same structuringelement. Thus it enables the destruction of small bodies present in the spacewith respect to the chosen structuring element. Simultaneously, it enablesthe filling of small vacuum places occurring in bodies. The value of k enablesus to control the power of the modification.

Transductions

Let us recall that a transduction is a 6-tuple 〈A,B, S, I, E, δ〉 where A is theinput alphabet, B is the output alphabet, S is a (finite) set of states, I ⊆ Sis the set of initial states of the transduction, E ⊆ S is the set of terminalstates of the transduction and δ ⊂ S×A∗×B∗×S is the transition function.

A transduction reads a word w ∈ A∗ and rewrites it in a correspondingword wo ∈ B∗. It proceeds sequentially from the first letter of w to the lastone, as a reading head moves letter by letter. The rewriting rules to generatewo are based on δ. Let (si, z, t, sj) ∈ δ, with si, sj ∈ S, z ⊂ A∗ and t ⊂ B∗,(si, z, t, sj) is called a transition of the transduction. If si is the current stateand we can read z in w (i.e. z is composed by the successive letters comingjust after the reading head), we replace it by t and the current state becomessj . A convention is to use $ to match the end of the input word, and ε (thenull word) is introduced when nothing has to be written.

For example, let A = {a, b},B = {0, 1}, S = {S1,S2,S3}, I = {S1},E = {S3} and δ = {(S1, a, 0,S1), (S1, b, ε, S2), (S2, a, 0,S1), (S2, b, 1,S2),(S1, $, $,S3), (S2, $, $,S3)}

C. Marsala

A simple visual representation of this transduction is a graphic form(Fig. 17).

S1

S2 S3

b / ε

$ / $

$ / $

b / 1

a / 0

a / 0

a / 0 read a and

Terminal state

Initial state

replace it by 0

Fig. 17. Example of transduction

Let us rewrite w = abbaabaab with this transduction. After the sequenceof states (S1, S1, S2, S2, S1, S1, S2, S1, S1, S2, S3), we have wo = 010000and, as the current state is S3, a terminal state, and we have nothing moreto read in w, the rewriting is done.

Documents

Fuzzy Partitioning Methodswebia.lip6.fr/.../articles/2001-inbook-GranularComputing.pdf · 2014. 11. 9. · Fuzzy Partitioning Methods (draft) Such kinds of discretization are, for