00163365

Fuzzy Min-Max Classification with Neural Networks Patrick K. Simpson

General Dynamics Electronics Division P.O. Box 85310; Mail Zone 7202-K

San Diego, CA 92186-5310

ABSTRACT A feedforward neural network classifier that uses min-max vector pairs to define classes is described. This two-layer neural network utilizes a supervised learning rule to build a set of classes. Each node in the output layer of the network represents a class. During recall each class node produces an output value that represents the degree to which the input pattern fits within the represented classes. This fuzzy neural network is ideally suited to applications that have very little data available to define classes. This paper provides a brief overview of fuzzy sets and fuzzy pattern classification, a description of fuzzy min-max classification and its neural network implementation, and an example of the classification operation.

1. FUZZY SETS AND FUZZY PATTERNS Fuzzy sets where introduced by Zadeh (1965) as a means of representing inexact concepts. Linguistic constructs such as “many,” “few,” “often,” and “sometimes” mean different things depending on the situation and the observer. A fuzzy set, A, is a subset of the universe of discourse, X, that ranges from no membership (the empty set) to full membership. A membership function, mA(x), is used to describe the degree to which the object, x, belongs within the set A. The degree, or grade, of membership in A ranges from 0 to 1, where 0 represents no membership and 1 represents full membership. As an example, assume that A is the set of all people that are young in the world X. The degree to which a 25 year old person belongs to X is greater than a person twice that age.

From another perspective, a fuzzy set is a class that admits the possibility of partial membership. Let X = { x) be the space of all objects of interest (the universe of discourse). Then the fuzzy set A in X is a set of ordered pairs A = {x, MA(x)), x E X, where MA(x) E [0,1] is defined as the degree of membership of x in A (Kandel, 1986).

1.1. Operations on Fuzzy Sets The power of fuzzy sets is their ability to represent and manipulate imprecise data. Like traditional set theory, there is an entire suite of operations available to fuzzy sets that allow this manipulation. The basic operations are comparison, containment, union, intersection, and complementation. Let X = {XI, x2, ..., x,) be a standard set. Assume Y and Z are each fuzzy subsets of X. The above operations are defined as follows:

ComDarison: Y and Z are said to be equal (Y = Z) iff My(x) = Mz(x), x E X. Containment: Y G Z iff My(x) 5 MB(x), x E X. Union: The union of Y and Z, denoted as Y U Z, is defined as

2 9 1

Y U 2 = (ma~(My(x), Mz(x)) I x E X}. Intersection: The intersection of Y and 2, denoted as Y n 2, is defined as

Y n 2 = (min(My(x), Mz(x)) I x E X). Comdement: The complement of Y, denoted as p, is defined as

YC=(l-My(x)IxE X). In addition to these operations, it is also useful in many instances to know the size of a fuzzy set. The sigma-count, the sum of all the set membership values, provides this measure. Given the fuzzy set Y, the sigma-count of Y is defined as

Zcount(Y) = C mdx) X € X

1.2. Fuzzy Sets as Fuzzy Patterns An n-dimensional fuzzy pattern is a pattern constructed from the membership values of an n- element ordered fuzzy set. Consider the ordered fuzzy set Y E X, where Y = (yl, y2, ..., yn). The fuzzy pattern for Y is considered here to be a vector constructed from the membership values of each of the n corresponding fuzzy set elements. There are several ways to represent a fuzzy pattern. Kosko (1990) describes fuzzy patterns as points in the n-dimensional unit hypercube. For low dimensional pattern sets (n<4), this is an indicatively pleasing way to represent the data. For higher dimensional data, a histograph (a histogram with the bars replaced by a point at the maximum value of each bar) is preferred as it makes the visualization of the data much easier. The histograph representation of a fuzzy pattern (Figure 1) represents the fuzzy set elements as points along the abscissa of the histograph and the degree of membership for each set element along the ordinate of the histograph.

Figure 1: Histograph Representation of a Fuzzy Set

1 .o - a E E n E 0.5 E -

e!

Q)

r 0 Q)

P = 0.0 - *

I I I I 0 . 0 I 1 2 3 4 fuzzy set element n

There are several illustrative examples of fuzzy patterns. One example is frequency spectra. Let the universe of discourse be the n-dimensional frequency space where each frequency value ranges from 0 to 1. Each frequency bin of the frequency spectrum represents the degree of membership in that frequency bin's range of values. The fuzzy set is a collection of frequency bin membership values and the fuzzy pattern is the frequency spectrum. Another example of a fuzzy pattern is a grey scale image. Let the universe of discourse be all possible nXp-dimensional images (spatial patterns) where each pixel ranges from 0 to 1. Then each pixel of the image is an element of the fuzzy set and each pixel value represents the membership in the range of pixels values. The fuzzy set is the collection of pixel values and the fuzzy pattern is the image.

' (a) Classification (b) Mean-Variance (c) Min-Max Problem Solution Solution

2. FUZZY MIN-MAX CLASSIFICATION Pattern classification determines which class an input pattern belongs. Pattern classification requires class boundaries for each class to be defined. Traditional pattern classification is usually performed by collecting data, extracting features, normalizing the features, and then attempting to find a set of class boundaries that minimizes the classification error (Fukunaga, 1986). Alternatively, a fuzzy class is a subset of the universe of discourse. Fuzzy pattern classification seeks to find the subset of the universe of discourse that best represents a given pattern class. Classification requires boundaries. Decision boundaries (Figure 2 (a)) in most classifiers seek to minimize the intraclass distance, XI and x2, and maximize the interclass distance, y. Techniques for creating the boundaries seek to find the mean and the variance of each class (Figure 2(b)). The mean-variance approach results in a set of n-dimensional hyperspheres (or hyper-ellipsoids), one for each class. Fuzzy min-max classification works from a different perspective. Class boundaries are hyperboxes (boxes pack into cubes much better than spheres and ellipsoids). The min and max points of a hyperbox are all that are required to completely define its size and shape. Hence, fuzzy min-max classification seeks to find the min point and max point for each class (Figure 2(c)). 2.1. Class Overlap and Decision Boundaries

f\ Figure 2: Overview of Classification Methods

. e . . e .

2 9 3

Each n-dimensional hyperbox represents a class in the n-dimensional hypercube. The hyperboxes, like the hyperspheres, can overlap. In traditional pattem classification, a Bayes theoretic boundary is created that minimizes the misclassification between classes. Although the boundary can be weighted according to the number of data points in each class and the relative importance of misclassifying a data point, the end result is a hard decision that represents a willingness to live with a fixed amount of misclassification. An alternate perspective is taken with fuzzy min-max classification. If a class overlaps, the data point that falls within the overlap belongs to both classes. Although at first glance this seems ludicrous, after some thought the motivations become clear. There are many areas in pattem classification where patterns are not strictly of one class or another, rather they are a mixture of two classes. The overlap between the classes represents that mixture. As an example, assume there is a two class system: one class for circles and the other for squares. Which class does an octagon belong to? It has a circular shape constructed from straight lines. Is it a course circle or a sloppy square? The answer is both. There is a degree to which it belongs to both classes.

2.2. Measuring the Degree Of Classification (DOC) Fuzzy min-max classes are represented by a min point and a max point. Using the histograph representation described above, one fuzzy set defines the max point and another defines the min point (Figure 3(a)). If an input pattem falls completely between the max and min, then the input is a member of the class with degree 1 (Figure 3(b)). If the input falls completely outside the min-

Figure 3: Fuzzy Min-Max Degree Of Classification (DOC)

Q >

a - Q

E 0.5 f i c c

.... r Min

3 - I 1 1 A b - 4

dimension

le I

Point

Point

N - 3 I 1 1 A b - 4

dimension (a) Mln and max points define the class.

U) 2 1. - - ....-

.... r ~ 0.0 Input ................................................................. E l - 3 .c

(c) Input pattern has DOC = 0.

1. c Input

$ - 0.0 3 - I

1 1 A b -4l dimension

(b) Input pattern has DOC = 1.

v)

(d) Input has 0 < DOC c 1.

max boundaries, then the input is a member of the class with degree 0 (Figure 3(c)). If the input pattern is neither completely within or completely outside of the min-max class boundaries, then the degree to which the input is a member of the class falls between 0 and 1 (Figure 3(d)). Degree of Classification (DOC) is the measure of how well the k’th input pattern, Ak, falls between the min point of the j’th class, V,, and the max point of the same class, Wj.The= are three measures that have been derived to describe the degree of classification for min-max classes: (1) subsethood/supersethood, (2) average underlap/overlap, and (3) biased underlap/overlap. Subsethood/Suuersethood. Kosko (1990) describes a method of measuring the degree to which a fuzzy set Y set is a superset (or subset) of another fuzzy set Z. The supersethood measure is a measure of the amount of Y’s underlap with Z normalized by Y’s sigma-count. Underlap is the measure of the amount of Y that is not a superset of Z. The subsethood measure is the complement of the supersethood measure; supersethood(Y,Z) = 1 - subsethood(Y,Z) where Y and Z are fuzzy sets. Using this measure, the degree of classification (DOC) is computed as follows

n

i = 1 n c ‘ki

i = 1 c ‘ki i = 1

where Ak = (akl, a a , ..., a h ) is the k’th n-dimensional fuzzy input pattern, k = 1: 2, ..., m; V, = (Vjl, vj2, ..., Vjn) is the j’th class’s min point; Wj = (wjl, wj2, ..., Wjn) is the j’th class’s max point, and j is the index of the class. This measure is clearly biased by the size of the input pattern. If the sigma-count (size) of the input pattern is large, the sum of the violations in the numerator of the two fractions has less of an affect than if the input pattern was small. Hence, points equidistant from the class boundary on the top and bottom provide different responses.

Average Underlau/OverlaD: To alleviate the bias created by the input pattern’s size, an alternative classification measure is introduced as a modification of (1) that normalizes by the pattern dimensionality. This second degree of classification metric is defined as

This measure is a product of two complements, the complement of the average underlap and the complement of the average overlap. Although this measure does eliminate the pattern size bias, for large dimension patterns, the average difference between the input and the class boundaries become very small. Hence the relative difference between the classification responses becomes very small. When using this method of measuring misclassification, the DOC values are close to unity most of the time, making it difficult to discriminate class membership from the DOC2 values.

2 9 5

Biased Underlap/Overlap: As a compromise between DOC1 and DOC2, the biased underlap/ overlap is introduced. The biased underlap/overlap calculates the amount of underlap and divides that value by the number of components of the fuzzy pattern that where less than the min point (nsub). Similarly, the amount of overlap is divided by the number of components of the fuzzy pattern that where greater than the max point (nsup). The resulting equation is

DOC,(A,, VP W j ) =

n n r i r 1

h p j/ G i = l

1 - i = nsub

(3)

The average underlap and overlap are now biased by the number of components that where outside the class boundaries. In practice, this measure has worked well for large dimension patterns (n>100).

3. FUZZY LOGIC AND NEURAL NETWORKS A neural network is a distributed processing system that utilizes only local information to carry out useful information processing tasks. Local processing, in the neural network sense, means that all the information a processing element needs to compute an output value is available at the abutting connections. Because of the local processing nature of neural networks, it is possible to implement them in parallel which can provide real-time processing. The fuzzy min-max classifier can be implemented as a two-layer dual connection feedforward neural network. All of the input necessary to compute a single valued output is available in the connections and the input pattern Ak = (akl, au , ..., ab). 3.1. Dual Connection Network Topology Most feedforward neural networks have a single connection between any two processing

Figure 4: Single and Dual Connection Neural Networks

FB

FA

F B

F A

( akl 9 ak2 9 ak3 ) = A k

(a)Slngle connection neural network.

( akl 9 ak2 9 ak3 ) =Ak (b) Dual connection neural network.

elements in the network (cf. Simpson, 1990). As an example, a two-layer feedforward fully interconnected network has one connection from each FA processing element to each FB processing element (Figure 4(a)). The connection from the i’th FA processing element to the j’th FB processing element is wji. These connections are used to represent class exemplars or form a span of the input space. Recently there have been dual connection neural networks introduced (cf. Simpson, 1991) with two connections between the processing elements. As an example, a two-layer feedforward fully interconnected network has two connections from each FA processing element to each FB processing element (Figure 4(b)). The two connections between the i’th FA and the j’th FB processing elements are v,i and w,i. These connections had only previously been used to represent the mean (wji) and variance (vji) of a class. Here we will use these dual connections to store the min and max points of a class. In this framework, the min point for a class is represented by the values on the Vj connections (min connections for the j’th class), V, = vjl, vj2, ..., vjn, where v,i is the connection from the i’th FA processing element to the j’th FB processing element in a two- layer feedforward neural network. The max point for a class is represented by the values on the Wj connections (max connections for the j’th class), W, = wjl, w,2, ..., wjn, where wji is the connection from the i’th FA processing element to the j’th FB processing element in a two-layer feedforward neural network. 3.2. Fuzzy Min-Max Adaptation The fuzzy min-max classifier is a supervised learning classifier. Each fuzzy pattern has a class associated with it. Only the min and max points associated with that class are adjusted during the presentation of that fuzzy pattern. The learning procedure utilizes the fuzzy union operation to adjust the max point and the fuzzy intersection operation to adjust the min point. As a result, the min point represents the set of lowest values among all the fuzzy patterns associated with a given class. Similarly, the max point represents the set of highest values among all the fuzzy patterns associated with a given class. Assuming the connection topology described in the previous section, the min connections are adjusted using the equation

(4) new old I’ J’ v.. = v.. n aki

for all i = 1, 2, ..., n, where j is the class associated with the fuzzy input pattern Ak = (akl, a u , ..., ah) , k = 1,2, ..., m. The max connections are adjusted using the equation

( 5 ) new old

W . . ,’ = wji U aki

for all i = 1’2, ..., n and j is the class associated with the fuzzy input pattern A,.

Two important aspects of these adaptation equations should be emphasized. First, the adaptation does not require several presentations of the input patterns, rather the learning is immediate. Second, the learning equations are able to learn on-line. New data can be added to the system without complete retraining. These qualities are very appealing in applications where it is difficult to maintain a data set for retraining and training time must be done quickly. 3.3. Fuzzy Class Initialization Initially each class is empty. During the presentation of the first pattern for a class, the min and max points should assume the same values as the input pattern. During successive pattem presentations the min and max points will then separate (providing successive patterns are not

2 9 7

Figure 5: The Six Fuzzy Patterns Used to Form Three Classes I

tkc Pattern A2

I e - Pattern A,

t

t Pattern A3

Pattern A4

t I

Pattern A5 t

r

Pattern A6

identical to the first pattern) with the max boundary increasing its values and the min boundary decreasing its values. To achieve this type of learning behavior, the min points are initially set to all 1’s (the FULL point of the unit hypercube) and the max points are initially set to all 0’s (the NULL point of the unit hypercube).

3.4. Fuzzy Min-Max Recall During recall, an input pattern, Ak, is presented to a neural network that has p classes. The neural network produces a set of p values, one for each class, that represents the degree to which the input pattern, Ak, fits within the class j. Any one of the three equations described for the degree of classification (1)-(3) can be used for recall. The selection is problem dependant. In the following section is an example of the recall process using only equation (3).

4. AN EXAMPLE OF MIN-MAX LEARNING AND CLASSIFICATION

4.1. Fuzzy Pattern Data Set To illustrate how the fuzzy min-max classifier works, the following example is presented. A data set of six lodimension patterns (Figure 5) is associated with 3 classes. The fuzzy patterns and their associated classes are as follows: Ai = (0.0,0.2,0.4,0.4,0.4,0.2,0.2,0.2,0.2,0.2) CbSS 1 A2 = (0.4,0.4,0.2,0.2,0.2,0.4,0.6,0.6,0.6,0.6) C ~ S S 2 4 = (0.2,0.4,0.6,0.4,0.2,0.6,0.8,0.4,0.2,0.2) ChSS 3 4 = (0.2,0.2,0.2,0.4,0.6,0.6,0.4,0.2,0.2,0.2) chSS 1 & = (0.6,0.4,0.2,0.2,0.2,0.2,0.2,0.6,0.6,0.6) CbSS 2 4 = (0.6,0.4,0.2,0.4,0.6,0.4,0.2,0.4,0.6,0.4) ChSS 3

4.2. Fuzzy Min-Max Class Formation To store these patterns requires a 10x3 dual

Figure 6: The Three Classes Formed From A, - A6

4

Class 1 (Vl,Wl) with A,

T

Class 2 (V,,Wd with A,

Class 3 (V3,W3) with A,

~~

Table 1 : Degree Of Classification for Patterns B1 and B2 I

Class 1

0.36

Class 2 Class 3 *

connection two-layer feedforward neural network. The learning procedure described by equations (4) and ( 5 ) produces the following sets of min and max connections (Figure 6) for this network: Class 1 min and max points:

Vi = (0.0,0.2,0.2,0.4,0.4,0.2,0.2,0.2,0.2,0.2) W1 = (0.2,0.2,0.4,0.4,0.6,0.6,0.4,0.2,0.2,0.2)

Class 2 min and max points: V2 = (0.4,0.4,0.2,0.2,0.2,0.2,0.2,0.6,0.6,0.6) W2 = (0.6,0.4,0.2,0.2,0.2,0.4,0.6,0.6,0.6,0.6)

Class 3 min and max points: V3 = (0.5,0.3,0.1,0.3,0.5,0.3,0.1,0.3,0.5,0.3) W3 = (0.6,0.4,0.2,0.4,0.6,0.4,0.2,0.4,0.6,0.4)

4.3. Classifying an Input Pattern Two new input patterns will be used to demonstrate the classification of the Fuzzy Min-Max Classifier. The two input patterns are

B1 = (0.0,0.2,0.4,0.6,0.8,0.6,0.4,0.2,0.2,0.2) 82 = (0.8,0.6,0.4,0.2,0.0,0.2,0.4,0.6,0.6,0.6)

Figure 7 shows these patterns relatives to the three classes formed. Using equation (3), DOC3(), the degree to which each input pattern fits within each of the three previously created classes are computed, yielding the classification values found in Table 1. It is interesting to note that B2 produced almost an identical response from Classes 2 and 3. Also, Both B1 and B2 produced equivalent DOC’S for class 3.

5. CONCLUSION A two-layer feedforward supervised learning neural network that classifies fuzzy patterns has been described and illustrated. This neural network can learn on-line and on a single pass through the data set. The algorithm provides an output value for each class that represents the degree to which the input pattern falls within the respective class boundaries. This algorithm is currently being explored for applications in pattern recognition and control.

2 9 9

Figure 7: Comparing the Input Patterns with the Classes Formed (Input patterns shown as dashed lines)

t t

Class 1 (V,,W,) with A, Class 1 (V,,W,) with A2

Class 2 (V,,W,) with A, Class 2 (V2,W2 with A2 4 A

<

b

Class 3 (V3,W3) with A, Class 3 (V3,W3) with A2

REFERENCES Fukunaga, K. (1986). Statistical pattern recognition, In Handbook of Pattern Recognition and

Image Processing, T. Young and K. Fu, Eds., pp. 3-32. Academic Press: San Diego, CA. Kandel, A. (1986). Fuzzy Mathematical Techniaues with ADDliCatiOnS, Addison-Wesley:

Reading, MA. Kosko, B. (1990). Fuzziness vs. probability, International Journal of General Systems, Vol. 17,

Simpson, P. (1990). Artificial Neural Systems: Foundations, Paradigms, ADDliCatiOnS and

Simpson, P. (1991). Foundations of neural networks, Chapter 1 in Neural Networks, C. Lau & E.

Zadeh, L. (1965). Fuzzy sets, Information and Control, Vol. 8, pp. 338-353.

pp. 211-240.

ImDlementations, Pergamon Press: Elmsford, NY.

Sanchez-Sinecio, Eds., IEEE Press: Piscataway, NJ.

3 0 0

Documents

00163365