[IEEE NAFIPS 2005 - 2005 Annual Meeting of the North American Fuzzy Information Processing Society - Detroit, MI, USA (26-28 June 2005)] NAFIPS 2005 - 2005 Annual Meeting of the North

NAFIPS 2005 - 2005 Annual Meeting of the North American Fuzzy Information Processing Society

Pattern Recognition using Multivariate-basedFuzzy Inference Rule Reduction on Neuro

Fuzzy SystemDeok H. Nam

Division ofEngineering and Computer ScienceWilberforce UniversityWilberforce, OH [email protected]

Abstract - The pattern recognition of the IRIS data usingmultivariate-based fuzzy inference rule reduction technique ispresented. There are numerous data sets to perform or recognizethe data themselves with unnecessary results of patternrecognition such as undesired rules or clusters with aninappropriate degree precision because of the imprecise andmassive data set itself. The proposed data reduction techniquereduces the large and imprecise data into the relatively reducedand precise data produced by the consecutive preprocessing offactor analysis and Subtractive clustering (SUBCLUST) analysisto generate the neuro fuzzy system for pattern recognition. As acase study, it is examined for the performance and its accuracyusing the proposed technique with IRIS data to recognize itspattern from 150 selected flowers from three species and theirfour different measurements using statistical measurements suchas correlation (CORR), total root mean square (TRMS),standard deviation (STD), mean of absolute distance (MAD), andequally weighted index (EWI).Index terms - Factor analysis, Multivariate-based datareduction, neuro fuzzy system, Subtractive clustering(SUBCLUST).

I. INTRODUCTION

There are countless data sets in the real world. Inaddition, some data sets are very large, ambiguous, andimprecise with noisy. Hence, it is very difficult to recognize orextract the desired information from those ambiguous,imprecise, and huge data sets. This aspect evokes the necessityof data reduction, especially for the massive and highdimensional data. Among those reduction techniques,multivariate analysis techniques such as principal componentanalysis, factor analysis, and clustering analysis are verypopular methods for the analysis of the data sets. By theappearances of other alternative methods such as neuralnetworks, the techniques of multivariate analysis are notpractically satisfied even though those are still extended anddeveloped by some scientists. However, these methods aregetting refocused on various development fields inengineering and natural sciences such as image processing,data mining, data fusion, and etc.

The purpose of data reduction techniques is mainly toreduce the number of dimensions for the data and then detectits desired structure in the relationships between the originalvariables to form a new structure using the reduced variables

Harpreet SinghDepartment ofElectrical and Computer Engineering

Wayne State UniversityDetroit, Ml 48202

[email protected]

extracted by the reduction of the dimension for the data.Among these techniques for multivariate analysis, factoranalysis [1][2] uses a technique for the analysis of allvariability in each variable. One of the goals for factoranalysis is using it as a method for reducing the dimensions ofthe observation space such as transforming or reducing amultidimensional space to another one of the same or lessdimensions (i.e., same or less number of axes or variables)depending upon the given conditions. Hence, the factoranalysis is usually converting the normalized data to the newextracted data so we called 'factor scores'. These scores givethe new combinations of variables that describe the majorpatterns ofvariation among data.

Another frequently used technique for data reduction isclustering analysis. The purpose of clustering is to classifyand simplify natural groups from a large and ungrouped databy producing a relatively simple and grouped data based upontheir similarities of behaviours between the initial data. Inother words, clustering of data generally splits a set of certainpatterns into a number of groups corresponding to a suitablesimilarity measure. Those patterns, which are belonging toany one of the groups, are similar and the patterns of differentgroups are as dissimilar as possible. It is very important forthe clustering results to measure their similarities dependingupon their mathematical properties such as distances betweendata, intensities, and etc. These properties must be used toidentify each cluster or pattern as uniquely as possible.

Among various data reduction techniques, even thoughmultivariate analysis techniques including factor analysis andprincipal component analysis are very popular, most oftechniques are concerning only one-directional reduction forthe data in their dimensions. For example, factor analysis andprincipal component analysis are only applied to eitherreduction of variables or reduction of observations,respectively. For other technique, like clustering analysis, it isnormally applied to reduce the number of observations byclassifying the similarities of data. Even other data analysismethods like neural networks optimize the data as one of thealternative techniques that cannot reduce the size of the data intwo-directional reduction of its dimension simultaneously.This may cause the difficulties to handle the very largenumber of observations with input/output format. Even

0-7803-91 87-XI05/$20.00 ©2005 IEEE. 573

though a new technique for the better performance isdeveloped, there is no any method to satisfy all kinds ofconditions. Therefore, the proposed algorithm requires thefollowing conditions. First, to reduce the original data into theequivalently reduced system, those original data set must behighly intercorrelated each other in variables. In other words,if the variables are uncorrelated, the meaning of reduction maybe fading out. Second, if the data set is linearly correlated, theresult may be better than the data set without linearlycorrelated. Since factor analysis is trying to represent thesystem with linear combinations using reduced embeddedcomponents, the main idea for the proposed algorithm is basedupon linear combinations for the embedded elements forvariables and simplifying the groups of observations.

II. FACTOR ANALYSIS (FA)

Factor Analysis [1][2] is the study of interrelationshipsamong the variables to evaluate a new set of variables withsimplifying complex and diverse interrelationships that existin a set of observed variables by uncovering commondimensions or factors.

The following steps summarize factor analysis algorithm.Step 1: Read the input data matrix.Step 2: Find the standardized matrix of the input data matrix.Step 3: Find the correlation matrix of the input data matrix.Step 4: Find eigenvalues and eigenvectors.Step 5: Calculate the initial factor loadings by multiplying

square root of eigenvalues and eigenvetors.Step 6: Find the rotated factor loadings using Varimax

rotation.Step 7: Compute the inverse matrix of the correlation matrix.Step 8: Obtain the transpose matrix of the rotated factor

loadings.Step 9: Multiply the inverse matrix from Step 7 and the

transpose matrix from Step8.StepIO: Calculate the factor scores by multiplying the

standardized matrix from Step 2 and the result matrixfrom Step 9.

III. SUBTRACTIVE CLUSTERING (SUBCLUST) [3]

Clustering of numerical data forms the basis of manyclassification and system modeling algorithms. The purpose ofclustering is to reform natural groupings of data from a largedata set producing a concise representation of a system'sbehaviour. The Subtractive clustering method is used forestimating cluster centres of numerical data to distill thenatural groupings. It is also used to determine the number ofclusters and their initial values for initializing iterativeoptimization-based clustering algorithms like fuzzy c-meansclustering method. The Subtractive clustering algorithmpartitions the input space and provides the cluster centres toform the rules that describe the system's behaviour. The inputspace partitions are the predicted outputs provided by themodel when applied to a checking set of data given withtraining data.

The following steps summarize the Subtractive clusteringmethod.

Step 1. Decide the measure, Mi, of data points, xi, and using(2) from APPENDIX.

Step 2. Find the first center point using the highest valuefrom Step 1.

Step 3. Recalculate the measure, Mi, for all data points using(3) from APPENDIX.

Step 4. Find the next center point using the highest valuefrom Step 3.

Step 5. Repeat this procedure until the kth cluster centre, thatsatisfies the condition (5) from APPENDix with the abovecriteria for e, is calculated.

IV. NEURO FUZZY SYSTEM (NFS)

The neuro fuzzy system [10] combines the neural networkalgorithms with fuzzy logic. In this paper, Adaptive-Network-Based Fuzzy Inference System (ANFIS) [5] is used toimplement the reduced IRIS data sets and the original IRISdata set. ANFIS is originally from the integration of TSKfuzzy model developed by Takagi, Sugeno, and Kang (TSK)[6][7][8][9], and the backpropagation learning algorithm withleast square estimation from neural networks. TSK fuzzymodel proposed to formalize a systematic approach togenerating fuzzy rules from and to input-output data set. Atypical fuzzy rule in a TSK fuzzy model has the format

If x is A and y is B then z = f (x,y)

where A and B are fuzzy sets in the antecedent; z =f(x, y) is acrisp function in the consequent. Usually f (x, y) is apolynomial in the input variables x and y, but it can be anyother functions as long as it can appropriately describe theoutput of the system within the fuzzy region specified by theantecedent of the rule.

To integrate TSK model and the backpropagation learningalgorithm with least square estimation from neural networks, itis convenient to put the fuzzy model into the framework ofadaptive networks that can compute gradient vectorssystematically. This adaptive network is a network structurewhich is a multilayered feedforward network consisting ofnodes which performs a particular function on incomingsignals and directional links through which nodes areconnected [5]. It is trained by a hybrid-learning rule [5] tospeed up the learning process. The hybrid learning ruleparadigm combines the gradient descent method and the leastsquares estimate as a faster identification of parameters.

V. MULTIVARIATE-BASED Fuzzy INFERENCE RULEREDUCTION ALGORITHM

For the multivariate-based fuzzy inference rule reductionalgorithm, which is the new proposed algorithm, the factoranalysis was applied to reduce the number of dimensions forthe given data set as a pre-processing procedure. Then, usingthe Subtractive clustering algorithm, the potential values fromeach clustered centre instead of using the membershippartition matrix are used for clustering. The following stepssummarize the algorithm of the multivariate-based datareduction technique.

574

Step 1. Read the original data set as a matrix format andpartition the original data into input data and outputdata.

Step 2. Normalize input data from Step 1.Step 3. Find the correlation matrix of the normalized data

from Step 2.Step 4. Find eigenvalues and eigenvectors of the

correlation matrix from Step 3 using thecharacteristic equation.

Step 5. Let the eigenvectors from Step 4 be unrotatedfactor loading of the data as initial factor loading.

Step 6. Use Varimax rotation to get the rotated factorloading from the initial factor loading.

Step 7. Compute the inverse matrix of the correlationmatrix from Step 3.

Step 8. Find the transposed matrix of the rotated factorloading from Step 6.

Step 9. Multiply the inverse matrix from Step 7 and thetransposed matrix from Step 8.

Step 1O. Estimate the factor scores by multiplying thestandardized matrix with zero mean and unit variancefrom Step 2 and the result matrix from Step 9.

Step l1. Combine the result matrix from Step 10 andoutput data from Step 1.

Step 12. Determine the measure, MA, of data points, xi,using (2) from APPENDIX.

Step 13. Find the centre point using the highest value, MI,from Step 11.

Step 14. Recalculate the measure, Mi, for all data pointsusing (3) from APPENDix with MI.

Step 15. Find the next centre point using the highest valuefrom Step 14.

Step 16. Repeat this procedure from Step 14 and Step 15until get the k' cluster centre satisfying (5) fromAPPENDIX with the above the criteria for £ is obtained.

VI. EXAMPLE: ANALYSIS AND RESULTSIRIS data are the best-known database to be found in the

pattern recognition literature. There are 50 flowers from eachof three species (or classes) of IRIS data set - IRIS setosa(Class 0), IRIS versicolor (Class 1), and IRIS virginica (Class2), and four measurements on each flower - petal length (cm),petal width (cm), sepal length (cm), and sepal width (cm). Oneclass is linearly separable from the other two classes and theother classes are not linearly separable from each other.

Before the proposed data reduction algorithms for theIRIS data set is applied, it is necessary to examine whether thedata set can eliminate the redundancy between its originalvariables with the highly correlated interrelationship. Toexamine the redundancy, the correlations between thevariables of IRIS data set were calculated. As shown in theTABLE I, there is a relatively high correlation between twovariables, petal length and petal width. That means thosevariables may have the redundancy so that the data can bereduced. There are also possibilities between sepal length andpetal length, or sepal length and petal width to apply thesystem reduction procedure since their correlations were

relatively higher than any other variables. Hence, in thisdatabase, two new reduced variables are considered from fouroriginal variables using the proposed new algorithm.

TABLE ICORRELATION MATRx FOR THE ORIGINAL IRIS DATA

sepal length sepal width petal length petal widthsepal length 1 -0.0923 0.83971 0.79439sepal width -0.0923 1 -0.48844 -0.3971petal length 0.83971 -0.48844 1 0.96526petal width 0.79439 -0.3971 0.96526 1

TABLE IIEIGENVALUES FOR THE ORIGINAL IRIS DATA SET

Eigenvectors Eigenvalues1 2.909362 0.926483 0.149694 0.01447

There are many different criterions to select the number ofthe new reduced variables. For this example, two combinedcriteria are used. One is the eigenvalues-greater-than-one rule[12] and the second criterion is the accumulated variance thatis more than 0.9 from the reduced system. TABLE II showsthe eigenvalues and eigenvectors for the IRIS data set and Fig.1 shows the break point for the number of reduced factorsfrom the original variables.

3 -

a)2 -

a)

0 -

1 2 43

eigenvectors

Fig. 1 The relationships between eigenvalues and eigenvectors

In this example, two factors are extracted from theoriginal variables. Simultaneously, IRIS data are clusteredwith 30 clusters using Subtractive clustering (SUBCLUST)algorithm. The following categories evaluate the performanceofthe neuro fuzzy systems using reduced data models: [13]CORR: Correlation between the original output and theestimated output from the fuzzy neural system using the datafrom each method.

575

d

TRMS: Total Root Mean Square for the distance between theoriginal output and the estimated output using the same testingdata through the fuzzy neural system.

n

TRMS - .=l (1)n-I

where xi is the estimated value and yi is the original outputvalue.STD: Standard Deviation for the distances between theoriginal output and the estimated output using the same testingdata through the fuzzy neural system.MAD: Mean of the absolute distances between the originaloutput and the estimated output using the same testing datathrough the fuzzy neural systemEWI: The index value from the summation of the valueswith multiplying the statistical estimation value by its equallyweighted potential value for each field. The value, which isclose to 0, is the better results.

A. Estimation using neurofuzzy system with the original dataThe output was estimated by the neuro fuzzy system with

the original IRIS data, without reducing any number ofvariables or observations. In this model, all 4 measurementsare used as input variables and the classification of threespecies from 50 flowers are used as an output variable fordeveloping a neuro fuzzy system. In this case, from theTABLE III, the correlation is 0.4832 and EWI was 2.6219.B. Estimation using neuro fuzzy system with the reduced databy only SUBCLUST algorithm

In this model, the reduced IRIS data set by SUBCLUSTalgorithm as input and output data was used for the neurofuzzy systems. Using only SUBCLUST algorithm, thenumber of observation was reduced to 30 from 150.Therefore, the new observations were, grouped by theirsimilarities and dissimilarities of the original IRIS data. Thecorrelation coefficient between the actual output and theestimated output of neuro fuzzy system using onlySUBCLUST algorithm has been found to be 0.9086 and theEWI of this estimation was 0.9316 from TABLE III.

C. Estimation using neuro fuzzy system with the reduced databy FA only

To implement this model, the reduced data by FA wereused to neuro fuzzy system. For this system, the number ofvariables from the original IRIS data set was reduced to 2 withnew principal components from four measurements.However, the number of observations were remaining as sameas the number of the original IRIS data set, which is 150. Thecorrelation for the estimated output against the actual outputwas 0.9404 and the EWI was 0.6262.D. Estimation using neuro fuzzy system with the reduced databy the newproposed algorithm

For using the new proposed algorithm, the neuro fuzzysystem was generated by the reduced IRIS data set with two

input new variables and 30 new observations instead of fourinputs and 150 observations of the original IRIS data set. Theoriginal IRIS data set has been reduced to the less number ofmeasurements and the less number of observations. For theevaluation of the new proposed algorithm comparing with theconventional algorithms, the correlation between the actualoutput values and the estimated output values is 0.9521 andEWI is 0.5799.

TABLE ISTATISTICAL ANALYSIS FOR THE PERFORMANCE AND ACCURACY BY NEUROFuzzy SYSTEMS FROM REDUCED IRS DATA SET AND THE ORIGINAL IRIS

DATA SET AGAINST THE ACTUAL OUTPUTCORR TRMS STD MAD EWI

FC 0.9521 0.1832 0.172 0.1767 0.5799

FA only 0.9404 0.1821 0.209 0.1756 0.6262

Clust 0.9086 0.285 0.2804 0.2748 0.9316

*Otg 0.4832 0.5409 1.0425 0.5216 2.6219XT-4- i#L r cOT Tl-lf- T TCTflNote: r is the mtluence range tor SUBCtLUST.

Neuro fuzzy systems of 30 clusters with SUBCLUST(r = 0.08) for IRIS

FC = Using the data Reduced by New Proposed algorithm,Clust = Using the data Reduced by only SUBCLUSTOrg = Using the original data,FA only = Using the data Reduced by only Factor Analysis

TABLE III also shows the statistical analysis for theperformance and accuracy by neuro fuzzy systemsimplemented by the original IRIS data set and the reducedIRIS data sets. The lowest EWI is from the outputs estimatedby the neuro fuzzy system with the reduced data set using thenew proposed algorithm.

VII. CONCLUSIONThe implementation of neuro fuzzy approach with

multivariate-based fuzzy inference rule reduction technique tomodel the relationships between four different measurements,petal length (cm), petal width (cm), sepal length (cm), andsepal width (cm) has been introduced. The new proposedalgorithm yield satisfactory results and requires a fraction ofthe effort that goes into conventional multivariable techniques.We expect that this proposed technique will become more andmore important as the cost of computers decreases and theirposer increases rapidly.

We have developed a software prototype using the neurofuzzy approach using MATLAB Fuzzy Tool Box [11] for thenew proposed technique and the conventional techniques.

The estimated output from the neuro fuzzy system usingthe reduced data by using only SUBCLUST algorithm, gave 6unpredictable outputs for defining the species. In thisprocessing, the original IRIS data set was only reduced to theless number of observations without reducing the originalvariables. This estimation also gave the correlation, 0.9086,and the EWI, 0.9316, against the actual output for IRIS data.

The estimated output from the neuro fuzzy system usingthe reduced data by using FA only was performed by reducingthe original IRIS data to two new measurements withoutreducing the number of observations. However, this

576

estimation gave the correlation, 0.9404, and the EWI, 0.6262,against the actual output for IRIS data.

Fig. 2 Comparison of the actual output values and the estimated output valuesusing the multivariate-based fuzzy inference rule reduction algorithm

The estimated output from the neuro fuzzy system usingthe reduced data by the new proposed algorithm gave oneunexpected prediction for specifying the classes from thetesting data. This output, however, gave the best coefficientsin all statistical categories when the new proposed algorithm isbeing used as a stand-alone estimating tool. This result maylead to the conclusion that for a limited number of input-output training data, the proposed algorithm can offer the bestperformance in comparison with the performances of the othertechniques for IRIS data. Fig. 2 shows the graphicalinterpretation between the actual output and the estimatedoutput using the proposed algorithm.

When the estimated output generated by the neuro fuzzysystem with the multivariate-based fuzzy inference rulereduction algorithm have been improved by -0.3517 and-0.0463 in EWI from the estimated outputs by the neuro fuzzysystem with using only SUBCLUST and FA only,respectively.

In conclusion, the estimated output using the neuro fuzzysystem with the reduced 2-factored data system shows a verysatisfactorily estimation comparing to the estimated outputsusing the neuro fuzzy systems with the original data set or thereduced data with classical methods. We can predict theclassification of IRIS data better with the proposed algorithmwithout losing any significant meaning.

APPENDIX

Subtractive Clustering [3]

Consider a collection of n data points such asX = {X1X.X2 ..... } in an M-dimensional space. Assumethat the data points are normalized by the equal coordinateranges in each dimension. Since each data point becomes apoint for cluster centres, we can define a measure of this datapoint, xi, as

Mi = Z -eaXi-X1ll1 (2)

where42ra

and ra is a positive constant. Hence, Mi becomes a function ofits distances from all other data points. The potential value forthis data point with many neighbouring data points isrelatively higher. The neighbour data points are effectivelydefined by the radius, ra. Compared to Mountain clustering,proposed by Yager and Filev [4], there are two differentpoints. One is that the candidate point for calculating clustercentres in the Subtractive clustering is an actual data point, nota grid point as in Mountain clustering. Another different viewis that the range of clustering for neighbour data points isdecided by the square of the distance between the centre pointand these neighbour data points, while Mountain clusteringuses the distance itself.

Using the values Mi for all data points, we can determinethe first cluster centre with the highest value. Let x]* be thelocation of the first cluster centre and M] be its measure to bea centre point. Then, update the measurement of each datapoint xiby the formula

M114 Mi Ml *e- xi -xj 11where

4,B=l 2

rband rb is a positive constant. In other words, to have a newmeasurement, M1, we subtract a value of measure for each datapoint as a function of this distance from the first cluster centre.As a result, the nearest data point from the centre point willhave the lowest value, so that it becomes the next candidatecentre point. The radius rb will decide the measurablereductions the neighbourhood data points. In general, rb willbe set as 1.5ra [3] to avoid obtaining closely spaced clustercentres.

After updating all new Mis, we can decide the secondcluster centre corresponding to the highest remaining value ofits potential measurement. After that, the procedure is repeateduntil the kh cluster centre has been determined, such as

M CM_ -Mk e-f8IXi-Xk 112 (4)where xk* is the position of the kh cluster centre and Mk* is themeasurement of its potential value. In Mountain clustering,the procedure to find the new cluster centre repeats until

Mk*<EM] (5)

577

Comparison between the orginal output valuesand the output values by proposed methods

3.5

,23 0 | |2.5

> 2 * Original outputs_ UE Outputs from FC

0I

0.5

01 4 7 10 13 16 19222528

Observations

(3)

where £ is a small fraction and the value of £ may generatemany cluster centers if it is too small. To decide a suitablevalue for E, Chui [3] proposed the following criteria:

IfMk*> E M,Accept xk* as a cluster centre and continue.

Else if Mk* < MEM,*Reject Xk and end the clustering process.

ElseLet dmin = shortest of the distances between xk and allpreviously found cluster centres.

If + k2 1ra M1

REFERENCES

[1] Richard L. Gorsuch, Factor Analysis, 2nd Ed., Hillsdale, NJ:Lawrence Erlbaum Associates Inc., 1983.

[2] Paul E.Green, Analysing Multivariate Data, Hinsdale, IL: TheDryden Press, 1978

[3] S. L. Chiu, "Fuzzy Model Identification based on ClusterEstimation," Journal of Intelligent and Fuzzy Systems, vol. 2, pp.267-278, 1994.

[4] Ronald Yager and Dimitar P. Filev, Essentials of Fuzzy Modeling andControl, John Wiley & Sons, 1994.

[5] J. S. Jang, "ANFIS: Adaptive Network Based Fuzzy InferenceSystem," IEEE Trans. Systems, Man and Cybernetics, vol. 23, no. 3,pp.665-684, May/June 1993.

[6] M. Sugeno and G. T. Kang, "Fuzzy Modeling and Control ofMultilayer Incinerator," Fuzzy Sets System, vol. 18, pp. 329-346,1986.

[7] M. Sugeno and G. T. Kang, "Structure Identification of FuzzyModel," Fuzzy Sets System, vol. 28, no. 1, pp. 15-23, 1988.

Accept xk* as a cluster centre and continue.Else

Reject xk and set the potential at xk to 0.Select the data point with the next highestpotential as the new Xk and re-test.

End if

End if

Here E specifies a terminal tolerance for the acceptablemeasure for the data point as a cluster centre; £ specifies aterminal tolerance for rejecting as a cluster centre. Chiusuggested £ = 0,5 and £= 0.15 [3].

[8] T. Takagi and M. Sugeno, "Derivation of Fuzzy Control Rules fromHuman Operator's Control Actions," in Proc. IFAC Symp. FuzzyInform., Knowledge Representation and Decision Analysis, July 1983,pp. 55-60.

[9] T. Takagi and M. Sugeno, "Fuzzy Identification of Systems and itsApplications to Modeling and Control," IEEE trans. on Systems, Man,and Cybernetics, vol. 15, pp.1 16-132, 1985.

[10] J. S. Jang, C. T. Sun, and E. Mizutani, Neuro-fuzzy and SoftComputing, NJ:Upper Saddle River, Prentice Hall, 1997.

[11] Fuzzy Logic Toolbox, for use with the MATLAB, the Math WorksInc., March 2001.

[12] Norman Cliff, "The Eigenvalues-Greater-Than-One Rule and theReliability of Components," Psychological Bulletin, vol. 103, no. 2,pp. 276-279, 1988.

[13] Deok Nam, Simulation of an equivalent reduced order system fromlarge, imprecise, and uncertain data system using multistagemultivariate analysis and neuro fuizzy approach, Ph. D. Dissertation,Dec. 2001.

578

Documents

[IEEE NAFIPS 2005 - 2005 Annual Meeting of the North American Fuzzy Information Processing Society - Detroit, MI, USA (26-28 June 2005)] NAFIPS 2005 - 2005 Annual Meeting of the North