4
Reliefs Application in Handwriting Recognition HaoMiao Wu Department of Computer Science and Technology Tsinghua University Beijing 100084 ZhongHang Yin Department of Computer Science and Technology Tsinghua University Beijing 100084 FuCun Sun Department of Computer Science and Technology Tsinghua University Beijing 100084 Abstract-Relief is a feature subset selection algorithm which provides a way to select an optimal feature set by maximum hypothesis-margin. Especially, it can rapidly deals with high-dimensionality feature selection. This paper mainly studies how the multi-class and unbalance data circumstance affect the algorithm process. We suggest an extension - ReliefF* and apply it in the Chinese handwriting recognition. The new algorithm not only saves computing time, but also results in a substantial improvement in the classification accuracy. The experiment results indicate ReliefF* is more effective for small-sample classification tasks. I. INTRODUCTION Feature subset selection is such a process that we choose an optimal subset to symbolize a concept or an object from several or hundred of features. This procedure not only reduces the computing pressure of feature extraction and classification, but in some cases it smoothes noise and provides a better classification accuracy. In this paper, we mainly discuss a kind of hypothesis-margin based algorithm - Relief 11 and analyze how the multi-class and unbalance data circumstance affect the algorithm process. When we face several categories, different features, or attributes, will show different dividable ability for every categories. And, great diversity of examples' quantities with different labels will further increase the difficulties of selecting process. The handwriting recognition is such a task. We try to apply Relief and its extensions to the handwriting recognition, and want to give a good solution. The paper is organized as follows: Section II briefly presents Relief and its major extension - ReliefF. In section III, we focus on analyzing the effect of multi-class and unbalance data in the algorithm process and proposing a new extension - ReliefF*. Empirical evidence on the performance of the algorithms is provided in section IV, followed by concluding discussion in section V. II. RELIEF AND RELIEFF Margin can be thought as a kind of measure to evaluate the confidence of a classifier with respect to its predictions. There are two natural ways of defining the margin of an instance, or an example, with respect to a classification rule. The sample-margin, which can be found in the Support Vector Machine[ [, measures the distance between the instance and the decision boundary. The other is the hypothesis-margin 31, which measures how much can an instance travel before it hits the decision boundary. The hypothesis-margin can be calculated by: 2 (l|x - M(x)|| - llx- H(x)||) where H(x) and M(x) denote the nearest point to x with the same and different label, respectively. Compared with the sample-margin, the hypothesis-margin is very easy to get. Moreover, its value is a good evaluation of attributes' dividable abilities. If we sum the hypothesis-margin of the training instances, we are approximately able to estimate which attributes are helpful to classification. That is Relief does. The Relief algorithm randomly chooses m instances from over all available training instances, and computes the weight of attributes by: Wp+1 = Wp-diff(p,x,H(xi))/m+diff(p,xi,M(x1))Im For nominal attributes the function diff( is defined as: diff(p, x, x') =1 { X , ,X-(XI,...,XN) and for numerical attributes as: dff(p, x,x') |xp -x| max (p) - min (p)' We say an attribute is easily dividable, which means the attribute has small distance to same label instances and has large distance to difference label instances. Its weight also will be high. If it is an irrelevant attribute, it is a random number to the label. Its weight will be zero or a small number, when we practice plenty of instances. Kononenko [415] provided an more robust extension -- ReliefF, which was not limited to two class problems, and could deal with incomplete and noise data. He changed the weight formula, like that: k = W- didff(p, xi, Hj (xi ))(m k) j=1 (1) 0-7803-9422-4/05/$20.00 C2005 IEEE 1770 P(C) + f dYff(p,xi,Mj(C)) C*CIOSS(I") I --P (class (xi )) j=,

[IEEE 2005 International Conference on Neural Networks and Brain - Beijing, China (13-15 Oct. 2005)] 2005 International Conference on Neural Networks and Brain - Relief's Application

  • Upload
    vantruc

  • View
    214

  • Download
    2

Embed Size (px)

Citation preview

Page 1: [IEEE 2005 International Conference on Neural Networks and Brain - Beijing, China (13-15 Oct. 2005)] 2005 International Conference on Neural Networks and Brain - Relief's Application

Reliefs Application in Handwriting RecognitionHaoMiao Wu

Department of ComputerScience and TechnologyTsinghua University

Beijing 100084

ZhongHang YinDepartment of ComputerScience and TechnologyTsinghua UniversityBeijing 100084

FuCun SunDepartment ofComputerScience and TechnologyTsinghua University

Beijing 100084

Abstract-Relief is a feature subset selection algorithm whichprovides a way to select an optimal feature set by maximumhypothesis-margin. Especially, it can rapidly deals withhigh-dimensionality feature selection. This paper mainlystudies how the multi-class and unbalance data circumstanceaffect the algorithm process. We suggest an extension -ReliefF* and apply it in the Chinese handwriting recognition.The new algorithm not only saves computing time, but alsoresults in a substantial improvement in the classificationaccuracy. The experiment results indicate ReliefF* is moreeffective for small-sample classification tasks.

I. INTRODUCTION

Feature subset selection is such a process that we choosean optimal subset to symbolize a concept or an object fromseveral or hundred of features. This procedure not onlyreduces the computing pressure of feature extraction andclassification, but in some cases it smoothes noise andprovides a better classification accuracy.

In this paper, we mainly discuss a kind ofhypothesis-margin based algorithm - Relief 11 and analyzehow the multi-class and unbalance data circumstance affectthe algorithm process. When we face several categories,different features, or attributes, will show different dividableability for every categories. And, great diversity ofexamples' quantities with different labels will furtherincrease the difficulties of selecting process. Thehandwriting recognition is such a task. We try to applyRelief and its extensions to the handwriting recognition, andwant to give a good solution.

The paper is organized as follows: Section II brieflypresents Relief and its major extension - ReliefF. Insection III, we focus on analyzing the effect of multi-classand unbalance data in the algorithm process and proposing anew extension - ReliefF*. Empirical evidence on theperformance of the algorithms is provided in section IV,followed by concluding discussion in section V.

II. RELIEF AND RELIEFF

Margin can be thought as a kind of measure to evaluatethe confidence of a classifier with respect to its predictions.There are two natural ways of defining the margin of aninstance, or an example, with respect to a classification rule.

The sample-margin, which can be found in the SupportVector Machine[[, measures the distance between theinstance and the decision boundary. The other is thehypothesis-margin 31, which measures how much can aninstance travel before it hits the decision boundary. Thehypothesis-margin can be calculated by:

2 (l|x - M(x)|| - llx- H(x)||)where H(x) and M(x) denote the nearest point to x with thesame and different label, respectively.Compared with the sample-margin, the hypothesis-margin

is very easy to get. Moreover, its value is a good evaluationof attributes' dividable abilities. Ifwe sum thehypothesis-margin of the training instances, we areapproximately able to estimate which attributes are helpfulto classification. That is Relief does.

The Relief algorithm randomly chooses m instances fromover all available training instances, and computes theweight of attributes by:

Wp+1 = Wp-diff(p,x,H(xi))/m+diff(p,xi,M(x1))ImFor nominal attributes the function diff( is defined as:

diff(p,x,x') =1{ X , ,X-(XI,...,XN)and for numerical attributes as:

dff(p,x,x') |xp-x|max(p) - min(p)'

We say an attribute is easily dividable, which means theattribute has small distance to same label instances and haslarge distance to difference label instances. Its weight alsowill be high. If it is an irrelevant attribute, it is a randomnumber to the label. Its weight will be zero or a smallnumber, when we practice plenty of instances.

Kononenko [415] provided an more robust extension --ReliefF, which was not limited to two class problems, andcould deal with incomplete and noise data. He changed theweight formula, like that:

k= W- didff(p, xi,Hj (xi))(m k)

j=1

(1)

0-7803-9422-4/05/$20.00 C2005 IEEE1770

P(C)+ f dYff(p,xi,Mj(C))C*CIOSS(I") I --P(class (xi )) j=,

Page 2: [IEEE 2005 International Conference on Neural Networks and Brain - Beijing, China (13-15 Oct. 2005)] 2005 International Conference on Neural Networks and Brain - Relief's Application

Noise is some little disturbance which make the dataslightly leave their original place. If there are many

attributes affected, finding right H(x) and M(x) becomes a

headache thing, and the algorithm's result will beunbelievable. Choosing k of nearest instances and averagingthem probably is a good way to smooth noise effect.

The Relief algorithm only can deal two-class problem. Ifwe want to apply multi-class, we can divide the multi-classto many small one-to-other tasks. In the ReliefF algorithm,the multi-class problem is seen as a one-to-multi problem.ReliefF calculates the distance from x to x's nearestneighbor points to every category. That improvement makesthe attributes which are helpful to divide two categoriesprominent.

HI. ANALYSIS

A. Multi-classThe multi-class is common in actual tasks. For example,

in the handwriting recognition, people naturally look a

person's handwriting as a category. Like in Relief andReliefF, the one-to-other method and one-to-multi methodare most popular methods to deal with it.

Unfortunately, in the handwriting sample dataset, thereare enormous irrelevant people's handwriting instances. Wewish to distinguish them from the persons need be identified,and don't care about who wrote. How to look them? Givethem same label, or everyone gives a label?We think when the features have enough expressive

ability, everyone's handwriting instances will congregate ina mass. If someone's handwriting is similar to another's, thetwo instances' mass will lap over each other and produce a

confusable region in the space. The algorithm's aim isobtaining an optimal subspace where the same labelinstances are near and the different label instances are far.

In fact, we have not capability to find irrelevant instances'identity in most case. Ifwe assign all the irrelevant instanceswith same label, it is very possible to regard a person A's

instance as person B's near neighbor of same category.Obviously, it is disadvantage of calculating the weights. on

the same reason, if we average k irrelevant instances, itmakes the thing more terrible.

Therefore, do not compute the irrelevant instances'hypothesis-margin and don't use k irrelevant instances tosmooth the noise data.B. Unbalance data

The worthy of every instance is difference, and thespending to get it is also variety. We probably spend muchtime, money or energy obtaining a special instance, butcollecting some other instances is almost free. As a result,the data used in training become unbalance.As the handwriting recognition is considered, the

handwriting instances which belong to the persons need beidentified is limited in comparison with thousands of

irrelevant instances. Even among relevant people, theunbalance phenomenon still exists. Some persons haveonly 3 to 4 instances in over all available instances. Otherones maybe own one or two hundred handwriting samples.The unbalance data harms Relief and Relief F.

1) The unbalance data affects instances' selection. Therandom selecting strategy in Relief make selecting thesmall-sample categories become low probability, even

impossible.2) The unbalance data affects the attributes' weight. If

there are many instances with same label, the attributeswhich are helpful to classify them will have more

opportunity to increase their weights. However, thoseattributes related to small-sample categories destine to havelower weights.

3) The unbalance data affects the instances' usage. Onone hand, with several instances, we need not to concern ofthe noise problem. On the other hand, we should not ignorethe noise's harmful effect, providing instances is enough.Using same k in the algorithm, is also unhelpful to smooththe noise.

To solve these two problems, we suggest:Make use of all instances in small-sample categories. And

sample remaining categories' instances by a certain rate.This method assigns each class for fair opportunity toevaluate features.

Replace the k in Equation (1) with a group of variables,which are appointed according to the quantities ofcategories.

Seek r nearest neighbor categories to compute the weights.

1771

ReliefF* algorithm*Set all weights: W0 = 0.00 Count the instances for every class , and give a kcand a sc for it. The kc value for the irrelevant class is 1and the Sc value of it is 0. Sample the examplesaccording to the sampling percent sc of each class C, andarrange to queue.*For each instance xp in the queue do:* Find kc nearest Hj of xp.* For each class C . class(xp), find kc nearest

Mj(C) of xp and choose r classes which have the smallestkc

values D(x, ,Mj (C)). (D(x,y) is the distance ofx toj=l

y)Efori= 1 toN

kc

+Wp+* = WP diff(i,x ,Hj)/(m kc)j=1

+'[ kc diff(i. xp Mj (C)) /(m r)

Cr=R C j=l

*Sort weights and select several attributes ofmaximumweights.

Page 3: [IEEE 2005 International Conference on Neural Networks and Brain - Beijing, China (13-15 Oct. 2005)] 2005 International Conference on Neural Networks and Brain - Relief's Application

And irrelevant instances only use the nearest neighbor.

IV. EXPERIMENTAL RESULTS

We empirically evaluated the algorithmn in two differentexperimental settings .A. Artificial dataset

This dataset combines with four relevant categories andan irrelevant category. Every relevant category is a sphericalregion, and points randomly distribute in it. Nearer thecenter of the sphere, denser it is. We sample the instancefrom these spherical regions. Every instance has nineattributes. Three of them are the sphere coordinates whichare the unique link between attributes and labels. Otherattributes are filled with random numbers. For an irrelevantinstance, it is consist of a group ofrandomn numbers.

The right column of Table I presents the relationshipbetween attributes and labels. The attributes 4, 6, 3 and 2 arerelative with at less two categories. We wish the weights ofthese attributes be high. The attributes 1, 5 and 7 only relateone category. If the sum of random numbers is zero, theattributes may be weak-relevant to the categories

TABLE I

Description of Datasetslbel umberAttribute:a ____- 1 23 456 789

1 442 9

3 49 vVn4 100 i>/0 383

The last step of Relief is ordering the attributes' weights.High weights mean the corresponding attributes are good toclassification. If weights are small or even negative, thedistances of same label instances are far in such attributes.Not only the weights' order but also the magnitude ofweight is meaningful. An ideal weights' queue is that theysort as strong-relevant attributes, weak- relevant ones andirrelevant ones and there are enough space among them.

Table II presents the result of three algorithms in anexperiment. From the second row and the third row, weknow the Relief algorithm can find the attributes which isrelative to many instances, such as the attribute 3, 4 and 6,but can not give the small-sample attribute 2, the weight ofwhich is even negative.

The ReliefF algorithm gives the better results in thefourth row and the fifth row of Table II. The weight ofattribute 2 is positive and the attributes 1, 5 and 7 are aheadof attributes 8 and 9. What could not let us satisfied is theattribute 2 is confused with the weak-relevant attribute 1, 5and 7.

Comparing with other two algorithms, ReliefF* hasobviously improvement in weights and the attributes' order.The attributes 4, 6 and 3 which are relative to many

instances are in the top. Next is the attribute 2 relating tosmall-sample category. The weak-relevant attributes 5, 1 and7 are followed. The irrelevant attributes 8 arid 9 is at the end.The weights of attributes 4, 6 and 3 are similar and thedifference of weights of attributes 2 and 5 is clear. Anotheradvantage of ReliefF* is it saves the compixte time. In TableII, the algorithm only calculate 162 instax-ces' hypothesis-margin and the result is better than others.

TABLE II

The Result of the Algorithms

ReliefF R-d- ieflFRelief Relief ReliefF*i.L (k=-3)

Sample 162 500 500 500 1629 1 9 9 8

-0.9927 -1.0980 -0.2508 -0.0410 -1.00797 7 1 8 9

-0.9432 -0.8731 0.0993 -00076 -0.93301 9 2 7 7

-0.4310 -0.7252 0.1010 0.3410 0.34108 2 7 1 1

Attribute -0.2500 -0.5918 0.1523 0-3497 0.3285No. and 2 82 5w°eights -02205 -0.4018 0.5253 04577 0.4452weights 5 5 5 5 2

0.3421 0.4944 1.4684 1 .2113 1.15643 3 3 3 3

1.2588 1.2149 2.3795 2.3401 1.93996 6 6 6 6

3.9442 3.3786 4.3159 4.2444 2.64004 4 4 4 4

__________ 4.7656 5.3420 6.0347 5.2059 3.2919B. Handwriting dataset

To evaluate the performance of the algorithms in theactual dataset, we select a group of 45 dirnension Chinesehandwritings as the testing dataset. The k *Nearest Neighborclassifier is used in the experiment. The k NN classifier isconsistent with the Relief in some places 'When the Reliefalgorithm selects the sub-space where the hypothesis-marginis larger, the same label's instances are near, and twoinstances with difference label are far. In that sub-space, therecognizing ability ofkNN will be good.

-45 40 35 3D 25 20DWnson

Fig. 1 Results of INN classification using the attributes' queues producedby ReliefF and ReliefF*.

We used INN classifier to the handwriting dataset, and

1772

I~~~~~~~-IeF-- F lm Fo--_ W,I

o -4~~~~~~~~~~A

20

tO

70I IT r -r

01 .- I__*r

Page 4: [IEEE 2005 International Conference on Neural Networks and Brain - Beijing, China (13-15 Oct. 2005)] 2005 International Conference on Neural Networks and Brain - Relief's Application

reduced the attributes using in the classification according tothe result of ReliefF algorithm or the result of ReliefF*algorithm. As Figure 1 shown, the line of ReliefF* issmoother than that of ReliefF and the drop is closer to theright side where smaller attributes were used. ReliefF* ismore effective to give the potential relationship between theattributes and the categories.

V. CONCLUSIONS

By introducing Relief into the multi-class and unbalancedata circumstance, we try to seek a good feature subsetselection algorithm for the handwriting reorganization. Weshow that the multi-class problem and the unbalance dataproblem affect the selecting process and make more favor toattributes which are relevant to big-sample categories. Wesuggested some steps like assigning the parameter related toexamples' number, only using relevant examples to againstit. The experimental results of the artificial dataset and thehandwriting dataset indicate that the ReliefF* algorithm ismore effective than the Relief or ReliefF. Specially, there area lot of small examples' categories in the dataset.

REFERENCES

[1] K. Kira and L. Rendell, "A practical approach to feature selection,"Proc. 9th International Workshop on Machine Learning, 1992, pp.249-256

[2] V.N. Vapnik, "Statistical Learning Theory," John Wiley&SonsInc.,1998.

[3] K. Crammer, R. Gilad-Bachrach, A. Navot, et al., "Margin analysis ofthe lvq algorithm," Proceedings of the 17th Conference on NeuralInformation Processing Systems, 2002.

[4] I. Kononenko, "Estimation Attributes: Analysis and Extensions ofRELIEF," European Conference on Machine Learning, 1994,pp.171-182.

[5] R.S. Marko, I. Kononenko. "Theoretical and Empirical Analysis ofReliefF and RReliefF," Machine Learning, 2003, vol. 53(1), pp.23-69.

[6] R. Kohavi and G.John, 'Wrapper for feature subset selection," ArtificialIntelligence, 1997, vol. 3, pp. 273-324.

[7] L.C. Molina, L. Belanche and A. Nebot, "Feature Selection Algorithms:A Survey and Experimental Evaluation," Proceedings of theInternational Conference on Data Mining, 2002.

[8] G.B. Ran, N. Amir and T. Naftali, "Margin based feature selection -theory and algorithms," Twenty-first international conference onMachine learning, 2004.

1773