Timo Ahonen Matti Pietik ainen

Image Des ription using Joint Distribution ofFilter Bank ResponsesTimo Ahonen �, Matti Pietik�ainenMa hine Vision Group, University of Oulu, PL 4500, FI-90014 University ofOulu, FINLAND

Abstra tThis paper presents a uni�ed framework for image des riptors based on quantizedjoint distribution of �lter bank responses and evaluates the signi� an e of �lterbank and ve tor quantizer sele tion. First, a �lter bank based representation of theLo al Binary Pattern (LBP) operator is introdu ed, whi h shows that LBP analso be presented as an operator produ ing ve tor quantized �lter bank responses.Maximum Response 8 (MR8) and Gabor �lters are widely used alternatives to thederivative �lters whi h are used to implement LBP, and the performan e of thesethree sets is ompared in the texture ategorization and fa e re ognition tasks. De-spite their small spatial support, the lo al derivative �lters are shown to outperformGabor and MR8 �lters in texture ategorization with the KTH-TIPS2 images. Infa e re ognition task with CMU PIE images, the Gabor �lter based representationa hieves the best re ognition rate. Furthermore, it is shown that when the �lterresponse ve tors are quantized for histogram based joint density estimation, thresh-olding is learly faster than using learned odebooks and, being robust to gray level hanges, it yields better re ognition rate in most ases. Third, automati sele tionof �lter bank is dis ussed and ex ellent fa e re ognition performan e in the fa ere ognition task is a hieved with the optimized �lter bank.Key words: texture, fa e image des ription, lo al binary pattern, LBP, MR8,Gabor �lters� Corresponding authorEmail addresses: tahonen�ee.oulu.fi (Timo Ahonen), mkp�ee.oulu.fi(Matti Pietik�ainen).Preprint submitted to Elsevier 15 August 2008

1 Introdu tionQuantitative des ription of lo al image appearan e has wide range of appli- ations in image analysis and omputer vision. Des ribing the appearan elo ally, e.g., using o-o urren es of gray values or with �lter bank responsesand then forming a global des ription by omputing statisti s of them over theimage area is a well established te hnique in texture analysis (Tu eryan andJain, 1998). On the other hand, re ent �ndings in applying texture methodsto fa e image analysis, for example, indi ate that texture might have appli a-tions in new �elds of omputer vision that have not been onsidered textureanalysis problems. In this work we extend the �ndings of our preliminary work(Ahonen and Pietik�ainen, 2008) to more general image analysis.Be ause of the importan e of texture analysis, a wide variety of di�erent tex-ture des riptors have been presented in the literature. However, there is noformal de�nition of the phenomenon of texture itself that the resear hers wouldagree upon. This is possibly one of the reasons that so far no uni�ed theoryor no uni�ed framework of texture des riptors has been presented.The Lo al Binary Pattern (LBP) (Ojala et al., 2002), Maximum Response 8(Varma and Zisserman, 2005) and Gabor �lter based texture des riptors areamong the most studied and best known re ent texture analysis te hniques.Despite the large number of publi ations dis ussing and applying these meth-ods, the onne tions and di�eren es between them are not well understood.This paper presents a new uni�ed framework for these texture des riptors,whi h allows for a systemati omparison of these widely used des riptors andthe parts that they are built of.LBP is an operator for image des ription that is based on the signs of di�er-en es of neighboring pixels. It is fast to ompute and invariant to monotoni gray-s ale hanges of the image. Despite being simple, it is very des riptive,whi h is attested by the wide variety of di�erent tasks it has been su ess-fully applied to. The LBP histogram has proven to be a widely appli ableimage feature for, e.g., texture lassi� ation, fa e analysis, video ba kgroundsubtra tion, et . (The Lo al Binary Pattern Bibliography, 2008).Another frequently used approa h in texture des ription is using distributionsof quantized �lter responses to hara terize the texture (Leung and Malik,2001), (Varma and Zisserman, 2005). In the �eld of texture analysis, �lteringand pixel value based texture operators have been seen as somewhat ontra-di tory. However, in this paper we show that the lo al binary pattern operator an be seen as a �lter operator based on lo al derivative �lters at di�erent ori-entations and a spe ial ve tor quantization fun tion. Apart from larifying the onne tions between LBP and �lter based methods, this also helps analyzing2

the properties of the LBP operator.The estimated distribution of lo al image appearan e is widely used in imageor image pat h des ription and di�erent implementations of this idea haveresulted in ex ellent performan e in wide range of appli ations, e.g., (S hieleand Crowley, 2000), (Ojala et al., 2002), (Varma and Zisserman, 2005), (Lowe,2004), (Ahonen et al., 2006). There are still a number of open questions regard-ing how to des ribe the lo al appearan e, how to estimate the distribution,how to use the estimated distribution in the sele ted appli ation, and whetherthe optimal methods are appli ation spe i� or more generi . This paper on-tributes to these questions by setting a framework, and providing systemati experimental results in two di�erent appli ations, namely texture ategoriza-tion and illumination invariant fa e re ognition. Commonly used �lter setswith di�erent hara teristi s and of varying spatial support are tested in lo alappearan e des ription. Then two di�erent methods for quantizing the �lterresponses are ompared. Finally, a method for sele ting a subset of �lters froma large �lter bank is proposed.2 Image des riptorsThis paper dis usses image des riptors that are based on estimating the dis-tribution of lo al hara teristi s of the image. In texture analysis literature,a variety of su h lo al hara teristi s have been studied. The well known o-o urren e matrix introdu ed by Harali k (1979) is based on gray values ofpixel pairs de�ned by a displa ement ve tor. Another lo al hara teristi om-puted dire tly from pixel gray values is the LBP label that is omputed fromgray level di�eren es of neighboring pixels. On the other hand, in the ore ofmany texture des riptors is a �lter bank or wavelet oeÆ ient based des rip-tion of lo al image appearan e. In the following we take a loser look at thethree image des riptors that are studied in this paper.2.1 The lo al binary pattern operatorThe lo al binary pattern operator (Ojala et al., 2002) is a powerful means oftexture des ription. The original version of the operator labels the pixels ofan image by thresholding the 3x3-neighborhood of ea h pixel with the entervalue and summing the thresholded values weighted by powers of two. Thenthe histogram of the labels an be used as a texture des riptor. See Fig. 1 foran illustration of the basi LBP operator.The operator an also be extended to use neighborhoods of di�erent sizes3

1 1 0

1

1 0 0

1

1 2 4

8

163264

128

5 9 1

4 4 6

7 2 3

Threshold Weights

LBP code: 1+2+8+64+128=203Fig. 1. The basi LBP operator.Fig. 2. Three ir ular neighborhoods: (8,1), (16,2), (6,1). The pixel values are bilin-early interpolated whenever the sampling point is not in the enter of a pixel.(Ojala et al., 2002). Using ir ular neighborhoods and bilinearly interpolatingthe pixel values allow any radius and number of pixels in the neighborhood.For neighborhoods we will use the notation (P;R) whi h means P samplingpoints on a ir le of radius of R. See Fig. 2 for an example of di�erent ir ularneighborhoods.Let us denote the enter pixel value by g and the gray values of the P samplingpoints by g1; g2; : : : ; gP . Now the generi LBPP;R operator is de�ned asLBPP;R = PXn=1 s(gn � g )2n�1; (1)wheres fzg = 8><>: 1; z � 00; z < 0 ; (2)Further extensions to the original operator are uniform and rotationally in-variant binary patterns (Ojala et al., 2002). A lo al binary pattern is alleduniform if the binary pattern ontains at most two bitwise transitions from 0to 1 or vi e versa when the bit pattern is onsidered ir ular. For example,the patterns 00000000 (0 transitions), 01110000 (2 transitions) and 11001111(2 transitions) are uniform whereas the patterns 11001001 (4 transitions) and01010011 (6 transitions) are not. In the omputation of the LBP histogram,uniform patterns are used so that the histogram has a separate bin for everyuniform pattern and all non-uniform patterns are assigned to a single bin.In the ontext of LBPs, rotational invarian e is a hieved by ir ularly rotat-ing ea h bit pattern to the minimum value. For instan e, the bit sequen es1000011, 1110000 and 0011100 arise from di�erent rotations of the same lo alpattern and they all orrespond to the normalized sequen e 0000111.4

Fig. 3. The MR8 �lter bank. The �lter bank onsists of an edge and a bar �lter bothat 3 s ales and 6 orientations and a Gaussian and a Lapla ian of Gaussian �lter.2.2 Maximum response 8 �ltersThe se ond des riptor onsidered here is the Maximum Response 8 des riptor(Varma and Zisserman, 2005). In the ore of that des riptor is a �lter set onsisting of 38 �lters: two isotropi �lters, a Gaussian and a Lapla ian ofGaussian both at s ale � = 10 pixels and an edge and a bar �lter both at3 s ales (�x; �y) = f(1; 3); (2; 6); (4; 12)g and 6 orientations. The �lter kernelsare shown in Figure 3.As the image has been onvolved with the �lter bank, the maximum of the6 responses at di�erent orientations is omputed. This results in a total of8 responses, 2 from the isotropi �lters and 6 from the edge and bar �ltersat di�erent s ales. Finally the response ve tor is labeled with the nearest odebook ve tor (texton) and the histogram of these labels is used to representthe texture. In the learning stage, the odebook is obtained by lustering aset of training samples with the k-means algorithm.2.3 Gabor �ltersAnother type of �lter kernels that is widely used in image des ription is Gabor�lters. The omplex Gabor fun tions an be de�ned asg(x; y) = 1=(2��x�y)e(x2=2�2x+y2=2�2y)e2�j(ux+vy) (3)in whi h �x and �y de�ne the s ale of the Gabor fun tion and (u; v) de�nesthe frequen y of the omplex sinusoid. Thus, the Gabor fun tion is a produ tof an ellipti al Gaussian and a omplex plane wave.The typi al way Gabor �lters are applied in texture des ription is to onvolvethe input image with a bank of Gabor �lters at di�erent s ales and frequen ies5

and ompute a set of features from the output images. In texture des ription,the best known method applying Gabor �lters is the one proposed Manjunathand Ma (1996) in whi h a ve tor of means and standard deviations of Gabor�lter responses are used for texture des ription. Another lassi work, applyingGabor �lters in fa e re ognition, is the Elasti Bun h Graph Mat hing method(Wiskott et al., 1997), whi h is based on Gabor �lter bank responses at ertainfa ial landmarks.In a re ent work by Zou et al. (2007) Gabor �lters and Lo al Binary Patternswere ompared in the fa e re ognition task and a fa e des riptor based onGabor �lter responses omputed at points spa ed one wavelength apart on thewhole fa e area was developed. In that work, the Gabor �lter based des riptorwas shown to produ e better re ognition rates than LBPs espe ially on diÆ ultimage sets ontaining lighting variation and aging of the subje ts.3 Framework for �lter bank and ve tor quantization based texturedes riptorsA widely used approa h to texture analysis is to onvolve an image withN di�erent �lters whose responses at a ertain position (x; y) form an N -dimensional ve tor. At learning stage, a set of su h ve tors is olle ted fromtraining images and the set is lustered using, e.g., k-means to form a ode-book. Then ea h pixel of a texture image is labeled with the label of the nearest luster enter and the histogram of these labels over a texture image is used todes ribe the texture. (Leung and Malik, 2001), (Varma and Zisserman, 2005).More formally, let I(x; y) be the image to be des ribed by the texture operator.Now the ve tor valued image obtained by onvolving the original image with�lter kernels F1; F2; : : : ; FN :If(x; y) = 2666666664 I1(x; y) = I(x; y) ? F1I2(x; y) = I(x; y) ? F2...IN(x; y) = I(x; y) ? FN

3777777775 (4)The labeled image Ilab(x; y) is obtained with a ve tor quantizer f : RN 7!f0; 1; 2; � � � ;M � 1g, where M is the number of di�erent labels produ ed bythe quantizer. Thus, the labeled image isIlab(x; y) = f(If(x; y)) (5)6

and the histogram of labels isHi =Xx;y Æ fi; Ilab(x; y)g ; i = 0; : : : ;M � 1; (6)in whi h Æ is the Krone ker deltaÆ fi; jg = 8><>: 1; i = j0; i 6= j (7)If the task is lassi� ation or ategorization as in this work, several possibil-ities exist for lassi�er sele tion. The most typi al strategy is to use nearestneighbor lassi�er using, e.g., �2 distan e to measure the distan e between his-tograms (Leung and Malik, 2001), (Varma and Zisserman, 2005). In (Varmaand Zisserman, 2004), the nearest neighbor lassi�er was ompared to Bayesian lassi� ation but no signi� ant di�eren e in the performan e was found. In(Caputo et al., 2005) it was shown that the performan e of a material ate-gorization system an be enhan ed by using suitably trained support ve torma hine based lassi�er. In this work, the main interest is not in the lassi�erdesign but in the lo al des riptors and thus the nearest neighbor lassi�er with�2 distan e was sele ted for the experimental part.The following two subse tions dis uss in more detail the two parts that de�nean image des riptor in the proposed framework. These parts are the �lter bankF1; F2; : : : ; FN and the quantization fun tion f .3.1 Filter bankIn this paper we ompare three di�erent types of �lter kernels that are om-monly used in texture des ription. The �rst �lter bank is a set of orientedderivative �lters whose thresholded output is shown to be equivalent to thelo al binary pattern operator. The other two �lter banks in luded in the om-parison are Gabor �lters and the Maximum Response 8 �lter set.A novel way to look at the LBP operator proposed in this paper is to see it asa spe ial �lter-based texture operator. The �lters for implementing LBP areapproximations of image derivatives omputed at di�erent orientations. The�lter oeÆ ients are omputed so that they are equal to the weights of bilinearinterpolation of pixel values at sampling points of the LBP operator and the oeÆ ient at �lter enter is obtained by subtra ting 1 from the enter value.For example, the kernels shown in Fig. 4 an be used for �lter based imple-mentation of lo al binary pattern operator in the ir ular (8,1) neighborhood.7

0

0

0 0 0

1

0

−1

00

0

0 0 0

0

0

0 0 0

0

−1

1

0 −0.914 0.207

0.207 0.5

Fig. 4. Filters F1 � � �F3 of the total of 8 lo al derivative �lters at (8,1) neighborhood.The remaining 5 �lters are obtained by mirroring the �lters shown here.The response of su h �lter at lo ation (x; y) gives the signed di�eren e of the enter pixel and the sampling point orresponding to the �lter. These �lters,whi h will be alled lo al derivative �lters in the following, an be onstru tedfor any radius and any number of sampling points.Applying the Maximum Response 8 des riptor in this framework is straight-forward. In the �lter bank design we follow the pro edure of Varma and Zisser-man (2005). This �lter bank produ es a 38 dimensional ve tor valued image.Sele ting the maximum over orientations is handled in the ve tor quantizerand is des ribed in more detail in the following se tion.For Gabor �lters, a lot of work has been devoted to designing the �lterbank and feature omputation methods, see, e.g., (Manjunath and Ma, 1996),(Clausi and Jernigan, 2000), (Grigores u et al., 2002). In this work we applythe Gabor �lters in the proposed image des ription framework, whi h is tosay that the responses of the �lter bank at a ertain position are sta ked intoa ve tor whi h is used as an input for the ve tor quantizer. This resemblesthe Gabor �lter based fa e des ription suggested by Zhang et al. (2007) inwhi h the Gabor �lter responses are quantized and a histogram of them isthen formed to en ode a fa ial image.In the design of the �lter bank, i.e. sele tion of the s ale and frequen y param-eters, the pro edure of Manjunath and Ma (1996) is applied. Furthermore, atea h hosen s ale and frequen y, the real and imaginary parts of the omplexGabor �lter are treated as separate �lters so the total number of �lters is2nsnf in whi h ns is the number of s ales and nf is the number of orientationsat ea h s ale.3.2 Ve tor quantizationThe assumption onto whi h the proposed texture des ription framework isbased on is that the joint distribution of �lter responses an be used to des ribethe image appearan e. Depending on the size of the �lter bank, the dimensionof the ve tors in the image If(x; y) an be high and quantization of the ve torsis needed for reliable estimation of the histogram.8

A simple, non-adaptive way to quantize the �lter responses is to thresholdthem and to ompute the sum of thresholded values multiplied by powers oftwo: Ilab(x; y) = NXn=1 st fIn(x; y)g 2n�1; (8)where s(z) is the thresholding fun tionst fzg = 8><>: 1; z � t0; z < t ; (9)in whi h the parameter t is the threshold.Thresholding divides ea h dimensionof the �lter bank output into two bins. The total number of di�erent labelsprodu ed by threshold quantization is 2N where N is the number of �lters.If the threshold t = 0 and the oeÆ ients of ea h of the �lters in the bankhave zero mean (i.e. they sum up to zero), the value of s fIn(x; y)g and thusthe value of Ilab(x; y) is not a�e ted by aÆne gray level hanges I 0(x; y) =�I(x; y) + �; � > 0. If the �lter oeÆ ients Fn(x; y) sum up to zero, I 0(x; y) ?Fn = �(I(x; y) ? Fn) and, assuming � > 0, the sign is not hanged. On theother hand, su h global gray level hanges an be easily normalized, but thisholds also lo ally for su h areas where hanges in pixel values, e.g., due tolighting hanges, an be modeled as aÆne gray level hange within the �ltersupport area.Now, let us onsider the ase that �lter bank used to obtain the image If (x; y)is the set of lo al derivative �lters (e.g. the �lters presented in Fig. 4), designedso that �lter responses are equal to the signed di�eren es, i.e. at ea h pixello ationIn = g � gn;where In is the response of n-th �lter at a given lo ation, and g and gn arethe enter pixel and n-th sampling point gray values (see Eq. (1)) at the samelo ation. As the quantizer (8) with threshold t = 0 is applied to If , it followsthat the resulting label Ilab is equal to that resulting from lo al binary patternoperator LBPP;R using the same neighborhood. Therefore, the LBP operator an be represented in the proposed framework.It should be noted that for LBPs, even stronger invarian e to gray level hangesholds than presented above. As dis ussed by Ojala et al. (2002), the LBP labelsare invariant to any monotoni mapping of gray values.9

Due to these reasons, the hoi e of threshold t = 0 has been ommon espe iallywith the LBP operator. Still, under some ir umstan es a di�erent hoi e mayyield better results. For instan e, when LBP features are used for ba kgroundsubtra tion, hoosing a non-zero threshold was observed to result in morestable odes in nearly at gray areas and, onsequently, in reased performan e(Heikkil�a and Pietik�ainen, 2006).Another method for quantizing the �lter responses is to onstru t a odebookof them at the learning stage and then use the nearest odeword to representthe �lter bank output at ea h lo ation:Ilab(x; y) = argminm jjIf(x; y)� mjj ; (10)in whi h m is the m-th ve tor ( odeword) in the odebook. This approa his used in (Leung and Malik, 2001) and (Varma and Zisserman, 2005), whi huse k-means to onstru t the odebook whose elements are alled textons.Codebook based quantization of signed di�eren es of neighboring pixels (whi h orrespond to lo al derivative �lter outputs) was presented in (Ojala et al.,2001).When omparing these two methods for quantizing the �lter responses, onemight expe t that the if the number of labels produ ed by the quantizersis kept roughly the same, the odebook based quantizer handles the possiblestatisti al dependen ies between the �lter responses better. On the other hand,sin e the odebook based quantization requires sear h for the losest odewordat ea h pixel lo ation, it is learly slower than simple thresholding, even thougha number of both exa t and approximate te hniques have been proposed for�nding the nearest odeword without exhaustive sear h through the odebook(Gray and Neuho�, 1998, p. 2362),(Ch�avez et al., 2001).3.3 Rotational invarian eIt is important to note that a lever o-design of the �lter bank and the ve torquantizer an also make the texture des riptor rotationally invariant. Again,two di�erent strategies have been proposed. Rotationally invariant LBP odesare obtained by ir ularly shifting a LBP binary ode to its minimum value(Ojala et al., 2002). In the joint framework this an be represented as further ombining the labels of threshold quantization (8) so that all the di�erentlabels that an arise from rotations of the lo al gray pattern are joined toform a single label.On the other hand, the approa h hosen for the MR8 des riptor to a hieverotational invarian e is to sele t only the maximum of the 6 di�erent rotations10

Table 1Properties of the tested �lter kernelsFilter bank Size Number of �ltersLo al derivative �lters 3� 3 8Gabor(1,4) 7� 7 8Gabor(4,6) 49� 49 48MR8 49� 49 38of ea h bar and edge �lters. Only these maximum values and the responses ofthe two isotropi �lters are used in further quantization so the 8-dimensionalresponse of the �lter is invariant to rotations of the gray pattern.4 ExperimentsThe proposed framework and the relative des riptiveness of the di�erent �l-ter banks and ve tor quantization methods were systemati ally tested in twodi�erent appli ation areas: material ategorization and illumination invariantfa e re ognition. Both of these are very hallenging unsolved problems so they learly highlight the performan e di�eren es of the operators.To test the proposed framework and to systemati ally explore the relativedes riptiveness of the di�erent �lter banks and ve tor quantization methods,the hallenging task of material ategorization using the KTH-TIPS2 database(Mallikarjuna et al., 2006) was utilized.The widely used CMU PIE (Pose, Illumination, and Expression) database bySim et al. (2003) was sele ted to serve as test material in the fa e re ognitionexperiments. It is espe ially interesting how the di�erent �lter banks and ve -tor quantizers respond to hanges in lighting onditions. The images in thePIE database ontain systemati lighting variation so it is very well suited forthese experiments.The same �lter banks were utilized in both experiments. The proposed frame-work allowed testing the performan e of di�erent �lters and di�erent quanti-zation methods independently. The �lter banks that were in luded in the testswere lo al derivative �lters, two di�erent banks of Gabor �lters and MR8 �l-ters. The lo al derivative �lter bank was hosen to mat h the LBP8;1 operatorwhi h resulted in 8 �lters (see Fig. 4). Two very di�erent types of Gabor �lterbanks were tested, one with only 1 s ale and 4 orientations and small spatialsupport (7 � 7) and another one with 4 s ales and 6 orientations and largerspatial support. The properties of the tested �lter kernels are listed in table1. 11

Fig. 5. Examples of images from the KTH-TIPS2 database. Figures in ea h olumnbelong to the same texture ategory.4.1 Material ategorizationThe KTH-TIPS2 (Mallikarjuna et al., 2006) was utilized to test the perfor-man e of the des riptors in the material ategorization task. The database ontains 4 samples of 11 di�erent materials, ea h sample imaged at 9 di�er-ent s ales and 12 lighting and pose setups, totaling 4572 images. Examples oftexture images from the KTH-TIPS2 database are shown in Figure 5.Caputo et al. performed material ategorization tests using the KTH-TIPS2and onsidered espe ially the signi� an e of lassi�er sele tion (Caputo et al.,2005). In that paper, the main on lusions were that the state-of-the-art de-s riptors su h as LBP and MR8 have relatively small di�eren es in the per-forman e but signi� ant gains in lassi� ation rate an be obtained by usingsupport ve tor ma hine lassi�er instead of nearest neighbor. Moreover, the lassi� ation rates an be enhan ed by in reasing the number of samples usedfor training.In this work, the main interest is to examine the relative des riptiveness ofdi�erent setups of the �lter bank based texture des riptors. To fa ilitate thistask, we hose a very hallenging test setup that resembles the most diÆ ultsetup used in (Caputo et al., 2005). Using ea h of the des riptors to be tested,a nearest neighbor lassi�er using Chi square distan e was trained with onesample (i.e. 9*12 images) per material ategory. The remaining 3*9*12 imageswere used for testing. This was repeated with 10000 random ombinations astraining and testing data and the mean and standard deviations over thepermutations were used to assess the performan e.12

Fig. 6. Example images of 2 out of the 68 subje ts in the CMU PIE database.4.2 Illumination invariant fa e re ognitionTo test the performan e of the des riptors in illumination invariant fa e re og-nition, the CMU PIE database was used. Totally, the database ontains 41368images of 68 subje ts taken at di�erent angles, lighting onditions and withvarying expression. For our experiments, we sele ted a set of 23 images of ea hof the 68 subje ts. 2 of these are taken with the room lights on and and theremaining 21 ea h with a ash at varying positions. Prepro essed exampleimages from the database are shown in Figure 6.In obtaining a des riptor for the fa ial image, the pro edure of Ahonen et al.(2006) was followed. The fa es were �rst normalized so that the eyes are at�xed positions. The sele ted �lter bank and ve tor quantization method wasthen applied and the resulting label image was ropped to size 128�128 pixels.Thus, for further analysis, the size of the ve tor valued image was the sameirrespe tive of the �lter kernel size.The labeled image was further divided into blo ks of size of 16�16 pixels andhistograms were omputed in ea h blo k individually and then on atenatedto form the spatially enhan ed histogram des ribing the fa e.Nearest neighbor lassi�er with Chi square distan e was utilized for lassi-� ation. One image per person was used for training and the remaining 22images for testing. Again, 10000 random sele tions into training and testingdata were used.4.3 Codebook Based Ve tor QuantizationAll the 4 �lter banks were tested using two types of ve tor quantization: thresh-olding and odebook based quantization. For odebook based quantization,the sele ted approa h was to aim for ompa t, universal texton odebooks,i.e. odebooks of rather small size that are not tailored for this spe i� set oftextures or fa es. Therefore, images from the CuRET texture database (Danaet al., 1999) were used to learn the odebooks for the texure ategorization13

16 32 64 128 2560

0.1

0.2

0.3

0.4

0.5

Local derivatives

Gabor(4,6)

Gabor(1,4)

MR8

16 32 64 128 2560

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Gabor(4,6)

Gabor(1,4)

Local derivatives

MR8

(a) (b)Fig. 7. (a) The KTH-TIPS2 ategorization rates and (b) CMU PIE re ognitionrates for k-means based ve tor quantization of �lter bank responses as a fun tionof odebook size.test and images from Yale B fa e database (Georghiades et al., 2001), forthe fa e re ognition test. The odebook sizes that were tested were 16 : : : 256 odewords.The texture ategorization rates and fa e re ognition rates as a fun tion of the odebook size obtained with ea h �lter bank and odebook based quantizationare plotted in Figure 7 (a) and (b). Figure 7 (a) shows that for most of thetime, using a larger odebook enhan es the texture ategorization rate but thesele tion of the �lter bank is a learly more dominant fa tor than the odebooksize. For example, lo al derivative �lters a hieve a higher ategorization ratewith the smallest odebook size than the MR8 �lters with any odebook size.The same applies to the fa e re ognition results, Figure 7 (b). Here it is seenthat Gabor(4,6) �lters with a large support area a hieve better re ognitionrates than Gabor(1,4) and lo al derivatives and ea h of these performs betterthan the MR8 des riptor. A probable ause for this is the rotational invarian ebuilt into MR8 des riptor that might a tually lose some useful information.4.4 Thresholding Based Ve tor QuantizationIn the next experiment the material ategorization and fa e re ognition testswere performed using the same �lter banks but thresholding based ve torquantization. The Gabor(4,6) �lter bank was omitted from this experimentdue to the large number of �lters in the �lter bank (the resulting histogramswould have been of length 248).For the lo al derivative and Gabor �lters have zero the thresholding fun tion(9) was applied dire tly with di�erent hoi es of threshold t. For the edge andbar �lters in the MR8 �lter set, only the maximum of responses over di�erent14

−0.1 −0.08 −0.06 −0.04 −0.02 0 0.02 0.04 0.06 0.08 0.10

0.1

0.2

0.3

0.4

0.5

0.6

Local derivatives

Gabor(1,4)

MR8

−0.1 −0.08 −0.06 −0.04 −0.02 0 0.02 0.04 0.06 0.08 0.10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Local derivatives

Gabor(1,4)

MR8

(a) (b)Fig. 8. (a) The KTH-TIPS2 ategorization rates and (b) CMU PIE re ognitionrates for thresholding quantization of �lter bank responses as a fun tion of thresholdvalue.Table 2Re ognition rates for di�erent �lter banks and quantization methodsTexture, CB Texture, thresh. Fa e, CB Fa e, thresh.Lo al der.�lt. 0.521 0.528 0.482 0.742Gabor(1,4) 0.458 0.366 0.502 0.883Gabor(4,6) 0.447 - 0.704 -MR8 �lters 0.455 0.492 0.204 0.348orientations is measured and therefore in that ase the mean of 8-dimensionalresponse ve tors over all the training images was omputed and subtra tedfrom the response before applying thresholding.Figure 8 shows the texture ategorization and fa e re ognition rates as a fun -tion of the threshold value. Interestingly, the shapes of the urves are somewhatdi�erent in the two tasks. Non-zero threshold values provide best results in tex-ture ategorization (Figure 8 (a)) with Gabor(1,4) and lo al derivative �ltersbut in general the performan e di�eren es aused by hanging the thresholdare small. In fa e re ognition, on the other hand, the situation is reversed.With all the �lter sets, best results are obtained with threshold t = 0. Thise�e t is likely to be due to lighting e�e ts, sin e as dis ussed in Se tion 3.2,using threshold t = 0 with �lters having zero mean yields in des riptor thatis invariant to aÆne hanges of gray values. Both in texture and in fa e ex-periments there are lighting hanges but in the texture experiments those are ompensated in part by the larger amount of training images under di�erentlighting onditions whereas in fa e re ognition experiment there is only onetraining image per subje t.Table 2 and Figures 9 and 10 show the texture ategorization and fa e re og-15

Local derivatives Gabor(1,4) MR8 Gabor(4,6)0

0.1

0.2

0.3

0.4

0.5

0.6

Codebook (256)

Thresholding

Fig. 9. The KTH-TIPS2 ategorization ratesnition rates using thresholding based quantization with t = 0 and odebookbased quantization with odebook size 256.In texture ategorization, odebook based quantization yields slightly worse ategorization rate than thresholding when using lo al derivative �lters, butthe di�eren e is smaller than the standard deviation in the rates. With the Ga-bor(1,4) �lter bank thresholding performs worse than odebook based quanti-zation, but interestingly with MR8 �lters, thresholding yields better rate. Thelo al derivative �lters give the best ategorization rate over the tested �ltersets with both quantization fun tions.The results obtained in these experiments are slightly di�erent than thosepresented in Ahonen and Pietik�ainen (2008) and those presented earlier byOjala et al. (2001) whi h showed that odebook based quantization of signedgray-level di�eren es yields slightly better re ognition than LBPs, howeverat the ost of higher omputational omplexity. We believe that this is dueto di�erent training data for the k-means algorithm. Our earlier experimentssuggested that odebook based quantization might perform slightly betterthan thresholding but in those experiments the material that was used tolearn the odebook had some overlap with the testing material. In the presentexperiments the training and testing data sets were ompletely separate so itmight be that the odebook learned from training images does not suit wellenough for des ribing the test images. Moreover, the images used for testingby Ojala et al. (2001) had less variation in lighting onditions than the KTH-TIPS2 images thus the robustness of thresholding to gray level variationsdis ussed in se tion 3.2 also explains these results.16

Local derivatives Gabor(1,4) MR8 Gabor(4,6)0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Codebook (256)

Thresholding

Fig. 10. The CMU PIE re ognition ratesIn illumination invariant fa e re ognition, the Gabor �lters show better per-forman e than lo al derivative �lters. With all the �lter banks, thresholdingbased quantization yields better re ognition rates than odebook. Here theperforman e gain of hanging odebook based quantization to thresholdingis apparent. Again, this somewhat surprising result is probably explained tosome degree by the invarian e of thresholding based quantization to gray level hanges.Considering the omputational ost of the presented methods, thresholdingbased quantization is mu h faster than odebook based quantization. As forthe �lter bank operations, the omputational ost grows with the size andnumber of �lters, but using FFT based onvolution an make the operationsfaster. Still, at two extremes, the omputations for lo al derivative �lter andthresholding based labeling of an image of size of 256� 256 take 0.04 se ondswhereas the odebook based labeling of the same image using Gabor(4,6) �l-ters (and performing onvolutions using FFT) take 10.98 se onds. Both run-ning times were measured using unoptimized Matlab implementations of themethods on a PC with AMD Athlon 2200 MHz pro essor.4.5 Filter subset sele tionThe third experiment tested whether it is possible to sele t a representativesubset of �lters from a large �lter bank for thresholding based quantization.The number of labels produ ed by the quantizer is 2N in whi h N is thenumber of �lters, whi h means that the length of the label histograms growsexponentially with respe t to the number of �lters. Thus a small �lter bank17

is desirable for the thresholding quantization.In this experiment, the Sequential Floating Forward Sele tion (SFFS) (Pudilet al., 1994) algorithm was used to sele t a maximum of 8 �lters from a larger�lter bank. The optimization riterion was the re ognition rate over the train-ing set (KTH-TIPS1). Two di�erent initial �lter banks were tested. First, 8�lters were sele ted from the 48 �lters in the Gabor(4,6) �lter bank. How-ever, the resulting 8-�lter bank did not perform well on the testing database,yielding a ategorization rate of only 0.295.The same experiment was done for fa e re ognition problem, using Yale Bimages as training data for SFFS. Again, 8 �lters were sele ted from the fullGabor(4,6) �lter bank using the re ognition rate in Yale B set as optimiza-tion riterion. Then the re ognition rate in the test set, CMU PIE dataset,was re orded. Here the �lter subset sele tion performed superbly, a hieving are ognition rate of 1.000.In the fa e re ognition literature, there are �ndings that LBP and Gabor �lterbased information are omplementary. In (Zhang et al., 2005), LBP histogramswere extra ted from Gabor �ltered images and in (Yan et al., 2007), s ore levelfusion of LBP and Gabor �lter based similarity s ores was done. Motivatedby these �ndings, SFFS was used to sele t 8 �lters from the union of lo alderivative and Gabor(1,4) �lter banks. This resulted in a set of 6 lo al deriva-tive and 2 Gabor �lters and the resulting �lter bank rea hed ategorizationrate of 0.544 whi h is signi� antly higher than the rate of Gabor(1,4) �lterbank and slightly higher than the rate of the lo al derivative �lter bank. Un-fortunately we were not able to a hieve performan e gains in fa e re ognitionby ombining the two types of �lters.5 Dis ussion and on lusionIn this paper we have presented a novel uni�ed framework under whi h the his-togram based image des ription methods su h as the well-known lo al binarypattern and MR8 des riptors an be explained and analyzed. Even thoughthis is still far from a omplete uni�ed theory of statisti al image des ription,the framework makes the di�eren es and similarities between the methodsapparent.Moreover, the presented framework allows for systemati omparison of di�er-ent des riptors and the parts that they are built of. Su h analyti approa h anbe useful in analyzing texture des riptors as they are usually presented in theliterature as a sequen e of steps whose relation to other des ription methodsis un lear. The framework presented in this work allows for expli itly illustrat-18

ing the onne tion between the parts of the LBP and MR8 des riptors andexperimenting with the performan e of ea h part.The �lter sets and ve tor quantization te hniques for LBP, MR8 and Gabor�lter based texture des riptors were ompared in the this paper. In this om-parison it was found out that the lo al derivative �lter responses are bothfastest to ompute and most des riptive in the texture ategorization task.This somewhat surprising result further attests the previous �ndings that tex-ture des riptors relying on small-s ale pixel relations yield omparable or evensuperior results to those based on �lters of larger spatial support (Ojala et al.,2002), (Varma and Zisserman, 2003).On the other hand, in fa e re ognition, the Gabor �lters showed better perfor-man e than lo al derivatives or the MR8 des riptor. It seems that larger spatialsupport is bene� ial in fa e des ription as Gabor(4,6), having the largest spa-tial support of 49� 49 pixels, performed better than Gabor(1,4) (7� 7 pixels)and lo al derivatives with the smallest support (3 � 3 pixels) gave the worstperforman e of the three. Furthermore, there is eviden e that Gabor �lterssuppress the e�e ts of lighting variation in fa ial images and for this reasonthey have even been used as prepro essing for the LBP des riptor (Zhanget al., 2005).When omparing the di�erent ve tor quantization methods, the experimentalresults show that thresholding is faster and in most ases also more des riptivemethod for quantization than odebooks. The likely explanation for this isthe robustness of thresholding with t = 0 to illumination hanges. This isespe ially apparent in the fa e des ription experiments, where the performan edrops when the threshold t = 0 is repla ed with a di�erent value or with odebook quantization.Finally, the experiments on �lter subset sele tion and Gabor �lter responsesshowed that these �lter sets may be omplementary and may yield betterperforman e than either of the sets alone in texture des ription. In fa e re og-nition, signi� ant performan e gain was a hieved by sele ting a subset of Ga-bor(4,6) �lters and applying it to threshold based des ription.Not only does the presented framework ontribute to understanding and om-parison of existing texture des riptors but it an be utilized for more system-ati development of new, even better performing methods. The frameworkis simple to implement and together with the publi ly available KTH-TIPS2and CMU PIE image databases it an be easily used for omparing novel de-s riptors with the urrent state-of-the-art methods. We believe that furtheradvan es in both the �lter bank and ve tor quantizer design are possible,espe ially as new invarian e properties of the des riptors are aimed for.19

Referen esAhonen, T., Hadid, A., Pietik�ainen, M., 2006. Fa e des ription with lo al bi-nary patterns: Appli ation to fa e re ognition. IEEE Transa tions on Pat-tern Analysis and Ma hine Intelligen e 28 (12), 2037{2041.Ahonen, T., Pietik�ainen, M., 2008. A framework for analyzing texture de-s riptors. In: 3rd International Conferen e on Computer Vision Theory andAppli ations (VISAPP 2008). pp. 1:507{512.Caputo, B., Hayman, E., Mallikarjuna, P., 2005. Class-spe i� material ate-gorisation. In: ICCV. IEEE Computer So iety, pp. 1597{1604.Ch�avez, E., Navarro, G., Baeza-Yates, R. A., Marroqu��n, J. L., 2001. Sear hingin metri spa es. ACM Computing Surveys 33 (3), 273{321.Clausi, D. A., Jernigan, M. E., 2000. Designing Gabor �lters for optimal tex-ture separability. Pattern Re ognition 33 (11), 1835{1849.Dana, K. J., van Ginneken, B., Nayar, S. K., Koenderink, J. J., 1999. Re- e tan e and texture of real-world surfa es. ACM Transa tions on Graphi s18 (1), 1{34.Georghiades, A., Belhumeur, P., Kriegman, D., 2001. From few to many: Illu-mination one models for fa e re ognition under variable lighting and pose.IEEE Transa tions on Pattern Analysis and Ma hine Intelligen e 23 (6),643{660.Gray, R. M., Neuho�, D. L., 1998. Quantization. IEEE Transa tions on Infor-mation Theory 44 (6), 2325{2383.Grigores u, S. E., Petkov, N., Kruizinga, P., 2002. Comparison of texturefeatures based on Gabor �lters. IEEE Transa tions on Image Pro essing11 (10), 1160{1167.Harali k, R. M., 1979. Statisti al and stru tural approa hes to texture. Pro- eedings of IEEE 67 (5), 786{804.Heikkil�a, M., Pietik�ainen, M., 2006. A texture-based method for modelingthe ba kground and dete ting moving obje ts. IEEE Trans. Pattern Anal.Ma h. Intell. 28 (4), 657{662.Leung, T., Malik, J., 2001. Representing and re ognizing the visual appear-an e of materials using three-dimensional textons. International Journal ofComputer Vision 43 (1), 29{44.Lowe, D. G., 2004. Distin tive image features from s ale-invariant keypoints.International Journal of Computer Vision 60 (2), 91{110.Mallikarjuna, P., Fritz, M., Targhi, A. T., Hayman, E., Caputo, B.,Eklundh, J.-O., 2006. The KTH-TIPS and KTH-TIPS2 databases.Http://www.nada.kth.se/ vap/databases/kth-tips/.Manjunath, B. S., Ma, W.-Y., 1996. Texture features for browsing and retrievalof image data. IEEE Trans. Pattern Anal. Ma h. Intell. 18 (8), 837{842.Ojala, T., Pietik�ainen, M., M�aenp�a�a, T., Jul 2002. Multiresolution gray-s aleand rotation invariant texture lassi� ation with lo al binary patterns. IEEETransa tions on Pattern Analysis and Ma hine Intelligen e 24 (7), 971{987.Ojala, T., Valkealahti, K., Oja, E., Pietik�ainen, M., 2001. Texture dis rimi-20

nation with multidimensional distributions of signed gray-level di�eren es.Pattern Re ognition 34 (3), 727{739.Pudil, P., Novovi ov�a, J., Kittler, J., 1994. Floating sear h methods in featuresele tion. Pattern Re ognition Letters 15 (10), 1119{1125.S hiele, B., Crowley, J. L., 2000. Re ognition without orresponden e usingmultidimensional re eptive �eld histograms. International Journal of Com-puter Vision 36 (1), 31{50.Sim, T., Baker, S., Bsat, M., 2003. The mu pose, illumination, and expressiondatabase. IEEE Transa tions on Pattern Analysis and Ma hine Intelligen e25 (12), 1615{1618.The Lo al Binary Pattern Bibliography, 2008.Http://www.ee.oulu.�/resear h/imag/texture/lbp/bibliography/.Tu eryan, M., Jain, A. K., 1998. Texture analysis. In: Chen, C. H., Pau, L. F.,Wang, P. S. P. (Eds.), The Handbook of Pattern Re ognition and ComputerVision (2nd Edition). World S ienti� Publishing Co., pp. 207{248.Varma, M., Zisserman, A., Jun. 2003. Texture lassi� ation: Are �lter banksne essary? In: Pro eedings of the IEEE Conferen e on Computer Vision andPattern Re ognition (CVPR 2003). Vol. 2. pp. 691{698.Varma, M., Zisserman, A., 2004. Unifying statisti al texture lassi� ationframeworks. Image and Vision Computing 22 (14), 1175{1183.Varma, M., Zisserman, A., April 2005. A statisti al approa h to texture lassi-� ation from single images. International Journal of Computer Vision 62 (1{2), 61{81.Wiskott, L., Fellous, J.-M., Kuiger, N., von der Malsburg, C., 1997. Fa ere ognition by elasti bun h graph mat hing. IEEE Transa tions on Pat-tern Analysis and Ma hine Intelligen e 19, 775{779.Yan, S., Wang, H., Tang, X., Huang, T. S., 2007. Exploring feature des riptorsfor fa e re ognition. In: International Conferen e on A ousti s, Spee h, andSignal Pro essing (ICASSP 2007), Honolulu, Hawaii, USA. IEEE SignalPro essing So iety, pp. I:629{632.Zhang, B., Shan, S., Chen, X., Gao, W., 2007. Histogram of Gabor phase pat-terns (HGPP): A novel obje t representation approa h for fa e re ognition.IEEE Transa tions on Image Pro essing 16 (1), 57{68.Zhang, W., Shan, S., Gao, W., Chen, X., Zhang, H., 2005. Lo al Gabor binarypattern histogram sequen e (LGBPHS): A novel non-statisti al model forfa e representation and re ognition. In: Pro . Tenth IEEE InternationalConferen e on Computer Vision (ICCV 05). Vol. 1. pp. 786{791.Zou, J., Ji, Q., Nagy, G., 2007. A omparative study of lo al mat hing ap-proa h for fa e re ognition. IEEE Transa tions on Image Pro essing 16 (10),2617{2628.21

Documents

Timo Ahonen Matti Pietik ainen