16

INV - pdfs.semanticscholar.orgSeo w Y ong Lai, W ee Kheng Leo Dep artment of Information Systems and Computer Scienc e, National University of Singap or e, L ower Kent R idge o ad,

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: INV - pdfs.semanticscholar.orgSeo w Y ong Lai, W ee Kheng Leo Dep artment of Information Systems and Computer Scienc e, National University of Singap or e, L ower Kent R idge o ad,

INVARIANT TEXTURE MATCHING FOR CONTENT-BASEDIMAGE RETRIEVALSeow Yong Lai, Wee Kheng LeowDepartment of Information Systems and Computer Science,National University of Singapore, Lower Kent Ridge Road, Singapore 119260Texture provides a very useful cue for retrieving images from a database. An imageof natural scene typically contains several distinct textured regions. The patternsin di�erent parts of a region are perceived by humans to be the same textureeven though they are often non-uniform. That is, their scales and orientationsvary throughout the region due to perspective distortion. Existing texture-basedimage retrieval methods handle only images containing single, uniform textureand are inappropriate for retrieving natural images. This paper presents a texturematching method for content-based image retrieval that is invariant to both scaleand orientation. It uses the outputs of a set of Gabor �lters to describe the spatialfrequencies and orientations of a texture pattern. It can thus take into accountpossible di�erences in scale and orientation when it matches the query texture withthose in the database.1 IntroductionTexture provides a very useful cue for retrieving images from a database. Manyobjects, such as brick walls and tiled roofs, can be recognized based on theirdistinctive texture patterns. Images containing these objects can be identi�edusing texture as the search key.To retrieve images based on texture, it is necessary to devise a matchingmethod that returns a strong match between two texture patterns that areperceived to be similar by human. Texture matching turns out to be di�cultfor images of natural scene. Take for example Fig. 1. The image containsseveral textured regions, such as plant, brick walls and window panes. Thebrick patterns further away from the viewer have �ner scales than those nearby.The patterns on the left have di�erent orientations compared to those on theright. Nevertheless, these brick patterns are perceptually similar even thoughthey vary in scales and orientations. At the same time, they are perceptuallydistinct from the plant and the window texture. Therefore, texture matchingmust be scale- and orientation-invariant so that perceptually similar patternswill have a strong match.Existing texture matching methods handle only images containing plaintexture taken from the Brodatz album or aerial photographs. Each image con-tains a single texture pattern that is uniform in scale and orientation. Thesemethods are thus inappropriate for matching natural images containing mul-Appears in Proceedings of International Conference on Multimedia Modelling,1997. 1

Page 2: INV - pdfs.semanticscholar.orgSeo w Y ong Lai, W ee Kheng Leo Dep artment of Information Systems and Computer Scienc e, National University of Singap or e, L ower Kent R idge o ad,

Figure 1: An image of natural scene containing multiple textured regions such as plant, brickwalls and window panes.tiple texture patterns that can vary in scale and orientation.This paper presents a novel scale- and orientation-invariant texture match-ing method for content-based image retrieval. The method is also used in theimage indexing process to identify perceptually uniform textured regions andtheir approximate locations, which can be useful for image retrieval. For ex-ample, users can specify in their queries the type of texture they want as wellas the location of the textured region. The addition of location informationcan help to increase retrieval accuracy.2 Related WorksThree types of features are commonly used in existing texture-based imageretrieval systems, namely statistical measures, Wold features and Gabor fea-tures.Statistical measures1{3 are obtained by computing the local statistical dis-tribution of image intensity. For example, IBM's Query By Image Content(QBIC) system2 uses measures such as coarseness, contrast and directionalityas texture features. Coarseness measures the texture's apparent grain size, con-trast measures the di�erence in the gray-level distribution, and directionality2

Page 3: INV - pdfs.semanticscholar.orgSeo w Y ong Lai, W ee Kheng Leo Dep artment of Information Systems and Computer Scienc e, National University of Singap or e, L ower Kent R idge o ad,

measures the local edge probabilities along various directions. Although thesefeatures are easy to compute, they are not scale- and orientation-invariant.There are newer statistical approaches4, 5 that are rotation-invariant, but theirapplication to content-based image retrieval are yet to be seen.The Wold model6, 7 extracts the periodicity, randomness and directionalityof a texture. Periodicity and directionality are related respectively to thespatial frequency and orientation of the texture, and randomness measureshow uniform is the texture. Test results of retrieving Brodatz images showthat texture matching based on the Wold features can tolerate a variety ofrotational and local inhomogeneities. However, the method has not been shownto be truly rotation- or scale-invariant, and no result on retrieving images ofnatural scene has been reported.Ma and Manjunath8 used Gabor texture features for image retrieval. Ga-bor features are obtained by convolving an image with a set of Gabor �lters.A simple learning algorithm is used to match the Gabor features. Test resultsindicate that this method of retrieving Brodatz images and aerial photographscompares favorably with those using other texture features. However, its per-formance on retrieving images of natural scene has not been reported.Existing methods8{10 use Gabor �lters with minimum overlaps in theirspatial frequencies and orientations. On the other hand, the texture matchingmethod described in this paper uses overlapping �lters to minimize the varia-tion of feature pattern due to changes in texture frequency and orientation (seedetails in Sect. 3.2). Moreover, it uses all the outputs of the Gabor �lters toprovide as much orientation and frequency information as possible for texturematching. In comparison, existing methods use only some of the Gabor �lters'outputs.3 Texture Feature ExtractionTo perform texture matching, the texture features in an image must be ex-tracted. This section discusses how Gabor texture features are extracted. Italso describes how neighbouring regions with similar features are grouped to-gether to reduce the amount of information to be stored for each image. Theinvariant matching method will be described in Sect. 4.3.1 Gabor Texture FeatureThe Gabor function h(x; y) at image position (x; y) is a complex sinusoidalgrating modulated by an oriented Gaussian function g(x0; y0):11h(x; y) = g(x0; y0) exp(2�jfx0) (1)3

Page 4: INV - pdfs.semanticscholar.orgSeo w Y ong Lai, W ee Kheng Leo Dep artment of Information Systems and Computer Scienc e, National University of Singap or e, L ower Kent R idge o ad,

g(x0; y0) = 12���2 exp �� (x0=�)2 + y022�2 � (2)where (x0; y0) = (x cos � + y sin �;�x sin � + y cos �) are rotated coordinatesoriented at angle � from the x-axis, � is the aspect ratio, and � is the scaleparameter. The Gabor function has radial frequency f and orientation �:f =pU2 + V 2 � = tan�1�VU � (3)where U and V are spatial frequencies in the x- and y-directions. Its frequency(octave) bandwidth B and orientation (radian) bandwidth at half-peak aregiven by: B = log2 ��f�� + ��f�� � �� = 2 tan�1� ��f�� (4)where � = p(ln 2)=2. The range of spatial frequencies within the frequencyand orientation bandwidths is called the half-peak support .A multi-channel approach is adopted to extract texture features. An inputimage I(x; y) is �ltered by a set of Gabor �lters with di�erent frequencies fand orientations �: kc;f�(x; y) = I(x; y) � hc;f�(x; y) (5)ks;f�(x; y) = I(x; y) � hs;f�(x; y) (6)where hc;f�(x; y) and hs;f�(x; y) are the real and the imaginary components ofthe Gabor function (Eq. 1). The Gabor channels' output energy is given by:Ef�(x; y) = k2c;f�(x; y) + k2s;f�(x; y) : (7)After Gabor �ltering, the channels' outputs are smoothed by Gaussian �l-ters to remove local variations introduced by the sinusoidal terms in the Gaborfunctions. The Gaussian �lters have the same aspect ratios and orientationsas the Gabor functions but larger scale parameters �. In the current imple-mentation, the Gaussians' scale parameters are set at four times those of thecorresponding Gabors.Notice from Eq. 5{7 that two identical texture patterns with di�erent in-tensities would produce �lter outputs with di�erent energy levels. The brighterpattern will have higher energy than the dimmer one although they are percep-tually similar. It is thus necessary to remove the di�erence in channel outputsdue to intensity variation. This is accomplished by normalizing the outputs4

Page 5: INV - pdfs.semanticscholar.orgSeo w Y ong Lai, W ee Kheng Leo Dep artment of Information Systems and Computer Scienc e, National University of Singap or e, L ower Kent R idge o ad,

(a) (b)Figure 2: A uniform texture (a) and it's corresponding feature vector (b) in the f-� space.The feature vector is shown as a 2-D pattern with frequency increasing from bottom up andorientation changes from left to right.by their largest components:kf�(x; y) = Ef�(x; y)maxf 0�0 Ef 0�0(x; y) : (8)The normalized outputs now form the texture feature vector k(x; y) in the f -�space (Fig. 2).3.2 Overlapping of Filter ChannelsExisting methods use Gabor �lters with minimum overlaps in their half-peaksupports. In contrast, �lters with signi�cant overlapping supports are usedhere. The �lters are designed using the following parameters:fi = fm2i�B �i = i� (9)where fm is the maximum spatial frequency, B and are the spatial frequencyand orientation bandwidths, and � is inversely proportional to the amount ofoverlap in the �lters' supports. In the current implementation, fm = 0:3,B = 0:5 octave, = 45�, and � = 0:5. A total of 48 �lters are used, with 6spatial frequencies and 8 orientations.With overlapping �lter supports, several �lter channels can respond to thesame texture, but they will have di�erent output strength. Each channel re-sponds most strongly to the texture that contains the matching frequency andorientation. As a result, the �lters respond to di�erent texture features withdi�erent output patterns, which can be used to represent the texture. Thismethod of representing texture features is known as distributed representationin the Neural Networks literatures.12 Distributed representation has the ad-vantage of encoding a large number of di�erent texture types using a smallnumber of channel outputs. The overlapping of �lters' supports ensures that5

Page 6: INV - pdfs.semanticscholar.orgSeo w Y ong Lai, W ee Kheng Leo Dep artment of Information Systems and Computer Scienc e, National University of Singap or e, L ower Kent R idge o ad,

the distributed representation will not be severely altered by slight changesin texture frequency and orientation. This characteristic facilitates a closermatch between similar texture patterns that vary in orientation and scale.3.3 Texture Region GroupingNeighbouring regions with similar texture features are grouped together toreduce the amount of data needed to be stored for an image. The regiongrouping process consists of two stages: region growing and region merging.In the region growing stage, the input image is �rst divided into seed regionsof 8 � 8 pixels. Regions with small magnitudes in all the Gabor channelsare discarded because they lack signi�cant information for texture matching.Each remaining region i is characterized by an average feature vector ki. Tworegions i and j are merged if:� they share a common boundary and� they are similar enough: SA(ki;kj) > �Awhere SA is the absolute matching criterion, de�ned as the normalized dotproduct between ki and kj :SA(ki;kj) = ki � kjkkik kkjk (10)and �A is a constant threshold currently set at cos 21:5�. The feature vector kof the merged region is then computed as the weighted average of ki and kj :k = 1ni + nj (niki + njkj) (11)where ni and nj are the numbers of pixels in regions i and j.After the growing process, regions with less than 50 � 8 � 8 pixels arediscarded because they are too small to contain signi�cant texture. In addition,feature vectors that contain many large vector components are also discardedbecause they are either non-uniform or noisy.In natural images, the texture in a perceptually uniform region often variesgradually in frequency and orientation. The absolute matching criteria SAcannot combine them into a single region. Instead, they are combined in theregion merging stage. Two regions i and j are merged if:� they share a common boundary and� they are similar enough: SI(ki;kj) > �I6

Page 7: INV - pdfs.semanticscholar.orgSeo w Y ong Lai, W ee Kheng Leo Dep artment of Information Systems and Computer Scienc e, National University of Singap or e, L ower Kent R idge o ad,

where SI is a scale- and orientation-invariant matching criteria to be describedin more details in Sect. 4, and �I is a constant threshold set at cos 31:8�. All thefeature vectors in the two regions are collected into the merged region. Theyare not averaged because in a large region, texture patterns in the extremeends may di�er signi�cantly in scales and orientations even though they varygradually over the entire region. After region merging, the regions and theirfeature vectors are indexed and stored in the database.4 Invariant Texture MatchingThe invariant matching method exploits a property of the feature vectors inthe 2-D f -� space. When a texture is isotropically scaled, the 2-D featurepattern (Fig. 2b) remains approximately unchanged except for a shift alongthe f -axis (vertical axis). Similarly, when the texture is rotated, the featurepattern just shifts along the �-axis (horizontal axis).The above property is used to achieve orientation- and scale-invariantmatching by shifting and matching the feature patterns. The invariant match-ing criterion SI between texture vectors k1 and k2 is de�ned as follows:SI(k1;k2) = maxf;� SB(k01;k2) (12)where k01 is the pattern k1 shifted by an amount f along the frequency axisand and � along the orientation axis. The matching criterion SB is similar toSA: SB(ki;kj) = ki � kjkkik kkjk (13)where ki � kj is a modi�ed vector dot product:ki � kj = 8><>:Xl maxm2N (l)kilkjm if � < jkil � kjljmax(kil; kjl) < �ki � kj otherwise (14)where N (l) is the neighbourhood of the vector component l in the f -� space.Due to perspective distortion, texture in a perceptually uniform region oftendoes not vary isotropically in scale. As a result, di�erent parts of the featurepatterns are shifted by di�erent amounts in the f -� space. To handle suchnon-uniform shifts, Eq. 14 computes the sum of product of the best matchingvector components instead of the ordinary vector dot product. To avoid falsematches, the modi�ed vector dot product is used only if the vectors are similar,i.e., � < jkil � kjlj=max(kil; kjl) < �. In the current implementation, � = 0:10and � = 0:20. 7

Page 8: INV - pdfs.semanticscholar.orgSeo w Y ong Lai, W ee Kheng Leo Dep artment of Information Systems and Computer Scienc e, National University of Singap or e, L ower Kent R idge o ad,

For two feature vectors k1 and k2 that are strongly correlated in their2-D patterns, the maximum of the shifted SB will be large regardless of theamount of shift. This method permits the matching of two identical texturesthat di�er only in scale and orientation.During image retrieval, the invariant matching method is used to matcha query texture with those in the database. An image may contain severaldi�erent texture feature vectors located at di�erent regions. The goodness ofmatch S between query texture k and image J is given by the largest matchingvalue between k and image J 's feature vectors ki:S(k; J) = maxki of J SI(k;ki) : (15)5 Test ResultsThis section illustrates some results of applying the invariant matching methodon image retrieval. The test database contains 60 natural images of varioussizes with a total of 239 textured regions. The images retrieved are rankedaccording to the texture matching value S (Eq. 15). Due to space constraints,only the top twelve ranking images are shown.Figures 3 and 4 compare image retrieval using invariant and typical non-invariant matching methods. The non-invariant method is derived from Eq. 15by replacing the invariant matching criterion SI by the absolute one SA. Thistype of matching is typical of the methods used in existing works and it isunable to match identical texture patterns that di�er a lot in scale and orien-tation. With the non-invariant method, only one relevant image (Fig. 3, image1) has a high matching value; others (Fig. 3, images 8, 9 and 11) have verylow matching values. Moreover, only 4 out of the 7 images with identical tiledroofs are ranked within the top 12. This result shows that the non-invariantmethod cannot tolerate di�erences in scale and orientation of perceptually sim-ilar texture patterns. In contrast, the invariant method gives high matchingvalues to relevant images (Fig. 4, images 1, 3, 8, 9 and 12). Six out of theseven images with identical tiled roofs are ranked within the top 13 positions(the 13th image is not shown in Fig. 4). In addition, 2 images with similartiled roofs (Fig. 4, images 10 and 11) are also retrieved. Other images in Fig. 4(images 2, 4, 5, 6 and 7) are ranked highly because their texture patterns havedominant lines in one direction, similar to the query texture. An interestingretrieval result is shown in Fig. 4, image 6. Despite the presence of the fencein the foreground, the segmentation algorithm can still isolate the sliding doortexture because it has higher contrast and is, therefore, more prominent. Thistexture is similar to the query texture and is ranked highly by the texturematching algorithm. 8

Page 9: INV - pdfs.semanticscholar.orgSeo w Y ong Lai, W ee Kheng Leo Dep artment of Information Systems and Computer Scienc e, National University of Singap or e, L ower Kent R idge o ad,

Query Texture

(1) 0.791222 (2) 0.711969(3) 0.740562 (4) 0.720556(5) 0.708253 (6) 0.700136Figure 3: Images retrieved by the non-invariant method ranked according to matching value.The white dots indicate the centres of the regions where the image texture best match thequery texture. Due to space constraint, the retrieved images are displayed at reduced scales.9

Page 10: INV - pdfs.semanticscholar.orgSeo w Y ong Lai, W ee Kheng Leo Dep artment of Information Systems and Computer Scienc e, National University of Singap or e, L ower Kent R idge o ad,

(7) 0.644191 (8) 0.638823

(9) 0.526878 (10) 0.495400

(11) 0.478321 (12) 0.474476Figure 3: (cont.) Images retrieved by the non-invariant method.10

Page 11: INV - pdfs.semanticscholar.orgSeo w Y ong Lai, W ee Kheng Leo Dep artment of Information Systems and Computer Scienc e, National University of Singap or e, L ower Kent R idge o ad,

(1) 0.993354 (2) 0.989987

(3) 0.970841 (4) 0.962419

(5) 0.944764 (6) 0.943385Figure 4: Images retrieved by the invariant method. The query texture is shown in Fig. 3.11

Page 12: INV - pdfs.semanticscholar.orgSeo w Y ong Lai, W ee Kheng Leo Dep artment of Information Systems and Computer Scienc e, National University of Singap or e, L ower Kent R idge o ad,

(7) 0.942859 (8) 0.938972(9) 0.934341 (10) 0.927344(11) 0.927009 (12) 0.925974Figure 4: (cont.) Images retrieved by the invariant method.

12

Page 13: INV - pdfs.semanticscholar.orgSeo w Y ong Lai, W ee Kheng Leo Dep artment of Information Systems and Computer Scienc e, National University of Singap or e, L ower Kent R idge o ad,

Query Texture (1) 0.960227 (2) 0.950997

(3) 0.948761 (4) 0.945986

(5) 0.945163 (6) 0.940494Figure 5: Images retrieved by the invariant method.13

Page 14: INV - pdfs.semanticscholar.orgSeo w Y ong Lai, W ee Kheng Leo Dep artment of Information Systems and Computer Scienc e, National University of Singap or e, L ower Kent R idge o ad,

(7) 0.939497 (8) 0.936643(9) 0.933211 (10) 0.928799

(11) 0.926664 (12) 0.926617Figure 5: (cont.) Images retrieved by the invariant method.14

Page 15: INV - pdfs.semanticscholar.orgSeo w Y ong Lai, W ee Kheng Leo Dep artment of Information Systems and Computer Scienc e, National University of Singap or e, L ower Kent R idge o ad,

Figure 5 shows another retrieval result using invariant matching. There are17 images with brick texture in the database. Eight of them have similar brickpatterns as the query texture. Five of them have similar brick patterns buttheir contrasts are not uniform and they appear to be di�erent from the querytexture. The remaining 4 have di�erent brick patterns. Out of the 8 imageswith similar brick texture, 5 are ranked within the top 8. Comparing this resultwith that of the non-invariant method (not shown), the �fth and eighth imageshave risen from positions 9 and 49 respectively using the invariant matching.This result shows that the invariant method is able to handle di�erences inscale and orientation. However, the seventh image has dropped in ranking(from position 4) because other images with similar texture (Fig. 5, images 4and 6) have higher rankings. Nevertheless, it is still ranked among the top 12.The above retrieval results show that relevant images are retrieved andhave high matching values, but other irrelevant images may also have highrankings. In other words, the recall of relevant images is good, and the retrievalaccuracy needs to be improved. The current method works well for retrievingstructured texture but not so well for less structured patterns such as grassand sand. The method is currently being revised to improve the retrievalaccuracy and to handle less structured texture. Several methods are beingconsidered, including extracting additional texture features for more accuratetexture discrimination, devising a better invariant matching method, and usinglearning techniques to �ne tune the invariant matching criterion.6 ConclusionThis paper presented a novel scale- and orientation-invariant matching methodfor texture-based image retrieval. The method uses the output pattern of Ga-bor �lters to represent di�erent texture types as well as the scale and orienta-tion of the texture. As a result, similar texture patterns that di�er in scalesand orientations can be closely matched. Compared with non-invariant meth-ods, which cannot match perceptually similar patterns that di�er in scalesand orientations, the invariant method has been shown to perform better inretrieving images of natural scene.AcknowledgmentsThis research is supported by NUS Academic Research Grant RP950656 andNUS Research Scholarship HD96-0672W.15

Page 16: INV - pdfs.semanticscholar.orgSeo w Y ong Lai, W ee Kheng Leo Dep artment of Information Systems and Computer Scienc e, National University of Singap or e, L ower Kent R idge o ad,

References1. P.M. Kelly, M. Canon, and D.R. Hush. Query by image example: TheCANDID approach. SPIE Storage and Retrieval for Image and VideoDatabases III, 2420:238{248, 1995.2. W. Niblack, R. Barber, W. Equitz, M. Flickner, E. Glasman, D. Petkovic,P. Yanker, C. Faloutsos, and G. Taubin. The QBIC Project: Queryingimages by content using color, texture and shape. In Proceedings ofSPIE Storage and Retrieval for Image and Video Databases, pages 173{181, Feb 1993.3. T. Pun and D. Squire. Statistical structuring of pictorial databasesfor content-based image retrieval systems. Pattern Recognition letters,17(12):1299{1310, Oct 1996.4. S.V.R. Madiraju and C.-C. Liu. Rotation invariant texture classi�cationusing covariance. In Proceedings of International Conference on ImageProcessing, volume 2, pages 655{659, 1994.5. S.V.R. Madiraju, T.M. Caelli, and C.-C. Liu. On the covariance tech-nique for robust and rotation invariant texture processing. In Proceedingsof Asian Conference on Computer Vision, pages 171{174, 1993.6. F. Liu and R.W. Picard. Periodicity, directionality and randomness:Wold features for image modeling and retrieval. IEEE Transactions onPattern Analysis and Machine Intelligence, 18(17):722{733, July, 1996.7. A. Prentland, R.W. Picard, and S. Sclaro�. Photobook: Tools forcontent-based manipulation of image databases. In SPIE Storage andRetrieval Image and Video Database II, volume 2, pages 34{47, 1994.8. W.Y. Ma and B.S. Manjunath. Texture features and learning similarity.In Proceedings of IEEE Conference on Computer Vision and PatternRecognition, pages 1160{1169, June, 1996.9. A.K. Jain and F. Farrokhnia. Unsupervised texture segmentation usingGabor �lters. Pattern Recognition, 24(12):1167{1186, 1991.10. A. Perry and D.G. Lowe. Segmentation of textured images. In Proceed-ings of IEEE Conference on Computer Vision and Pattern Recognition,pages 319{325, 1989.11. A.C. Bovik, M. Clark, and W.S. Geisler. Multichannel texture analysisusing localized spatial �lters. IEEE Transactions on Pattern Analysisand Machine Intelligence, 12(1):55{73, 1990.12. G.E. Hinton, J.L. McClelland, and D.E. Rumelhart. Distributed rep-resentation. In D.E. Rumelhart and J.L. McClelland, editors, ParallelDistributed Processing. MIT Press, Cambridge, Massachusetts, 1986.16