12
The method for image retrieval based on multi-factors correlation utilizing block truncation coding Wang Xingyuan n , Wang Zongyu Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian 116024, China article info Article history: Received 16 December 2013 Received in revised form 18 March 2014 Accepted 18 April 2014 Available online 2 May 2014 Keywords: Content based image retrieval Block truncation coding Multi-factors correlation Structure element correlation Gradient value correlation Gradient direction correlation abstract In this paper, we proposed multi-factors correlation (MFC) to describe the image, structure element correlation (SEC), gradient value correlation (GVC) and gradient direction correlation (GDC). At rst, the RGB color space image is converted to a bitmap image and a mean color component image utilizing the block truncation coding (BTC). Then, three correlations will be used to extract the image feature. The structure elements can effectively represent the bitmap which is generated by BTC, and SEC can effectively denote the bitmap's structure and the correlation of the block in the bitmap. GVC and GDC can effectively denote the gradient relation, which is computed by a mean color component image. Formed by SEC, GVC and GDC, the image feature vectors can effectively represent the image. In the end, the results demonstrate that the method has better performance than other image retrieval methods in the experiment. & 2014 Elsevier Ltd. All rights reserved. 1. Introduction In recent years, image databases are increasing fast. It is difcult to capture useful images in a huge image database, and it becomes worse that we frequently obtain useful images through the Internet which is widely spreading. Existing methods are usually difcult to change, so it is urgent to nd a new way to retrieve images accurately. Thus image retrieval becomes an important topic in image processing and pattern recognition. Generally speaking, images can be retrieved in three ways: text based, content based and semantic based [18]. The text based retrieval approach is used widely, such as, Baidu and Google. With this method we can retrieve images using keywords that are annotated on the images. However, we often obtain images that are unrelated to our expected results, and the results of the image retrieval rely on our understanding of the query images. There are two drawbacks to this approach. Firstly, images in the database are annotated manually by annotators; it is time-consuming for a huge image database and requires much human labor for annotation. Secondly, the results of the retrieval are inaccurate, because the results of the retrieval are related to an understanding of the query images. The second approach, i.e. content based image retrieval (CBIR), was proposed in the early 1990s [916]. This approach is to retrieve images using low-level features like color, texture and shape that can represent an image. This approach involves querying an example image rst, extracting the low-level features on the example image, then computing the similarity between the query image and the images in the image dataset and, nally, sorting images by similarity and displaying the top images. CBIR has been shown to be much more effective and subjective than the text based approach [1719]. The nal approach is the semantic based method. The CBIR also fails to describe the semantic concepts, so researchers proposed some meth- ods for image retrieval using relevance feedback algorithms. The relevance feedback algorithms capture a user's preferences and bridge the semantic gap [20,21]; the results of using this method which is based on relevance feedback is more approach human perception. In CBIR, color, texture and shape are the most important features. HSV color space is widely used in extracting color features. In this space, hue is used to distinguish color, saturation is the percentage of white light added to a pure color, and value refers to the perceived light intensity [4,22]. The advantage of HSV color space is that it is closer to human conceptual understanding of colors. In order to cut down computing complexity and extract the color features in an efcient way, the HSV color space is quantized to 72 bins, in general [22]. Generally, in CBIR, the descriptors representing the image are based on color, shape or texture and are used to describe the image. Various algorithms have been designed to extract features for image retrieval. The multi-texton histogram (MTH) proposed for image retrieval integrates the advantages of a co-occurrence matrix and a histogram by representing the attribute of co-occurrence matrix using a histogram [1]. A novel image feature detecting and describing method is proposed, called micro-structure descriptor (MSD). The MSD is built based on the underlying colors in micro- Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/pr Pattern Recognition http://dx.doi.org/10.1016/j.patcog.2014.04.020 0031-3203/& 2014 Elsevier Ltd. All rights reserved. n Corresponding author. E-mail addresses: [email protected] (X. Wang), [email protected] (Z. Wang). Pattern Recognition 47 (2014) 32933303

The method for image retrieval based on multi-factors ...download.xuebalib.com/1spvDT597YNr.pdf · the block truncation coding (BTC). Then, three correlations will be used to extract

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The method for image retrieval based on multi-factors ...download.xuebalib.com/1spvDT597YNr.pdf · the block truncation coding (BTC). Then, three correlations will be used to extract

The method for image retrieval based on multi-factors correlationutilizing block truncation coding

Wang Xingyuann, Wang ZongyuFaculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian 116024, China

a r t i c l e i n f o

Article history:Received 16 December 2013Received in revised form18 March 2014Accepted 18 April 2014Available online 2 May 2014

Keywords:Content based image retrievalBlock truncation codingMulti-factors correlationStructure element correlationGradient value correlationGradient direction correlation

a b s t r a c t

In this paper, we proposed multi-factors correlation (MFC) to describe the image, structure elementcorrelation (SEC), gradient value correlation (GVC) and gradient direction correlation (GDC). At first, theRGB color space image is converted to a bitmap image and a mean color component image utilizingthe block truncation coding (BTC). Then, three correlations will be used to extract the image feature.The structure elements can effectively represent the bitmap which is generated by BTC, and SEC caneffectively denote the bitmap's structure and the correlation of the block in the bitmap. GVC and GDCcan effectively denote the gradient relation, which is computed by a mean color component image.Formed by SEC, GVC and GDC, the image feature vectors can effectively represent the image. In the end,the results demonstrate that the method has better performance than other image retrieval methods inthe experiment.

& 2014 Elsevier Ltd. All rights reserved.

1. Introduction

In recent years, image databases are increasing fast. It is difficult tocapture useful images in a huge image database, and it becomes worsethat we frequently obtain useful images through the Internet which iswidely spreading. Existingmethods are usually difficult to change, so itis urgent to find a new way to retrieve images accurately. Thus imageretrieval becomes an important topic in image processing and patternrecognition. Generally speaking, images can be retrieved in threeways: text based, content based and semantic based [1–8]. The textbased retrieval approach is used widely, such as, Baidu and Google.With this method we can retrieve images using keywords thatare annotated on the images. However, we often obtain images thatare unrelated to our expected results, and the results of the imageretrieval rely on our understanding of the query images. There are twodrawbacks to this approach. Firstly, images in the database areannotated manually by annotators; it is time-consuming for a hugeimage database and requires much human labor for annotation.Secondly, the results of the retrieval are inaccurate, because the resultsof the retrieval are related to an understanding of the query images.The second approach, i.e. content based image retrieval (CBIR),was proposed in the early 1990s [9–16]. This approach is to retrieveimages using low-level features like color, texture and shape that canrepresent an image. This approach involves querying an example

image first, extracting the low-level features on the example image,then computing the similarity between the query image and theimages in the image dataset and, finally, sorting images by similarityand displaying the top images. CBIR has been shown to be much moreeffective and subjective than the text based approach [17–19]. Thefinal approach is the semantic based method. The CBIR also fails todescribe the semantic concepts, so researchers proposed some meth-ods for image retrieval using relevance feedback algorithms. Therelevance feedback algorithms capture a user's preferences and bridgethe semantic gap [20,21]; the results of using this method which isbased on relevance feedback is more approach human perception.

In CBIR, color, texture and shape are the most important features.HSV color space is widely used in extracting color features. In thisspace, hue is used to distinguish color, saturation is the percentageof white light added to a pure color, and value refers to the perceivedlight intensity [4,22]. The advantage of HSV color space is that itis closer to human conceptual understanding of colors. In orderto cut down computing complexity and extract the color features inan efficient way, the HSV color space is quantized to 72 bins, ingeneral [22].

Generally, in CBIR, the descriptors representing the image arebased on color, shape or texture and are used to describe the image.Various algorithms have been designed to extract features forimage retrieval. The multi-texton histogram (MTH) proposed forimage retrieval integrates the advantages of a co-occurrence matrixand a histogram by representing the attribute of co-occurrencematrix using a histogram [1]. A novel image feature detecting anddescribing method is proposed, called micro-structure descriptor(MSD). The MSD is built based on the underlying colors in micro-

Contents lists available at ScienceDirect

journal homepage: www.elsevier.com/locate/pr

Pattern Recognition

http://dx.doi.org/10.1016/j.patcog.2014.04.0200031-3203/& 2014 Elsevier Ltd. All rights reserved.

n Corresponding author.E-mail addresses: [email protected] (X. Wang),

[email protected] (Z. Wang).

Pattern Recognition 47 (2014) 3293–3303

Page 2: The method for image retrieval based on multi-factors ...download.xuebalib.com/1spvDT597YNr.pdf · the block truncation coding (BTC). Then, three correlations will be used to extract

structures with a similar edge orientation [2]. In [3], the author isusing the block color co-occurrence matrix and the block patternhistogram to represent an image. In [8], a structure element'descriptor is proposed, and a structure histogram is used to extractan image feature. In [9], three types of image features are proposedto describe the color and spatial distributions of an image; in thesefeatures, the K-means algorithm is adopted to classify all the pixelsin an image into several clusters according to their colors. TheMPEG-7 edge histogram descriptor (EHD) is extracted on spatialdistribution of edges, and it is an efficient texture descriptor forimages [23]. The edge orientation autocorrelogram (EOAC) isproposed for a shape based descriptor [24]. A very effective methodhas been proposed to detect and describe local features in images,called scale-invariant feature transform (SIFT) [25]. The textonco-occurrence matrices (TCM) can describe the spatial correlationof textons for image retrieval which is proposed in [26]. In [27], anadaptive color feature extraction scheme is proposed by consideringthe distribution of an image; the binary quaternion-moment-preserving (BQMP) threshold technique is used.

In this paper, we proposed multi-factors correlation (MFC) todescribe an image, structure element correlation (SEC), gradientvalue correlation (GVC) and gradient direction correlation (GDC).In the first place, an RGB color space image is converted to a bitmapimage and a mean color component image utilizing the blocktruncation coding (BTC). Then, three correlations will be used toextract the image feature. The structure elements can effectivelyrepresent the bitmap which is generated by BTC, and SEC caneffectively denote the bitmap's structure and the correlation of theblock in the bitmap. GVC and GDC can effectively denote the gradientrelation, which is computed by a mean color component image. Theimage feature vectors are formed by SEC, GVC and GDC and they caneffectively represent the image. In the end, experiments showed thatthe MFC method has a higher retrieval precision and recall ratio thanthe MSD [2], SED [8] and CSD3 [9] methods.

The rest of the paper is organized as follows. Section 2 details theBTC for a color image. Section 3 describes the SEC and the method forextracting the feature vector in a bitmap. In Section 4, the GVC andGDC are described. The similarity measure approach is defined inSection 5. The experimental results and comparisons are presentedin Section 6. And Section 7 is the conclusion of the paper.

2. BTC for color image

Block truncation coding (BTC) was first proposed in 1979 [28]and it is mainly used for compressing gray-scale images. BTCdivides the image into small nonoverlapping block images. In everyblock, after block coding, the original image has generated a bitmap

and two mean values. In BTC, firstly, the means of every block iscomputed. Then, compared with the mean value, the pixels in theblock will have a value of 1 in the bitmap if the pixel value is greaterthan the mean value; otherwise it will have a value of 0. In BTC, thetwo mean values are the mean value of the pixels which is greaterthan the block mean value and the mean value of the pixels whichis less than the block mean value and they are called max meanvalue and min mean value, respectively. At the decoding stage, inthe bitmap, the value is replaced by max mean value if the bitmaphas a value of 1; otherwise it is replaced by the min mean valuereplaced. Therefore, the BTC is the lossy compression algorithm.

The BTC is used in a gray-scale image in the early period andthen extended in a Multispectral image, such as a color image[3,29,30], gradually. Most of the color images used the RGB colorspace which is widely used for representing the color image. Afterthe RGB color space, there are many color spaces, such as YCbCrand HSV.

The steps of the BTC for color image can be described as follows[3,29]:

1. Divide the image into small nonoverlapping block images andthe size of every block is m�n; compute the mean valueaði; jÞ; ðiA ½1;m�; jA ½1;n�Þ of every pixel in the block by Eq. (1)

aði; jÞ ¼ ð1=3Þðrði; jÞþgði; jÞþbði; jÞÞ iA ½1;m�; jA ½1;n� ð1Þ

2. Compute the mean value Th of every block as the threshold.

Th¼ 1m� n

aði; jÞ: ðiA ½1;m�; jA ½1;n�Þ ð2Þ

3. The bitmap of every block can be computed as follows:

bimpði; jÞ ¼1; aði; jÞ4Th:

0; aði; jÞrTh:

(ðiA ½1;m�; jA ½1;n�Þ ð3Þ

4. Compute the max mean value Mmax and the min mean valueMmin by

Mmax ¼

1

∑m

i ¼ 1∑n

j ¼ 1bimpði; jÞ

∑m

i ¼ 1∑n

j ¼ 1bimpði; jÞ � rði; jÞ:

1

∑m

i ¼ 1∑n

j ¼ 1bimpði; jÞ

∑m

i ¼ 1∑n

j ¼ 1bimpði; jÞ � gði; jÞ:

1

∑m

i ¼ 1∑n

j ¼ 1bimpði; jÞ

∑m

i ¼ 1∑n

j ¼ 1bimpði; jÞ � bði; jÞ:

8>>>>>>>>>>>>>>>><>>>>>>>>>>>>>>>>:

ð4Þ

Fig. 1. Bitmap of the image. (a) Original image. (b) Bitmap of the image.

X. Wang, Z. Wang / Pattern Recognition 47 (2014) 3293–33033294

Page 3: The method for image retrieval based on multi-factors ...download.xuebalib.com/1spvDT597YNr.pdf · the block truncation coding (BTC). Then, three correlations will be used to extract

Mmin ¼

1

m�n� ∑m

i ¼ 1∑n

j ¼ 1bimpði; jÞ

∑m

i ¼ 1∑n

j ¼ 1ð1�bimpði; jÞÞ � rði; jÞ:

1

m�n� ∑m

i ¼ 1∑n

j ¼ 1bimpði; jÞ

∑m

i ¼ 1∑n

j ¼ 1ð1�bimpði; jÞÞ � gði; jÞ:

1

m�n� ∑m

i ¼ 1∑n

j ¼ 1bimpði; jÞ

∑m

i ¼ 1∑n

j ¼ 1ð1�bimpði; jÞÞ � bði; jÞ:

8>>>>>>>>>>>>>>>><>>>>>>>>>>>>>>>>:

ð5Þ

Since the image is composed of small blocks and the bitmap andthe mean values of every block have been computed, we canobtain the bitmap and the mean value images of the image.

Fig. 1 is the bitmap of the image using BTC; Fig. 1(a) is theoriginal image and Fig. 1(b) is the bitmap of the image.

3. Structure element correlation

In image retrieval, it is difficult to represent an image effec-tively by using feature vector. In this paper, first, by using BTC, weobtain the bitmap and the mean value image. Then, we extract thefeature vector and use the method which was proposed in thispaper to represent the image effectively by feature vector.

In Fig. 1, the bitmap can effectively identify the shape and edgeof the image, so we can conclude that the image's shape and edgecan effectively be represented if the bitmap is effectively repre-sented. In BTC, the image is divided into many 2�2 blocks, andthe shape and edge are composed of these small blocks. In [8], theauthors have proposed a structure element descriptor (SED) torepresent the image's texture. In our paper, the structure elementis applied. In a bitmap, the image's shape and edge are effectivelyrepresented through describing the relation of the structureelement.

Through the above analysis, we proposed a novel method toextract the feature from bitmap–structure element correlation(SEC). In BTC, the size of a block is 2� 2 ðm¼ 2;n¼ 2Þ. Fig. 2displays 8 kinds of structure element which could appear in abitmap. The relation of the structure element can be definedthrough structure element correlation; therefore, the structureelement correlation can represent the image shape and edgeeffectively.

The SEC is defined as the probability of the two structureelements that appear in the bitmap and is described as

SPij ¼ PrðSðx; yÞ ¼ SijSðx0; y0Þ ¼ SjÞ: ði; jA ½1;8�Þ ð6ÞIn Eq. (6), maxðjx�x0j; jy�y0jÞ ¼D, SPij is the probability when

the structure element is Sj and the other structure element is Si,and the distance is not greater than D. In practice, i¼ j and D¼ 1are used, generally; it means that the probability of the structureelement is Si and the neighboring structure element is the same.

The procedure for computing SEC can be described as follows:

1. Match the structure element which has been defined in Fig. 2in the bitmap. Starting from the origin (0, 0), move thestructure element from the left to the right and the top to thebottom with 2-step length. In BTC, the size of a block is 2�2;therefore, the structure element just matches a block in thebitmap. In Fig. 3, the bitmap is converted to the structureelement correlation map and the bitmap's size is 10�10, so thesize of the structure element correlation map is 5�5.

2. In Fig. 3, the red dotted line indicates the structure element'scorrelation. A red dotted line with two structure elementsindicates that the neighboring structure element is Si when thestructure is Si.

3. Compute the structure element correlation using Eq. (6). InFig. 3, S1 appears 6 times, and S1 appears 4 times in theneighboring area when S1 appears. Therefore, the correlation ofstructure element S1 is p1¼4/6¼0.67. Similarly, p2¼0, p3¼0,p4¼0.5, p5¼0.5, p6¼0.6, p7¼0, and p8¼0.5.

Compute the feature vector of the bitmap with the aboveprocedures. The feature vector of the structure element correlationSP ¼ f0:67; 0; 0; 0:5; 0:5; 0:6; 0; 0:5g.

4. Gradient correlation

The structure element correlation denotes the shape and edgefeatures of the image. For an image, the shape and edge describepart of the features. In the image, texture, the most importantfeature of the image, describes the object and the object's feature.In image retrieval, it is important to effectively represent theimage's texture.

In the descriptions of the image, the gradient is an effectivetexture feature of an image. Through computing the gradient, the

Fig. 2. Structure element.

X. Wang, Z. Wang / Pattern Recognition 47 (2014) 3293–3303 3295

Page 4: The method for image retrieval based on multi-factors ...download.xuebalib.com/1spvDT597YNr.pdf · the block truncation coding (BTC). Then, three correlations will be used to extract

change value and the direction of every pixel can be computed andthey are called gradient value and gradient direction, respectively.The image texture will be effectively described if the gradientvalue and gradient direction of the pixel have been described.

4.1. Compute gradient for color image

In the process of computing the image gradient, a gray-scaleimage is used generally. But in image retrieval, most images arecolor images. In [31], Zenzo has proposed a gradient computingmethod for color images. Most of the color images are based onRGB color space. Gradient computing for color images can bedescribed as follows:

Let r, g and b be unit vectors of R-, G- and B-axes. Let f(x,y) be acolor image; the vectors can be computed as follows [2,32]:

u¼ ∂R∂x

rþ∂G∂x

gþ∂B∂x

b ð7Þ

v¼ ∂R∂y

rþ∂G∂y

gþ∂B∂y

b ð8Þ

The dot products of vectors u and v are defined as follows:

gxx ¼ utu¼ ∂R∂x

��������2

þ ∂G∂x

��������2

þ ∂B∂x

��������2

ð9Þ

gyy ¼ vtv¼ ∂R∂y

��������2

þ ∂G∂y

��������2

þ ∂B∂y

��������2

ð10Þ

gxy ¼ utv¼ ∂R∂x

∂R∂y

þ∂G∂x

∂G∂x

þ∂B∂x

∂B∂y

ð11Þ

Let p(x,y) be the vector at the point (x,y) in the image, so thegradient direction of p(x,y) can be defined as follows:

θðx; yÞ ¼ arctan2gxy

gxxþgyy

!ð12Þ

The gradient value in the point (x,y) is defined as follows:

Gðx; yÞ ¼ ðð1=2ÞððgxxþgyyÞþðgxx�gyyÞ cos θþ2gxy sin θÞÞ1=2 ð13Þ

In Eq. (12), tan ðθÞ ¼ tan ðθ7πÞ; therefore, if θ0 is a solution toEq. (12), then θ07π is also a solution to Eq. (12). Therefore, we can

obtain Eqs. (14) and (15):

G1ðx; yÞ ¼ ðð1=2ÞððgxxþgyyÞþðgxx�gyyÞ cos θþ2gxy sin θÞÞ1=2 ð14Þ

G2ðx; yÞ ¼ ðð1=2ÞððgxxþgyyÞþðgxx�gyyÞ cos ðθþπÞþ2gxy sin ðθþπÞÞÞ1=2

ð15ÞCompute the max gradient value Gmaxðx; yÞ of the pixel using

Eq. (16); G1ðx; yÞ and G2ðx; yÞ are computed by Eqs. (14) and (15).

Gmaxðx; yÞ ¼ maxðG1ðx; yÞ;G2ðx; yÞÞ ð16ÞMake the max gradient value Gmaxðx; yÞ the pixel's gradient

value; therefore, the gradient direction θðx; yÞ is defined as

θðx; yÞ ¼θ0; Gmaxðx; yÞ ¼ G1ðx; yÞθ0þπ; Gmaxðx; yÞ ¼ G2ðx; yÞ

(ð17Þ

In BTC, two mean values have been generated. In image retrieval,the two mean values are similar in our sights. Therefore, we usedmax mean value to compute the image's gradient, namely Mmax inSection 2.

Fig. 4 displays the image's gradient, and the gradient image caneffectively represent the image's texture.

4.2. Gradient value correlation

The gradient can effectively represent the image's texture feature.In this paper, gradient value correlation (GVC) is used to representthe relation of the gradient image. In Section 4.1, the details ofcomputing the gradient value have been described. In this section,the gradient values are equidistantly quantized to 32 dimensions inorder to reduce the complexity of computing GVC.

GVC is defined as the probability of two gradient valuesappearing in the gradient image and are described as

GPij ¼ PrfGðx; yÞ ¼ GijGðx0; y0Þ ¼ Gjg ði; jA ½1;32�Þ ð18ÞIn Eq. (18), maxðjx�x0j; jy�y0jÞ ¼D, GPij is the probability when

the structure element is Gj and the other structure element is Gi,and the distance is not greater than D. In practice, i¼ j and D¼ 1are used, generally; it means that the probability of the structureelement is Gi and the neighboring structure element is the same.

In Fig. 5, the gradient values are equidistantly quantized to 32dimensions; therefore, the gradient values are limited to [1,32].

Fig. 3. Bitmap and the structure element correlation map. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of thisarticle.)

X. Wang, Z. Wang / Pattern Recognition 47 (2014) 3293–33033296

Page 5: The method for image retrieval based on multi-factors ...download.xuebalib.com/1spvDT597YNr.pdf · the block truncation coding (BTC). Then, three correlations will be used to extract

The red dotted line indicates that there is correlation between thetwo gradient values. In Fig. 5, the gradient value 1 appears once,and p1 ¼ 0; gradient value 2 appears 0 times, and p2 ¼ 0. Gradientvalue 6 appears 7 times and gradient value 6 appears 3 timeswhen gradient value 6 appears in adjacent positions, so p6¼3/6¼0.5.

Compute the feature vector of GVC using the above procedures;therefore, GP¼{0, 0, 0, 0, 0.75, 0.5, 0.33, 0.2, 0.4, 0.5, 0.33, 0.33, 0, 0,0.43, 0.5, 0.57, 0.43, 0.4, 0.25, 0.57, 1, 0.5, 0, 0.33, 0, 0, 0, 0.5, 0.33, 0, 0}.

4.3. Gradient direction correlation

In Section 4.1, the gradient direction of every pixel has beencomputed. In this paper, gradient direction correlation (GDC) isused to denote the relations of the gradient direction of pixels. Bycomputing gradient direction, gradient direction θðx; yÞA ½1;180Þ.In order to reduce the complexity of computing GDC, the gradientdirections are equidistantly quantized to 90 dimensions; therefore,θðx; yÞA ½1;90�.

GDC is defined as the probability of two gradient directionsappearing in the gradient image, and is described as

θPij ¼ Prfθðx; yÞ ¼ θijθðx0; y0Þ ¼ θjg ði; jA ½1;90�Þ ð19Þ

In Eq. (19), maxðjx�x0j; jy�y0jÞ ¼D, θPij is the probability whenthe structure element is θj, the other structure element is θi, andthe distance is not greater than D. In practice, i¼ j and D¼ 1 areused, generally, which means that the probability of the structureelement is θi and the neighboring structure element is the same.

In Section 4.1, the gradient direction is limited to [1,90]. Inorder to reduce the describing complexity and illustrate theprinciple, in Fig. 6 the gradient direction is limited to [1,10].In Fig. 6, the red dotted line indicates the two gradient directionsthat have the correlation. Computing the GDC using Eq. (19), inFig. 6, gradient direction 1 appears 3 times, and gradient direction1 appears twice when gradient direction 1 appears in adjacentpositions, and p1 ¼ 0:67. Gradient direction 2 appears twice, andgradient direction 2 appears once when gradient direction2 appears in adjacent positions and p2 ¼ 0:5.

Compute the feature vector of GDC using the above procedures;therefore, θP¼{0.67, 0.5, 0.5, 0.5, 0.2, 0.33, 0.33, 0.5, 0.33, 0.5}.

5. Similarity measure

The feature vector is used to represent the image; therefore, thesimilarity of two images can be converted to the similarity of twofeature vectors. If two feature vectors are similar, then the twoimages will be similar.

In this paper, the image's feature vector F contains three portions,feature vector of SEC SP ¼ fp1; p2;…; pi;…; p8g, feature vector of GVC

Fig. 5. Gradient correlation map. (For interpretation of the references to color inthis figure legend, the reader is referred to the web version of this article.)

Fig. 6. Gradient direction map. (For interpretation of the references to color in thisfigure legend, the reader is referred to the web version of this article.)

Fig. 4. Gradient image. (a) Original image. (b) Gradient image.

X. Wang, Z. Wang / Pattern Recognition 47 (2014) 3293–3303 3297

Page 6: The method for image retrieval based on multi-factors ...download.xuebalib.com/1spvDT597YNr.pdf · the block truncation coding (BTC). Then, three correlations will be used to extract

GP ¼ fp1; p2;…;pi;…; p32g and feature vector of GDCθP ¼ fp1; p2;…; pi;…; p90g; therefore, the feature vector of the imagehas 130 dimensions.

Let the query image's feature vector F ¼ fSP;GP; θPg and datasetimage's feature vector F 0 ¼ fSP0;GP0; θP0g. The distance of twofeature vectors is described as follows:

SimðSP; SP0Þ ¼ ∑8

i ¼ 1ðpi�pi

0Þ2 !1=2

ð20Þ

SimðGP;GP0Þ ¼ ∑32

i ¼ 1ðpi�p0iÞ2

!1=2

ð21Þ

SimðθP; θP0Þ ¼ ∑90

i ¼ 1ðpi�p0iÞ2

!1=2

ð22Þ

SIMðF; F 0Þ ¼w1 � SimðSP; SP0Þþw2 � SimðGP;GP0Þþw3 � SimðθP; θP0Þð23Þ

In Eq. (20), w1, w2 and w3 are weights of the similarity of eachfeature vector, and w1þw2þw3 ¼ 1. The value of each weight willbe given in the experiment.

6. Experimental results

There are many image datasets used in the CBIR systems, suchas UCID [33], NISTER [34], and Corel image dataset, with the Corelimage dataset being the most widely used. In our algorithm,experiments are carried out by using two Corel image datasets.The first one is Corel-1000 dataset, which is divided into 10categories including landscapes, horses, elephants, human beings,bus, flowers, buildings, mountains, food and dragons, and eachcategory contains 100 images. Another one is Corel-10000 dataset,which contains 100 categories including beach, car, fish, outdoorand sunset, and each category contains 100 images. The experi-mental images cover a variety of contents.

In the experiment, in the Corel-1000 dataset, we randomly choose10 images from each category and use them as query images. In eachcategory, we compute the precision and recall percentage ofimages; then, the mean precision–recall pair percentage would becomputed by the precision–recall pair that has been obtained by 10times the retrieval results using 10 random images. In the Corel-10000 dataset, 20 kinds of categories from 100 categories are chosenrandomly; choose 10 images from each category that has been

chosen, and use them as query images. Then, the mean precision–recall pair percentage would be computed.

The commonly used performance measurement, precision–recallpair, is used for evaluation of retrieval performance. The precision P isdefined as the ratio between the number of the retrieved relevantimages M and the total number of the retrieved images N; itmeasures the accuracy of the retrieval. Recall that R is defined asthe ratio between the number of the retrieved relevant imagesM andthe total number of relevant images S of the whole database; itmeasures the robustness of the retrieval. Precision P and recall R arecomputed by the following equations:

P ¼MN; ð24Þ

R¼MS: ð25Þ

Fig. 7 displays the experimental results which compare thegradient value divided into 16, 32 and 64 dimensions. In theexperiment, the performance of higher dimensions is better thanlower dimension because more details will be extracted if thedimension is higher. However, in Fig. 7, the performance of 64dimensions roughly equals that of 32 dimensions, and the perfor-mance of 32 dimensions is obviously higher than 16. Therefore, inour method, 32 dimensions are utilized.

Fig. 8 displays the experimental results which compare thegradient direction divided into 45, 90 and 180 dimensions. In theexperiment, the performance of the higher dimension is betterthan the lower dimension because more details will be extracted ifthe dimension is higher. However, in Fig. 8, the performance of 180dimensions roughly equals that of 90 dimensions, and the perfor-mance of 90 dimensions is higher than 45 obviously. Therefore, inour method, 90 dimensions are used.

Fig. 9 displays the retrieval performance of the weights usingdifferent values in similarity. The retrieval performance is the bestwhen the weight is (0.1, 0.3, 0.6). Because in image retrieval theGDC can effectively represent the edge and texture, and GVC isrelated to GDC, GDC is prominent in image retrieval in ourmethod, and the weight of GDC should be bigger than others.

Fig. 10 displays the compared experimental results between themethod and methods MSD [2], SED [8] and CSD3 in [9]. Theretrieval results show that the proposed method has betterperformance in image retrieval than the other methods. InFig. 10(a), the top 10 images are relevant to the query image inour proposed method, and in MSD, SED and CSD3 the precision of

Fig. 7. The comparison results of dividing gradient value to 16, 32 and 64 dimensions.

X. Wang, Z. Wang / Pattern Recognition 47 (2014) 3293–33033298

Page 7: The method for image retrieval based on multi-factors ...download.xuebalib.com/1spvDT597YNr.pdf · the block truncation coding (BTC). Then, three correlations will be used to extract

the top 10 images is 85%, 97% and 81%, respectively. On the otherhand, the precision has reached 39%, 45% and 51% in MSD, SED andCSD3, respectively, while the recall reached 100%. The SED hasbetter precision than our method in portions, but our method has

better performance on the whole. And in Fig. 10(b), the perfor-mance of the proposed method is higher than SED. Fig. 10(b) shows that the performance of the proposed method is higherthan the other method when the Corel-10000 dataset is used.

Fig. 8. The comparison results which divide the gradient directions into 45, 90 and 180 dimensions. (a) Corel-1000 dataset is used. (b) Corel-10000 dataset is used.

Fig. 9. The results comparing different values for weight. (a) Corel-1000 dataset is used. (b) Corel-10000 dataset is used.

Fig. 10. The results comparing four methods. (a) Corel-1000 dataset is used. (b) Corel-10000 dataset is used.

X. Wang, Z. Wang / Pattern Recognition 47 (2014) 3293–3303 3299

Page 8: The method for image retrieval based on multi-factors ...download.xuebalib.com/1spvDT597YNr.pdf · the block truncation coding (BTC). Then, three correlations will be used to extract

Therefore, we can conclude that the performance of our proposedmethod is higher than the other three methods.

In Fig. 10, the proposed method outperforms MSD [2], SED [8],and CSD3 [9]. In the proposed method, we used multi-factorscorrelation (MFC) to describe the image, structure element corre-lation (SEC), gradient value correlation (GVC) and gradient direc-tion correlation (GDC). The structure elements can effectivelyrepresent the bitmap which is generated by BTC, and SEC caneffectively denote the bitmap's structure and the correlation of theblock in the bitmap. GVC and GDC can effectively denote thegradient relation, which is computed by a mean color componentimage. Therefore, the feature vectors can effectively represent theimage, and have outperformed the other methods.

In MSD [2], the micro-structure descriptor is proposed. TheMSD is defined as the underlying colors in the micro-structurewith similar edge orientations, and the feature vector is extractedon edge orientation; so the feature vector is part of the features ofthe image. In the proposed method, the feature vector is the globalfeature of the image. In SED [8], the feature vector is extracted bySEH and combines the color information with SED. In SED [8], it iscomposed of five structure elements that denote the five differentdirections. On the other hand, 72 kinds of main colors arequantized in HSV color space, and extract the SEH on every bin.In CSD3 [9], the method describes the image by color and spatialdistributions. In CSD3, the feature vector is extracted on a singlefeature, shape or color. In the proposed method, the feature vectoris extracted on color and texture features simultaneously, and canrepresent them more effectively. Therefore, the proposed methodoutperforms MSD [2], SED [8], and CSD3 [9].

In the proposed method, let the image's size be m�n. Firstly,scanning every pixel in the image when computing block truncationcoding, the complexity is O(mn). Secondly, scanning the bitmapwhen computing the SEC, the complexity is O(mn). Thirdly, scanningthe max mean value color component two times, the complexity is

Oð2�m=2� n=2Þ, and can be considered as O(mn). Therefore, thecomputational complexity of the algorithm is O(mn).

In the experiment, the Corel-1000 dataset was used to computethe computational power. The feature vector was extracted fromevery image in Corel-1000 and we computed the time. Then, themean time is computed. The mean times of our method, and SED,MSD and CSD3 methods are 0.227 s, 0.324 s, 0.258 s and 0.475 s,respectively. In the experiment, the computer configuration is asfollows: the CPU is dual-core 2.2 GHZ and the RAM is 2G.

In the end, three examples of image retrieval results have beendisplayed and the Corel-1000 dataset was used. The similarity wascomputed between the query image and the image database,and we then ranked them. The top 11 images are displayed inour experiment. Three different kinds of images were testedwith our proposed retrieval methods. The results are displayedin Figs. 11–14. In Fig. 14, the result showed that the objects withless texture have better performance and higher precision.

In the experiment, the retrieval precision by bus, flower anddragon is higher than the precision of retrieval by food. The top 11retrieval images of buses, flowers and dragons are relevant to thequery image entirely. And there are 8 images relevant to the queryimage of landscape when the top 11 images of food are retrieved.The reasons why the precision of retrieving by bus, flower anddragon is higher than food are that images of bus, flower anddragon have significant regions, and these images have significanttexture. The texture of food is not significant in the image.Therefore, we can conclude that the precision is higher whenthe query image has more significant regions and stronger texturein the image than those images whose regions are not significant.

In summary, the method we proposed is more suitable to thoseimages which have significant regions, significant objects andstrong texture, such as buses, flowers and dragons. And in theseimages, our method will have better performance and higherprecision.

Fig. 11. Image retrieval for bus.

X. Wang, Z. Wang / Pattern Recognition 47 (2014) 3293–33033300

Page 9: The method for image retrieval based on multi-factors ...download.xuebalib.com/1spvDT597YNr.pdf · the block truncation coding (BTC). Then, three correlations will be used to extract

Fig. 12. Image retrieval for flower.

Fig. 13. Image retrieval for food.

X. Wang, Z. Wang / Pattern Recognition 47 (2014) 3293–3303 3301

Page 10: The method for image retrieval based on multi-factors ...download.xuebalib.com/1spvDT597YNr.pdf · the block truncation coding (BTC). Then, three correlations will be used to extract

7. Conclusion

In this paper, we proposed MFC, formed by SEC, GVC and GDC, todescribe an image. Firstly, the RGB color space image is converted to abitmap image and a mean color component image utilizing the BTC.Then, three correlations will be used to extract the image features.The structure elements can effectively represent the bitmap which isgenerated by BTC, and SEC can effectively denote the bitmap'sstructure and the correlation of the block in the bitmap. GVC andGDC can effectively denote the gradient relation, which is computedby a mean color component image. The image feature vectors areformed by SEC, GVC and GDC, and they can effectively represent theimage. In the similarity measure, we have used Euclidean distancewith the weighted formula to calculate the similarity between thequery image and images in the database. The retrieval performanceof the approach based on MFC is tested and compared with severalother retrieval methods. The experimental results show that thealgorithm has better performance than the other retrieval methods.

Conflict of interest statement

We declare that we have no financial and personal relation-ships with other people or organizations that can inappropriatelyinfluence our work, there is no professional or other personalinterest of any nature or kind in any product, service and companythat could be construed as influencing the position presented in,or the review of, the manuscript entitled.

Acknowledgments

This research was supported by the National Natural ScienceFoundation of China (Nos. 61370145, 61173183, and 60973152), the

Doctoral Program Foundation of Institution of Higher Educationof China (No. 20070141014), the Program for Liaoning ExcellentTalents in University (No. LR2012003), the National Natural ScienceFoundation of Liaoning province (No. 20082165) and the FundamentalResearch Funds for the Central Universities (No. DUT12JB06).

References

[1] G.H. Liu, L. Zhang, Y.K. Hou, Z.Y. Li, J.Y. Yang, Image retrieval based on multi-texton histogram, Pattern Recognit. 43 (7) (2010) 2380–2389.

[2] G.H. Liu, Z.Y. Li, L. Zhang, Y. Xu, Image retrieval based on micro-structuredescriptor, Pattern Recognit. 44 (9) (2011) 2123–2133.

[3] G.P. Qiu, Color image indexing using BTC, IEEE Trans. Image Process. 12 (1)(2003) 93–101.

[4] R. Kwitt, P. Meerwald, A. Uhl, Efficient texture image retrieval using copulas ina Bayesian framework, IEEE Trans. Image Process. 20 (7) (2011) 2063–2077.

[5] E. Aptoula, S. Lefèvre, Morphological description of color images for content-based image retrieval, IEEE Trans. Image Process. 18 (11) (2009) 2505–2517.

[6] G. Quellec, M. Lamard, G. Cazuguel, B. Cochener, C. Roux, Fast wavelet-basedimage characterization for highly adaptive image retrieval, IEEE Trans. ImageProcess. 21 (4) (2012) 1613–1623.

[7] M. Heikkilä, M. Pietikäinen, C. Schmid, Description of interest regions withlocal binary patterns, Pattern Recognit. 42 (3) (2009) 425–436.

[8] X.Y. Wang, Z.Y. Wang. A., Novel method for image retrieval based on structureelements' descriptor, J. Vis. Commun. Image Represent. 24 (1) (2013) 63–74.

[9] C.H. Lin, D.C. Huang, Y.K. Chan, K.H. Chen, Y.J. Chang, Fast color-spatial featurebased image retrieval methods, Expert Syst. Appl. 38 (9) (2011) 11412–11420.

[10] G. Michèle, Z. Bertrand, Body color sets: a compact and reliable representationof images, J. Vis. Commun. Image Represent. 22 (1) (2011) 48–60.

[11] Z. Konstantinos, E. Kavallieratou, P. Nikos, Image retrieval systems based oncompact shape descriptor and relevance feedback information, J. Vis. Com-mun. Image Represent. 22 (5) (2011) 378–390.

[12] A. El-ghazal, O. Basir, S. Belkasim, Invariant curvature-based Fourier shapedescriptors, J. Vis. Commun. Image Represent. 23 (4) (2012) 622–633.

[13] T. Celik, T. Tjahjadi, Bayesian texture classification and retrieval based onmultiscale feature vector, Pattern Recognit. Lett. 32 (2) (2011) 159–167.

[14] N.D. Thang, T. Rasheed, Y.K. Lee, S.Y. Lee, T.S. Kim, Content-based facial imageretrieval using constrained independent component analysis, Inf. Sci. 181 (15)(2011) 3162–3174.

Fig. 14. Image retrieval for dragons.

X. Wang, Z. Wang / Pattern Recognition 47 (2014) 3293–33033302

Page 11: The method for image retrieval based on multi-factors ...download.xuebalib.com/1spvDT597YNr.pdf · the block truncation coding (BTC). Then, three correlations will be used to extract

[15] K. Zhan, H.J. Zhang, Y.D. Ma, New spiking cortical model for invariant textureretrieval and image processing, IEEE Trans. Neural Netw. 20 (12) (2009)1980–1986.

[16] R. Kwitt, A. Uhl, Lightweight probabilistic texture retrieval, IEEE Trans. ImageProcess. 19 (1) (2010) 241–253.

[17] G. Quellec, M. Lamard, G. Cazuguel, B. Cochener, C. Roux, Adaptive nonsepar-able wavelet transform via lifting and its application to content-based imageretrieval, IEEE Trans. Image Process. 19 (1) (2010) 25–35.

[18] S. Murala, R.P. Maheshwari, R. Balasubramanian, Local tetra patterns: a newfeature descriptor for content-based image retrieval, IEEE Trans. ImageProcess. 21 (5) (2012) 2874–2886.

[19] X.F. He, Laplacian regularized D-optimal design for active learning and itsapplication to image retrieval, IEEE Trans. Image Process. 19 (1) (2010) 254–263.

[20] W. Bian, D.C. Tao, Biased discriminant Euclidean embedding for content-basedimage retrieval, IEEE Trans. Image Process. 19 (2) (2010) 545–554.

[21] D.S. Zhang, M.M. Islam, G.J. Lu, A review on automatic image annotationtechniques, Pattern Recognit. 45 (1) (2012) 346–362.

[22] J.L. Liu, D.G. Kong, Image retrieval based on weighted blocks and color featurein: Proceedings of the International Conference on Mechatronic Science,Electric Engineering and Computer, Jilin, 2011, pp. 921–924.

[23] B.S. Manjunath, J.R. Ohm, V.V. Vasudevan, A. Yamada, Color and texturedescriptors, IEEE Trans. Circuit Syst. Video Technol. 11 (6) (2001) 703–715.

[24] F. Mahmoudi, J. Shanbehzadeh, Image retrieval based on shape similarity byedge orientation autocorrelogram, Pattern Recognit. 36 (8) (2003) 1725–1736.

[25] D.G. Lowe, Distinctive image features from scale-invariant keypoints, Int. J.Comput. Vis. 60 (2) (2004) 91–110.

[26] G.H. Liu, J.Y. Yang, Image retrieval based on the texton co-occurrence matrix,Pattern Recognit. 41 (12) (2008) 3521–3527.

[27] W.T. Chen, W.C. Liu, M.S. Chen, Adaptive color feature extraction base onimage color distributions, IEEE Trans. Image Process. 19 (8) (2010) 2005–2016.

[28] E.J. Delp, O.R. Mitchell, Image coding using block truncation coding, IEEETrans. Commun. 27 (3) (1979) 1335–1342.

[29] T. Kurita, N. Otsu, A method of block truncation coding for color imagecompression, IEEE Trans. Commun. 41 (9) (1993) 1270–1274.

[30] Y. Wu, D.C. Coll, Single bit-map block truncation coding of color images, IEEE J.Sel. Areas Commun. 10 (5) (1992) 952–959.

[31] S.D. Zenzo, A note on the gradient of a multi-image, Comput. Vis. Graph. ImageProcess. 33 (1) (1986) 116–125.

[32] D.G. Lowe, Distinctive image features from scale-invariant keypoints, Int. J.Comput. Vis. 60 (2) (2004) 91–110.

[33] G. Schaefer, M. Stich, UCID – an uncompressed colour image database, in:Proceedings of Conference on Storage and Retrieval Methods and Applicationsfor Multimedia, San Jose, 2004, pp. 472–480.

[34] D. Nistér, H. Stewénius, Scalable recognition with a vocabulary tree, in:Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,Washington, 2006, pp. 2161–2168.

Xingyuan Wang is currently a professor with the school of computer science and technology, Dalian University of Technology in China. He received a PhD degree fromNortheastern University. His current research interests are in the areas of image processing, chaos theory and the application of fractal theory in image processing.

Zongyu Wang is currently a graduate student for a Master's degree with the school of computer science and technology, Dalian University of Technology in China. His currentresearch interests include image processing, image retrieval and pattern recognition.

X. Wang, Z. Wang / Pattern Recognition 47 (2014) 3293–3303 3303

Page 12: The method for image retrieval based on multi-factors ...download.xuebalib.com/1spvDT597YNr.pdf · the block truncation coding (BTC). Then, three correlations will be used to extract

本文献由“学霸图书馆-文献云下载”收集自网络,仅供学习交流使用。

学霸图书馆(www.xuebalib.com)是一个“整合众多图书馆数据库资源,

提供一站式文献检索和下载服务”的24 小时在线不限IP

图书馆。

图书馆致力于便利、促进学习与科研,提供最强文献下载服务。

图书馆导航:

图书馆首页 文献云下载 图书馆入口 外文数据库大全 疑难文献辅助工具