Discovery Engineering Analysisdiscoveryjournals.org/discoveryengineering/current_issue/2016/A16.… · histogram, local ternary co-occurrence matrix, texture analysis in the frequency

Page | 534

Image retrieval based on a statistical model with effective machine learning strategy and similarity measure Publication History Received: 31 July 2016 Accepted: 02 September 2016 Online First: 5 September 2016 Published: October-December 2016 Citation Sathiamoorthy S, Saravanan A. Image retrieval based on a statistical model with effective machine learning strategy and similarity measure. Discovery Engineering, 2016, 4(14), 534-539 Publication License

This work is licensed under a Creative Commons Attribution 4.0 International License. General Note

Article is recommended to print as digital color version in recycled paper.

Discovery Engineering Analysis An International Journal ISSN 2320 – 6675 EISSN 2320 –6853 © 2016 Discovery Publication. All Rights Reserved

Page | 535

Image retrieval based on a statistical model with effective machine learning strategy and similarity

measure S.Sathiamoorthy* A. Saravanan

Computer Science and Engineering Wing, DDE

Annamalai University Annamalai Nagar-608002 *[email protected]

Abstract— The image retrieval has been an imperative research

interest over the decade. For effectively retrieving more similar images from huge image repositories, in this paper, we uses the color autocorrelogram and edge orientation autocorrelogram (EOAC), which represent the global distribution of local spatial correlation between identical color and edge orientations respectively. Moreover, the micro-texture is used to represent the textures at local as well as global level. The support vector machine (SVM) with vector valued decision (VVD) is used for accurate categorization of images. The Canberra metric is used to measure the distance between query and target images. The experimental results indicated that the proposed method indeed outperforms other methods in terms of retrieval accuracy.

Keywords—Color autocorrelogram; micro-textures; edge orientation autocorrelogram; support vector machine; vector valued decision.

I. INTRODUCTION Since 1970’s image retrieval has been a very active and

important research field in various domains. The earlier image retrieval methods are text based that is each image in an image database is annotated using keywords for similar image retrieval and it has two main problems [1]. First one is two different people may annotate an image using different keywords. Secondly, annotating huge image repositories is time consuming and tedious. Afterwards, the negative aspects of text-based image retrieval has been solved using the content based image retrieval (CBIR) methods in which low-level visual contents of an images like color, texture, shape, spatial and temporal are extracted automatically and summarized to perform image retrieval [2-12].

Since color is the most important feature that human perceives when viewing an image, a numerous methods for describing the color feature of an image [13] such as color moments, color histogram, color coherence vector, scalable color descriptor, dominant color descriptor, color structure descriptor, spatial chromatic histogram, color anglogram, chromaticity moments, spatial information of colors using entropy and so on has been presented. However, literature describes that the color autocorrelogram presented by Hung et al., [14] captures global distribution of local spatial correlation between the identical color elements is significantly superior in accuarcy; robust for color, appearance, contrast and brightness changes; less computational and storage cost than

the existing methods [2,13]. Thus, color autocorrelogram has been chosen in this paper as a color feature.

Texture feature plays a noteworthy role in understanding the structural arrangement of surfaces and the relationship of the surrounding surfaces in the images. Hence, several techniques [13] have been introduced and successfully utilized for image classification and retrieval like texture energy approach introduced by Laws, gray level co-occurrence matrix, gray level spatial dependence of texture, fractal dimension, local binary pattern, scale invariant feature transform, texton co-occurrence matrix, multi-texton histogram, local ternary co-occurrence matrix, texture analysis in the frequency domain using the wavelet transforms namely orthogonal, tree-structured, pyramid, Gabor, Daubechies, etc. Recently, [15] characterizes and represent texture at local as well as global level, and is employed in the proposed method as it significantly outperforms the existing techniques on texture analysis.

As human can recognize objects solely from their shapes, shape features provide potential information for image classification. Since human beings recognize shapes mainly by their boundary features, the contour is the main interest in many applications than the region based shape methods. Thus, a large number of algorithms have been presented for boundary based shape representation [10,12, 16-20] such as chain code method, chain code histogram, simple global contour descriptors like area, eccentricity, circularity, bending energy, major axis orientation, convexity, ratio of principle axis, elliptic variance and circular variance, the shape signatures namely centroidal profile, cumulative angle, complex coordinates, tangent angle, curvature and chord-length, boundary moments, edge histogram descriptor and its improved version, edge orientation autocorrelogram (EOAC), scale space method and so on. Recently, Seetharaman and Sathiamoorthy [12] introduced a new version of EOAC, which describes the global distribution of local spatial correlation between similar edge orientations in HSV color space using the very minute and fine edges extracted using the Full Range Autoregressive model with Bayesian approach and is significantly better than the conventional techniques based on contours. Hence, a new version of EOAC is adopted in this paper to represent a shape feature of an image.

The extracted features of an image are combined together, represented as a vector of finite dimension and normalized using Z-score normalization method [12], which

Page | 536

removes the outlier in the data and keep a same range of values for all the features in the composite feature vector.

In human visual perception system, the color, shape and texture features do not have equal weight to differentiate images. Therefore, the proposed method considers the degree of importance of each computed feature to determine the exact class of an image. The proposed method assigns weights to each computed feature using a set of training data set in the classification phase and is compared with the existing approach [12], which is effectively reported for medical image retrieval using adaptive binary tree based SVM.

In the literature, a numerous linear and non-linear classifiers [10-12,15,21] based on statistical and mathematical models, and neural networks such as Bayes classifier, graphs, backpropagation neural network, genetic algorithm, radial basis function neural network, decision tree algorithm, support vector machines, etc. have been used. As conventional classification methods are less capable of dealing with gigantic and more complicated non-linear and dependent data; artificial neural networks have more flexibility in modeling and reasonable accuracy in classification problems, artificial neural networks have been used largely in diverse domains [10-12,21].

Over the decades, the support vector machine (SVM) developed by Vapnik based on statistical learning theory has been used extensively for non linear and non separable problems in diverse domains like medicine, engineering, etc. due to its high generalization ability, robustness to high-dimensional data and a well-defined learning theory. Since the SVM provide a potential solution for classification task while the datasets are more unbalanced, it attracted much attention of researchers and leads to successful development of various refined approaches of SVM such as one-against-all (OAA), one-against-one (OAO), all-and-one (A&O), direct acyclic graph SVM (DAGSVM), decision directed acyclic graph SVM (DDAGSVM), adaptive directed acyclic graph SVM (ADAGSVM), binary tree based SVM (BTS), adaptive binary tree based SVM (ABTSVM), decision tree SVM (DTS), etc [21]. Recently, a SVM based on vector valued decision is introduced in [21] and its classification efficiency is superior to the existing SVM approaches, reduces the computational complexity for both training and testing phase and better interpretability with respect to domain knowledge in image classification. Hence, this paper exploits the potential of SVM-VVD for filtering out the irrelevant classes of images for the final stage in which the similarity between the query and target images is measured. It is strongly believed that the combination of the proposed composite feature vector and SVM-VVD overcomes the shortcomings in the context of image retrieval [2-12,14].

The rest of the paper is organized as follows. The full range autoregressive model (FRAR) is described in section 2, while the feature extraction method is explained in section 3. Section 4 provides experimental results and discussion. Finally, conclusion is formulated in Section 5.

II. FRAR MODEL Recently, a refined approach of autoregressive model

called full range autoregressive model is used in [10,12] for characterizing texture and shape features more effectively.

Thus, we employed FRAR model with Bayesian approach [10,12] for extracting texture and shape feature of the proposed retrieval method.

Let X be a random variable that represents the intensity value of a pixel at location (k, l) in an image of size LL. The FRAR model [10,12] is expressed in equation (1)

0ji

2M

2M

jo

2M

2M

i

y)ε(x, j)y i,I(xy)I(x,

(1)

where )rcos()rsin(K

oo

and 2/)1M(Mjio

In equation (1), j)y i,I(x is the spatial variation due to image properties and (k,l) is the spatial variation due to additive noise and model coefficients Po, (o = 1, 2) are the variation among the low-level primitives in the sub-image region of size M x M. The model coefficients are interrelated. The interrelationship is established through the model parameters K, , , and , which are estimated using the BA [10,12]. The initial assumption about the model parameters is KR; > 1; and [0,2]. The model parameter K represents a real valued function; α represents the strength of the linear dependency of a pixel on its neighbouring pixels in terms of spatial orientation; the θ and ϕ represent the directions of the pixels as they are associated with the circular functions sine and cosine.

III. FEATURE EXTRACTION

A. Color The HSV (hue, saturation, value) color space is most

commonly used as it is more intuitive to human vision system; classification performance is higher than RGB in both noisy and noisy free conditions [10,12], which stimulates the idea to present this work on HSV color space. The image in HSV color space is separated into H, S and V components. Since the H and S components contains chromatic information, the H and S components are uniformly quantized into 8 level using generalized Lloyd algorithm [2,14] then the global distribution of local spatial correlation between the identical color elements at distance 1 is computed, called color autocorrelogram.

Let I be an image of size n x n, the colors in I are quantized into m colors c1,c2, c3…cm and p1 and p2 are the pixels at location (x1,y1) and (x2,y2) respectively. The color autocorrelogram [2,14] is expressed as in equation (2)

)I()I( kc

kc

(2)

where ic121jc2k

jc,ic Ip|kpp,IpPr)I(

B. Texture The texture feature is computed from the V component

image because it contains achromatic information. The V component image is divided into a number of sub-image regions of size 3 x 3. The autocorrelation coefficient is computed for each sub-image region using the model parameters of FRAR model [10,12], which are estimated using

Page | 537

the BA. A statistical test of hypothesis [15] is performed on computed autocorrelation coefficients to identify the textures in an image. The identified textures are represented in decimal number by performing a transformation (autocorrelation coefficient * 100) + 100, which results in between 0 to 200 and are called texnums, the local descriptor. The frequency of texnums is called texspectrum, the global descriptor. The texnums Vs texspectrum is represented using a histogram.

C. Shape The shape feature is extracted from the H, S and V

component images in order to avoid the loss of edges due to spectral variations and chromatic changes [12]. The image in HSV color space is converted from cylindrical space to Cartesian co-ordinate system then separated into H, S and V component images afterward a very minute and fine edges are detected on H, S and V component images using the FRAR model with BA [12], which is not sensitive to noise to some extent due to the basic behavior of the employed model. The orientation of edges are computed as in [12] and is expressed as in equation (3)

b.ab.aarccos (3)

where a and b represents the gradients along x and y direction respectively, and is expressed as follows

xxx VSHa

yyy VSHb and

yxyxyx VVSSHHab where xH is the gradient in H component image along the horizontal direction, xS is the gradient in S component image along the horizontal direction and so on.

The edge pixels are uniformly normalized using generalized Lloyd algorithm into 72 levels based on its orientation. Later, the global distribution of local spatial correlation between similar edge pixels is computed at distance {1,3} called a new version EOAC [12], which results in 72 dimension.

IV. IMAGE CLASSIFICATAION Since multiclass SVM-VVD reduces the computational

complexity for both training and testing by using the least number of classifiers, maintain a comparable accuracy and eliminates the unclassifiable regions by partitioning the feature space in a more effective manner than that of the existing refined approaches of SVM, this work uses multiclass SVM-VVD [21] in the classification phase.

The SVM parameter C can be regarded as a regulation parameter that imposes a balance between the minimization of the error function and the maximization of the margin of the optimal hyperplane. In SVMs different kernel functions are used such as linear, polynomial and Gaussian RBF, which automatically maps the data from the input space into a higher dimensional feature space nonlinearly and determines the success of SVM learning process. But, choosing an appropriate kernel function for a problem is depends on the data and still there is no method for choosing a suitable kernel function. In this paper, the choice of the kernel functions was

studied empirically and optimal results were achieved using the most widely used Gaussian RBF kernel function with the kernel parameter γ, which controls the trades off between the complexity of SVM and the number of non separable points. The best pair of (γ, C) that achieved highest validation rates is used to train the whole training dataset.

In the experiments, to attain the best generalization and to reduce the over-fitting problem, five-fold cross-validation is used. The N feature vectors in the feature vector dataset are randomly divided into five subsets of approximately equal sizes respectively. Each multi-class model was trained using four subsets and tested using the remaining one subset. Thus, training is repeated for five times i.e. five independent runs.

V. EXPERIMENTS The proposed method is implemented using Corel 1k [22] and Freefoto [23] benchmark datasets. The Corel 1k dataset consists of 10 classes of images like buses, beaches, buildings, flowers, horses, elephants, food, mountains, dinosaurs and africans, and the images are in JPEG format and in 256 x 384 or 384 x 256 size. The Freefoto dataset contains 5,226, 2062, 685, 837, 1067, 2014, 2256, 2308 and 2296 images of nature, animals, clouds, floods, textures, gardens, seasons, sunrise and sunset, and agriculture categories respectively, and the images are in JPEG format and in 125 x 83 or 83 x 125 size. Some of the sample images taken from the experimental datasets are shown in Fig.1.

(a)

(b)

Fig. 1. Sample images from (a) Corel 1k database (b) Freefoto database

The color, shape and texture features are computed as explained in section III. The computed features are combined together to form a composite feature vector, which results in 244 dimension (color – 16 (H component -8, S component -8), shape -72 and texture -201); stored in a separate space called feature vector space in the database along with the images;

Page | 538

each feature vector in the feature vector space is normalized using Z-Score normalization then the normalized composite feature vector is fed to multiclass SVM-VVD to perform classification. The advantage of using SVM-VVD here is that they provide improved performance over other existing methods [21] while requiring only modest computational requirements.

(a)

(b)

(c)

(d)

Fig.2. a) Average precision for Corel 1K dataset; b) Average precision for Freefoto dataset; c) Average recall for Corel 1K dataset;

d) Average recall for Freefoto dataset;

The Canberra distance [24] is used to measure the similarity between the query feature vector and target feature vectors by computing each feature pair difference and is

normalized by dividing it by the sum of a pair of features, and it returns value between 0 to 1, 0 for absolutely similar; 1 for absolutely dissimilar. The Canberra distance measure is expressed as in equation (4).

n

1i ii

ii

TQTQ

D (4)

where iQ and iT are the query and target feature vectors respectively, and n is the number of feature in the feature vector. The retrieval performances of the proposed and existing methods are evaluated using the precision and recall methods [2,12]. The precision measures the accuracy of the retrieval and recall measures the robustness of the retrieval and are computed as in equations (5) and (6).

Precision = images retrieved ofnumber Total

imagesrelevant retrieved ofNumber (5)

Recall =dataset whole theof imagesrelevant of no. Total

imagesrelevant retrieved ofNumber (6)

The average precision and recall of the proposed and existing method [12] for the Corel 1k and Freefoto datasets are shown in Fig. 2. It is observed from the results that the proposed method is significantly better than that of existing approach [12] in terms of accuracy, which is an average of precision and recall; and we strongly believed that it is because of the effective combination of multiclass SVM-VVD and Canberra distance measure apart from the efficiency and effectiveness of the composite feature vector used in the proposed and existing approaches. The obtained retrieval results of the proposed method for the Corel 1k and Freefoto datasets with top 5 matches in ascending order of distance between the query and target images is shown in Fig. 3 and Fig. 4 respectively.

(a)

(b) (c) (d) (e) (f)

Figure.3. Top 5 retrieval results obtained with the proposed method corresponding to the query image (a) from the

Corel 1k dataset.

(a)

(b) (c) (d) (e) (f)

Figure.4. Top 5 retrieval results obtained with the proposed method corresponding to the query image (a) from the

Freefoto dataset.

Page | 539

The experimental results indicates that the proposed method for image retrieval is significantly attains high accuracy and will be very useful for retrieving the images of personal photographs, enterprise image repositories, domain specific image repositories like satellite, agriculture, military, medicine, and web, which are huge in volume.

The proposed CBIR system is implemented with the system configuration: Pentium® Dual core personal computer with 2.20 GHz processor; 2 GB RAM; Java, MATLAB, Oracle and Windows 7.

VI. CONCLUSION In this paper, a novel approach for image retrieval using

color autocorrelogram, FRAR model with BA and SVM-VVD is proposed. The retrieval performance of the proposed method is tested and compared with the existing method. The experimental results confirm that the proposed method significantly outperforms the existing method. In future, the proposed framework can be utilized for video retrieval.

References 1. Stricker, M., Orengo, M., 1995, Similarity of color

images, in: Proc. SPIE Storage and Retrieval for Image and Video Databases, San Jose, pp. 381–392.

2. Chun, Y.D., Kim, N.C. and Jang, I.H., 2008. Content-Based Image Retrieval using Multiresolution Color and Texture Features, IEEE Transactions on Multimedia, 10(6):1073-1084

3. Liu, G.H., Yang, J.Y., 2008. Image retrieval based on the texton co-occurrence matrix, Pattern Recognition. 41 (12): 3521–3527.

4. Liu G, Zhang L, Hou Y, Li Z, Yang J., 2010. Image retrieval based on multi-texton histogram. J. Pattern Recognit. 43: 2380–2389.

5. Liu G.H., Li, Z.Y., Zhang, L., Xu. Y., 2011. Image retrieval based on micro-structure descriptor. Pattern Recognition. 44(9): 2123-2133.

6. Krishnamoorthi, R., Sathiya devi, S., 2012. A multiresolution approach for rotation invariant texture image retrieval with orthogonal polynomials model, J. Vis. Commun. Image R. 23: 18–30.

7. Wang Xingyuan, Wang Zongyu, 2013. A novel method for image retrieval based on structure elements’ descriptor, J. Vis. Commun. Image R., 24 : 63–74.

8. Subrahmanyam Murala, Q.M. Jonathan Wu, 2013. Local ternary co-occurrence patterns: A new feature descriptor for MRI and CT image retrieval, Neurocomputing, 119: 399–412.

9. Seetharaman, K., Kamarasan, M, 2014. Statistical framework for image retrieval based on multiresolution features and similarity method. Multimed. Tools Appl.3 (1), 53–66.

10. Seetharaman. K., Sathiamoorthy. S., 2014. Color image retrieval using statistical model and radial basis function neural network. Egyptian Informatics Journal, 15, 59-68.

11. Seetharaman.K., Sathiamoorthy.S, A framework for color image retrieval using full range Gaussian Morkov random field model and multi-class SVM learning approach, International Journal of Innovative Research. in Advanced Engineering, 1(7): 53-63, 2014.

12. Seetharaman. K., Sathiamoorthy. S, 2016. A unified learning framework for content based medical image retrieval using a statistical model. Journal of King Saud University-Computer and Information Sciences, 28(1), 110-124.

13. Penatti, O. A. B., Valle, E., Torres, R. D. S., 2012. Comparative study of global color and texture descriptors for web image retrieval, J. Vis. Commun. Image R., vol. 23, pp. 359–380.

14. Huang, C.-L., Huang, D.-H., 1998. A content-based image retrieval system, Image Vision Comp. 16:149–163.

15. Seetharaman, K., and Palanivel. N, 2013. Texture characterization, representation, description, and classification based on full range Gaussian markov random field model with Bayesian approach. International journal of image and data fusion. pp. 1-24.

16. Andalo′. F.A., Miranda, P.A.V., Torres, R.S., Falcao, A.X., 2010. Shape feature extraction and description based on tensor scale, Pattern Recognition. 43(1): 26-36.

17. Michel, D., Oikonomidis, I., Argyros, A., 2011. Scale invariant and deformation tolerant partial shape matching, Image Vision. Comput. 29(7): 459-469.

18. Wang, Z., Yan, Z., Chen, G., 2011. Lattice Boltzmann method of active contour for image segmentation. In: Sixth Inter. Conf. on Image and Graphic, pp. 338–343.

19. Collins, M., Xu, J., Grady, L., Singh, V., 2012. Random walks based multi-image segmentation: quasiconvexity results and GPU-based solutions. In: IEEE Conf. on Comp. Vision and Pattern Recognition, pp. 1656–1663.

20. Quellec, G., Lamard, M., Cazuguel, G., Cochener, B. Roux, C., 2010. Wavelet optimization for content-based image retrieval in medical databases, Medical Image Analysis. 14: 227–241.

21. Ran Wang, Sam Kwong, Degang Chen, Jingjing Cao, 2013. A vector-valued support vector machine model for multiclass problem. Information Sciences, 235: 174-194.

22. WWW.Corel.com 23. WWW.Freefoto.com 24. Fazal Malik, Baharum Baharudin, 2013. Analysis of

distance metrics in content-based image retrieval using statistical quantized histogram texture features in the DCT domain, Journal of King Saud University – Computer and Information Sciences (2013) 25, 207–218.

Documents

Discovery Engineering Analysisdiscoveryjournals.org/discoveryengineering/current_issue/2016/A16.… · histogram, local ternary co-occurrence matrix, texture analysis in the frequency