Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Computationally Intelligent Retrieval ofImages Bases on the Actual Image Contents
AuthorREHAN ASHRAF
11F-UET/PhD-CP-29
SupervisorDr. Khalid Bashir Bajwa
Department of Computer EngineeringUniversity of Engineering and Technology
Taxila, Pakistan
(2016)
Computationally Intelligent Retrieval ofImages Bases on the Actual Image Contents
REHAN ASHRAF11F-UET/PhD-CP-29
A thesis submitted in partial fulfilments for thedegree of
Doctor of Philosophy
Thesis SupervisorDr. Khalid Bashir Bajwa
Department of Computer EngineeringUniversity of Engineering and Technology Taxila,
Pakistan
(2016)
Computationally Intelligent Retrieval of ImagesBases on the Actual Image Contents
A Dissertation submitted in partial fulfilment of the requirement for the degree of Doctor
of Philosophy in Computer Engineering by:
11F-UET/PhD-CP-29Checked and recommended by:
Members of Research Monitoring Committee
Dr. Hafiz Adnan Habib Dr. Zulfiqar Hasan Khan Dr. Tabassam Nawaz
Foreign Experts
Dr. Jonathan Kok-Keong Loo Dr. Rupert YoungMiddlesex University London, UK University of Sussex, UK
Approved by:
Dr. Khalid Iqbal Dr. Umair Abdullah(External Examiner-1) (External Examiner-2)
Dr. Khalid Bashir Bajwa(Supervisor/Internal Examiner)
Author’s Declaration
I, Rehan Ashraf, affirm that my PhD thesis titled Computationally Intelligent Re-
trieval of Images Bases on the Actual Image Contents contains no material which has
been accepted for the award of any other degree or diploma in any university or other
institution and confirm that to the best of my knowledge, the thesis contains no material
previously published or written by another person, except where due reference is made in
the text of the thesis.
Rehan Ashraf
Dated:
Plagiarism Undertaking
I take full responsibility of the research work conducted during the PhD Thesis titled
Computationally Intelligent Retrieval of Images Bases on the Actual Image Con-
tents. I solemnly declare that the research work presented in the thesis is done solely by
me with no significant help from any other person; however, small help wherever taken is
duly acknowledged. I have also written the complete thesis by myself. Moreover, I have
not presented this thesis (or substantially similar research work) or any part of the thesis
previously to any other degree awarding institution within Pakistan or abroad.
Therefore, I, as an author of the above-mentioned thesis, solemnly declare that no
portion of my thesis has been plagiarized and any material used in the thesis from other
sources is properly referenced. Moreover, the thesis does not contain any literal citing of
more than 70 words (total) even by giving a reference unless I have the written permission
of the publisher to do so. Furthermore, the work presented in the thesis is my own orig-
inal work and I have positively cited the related work of the other researchers by clearly
differentiating my work from their relevant work.
I further understand that if I am found guilty of any form of plagiarism in my thesis
work even after my graduation, the University reserves the right to revoke my PhD degree.
Moreover, the University will also have the right to publish my name on its website that
keeps a record of the students who plagiarized in their thesis work.
Rehan Ashraf
I dedicate this thesis to my loving parents,elder brothers, sister, my wife and kids.
Acknowledgements
Primarily, I thank Almighty Allah for giving me the strength and ability to
pursue this work to its conclusion. During this research, I have worked with
many people who have contributed in a variety of ways to my research. It is
a pleasure to convey my gratitude to all of them in my humble acknowledg-
ment. In my opinion, writing thesis is a difficult task without any guidance.
I would like to express my gratitude to all those who gave me the possibility
to complete this thesis. First of all, I would like to thank Dr. Khalid Bashir
Bajwa for his help, support, encouragement, teachings and supervision. His
wise academic advice and ideas have played an extremely important role in
the work presented in this thesis. I gratefully thank the members of my Pro-
posal defense committee for their constructive comments about this thesis.
UET Taxila provides the opportunity to meet amazing people and make new
friends and also provide the amazing atmosphere inside the institution.
I would also like to pay my gratitude to my teachers, Dr. Adeel Akram,
Dr. Iram Baig, Dr. Hafiz Adnan Habib, Dr. Zulfiqar Hassan Khan and Dr.
Tabbassum Nawaz without their guidance and motivation during my PhD,
none of this would have been even possible.
I am also thankful to Dr. Syed Aun Irtaza, who gave me an opportunity to
work with him and was always there to help me out. He provided the freedom
to pursue my own ideas and, at the same time, were rigorous in reviewing my
work. For that purpose, and for granting me the opportunity in such a re-
markable research field, I am thankful to him. He had a fundamental role in
bringing this document to life. Lots of thanks to my friends specially Toqeer
Mehmood, Zahid Mahmood, Farhan Aadil and colleagues in university of
Engineering and Technology Taxila, for their cooperation. During the whole
period of research, I spent a big part of each day in the UET research cen-
tre, so UET has become a part of my precious memories. Last but not the
least, my heartiest gratitude to my beloved parents, father-in-law, my elder
brothers (Professor Imran Ashraf, Engr. Rizwan Ashraf, Dr. Kamran Ashraf,
Professor Rabnawaz, Irfan Ashraf) and my wife for their unconditional sup-
port in every aspect of my life. They extended their helpful hands whenever
I needed. Without their patience and inspiration, it would not be possible for
me to start and continue my research. Above all, I thank Allah for blessing
me with all these resources, favours, and enabling me to complete this thesis.
Abstract
The images serve as a significant format for human communications and
they deliver a rich amount of information for people to understand the digital
world. With wide spread use of internet and availability of the digital imaging
techniques, more and more images are accessible to the world. As a result,
efficient image indexing and retrieval has grown exponentially. The current
form of image retrieval is based on the textual annotations that are used to
describe the image content; but in today’s global world textual annotation of
images are becoming impractical, unfeasible and inefficient to represent and
retrieve the images. On the list of key demands imposed with a CBIR tech-
nique should be effectively carried out the particular image content evaluation
and analysis, and generated an output in the form of image collections that
also has the similar semantics. The targeted output can be generated if we
train and evaluate the semantic classifiers on the actual objects of interest, but
this is a challenging problem as the image content is very diverse especially
in the natural image scenes. Targeting the problem by the application of the
image segmentation techniques to map what constitute an image is a well de-
veloped approach in the domain of CBIR, but the results of these approaches
are even out of the discussion. Some of the reasons are: (1) Currently there is
no robust way to perform the segmentation to figure out all objects or even the
complete structure of objects in most of the images. (2) In Computer Vision
(CV), Segmentation is still an open problem. (3) Segmentation slows down
the image representation process which is not a feasible thing in the real time
systems like CBIR. However, despite the great deal of research work, the im-
age retrieval performance of the CBIR is not satisfactory because of the gaps
between the semantic representation of the image at a low level and visual
concepts at a high level.
Therefore, based on these points, we believe that the application of the cur-
rently existing segmentation techniques for image representation is not a de-
manding thing. We have addressed the image representation by analyzing the
image contents through bandlets. The benefit of the scheme is that the image
representations are very powerful as we target the actual objects of interest
present in an image in a much effective way that is even impossible by using
segmentation based techniques. Secondly, the image analysis through Ban-
delet transform is a novel way for image retrieval and we have applied it for
generating the meaningful image representations and training the semantic
classifiers. Consistency enhancement in semantic association process, ad-
dresses the two main reasons, experienced by the conventional framework of
CBIR are not able to achieve effective results retrieval. These are: the lack
of output verification and neighborhood similarity avoidance. Due to these
problems the image retrieval response is very inconsistent and the target out-
put contains wrong results as compared to the right results. In this thesis,
we concentrate these issues by applying the Neural Networks over the bag
of images (BOI), and exploring the query’s semantic association space. The
semantic association process involves the Artificial Neural Networks (ANN)
that are trained over the resultant representations and guarantees the impres-
sive image retrieval output.
Contents
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Research Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Research Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 CBIR Background 10
2.1 Content Based Image Retrieval (CBIR) . . . . . . . . . . . . . . . . . . 10
2.2 User Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.1 Spectral Clustering Approach . . . . . . . . . . . . . . . . . . . 15
2.3.2 An Unsupervised Approach (JSEG approach) . . . . . . . . . . . 16
2.3.3 Multiclass Image Semantic Segmentation (MCISS) . . . . . . . . 17
2.4 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.1 Texture Feature Extraction . . . . . . . . . . . . . . . . . . . . . 18
2.4.1.1 Spatial Method . . . . . . . . . . . . . . . . . . . . . . 20
2.4.1.2 Tamura Feature . . . . . . . . . . . . . . . . . . . . . 20
2.4.1.3 Markov Random Fields . . . . . . . . . . . . . . . . . 21
2.4.1.4 Co-occurrence Matrix . . . . . . . . . . . . . . . . . . 21
2.4.1.5 Edge Histogram . . . . . . . . . . . . . . . . . . . . . 22
2.4.1.6 Fractals in Spatial Feature . . . . . . . . . . . . . . . . 23
2.4.2 Spectral Domain . . . . . . . . . . . . . . . . . . . . . . . . . . 23
vii
CONTENTS
2.4.2.1 Fourier Transform . . . . . . . . . . . . . . . . . . . . 23
2.4.2.2 Discrete Cosine Transform and Wavelet Transform . . . 25
2.4.2.3 Gabor Filters Transform . . . . . . . . . . . . . . . . . 26
2.4.2.4 Curvelet Transform . . . . . . . . . . . . . . . . . . . 27
2.4.3 Color Feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.4.3.1 Color Histogram . . . . . . . . . . . . . . . . . . . . . 31
2.4.3.2 Color Coherence Vector . . . . . . . . . . . . . . . . . 32
2.4.3.3 HSV Color Space . . . . . . . . . . . . . . . . . . . . 33
2.4.3.4 HUE MINIMUM MAXIMUM DIFFERENCE (HMMD) 34
2.4.3.5 Dominant Color Descriptor (DCD) . . . . . . . . . . . 34
2.4.3.6 YCbCr Color Space . . . . . . . . . . . . . . . . . . . 35
2.4.4 Shape Feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.4.4.1 Shape Extraction based on Contour Method . . . . . . 37
2.4.4.2 Simple shape descriptors . . . . . . . . . . . . . . . . 37
2.4.4.3 Shape signature . . . . . . . . . . . . . . . . . . . . . 38
2.4.4.4 Stochastic method . . . . . . . . . . . . . . . . . . . . 38
2.4.4.5 Spectral transform . . . . . . . . . . . . . . . . . . . 39
2.4.5 Interest point detector . . . . . . . . . . . . . . . . . . . . . . . 39
2.4.5.1 Harris detector . . . . . . . . . . . . . . . . . . . . . . 40
2.4.5.2 Speeded up robust feature (SURF) . . . . . . . . . . . 40
2.4.5.3 Scale Invariant Feature Transform (SIFT) . . . . . . . . 42
2.5 Similarity Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.5.1 Metric Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.5.1.1 Manhattan Distance . . . . . . . . . . . . . . . . . . . 44
2.5.1.2 Euclidean Distance . . . . . . . . . . . . . . . . . . . 45
2.5.1.3 Minkowski Distance . . . . . . . . . . . . . . . . . . . 45
2.5.1.4 Hausdorff Distance . . . . . . . . . . . . . . . . . . . 45
2.5.2 Histogram Distance . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.5.2.1 Earth Movers Distance . . . . . . . . . . . . . . . . . 46
viii
CONTENTS
2.5.2.2 Kullback-Leibler (KL) Divergence . . . . . . . . . . . 46
2.6 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.6.1 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.6.1.1 Precision . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.6.1.2 Recall . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.6.2 Precision-Recall Graphs . . . . . . . . . . . . . . . . . . . . . . 49
2.6.3 Mean average precision . . . . . . . . . . . . . . . . . . . . . . 49
2.7 Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.8 Multimedia Information Retrieval . . . . . . . . . . . . . . . . . . . . . 50
2.8.1 Image representations and similarity detection . . . . . . . . . . . 50
2.8.2 Image block based presentation and salient points . . . . . . . . . 53
2.8.3 Image classification and similarity detection . . . . . . . . . . . . 54
2.9 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3 Bandelet Transform 58
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.2 Surface Compression through Geometric Bandelet . . . . . . . . . . . . 59
3.3 Bandelet Image Compression . . . . . . . . . . . . . . . . . . . . . . . . 61
3.3.1 Geometric image model . . . . . . . . . . . . . . . . . . . . . . 62
3.3.2 Geometric Image Flow with Bandelet Bases . . . . . . . . . . . . 63
3.3.3 Image Compression Through Bandelet . . . . . . . . . . . . . . 65
3.4 Orthogonal Bandlets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.4.1 Block Based Bandelet Basis . . . . . . . . . . . . . . . . . . . . 67
3.4.2 Fast Discrete Bandelet Transform . . . . . . . . . . . . . . . . . 69
3.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4 Feature Extraction Using Bandelet Transform 74
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.2 Image Representation using Bandletized region include YCbCr Color Space 76
4.2.1 Modified Bandelet Transform . . . . . . . . . . . . . . . . . . . 77
ix
CONTENTS
4.2.1.1 Alpert bases in bandelet transform . . . . . . . . . . . 80
4.2.1.2 Texture Feature Extraction using Bandelet . . . . . . . 84
4.2.1.3 Artificial Neural Network . . . . . . . . . . . . . . . . 85
4.2.1.4 Gabor Feature . . . . . . . . . . . . . . . . . . . . . . 87
4.2.2 Color Feature Extraction . . . . . . . . . . . . . . . . . . . . . . 89
4.2.3 Fusion Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.3 Image Representation using HSV . . . . . . . . . . . . . . . . . . . . . . 91
4.3.1 Color Feature HSV Domain . . . . . . . . . . . . . . . . . . . . 92
4.3.2 Combine texture and HSV color feature . . . . . . . . . . . . . . 93
4.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5 State of Art Classifier For Image Retrieval 95
5.1 Semantic Association . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.2 Content based image retrieval using ANN . . . . . . . . . . . . . . . . . 98
5.3 Content based image retrieval using SVM . . . . . . . . . . . . . . . . . 99
5.4 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.4.1 Image Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.4.2 Retrieval Precision/Recall Evaluation . . . . . . . . . . . . . . . 102
5.4.3 Comparison on Corel Image Set . . . . . . . . . . . . . . . . . . 104
5.4.4 Comparison with State-of-the-Art Methods . . . . . . . . . . . . 109
5.4.5 Comparison on Coil Image Set . . . . . . . . . . . . . . . . . . . 114
5.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6 Conclusions and Future Work 116
6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6.2.1 Extensions of the Work . . . . . . . . . . . . . . . . . . . . . . . 117
6.2.2 Practical Implications of CBIR . . . . . . . . . . . . . . . . . . . 118
6.2.3 Anomalous Events Monitoring in Surveillance Videos . . . . . . 118
6.2.4 Traffic Management . . . . . . . . . . . . . . . . . . . . . . . . 119
x
CONTENTS
6.2.5 CBIR using Hadoop . . . . . . . . . . . . . . . . . . . . . . . . 119
References 139
xi
CONTENTS
List of Publications
1. R.Ashraf, K.Bashir, A.irtaza, M.T.Mahmood; Content Based Image Retrieval Us-
ing Embedded Neural Networks with Bandletized Regions, Entropy, Vol 17. No.6,
pp:3552-3580, 2015.(IF=1.5)
2. R.Ashraf, K.Bashir, T.Mehmood; Content-based Image Retrieval by Exploring
Bandletized Regions through Support Vector Machines, Journal Of Information
Science and Engineering, J INF SCI ENG 2015. Accepted Id:150159.(IF=0.5)
3. R.Ashraf, K.Bashir, A.irtaza, T.Mehmood; A Novel Approach For the Gender Clas-
sification Through Trained Neural Networks, Journal of Basic and Applied Scien-
tific Research, Vol 4. No.6, pp.136-144, 2014.(ISI Indexed)
xii
List of Figures
1.1 Query Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Image of cluttered scene . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Putatively Matched Points . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Query Result:1st image is query image while remaining images in the
group are retrieved images . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1 Typical CBIR structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 JSEG algorithm for color image segmentation . . . . . . . . . . . . . . . 16
2.3 JSEG segmentation results . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 MCISS Frame work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5 classification of texture features extraction methods . . . . . . . . . . . . 19
2.6 Image and its Fourier Transform . . . . . . . . . . . . . . . . . . . . . . 24
2.7 Different images with same Fourier Transform . . . . . . . . . . . . . . . 24
2.8 Five-level curvelet digital tiling of an image . . . . . . . . . . . . . . . . 28
2.9 RGB Color Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.10 Image and its color histogram . . . . . . . . . . . . . . . . . . . . . . . . 32
2.11 Images and their CCV color feature vectors . . . . . . . . . . . . . . . . 33
2.12 The RGB and HSV Color Space . . . . . . . . . . . . . . . . . . . . . . 34
2.13 HMMD Color Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.14 Classification of shape representation and description technique . . . . . 36
2.15 Shape eccentricity and circularity . . . . . . . . . . . . . . . . . . . . . . 37
2.16 Surf interest point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.17 Surf interest point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
xiii
LIST OF FIGURES
2.18 SIFT interest point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.19 Accuracy Parameters - Correct and incorrect labeling of an image . . . . 47
3.1 Horizon model with a flow . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.2 Flow in region in an image . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.3 Dyadic square segmentation of an image . . . . . . . . . . . . . . . . . . 68
3.4 Left columns gives zooms of noisy images having a PSNR = 20:19 dB.
The middle and left columns are obtained, respectively, with bandelet and
wavelet estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.1 Proposed Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.2 Bandelet Transform [1; 2]. (a) Dyadic segmentation depends on local
directionality of the image; (b) Bandelet segmentation square which con-
tains a regularity function as shown by the red dash; (c) sampling posi-
tion and Geometric flow; (d) Sampling position adapted to the warped
geometric flow; (e) warping example. . . . . . . . . . . . . . . . . . . . 79
4.3 Geometric flow representation using different block sizes (a) small size
4*4 (b) medium size 8*8 (a) small size 4*4; (b) medium size 8*8. . . . . 79
4.4 Object categorization on the base of Geometric flow obtained
through Bandletization. . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.5 The structure of neural network. . . . . . . . . . . . . . . . . . . . . . . 86
4.6 Types of texture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.7 (a) RGB Original Image; (b) Y matrix Luminance Image; (c) Canny
Luma Image; (d) Canny RGB Image. . . . . . . . . . . . . . . . . . . . 90
4.8 Proposed Method HSV. . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.9 (a) RGB Original Image; (b) H matrix Hue Image; (c) Canny Hue Image;
(d) Canny RGB Image. . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.1 Verification inconsistency: Similarity of the visual features, but both im-
ages belong from different semantic classes. . . . . . . . . . . . . . . . . 97
5.2 Query Performance on Corel image dataset with Top 10 to Top 40 Retrievals.105
xiv
LIST OF FIGURES
5.3 Query Performance on Caltech image dataset with Top 10 to Top 60 Re-
trievals in terms of Precision. . . . . . . . . . . . . . . . . . . . . . . . . 106
5.4 Query Performance on Caltech image dataset with Top 10 to Top 60 Re-
trievals in terms of Recall. . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.5 Comparison of mean precision obtained by proposed method with other
standard retrieval systems. . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.6 Comparison of mean recall obtained by proposed method with other stan-
dard retrieval systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.7 Comparison of mean precision obtained by proposed method with state
of art retrieval systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.8 Comparison of mean recall obtained by proposed method with state of art
retrieval systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.9 Mean Precision of Proposed Method-HSV compared with other standard
retrieval systems on top 20 retrievals. . . . . . . . . . . . . . . . . . . . . 113
5.10 Comparison of mean recall obtained by proposed method-HSV with other
standard retrieval systems. . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.11 Comparison of precision and recall obtained by proposed method with
ICTEDCT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
xv
List of Tables
2.1 Popular CBIR system implemented in commercial and academia . . . . . 13
2.2 Features calculated by using matrix of normalized co-occurrence P (Q, R) 22
4.1 Summary of Neural network structure for every image category used in
this work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.1 Comparison of mean precision obtained by proposed method with other
standard retrieval systems on top 20 retrievals. . . . . . . . . . . . . . . . 107
5.2 Comparison of mean recall obtained by proposed method with other stan-
dard retrieval systems on top 20 retrievals. . . . . . . . . . . . . . . . . . 108
5.3 Comparison of mean precision obtained by proposed method with state-
of-art methods on top 20 retrievals. . . . . . . . . . . . . . . . . . . . . . 110
5.4 Comparison of mean recall obtained by proposed method with state-of-art
methods on top 20 retrievals. . . . . . . . . . . . . . . . . . . . . . . . . 111
5.5 Mean Precision of Proposed Method-HSV compared with other standard
retrieval systems on top 20 retrievals. . . . . . . . . . . . . . . . . . . . . 112
5.6 Comparison of mean recall obtained by proposed method-HSV with other
standard retrieval systems on top 20 retrievals. . . . . . . . . . . . . . . . 113
xvi
NomenclatureSymbol Description
CBIR Content based image retrieval
RF Relevance feedback
SVM support vector machine
QBE Query by example
EM Expectation Maximization
CCV Color Coherence Vector
HMMD hue-min-max-difference
HSV Hue, saturation, value
FD Fourier Descriptor
Y CbCr Luminance and crominance model
DoG Difference of Gaussian
L1distance Manhattan distance
L2distance Euclidean distance
MCM Motif Co-occurrence Matrix
DDC Dynamic Dominant Color
MIR Multimedia Information Retrieval
LBP Local Binary Patterns
CHAPTER 1
INTRODUCTION
There are many assets and properties on the Web sites that can be used to create public
images. It has generated the need to search a way of these images. In this way, finding
productive image recovery systems has gotten to be and transformed into a wide territory
of eagerness to researchers. The frame work of retrieval of image technique is looking and
recovering images from an extensive repository of digital images. In the last few decades,
researchers have been working on image retrieval techniques and two types of promising
techniques have been developed such as, Content Based Image Retrieval (CBIR) and Text
Based Image Retrieval (TBIR). In TBIR methods, users use keywords or descriptions to
the images as query so that they can use the retrieved images, which are pertinent to the
keyword.
Text-based retrieval has several dis-advantages. Primarily, there is inconsistency
through labels by using image annotators caused by diverse realizing about image con-
tents e.g. an image consisting of grass and flowers might be labeled as either grass or
flower or nature by different people. Second, it takes more time to annotate an image in
a large database and makes the process subjective [3]. Third, there is a high likelihood
of mistakes event amid an image labeling or tagging process when the database is large.
Subsequently, text based image retrieval cannot accomplish high level of productivity and
effectiveness. CBIR has several advantages over the traditional text based retrieval. Due
to using the visual contents of the query image in CBIR, it is a more efficient and effective
1
1.1 Motivation
way at finding relevant images than searching based on text annotations. CBIR does not
consume the time wasted in manual annotation process of text based approach. These
advantages have motivated us to employ a CBIR technique for our research. The CBIR is
an automated technique and is based the features in low level images like color, texture,
shape and combination are extricated from image of repository.
The chapter starts with the discussion of the factors that motivated us to explore the
domain of the CBIR. The section is followed by the research objectives that are discussed
in section 1.2. Original contributions of the dissertation are reported in section 1.3. Fi-
nally, a short overview of the remaining chapters is provided in section 1.4.
1.1 Motivation
Due to an increasing of multimedia application and widely use of internet, the image li-
braries having digital contents have seen dramatic expansion. Therefore, need to explore
these libraries and access the appropriate information in an automatic way which has mo-
tivated research in the domain of image retrieval. Automatically retrieving images have
found vast applications in many fields like geographical information systems, surveillance
systems, remote sensing, data mining, architectural design, fabric design, internet image
search, medical image retrieval, satellite imaging, video search and communication sys-
tems. To cope with the challenges of image retrieval, many commercial search services
like Google, Bing etc. have become indispensable tool in the people’s work and daily
life [4; 5]. But as these search services usually retrieve images through keywords and
metadata, therefore, they cannot help in the retrieval of images which often do not have
such associated details of information. These image retrieval systems also suffer from
other critical issues like keywords limitation, lack of appropriate metadata association,
and high cost of text manual annotations. So, to avoid these shortcomings and to enhance
the capabilities of image retrieval system, an important focus of research is on the CBIR.
Several application areas like surveillance systems, architectural design, data mining,
GIS and remote sensing, fabric design, and video search etc can also obtain the benefit
2
1.1 Motivation
of the automatic image content analysis and association in their particular domains. The
existing techniques for image retrieval are based on the content specification by manual
descriptions in the form of metadata are no longer be able to deal with such massive
repositories because of the size, diversity of the content, high cost of the manual anno-
tations, and unavailability of the globally acceptable keywords. Therefore, over the past
few years a novel and exciting field of study, the CBIR has emerged in the domain of
computer vision.
CBIR performs the image content analysis by using the visual attributes such as color,
shape, texture, salient or the key points; and textual descriptions associated with the im-
ages are never considered for image retrieval. Principle working subject of the CBIR is
always to represent the actual images such as reduced level characteristics and attributes
in the form of low level visual features and then find the semantically similar images from
image repository by applying the metric or non-metric norms. Hence, the relevant out-
put is dependent on two factors i.e. the ability to effectively representation of the images
based on the actual visual contents, and a norm that is capable enough to match the feature
vectors but also to bring the output that is similar in terms of semantics.
If both of these factors are not addressed properly it results in the form of improper
output. One of the key reasons found in the literature of CBIR that causes this imperfect
output is the inherent semantic gap that is found between the representation of an image
in low level and semantic concept of high level in the image. Therefore, as a motivation
for this research work, we have addressed the issue by introducing a new powerful im-
age representation scheme through Bandelet transform. The novel feature representation
contains the benefits of local object level detection and representation and generates the
combinatorial version by considering the global image semantics as well. The power-
ful image representation is systematically combined with Artificial Neural Networks to
generate powerful image retrieval output that lowers the semantic gap as well [6].
Thus as a motivation of this thesis, we address these issues, and reported the ways
through which retrieval of images can perform in more robust way. Although, in the last
decades have been producing a remarkable research work in the territory of CBIR, yet at
3
1.2 Research Objectives
the same time CBIR is not a full grown exploration zone. The major motivation of thesis
is to contribute in CBIR field by addressing the existing challenges and presenting new
ideas.
1.2 Research Objectives
The Powerful image representations are required for efficient retrieval, a lot of research
works have emphasized this problem by computing the global frequency of the image (i.e.
the variance, mean etc over texture), and many other research works are addressing the
problem by considering the object frequencies that compose an image by segmentation or
the keypoints detection (e.g. SIFT, SURF). The benefit of the global frequencies as image
features is that it gives us an opportunity to sense the overall semantics of the image but it
cannot work effectively in object searching e.g. as elaborated in figure 1.1 and 1.2. If we
try to match the contents of figure 1.2 and 1.1, then it is not possible to find the relevant
information although the figure 1.2 contains the complete contents of figure 1.1.
The reason is that the global image frequencies of both images are different. But for
this problem, if searching is performed through the local features (e.g. SIFT, SURF),
it will work efficiently as described in figure 1.3. Local features work impressively for
object detection but the results are not optimal when it is required to determine the overall
image semantics as described in figure 1.4. Therefore, in this research work we have
presented a novel way of feature extraction through Bandelet transform that is able to
reap the benefits of both schemes (i.e. global and local). The Bandelet transform provide
an opportunity to deeply analyze the image up to the image objects and also overcomes
the drawbacks found in the segmentation based techniques.
To extract features and Image representation purposes, we have considered the image
objects and on the basis of these object we have computed the overall (i.e. global) image
energy, and consider this energy for image representation. Artificial Neural Networks
(ANN) and Support Vector Machine (SVM) are trained over these representations so that
the ultimate target of reducing the semantic gap can be achieved.
4
1.2 Research Objectives
Figure 1.1: Query Image
Figure 1.2: Image of cluttered scene
Figure 1.3: Putatively Matched Points
5
1.3 Research Contributions
1.3 Research Contributions
Several methods have been described in CBIR techniques using both the spatial and spec-
tral approaches. Most of them are not capable of effective texture representation. Bandelet
transform provides in depth details about image in the spectral domain by using more ori-
entation information at each scale. Hence, our main contribution and principle goal is to
use Bandelet texture features for CBIR.
Following key points are the original contributions of our thesis:
• A new scheme for object detection in images based on bandlets.
• A new feature extraction scheme by combining the object representations with
global energy of an image.
• Applications of the texture specific parameters for Gabor filters.
• A new feature extraction scheme that combines the texture features with color fea-
tures for better image representations.
• Adapt the neural network parameters that ensures the minimum mean squared error.
• Instead of the relevant image search on the basis of query image, a probabilistic
interpretation of the multiple images for image search is presented.
6
1.4 Thesis Outline
Figure 1.4: Query Result:1st image is query image while remaining images in the group
are retrieved images
1.4 Thesis Outline
It is organizing the rest of this thesis into five chapters. A brief summary of each of these
chapters as follows:
An overview of the Content Based Image Retrieval (CBIR) is presented in chap-
ter 2. This chapter reviews the relevant topics in CBIR and provides a comprehensive
back ground about the current research. Different paradigms of user query, segmentation,
choice of visual features (color, texture, shape), salient points, similarity measures, image
7
1.4 Thesis Outline
benchmarks (datasets) and approach of results presentation and performance evaluation
are discussed in this chapter.
Chapter 3 specifically focuses on Bandelet Transform in the image retrieval process.
Bandelet Transform are widely used in Image compression, noise removal application,
multi scale structuring, geometry compression. The bandelet alter gives the details of
geometric image occasionally by means of getting rid of the particular redundancy of
wavelet alter.
In chapter 4, we have focused for the generation of powerful feature vectors. To
achieve this, first of all Bandelet transform are applied on repository, which returns the
geometric limits of the significant items found in image which returns the major objects
found in an image. We apply Back propagation neural network system which makes it
sure that the composition estimation parameters to apply Gabor filter with focused pa-
rameters (as will be depicted) to estimate the content of texture around the boundaries
with maximum accuracy. To increase the power of feature vector color components are
also induced in the YCbCr and HSV domain after approximating it through wavelet de-
composition over the color histogram. CBIR performs the image content investigation
by the visual properties e.g. shape, color, texture remarkable, salient or the keypoints;
printed portrayals connected with the images are never considered for image retrieval.
For classification purposes, using SVM and ANN is suitable to determine the semantic
classes.
“Chapter 5 concentrates more deeply on the design of a stable semantic classifier
through ANN and SVM. When there are several categories enrolled then the samples of
a particular category are naturally far less than the comparative samples from other cate-
gories. In this way, when proposing the framework of return or any category of semantic
query image, it is returned images which have the same semantic category for the user
after the arrangement on the base of the distance with respect to the query image sim-
ilarity features. So, all this guarantees the performance enhancement in image retrieval
process.”
8
1.4 Thesis Outline
Chapter 6 summarizes the major research results of the dissertation. The key findings
and future direction are highlighted in this chapter.
9
CHAPTER 2
CBIR BACKGROUND
This chapter introduces the content based image retrieval and reviews relevant topics to
provide the basis for this thesis. The chapter is arranged in the following order: Section
2.1 discusses the content based image retrieval (CBIR). Section 2.2 provides the details
of user query in a CBIR environment. Section 2.3 presents the details of segmentation
in a CBIR domain. Different ways of feature extraction and image representation are
described in section 2.4. Section 2.5 covers the detail used to compute the relevance
against a query image. Section 2.6 provides the details of Performance evaluation. Section
2.7 discusses image benchmarks being used in CBIR. Section 2.8 presents a review of
multimedia information retrieval in the field of CBIR. Section 2.9 summarizes the chapter.
2.1 Content Based Image Retrieval (CBIR)
Data and information presented as images continue to grow astronomically, as a result of
intensive use of multimedia services which has been equipped with digital cameras, and
because of the interface that the internet transformed, internet researchers such as Google
to fully heal reports in light of the specific content of the texts. In any case, the nature of
the outcome is frequently a long way from ideal when it comes down to finding symbolic.
Because the universe is scope for words to describe a limited picture.“The well-known
saying a picture is worth a thousand words point out one such reason, why it is so difficult
10
2.1 Content Based Image Retrieval (CBIR)
to track down the images someone is looking for an image.” Understanding an image
as well as interpretation the particular semantics are stimulating tasks for computers and
human perception. Because, significance are comparable and relative, as well as changes
from one person to another [7; 8; 9]. For example, just what an individual person might
comprehend as being a holiday image featuring any pile could be by simply someone
else because landscape regarding Iceland, whilst one third one may illustrates particular
picture as being a volcano on the edge regarding eruption. Search engines need to be
capable of providing pictures which can be available to multiple descriptions as well as
interpretations.
Consequently, to resolve this problem depends on the image independence search
keywords and metadata associated with them. This particular objective has guided the
scientists and researchers to find new horizons in the CBIR field. Moreover, the system
can effectively needed and strong CBIR to overcome the semantic problems in image
retrieval. The dilemma is that the images are hosted in an unorganized way and are
searched through the associated keywords and meta-data that makes it very difficult for a
user to view all the related images due to the inherent limitations of the textual data for
describing and representing the content.
Figure 2.1: Typical CBIR structure - [10]
For this reason, a lot of general purpose graphic access systems have been produced
such as text based and as well content based. The text based methodology is usually fol-
lowed back to 1970s. In such frameworks, the particular graphics usually hand simply be
11
2.1 Content Based Image Retrieval (CBIR)
annotated by descriptors textual content, which happens to be then used by any database
management system (DBMS) to accomplish graphic access and having two disadvantages
of using this type of method. An example may be that your sizeable a higher level indi-
vidual the time becomes necessary pertaining to handbook annotation. The 2nd may be
the annotation inaccuracy, a result of the subjectivity associated with individual belief.
To help triumph over the above mentioned disadvantages with text based access method,
CBIR seemed to be unveiled inside beginning 1980s.
CBIR recognizes image on the base of their visual contents. Computer vision tech-
nique refers the features of low-level like color, texture, shape, spatial layout etc. The real
meaning of content based is to evaluate the actual contents of an image. The term content
describes the color, texture, shapes and other information related to the image. CBIR is
to find an image or group of images that are related to the query image. Figure 2.1 [10]
shows the general CBIR system. Various researchers have described the relationship of
features of low level and high level [11; 12]. There are three categories of features classi-
fied features that are low level features, middle level features and high level features. The
gap between first two levels (low-level and high-level) or limited descriptive power and
productivity of user is known as semantic gap [13; 14].
The CBIR is an automated technique and based an image in low level features like
texture, shape and color are extracted from the image of repository. CBIR is more effi-
cient and effective way at finding relevant images than manual text annotation techniques.
Many CBIR systems are introduced [3; 15] throughout the years. Some of these frame-
works are commercial and most are academic (scholarly) the point of interest is appeared
in table 2.1.
This research is based on the features extraction techniques involved in image re-
trieval, machine learning techniques, classification and computationally intelligent re-
trieval of images through which an efficient CBIR system can be proposed.
12
2.1 Content Based Image Retrieval (CBIR)
Table 2.1: Popular CBIR system implemented in commercial and academia
CBIR Systems Characteristic Category
Visual SEEK Image assessment through corresponding salient coloration
parts with regards to hues, sizes, and also relative spatial
locations.
Academic
Photo book Incorporate retrieval mechanisms for two dimensional
shapes, face recognition and texture images.
Academic
Multimedia
Analysis and
Retrieval Sys-
tems(MARS)
Consolidate importance relevance feedback from the user
for consequent result refinements.
Academic
NeTra In this system, segmentation technique is used. First of all,
an image is divided into regions of homogenous color, then
extracted features from those regions by applying the color,
texture, shape and spatial location.
Academic
QBIC Determined by several capabilities which might be decided
on from the consumer extracts coloring capabilities pertain-
ing to personal physical objects or even the complete image
in several coloring areas. Furthermore, it includes texture
structure and shape feature in addition to design capabili-
ties.
Commercial
Virage This system introduced simple features like as global and
local color, shapes and texture. For similarity measure pur-
pose indexing technique is presented when developing an
application.
Commercial
13
2.2 User Query
2.2 User Query
The way to query the CBIR system and describe the information needs of user, is an
inherent problem. In CBIR systems, users provide an example image to image retrieval
system and consider the image content as the information are similar which user is trying
to search. This query hypothesis is known as Query by Example (QBE) scheme. Apart
from the search algorithms, the retrieval results must share at least some elements as that
are the part of the input image which serves as the search example. The image retrieval
process starts by computing the visual features of the query image. CBIR system to
bring the response images having similarity in features and appear close to the user’s
query within distance. Instead of taking the whole image as the jargon for users search
intention. Some CBIR systems allow users to mark image regions, which user would
like to search. This approach is known as query by image region. The approach must
allow the system users to mark the regions in the input image which, they are intended
to search. For this approach, CBIR system must be equipped with unsupervised method
of segmentation. Blobworld [16] and NeTra [17] are examples of this query approach.
In this query scheme, the retrieval results are much better as it does not consider the
irrelevant image portions in the repository search.
To more effectively deal the automatic image retrieval problem, some systems do not
only depend on a single example image. In this regard, they take several example images
from the user [18], and computes commonalities amongst them. Most representative im-
ages have major contributions in image search. Contribution role is determined through
the weights that are assigned to all example images. After this the image repository is
searched on the base of these images in which image commonalities serves as the basis of
query [19]. Another approach to take multiple example images is through relevance feed-
back. In this paradigm system iteratively learn the users information need and improve
itself as the relevant example images increases. Some query paradigms merge different
modalities (like body gestures, touch and voice) to query of the CBIR system. One exam-
ple of which is investigating multi model interfaces is presented in [20], this work exhibits
14
2.3 Segmentation
dynamic interface for retrieval purposes. In this thesis, we employed Query by example
paradigm to query the CBIR.
2.3 Segmentation
Automatic image segmentation is a challenging errand. Image segmentation is a key
role in CBIR and extracts the region based image manifestation. Segmentation algorithm
splits the image into small parts or into altered components. These components based on
feature homogeneity. Many techniques and approaches of segmentation are developed for
example graph based, grid based, clustering, model based, region growing based method
and contour based. In computer vision, different segmentation techniques are used.
2.3.1 Spectral Clustering Approach
Spectral clustering technique is yet another technique that’s widely used with segmen-
tation goal [16]. On this technique, mixture of texture in addition to coloration charac-
teristics are generally acquired utilizing the clustering pixels. Initially, the joint dissemi-
nation of texture, composition, color and position components is displayed with a blend
of Gaussian. To calculate the parameters of the model, Expectation Maximization (EM)
algorithm is used. The consequent pixel cluster association rights proffer a segmenta-
tion from the image as well as resulted regions roughly correspond to objects. Several
methods designed and style their very own segmentations. So, as a result preferred spot
attributes throughout segmentation, whether it be colour, texture, or combination of both
[21]. These algorithms in many cases are based upon k-means clustering with regards to
pixel/block characteristics features. In the work of [21; 22], segmented image is divided
into trivial blocks of size with dimension of 4 ∗ 4 from which texture as well as color
features are extracted. At that point, it is applied k-means clustering to group of particular
vector of cluster vectors on several categories and classes. Each class represents to one
region. Blocks are classified in the same category in the same area or region. Therefore,
clustering segmentation technique is more effective in image retrieval.
15
2.3 Segmentation
2.3.2 An Unsupervised Approach (JSEG approach)
The important idea of the JSEG procedure is to distinct the segmentation process in a
pair of levels, coloration quantization as well as spatial segmentation [22]. From the
initial stage, color in an image tends to be quantized to several classes that can be used to
distinguish between areas in the image. This quantization is carried out in the coloration
living space without taking into consideration the spatial distributions from the color. As
a result, the image pixel values tend to be swapped out by their matching of the color class
labels and then figure out a class map which can be viewed as a special kind of texture
composition. On the other hand, spatial segmentation is carried out entirely on this kind
of class map without taking into consideration the equivalent pixel coloration likeness.
The real advantage of this sort of two stage separation is completely clear. It is a
struggle to handle the similarity with coloring as well as their particular distributions at
the same time. The separation of the similar color of the spatial distribution that takes
into account the development of algorithms more manageable for each of the two stages
of processing [23; 24]. Figure 2.2 [24] shows the schematic way of the color segmentation
from JSEG algorithm.
Figure 2.2: JSEG algorithm for color image segmentation - [24]
16
2.3 Segmentation
Figure 2.3: JSEG segmentation results - [22]
Regardless of the possibility that JSEG is an established methodology for segmen-
tation it will miscarry for some class of images and delivers fewer quality segmentation
results as over segmentation is appeared in figure 2.3 [22]. The framework depends sus-
picion by enhancing the segmentation accuracy or precision and decreasing the computa-
tional intricacy of bottom up approach and that point MCISS methodology are progressed.
2.3.3 Multiclass Image Semantic Segmentation (MCISS)
Multi Class Image Semantic Segmentation (MCISS) is another productive methodology
in light of the fact that it can be utilized the images for all classes and not only for specific
images of class [16]. Class particular means the framework just ready to image segment
for some images of particular class, for example horses, buildings, pedestrians etc. Ad-
ditionally, we are able to think about specific methods useful class of the segmenting
images containing organized objects. Since the images containing both structured and
non-structured elements such as the river, sky, etc. that doesn’t include almost any cer-
tain condition or even composition the job is not able with regard to this sort of course
involving images [25].
Multiclass image semantic segmentation technique are combined top down and bot-
tom up approaches. From the bottom up method that utilizes both fractal dimensions
along with j-value being homogeneity calculate is employed. The aim of a bottom-up
way to give a region based segmentation approach which can be utilized for all class of
pictures where JSEG is not able to do some class of images [25; 26]. MCISS frame work
is shown in figure 2.4 [25].
17
2.4 Feature Extraction
Figure 2.4: MCISS Frame work - [25]
Image segmentation is still an open research problem. Selection of segmentation tech-
nique is dependent on the type of application or domain. In image retrieval, region bound-
ary is not proper until region is homogenous.
2.4 Feature Extraction
An image is a group of pixels prescribed in the form of a matrix and represented in
low level features. Selection of proper features is the most important step in designing
an efficient image retrieval system. Features are extracted on global and region based
techniques. In global extraction segmentation is not required while for region based ex-
traction, segmentation is done at first step. In any system of CBIR, efficient retrieval
is strongly dependent on the fact that how efficiently the visual contents present in the
image are represented in the form of visual signature (composition of multiple features).
Retrieval results are adversely affected by the improper features. Suppose if the color
features are extracted from a repository having similarly colored images, then one can
imagine that the retrieval results will never be appropriate. This is the reason that selec-
tion of features is a significant design phase of a CBIR system.
2.4.1 Texture Feature Extraction
Texture is imperative and prominent visual property of an image in low level feature
and extensively used in CBIR. The real world images are composed of different kinds
18
2.4 Feature Extraction
Figure 2.5: classification of texture features extraction methods
of objects and find these objects have different surface patterns within the image. The
surface pattern of an object in the image or whole image is known as the texture. Texture
feature condensing definition as given in [8; 27] can be stated as, the representing the
spatial arrangement of gray levels (red and black) of the pixels in an image or region. The
common known texture descriptors are co-occurrence matrices [28], Wavelet Transform
[29; 30], Gabor filter [31; 32] and Tamura features [33; 34].
Texture feature have two main components as shown in Figure 2.5.
1. Spatial method.
2. Spectral method.
19
2.4 Feature Extraction
2.4.1.1 Spatial Method
Spatial Texture feature including co-occurrence matrices [35; 36], Fourier power spectra
[37; 38], shift-invariant principal component analysis (SPCA) [38], world decomposition
[39], Markov random field [40; 41], fractal model [42], Tamura feature [43; 44], Haar
wavelet transform [45; 46] and Gabor filter differentiate texture by the statistical distribu-
tion of the image intensity, which have been used frequently. Spatial methods are widely
used and most effective in the area of CBIR. In CBIR system, extraction of features is
important key steps because they are used in all subsequent modules of the system. They
have shown good performance when applied to irregular shapes. The drawbacks of these
techniques are their sensitivity to noise and distortion. These methods are complex and
required lot of computations.
2.4.1.2 Tamura Feature
Tamura texture features are typical spatial domain features. Tamura texture features con-
sist of 6 components e.g. coarseness, contrast, directionality, likeliness, regularity and
roughness [22; 44]. Among these, coarseness, contrast and directionality are considered
to be more important. The Tamura feature descriptor is not effective when used to rep-
resent distorted or deformed images because it is sensitive to scale and orientation [44].
Coarseness has quick relationship to scale in addition to reiteration rates in texture fea-
tures. An image will contain materials such as texture in a small number of scales, and
expects coarseness, Roughness to determine the size of the largest, where there is tex-
ture, and even in the case of a small texture exists. By contrast, the formation of seeking
dynamic range of grey levels in an image group, along with the polarization of the distri-
bution of highly contrasting (Black and Red). The specific component depicted does not
intend to separate between different orientations or patterns however measures the overall
directional class.
20
2.4 Feature Extraction
2.4.1.3 Markov Random Fields
Markov Random Field (MRF) are used in various images processing application such as
texture synthesis, classification, image segmentation, restoration and compression. MRF
model successfully represents the textures and consists of small primitives. Markov Ran-
dom Fields is a probabilistic process [47]. The probability of a cell in a given state is deter-
mined by the probability of neighboring cell therefore all interactions are local. Widely
used models include the Gaussian MRF (GMRF), and the simultaneous autoregressive
(SAR) model. If the model is independent, zero mean, identically distributed and the unit
variance noise, i.e. red noise, then the model is called simultaneous autoregressive model
(SAR). SAR model uses fewer parameters as compared to other MRF models. However, it
is not rotation invariant [48; 49]. The multi-resolution simultaneous autoregressive model
(RISAR) is proposed in [50]. When we used multi resolution Gaussian pyramids with low
pass filter then image applied several successive levels for sub sampling. At each level,
either SAR or RISAR model can be used. Although it has a better performance com-
pared other texture features, such as wavelet transformation and Wold Decomposition.
The MRSAR cannot distinguish images when the structured pattern is involved.
2.4.1.4 Co-occurrence Matrix
Co-occurrence Matrix is one of the earliest methods which consist of statistical features of
grey level to classify texture. Haralick [33] proposed the Grey Level Co-occurrence Ma-
trix (GLCM)and used statistics 2nd order method. They extract features from an image
using GLCM. It is very effectively and successfully for texture classification in evalua-
tions [51]. Features are obtained by using normalized matrix of co-occurrence P (Q, R)
are given in table 2.2.
GLCM matrix depends on the frequencies between two pixels in an image and sep-
arated by certain vector. The matrix distribution depends on the distance and angular
relationship between pixels. Different texture characteristics are captured by changing
and varying the vectors [52]. GLCM Can be categorized into four groups.
1. Characteristics of Visual texture.
21
2.4 Feature Extraction
Table 2.2: Features calculated by using matrix of normalized co-occurrence P (Q, R)
Feature Formula
Energy∑q
∑r
P 2(Q,R)
Contrast∑q
∑r
(Q,R)2P (Q,R)
Entropy∑q
∑r
P (Q,R) logP (Q,R)
Homogeneity∑q
∑r
P (Q,R)1+(Q−R)
2. linear algebra, Statistics.
3. Correlation Information.
4. Information theory.
2.4.1.5 Edge Histogram
The edges of the images in the definition of a basic feature of the representation of its
contents histogram is used to represent the edge features. Edge histograms imply the
frequency and also the directionality in the purity modifications from the image, it is a
unique feature for images, which can’t be replicated by a color histogram or the homoge-
neous features of texture. To be able to symbolize this original characteristic, exclusive
MPEG-7, there’s a descriptor regarding for edge distribution in the image. Edge His-
togram Descriptor (EHD) are used to reflect the distribution of all the local edge in the
form of an image [53; 54]. In the extraction of spatial scale feature edge in a block image
to the application of digital filter. The descriptor of edge histogram represents five forms
of edges, such as a 4 edges of directional and remaining edge is direction. To recognize
the image, the edges of the play a key role in the feature extraction and retrieval of images
with similar semantic meaning.
22
2.4 Feature Extraction
2.4.1.6 Fractals in Spatial Feature
A fractal texture is characterized by Self-Similarity. This self-similarity is quantified by its
fractal dimension and defined as D = logN/(log(1/r)). The Fractal dimension describes
the roughness of texture image many researchers are used fractal technique to get texture
feature in an image [55; 56]. The main problems are both the difficulty in finding the
fractal dimension and the lack of self-similarity in most of the real-life textures. Another
problem is that visually different textures may have equal fractal dimension [57].
Spectral domain methods such as multiresolution wavelet [58], discrete cosine trans-
form [59], Gabor filters [60], and the multiresolution simultaneous autoregressive model
[40] take the advantage of being sensitive to noise. Consequently, most of these turns
possess extensively also been employed to symbolize image textures Another multireso-
lution system, discrete curvelet change, has been created by Cands and Donoho, which is
powerful technique to generate the edges. DCT has been properly employed to denoising
images [61]. Spatial method is more efficient over spectral methods and widely used in
CBIR.
2.4.2 Spectral Domain
Spectral domain discusses density function in the frequency domain. It means that in this
domain frequency is an important role in CBIR texture. For large size images, spectral
features are useful and for small size images having irregular shapes, spatial feature are
reliable. Innumerable researcher used spectral features for CBIR. A human can separate
two different images at a glance but when a machine tries to perform the same job then
a lot of image discriminatory information needs to be preprocessed. Now, we discuss the
effectiveness of several well-known transforms in representing image texture feature in
CBIR techniques.
2.4.2.1 Fourier Transform
Fourier Transform (FT) uses Fourier analysis to measure the frequency components of
the signal. The purpose of FT is to convert a time-domain signal into the domain of
23
2.4 Feature Extraction
frequency [62]. FT provides the pattern information of an image that is collected from
its frequency components [17] as shown in Figure 2.6 [17]. The frequency components
at specific locations associated with an image are utilized to represent the texture features
of that image. Texture features computed from high frequency components are the main
distinguishing factors between images which are used in CBIR [63]. Therefore, frequency
information at specific locations is required to distinguish images in CBIR. However, the
disadvantage of Fourier transform is it captures only the global spectral features but does
not provide any information about the exact location of these features as shown in Figure
2.7 [62].
Figure 2.6: Image and its Fourier Transform - [17]
Figure 2.7: Different images with same Fourier Transform - [62]
Fourier transform miscarries to provide texture pattern discrimination information
properly. Two completely different images may have similar patterns in their Fourier
24
2.4 Feature Extraction
domain (shown in figure 2.7). The original images on left side look different; however,
their Fourier spectra have similar patterns. Based on this spectral pattern, these images
may be considered as similar in a CBIR process but they can easily be differentiated by
human perception [63]. Therefore, Fourier transform is functional only when spectral
features of the signal are taken into consideration but not their exact location of occur-
rence. In [64] shape features are used for image retrieval purpose. However, when they
using texture for CBIR, the frequency components, the exact location of the different fre-
quencies are equally important in distinguishing images. Thus, Fourier transform is not
applicable directly in texture based CBIR.
2.4.2.2 Discrete Cosine Transform and Wavelet Transform
Discrete Cosine Transform (DCT) is like to the discrete fourier transform. It means that
signals or image are converted from the domain of spatial to frequency domain and ob-
tains DCT coefficients which can be used in various images processing purpose. Discrete
cosine transform has been adopted as an effective technique for image and video compres-
sion. It possesses the property of preserving the image energy in the low frequency DCT
coefficients which makes it so popular for data compression [65]. Different approaches
for shape, texture and color feature extraction and indexing using DCT can be found in
[66]. Similar to FT, DCT texture features can only capture global features while ignoring
local details. Therefore, it is not suitable for CBIR.
Wavelet transforms are effectively used for analyzing the texture information of im-
ages. It decomposes an image spectrum into multi scale and oriented sub band packets
of wavelets is a generalization process of wavelet decomposition and all possible combi-
nation of sub band tree decomposition are obtained [67]. Wavelet packets are all around
restricted in both time and frequency and in this manner give an attractive alternative to
pure frequency analysis [61; 67].
In the work of [68] builds a wavelet transform to analyze the material fixed in the
light of the four tap Daubechies wavelet filter coefficients. In its strategy, the texture is
disintegrated into ten channels, which are acquired three level of wavelet decomposition.
25
2.4 Feature Extraction
In each level, texture is split into four channels, which can be represented by LL, LH, HL,
and HH. For example, channel represents LH low horizontal and high vertical frequency
Subsequently the channel HH funnel has almost of the image noise, it can be disposed
at each decomposition stage. The combination of horizontal and vertical elements in
each frequency to obtain a rotational invariant. Investigates 16 Brodatz compositions at
six introductions are performed to test the execution of this system The normal wavelet
coefficient is separated from each of the staying four channels and utilized as invariant
elements for the characterization. The detriment of this procedure is that the directional
information is lost when the channels are merged.
2.4.2.3 Gabor Filters Transform
Gabor filter transform is a virtuous multiresolution tactic. Gabor filter represents the
edges of image in an effective manner through the use of multi-orientation and different
scales. Various Researcher good opportunities to use the Gabor filter in image processing.
Gabor filtration system results in some sort of separate out standard bank composed of
Gabor filtration system together with a variety of regulations, scales and also orientations.
Gabor filters utilize multiple window size in a different level where as STFT uses only
one window.
Gabor filters have been extensively used in texture representation and compared TWT,
PWT and MR-SAR model [39; 69]. Gabor filters texture feature are found to be the most
propitious, auspicious and robust. Gabor filters removed the problem of similar retrieving
and rotated images from an image data based [70]. Wavelets are not so effective in repre-
senting edge discontinuities in images [63; 71]. In the work of [46; 72], several different
spectral approaches have been introduced and seen that the gabor filters perform best as
compare wavelet transform including the Orthogonal wavelet, bi-orthogonal wavelet. Ga-
bor filter texture feature have been found suitable and useful in CBIR of biomedical image
like cervicographic images for cancer detection [73]. Gabor filters only use the half peak
magnitudes in frequency domain and do not involve image down sampling.
26
2.4 Feature Extraction
The drawback of this methods is that they do not capture the edge information of
an image effectively. This is the reason for finding better multiresolution spectral ap-
proaches, which can capture the edge and orientation information of an image effectively.
When using the technique of discrete wavelet and Gabor filters transform then edge and
orientation caused a problem. A newly multiresolution scheme has been introduced by E.
J. Candes and D. L. Donoho [61].
2.4.2.4 Curvelet Transform
To overcome the limitation of Gabor and wavelet filters then developed a new technique
such as curvelet transform. The first strategy involving curvelet transform implements the
discrete ridge-let transform [61; 74]. It has been used for effective tools in image denois-
ing [61], classification of texture [75], image convolutions [76], contrast enhancement
[77] etc. In the work of [62; 74], introduced two new techniques of curvelet transform
depending on different operation involving Fourier samples are given below.
1. Unequally Spaced Fast Fourier Transform (USFFT)
2. Wrapping Based Fast Curvelet Transform (WBFCT)
WBFCT is more rapidly in computation time and more robust ridge-let and unequally
space fast Fourier transform as shown in Figure 2.8 [62]. So, we start the definition
of ridge-let transform ,f(x, y) is given image then continuous ridge-let coefficient are
expressed as [76; 78].
<(a, b, θ) :=
∫∫ψ(a,b,θ)(x, y)f(x, y)dxdy (2.1)
In above equation 2.1, where a >0 is known parameter of scale, b ∈ R translation
parameter and θ [0,2] is the orientation parameter.
ψ(a,b,θ)(x, y) = a1/2ψxcosθ + ysinθ − b
a(2.2)
27
2.4 Feature Extraction
Where θ = orientation of the ridge let. Ridge lets are constant (X cos θ + Y sin θ =
constant) and transverse to these ridges are wavelets [76]. A ridge-let gives the informa-
tion of edge direction and is much more faster than a conventional sinusoidal wavelet. In
this approach, first decomposed the input image into set of sub-bands and then further
divided into several blocks for ridge-let analysis. During the process a large amount of
redundant images so this process is very slow time consuming and not effective in large
database.
Therefore, Fast discrete curvelet transform (FDCT) in view of the Fourier wrap sam-
ples have a lower computational complexity because it uses the FFT instead of complex
ridge-let transform. In this approach, it has been tight to support curvelet in the fre-
quency domain to reduce duplication of data. Normally, ridge-lets use a set size length
that is equal to the image size with a variable width, whereas curvelet have both variable
length and width which is represented more anisotropy [79]. So, (WBFCT) is easier, less
redundant, more straightforward and faster in computation than ridgelet based curvelet
transform. Thus frequency plane is covered by curvelet spectra, and there is no loss or
wasted of spectral information like as the Gabor filter.
Figure 2.8: Five-level curvelet digital tiling of an image - [62]
Majumder has described a method to automate Bangla basic character recognition us-
ing ridgelet based curvelet transform [80]. There are fifty characters in Bangla language
and all the existing Bangla fonts use all these characters. Majumder has changed each
character morphologically by thinning and thickening twice the original characters for
his experiment. In training phase, the curvelet coefficients have been extracted from all
28
2.4 Feature Extraction
these characters to generate texture features descriptors and 5 sets of classifiers have been
created from each character. The characters are altered to capture the changes in charac-
ters of different fonts by slightly varying their edge positions. Curvelet texture features of
the query character are then compared with the training sets to find the same characters.
He has done the experiment on only twenty well known Bangla fonts. Therefore, there is
no guarantee that this application will recognize all characters in complex format as well.
In the work of [80] Joutel et al. have created a convenient assistance tool for the identi-
fication of ancient handwritten manuscripts using ridge-let based curvelet transform. The
curvature and orientation of handwritten scripts are the two main morphological shape
properties used to generate discrete curvelet features. Joutel et al. have focused on charac-
terizing handwritings and classifying them into visual writers families. Problems regard-
ing historical manuscript classification include a difficulty in segmenting lines, words,
nonlinear text size differences, irregular handwritten shapes, difficulty in the recognition
of spaces or edges due to lack of pen pressure, unpredictable page layouts, etc. Moreover,
backgrounds of many ancient documents have noisy texture patterns. Although the clas-
sification and writer recognition tests computed on two separate databases obtain a high
level of accuracy, this approach has some shortcomings. One orientation representation
and one curvature representation have been generated in this approach from each script,
which is not enough to classify and characterize all ancient handwritten scripts. Texture
patterns of image are not represented in this approach, so it will not be effective for natural
image retrieval.
In the work of [75] texture classification by statistical and co-occurrence features us-
ing discrete curvelet based on ridge-let is presented. In this work, texture classification
depends upon three different feature descriptors. The first consists of curvelet statisti-
cal features (CSF), i.e., mean and standard deviation. The second consists of curvelet
co-occurrence features (CCF), i.e., cluster shade, contrast, local homogeneity and clus-
ter prominence. The last one involves the combination of the CCF and CSF descriptors.
Above all, we have described several related works on curvelet transform, all of them
use ridge-let based curvelet transform. So far, we find only one application of wrapping
29
2.4 Feature Extraction
based curvelet transform in texture classification method for analyzing medical images
gathered from computed tomography [75]. The main problems in this approach are the
existence of large number of similar images in the database and small image dimension.
Because the database has only human tissue images. Therefore, it has less variation in its
domain compared to the natural image databases. Tissue texture of same human organ is
expected to have a negligible difference in such a small image (32 ∗ 32), thereby making
the classification method simple. Natural images are quite different in nature. Therefore,
this process will not be effective for CBIR in a large database with large natural images.
We find ridge-let based curvelet transform has some drawbacks. Newer approaches
of curvelet transform, USFFT based and wrapping based fast discrete curvelet transform
have several advantages over ridge-lets as well as the ridge-let based curvelet transform.
Among these new approaches, wrapping based discrete curvelet transform provides addi-
tional features such as robustness, simplicity and good edge capturing capability in image
texture representation. We also described how this wrapping based curvelet transform
works by providing details on curvelet structures in spatial and spectral domain and how
these curvelet provide better texture discriminatory property in representing edges.
2.4.3 Color Feature
Color is an imperative visual feature and pervasive nature for digital images. The actual
extraction regarding color features from digital images determined by comprehensive em-
pathetic of the color theory in digital images. A color space can be utilized to indicate a 3
dimensional coordinate color system and a subspace in the system through which colours
tend to be displayed. The most recognized and widely used digital photos and computer
graphics color space RGB color space where colors are linear combinations of red, green
and blue channels of representation. Moreover, most of the pixel values stored digital im-
ages from the RGB color space formats. The geometry of the RGB color space is depict
in figure 2.9[81].
Different color spaces are analyzed based on three properties. The properties of in-
terested in a color space possessing are uniformity, completeness, and uniqueness. The
30
2.4 Feature Extraction
Figure 2.9: RGB Color Space - [81]
RGB color space is not perceptually uniform. Among the color spaces, HSV and HMMD
are more useful in measuring perceptual similarity. Some of popular techniques for color
features extraction are commonly used color histogram, the color coherence vector, HSV,
YCbCb, HMMD and dominant color descriptor.
2.4.3.1 Color Histogram
Color histogram is a very simple color feature and it represents the color distribution of
an image. It has a lot of applications in image retrieval and object recognition [63; 82].
Color space are converted into bins and each color bin has its own frequency. Colour
histogram is usually invariant in order to translation in addition to rotation. Normally
colors are grouped into bins. Therefore, every color occurrence contributes to the overall
score of the bin which belongs in an image. The bins generally indicate the quantity of
red, green and blue found in the pixels rather than indicating which individual colors are
presented [82]. Histograms of the RGB color are usually normalized, and compared with
different sizes as shown in figure 2.10 [82].
Color histogram lacks the spatial information about pixels and relationship between
different image parts cannot be maintained. Different objects having same color can have
same color histogram. Color histogram ignores important information like shape, texture
and two different visuals can have same color histogram and it lacks spatial information.
31
2.4 Feature Extraction
Figure 2.10: Image and its color histogram - [82]
2.4.3.2 Color Coherence Vector
Color Coherence Vector (CCV) is different from color histogram in a way that it captures
the spatial information in an image [83]. It is constructed by improving the performance
of global color histogram (GCH). Computational complexity of a CCV is more than a
color histogram. In the work of [83] pixels are classified as coherent or incoherent, ac-
cording to its similarity to the specified color region. Color region larger than 1% of the
image size are considered as coherent and less than 1% are taken as non-coherent. Due
to separation of coherent and incoherent pixels and resemblance of significant region to
coherent pixels, CCV performance is better than a color histogram. Color histogram can
be same for two different images while by using CCV difference can be distinguished by
separating the histogram of coherent and non-coherent region. In [84] proposed an effi-
cient image retrieval system by using interior pixel classification. In the image analysis
step quantization of an image is done in RGB space and pixels are classified as interior or
border on the base of pixel location by using 04 neighbor rules for sake of simplicity. Two
histograms have been calculated for interior and border pixels and binary classification of
image pixels has also been proposed. Due to classification of interior and border pixels
the resulted histograms are discriminative. In-case there are smaller interior pixels for the
same color then there must be some visual property of the image that is useful for creating
the difference [85; 86]. Three images with their CCV color feature vectors are shown in
32
2.4 Feature Extraction
figure 2.11 [86].
Figure 2.11: Images and their CCV color feature vectors - [86]
2.4.3.3 HSV Color Space
In HSV color space, images are treated as group of pixels comprising of red, green and
blue values [86; 87]. HSV color space solves the color distribution problem through
symbolizing color close to human perception [88]. In HSV color model Hue; represents
the color, Saturation is amount of the color and amount of light is represented by Value
[88; 89] as shown in figure 2.12 [86].
The dominant spectral components is represented by Hue and represents it as red,
yellow or green. Addition of red to the color adds the saturation (S) while V represents
the brightness. The cylindrical co-orientate system is represented by the sub space that is
based on inverted pyramid with six sides. Change of each color weight can be perceived
by using linear refractivity and similarity measure depends on color. Primary color RGB
are converted into HSV then first of all, to calculate the minimum value and maximum
value from the primary RGB Triplets.
H = cos−1 1/2[(R−G) + (R−B)]√(R−G)2 + (R−B)(G−B)
(2.3)
S = 1− 3[min(R,G,B)]
V(2.4)
33
2.4 Feature Extraction
Figure 2.12: The RGB and HSV Color Space - [86]
V = 1/3(R +G+B) (2.5)
2.4.3.4 HUE MINIMUM MAXIMUM DIFFERENCE (HMMD)
The HMMD hue-min-max-difference color space is new color scheme, which is sup-
ported in MPEG-7 together with simple monochrome grayscale and intensity only space.
The Hue is defined and calculated from HSV color space and min-max is the minimum
and the maximum values among the R, G and B values. The difference factor is charac-
terized as the difference between maximum values and minimum values [28; 90]. This
color space can be represented using the double cone structure as shown in figure 2.13
[28].
2.4.3.5 Dominant Color Descriptor (DCD)
Dominant Color Descriptor (DCD) is a variations of color histogram and extract the color
from the highest bin of histogram [91]. The bin height threshold is used for the selection of
color bins. According to MPEG-7 standard, 1-8 colors are sufficient for the representation
34
2.4 Feature Extraction
Figure 2.13: HMMD Color Space - [28]
of a region. This selected color bins are adopted to the regions instead of the adoption
of the color space. The performance of DCD is more accurate as compared to color
histogram [82]. Many to many matching is used for the distance calculation and similarity
check.
2.4.3.6 YCbCr Color Space
The variation involving YCbCr in addition to RGB is usually in which YCbCr signifies
color since lighting as brightness and difference of two color signals, while RGB consist
of red, green and blue colors. In the YCbCr color space, is represented component of the
Y as brightness called Luma, Cb is blue less Lama (B-Y) and Cr red Lama minus (R-Y).
This color space exploits the characteristics of the human eye. The eye is more sensitive
to changes in light intensity and less sensitive to changes hue. Whenever the amount of
information is to be minimal, the density of the component can be stored higher than Cb
and Cr components accurately.
35
2.4 Feature Extraction
2.4.4 Shape Feature
Shape is another imperative visual components of an image and recognized real world
objects. Shape features have been used in many application and for image retrieval. In the
work of [92] broadly classify shape extraction techniques and strategies are categorized
into two groups, one is Contour Based and other one Region Based methods. Contour
based techniques compute the features of shape that is only boundary of the shape and
also limit of the shape. While the 2nd region based approaches are used to extract features
from the entire region in an image. Since Contour based schemes employ a portion of
small region in an image. There are additional vulnerable to noise as compare region
based. For that reason, region based shape features are used to extract the color feature in
image retrieval.
Shape descriptors are normally used in color image retrieval, such as, moments, cir-
cularity, area and eccentricity. The descriptor of area-based is employed in several work
in image processing. In the work of [93] eccentricity or elongation is used. Eccentric-
ity is the ratio of actual key axis length and minor axis. Therefore, normally merged to
make a more efficient form descriptor. The type of method explains that the shape is ex-
tracted on the base of contours or regions. Different structural and global approaches have
been defined further in both methods. Figure 2.14 [92] show the classification of shape
representation and their techniques.
Figure 2.14: Classification of shape representation and description technique - [92]
36
2.4 Feature Extraction
2.4.4.1 Shape Extraction based on Contour Method
Object boundary is used in contour shape extraction. Two different types of approaches
are used in contour shape methods that are global approach (continuous) and structural
approach (discrete). In global /continuous approach, a shape is not divided into parts and
obtained feature vector from integral boundary for the shape description. Shape similarity
is considered as a distance metric amongst the feature vectors. In discrete shape boundary
approach is broken into small segments which are known primitives by using particular
criterion and final feature representation is build a string [92]. Multidimensional feature
vector of numeric from shape boundary are extracted in global contour shape represen-
tation. The matching process is based on metric distance e.g. city block distance or
Euclidean distance.
2.4.4.2 Simple shape descriptors
The Simple shape descriptor consist of Area, eccentricity, major axis orientation and cir-
cularity [94]. The simple shape descriptors are used for large differences of shape dis-
crimination. Filter are used to remove the false hits or other combined shape descriptors
for discrimination.
Figure 2.15: Shape eccentricity and circularity - [28]
Figure 2.15 [28] describes the eccentricity of shape, figure 2.15 (a) is showing a
parabola but it is not correctly describing the shape as it seems to be an elongated shape
and if we consider it as circular it seems to be a better option. If we consider the circu-
larity shape as presented in figure 2.15 (b) and figure 2.15 (c) the same circularity having
different shapes.
Efficiency of a simple shape descriptor includes convexity, elliptic and circular vari-
ance. Similarity can be checked to calculate the means of point to point matching and
37
2.4 Feature Extraction
every point is dried as a feature point [94]. Hausdroff distance is example of classical
correspondence shape based matching method and used for object location in an image
and can calculate the similarities between shapes [92]. The advantage of Hausdroff dis-
tance is partial matching of shape while Hausdroff distance is not rotation, translation and
scaling invariant. For matching of any shape model is overlapped on the image in differen
positions, orientation and direction scale. This results in an expensive matching.
2.4.4.3 Shape signature
Shape signature is represented by a one dimension function derived from the shape bound-
ary. Centroid profile, centroid distance, complex coordinates, tangent angle, chord length,
area and curvature are the example of shape signature [95]. For best matching between
shapes are obtained from shift matching factor which is depend on the orientation changes
in 1-D space. Some signature matching requires the matching of shift in 2D space exam-
ple is centroid profiles. For online retrieval, matching cost is too high. Shape signature
are sensitive to noise and a very small change in boundary can result in large error during
matching due to which it is not beneficial to describe the shape using shape signature. Fur-
ther work is required for increase in performance and reduction of machine load. A shape
histogram can be simplified through signature of quantization into signature histogram
which is invariant.
2.4.4.4 Stochastic method
Time series especially auto regression modelling has been used for calculation of shape
descriptors. Techniques for this sort of class are for the most part situated in stochastic
demonstrating of 1-D function ’f’ obtained from the shape. A linear regression model
conveys the value in a particular function, such as linear combination of some of the
previous values [96]. The autoregressive model depends on basic radius by a combination
of M perception values and an error term. The AR methods are failed due to complex
boundaries, parameter of AR are not adequate for a description.
38
2.4 Feature Extraction
2.4.4.5 Spectral transform
Spectral transform shape descriptor overcomes the problem of boundary variations by an-
alyzing the shape and noise sensitivity in spectral domain. Spectral descriptor includes
wavelet and fourier descriptor, both are derived from 1D shape signature. Researchers
have proposed the wavelet descriptor regarding shape description which includes a ben-
efit over Fourier descriptor and multi resolution have been presented in both spatial and
spectral space. The increase of spatial resolution will absolutely give up frequency res-
olution. In the work of [92; 95; 96] low frequencies of wavelet coefficient are used for
shape representation.
In global contour shape, representation depends upon the shape of object in the form
of contour. The matching process between shapes can be done in the field of feature.
Accuracy and less computational power are the main requirements for any shape repre-
sentation system. Global shape descriptors are inaccurate, compact and combination with
other useful shape descriptor can be useful. Signature based shape matching and corre-
spondence based shape matching are not adequate for online shape matching due to the
involves the 2D matching of two shapes. In case only partial matching is required than
Hausdroff distance is a better choice. Autoregressive methods involve metrics operation
which are complex due to more computation. Implementation of FD is simple and has
less computation by using fast Fourier transform.
2.4.5 Interest point detector
Interest point detectors have significance important in feature extraction. Most of the por-
tion of the CBIR framework utilizes the worldwide elements of global features like as
texture, shape, color and combination of these features to extract the required features
from the images. Most similar images are extracted by using the local features and differ-
ent transformation. It describes the local pixel in the image advantage through the local
neighborhood content. The local descriptors such as SIFT, Harris detector and SURF
search for distinctive locations called interest points in an image.
39
2.4 Feature Extraction
2.4.5.1 Harris detector
The specific Harris detector continues to be widely used throughout Object detection as
well as image retrieval for its repeatable detection performance. Basic thought of this
detector is the use of the function of autocorrelation with the ultimate goal of deciding
where signal modifications within one or two recommendations [97]. The interest details
recognized with the Harris detector aren’t invariant to be able to scale changes. The Harris
Laplace detector could find size and scale invariant feature capabilities. Step one on this
approach is usually to figure out the interest points say Harris points at different scales.
These points consist of local maximal measure are chosen by Harris Laplace due to its
high detection rate [98]. The particular size in the level using the Laplacian optimum will
be taken as the attribute size and scale on the interest level points.
2.4.5.2 Speeded up robust feature (SURF)
The Speeded Up Robust Features (SURF)algorithm is scale invariant in addition to rota-
tion invariant detector interest point and this detector are computationally very fast. The
detector detects points of interest within the image. Descriptor symbolizes the features of
interest points and also builds the distribution of responses wavelet Haar inside the neigh-
borhood points of interest feature vectors of Points of Interest. Specific integral image is
usually calculated speedily coming from an input image Which led to accelerate specific
account points of interest [99; 100]. The main steps of SURF technique is as follows:
1. Interest point detection
2. Interest point description
3. Feature matching
Interest point detection
To reduce the computation time Integral images are used. Due to its good performance
in accuracy, Hessian matrix approximation and convolution filters are used to detect the
interest point as shown in figure 2.16. The Hessian matrix is used to determine the location
and scale of descriptor.
40
2.4 Feature Extraction
Figure 2.16: Surf interest point -
Interest Point Description
The SURF represents the actual distribution on the strength content and information inside
of the interest point neighbourhoods. The distribution of first order Haar wavelet is used
to take the advantage of integral images for speed. Become fixed to assist in the rotation
of the image, it has been identified reproducible orientation of Points of Interest. For
that reason, the responses of Haar wavelet are generally determined within times in x and
y directions inside a neighbourhoods of radius 6s near to the point of the interest. To
compute the response in xy direction at any sale, six operation are necessary required.
In initial step, comprises of building a square region focused on the IP and selected
different orientation. To preserve vital spatial information, the region is fragmented into
several small blocks i.e. 4 x 4 square small regions. The Haar wavelet replies in both
direction horizontal and vertical are called dx and dy respectively. To raise this robustness
to geometric distortions and also localization problems. Therefore, dx and dy are initially
weighted with a Gaussian standard deviated by = 3.3s centered at the interest point. At
that point, the summation of the wavelet responses of the component dx and dy over each
sub-region procedures an initial set of entries in the feature vector [99; 100].
SURF Descriptors Matching
SURF matching descriptors of each of the images is by an algorithm Nearest Neighbor
(NN) which classifies the objects rely on neighboring training examples in space feature.
The NN acts as to start with, training process makes objects repository which is all of us
already know just what the correct group ought to be. At that point, query is given to
the system to obtain the objects which is classify on the nearest neighbours of the query
in repository. Then, the system classifies the query as belonging to the same class as his
41
2.4 Feature Extraction
nearest neighbor is shown in figure 2.17. Figure 2.17 (a) shows that matching between
original image and cropped image and (b) is showing matching between original image,
scaled and rotated image.
Figure 2.17: Surf interest point -
2.4.5.3 Scale Invariant Feature Transform (SIFT)
David Lowe was efficiently introduced the Scale Invariant Feature Transform (SIFT) to
explains the feature of local image [101]. The SIFT procedure consists of four stages:
1. Scale space extrema detection
2. localization Key-point
3. Orientation
4. Key point descriptor
Scale-space extrema detection
To detect the local interest points by using SIFT algorithm is called key points. In this
particular stage, the algorithm has to look at the potential at all levels along with the
sites main points of the pictures can be successfully implemented and implemented using
the function of the difference between the Gaussian that can be fixed, and the scope and
orientation efficiently.
42
2.4 Feature Extraction
localization Key point
The following stage is to perform a point by point fit to the adjacent information for
location, peak magnitude and response of the edge . A location is identified in image scale
spaces which can be invariant to be able to scale with respect to rotation of an image, in
addition to scaling and translation. In each and every candidate location, a specific model
product is suit to find out location, scale and contrast. Key points are chosen taking into
account measures of their dependability and stability. To be able to define the particular
impression on each and a key point location, the image is processed to extract image
gradient and orientations [101; 102].
Orientation assignment
A number orientations are issued to be able to just about every a key point position depen-
dent on local image properties. The scale with key point is utilized to choose the Gaussian
smoothed image L, with the nearest scale, as all computations should be performed in a
scale-invariant manner.
Key point descriptor
Gradients of local image are calculated in the certain scale about each key point in the
region and transformed that takes into consideration neighbourhood shape distortion in
addition to change in illumination. At the first stage, compute the gradient magnitude
from key point descriptor and to calculate the image orientation region around the key
point location. Weighted Gaussian window are used to indicate inside layer circle and
gathered into orientation histograms summarizing the contents over 4x4 sub regions as
shown in figure 2.18. The length of each arrow indicates the gradient magnitudes, direc-
tion in an image region. SIFT feature have the uniqueness of their object matching ability
that work efficiently even for large databases [101; 102]. Figure 2.18 describes the image
gradient and key point descriptor.
43
2.5 Similarity Measure
Figure 2.18: SIFT interest point - [101]
2.5 Similarity Measure
In CBIR, finding similarity measure which is consistent with human perception to map
the users information need is a fundamental problem. A simple way to detect similarity
between images is through distance measures. In particular, a specific distance measure
could be designed for a single visual feature in a certain space to match the perceptual
similarity [103; 104]. However, simple distance measures are not always effective, there-
fore more complex methods are desirable for CBIR, which may be more effective as well.
Several distance measures are introduced in literature, which are categorized either for
metric space or for the histograms.
2.5.1 Metric Space
When feature vectors are used for image representation purposes that reflects some points
in a metric space, then similarity are usually determined by computing the distance be-
tween their corresponding points in the space. Various metric distances can be used for
this purpose to find the similarity [105].
2.5.1.1 Manhattan Distance
Manhattan distance is also known as the L1 distance, taxicab and city block distance. This
similarity detection metric can be calculated by using the formula of distance between two
points P = p1, p2, ...., pn and Q = q1, q2, ...., qn as the sum of their absolute coordinate
44
2.5 Similarity Measure
differences:
DistL1(P,Q) =n∑i=1
|pi − qi| (2.6)
2.5.1.2 Euclidean Distance
Euclidean distance is frequently denoted to as the L2 distance, and measures the shortest
path between two points: Euclidean distance is computed as:
DistL2(X, Y ) =
√√√√ n∑i=1
(|xi − yi|)2 (2.7)
Some applications like MARS [106] added weight component in the Euclidean distance
to compute the similarity. The modification can be represented as:
DistL2(X, Y ) =
√√√√wi
n∑i=1
(|xi − yi|)2 (2.8)
and is known as weighted Euclidean distance.
2.5.1.3 Minkowski Distance
The Minkowski Distance is a explanation of the metrics L1 and L2, in which the param-
eter P controls how the distance is calculated [105].
DistLP(X, Y ) =
n∑i=1
((|xi − yi|)P )(1/P ) (2.9)
Using 1 as value of P results in the form of Manhattan distance, while choosing the
value of Euclidean distance p =1 results in the form of Chebyshev distance. Fractional
distances can be acquired by choosing 0 |P | 1. As these distances violate the triangle
inequality therefore these distances are not the metric distances.
2.5.1.4 Hausdorff Distance
In region based image retrieval Hausdorff distance is used which can be computed as:
DistH(X, Y ) = max(maximinjD((|xi, yj|)),maxjminiD((|yj, xi|)) (2.10)
where D(|xi, yj|) and D(|yj, xi|) is underlying distance between vectors X =
x1, x2, ...., xi and Y = y1, y2, ...., yj
45
2.6 Performance Evaluation
2.5.2 Histogram Distance
In image retrieval histograms are frequently used, especially when color features are used
for image representation purposes. An alternative use of histograms is in the form of
probabilistic distributions, where often the likelihood of an image matching the query
concept is considered.
2.5.2.1 Earth Movers Distance
This distance metric computes the distance between two weighted distributions, by con-
verting values of the first distribution into those of the second distribution. The distance
can be computed as [107].
DistEMD(X, Y ) =
∑mi=1
∑nj=1 fijcij∑m
i=1
∑nj=1 fij
(2.11)
In above equation X = (x1, wx1), .....(xm, wxm) and Y = (y1, wy1), .....(yn, wyn) where
xi and yj are the cluster representative,and wi and wj are the corresponding weights of
clusters. Furthermore,cij represent distance between two clusters ′i′ and ′j′, fij represents
optimal flow in converting distribution X to Y .
2.5.2.2 Kullback-Leibler (KL) Divergence
For two probability distributions X and Y , KL is an asymmetric dissimilarity measure.
The KL divergence is used for measuring the similarity of texture features. One way of
interpreting its functioning is that, it is used to find the added number of bits, when code
of Y is used to encode the events which are sampled fromX . The KL divergence between
two distributions can be computed as:
DKL(X, Y ) =
∫ ∞−∞
X(j)logX(j)
Y(k)
dx,DKL(X, Y ) =∑i
X(j)logX(j)
Y(k)
(2.12)
2.6 Performance Evaluation
When evaluating a CBIR system, the main objective is to determine its performance.
Performance can be evaluated in terms of accuracy, e.g. how many mistakes the image
46
2.6 Performance Evaluation
retrieval algorithm makes, or in terms of computational complexity, e.g. how quickly does
the system present the results. In this section, a number of popular performance measures
are presented, which are frequently used for the performance evaluation of CBIR systems.
2.6.1 Accuracy
The major emphasis of most researchers is on assessing that, how well their proposed
method works, especially when comparing their work with the works of other researchers.
In the domain of CBIR, image retrieval algorithms are designed in a way that they can in-
crease the correct categorization rate or at least can assure that the top M images returned
as the system response should be relevant for a user. Four cases can be distinguished
when an algorithm assigns a label to an image [105], which are shown in figure 2.19.
From users point of view, retrieval accuracy should consider the case of true positives
only, as these are the related images against any user query image. But as the output dis-
played contains limited images, therefore the false positives can influence the number of
correctly labeled relevant images, as one or more of the displayed images may actually be
incorrectly labeled as relevant. This is the reason that the image retrieval performance is
usually obtained in precision values and recall values.
Figure 2.19: Accuracy Parameters - Correct and incorrect labeling of an image -
[105]
2.6.1.1 Precision
Precision is used to indicate how exact an algorithm is in retrieving the relevant images.
If we use the terminology of true and false or positives and negatives as shown in figure
47
2.6 Performance Evaluation
2.19. We assume that the retrieval system only returns us relevant images, then we can
express precision as follows:
Precision = [true positives] / [true positives + true negatives]
But if the scope factor is also present in the system response, than precision of image
retrieval will be defined as:
Precision = true positives / images in scope of response
As the relevant images in a specific category are large which is not possible to display
once. Therefore, we keep a constraint that, the fixed number of images will be displayed
in a single go. This limit is known as scope. This can be more clearly understood as,
suppose there are 100 relevant images against a query image, and we present 20 images
as the system output in a single go. These 20 images will be considered as the retrieval
scope. In this case precision will be known as precision rate. When precision values
are plotted at multiple scope levels, than this graph is called a precision-scope graph.
In relevance feedback based CBIR, the precision is often plotted against the number of
iterations. To create this precision and iteration graph the scope is fixed at a certain value,
usually the number of images that are displayed in a single value.
2.6.1.2 Recall
Recall indicates the completeness of a CBIR system in terms of returning the relevant
images as its response. Recall indicates the percentage of relevant images at different
scope levels. Recall can be calculated as:
Recall = (truepositives)(truepositives+flasenegatives)
Recall is also known as sensitivity of the system, it determines the ability of a system
in terms of model association. In recall, it does not matter that how many images are
displayed on screen or how many incorrect images are wrongly considered as relevant.
The reason is that the recall performance measure only focuses on relevant images that
are found thus far. Due to this fact, recall is also known as true positive rate. In case of
relevance feedback based CBIR system, the recall can also be plotted against the number
of iterations as a recall-iteration graph.
48
2.7 Data Sets
2.6.2 Precision-Recall Graphs
For increasing numbers of retrieved images, we can be computed precision-recall values.
In precision and recall graph, values are plotted in the form of the precision and recall
curves. Recall of retrieval algorithm is plotted on x-axis and precision of algorithm is
plotted on y-axis. An ideal goal for image retrieval system development is to bring the
improvements in the system, so that both precision and recall values can be increased.
2.6.3 Mean average precision
By averaging the precision values obtained every time a relevant image is encountered
you get a good sense of how well a method overall performs [105].
AP =∑N
i=1 Precision(i)
N
Where N=true positives+false negatives. By calculating average precision for multiple
queries and taking the average of all these values results in the form of mean average
precision (MAP). The MAP value is usually considered as the same value, as that of the
area under the precision-recall graph.
2.7 Data Sets
For content based image retrieval evaluation, many image-sets are used, such as Corel
stock photos, Caltech101, TRECVID, OLIVIA, and ImageCLEF. Corel stock photos are
most popular for performance evaluation in the domain of CBIR. Caltech101 dataset in-
cludes 101 image categories, which has been extended to 256 categories. Caltech is used
for object recognition and classification. Image CLEF is used for cross-language evalua-
tion. This dataset also provides 20,000 images, which can be used for CBIR performance
evaluation. The TRECVID benchmark is also used by the CBIR researchers to validate
their image retrieval algorithms. Some research works also used OLIVIA image-set hav-
ing data of eight semantic categories.
49
2.8 Multimedia Information Retrieval
2.8 Multimedia Information Retrieval
The Multimedia Information Retrieval (MIR) is a field that deals with the search of knowl-
edge in all forms of information media like video, audio and image. Content based strat-
egy is a huge essential method to access the necessary facts properly on the press data
source using concentration currently being upon increasing your research. The work to
extract the digital information began as soon as the idea of digitizing content that was
present in physical mediums such as books, vinyl records gained foothold. From a hypo-
thetical point of view, ranges, for example, areas such as artificial intelligence, computa-
tional vision, optimization theory, and pattern recognition have contributed significantly
to the basic scientific establishment of MIR. Psychology in addition to associated places
for example looks in addition to ergonomics supplied essential fundamentals for that dis-
cussion with all the consumer.
Furthermore, applications of pictorial search into a database of images already existed
in specialized forms such as face recognition using in biometric applications, robotic
guidance to traverse the terrain without hitting any obstacles and take the simplest way,
and to identify the characters in the textual data. At the forefront however was the field
of computer vision which provided some of the first algorithms for searching features
in video, audio, and images. Now increasing the internet Web engines caught on, and
started to provide image searches. Efforts were also made for integration of such systems
directly into commercial database systems. It was realized by scientists during the course
of developing media information systems that there was a widening semantic gap between
the low level features like colors and textures that were used in computations by scientists
and high level features like objects in an image that users generally searched for using
words from their daily language when searching for images of interest.
2.8.1 Image representations and similarity detection
The image retrieval (IR) is one of the trends in research the most active in the MIR.
The process associated with precise re-encoded type of graphic suitable for assessment
50
2.8 Multimedia Information Retrieval
objective is referred to as attribute removal. Capabilities can be classified seeing that low-
level characteristics as well as high-level characteristics; low-level features are usually
extracted from the actual pixel. CBIR system sees the query image and the images in the
database in the form of a set of features, and occupies the importance of the query image
and images any target in proportion to the similarity advantage.
IBMs QBIC system [47] as the first CBIR system, which opened avenues for research
in the field of CBIR. Then, many of the CBIR systems, which aim to address the problem
of image search more effective by addressing the types of measures and the signing of a
new way to detect the image of similarities emerged. In CBIR, image signature plays an
imperative role to fabricate an efficient image search. Signature development is usually
performed through the analysis of color [82; 108], texture [63; 109], or shape [110; 111;
112] or by generating any of these combinations and representing them mathematically
[113]. CBIR traditional techniques rely on two types of optical properties: global and
local features. The aim of algorithms based on the global advantage in the whole picture
as visual content such as color, texture and shape as focused as local algorithms feature
mainly based on the key points or salient patches.
In CBIR, color features are extensively used that can be recognized better potentiality
in three dimensional domains over the gray level images which is single dimensional
domain. Texture features as powerful visual features are used to capture repetitive patterns
of a surface in the images. The formation of human identity and recognize objects in the
real world is known as an important cue of the shape. Shape features form have been used
for the purpose of retrieving images in many applications [114]. Shape features extraction
techniques are classified into contour based and region based methods. The classification
of contour method based on shape boundary which is consist of contours and extracted
features from the contour, while region based methods extract the features from the entire
region.
In the work of [115] a color-texture and dominant color based image retrieval system
(CTDCIRS) is proposed and three different features from the images, i.e., Motif Co-
occurrence Matrix (MCM), Dynamic Dominant Color (DDC), and Difference between
51
2.8 Multimedia Information Retrieval
Pixels of Scan Pattern (DBPSP) are offered. Initially color quantization algorithm is used
to divide the image into eight coarse partitions and then from these eight portions eight
dominant colors are obtained. Next MCM and DBPSP are used to represent the texture
of image. For image representation these three types of features are integrated, which are
then used by the image retrieval system.
In the work of [116], a dominant color is defined as the color perceptual of a region
in HSV color space. In this technique, author calculated the dominant color feature by
using the color HSV histogram (10 * 4 * 4 bins) in a region and calculate the bin size with
maximum. To obtained the mean of HSV values of all pixels in the designated bins. This
value is known as Dominant color value in HSV color space. t was noted that in most
cases, the average dominant color and color are very similar.
In the work of [89], an image retrieval system is presented which is using texture
features. The technique combines the features obtained through the curvelet transform
and Region-based vector codebook Sub-band Clustering (RBSC) to obtain the dominant
colors and sub-band textures. In this approach, user define image and target images are
compared by using the principle of the most similar highest priority (MSHP) and eval-
uated for the retrieval performance. Lin et al. [2], proposed three image features for
efficient automatically retrieval of images. To extract the texture feature, difference be-
tween pixels of scan pattern (DBPSP)are used while color co-occurrence matrix (CCM)
are used to get the color features. The last image feature depends on color distribution,
called color histogram for K-mean (CHKM).
Jhanwar et al. [117], presented the MCM based technique for CBIR. In this work,
motif transformed image is computed and used for the driving of motif co-occurrence
matrix. 2x2 pixel grids are generated in the whole image. Motif scan and replaces the
network, which reduces the local gradient, while the network reflects and traversing the
2x2 grid forming the image motif transformed. 3D matrix is used to define the MCM
whose (i,j,k) entry in the transformed image, represents the probability of finding a motif
i at a distance k from the motif j. The concept of MCM is very similar to the color co-
occurrence matrix (CCM), but as MCM captures the third order image statistics as well
52
2.8 Multimedia Information Retrieval
in the local neighborhood therefore the retrieval using the MCM is better than the CCM.
Wang et al. [107] offered a method of semantics classification, that can be used a wavelet-
based approach to extract the feature and then to compare the image. In the work of [118],
proposed a method cluster-based retrieval of images by unsupervised learning (CLUE).
They proposed an unsupervised clustering-based method, which generates multiple sets
of results that have been recovered and give more accurate results compared to the previ-
ous work, but their way of issues such as numbers of identifying groups and uncertainty
segmentation, experienced by this method results suffer unreliable.
ElAlami [85] proposed a model which is based on three different techniques: the
first one is concerned to extract the features from images repository. For this purpose,
Histogram of color and Gabor filter are used to extract the combined features such as
color and texture features. Second technique depends on genetic algorithm and obtained
the optimal boundaries of these discrete values. In the last technique, the selection of
features consist of two successive function which are called preliminary and intensely
reduction for extracting the most similar features from the original feature repository sets.
2.8.2 Image block based presentation and salient points
In computer vision, Local Binary Patterns (LBP) is widely used for classification and it
gave the operator LBP first as a complement to local variation image [119; 120] and has
been observed that there is a strong advantage for the classification of texture. The primary
form of the LBP operator has been depended eight neighbors pixel and the center pixel
are used the value of threshold. An LBP neighborhood code was produced by multiplying
the threshold values with the weights given to the corresponding pixels, and summarize
the result.
Some other visual features are also proposed for CBIR system, important points such
as salient points and spatial features. SIFT [121] and SURF [122] based on salient points
found in an image are the familiar visual features. Researchers have done a lot of work by
using these salient points in CBIR. Velmurugan and Baboo [100] applied SURF features
by combining them with the color features to improve the retrieval accuracy. In [123]
53
2.8 Multimedia Information Retrieval
introduced human detection algorithm using histograms of oriented gradients (HOG),
which are similar to the features used in the descriptor Sift. HOG features are calculated
by taking the direction of the intensity histograms edge in the local area. It is designed
by the tradition of visual information processing in the brain and have the force of local
changes of appearances and position. The researchers showed that networks of HOG
descriptors are outside noticeable feature-implementation of human groups exist to detect.
Mallat and Peyre [124] introduced bandelet approaches to geometric image repre-
sentations. Orthogonal Bandelets using an adaptive segmentation are well appropriate
solution to capture the regularity of edge structure. They applied wavelet coefficient of
orthogonal transformation based on bandeletization. In the work of [125] proposed a
system based on bandelet transform and represented sharp image transitions such as edges
by taking the advantage of geometric regularity of the image structure in image fusion. To
create the fused image max rule is applied to select the geometric flow of the source image
and bandelet coefficient. In the work of [126] demonstrated significant improvements in
large scale image retrievals performance maintaining high retrieval speeds, by depicting
three improved models. RootSIFT among the established models expresses better per-
formance without increasing processing and storage requirements. Second model targets
inverted indexes and query expansion, whereas the last model retains augmented features
consistent with augmented images to speed up image retrieval effectiveness. Combination
of these three complementary models produces enhanced accuracy and efficient retrieval
speed. Therefore, to enhance image finding rate and simplify calculation of image re-
trieval, a dimensionality reduction technique must be adopted such as feature selection
[127].
2.8.3 Image classification and similarity detection
Neural networks are used as statistical tools as a part of different fields, which includes
Architectural, Statistics, Engineering, Psychology, Economics and also Physics. The aim
of this neural network is to learn or to discover some correlation between the input and
54
2.8 Multimedia Information Retrieval
output patterns, or to analyze, or to discover the structure of the input patterns. Neu-
ral network has been used to improve the image classification problems because of its
property called ’black box’ to learn [128]. There technique using ANN implemented in
two phases, which are training phase and testing phase. In the training phase, it must
be entered into the network data and target data entry. Therefore, the color and texture
extracted from the image data give data entry and the designation of a category of images
Base features gives objective data. The back-propagation learning rule is applied until the
network convergence is reached.
In the testing phase, to get an image of the query using the same procedure and tech-
nique to get the features to create a feature vector, which then become input to the neural
network trained to the process of retrieval. Network carriers appoint one or more of the
similar categories. Therefore, finding a good measure of the similarity between the im-
ages on the basis of some of the group difficult task feature, also it contributes to the
impact on the effective and efficient retrieval technique. When calculating the values as-
sociated with the base features images and store data, and can be queries. This type of
vector selected feature determines the type of measurement that will be used to compare
the similarity. If the display of images extracted features multi-dimensional dots, and
can calculate distances between multiple dimensions corresponding points. Euclidean
distance, after Manhattan, after the Euclidean distance weighted, minimum rule middle
distance, after the cross-correlation and statistical distance are the most common metrics
to measure the distance between two points in a multi-dimensional space [104].
In the work of [129] defines the classification of multi spectral with remote sensing
data by using Back-propagation Neural Network [130]. A combined learning technique
of mid level feature representation is presented in [131]. The procedure unites the ben-
efits of semantic features and higher expressive non-semantic vector representations to
infer closed form solutions of optimized problems. An auto-encoder model with large
margin principle was utilized to augment attribute based features with additional dimen-
sions to form an efficient retrieval model. The constructed model permits smooth alter-
ations between zero-data learning without training samples, unsupervised learning with
55
2.9 Chapter Summary
training samples but without class labels, and supervised learning with training samples,
to achieve high precision than the semantic or syntactic representations alone. Support
vector machine (SVM) and Relevance Feedback (RF), are used to solve classification
problem, in which relevant images and irrelevant images serve as two separate training
sets. SVM active learning [132], consider the samples near the SVM boundary and take
the labeled input from the users. The most informative samples are taken near the bound-
ary. Constrained similarity measure support vector machines (CSVM) [133] considers
the repository images belongs two different clusters to separate the boundary, and after
sorting the results are obtained. Asymmetric bagging and random subspace for support
vector machines (ABRS-SVM) [134] addresses the issue of imbalanced training sets by
generating the multiple versions of the SVM classifier by replacing the negative samples
with the positive duplicates. Improvement in the image retrieval occurs with this approach
but it is not significant, when feedbacks are severely imbalanced.
Irtaza and Jaffar [87] Presented a possible solution to retrieve similar images linguis-
tically from a large image repositories against any image query. The algorithm used ma-
chinery and tankers genetic support to reduce the gap between high-level and low-level
features. To avoid the risk of disintegration, and included relevance feedback are also
important in their work.
2.9 Chapter Summary
Some of the basics required for this research are presented in this chapter. The chapter
starts with an overview of content based image retrieval, than we discussed different query
scenarios in a CBIR system like query by example, image region based query, sketch
based query, query by multiple examples, and query by multiple modalities. In this chap-
ter, we also discussed the role of feature extraction, and different ways, which can be used
for extracting visual features from images. Another focus of the chapter is on similarity
computation and performance measurement. In this regard, we discussed popular ways
of similarity computation and machine learning techniques. The chapter also focuses on
56
2.9 Chapter Summary
the performance evaluation measures over the standard image benchmarks that are com-
monly used in the CBIR research. Another important focus of the chapter is to cover the
previous works that are done so far in the field of CBIR. Hence, a comprehensive review
of a few studies is also presented in this chapter.
57
CHAPTER 3
BANDELET TRANSFORM
3.1 Introduction
Wavelet bases incomplete to think regular images on the basis that they can not take
advantage of the geometric consistency’s structures. Certainly, the wavelet has support
construed square lattice box, that have not been adjusted for consistency anisotropic geo-
metric elements of counting the edges. There are a few edges for example, the curvelets of
[135] Candes and Donoho and the distorted bandlets of Le Pennec and Mallat [136] have
been acquainted with enhance the close estimation exhibitions of wavelets. The image
is decayed over vectors which extends the image and be in different directions vanishing
minutes to exploit the existing picture of consistency along the special bearings. Asymp-
totic hypotheses give better close estimation blunder rots in these edges thought about
to wavelet bases, yet curvelet and distorted bandlets don’t appear to plainly enhance the
numerical rough guess capacities of wavelets for most characteristic pictures and natural
images.
From a scientific perspective, the inquiry raised by these physiological models in
late is to comprehend if one can develop hierarchical representations of the image from
wavelet coefficients, to exploit geometric picture consistency. Characterizing a geometry
on wavelet coefficients to provide the ability to adapt to leave this geometry depends on
the image scale. This can be imperative for surfaces having multi scale structures taking
58
3.2 Surface Compression through Geometric Bandelet
after distinctive geometries at every scale.
We have to show that excellence in various geometric representation and directed
wavelet coefficients have a number of numerical calculations and points of interest on the
deterioration in the quality directly, for example, curvelet What is more, twisted Bandelet
outline. Instead of these past developments, and subsequent rules bandelet perpendicular,
and bases are orthogonal and obtain the wavelets normality. These bases are gotten from
a wavelet premise with a course of orthogonal administrators that characterize a discrete
bandeletization, which prompts a quick calculation.
3.2 Surface Compression through Geometric Bandelet
The geometry of regular surfaces is unpredictable and naturally multi scale [137]. The di-
verse methods applied to depict the geometry for a mixed bag of structures at distinctive
levels of point of interest: large scale structures (the customary 3D network representa-
tion), meso structures (knock guide or removal guide), and miniaturized scale material
structures (reflectance capacity). The image Surfaces are usually decomposed within ba-
sis of bandelet that has a speedily bandeletization algorithm. Finding these geometric
components is a badly postured issue, and researcher try to explain the regularity direc-
tion estimation and purpose regarding surface compression. In image processing, discrete
data are used. Sampling is a first step before any processing which can model various
acquisition process that is range scanning, reconstruction, remeshing of 3D model.
Several strategies are already planned to fix the condition associated with animations
geometry compression, understand the latest review associated with [138]. In order to
execute the compression, semi-regular remeshing may be utilized. The first structure
associated with wavelets over the triangulations seemed to be developed. The raising
scheme provides this specific structure and supplies a good device with regard to sur-
face area research. In reality, best known coders range from the normal multi resolution
structure associated with [139]. For images in addition to geometry image compression,
the algorithms are generally the most effective conversion idea degrade the signal within
59
3.2 Surface Compression through Geometric Bandelet
an orthonormal basis along with quantized resultant coefficients. The well known exam-
ples of these algorithms are the JPEG and JPEG2000. Making use of normal foundation
functions is additionally essential to stop adding blocking artifacts within the pressurized
transmission. Most notably are the curvelets [135], contourlets [140], wedge prints [141]
and non-linear subdivision schemes [142]. Even so, it’s unlikely that any these methods
are able to build the basis of normal orthogonal functions that is highly desirable for the
image compression.
The bandelet approximation [136] takes the benefit of irregular geometric image by
eliminating redundant information of the wavelet transform warped by applying bandele-
tization. However, the output converter is a non-orthogonal and warpped with unique
border. As an alternative, the second generation bandelet converter usually created more
than the standard orthogonal wavelet transform. Thus, it is simpler and orthogonal, and
without the impact of the border effect. This type of second bandeletization first genera-
tion of rearranging wavelet coefficients 2D and 1D perform wavelet transform. The steps
of the bandelet algorithm are:
1. Input image.
2. 2D wavelet transform.
3. Selecting each dyadic square.
4. Selecting each geometry.
5. Projection of the sampling locations.
6. 1D wavelet transform.
7. Selection of the best geometry.
8. Output of the transform.
9. Build the quadtree.
60
3.3 Bandelet Image Compression
The user gives an image to store as 3-channels by taking any geometry image tech-
nique. The compression rate of algorithm depend upon the value of threshold T. The
transform orthogonal or biorthogonal are applied on 3-channels (RGB) of the given im-
age and the results is a collection of images in 3-tupples (fHj , fVj , fDj ). The new images
f sj , for each scale 2j and orientation S ∈ (V,H,D) are stowed in single image with same
size of the original image f. Some sort of dyadic rectangular is actually by simply descrip-
tion the rectangular attained by simply recursively removing the original wavelet changed
impression f sj , into 4 sub-squares of similar size. To choose an appropriate geometry, the
direction (d) must be selected in a way that helps in minimizing the Lagrangian equation:
L(fd1, R) = ‖(fd1 − fdRq)2‖+ λT 2(RG +RB) (3.1)
where fdRq is the regenerated signal using the inverse function of wavelet transform in 1D
and RG represents total bits used to estimated to code of geometric parameter d having
some amount of an entropy. The transformation of f by this procedure is equal to the
decomposition of f on the basis of bandelet B. The bandelet function bu is defined by u
= ( j,S, k,m), where 2j is the size of scale of 2D wavelet transform, S is the dyadic width
of L pixel with 1 ≤ L ≤ 2−j , k ∈ 0, ....2 log2(L), and m ∈ 1....2k are the scale and index
in the ID wavelet transform. The bandelet function depends upon the wavelet, quadtree
segments with non overlapping squares and fixed scale of the geometry.
3.3 Bandelet Image Compression
At the point when a capacity characterized over [0, 1]2 has singularities that have a place
with general bends, one may exploit the normality of geometrical to upgrade the esti-
mation from the parameters of M. Mostly techniques, for example, Wavelet approximate
dividing the M-term are the properties locally and in this way cannot be that preference
point of such geometric normality. In this thesis presents a whole new category of bases,
using elongated tankers scale multi bandelet that adhere to the actual geometry, in order
to enhance the actual convergence
61
3.3 Bandelet Image Compression
3.3.1 Geometric image model
Functions that were typical almost everywhere outside the range of regular curves de-
fine the edge of the first basic model geometrically regular functions. Suppose Cα(Λ) is
Holderian functions of order α over Λ ⊂ <n for α > 0:
Cα(Λ) = f,<n → < : ∀|β| = bαc, ϑ|β|
ϑxβ11 ....ϑxβnn
f exists and satisfies (3.2)
Where α is an integer, and Cα is superior the function of space with bounded deriva-
tives up to α. Thus, the Norm of the function ‖f‖ Cα(Λ) is obtained as:
‖f‖Cα(Λ) = maxsupx∈Λmaxβ≤αϑβ
ϑxβ11 ....ϑxβnn
f(x), (3.3)
‖f‖Cα(Λ) = maxsup(x,y)∈Λ2maxβ=α|ϑβ
ϑxβ11 ....ϑxβnn
f(x)− ϑβ
ϑxβ11 ....ϑxβnn
f(y)|×‖x− y‖bαc−α
(3.4)
To obtained the geometrically regular edges by using the equation f ∈ Cα(Λ) for Λ
= [0, 1]2 - Cγ1≤γ≤G , Cγ are the edge curves of holderian of order α. These kinds of
effects of blurring along the edges may be modeled through a convolution through the
presence of unknown kernel of compact support h(x). Thus, we have:
f(x) = f∼ ∗ h(x) (3.5)
where f∼ ∈ Cα(Ω) for Ω = [0, 1]2- Cγ1≤γ≤G. To calculate an approximation fM
through M parameter satisfying the equation.
[f − fM ]2 ≤ CM−α (3.6)
where C is known as constant term which does not rely on the blurring kernel h. Can-
des and Donoho [135] proposed the image model using curvelets and satisfies an image
approximation fM through M Curvelets:
[f − fM ]2 ≤ CM−2(log2M)3 (3.7)
62
3.3 Bandelet Image Compression
On the contrary, the polynomial complexity is unknown that handles such approximation
fM algorithm and an error that always decays like M−2. The integration of the nucleus
in a blur of unknown geometric model produces a more complex problem. To be sure,
smooth corners are more problematic to distinguish between the sharp characters, along
with the Trinity has to be adapted to help the actual size of the blur to assist in bringing
precision image transitions along the edges.
3.3.2 Geometric Image Flow with Bandelet Bases
Previous Image model describe geometrically regular edges. Sharp transitions move
across the edges but has a normal difference when moving parallel to the edges. This dis-
placement parallel to the edges of geometric can flow, which is the field of parallel vectors
that give local direction that the f has a regular variations description. It is constructed
Bandelet rules by warping the orthogonal wavelet bases mixed with this geometric flow.
“Geometric flow is known as vector field ~τ(x1, x2) to produce directions in that vector
field in which the function f has regular variations in every pixels (x1, x2). Inside area of
advantage, the circulation is usually parallel towards tangents on the advantage contour.
To build a orthogonal schedule basis, which has a geometric circulation, most of us will
demand the circulation is actually locally often parallel from the top to bottom direction,
so because of this frequent with this direction, or perhaps parallel from the horizontal
direction. The function f include one edge C that have the angle with the horizontal or
vertical direction, which is still smaller than π/3, so parameters can be C horizontally
or vertically by function. Let’s assume that f is a model of horizontal horizon as shown
in figure 3.1. Determine the flow of parallel vertically whose angle with the horizontal
direction is smaller π/3. Such flow can be written as follows:”
~τ(x1, x2) = ~τ(x1) = (1, g′(x1)) (3.8)
Figure 3.1 show that flow line is an integral curve whose tangent at (x1, x2) is collinear to
63
3.3 Bandelet Image Compression
~τ(x1, x2). g(x) be prehistoric of g′(x) which is defined by
g(x) =
∫ x
0
g′(x)dx (3.9)
These points represent the Flow lines are (x1, x2) ∈ Ω which satisfy x2 = g(x1) + cst.
this flow is parallel from A band to B is defined as.
B = (x1, x2) : x1 ∈ [a1, b1], x2 ∈ [g(x1) + a2, g(x1) + b2] (3.10)
Figure 3.1: Horizon model with a flow - [135]
If your stream information tend to be adequately parallel towards the border infor-
mation, subsequently f(x) offers typical variations along side every single stream line
(x1, g(x1) + cst). Then, applied warped image so,
Wf(x1, x2) = f(x1, x2 + g(x1)) (3.11)
By applying the warping function on band B:
WB = (x1, x2) : (x1, x2) + g(x1) ∈ B = (x1, x2) : (x1 ∈ [a1, b1], x2 ∈ [a2, b2])
(3.12)
If Ψ(x1, x2) is a function having vanishing moments along x1 for x2 fixed, therefore
Wf((x1, x2)) is regular along x1
〈Wf,Ψ〉 = 〈f,W ∗Ψ〉 (3.13)
where W ∗ is the adjoint of the operator and W is an orthogonal operator.
W ∗f(x1, x2) = W−1f(x1, x2) = f(x1, x2 − g(x1)) (3.14)
64
3.3 Bandelet Image Compression
From 1D wavelet Ψ(t) and a scaling function Φ(t) are obtained as :
Ψj,m(t) =1√2j
Ψt− 2jm
2j(3.15)
Φj,m(t) =1√2j
Φt− 2jm
2j(3.16)
when wavelet scale 2j decreased then index j goes to negative infinity (j → −∞). so we
can obtain family of separable wavelets
φj,m1(x1)ψj,m2(x2),
ψj,m1(x1)φj,m2(x2),
ψj,m1(x1)ψj,m2(x2),
wherej,m1,m2 ∈ IWB (3.17)
“ The index set IWB depends upon the length and width of rectangle WB. Since W
is orthogonal, using the inverse to each of the wavelets yields an orthonormal basis of
L2(B) that is known as warped wavelet basis. After applying the inverse warping W 1 the
resulting functionsψl,m1 ψj,m2(x2−g(x1)) are called bandelet because their flow support is
parallel to the flow lines and is more elongated (2l > 2j) in the direction of the geometric
flow. Inserting these bandelet in the warped wavelet basis yields a bandelet orthonormal
basis of L2(B):”φj,m1(x1)ψj,m2(x2),
ψj,m1(x1)φj,m2(x2),
ψj,m1(x1)ψj,m2(x2),
wherej, l > j,m1,m2 (3.18)
3.3.3 Image Compression Through Bandelet
It will be compressed image under very bandelet through the first image segmentation
and support the flow of geometric in every region of the segmentation regions encoding.
Actual decomposition of the image bandelet transactions in the framework of the result-
ing quantum and then stored with the binary code. In image compression scheme, R is
quantity of bits to encode the frame Bandelet and Bandelet coefficient of f in this frame.
R = RS +RG +RB (3.19)
65
3.4 Orthogonal Bandlets
“where RS is the dyadic square segmentation, RG is the flow in each square region, and
RB is number of bits of the Bandelet coefficients. 2λ is square of size in an image, the
geometry flow is parameterized at a scale 2k by 2λ−k quantized coefficient am = qT2 with
qT ≤ Cθ. The flow factor RG is the sum of all squares to define a flow having the number
of bits required to calculae the scale. In a Bandelet frame F = (bi,m)i,m, all Bandelet
coefficients 〈f, (bi,m)〉 are uniformly quantized with a uniform quantizer QT of step T ”:
QT (x) = qT if(q − 1/2)T < x < (q + 1/2)T (3.20)
Total number of bits to encode the quantized Bandelet coefficients
RB ≤MB(log2(‖f‖/T )) + log2(C2Ψ‖f‖2
∞T−2) (3.21)
The image restored from its Bandelet coefficient is
fR =∑i,m
QT (〈f, bi,m〉)b′
i,m (3.22)
In Bandelet scheme, resulting distortion is D(R) = [‖f − fR‖]2 and obtained as:
D(R) = ‖f − fR‖2 ≤∑i,m
‖〈(f, bi,m)〉 −QT 〈(f, bi,m)〉‖2 (3.23)
For the MB nonzero quantized coefficient
D(R) ≤∑i,m
〈(f, bi,m)〉‖2 +MBT2/4 (3.24)
D(R) ≤ L(f, T/2,F) (3.25)
Small distortion rate can be obtained by finding a best Bandelet frame in DT 2 that mini-
mize L(f, T/2,F).
3.4 Orthogonal Bandlets
Bandelet technique is definitely an research software which in turn is aimed at benefiting
from sharp graphic changes within graphics. Some sort of geometric circulation, which in
66
3.4 Orthogonal Bandlets
turn shows recommendations in which the graphic dull amounts include standard different
versions, is employed in order to create bandelet bottoms within bandelet alter. This
bandelet bottoms result in optimal approximation premiums with regard to geometrically
standard graphics and so are proven to be productive within nonetheless graphic data
compression, online video data compression, as well as noise-removal algorithms.
3.4.1 Block Based Bandelet Basis
“This area portrays the development of a bandelet premise from a wavelet premise that is
distorted along the geometric stream, to exploit the picture consistency along this stream
and geometric flow to acquire the bases of orthonormal bandelet. Bandelet provides the
information of the block in an image and these block are called small region in an image.
Ω is geometric flow in a region along a vector field ~τ(x1, x2) which gives a new direction
where possesses typical modifications inside your neighborhood of each region (x1, x2) ∈
Ω. Orthogonal bases with the resulting flow is obtained. As first condition of regularity
enforce that the flow is either parallel or vertically are ~τ(x1, x2) = ~τ(x1) , or parallel
horizontally hence, ~τ(x1, x2) = ~τ(x2). The image S is partitioned into regions S = ∪iΩi,
and having each flow Ωi parallel means horizontal or vertical. Figure 3.2 [143] shows the
vertically geometric flow of real image in a region.”
Figure 3.3 [143] shows the example, image is divided into square region and each
region represents Ωi include at most one contour.
Figure 3.2: Flow in region in an image - [143]
In each region, contains a contour piece, this parallel flow can be selected to the tan-
67
3.4 Orthogonal Bandlets
gents from the contour curve and corresponds to the flow of line. Bandlets tend to be
constructed throughout these kinds of regions by applying the warping separable wavelet
bases and this process is known as bandeletization. When a geometric movement or flow
can be determined throughout Ω, this wavelet basis is exchanged by the basis of Ban-
delet. A flow line represented the flow of an integral curve, whose tangents are parallel
to ~τ(x1). Therefore, flow line associated to a fixed translation parameter x2 is a set of
pointx1, x2 + c(x1) ∈ Ω for x1 varying,
c(x) =
∫ x
xmin
c′(x)dx (3.26)
By development of the flow, the actual impression gray stage has normal modifications
Figure 3.3: Dyadic square segmentation of an image - [143]
combined these types of flow traces, thus the actual warped impression is regular along
the horizontal lines for x2 fixed and x1 varying.
Wf(x1, x2) = f(x1, x2 + c(x1)) (3.27)
If Ψ(x1, x2) is a wavelet having several vanishing moments along x1 for each x2 fixed,
then the inner product has a small amplitude.
〈Wf,Ψ〉 = 〈f,W ∗Ψ (3.28)
The warping operator W is an orthogonal operator since its adjoint is equal to its inverse
W ∗f(x1, x2) = W−1f(x1, x2) = f(x1, x2 − c(x1)) (3.29)
68
3.4 Orthogonal Bandlets
Therefore, warping operator and its inverse is written as W ∗ = W−1 and applying inverse
operator to each wavelet of an orthonormal basis of L2 (WΩ):
φj,m1(x1)ψj,m2(x2)
ψj,m1(x1)φj,m2(x2)
ψj,m1(x1)ψj,m2(x2)
wherej,m1,m2 ∈ IW (Ω) (3.30)
W−1 is orthogonal thus, warped wavelet orthonormal basis of L2(Ω)
φj,m1(x1)ψj,m2(x2 − c(x1))
ψj,m1(x1)φj,m2(x2 − c(x1))
ψj,m1(x1)ψj,m2(x2 − c(x1))
where(j,m1,m2) ∈ IW (Ω) (3.31)
If the geometric flow in Ω is parallel in horizontal direction, which is meaning that
~τ(x1, x2) = ~τ(x2) = (c(x2)) (3.32)
Now suppose xmin = infx2(x1, x2) ∈ Ω and c(x) =∫ xxmin
c′(x)dx. A warped wavelet
basis is constructed from a wavelet basis of
WΩ = (x1, x2) : (x1 + c(x2), x2) ∈ Ω (3.33)
φj,m1(x1 − c(x2))ψj,m2(x2)
ψj,m1(x1 − c(x2))φj,m2(x2)
ψj,m1(x1 − c(x2))ψj,m2(x2)
where(j,m1,m2) ∈ IW (Ω) (3.34)
The bandeletization replaces each family of scaling functions φj,m2(x2)m2by a family of
orthonormal wavelets that generates the same space. The resulting Bandelet orthonormal
basis of L2(Ω) is
φj,m1(x1 − c(x2))ψj,m2(x2)
ψj,m1(x1 − c(x2))ψj,m2(x2)
ψj,m1(x1 − c(x2))ψj,m2(x2)
where(j, I) > j,m1,m2 (3.35)
3.4.2 Fast Discrete Bandelet Transform
Bandelets inside an area are usually calculated through the use of the bandeletization to
help warped wavelets, that happen to be separable coupled the fixed route (horizontal
69
3.4 Orthogonal Bandlets
as well as vertical) along with on the movement outlines so long as many people con-
tinue being clear of your boundary connected with an image. A simple under the radar
bandelet transform may, consequently, become calculated coming from a fast separable
wavelet transform coupled this particular fixed route along with on the impression move-
ment outlines. The block basis of bandelet of previous section is built with the basis
of warped wavelet inside each region.“In image processing application, discontinuities
appear along the region boundaries when Bandelet coefficient is modified. To avoid this
situation (boundary effects), discrete Bandelet transform or discrete warped wavelet trans-
form are used. The fast discrete Bandelet transform related to an image partition⋃i Ωi
contains three steps:
1. Image Resampling of an region, that determine the image sample values along the
flow lines Ωi of the partition in each region.
2. Applied warped transform, some sort of warped wavelet change with a subband
filtering along the flow lines, which usually will go over the location border and
region boundaries.
3. Bandeletization process, transforms the warped wavelet coefficient to calculate the
coefficients of bandelet along the flow lines.
The fast inverse Bandelet transform includes the three inverse step:
1. Inverse bandeletization process,that recovers the warped wavelet coefficient and
bandelet coefficients along the flow lines.
2. Inverse warped transform, with a inverse subband filtering along the flow lines.
3. Inverse resampling, which computes the image samples along the original grid from
the samples along the flow lines in each region Ωi .
This segmented geometric flow is optimized for image compression and noise removal
applications.”
Noise Removal Application
Thresholds estimators in an orthonormal premise are especially effective to uproot added
70
3.4 Orthogonal Bandlets
substance commotions if the premise has the capacity rough the first flag with few nonzero
coefficients. For bandelet bases, this requires to evaluate and advance the geometric
stream in vicinity of added substance clamor. The punished estimation finds the particular
best bandelet basis which in turn decreases a great empirical risk that may be punished
from the complexity from the geometric move and flow. To estimate a signal f [n] from
the noisy data is:
X[n] = f [n] +W [n] (3.36)
where W [n] is gaussian white noise of variance σ. A thresholding in a Bandelet basis
B = [gm]1≤m≤N2 can be written as
F =N2∑m=1
ρT (〈X, gm〉)gm (3.37)
where ρT (x) is a thresholding at T : ρT (x) = x1|x|>T and σ is noise variance and accord-
ing to Donoho and johnstone [144] the value of threshold is set to T = γρ√
2loge(N2)
where ρ is remain constant.
The best Bandelet basis is the one that minimizes this risk among all possible Bandelet
bases. This requires to be able to improve your geometric stream regarding your bandelet
schedule within existence regarding noise. When the noisy data X is acquired by the
addition of Gaussian white noise then, to find best bases by applying the minimizing
appropriate penalized cost function. This function can be obtained by minimizing the
Lagrangian of distortion rate
D + λρ2RwithD = ‖X − F‖2 (3.38)
where methods the intricacy from the product for the reason that volume of parts
needed to program code the picked time frame and also the quantized coefficients regard-
ing within, for just a quantization stage equal to the patience.“whereR measures the com-
plexity quality of the model M as the number of bits expected to code the chose premise
basis B and the quantized coefficients of X in B, for a quantization step equivalent to the
thresholds T . In the context of image compression, given an image segmentation, flow
in each region Ωi is calculated by minimizing the quadratic image variation within the
71
3.5 Chapter Summary
Figure 3.4: Left columns gives zooms of noisy images having a PSNR = 20:19 dB.
The middle and left columns are obtained, respectively, with bandelet and wavelet
estimators - [143]
flow. To calculate the flow in presence of white gaussian noise,the variance β2 of Gaus-
sian filter θ and displacement parameters c′i[p] is parameterized in family of B-splines
dilated by 2l. It requires O(N2(log2N)2) operations to optimize the image segmentation
and the geometric flow in each region through thresholding estimator.” Figure 3.4 [143]
shows the PSNR of bandelet and wavelet threshold images and bandelet transform are
better performance to remove the noise in an image as compared to wavelet.
3.5 Chapter Summary
Some of the basics required for this research are presented in this chapter. The chap-
ter starts with an overview of Bandelet transform, than we discussed different techniques
which are widely used in image compression, noise removal application, multi scale struc-
turing, geometry compression. In this chapter, we also discussed, the role of Bandelet
transform to feature extraction, and different ways, which can be used for extracting vi-
sual features from images. “We demonstrate that characterizing a various leveled geo-
metric representation from wavelet coefficients has number of numerical and algorithmic
points of interest over direct deteriorations, for example, curvelet what’s more, twisted
Bandelet outlines. Rather than these past developments, the subsequent bandelet bases
72
3.5 Chapter Summary
are orthogonal and acquire the wavelets normality they are developed from. The geom-
etry can likewise be adjusted at every scale. These bases are achieved from a wavelet
premise with a course of orthogonal administrators that characterize a discrete bandele-
tization, which prompts a quick calculation. Bandelets are an orthonormal basis that is
adapted to geometric boundaries. Bandelets can be interpreted as a warped wavelet ba-
sis. The motivation behind bandelets is to perform a transform on functions defined as
smooth functions on smoothly bounded domains.”Another important focus of the chapter,
is to cover the previous works that are done so far in the technique of Bandelet Transform.
73
CHAPTER 4
FEATURE EXTRACTION USING
BANDELET TRANSFORM
One of the major requirements of CBIR systems is to ensure meaningful image retrieval
against query images. The performance of these systems is severely degraded by the
inclusion of image contents, which do not comprise the objects of interest in an image
during the image representation phase. Segmentation of the images is considered as a
solution but there is no technique that can guarantee the object extraction in a robust
way. Another limitation of the segmentation is that, most of the image segmentation
techniques are slow and their results are not reliable. To overcome these problems, a
bandelet transform based image representation technique is presented in this research,
which reliably returns the information about the major objects found in an image. For
image retrieval purposes, ANN and SVM are applied and the performance of the system
and achievement is evaluated on three standard data sets used in the domain of CBIR [6].
4.1 Introduction
CBIR systems generate meaningful image representations by considering the visual char-
acteristics of images, and bring closely resembling images in terms of distance as the
semantic response. In this regard, one of the major challenges is semantic gap, i.e., fea-
74
4.1 Introduction
tures at low level are not sufficient to characterize the high level image semantics [22]. To
bridge this gap to some extent, an important focus of research is on the enhancement of
these features, so that the machine learning algorithms can make significant improvements
to bridge this gap. To reap the benefits of the segmentation based image representations
and overcome the associated drawbacks, the focus of the research is on the identification
of the image segments that contain major image objects by applying Bandelet transform.
Bandelet transform returns the geometric representation of the texture of the object re-
gions, which can be used to discriminate the objects of interest in a fast way. The detailed
procedure will be described in the section 4.2.1. The major problem with the geormatric
output is that its empathy is complicated due to the close resemblance of the connected
regions. In order to ensure the actual association, artificial neural networks are applied
and correct texture classification is performed. We, then apply the Gabor filter and gen-
erate the texture representation which is based on the classification output. To further
enhance the image representation capabilities, we have also estimated the color content in
the YCbCr color domain and HSV color domain and defused it with texture feature.
As described by the Irtaza et al., major drawbacks faced by a CBIR system or query
by image content which severely impact the retrieval performance [87] are: (1) the lack of
output verification and (2) neighborhood similarity avoidance for the semantic association
purpose. Therefore, we have followed their findings and also included the neighborhood
in the semantic association process. Content based image retrieval is then performed by
the artificial neural networks after training them with these obtained features. Bandelet
transform is used for medical image retrieval [145]. However, their approach for feature
extraction is different than our approach. Before this, researchers have utilized the ban-
deletization property for image compression [146], image enhancement [143] and gender
classification [147].
The technique that we present in this research considers the most prominent objects
which exist in an image using the object geometric representation obtained by bandelet
transform in a precise manner. The texture information found in object boundaries are
then utilized for use as the component of the feature vectors after applying the targeted
75
4.2 Image Representation using Bandletized region include YCbCr Color Space
parameters to the Gabor transform based on the Artificial Neural Network suggestions.
The features are further improved by incorporating the color information in YCbCr do-
main. Image semantics are then obtained by the Artificial Neural Networks. Other fea-
tures obtained the color information in HSV domain and support vector machine are used
for Image semantic purposes.
4.2 Image Representation using Bandletized region in-
clude YCbCr Color Space
The most important capability of the proposed method is its attribute for identifying the
most prominent objects in an image. These objects are, then considered as the core out-
comes which are used for the generation of feature vectors. To achieve this, first of all
image transformations are generated through Bandelet transform, which return the geo-
metric boundaries of the major objects found in an image. We apply Gabor filter with
targeted parameters (as will be described) to estimate the texture content around these
boundaries. These geometric boundaries are vague in a sense that they could easily be
deceived to be associated with unwanted texture classes as all of them closely resemble
to one another and if not carefully considered, can result in the form of wrong parame-
ter estimation, which will be consequence of the form of unsatisfactory image retrieval
output. Therefore, to avoid this situation geometric classification is performed through
the backpropagation neural networks, which make it certain that the texture estimation
parameters to apply Gabor filter should be approximated with maximum accuracy.
To increase the power of feature vectors, color components are also induced in the
YCbCr domain after approximating it through wavelet decomposition over the color his-
tograms. The proposed features of all the images are applied in the image repository, and
is determined their classes semantic through ground truth training with Artificial Neural
Networks and the finer neighborhood of every image. We generated inverted index over
the semantic sets, which guarantees the fast image retrieval after determining the semantic
76
4.2 Image Representation using Bandletized region include YCbCr Color Space
class of user image (query). The complete process of the proposed method is represented
in figure 4.1 and the detail of the process will come in the subsequent sections.
4.2.1 Modified Bandelet Transform
The issue with Wavelet bases was that the same values of texture have different directions
in an image. To overcome this limitation, Le Pennec and Mallat et al. [124; 136] proposed
the Geometric regularity in an anisotropic way by eliminating the redundancy of wavelet
transform using the concept of bandeletization. Bandelet transform is a major self adap-
tive multiscale geometry analysis method which exploits the recognized geometric infor-
mation of images as compared to the non adaptive algorithms such as curvelet [62; 89]
and Contourlet transforms [148]. Bandelet transform not only has the uniqueness of mul-
tiscale analysis, directionality and anisotropy but it also presents particular possessions of
severe sampling and adaptability for image representation. Bandelet basis rules accumu-
late carriers extend in the direction perpendicular to the regularity of the maximum of the
function, as shown in figure 4.2. Alpert transform is used for bandeletization that closely
follows the geometry of underlying images. The main objective is to take the advantage
of sharp image transitions by computing the geometric flow to form bandelet bases, which
capture the grayscale images constantly changing direction.
77
4.2 Image Representation using Bandletized region include YCbCr Color Space
Figure 4.1: Proposed Method.
78
4.2 Image Representation using Bandletized region include YCbCr Color Space
Figure 4.2: Bandelet Transform [1; 2]. (a) Dyadic segmentation depends on local di-
rectionality of the image; (b) Bandelet segmentation square which contains a regularity
function as shown by the red dash; (c) sampling position and Geometric flow; (d) Sam-
pling position adapted to the warped geometric flow; (e) warping example.
As shown in figure 4.3, bandelet transform divides the image into square blocks and
obtains one contour (Ωi) from it. If a small region of an image does not comprise any
contour, it means that intensity of an image is uniform and regular in that region, therefore,
the flow of line is not defined.
Figure 4.3: Geometric flow representation using different block sizes (a) small size 4*4
(b) medium size 8*8 (a) small size 4*4; (b) medium size 8*8.
79
4.2 Image Representation using Bandletized region include YCbCr Color Space
4.2.1.1 Alpert bases in bandelet transform
As per the work of [149], Alpert transform was applied to compute the bandelet bases
to approximate images having some geometric regularity. For this, image is divided into
square blocks and the geometric flow is estimated in every block. These square blocks are
represented by S. In our implementation, the block size is 8×8. As we elaborated in figure
4.3, if, we use smaller blocks, i.e., 4 × 4, then it will divide the image in more chunks.
However, the drawback is that in this case, the Bandelet transform would not be able to
capture the sharp edges and similarly, if the block size is larger, i.e., 16 × 16 or 32 × 32,
then the geometric flow exceeds the object boundaries. Hence, through the experimental
observations, we used the block size of 8 × 8 for appropriate object estimation. Alpert
transforms parallel to the geometric flow, which is developed over the space l2(S) of
wavelet coefficients in S with the polynomials of piecewise over bands of dyadic widths.
The direction of geometric flow γ is expected in the domain of S and warping operator
W warps S into derivative of S. In the theory of warping function, any point xn = 2jn
is warped into xn = W (2jn). Similarly, l2(S) represents the function sampled in warped
domain, i.e., g(xn)(2jn) ∈ S. To explain the multiresolution for each scale 2l, the warped
square S is recursively subdivided into 2−l horizontal bands through:
S =2−l−1⋃i=0
βl,i (4.1)
In equation (4.1), 2nd term depends upon βl,i = βl−1,2i ∪ βl−1,2i+1. Now the value of
band is calculated in original Square “S” using Alpert multiresolution space Vl ⊂ l2(S)
i.e. βl,i =def W−1(βl,i) ∈ S that has the width roughly equal to λ2l and sampling points
equals to 2l(λ2l)2. Alpert vector is obtained through equations (4.2) and (4.3):
∀Vl = g ∈ l2(S) ∈ βl,i (4.2)
g(xn) = Pi(xn) (4.3)
According to multiresolution space orthogonal bases (hl,i,k)i,k of each space are acquired
by the using of Gram Schmidt orthogonalization and resulting vector is obtained as:
pk(xn) = (x1)k1(x2)k2 (4.4)
80
4.2 Image Representation using Bandletized region include YCbCr Color Space
Alpert wavelets (Ψl,i,k)i.k are represented the orthogonal bases of orthogonal complement
(wl) of (Vl). Therefore, we compute Alpert wavelet vectors (Ψl,i,k)k after applying Gram-
Schmidt orthogonalization:
[hl−1,2i,k − hl−1,2i+1,k]k1+k2 < p ⊂ Vl−1 (4.5)
The consequential multiwavelet vectors (Ψl,i,k) have vanishing moments over the warped
domain which is orthogonal to (Vl):∑(Ψl,i,k)(x)(x)k = 0 (4.6)
The above equation satisfies the condition, where (x)k = (x1)k1 are the requirements
of each point xn = (x1, x2) in the warped domain. The orthogonal bases (Ψl,i,k)l,i,k of
l2(SS) describes an orhogonal alpert bases in l2(S) domain.
(ψl,i,k)(xn) = (ψl,i,k)(xn) (4.7)
In square block at scale 2l we calculate the orthogonal alpert bases β(S, γ) of l2(S) by:
β(S, γ) =def (ψl,m)|L ≤ l ≤ 0 and 0 ≤ m < p(p+ 1)(2l−1) (4.8)
Bandelet transform provides the information of each square block and the flow of each
square S is undefined then drawn a perpendicular onto β(S, γ) and skip the wavelet co-
efficient in S as unchanged. Bandelets have to provide the dyadic segmentation with
bandeletization bases β(Γj) of the whole space of wavelet coefficients at a scale 2j .
β(Γj) =⋃S∈Sj
β(S, γ′S) (4.9)
After applying alpert transform, we get a vector for each square, i.e.,
ψv[n] = ψl,k)[n] (4.10)
In the equation above ψv[n] are the coordinates of bandelet function and fit in the space
of L2([0, 1]2). These coordinates are further used to calculate the bandelet bases, which
is called bandelization.
β(Γj) =⋃j≤0
bv | ψv ∈ β(Γj) where Γ =⋃j≤0
Γj (4.11)
81
4.2 Image Representation using Bandletized region include YCbCr Color Space
Bandelet bases are important factor to calculate the geometric images. Therefore, in ban-
delet transform best bases are obtained by minimizing the Lagrangian function:
β(Γ∗) = argminβ(Γ)∈DT2L(f, β(Γ), T ) (4.12)
Bandelet transform pursues the above equation to generate the geometries of an image.
The value of threshold has impacts on the diversity of the image estimation. Different
values can be adopted, i.e., 32, 48, 56, ..., etc. Therefore, in our implementation, we used
a threshold value of 70 after detailed experimentation, which is able to estimate the theme
object in an image. In the work of [125], each block is estimated in discrete wavelet bases
of L2(Ω) domain i.e.,
φj,m(x) = φj,m1(x1)φj,m2(x2)
ψHj,m(x) = φj,m1(x1)ψj,m2(x2)
ψVj,m(x) = ψj,m1(x1)φj,m2(x2)
ψDj,m(x) = ψj,m1(x1)ψj,m2(x2)
wherej,m1,m2 ∈ I(Ω) (4.13)
where I(Ω) is region of an image which depends upon the geometry of the boundary of
(Ω) and x1, x2 denotes pixel location in the image. Above equation represents that the
modified wavelets and geometry flow, which is calculated in region (Ω). These wavelet
bases are replaced by bandelet orthonormal bases of L2(Ω). Then,
φl,m1(x1)ψj,m2(x2 − c(x1))
ψj,m1(x1)φj,m2(x2 − c(x1))
ψj,m1(x1)ψj,m2(x2 − c(x1))
where j,m1,m2 ∈ I(Ω) (4.14)
Where c(x) defines the line flow associated to fix translation parameter x2, (x1, x2+c(x1))
be in the right place to (Ω) and the direction of geometric flow is extended. Then, c(x) is
obtained as:
c(x) =
∫ x
xmin
c′(x)dx (4.15)
This flow is parallel and c′(x) is calculated as an expansion over translated function dilated
by a scale factor 2l. Then, the flow at this scale is characterized by:
c′(x) =2k−l∑n=1
anb(2−lt− n) (4.16)
82
4.2 Image Representation using Bandletized region include YCbCr Color Space
The bandeletization of wavelet coefficient use alpert transform to define a set of bandelet
coefficients and by using these coefficients combined as inner product of original image
f with bandelets:
bkj,l,n(x) =∑p
al,n[p]ψkj,p(x) (4.17)
Local geometric flow depends upon these coefficient and scales. Therefore, for each scale
2j of the orientation k a different geometry is obtained. After bandeletization process,
we have achieved multiscale low and high pass filtering structure similar to wavelet trans-
form. The above equations 4.12 and 4.17 are used to calculate the geometry of the images
as shown in figure 4.4. The regions having contours which are further used for texture
classification to compute the features using Artificial Neural Network structure.
Figure 4.4: Object categorization on the base of Geometric flow obtained through Ban-
dletization.
83
4.2 Image Representation using Bandletized region include YCbCr Color Space
4.2.1.2 Texture Feature Extraction using Bandelet
Texture is significant module of human visual perception and many researchers have done
a lot of work to determine effectively characterize it in an images. In this regard, we
have proposed a new method to figure out the most prominent texture areas in the im-
age that constitutes the major image objects. In the proposed method, first of all, image
transformations are generated through bandelet transform, which gives back the geomet-
ric boundaries of the major objects found in an image. Secondly, we apply Gabor filter
with targeted parameters to estimate the texture content around these boundaries. These
geometric boundaries are indefinite in a sense that they could easily be duped and to be as-
sociated with undesired texture classes as all of them closely resemble to one another; and
if not carefully considered can result in the form of wrong parameter estimation, which
will be in consequence of the unsatisfactory image retrieval output. Therefore, to avoid
this situation geometric classification is performed through the back propagation neural
networks, which makes it certain that the texture estimation parameters after applying the
Gabor filter should be approximated with maximum accuracy. The following are the main
steps of texture feature extraction:
(1) Convert input RGB image (I) of size M ×N into gray scale image.
(2) Apply bandelet transform to calculate the geometry of an image and obtain the
directional edges.
(3) Artificial Neural Network is used to classify these blocks having directional edges
after training on the sample edge set as described in Figure 4.6. Once the network
is trained, every geometric shape obtained in the step 2 is classified for parameter
estimation. These parameters will further be described in the Gabor filter section.
(4) After parameter estimation, the blocks with geometric contents are passed to the
Gabor filter to estimate the texture.
(5) Steps 1 to 4 is repeated for whole image repository.
84
4.2 Image Representation using Bandletized region include YCbCr Color Space
4.2.1.3 Artificial Neural Network
Neural Networks (NN) are renounced as powerful and dominant tools in the area of pat-
tern recognition and are inspired by biological neurons found in the human brain. The
least mean square rule and the gradient search method are used to minimize the average
difference between input and target values on neural network [150]. The backpropagation
neural networks are applied to classify the texture on the base of geometry returned by
the bandelet transform. In this regard, we classify the texture directions in either horizon-
tal, vertical, right/left diagonal or no contour blocks, by training on a small set developed
manually. For this, we have placed 14 block samples representing the mentioned geo-
metric types in every category, as described in Figure 4.6. To generate these samples, we
consider only the image geometry and suppress the original image part. Once the network
is trained, we apply it to classify every block present in the image. The reason to perform
this task with the help of ANN instead of kernel (Window based operations used in im-
age processing) is that, the geometry is not fixed and has different variations for same
category. In this situation, the performance of the kernel based operations is miserable.
Therefore, the ANN is applied and it classified the texture with maximum accuracy. Fig-
ure 4.5 shows the structure of neural network. The neural networks structure is defined
with one hidden layer having 20 neurons and four output units. The sigmoid function is
used in hidden layer and output layer as transfer function, i.e.,
f(x) = g(x) =1
1 + exp(−x/x0)(4.18)
After training of the neural networks, all blocks of an image are tested against the
neural network, and their texture type is determined as:
m↓ = argmax(yfm) (4.19)
where m represents the total number of output units in the neural network structure, yfm
returns the association factor of a particular output unit. The texture type m↓ of the par-
ticular class is based on the output of the neural network with highest association factor.
Details of the neural network structure is summarized in table 4.1.
85
4.2 Image Representation using Bandletized region include YCbCr Color Space
Figure 4.5: The structure of neural network.
Figure 4.6: Types of texture.
86
4.2 Image Representation using Bandletized region include YCbCr Color Space
Table 4.1: Summary of Neural network structure for every image category used in this
work.
INPUT
Input: −→a = (a1, . . . , an) dim(−→a ) = N
MIDDLE (HIDDEN) LAYER
Input:−→b = U−→a dim(
−→b ) = M
Output: −→c = f(b - −→s ) dim(−→c ) = M
U: MxN weight matrix
f: hidden layer activation function
−→s : thresholds
OUTPUT LAYER
Input:−→d = W−→c dim(
−→d ) = K
Output: −→e = g(d -−→t ) dim(−→e ) = 1
W: 1xM weight matrix
g: output layer activation function−→t : thresholds
ERROR CORRECTION
MSE: E = 1/2(−→p −−→e )2
∆Wij = - α∂E/∂Wij = αδicj
∆ti = αδi
∆Uji = - β∂E/∂Uji
4.2.1.4 Gabor Feature
“Gabor filters are extensively used in the vicinity of computer vision and pattern recog-
nition. Several conquering applications of Gabor wavelet filter include feature extraction,
87
4.2 Image Representation using Bandletized region include YCbCr Color Space
segmentation of texture, face recognition, identification of finger prints, edge detection,
contours detection, directional image enhancement, and image hierarchical representa-
tion, compression, recognition. Gabor is a strong technique to reduce the noise and can
easily reduce image redundancy and repetition [151]. Gabor filters can be convolved on
a small portion of an image or can be convolved on the full image. An image region is
expressed by the different Gabor responses generated through different orientations, dif-
ferent frequencies and angles [87; 152]. For an image I(x, y) with size M×N, its discrete
Gabor wavelet transform is given by convolution:” “
Gmn(x, y) =∑s
∑t
I(x− s, y − t)ψ∗mn(s, t) (4.20)
where s and t are filter mask size and ψ∗mn is the complex conjugate of ψmn which is the
self similar function created from rotation and dilation of following wavelets, i.e.:
ψ(x, y) =1
2πσxσyexp[−1
2(x2
σ2x
+y2
σ2y
)]exp(j2πλx) (4.21)
where λ is the modulation frequency. The generating function is used to obtain self similar
Gabor wavelets.
ψmn(x, y) = a−mψ(x, y) (4.22)
where m and n specify the scale and orientation of wavelet,with m = 0, 1, ...,M − 1 and
n = 0, 1, ..., N − 1. In the above equation, we calculate the term x, y, i.e.:
x = a−m(xcosθ + ysinθ) (4.23)
y = a−m(−xsinθ + ycosθ) (4.24)
where a > 1 and θ = nπ/N . In Gabor filter σ is the standard deviation of the Gaussian
function, λ is the wavelength of harmonic function, and θ is the orientation.”
In our implementation, blocks having bandelet based geometric response are passed
to the Gabor filter and based on the neural network classification, we select the parameters
for the application of the Gabor filter [152].
For horizontal texture portions:
θ = π, and λ = 0.3.
88
4.2 Image Representation using Bandletized region include YCbCr Color Space
For vertical texture portions:
θ = π/2, and λ = 0.4.
For left diagnol texture portions:
θ = π/4, and λ = 0.5.
For right diagnol texture portions:
θ = 3π/4, and λ = 0.5.
Energy computation is performed using following equation:
Fv = µ((A− λEI)X) (4.25)
where Fv is the feature vector. λE are the eigen values, X is the eigen vector, and A is the
Gabor response on a particular block.
4.2.2 Color Feature Extraction
In CBIR, color is the most imperative and significant visual attribute. It has been exten-
sively studied and the motivation is that: the color estimation is not sensitive to rotation
translation, and scale changes. Varieties of color spaces are available and serve effectively
for different applications [153; 154]. The color features in our work are extracted on the
basis of edge detection in YCbCr color space. Edges are extracted by applying canny
edge detector on Y luminance component. The main steps of color feature extraction are
as under:
(1) RGB image (I) is converted into Y CbCr color space.
(2) After conversion, we separate the Y CbCr components and apply canny edge detec-
tor on Y component of the image.
(3) In the next step, we combine the edges obtained in the previous step with unchanged
Cb and Cr.
(4) After step (3), convert the combined image back into single RGB image.
(5) Now separate the individual R, G, and B components and calculate the histogram
of each component. 256 bins are obtained from HR, HG, and HB.
89
4.2 Image Representation using Bandletized region include YCbCr Color Space
(6) To improve the feature performance, we apply wavelet transform at each histogram
obtained in the previous step. We apply the discrete wavelet transform of HR at
level 2, HG, and HB are applied at level 3. After this step, we have 128 bins i.e. 64
bins from HR, 32 bins from HG, and 32 bins from HB.
(7) Calculate feature vector for every image in the repository.
Figure 4.7 shows that the color features are obtain as describe in above mentioned step.
Figure 4.7: (a) RGB Original Image; (b) Y matrix Luminance Image; (c) Canny Luma
Image; (d) Canny RGB Image.
4.2.3 Fusion Vector
Application of the above mentioned procedure generate two feature vectors, represent-
ing texture features obtained from Bandelet transform and color features obtained from
YCbCr color space. Aggregation of these feature vectors in a single vector represents
feature vector features against any image.
90
4.3 Image Representation using HSV
4.3 Image Representation using HSV
The most significant capability of the proposed algorithm is its characteristic for identi-
fying the most prominent objects in an image, which considered as the core outcomes
are used to obtain feature vectors. For this purpose, first of all image transformations are
generated through bandelet transform, which return us the geometric boundaries of the
major objects found in an image, and use that particular information for image represen-
tation that ensures the retrieval of images in a more precise way. The main attribute of
the Bandelet based object estimation is its enormous speed that needs to perform only n
operations to achieve this goal. In bandelet n are the total number of non overlapping
blocks in which the image is divided. Then, Gabor filter is applied with targeted parame-
ters (as will be described) to estimate the texture content around these boundaries. These
geometric boundaries are erroneously selected that they could easily be deceived to be
associated with redundant and unwanted texture classes, as all of them closely resemble
to one another; and if not carefully considered can result in the form of wrong parameter
estimation, which will consequence in the form of unsatisfactory image retrieval output.
Therefore, to avoid this situation geometric classification is performed through the back
propagation neural networks as will be described in 4.2, which make it certain that the tex-
ture estimation parameters to apply Gabor filter should be approximated with maximum
accuracy. To boost the facility involving feature vectors, color components are induced
within the HSV domain soon after approximating this by way of wavelet decomposition
over the color histograms. The proposed features are applied on all images present in the
image repository, and their semantic classes are determined through ground truth train-
ing with Support Vector Machine and the finer neighborhood of every image. We also
generate inverted index over the semantic sets, which guarantees the fast image retrieval
after determining the semantic class of query image. Complete process of the proposed
method is represented in figure 4.8.
91
4.3 Image Representation using HSV
Figure 4.8: Proposed Method HSV.
4.3.1 Color Feature HSV Domain
Varieties of color spaces are available and serve effectively for different applications [153;
154]. The color features, in our method, we are extracted on the basis of edge detection
in HSV color space; To extract the edges on H Hue component, canny edge detector is
used. To obtain the color feature, main steps are:
(1) RGB image (I) is converted into HSV color space.
(2) After conversion of step 1, we separate theHSV components and apply canny edge
detector on H component of the image.
(3) In the next step, we combine the edges obtained in the previous step with unchanged
S and V .
(4) Convert the combined image back into single RGB image after step (3).
92
4.3 Image Representation using HSV
(5) Now separate the individual R, G, and B components and calculate the histogram
of each component. 256 bins are obtained from HR, HG, and HB.
(6) To improve the feature performance, we apply wavelet transform at each histogram
obtained in the previous step. We apply the discrete wavelet transform of HR at
level 2, HG, and HB are applied at level 3. After this step, we have 128 bins i.e. 64
bins from HR, 32 bins from HG, and 32 bins from HB.
(7) Calculate feature vector for every image in the repository.
Figure 4.9 shows that the color features are obtain as described in above mentioned steps.
Figure 4.9: (a) RGB Original Image; (b) H matrix Hue Image; (c) Canny Hue Image; (d)
Canny RGB Image.
4.3.2 Combine texture and HSV color feature
Application of the above mentioned procedure generate two feature vectors, representing
texture features obtained from Bandelet transform and color features obtained from HSV
color space. Aggregation of these feature vectors in a single vector represents feature
vector features against any image.
93
4.4 Chapter Summary
4.4 Chapter Summary
Content-based image retrieval systems can perform consistently well if they are able to
verify their output once the images are retrieved. The efficient image retrieval and the
output that is semantically correct need the images to be represented in a powerful way,
and must be transparent to the variations of the features that can cause the undesired output
when matched with the images present in the repository. To meet all these goals, an image
representation scheme is introduced in this chapter that performs the in-depth analysis of
the texture by the Bandelet transform, Artificial Neural Network and the Gabor filters. To
further enhance the image representation capabilities, color features are also incorporated.
All this guarantees the retrieval of images in a more systematic way.
94
CHAPTER 5
STATE OF ART CLASSIFIER FOR
IMAGE RETRIEVAL
“The section compares the proposed method, against several existing state of art tech-
niques based on CBIR. It is to provide details of experiments and comparative analysis
in the following subsections. For image retrieval purposes, Artificial Neural Networks
(ANN) and Support Vector Machine (SVM) are applied and the performance of the sys-
tem and achievement is evaluated on three standard data sets are used in the domain of
CBIR.”
5.1 Semantic Association
The semantic image retrieval refers towards the capability of a system to understand the
meanings of the visual content that constitute an image, and when searched than bring
the images, where the same concept dominates. Efforts are carried out for determining
the ways through which the correct semantic association rate against the queries can be
increased. The research in the area of CBIR, ultimately revolves around this one intention.
But the real challenge exists in the interpretation of the semantics, as it is a challenging
thing even for the human beings, and machines suffer the same problem with far more
severity. A two-fold strategy is normally followed in the standard CBIR systems for
95
5.1 Semantic Association
improving the semantic association:
• Generate the powerful image representations, that could make it certain that the
meaningful image retrieval will occur in the response of the queries.
• Diminish the semantic gap between low-level image descriptions (here the image
descriptions are interchangeably used for image representations) and the high level
image semantics. Many CBIR systems apply different machine learning techniques
to achieve this goal [155].
Hence, it is believed in the CBIR research that the meaningful image representations lead
to the high semantic retrieval performance. Therefore, a lot of research in the domain of
the content based image retrieval has been carried out for the improvement of the visual
features and determining the new feature types. These approaches usually match the
feature vector of the query image with the feature vectors of the images present in the
repository and index of the images on the basis of similarity.
“But this approach suffers from two main flaws due to which the results of this ap-
proach are not satisfactory: (1) CBIR rely solely on image features; that they recovered
the order of images based on the distance feature with an image query and generate out-
put; in this respect, they do not check their outputs. The problem with this methodology
is that: many images may seem as the response of the images while they are not neces-
sarily relevant at all. For example as described in the figure 5.1: the CBIR system may be
deceived by the girl’s image, against the dog image as query, because both of the images
are very close to each other in the terms of feature distance. (2) Secondly, the similarity
amongst the neighbors of the query image for output finalization which is a major reason
of the inconsistent output. As it is our experimental observation, that the probability of
the right semantic association, which occurs on the base of single image, is far less than
the right semantic association of the multiple images [156]”.
96
5.1 Semantic Association
Figure 5.1: Verification inconsistency: Similarity of the visual features, but both images
belong from different semantic classes.
Keeping the above mentioned points in the mind, the goal of research in our thesis
focus on finding ways in which this kind of defects can be avoided and can improve the
retrieval of images from CBIR systems performance. For this, we have focused on the
following points:
• To achieve the semantically correct retrieval of the images, an effective feature ex-
traction method is introduced.
• To evaluate the proposed feature-sets, and to reduce the inherent gap between the
image features and the semantics, a neural network based architecture is presented.
• To raise the right semantic association probability and bring the output verification,
true neighborhood of the query images is detected and utilized for the finalization
of the semantic class.
• Pearson correlation is used as a distance measure and for the neighborhood selec-
tion, which also verifies the effectiveness of proposed scheme.
• To evaluate the image representation, and to reduce the semantic gap, Support vec-
tor machine is presented.
97
5.2 Content based image retrieval using ANN
5.2 Content based image retrieval using ANN
Once the images present in the image repository are represented in the form of low level
features, we can determine their semantic class. To determine the actual semantic class, a
sub repository of the images is generated having the representation of “M” known classes,
and every class contains R ≥ 2 images. In our implementation, the value of “R” is set
on 30, which means that 30 images from every semantic class from a ground truth image
repository are used for the development of training repository. On this sub repository
neural networks with specific class association parameters are trained. One against all
classes (OAA) association rule is used for the network development, with the target to
decrease the mean squared error between actual association and the NN based obtained
association. The class specific training set can be defined as: Ωtr = Ωpos
⋃Ωneg where
Ωpos are representing the R images from a particular class, and Ωneg are all other images
in the training repository. Once the training is complete, semantic class of all images
present in the image repository is determined on the base of decision function and the
association rules.
“Due to the object composition present in the images, many images may tend to show
the association with some other classes, i.e., in the case of Mountain images which some-
times associate with the beach images. Therefore, a mechanism is required to reduce
such associations. This is the reason, that the class finalization process also involves top
“K” neighbors in the semantic association process using the majority voting rule (MVR)
[157]”:
C∗(x) = sgn
∑i
Ci(X)− K − 1
2
(5.1)
“where Ci(X) is the class wise association of input image and its top neighbor.
Ci(X) = yf l (5.2)
where l = 1, 2, 3..., n represents the total number of neural network structures, and yf l
returns the association factor of a particular neural network structure with a specific class.
MVR counts the largest number of classifiers that agree with each other [157]. Therefore,
98
5.3 Content based image retrieval using SVM
the class association can be determined by:
C∗F (x) = argmax(∑i
C∗(x)) (5.3)
Once the semantic association of all images present in the image repository is deter-
mined, we store the semantic association values in a file that serves for us as the semantic
association database. Therefore, after determining the semantic class of the query im-
age through trained neural networks, we compute the Euclidean distance of the query
image with all images of the same semantic class by taking into an account the values
of the semantic association database, and generate the output on the basis of the feature
similarities.”
5.3 Content based image retrieval using SVM
“Application of the above mentioned procedure returns two feature vectors, representing
texture features obtained from Bandelet transform, and color features obtained from HSV
color space. Amassment of these feature vectors in a single vector represents feature
vector features against any image. Support Vector Machines (SVMs) tend to be under
the supervision of the study of the techniques used to classify images. They look at
specific image database as two sets of vectors in ’n’ dimensional space and build a high
level of isolation that increases the margin between the images of the relevant query and
pictures are not relevant to the query. SVM is usually the method of the nucleus and
the nucleus function used in SVM is very crucial in determining performance. The basic
principle for SVMs is the maximum margin of workbook. Using the methods of the
nucleus, the data can first be set implicitly to a high-dimensional space nucleus. Kernel
function corresponding decision, in the original space can be non-linear is determined
by the maximum margin of seed in space. Now, suppose a set of inputs belong to two
categories as [158]”:
(Xi, Yi)Ni=1 = (+1,−1) (5.4)
99
5.3 Content based image retrieval using SVM
where xi and yi are input sets and hyper planes are created by finding the efficient
values of weight vector w and bias b as fallow:
W T .X + b = 0 (5.5)
Two classes can be separated from each other, therefore, first find maximum margin 2/w
hyper planes:
W T .Xi + b >= +1 (5.6)
W T .Xi + b <= −1 (5.7)
For Binary classification through kernel version of Wolfe dual problem with La-
grangian multiplied by αi.
W (α) =m∑
i,j=1
αi − 1/2m∑
i,j=1
αiαjyiyjK(Xi, Xj) (5.8)
where αi = 0 and∑m
i=1 αiyi = 0 are subject constraint. After getting the optimal
values of αi, the decision function is based on kernel function of SVM classifier is given
by:
F (X) = Sgn[g(x)] (5.9)
g(x) =m∑i=1
αiYiK(Xi, Xj) + b (5.10)
“Above output equation is known as hyper plane decision function of SVM. High estima-
tion values of g(x) represent high prediction confidence and vice versa. After generating
a sub-repository of images, as well as generating new feature through a combination of
color and texture features, we train category-specific support vector machines on this
sub-repository with the idea of one against all classes (OAA) classification. All vectors
described feature found in certain positive training class group with 1, and all other fea-
ture vectors, which do not belong to this category specific with 0. In this way, we identify
training packages for all categories of works train SVM them optimal use quadratic pro-
gramming, and maintaining maximum iterations = 1000. After training of these support
vector machines, all images are tested against all trained support vector machines present
100
5.4 Performance Evaluation
in image repository, and on the basis of decision function, they are associated with their
specific semantic class. Our decision function is as follows:”
I∗hsv = argmax(Y ′fl) (5.11)
Where l = 1, 2, ..., n are the total number of support vector machines, (argmax) is
the set of points of the given argument for which the given function attains its maximum
value, (Y ′fl) returns the association of corresponding support vector machine, and I∗hsv
represents the obtained associated class. Due to the object composition present in the
images, many images may tend to show the association with some other classes i.e. in
the case of Mountain images which sometimes associate with the beach images, therefore
a mechanism is required to reduce such associations. This is the reason that, the class
finalization process also involves top K neighbors in the semantic association process
using the majority voting rule (MVR):
C∗(X) = sgn∑i
Ci(X)− K − 1
2(5.12)
Where Ci(X) is the class wise association of input image with its top neighbors. MVR
counts the largest number of classifiers that agree with each other [159]. So, according to
equation 5.12 the class association can be determined by:
Cfe(X) = argmax(∑i
C∗(X)) (5.13)
“The aforementioned process will be applied on all images present in the image repository
and their semantic class is determined. In this manner, when the system suggests an output
or semantic class for any query image, only images having the same semantic class are
returned to the user after ranking on the base of distance with respect to the query image
of feature similarities.”
5.4 Performance Evaluation
To elaborate the retrieval capabilities of the proposed method, numerous experiments are
performed on three different image datasets. For implementation purposes, we have used
101
5.4 Performance Evaluation
Matlab 2010 in the Windows 7 environment using a core i3 machine by Dell. The detail of
the experiments will be presented in the following subsections. Section 5.4.1 describes the
datasets used for image retrieval purposes. Section 5.4.2 is about the retrieval precision
and recall on randomly selected queries. Section 5.4.3 and 5.4.5 describe the comparison
results of the proposed method with some state of the art works in CBIR.
5.4.1 Image Datasets
For our experiments, we have used three image datasets namely: Corel, Coil, and Cal-
tech 101. The Corel dataset contains 10,908 images of the size 384 × 256 or 256 × 384
each. For this dataset, we have reported the results on ten semantic categories having
100 images in each category. These semantic classes are namely: Africa, Buses, Beach,
Dinosaurs, Buildings, Elephants, Horses, Mountains, Flowers, and Food. The reason for
our choice to report the result on these categories is that: these categories are the same
semantic groups used by most of the researchers who are working in the domain of CBIR
to report the effectiveness of their work [2; 85; 89; 115; 117], so a clear performance com-
parison is possible in term of the reported results. To further elaborate the performance of
the proposed system, experiments are also performed on Columbia object image library
(COIL) [89] having 7200 images from 100 different categories. Finally, we have used the
Caltech 101 image set. This dataset consists of 101 image categories and every category
has different number of images in it. For the simplification purposes, we have manually
selected 30 categories which contain at least 100 images from every semantic category.
5.4.2 Retrieval Precision/Recall Evaluation
For our experiments, we have written a computer simulation which randomly selects 300
images from image repository and uses them as the query image. As already described
above, we are using image datasets having the images grouped in the form of semantic
concepts, and on the base of their labels, we can automatically determine their semantic
association. We run this simulation on all three datasets mentioned previously and de-
termine the performance by counting how many correct results are obtained against each
102
5.4 Performance Evaluation
query image. In proposed work, we reported the average result after running the five time
experiments from each image category. For our experiments, a reverted index mechanism
is proposed which after determining the semantic class of the query image, returns the rel-
evant images against it, i.e., a method followed by the Google for text document search.
According to the proposed method, we apply the trained neural networks on every image
present in the image repository and determine its semantic class. The class association
information is stored in a file which serves for us the semantic association database. The
usability of this approach is that, after determining the semantic information for one time,
we only need to determine the semantic class of the query image and relevance informa-
tion about the predetermined semantic cluster. Overall, the class association accuracy is
determined in terms of the precision and recall using following formulas:
Precision =NA(q)
NR(q)
(5.14)
Recall =NA(q)
Nt
(5.15)
“where NA(q) represents the relevant images matching to the query image, NR(q) repre-
sents the retrieved images against the query image, and Nt the total number of relevant
images available in the database. Precision or specificity determines the ability of sys-
tem to retrieve only those images which are relevant for any query image amongst all
of the retrieved images, while Recall rate also known as sensitivity or true positive rate,
determines the ability of classifier systems in terms of model association with their actual
class. For the elaboration of results, top 20 retrieved images against any query image are
used to compute the precision and recall. We have reported the average of the results
as we mentioned previously after running our technique for five times.” To elaborate the
performance of the proposed system, we have randomly selected three images from 10
previously mentioned image categories from Corel image set and displayed their results
in Figure 5.2. Here we have displayed the retrieval results which represent the precision
obtained by our method against these query images. In this regard, we have displayed the
results from the top 10 to top 40 retrievals against these randomly selected query images.
The quantitative analysis of the proposed method suggests that the quality of the system
103
5.4 Performance Evaluation
is good in terms of precision as reliable results are appearing against these random selec-
tions. Most reliable results appear in the range of 10 to 30 images against query image as
there are 100 images in a single category. The important thing to note here is that these
are the results we have achieved without involving any kind of the external supervision
done by the user, as most of the relevance feedback based CBIR techniques do.
In figure 5.3 and 5.4, the same experiment is performed on Caltech 101 image set
by randomly selecting 4 images. Precision and Recall are reported on top 10 to top 60
retrieval rates. Hence, on the base of retrieval accuracy we can say the proposed method
is quite efficient. Another important thing to note is that the results reported here repre-
sent the retrieval against three random queries, while overall accuracy is reported on the
average of 100 query images and performing the experiments for five times.
5.4.3 Comparison on Corel Image Set
“To determine the usability of proposed method, it is compared with some state of the
art methods in CBIR. In this regard, the technique (texture + Color YCbCr and ANN) is
compared with [2; 85; 89; 115; 117]. The reason for our choice to compare with these
techniques is that these systems have reported their results on the common denomination
of the ten semantic categories of Corel dataset as described earlier. Therefore, a clear
performance comparison is possible. Table 5.1 explains the class wise comparison of
the proposed system with other comparative systems in terms of precision. The results
show that proposed system has performed better than all other systems in terms of av-
erage precision obtained. Table 5.2 describes the performance comparison in terms of
Recall rates with the same systems. From the results, it could be easily observed that the
proposed system has the highest recall rates. Figure 5.5 shows the performance of pro-
posed method in term of precision with other state of art system. Figure 5.6 describes the
performance comparison in terms of Recall rates with the same systems. From results, it
could be easily observed that the proposed system has the highest recall rates as show in
Table 5.2”.
104
5.4 Performance Evaluation
Figure 5.2: Query Performance on Corel image dataset with Top 10 to Top 40 Retrievals.
105
5.4 Performance Evaluation
Figure 5.3: Query Performance on Caltech image dataset with Top 10 to Top 60 Retrievals
in terms of Precision.
Figure 5.4: Query Performance on Caltech image dataset with Top 10 to Top 60 Retrievals
in terms of Recall.
“
106
5.4 Performance Evaluation
Table 5.1: Comparison of mean precision obtained by proposed method with other stan-
dard retrieval systems on top 20 retrievals.
Class Proposed Method [117] [115] [85] [89] [2]
Africa 0.65 0.45 0.56 0.70 0.64 0.68
Beach 0.70 0.39 0.53 0.56 0.64 0.54
Buildings 0.75 0.37 0.61 0.57 0.70 0.54
Buses 0.95 0.74 0.89 0.87 0.92 0.88
Dinosaurs 1.00 0.91 0.98 0.97 0.99 0.99
Elephants 0.80 0.30 0.57 0.67 0.78 0.65
Flowers 0.95 0.85 0.89 0.91 0.95 0.89
Horses 0.90 0.56 0.78 0.83 0.95 0.80
Mountains 0.75 0.29 0.51 0.53 0.74 0.52
Food 0.75 0.36 0.69 0.74 0.81 0.73
Mean 0.820 0.522 0.701 0.735 0.812 0.722
”
Figure 5.5: Comparison of mean precision obtained by proposed method with other stan-
dard retrieval systems.
107
5.4 Performance Evaluation
“
Table 5.2: Comparison of mean recall obtained by proposed method with other standard
retrieval systems on top 20 retrievals.
Class Proposed Method [117] [115] [85] [89] [2]
Africa 0.13 0.11 0.15 0.15 0.13 0.14
Beach 0.14 0.12 0.19 0.19 0.13 0.19
Buildings 0.15 0.12 0.18 0.18 0.14 0.17
Buses 0.19 0.09 0.11 0.11 0.18 0.12
Dinasours 0.20 0.07 0.09 0.09 0.20 0.10
Elephants 0.16 0.13 0.15 0.15 0.16 0.14
Flowers 0.19 0.08 0.11 0.11 0.19 0.11
Horses 0.18 0.10 0.13 0.13 0.19 0.13
Mountains 0.15 0.13 0.22 0.22 0.15 0.21
Food 0.15 0.12 0.13 0.13 0.16 0.13
Mean 0.164 0.107 0.146 0.146 0.163 0.144
”
Figure 5.6: Comparison of mean recall obtained by proposed method with other standard
retrieval systems.
108
5.4 Performance Evaluation
5.4.4 Comparison with State-of-the-Art Methods
The retrieval results are compared with the state-of-art image retrieval methods, includ-
ing the methods of Efficient content-based image retrieval using Multiple Support Vector
Machines Ensemble (EMSVM) [160], Simplicity [107], CLUE [118], patch based His-
togram of Oriented Gradients-Local Binary Pattern ( Patch based HOG-LBP) [161], and
Edge orientation difference histogram and color-SIFT (EODH and Color-SIFT) [162].
The reason of our choice for comparison with these techniques is that: these systems
have reported their results on the common denomination of the ten semantic categories of
Corel dataset as described earlier. Hence, a clear performance comparison is possible with
above state of art methods. “Table 5.3 presents comparison of the proposed system with
other comparative systems in terms of average precision. The results show that proposed
system has performed better than all other systems in terms of average precision obtained.
Same results are graphically illustrated in figure 5.7. Table 5.4 describes the performance
comparison in terms of Recall rates with the same systems. From the results, it can easily
observed that the proposed system has the better recall rates. Figure 5.8 describes the
performance comparison in terms of recall rates with the same systems.” “
109
5.4 Performance Evaluation
Table 5.3: Comparison of mean precision obtained by proposed method with state-of-art
methods on top 20 retrievals.
Class Proposed EMSVM Simplicity CLUE HOG-LBP SIFT
[160] [107] [118] [161] [162]
Africa 0.65 0.5 0.4 0.5 0.55 0.75
Beach 0.70 0.7 0.3 0.35 0.47 0.38
Buildings 0.75 0.2 0.4 0.45 0.56 0.54
Buses 0.95 0.8 0.6 0.65 0.91 0.97
Dinosaurs 1.00 0.9 0.96 0.95 0.94 0.99
Elephants 0.80 0.6 0.3 0.3 0.49 0.66
Flowers 0.95 1.00 0.6 0.75 0.85 0.92
Horses 0.90 0.8 0.6 0.7 0.52 0.87
Mountains 0.75 0.5 0.25 0.3 0.37 0.59
Food 0.75 0.6 0.45 0.6 0.55 0.62
Mean 0.820 0.661 0.486 0.555 0.621 0.729
”
Figure 5.7: Comparison of mean precision obtained by proposed method with state of art
retrieval systems.
“
110
5.4 Performance Evaluation
Table 5.4: Comparison of mean recall obtained by proposed method with state-of-art
methods on top 20 retrievals.
Class Proposed Method [160] [107] [118] [161] [162]
Africa 0.13 0.1 0.08 0.1 0.11 0.15
Beach 0.14 0.14 0.06 0.07 0.09 0.08
Buildings 0.15 0.04 0.07 0.09 0.11 0.11
Buses 0.19 0.14 0.12 0.13 0.18 0.19
Dinasours 0.20 0.18 0.19 0.19 0.1 0.13
Elephants 0.16 0.12 0.06 0.06 0.1 0.13
Flowers 0.19 0.2 0.12 0.15 0.17 0.18
Horses 0.18 0.16 0.12 0.14 0.1 0.17
Mountains 0.15 0.1 0.05 0.06 0.08 0.12
Food 0.15 0.12 0.09 0.12 0.11 0.13
Mean 0.164 0.130 0.096 0.111 0.124 0.146
”
Figure 5.8: Comparison of mean recall obtained by proposed method with state of art
retrieval systems.
“The second technique (Bandelet texture + Color HSV and SVM) is compared with
[6; 85; 89; 115; 159; 160; 162]. Table 5.5 explains the class wise comparison of the pro-
posed system with other comparative systems in terms of precision. The results show that
111
5.4 Performance Evaluation
proposed system has performed better than all other systems in terms of average precision
obtained. Table 5.6 describes the performance comparison in terms of Recall rates with
the same systems. From the results, it could easily be observed that the proposed sys-
tem has the highest recall rates. Figure 5.9 and 5.10 shows the performance of proposed
method in term of precision and recall rates with other state of art system”. “
Table 5.5: Mean Precision of Proposed Method-HSV compared with other standard re-
trieval systems on top 20 retrievals.
Class Proposed Method-HSV [115] [89] [85] [159] [160] [162] [6]
Africa 0.8 0.56 0.64 0.70 0.42 0.5 0.75 0.65
Beach 0.75 0.53 0.64 0.56 0.45 0.7 0.38 0.7
Buildings 0.75 0.61 0.70 0.57 0.41 0.2 0.54 0.75
Buses 0.9 0.89 0.92 0.87 0.85 0.8 0.97 0.95
Dinosaurs 1.00 0.98 0.99 0.97 0.59 0.9 0.99 1.0
Elephants 0.90 0.57 0.78 0.67 0.43 0.6 0.66 0.8
Flowers 0.8 0.89 0.95 0.91 0.90 1.0 0.92 0.95
Horses 0.90 0.78 0.95 0.83 0.59 0.80 0.87 0.9
Mountains 0.70 0.51 0.74 0.53 0.27 0.5 0.59 0.75
Food 0.8 0.69 0.81 0.74 0.43 0.6 0.62 0.75
Mean 0.830 0.701 0.812 0.735 0.534 0.660 0.729 0.820
”
112
5.4 Performance Evaluation
Figure 5.9: Mean Precision of Proposed Method-HSV compared with other standard re-
trieval systems on top 20 retrievals.
“
Table 5.6: Comparison of mean recall obtained by proposed method-HSV with other
standard retrieval systems on top 20 retrievals.
Class Proposed Method-HSV [115] [89] [85] [159] [160] [162] [6]
Africa 0.16 0.15 0.13 0.15 0.08 0.1 0.15 0.13
Beach 0.15 0.19 0.13 0.19 0.09 0.14 0.08 0.14
Buildings 0.15 0.18 0.14 0.18 0.08 0.04 0.11 0.15
Buses 0.18 0.11 0.18 0.11 0.17 0.14 0.19 0.19
Dinasours 0.20 0.09 0.20 0.09 0.12 0.18 0.13 0.2
Elephants 0.18 0.15 0.16 0.15 0.09 0.12 0.13 0.16
Flowers 0.16 0.11 0.19 0.11 0.18 0.2 0.18 0.19
Horses 0.18 0.13 0.19 0.13 0.12 0.16 0.17 0.18
Mountains 0.14 0.22 0.15 0.22 0.05 0.1 0.12 0.15
Food 0.16 0.13 0.16 0.13 0.09 0.12 0.13 0.15
Mean 0.166 0.146 0.163 0.146 0.107 0.130 0.139 0.164
”
113
5.4 Performance Evaluation
Figure 5.10: Comparison of mean recall obtained by proposed method-HSV with other
standard retrieval systems.
5.4.5 Comparison on Coil Image Set
From the precision and recall results described for the Corel dataset, we can observe
that (ICTEDCT) has second highest rates in terms of precision and recall. Therefore,
we have reported the performance comparison on Coil dataset on different retrieval rates
against ICTEDCT [89]. For this experiment, five images are selected from each image
category and then performance of both systems is compared against each category. From
the results elaborated in figure 5.11, it can clearly be observed that proposed method is
giving higher recall and precision rates as compare to ICTEDCT [89]. Hence, from the
results of proposed method on Coil and Corel datasets, we can say that proposed method
is much more precise and effective than other CBIR systems.
114
5.5 Chapter Summary
Figure 5.11: Comparison of precision and recall obtained by proposed method with ICT-
EDCT.
5.5 Chapter Summary
With several application benefits, content based image retrieval has gained a lot of re-
search attention. The research work has introduced a mechanism for automatic image
retrieval. The major consideration of the research was on a finding that the most promi-
nent image results can appear if we generate the image representations that emphasize the
core image objects instead of considering every image patch. Therefore, we have applied
the bandelet transform for feature extraction which considers the core objects found in an
image. To further enhance the image representation capabilities, color features are also
incorporated. Semantic association is performed through artificial neural networks, and
an inverted index mechanism is used to return the images against queries to assure the
fast retrieval. The results of the proposed method are reported on three image data sets
namely: Corel, Coil, and Caltech-101. The comparison results with other standard CBIR
systems have revealed that the proposed system has outperformed other systems in terms
of average precision and recall values.
115
CHAPTER 6
CONCLUSIONS AND FUTURE WORK
6.1 Conclusion
This chapter presents major research innovations of our thesis, and also provides an in-
sight for the future work. The focus of research is on the identification of the ways that
can be applied to achieve the content based image retrieval in a stable way. The ultimate
goal that was aimed to be addressed was the reduction of the semantic gap that was a
major source of the improper image retrieval output. The reasons in this regard were
identified as: (1) Improper representation of the images. (2) Improper similarity detec-
tion measures that are incorporated to bring the semantically correct output. Therefore,
to overcome these limitations, the focus of the research is on the identification of a novel
scheme of feature extraction that is able to compute the global image energy on the basis
of the core objects found in an image. In this regard, for feature extraction, we performed
the in depth image analysis by the Bandelet transform and identify the objects that return
us the image theme as being the core objects. Once these objects are identified, we com-
puted the image energy by remaining the feature extraction process around the blocks that
contain the objects of interest. For this, texture of the blocks are considered and texture
specific parameters are applied and image representations are generated by computing the
Eigen values of the Gabor filters. The texture features heavily rely on the artificial neural
networks that gives the precise information about the type of texture, so that the texture
specific parameters are applied. To further improve the capabilities of the feature vectors,
the color features obtained by the YCbCr domain are fused with the Bandelet and Gabor
116
6.2 Future Work
based texture representations. This result in the form of very powerful image representa-
tions that are invariant to scale, and rotation. Over these representations, semantic sensors
(category specific Artificial Neural network) are trained that guarantee the robust output
against the image queries [6].
The work presented in this thesis has some limitations as well. We performed the
content based image retrieval for fixed number of semantic categories. Therefore, we
admit that, the methods described are not able to handle the datasets where categories
are unsupervised, infinite or the system is not trained over them. Another limitation is
elaborated in section 1.2 using figure 1.1 and 1.2. If we try to match the contents of figure
1.2 and 1.1, then it is not possible to find the relevant information although the figure
1.2 contains the complete contents of figure 1.1. The reason is that the global image
frequencies of both images are different. In this thesis, we used three datasets namely
Corel, Coil and Caltech 101 to evaluate the image retrieval performance.
6.2 Future Work
There is a tremendous scope of research in the domain of CBIR. The major emphasis
would be on the development of a method which would be able to overcome the limita-
tions, which are present in the current work. Secondly, we can extend our work for pure
application development having practical implications. Here, we would like to emphasize
on some of them.
6.2.1 Extensions of the Work
• In the current research work, problem of automatic image retrieval is addressed, in
this regard, the emphasis was on the development of a robust way of image rep-
resentation and retrieval. The results of the techniques are comparatively higher
compared against the other state of the art methods. But still, for many images, the
system will not be able find the exact semantic associations. The major reason for
this improper output is the diversity of the content found in the images. Therefore,
117
6.2 Future Work
the situation like this is needed to be handled during the image retrieval. The sim-
plest solution is the involvement of user in the image retrieval process to guide the
system by feedbacks when it is not able to produce the desired output. Hence, in
future, we will extend our method to address the problem of miss-associations by
the relevance feedback.
• We can further extend the capabilities of the proposed image retrieval technique
by including the evolutionary algorithms like GA, and PSO and can consider the
image retrieval problem as an optimization problem where we are concerned with
the maximization of the image retrieval results. Currently, we are working on the
application of fuzzy Genetic Algorithms for obtaining the semantic groups in the
data.
6.2.2 Practical Implications of CBIR
We can extend the CBIR work presented in this dissertation for the application develop-
ment having impacts over the society. Here we would like to present a few of them.
6.2.3 Anomalous Events Monitoring in Surveillance Videos
Government of Pakistan is planning to install over 1.5 million operational CCTV cameras,
that is virtually one camera for every 100th person, and this figure is increasing at a fast
rate throughout the world. Effectively monitoring hundreds or even thousands of cameras
at the same time is challenging, if not impossible. Research shows that humans have an
attention span of about twenty minutes and even less when doing mundane tasks. Security
personnel can often become bored or distracted and have other duties that keep them from
efficiently monitoring cameras. Given the high stakes inherent in security operations, a
proactive approach to intelligent video analysis is vital. Therefore, the basic theme of the
product for Video analysis would be the complete self-learning.
“Security concerns are not by any means, the only component driving the rapid growth
of CCTV cameras. Another essential reason is the access of hidden knowledge extracted
118
6.2 Future Work
from CCTV footage to be used for effective business decision making, such as customer
services, store designing, reducing store shrinkage, product marketing etc. Activities
happening inside witnessed views are generally the most crucial semantic organizations
that could be taken coming from videos and recordings. Most of the work, presented in
the past to find patterns or recurring event dealing with the discovery of natural events.
In contrast, here we can frame to detect anomalous events unknown status associated
with recurring series of events. This is to discover the events which are not likely to
follow the repeated sequence of events. This information can be very useful to detect
abnormal events which are unknown and could provide intelligence information early in
the redistribution of resources on specific areas of the display.”
6.2.4 Traffic Management
We can apply the ideas of the content based image retrieval research presented in this
thesis for effectively monitoring the traffic. Cameras will be mounted on the highways,
and based on the ideas discussed above, we can identify the places automatically where
traffic-jams will occur. If we have such information, then the resources could be deployed
to get-rid from the traffic-jam. We can also stream such information over the Internet
in the real time, so that the people can get the suggestions before coming there, and can
choose an alternate path to reach their destination. Hence, this research will be a great
step towards the concept of ”Intelligent Cities”.
6.2.5 CBIR using Hadoop
“With information technology developing rapidly, variety and quantity of image data is
increasing rapidly. How to retrieve desired images, among massive image sets having
millions of images is an open problem. For these very large image sets where images
are retrieved in a content-based way, we can speed up the image retrieval process by
utilizing MapReduce distributed and parallel computing model. Here, it is important to
mention that MapReduce is the abstraction of the model while Hadoop serves as the tool
for the implementation. Through Hadoop, we can design and implement a system that can
119
6.2 Future Work
overcome the performance bottlenecks brought by computing complexity and big amount
of data, when constructing a CBIR system.”
120
References
[1] G.-H. Liu, Z.-Y. Li, L. Zhang, and Y. Xu, “Image retrieval based on micro-structure
descriptor,” Pattern Recognition, vol. 44, no. 9, pp. 2123–2133, 2011.
[2] C.-H. Lin, R.-T. Chen, and Y.-K. Chan, “A smart content-based image retrieval
system based on color and texture feature,” Image and Vision Computing, vol. 27,
no. 6, pp. 658–665, 2009.
[3] F. Long, H. Zhang, and D. D. Feng, “Fundamentals of content-based image re-
trieval,” in Multimedia Information Retrieval and Management. Springer, 2003,
pp. 1–26.
[4] N. Gandal, “The dynamics of competition in the internet search engine market,”
International Journal of Industrial Organization, vol. 19, no. 7, pp. 1103–1117,
2001.
[5] E. S. McIntyre, “Search engine optimization,” 2015.
[6] R. Ashraf, K. Bashir, A. Irtaza, and M. T. Mahmood, “Content based image
retrieval using embedded neural networks with bandletized regions,” Entropy,
vol. 17, no. 6, pp. 3552–3580, 2015.
[7] V. N. Gudivada and V. V. Raghavan, “Content based image retrieval systems,”
Computer, vol. 28, no. 9, pp. 18–22, 1995.
[8] A. W. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, “Content-based
image retrieval at the end of the early years,” Pattern Analysis and Machine Intel-
ligence, IEEE Transactions on, vol. 22, no. 12, pp. 1349–1380, 2000.
121
REFERENCES
[9] R. Datta, J. Li, and J. Z. Wang, “Content-based image retrieval: approaches and
trends of the new age,” in Proceedings of the 7th ACM SIGMM international work-
shop on Multimedia information retrieval. ACM, 2005, pp. 253–262.
[10] A. Singla and M. Garg, “Qbic, mars and viper: A review on content based image
retrieval techniques.”
[11] M. A. Stricker and M. Orengo, “Similarity of color images,” in IS&T/SPIE’s Sym-
posium on Electronic Imaging: Science & Technology. International Society for
Optics and Photonics, 1995, pp. 381–392.
[12] W. Forstner, “A framework for low level feature extraction,” in Computer Vi-
sionECCV’94. Springer, 1994, pp. 383–394.
[13] I. K. Sethi, I. L. Coman, and D. Stan, “Mining association rules between low-level
image features and high-level concepts,” in Aerospace/Defense Sensing, Simula-
tion, and Controls. International Society for Optics and Photonics, 2001, pp.
279–290.
[14] M. Rehman, M. Iqbal, M. Sharif, and M. Raza, “Content based image retrieval:
survey,” World Applied Sciences Journal, vol. 19, no. 3, pp. 404–412, 2012.
[15] D. G. Thakore and A. Trivedi, “Content based image retrieval techniques–issues,
analysis and the state of the art,” BVM Engineering College, Gujarat, 2010.
[16] C. Carson, S. Belongie, H. Greenspan, and J. Malik, “Blobworld: Image segmen-
tation using expectation-maximization and its application to image querying,” Pat-
tern Analysis and Machine Intelligence, IEEE Transactions on, vol. 24, no. 8, pp.
1026–1038, 2002.
[17] W.-Y. Ma and B. S. Manjunath, “Netra: A toolbox for navigating large image
databases,” Multimedia systems, vol. 7, no. 3, pp. 184–198, 1999.
122
REFERENCES
[18] O. Marques, L. M. Mayron, G. B. Borba, and H. R. Gamba, “An attention-driven
model for grouping similar images with image retrieval applications,” EURASIP
Journal on Applied Signal Processing, vol. 2007, no. 1, pp. 116–116, 2007.
[19] M. Mayron Liam, “Image retrieval using visual attention,” Ph.D. dissertation, PhD
thesis/Liam M. Mayron–Boca Raton, Florida, 2008.–217 p, 2008.
[20] S. Mavandadi, P. Aarabi, A. Khaleghi, and R. Appel, “Predictive dynamic user
interfaces for interactive visual search,” in Multimedia and Expo, 2006 IEEE Inter-
national Conference on. IEEE, 2006, pp. 381–384.
[21] V. Mezaris, I. Kompatsiaris, and M. G. Strintzis, “An ontology approach to object-
based image retrieval,” in Image Processing, 2003. ICIP 2003. Proceedings. 2003
International Conference on, vol. 2. IEEE, 2003, pp. II–511.
[22] Y. Liu, D. Zhang, G. Lu, and W.-Y. Ma, “A survey of content-based image retrieval
with high-level semantics,” Pattern Recognition, vol. 40, no. 1, pp. 262–282, 2007.
[23] K. Madhu and R. Minu, “Image segmentation using improved jseg,” in Pattern
Recognition, Informatics and Mobile Engineering (PRIME), 2013 International
Conference on. IEEE, 2013, pp. 37–42.
[24] Y. Deng and B. Manjunath, “Unsupervised segmentation of color-texture regions in
images and video,” Pattern Analysis and Machine Intelligence, IEEE Transactions
on, vol. 23, no. 8, pp. 800–810, 2001.
[25] C. Gao, X. Zhang, and H. Wang, “A combined method for multi-class image se-
mantic segmentation,” Consumer Electronics, IEEE Transactions on, vol. 58, no. 2,
pp. 596–604, 2012.
[26] E. Borenstein and S. Ullman, “Combined top-down/bottom-up segmentation,” Pat-
tern Analysis and Machine Intelligence, IEEE Transactions on, vol. 30, no. 12, pp.
2109–2125, 2008.
123
REFERENCES
[27] Y. Rui, T. S. Huang, and S.-F. Chang, “Image retrieval: Current techniques, promis-
ing directions, and open issues,” Journal of visual communication and image rep-
resentation, vol. 10, no. 1, pp. 39–62, 1999.
[28] B. S. Manjunath, J.-R. Ohm, V. V. Vasudevan, and A. Yamada, “Color and texture
descriptors,” Circuits and Systems for Video Technology, IEEE Transactions on,
vol. 11, no. 6, pp. 703–715, 2001.
[29] M. Lamard, G. Cazuguel, G. Quellec, L. Bekri, C. Roux, and B. Cochener, “Con-
tent based image retrieval based on wavelet transform coefficients distribution,”
in Engineering in Medicine and Biology Society, 2007. EMBS 2007. 29th Annual
International Conference of the IEEE. IEEE, 2007, pp. 4532–4535.
[30] I. J. Sumana, G. Lu, and D. Zhang, “Comparison of curvelet and wavelet texture
features for content based image retrieval,” in Multimedia and Expo (ICME), 2012
IEEE International Conference on. IEEE, 2012, pp. 290–295.
[31] J. R. Smith and S.-F. Chang, “Transform features for texture classification and
discrimination in large image databases,” in Image Processing, 1994. Proceedings.
ICIP-94., IEEE International Conference, vol. 3. IEEE, 1994, pp. 407–411.
[32] D. Zhang, A. Wong, M. Indrawan, and G. Lu, “Content-based image retrieval using
gabor texture features,” in IEEE Pacific-Rim Conference on Multimedia, University
of Sydney, Australia, 2000.
[33] R. M. Haralick, “Statistical and structural approaches to texture,” Proceedings of
the IEEE, vol. 67, no. 5, pp. 786–804, 1979.
[34] A. Khokher and R. Talwar, “Content-based image retrieval: Feature extraction
techniques and applications,” in Conference proceedings, 2012.
[35] H. Tamura, S. Mori, and T. Yamawaki, “Textural features corresponding to visual
perception,” Systems, Man and Cybernetics, IEEE Transactions on, vol. 8, no. 6,
pp. 460–473, 1978.
124
REFERENCES
[36] C. C. Gotlieb and H. E. Kreyszig, “Texture descriptors based on co-occurrence
matrices,” Computer Vision, Graphics, and Image Processing, vol. 51, no. 1, pp.
70–86, 1990.
[37] P. D. Welch, “The use of fast fourier transform for the estimation of power spec-
tra: A method based on time averaging over short, modified periodograms,” IEEE
Transactions on audio and electroacoustics, vol. 15, no. 2, pp. 70–73, 1967.
[38] R. Milanese and M. Cherbuliez, “A rotation, translation, and scale-invariant ap-
proach to content-based image retrieval,” Journal of visual communication and
image representation, vol. 10, no. 2, pp. 186–196, 1999.
[39] B. S. Manjunath and W.-Y. Ma, “Texture features for browsing and retrieval of
image data,” Pattern Analysis and Machine Intelligence, IEEE Transactions on,
vol. 18, no. 8, pp. 837–842, 1996.
[40] J. Mao and A. K. Jain, “Texture classification and segmentation using multiresolu-
tion simultaneous autoregressive models,” Pattern recognition, vol. 25, no. 2, pp.
173–188, 1992.
[41] R. Datta, D. Joshi, J. Li, and J. Z. Wang, “Image retrieval: Ideas, influences, and
trends of the new age,” ACM Computing Surveys (CSUR), vol. 40, no. 2, p. 5, 2008.
[42] R. S. Choras, “Image feature extraction techniques and their applications for cbir
and biometrics systems,” International journal of biology and biomedical engi-
neering, vol. 1, no. 1, pp. 6–16, 2007.
[43] M. Madugunki, D. Bormane, S. Bhadoria, and C. Dethe, “Comparison of different
cbir techniques,” in Electronics Computer Technology (ICECT), 2011 3rd Interna-
tional Conference on, vol. 4. IEEE, 2011, pp. 372–375.
[44] T. Deselaers, D. Keysers, and H. Ney, “Features for image retrieval: an experimen-
tal comparison,” Information Retrieval, vol. 11, no. 2, pp. 77–107, 2008.
125
REFERENCES
[45] H. Jalab et al., “Image retrieval system based on color layout descriptor and gabor
filters,” in Open Systems (ICOS), 2011 IEEE Conference on. IEEE, 2011, pp.
32–36.
[46] R. Ashraf, “A novel approach for the gender classification through trained neural
networks,” Journal of Basic and Applied Scientific Research, 2014.
[47] M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani,
J. Hafner, D. Lee, D. Petkovic et al., “Query by image and video content: The qbic
system,” Computer, vol. 28, no. 9, pp. 23–32, 1995.
[48] P. Vacha and M. Haindl, “Image retrieval measures based on illumination invariant
textural mrf features,” in Proceedings of the 6th ACM international conference on
Image and video retrieval. ACM, 2007, pp. 448–454.
[49] P. Vacha, M. Haindl, and T. Suk, “Colour and rotation invariant textural features
based on markov random fields,” Pattern Recognition Letters, vol. 32, no. 6, pp.
771–779, 2011.
[50] D. Zhang, M. M. Islam, G. Lu, and I. J. Sumana, “Rotation invariant curvelet
features for region based image retrieval,” International journal of computer vision,
vol. 98, no. 2, pp. 187–201, 2012.
[51] P. P. Ohanian and R. C. Dubes, “Performance evaluation for four classes of textural
features,” Pattern recognition, vol. 25, no. 8, pp. 819–833, 1992.
[52] P. Howarth and S. Ruger, “Evaluation of texture features for content-based image
retrieval,” in Image and Video Retrieval. Springer, 2004, pp. 326–334.
[53] A. Amato and V. D. Lecce, “Edge detection techniques in image retrieval: the
semantic meaning of edge,” in Video/Image Processing and Multimedia Commu-
nications, 2003. 4th EURASIP Conference focused on, vol. 1. IEEE, 2003, pp.
143–148.
126
REFERENCES
[54] S. Nandagopalan, D. B. Adiga, and N. Deepak, “A universal model for content-
based image retrieval,” jiP, vol. 1, p. 5, 2008.
[55] L. M. Kaplan, R. Murenzi, and K. R. Namuduri, “Fast texture database retrieval
using extended fractal features,” in Photonics West’98 Electronic Imaging. Inter-
national Society for Optics and Photonics, 1997, pp. 162–173.
[56] L. M. Kaplan, “Extended fractal analysis for texture classification and segmenta-
tion,” Image Processing, IEEE Transactions on, vol. 8, no. 11, pp. 1572–1585,
1999.
[57] M. O. M. Dyla and H. Tairi, “Texture-based image retrieval based on fabemd,”
IJCSI, 2011.
[58] S. G. Mallat, “A theory for multiresolution signal decomposition: the wavelet rep-
resentation,” Pattern Analysis and Machine Intelligence, IEEE Transactions on,
vol. 11, no. 7, pp. 674–693, 1989.
[59] Y.-L. Huang, “A fast method for textural analysis of dct-based image,” in Journal
of information science and engineering. Citeseer, 2005.
[60] J. G. Daugman, “Complete discrete 2-d gabor transforms by neural networks for
image analysis and compression,” Acoustics, Speech and Signal Processing, IEEE
Transactions on, vol. 36, no. 7, pp. 1169–1179, 1988.
[61] E. Candes, L. Demanet, D. Donoho, and L. Ying, “Fast discrete curvelet trans-
forms,” Multiscale Modeling & Simulation, vol. 5, no. 3, pp. 861–899, 2006.
[62] I. J. Sumana, M. M. Islam, D. Zhang, and G. Lu, “Content based image retrieval
using curvelet transform,” in Multimedia Signal Processing, 2008 IEEE 10th Work-
shop on. IEEE, 2008, pp. 11–16.
[63] S. Selvarajah and S. Kodituwakku, “Analysis and comparison of texture features
for content based image retrieval,” International Journal of Latest Trends in Com-
puting, vol. 2, no. 1, 2011.
127
REFERENCES
[64] K. Kosnar, V. Vonasek, M. Kulich, and L. Preucil, “Comparison of shape match-
ing techniques for place recognition,” in Mobile Robots (ECMR), 2013 European
Conference on. IEEE, 2013, pp. 107–112.
[65] Y. Li, X. Chen, X. Fu, and S. Belkasim, “Multi-level discrete cosine transform for
content-based image retrieval by support vector machines,” in Image Processing,
2007. ICIP 2007. IEEE International Conference on, vol. 6. IEEE, 2007, pp.
VI–21.
[66] C.-W. Ngo, T.-C. Pong, and R. T. Chin, “Exploiting image indexing techniques in
dct domain,” pattern Recognition, vol. 34, no. 9, pp. 1841–1851, 2001.
[67] M. Banerjee and M. K. Kundu, “Content-based image retrieval using wavelet pack-
ets and fuzzy spatial relations,” in Computer Vision, Graphics and Image Process-
ing. Springer, 2006, pp. 861–871.
[68] R. Porter and N. Canagarajah, “Robust rotation-invariant texture classification:
wavelet, gabor filter and gmrf based schemes,” in Vision, Image and Signal Pro-
cessing, IEE Proceedings-, vol. 144, no. 3. IET, 1997, pp. 180–188.
[69] W.-Y. Ma and B. S. Manjunath, “Texture features and learning similarity,” in Com-
puter Vision and Pattern Recognition, 1996. Proceedings CVPR’96, 1996 IEEE
Computer Society Conference on. IEEE, 1996, pp. 425–430.
[70] P. Sarkar, C. Chakraborty, and M. Ghosh, “Content-based leukocyte image retrieval
ensembling quaternion fourier transform and gabor-wavelet features,” in Intelligent
Systems Design and Applications (ISDA), 2012 12th International Conference on.
IEEE, 2012, pp. 345–350.
[71] S. Selvarajah and S. Kodithuwakku, “Combined feature descriptor for content
based image retrieval,” in Industrial and Information Systems (ICIIS), 2011 6th
IEEE International Conference on. IEEE, 2011, pp. 164–168.
128
REFERENCES
[72] Y.-H. Lee, S.-B. Rhee, and B. Kim, “Content-based image retrieval using wavelet
spatial-color and gabor normalized texture in multi-resolution database,” in Inno-
vative Mobile and Internet Services in Ubiquitous Computing (IMIS), 2012 Sixth
International Conference on. IEEE, 2012, pp. 371–377.
[73] Z. Xue, S. Antani, L. R. Long, J. Jeronimo, and G. R. Thoma, “Investigating cbir
techniques for cervicographic images,” in AMIA Annual Symposium Proceedings,
vol. 2007. American Medical Informatics Association, 2007, p. 826.
[74] E. J. Candes, D. L. Donoho et al., Curvelets: A surprisingly effective nonadaptive
representation for objects with edges. DTIC Document, 1999.
[75] S. Arivazhagan, L. Ganesan, and S. T. Kumar, “Texture classification using curvelet
statistical and co-occurrence features,” in Pattern Recognition, 2006. ICPR 2006.
18th International Conference on, vol. 2. IEEE, 2006, pp. 938–941.
[76] J.-L. Starck, M. K. Nguyen, and F. Murtagh, “Wavelets and curvelets for image de-
convolution: a combined approach,” Signal Processing, vol. 83, no. 10, pp. 2279–
2283, 2003.
[77] J.-L. Starck, F. Murtagh, E. J. Candes, and D. L. Donoho, “Gray and color image
contrast enhancement by the curvelet transform,” Image Processing, IEEE Trans-
actions on, vol. 12, no. 6, pp. 706–717, 2003.
[78] J.-L. Starck, E. J. Candes, and D. L. Donoho, “The curvelet transform for image
denoising,” Image Processing, IEEE Transactions on, vol. 11, no. 6, pp. 670–684,
2002.
[79] J. Fadili and J.-L. Starck, “Curvelets and ridgelets,” in Computational Complexity.
Springer, 2012, pp. 754–773.
[80] G. Joutel, V. Eglin, S. Bres, and H. Emptoz, “Curvelets based feature extraction
of handwritten shapes for ancient manuscripts classification,” in Electronic Imag-
ing 2007. International Society for Optics and Photonics, 2007, pp. 65 000D–
65 000D.
129
REFERENCES
[81] A. Irtaza, M. A. Jaffar, and E. Aleisa, “Correlated networks for content based im-
age retrieval,” International Journal of Computational Intelligence Systems, vol. 6,
no. 6, pp. 1189–1205, 2013.
[82] A. N. Fierro-Radilla, M. Nakano-Miyatake, H. Perez-Meana, M. Cedillo-
Hernandez, and F. Garcia-Ugalde, “An efficient color descriptor based on global
and local color features for image retrieval,” in Electrical Engineering, Computing
Science and Automatic Control (CCE), 2013 10th International Conference on.
IEEE, 2013, pp. 233–238.
[83] G. Pass, R. Zabih, and J. Miller, “Comparing images using color coherence vec-
tors,” in Proceedings of the fourth ACM international conference on Multimedia.
ACM, 1997, pp. 65–73.
[84] R. O. Stehling, M. A. Nascimento, and A. X. Falcao, “A compact and efficient im-
age retrieval approach based on border/interior pixel classification,” in Proceedings
of the eleventh international conference on Information and knowledge manage-
ment. ACM, 2002, pp. 102–109.
[85] M. E. ElAlami, “A novel image retrieval model based on the most relevant fea-
tures,” Knowledge-Based Systems, vol. 24, no. 1, pp. 23–32, 2011.
[86] A. Al-Hamami and H. Al-Rashdan, “Improving the effectiveness of the color co-
herence vector.” Int. Arab J. Inf. Technol., vol. 7, no. 3, pp. 324–332, 2010.
[87] A. Irtaza and M. A. Jaffar, “Categorical image retrieval through genetically opti-
mized support vector machines (gosvm) and hybrid texture features,” Signal, Image
and Video Processing, pp. 1–17, 2014.
[88] S. Sural, G. Qian, and S. Pramanik, “Segmentation and histogram generation using
the hsv color space for image retrieval,” in Image Processing. 2002. Proceedings.
2002 International Conference on, vol. 2. IEEE, 2002, pp. II–589.
130
REFERENCES
[89] S. M. Youssef, “Ictedct-cbir: Integrating curvelet transform with enhanced dom-
inant colors extraction and texture analysis for efficient content-based image re-
trieval,” Computers & Electrical Engineering, vol. 38, no. 5, pp. 1358–1376, 2012.
[90] R. N. Sowmya Rani and S. Reddy, “Comparative study on content based image
retrieval,” International Journal of Future Computer and Communication, vol. 1,
no. 4, p. 366368, 2012.
[91] N.-C. Yang, W.-H. Chang, C.-M. Kuo, and T.-H. Li, “A fast mpeg-7 dominant
color extraction with new similarity measure for image retrieval,” Journal of Visual
Communication and Image Representation, vol. 19, no. 2, pp. 92–105, 2008.
[92] D. Zhang and G. Lu, “Review of shape representation and description techniques,”
Pattern recognition, vol. 37, no. 1, pp. 1–19, 2004.
[93] E. Chang, K. Goh, G. Sychay, and G. Wu, “Cbsa: content-based soft annotation
for multimodal image retrieval using bayes point machines,” Circuits and Systems
for Video Technology, IEEE Transactions on, vol. 13, no. 1, pp. 26–38, 2003.
[94] J. Iivarinen, M. Peura, J. Sarela, and A. Visa, “Comparison of combined shape
descriptors for irregular objects.” in BMVC. Citeseer, 1997.
[95] A. El-ghazal, O. Basir, and S. Belkasim, “Farthest point distance: A new shape sig-
nature for fourier descriptors,” Signal Processing: Image Communication, vol. 24,
no. 7, pp. 572–586, 2009.
[96] D. Zhang and G. Lu, “Content-based shape retrieval using different shape descrip-
tors: A comparative study,” in null. IEEE, 2001, p. 289.
[97] J. Wang, H. Zha, and R. Cipolla, “Combining interest points and edges for content-
based image retrieval,” in Image Processing, 2005. ICIP 2005. IEEE International
Conference on, vol. 3. IEEE, 2005, pp. III–1256.
131
REFERENCES
[98] K. Mikolajczyk and C. Schmid, “Indexing based on scale invariant interest points,”
in Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE International
Conference on, vol. 1. IEEE, 2001, pp. 525–531.
[99] M. Saad, H. Saleh, H. Konber, and M. Ashour, “Cbir system based on integration
between surf and global features,” 2013.
[100] K. Velmurugan and L. D. S. S. Baboo, “Content-based image retrieval using
surf and colour moments,” Global Journal of Computer Science and Technology,
vol. 11, no. 10, 2011.
[101] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Interna-
tional journal of computer vision, vol. 60, no. 2, pp. 91–110, 2004.
[102] S. A. Bakar, M. S. Hitam, W. Yussof, and W. N. J. Hj, “Content-based image re-
trieval using sift for binary and greyscale images,” in Signal and Image Processing
Applications (ICSIPA), 2013 IEEE International Conference on. IEEE, 2013, pp.
83–88.
[103] J. Zhang, “Robust content-based image retrieval of multi-example queries,” 2011.
[104] G. Qian, S. Sural, Y. Gu, and S. Pramanik, “Similarity between euclidean and
cosine angle distance for nearest neighbor queries,” in Proceedings of the 2004
ACM symposium on Applied computing. ACM, 2004, pp. 1232–1237.
[105] B. Thomee et al., “A picture is worth a thousand words: content-based image re-
trieval techniques,” Ph.D. dissertation, Leiden Institute of Advanced Computer Sci-
ence (LIACS), Faculty of Science, Leiden University, 2010.
[106] Y. Rui, T. S. Huang, M. Ortega, and S. Mehrotra, “Relevance feedback: a power
tool for interactive content-based image retrieval,” Circuits and Systems for Video
Technology, IEEE Transactions on, vol. 8, no. 5, pp. 644–655, 1998.
132
REFERENCES
[107] J. Z. Wang, J. Li, and G. Wiederhold, “Simplicity: Semantics-sensitive integrated
matching for picture libraries,” Pattern Analysis and Machine Intelligence, IEEE
Transactions on, vol. 23, no. 9, pp. 947–963, 2001.
[108] G.-H. Liu and J.-Y. Yang, “Content-based image retrieval using color difference
histogram,” Pattern Recognition, vol. 46, no. 1, pp. 188–198, 2013.
[109] M. R. Hejazi and Y.-S. Ho, “An efficient approach to texture-based image retrieval,”
International journal of imaging systems and technology, vol. 17, no. 5, pp. 295–
302, 2007.
[110] B. Prasad, K. K. Biswas, and S. Gupta, “Region-based image retrieval using inte-
grated color, shape, and location index,” computer vision and image understanding,
vol. 94, no. 1, pp. 193–233, 2004.
[111] D. Zhang, G. Lu et al., “A comparative study on shape retrieval using fourier de-
scriptors with different shape signatures,” in Proc. of international conference on
intelligent multimedia and distance education (ICIMADE01), 2001, pp. 1–9.
[112] M. Yang, K. Kpalma, J. Ronsin et al., “A survey of shape feature extraction tech-
niques,” Pattern recognition, pp. 43–90, 2008.
[113] A. Barley and C. Town, “Combinations of feature descriptors for texture image
classification,” Journal of Data Analysis and Information Processing, vol. 2014,
2014.
[114] P. J. Besl and N. D. McKay, “Method for registration of 3-d shapes,” in Robotics-
DL tentative. International Society for Optics and Photonics, 1992, pp. 586–606.
[115] M. B. Rao, B. P. Rao, and A. Govardhan, “Ctdcirs: content based image retrieval
system based on dominant color and texture features,” International Journal of
Computer Applications, vol. 18, no. 6, pp. 40–46, 2011.
133
REFERENCES
[116] Y. Liu, D. Zhang, G. Lu, and W.-Y. Ma, “Region-based image retrieval with per-
ceptual colors,” in Advances in Multimedia Information Processing-PCM 2004.
Springer, 2005, pp. 931–938.
[117] N. Jhanwar, S. Chaudhuri, G. Seetharaman, and B. Zavidovique, “Content based
image retrieval using motif cooccurrence matrix,” Image and Vision Computing,
vol. 22, no. 14, pp. 1211–1220, 2004.
[118] Y. Chen, J. Z. Wang, and R. Krovetz, “Clue: cluster-based retrieval of images by
unsupervised learning,” Image Processing, IEEE Transactions on, vol. 14, no. 8,
pp. 1187–1201, 2005.
[119] T. Ojala, M. Pietikainen, and D. Harwood, “A comparative study of texture mea-
sures with classification based on featured distributions,” Pattern recognition,
vol. 29, no. 1, pp. 51–59, 1996.
[120] T. Maenpaa, “The local binary pattern approach to texture analysis,” 2003.
[121] X. Yuan, J. Yu, Z. Qin, and T. Wan, “A sift-lbp image retrieval model based on bag
of features,” in IEEE International Conference on Image Processing, 2011.
[122] J. Yue, Z. Li, L. Liu, and Z. Fu, “Content-based image retrieval using color and
texture fused features,” Mathematical and Computer Modelling, vol. 54, no. 3, pp.
1121–1127, 2011.
[123] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,”
in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer
Society Conference on, vol. 1. IEEE, 2005, pp. 886–893.
[124] S. Mallat and G. Peyre, “A review of bandlet methods for geometrical image rep-
resentation,” Numerical Algorithms, vol. 44, no. 3, pp. 205–234, 2007.
[125] X. Qu, J. Yan, G. Xie, Z. Zhu, and B. Chen, “A novel image fusion algorithm based
on bandelet transform,” Chinese Optics Letters, vol. 5, no. 10, pp. 569–572, 2007.
134
REFERENCES
[126] R. Arandjelovic and A. Zisserman, “Three things everyone should know to improve
object retrieval,” in Computer Vision and Pattern Recognition (CVPR), 2012 IEEE
Conference on. IEEE, 2012, pp. 2911–2918.
[127] W. Jiang, G. Er, Q. Dai, and J. Gu, “Similarity-based online feature selection in
content-based image retrieval,” Image Processing, IEEE Transactions on, vol. 15,
no. 3, pp. 702–712, 2006.
[128] B. Jyothi and U. Shanker, “Neural network approach for image retrieval based on
preference elicitation,” International Journal on Computer Science and Engineer-
ing, vol. 2, no. 4, pp. 934–941, 2010.
[129] P. D. Heermann and N. Khazenie, “Classification of multispectral remote sensing
data using a back-propagation neural network,” Geoscience and Remote Sensing,
IEEE Transactions on, vol. 30, no. 1, pp. 81–88, 1992.
[130] G. Hepner, T. Logan, N. Ritter, and N. Bryant, “Artificial neural network classifica-
tion using a minimal training set- comparison to conventional supervised classifi-
cation,” Photogrammetric Engineering and Remote Sensing, vol. 56, pp. 469–473,
1990.
[131] V. Sharmanska, N. Quadrianto, and C. H. Lampert, “Augmented attribute represen-
tations,” in Computer Vision–ECCV 2012. Springer, 2012, pp. 242–255.
[132] E. Yildizer, A. M. Balci, M. Hassan, and R. Alhajj, “Efficient content-based image
retrieval using multiple support vector machines ensemble,” Expert Systems with
Applications, vol. 39, no. 3, pp. 2385–2396, 2012.
[133] M. R. Azimi-Sadjadi, J. Salazar, and S. Srinivasan, “An adaptable image retrieval
system with relevance feedback using kernel machines and selective sampling,”
Image Processing, IEEE Transactions on, vol. 18, no. 7, pp. 1645–1659, 2009.
[134] D. Tao, X. Tang, X. Li, and X. Wu, “Asymmetric bagging and random subspace
for support vector machines-based relevance feedback in image retrieval,” Pattern
135
REFERENCES
Analysis and Machine Intelligence, IEEE Transactions on, vol. 28, no. 7, pp. 1088–
1099, 2006.
[135] E. J. Candes and D. L. Donoho, “New tight frames of curvelets and optimal rep-
resentations of objects with piecewise c2 singularities,” Communications on pure
and applied mathematics, vol. 57, no. 2, pp. 219–266, 2004.
[136] E. Le Pennec and S. Mallat, “Bandelet image approximation and compression,”
Multiscale Modeling & Simulation, vol. 4, no. 3, pp. 992–1039, 2005.
[137] K. J. Dana, B. Van Ginneken, S. K. Nayar, and J. J. Koenderink, “Reflectance and
texture of real-world surfaces,” ACM Transactions on Graphics (TOG), vol. 18,
no. 1, pp. 1–34, 1999.
[138] P. Alliez and C. Gotsman, “Recent advances in compression of 3d meshes,” in
Advances in multiresolution for geometric modelling. Springer, 2005, pp. 3–26.
[139] A. Khodakovsky, N. Litke, and P. Schroder, “Globally smooth parameterizations
with low distortion,” in ACM Transactions on Graphics (TOG), vol. 22, no. 3.
ACM, 2003, pp. 350–357.
[140] M. N. Do and M. Vetterli, “The contourlet transform: an efficient directional
multiresolution image representation,” Image Processing, IEEE Transactions on,
vol. 14, no. 12, pp. 2091–2106, 2005.
[141] M. F. Duarte, S. Sarvotham, D. Baron, M. B. Wakin, and R. G. Baraniuk, “Dis-
tributed compressed sensing of jointly sparse signals,” in Asilomar Conf. Signals,
Sys., Comput, 2005, pp. 1537–1541.
[142] A. Cohen and B. Matei, “Nonlinear subdivision schemes: applications to image
processing,” in Tutorials on Multiresolution in Geometric Modelling. Springer,
2002, pp. 93–97.
136
REFERENCES
[143] E. Le Pennec and S. Mallat, “Sparse geometric image representations with ban-
delets,” Image Processing, IEEE Transactions on, vol. 14, no. 4, pp. 423–438,
2005.
[144] D. L. Donoho, I. M. Johnstone et al., “Ideal denoising in an orthonormal basis
chosen from a library of bases,” Comptes Rendus de l’Academie des Sciences-Serie
I-Mathematique, vol. 319, no. 12, pp. 1317–1322, 1994.
[145] S. Sajidaparveen, “G.; b. chandramohan. medical image retrieval using bandelet,”
Int. J. Sci. Eng. Technol, vol. 2, pp. 1103–1115, 2014.
[146] G. Peyre and S. Mallat, “Surface compression with geometric bandelets,” in ACM
Transactions on Graphics (TOG), vol. 24, no. 3. ACM, 2005, pp. 601–608.
[147] F. A. Alomar, G. Muhammad, H. Aboalsamh, M. Hussain, A. M. Mirza, and G. Be-
bis, “Gender recognition from faces using bandlet and local binary patterns,” in
Systems, Signals and Image Processing (IWSSIP), 2013 20th International Confer-
ence on. IEEE, 2013, pp. 59–62.
[148] N. Chitaliya and A. Trivedi, “Comparative analysis using fast discrete curvelet
transform via wrapping and discrete contourlet transform for feature extraction
and recognition,” in Intelligent Systems and Signal Processing (ISSP), 2013 In-
ternational Conference on. IEEE, 2013, pp. 154–159.
[149] G. Peyre and S. Mallat, “Orthogonal bandelet bases for geometric images approx-
imation,” Communications on Pure and Applied Mathematics, vol. 61, no. 9, pp.
1173–1212, 2008.
[150] M. Weber, P. Crilly, and W. E. Blass, “Adaptive noise filtering using an error-
backpropagation neural network,” Instrumentation and Measurement, IEEE Trans-
actions on, vol. 40, no. 5, pp. 820–825, 1991.
[151] T. Andrysiak and M. Choras, “Image retrieval based on hierarchical gabor filters,”
International Journal of Applied Mathematics and Computer Science, vol. 15, pp.
471–480, 2005.
137
REFERENCES
[152] M. Lam, T. Disney, M. Pham, D. Raicu, J. Furst, and R. Susomboon, “Content-
based image retrieval for pulmonary computed tomography nodule images,” in
Medical Imaging. International Society for Optics and Photonics, 2007, pp.
65 160N–65 160N.
[153] K. N. Plataniotis and A. N. Venetsanopoulos, Color image processing and appli-
cations. Springer, 2000.
[154] T. Acharya and A. K. Ray, Image processing: principles and applications. John
Wiley & Sons, 2005.
[155] M. M. Rahman, P. Bhattacharya, and B. C. Desai, “Probabilistic similarity mea-
sures in image databases with svm based categorization and relevance feedback,”
pp. 601–608, 2005.
[156] Y. Chen, J. Z. Wang, and R. Krovetz, “Clue: cluster-based retrieval of images by
unsupervised learning,” Image Processing, IEEE Transactions on, vol. 14, no. 8,
pp. 1187–1201, 2005.
[157] D. Tao, X. Tang, X. Li, and X. Wu, “Asymmetric bagging and random subspace
for support vector machines-based relevance feedback in image retrieval,” Pattern
Analysis and Machine Intelligence, IEEE Transactions on, vol. 28, no. 7, pp. 1088–
1099, 2006.
[158] C. Campbell, “Algorithmic approaches to training support vector machines: a sur-
vey.” in ESANN, 2000, pp. 27–36.
[159] P.-W. Huang and S. Dai, “Image retrieval by texture similarity,” Pattern recogni-
tion, vol. 36, no. 3, pp. 665–679, 2003.
[160] E. Yildizer, A. M. Balci, M. Hassan, and R. Alhajj, “Efficient content-based image
retrieval using multiple support vector machines ensemble,” Expert Systems with
Applications, vol. 39, no. 3, pp. 2385–2396, 2012.
138
REFERENCES
[161] J. Yu, Z. Qin, T. Wan, and X. Zhang, “Feature integration analysis of bag-of-
features model for image retrieval,” Neurocomputing, vol. 120, pp. 355–364, 2013.
[162] X. Tian, L. Jiao, X. Liu, and X. Zhang, “Feature integration of eodh and color-
sift: Application to image retrieval based on codebook,” Signal Processing: Image
Communication, vol. 29, no. 4, pp. 530–545, 2014.
139