Computationally Intelligent Retrieval of Images Bases on ...prr.hec.gov.pk/jspui/bitstream/123456789/7239/1/...Computationally Intelligent Retrieval of Images Bases on the Actual Image

Computationally Intelligent Retrieval ofImages Bases on the Actual Image Contents

AuthorREHAN ASHRAF

11F-UET/PhD-CP-29

SupervisorDr. Khalid Bashir Bajwa

Department of Computer EngineeringUniversity of Engineering and Technology

Taxila, Pakistan

(2016)

Computationally Intelligent Retrieval ofImages Bases on the Actual Image Contents

REHAN ASHRAF11F-UET/PhD-CP-29

A thesis submitted in partial fulfilments for thedegree of

Doctor of Philosophy

Thesis SupervisorDr. Khalid Bashir Bajwa

Department of Computer EngineeringUniversity of Engineering and Technology Taxila,

Pakistan

(2016)

Computationally Intelligent Retrieval of ImagesBases on the Actual Image Contents

A Dissertation submitted in partial fulfilment of the requirement for the degree of Doctor

of Philosophy in Computer Engineering by:

11F-UET/PhD-CP-29Checked and recommended by:

Members of Research Monitoring Committee

Dr. Hafiz Adnan Habib Dr. Zulfiqar Hasan Khan Dr. Tabassam Nawaz

Foreign Experts

Dr. Jonathan Kok-Keong Loo Dr. Rupert YoungMiddlesex University London, UK University of Sussex, UK

Approved by:

Dr. Khalid Iqbal Dr. Umair Abdullah(External Examiner-1) (External Examiner-2)

Dr. Khalid Bashir Bajwa(Supervisor/Internal Examiner)

Author’s Declaration

I, Rehan Ashraf, affirm that my PhD thesis titled Computationally Intelligent Re-

trieval of Images Bases on the Actual Image Contents contains no material which has

been accepted for the award of any other degree or diploma in any university or other

institution and confirm that to the best of my knowledge, the thesis contains no material

previously published or written by another person, except where due reference is made in

the text of the thesis.

Rehan Ashraf

Dated:

Plagiarism Undertaking

I take full responsibility of the research work conducted during the PhD Thesis titled

Computationally Intelligent Retrieval of Images Bases on the Actual Image Con-

tents. I solemnly declare that the research work presented in the thesis is done solely by

me with no significant help from any other person; however, small help wherever taken is

duly acknowledged. I have also written the complete thesis by myself. Moreover, I have

not presented this thesis (or substantially similar research work) or any part of the thesis

previously to any other degree awarding institution within Pakistan or abroad.

Therefore, I, as an author of the above-mentioned thesis, solemnly declare that no

portion of my thesis has been plagiarized and any material used in the thesis from other

sources is properly referenced. Moreover, the thesis does not contain any literal citing of

more than 70 words (total) even by giving a reference unless I have the written permission

of the publisher to do so. Furthermore, the work presented in the thesis is my own orig-

inal work and I have positively cited the related work of the other researchers by clearly

differentiating my work from their relevant work.

I further understand that if I am found guilty of any form of plagiarism in my thesis

work even after my graduation, the University reserves the right to revoke my PhD degree.

Moreover, the University will also have the right to publish my name on its website that

keeps a record of the students who plagiarized in their thesis work.

Rehan Ashraf

I dedicate this thesis to my loving parents,elder brothers, sister, my wife and kids.

Acknowledgements

Primarily, I thank Almighty Allah for giving me the strength and ability to

pursue this work to its conclusion. During this research, I have worked with

many people who have contributed in a variety of ways to my research. It is

a pleasure to convey my gratitude to all of them in my humble acknowledg-

ment. In my opinion, writing thesis is a difficult task without any guidance.

I would like to express my gratitude to all those who gave me the possibility

to complete this thesis. First of all, I would like to thank Dr. Khalid Bashir

Bajwa for his help, support, encouragement, teachings and supervision. His

wise academic advice and ideas have played an extremely important role in

the work presented in this thesis. I gratefully thank the members of my Pro-

posal defense committee for their constructive comments about this thesis.

UET Taxila provides the opportunity to meet amazing people and make new

friends and also provide the amazing atmosphere inside the institution.

I would also like to pay my gratitude to my teachers, Dr. Adeel Akram,

Dr. Iram Baig, Dr. Hafiz Adnan Habib, Dr. Zulfiqar Hassan Khan and Dr.

Tabbassum Nawaz without their guidance and motivation during my PhD,

none of this would have been even possible.

I am also thankful to Dr. Syed Aun Irtaza, who gave me an opportunity to

work with him and was always there to help me out. He provided the freedom

to pursue my own ideas and, at the same time, were rigorous in reviewing my

work. For that purpose, and for granting me the opportunity in such a re-

markable research field, I am thankful to him. He had a fundamental role in

bringing this document to life. Lots of thanks to my friends specially Toqeer

Mehmood, Zahid Mahmood, Farhan Aadil and colleagues in university of

Engineering and Technology Taxila, for their cooperation. During the whole

period of research, I spent a big part of each day in the UET research cen-

tre, so UET has become a part of my precious memories. Last but not the

least, my heartiest gratitude to my beloved parents, father-in-law, my elder

brothers (Professor Imran Ashraf, Engr. Rizwan Ashraf, Dr. Kamran Ashraf,

Professor Rabnawaz, Irfan Ashraf) and my wife for their unconditional sup-

port in every aspect of my life. They extended their helpful hands whenever

I needed. Without their patience and inspiration, it would not be possible for

me to start and continue my research. Above all, I thank Allah for blessing

me with all these resources, favours, and enabling me to complete this thesis.

Abstract

The images serve as a significant format for human communications and

they deliver a rich amount of information for people to understand the digital

world. With wide spread use of internet and availability of the digital imaging

techniques, more and more images are accessible to the world. As a result,

efficient image indexing and retrieval has grown exponentially. The current

form of image retrieval is based on the textual annotations that are used to

describe the image content; but in today’s global world textual annotation of

images are becoming impractical, unfeasible and inefficient to represent and

retrieve the images. On the list of key demands imposed with a CBIR tech-

nique should be effectively carried out the particular image content evaluation

and analysis, and generated an output in the form of image collections that

also has the similar semantics. The targeted output can be generated if we

train and evaluate the semantic classifiers on the actual objects of interest, but

this is a challenging problem as the image content is very diverse especially

in the natural image scenes. Targeting the problem by the application of the

image segmentation techniques to map what constitute an image is a well de-

veloped approach in the domain of CBIR, but the results of these approaches

are even out of the discussion. Some of the reasons are: (1) Currently there is

no robust way to perform the segmentation to figure out all objects or even the

complete structure of objects in most of the images. (2) In Computer Vision

(CV), Segmentation is still an open problem. (3) Segmentation slows down

the image representation process which is not a feasible thing in the real time

systems like CBIR. However, despite the great deal of research work, the im-

age retrieval performance of the CBIR is not satisfactory because of the gaps

between the semantic representation of the image at a low level and visual

concepts at a high level.

Therefore, based on these points, we believe that the application of the cur-

rently existing segmentation techniques for image representation is not a de-

manding thing. We have addressed the image representation by analyzing the

image contents through bandlets. The benefit of the scheme is that the image

representations are very powerful as we target the actual objects of interest

present in an image in a much effective way that is even impossible by using

segmentation based techniques. Secondly, the image analysis through Ban-

delet transform is a novel way for image retrieval and we have applied it for

generating the meaningful image representations and training the semantic

classifiers. Consistency enhancement in semantic association process, ad-

dresses the two main reasons, experienced by the conventional framework of

CBIR are not able to achieve effective results retrieval. These are: the lack

of output verification and neighborhood similarity avoidance. Due to these

problems the image retrieval response is very inconsistent and the target out-

put contains wrong results as compared to the right results. In this thesis,

we concentrate these issues by applying the Neural Networks over the bag

of images (BOI), and exploring the query’s semantic association space. The

semantic association process involves the Artificial Neural Networks (ANN)

that are trained over the resultant representations and guarantees the impres-

sive image retrieval output.

Contents

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Research Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3 Research Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 CBIR Background 10

2.1 Content Based Image Retrieval (CBIR) . . . . . . . . . . . . . . . . . . 10

2.2 User Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.3 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3.1 Spectral Clustering Approach . . . . . . . . . . . . . . . . . . . 15

2.3.2 An Unsupervised Approach (JSEG approach) . . . . . . . . . . . 16

2.3.3 Multiclass Image Semantic Segmentation (MCISS) . . . . . . . . 17

2.4 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.4.1 Texture Feature Extraction . . . . . . . . . . . . . . . . . . . . . 18

2.4.1.1 Spatial Method . . . . . . . . . . . . . . . . . . . . . . 20

2.4.1.2 Tamura Feature . . . . . . . . . . . . . . . . . . . . . 20

2.4.1.3 Markov Random Fields . . . . . . . . . . . . . . . . . 21

2.4.1.4 Co-occurrence Matrix . . . . . . . . . . . . . . . . . . 21

2.4.1.5 Edge Histogram . . . . . . . . . . . . . . . . . . . . . 22

2.4.1.6 Fractals in Spatial Feature . . . . . . . . . . . . . . . . 23

2.4.2 Spectral Domain . . . . . . . . . . . . . . . . . . . . . . . . . . 23

vii

CONTENTS

2.4.2.1 Fourier Transform . . . . . . . . . . . . . . . . . . . . 23

2.4.2.2 Discrete Cosine Transform and Wavelet Transform . . . 25

2.4.2.3 Gabor Filters Transform . . . . . . . . . . . . . . . . . 26

2.4.2.4 Curvelet Transform . . . . . . . . . . . . . . . . . . . 27

2.4.3 Color Feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.4.3.1 Color Histogram . . . . . . . . . . . . . . . . . . . . . 31

2.4.3.2 Color Coherence Vector . . . . . . . . . . . . . . . . . 32

2.4.3.3 HSV Color Space . . . . . . . . . . . . . . . . . . . . 33

2.4.3.4 HUE MINIMUM MAXIMUM DIFFERENCE (HMMD) 34

2.4.3.5 Dominant Color Descriptor (DCD) . . . . . . . . . . . 34

2.4.3.6 YCbCr Color Space . . . . . . . . . . . . . . . . . . . 35

2.4.4 Shape Feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

2.4.4.1 Shape Extraction based on Contour Method . . . . . . 37

2.4.4.2 Simple shape descriptors . . . . . . . . . . . . . . . . 37

2.4.4.3 Shape signature . . . . . . . . . . . . . . . . . . . . . 38

2.4.4.4 Stochastic method . . . . . . . . . . . . . . . . . . . . 38

2.4.4.5 Spectral transform . . . . . . . . . . . . . . . . . . . 39

2.4.5 Interest point detector . . . . . . . . . . . . . . . . . . . . . . . 39

2.4.5.1 Harris detector . . . . . . . . . . . . . . . . . . . . . . 40

2.4.5.2 Speeded up robust feature (SURF) . . . . . . . . . . . 40

2.4.5.3 Scale Invariant Feature Transform (SIFT) . . . . . . . . 42

2.5 Similarity Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

2.5.1 Metric Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

2.5.1.1 Manhattan Distance . . . . . . . . . . . . . . . . . . . 44

2.5.1.2 Euclidean Distance . . . . . . . . . . . . . . . . . . . 45

2.5.1.3 Minkowski Distance . . . . . . . . . . . . . . . . . . . 45

2.5.1.4 Hausdorff Distance . . . . . . . . . . . . . . . . . . . 45

2.5.2 Histogram Distance . . . . . . . . . . . . . . . . . . . . . . . . . 46

2.5.2.1 Earth Movers Distance . . . . . . . . . . . . . . . . . 46

viii

CONTENTS

2.5.2.2 Kullback-Leibler (KL) Divergence . . . . . . . . . . . 46

2.6 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

2.6.1 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

2.6.1.1 Precision . . . . . . . . . . . . . . . . . . . . . . . . . 47

2.6.1.2 Recall . . . . . . . . . . . . . . . . . . . . . . . . . . 48

2.6.2 Precision-Recall Graphs . . . . . . . . . . . . . . . . . . . . . . 49

2.6.3 Mean average precision . . . . . . . . . . . . . . . . . . . . . . 49

2.7 Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

2.8 Multimedia Information Retrieval . . . . . . . . . . . . . . . . . . . . . 50

2.8.1 Image representations and similarity detection . . . . . . . . . . . 50

2.8.2 Image block based presentation and salient points . . . . . . . . . 53

2.8.3 Image classification and similarity detection . . . . . . . . . . . . 54

2.9 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3 Bandelet Transform 58

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3.2 Surface Compression through Geometric Bandelet . . . . . . . . . . . . 59

3.3 Bandelet Image Compression . . . . . . . . . . . . . . . . . . . . . . . . 61

3.3.1 Geometric image model . . . . . . . . . . . . . . . . . . . . . . 62

3.3.2 Geometric Image Flow with Bandelet Bases . . . . . . . . . . . . 63

3.3.3 Image Compression Through Bandelet . . . . . . . . . . . . . . 65

3.4 Orthogonal Bandlets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

3.4.1 Block Based Bandelet Basis . . . . . . . . . . . . . . . . . . . . 67

3.4.2 Fast Discrete Bandelet Transform . . . . . . . . . . . . . . . . . 69

3.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

4 Feature Extraction Using Bandelet Transform 74

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

4.2 Image Representation using Bandletized region include YCbCr Color Space 76

4.2.1 Modified Bandelet Transform . . . . . . . . . . . . . . . . . . . 77

ix

CONTENTS

4.2.1.1 Alpert bases in bandelet transform . . . . . . . . . . . 80

4.2.1.2 Texture Feature Extraction using Bandelet . . . . . . . 84

4.2.1.3 Artificial Neural Network . . . . . . . . . . . . . . . . 85

4.2.1.4 Gabor Feature . . . . . . . . . . . . . . . . . . . . . . 87

4.2.2 Color Feature Extraction . . . . . . . . . . . . . . . . . . . . . . 89

4.2.3 Fusion Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

4.3 Image Representation using HSV . . . . . . . . . . . . . . . . . . . . . . 91

4.3.1 Color Feature HSV Domain . . . . . . . . . . . . . . . . . . . . 92

4.3.2 Combine texture and HSV color feature . . . . . . . . . . . . . . 93

4.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

5 State of Art Classifier For Image Retrieval 95

5.1 Semantic Association . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

5.2 Content based image retrieval using ANN . . . . . . . . . . . . . . . . . 98

5.3 Content based image retrieval using SVM . . . . . . . . . . . . . . . . . 99

5.4 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5.4.1 Image Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

5.4.2 Retrieval Precision/Recall Evaluation . . . . . . . . . . . . . . . 102

5.4.3 Comparison on Corel Image Set . . . . . . . . . . . . . . . . . . 104

5.4.4 Comparison with State-of-the-Art Methods . . . . . . . . . . . . 109

5.4.5 Comparison on Coil Image Set . . . . . . . . . . . . . . . . . . . 114

5.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

6 Conclusions and Future Work 116

6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

6.2.1 Extensions of the Work . . . . . . . . . . . . . . . . . . . . . . . 117

6.2.2 Practical Implications of CBIR . . . . . . . . . . . . . . . . . . . 118

6.2.3 Anomalous Events Monitoring in Surveillance Videos . . . . . . 118

6.2.4 Traffic Management . . . . . . . . . . . . . . . . . . . . . . . . 119

x

CONTENTS

6.2.5 CBIR using Hadoop . . . . . . . . . . . . . . . . . . . . . . . . 119

References 139

xi

CONTENTS

List of Publications

1. R.Ashraf, K.Bashir, A.irtaza, M.T.Mahmood; Content Based Image Retrieval Us-

ing Embedded Neural Networks with Bandletized Regions, Entropy, Vol 17. No.6,

pp:3552-3580, 2015.(IF=1.5)

2. R.Ashraf, K.Bashir, T.Mehmood; Content-based Image Retrieval by Exploring

Bandletized Regions through Support Vector Machines, Journal Of Information

Science and Engineering, J INF SCI ENG 2015. Accepted Id:150159.(IF=0.5)

3. R.Ashraf, K.Bashir, A.irtaza, T.Mehmood; A Novel Approach For the Gender Clas-

sification Through Trained Neural Networks, Journal of Basic and Applied Scien-

tific Research, Vol 4. No.6, pp.136-144, 2014.(ISI Indexed)

xii

List of Figures

1.1 Query Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2 Image of cluttered scene . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.3 Putatively Matched Points . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.4 Query Result:1st image is query image while remaining images in the

group are retrieved images . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1 Typical CBIR structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2 JSEG algorithm for color image segmentation . . . . . . . . . . . . . . . 16

2.3 JSEG segmentation results . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.4 MCISS Frame work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.5 classification of texture features extraction methods . . . . . . . . . . . . 19

2.6 Image and its Fourier Transform . . . . . . . . . . . . . . . . . . . . . . 24

2.7 Different images with same Fourier Transform . . . . . . . . . . . . . . . 24

2.8 Five-level curvelet digital tiling of an image . . . . . . . . . . . . . . . . 28

2.9 RGB Color Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.10 Image and its color histogram . . . . . . . . . . . . . . . . . . . . . . . . 32

2.11 Images and their CCV color feature vectors . . . . . . . . . . . . . . . . 33

2.12 The RGB and HSV Color Space . . . . . . . . . . . . . . . . . . . . . . 34

2.13 HMMD Color Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2.14 Classification of shape representation and description technique . . . . . 36

2.15 Shape eccentricity and circularity . . . . . . . . . . . . . . . . . . . . . . 37

2.16 Surf interest point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

2.17 Surf interest point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

xiii

LIST OF FIGURES

2.18 SIFT interest point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

2.19 Accuracy Parameters - Correct and incorrect labeling of an image . . . . 47

3.1 Horizon model with a flow . . . . . . . . . . . . . . . . . . . . . . . . . 64

3.2 Flow in region in an image . . . . . . . . . . . . . . . . . . . . . . . . . 67

3.3 Dyadic square segmentation of an image . . . . . . . . . . . . . . . . . . 68

3.4 Left columns gives zooms of noisy images having a PSNR = 20:19 dB.

The middle and left columns are obtained, respectively, with bandelet and

wavelet estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

4.1 Proposed Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

4.2 Bandelet Transform [1; 2]. (a) Dyadic segmentation depends on local

directionality of the image; (b) Bandelet segmentation square which con-

tains a regularity function as shown by the red dash; (c) sampling posi-

tion and Geometric flow; (d) Sampling position adapted to the warped

geometric flow; (e) warping example. . . . . . . . . . . . . . . . . . . . 79

4.3 Geometric flow representation using different block sizes (a) small size

4*4 (b) medium size 8*8 (a) small size 4*4; (b) medium size 8*8. . . . . 79

4.4 Object categorization on the base of Geometric flow obtained

through Bandletization. . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

4.5 The structure of neural network. . . . . . . . . . . . . . . . . . . . . . . 86

4.6 Types of texture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

4.7 (a) RGB Original Image; (b) Y matrix Luminance Image; (c) Canny

Luma Image; (d) Canny RGB Image. . . . . . . . . . . . . . . . . . . . 90

4.8 Proposed Method HSV. . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

4.9 (a) RGB Original Image; (b) H matrix Hue Image; (c) Canny Hue Image;

(d) Canny RGB Image. . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

5.1 Verification inconsistency: Similarity of the visual features, but both im-

ages belong from different semantic classes. . . . . . . . . . . . . . . . . 97

5.2 Query Performance on Corel image dataset with Top 10 to Top 40 Retrievals.105

xiv

LIST OF FIGURES

5.3 Query Performance on Caltech image dataset with Top 10 to Top 60 Re-

trievals in terms of Precision. . . . . . . . . . . . . . . . . . . . . . . . . 106

5.4 Query Performance on Caltech image dataset with Top 10 to Top 60 Re-

trievals in terms of Recall. . . . . . . . . . . . . . . . . . . . . . . . . . 106

5.5 Comparison of mean precision obtained by proposed method with other

standard retrieval systems. . . . . . . . . . . . . . . . . . . . . . . . . . 107

5.6 Comparison of mean recall obtained by proposed method with other stan-

dard retrieval systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

5.7 Comparison of mean precision obtained by proposed method with state

of art retrieval systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

5.8 Comparison of mean recall obtained by proposed method with state of art

retrieval systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

5.9 Mean Precision of Proposed Method-HSV compared with other standard

retrieval systems on top 20 retrievals. . . . . . . . . . . . . . . . . . . . . 113

5.10 Comparison of mean recall obtained by proposed method-HSV with other

standard retrieval systems. . . . . . . . . . . . . . . . . . . . . . . . . . 114

5.11 Comparison of precision and recall obtained by proposed method with

ICTEDCT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

xv

List of Tables

2.1 Popular CBIR system implemented in commercial and academia . . . . . 13

2.2 Features calculated by using matrix of normalized co-occurrence P (Q, R) 22

4.1 Summary of Neural network structure for every image category used in

this work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

5.1 Comparison of mean precision obtained by proposed method with other

standard retrieval systems on top 20 retrievals. . . . . . . . . . . . . . . . 107

5.2 Comparison of mean recall obtained by proposed method with other stan-

dard retrieval systems on top 20 retrievals. . . . . . . . . . . . . . . . . . 108

5.3 Comparison of mean precision obtained by proposed method with state-

of-art methods on top 20 retrievals. . . . . . . . . . . . . . . . . . . . . . 110

5.4 Comparison of mean recall obtained by proposed method with state-of-art

methods on top 20 retrievals. . . . . . . . . . . . . . . . . . . . . . . . . 111

5.5 Mean Precision of Proposed Method-HSV compared with other standard

retrieval systems on top 20 retrievals. . . . . . . . . . . . . . . . . . . . . 112

5.6 Comparison of mean recall obtained by proposed method-HSV with other

standard retrieval systems on top 20 retrievals. . . . . . . . . . . . . . . . 113

xvi

NomenclatureSymbol Description

CBIR Content based image retrieval

RF Relevance feedback

SVM support vector machine

QBE Query by example

EM Expectation Maximization

CCV Color Coherence Vector

HMMD hue-min-max-difference

HSV Hue, saturation, value

FD Fourier Descriptor

Y CbCr Luminance and crominance model

DoG Difference of Gaussian

L1distance Manhattan distance

L2distance Euclidean distance

MCM Motif Co-occurrence Matrix

DDC Dynamic Dominant Color

MIR Multimedia Information Retrieval

LBP Local Binary Patterns

CHAPTER 1

INTRODUCTION

There are many assets and properties on the Web sites that can be used to create public

images. It has generated the need to search a way of these images. In this way, finding

productive image recovery systems has gotten to be and transformed into a wide territory

of eagerness to researchers. The frame work of retrieval of image technique is looking and

recovering images from an extensive repository of digital images. In the last few decades,

researchers have been working on image retrieval techniques and two types of promising

techniques have been developed such as, Content Based Image Retrieval (CBIR) and Text

Based Image Retrieval (TBIR). In TBIR methods, users use keywords or descriptions to

the images as query so that they can use the retrieved images, which are pertinent to the

keyword.

Text-based retrieval has several dis-advantages. Primarily, there is inconsistency

through labels by using image annotators caused by diverse realizing about image con-

tents e.g. an image consisting of grass and flowers might be labeled as either grass or

flower or nature by different people. Second, it takes more time to annotate an image in

a large database and makes the process subjective [3]. Third, there is a high likelihood

of mistakes event amid an image labeling or tagging process when the database is large.

Subsequently, text based image retrieval cannot accomplish high level of productivity and

effectiveness. CBIR has several advantages over the traditional text based retrieval. Due

to using the visual contents of the query image in CBIR, it is a more efficient and effective

1

1.1 Motivation

way at finding relevant images than searching based on text annotations. CBIR does not

consume the time wasted in manual annotation process of text based approach. These

advantages have motivated us to employ a CBIR technique for our research. The CBIR is

an automated technique and is based the features in low level images like color, texture,

shape and combination are extricated from image of repository.

The chapter starts with the discussion of the factors that motivated us to explore the

domain of the CBIR. The section is followed by the research objectives that are discussed

in section 1.2. Original contributions of the dissertation are reported in section 1.3. Fi-

nally, a short overview of the remaining chapters is provided in section 1.4.

1.1 Motivation

Due to an increasing of multimedia application and widely use of internet, the image li-

braries having digital contents have seen dramatic expansion. Therefore, need to explore

these libraries and access the appropriate information in an automatic way which has mo-

tivated research in the domain of image retrieval. Automatically retrieving images have

found vast applications in many fields like geographical information systems, surveillance

systems, remote sensing, data mining, architectural design, fabric design, internet image

search, medical image retrieval, satellite imaging, video search and communication sys-

tems. To cope with the challenges of image retrieval, many commercial search services

like Google, Bing etc. have become indispensable tool in the people’s work and daily

life [4; 5]. But as these search services usually retrieve images through keywords and

metadata, therefore, they cannot help in the retrieval of images which often do not have

such associated details of information. These image retrieval systems also suffer from

other critical issues like keywords limitation, lack of appropriate metadata association,

and high cost of text manual annotations. So, to avoid these shortcomings and to enhance

the capabilities of image retrieval system, an important focus of research is on the CBIR.

Several application areas like surveillance systems, architectural design, data mining,

GIS and remote sensing, fabric design, and video search etc can also obtain the benefit

2

1.1 Motivation

of the automatic image content analysis and association in their particular domains. The

existing techniques for image retrieval are based on the content specification by manual

descriptions in the form of metadata are no longer be able to deal with such massive

repositories because of the size, diversity of the content, high cost of the manual anno-

tations, and unavailability of the globally acceptable keywords. Therefore, over the past

few years a novel and exciting field of study, the CBIR has emerged in the domain of

computer vision.

CBIR performs the image content analysis by using the visual attributes such as color,

shape, texture, salient or the key points; and textual descriptions associated with the im-

ages are never considered for image retrieval. Principle working subject of the CBIR is

always to represent the actual images such as reduced level characteristics and attributes

in the form of low level visual features and then find the semantically similar images from

image repository by applying the metric or non-metric norms. Hence, the relevant out-

put is dependent on two factors i.e. the ability to effectively representation of the images

based on the actual visual contents, and a norm that is capable enough to match the feature

vectors but also to bring the output that is similar in terms of semantics.

If both of these factors are not addressed properly it results in the form of improper

output. One of the key reasons found in the literature of CBIR that causes this imperfect

output is the inherent semantic gap that is found between the representation of an image

in low level and semantic concept of high level in the image. Therefore, as a motivation

for this research work, we have addressed the issue by introducing a new powerful im-

age representation scheme through Bandelet transform. The novel feature representation

contains the benefits of local object level detection and representation and generates the

combinatorial version by considering the global image semantics as well. The power-

ful image representation is systematically combined with Artificial Neural Networks to

generate powerful image retrieval output that lowers the semantic gap as well [6].

Thus as a motivation of this thesis, we address these issues, and reported the ways

through which retrieval of images can perform in more robust way. Although, in the last

decades have been producing a remarkable research work in the territory of CBIR, yet at

3

1.2 Research Objectives

the same time CBIR is not a full grown exploration zone. The major motivation of thesis

is to contribute in CBIR field by addressing the existing challenges and presenting new

ideas.


The Powerful image representations are required for efficient retrieval, a lot of research

works have emphasized this problem by computing the global frequency of the image (i.e.

the variance, mean etc over texture), and many other research works are addressing the

problem by considering the object frequencies that compose an image by segmentation or

the keypoints detection (e.g. SIFT, SURF). The benefit of the global frequencies as image

features is that it gives us an opportunity to sense the overall semantics of the image but it

cannot work effectively in object searching e.g. as elaborated in figure 1.1 and 1.2. If we

try to match the contents of figure 1.2 and 1.1, then it is not possible to find the relevant

information although the figure 1.2 contains the complete contents of figure 1.1.

The reason is that the global image frequencies of both images are different. But for

this problem, if searching is performed through the local features (e.g. SIFT, SURF),

it will work efficiently as described in figure 1.3. Local features work impressively for

object detection but the results are not optimal when it is required to determine the overall

image semantics as described in figure 1.4. Therefore, in this research work we have

presented a novel way of feature extraction through Bandelet transform that is able to

reap the benefits of both schemes (i.e. global and local). The Bandelet transform provide

an opportunity to deeply analyze the image up to the image objects and also overcomes

the drawbacks found in the segmentation based techniques.

To extract features and Image representation purposes, we have considered the image

objects and on the basis of these object we have computed the overall (i.e. global) image

energy, and consider this energy for image representation. Artificial Neural Networks

(ANN) and Support Vector Machine (SVM) are trained over these representations so that

the ultimate target of reducing the semantic gap can be achieved.

4


Figure 1.1: Query Image

Figure 1.2: Image of cluttered scene

Figure 1.3: Putatively Matched Points

5

1.3 Research Contributions

1.3 Research Contributions

Several methods have been described in CBIR techniques using both the spatial and spec-

tral approaches. Most of them are not capable of effective texture representation. Bandelet

transform provides in depth details about image in the spectral domain by using more ori-

entation information at each scale. Hence, our main contribution and principle goal is to

use Bandelet texture features for CBIR.

Following key points are the original contributions of our thesis:

• A new scheme for object detection in images based on bandlets.

• A new feature extraction scheme by combining the object representations with

global energy of an image.

• Applications of the texture specific parameters for Gabor filters.

• A new feature extraction scheme that combines the texture features with color fea-

tures for better image representations.

• Adapt the neural network parameters that ensures the minimum mean squared error.

• Instead of the relevant image search on the basis of query image, a probabilistic

interpretation of the multiple images for image search is presented.

6

1.4 Thesis Outline

Figure 1.4: Query Result:1st image is query image while remaining images in the group

are retrieved images

1.4 Thesis Outline

It is organizing the rest of this thesis into five chapters. A brief summary of each of these

chapters as follows:

An overview of the Content Based Image Retrieval (CBIR) is presented in chap-

ter 2. This chapter reviews the relevant topics in CBIR and provides a comprehensive

back ground about the current research. Different paradigms of user query, segmentation,

choice of visual features (color, texture, shape), salient points, similarity measures, image

7

1.4 Thesis Outline

benchmarks (datasets) and approach of results presentation and performance evaluation

are discussed in this chapter.

Chapter 3 specifically focuses on Bandelet Transform in the image retrieval process.

Bandelet Transform are widely used in Image compression, noise removal application,

multi scale structuring, geometry compression. The bandelet alter gives the details of

geometric image occasionally by means of getting rid of the particular redundancy of

wavelet alter.

In chapter 4, we have focused for the generation of powerful feature vectors. To

achieve this, first of all Bandelet transform are applied on repository, which returns the

geometric limits of the significant items found in image which returns the major objects

found in an image. We apply Back propagation neural network system which makes it

sure that the composition estimation parameters to apply Gabor filter with focused pa-

rameters (as will be depicted) to estimate the content of texture around the boundaries

with maximum accuracy. To increase the power of feature vector color components are

also induced in the YCbCr and HSV domain after approximating it through wavelet de-

composition over the color histogram. CBIR performs the image content investigation

by the visual properties e.g. shape, color, texture remarkable, salient or the keypoints;

printed portrayals connected with the images are never considered for image retrieval.

For classification purposes, using SVM and ANN is suitable to determine the semantic

classes.

“Chapter 5 concentrates more deeply on the design of a stable semantic classifier

through ANN and SVM. When there are several categories enrolled then the samples of

a particular category are naturally far less than the comparative samples from other cate-

gories. In this way, when proposing the framework of return or any category of semantic

query image, it is returned images which have the same semantic category for the user

after the arrangement on the base of the distance with respect to the query image sim-

ilarity features. So, all this guarantees the performance enhancement in image retrieval

process.”

8

1.4 Thesis Outline

Chapter 6 summarizes the major research results of the dissertation. The key findings

and future direction are highlighted in this chapter.

9

CHAPTER 2

CBIR BACKGROUND

This chapter introduces the content based image retrieval and reviews relevant topics to

provide the basis for this thesis. The chapter is arranged in the following order: Section

2.1 discusses the content based image retrieval (CBIR). Section 2.2 provides the details

of user query in a CBIR environment. Section 2.3 presents the details of segmentation

in a CBIR domain. Different ways of feature extraction and image representation are

described in section 2.4. Section 2.5 covers the detail used to compute the relevance

against a query image. Section 2.6 provides the details of Performance evaluation. Section

2.7 discusses image benchmarks being used in CBIR. Section 2.8 presents a review of

multimedia information retrieval in the field of CBIR. Section 2.9 summarizes the chapter.

2.1 Content Based Image Retrieval (CBIR)

Data and information presented as images continue to grow astronomically, as a result of

intensive use of multimedia services which has been equipped with digital cameras, and

because of the interface that the internet transformed, internet researchers such as Google

to fully heal reports in light of the specific content of the texts. In any case, the nature of

the outcome is frequently a long way from ideal when it comes down to finding symbolic.

Because the universe is scope for words to describe a limited picture.“The well-known

saying a picture is worth a thousand words point out one such reason, why it is so difficult

10


to track down the images someone is looking for an image.” Understanding an image

as well as interpretation the particular semantics are stimulating tasks for computers and

human perception. Because, significance are comparable and relative, as well as changes

from one person to another [7; 8; 9]. For example, just what an individual person might

comprehend as being a holiday image featuring any pile could be by simply someone

else because landscape regarding Iceland, whilst one third one may illustrates particular

picture as being a volcano on the edge regarding eruption. Search engines need to be

capable of providing pictures which can be available to multiple descriptions as well as

interpretations.

Consequently, to resolve this problem depends on the image independence search

keywords and metadata associated with them. This particular objective has guided the

scientists and researchers to find new horizons in the CBIR field. Moreover, the system

can effectively needed and strong CBIR to overcome the semantic problems in image

retrieval. The dilemma is that the images are hosted in an unorganized way and are

searched through the associated keywords and meta-data that makes it very difficult for a

user to view all the related images due to the inherent limitations of the textual data for

describing and representing the content.

Figure 2.1: Typical CBIR structure - [10]

For this reason, a lot of general purpose graphic access systems have been produced

such as text based and as well content based. The text based methodology is usually fol-

lowed back to 1970s. In such frameworks, the particular graphics usually hand simply be

11


annotated by descriptors textual content, which happens to be then used by any database

management system (DBMS) to accomplish graphic access and having two disadvantages

of using this type of method. An example may be that your sizeable a higher level indi-

vidual the time becomes necessary pertaining to handbook annotation. The 2nd may be

the annotation inaccuracy, a result of the subjectivity associated with individual belief.

To help triumph over the above mentioned disadvantages with text based access method,

CBIR seemed to be unveiled inside beginning 1980s.

CBIR recognizes image on the base of their visual contents. Computer vision tech-

nique refers the features of low-level like color, texture, shape, spatial layout etc. The real

meaning of content based is to evaluate the actual contents of an image. The term content

describes the color, texture, shapes and other information related to the image. CBIR is

to find an image or group of images that are related to the query image. Figure 2.1 [10]

shows the general CBIR system. Various researchers have described the relationship of

features of low level and high level [11; 12]. There are three categories of features classi-

fied features that are low level features, middle level features and high level features. The

gap between first two levels (low-level and high-level) or limited descriptive power and

productivity of user is known as semantic gap [13; 14].

The CBIR is an automated technique and based an image in low level features like

texture, shape and color are extracted from the image of repository. CBIR is more effi-

cient and effective way at finding relevant images than manual text annotation techniques.

Many CBIR systems are introduced [3; 15] throughout the years. Some of these frame-

works are commercial and most are academic (scholarly) the point of interest is appeared

in table 2.1.

This research is based on the features extraction techniques involved in image re-

trieval, machine learning techniques, classification and computationally intelligent re-

trieval of images through which an efficient CBIR system can be proposed.

12


Table 2.1: Popular CBIR system implemented in commercial and academia

CBIR Systems Characteristic Category

Visual SEEK Image assessment through corresponding salient coloration

parts with regards to hues, sizes, and also relative spatial

locations.

Academic

Photo book Incorporate retrieval mechanisms for two dimensional

shapes, face recognition and texture images.

Academic

Multimedia

Analysis and

Retrieval Sys-

tems(MARS)

Consolidate importance relevance feedback from the user

for consequent result refinements.

Academic

NeTra In this system, segmentation technique is used. First of all,

an image is divided into regions of homogenous color, then

extracted features from those regions by applying the color,

texture, shape and spatial location.

Academic

QBIC Determined by several capabilities which might be decided

on from the consumer extracts coloring capabilities pertain-

ing to personal physical objects or even the complete image

in several coloring areas. Furthermore, it includes texture

structure and shape feature in addition to design capabili-

ties.

Commercial

Virage This system introduced simple features like as global and

local color, shapes and texture. For similarity measure pur-

pose indexing technique is presented when developing an

application.

Commercial

13

2.2 User Query

2.2 User Query

The way to query the CBIR system and describe the information needs of user, is an

inherent problem. In CBIR systems, users provide an example image to image retrieval

system and consider the image content as the information are similar which user is trying

to search. This query hypothesis is known as Query by Example (QBE) scheme. Apart

from the search algorithms, the retrieval results must share at least some elements as that

are the part of the input image which serves as the search example. The image retrieval

process starts by computing the visual features of the query image. CBIR system to

bring the response images having similarity in features and appear close to the user’s

query within distance. Instead of taking the whole image as the jargon for users search

intention. Some CBIR systems allow users to mark image regions, which user would

like to search. This approach is known as query by image region. The approach must

allow the system users to mark the regions in the input image which, they are intended

to search. For this approach, CBIR system must be equipped with unsupervised method

of segmentation. Blobworld [16] and NeTra [17] are examples of this query approach.

In this query scheme, the retrieval results are much better as it does not consider the

irrelevant image portions in the repository search.

To more effectively deal the automatic image retrieval problem, some systems do not

only depend on a single example image. In this regard, they take several example images

from the user [18], and computes commonalities amongst them. Most representative im-

ages have major contributions in image search. Contribution role is determined through

the weights that are assigned to all example images. After this the image repository is

searched on the base of these images in which image commonalities serves as the basis of

query [19]. Another approach to take multiple example images is through relevance feed-

back. In this paradigm system iteratively learn the users information need and improve

itself as the relevant example images increases. Some query paradigms merge different

modalities (like body gestures, touch and voice) to query of the CBIR system. One exam-

ple of which is investigating multi model interfaces is presented in [20], this work exhibits

14

2.3 Segmentation

dynamic interface for retrieval purposes. In this thesis, we employed Query by example

paradigm to query the CBIR.

2.3 Segmentation

Automatic image segmentation is a challenging errand. Image segmentation is a key

role in CBIR and extracts the region based image manifestation. Segmentation algorithm

splits the image into small parts or into altered components. These components based on

feature homogeneity. Many techniques and approaches of segmentation are developed for

example graph based, grid based, clustering, model based, region growing based method

and contour based. In computer vision, different segmentation techniques are used.

2.3.1 Spectral Clustering Approach

Spectral clustering technique is yet another technique that’s widely used with segmen-

tation goal [16]. On this technique, mixture of texture in addition to coloration charac-

teristics are generally acquired utilizing the clustering pixels. Initially, the joint dissemi-

nation of texture, composition, color and position components is displayed with a blend

of Gaussian. To calculate the parameters of the model, Expectation Maximization (EM)

algorithm is used. The consequent pixel cluster association rights proffer a segmenta-

tion from the image as well as resulted regions roughly correspond to objects. Several

methods designed and style their very own segmentations. So, as a result preferred spot

attributes throughout segmentation, whether it be colour, texture, or combination of both

[21]. These algorithms in many cases are based upon k-means clustering with regards to

pixel/block characteristics features. In the work of [21; 22], segmented image is divided

into trivial blocks of size with dimension of 4 ∗ 4 from which texture as well as color

features are extracted. At that point, it is applied k-means clustering to group of particular

vector of cluster vectors on several categories and classes. Each class represents to one

region. Blocks are classified in the same category in the same area or region. Therefore,

clustering segmentation technique is more effective in image retrieval.

15

2.3 Segmentation

2.3.2 An Unsupervised Approach (JSEG approach)

The important idea of the JSEG procedure is to distinct the segmentation process in a

pair of levels, coloration quantization as well as spatial segmentation [22]. From the

initial stage, color in an image tends to be quantized to several classes that can be used to

distinguish between areas in the image. This quantization is carried out in the coloration

living space without taking into consideration the spatial distributions from the color. As

a result, the image pixel values tend to be swapped out by their matching of the color class

labels and then figure out a class map which can be viewed as a special kind of texture

composition. On the other hand, spatial segmentation is carried out entirely on this kind

of class map without taking into consideration the equivalent pixel coloration likeness.

The real advantage of this sort of two stage separation is completely clear. It is a

struggle to handle the similarity with coloring as well as their particular distributions at

the same time. The separation of the similar color of the spatial distribution that takes

into account the development of algorithms more manageable for each of the two stages

of processing [23; 24]. Figure 2.2 [24] shows the schematic way of the color segmentation

from JSEG algorithm.

Figure 2.2: JSEG algorithm for color image segmentation - [24]

16

2.3 Segmentation

Figure 2.3: JSEG segmentation results - [22]

Regardless of the possibility that JSEG is an established methodology for segmen-

tation it will miscarry for some class of images and delivers fewer quality segmentation

results as over segmentation is appeared in figure 2.3 [22]. The framework depends sus-

picion by enhancing the segmentation accuracy or precision and decreasing the computa-

tional intricacy of bottom up approach and that point MCISS methodology are progressed.

2.3.3 Multiclass Image Semantic Segmentation (MCISS)

Multi Class Image Semantic Segmentation (MCISS) is another productive methodology

in light of the fact that it can be utilized the images for all classes and not only for specific

images of class [16]. Class particular means the framework just ready to image segment

for some images of particular class, for example horses, buildings, pedestrians etc. Ad-

ditionally, we are able to think about specific methods useful class of the segmenting

images containing organized objects. Since the images containing both structured and

non-structured elements such as the river, sky, etc. that doesn’t include almost any cer-

tain condition or even composition the job is not able with regard to this sort of course

involving images [25].

Multiclass image semantic segmentation technique are combined top down and bot-

tom up approaches. From the bottom up method that utilizes both fractal dimensions

along with j-value being homogeneity calculate is employed. The aim of a bottom-up

way to give a region based segmentation approach which can be utilized for all class of

pictures where JSEG is not able to do some class of images [25; 26]. MCISS frame work

is shown in figure 2.4 [25].

17

2.4 Feature Extraction

Figure 2.4: MCISS Frame work - [25]

Image segmentation is still an open research problem. Selection of segmentation tech-

nique is dependent on the type of application or domain. In image retrieval, region bound-

ary is not proper until region is homogenous.


An image is a group of pixels prescribed in the form of a matrix and represented in

low level features. Selection of proper features is the most important step in designing

an efficient image retrieval system. Features are extracted on global and region based

techniques. In global extraction segmentation is not required while for region based ex-

traction, segmentation is done at first step. In any system of CBIR, efficient retrieval

is strongly dependent on the fact that how efficiently the visual contents present in the

image are represented in the form of visual signature (composition of multiple features).

Retrieval results are adversely affected by the improper features. Suppose if the color

features are extracted from a repository having similarly colored images, then one can

imagine that the retrieval results will never be appropriate. This is the reason that selec-

tion of features is a significant design phase of a CBIR system.

2.4.1 Texture Feature Extraction

Texture is imperative and prominent visual property of an image in low level feature

and extensively used in CBIR. The real world images are composed of different kinds

18


Figure 2.5: classification of texture features extraction methods

of objects and find these objects have different surface patterns within the image. The

surface pattern of an object in the image or whole image is known as the texture. Texture

feature condensing definition as given in [8; 27] can be stated as, the representing the

spatial arrangement of gray levels (red and black) of the pixels in an image or region. The

common known texture descriptors are co-occurrence matrices [28], Wavelet Transform

[29; 30], Gabor filter [31; 32] and Tamura features [33; 34].

Texture feature have two main components as shown in Figure 2.5.

1. Spatial method.

2. Spectral method.

19


2.4.1.1 Spatial Method

Spatial Texture feature including co-occurrence matrices [35; 36], Fourier power spectra

[37; 38], shift-invariant principal component analysis (SPCA) [38], world decomposition

[39], Markov random field [40; 41], fractal model [42], Tamura feature [43; 44], Haar

wavelet transform [45; 46] and Gabor filter differentiate texture by the statistical distribu-

tion of the image intensity, which have been used frequently. Spatial methods are widely

used and most effective in the area of CBIR. In CBIR system, extraction of features is

important key steps because they are used in all subsequent modules of the system. They

have shown good performance when applied to irregular shapes. The drawbacks of these

techniques are their sensitivity to noise and distortion. These methods are complex and

required lot of computations.

2.4.1.2 Tamura Feature

Tamura texture features are typical spatial domain features. Tamura texture features con-

sist of 6 components e.g. coarseness, contrast, directionality, likeliness, regularity and

roughness [22; 44]. Among these, coarseness, contrast and directionality are considered

to be more important. The Tamura feature descriptor is not effective when used to rep-

resent distorted or deformed images because it is sensitive to scale and orientation [44].

Coarseness has quick relationship to scale in addition to reiteration rates in texture fea-

tures. An image will contain materials such as texture in a small number of scales, and

expects coarseness, Roughness to determine the size of the largest, where there is tex-

ture, and even in the case of a small texture exists. By contrast, the formation of seeking

dynamic range of grey levels in an image group, along with the polarization of the distri-

bution of highly contrasting (Black and Red). The specific component depicted does not

intend to separate between different orientations or patterns however measures the overall

directional class.

20


2.4.1.3 Markov Random Fields

Markov Random Field (MRF) are used in various images processing application such as

texture synthesis, classification, image segmentation, restoration and compression. MRF

model successfully represents the textures and consists of small primitives. Markov Ran-

dom Fields is a probabilistic process [47]. The probability of a cell in a given state is deter-

mined by the probability of neighboring cell therefore all interactions are local. Widely

used models include the Gaussian MRF (GMRF), and the simultaneous autoregressive

(SAR) model. If the model is independent, zero mean, identically distributed and the unit

variance noise, i.e. red noise, then the model is called simultaneous autoregressive model

(SAR). SAR model uses fewer parameters as compared to other MRF models. However, it

is not rotation invariant [48; 49]. The multi-resolution simultaneous autoregressive model

(RISAR) is proposed in [50]. When we used multi resolution Gaussian pyramids with low

pass filter then image applied several successive levels for sub sampling. At each level,

either SAR or RISAR model can be used. Although it has a better performance com-

pared other texture features, such as wavelet transformation and Wold Decomposition.

The MRSAR cannot distinguish images when the structured pattern is involved.

2.4.1.4 Co-occurrence Matrix

Co-occurrence Matrix is one of the earliest methods which consist of statistical features of

grey level to classify texture. Haralick [33] proposed the Grey Level Co-occurrence Ma-

trix (GLCM)and used statistics 2nd order method. They extract features from an image

using GLCM. It is very effectively and successfully for texture classification in evalua-

tions [51]. Features are obtained by using normalized matrix of co-occurrence P (Q, R)

are given in table 2.2.

GLCM matrix depends on the frequencies between two pixels in an image and sep-

arated by certain vector. The matrix distribution depends on the distance and angular

relationship between pixels. Different texture characteristics are captured by changing

and varying the vectors [52]. GLCM Can be categorized into four groups.

1. Characteristics of Visual texture.

21


Table 2.2: Features calculated by using matrix of normalized co-occurrence P (Q, R)

Feature Formula

Energy∑q

∑r

P 2(Q,R)

Contrast∑q

∑r

(Q,R)2P (Q,R)

Entropy∑q

∑r

P (Q,R) logP (Q,R)

Homogeneity∑q

∑r

P (Q,R)1+(Q−R)

2. linear algebra, Statistics.

3. Correlation Information.

4. Information theory.

2.4.1.5 Edge Histogram

The edges of the images in the definition of a basic feature of the representation of its

contents histogram is used to represent the edge features. Edge histograms imply the

frequency and also the directionality in the purity modifications from the image, it is a

unique feature for images, which can’t be replicated by a color histogram or the homoge-

neous features of texture. To be able to symbolize this original characteristic, exclusive

MPEG-7, there’s a descriptor regarding for edge distribution in the image. Edge His-

togram Descriptor (EHD) are used to reflect the distribution of all the local edge in the

form of an image [53; 54]. In the extraction of spatial scale feature edge in a block image

to the application of digital filter. The descriptor of edge histogram represents five forms

of edges, such as a 4 edges of directional and remaining edge is direction. To recognize

the image, the edges of the play a key role in the feature extraction and retrieval of images

with similar semantic meaning.

22


2.4.1.6 Fractals in Spatial Feature

A fractal texture is characterized by Self-Similarity. This self-similarity is quantified by its

fractal dimension and defined as D = logN/(log(1/r)). The Fractal dimension describes

the roughness of texture image many researchers are used fractal technique to get texture

feature in an image [55; 56]. The main problems are both the difficulty in finding the

fractal dimension and the lack of self-similarity in most of the real-life textures. Another

problem is that visually different textures may have equal fractal dimension [57].

Spectral domain methods such as multiresolution wavelet [58], discrete cosine trans-

form [59], Gabor filters [60], and the multiresolution simultaneous autoregressive model

[40] take the advantage of being sensitive to noise. Consequently, most of these turns

possess extensively also been employed to symbolize image textures Another multireso-

lution system, discrete curvelet change, has been created by Cands and Donoho, which is

powerful technique to generate the edges. DCT has been properly employed to denoising

images [61]. Spatial method is more efficient over spectral methods and widely used in

CBIR.

2.4.2 Spectral Domain

Spectral domain discusses density function in the frequency domain. It means that in this

domain frequency is an important role in CBIR texture. For large size images, spectral

features are useful and for small size images having irregular shapes, spatial feature are

reliable. Innumerable researcher used spectral features for CBIR. A human can separate

two different images at a glance but when a machine tries to perform the same job then

a lot of image discriminatory information needs to be preprocessed. Now, we discuss the

effectiveness of several well-known transforms in representing image texture feature in

CBIR techniques.

2.4.2.1 Fourier Transform

Fourier Transform (FT) uses Fourier analysis to measure the frequency components of

the signal. The purpose of FT is to convert a time-domain signal into the domain of

23


frequency [62]. FT provides the pattern information of an image that is collected from

its frequency components [17] as shown in Figure 2.6 [17]. The frequency components

at specific locations associated with an image are utilized to represent the texture features

of that image. Texture features computed from high frequency components are the main

distinguishing factors between images which are used in CBIR [63]. Therefore, frequency

information at specific locations is required to distinguish images in CBIR. However, the

disadvantage of Fourier transform is it captures only the global spectral features but does

not provide any information about the exact location of these features as shown in Figure

2.7 [62].

Figure 2.6: Image and its Fourier Transform - [17]

Figure 2.7: Different images with same Fourier Transform - [62]

Fourier transform miscarries to provide texture pattern discrimination information

properly. Two completely different images may have similar patterns in their Fourier

24


domain (shown in figure 2.7). The original images on left side look different; however,

their Fourier spectra have similar patterns. Based on this spectral pattern, these images

may be considered as similar in a CBIR process but they can easily be differentiated by

human perception [63]. Therefore, Fourier transform is functional only when spectral

features of the signal are taken into consideration but not their exact location of occur-

rence. In [64] shape features are used for image retrieval purpose. However, when they

using texture for CBIR, the frequency components, the exact location of the different fre-

quencies are equally important in distinguishing images. Thus, Fourier transform is not

applicable directly in texture based CBIR.

2.4.2.2 Discrete Cosine Transform and Wavelet Transform

Discrete Cosine Transform (DCT) is like to the discrete fourier transform. It means that

signals or image are converted from the domain of spatial to frequency domain and ob-

tains DCT coefficients which can be used in various images processing purpose. Discrete

cosine transform has been adopted as an effective technique for image and video compres-

sion. It possesses the property of preserving the image energy in the low frequency DCT

coefficients which makes it so popular for data compression [65]. Different approaches

for shape, texture and color feature extraction and indexing using DCT can be found in

[66]. Similar to FT, DCT texture features can only capture global features while ignoring

local details. Therefore, it is not suitable for CBIR.

Wavelet transforms are effectively used for analyzing the texture information of im-

ages. It decomposes an image spectrum into multi scale and oriented sub band packets

of wavelets is a generalization process of wavelet decomposition and all possible combi-

nation of sub band tree decomposition are obtained [67]. Wavelet packets are all around

restricted in both time and frequency and in this manner give an attractive alternative to

pure frequency analysis [61; 67].

In the work of [68] builds a wavelet transform to analyze the material fixed in the

light of the four tap Daubechies wavelet filter coefficients. In its strategy, the texture is

disintegrated into ten channels, which are acquired three level of wavelet decomposition.

25


In each level, texture is split into four channels, which can be represented by LL, LH, HL,

and HH. For example, channel represents LH low horizontal and high vertical frequency

Subsequently the channel HH funnel has almost of the image noise, it can be disposed

at each decomposition stage. The combination of horizontal and vertical elements in

each frequency to obtain a rotational invariant. Investigates 16 Brodatz compositions at

six introductions are performed to test the execution of this system The normal wavelet

coefficient is separated from each of the staying four channels and utilized as invariant

elements for the characterization. The detriment of this procedure is that the directional

information is lost when the channels are merged.

2.4.2.3 Gabor Filters Transform

Gabor filter transform is a virtuous multiresolution tactic. Gabor filter represents the

edges of image in an effective manner through the use of multi-orientation and different

scales. Various Researcher good opportunities to use the Gabor filter in image processing.

Gabor filtration system results in some sort of separate out standard bank composed of

Gabor filtration system together with a variety of regulations, scales and also orientations.

Gabor filters utilize multiple window size in a different level where as STFT uses only

one window.

Gabor filters have been extensively used in texture representation and compared TWT,

PWT and MR-SAR model [39; 69]. Gabor filters texture feature are found to be the most

propitious, auspicious and robust. Gabor filters removed the problem of similar retrieving

and rotated images from an image data based [70]. Wavelets are not so effective in repre-

senting edge discontinuities in images [63; 71]. In the work of [46; 72], several different

spectral approaches have been introduced and seen that the gabor filters perform best as

compare wavelet transform including the Orthogonal wavelet, bi-orthogonal wavelet. Ga-

bor filter texture feature have been found suitable and useful in CBIR of biomedical image

like cervicographic images for cancer detection [73]. Gabor filters only use the half peak

magnitudes in frequency domain and do not involve image down sampling.

26


The drawback of this methods is that they do not capture the edge information of

an image effectively. This is the reason for finding better multiresolution spectral ap-

proaches, which can capture the edge and orientation information of an image effectively.

When using the technique of discrete wavelet and Gabor filters transform then edge and

orientation caused a problem. A newly multiresolution scheme has been introduced by E.

J. Candes and D. L. Donoho [61].

2.4.2.4 Curvelet Transform

To overcome the limitation of Gabor and wavelet filters then developed a new technique

such as curvelet transform. The first strategy involving curvelet transform implements the

discrete ridge-let transform [61; 74]. It has been used for effective tools in image denois-

ing [61], classification of texture [75], image convolutions [76], contrast enhancement

[77] etc. In the work of [62; 74], introduced two new techniques of curvelet transform

depending on different operation involving Fourier samples are given below.

1. Unequally Spaced Fast Fourier Transform (USFFT)

2. Wrapping Based Fast Curvelet Transform (WBFCT)

WBFCT is more rapidly in computation time and more robust ridge-let and unequally

space fast Fourier transform as shown in Figure 2.8 [62]. So, we start the definition

of ridge-let transform ,f(x, y) is given image then continuous ridge-let coefficient are

expressed as [76; 78].

<(a, b, θ) :=

∫∫ψ(a,b,θ)(x, y)f(x, y)dxdy (2.1)

In above equation 2.1, where a >0 is known parameter of scale, b ∈ R translation

parameter and θ [0,2] is the orientation parameter.

ψ(a,b,θ)(x, y) = a1/2ψxcosθ + ysinθ − b

a(2.2)

27


Where θ = orientation of the ridge let. Ridge lets are constant (X cos θ + Y sin θ =

constant) and transverse to these ridges are wavelets [76]. A ridge-let gives the informa-

tion of edge direction and is much more faster than a conventional sinusoidal wavelet. In

this approach, first decomposed the input image into set of sub-bands and then further

divided into several blocks for ridge-let analysis. During the process a large amount of

redundant images so this process is very slow time consuming and not effective in large

database.

Therefore, Fast discrete curvelet transform (FDCT) in view of the Fourier wrap sam-

ples have a lower computational complexity because it uses the FFT instead of complex

ridge-let transform. In this approach, it has been tight to support curvelet in the fre-

quency domain to reduce duplication of data. Normally, ridge-lets use a set size length

that is equal to the image size with a variable width, whereas curvelet have both variable

length and width which is represented more anisotropy [79]. So, (WBFCT) is easier, less

redundant, more straightforward and faster in computation than ridgelet based curvelet

transform. Thus frequency plane is covered by curvelet spectra, and there is no loss or

wasted of spectral information like as the Gabor filter.

Figure 2.8: Five-level curvelet digital tiling of an image - [62]

Majumder has described a method to automate Bangla basic character recognition us-

ing ridgelet based curvelet transform [80]. There are fifty characters in Bangla language

and all the existing Bangla fonts use all these characters. Majumder has changed each

character morphologically by thinning and thickening twice the original characters for

his experiment. In training phase, the curvelet coefficients have been extracted from all

28


these characters to generate texture features descriptors and 5 sets of classifiers have been

created from each character. The characters are altered to capture the changes in charac-

ters of different fonts by slightly varying their edge positions. Curvelet texture features of

the query character are then compared with the training sets to find the same characters.

He has done the experiment on only twenty well known Bangla fonts. Therefore, there is

no guarantee that this application will recognize all characters in complex format as well.

In the work of [80] Joutel et al. have created a convenient assistance tool for the identi-

fication of ancient handwritten manuscripts using ridge-let based curvelet transform. The

curvature and orientation of handwritten scripts are the two main morphological shape

properties used to generate discrete curvelet features. Joutel et al. have focused on charac-

terizing handwritings and classifying them into visual writers families. Problems regard-

ing historical manuscript classification include a difficulty in segmenting lines, words,

nonlinear text size differences, irregular handwritten shapes, difficulty in the recognition

of spaces or edges due to lack of pen pressure, unpredictable page layouts, etc. Moreover,

backgrounds of many ancient documents have noisy texture patterns. Although the clas-

sification and writer recognition tests computed on two separate databases obtain a high

level of accuracy, this approach has some shortcomings. One orientation representation

and one curvature representation have been generated in this approach from each script,

which is not enough to classify and characterize all ancient handwritten scripts. Texture

patterns of image are not represented in this approach, so it will not be effective for natural

image retrieval.

In the work of [75] texture classification by statistical and co-occurrence features us-

ing discrete curvelet based on ridge-let is presented. In this work, texture classification

depends upon three different feature descriptors. The first consists of curvelet statisti-

cal features (CSF), i.e., mean and standard deviation. The second consists of curvelet

co-occurrence features (CCF), i.e., cluster shade, contrast, local homogeneity and clus-

ter prominence. The last one involves the combination of the CCF and CSF descriptors.

Above all, we have described several related works on curvelet transform, all of them

use ridge-let based curvelet transform. So far, we find only one application of wrapping

29


based curvelet transform in texture classification method for analyzing medical images

gathered from computed tomography [75]. The main problems in this approach are the

existence of large number of similar images in the database and small image dimension.

Because the database has only human tissue images. Therefore, it has less variation in its

domain compared to the natural image databases. Tissue texture of same human organ is

expected to have a negligible difference in such a small image (32 ∗ 32), thereby making

the classification method simple. Natural images are quite different in nature. Therefore,

this process will not be effective for CBIR in a large database with large natural images.

We find ridge-let based curvelet transform has some drawbacks. Newer approaches

of curvelet transform, USFFT based and wrapping based fast discrete curvelet transform

have several advantages over ridge-lets as well as the ridge-let based curvelet transform.

Among these new approaches, wrapping based discrete curvelet transform provides addi-

tional features such as robustness, simplicity and good edge capturing capability in image

texture representation. We also described how this wrapping based curvelet transform

works by providing details on curvelet structures in spatial and spectral domain and how

these curvelet provide better texture discriminatory property in representing edges.

2.4.3 Color Feature

Color is an imperative visual feature and pervasive nature for digital images. The actual

extraction regarding color features from digital images determined by comprehensive em-

pathetic of the color theory in digital images. A color space can be utilized to indicate a 3

dimensional coordinate color system and a subspace in the system through which colours

tend to be displayed. The most recognized and widely used digital photos and computer

graphics color space RGB color space where colors are linear combinations of red, green

and blue channels of representation. Moreover, most of the pixel values stored digital im-

ages from the RGB color space formats. The geometry of the RGB color space is depict

in figure 2.9[81].

Different color spaces are analyzed based on three properties. The properties of in-

terested in a color space possessing are uniformity, completeness, and uniqueness. The

30


Figure 2.9: RGB Color Space - [81]

RGB color space is not perceptually uniform. Among the color spaces, HSV and HMMD

are more useful in measuring perceptual similarity. Some of popular techniques for color

features extraction are commonly used color histogram, the color coherence vector, HSV,

YCbCb, HMMD and dominant color descriptor.

2.4.3.1 Color Histogram

Color histogram is a very simple color feature and it represents the color distribution of

an image. It has a lot of applications in image retrieval and object recognition [63; 82].

Color space are converted into bins and each color bin has its own frequency. Colour

histogram is usually invariant in order to translation in addition to rotation. Normally

colors are grouped into bins. Therefore, every color occurrence contributes to the overall

score of the bin which belongs in an image. The bins generally indicate the quantity of

red, green and blue found in the pixels rather than indicating which individual colors are

presented [82]. Histograms of the RGB color are usually normalized, and compared with

different sizes as shown in figure 2.10 [82].

Color histogram lacks the spatial information about pixels and relationship between

different image parts cannot be maintained. Different objects having same color can have

same color histogram. Color histogram ignores important information like shape, texture

and two different visuals can have same color histogram and it lacks spatial information.

31


Figure 2.10: Image and its color histogram - [82]

2.4.3.2 Color Coherence Vector

Color Coherence Vector (CCV) is different from color histogram in a way that it captures

the spatial information in an image [83]. It is constructed by improving the performance

of global color histogram (GCH). Computational complexity of a CCV is more than a

color histogram. In the work of [83] pixels are classified as coherent or incoherent, ac-

cording to its similarity to the specified color region. Color region larger than 1% of the

image size are considered as coherent and less than 1% are taken as non-coherent. Due

to separation of coherent and incoherent pixels and resemblance of significant region to

coherent pixels, CCV performance is better than a color histogram. Color histogram can

be same for two different images while by using CCV difference can be distinguished by

separating the histogram of coherent and non-coherent region. In [84] proposed an effi-

cient image retrieval system by using interior pixel classification. In the image analysis

step quantization of an image is done in RGB space and pixels are classified as interior or

border on the base of pixel location by using 04 neighbor rules for sake of simplicity. Two

histograms have been calculated for interior and border pixels and binary classification of

image pixels has also been proposed. Due to classification of interior and border pixels

the resulted histograms are discriminative. In-case there are smaller interior pixels for the

same color then there must be some visual property of the image that is useful for creating

the difference [85; 86]. Three images with their CCV color feature vectors are shown in

32


figure 2.11 [86].

Figure 2.11: Images and their CCV color feature vectors - [86]

2.4.3.3 HSV Color Space

In HSV color space, images are treated as group of pixels comprising of red, green and

blue values [86; 87]. HSV color space solves the color distribution problem through

symbolizing color close to human perception [88]. In HSV color model Hue; represents

the color, Saturation is amount of the color and amount of light is represented by Value

[88; 89] as shown in figure 2.12 [86].

The dominant spectral components is represented by Hue and represents it as red,

yellow or green. Addition of red to the color adds the saturation (S) while V represents

the brightness. The cylindrical co-orientate system is represented by the sub space that is

based on inverted pyramid with six sides. Change of each color weight can be perceived

by using linear refractivity and similarity measure depends on color. Primary color RGB

are converted into HSV then first of all, to calculate the minimum value and maximum

value from the primary RGB Triplets.

H = cos−1 1/2[(R−G) + (R−B)]√(R−G)2 + (R−B)(G−B)

(2.3)

S = 1− 3[min(R,G,B)]

V(2.4)

33


Figure 2.12: The RGB and HSV Color Space - [86]

V = 1/3(R +G+B) (2.5)

2.4.3.4 HUE MINIMUM MAXIMUM DIFFERENCE (HMMD)

The HMMD hue-min-max-difference color space is new color scheme, which is sup-

ported in MPEG-7 together with simple monochrome grayscale and intensity only space.

The Hue is defined and calculated from HSV color space and min-max is the minimum

and the maximum values among the R, G and B values. The difference factor is charac-

terized as the difference between maximum values and minimum values [28; 90]. This

color space can be represented using the double cone structure as shown in figure 2.13

[28].

2.4.3.5 Dominant Color Descriptor (DCD)

Dominant Color Descriptor (DCD) is a variations of color histogram and extract the color

from the highest bin of histogram [91]. The bin height threshold is used for the selection of

color bins. According to MPEG-7 standard, 1-8 colors are sufficient for the representation

34


Figure 2.13: HMMD Color Space - [28]

of a region. This selected color bins are adopted to the regions instead of the adoption

of the color space. The performance of DCD is more accurate as compared to color

histogram [82]. Many to many matching is used for the distance calculation and similarity

check.

2.4.3.6 YCbCr Color Space

The variation involving YCbCr in addition to RGB is usually in which YCbCr signifies

color since lighting as brightness and difference of two color signals, while RGB consist

of red, green and blue colors. In the YCbCr color space, is represented component of the

Y as brightness called Luma, Cb is blue less Lama (B-Y) and Cr red Lama minus (R-Y).

This color space exploits the characteristics of the human eye. The eye is more sensitive

to changes in light intensity and less sensitive to changes hue. Whenever the amount of

information is to be minimal, the density of the component can be stored higher than Cb

and Cr components accurately.

35


2.4.4 Shape Feature

Shape is another imperative visual components of an image and recognized real world

objects. Shape features have been used in many application and for image retrieval. In the

work of [92] broadly classify shape extraction techniques and strategies are categorized

into two groups, one is Contour Based and other one Region Based methods. Contour

based techniques compute the features of shape that is only boundary of the shape and

also limit of the shape. While the 2nd region based approaches are used to extract features

from the entire region in an image. Since Contour based schemes employ a portion of

small region in an image. There are additional vulnerable to noise as compare region

based. For that reason, region based shape features are used to extract the color feature in

image retrieval.

Shape descriptors are normally used in color image retrieval, such as, moments, cir-

cularity, area and eccentricity. The descriptor of area-based is employed in several work

in image processing. In the work of [93] eccentricity or elongation is used. Eccentric-

ity is the ratio of actual key axis length and minor axis. Therefore, normally merged to

make a more efficient form descriptor. The type of method explains that the shape is ex-

tracted on the base of contours or regions. Different structural and global approaches have

been defined further in both methods. Figure 2.14 [92] show the classification of shape

representation and their techniques.

Figure 2.14: Classification of shape representation and description technique - [92]

36


2.4.4.1 Shape Extraction based on Contour Method

Object boundary is used in contour shape extraction. Two different types of approaches

are used in contour shape methods that are global approach (continuous) and structural

approach (discrete). In global /continuous approach, a shape is not divided into parts and

obtained feature vector from integral boundary for the shape description. Shape similarity

is considered as a distance metric amongst the feature vectors. In discrete shape boundary

approach is broken into small segments which are known primitives by using particular

criterion and final feature representation is build a string [92]. Multidimensional feature

vector of numeric from shape boundary are extracted in global contour shape represen-

tation. The matching process is based on metric distance e.g. city block distance or

Euclidean distance.

2.4.4.2 Simple shape descriptors

The Simple shape descriptor consist of Area, eccentricity, major axis orientation and cir-

cularity [94]. The simple shape descriptors are used for large differences of shape dis-

crimination. Filter are used to remove the false hits or other combined shape descriptors

for discrimination.

Figure 2.15: Shape eccentricity and circularity - [28]

Figure 2.15 [28] describes the eccentricity of shape, figure 2.15 (a) is showing a

parabola but it is not correctly describing the shape as it seems to be an elongated shape

and if we consider it as circular it seems to be a better option. If we consider the circu-

larity shape as presented in figure 2.15 (b) and figure 2.15 (c) the same circularity having

different shapes.

Efficiency of a simple shape descriptor includes convexity, elliptic and circular vari-

ance. Similarity can be checked to calculate the means of point to point matching and

37


every point is dried as a feature point [94]. Hausdroff distance is example of classical

correspondence shape based matching method and used for object location in an image

and can calculate the similarities between shapes [92]. The advantage of Hausdroff dis-

tance is partial matching of shape while Hausdroff distance is not rotation, translation and

scaling invariant. For matching of any shape model is overlapped on the image in differen

positions, orientation and direction scale. This results in an expensive matching.

2.4.4.3 Shape signature

Shape signature is represented by a one dimension function derived from the shape bound-

ary. Centroid profile, centroid distance, complex coordinates, tangent angle, chord length,

area and curvature are the example of shape signature [95]. For best matching between

shapes are obtained from shift matching factor which is depend on the orientation changes

in 1-D space. Some signature matching requires the matching of shift in 2D space exam-

ple is centroid profiles. For online retrieval, matching cost is too high. Shape signature

are sensitive to noise and a very small change in boundary can result in large error during

matching due to which it is not beneficial to describe the shape using shape signature. Fur-

ther work is required for increase in performance and reduction of machine load. A shape

histogram can be simplified through signature of quantization into signature histogram

which is invariant.

2.4.4.4 Stochastic method

Time series especially auto regression modelling has been used for calculation of shape

descriptors. Techniques for this sort of class are for the most part situated in stochastic

demonstrating of 1-D function ’f’ obtained from the shape. A linear regression model

conveys the value in a particular function, such as linear combination of some of the

previous values [96]. The autoregressive model depends on basic radius by a combination

of M perception values and an error term. The AR methods are failed due to complex

boundaries, parameter of AR are not adequate for a description.

38


2.4.4.5 Spectral transform

Spectral transform shape descriptor overcomes the problem of boundary variations by an-

alyzing the shape and noise sensitivity in spectral domain. Spectral descriptor includes

wavelet and fourier descriptor, both are derived from 1D shape signature. Researchers

have proposed the wavelet descriptor regarding shape description which includes a ben-

efit over Fourier descriptor and multi resolution have been presented in both spatial and

spectral space. The increase of spatial resolution will absolutely give up frequency res-

olution. In the work of [92; 95; 96] low frequencies of wavelet coefficient are used for

shape representation.

In global contour shape, representation depends upon the shape of object in the form

of contour. The matching process between shapes can be done in the field of feature.

Accuracy and less computational power are the main requirements for any shape repre-

sentation system. Global shape descriptors are inaccurate, compact and combination with

other useful shape descriptor can be useful. Signature based shape matching and corre-

spondence based shape matching are not adequate for online shape matching due to the

involves the 2D matching of two shapes. In case only partial matching is required than

Hausdroff distance is a better choice. Autoregressive methods involve metrics operation

which are complex due to more computation. Implementation of FD is simple and has

less computation by using fast Fourier transform.

2.4.5 Interest point detector

Interest point detectors have significance important in feature extraction. Most of the por-

tion of the CBIR framework utilizes the worldwide elements of global features like as

texture, shape, color and combination of these features to extract the required features

from the images. Most similar images are extracted by using the local features and differ-

ent transformation. It describes the local pixel in the image advantage through the local

neighborhood content. The local descriptors such as SIFT, Harris detector and SURF

search for distinctive locations called interest points in an image.

39


2.4.5.1 Harris detector

The specific Harris detector continues to be widely used throughout Object detection as

well as image retrieval for its repeatable detection performance. Basic thought of this

detector is the use of the function of autocorrelation with the ultimate goal of deciding

where signal modifications within one or two recommendations [97]. The interest details

recognized with the Harris detector aren’t invariant to be able to scale changes. The Harris

Laplace detector could find size and scale invariant feature capabilities. Step one on this

approach is usually to figure out the interest points say Harris points at different scales.

These points consist of local maximal measure are chosen by Harris Laplace due to its

high detection rate [98]. The particular size in the level using the Laplacian optimum will

be taken as the attribute size and scale on the interest level points.

2.4.5.2 Speeded up robust feature (SURF)

The Speeded Up Robust Features (SURF)algorithm is scale invariant in addition to rota-

tion invariant detector interest point and this detector are computationally very fast. The

detector detects points of interest within the image. Descriptor symbolizes the features of

interest points and also builds the distribution of responses wavelet Haar inside the neigh-

borhood points of interest feature vectors of Points of Interest. Specific integral image is

usually calculated speedily coming from an input image Which led to accelerate specific

account points of interest [99; 100]. The main steps of SURF technique is as follows:

1. Interest point detection

2. Interest point description

3. Feature matching

Interest point detection

To reduce the computation time Integral images are used. Due to its good performance

in accuracy, Hessian matrix approximation and convolution filters are used to detect the

interest point as shown in figure 2.16. The Hessian matrix is used to determine the location

and scale of descriptor.

40


Figure 2.16: Surf interest point -

Interest Point Description

The SURF represents the actual distribution on the strength content and information inside

of the interest point neighbourhoods. The distribution of first order Haar wavelet is used

to take the advantage of integral images for speed. Become fixed to assist in the rotation

of the image, it has been identified reproducible orientation of Points of Interest. For

that reason, the responses of Haar wavelet are generally determined within times in x and

y directions inside a neighbourhoods of radius 6s near to the point of the interest. To

compute the response in xy direction at any sale, six operation are necessary required.

In initial step, comprises of building a square region focused on the IP and selected

different orientation. To preserve vital spatial information, the region is fragmented into

several small blocks i.e. 4 x 4 square small regions. The Haar wavelet replies in both

direction horizontal and vertical are called dx and dy respectively. To raise this robustness

to geometric distortions and also localization problems. Therefore, dx and dy are initially

weighted with a Gaussian standard deviated by = 3.3s centered at the interest point. At

that point, the summation of the wavelet responses of the component dx and dy over each

sub-region procedures an initial set of entries in the feature vector [99; 100].

SURF Descriptors Matching

SURF matching descriptors of each of the images is by an algorithm Nearest Neighbor

(NN) which classifies the objects rely on neighboring training examples in space feature.

The NN acts as to start with, training process makes objects repository which is all of us

already know just what the correct group ought to be. At that point, query is given to

the system to obtain the objects which is classify on the nearest neighbours of the query

in repository. Then, the system classifies the query as belonging to the same class as his

41


nearest neighbor is shown in figure 2.17. Figure 2.17 (a) shows that matching between

original image and cropped image and (b) is showing matching between original image,

scaled and rotated image.

Figure 2.17: Surf interest point -

2.4.5.3 Scale Invariant Feature Transform (SIFT)

David Lowe was efficiently introduced the Scale Invariant Feature Transform (SIFT) to

explains the feature of local image [101]. The SIFT procedure consists of four stages:

1. Scale space extrema detection

2. localization Key-point

3. Orientation

4. Key point descriptor

Scale-space extrema detection

To detect the local interest points by using SIFT algorithm is called key points. In this

particular stage, the algorithm has to look at the potential at all levels along with the

sites main points of the pictures can be successfully implemented and implemented using

the function of the difference between the Gaussian that can be fixed, and the scope and

orientation efficiently.

42


localization Key point

The following stage is to perform a point by point fit to the adjacent information for

location, peak magnitude and response of the edge . A location is identified in image scale

spaces which can be invariant to be able to scale with respect to rotation of an image, in

addition to scaling and translation. In each and every candidate location, a specific model

product is suit to find out location, scale and contrast. Key points are chosen taking into

account measures of their dependability and stability. To be able to define the particular

impression on each and a key point location, the image is processed to extract image

gradient and orientations [101; 102].

Orientation assignment

A number orientations are issued to be able to just about every a key point position depen-

dent on local image properties. The scale with key point is utilized to choose the Gaussian

smoothed image L, with the nearest scale, as all computations should be performed in a

scale-invariant manner.

Key point descriptor

Gradients of local image are calculated in the certain scale about each key point in the

region and transformed that takes into consideration neighbourhood shape distortion in

addition to change in illumination. At the first stage, compute the gradient magnitude

from key point descriptor and to calculate the image orientation region around the key

point location. Weighted Gaussian window are used to indicate inside layer circle and

gathered into orientation histograms summarizing the contents over 4x4 sub regions as

shown in figure 2.18. The length of each arrow indicates the gradient magnitudes, direc-

tion in an image region. SIFT feature have the uniqueness of their object matching ability

that work efficiently even for large databases [101; 102]. Figure 2.18 describes the image

gradient and key point descriptor.

43

2.5 Similarity Measure

Figure 2.18: SIFT interest point - [101]


In CBIR, finding similarity measure which is consistent with human perception to map

the users information need is a fundamental problem. A simple way to detect similarity

between images is through distance measures. In particular, a specific distance measure

could be designed for a single visual feature in a certain space to match the perceptual

similarity [103; 104]. However, simple distance measures are not always effective, there-

fore more complex methods are desirable for CBIR, which may be more effective as well.

Several distance measures are introduced in literature, which are categorized either for

metric space or for the histograms.

2.5.1 Metric Space

When feature vectors are used for image representation purposes that reflects some points

in a metric space, then similarity are usually determined by computing the distance be-

tween their corresponding points in the space. Various metric distances can be used for

this purpose to find the similarity [105].

2.5.1.1 Manhattan Distance

Manhattan distance is also known as the L1 distance, taxicab and city block distance. This

similarity detection metric can be calculated by using the formula of distance between two

points P = p1, p2, ...., pn and Q = q1, q2, ...., qn as the sum of their absolute coordinate

44


differences:

DistL1(P,Q) =n∑i=1

|pi − qi| (2.6)

2.5.1.2 Euclidean Distance

Euclidean distance is frequently denoted to as the L2 distance, and measures the shortest

path between two points: Euclidean distance is computed as:

DistL2(X, Y ) =

√√√√ n∑i=1

(|xi − yi|)2 (2.7)

Some applications like MARS [106] added weight component in the Euclidean distance

to compute the similarity. The modification can be represented as:

DistL2(X, Y ) =

√√√√wi

n∑i=1

(|xi − yi|)2 (2.8)

and is known as weighted Euclidean distance.

2.5.1.3 Minkowski Distance

The Minkowski Distance is a explanation of the metrics L1 and L2, in which the param-

eter P controls how the distance is calculated [105].

DistLP(X, Y ) =

n∑i=1

((|xi − yi|)P )(1/P ) (2.9)

Using 1 as value of P results in the form of Manhattan distance, while choosing the

value of Euclidean distance p =1 results in the form of Chebyshev distance. Fractional

distances can be acquired by choosing 0 |P | 1. As these distances violate the triangle

inequality therefore these distances are not the metric distances.

2.5.1.4 Hausdorff Distance

In region based image retrieval Hausdorff distance is used which can be computed as:

DistH(X, Y ) = max(maximinjD((|xi, yj|)),maxjminiD((|yj, xi|)) (2.10)

where D(|xi, yj|) and D(|yj, xi|) is underlying distance between vectors X =

x1, x2, ...., xi and Y = y1, y2, ...., yj

45

2.6 Performance Evaluation

2.5.2 Histogram Distance

In image retrieval histograms are frequently used, especially when color features are used

for image representation purposes. An alternative use of histograms is in the form of

probabilistic distributions, where often the likelihood of an image matching the query

concept is considered.

2.5.2.1 Earth Movers Distance

This distance metric computes the distance between two weighted distributions, by con-

verting values of the first distribution into those of the second distribution. The distance

can be computed as [107].

DistEMD(X, Y ) =

∑mi=1

∑nj=1 fijcij∑m

i=1

∑nj=1 fij

(2.11)

In above equation X = (x1, wx1), .....(xm, wxm) and Y = (y1, wy1), .....(yn, wyn) where

xi and yj are the cluster representative,and wi and wj are the corresponding weights of

clusters. Furthermore,cij represent distance between two clusters ′i′ and ′j′, fij represents

optimal flow in converting distribution X to Y .

2.5.2.2 Kullback-Leibler (KL) Divergence

For two probability distributions X and Y , KL is an asymmetric dissimilarity measure.

The KL divergence is used for measuring the similarity of texture features. One way of

interpreting its functioning is that, it is used to find the added number of bits, when code

of Y is used to encode the events which are sampled fromX . The KL divergence between

two distributions can be computed as:

DKL(X, Y ) =

∫ ∞−∞

X(j)logX(j)

Y(k)

dx,DKL(X, Y ) =∑i

X(j)logX(j)

Y(k)

(2.12)


When evaluating a CBIR system, the main objective is to determine its performance.

Performance can be evaluated in terms of accuracy, e.g. how many mistakes the image

46


retrieval algorithm makes, or in terms of computational complexity, e.g. how quickly does

the system present the results. In this section, a number of popular performance measures

are presented, which are frequently used for the performance evaluation of CBIR systems.

2.6.1 Accuracy

The major emphasis of most researchers is on assessing that, how well their proposed

method works, especially when comparing their work with the works of other researchers.

In the domain of CBIR, image retrieval algorithms are designed in a way that they can in-

crease the correct categorization rate or at least can assure that the top M images returned

as the system response should be relevant for a user. Four cases can be distinguished

when an algorithm assigns a label to an image [105], which are shown in figure 2.19.

From users point of view, retrieval accuracy should consider the case of true positives

only, as these are the related images against any user query image. But as the output dis-

played contains limited images, therefore the false positives can influence the number of

correctly labeled relevant images, as one or more of the displayed images may actually be

incorrectly labeled as relevant. This is the reason that the image retrieval performance is

usually obtained in precision values and recall values.

Figure 2.19: Accuracy Parameters - Correct and incorrect labeling of an image -

[105]

2.6.1.1 Precision

Precision is used to indicate how exact an algorithm is in retrieving the relevant images.

If we use the terminology of true and false or positives and negatives as shown in figure

47


2.19. We assume that the retrieval system only returns us relevant images, then we can

express precision as follows:

Precision = [true positives] / [true positives + true negatives]

But if the scope factor is also present in the system response, than precision of image

retrieval will be defined as:

Precision = true positives / images in scope of response

As the relevant images in a specific category are large which is not possible to display

once. Therefore, we keep a constraint that, the fixed number of images will be displayed

in a single go. This limit is known as scope. This can be more clearly understood as,

suppose there are 100 relevant images against a query image, and we present 20 images

as the system output in a single go. These 20 images will be considered as the retrieval

scope. In this case precision will be known as precision rate. When precision values

are plotted at multiple scope levels, than this graph is called a precision-scope graph.

In relevance feedback based CBIR, the precision is often plotted against the number of

iterations. To create this precision and iteration graph the scope is fixed at a certain value,

usually the number of images that are displayed in a single value.

2.6.1.2 Recall

Recall indicates the completeness of a CBIR system in terms of returning the relevant

images as its response. Recall indicates the percentage of relevant images at different

scope levels. Recall can be calculated as:

Recall = (truepositives)(truepositives+flasenegatives)

Recall is also known as sensitivity of the system, it determines the ability of a system

in terms of model association. In recall, it does not matter that how many images are

displayed on screen or how many incorrect images are wrongly considered as relevant.

The reason is that the recall performance measure only focuses on relevant images that

are found thus far. Due to this fact, recall is also known as true positive rate. In case of

relevance feedback based CBIR system, the recall can also be plotted against the number

of iterations as a recall-iteration graph.

48

2.7 Data Sets

2.6.2 Precision-Recall Graphs

For increasing numbers of retrieved images, we can be computed precision-recall values.

In precision and recall graph, values are plotted in the form of the precision and recall

curves. Recall of retrieval algorithm is plotted on x-axis and precision of algorithm is

plotted on y-axis. An ideal goal for image retrieval system development is to bring the

improvements in the system, so that both precision and recall values can be increased.

2.6.3 Mean average precision

By averaging the precision values obtained every time a relevant image is encountered

you get a good sense of how well a method overall performs [105].

AP =∑N

i=1 Precision(i)

N

Where N=true positives+false negatives. By calculating average precision for multiple

queries and taking the average of all these values results in the form of mean average

precision (MAP). The MAP value is usually considered as the same value, as that of the

area under the precision-recall graph.

2.7 Data Sets

For content based image retrieval evaluation, many image-sets are used, such as Corel

stock photos, Caltech101, TRECVID, OLIVIA, and ImageCLEF. Corel stock photos are

most popular for performance evaluation in the domain of CBIR. Caltech101 dataset in-

cludes 101 image categories, which has been extended to 256 categories. Caltech is used

for object recognition and classification. Image CLEF is used for cross-language evalua-

tion. This dataset also provides 20,000 images, which can be used for CBIR performance

evaluation. The TRECVID benchmark is also used by the CBIR researchers to validate

their image retrieval algorithms. Some research works also used OLIVIA image-set hav-

ing data of eight semantic categories.

49

2.8 Multimedia Information Retrieval


The Multimedia Information Retrieval (MIR) is a field that deals with the search of knowl-

edge in all forms of information media like video, audio and image. Content based strat-

egy is a huge essential method to access the necessary facts properly on the press data

source using concentration currently being upon increasing your research. The work to

extract the digital information began as soon as the idea of digitizing content that was

present in physical mediums such as books, vinyl records gained foothold. From a hypo-

thetical point of view, ranges, for example, areas such as artificial intelligence, computa-

tional vision, optimization theory, and pattern recognition have contributed significantly

to the basic scientific establishment of MIR. Psychology in addition to associated places

for example looks in addition to ergonomics supplied essential fundamentals for that dis-

cussion with all the consumer.

Furthermore, applications of pictorial search into a database of images already existed

in specialized forms such as face recognition using in biometric applications, robotic

guidance to traverse the terrain without hitting any obstacles and take the simplest way,

and to identify the characters in the textual data. At the forefront however was the field

of computer vision which provided some of the first algorithms for searching features

in video, audio, and images. Now increasing the internet Web engines caught on, and

started to provide image searches. Efforts were also made for integration of such systems

directly into commercial database systems. It was realized by scientists during the course

of developing media information systems that there was a widening semantic gap between

the low level features like colors and textures that were used in computations by scientists

and high level features like objects in an image that users generally searched for using

words from their daily language when searching for images of interest.

2.8.1 Image representations and similarity detection

The image retrieval (IR) is one of the trends in research the most active in the MIR.

The process associated with precise re-encoded type of graphic suitable for assessment

50


objective is referred to as attribute removal. Capabilities can be classified seeing that low-

level characteristics as well as high-level characteristics; low-level features are usually

extracted from the actual pixel. CBIR system sees the query image and the images in the

database in the form of a set of features, and occupies the importance of the query image

and images any target in proportion to the similarity advantage.

IBMs QBIC system [47] as the first CBIR system, which opened avenues for research

in the field of CBIR. Then, many of the CBIR systems, which aim to address the problem

of image search more effective by addressing the types of measures and the signing of a

new way to detect the image of similarities emerged. In CBIR, image signature plays an

imperative role to fabricate an efficient image search. Signature development is usually

performed through the analysis of color [82; 108], texture [63; 109], or shape [110; 111;

112] or by generating any of these combinations and representing them mathematically

[113]. CBIR traditional techniques rely on two types of optical properties: global and

local features. The aim of algorithms based on the global advantage in the whole picture

as visual content such as color, texture and shape as focused as local algorithms feature

mainly based on the key points or salient patches.

In CBIR, color features are extensively used that can be recognized better potentiality

in three dimensional domains over the gray level images which is single dimensional

domain. Texture features as powerful visual features are used to capture repetitive patterns

of a surface in the images. The formation of human identity and recognize objects in the

real world is known as an important cue of the shape. Shape features form have been used

for the purpose of retrieving images in many applications [114]. Shape features extraction

techniques are classified into contour based and region based methods. The classification

of contour method based on shape boundary which is consist of contours and extracted

features from the contour, while region based methods extract the features from the entire

region.

In the work of [115] a color-texture and dominant color based image retrieval system

(CTDCIRS) is proposed and three different features from the images, i.e., Motif Co-

occurrence Matrix (MCM), Dynamic Dominant Color (DDC), and Difference between

51


Pixels of Scan Pattern (DBPSP) are offered. Initially color quantization algorithm is used

to divide the image into eight coarse partitions and then from these eight portions eight

dominant colors are obtained. Next MCM and DBPSP are used to represent the texture

of image. For image representation these three types of features are integrated, which are

then used by the image retrieval system.

In the work of [116], a dominant color is defined as the color perceptual of a region

in HSV color space. In this technique, author calculated the dominant color feature by

using the color HSV histogram (10 * 4 * 4 bins) in a region and calculate the bin size with

maximum. To obtained the mean of HSV values of all pixels in the designated bins. This

value is known as Dominant color value in HSV color space. t was noted that in most

cases, the average dominant color and color are very similar.

In the work of [89], an image retrieval system is presented which is using texture

features. The technique combines the features obtained through the curvelet transform

and Region-based vector codebook Sub-band Clustering (RBSC) to obtain the dominant

colors and sub-band textures. In this approach, user define image and target images are

compared by using the principle of the most similar highest priority (MSHP) and eval-

uated for the retrieval performance. Lin et al. [2], proposed three image features for

efficient automatically retrieval of images. To extract the texture feature, difference be-

tween pixels of scan pattern (DBPSP)are used while color co-occurrence matrix (CCM)

are used to get the color features. The last image feature depends on color distribution,

called color histogram for K-mean (CHKM).

Jhanwar et al. [117], presented the MCM based technique for CBIR. In this work,

motif transformed image is computed and used for the driving of motif co-occurrence

matrix. 2x2 pixel grids are generated in the whole image. Motif scan and replaces the

network, which reduces the local gradient, while the network reflects and traversing the

2x2 grid forming the image motif transformed. 3D matrix is used to define the MCM

whose (i,j,k) entry in the transformed image, represents the probability of finding a motif

i at a distance k from the motif j. The concept of MCM is very similar to the color co-

occurrence matrix (CCM), but as MCM captures the third order image statistics as well

52


in the local neighborhood therefore the retrieval using the MCM is better than the CCM.

Wang et al. [107] offered a method of semantics classification, that can be used a wavelet-

based approach to extract the feature and then to compare the image. In the work of [118],

proposed a method cluster-based retrieval of images by unsupervised learning (CLUE).

They proposed an unsupervised clustering-based method, which generates multiple sets

of results that have been recovered and give more accurate results compared to the previ-

ous work, but their way of issues such as numbers of identifying groups and uncertainty

segmentation, experienced by this method results suffer unreliable.

ElAlami [85] proposed a model which is based on three different techniques: the

first one is concerned to extract the features from images repository. For this purpose,

Histogram of color and Gabor filter are used to extract the combined features such as

color and texture features. Second technique depends on genetic algorithm and obtained

the optimal boundaries of these discrete values. In the last technique, the selection of

features consist of two successive function which are called preliminary and intensely

reduction for extracting the most similar features from the original feature repository sets.

2.8.2 Image block based presentation and salient points

In computer vision, Local Binary Patterns (LBP) is widely used for classification and it

gave the operator LBP first as a complement to local variation image [119; 120] and has

been observed that there is a strong advantage for the classification of texture. The primary

form of the LBP operator has been depended eight neighbors pixel and the center pixel

are used the value of threshold. An LBP neighborhood code was produced by multiplying

the threshold values with the weights given to the corresponding pixels, and summarize

the result.

Some other visual features are also proposed for CBIR system, important points such

as salient points and spatial features. SIFT [121] and SURF [122] based on salient points

found in an image are the familiar visual features. Researchers have done a lot of work by

using these salient points in CBIR. Velmurugan and Baboo [100] applied SURF features

by combining them with the color features to improve the retrieval accuracy. In [123]

53


introduced human detection algorithm using histograms of oriented gradients (HOG),

which are similar to the features used in the descriptor Sift. HOG features are calculated

by taking the direction of the intensity histograms edge in the local area. It is designed

by the tradition of visual information processing in the brain and have the force of local

changes of appearances and position. The researchers showed that networks of HOG

descriptors are outside noticeable feature-implementation of human groups exist to detect.

Mallat and Peyre [124] introduced bandelet approaches to geometric image repre-

sentations. Orthogonal Bandelets using an adaptive segmentation are well appropriate

solution to capture the regularity of edge structure. They applied wavelet coefficient of

orthogonal transformation based on bandeletization. In the work of [125] proposed a

system based on bandelet transform and represented sharp image transitions such as edges

by taking the advantage of geometric regularity of the image structure in image fusion. To

create the fused image max rule is applied to select the geometric flow of the source image

and bandelet coefficient. In the work of [126] demonstrated significant improvements in

large scale image retrievals performance maintaining high retrieval speeds, by depicting

three improved models. RootSIFT among the established models expresses better per-

formance without increasing processing and storage requirements. Second model targets

inverted indexes and query expansion, whereas the last model retains augmented features

consistent with augmented images to speed up image retrieval effectiveness. Combination

of these three complementary models produces enhanced accuracy and efficient retrieval

speed. Therefore, to enhance image finding rate and simplify calculation of image re-

trieval, a dimensionality reduction technique must be adopted such as feature selection

[127].

2.8.3 Image classification and similarity detection

Neural networks are used as statistical tools as a part of different fields, which includes

Architectural, Statistics, Engineering, Psychology, Economics and also Physics. The aim

of this neural network is to learn or to discover some correlation between the input and

54


output patterns, or to analyze, or to discover the structure of the input patterns. Neu-

ral network has been used to improve the image classification problems because of its

property called ’black box’ to learn [128]. There technique using ANN implemented in

two phases, which are training phase and testing phase. In the training phase, it must

be entered into the network data and target data entry. Therefore, the color and texture

extracted from the image data give data entry and the designation of a category of images

Base features gives objective data. The back-propagation learning rule is applied until the

network convergence is reached.

In the testing phase, to get an image of the query using the same procedure and tech-

nique to get the features to create a feature vector, which then become input to the neural

network trained to the process of retrieval. Network carriers appoint one or more of the

similar categories. Therefore, finding a good measure of the similarity between the im-

ages on the basis of some of the group difficult task feature, also it contributes to the

impact on the effective and efficient retrieval technique. When calculating the values as-

sociated with the base features images and store data, and can be queries. This type of

vector selected feature determines the type of measurement that will be used to compare

the similarity. If the display of images extracted features multi-dimensional dots, and

can calculate distances between multiple dimensions corresponding points. Euclidean

distance, after Manhattan, after the Euclidean distance weighted, minimum rule middle

distance, after the cross-correlation and statistical distance are the most common metrics

to measure the distance between two points in a multi-dimensional space [104].

In the work of [129] defines the classification of multi spectral with remote sensing

data by using Back-propagation Neural Network [130]. A combined learning technique

of mid level feature representation is presented in [131]. The procedure unites the ben-

efits of semantic features and higher expressive non-semantic vector representations to

infer closed form solutions of optimized problems. An auto-encoder model with large

margin principle was utilized to augment attribute based features with additional dimen-

sions to form an efficient retrieval model. The constructed model permits smooth alter-

ations between zero-data learning without training samples, unsupervised learning with

55

2.9 Chapter Summary

training samples but without class labels, and supervised learning with training samples,

to achieve high precision than the semantic or syntactic representations alone. Support

vector machine (SVM) and Relevance Feedback (RF), are used to solve classification

problem, in which relevant images and irrelevant images serve as two separate training

sets. SVM active learning [132], consider the samples near the SVM boundary and take

the labeled input from the users. The most informative samples are taken near the bound-

ary. Constrained similarity measure support vector machines (CSVM) [133] considers

the repository images belongs two different clusters to separate the boundary, and after

sorting the results are obtained. Asymmetric bagging and random subspace for support

vector machines (ABRS-SVM) [134] addresses the issue of imbalanced training sets by

generating the multiple versions of the SVM classifier by replacing the negative samples

with the positive duplicates. Improvement in the image retrieval occurs with this approach

but it is not significant, when feedbacks are severely imbalanced.

Irtaza and Jaffar [87] Presented a possible solution to retrieve similar images linguis-

tically from a large image repositories against any image query. The algorithm used ma-

chinery and tankers genetic support to reduce the gap between high-level and low-level

features. To avoid the risk of disintegration, and included relevance feedback are also

important in their work.

2.9 Chapter Summary

Some of the basics required for this research are presented in this chapter. The chapter

starts with an overview of content based image retrieval, than we discussed different query

scenarios in a CBIR system like query by example, image region based query, sketch

based query, query by multiple examples, and query by multiple modalities. In this chap-

ter, we also discussed the role of feature extraction, and different ways, which can be used

for extracting visual features from images. Another focus of the chapter is on similarity

computation and performance measurement. In this regard, we discussed popular ways

of similarity computation and machine learning techniques. The chapter also focuses on

56

2.9 Chapter Summary

the performance evaluation measures over the standard image benchmarks that are com-

monly used in the CBIR research. Another important focus of the chapter is to cover the

previous works that are done so far in the field of CBIR. Hence, a comprehensive review

of a few studies is also presented in this chapter.

57

CHAPTER 3

BANDELET TRANSFORM

3.1 Introduction

Wavelet bases incomplete to think regular images on the basis that they can not take

advantage of the geometric consistency’s structures. Certainly, the wavelet has support

construed square lattice box, that have not been adjusted for consistency anisotropic geo-

metric elements of counting the edges. There are a few edges for example, the curvelets of

[135] Candes and Donoho and the distorted bandlets of Le Pennec and Mallat [136] have

been acquainted with enhance the close estimation exhibitions of wavelets. The image

is decayed over vectors which extends the image and be in different directions vanishing

minutes to exploit the existing picture of consistency along the special bearings. Asymp-

totic hypotheses give better close estimation blunder rots in these edges thought about

to wavelet bases, yet curvelet and distorted bandlets don’t appear to plainly enhance the

numerical rough guess capacities of wavelets for most characteristic pictures and natural

images.

From a scientific perspective, the inquiry raised by these physiological models in

late is to comprehend if one can develop hierarchical representations of the image from

wavelet coefficients, to exploit geometric picture consistency. Characterizing a geometry

on wavelet coefficients to provide the ability to adapt to leave this geometry depends on

the image scale. This can be imperative for surfaces having multi scale structures taking

58

3.2 Surface Compression through Geometric Bandelet

after distinctive geometries at every scale.

We have to show that excellence in various geometric representation and directed

wavelet coefficients have a number of numerical calculations and points of interest on the

deterioration in the quality directly, for example, curvelet What is more, twisted Bandelet

outline. Instead of these past developments, and subsequent rules bandelet perpendicular,

and bases are orthogonal and obtain the wavelets normality. These bases are gotten from

a wavelet premise with a course of orthogonal administrators that characterize a discrete

bandeletization, which prompts a quick calculation.


The geometry of regular surfaces is unpredictable and naturally multi scale [137]. The di-

verse methods applied to depict the geometry for a mixed bag of structures at distinctive

levels of point of interest: large scale structures (the customary 3D network representa-

tion), meso structures (knock guide or removal guide), and miniaturized scale material

structures (reflectance capacity). The image Surfaces are usually decomposed within ba-

sis of bandelet that has a speedily bandeletization algorithm. Finding these geometric

components is a badly postured issue, and researcher try to explain the regularity direc-

tion estimation and purpose regarding surface compression. In image processing, discrete

data are used. Sampling is a first step before any processing which can model various

acquisition process that is range scanning, reconstruction, remeshing of 3D model.

Several strategies are already planned to fix the condition associated with animations

geometry compression, understand the latest review associated with [138]. In order to

execute the compression, semi-regular remeshing may be utilized. The first structure

associated with wavelets over the triangulations seemed to be developed. The raising

scheme provides this specific structure and supplies a good device with regard to sur-

face area research. In reality, best known coders range from the normal multi resolution

structure associated with [139]. For images in addition to geometry image compression,

the algorithms are generally the most effective conversion idea degrade the signal within

59


an orthonormal basis along with quantized resultant coefficients. The well known exam-

ples of these algorithms are the JPEG and JPEG2000. Making use of normal foundation

functions is additionally essential to stop adding blocking artifacts within the pressurized

transmission. Most notably are the curvelets [135], contourlets [140], wedge prints [141]

and non-linear subdivision schemes [142]. Even so, it’s unlikely that any these methods

are able to build the basis of normal orthogonal functions that is highly desirable for the

image compression.

The bandelet approximation [136] takes the benefit of irregular geometric image by

eliminating redundant information of the wavelet transform warped by applying bandele-

tization. However, the output converter is a non-orthogonal and warpped with unique

border. As an alternative, the second generation bandelet converter usually created more

than the standard orthogonal wavelet transform. Thus, it is simpler and orthogonal, and

without the impact of the border effect. This type of second bandeletization first genera-

tion of rearranging wavelet coefficients 2D and 1D perform wavelet transform. The steps

of the bandelet algorithm are:

1. Input image.

2. 2D wavelet transform.

3. Selecting each dyadic square.

4. Selecting each geometry.

5. Projection of the sampling locations.

6. 1D wavelet transform.

7. Selection of the best geometry.

8. Output of the transform.

9. Build the quadtree.

60

3.3 Bandelet Image Compression

The user gives an image to store as 3-channels by taking any geometry image tech-

nique. The compression rate of algorithm depend upon the value of threshold T. The

transform orthogonal or biorthogonal are applied on 3-channels (RGB) of the given im-

age and the results is a collection of images in 3-tupples (fHj , fVj , fDj ). The new images

f sj , for each scale 2j and orientation S ∈ (V,H,D) are stowed in single image with same

size of the original image f. Some sort of dyadic rectangular is actually by simply descrip-

tion the rectangular attained by simply recursively removing the original wavelet changed

impression f sj , into 4 sub-squares of similar size. To choose an appropriate geometry, the

direction (d) must be selected in a way that helps in minimizing the Lagrangian equation:

L(fd1, R) = ‖(fd1 − fdRq)2‖+ λT 2(RG +RB) (3.1)

where fdRq is the regenerated signal using the inverse function of wavelet transform in 1D

and RG represents total bits used to estimated to code of geometric parameter d having

some amount of an entropy. The transformation of f by this procedure is equal to the

decomposition of f on the basis of bandelet B. The bandelet function bu is defined by u

= ( j,S, k,m), where 2j is the size of scale of 2D wavelet transform, S is the dyadic width

of L pixel with 1 ≤ L ≤ 2−j , k ∈ 0, ....2 log2(L), and m ∈ 1....2k are the scale and index

in the ID wavelet transform. The bandelet function depends upon the wavelet, quadtree

segments with non overlapping squares and fixed scale of the geometry.


At the point when a capacity characterized over [0, 1]2 has singularities that have a place

with general bends, one may exploit the normality of geometrical to upgrade the esti-

mation from the parameters of M. Mostly techniques, for example, Wavelet approximate

dividing the M-term are the properties locally and in this way cannot be that preference

point of such geometric normality. In this thesis presents a whole new category of bases,

using elongated tankers scale multi bandelet that adhere to the actual geometry, in order

to enhance the actual convergence

61


3.3.1 Geometric image model

Functions that were typical almost everywhere outside the range of regular curves de-

fine the edge of the first basic model geometrically regular functions. Suppose Cα(Λ) is

Holderian functions of order α over Λ ⊂ <n for α > 0:

Cα(Λ) = f,<n → < : ∀|β| = bαc, ϑ|β|

ϑxβ11 ....ϑxβnn

f exists and satisfies (3.2)

Where α is an integer, and Cα is superior the function of space with bounded deriva-

tives up to α. Thus, the Norm of the function ‖f‖ Cα(Λ) is obtained as:

‖f‖Cα(Λ) = maxsupx∈Λmaxβ≤αϑβ

ϑxβ11 ....ϑxβnn

f(x), (3.3)

‖f‖Cα(Λ) = maxsup(x,y)∈Λ2maxβ=α|ϑβ

ϑxβ11 ....ϑxβnn

f(x)− ϑβ

ϑxβ11 ....ϑxβnn

f(y)|×‖x− y‖bαc−α

(3.4)

To obtained the geometrically regular edges by using the equation f ∈ Cα(Λ) for Λ

= [0, 1]2 - Cγ1≤γ≤G , Cγ are the edge curves of holderian of order α. These kinds of

effects of blurring along the edges may be modeled through a convolution through the

presence of unknown kernel of compact support h(x). Thus, we have:

f(x) = f∼ ∗ h(x) (3.5)

where f∼ ∈ Cα(Ω) for Ω = [0, 1]2- Cγ1≤γ≤G. To calculate an approximation fM

through M parameter satisfying the equation.

[f − fM ]2 ≤ CM−α (3.6)

where C is known as constant term which does not rely on the blurring kernel h. Can-

des and Donoho [135] proposed the image model using curvelets and satisfies an image

approximation fM through M Curvelets:

[f − fM ]2 ≤ CM−2(log2M)3 (3.7)

62


On the contrary, the polynomial complexity is unknown that handles such approximation

fM algorithm and an error that always decays like M−2. The integration of the nucleus

in a blur of unknown geometric model produces a more complex problem. To be sure,

smooth corners are more problematic to distinguish between the sharp characters, along

with the Trinity has to be adapted to help the actual size of the blur to assist in bringing

precision image transitions along the edges.

3.3.2 Geometric Image Flow with Bandelet Bases

Previous Image model describe geometrically regular edges. Sharp transitions move

across the edges but has a normal difference when moving parallel to the edges. This dis-

placement parallel to the edges of geometric can flow, which is the field of parallel vectors

that give local direction that the f has a regular variations description. It is constructed

Bandelet rules by warping the orthogonal wavelet bases mixed with this geometric flow.

“Geometric flow is known as vector field ~τ(x1, x2) to produce directions in that vector

field in which the function f has regular variations in every pixels (x1, x2). Inside area of

advantage, the circulation is usually parallel towards tangents on the advantage contour.

To build a orthogonal schedule basis, which has a geometric circulation, most of us will

demand the circulation is actually locally often parallel from the top to bottom direction,

so because of this frequent with this direction, or perhaps parallel from the horizontal

direction. The function f include one edge C that have the angle with the horizontal or

vertical direction, which is still smaller than π/3, so parameters can be C horizontally

or vertically by function. Let’s assume that f is a model of horizontal horizon as shown

in figure 3.1. Determine the flow of parallel vertically whose angle with the horizontal

direction is smaller π/3. Such flow can be written as follows:”

~τ(x1, x2) = ~τ(x1) = (1, g′(x1)) (3.8)

Figure 3.1 show that flow line is an integral curve whose tangent at (x1, x2) is collinear to

63


~τ(x1, x2). g(x) be prehistoric of g′(x) which is defined by

g(x) =

∫ x

0

g′(x)dx (3.9)

These points represent the Flow lines are (x1, x2) ∈ Ω which satisfy x2 = g(x1) + cst.

this flow is parallel from A band to B is defined as.

B = (x1, x2) : x1 ∈ [a1, b1], x2 ∈ [g(x1) + a2, g(x1) + b2] (3.10)

Figure 3.1: Horizon model with a flow - [135]

If your stream information tend to be adequately parallel towards the border infor-

mation, subsequently f(x) offers typical variations along side every single stream line

(x1, g(x1) + cst). Then, applied warped image so,

Wf(x1, x2) = f(x1, x2 + g(x1)) (3.11)

By applying the warping function on band B:

WB = (x1, x2) : (x1, x2) + g(x1) ∈ B = (x1, x2) : (x1 ∈ [a1, b1], x2 ∈ [a2, b2])

(3.12)

If Ψ(x1, x2) is a function having vanishing moments along x1 for x2 fixed, therefore

Wf((x1, x2)) is regular along x1

〈Wf,Ψ〉 = 〈f,W ∗Ψ〉 (3.13)

where W ∗ is the adjoint of the operator and W is an orthogonal operator.

W ∗f(x1, x2) = W−1f(x1, x2) = f(x1, x2 − g(x1)) (3.14)

64


From 1D wavelet Ψ(t) and a scaling function Φ(t) are obtained as :

Ψj,m(t) =1√2j

Ψt− 2jm

2j(3.15)

Φj,m(t) =1√2j

Φt− 2jm

2j(3.16)

when wavelet scale 2j decreased then index j goes to negative infinity (j → −∞). so we

can obtain family of separable wavelets

φj,m1(x1)ψj,m2(x2),

ψj,m1(x1)φj,m2(x2),

ψj,m1(x1)ψj,m2(x2),

wherej,m1,m2 ∈ IWB (3.17)

“ The index set IWB depends upon the length and width of rectangle WB. Since W

is orthogonal, using the inverse to each of the wavelets yields an orthonormal basis of

L2(B) that is known as warped wavelet basis. After applying the inverse warping W 1 the

resulting functionsψl,m1 ψj,m2(x2−g(x1)) are called bandelet because their flow support is

parallel to the flow lines and is more elongated (2l > 2j) in the direction of the geometric

flow. Inserting these bandelet in the warped wavelet basis yields a bandelet orthonormal

basis of L2(B):”φj,m1(x1)ψj,m2(x2),

ψj,m1(x1)φj,m2(x2),

ψj,m1(x1)ψj,m2(x2),

wherej, l > j,m1,m2 (3.18)

3.3.3 Image Compression Through Bandelet

It will be compressed image under very bandelet through the first image segmentation

and support the flow of geometric in every region of the segmentation regions encoding.

Actual decomposition of the image bandelet transactions in the framework of the result-

ing quantum and then stored with the binary code. In image compression scheme, R is

quantity of bits to encode the frame Bandelet and Bandelet coefficient of f in this frame.

R = RS +RG +RB (3.19)

65

3.4 Orthogonal Bandlets

“where RS is the dyadic square segmentation, RG is the flow in each square region, and

RB is number of bits of the Bandelet coefficients. 2λ is square of size in an image, the

geometry flow is parameterized at a scale 2k by 2λ−k quantized coefficient am = qT2 with

qT ≤ Cθ. The flow factor RG is the sum of all squares to define a flow having the number

of bits required to calculae the scale. In a Bandelet frame F = (bi,m)i,m, all Bandelet

coefficients 〈f, (bi,m)〉 are uniformly quantized with a uniform quantizer QT of step T ”:

QT (x) = qT if(q − 1/2)T < x < (q + 1/2)T (3.20)

Total number of bits to encode the quantized Bandelet coefficients

RB ≤MB(log2(‖f‖/T )) + log2(C2Ψ‖f‖2

∞T−2) (3.21)

The image restored from its Bandelet coefficient is

fR =∑i,m

QT (〈f, bi,m〉)b′

i,m (3.22)

In Bandelet scheme, resulting distortion is D(R) = [‖f − fR‖]2 and obtained as:

D(R) = ‖f − fR‖2 ≤∑i,m

‖〈(f, bi,m)〉 −QT 〈(f, bi,m)〉‖2 (3.23)

For the MB nonzero quantized coefficient

D(R) ≤∑i,m

〈(f, bi,m)〉‖2 +MBT2/4 (3.24)

D(R) ≤ L(f, T/2,F) (3.25)

Small distortion rate can be obtained by finding a best Bandelet frame in DT 2 that mini-

mize L(f, T/2,F).


Bandelet technique is definitely an research software which in turn is aimed at benefiting

from sharp graphic changes within graphics. Some sort of geometric circulation, which in

66


turn shows recommendations in which the graphic dull amounts include standard different

versions, is employed in order to create bandelet bottoms within bandelet alter. This

bandelet bottoms result in optimal approximation premiums with regard to geometrically

standard graphics and so are proven to be productive within nonetheless graphic data

compression, online video data compression, as well as noise-removal algorithms.

3.4.1 Block Based Bandelet Basis

“This area portrays the development of a bandelet premise from a wavelet premise that is

distorted along the geometric stream, to exploit the picture consistency along this stream

and geometric flow to acquire the bases of orthonormal bandelet. Bandelet provides the

information of the block in an image and these block are called small region in an image.

Ω is geometric flow in a region along a vector field ~τ(x1, x2) which gives a new direction

where possesses typical modifications inside your neighborhood of each region (x1, x2) ∈

Ω. Orthogonal bases with the resulting flow is obtained. As first condition of regularity

enforce that the flow is either parallel or vertically are ~τ(x1, x2) = ~τ(x1) , or parallel

horizontally hence, ~τ(x1, x2) = ~τ(x2). The image S is partitioned into regions S = ∪iΩi,

and having each flow Ωi parallel means horizontal or vertical. Figure 3.2 [143] shows the

vertically geometric flow of real image in a region.”

Figure 3.3 [143] shows the example, image is divided into square region and each

region represents Ωi include at most one contour.

Figure 3.2: Flow in region in an image - [143]

In each region, contains a contour piece, this parallel flow can be selected to the tan-

67


gents from the contour curve and corresponds to the flow of line. Bandlets tend to be

constructed throughout these kinds of regions by applying the warping separable wavelet

bases and this process is known as bandeletization. When a geometric movement or flow

can be determined throughout Ω, this wavelet basis is exchanged by the basis of Ban-

delet. A flow line represented the flow of an integral curve, whose tangents are parallel

to ~τ(x1). Therefore, flow line associated to a fixed translation parameter x2 is a set of

pointx1, x2 + c(x1) ∈ Ω for x1 varying,

c(x) =

∫ x

xmin

c′(x)dx (3.26)

By development of the flow, the actual impression gray stage has normal modifications

Figure 3.3: Dyadic square segmentation of an image - [143]

combined these types of flow traces, thus the actual warped impression is regular along

the horizontal lines for x2 fixed and x1 varying.

Wf(x1, x2) = f(x1, x2 + c(x1)) (3.27)

If Ψ(x1, x2) is a wavelet having several vanishing moments along x1 for each x2 fixed,

then the inner product has a small amplitude.

〈Wf,Ψ〉 = 〈f,W ∗Ψ (3.28)

The warping operator W is an orthogonal operator since its adjoint is equal to its inverse

W ∗f(x1, x2) = W−1f(x1, x2) = f(x1, x2 − c(x1)) (3.29)

68


Therefore, warping operator and its inverse is written as W ∗ = W−1 and applying inverse

operator to each wavelet of an orthonormal basis of L2 (WΩ):

φj,m1(x1)ψj,m2(x2)

ψj,m1(x1)φj,m2(x2)

ψj,m1(x1)ψj,m2(x2)

wherej,m1,m2 ∈ IW (Ω) (3.30)

W−1 is orthogonal thus, warped wavelet orthonormal basis of L2(Ω)

φj,m1(x1)ψj,m2(x2 − c(x1))

ψj,m1(x1)φj,m2(x2 − c(x1))

ψj,m1(x1)ψj,m2(x2 − c(x1))

where(j,m1,m2) ∈ IW (Ω) (3.31)

If the geometric flow in Ω is parallel in horizontal direction, which is meaning that

~τ(x1, x2) = ~τ(x2) = (c(x2)) (3.32)

Now suppose xmin = infx2(x1, x2) ∈ Ω and c(x) =∫ xxmin

c′(x)dx. A warped wavelet

basis is constructed from a wavelet basis of

WΩ = (x1, x2) : (x1 + c(x2), x2) ∈ Ω (3.33)

φj,m1(x1 − c(x2))ψj,m2(x2)

ψj,m1(x1 − c(x2))φj,m2(x2)

ψj,m1(x1 − c(x2))ψj,m2(x2)

where(j,m1,m2) ∈ IW (Ω) (3.34)

The bandeletization replaces each family of scaling functions φj,m2(x2)m2by a family of

orthonormal wavelets that generates the same space. The resulting Bandelet orthonormal

basis of L2(Ω) is

φj,m1(x1 − c(x2))ψj,m2(x2)

ψj,m1(x1 − c(x2))ψj,m2(x2)

ψj,m1(x1 − c(x2))ψj,m2(x2)

where(j, I) > j,m1,m2 (3.35)

3.4.2 Fast Discrete Bandelet Transform

Bandelets inside an area are usually calculated through the use of the bandeletization to

help warped wavelets, that happen to be separable coupled the fixed route (horizontal

69


as well as vertical) along with on the movement outlines so long as many people con-

tinue being clear of your boundary connected with an image. A simple under the radar

bandelet transform may, consequently, become calculated coming from a fast separable

wavelet transform coupled this particular fixed route along with on the impression move-

ment outlines. The block basis of bandelet of previous section is built with the basis

of warped wavelet inside each region.“In image processing application, discontinuities

appear along the region boundaries when Bandelet coefficient is modified. To avoid this

situation (boundary effects), discrete Bandelet transform or discrete warped wavelet trans-

form are used. The fast discrete Bandelet transform related to an image partition⋃i Ωi

contains three steps:

1. Image Resampling of an region, that determine the image sample values along the

flow lines Ωi of the partition in each region.

2. Applied warped transform, some sort of warped wavelet change with a subband

filtering along the flow lines, which usually will go over the location border and

region boundaries.

3. Bandeletization process, transforms the warped wavelet coefficient to calculate the

coefficients of bandelet along the flow lines.

The fast inverse Bandelet transform includes the three inverse step:

1. Inverse bandeletization process,that recovers the warped wavelet coefficient and

bandelet coefficients along the flow lines.

2. Inverse warped transform, with a inverse subband filtering along the flow lines.

3. Inverse resampling, which computes the image samples along the original grid from

the samples along the flow lines in each region Ωi .

This segmented geometric flow is optimized for image compression and noise removal

applications.”

Noise Removal Application

Thresholds estimators in an orthonormal premise are especially effective to uproot added

70


substance commotions if the premise has the capacity rough the first flag with few nonzero

coefficients. For bandelet bases, this requires to evaluate and advance the geometric

stream in vicinity of added substance clamor. The punished estimation finds the particular

best bandelet basis which in turn decreases a great empirical risk that may be punished

from the complexity from the geometric move and flow. To estimate a signal f [n] from

the noisy data is:

X[n] = f [n] +W [n] (3.36)

where W [n] is gaussian white noise of variance σ. A thresholding in a Bandelet basis

B = [gm]1≤m≤N2 can be written as

F =N2∑m=1

ρT (〈X, gm〉)gm (3.37)

where ρT (x) is a thresholding at T : ρT (x) = x1|x|>T and σ is noise variance and accord-

ing to Donoho and johnstone [144] the value of threshold is set to T = γρ√

2loge(N2)

where ρ is remain constant.

The best Bandelet basis is the one that minimizes this risk among all possible Bandelet

bases. This requires to be able to improve your geometric stream regarding your bandelet

schedule within existence regarding noise. When the noisy data X is acquired by the

addition of Gaussian white noise then, to find best bases by applying the minimizing

appropriate penalized cost function. This function can be obtained by minimizing the

Lagrangian of distortion rate

D + λρ2RwithD = ‖X − F‖2 (3.38)

where methods the intricacy from the product for the reason that volume of parts

needed to program code the picked time frame and also the quantized coefficients regard-

ing within, for just a quantization stage equal to the patience.“whereR measures the com-

plexity quality of the model M as the number of bits expected to code the chose premise

basis B and the quantized coefficients of X in B, for a quantization step equivalent to the

thresholds T . In the context of image compression, given an image segmentation, flow

in each region Ωi is calculated by minimizing the quadratic image variation within the

71

3.5 Chapter Summary

Figure 3.4: Left columns gives zooms of noisy images having a PSNR = 20:19 dB.

The middle and left columns are obtained, respectively, with bandelet and wavelet

estimators - [143]

flow. To calculate the flow in presence of white gaussian noise,the variance β2 of Gaus-

sian filter θ and displacement parameters c′i[p] is parameterized in family of B-splines

dilated by 2l. It requires O(N2(log2N)2) operations to optimize the image segmentation

and the geometric flow in each region through thresholding estimator.” Figure 3.4 [143]

shows the PSNR of bandelet and wavelet threshold images and bandelet transform are

better performance to remove the noise in an image as compared to wavelet.

3.5 Chapter Summary

Some of the basics required for this research are presented in this chapter. The chap-

ter starts with an overview of Bandelet transform, than we discussed different techniques

which are widely used in image compression, noise removal application, multi scale struc-

turing, geometry compression. In this chapter, we also discussed, the role of Bandelet

transform to feature extraction, and different ways, which can be used for extracting vi-

sual features from images. “We demonstrate that characterizing a various leveled geo-

metric representation from wavelet coefficients has number of numerical and algorithmic

points of interest over direct deteriorations, for example, curvelet what’s more, twisted

Bandelet outlines. Rather than these past developments, the subsequent bandelet bases

72

3.5 Chapter Summary

are orthogonal and acquire the wavelets normality they are developed from. The geom-

etry can likewise be adjusted at every scale. These bases are achieved from a wavelet

premise with a course of orthogonal administrators that characterize a discrete bandele-

tization, which prompts a quick calculation. Bandelets are an orthonormal basis that is

adapted to geometric boundaries. Bandelets can be interpreted as a warped wavelet ba-

sis. The motivation behind bandelets is to perform a transform on functions defined as

smooth functions on smoothly bounded domains.”Another important focus of the chapter,

is to cover the previous works that are done so far in the technique of Bandelet Transform.

73

CHAPTER 4

FEATURE EXTRACTION USING

BANDELET TRANSFORM

One of the major requirements of CBIR systems is to ensure meaningful image retrieval

against query images. The performance of these systems is severely degraded by the

inclusion of image contents, which do not comprise the objects of interest in an image

during the image representation phase. Segmentation of the images is considered as a

solution but there is no technique that can guarantee the object extraction in a robust

way. Another limitation of the segmentation is that, most of the image segmentation

techniques are slow and their results are not reliable. To overcome these problems, a

bandelet transform based image representation technique is presented in this research,

which reliably returns the information about the major objects found in an image. For

image retrieval purposes, ANN and SVM are applied and the performance of the system

and achievement is evaluated on three standard data sets used in the domain of CBIR [6].

4.1 Introduction

CBIR systems generate meaningful image representations by considering the visual char-

acteristics of images, and bring closely resembling images in terms of distance as the

semantic response. In this regard, one of the major challenges is semantic gap, i.e., fea-

74

4.1 Introduction

tures at low level are not sufficient to characterize the high level image semantics [22]. To

bridge this gap to some extent, an important focus of research is on the enhancement of

these features, so that the machine learning algorithms can make significant improvements

to bridge this gap. To reap the benefits of the segmentation based image representations

and overcome the associated drawbacks, the focus of the research is on the identification

of the image segments that contain major image objects by applying Bandelet transform.

Bandelet transform returns the geometric representation of the texture of the object re-

gions, which can be used to discriminate the objects of interest in a fast way. The detailed

procedure will be described in the section 4.2.1. The major problem with the geormatric

output is that its empathy is complicated due to the close resemblance of the connected

regions. In order to ensure the actual association, artificial neural networks are applied

and correct texture classification is performed. We, then apply the Gabor filter and gen-

erate the texture representation which is based on the classification output. To further

enhance the image representation capabilities, we have also estimated the color content in

the YCbCr color domain and HSV color domain and defused it with texture feature.

As described by the Irtaza et al., major drawbacks faced by a CBIR system or query

by image content which severely impact the retrieval performance [87] are: (1) the lack of

output verification and (2) neighborhood similarity avoidance for the semantic association

purpose. Therefore, we have followed their findings and also included the neighborhood

in the semantic association process. Content based image retrieval is then performed by

the artificial neural networks after training them with these obtained features. Bandelet

transform is used for medical image retrieval [145]. However, their approach for feature

extraction is different than our approach. Before this, researchers have utilized the ban-

deletization property for image compression [146], image enhancement [143] and gender

classification [147].

The technique that we present in this research considers the most prominent objects

which exist in an image using the object geometric representation obtained by bandelet

transform in a precise manner. The texture information found in object boundaries are

then utilized for use as the component of the feature vectors after applying the targeted

75

4.2 Image Representation using Bandletized region include YCbCr Color Space

parameters to the Gabor transform based on the Artificial Neural Network suggestions.

The features are further improved by incorporating the color information in YCbCr do-

main. Image semantics are then obtained by the Artificial Neural Networks. Other fea-

tures obtained the color information in HSV domain and support vector machine are used

for Image semantic purposes.

4.2 Image Representation using Bandletized region in-

clude YCbCr Color Space

The most important capability of the proposed method is its attribute for identifying the

most prominent objects in an image. These objects are, then considered as the core out-

comes which are used for the generation of feature vectors. To achieve this, first of all

image transformations are generated through Bandelet transform, which return the geo-

metric boundaries of the major objects found in an image. We apply Gabor filter with

targeted parameters (as will be described) to estimate the texture content around these

boundaries. These geometric boundaries are vague in a sense that they could easily be

deceived to be associated with unwanted texture classes as all of them closely resemble

to one another and if not carefully considered, can result in the form of wrong parame-

ter estimation, which will be consequence of the form of unsatisfactory image retrieval

output. Therefore, to avoid this situation geometric classification is performed through

the backpropagation neural networks, which make it certain that the texture estimation

parameters to apply Gabor filter should be approximated with maximum accuracy.

To increase the power of feature vectors, color components are also induced in the

YCbCr domain after approximating it through wavelet decomposition over the color his-

tograms. The proposed features of all the images are applied in the image repository, and

is determined their classes semantic through ground truth training with Artificial Neural

Networks and the finer neighborhood of every image. We generated inverted index over

the semantic sets, which guarantees the fast image retrieval after determining the semantic

76


class of user image (query). The complete process of the proposed method is represented

in figure 4.1 and the detail of the process will come in the subsequent sections.

4.2.1 Modified Bandelet Transform

The issue with Wavelet bases was that the same values of texture have different directions

in an image. To overcome this limitation, Le Pennec and Mallat et al. [124; 136] proposed

the Geometric regularity in an anisotropic way by eliminating the redundancy of wavelet

transform using the concept of bandeletization. Bandelet transform is a major self adap-

tive multiscale geometry analysis method which exploits the recognized geometric infor-

mation of images as compared to the non adaptive algorithms such as curvelet [62; 89]

and Contourlet transforms [148]. Bandelet transform not only has the uniqueness of mul-

tiscale analysis, directionality and anisotropy but it also presents particular possessions of

severe sampling and adaptability for image representation. Bandelet basis rules accumu-

late carriers extend in the direction perpendicular to the regularity of the maximum of the

function, as shown in figure 4.2. Alpert transform is used for bandeletization that closely

follows the geometry of underlying images. The main objective is to take the advantage

of sharp image transitions by computing the geometric flow to form bandelet bases, which

capture the grayscale images constantly changing direction.

77


Figure 4.1: Proposed Method.

78


Figure 4.2: Bandelet Transform [1; 2]. (a) Dyadic segmentation depends on local di-

rectionality of the image; (b) Bandelet segmentation square which contains a regularity

function as shown by the red dash; (c) sampling position and Geometric flow; (d) Sam-

pling position adapted to the warped geometric flow; (e) warping example.

As shown in figure 4.3, bandelet transform divides the image into square blocks and

obtains one contour (Ωi) from it. If a small region of an image does not comprise any

contour, it means that intensity of an image is uniform and regular in that region, therefore,

the flow of line is not defined.

Figure 4.3: Geometric flow representation using different block sizes (a) small size 4*4

(b) medium size 8*8 (a) small size 4*4; (b) medium size 8*8.

79


4.2.1.1 Alpert bases in bandelet transform

As per the work of [149], Alpert transform was applied to compute the bandelet bases

to approximate images having some geometric regularity. For this, image is divided into

square blocks and the geometric flow is estimated in every block. These square blocks are

represented by S. In our implementation, the block size is 8×8. As we elaborated in figure

4.3, if, we use smaller blocks, i.e., 4 × 4, then it will divide the image in more chunks.

However, the drawback is that in this case, the Bandelet transform would not be able to

capture the sharp edges and similarly, if the block size is larger, i.e., 16 × 16 or 32 × 32,

then the geometric flow exceeds the object boundaries. Hence, through the experimental

observations, we used the block size of 8 × 8 for appropriate object estimation. Alpert

transforms parallel to the geometric flow, which is developed over the space l2(S) of

wavelet coefficients in S with the polynomials of piecewise over bands of dyadic widths.

The direction of geometric flow γ is expected in the domain of S and warping operator

W warps S into derivative of S. In the theory of warping function, any point xn = 2jn

is warped into xn = W (2jn). Similarly, l2(S) represents the function sampled in warped

domain, i.e., g(xn)(2jn) ∈ S. To explain the multiresolution for each scale 2l, the warped

square S is recursively subdivided into 2−l horizontal bands through:

S =2−l−1⋃i=0

βl,i (4.1)

In equation (4.1), 2nd term depends upon βl,i = βl−1,2i ∪ βl−1,2i+1. Now the value of

band is calculated in original Square “S” using Alpert multiresolution space Vl ⊂ l2(S)

i.e. βl,i =def W−1(βl,i) ∈ S that has the width roughly equal to λ2l and sampling points

equals to 2l(λ2l)2. Alpert vector is obtained through equations (4.2) and (4.3):

∀Vl = g ∈ l2(S) ∈ βl,i (4.2)

g(xn) = Pi(xn) (4.3)

According to multiresolution space orthogonal bases (hl,i,k)i,k of each space are acquired

by the using of Gram Schmidt orthogonalization and resulting vector is obtained as:

pk(xn) = (x1)k1(x2)k2 (4.4)

80


Alpert wavelets (Ψl,i,k)i.k are represented the orthogonal bases of orthogonal complement

(wl) of (Vl). Therefore, we compute Alpert wavelet vectors (Ψl,i,k)k after applying Gram-

Schmidt orthogonalization:

[hl−1,2i,k − hl−1,2i+1,k]k1+k2 < p ⊂ Vl−1 (4.5)

The consequential multiwavelet vectors (Ψl,i,k) have vanishing moments over the warped

domain which is orthogonal to (Vl):∑(Ψl,i,k)(x)(x)k = 0 (4.6)

The above equation satisfies the condition, where (x)k = (x1)k1 are the requirements

of each point xn = (x1, x2) in the warped domain. The orthogonal bases (Ψl,i,k)l,i,k of

l2(SS) describes an orhogonal alpert bases in l2(S) domain.

(ψl,i,k)(xn) = (ψl,i,k)(xn) (4.7)

In square block at scale 2l we calculate the orthogonal alpert bases β(S, γ) of l2(S) by:

β(S, γ) =def (ψl,m)|L ≤ l ≤ 0 and 0 ≤ m < p(p+ 1)(2l−1) (4.8)

Bandelet transform provides the information of each square block and the flow of each

square S is undefined then drawn a perpendicular onto β(S, γ) and skip the wavelet co-

efficient in S as unchanged. Bandelets have to provide the dyadic segmentation with

bandeletization bases β(Γj) of the whole space of wavelet coefficients at a scale 2j .

β(Γj) =⋃S∈Sj

β(S, γ′S) (4.9)

After applying alpert transform, we get a vector for each square, i.e.,

ψv[n] = ψl,k)[n] (4.10)

In the equation above ψv[n] are the coordinates of bandelet function and fit in the space

of L2([0, 1]2). These coordinates are further used to calculate the bandelet bases, which

is called bandelization.

β(Γj) =⋃j≤0

bv | ψv ∈ β(Γj) where Γ =⋃j≤0

Γj (4.11)

81


Bandelet bases are important factor to calculate the geometric images. Therefore, in ban-

delet transform best bases are obtained by minimizing the Lagrangian function:

β(Γ∗) = argminβ(Γ)∈DT2L(f, β(Γ), T ) (4.12)

Bandelet transform pursues the above equation to generate the geometries of an image.

The value of threshold has impacts on the diversity of the image estimation. Different

values can be adopted, i.e., 32, 48, 56, ..., etc. Therefore, in our implementation, we used

a threshold value of 70 after detailed experimentation, which is able to estimate the theme

object in an image. In the work of [125], each block is estimated in discrete wavelet bases

of L2(Ω) domain i.e.,

φj,m(x) = φj,m1(x1)φj,m2(x2)

ψHj,m(x) = φj,m1(x1)ψj,m2(x2)

ψVj,m(x) = ψj,m1(x1)φj,m2(x2)

ψDj,m(x) = ψj,m1(x1)ψj,m2(x2)

wherej,m1,m2 ∈ I(Ω) (4.13)

where I(Ω) is region of an image which depends upon the geometry of the boundary of

(Ω) and x1, x2 denotes pixel location in the image. Above equation represents that the

modified wavelets and geometry flow, which is calculated in region (Ω). These wavelet

bases are replaced by bandelet orthonormal bases of L2(Ω). Then,

φl,m1(x1)ψj,m2(x2 − c(x1))

ψj,m1(x1)φj,m2(x2 − c(x1))

ψj,m1(x1)ψj,m2(x2 − c(x1))

where j,m1,m2 ∈ I(Ω) (4.14)

Where c(x) defines the line flow associated to fix translation parameter x2, (x1, x2+c(x1))

be in the right place to (Ω) and the direction of geometric flow is extended. Then, c(x) is

obtained as:

c(x) =

∫ x

xmin

c′(x)dx (4.15)

This flow is parallel and c′(x) is calculated as an expansion over translated function dilated

by a scale factor 2l. Then, the flow at this scale is characterized by:

c′(x) =2k−l∑n=1

anb(2−lt− n) (4.16)

82


The bandeletization of wavelet coefficient use alpert transform to define a set of bandelet

coefficients and by using these coefficients combined as inner product of original image

f with bandelets:

bkj,l,n(x) =∑p

al,n[p]ψkj,p(x) (4.17)

Local geometric flow depends upon these coefficient and scales. Therefore, for each scale

2j of the orientation k a different geometry is obtained. After bandeletization process,

we have achieved multiscale low and high pass filtering structure similar to wavelet trans-

form. The above equations 4.12 and 4.17 are used to calculate the geometry of the images

as shown in figure 4.4. The regions having contours which are further used for texture

classification to compute the features using Artificial Neural Network structure.

Figure 4.4: Object categorization on the base of Geometric flow obtained through Ban-

dletization.

83


4.2.1.2 Texture Feature Extraction using Bandelet

Texture is significant module of human visual perception and many researchers have done

a lot of work to determine effectively characterize it in an images. In this regard, we

have proposed a new method to figure out the most prominent texture areas in the im-

age that constitutes the major image objects. In the proposed method, first of all, image

transformations are generated through bandelet transform, which gives back the geomet-

ric boundaries of the major objects found in an image. Secondly, we apply Gabor filter

with targeted parameters to estimate the texture content around these boundaries. These

geometric boundaries are indefinite in a sense that they could easily be duped and to be as-

sociated with undesired texture classes as all of them closely resemble to one another; and

if not carefully considered can result in the form of wrong parameter estimation, which

will be in consequence of the unsatisfactory image retrieval output. Therefore, to avoid

this situation geometric classification is performed through the back propagation neural

networks, which makes it certain that the texture estimation parameters after applying the

Gabor filter should be approximated with maximum accuracy. The following are the main

steps of texture feature extraction:

(1) Convert input RGB image (I) of size M ×N into gray scale image.

(2) Apply bandelet transform to calculate the geometry of an image and obtain the

directional edges.

(3) Artificial Neural Network is used to classify these blocks having directional edges

after training on the sample edge set as described in Figure 4.6. Once the network

is trained, every geometric shape obtained in the step 2 is classified for parameter

estimation. These parameters will further be described in the Gabor filter section.

(4) After parameter estimation, the blocks with geometric contents are passed to the

Gabor filter to estimate the texture.

(5) Steps 1 to 4 is repeated for whole image repository.

84


4.2.1.3 Artificial Neural Network

Neural Networks (NN) are renounced as powerful and dominant tools in the area of pat-

tern recognition and are inspired by biological neurons found in the human brain. The

least mean square rule and the gradient search method are used to minimize the average

difference between input and target values on neural network [150]. The backpropagation

neural networks are applied to classify the texture on the base of geometry returned by

the bandelet transform. In this regard, we classify the texture directions in either horizon-

tal, vertical, right/left diagonal or no contour blocks, by training on a small set developed

manually. For this, we have placed 14 block samples representing the mentioned geo-

metric types in every category, as described in Figure 4.6. To generate these samples, we

consider only the image geometry and suppress the original image part. Once the network

is trained, we apply it to classify every block present in the image. The reason to perform

this task with the help of ANN instead of kernel (Window based operations used in im-

age processing) is that, the geometry is not fixed and has different variations for same

category. In this situation, the performance of the kernel based operations is miserable.

Therefore, the ANN is applied and it classified the texture with maximum accuracy. Fig-

ure 4.5 shows the structure of neural network. The neural networks structure is defined

with one hidden layer having 20 neurons and four output units. The sigmoid function is

used in hidden layer and output layer as transfer function, i.e.,

f(x) = g(x) =1

1 + exp(−x/x0)(4.18)

After training of the neural networks, all blocks of an image are tested against the

neural network, and their texture type is determined as:

m↓ = argmax(yfm) (4.19)

where m represents the total number of output units in the neural network structure, yfm

returns the association factor of a particular output unit. The texture type m↓ of the par-

ticular class is based on the output of the neural network with highest association factor.

Details of the neural network structure is summarized in table 4.1.

85


Figure 4.5: The structure of neural network.

Figure 4.6: Types of texture.

86


Table 4.1: Summary of Neural network structure for every image category used in this

work.

INPUT

Input: −→a = (a1, . . . , an) dim(−→a ) = N

MIDDLE (HIDDEN) LAYER

Input:−→b = U−→a dim(

−→b ) = M

Output: −→c = f(b - −→s ) dim(−→c ) = M

U: MxN weight matrix

f: hidden layer activation function

−→s : thresholds

OUTPUT LAYER

Input:−→d = W−→c dim(

−→d ) = K

Output: −→e = g(d -−→t ) dim(−→e ) = 1

W: 1xM weight matrix

g: output layer activation function−→t : thresholds

ERROR CORRECTION

MSE: E = 1/2(−→p −−→e )2

∆Wij = - α∂E/∂Wij = αδicj

∆ti = αδi

∆Uji = - β∂E/∂Uji

4.2.1.4 Gabor Feature

“Gabor filters are extensively used in the vicinity of computer vision and pattern recog-

nition. Several conquering applications of Gabor wavelet filter include feature extraction,

87


segmentation of texture, face recognition, identification of finger prints, edge detection,

contours detection, directional image enhancement, and image hierarchical representa-

tion, compression, recognition. Gabor is a strong technique to reduce the noise and can

easily reduce image redundancy and repetition [151]. Gabor filters can be convolved on

a small portion of an image or can be convolved on the full image. An image region is

expressed by the different Gabor responses generated through different orientations, dif-

ferent frequencies and angles [87; 152]. For an image I(x, y) with size M×N, its discrete

Gabor wavelet transform is given by convolution:” “

Gmn(x, y) =∑s

∑t

I(x− s, y − t)ψ∗mn(s, t) (4.20)

where s and t are filter mask size and ψ∗mn is the complex conjugate of ψmn which is the

self similar function created from rotation and dilation of following wavelets, i.e.:

ψ(x, y) =1

2πσxσyexp[−1

2(x2

σ2x

+y2

σ2y

)]exp(j2πλx) (4.21)

where λ is the modulation frequency. The generating function is used to obtain self similar

Gabor wavelets.

ψmn(x, y) = a−mψ(x, y) (4.22)

where m and n specify the scale and orientation of wavelet,with m = 0, 1, ...,M − 1 and

n = 0, 1, ..., N − 1. In the above equation, we calculate the term x, y, i.e.:

x = a−m(xcosθ + ysinθ) (4.23)

y = a−m(−xsinθ + ycosθ) (4.24)

where a > 1 and θ = nπ/N . In Gabor filter σ is the standard deviation of the Gaussian

function, λ is the wavelength of harmonic function, and θ is the orientation.”

In our implementation, blocks having bandelet based geometric response are passed

to the Gabor filter and based on the neural network classification, we select the parameters

for the application of the Gabor filter [152].

For horizontal texture portions:

θ = π, and λ = 0.3.

88


For vertical texture portions:

θ = π/2, and λ = 0.4.

For left diagnol texture portions:

θ = π/4, and λ = 0.5.

For right diagnol texture portions:

θ = 3π/4, and λ = 0.5.

Energy computation is performed using following equation:

Fv = µ((A− λEI)X) (4.25)

where Fv is the feature vector. λE are the eigen values, X is the eigen vector, and A is the

Gabor response on a particular block.

4.2.2 Color Feature Extraction

In CBIR, color is the most imperative and significant visual attribute. It has been exten-

sively studied and the motivation is that: the color estimation is not sensitive to rotation

translation, and scale changes. Varieties of color spaces are available and serve effectively

for different applications [153; 154]. The color features in our work are extracted on the

basis of edge detection in YCbCr color space. Edges are extracted by applying canny

edge detector on Y luminance component. The main steps of color feature extraction are

as under:

(1) RGB image (I) is converted into Y CbCr color space.

(2) After conversion, we separate the Y CbCr components and apply canny edge detec-

tor on Y component of the image.

(3) In the next step, we combine the edges obtained in the previous step with unchanged

Cb and Cr.

(4) After step (3), convert the combined image back into single RGB image.

(5) Now separate the individual R, G, and B components and calculate the histogram

of each component. 256 bins are obtained from HR, HG, and HB.

89


(6) To improve the feature performance, we apply wavelet transform at each histogram

obtained in the previous step. We apply the discrete wavelet transform of HR at

level 2, HG, and HB are applied at level 3. After this step, we have 128 bins i.e. 64

bins from HR, 32 bins from HG, and 32 bins from HB.

(7) Calculate feature vector for every image in the repository.

Figure 4.7 shows that the color features are obtain as describe in above mentioned step.

Figure 4.7: (a) RGB Original Image; (b) Y matrix Luminance Image; (c) Canny Luma

Image; (d) Canny RGB Image.

4.2.3 Fusion Vector

Application of the above mentioned procedure generate two feature vectors, represent-

ing texture features obtained from Bandelet transform and color features obtained from

YCbCr color space. Aggregation of these feature vectors in a single vector represents

feature vector features against any image.

90

4.3 Image Representation using HSV


The most significant capability of the proposed algorithm is its characteristic for identi-

fying the most prominent objects in an image, which considered as the core outcomes

are used to obtain feature vectors. For this purpose, first of all image transformations are

generated through bandelet transform, which return us the geometric boundaries of the

major objects found in an image, and use that particular information for image represen-

tation that ensures the retrieval of images in a more precise way. The main attribute of

the Bandelet based object estimation is its enormous speed that needs to perform only n

operations to achieve this goal. In bandelet n are the total number of non overlapping

blocks in which the image is divided. Then, Gabor filter is applied with targeted parame-

ters (as will be described) to estimate the texture content around these boundaries. These

geometric boundaries are erroneously selected that they could easily be deceived to be

associated with redundant and unwanted texture classes, as all of them closely resemble

to one another; and if not carefully considered can result in the form of wrong parameter

estimation, which will consequence in the form of unsatisfactory image retrieval output.

Therefore, to avoid this situation geometric classification is performed through the back

propagation neural networks as will be described in 4.2, which make it certain that the tex-

ture estimation parameters to apply Gabor filter should be approximated with maximum

accuracy. To boost the facility involving feature vectors, color components are induced

within the HSV domain soon after approximating this by way of wavelet decomposition

over the color histograms. The proposed features are applied on all images present in the

image repository, and their semantic classes are determined through ground truth train-

ing with Support Vector Machine and the finer neighborhood of every image. We also

generate inverted index over the semantic sets, which guarantees the fast image retrieval

after determining the semantic class of query image. Complete process of the proposed

method is represented in figure 4.8.

91


Figure 4.8: Proposed Method HSV.

4.3.1 Color Feature HSV Domain

Varieties of color spaces are available and serve effectively for different applications [153;

154]. The color features, in our method, we are extracted on the basis of edge detection

in HSV color space; To extract the edges on H Hue component, canny edge detector is

used. To obtain the color feature, main steps are:

(1) RGB image (I) is converted into HSV color space.

(2) After conversion of step 1, we separate theHSV components and apply canny edge

detector on H component of the image.

(3) In the next step, we combine the edges obtained in the previous step with unchanged

S and V .

(4) Convert the combined image back into single RGB image after step (3).

92


(5) Now separate the individual R, G, and B components and calculate the histogram

of each component. 256 bins are obtained from HR, HG, and HB.

(6) To improve the feature performance, we apply wavelet transform at each histogram

obtained in the previous step. We apply the discrete wavelet transform of HR at

level 2, HG, and HB are applied at level 3. After this step, we have 128 bins i.e. 64

bins from HR, 32 bins from HG, and 32 bins from HB.

(7) Calculate feature vector for every image in the repository.

Figure 4.9 shows that the color features are obtain as described in above mentioned steps.

Figure 4.9: (a) RGB Original Image; (b) H matrix Hue Image; (c) Canny Hue Image; (d)

Canny RGB Image.

4.3.2 Combine texture and HSV color feature

Application of the above mentioned procedure generate two feature vectors, representing

texture features obtained from Bandelet transform and color features obtained from HSV

color space. Aggregation of these feature vectors in a single vector represents feature

vector features against any image.

93

4.4 Chapter Summary

4.4 Chapter Summary

Content-based image retrieval systems can perform consistently well if they are able to

verify their output once the images are retrieved. The efficient image retrieval and the

output that is semantically correct need the images to be represented in a powerful way,

and must be transparent to the variations of the features that can cause the undesired output

when matched with the images present in the repository. To meet all these goals, an image

representation scheme is introduced in this chapter that performs the in-depth analysis of

the texture by the Bandelet transform, Artificial Neural Network and the Gabor filters. To

further enhance the image representation capabilities, color features are also incorporated.

All this guarantees the retrieval of images in a more systematic way.

94

CHAPTER 5

STATE OF ART CLASSIFIER FOR

IMAGE RETRIEVAL

“The section compares the proposed method, against several existing state of art tech-

niques based on CBIR. It is to provide details of experiments and comparative analysis

in the following subsections. For image retrieval purposes, Artificial Neural Networks

(ANN) and Support Vector Machine (SVM) are applied and the performance of the sys-

tem and achievement is evaluated on three standard data sets are used in the domain of

CBIR.”

5.1 Semantic Association

The semantic image retrieval refers towards the capability of a system to understand the

meanings of the visual content that constitute an image, and when searched than bring

the images, where the same concept dominates. Efforts are carried out for determining

the ways through which the correct semantic association rate against the queries can be

increased. The research in the area of CBIR, ultimately revolves around this one intention.

But the real challenge exists in the interpretation of the semantics, as it is a challenging

thing even for the human beings, and machines suffer the same problem with far more

severity. A two-fold strategy is normally followed in the standard CBIR systems for

95


improving the semantic association:

• Generate the powerful image representations, that could make it certain that the

meaningful image retrieval will occur in the response of the queries.

• Diminish the semantic gap between low-level image descriptions (here the image

descriptions are interchangeably used for image representations) and the high level

image semantics. Many CBIR systems apply different machine learning techniques

to achieve this goal [155].

Hence, it is believed in the CBIR research that the meaningful image representations lead

to the high semantic retrieval performance. Therefore, a lot of research in the domain of

the content based image retrieval has been carried out for the improvement of the visual

features and determining the new feature types. These approaches usually match the

feature vector of the query image with the feature vectors of the images present in the

repository and index of the images on the basis of similarity.

“But this approach suffers from two main flaws due to which the results of this ap-

proach are not satisfactory: (1) CBIR rely solely on image features; that they recovered

the order of images based on the distance feature with an image query and generate out-

put; in this respect, they do not check their outputs. The problem with this methodology

is that: many images may seem as the response of the images while they are not neces-

sarily relevant at all. For example as described in the figure 5.1: the CBIR system may be

deceived by the girl’s image, against the dog image as query, because both of the images

are very close to each other in the terms of feature distance. (2) Secondly, the similarity

amongst the neighbors of the query image for output finalization which is a major reason

of the inconsistent output. As it is our experimental observation, that the probability of

the right semantic association, which occurs on the base of single image, is far less than

the right semantic association of the multiple images [156]”.

96


Figure 5.1: Verification inconsistency: Similarity of the visual features, but both images

belong from different semantic classes.

Keeping the above mentioned points in the mind, the goal of research in our thesis

focus on finding ways in which this kind of defects can be avoided and can improve the

retrieval of images from CBIR systems performance. For this, we have focused on the

following points:

• To achieve the semantically correct retrieval of the images, an effective feature ex-

traction method is introduced.

• To evaluate the proposed feature-sets, and to reduce the inherent gap between the

image features and the semantics, a neural network based architecture is presented.

• To raise the right semantic association probability and bring the output verification,

true neighborhood of the query images is detected and utilized for the finalization

of the semantic class.

• Pearson correlation is used as a distance measure and for the neighborhood selec-

tion, which also verifies the effectiveness of proposed scheme.

• To evaluate the image representation, and to reduce the semantic gap, Support vec-

tor machine is presented.

97

5.2 Content based image retrieval using ANN

5.2 Content based image retrieval using ANN

Once the images present in the image repository are represented in the form of low level

features, we can determine their semantic class. To determine the actual semantic class, a

sub repository of the images is generated having the representation of “M” known classes,

and every class contains R ≥ 2 images. In our implementation, the value of “R” is set

on 30, which means that 30 images from every semantic class from a ground truth image

repository are used for the development of training repository. On this sub repository

neural networks with specific class association parameters are trained. One against all

classes (OAA) association rule is used for the network development, with the target to

decrease the mean squared error between actual association and the NN based obtained

association. The class specific training set can be defined as: Ωtr = Ωpos

⋃Ωneg where

Ωpos are representing the R images from a particular class, and Ωneg are all other images

in the training repository. Once the training is complete, semantic class of all images

present in the image repository is determined on the base of decision function and the

association rules.

“Due to the object composition present in the images, many images may tend to show

the association with some other classes, i.e., in the case of Mountain images which some-

times associate with the beach images. Therefore, a mechanism is required to reduce

such associations. This is the reason, that the class finalization process also involves top

“K” neighbors in the semantic association process using the majority voting rule (MVR)

[157]”:

C∗(x) = sgn

∑i

Ci(X)− K − 1

2

(5.1)

“where Ci(X) is the class wise association of input image and its top neighbor.

Ci(X) = yf l (5.2)

where l = 1, 2, 3..., n represents the total number of neural network structures, and yf l

returns the association factor of a particular neural network structure with a specific class.

MVR counts the largest number of classifiers that agree with each other [157]. Therefore,

98

5.3 Content based image retrieval using SVM

the class association can be determined by:

C∗F (x) = argmax(∑i

C∗(x)) (5.3)

Once the semantic association of all images present in the image repository is deter-

mined, we store the semantic association values in a file that serves for us as the semantic

association database. Therefore, after determining the semantic class of the query im-

age through trained neural networks, we compute the Euclidean distance of the query

image with all images of the same semantic class by taking into an account the values

of the semantic association database, and generate the output on the basis of the feature

similarities.”


“Application of the above mentioned procedure returns two feature vectors, representing

texture features obtained from Bandelet transform, and color features obtained from HSV

color space. Amassment of these feature vectors in a single vector represents feature

vector features against any image. Support Vector Machines (SVMs) tend to be under

the supervision of the study of the techniques used to classify images. They look at

specific image database as two sets of vectors in ’n’ dimensional space and build a high

level of isolation that increases the margin between the images of the relevant query and

pictures are not relevant to the query. SVM is usually the method of the nucleus and

the nucleus function used in SVM is very crucial in determining performance. The basic

principle for SVMs is the maximum margin of workbook. Using the methods of the

nucleus, the data can first be set implicitly to a high-dimensional space nucleus. Kernel

function corresponding decision, in the original space can be non-linear is determined

by the maximum margin of seed in space. Now, suppose a set of inputs belong to two

categories as [158]”:

(Xi, Yi)Ni=1 = (+1,−1) (5.4)

99


where xi and yi are input sets and hyper planes are created by finding the efficient

values of weight vector w and bias b as fallow:

W T .X + b = 0 (5.5)

Two classes can be separated from each other, therefore, first find maximum margin 2/w

hyper planes:

W T .Xi + b >= +1 (5.6)

W T .Xi + b <= −1 (5.7)

For Binary classification through kernel version of Wolfe dual problem with La-

grangian multiplied by αi.

W (α) =m∑

i,j=1

αi − 1/2m∑

i,j=1

αiαjyiyjK(Xi, Xj) (5.8)

where αi = 0 and∑m

i=1 αiyi = 0 are subject constraint. After getting the optimal

values of αi, the decision function is based on kernel function of SVM classifier is given

by:

F (X) = Sgn[g(x)] (5.9)

g(x) =m∑i=1

αiYiK(Xi, Xj) + b (5.10)

“Above output equation is known as hyper plane decision function of SVM. High estima-

tion values of g(x) represent high prediction confidence and vice versa. After generating

a sub-repository of images, as well as generating new feature through a combination of

color and texture features, we train category-specific support vector machines on this

sub-repository with the idea of one against all classes (OAA) classification. All vectors

described feature found in certain positive training class group with 1, and all other fea-

ture vectors, which do not belong to this category specific with 0. In this way, we identify

training packages for all categories of works train SVM them optimal use quadratic pro-

gramming, and maintaining maximum iterations = 1000. After training of these support

vector machines, all images are tested against all trained support vector machines present

100


in image repository, and on the basis of decision function, they are associated with their

specific semantic class. Our decision function is as follows:”

I∗hsv = argmax(Y ′fl) (5.11)

Where l = 1, 2, ..., n are the total number of support vector machines, (argmax) is

the set of points of the given argument for which the given function attains its maximum

value, (Y ′fl) returns the association of corresponding support vector machine, and I∗hsv

represents the obtained associated class. Due to the object composition present in the

images, many images may tend to show the association with some other classes i.e. in

the case of Mountain images which sometimes associate with the beach images, therefore

a mechanism is required to reduce such associations. This is the reason that, the class

finalization process also involves top K neighbors in the semantic association process

using the majority voting rule (MVR):

C∗(X) = sgn∑i

Ci(X)− K − 1

2(5.12)

Where Ci(X) is the class wise association of input image with its top neighbors. MVR

counts the largest number of classifiers that agree with each other [159]. So, according to

equation 5.12 the class association can be determined by:

Cfe(X) = argmax(∑i

C∗(X)) (5.13)

“The aforementioned process will be applied on all images present in the image repository

and their semantic class is determined. In this manner, when the system suggests an output

or semantic class for any query image, only images having the same semantic class are

returned to the user after ranking on the base of distance with respect to the query image

of feature similarities.”


To elaborate the retrieval capabilities of the proposed method, numerous experiments are

performed on three different image datasets. For implementation purposes, we have used

101


Matlab 2010 in the Windows 7 environment using a core i3 machine by Dell. The detail of

the experiments will be presented in the following subsections. Section 5.4.1 describes the

datasets used for image retrieval purposes. Section 5.4.2 is about the retrieval precision

and recall on randomly selected queries. Section 5.4.3 and 5.4.5 describe the comparison

results of the proposed method with some state of the art works in CBIR.

5.4.1 Image Datasets

For our experiments, we have used three image datasets namely: Corel, Coil, and Cal-

tech 101. The Corel dataset contains 10,908 images of the size 384 × 256 or 256 × 384

each. For this dataset, we have reported the results on ten semantic categories having

100 images in each category. These semantic classes are namely: Africa, Buses, Beach,

Dinosaurs, Buildings, Elephants, Horses, Mountains, Flowers, and Food. The reason for

our choice to report the result on these categories is that: these categories are the same

semantic groups used by most of the researchers who are working in the domain of CBIR

to report the effectiveness of their work [2; 85; 89; 115; 117], so a clear performance com-

parison is possible in term of the reported results. To further elaborate the performance of

the proposed system, experiments are also performed on Columbia object image library

(COIL) [89] having 7200 images from 100 different categories. Finally, we have used the

Caltech 101 image set. This dataset consists of 101 image categories and every category

has different number of images in it. For the simplification purposes, we have manually

selected 30 categories which contain at least 100 images from every semantic category.

5.4.2 Retrieval Precision/Recall Evaluation

For our experiments, we have written a computer simulation which randomly selects 300

images from image repository and uses them as the query image. As already described

above, we are using image datasets having the images grouped in the form of semantic

concepts, and on the base of their labels, we can automatically determine their semantic

association. We run this simulation on all three datasets mentioned previously and de-

termine the performance by counting how many correct results are obtained against each

102


query image. In proposed work, we reported the average result after running the five time

experiments from each image category. For our experiments, a reverted index mechanism

is proposed which after determining the semantic class of the query image, returns the rel-

evant images against it, i.e., a method followed by the Google for text document search.

According to the proposed method, we apply the trained neural networks on every image

present in the image repository and determine its semantic class. The class association

information is stored in a file which serves for us the semantic association database. The

usability of this approach is that, after determining the semantic information for one time,

we only need to determine the semantic class of the query image and relevance informa-

tion about the predetermined semantic cluster. Overall, the class association accuracy is

determined in terms of the precision and recall using following formulas:

Precision =NA(q)

NR(q)

(5.14)

Recall =NA(q)

Nt

(5.15)

“where NA(q) represents the relevant images matching to the query image, NR(q) repre-

sents the retrieved images against the query image, and Nt the total number of relevant

images available in the database. Precision or specificity determines the ability of sys-

tem to retrieve only those images which are relevant for any query image amongst all

of the retrieved images, while Recall rate also known as sensitivity or true positive rate,

determines the ability of classifier systems in terms of model association with their actual

class. For the elaboration of results, top 20 retrieved images against any query image are

used to compute the precision and recall. We have reported the average of the results

as we mentioned previously after running our technique for five times.” To elaborate the

performance of the proposed system, we have randomly selected three images from 10

previously mentioned image categories from Corel image set and displayed their results

in Figure 5.2. Here we have displayed the retrieval results which represent the precision

obtained by our method against these query images. In this regard, we have displayed the

results from the top 10 to top 40 retrievals against these randomly selected query images.

The quantitative analysis of the proposed method suggests that the quality of the system

103


is good in terms of precision as reliable results are appearing against these random selec-

tions. Most reliable results appear in the range of 10 to 30 images against query image as

there are 100 images in a single category. The important thing to note here is that these

are the results we have achieved without involving any kind of the external supervision

done by the user, as most of the relevance feedback based CBIR techniques do.

In figure 5.3 and 5.4, the same experiment is performed on Caltech 101 image set

by randomly selecting 4 images. Precision and Recall are reported on top 10 to top 60

retrieval rates. Hence, on the base of retrieval accuracy we can say the proposed method

is quite efficient. Another important thing to note is that the results reported here repre-

sent the retrieval against three random queries, while overall accuracy is reported on the

average of 100 query images and performing the experiments for five times.

5.4.3 Comparison on Corel Image Set

“To determine the usability of proposed method, it is compared with some state of the

art methods in CBIR. In this regard, the technique (texture + Color YCbCr and ANN) is

compared with [2; 85; 89; 115; 117]. The reason for our choice to compare with these

techniques is that these systems have reported their results on the common denomination

of the ten semantic categories of Corel dataset as described earlier. Therefore, a clear

performance comparison is possible. Table 5.1 explains the class wise comparison of

the proposed system with other comparative systems in terms of precision. The results

show that proposed system has performed better than all other systems in terms of av-

erage precision obtained. Table 5.2 describes the performance comparison in terms of

Recall rates with the same systems. From the results, it could be easily observed that the

proposed system has the highest recall rates. Figure 5.5 shows the performance of pro-

posed method in term of precision with other state of art system. Figure 5.6 describes the

performance comparison in terms of Recall rates with the same systems. From results, it

could be easily observed that the proposed system has the highest recall rates as show in

Table 5.2”.

104


Figure 5.2: Query Performance on Corel image dataset with Top 10 to Top 40 Retrievals.

105


Figure 5.3: Query Performance on Caltech image dataset with Top 10 to Top 60 Retrievals

in terms of Precision.

Figure 5.4: Query Performance on Caltech image dataset with Top 10 to Top 60 Retrievals

in terms of Recall.

“

106


Table 5.1: Comparison of mean precision obtained by proposed method with other stan-

dard retrieval systems on top 20 retrievals.

Class Proposed Method [117] [115] [85] [89] [2]

Africa 0.65 0.45 0.56 0.70 0.64 0.68

Beach 0.70 0.39 0.53 0.56 0.64 0.54

Buildings 0.75 0.37 0.61 0.57 0.70 0.54

Buses 0.95 0.74 0.89 0.87 0.92 0.88

Dinosaurs 1.00 0.91 0.98 0.97 0.99 0.99

Elephants 0.80 0.30 0.57 0.67 0.78 0.65

Flowers 0.95 0.85 0.89 0.91 0.95 0.89

Horses 0.90 0.56 0.78 0.83 0.95 0.80

Mountains 0.75 0.29 0.51 0.53 0.74 0.52

Food 0.75 0.36 0.69 0.74 0.81 0.73

Mean 0.820 0.522 0.701 0.735 0.812 0.722

”

Figure 5.5: Comparison of mean precision obtained by proposed method with other stan-

dard retrieval systems.

107


“

Table 5.2: Comparison of mean recall obtained by proposed method with other standard

retrieval systems on top 20 retrievals.


Africa 0.13 0.11 0.15 0.15 0.13 0.14

Beach 0.14 0.12 0.19 0.19 0.13 0.19

Buildings 0.15 0.12 0.18 0.18 0.14 0.17

Buses 0.19 0.09 0.11 0.11 0.18 0.12

Dinasours 0.20 0.07 0.09 0.09 0.20 0.10

Elephants 0.16 0.13 0.15 0.15 0.16 0.14

Flowers 0.19 0.08 0.11 0.11 0.19 0.11

Horses 0.18 0.10 0.13 0.13 0.19 0.13

Mountains 0.15 0.13 0.22 0.22 0.15 0.21

Food 0.15 0.12 0.13 0.13 0.16 0.13

Mean 0.164 0.107 0.146 0.146 0.163 0.144

”

Figure 5.6: Comparison of mean recall obtained by proposed method with other standard

retrieval systems.

108


5.4.4 Comparison with State-of-the-Art Methods

The retrieval results are compared with the state-of-art image retrieval methods, includ-

ing the methods of Efficient content-based image retrieval using Multiple Support Vector

Machines Ensemble (EMSVM) [160], Simplicity [107], CLUE [118], patch based His-

togram of Oriented Gradients-Local Binary Pattern ( Patch based HOG-LBP) [161], and

Edge orientation difference histogram and color-SIFT (EODH and Color-SIFT) [162].

The reason of our choice for comparison with these techniques is that: these systems

have reported their results on the common denomination of the ten semantic categories of

Corel dataset as described earlier. Hence, a clear performance comparison is possible with

above state of art methods. “Table 5.3 presents comparison of the proposed system with

other comparative systems in terms of average precision. The results show that proposed

system has performed better than all other systems in terms of average precision obtained.

Same results are graphically illustrated in figure 5.7. Table 5.4 describes the performance

comparison in terms of Recall rates with the same systems. From the results, it can easily

observed that the proposed system has the better recall rates. Figure 5.8 describes the

performance comparison in terms of recall rates with the same systems.” “

109


Table 5.3: Comparison of mean precision obtained by proposed method with state-of-art

methods on top 20 retrievals.

Class Proposed EMSVM Simplicity CLUE HOG-LBP SIFT

[160] [107] [118] [161] [162]

Africa 0.65 0.5 0.4 0.5 0.55 0.75

Beach 0.70 0.7 0.3 0.35 0.47 0.38

Buildings 0.75 0.2 0.4 0.45 0.56 0.54

Buses 0.95 0.8 0.6 0.65 0.91 0.97

Dinosaurs 1.00 0.9 0.96 0.95 0.94 0.99

Elephants 0.80 0.6 0.3 0.3 0.49 0.66

Flowers 0.95 1.00 0.6 0.75 0.85 0.92

Horses 0.90 0.8 0.6 0.7 0.52 0.87

Mountains 0.75 0.5 0.25 0.3 0.37 0.59

Food 0.75 0.6 0.45 0.6 0.55 0.62

Mean 0.820 0.661 0.486 0.555 0.621 0.729

”

Figure 5.7: Comparison of mean precision obtained by proposed method with state of art

retrieval systems.

“

110


Table 5.4: Comparison of mean recall obtained by proposed method with state-of-art

methods on top 20 retrievals.


Africa 0.13 0.1 0.08 0.1 0.11 0.15

Beach 0.14 0.14 0.06 0.07 0.09 0.08

Buildings 0.15 0.04 0.07 0.09 0.11 0.11

Buses 0.19 0.14 0.12 0.13 0.18 0.19

Dinasours 0.20 0.18 0.19 0.19 0.1 0.13

Elephants 0.16 0.12 0.06 0.06 0.1 0.13

Flowers 0.19 0.2 0.12 0.15 0.17 0.18

Horses 0.18 0.16 0.12 0.14 0.1 0.17

Mountains 0.15 0.1 0.05 0.06 0.08 0.12

Food 0.15 0.12 0.09 0.12 0.11 0.13

Mean 0.164 0.130 0.096 0.111 0.124 0.146

”

Figure 5.8: Comparison of mean recall obtained by proposed method with state of art

retrieval systems.

“The second technique (Bandelet texture + Color HSV and SVM) is compared with

[6; 85; 89; 115; 159; 160; 162]. Table 5.5 explains the class wise comparison of the pro-

posed system with other comparative systems in terms of precision. The results show that

111


proposed system has performed better than all other systems in terms of average precision

obtained. Table 5.6 describes the performance comparison in terms of Recall rates with

the same systems. From the results, it could easily be observed that the proposed sys-

tem has the highest recall rates. Figure 5.9 and 5.10 shows the performance of proposed

method in term of precision and recall rates with other state of art system”. “

Table 5.5: Mean Precision of Proposed Method-HSV compared with other standard re-

trieval systems on top 20 retrievals.

Class Proposed Method-HSV [115] [89] [85] [159] [160] [162] [6]

Africa 0.8 0.56 0.64 0.70 0.42 0.5 0.75 0.65

Beach 0.75 0.53 0.64 0.56 0.45 0.7 0.38 0.7

Buildings 0.75 0.61 0.70 0.57 0.41 0.2 0.54 0.75

Buses 0.9 0.89 0.92 0.87 0.85 0.8 0.97 0.95

Dinosaurs 1.00 0.98 0.99 0.97 0.59 0.9 0.99 1.0

Elephants 0.90 0.57 0.78 0.67 0.43 0.6 0.66 0.8

Flowers 0.8 0.89 0.95 0.91 0.90 1.0 0.92 0.95

Horses 0.90 0.78 0.95 0.83 0.59 0.80 0.87 0.9

Mountains 0.70 0.51 0.74 0.53 0.27 0.5 0.59 0.75

Food 0.8 0.69 0.81 0.74 0.43 0.6 0.62 0.75

Mean 0.830 0.701 0.812 0.735 0.534 0.660 0.729 0.820

”

112


Figure 5.9: Mean Precision of Proposed Method-HSV compared with other standard re-

trieval systems on top 20 retrievals.

“

Table 5.6: Comparison of mean recall obtained by proposed method-HSV with other

standard retrieval systems on top 20 retrievals.

Class Proposed Method-HSV [115] [89] [85] [159] [160] [162] [6]

Africa 0.16 0.15 0.13 0.15 0.08 0.1 0.15 0.13

Beach 0.15 0.19 0.13 0.19 0.09 0.14 0.08 0.14

Buildings 0.15 0.18 0.14 0.18 0.08 0.04 0.11 0.15

Buses 0.18 0.11 0.18 0.11 0.17 0.14 0.19 0.19

Dinasours 0.20 0.09 0.20 0.09 0.12 0.18 0.13 0.2

Elephants 0.18 0.15 0.16 0.15 0.09 0.12 0.13 0.16

Flowers 0.16 0.11 0.19 0.11 0.18 0.2 0.18 0.19

Horses 0.18 0.13 0.19 0.13 0.12 0.16 0.17 0.18

Mountains 0.14 0.22 0.15 0.22 0.05 0.1 0.12 0.15

Food 0.16 0.13 0.16 0.13 0.09 0.12 0.13 0.15

Mean 0.166 0.146 0.163 0.146 0.107 0.130 0.139 0.164

”

113


Figure 5.10: Comparison of mean recall obtained by proposed method-HSV with other

standard retrieval systems.

5.4.5 Comparison on Coil Image Set

From the precision and recall results described for the Corel dataset, we can observe

that (ICTEDCT) has second highest rates in terms of precision and recall. Therefore,

we have reported the performance comparison on Coil dataset on different retrieval rates

against ICTEDCT [89]. For this experiment, five images are selected from each image

category and then performance of both systems is compared against each category. From

the results elaborated in figure 5.11, it can clearly be observed that proposed method is

giving higher recall and precision rates as compare to ICTEDCT [89]. Hence, from the

results of proposed method on Coil and Corel datasets, we can say that proposed method

is much more precise and effective than other CBIR systems.

114

5.5 Chapter Summary

Figure 5.11: Comparison of precision and recall obtained by proposed method with ICT-

EDCT.

5.5 Chapter Summary

With several application benefits, content based image retrieval has gained a lot of re-

search attention. The research work has introduced a mechanism for automatic image

retrieval. The major consideration of the research was on a finding that the most promi-

nent image results can appear if we generate the image representations that emphasize the

core image objects instead of considering every image patch. Therefore, we have applied

the bandelet transform for feature extraction which considers the core objects found in an

image. To further enhance the image representation capabilities, color features are also

incorporated. Semantic association is performed through artificial neural networks, and

an inverted index mechanism is used to return the images against queries to assure the

fast retrieval. The results of the proposed method are reported on three image data sets

namely: Corel, Coil, and Caltech-101. The comparison results with other standard CBIR

systems have revealed that the proposed system has outperformed other systems in terms

of average precision and recall values.

115

CHAPTER 6

CONCLUSIONS AND FUTURE WORK

6.1 Conclusion

This chapter presents major research innovations of our thesis, and also provides an in-

sight for the future work. The focus of research is on the identification of the ways that

can be applied to achieve the content based image retrieval in a stable way. The ultimate

goal that was aimed to be addressed was the reduction of the semantic gap that was a

major source of the improper image retrieval output. The reasons in this regard were

identified as: (1) Improper representation of the images. (2) Improper similarity detec-

tion measures that are incorporated to bring the semantically correct output. Therefore,

to overcome these limitations, the focus of the research is on the identification of a novel

scheme of feature extraction that is able to compute the global image energy on the basis

of the core objects found in an image. In this regard, for feature extraction, we performed

the in depth image analysis by the Bandelet transform and identify the objects that return

us the image theme as being the core objects. Once these objects are identified, we com-

puted the image energy by remaining the feature extraction process around the blocks that

contain the objects of interest. For this, texture of the blocks are considered and texture

specific parameters are applied and image representations are generated by computing the

Eigen values of the Gabor filters. The texture features heavily rely on the artificial neural

networks that gives the precise information about the type of texture, so that the texture

specific parameters are applied. To further improve the capabilities of the feature vectors,

the color features obtained by the YCbCr domain are fused with the Bandelet and Gabor

116

6.2 Future Work

based texture representations. This result in the form of very powerful image representa-

tions that are invariant to scale, and rotation. Over these representations, semantic sensors

(category specific Artificial Neural network) are trained that guarantee the robust output

against the image queries [6].

The work presented in this thesis has some limitations as well. We performed the

content based image retrieval for fixed number of semantic categories. Therefore, we

admit that, the methods described are not able to handle the datasets where categories

are unsupervised, infinite or the system is not trained over them. Another limitation is

elaborated in section 1.2 using figure 1.1 and 1.2. If we try to match the contents of figure

1.2 and 1.1, then it is not possible to find the relevant information although the figure

1.2 contains the complete contents of figure 1.1. The reason is that the global image

frequencies of both images are different. In this thesis, we used three datasets namely

Corel, Coil and Caltech 101 to evaluate the image retrieval performance.

6.2 Future Work

There is a tremendous scope of research in the domain of CBIR. The major emphasis

would be on the development of a method which would be able to overcome the limita-

tions, which are present in the current work. Secondly, we can extend our work for pure

application development having practical implications. Here, we would like to emphasize

on some of them.

6.2.1 Extensions of the Work

• In the current research work, problem of automatic image retrieval is addressed, in

this regard, the emphasis was on the development of a robust way of image rep-

resentation and retrieval. The results of the techniques are comparatively higher

compared against the other state of the art methods. But still, for many images, the

system will not be able find the exact semantic associations. The major reason for

this improper output is the diversity of the content found in the images. Therefore,

117

6.2 Future Work

the situation like this is needed to be handled during the image retrieval. The sim-

plest solution is the involvement of user in the image retrieval process to guide the

system by feedbacks when it is not able to produce the desired output. Hence, in

future, we will extend our method to address the problem of miss-associations by

the relevance feedback.

• We can further extend the capabilities of the proposed image retrieval technique

by including the evolutionary algorithms like GA, and PSO and can consider the

image retrieval problem as an optimization problem where we are concerned with

the maximization of the image retrieval results. Currently, we are working on the

application of fuzzy Genetic Algorithms for obtaining the semantic groups in the

data.

6.2.2 Practical Implications of CBIR

We can extend the CBIR work presented in this dissertation for the application develop-

ment having impacts over the society. Here we would like to present a few of them.

6.2.3 Anomalous Events Monitoring in Surveillance Videos

Government of Pakistan is planning to install over 1.5 million operational CCTV cameras,

that is virtually one camera for every 100th person, and this figure is increasing at a fast

rate throughout the world. Effectively monitoring hundreds or even thousands of cameras

at the same time is challenging, if not impossible. Research shows that humans have an

attention span of about twenty minutes and even less when doing mundane tasks. Security

personnel can often become bored or distracted and have other duties that keep them from

efficiently monitoring cameras. Given the high stakes inherent in security operations, a

proactive approach to intelligent video analysis is vital. Therefore, the basic theme of the

product for Video analysis would be the complete self-learning.

“Security concerns are not by any means, the only component driving the rapid growth

of CCTV cameras. Another essential reason is the access of hidden knowledge extracted

118

6.2 Future Work

from CCTV footage to be used for effective business decision making, such as customer

services, store designing, reducing store shrinkage, product marketing etc. Activities

happening inside witnessed views are generally the most crucial semantic organizations

that could be taken coming from videos and recordings. Most of the work, presented in

the past to find patterns or recurring event dealing with the discovery of natural events.

In contrast, here we can frame to detect anomalous events unknown status associated

with recurring series of events. This is to discover the events which are not likely to

follow the repeated sequence of events. This information can be very useful to detect

abnormal events which are unknown and could provide intelligence information early in

the redistribution of resources on specific areas of the display.”

6.2.4 Traffic Management

We can apply the ideas of the content based image retrieval research presented in this

thesis for effectively monitoring the traffic. Cameras will be mounted on the highways,

and based on the ideas discussed above, we can identify the places automatically where

traffic-jams will occur. If we have such information, then the resources could be deployed

to get-rid from the traffic-jam. We can also stream such information over the Internet

in the real time, so that the people can get the suggestions before coming there, and can

choose an alternate path to reach their destination. Hence, this research will be a great

step towards the concept of ”Intelligent Cities”.

6.2.5 CBIR using Hadoop

“With information technology developing rapidly, variety and quantity of image data is

increasing rapidly. How to retrieve desired images, among massive image sets having

millions of images is an open problem. For these very large image sets where images

are retrieved in a content-based way, we can speed up the image retrieval process by

utilizing MapReduce distributed and parallel computing model. Here, it is important to

mention that MapReduce is the abstraction of the model while Hadoop serves as the tool

for the implementation. Through Hadoop, we can design and implement a system that can

119

6.2 Future Work

overcome the performance bottlenecks brought by computing complexity and big amount

of data, when constructing a CBIR system.”

120

References

[1] G.-H. Liu, Z.-Y. Li, L. Zhang, and Y. Xu, “Image retrieval based on micro-structure

descriptor,” Pattern Recognition, vol. 44, no. 9, pp. 2123–2133, 2011.

[2] C.-H. Lin, R.-T. Chen, and Y.-K. Chan, “A smart content-based image retrieval

system based on color and texture feature,” Image and Vision Computing, vol. 27,

no. 6, pp. 658–665, 2009.

[3] F. Long, H. Zhang, and D. D. Feng, “Fundamentals of content-based image re-

trieval,” in Multimedia Information Retrieval and Management. Springer, 2003,

pp. 1–26.

[4] N. Gandal, “The dynamics of competition in the internet search engine market,”

International Journal of Industrial Organization, vol. 19, no. 7, pp. 1103–1117,

2001.

[5] E. S. McIntyre, “Search engine optimization,” 2015.

[6] R. Ashraf, K. Bashir, A. Irtaza, and M. T. Mahmood, “Content based image

retrieval using embedded neural networks with bandletized regions,” Entropy,

vol. 17, no. 6, pp. 3552–3580, 2015.

[7] V. N. Gudivada and V. V. Raghavan, “Content based image retrieval systems,”

Computer, vol. 28, no. 9, pp. 18–22, 1995.

[8] A. W. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, “Content-based

image retrieval at the end of the early years,” Pattern Analysis and Machine Intel-

ligence, IEEE Transactions on, vol. 22, no. 12, pp. 1349–1380, 2000.

121

REFERENCES

[9] R. Datta, J. Li, and J. Z. Wang, “Content-based image retrieval: approaches and

trends of the new age,” in Proceedings of the 7th ACM SIGMM international work-

shop on Multimedia information retrieval. ACM, 2005, pp. 253–262.

[10] A. Singla and M. Garg, “Qbic, mars and viper: A review on content based image

retrieval techniques.”

[11] M. A. Stricker and M. Orengo, “Similarity of color images,” in IS&T/SPIE’s Sym-

posium on Electronic Imaging: Science & Technology. International Society for

Optics and Photonics, 1995, pp. 381–392.

[12] W. Forstner, “A framework for low level feature extraction,” in Computer Vi-

sionECCV’94. Springer, 1994, pp. 383–394.

[13] I. K. Sethi, I. L. Coman, and D. Stan, “Mining association rules between low-level

image features and high-level concepts,” in Aerospace/Defense Sensing, Simula-

tion, and Controls. International Society for Optics and Photonics, 2001, pp.

279–290.

[14] M. Rehman, M. Iqbal, M. Sharif, and M. Raza, “Content based image retrieval:

survey,” World Applied Sciences Journal, vol. 19, no. 3, pp. 404–412, 2012.

[15] D. G. Thakore and A. Trivedi, “Content based image retrieval techniques–issues,

analysis and the state of the art,” BVM Engineering College, Gujarat, 2010.

[16] C. Carson, S. Belongie, H. Greenspan, and J. Malik, “Blobworld: Image segmen-

tation using expectation-maximization and its application to image querying,” Pat-

tern Analysis and Machine Intelligence, IEEE Transactions on, vol. 24, no. 8, pp.

1026–1038, 2002.

[17] W.-Y. Ma and B. S. Manjunath, “Netra: A toolbox for navigating large image

databases,” Multimedia systems, vol. 7, no. 3, pp. 184–198, 1999.

122

REFERENCES

[18] O. Marques, L. M. Mayron, G. B. Borba, and H. R. Gamba, “An attention-driven

model for grouping similar images with image retrieval applications,” EURASIP

Journal on Applied Signal Processing, vol. 2007, no. 1, pp. 116–116, 2007.

[19] M. Mayron Liam, “Image retrieval using visual attention,” Ph.D. dissertation, PhD

thesis/Liam M. Mayron–Boca Raton, Florida, 2008.–217 p, 2008.

[20] S. Mavandadi, P. Aarabi, A. Khaleghi, and R. Appel, “Predictive dynamic user

interfaces for interactive visual search,” in Multimedia and Expo, 2006 IEEE Inter-

national Conference on. IEEE, 2006, pp. 381–384.

[21] V. Mezaris, I. Kompatsiaris, and M. G. Strintzis, “An ontology approach to object-

based image retrieval,” in Image Processing, 2003. ICIP 2003. Proceedings. 2003

International Conference on, vol. 2. IEEE, 2003, pp. II–511.

[22] Y. Liu, D. Zhang, G. Lu, and W.-Y. Ma, “A survey of content-based image retrieval

with high-level semantics,” Pattern Recognition, vol. 40, no. 1, pp. 262–282, 2007.

[23] K. Madhu and R. Minu, “Image segmentation using improved jseg,” in Pattern

Recognition, Informatics and Mobile Engineering (PRIME), 2013 International

Conference on. IEEE, 2013, pp. 37–42.

[24] Y. Deng and B. Manjunath, “Unsupervised segmentation of color-texture regions in

images and video,” Pattern Analysis and Machine Intelligence, IEEE Transactions

on, vol. 23, no. 8, pp. 800–810, 2001.

[25] C. Gao, X. Zhang, and H. Wang, “A combined method for multi-class image se-

mantic segmentation,” Consumer Electronics, IEEE Transactions on, vol. 58, no. 2,

pp. 596–604, 2012.

[26] E. Borenstein and S. Ullman, “Combined top-down/bottom-up segmentation,” Pat-

tern Analysis and Machine Intelligence, IEEE Transactions on, vol. 30, no. 12, pp.

2109–2125, 2008.

123

REFERENCES

[27] Y. Rui, T. S. Huang, and S.-F. Chang, “Image retrieval: Current techniques, promis-

ing directions, and open issues,” Journal of visual communication and image rep-

resentation, vol. 10, no. 1, pp. 39–62, 1999.

[28] B. S. Manjunath, J.-R. Ohm, V. V. Vasudevan, and A. Yamada, “Color and texture

descriptors,” Circuits and Systems for Video Technology, IEEE Transactions on,

vol. 11, no. 6, pp. 703–715, 2001.

[29] M. Lamard, G. Cazuguel, G. Quellec, L. Bekri, C. Roux, and B. Cochener, “Con-

tent based image retrieval based on wavelet transform coefficients distribution,”

in Engineering in Medicine and Biology Society, 2007. EMBS 2007. 29th Annual

International Conference of the IEEE. IEEE, 2007, pp. 4532–4535.

[30] I. J. Sumana, G. Lu, and D. Zhang, “Comparison of curvelet and wavelet texture

features for content based image retrieval,” in Multimedia and Expo (ICME), 2012

IEEE International Conference on. IEEE, 2012, pp. 290–295.

[31] J. R. Smith and S.-F. Chang, “Transform features for texture classification and

discrimination in large image databases,” in Image Processing, 1994. Proceedings.

ICIP-94., IEEE International Conference, vol. 3. IEEE, 1994, pp. 407–411.

[32] D. Zhang, A. Wong, M. Indrawan, and G. Lu, “Content-based image retrieval using

gabor texture features,” in IEEE Pacific-Rim Conference on Multimedia, University

of Sydney, Australia, 2000.

[33] R. M. Haralick, “Statistical and structural approaches to texture,” Proceedings of

the IEEE, vol. 67, no. 5, pp. 786–804, 1979.

[34] A. Khokher and R. Talwar, “Content-based image retrieval: Feature extraction

techniques and applications,” in Conference proceedings, 2012.

[35] H. Tamura, S. Mori, and T. Yamawaki, “Textural features corresponding to visual

perception,” Systems, Man and Cybernetics, IEEE Transactions on, vol. 8, no. 6,

pp. 460–473, 1978.

124

REFERENCES

[36] C. C. Gotlieb and H. E. Kreyszig, “Texture descriptors based on co-occurrence

matrices,” Computer Vision, Graphics, and Image Processing, vol. 51, no. 1, pp.

70–86, 1990.

[37] P. D. Welch, “The use of fast fourier transform for the estimation of power spec-

tra: A method based on time averaging over short, modified periodograms,” IEEE

Transactions on audio and electroacoustics, vol. 15, no. 2, pp. 70–73, 1967.

[38] R. Milanese and M. Cherbuliez, “A rotation, translation, and scale-invariant ap-

proach to content-based image retrieval,” Journal of visual communication and

image representation, vol. 10, no. 2, pp. 186–196, 1999.

[39] B. S. Manjunath and W.-Y. Ma, “Texture features for browsing and retrieval of

image data,” Pattern Analysis and Machine Intelligence, IEEE Transactions on,

vol. 18, no. 8, pp. 837–842, 1996.

[40] J. Mao and A. K. Jain, “Texture classification and segmentation using multiresolu-

tion simultaneous autoregressive models,” Pattern recognition, vol. 25, no. 2, pp.

173–188, 1992.

[41] R. Datta, D. Joshi, J. Li, and J. Z. Wang, “Image retrieval: Ideas, influences, and

trends of the new age,” ACM Computing Surveys (CSUR), vol. 40, no. 2, p. 5, 2008.

[42] R. S. Choras, “Image feature extraction techniques and their applications for cbir

and biometrics systems,” International journal of biology and biomedical engi-

neering, vol. 1, no. 1, pp. 6–16, 2007.

[43] M. Madugunki, D. Bormane, S. Bhadoria, and C. Dethe, “Comparison of different

cbir techniques,” in Electronics Computer Technology (ICECT), 2011 3rd Interna-

tional Conference on, vol. 4. IEEE, 2011, pp. 372–375.

[44] T. Deselaers, D. Keysers, and H. Ney, “Features for image retrieval: an experimen-

tal comparison,” Information Retrieval, vol. 11, no. 2, pp. 77–107, 2008.

125

REFERENCES

[45] H. Jalab et al., “Image retrieval system based on color layout descriptor and gabor

filters,” in Open Systems (ICOS), 2011 IEEE Conference on. IEEE, 2011, pp.

32–36.

[46] R. Ashraf, “A novel approach for the gender classification through trained neural

networks,” Journal of Basic and Applied Scientific Research, 2014.

[47] M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani,

J. Hafner, D. Lee, D. Petkovic et al., “Query by image and video content: The qbic

system,” Computer, vol. 28, no. 9, pp. 23–32, 1995.

[48] P. Vacha and M. Haindl, “Image retrieval measures based on illumination invariant

textural mrf features,” in Proceedings of the 6th ACM international conference on

Image and video retrieval. ACM, 2007, pp. 448–454.

[49] P. Vacha, M. Haindl, and T. Suk, “Colour and rotation invariant textural features

based on markov random fields,” Pattern Recognition Letters, vol. 32, no. 6, pp.

771–779, 2011.

[50] D. Zhang, M. M. Islam, G. Lu, and I. J. Sumana, “Rotation invariant curvelet

features for region based image retrieval,” International journal of computer vision,

vol. 98, no. 2, pp. 187–201, 2012.

[51] P. P. Ohanian and R. C. Dubes, “Performance evaluation for four classes of textural

features,” Pattern recognition, vol. 25, no. 8, pp. 819–833, 1992.

[52] P. Howarth and S. Ruger, “Evaluation of texture features for content-based image

retrieval,” in Image and Video Retrieval. Springer, 2004, pp. 326–334.

[53] A. Amato and V. D. Lecce, “Edge detection techniques in image retrieval: the

semantic meaning of edge,” in Video/Image Processing and Multimedia Commu-

nications, 2003. 4th EURASIP Conference focused on, vol. 1. IEEE, 2003, pp.

143–148.

126

REFERENCES

[54] S. Nandagopalan, D. B. Adiga, and N. Deepak, “A universal model for content-

based image retrieval,” jiP, vol. 1, p. 5, 2008.

[55] L. M. Kaplan, R. Murenzi, and K. R. Namuduri, “Fast texture database retrieval

using extended fractal features,” in Photonics West’98 Electronic Imaging. Inter-

national Society for Optics and Photonics, 1997, pp. 162–173.

[56] L. M. Kaplan, “Extended fractal analysis for texture classification and segmenta-

tion,” Image Processing, IEEE Transactions on, vol. 8, no. 11, pp. 1572–1585,

1999.

[57] M. O. M. Dyla and H. Tairi, “Texture-based image retrieval based on fabemd,”

IJCSI, 2011.

[58] S. G. Mallat, “A theory for multiresolution signal decomposition: the wavelet rep-

resentation,” Pattern Analysis and Machine Intelligence, IEEE Transactions on,

vol. 11, no. 7, pp. 674–693, 1989.

[59] Y.-L. Huang, “A fast method for textural analysis of dct-based image,” in Journal

of information science and engineering. Citeseer, 2005.

[60] J. G. Daugman, “Complete discrete 2-d gabor transforms by neural networks for

image analysis and compression,” Acoustics, Speech and Signal Processing, IEEE

Transactions on, vol. 36, no. 7, pp. 1169–1179, 1988.

[61] E. Candes, L. Demanet, D. Donoho, and L. Ying, “Fast discrete curvelet trans-

forms,” Multiscale Modeling & Simulation, vol. 5, no. 3, pp. 861–899, 2006.

[62] I. J. Sumana, M. M. Islam, D. Zhang, and G. Lu, “Content based image retrieval

using curvelet transform,” in Multimedia Signal Processing, 2008 IEEE 10th Work-

shop on. IEEE, 2008, pp. 11–16.

[63] S. Selvarajah and S. Kodituwakku, “Analysis and comparison of texture features

for content based image retrieval,” International Journal of Latest Trends in Com-

puting, vol. 2, no. 1, 2011.

127

REFERENCES

[64] K. Kosnar, V. Vonasek, M. Kulich, and L. Preucil, “Comparison of shape match-

ing techniques for place recognition,” in Mobile Robots (ECMR), 2013 European


[65] Y. Li, X. Chen, X. Fu, and S. Belkasim, “Multi-level discrete cosine transform for

content-based image retrieval by support vector machines,” in Image Processing,

2007. ICIP 2007. IEEE International Conference on, vol. 6. IEEE, 2007, pp.

VI–21.

[66] C.-W. Ngo, T.-C. Pong, and R. T. Chin, “Exploiting image indexing techniques in

dct domain,” pattern Recognition, vol. 34, no. 9, pp. 1841–1851, 2001.

[67] M. Banerjee and M. K. Kundu, “Content-based image retrieval using wavelet pack-

ets and fuzzy spatial relations,” in Computer Vision, Graphics and Image Process-

ing. Springer, 2006, pp. 861–871.

[68] R. Porter and N. Canagarajah, “Robust rotation-invariant texture classification:

wavelet, gabor filter and gmrf based schemes,” in Vision, Image and Signal Pro-

cessing, IEE Proceedings-, vol. 144, no. 3. IET, 1997, pp. 180–188.

[69] W.-Y. Ma and B. S. Manjunath, “Texture features and learning similarity,” in Com-

puter Vision and Pattern Recognition, 1996. Proceedings CVPR’96, 1996 IEEE

Computer Society Conference on. IEEE, 1996, pp. 425–430.

[70] P. Sarkar, C. Chakraborty, and M. Ghosh, “Content-based leukocyte image retrieval

ensembling quaternion fourier transform and gabor-wavelet features,” in Intelligent

Systems Design and Applications (ISDA), 2012 12th International Conference on.

IEEE, 2012, pp. 345–350.

[71] S. Selvarajah and S. Kodithuwakku, “Combined feature descriptor for content

based image retrieval,” in Industrial and Information Systems (ICIIS), 2011 6th

IEEE International Conference on. IEEE, 2011, pp. 164–168.

128

REFERENCES

[72] Y.-H. Lee, S.-B. Rhee, and B. Kim, “Content-based image retrieval using wavelet

spatial-color and gabor normalized texture in multi-resolution database,” in Inno-

vative Mobile and Internet Services in Ubiquitous Computing (IMIS), 2012 Sixth

International Conference on. IEEE, 2012, pp. 371–377.

[73] Z. Xue, S. Antani, L. R. Long, J. Jeronimo, and G. R. Thoma, “Investigating cbir

techniques for cervicographic images,” in AMIA Annual Symposium Proceedings,

vol. 2007. American Medical Informatics Association, 2007, p. 826.

[74] E. J. Candes, D. L. Donoho et al., Curvelets: A surprisingly effective nonadaptive

representation for objects with edges. DTIC Document, 1999.

[75] S. Arivazhagan, L. Ganesan, and S. T. Kumar, “Texture classification using curvelet

statistical and co-occurrence features,” in Pattern Recognition, 2006. ICPR 2006.

18th International Conference on, vol. 2. IEEE, 2006, pp. 938–941.

[76] J.-L. Starck, M. K. Nguyen, and F. Murtagh, “Wavelets and curvelets for image de-

convolution: a combined approach,” Signal Processing, vol. 83, no. 10, pp. 2279–

2283, 2003.

[77] J.-L. Starck, F. Murtagh, E. J. Candes, and D. L. Donoho, “Gray and color image

contrast enhancement by the curvelet transform,” Image Processing, IEEE Trans-

actions on, vol. 12, no. 6, pp. 706–717, 2003.

[78] J.-L. Starck, E. J. Candes, and D. L. Donoho, “The curvelet transform for image

denoising,” Image Processing, IEEE Transactions on, vol. 11, no. 6, pp. 670–684,

2002.

[79] J. Fadili and J.-L. Starck, “Curvelets and ridgelets,” in Computational Complexity.

Springer, 2012, pp. 754–773.

[80] G. Joutel, V. Eglin, S. Bres, and H. Emptoz, “Curvelets based feature extraction

of handwritten shapes for ancient manuscripts classification,” in Electronic Imag-

ing 2007. International Society for Optics and Photonics, 2007, pp. 65 000D–

65 000D.

129

REFERENCES

[81] A. Irtaza, M. A. Jaffar, and E. Aleisa, “Correlated networks for content based im-

age retrieval,” International Journal of Computational Intelligence Systems, vol. 6,

no. 6, pp. 1189–1205, 2013.

[82] A. N. Fierro-Radilla, M. Nakano-Miyatake, H. Perez-Meana, M. Cedillo-

Hernandez, and F. Garcia-Ugalde, “An efficient color descriptor based on global

and local color features for image retrieval,” in Electrical Engineering, Computing

Science and Automatic Control (CCE), 2013 10th International Conference on.

IEEE, 2013, pp. 233–238.

[83] G. Pass, R. Zabih, and J. Miller, “Comparing images using color coherence vec-

tors,” in Proceedings of the fourth ACM international conference on Multimedia.

ACM, 1997, pp. 65–73.

[84] R. O. Stehling, M. A. Nascimento, and A. X. Falcao, “A compact and efficient im-

age retrieval approach based on border/interior pixel classification,” in Proceedings

of the eleventh international conference on Information and knowledge manage-

ment. ACM, 2002, pp. 102–109.

[85] M. E. ElAlami, “A novel image retrieval model based on the most relevant fea-

tures,” Knowledge-Based Systems, vol. 24, no. 1, pp. 23–32, 2011.

[86] A. Al-Hamami and H. Al-Rashdan, “Improving the effectiveness of the color co-

herence vector.” Int. Arab J. Inf. Technol., vol. 7, no. 3, pp. 324–332, 2010.

[87] A. Irtaza and M. A. Jaffar, “Categorical image retrieval through genetically opti-

mized support vector machines (gosvm) and hybrid texture features,” Signal, Image

and Video Processing, pp. 1–17, 2014.

[88] S. Sural, G. Qian, and S. Pramanik, “Segmentation and histogram generation using

the hsv color space for image retrieval,” in Image Processing. 2002. Proceedings.

2002 International Conference on, vol. 2. IEEE, 2002, pp. II–589.

130

REFERENCES

[89] S. M. Youssef, “Ictedct-cbir: Integrating curvelet transform with enhanced dom-

inant colors extraction and texture analysis for efficient content-based image re-

trieval,” Computers & Electrical Engineering, vol. 38, no. 5, pp. 1358–1376, 2012.

[90] R. N. Sowmya Rani and S. Reddy, “Comparative study on content based image

retrieval,” International Journal of Future Computer and Communication, vol. 1,

no. 4, p. 366368, 2012.

[91] N.-C. Yang, W.-H. Chang, C.-M. Kuo, and T.-H. Li, “A fast mpeg-7 dominant

color extraction with new similarity measure for image retrieval,” Journal of Visual

Communication and Image Representation, vol. 19, no. 2, pp. 92–105, 2008.

[92] D. Zhang and G. Lu, “Review of shape representation and description techniques,”

Pattern recognition, vol. 37, no. 1, pp. 1–19, 2004.

[93] E. Chang, K. Goh, G. Sychay, and G. Wu, “Cbsa: content-based soft annotation

for multimodal image retrieval using bayes point machines,” Circuits and Systems

for Video Technology, IEEE Transactions on, vol. 13, no. 1, pp. 26–38, 2003.

[94] J. Iivarinen, M. Peura, J. Sarela, and A. Visa, “Comparison of combined shape

descriptors for irregular objects.” in BMVC. Citeseer, 1997.

[95] A. El-ghazal, O. Basir, and S. Belkasim, “Farthest point distance: A new shape sig-

nature for fourier descriptors,” Signal Processing: Image Communication, vol. 24,

no. 7, pp. 572–586, 2009.

[96] D. Zhang and G. Lu, “Content-based shape retrieval using different shape descrip-

tors: A comparative study,” in null. IEEE, 2001, p. 289.

[97] J. Wang, H. Zha, and R. Cipolla, “Combining interest points and edges for content-

based image retrieval,” in Image Processing, 2005. ICIP 2005. IEEE International

Conference on, vol. 3. IEEE, 2005, pp. III–1256.

131

REFERENCES

[98] K. Mikolajczyk and C. Schmid, “Indexing based on scale invariant interest points,”

in Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE International

Conference on, vol. 1. IEEE, 2001, pp. 525–531.

[99] M. Saad, H. Saleh, H. Konber, and M. Ashour, “Cbir system based on integration

between surf and global features,” 2013.

[100] K. Velmurugan and L. D. S. S. Baboo, “Content-based image retrieval using

surf and colour moments,” Global Journal of Computer Science and Technology,

vol. 11, no. 10, 2011.

[101] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Interna-

tional journal of computer vision, vol. 60, no. 2, pp. 91–110, 2004.

[102] S. A. Bakar, M. S. Hitam, W. Yussof, and W. N. J. Hj, “Content-based image re-

trieval using sift for binary and greyscale images,” in Signal and Image Processing

Applications (ICSIPA), 2013 IEEE International Conference on. IEEE, 2013, pp.

83–88.

[103] J. Zhang, “Robust content-based image retrieval of multi-example queries,” 2011.

[104] G. Qian, S. Sural, Y. Gu, and S. Pramanik, “Similarity between euclidean and

cosine angle distance for nearest neighbor queries,” in Proceedings of the 2004

ACM symposium on Applied computing. ACM, 2004, pp. 1232–1237.

[105] B. Thomee et al., “A picture is worth a thousand words: content-based image re-

trieval techniques,” Ph.D. dissertation, Leiden Institute of Advanced Computer Sci-

ence (LIACS), Faculty of Science, Leiden University, 2010.

[106] Y. Rui, T. S. Huang, M. Ortega, and S. Mehrotra, “Relevance feedback: a power

tool for interactive content-based image retrieval,” Circuits and Systems for Video

Technology, IEEE Transactions on, vol. 8, no. 5, pp. 644–655, 1998.

132

REFERENCES

[107] J. Z. Wang, J. Li, and G. Wiederhold, “Simplicity: Semantics-sensitive integrated

matching for picture libraries,” Pattern Analysis and Machine Intelligence, IEEE

Transactions on, vol. 23, no. 9, pp. 947–963, 2001.

[108] G.-H. Liu and J.-Y. Yang, “Content-based image retrieval using color difference

histogram,” Pattern Recognition, vol. 46, no. 1, pp. 188–198, 2013.

[109] M. R. Hejazi and Y.-S. Ho, “An efficient approach to texture-based image retrieval,”

International journal of imaging systems and technology, vol. 17, no. 5, pp. 295–

302, 2007.

[110] B. Prasad, K. K. Biswas, and S. Gupta, “Region-based image retrieval using inte-

grated color, shape, and location index,” computer vision and image understanding,

vol. 94, no. 1, pp. 193–233, 2004.

[111] D. Zhang, G. Lu et al., “A comparative study on shape retrieval using fourier de-

scriptors with different shape signatures,” in Proc. of international conference on

intelligent multimedia and distance education (ICIMADE01), 2001, pp. 1–9.

[112] M. Yang, K. Kpalma, J. Ronsin et al., “A survey of shape feature extraction tech-

niques,” Pattern recognition, pp. 43–90, 2008.

[113] A. Barley and C. Town, “Combinations of feature descriptors for texture image

classification,” Journal of Data Analysis and Information Processing, vol. 2014,

2014.

[114] P. J. Besl and N. D. McKay, “Method for registration of 3-d shapes,” in Robotics-

DL tentative. International Society for Optics and Photonics, 1992, pp. 586–606.

[115] M. B. Rao, B. P. Rao, and A. Govardhan, “Ctdcirs: content based image retrieval

system based on dominant color and texture features,” International Journal of

Computer Applications, vol. 18, no. 6, pp. 40–46, 2011.

133

REFERENCES

[116] Y. Liu, D. Zhang, G. Lu, and W.-Y. Ma, “Region-based image retrieval with per-

ceptual colors,” in Advances in Multimedia Information Processing-PCM 2004.

Springer, 2005, pp. 931–938.

[117] N. Jhanwar, S. Chaudhuri, G. Seetharaman, and B. Zavidovique, “Content based

image retrieval using motif cooccurrence matrix,” Image and Vision Computing,

vol. 22, no. 14, pp. 1211–1220, 2004.

[118] Y. Chen, J. Z. Wang, and R. Krovetz, “Clue: cluster-based retrieval of images by

unsupervised learning,” Image Processing, IEEE Transactions on, vol. 14, no. 8,

pp. 1187–1201, 2005.

[119] T. Ojala, M. Pietikainen, and D. Harwood, “A comparative study of texture mea-

sures with classification based on featured distributions,” Pattern recognition,

vol. 29, no. 1, pp. 51–59, 1996.

[120] T. Maenpaa, “The local binary pattern approach to texture analysis,” 2003.

[121] X. Yuan, J. Yu, Z. Qin, and T. Wan, “A sift-lbp image retrieval model based on bag

of features,” in IEEE International Conference on Image Processing, 2011.

[122] J. Yue, Z. Li, L. Liu, and Z. Fu, “Content-based image retrieval using color and

texture fused features,” Mathematical and Computer Modelling, vol. 54, no. 3, pp.

1121–1127, 2011.

[123] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,”

in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer

Society Conference on, vol. 1. IEEE, 2005, pp. 886–893.

[124] S. Mallat and G. Peyre, “A review of bandlet methods for geometrical image rep-

resentation,” Numerical Algorithms, vol. 44, no. 3, pp. 205–234, 2007.

[125] X. Qu, J. Yan, G. Xie, Z. Zhu, and B. Chen, “A novel image fusion algorithm based

on bandelet transform,” Chinese Optics Letters, vol. 5, no. 10, pp. 569–572, 2007.

134

REFERENCES

[126] R. Arandjelovic and A. Zisserman, “Three things everyone should know to improve

object retrieval,” in Computer Vision and Pattern Recognition (CVPR), 2012 IEEE


[127] W. Jiang, G. Er, Q. Dai, and J. Gu, “Similarity-based online feature selection in

content-based image retrieval,” Image Processing, IEEE Transactions on, vol. 15,

no. 3, pp. 702–712, 2006.

[128] B. Jyothi and U. Shanker, “Neural network approach for image retrieval based on

preference elicitation,” International Journal on Computer Science and Engineer-

ing, vol. 2, no. 4, pp. 934–941, 2010.

[129] P. D. Heermann and N. Khazenie, “Classification of multispectral remote sensing

data using a back-propagation neural network,” Geoscience and Remote Sensing,

IEEE Transactions on, vol. 30, no. 1, pp. 81–88, 1992.

[130] G. Hepner, T. Logan, N. Ritter, and N. Bryant, “Artificial neural network classifica-

tion using a minimal training set- comparison to conventional supervised classifi-

cation,” Photogrammetric Engineering and Remote Sensing, vol. 56, pp. 469–473,

1990.

[131] V. Sharmanska, N. Quadrianto, and C. H. Lampert, “Augmented attribute represen-

tations,” in Computer Vision–ECCV 2012. Springer, 2012, pp. 242–255.

[132] E. Yildizer, A. M. Balci, M. Hassan, and R. Alhajj, “Efficient content-based image

retrieval using multiple support vector machines ensemble,” Expert Systems with

Applications, vol. 39, no. 3, pp. 2385–2396, 2012.

[133] M. R. Azimi-Sadjadi, J. Salazar, and S. Srinivasan, “An adaptable image retrieval

system with relevance feedback using kernel machines and selective sampling,”

Image Processing, IEEE Transactions on, vol. 18, no. 7, pp. 1645–1659, 2009.

[134] D. Tao, X. Tang, X. Li, and X. Wu, “Asymmetric bagging and random subspace

for support vector machines-based relevance feedback in image retrieval,” Pattern

135

REFERENCES

Analysis and Machine Intelligence, IEEE Transactions on, vol. 28, no. 7, pp. 1088–

1099, 2006.

[135] E. J. Candes and D. L. Donoho, “New tight frames of curvelets and optimal rep-

resentations of objects with piecewise c2 singularities,” Communications on pure

and applied mathematics, vol. 57, no. 2, pp. 219–266, 2004.

[136] E. Le Pennec and S. Mallat, “Bandelet image approximation and compression,”

Multiscale Modeling & Simulation, vol. 4, no. 3, pp. 992–1039, 2005.

[137] K. J. Dana, B. Van Ginneken, S. K. Nayar, and J. J. Koenderink, “Reflectance and

texture of real-world surfaces,” ACM Transactions on Graphics (TOG), vol. 18,

no. 1, pp. 1–34, 1999.

[138] P. Alliez and C. Gotsman, “Recent advances in compression of 3d meshes,” in

Advances in multiresolution for geometric modelling. Springer, 2005, pp. 3–26.

[139] A. Khodakovsky, N. Litke, and P. Schroder, “Globally smooth parameterizations

with low distortion,” in ACM Transactions on Graphics (TOG), vol. 22, no. 3.

ACM, 2003, pp. 350–357.

[140] M. N. Do and M. Vetterli, “The contourlet transform: an efficient directional

multiresolution image representation,” Image Processing, IEEE Transactions on,

vol. 14, no. 12, pp. 2091–2106, 2005.

[141] M. F. Duarte, S. Sarvotham, D. Baron, M. B. Wakin, and R. G. Baraniuk, “Dis-

tributed compressed sensing of jointly sparse signals,” in Asilomar Conf. Signals,

Sys., Comput, 2005, pp. 1537–1541.

[142] A. Cohen and B. Matei, “Nonlinear subdivision schemes: applications to image

processing,” in Tutorials on Multiresolution in Geometric Modelling. Springer,

2002, pp. 93–97.

136

REFERENCES

[143] E. Le Pennec and S. Mallat, “Sparse geometric image representations with ban-

delets,” Image Processing, IEEE Transactions on, vol. 14, no. 4, pp. 423–438,

2005.

[144] D. L. Donoho, I. M. Johnstone et al., “Ideal denoising in an orthonormal basis

chosen from a library of bases,” Comptes Rendus de l’Academie des Sciences-Serie

I-Mathematique, vol. 319, no. 12, pp. 1317–1322, 1994.

[145] S. Sajidaparveen, “G.; b. chandramohan. medical image retrieval using bandelet,”

Int. J. Sci. Eng. Technol, vol. 2, pp. 1103–1115, 2014.

[146] G. Peyre and S. Mallat, “Surface compression with geometric bandelets,” in ACM

Transactions on Graphics (TOG), vol. 24, no. 3. ACM, 2005, pp. 601–608.

[147] F. A. Alomar, G. Muhammad, H. Aboalsamh, M. Hussain, A. M. Mirza, and G. Be-

bis, “Gender recognition from faces using bandlet and local binary patterns,” in

Systems, Signals and Image Processing (IWSSIP), 2013 20th International Confer-

ence on. IEEE, 2013, pp. 59–62.

[148] N. Chitaliya and A. Trivedi, “Comparative analysis using fast discrete curvelet

transform via wrapping and discrete contourlet transform for feature extraction

and recognition,” in Intelligent Systems and Signal Processing (ISSP), 2013 In-

ternational Conference on. IEEE, 2013, pp. 154–159.

[149] G. Peyre and S. Mallat, “Orthogonal bandelet bases for geometric images approx-

imation,” Communications on Pure and Applied Mathematics, vol. 61, no. 9, pp.

1173–1212, 2008.

[150] M. Weber, P. Crilly, and W. E. Blass, “Adaptive noise filtering using an error-

backpropagation neural network,” Instrumentation and Measurement, IEEE Trans-

actions on, vol. 40, no. 5, pp. 820–825, 1991.

[151] T. Andrysiak and M. Choras, “Image retrieval based on hierarchical gabor filters,”

International Journal of Applied Mathematics and Computer Science, vol. 15, pp.

471–480, 2005.

137

REFERENCES

[152] M. Lam, T. Disney, M. Pham, D. Raicu, J. Furst, and R. Susomboon, “Content-

based image retrieval for pulmonary computed tomography nodule images,” in

Medical Imaging. International Society for Optics and Photonics, 2007, pp.

65 160N–65 160N.

[153] K. N. Plataniotis and A. N. Venetsanopoulos, Color image processing and appli-

cations. Springer, 2000.

[154] T. Acharya and A. K. Ray, Image processing: principles and applications. John

Wiley & Sons, 2005.

[155] M. M. Rahman, P. Bhattacharya, and B. C. Desai, “Probabilistic similarity mea-

sures in image databases with svm based categorization and relevance feedback,”

pp. 601–608, 2005.

[156] Y. Chen, J. Z. Wang, and R. Krovetz, “Clue: cluster-based retrieval of images by

unsupervised learning,” Image Processing, IEEE Transactions on, vol. 14, no. 8,

pp. 1187–1201, 2005.

[157] D. Tao, X. Tang, X. Li, and X. Wu, “Asymmetric bagging and random subspace

for support vector machines-based relevance feedback in image retrieval,” Pattern

Analysis and Machine Intelligence, IEEE Transactions on, vol. 28, no. 7, pp. 1088–

1099, 2006.

[158] C. Campbell, “Algorithmic approaches to training support vector machines: a sur-

vey.” in ESANN, 2000, pp. 27–36.

[159] P.-W. Huang and S. Dai, “Image retrieval by texture similarity,” Pattern recogni-

tion, vol. 36, no. 3, pp. 665–679, 2003.

[160] E. Yildizer, A. M. Balci, M. Hassan, and R. Alhajj, “Efficient content-based image

retrieval using multiple support vector machines ensemble,” Expert Systems with

Applications, vol. 39, no. 3, pp. 2385–2396, 2012.

138

REFERENCES

[161] J. Yu, Z. Qin, T. Wan, and X. Zhang, “Feature integration analysis of bag-of-

features model for image retrieval,” Neurocomputing, vol. 120, pp. 355–364, 2013.

[162] X. Tian, L. Jiao, X. Liu, and X. Zhang, “Feature integration of eodh and color-

sift: Application to image retrieval based on codebook,” Signal Processing: Image

Communication, vol. 29, no. 4, pp. 530–545, 2014.

139

Documents

Computationally Intelligent Retrieval of Images Bases on ...prr.hec.gov.pk/jspui/bitstream/123456789/7239/1/...Computationally Intelligent Retrieval of Images Bases on the Actual Image