49
Technical Report, IDE0803, January 2008 Comparison Of Salient Feature Descriptors Master’s Thesis in Computer Systems Engineering Sara Farzaneh School of Information Science, Computer and Electrical Engineering, Halmstad University

Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

Technical Report, IDE0803, January 2008 Comparison Of Salient Feature Descriptors Master’s Thesis in Computer Systems Engineering

Sara Farzaneh

School of Information Science, Computer and Electrical Engineering, Halmstad University

Page 2: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms
Page 3: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

i

Abstract: In robot navigation, and image content searches reliable salient features are of pivotal importance. Also in biometric human recognition, salient features are increasingly used. Regardless the application, image matching is one of the many problems in computer vision, including object recognition. This report investigates some salient features to match sub-images of different images. An underlying assumption is that sub-images, also called image objects, or objects, are possible to recognize by the salient features that can be recognized independently. Since image objects are images of 3D objects, the salient features in 2D images must be invariant to reasonably large viewing direction and distance (scale) changes. These changes are typically due to 3D rotations and translations of the 3D object with respect to the camera. Other changes that influence the matching of two 2D image objects is illumination changes, and image acquisition noise. This thesis will discuss how to find the salient features and will compare them with respect to their matching performance. Also it will explore how these features are invariant to rotation and scaling.

Page 4: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

ii

Page 5: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

iii

Table of Content Abstract ............................................................................................... i Table of contents ................................................................................ iii Acknowledgment ...................................................................................v 1. Introduction.......................................................................................1 1.1 Aim of study.............................................................................1 1.2 Related works...........................................................................2 2. Background ......................................................................................3 2.1 Comparing two images in general.............................................3 2.2 Comparing two images in this study—assumptions ..................6 2.3 Image descriptor candidates: Sift, Todai .................................10 2.3.1 Sift (scale invariant feature transform) descriptor ..........10 2.3.2 Orientation Radiogram descriptor .................................12 3. Sift descriptors ..............................................................................13 3.1 Performance evaluation ..........................................................18 3.1.1 Comparing a disturbed image with a large set of images ...........................................................................18 3.1.2 The studied disturbance ................................................20 3.2 The results ..............................................................................21 3.2.1 Different sigma.....................................................................24 4. Todai descriptors............................................................................27 4.1 Linear symmetry ....................................................................27 4.2 Orientation radiogram.............................................................28 4.3 Discrete Fourier Transform (DFT) or Karhunen Loeve Transform...............................................................................30 4.4 Image descriptor ...................................................................31 4.4.1 Rotation and flip invariance ..........................................31 4.5 Performance evaluation of Todai descriptor ..........................31 5. Suggestion: Linear Symmetry histograms as modified sift features ......................................................................................35 5.1 Gradient in double angle versus single angle ...........................35 5.2 Histograms of double angle gradients versus single angle gradients ................................................................................35 5.3 Performance evaluation of the modified sift descriptor...........36 6. Discussion and conclusion..............................................................39 7. References.......................................................................................41

Page 6: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

iv

Page 7: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

v

Acknowledgments I would like to express my appreciation to my supervisor, Professor Josef Bigün for his friendly and inspiring guidance, support, useful advice and comments during this project.

Page 8: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

vi

Page 9: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

1

1 Introduction In many computer vision applications, including object or scene recognition, and motion tracking, image matching is one of the most important problems. For matching 2 images among many images, adjusting all pixels distances uniformly will not enable a correct matching because distances between objects and hence pixels change non-uniformly. Local image descriptors, also called features, should be utilized to match sub-images. Robot navigation applications have attempted to use natural tags to obtain invariant navigation. Also for pulling out one image from a large database such as those in biometric databases for human recognition, finding features and comparing them would be useful. This thesis discusses issues of matching images by using their features. Because the image features can be made to be rich, they are suitable for matching different images of an object. Image comparison is about finding the differences or similarities between two or more images, and how to quantify these differences. Image processing contains collections of methods for finding points where relevant features can be extracted for a comparison. A significant task for computer image processing and also to this project is 1. To suggest a set of image points, 2. To extract a set of features, a feature vector, from each of these points 3. To compare collections of feature vectors or individual feature vectors depending on

the application.

1.1 Aim of the study In this thesis focusing on the representation power of image descriptors at salient points, also known as key points, is more important than how to detect such points. Two different salient features will be studied, the SIFT features [1] and the Orientation Radiograms [michel96]. Accordingly, the study assumes that the key point is available and it is the same for both methods. The salient features will be compared by studying their recognition performance in an image database application. It is a typographic image database containing images as those illustrated in Fig (3-1).The performance under relevant transformations including different directions will be investigated. As rotation and scale is estimated outside of feature description, during key-point extraction, this means that in practice the project will focus on small rotational changes of images when comparing the recognition performance of the two salient feature descriptors. The project will adapt and complete existing software to the effect that it can extract these salient features, given a neighborhood.

Page 10: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

2

Only one key point per image will be considered. This point will be the center point of an image, and the neighborhood around it, (the local image) will be the entire image. In the next step, the feature vector will be extracted. It represents the salient features for the entire image. At the end these vectors will be used to see how similar the two images are.

1.2 Related works In this field, which is related to corner detection, there has been significant number of studies. An early report is that of the Moravec (1981), which is a detector suggested for use in stereo image matching. The study is enhanced by Harris and Stephen (1988) to make it more repeatable under small image variations and near edges [4]. Called Harris detector, this method indicates lack of lines, defined as linearly symmetries Bigun and Granlund (1987), to deliver a measure for presence of corners. This corner detector has been used in several applications, including 3D structure estimation from motion and stereo images. In 1995 Zhang et al. tried to improve the Harris detector. They showed it is possible to use the Harris detector even for large neighborhoods by using a correlation window around each corner to detect the best match. Schmidt and Mohr (1997) showed that feature matching can be expanded to general image recognition problems in which recognition by parts has been employed. The idea has been suggested as a method to pull out images from large image databases. Lowes 2004 puts forward that Harris corner detector is sensitive to changes in image scale and therefore it should not be used to match image objects that are the same but appear at different geometric scales (different sizes).

Page 11: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

3

2 Background

2.1 Comparing two images in general In general, for matching and comparing two different images of one object or scene, image descriptors are used. Applications are diverse and include robot navigation as well as image database searches. In database searches the query information, being an image, is usually similar but not exactly the same as the one that possibly exists in the database. In robot navigation, the robot usually has not seen object at the same viewing direction and distance as compared to the image of the same object that it has seen before. A common technique is to extract key points and check if the neighborhoods around the key-points match between the image that has been seen before (in the database) and the present one (the query image). The neighborhood around each such key-point should be studied and it will be called ometimes the local image here. In the local image a feature vector is extracted to represent it compactly. The features vectors from all salient points would then represent the entire image. Sometimes the local neighborhood of those salient points could be very large and rich in detail. Two different views (images) of the same scene contain common as well as non-common salient points. If there are sufficiently many key points that are common between two images, then these two images are said to have a common scene together. A salient point is common between two images if the corresponding feature vectors are sufficiently close to each other. Ideally, for comparing two images to know if they are match or not, the features that were extracted should be invariant to scaling and rotation, including non-planar 3D rotation. It would mean that the images scaled in different percentages or rotated in different degrees the salient point’s features would remain the same. However, in practice, especially in the case of SIFT and TODAI features that we study here, the descriptor features are not rotation and scale invariant strictly speaking. This is because there is a division of labor between key-point detection and feature vector extraction where in the key-point detection estimates the overall rotation and overall scale of the local image around a key point and hands this information in to the feature vector extraction. The latter rotates and scales the local-image according to this information before it extracts its feature vector. This is rotation normalization. Accordingly, if two local images have 45 and -60 degrees as overall orientation the first one is rotated with -45 and the second one is rotated with +60 degrees to the effect that they should both have the same overall direction at 0 degrees before their respective feature vectors are computed. A similar strategy is applied with respect to scales where the key-point detector has already estimated the scale and has handed it over (together with the orientation estimation) to the feature vector extraction. The feature vector extraction rescales the local-image to a standard size before it extracts the feature vector for it. This is scale normalization. However, the rotation and scale estimation handed over by key-point detection can contain errors, hopefully small amounts of errors. For this reason the feature vector must be in practice invariant for small changes in rotation and scale, only.

Page 12: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

4

Finding key-points is the first step of comparing two images and the Scale-space extreme-point detection is one way to locate them. The method searches the scale-space for extreme values, over all scales and image locations. The search amounts to Finding stable points in the scale space (Witkin 1983). The scale space of an image is defined as a function, L(x, y, σ), that is produced by convolving a Gaussian function having the standard deviation σ, G(x, y, σ), function, with an input image, I(x, y);

),(*),,(),,( yxIyxGyxL !! = (1)

)2

exp(.2

1),,(

2

22

2 !"!!

yxyxG

+= (2)

For efficiently detecting stable point locations, (Lowe, 1999) uses the difference of Gaussian function in the convolution above because this approximates the derivation with respect to the scale variable σ. For a scale σ, the next closest scale is kσ where k is a scalar larger than 1 and the derivative of L with respect to σ is then approximated as:

),,(),,(),,( !!! yxLkyxLyxD "= (3)

= ),(*),,(),(*),,( yxIyxGyxIkyxG !! " Using this procedure has some reasons. First, convolving an image with a Gaussian is a very effective way to compute because Gaussians are separable. Second, when σ is approximately 1, it is possible to compute L only at every second pixel (in both row and column directions). This results in a half (in each direction) smaller image and one can interpolate the missing L values that were not computed from those that are computed, essentially without loss of information. The size change when halving the lengths (both row and column size) of an image is called octave size change. Repeating the octave size change results in the so called Gaussian pyramid. However, in Scale space applications σ=1 is too large as a step size. Smaller step sizes are needed, typically a fraction like σ/(3)^(1/2) or σ/(4)^(1/2), resulting in that one has to wait, typically 3 or 4 (Gaussian) convolutions, before down-sampling. This is because repeated convolutions of Gaussians are equivalent to a single convolution with a larger Gaussian, since the variances of the Gaussians add i.e.

),(*)(,,(),(*),,(*),,( 22yxIyxGyxIyxGyxG !!!! += 4)

Third, L should be computed in any case to tell at which scale the detected key-point has been found, [1], so that key-detection module can hand this information to feature vector extraction module. The scale space building by smoothing and down-sampling (pyramids) is illustrated in Figure 2-1.

Page 13: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

5

Figure (2-1): For each octave of scale space, the initial image is repeatedly convolved with a Gaussian to produce the set of scale space image shown on the left. Gaussian images are subtracted to produce the difference-of- Gaussian images on the right. After each octave, the Gaussian image is down-sampled by a factor of 2, and the process is repeated. Adapted from [LOWE2004] After computing the scale-space extreme, the location and the scale of the key point should be determined. Key point candidates are found by comparing a pixel to its neighbors in the scale, such that D(x, y), is small. The next step is to manage a more detailed investigation of the 3D neighborhood (in row, column, scale directions) to make sure that the key point candidate has 1) sufficient strength 2) that it is a nearby an edge with a distinct direction.

Difference of Gaussian (DOG) Gaussian

Scale (First Octave)

Scale (Next Octave)

Page 14: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

6

This information allows points to be rejected as key-points if they have low contrast or are poorly localized along an edge. A point that is accepted as key point then will have a well defined scale as a result of this investigation. Likewise a unique orientation for the key-point should be determined. The orientation of a key-point is acquired as the predominant orientation of the gradient around each key-point. By appointing a scale and orientation to each key point, the subsequent feature vector (image descriptor) can be aligned (normalized) relative to this orientation and to achieve invariance. The descriptor itself represents the neighborhood around the key points based on statistics of the image gradient around each key point.

2.2 Comparing two images in this study—assumptions In this thesis the comparing two images is done as follows. The studying is based on 3491 images of the Passé-Partout Database, which is an international bank of printers’ ornaments at the University of Lausanne [Passé-Partout]. The goal is to evaluate whether two different images have consistently different descriptors, and two images that are the same have the same descriptors despite small changes. SIFT descriptor is a descriptor that will be studied first. In this thesis one sift feature vector will be extracted for one point per image and the feature vector has 128 elements. Another descriptor that will be studied is Orientation Radiogram descriptors, [Michel 1996] used in TODAI search engine [PASSEPARTOUT]. As for SIFT descriptors, the Orientation Radiograms feature vector will be extracted for one point per image. This vector has 120 elements. In contrast with the use of SIFT features in applications, there is no need to extract key-points, because we are only interested in the description power of the feature vectors, not how effectively the key points are found. This is motivated by the fact that the performance figures will be disturbed if a key point is not found as it should; causing that the neighborhoods will not be recognized. Accordingly a recognition error will be produced although the features are powerful enough to describe and recognize the neighborhoods if the key-points were found. For convenience, the unique key point that will be utilized for every image is the center of the image. Todai is an image search engine that already is a full application. As feature vector, it uses 1 feature vector, the Orientation Radiograms, per image. Accordingly, it can be assumed that the feature vector is computed for the center of the image, key point. This database search application will evaluate orientation radiograms features and sift features.

Page 15: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

7

Todai has 120 values for each bin, whereas this number is 128 for sift, so the test should show if 120 features would be enough to recognize the image in comparison to 128 features. Accordingly, we needed a software that only computes one sift descriptor at the center of an image whereby the descriptor vector is computed as if the entire image was the neighborhood of the key point, the center. To obtain this functionality, we have chosen to adapt existing software, [Vedaldi] that did not function exactly in the way described above. Since the source code was available we could change it to obtain the required functionality. The images in the PASSEPARTOUT database are not always quadratic. We chose to Make them quadratic by extracting the Largest possible square around the center of each image and saved them to use them instead of the original PASSEPARTOUT images. This is to make sure that all directions are treated in the same way by SIFT descriptors which would otherwise scale the horizontal and vertical directions automatically differently. In a local neighborhood, SIFT descriptors apply scale parameters in both directions in the same way. The feature vector should be extracted at the center of the query image too. In other words the feature vector will represent the salient feature in the entire query image. This is supposed to have been done for all images in the database already. It is expected that these vectors from the images in the database will show similarities with the query image only if and only if their feature vectors are similar. Sift features produced by the original software are not optimized for the intentions to be used in our study, but to be used to extract features for neighborhoods of key-points in with radii in the order of tens of pixels. By contrast in our study we use them to describe neighborhoods with radii of tens to hundreds of pixels. For Orientation Radiograms this optimization is already achieved since it already is used to find images in the PASSEPARTOUT database. Accordingly, we will attempt to make SIFT features as good as possible when the feature vector is 128 dimensional. To find out if two descriptor vectors are the same we use the distance between them, which is, when both vectors have the same norm, e.g. 1, fully equivalent to the use of the “angle” between them. The angle is available if the scalar product between the vectors is available For example if we have 2 vectors f, g:

Page 16: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

8

!!!!!!!!!

"

#

$$$$$$$$$

%

&

=

!!!!!!!!!

"

#

$$$$$$$$$

%

&

=

128

3

2

1

128

3

2

1

g

.

.

.

g

g

g

g

f

.

.

.

f

f

f

f >=<><

= gf,||g||||f||

gf,)Cos(! (5)

><!><+>>=<=<! gf,2gg,ff,g-fg,-f||gf|| 2 (6)

= )g,f1(2 ><!

2222 )(...)()(|||| 1281282211 gfgfgfgf !++!+!=! (7) If one image is supposed to be a query image to be compared with other images in PASSEPARTOUT, its distance, between all images in PASSEPARTOUT (equivalently the scalar products between them) need to be computed. The query image can be any image that is a distorted version of images in PASSEPARTOUT. Accordingly a confusion matrix which represents the scalar products between the feature vectors of distorted images and non-distorted images of the entire PASSEPARTOUT (3491 images) can be computed. Ideally the confusion matrix would be the identity matrix if the query images are identical to those in the database, as shown below. OR1 OR2 OR3..............database images……..OR3491 OR1 1 0 0 0 OR2 0 1 0 0 OR3 0 0 1 0 . . Query Images . . . OR3491 0 0 0 1 Figure (2-2): showing confusion matrix. It would be identity if the query image are identical those images in database

Page 17: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

9

However, this is not the case because the query image is a distorted version of an image in the image database. When the distance between the query image and the images in the database is known one can obtain the best match since the scalar product between the respective images is the largest on the row of the query image. Likewise the 10 best matches are found by noting the 10 largest scalar products on the row. Accordingly the errors can be counted since we will see them as scalar products larger than the diagonal of the confusion matrix. For comparing the searched image with others and finding their similarities a disturbance (distortion) will be applied. Here rotation and scaling will be applied as disturbances. The images will be rotated in small amounts and these will be comparing with the images which are not distorted. Likewise images scaled with small amounts will be compared to undistorted versions. The Sift descriptors consist of histograms of gradient orientations, typically 16, but each histogram has few direction bins, typically 8 resulting in 128 scalars. In the Sift descriptor every local image that is “a line” will be present by two entries in the (bins) in the average because a line will generate two opposite gradient directions in a local neighborhood, Fig (2- 3). It means for each line, 2 bins will be needed to represent its direction. One might suspect that there is a systematic resource wasting in this redundant representation. If the original orientations angles are multiplied by 2 the line can be represented by 1 bin, and this will hopefully avoid wasting resources, the bins. The other descriptor that will be considered in this thesis orientation radiograms, uses few 1D signals, just like the 1D histogram graphs of SIFT. The 1D signal of orientation radiograms are obtained from quantized linear symmetry orientation angles, which is typically quantized to 6 angles, but each 1D graph has 20 bins on the x-axis, resulting in 120 scalars. The orientation radiograms are in double angle representation by construction because linear symmetry angles are in double angle representation. Figure (2-3): Each line has two opposing gradient vectors. By doubling their angles each line will occupy one direction bin, instead of two.

Gradients

Ө

θ+Π

Page 18: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

10

Figure (2-4): When sigma grows the result of convolution goes less sharp (blurred). Sigma represents the scale level. Now we turn to the,σ, (Sigma), a sift parameter which represents the scale level. In sift descriptor each level of the scale space is obtained by smoothing the scale level before, by a Gaussian. Every level has thus its own effective Sigma, which represents the amount of smoothing that has to be applied to the original image to obtain the image at a particular scale level. Consequently the details (such as lines and edges shown in the Figure) get less sharp, (blurred), as sigma grows. The scale space is implemented by use of the Gaussian pyramid.

2.3 Image descriptor candidates: Sift, Todai

2.3.1 Sift (Scale Invariant Feature Transform) descriptor The software of SIFT detects the key points and matches the key points between two images. Typically this software detects numerous key points, as in figure (2-6) and put such key points in two images in correspondence by joining them with arcs. This is however, not what we want as we wish to evaluate the description power of the sift

Edge after a large Sigma used (scale)

Edge after a small Sigma used

Sharp line

Page 19: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

11

features at one key point a time. What is interesting to us is thus how close these features are to each other when we have two different (local) images and similar (local) images. The best match between two image objects should occur even in cluttered images and images with a lot of noise. Many features in cluttered images will not have a good match in the database yielding false matches (false acceptance). Even false rejection is possible. This occurs when two images are said not-matched when in reality they do match. In this thesis these errors will be quantified indirectly—the errors made in the best match, and in the 10 best match queries. The software that we adapted was written by A. Vedaldi, freely available for research purposes on internet, [Vedaldi]. It runs under MS-Windows and it works with the PGM format. Examples Two output pictures of SIFT features from an application are shown below side by side. These images illustrate the matching key points. Features are extracted at key points from both images, and arcs are drawn between key points whose features have close matches. In this example, many matches are correct but a non-negligible fraction is not correct. This typically shows as crossing arcs in the two images. To produce this example the demonstration software from D. Lowe is utilized [LOWE-DEMO]. Initially we wanted to adapt this software to our needs; however, we were unable to do it as its source code, was not freely available on internet.

Figure (2-6-a): Matching 2 images. The blue lines show the corresponding points in the respective image.

Page 20: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

12

Figure (2-6-b): Matching 2 images. The blue lines show the corresponding points in the respective image.

2.3.2 Orientation Radiograms descriptor TODAI is an ornament database search engine that is implemented in a Web-page at the University of Lausanne. The database contains ornaments scanned from books published in the past (especially the eighteenth) centuries [3].A more detailed description of the database and the search engine will be given below. TODAI is a search engine for the passé -partout (which is the name of the ornament database). Todai stands for” Typographic Ornament Database Identification” and is developed at EPFL in 1996. When an image is to be searched, it is uploaded via a web page. This image is the query image which can be any image on the local disc somewhere in the world. TODAI will then search for the best matches for the image received. When the search is finished the ten best matches are displayed in descending order of closeness. For each of the ornaments there is information about where it (the book) was printed, the size of the image in pixels, the ornament’s use (e.g. vignette) and the ornament’s nature (e.g. woodcut/cliché).

Page 21: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

13

3 Sift descriptors

Histograms of gradients

For the fmtest image

Figure (3-1): The sift descriptor is illustrated for the fmtest image. The key point is the center of the image and the local neighborhood is the entire image. The sub-squares show the histogram supports and the graphs in them represent the histograms in them.

Histogram supports Histograms of gradients

Page 22: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

14

As the figure (3-1) displays, a sift descriptor is computed for a region, here called, neighborhood (see also Figure (3-3)). This region is the neighborhood around a key point which is selected automatically in Lowe 2004, though it will be selected manually here. Once the key point is known, the descriptor of that should be computed by using the orientations of the image gradients in the neighborhood. The gradients are weighted by a Gaussian window as the circle in the left image illustrates. The right image displays the gradient histograms of the left image; the gradients are computed by using linear filtering. The bottom image represents the gradient histograms for a real “neighborhood” which is the fmtest image containing all directions and most frequencies. One can see which directions are occurring most frequently by consulting the histogram graphs drawn in each yellow square. The right image illustrates histograms in (16) sub images tiled as 4x4; each contains a set of arrows representing the orientation histogram of the gradients in a sub image, the histogram support, Figure (3-2). The length of each arrow is proportional to the total number of the gradients having a particular orientation in the support. The sift descriptor includes parameters such as Sigma that is assumed to determine the width of the histogram support. Sigma is also used to determine the size of the gradient filters. In figure 1 there are 16 orientation histograms, and each histogram has 8 angular bins, so there are 16*8=128 elements in the descriptor vector. This vector will be called d. A database of image neighborhoods can be used to study the descriptive power of such vectors d. Such vectors can construct a matrix D by using d coming from all images in the database, as rows. The last processing step consists of adjusting the descriptor vector for brightness variations by normalization. Non-linear illumination changes in images, such as camera saturation, illumination changes due to 3D shapes of the objects such as shadows, are not handled by this simple adjustment. The normalization effects do not change relative magnitudes of the gradients participating into the sift histograms. Also, this normalization does not change the gradient orientations, which in turn are judged to be influenced by non-linear illumination effects mentioned above. However, few gradients with extreme magnitudes can bias negatively the descriptive power of the sift feature vector, because they will dominate the local histograms. To minimize this effect, no orientation bin is allowed to be larger than 0.2 in the normalized feature that is the large values are saturated at 0.2. After this the sift feature vector is renormalized. This operation reduces the emphasis of large gradient magnitudes when matching sifts feature vectors.

Page 23: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

15

Figure (3-2): Sift features consists of gradient histograms. To avoid statistical bias from outliers (orientations of very strong edges), the feature vector components are saturated at 0.2. The Sift descriptor consists of histograms of weighted gradient directions around the key point. The descriptor has parameters as below.

Figure (2)

Figure (3-3): the graph shows the Sift parameters

Accordingly, the local statistics of the orientation of the gradient is the SIFT descriptor of (the neighborhood) of a point in the image. Such a point around which statistics is collected is called a key-point. How key points are obtained is not the focus of our evaluation here.

D

Histogram support

Center (Keypoint)

Extension

Neighborhood

0.2

45° 315° 0° 90° 180° 135° 225° 270°

Page 24: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

16

Figure (3-4): In this figure sigma determines the width of the support of the histogram.

Histogram layout As the figure (3-1) shows, the SIFT descriptor is, a collection of histograms of the gradient orientation in local supports around the key point. In the above figure there are 128 scalars constituting the sift descriptor because there are 16 (4x4) supports and at each support there are 8 orientations to be considered. Histogram support Sigma,σ determines the width of the histogram supports, D, of the descriptor, see Fig (3-4). There is a so called magnification parameter, m, which allows this direct coupling to sigma via ó*mD = (8) By default m=3.0, and Sigma is constant for the entire image in our evaluation, which means that the extension is always 3x3 if Sigma is fixed as Sigma=1.0, irrespective of the image size. Instead of growing with image size, the extension of the histogram supports will thus remain as 3x3 or another fixed size for a different sigma. It is worth noting that when confusion matrix was computed using the standard setting (m=3) the results were not favorable to SIFT descriptors (nearly 2000 errors for BST=1) because the histogram supports had much smaller diameter than what they were supposed to (a quarter of the image width/height) mostly. We wanted that the histogram support varied with the image size, not with the sigma because we know the size of the neighborhood exactly—it amounts the entire image, to the effect that the entire image should be considered when SIFT descriptors are computed. In our evaluation we changed this behavior so that the extension of the histogram support was always (Height/4) allowing growth of the histogram support with growing image size. This was achieved by choosing m such that

.

. . . . . . . . . . . . . . . .

D

Page 25: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

17

Height/4/óm = Yielding Height/4ó*mD == (9) This steering of D gave clearly much better results for SIFT descriptors as will be presented in the results section further below. Feature vector normalization. To perform invariance against illumination changes, the feature vector f is normalized such that its Euclidean length is 1 1||f|| = (10) . Weighting and binning. The gradient modules in the neighborhood are multiplied with weights coming from a Gaussian window, appropriately covering the extension before the histograms are collected:

• G! is weighted by a Gaussian; Furthermore, during the histogram collection, gradient directions may not exactly agree with the histogram bins (with the 8 discrete directions of the figure (3-2)). Gradient directions must be assigned to bins in a “democratic” way. In the Sift descriptors, an interpolation is used to distribute the values of orientations falling between the bins. For example if such a direction is having the distance r from the closest bin then it will have the distance 1-r from the next closest bin. The values ||1-r ||∠(∇G) and ||r||∠(∇G) are then added to the closest and next closest bins. Note that ∠(∇G) is the argument of ∇G, i.e. the gradient angle. In Figure (3-5) this is illustrated for a gradient having 70 degrees as its direction and the closest bins are 45 and 90 degrees respectively. Accordingly r is 25/45.

Figure (3-5): weighting and binning for the gradient angles which do not agree with the pre-defined bin-angles.

45 70 90

r= 25/45 1- r

Page 26: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

18

Performance evaluation

3.1.1 Comparing a disturbed image with a large set of images In this thesis, a query image (the searched image) will be compared with other images in a large database. The query image will be distorted gradually, and its visual features will be compared to those of the database. We use scalar products to compare the feature vector of the query image to those of the 3491 images. The function below will extract the max, x, and the position of the max, p, [ ] )max(confpx,

'= (11) And the max should occur at the diagonal elements of the confusion matrix being as close as possible to 1. The errors can be plotted by a matlab command as: 0))),3491,3491linspace(1pplot(abs(( >! (12) For visualization they can be counted separately by software. Without disturbing the images we obtained 17 errors, instead of the expected zero errors. This is because the scalar product of a normalized vector with itself is always 1, which is the maximum. In other words there were another image which was equally similar to the query image than the one with exactly the same name. We checked visually these images and found indeed these images were not only similar but they were identical to the respective query images, even at pixel level. We noted the names of these images for future. Table (3-1): The identical images in the PASSEPARTOUT database (3491) images

Filename Index Filename Index Filename Index

OR1066.jpg 74 OR1106 .jpg 118 OR1070.jpg 79 OR1107.jpg 119 OR1261.jpg 285 OR1263.jpg 286 OR1262.jpg 287 OR1240.jpg 263 OR1272.jpg 296 OR1239.jpg 262 OR1273 .jpg 297 OR1270.jpg 294 OR1275.jpg 299 OR1294.jpg 319 OR1300 .jpg 325 OR5.jpg 2 OR5 .jpg 2971 OR305.jpg 2246 OR63 .jpg 3107 OR402.jpg 2873 OR64 .jpg 3118 OR1424.jpg 457 OR700.jpg 3181 OR1104.jpg 116 OR871 .jpg 3358 OR1103.jpg 115 OR872 .jpg 3359 OR1102.jpg 114 OR873 .jpg 3360 OR1078.jpg 87 OR946 .jpg 3437 OR1086.jpg 96 OR961.jpg 3452

Page 27: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

19

The images in this database should be rotated to compare the confusion matrix between the rotated and not rotated images. The rotated images were put in a different directory, and we implemented software that would

• Compute sift features for every image in both directories and store it there, • Compute the confusion matrix between the directories, • Count the occurrences when the query image is not found as the best match,

bst=1, • Count the occurrences when the query image is not among the 10 best

matches, bst=10. Confusion matrix is a scalar product between normalized vectors with positive elements. Accordingly the best case scalar product should be close to one and never negative. In this thesis the different directories of images, rotated images and flipped images were compared, the images rotated in large and small rotations first. They were rotated in 90, 180, and 270 also they were flipped in these degrees and they were saved in different folders as a first trial to detect possible anomalies in our software. So there are 8 directories for rotated images in 0, 90°, 180°, 270° degree and flipped. However, only small rotations, scaling, and additions of noise are utilized in the evaluation. The large rotations and scaling are implemented but not utilized in the conclusions, since large rotations and scaling will be captured by the key-point extractor. And flipping is only of interest for Pass partout application, not for general image processing applications. Flipping invariance can be implemented for Passé partout separately by simply having 2 entries in the database, one for the flipped, and one for no flipped version, that after the search are put into equivalence. When the images are rotated in large amounts and are flipped, the features of rotated images would be compared. And the confusion matrix of each of them should be compared with the no rotated images for finding the errors. There are some examples for rotated (90 degree) images with and without flips below.

Page 28: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

20

Figure (3-6): Experimental images showing big rotation and flipping

3.1.2 The studied disturbances: In this thesis for small rotations, the degrees such as 10°, 4°, 3°, and 2° are considered. All images were rotated with these degrees. The small rotation is more realistic in a real application when the SIFT features will work together with an appropriate key-point descriptor. There are some examples of rotation in 2° below:

Figure (3-7): Experimental images of small rotation with flipping and without flipping

Rotate= 2 Flipp=0 Rotate= 2 Flipp=1

Real Image with no rotation and no flip

Rotate=90, flipp=0 Rotate=90, flipp=1

Page 29: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

21

By doing the short and big rotation in sift program the result shows us that 2 images are not similar and it’s not a bug. With rotation 2° the errors should be fewer in comparison to other (larger) rotations, (and if it not, then there is a bug), which was the case. As mentioned, for all of rotation, we measure the errors in 2 ways. In the first case the BST should be equal to one and in the second case the BST should equal to ten. When BST=1 only the correct search will be accepted because the goal is to look for the first image in the returned list of found images. When BST=10 amounts to a less severe evaluation. If the rotated image is not even among the ten best first then this is recorded as an error. It is not recorded as an error if the query image is among the 10 best images returned. Evidently, for BST=10, matching errors should always be less (else there is a bug), which was the case. The confusion matrix between the directories of images without rotations (and no-flip) will be compared with the rotated image (and no-flip) in another directory. The result will be given in the next section. Second experiment uses scaling as disturbance for comparing the features. The scaling is can make images larger or smaller. First the disturbance consisted of making the images larger with 2%, 4%, 6%, 10% and then we compared these with the originals. However, this turned out to be not a real disturbance because when we make an image larger there is virtually no loss of information and global gradient statistics will remain the same. Accordingly, the errors were very low even at 10% size increase, which was 17 for BST=1. Furthermore these were not errors at all, when studied closely, because the non-matching images were the duplicates of the original images. A more significant difference could be observed when we started to scale the other way around, namely a scale change with information loss—image size reduction to 65%, 70%, 85%, 90%, 95%. The result of scaling as disturbance will be presented in the next section.

3.2 The results (SIFT) Rotation distortion experiments Below we show the query results when we distort the image at various degrees. The fact that we obtain 17 errors with BST=1 is attributable to duplicates. We have kept these duplicates to allow a comparison with Orientation Radiograms features since the results of the experiments reported in Henning son and Willem 2002, were containing the same duplicates.

Page 30: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

22

Rotation (degrees)

Errors at BST=1 Errors at BST=10

0 17 0 1 17 0 2 22 0 3 49 1 4 80 6 10 983 380 Table (3-2): ): the result of rotation in small degrees. The shown17 errors are not actual errors; they represent the duplicated images

We can see that the errors start to be obtained at rotations of 2 degrees and upwards. At 4 degrees rotation the errors are not far away from 100 at BST=1 and 6 at BST=10, meaning that there were 6 images that were not even found among 10 best matches. Degradation of the errors is to be expected because the disturbance is increasing. At 10 degrees the disturbance is so significant that there are 380 images (more than 10% of the database) are not found at all in the list of 10 best matches. Scale distortions experiments In all experiments regarding scale as disturbance, we express the scale of the resulting image as its size relative the original. Here size changes refer to the periphery, not the area, and the size is changed uniformly across rows and columns of an image. Accordingly if the relative size of the new image is 1.1 and the original image has 100x100 pixels, the resulting size is 110x110 pixels. Likewise, the resulting image size will be 60x60 if its relative size is 0.60 for the same image (100x100). The table below shows the result when disturbing original images by increasing their size. As it can be appreciated from the table, there are no errors at all when BST=10, and the errors of BST=1 are constant, 17, including up and including 10 % size enlargement. With a closer inspection it turns out that these images are duplicates, i.e. there is no error up to 110% size change. There was no degradation even at image size amplifications at 160% (not shown). As explained before, this can be attributed to the fact that there is no information loss when enlarging images.

Page 31: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

23

Table (3-3): The query errors for scaling (up-sampling) When using size reduction as image disturbance, we can see an immediate degradation of the performance as compared to enlargement. At 0.90 relative sizes we obtain 19 errors, of which 2 are genuine errors (not duplicates) and at. 0.60 relative size the errors jump to 193 when BST=1, and to 52 when BST=10. Table (3-4): the query errors for scaling (down sampling)

Enlargement BST = 1 BST = 10 1. 02 17 0 1. 04 17 0 1. 06 17 0 1. 10 17 0

Reduction BST = 1 BST = 10 0.60 193 52 0.70 57 7 0.85 19 0 0.90 19 0 0.95 19 0

Page 32: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

24

3.2.1 Different Sigma Finally, we will verify that the gradients we extract for every image of the database are not too blurred. This is also controlled by sigma in SIFT features because the gradient is obtained by applying the same filter h, below

-1 1 To every scale,σ, both in x and y directions. A Gaussian convolved with the filter h in the x direction, is equivalent to xG(x, y,σ), the derivative of a Gaussian filter w.r.t. x. In other words this procedure is equivalent to convolving the original image directly with xG(x, y,σ) implying that σ also controls the width of the derivative filters. We will attempt to find the best σ for all images. We will use the same sigma for all images because; all images are scanned with approximately the same equipment having always the same scanning resolution. We know this sigma already for Orientation radiograms, (sigma=1), but we wanted to make sure that theσ we use for SIFT descriptors is also optimal for it. For each sigma the confusion matrix will be computed and the errors will be noted as before. The σ will be varied for σ=1, σ=2, σ= 4. In the experiment the query image will undergo a rotation of 2 degrees for all 3 cases. The result shows that sigma=1 works better than the other values, which is not very surprising, because this value also works best for the Orientation Radiograms. It confirms that the scales of the edges appearing in the database is estimated well by σ=1. When σ=1, the number of errors was 20, but 17 of them were not errors. They were due to duplicate images.

Sigma=2 the number of the Errors BST=1 BST=10 Rotation=0 flipp=0 Rotation=2 flipp=0 41 3

Page 33: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

25

Table (4-5) Table (3-5): The query errors when using different sigma in the sift descriptor

Sigma=4 the number of the Errors BST=1 BST=10 Rotation=0 flipp=0 Rotation=2 flipp=0 146 21

Sigma=1 the number of the Errors BST=1 BST=10 Rotation=0 flipp=0 Rotation=2 flipp=0 20 0

Page 34: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

26

Page 35: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

27

4 Todai descriptor

4.1 Linear symmetry The linear symmetry can be computed by using separable spatial filtering. It investigates the neighborhood of each pixel for dominant orientation. It provides estimates of several parameters: the dominant orientation, and two error-measures related to the best and worst directions that can be fit. The figure (4-1) illustrates three different neighborhoods. The linear symmetry will produce the orientation represented by the dotted axis in all three cases. In the left image, the best fit will produce nearly zero error, whereas the worst fit the vertical direction will produce a very high error. This means that it pays off to fit the line in the dotted-axis direction as compared to the vertical direction because the error is much smaller than the alternative (vertical axis, which is worst). The difference between the worst and best errors is the certainty The middle image has the lines with many different orientations, and the orientation fit would still indicate the dashed axis as the optimal, however, the error in the best fit (dashed axis) would not be much less than the worst axis (vertical, not shown). Because this difference between the best and worst errors, which is the certainty, is small to indicate that the confidence in the best fit is small. In the right image, the image has covered by the same gray level, so the dominant orientation is random; the certainty of the best-axis is zero.

Figure (4-1): applying linear symmetry on these 3 neighborhoods will yield the same orientation but the variance will differ.

Page 36: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

28

Orientation radiograms Orientation radiograms, [Michel et. all. 1996], are image descriptors that are used by the TODAI, which is a search-engine for typographic ornament images in an image database [PASSEPARTOUT]. [5] To obtain the orientation radiograms, the original image is decomposed into several linear-symmetry orientations, each represented as a decomposition image, fig (4-2). A decomposition image contains lines and edges of a fixed orientation, the other orientations being suppressed to appear in other decomposition images. The linear symmetry orientations are different from gradient orientations and will be discussed further below. In the example of Fig (4-2), which shows the image of a rectangle, the decomposition using 6 directions is illustrated. Each of the 6 decomposition images contains edges and lines of the original but only in one of the 6 directions. The orientations falling in between the 6 (fixed) orientation bins will appear in the closest orientation decomposition images with edge intensities adjusted to their distance to the nearby bins. Figure (4-2): Decomposition of a rectangle in six orientations.

+ +

+ +

=

0° 60°

30°

90° 120° 150°

90° 120° 150°

Page 37: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

29

Figure (4-3): the radiogram x-rayed for 6 different orientations (Orientation Radiogram)

X-ray for orientation 30°

X-ray for orientation 90°

X-ray for orientation 60°

X-ray for orientation 0°

X-ray for orientation 150° X-ray for orientation 120°

Page 38: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

30

Each decomposition image is “X-rayed”, projected, in the direction corresponding to the direction of the decomposition, Fig (4-3). These graphs are called the orientation radiograms. These are not used directly as features because the number of discrete bins of the x-axes is usually high. This is discussed next.

4.2 Discrete Fourier Transform (DFT) or Karhunen Loeve Transform (KLT)

One way to compact the orientation radiograms is to use the first coefficients of the (1D) Fourier transform of the radiogram graph. Accordingly, every radiogram would need only a few real scalars for its representation. In the implementation 10 Fourier coefficients (which are complex, hence 20 real values) were sufficient to keep most of the information in the orientation radiograms, figure (4-4). In the implementation of Michel et. al. 1996, each graph had 512 discrete values (on the x-coordinate). Before compaction, all the frequencies exist in the transformed radiograms but only the 10 lowest frequencies (20 real coefficients) are kept after compaction. This means that the radiogram graphs also were applied a low-pass filtering because high frequency information, usually representing noise, is left out. Another way is to use the Karhunen Loeve Transform on a subset of the radiograms graph to obtain KLT basis, and then to use the projection coefficients on the 20 most significant KLT basis vectors. This is very similar to DFT approach because even the 20 real coefficients of the DFT transform are projection coefficients (on the Fourier basis). This approach was implemented and tested by Henninngs son and Willem 2002. The 20 projection coefficients (DFT or KLT) of the 6 radiograms are concatenated to form a descriptor vector, when searching for similar images. In the decomposition 6 directions were utilized meaning that, there were 6x20=120 scalars representing the image. Todai uses these 120 dimensional feature vectors to estimate the distance between two images to judge if they are similar. The distance measurements are then utilized to pull out the 10 closest matches of a query image from the database. Original Radiogram KLT or DFT of Radiogram Figure (4-4): The 20 real variables (DFT, KLT) used in Todai descriptor represents the, orientation radiograms, the graph on the left

20 1024

Page 39: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

31

4.3 Image descriptors In the [PASSEPARTOUT] application web-page, when an image is to be searched, it is uploaded via a web page. This image is the query image which can be any image on the local disc somewhere in the world. TODAI will then search for the best matches in the collection of images that it has (PASSEPARTOUT), for the image received. When the search is finished the ten best matches are displayed in descending order of closeness. For each of the ornaments there is information about where it was printed, the size of the image in pixels, the ornament’s use (e.g. vignette) and the ornament’s nature (e.g. woodcut/cliché).

4.3.1 Rotation and flip invariance TODAI implements rotation invariance by global orientation alignment. Just like in SIFT features the rotation invariance is handled outside of the descriptors—a global image orientation is estimated and the image is rotated to 0 degrees for both the database and the query image. By this procedure two similar images will be aligned to the same angle since both images are aligned along the same direction—the global rotation axis, To deal with this problem the global rotation-axis with the smallest inertia should be computed. This approach can work properly only if exactly one such axis exists in specific image. Accordingly there is a risk that it fails if this axis is not unique. It is important that the images are rotation and flip invariant in the web-application for the purposes of book-historians. It is also important for many users of SIFT features that these are both rotation and scale invariant when they use it for robot navigation. However, it is worth noting that rotation, scale and flip invariance are not desired in every application. For example robot navigation should not have flip-invariance, and for book historians the size of the ornaments is an important variable that should not be discarded. Accordingly, it makes sense to keep the “invariance’s” outside of the descriptors e.g. to pre-rotate or pre-scale images to rotation and scale values that can be obtained from the image pair being tested (query and the database).

4.4 Performance evaluation of Todai descriptor In 1996 TODAI was tested on a database that 39 different images [Michel et el] with the requirement that the search would not be sensitive to rotation and flipping. The hit ratio (i.e., number of correct searches divided by total number of searches) was 97%. This result was higher than other methods tested in the same work. An example of such a query is shown in Figure 4-5.

Page 40: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

32

When the website was presented, the database had grown to a few hundred images. In 2002 this number increases to more than 3500 images. Accordingly, There was a clear need to estimate the performance and if possible to improve it. This work was undertaken in Henning son and Willem 2002.

Figure (4-5): An example of a query result [Bigun et el]: top left: input image from top right: the closest images in descending order In order to search for all images, the existing software should be run many times. It means that distorted versions of all pictures in the database, are to be compared with all images in the database, just like we did for SIFT features. The results in this section are due to Henningsson and Willems 2002.

Page 41: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

33

Tab1e (4-1): Search result for two images, at left: image OR23 is successful search. At right: image OR230 is a less successful-search with image OR61 in rank 1 and the query image in rank 3 In order to test rotation and flip invariance their search was applied at different transformations (rotation 2°, 10°, 90°, flipped left –right) of the original images, [Henningson and Willem]. As said before large rotations, large scale changes and flipping are not of general interest, but only application dependent. Such invariance is usually handled outside of the feature description. Accordingly we only recite their results concerning 2 degrees and 10 degrees distortion and leave out performance at larger rotations as well as the flip performance. We also think at this point that the performance at 2 degrees distortion should be more important because small deviations (of 2 degrees) are more likely than larger deviations (10 degrees). Their study found out that the DFT did not perform as well in very low distortions (2 degrees) as KLT, but on the other hand KLT was worse than DFT at 10 degrees distortions. However, it can be concluded that the radiogram features contained sufficient information to classify even the larger database. REDUCTION ROTATION BST=1 BST=10 Table in [HW] DFT 2 56 27 Table 3 DFT 10 229 93 Table 3 KLT 2 22 1 Table 4 KLT 10 485 256 Table 4

Searched image: OR23.pgm Searched image: OR230.pgm

Rank Found images Distance Found images Distance

1 OR23.pgm 35842.842000 OR61.pgm 82848.972868

2 OR24.pgm 60467.913330 OR438.pgm 90922.324504

3 OR770.pgm 67417.096744 OR230.pgm 91203.430983

4 OR696.pgm 67502.382107 OR606.pgm 93982.690192

5 OR536.pgm 68539.403090 OR400.pgm 95452.346785

6 OR167.pgm 70949.404165 OR74.pgm 96185.371502

7 OR154.pgm 71157.903313 OR171.pgm 96875.623614

8 OR58.pgm 72069.984311 OR264.pgm 96972.164222

9 OR766.pgm 72192.8111 03 OR479.pgm 97103.112813

10 OR628.pgm 72286.192833 OR387.pgm 97677.126605

Table 1: Search results for two different images. At left: image OR23 is a

successful search (rank 1). At right: image OR230 is a faulty search with

image OR61 in rank 1 and it self in rank 3.

Page 42: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

34

Table (2-4): Search result for different transformations on the 3490 images according to [HW] (Henningsson and Willem 2002)

Page 43: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

35

5 Suggestion: Linear Symmetry histograms as modified sift features

5.1 Gradient in double angle versus single angle Around edges and lines, the gradient angle alternates in single angle representation whereas it is unchanged in double angle representation Figure (5-1): ): Each line has 2 opposing gradients. By doubling the gradient angles each line will occupy one bin instead of 2 in orientation histograms.

5.2 Histograms of double angle gradients versus single angle gradients

The Sift descriptors consist of histograms of gradient orientations, typically in 16 sub-squares in a neighborhood, but each histogram has few direction bins, typically 8 resulting in 128 scalars. By contrast Todai search engine uses few radiograms of linear symmetry orientations, typically 6, but each has 20 bins, resulting in 120 scalars. Common to both is that both use 1D graph in their image description. In the Sift descriptor every line will be present by two entries in the (bins) in the average because a line will generate two opposite gradient directions in a local neighborhood, Fig. (5-1). It means for each entry 2 bins will be needed to represent 1 line direction. One might suspect that there is a systematic resource wasting in this representation as it contains redundancy. If we multiply the original orientation angles by 2 the line can be represented by 1 bin, which is one of the ideas behind the linear symmetry orientation

Ө+Π

Gradients

Ө

Page 44: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

36

estimation procedure. In a neighborhood around an edge there will be one (angle) bin in double angle representation that will receive votes. In the same situation there would be two angle-bins that would receive votes in SIFT features. The actual angle difference between two adjacent bins in double angle representation will be 22.5 degrees when using 8 angle-bins using the same total angle-bins, which is to be contrasted to the single angle representation of the SIFT features, 45 degrees, with the same amount of bins resources. To implement this within the existing software we proceeded as follows.

)(cosadx != (13) )(sinady != (14)

Are already computed yx d,d in the sift software by means of a Gaussian filtering and application of the h filter (difference filter) discussed previously. What we need is to replace Cos Φ with Cos 2Φ, and Sin Φ with Sin 2Φ in the software, then the descriptors will be in double angle representation, everything else being unchanged (using the bin-resources, resulting in 128 elements of the feature vectored). To do this we observed that:

)d(dSinaCosa)Cos(2a 2

y

2

x

22222 !=!= """ (15)

)d.2(d CosSin2a)Sin(2a yx

22== !!! (16)

and also implemented the demanded changes in the software.

5.3 Performance evaluation of the modified sift descriptor The result of experiment with changing the sift descriptor angles to double angles will be shown below:

Page 45: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

37

The number of the Errors Table (5-1): The query errors when modified sift descriptors are used as features

Sigma = 1 BST= 1 BST = 10 Rotate= 0 flipp=0 17 0 Rotate= 1 flipp=0 17 0 Rotate=2 flipp=0 18 0 Rotate= 3 flipp=0 22 0 Rotate= 4 flipp=0 38 0 Rotate= 10 flipp=0 1186 451 Rotate= 0 flipp=1 1638 1109

Page 46: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

38

Page 47: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

39

6 Discussion and Conclusion The goal of this thesis was to study the performance of image descriptors. In particular we studied and compared 2 kinds of descriptors using a common large database. The idea was to test their descriptive power independent of the particular needs of the application with respect to invariance to large image distortions such as flip, scale, and rotation. Small distortions with respect to scale and rotations are considered in the comparisons because the scale and rotation are typically estimated before image features are extracted to align the scale and the rotation angles and the estimation can be assumed to contain small errors. For small image rotations (approximately 2 degrees) we could conclude that Todai and Sift are approximately the same, giving very low errors—in the range of 0-5 errors (BST=1 and BST=10) when the duplicates are removed from the errors. For larger image rotations (approximately 10 degrees) the worst Orientation Radiogram results, 485 and 256 (with KLT for BST=1 and BST=10 respectively) were better than the respective SIFT results, 983 and 383 errors. However, most interestingly we could observe a significant improvement of SIFT features performance when we used the same (double) angle representation as the Orientation Radiograms—the errors were reduced to approximately half or better to 4 degrees. After that the improvement to decrease and eventually disappeared. At the instant we had reached to 10 degrees we could not observe statistically significant difference between the two. This suggests that there is a systematic waste of resources in the SIFT feature descriptor (128 real parameters) that can be improved. For SIFT features (single angle representation) the scale enlargements down to 0.85 of the original (perimeter) size gave virtually no-errors—1 error when the 17 duplicates were removed at BST=1, and zero at BST=10. Only at 0.70 of the original size we could start to observe a serious increase in errors (57 at BST=1), suggesting that 15% size reduction (giving nearly no-errors) is considered as a small size change by SIFT features which can be a strength in many applications. It was not interesting to attempt to improve these results by using double angle representation because these are already very good results. For Orientation radiograms, the performance evaluation of scale change disturbances was not available in the study of Henningsson and Willem 2002, and the scope of this project did not permit to implement these disturbances.

Page 48: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

40

Page 49: Comparison Of Salient Feature Descriptorshh.diva-portal.org/smash/get/diva2:238363/FULLTEXT01.pdfsalient features will be studied, the SIFT features [1] and the Orientation Radiograms

41

7 References [1] David G. Lowe, "Distinctive image features from scale-invariant key points"

International Journal of Computer Vision, 60, 2 (2004), pp. 91-110. [2] S. Michel, B. Karoubi, J. Bigün, and S.Corsini, .Orientation radiograms for

indexing and identification in image databases,. European conference on signal processing (Eupsico), Trieste, Sept. 10-13 1996, pp. 693-1696

[3] Weisstein, Eric W. "Salient Point." From MathWorld--A Wolfram Web Resource.

http://mathworld.wolfram.com/SalientPoint.html [4] David G. Lowe, Method and apparatus for identifying scale invariant features in

an image and use of same for locating an object in an image US Patent 6,711,293 (March 23, 2004). Provisional application filed March 8, 1999. Assignee: The University of British Columbia

[5] [PASSEPARTOUT] Passé-Partout database, TODAI, Lausanne http://www.unil.ch/BCU/docs/collecti/res_prec/en/todai_intro.html

[6] http://www.umiacs.umd.edu/~shekhar/ps/pr96.ps.gz [7] J. Bigun and G.H. Granlund, Optimal orientation detection of linear symmetry.

In First International Conference on Computer Vision, ICCV, London, June 8-11, pages 433-438. IEEE Computer Society, 1987.

[8] [Vedaldi] http://vision.ucla.edu/~vedaldi/code/sift/sift.html [9] [LOWE-DEMO] http://www.cs.ubc.ca/~lowe/keypoints/ [10] Laine Henning son and Jana Willems Image database search engine, Masters

Thesis, Halmstad University, IDE, 2002