Study group SURF: Feature detection & descriptioncs.au.dk/~jtp/SURF/report.pdf · SURF: FEATURE DETECTION & DESCRIPTION, Q4 2011 1 Study group SURF: Feature detection & description

  • Upload
    trinhtu

  • View
    245

  • Download
    10

Embed Size (px)

Citation preview

  • SURF: FEATURE DETECTION & DESCRIPTION, Q4 2011 1

    Study groupSURF: Feature detection & description

    Jacob Toft Pedersen 19983275 [email protected]

    AbstractA technical report on Feature detection and implementing the Speeded-Up Robust Features(SURF)algorithm.

    Index TermsSURF, SIFT, feature detection, feature description, integral image

    F

    Fig. 1. Detecting common features between twoimages with different scale and cropping.

    1 INTRODUCTION

    Feature detection is the process where we auto-matically examine an image to extract features,that are unique to the objects in the image, insuch a manner that we are able to detect anobject based on its features in different images.This detection should ideally be possible whenthe image shows the object with different trans-formations, mainly scale and rotation, or whenparts of the object are occluded.

    The processes can be divided in to 3 overallsteps.

    Detection Automatically identify interestingfeatures, interest points this must be done ro-bustly. The same feature should always be

    detected irregardless of viewpoint.Description Each interest point should have

    a unique description that does not depend onthe features scale and rotation.

    Matching Given and input image, determinewhich objects it contains, and possibly a trans-formation of the object, based on predeter-mined interest points.

    This report will focus on the details of thefirst two steps with the SURF algorithm 1.

    2 DETECTION

    Scale-Invariant Feature Transform, SIFT is asuccessful approach to feature detection intro-duced by Lowe [1]. The SURF-algorithm [2] isbased on the same principles and steps, butit utilizes a different scheme and it shouldprovide better results, faster.

    In order to detect feature points in a scale-invariant manner SIFT uses a cascading filter-ing approach. Where the Difference of Gaus-sians, DoG, is calculated on progressivelydownscaled images.

    In general the technique to achieve scaleinvariance is to examine the image at differentscales, scale space, using Gaussian kernels. BothSIFT and SURF divides the scale space intolevels and octaves. An octave corresponds toa doubling of , and the the octave is dividedinto uniformly spaced levels.

    1. Speeded-Up Robust Features

  • SURF: FEATURE DETECTION & DESCRIPTION, Q4 2011 2

    = k

    = 2k

    = 2k

    = 4k

    = 4k

    = 8k

    = 32k

    Fig. 2. 3 octaves with 3 levels, The neighbor-hood for the 333 non-maximum suppressionused to detect features is highlighted.

    Both approaches builds a pyramid of re-sponse maps, with different levels within oc-taves. A response map is the result of an oper-ation on the image. The interest points are thepoints that are the extrema among 8 neighborsin the current level and its 29 neighbors in thelevel below and above. This is a non-maximumsuppression in a 333 neighborhood, the rela-tion between levels, octaves and neighborhoodis illustrated in Figure 2.

    2.1 Hessian matrix interest points

    SURF uses a hessian based blob detector to findinterest points. The determinant of a hessianmatrix expresses the extent of the response andis an expression of the local change around thearea [3].

    H(x, ) =[Lxx(x, ) Lxy(x, )Lxy(x, ) Lyy(x, )

    ](1)

    where

    Lxx(x, ) = I(x) 2

    x2g() (2)

    Lxy(x, ) = I(x) 2

    xyg() (3)

    Lxx(x, ) in equation 2 is the convolution ofthe image with the second derivative of theGaussian. The heart of the SURF detection isnon-maximal-suppression of the determinantsof the hessian matrices. The convolutions isvery costly to calculate and it is approximated

    Fig. 3. 4 memory look ups is sufficient tocalculate the sum of an rectangular area withan integral image

    and speeded-up with the use of integral imagesand approximated kernels.

    An Integral image2 I(x) is an image whereeach point x = (x, y)T stores the sum of allpixels in a rectangular area between origo andx (See equation 4).

    I(x) =ixi=0

    jyj=0

    I(x, y) (4)

    The use of integral images enables calculat-ing the response in a rectangular area witharbitrary size using 4 look-ups as illustrated inFigure 3.

    The second order Gaussian kernels 2

    y2g()

    used for the hessian matrix must be discretizedand cropped before we can apply them, a 99kernel is illustrated in Figure 4. The SURF algo-rithm approximates these kernels with rectan-gular boxes, box filters. In the illustration greyareas corresponds to 0 in the kernel where aswhite are positive and black are negative. Thisway it is possible to calculate the approximatedconvolution effectively for arbitrarily sized ker-nel utilizing the integral image.

    Det(Happrox) = DxxDxy (wDxy)2 (5)

    The approximated and discrete kernels arereferred to as Dyy for Lyy(x, ) and Dxy for

    2. It is also another name for SAT, summed area tables

  • SURF: FEATURE DETECTION & DESCRIPTION, Q4 2011 3

    Fig. 4. Lyy(x, ) and Lxy(x, ) DiscretizedGaussians and the approximations Dyy and Dxy

    Fig. 5. Where SIFT(left) downscales the image,SURF(right) uses larger and larger filters

    Lxy(x, ). The illustrated kernels corresponds toa of 1.2 and are the lowest scale that the SURFalgorithm can handle. When using the approx-imated kernels to calculate the determinant ofthe Hessian matrix - we have to weight it withw in equation 5, this is to assure the energyconservation for the Gaussians. The w term istheoretically sensitive to scale but it can be keptconstant at 0.9 [2].

    To detect features across scale we have toexamine several octaves and levels, where SIFTscales the image down for each octave anduse progressively larger Gaussian kernels, theintegral images allows the SURF algorithm tocalculate the responses with arbitrary large ker-nels(Figure 5).

    .

    Fig. 6. Increasing the size of kernels, andkeeping the lobes correctly scaled

    This does pose two challenges, how to scalethe approximated kernels and how this influ-ences the possible values for . The kernels hasto have an uneven size to have a central pixeland the rectangular areas, the lobes, has to havethe same size(Figure 6).

    The SURF paper [2] goes into detail withthese considerations. The result is that divisionof scale space into levels and octaves becomesfixed as illustrated in Figure 7. The filter sizewill be large and if the convolution were tobe done with a regular Gaussian kernel thiswould be prohibitively expensive. The use ofintegral images not only makes this feasible- it also does it fast, and without the needto downscale the image. It should be notedthat this approach with large box filters canpreserve and be sensitive to high frequencynoise.

    When finding a extrema at one of the higheroctaves the area covered by the filter is ratherlarge and this introduces a significant error forthe position of the interest point. To remedythis the exact location of the interest pointare interpolated by fitting a 3D quadratic inscale space [4]. An interest point is located in

  • SURF: FEATURE DETECTION & DESCRIPTION, Q4 2011 4

    Fig. 7. The scale () the filter sizes and octaveson a logarithmic scale

    scale space by (x, y, s) where x, y are relativecoordinates(x, y [0; 1]) and s is the scale.

    3 DESCRIPTION

    The purpose of a descriptor is to provide aunique and robust description of a feature,a descriptor can be generated based on thearea surrounding a interest point. The SURFdescriptor is based on Haar wavelet responsesand can be calculated efficiently with integralimages. SIFT uses another scheme for descrip-tors based on the Hough transform. Commonto both schemes is the need to determine theorientation. By determining a unique orienta-tion for a interest point, it is possible to achieverotational invariance. Before the descriptor iscalculated the interest area surrounding the in-terest point are rotated to its direction.

    The SURF descriptors are robust to rotationsand an upright version, U-SURF, should be ro-bust for rotations 15, without performing anorientation assignment [2]. I have implementedthe upright version, and will not go into furtherdetail on orientation assignment.

    The SURF descriptor describes an interestarea with size 20s. The interest area is dividedinto 4 4 subareas that is described by thevalues of a wavelet response in the x andy directions. The wavelet response in the xand y direction is refered to as dx and dyrespectively, the wavelets used to calculate theresponse is illustrated in Figure 9. The interestarea are weighted with a Gaussian centeredat the interest point to give some robustnessfor deformations and translations. The compo-

    Gaussian = 3.3s

    w = 20s

    Fig. 8. A 20s areas is divided into 44 subareasthat are sampled 5 5 times to get the waveletresponse(Figure 9)

    dy dx

    Fig. 9. The wavelets response. Black and whiteareas corresponds to a weight 1 and 1 for theHaar kernels. They are used with a filter size of2s

    nents involved in the calculations is illustratedin Figure 8.

    v = {

    dx,|dx|,

    dy,

    |dy|} (6)

    For each subarea a vector v (Equation 6) is cal-culated, based on 55 samples. The descriptorfor a interest point is the 16 vectors for thesubareas concatenated. Finally the descriptor isnormalized, to achieve invariance to contrastvariations that will represent themselves as alinear scaling of the descriptor.

    Several schemes varying the size, number ofsamples and wavelet function has been tested.This setup has been experimentally found asoptimal, taking performance and precision intoaccount [2].

    Having calculated the descriptors finding amatch between two descriptors is a matter oftesting if the distance between the two vectorsis sufficiently small. The SURF algorithm doesadd another detail to speed up matching, that

  • SURF: FEATURE DETECTION & DESCRIPTION, Q4 2011 5

    is the sign of Laplacian.

    2L = tr(H) = Lxx(x, ) + Lyy(x, ) (7)The laplacian is the trace of the hessian

    matrix (Equation 7) and when calculating thedeterminant of the hessian matrix these valuesare available. It is a matter of storing the sign.The reason to store the sign of the Laplacianis that distinguishes between bright blobs ondark backgrounds and vice versa. It is onlynecessary to compare the full descriptor vectorsif the have the same sign, which can lower thecomputational cost of matching.

    4 IMPLEMENTATIONIn the previous sections I have outlined thetheoretical background for the SURF algorithm,in this section I will go into details with theimplementation I have made.

    I have used the openCV framework, this isa C++ framework for computer vison. It shipswith its own implementation of SURF and SIFTand several other computer vision algorithms.It was chosen as it provides a good low levelrutines for working with images and easy load-ing and saving of different image formats.

    When implementing i have studied both animageJ plugin3 for SURF and openSURF 4 forcomparison and inspiration. Evans that ini-tiated openSURF has published an technicalarticle detailing the individual steps in thealgorithm [5].

    The focus when implementing has been ongetting to a state where it was possible tostart automated testing, such that the effectsof additional refinements could be examined.It has not been a priority to optimize for speedor memory usage.

    4.1 DetectionThe example image used throughout the reportis the barbecue setting (Figure 1), but duringimplementation testing other images have beenused to verify the results5.

    3. www.labun.com/imagej-surf4. http://www.chrisevansdev.com/5. All images have been processed as 64bit greyscale floating

    point, as the input has been normal 3 channel color imagesthis is overkill, but working with double precision arithmeticreduces precision errors.

    Fig. 10. The hessian determinant (Det(Happrox))response maps for the 2th octave. Positive val-ues are green and negative values blue

    The first step is to create an integral image,this is done using an built-in opencv routine.Then the response maps for 4 levels in 4 octavesare created. The size of the filters are fixed andwhen calculating the response maps the exactsize of the lobes and their offsets are calculatedbased on the filter size(see Figure 6).

    boxfilters are used to find Dxx, Dyy and Dxyand then calculate the approximated Hessiansand Laplacian. Before using D?? they are nor-malized - such that the responses does notdepend on the filter size.

    The dimension of the response maps areequal within an octave, and is halved whengoing up an octave, this is implemented byincreasing the distance between samples.

    When implementing this part it was criticalto have some meaningful debugging output.Figure 10 and 11 shows a montage of someof this debugging output6. This type of outputcoupled with simple test images proved to bea valuable tool to inspect the behavior of thedetector.

    After having built the response maps the

    6. Images using the green/blue coloring have been prepro-cessed with histogram equalization, as this presents the databetter for visual inspection.

    www.labun.com/imagej-surfhttp://www.chrisevansdev.com/

  • SURF: FEATURE DETECTION & DESCRIPTION, Q4 2011 6

    Fig. 11. The Dxx response map for the largestfilter in each octave. The images are to scale,it can be observed that not only is the imagessmaller also the filters are larger, thus removingmore noise

    next step is to do a non-maximum sup-pression on the hessian determinant values(Det(Happrox)). Neubeck et al [6] provides prac-tical considerations for an efficient algorithm,which weighs the theoretical bounds againstreal life access times. Time did not permit op-timizing this part. The neighborhood for non-maximal suppression is illustrated in Figure 2.To filter out noise only are thresholded beforeconsidering them.

    Having identified a maximum, the next stepis to interpolate the position in scale spaceusing the method from Lowe [4], Evans [5]provides good notes on the details of this step.In essence we have to fit a 3D quadratic ex-pressing the Hessian as a function H(x, y, )byusing a Taylor expansion(8) for finding theextrema by setting the derivative to zero andsolving the equation(9) to find x = (x, y, )

    H(x) = H +HT

    xx+

    1

    2xT2HT

    x2x (8)

    x =2H1

    2x

    H

    x(9)

    The derivatives are calculated from finite dif-ferences in the response maps. And the inverseis found using an opencv function with anSVD-decomposition. This step proved to be anexercise in rigor and indexing discipline. It wasimplemented such that it is possible to compile

    Fig. 12. Interest point and at which scale s theyare detected

    with or without interpolation, Figure 12 showsthe scales at which interest points are found.

    Unfortunately the results are consistentlybetter without interpolation. The interpolationis the result of fitting a 3D quadratic in scalespace and the interpolation itself might givewrong results, that is results that are unreason-able far from the initial point. When studyingthe source for openSURF and the imageJ pluginit was obvious that the exact criteria for an badresult where up to interpretation. The resultfrom the interpolation is relative offsets fromthe original point, and to weed the bad resultsout, interpolation results with any component> 0.5 discards the offset. 7

    4.2 DescriptionFiXmeNote:bet-terfig-ure?

    In the previous section I have described howto identify interest points, this section will gointo detail about describing these points. Theoverall and theoretical background is in Section3.

    I have implemented the U-SURF variant, thatis i do not assign an orientation to the interestpoint. This was chosen as U-SURF should befairly rotational invariant and orientation as-signment could be implemented at later stageto improve the performance.

    7. This is similar to the same tactic used in openSURF, theimageJ plugin checks whether the point is still inside the scalespace pyramid.

  • SURF: FEATURE DETECTION & DESCRIPTION, Q4 2011 7

    Fig. 13. The interest points detected in thebarbecue setting

    Fig. 14. An image representation of the descrip-tors, together with the areas described. Theareas are weighted by a Gaussian as describedin section 3.

    To calculate the descriptors, each of the 16subareas surrounding the interest point aresampled to get the wavelet response dx anddy together with their absolute values usingthe scheme illustrated in Figure 8. The waveletresponse is calculated using the integral imagewith box filters with filter size 2s. For the casedx this is the value of square to the right minusthe value of a square to the left of the interestpoint(See Figure 9). This response are thenweighted by a Gaussian ( = 3.3s) centered atthe interest point.

    The vector consisting of the concatenatedvalues of all 64 descriptors are normalized as alast step. The interest points can be saved to abasic ASCII list where precision does matter. It

    dx

    dy

    |dx| |dy|

    Fig. 15. The layout of the image representationof an desciptor (Figure 14)

    Fig. 16. Matching with a 10 rotated version.The circles has a diameter of 10s

    was interesting to note that with the defaultprecision saving and loading a list and test-ing against the same image would not give a100% match. Figure 14 shows the area sampletogether with the descriptors according to thelayout in Figure 15.

    The program together with instructions oncompiling and running i is available in Ap-pendix A.

    4.3 Futher work

    Focus and time has been put at getting agood quality of the results. There is still thingsthat could be optimized in this regard, justas there are some optimisations that could beimplemented that would increase performance.At the moment the structure of the code issuch that the response maps for each octaveare build in isolation, as can be seen in Figure6 several filter sizes are used across octaves,reusing the results should be possible. Furthermore the calculation of the response maps may

  • SURF: FEATURE DETECTION & DESCRIPTION, Q4 2011 8

    be done in parallel as suggested by Gossow etal [7] and the non-maximal suppression couldalso be done in parallel. In general there isa potential to get a considerable performancegain by using GPU computations8.

    5 RESULTSIt was very satisfying when the programshowed the first matching of interest points asillustrated in Figure 1, which incidentally is thesame test data used for the first match. Fromthere the next step were to setup automatedtesting, enabling testing and measuring howchanges and additions affected the quality.

    5.1 Testing

    When testing the algorithm there are severalkey factors to take into account. Gossow etal [7] has published a report comparing sev-eral open source Implementations of the SURFalgorithm, they introduce 3 different perfor-mance criteria. Repeatability, precision and recall.Repeatability is the how good the detector isat detecting the same features under differenttransformations of the images. As some trans-formations, for example an rotation, may moveinterest points out of the image it must takethis into account. Thus repeatability is definedas = #correspondences

    min(n1,n2), where n1,2 is the number of

    interest point in the images to be compared andcorrespondence whether the points are close tothe correct result. 9.

    Precision measures how many correctmatches are found, and recall is the relationbetween correct matches and correspondences.Gossow et al uses a fixed pool of imagesand they have calculated the transformationbetween them, so they are able to calculatecorrespondence and verify if a match is correct

    As i have implemented the U-SURF variantI have mainly focused the testing invariance toscale. This also removed the need to calculate

    8. Debugging GPU algorithms is notoriously difficult, and asfocus has been on quality this was not considered a reasonablestarting point for this study group. There exists an open sourceGPU implementation http://www.d2.mpi-inf.mpg.de/surf.

    9. Correspondence is defined based on the overlapping areabetween interest regions [7]

    the position of interest points after the transfor-mations, as the position is stored as relative co-ordinates, and correspondence is linear in withrespect to the distance between the match andthe original point, when there is no skewinginvolved.

    5.2 SetupThe tests have been carried out on a Intel corei5 processor with 8GB ram running a 64 bitUbuntu Linux natty narwhale, using the GCCcompiler version 4.5. with optimisation level 2and openCV version 2.2.

    I created a script the takes as input a sin-gle image and rotates and scales this image10,and then tries to match interest points fromthe original image to the transformed ver-sions(Instructions in Appendix A). It is a syn-thetic case, where the actual implementation ofthe scaling could influence the results. It is onthe other hand a reasonable test setup and it iseasy to rerun the test to investigate the effectsof different parameters. In the future a setupwhere different images are tested automaticallywould be beneficial as the image tested doesinfluence how many interest points are found.

    The statistics gathered are the number ofinterest points found and how they match in-terest points from the original image. Matchingis performed based on simple thresholding ofthe distance between the descriptor vectors. Toinvestigate this threshold the minimal distanceis found and the values are put into bins ofsize 0.1, a interest point are placed into all binswhen the distance is below the bins threshold,to create histograms.

    Furthermore the interest points are verified,the distance between the relative coordinatesare compared11, and this is considered as cor-rect matches which can be used to determinethe precision of the implementation. Since theverification only looks a relative coordinates itis not meaningful for rotations.

    The SURF paper [2] states that the U-SURFis robust 15 when looking at Figure 17 thiswould suggest that a threshold of 0.4 or 0.5

    10. using imagemagicks convert utility.11. I use a basic threshold .01, ideally it should have taken

    scale s and the image dimensions into account

    http://www.d2.mpi-inf.mpg.de/surf

  • SURF: FEATURE DETECTION & DESCRIPTION, Q4 2011 9

    Fig. 17. Distance between descriptor matchesfor different thresholds.

    would be reasonable, Figure 16 shows a match-ing with a tolerance of 0.4. If the threshold ishigher false positives starts occuring, here thereare no visual noticeable false positives. Thiscomparison is made using the euclidean dis-tance between the descriptor vectors, it couldbe interesting to examine the effect of using theMahalanobis distance12.

    When scaling it is possible to measure therepeatability, this is illustrated in Figure 18.There is a spike at 100% scaling where therecognition of course is 100%. At other scalesit hovers around 70% which is a bit worsethat the measurements in the SURF-article [2]and Gossow et al [7]. As mentioned in Section4.1 it is possible to compile without interpola-tion. The results without interpolation is shownin Figure 19 and it is frustrating to observethat complex calculations to interpolate in scalespace apparently gives worse results.

    The reason for this discrepancy could be assimple as an implementation bug, it could alsobe an side effect from the scaling. Withoutinterpolation the subset of possible positionsin scale space is fixed and it is entirely possiblethat the better results simple is the result of thepositions snapping to a grid. This suspicion isfurther fueled when inspecting the data withdifferent thresholds in Figure 20 and 21. Thespikes could very well be a snap to grid sideeffect. Its most likely a combination of several

    12. This is suggested by Bay et al [2]

    Fig. 18. Repeatability for scaling

    Fig. 19. Repeatability for scaling without inter-polation

    details.It may also be a the fault in the descriptors

    robustness to translations. There is reason tobelieve that fiddling with the parameters forcalculation of the descriptors will influence theresults. The reason is that the spikes in the plotswere more pronounced before changing the pa-rameters for calculating the descriptors13. Thereare several options that could be explored inmore detail, and Gossow et al do mention thatalgorithms using another interpolation schemefor the descriptors than the original consis-

    13. Correctly centering the Gaussian on the interest pointand not the subarea did improve the results, however bothincreasing or decreasing gave worse results

  • SURF: FEATURE DETECTION & DESCRIPTION, Q4 2011 10

    Fig. 20. The scaling data

    Fig. 21. The scaling data with out interpolation.

    tently showed better results14.Besides exploring different techniques it

    could be interesting to explore the parame-ters for the current implementation in moredepth. There are many different parametersthat can influence the type and amount ofinterest points. For instance the thresholdingbefore doing non-maximum suppression, thenumber of octaves and levels and the responsemaps initial size could be investigated further.

    6 CONCLUSIONImplementing the SURF algorithm has provento be a challenge. The results are not optimal,but as preceding sections shows, there are vast

    14. openSURF is using another scheme, I have not been ableto find the paper they reference: Agrawal ECCV 08

    possibilities for tweaking and fiddling, andthere are many places a subtle bug might hide.Gossow et al [7] shows that the implementationdetails do matter, and that there is more thanone way to implement it.

    It has been interesting and time consumingto implement the algorithm from the groundup. If I were to employ the SURF algorithm toa real world problem in the future, this experi-ence will be valuable when adapting a opensource implementation to my needs. Havingmore eyes on the code can help optimize de-tails and assure correct implementations. Thiswould free resources to investigate differentvariations of parameters and strategies.

    In conclusion, I consider this implementationsuccessful. It is able to detect and describe withconsistent results and demonstrates the coreprinciples of the SURF algorithms detectionand description scheme.

    Jacob Toft Pedersen Computer Science,Aarhus.

  • SURF: FEATURE DETECTION & DESCRIPTION, Q4 2011 11

    REFERENCES[1] D. G. Lowe, Distinctive image features from scale in-

    variant keypoints, International Journal of Computer Vision,2004.

    [2] S. U. R. F. (SURF), Herbet bay, andreas ess, tinne tuyte-laars, luc van gool, Elsevier preprint, 2008.

    [3] Wikipedia, Blob detection Wikipedia, thefree encyclopedia, 2011, [Online; accessed 14-July-2010]. [Online]. Available: https://secure.wikimedia.org/wikipedia/en/wiki/Blob detection

    [4] M. B. David Lowe, Invariant features from interest pointgroups, BMVC, 2002.

    [5] C. Evans, Notes on the opensurf library, University ofBristol, Tech. Rep. CSTR-09-001, January 2009. [Online].Available: http://www.chrisevansdev.com

    [6] L. V. G. Alexander Neubeck, Efficient non-maximum su-pression, ICPR, 2006.

    [7] D. P. David Gossow, Peter Decker, An evaluation ofopen source surf implementations, Active Vision Group,University of Koblenz-Landau, Tech. Rep., 2009?

    [8] Wikipedia, Feature detection Wikipedia, the freeencyclopedia, 2011, [Online; accessed 14-July-2010]. [On-line]. Available: https://secure.wikimedia.org/wikipedia/en/wiki/Feature detection %28computer vision%29

    CONTENTS

    1 Introduction 1

    2 Detection 1

    3 Description 4

    4 Implementation 5

    5 Results 8

    6 Conclusion 10

    Biographies 10

    References 11

    Appendix A: Program 11

    Appendix B: Resources 11

    APPENDIX APROGRAMThe source code is available at http://cs.au.dk/jtp/SURF. There is a Makefile and itshould compile on any linux platform with theopenCV libraries installed.

    The basic usage is:

    bin$./main needles.png haystack.jpg

    Which will search needles.png for interestpoints and try to match them to interest pointsfound in haystack.jpg, it will then diplaythe result as in Figure 16.

    The main program accepts several options,the most important are:

    -o --output filename will save the out-put instead of displaying it.

    -c --create filemame will save descrip-tion of needles.png and not try tomatch.

    -l --load filemame will load descriptionof interest points and try to match.

    -t --tolerance dist will only considerit as a match if the distance betweendescriptor vectors is less than dist

    -v --verbose will be verbose, severalvs as -vvvvv will increase verbositylevel.

    There are further options to produce de-bugging output, statistical output or run tests.Theese options are documented in main.cpp.

    To run automated test there is a bash scriptcreatecase.sh in the data directory. It canbe run as

    data$./createcase.sh bbq.jpg suffix

    This will create plots the directorydata/bbq/plots. This was used to generatethe plots in the used in Section 5

    APPENDIX BRESOURCESThe main resources for doing the implementa-tion has been the SURF paper [2] and Evans[5].

    To get a general feel of computer vision andthe techniques used Wikipedia does have angood section on feature detection [8]. The SIFT

    https://secure.wikimedia.org/wikipedia/en/wiki/Blob_detectionhttps://secure.wikimedia.org/wikipedia/en/wiki/Blob_detectionhttp://www.chrisevansdev.comhttps://secure.wikimedia.org/wikipedia/en/wiki/Feature_detection_%28computer_vision%29https://secure.wikimedia.org/wikipedia/en/wiki/Feature_detection_%28computer_vision%29http://cs.au.dk/~jtp/SURFhttp://cs.au.dk/~jtp/SURF

  • SURF: FEATURE DETECTION & DESCRIPTION, Q4 2011 12

    article [1] and http://www.aishack.in/2010/05/sift-step-1-constructing-a-scale-space/made it possible to make a rudimentary SIFT- detector, it is not a part of the codebaseany longer, as the initial plan to have bothalgorithms were not feasible.

    In general http://www.aishack.in/ doeshave some interesting articles on computer vi-sion and working with openCV. With descrip-tions of various concepts: Hough transform,features, corner detection, scale space et cetera.

    I was led on a an interesting but none the lesswild goose chase with optimizing non-maximalsuppression. It was an interesting article but itwas not an essential part of the algorithm.

    The documentation supplied with openCV isgood and thorough, it was unfortunate that idid not realize that their addressing conventionwere col, row and not row, col, this was not no-ticeable before is causes x and y coordinates tobe switched. This particular misunderstandingand rigorously going through the code to fix itwere particular annoying.

    LIST OF FIGURES1 Detecting common features be-

    tween two images with differentscale and cropping. . . . . . . . . . 1

    2 3 octaves with 3 levels, The neigh-borhood for the 3 3 3 non-maximum suppression used to de-tect features is highlighted. . . . . 2

    3 4 memory look ups is sufficient tocalculate the sum of an rectangulararea with an integral image . . . . 2

    4 Lyy(x, ) and Lxy(x, ) DiscretizedGaussians and the approximationsDyy and Dxy . . . . . . . . . . . . . 3

    5 Where SIFT(left) downscales theimage, SURF(right) uses larger andlarger filters . . . . . . . . . . . . . . 3

    6 Increasing the size of kernels, andkeeping the lobes correctly scaled . 3

    7 The scale () the filter sizes andoctaves on a logarithmic scale . . . 4

    8 A 20s areas is divided into 44 sub-areas that are sampled 5 5 timesto get the wavelet response(Figure 9) 4

    9 The wavelets response. Blackand white areas corresponds toa weight 1 and 1 for the Haarkernels. They are used with a filtersize of 2s . . . . . . . . . . . . . . . 4

    10 The hessian determinant(Det(Happrox)) response mapsfor the 2th octave. Positive valuesare green and negative values blue 5

    11 The Dxx response map for thelargest filter in each octave. Theimages are to scale, it can be ob-served that not only is the imagessmaller also the filters are larger,thus removing more noise . . . . . 6

    12 Interest point and at which scale sthey are detected . . . . . . . . . . 6

    13 The interest points detected in thebarbecue setting . . . . . . . . . . . 7

    14 An image representation of the de-scriptors, together with the areasdescribed. The areas are weightedby a Gaussian as described in sec-tion 3. . . . . . . . . . . . . . . . . . 7

    15 The layout of the image represen-tation of an desciptor (Figure 14) . 7

    16 Matching with a 10 rotated ver-sion. The circles has a diameter of10s . . . . . . . . . . . . . . . . . . 7

    17 Distance between descriptormatches for different thresholds. . 9

    18 Repeatability for scaling . . . . . . 919 Repeatability for scaling without

    interpolation . . . . . . . . . . . . . 920 The scaling data . . . . . . . . . . . 1021 The scaling data with out interpo-

    lation. . . . . . . . . . . . . . . . . . 10

    http://www.aishack.in/2010/05/sift-step-1-constructing-a-scale-space/http://www.aishack.in/2010/05/sift-step-1-constructing-a-scale-space/http://www.aishack.in/

    IntroductionDetectionHessian matrix interest points

    DescriptionImplementationDetectionDescriptionFuther work

    ResultsTestingSetup

    ConclusionBiographiesJacob Toft Pedersen

    ReferencesAppendix A: ProgramAppendix B: Resources