Institutionen för systemteknik - DiVA portal442585/FULLTEXT02.pdf · today typically take many similar photographs on similar scenes during a photo ... synthesis works by stitching

Institutionen för systemteknikDepartment of Electrical Engineering

Examensarbete

Image Completion Using Local Images

Examensarbete utfört i Informationskodningvid Tekniska högskolan vid Linköpings universitet

av

Mikael Dalkvist

LiTH-ISY-EX--11/4506--SE

Linköping 2011

Department of Electrical Engineering Linköpings tekniska högskolaLinköpings universitet Linköpings universitetSE-581 83 Linköping, Sweden 581 83 Linköping

Image Completion Using Local Images

Examensarbete utfört i Informationskodningvid Tekniska högskolan i Linköping

av

Mikael Dalkvist

LiTH-ISY-EX--11/4506--SE

Handledare: Jens Ogniewskiisy, Linköpings universitet

Examinator: Ingemar Ragnemalmisy, Linköpings universitet

Linköping, 20 September, 2011

Avdelning, InstitutionDivision, Department

Division of Information CodingDepartment of Electrical EngineeringLinköpings universitetSE-581 83 Linköping, Sweden

DatumDate

2011-09-20

SpråkLanguage

� Svenska/Swedish� Engelska/English

�

�

RapporttypReport category

� Licentiatavhandling� Examensarbete� C-uppsats� D-uppsats� Övrig rapport�

�

URL för elektronisk versionhttp://www.icg.isy.liu.se

http://www.ep.liu.se

ISBN—

ISRNLiTH-ISY-EX--11/4506--SE

Serietitel och serienummerTitle of series, numbering

ISSN—

TitelTitle Image Completion Using Local Images

FörfattareAuthor

Mikael Dalkvist

SammanfattningAbstract

Image completion is a process of removing an area from a photograph and replacingit with suitable data. Earlier methods either search for this relevant data withinthe image itself, or extends the search to some form of additional data, usuallysome form of database.

Methods that search for suitable data within the image itself has problemswhen no suitable data can be found in the image. Methods that extend theirsearch has in earlier work either used some form of database with labeled imagesor a massive database with photos from the Internet. For the labels in a databaseto be useful they typically needs to be entered manually, which is a very timeconsuming process. Methods that uses databases with millions of images fromthe Internet has issues with copyrighted images, storage of the photographs andcomputation time.

This work shows that a small database of the user’s own private, or professional,photos can be used to improve the quality of image completions. A photographertoday typically take many similar photographs on similar scenes during a photosession. Therefore a smaller number of images are needed to find images that arevisually and structurally similar, than when random images downloaded from theinternet are used.

Thus, this approach gains most of the advantages of using additional data forthe image completions, while at the same time minimizing the disadvantages. Itgains a better ability to find suitable data without having to process millions ofirrelevant photos.

NyckelordKeywords hole filling, image completion, scene completion, image compositing, image

database, inpainting, GPU, CUDA

AbstractImage completion is a process of removing an area from a photograph and replacingit with suitable data. Earlier methods either search for this relevant data withinthe image itself, or extends the search to some form of additional data, usuallysome form of database.

Methods that search for suitable data within the image itself has problemswhen no suitable data can be found in the image. Methods that extend theirsearch has in earlier work either used some form of database with labeled imagesor a massive database with photos from the Internet. For the labels in a databaseto be useful they typically needs to be entered manually, which is a very timeconsuming process. Methods that uses databases with millions of images fromthe Internet has issues with copyrighted images, storage of the photographs andcomputation time.

This work shows that a small database of the user’s own private, or professional,photos can be used to improve the quality of image completions. A photographertoday typically take many similar photographs on similar scenes during a photosession. Therefore a smaller number of images are needed to find images that arevisually and structurally similar, than when random images downloaded from theinternet are used.

Thus, this approach gains most of the advantages of using additional data forthe image completions, while at the same time minimizing the disadvantages. Itgains a better ability to find suitable data without having to process millions ofirrelevant photos.

v

Acknowledgments

I would like to thank Ingemar Ragnemalm and Jens Ogniewski at Linköping Uni-versity for the valuable support and inspiration during the work with this thesis.

vii

Contents

1 Introduction 11.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Background and Related Work 32.1 Local Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1.1 Texture Synthesis . . . . . . . . . . . . . . . . . . . . . . . 32.1.2 Image Inpainting . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Global Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.3 Recent Progress . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.4 Commercial and Other End User Software . . . . . . . . . . . . . . 9

3 Method 113.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.2 Semantic Scene Matching . . . . . . . . . . . . . . . . . . . . . . . 123.3 Local Context Matching . . . . . . . . . . . . . . . . . . . . . . . . 133.4 Adjustments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4 Implementation 194.1 Precomputation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.2 Finding Similar Images . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.2.1 Semantic Scene Matching . . . . . . . . . . . . . . . . . . . 204.2.2 Local Context Matching . . . . . . . . . . . . . . . . . . . . 204.2.3 Texture Matching . . . . . . . . . . . . . . . . . . . . . . . 234.2.4 Graph Cut Seam Finding . . . . . . . . . . . . . . . . . . . 23

4.3 Poisson Blending . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.4 GPU Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 24

5 Results 255.1 System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

ix

x Contents

6 Evaluation 336.1 Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346.3 Commercial Potential . . . . . . . . . . . . . . . . . . . . . . . . . 34

7 Conclusion 357.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Bibliography 37

Chapter 1

Introduction

1.1 BackgroundImage completion algorithms are a group of digital image manipulation algorithmsthat aim to fill or replace an area in a photograph with new data. The data shouldbe added in a way that makes it impossible for the human vision to detect thatthe image has been modified. This procedure could be used to remove unwantedobjects in a scene or to restore damaged photos.

This is a problem that researchers have been trying to solve for years. Imagecompletion algorithms can roughly be divided into two categories; algorithms thatsearch for suitable data to complete the image within the image itself and algo-rithms that search in other images for this data. Both have its own set of strengthsand weaknesses, which are described in more detail later in this thesis.

1.2 MotivationMost research on this subject has explored algorithms that search for suitable datawithin the source image itself. The same is true for commercial software. Thesealgorithms are relatively fast and produce good results in areas with reasonableuniform textures. However, when the scenes get more complex and the algorithmcan’t find suitable data, the results tend to become unusable.

Another approach, by Hays [13], that has gained some attention in the researchcommunity is to make use of millions of images downloaded from the Internet whensearching for suitable data. This approach has proven to be more robust even formore complex scenes. It does however require the user to download and store amassive amount of images. Another problem is the computation times needed.Maybe most important though, at least for a commercial implementation, are thecopyright issues. The user, or developer, would need to arrange licensing deals forall of the millions of photographs. A time consuming and costly process.

Most work up until now have been done in one of these two categories ofalgorithms. Either the search for suitable data is only done in the image itself, at

1

2 Introduction

a high risk of not finding enough data, or it is extended to a large database filledwith data irrelevant for the current image completion.

1.3 AimThe primary aim of this thesis is to explore the gap between the two categoriesdescried in section 1.2 and preform the search within a small amount of relevantdata.

A more concrete goal is to find an algorithm that make use of the personalphotos already stored on the computer and to implement a program that usesthis algorithm. This algorithm would be somewhere in between the algorithmpresented by Hays et at., that downloads millions of images from the Internet, andalgorithms like the one presented by Criminisi et al. [6] that only uses the sourceimage. This algorithm should use thousands of images rather than millions. Thiscould potentially produce more robust results than algorithms that only use thesource image, while avoiding many of the problems associated with using millionsof photographs from the Internet.

Another goal is to make the program run as fast as possible. The computationtime and processing power required by Hays et al. makes the algorithm unsuitablefor day to day use. A fast implementation of this algorithm would thus make itmore competitive against relatively fast algorithms like Criminisi et al’s, especiallyfor use in commercial software.

Chapter 2

Background and RelatedWork

This chapter will explain some of the concepts used in this thesis, and provide aoverview of related work in this field prior to this thesis.

Image completion algorithms aim to fill or replace an area in a photographwith new data. The data should be added in a way that makes it impossible forthe human vision to detect that the image has been modified. Early work focusedon extending textures [11] and removing text and thin scratches from images [4].Researchers have since explored the possibilities of removing larger objects [6]and the use of databases to enable more semantically valid image compositions[7, 16, 13].

Previous work in this field can be divided into two groups of algorithms; localand global algorithms. Local algorithms search the input image for suitable imagedata to fill the missing region with. Global algorithms extend their search forsuitable data beyond the source image. The search can extended to find similarimages, or to look in a database with semantically labeled images for example.

2.1 Local Algorithms2.1.1 Texture SynthesisTexture synthesis is a process of automatically generating an arbitrary sized image(texture) from an sample image, that is structurally similar to the sample image.The new texture should not have visible artifacts such as seams or visible edgesnor have repeating structures.

A number of different techniques for image synthesis have over the years beenproposed for texture synthesis, a short summary of some of them follows. Figure2.1 shows a comparison of four texture synthesis methods [26, 33, 30, 10].

Tiling methods simply puts several copies of the sample texture side by side.This is the simplest of texture synthesis technique and does rarely produce

3

4 Background and Related Work

Figure 2.1. Various image synthesis methods. Images in column a: input image, b:Portilla and Simoncelli [26], c: Xu et al. [33], d: Wei and Levoy [30], e: Efros et al. [10].Comparison from Efros et al. [10].

satisfying results due to highly visible seams and repetitive structures.

Stochastic texture synthesis algorithms generate textures by assigning ran-dom color values to each pixel. The color values are controlled by basicparameters such as brightness, color and contrast. These algorithms canonly generate stochastic (noise like) textures.

Single purpose structured texture synthesis algorithms are highly special-ized on generating a specific type of texture. E.g. an algorithm may generateexcellent brick textures, but fail miserably if the sample image is a grass tex-ture.

Chaos mosaic is an extension of tiling. First fill the output by tiling the sampletexture. Then randomly copy and paste parts of the sample to the outputtexture. Finally, apply a smoothing filter to remove visible seams. Thismethod produce non repetitive textures, but the smoothing step results in ablurred image.

Pixel-based texture synthesis algorithms [24, 11, 30, 14] are among the mostsuccessful general purpose texture synthesis algorithms. The synthesis gen-erates a texture by finding color values in the sample texture with neighbor-hoods similar to the generated texture, one pixel at a time. They aim atpreserving the local structures of the sample. The algorithms are typicallyaccelerated using Approximate Nearest Neighbor to speed up the search for

2.1 Local Algorithms 5

Figure 2.2. Comparison between image quilting [10] (center) and image and videosynthesis using graph cuts [17] (right). Sample texture on the left. Image from Kwatraet al. [17].

suitable pixel values. Early work by Efros et al. [11] used pixel-based texturesynthesis to, among other things, extend the boundaries of photographs.

Patch-based texture synthesis works by stitching together small patches ofthe sample texture at various offsets. These tend to be faster and moreeffective then pixel-based methods. Efros et al. presented an patch-basedtexture synthesis method [10] that enables both texture synthesis and texturetransfer. Kwatra et al. [17] use graph cuts to find better seams between thepatches in their method, seen in figure 2.2.

These methods are generally very effective at generating consistent textures.But hole filling in real world photographs is a different story due to the complexnature of spatially interacting textures. It is primarily the boundary regions be-tween image regions, where the different textures mutually influence each other,that causes problems.

2.1.2 Image InpaintingImage inpainting is a process of reconstructing parts of an image that has beenlost due to scratches, dust and speckles etc. This process where traditionally doneby image and art restoration artists, but a move towards digital processes havetaken place in recent years.

Bertalmio et al. introduced the term digital inpainting at the SIGGRAPHconference in 2000 [4]. Their work were an attempt to digitally replicate someof the basic methods used by professional art restorators. After the user hasselected what regions to be removed, their algorithm fills in the regions using thedata surrounding them. This allows removal of dust, scratches, speckles and evenoverlaid text such as subtitles or dates, figure 2.3.

The algorithm [4] by Bertalmio et al. that pioneered the field of digital imageinpainting is based on partial differential equations (PDEs), and treats the redgreen and blue color channels individually. Here follows a short summary of thealgorithm.


Figure 2.3. Result from the original inpainting method by Bertalmio et al. [3]. Com-pleted images on the right, original images on the left. Image from Bertalmio et al.[3].

• The user marks what area(s) of the image to be removed.

• The algorithm fills the region in each color channel by propagating dataalong isotopes from the neighboring region outside of the marked area. Thisaims to preserve edges during the propagation.

• Isotope directions are calculated using gradient vectors at each position alongthe contours of the marked region(s).

• Locally calculate color smoothness variation.

• Smooth the inpainted region.

Bertalmio et al. later refined their work using fluid dynamics and the Navier-Stokes equations[3]. The refined algorithm also work for digital inpainting in video.

The work of Bertalmio et al. [4, 3] where later extended by Criminisi et al.[6] to enable removal of larger objects, seen in figure 2.4. Their algorithm aims tocombine the advantages of texture synthesis and image inpainting.

This method uses an exemplar-based texture synthesis technique. The fill orderof the missing region is controlled by prioritizing the pixels. The priority of a pixelis determined by a confidence value together with the image isotopes.

The algorithm is able to propagate both linear structures and two-dimensionaltextures into the missing region. They found that the fill order was of utmostimportance for this. Many previous methods used some form of simple linear fillorder.

2.2 Global Algorithms 7

Figure 2.4. Image from Criminisi et al. [6]. Completed image on the right, original onthe left.

2.2 Global AlgorithmsSuppose the user would like to replace an overexposed sky in a photograph. If thewhole sky is damaged, local algorithms have no suitable data to propagate intothe damaged region. Thus resulting in very unsatisfying image completions, seenin figure 5.1. To avoid these problems researchers have explored the possibilitiesof extending the search for suitable image data beyond the source image. Whenextending the search, it is important to make sure that the completed image isstill semantically valid. Copying semantically invalid objects or image regions intoa scene, even though it may be a perfect match locally, is no better for the finalimage than what a local algorithm could achieve.

One possible approach to this problem is to use a database with labeled images[7, 16]. An image typically consist of several different regions; sky, a road and afield for example, all of these regions, in all the photos in the database are manuallylabeled with suitable keywords. This allows the user to import a certain type ofimagery into a specified region.

However this approach has some problems. In image completion, usually avery specific type of object is needed in order for it to fit in the image. In analleyway for example, not just any type of road will fit. This is a problem thatis hard to solve with broad and general keywords. On the other hand, using toospecific keywords will make the system more complex to use. It can also be hardto describe what type of road is needed using only a few keywords.

Another problem with this approach is that labeling and maintaining a largedatabase is time consuming and exhausting work.

Hays et al. saw these problems and decided to take a different approach in theirwork [13]. Instead of using a labeled database, they download a massive amount of


Figure 2.5. Hays et al. [13] use millions of photographs to find suitable scenes tocomplete the input image with. Image from Hays et al. [13].

images from the Internet and applied sophisticated algorithms to find semanticallysimilar scenes, figure 2.5. The main part of these algorithms are based on earlierwork by Oliva and Torralba [22, 28, 23]. This method is described in greater detailin chapter 3.

2.3 Recent ProgressImage Completion is a very active research field and a vast number of papers havebeen published in the last few months. Here follows a short summary of some ofthese.

Image Inpainting by Patch Propagation Using Patch Sparsity

In May 2010 Xu et al. published an exemplar-based image inpainting algorithm[34] that is investigating the sparsity of natural image patches. This approachresults in better separation between structure and texture, and ensures that theinpainted region is sharp and consistent with surrounding areas.

Inpainting by Joint Optimization of Linear Combinations of Exemplars

Wohlberg [32] proposed an alternative exemplar-based image inpainting approachin January 2011 that targets some of the problems associated with earlier ap-proaches. These methods typically fill the missing region progressively inwardsfrom the boundary. This renders the final result very dependent on the fill order.Even though significant progress has been made in this area over the years, somecases still causes artifacts due to the greedy nature of the approach. Wohlbergproposes an alternative method that simultaneously assigns an estimated value tothe entire missing region.

An Image Completion Method Using Decomposition

In February 2011 Dizaroğlu [8] presented an hybrid method that combines inpaint-ing and texture synthesis techniques. The proposed method completes images us-ing geometry and texture components of the input data. The main contribution

2.4 Commercial and Other End User Software 9

of the paper is the use of decomposition and montage stages together, which givesbetter results compared to existing methods.

Image Segmentation and Inpainting Using Hierarchical Level Set andTexture Mapping

Du et al. presented in April 2011 a novel approach for image inpainting [9]. Thisapproach aims to estimate the structure of the missing region using the Mumford-Shah model [20] and a level set model. This approach can detect and preserveedges in the inpainting area and exemplar-based texture synthesis is used to filltextured regions.

Image Inpainting With Salient Structure Completion and Texture Prop-agation

Published in July 2011. Li et al. [18] propose a new approach to image inpaintingbased on the human visual characteristics. The proposed method uses salientstructure completion and texture propagation to fill missing regions. Li et al.shows that this can effectively remove objects from real and synthetic images andthat the results are favorable compared to existing methods.

2.4 Commercial and Other End User SoftwareApart from being an extremely active research field, image completion has beenmaking its way into commercial software over the last years. Here follows a briefsummary of some of them.

Adobe Photoshop

Adobe Systems’ market leading photo editing software Photoshop features a num-ber of image completion tools. Most of them are traditional inpainting tools suchas the various healing brush-tools. Most relevant for this thesis is however thecontent aware fill function that allows the user to remove large objects in the im-age, introduced in the CS5 version in 2010. This fully automated function seemsto work similar to recent image completion publications, though Adobe Systemsdoes not release any technical details.

Corel PaintShop Photo Pro

Corel Corporation’s PaintShop Photo Pro has a similar feature set as Adobe Pho-toshop. Introduced with PaintShop Photo Pro X3 in 2010, the Smart Carverfunction does, similar to content aware fill in Adobe Photoshop, allow the user toremove large objets in the image.


AKVIS Retoucher

AKVIS Retoucher is a photo restoration software that enables removal of largeobjects via a extensive hands-on process.

GIMP, GNU Image Manipulation Program

GIMP is a freely distributed photo editing software and not commercial software.It does however support impressive image completion tools via user created plug-ins.

Chapter 3

Method

This chapter will describe the algorithm developed for this project. It is mainlybased on the work by Hays et al. [13], though with adjustments to make it workwith a smaller database and to increase performance at run time.

3.1 Overview

As a first step it is desirable to find the images that are semantically most similar(see section 3.2) to the incomplete image that is to be completed, the source image.Further calculations are then only performed on the semantically most similarimages instead of the whole database. This will heavily reduce the computingtime of the image completions. Hays et al. [13] uses the top 200 matches, out of2.3 million images, for further calculations and thus decreases computation time byorders of magnitude. u This project uses a much smaller database, approximately7,000 images, so it is likely that the algorithm will not find as many semanticallyvalid photos. Therefore only the top 40 matches are used for further calculations.The subsequent steps will only use these semantically most similar images in theircalculations.

The algorithm then continues by searching for the scale and alignment of theseimages that best correlates with the area around the missing region. It also com-pares the texture in this area using a simple texture descriptor, figure 3.3.

The final step is to composite the source image with the best matching imagesfrom the database. To make the blend as seamless as possible and to avoid cuttingthough objects a form of graph cut seam finding[17] algorithm is applied. Afterthis, the twenty images with the lowest combined cost are blended with the sourceimage using Poisson image blending[25]. All of the twenty blended images are thenpresented to the user to choose from, since it is not always the alternative withthe lowest cost for the completion that best resembles what the user intended.

11

12 Method

3.2 Semantic Scene MatchingThere are some potential problems that may occur when the search for suitableimage data to fill the missing region with is expanded beyond the source image.One is that the matched image may be a good match locally, with similar colorand texture to the neighboring area of the missing region, but semantically invalid;e.g. a pond transferred in to a city street.

Another problem is that doing all of the matching computations on thousands,or even millions (as done by Hays [13]), of images to find suitable image data isnot computationally feasible. Thus, some method of finding the most semanticallystructurally similar images to the source image is needed. More extensive matchingcomputations can then be done on the best matches.

One way of finding semantically similar images is to let the user manually la-bel the images with keywords. However, when the database consist of millionsof images, this becomes an unreasonable task. This is also not desirable for au-tomated end-user tools, especially for non-professionals, where manual labelingwould not be appreciated and may be performed inappropriately. Instead, a gistscene descriptor [22, 28, 23] is used to describe the semantics of the images.

This descriptor consist of the impulse response from several Gabor filters [12],with different angles and scales, applied to the image. These impulse responsesare then aggregated to a x by y spatial resolution. This results in a x ∗ y ∗M ∗Ndimensional descriptor vector, where M is the number of scales used and N thenumber of different orientations used per scale. Hays et al. [13] found that a gistscene descriptor with edge responses at 6 orientations and 5 scales, aggregated toa 4 ∗ 4 spatial resolution worked best for this purpose. That setup is used in thisthesis as well.

Torralba et al. [28] suggest the use of Principal Component Analysis, PCA,to reduce the descriptor even further and only use the most relevant dimensionsof the vector. However Hays et al. [13] argue that due to the arbitrary missingdimensions in the source image, it will not work in this usage case. Thus it is notused in this thesis either.

As a compliment to the gist scene descriptor, a down sampled 4 by 4 versionof the image in CIELAB color space is used to ensure that the matched imageswill have similar color to the source image, figure 3.1. CIELAB is a color spacedesigned to approximate the human vision and can represent all colors visible tothe human eye.

Since the source image has missing regions, the descriptors need to be weightedby a mask with the ratio of valid pixels at each spatial bin. An example of suchmask is visualized in figure 3.1. If this mask is applied during the search insteadof when the descriptors are calculated, all of the descriptors can be precomputed.This limits the computation required at runtime and thus increases the perfor-mance by orders of magnitude

The SSD error (Sum of Squared Differences) between the gist descriptor ofthe source image and the gist descriptors of all of the images in the database iscalculated in order to find the most similar images. The same thing is done withthe color descriptors. The differences are then weighted so that the gist descriptor

3.3 Local Context Matching 13

Figure 3.1. Left 4 by 4 color image. Right: weighted mask that represents the ratio ofvalid pixels in each spatial bin. White: 1, black: 0, gray values in between.

contributes approximately twice as much as the color descriptor to the final score.The SSD error in this context is defined as:

ssd =n∑

i=0(fs(i)− fm(i))2 (3.1)

where fs and fm are the descriptors being compared for source image and matchedimage (from the database) respectively. n is the number of elements in the de-scriptors.

The final error for each image is defined as:

escene = SSDgist + x ∗ SSDcolor (3.2)

where x is a factor that makes the gist scene descriptor contribute roughly twiceas much as the color descriptor to the final error. The 40 images with lowest errorare then used for further calculations, hereby referred to as the matched images .

Not only does this method find the semantically most similar images and thusthe best potential matches, it also increases the performance of the algorithm as aresult of limiting most of the computations to only a fraction of the dataset. Withthis operation Hays et al. [13] constrained further calculations to a fraction lessthan 1/10, 000 of the dataset. Since this thesis already uses a small database, thedifference is not as large, 1/175.

3.3 Local Context MatchingOn the 40 images selected by the previous step, local context matching is applied.The goal of the local context matching is to find which scale and alignment of the

14 Method

Figure 3.2. A: missing region. B: local context c.

matched images that is the best fit for the source image. It is most critical thatthe matched image is a good fit to the source image in the area close to the missingregion. Therefor a local context c that consist of all pixels in the source imagewithin a 20 pixel radius from the boundary of the missing region is defined, seefigure 3.2. A comparison between c and the matched images, at all valid positionsand three scales (1, .9, .81), are done with a pixel-wise SSD error in CIELABcolor space. A position is considered valid if the matched image can fill the wholemissing region.

Since the images are assumed to be roughly aligned with the source image,a small penalty based on the magnitude of the offset is added to the error ateach position. This will encourage the algorithm to pick matches with a smalltranslational offset. The alignment with the lowest error for each image is fromthis point used in all calculations. The error values are saved as econtext.

A texture descriptor: a 5x5 median filter of image gradient magnitude at eachpixel is also computed for all of the matched images, see figure 3.3. These are thencompared to the texture descriptor of the source image with pixel-wise SSD errorand referred to as etexture.

A form of graph cut seam finding [17] is then applied to find a seam betweenthe source image and the matched images that will minimize visible artifacts fromthe Poisson image blending [25] used in the final compositing, see figure 3.4.

It can however be argued that it is better to make use of all the valid pixelsin the source image in image completion algorithms [6]. But this could lead toobvious artifacts in the following Poisson image blending if the seam pass throughhigh frequency edges.

In standard graph cut seam finding the goal is to minimize the intensity differ-ences between the images. Hays et al. [13] argues that it is better to minimize thegradient of the image difference along the seam when the graph cut is followed by

3.3 Local Context Matching 15

Figure 3.3. Texture descriptor. 5 by 5 median filter applied to the image gradientmagnitude of a photo.

Figure 3.4. Left: Before graph cut seam finding. Right: after graph cut seam finding.

Poisson image blending. This results in a seam between the images that will tendto avoid cutting through objects as much as possible, and thus minimize obviousblending artifacts.

The seam is found by minimizing the following cost function:

C(L) =∑

p

Cd (p, L(p)) +∑p,q

Ci (p, q, L(p), L(q)) (3.3)

where Cd (p, L(p)) is the cost of assigning pixel p to a specific label L(p) andCi (p, q, L(p), L(q)) is the higher-order cost of assigning pixels p and q to labels

16 Method

L(p) and L(q) respectively. The label of each pixel has two possible values;

L(p) ={patch, if p is taken from the scene matchexists, if p is taken from the source image

(3.4)

Figure 3.5. Left: original image, with the missing region inside the red border. Center:matched image aligned to fit the area around the red border. Black pixels are outside ofthe matched image. Right: composited image. L(p) = exists outside of the red border.L(p) = patch inside of the border.

The cost Cd (p, L(p)) is defined as

Cd (p, exist) ={∞, if p ⊆ missing region0, otherwise

(3.5)

Cd (p, patch) ={∞, if p 6⊆ matched scene(k ∗Dist(p, hole))3

, otherwise(3.6)

where Dist(p, hole) is the distance from p to the missing region. This is used asa small penalty to discourage the algorithm to assign pixels that are distant fromthe missing region to exists. k is a constant that adjusts this penalty. Hays et al.[13] found that k = .002 works best, Wilczkowiak et al. [31] use a similar penalty.But since this thesis use images with a lower resolution than Hays et al. [13],k = .075 has empirically been found to be a better choice. The different areas ofthe images are visualized in figure 3.5.

Ci (p, q, L(p), L(q)) is defined as

Ci (p, q, L(p), L(q)) =

0, if L(p) = L(q)∇diff(p, q), if p is along the seams0, otherwise

(3.7)

where ∇diff(p, q) is the gradient magnitude of the SSD error between the sourceimage and the scene match at pixels p and q. This cost function can efficiently beminimized globally using a graph cut [5].

3.4 Adjustments 17

A final score for the matching of a scene is constructed as a sum of the scenematching error, escene, the local context (alignment) error, econtext, the local tex-ture distance, etexture, and the graph cut cost. The different components areweighted to contribute roughly the same to the final error. The 20 scenes with thelowest overall error are then composited using standard Poisson image blending[25] and presented to the user to choose from.

3.4 AdjustmentsHere follows a summary of the adjustments and improvements on Hays work madein this project.

• Smaller database. This work uses a smaller database with the users ownprivate photos.

• Due to the smaller database, the top 40 instead of top 200 scene matchesare used for further calculations.

• Precomputations. Gist scene descriptors and color descriptors are precom-puted to improve runtime performance.

• The implementation, see chapter 4, uses a smaller image size, 256 by 256pixels instead of max 1024 pixels in any direction, due to limitations on thedeveloper machine. Therefore context radius where changed from 80 pixelsto 20 pixels and k in the graph cut calculations where changed from .002 to.075.

Chapter 4

Implementation

In this thesis, a computer program that performs image completion using a databaseof local images has been created. The algorithm where first implemented in Math-works’ MATLAB. This implementation had some performance issues and a deci-sion where made to create a new implementation using CUDA [21] and CUFFTfrom Nvidia to accelerate computations. This implementation uses GLUT to dis-play the results and handle some of the user input. The Corona library is used toread JPEG compressed image files.

Both implementations are limited to images with a resolution of 256 by 256pixels, due to limitations in storage and performance on the developer machine.

The following sections will describe the CUDA implementation in greater de-tail.

4.1 PrecomputationIn order to minimize the computations needed at runtime, some characteristics foreach image in the database are precomputed.

The gist scene descriptor [22, 28, 23] is a relatively compact descriptor thatcan be used to find semantically and structurally similar scenes. It uses a set ofedge responses. These are then aggregated into a number of very coarse spatialbins.

The edge responses can be calculated [15] using the impulse response of Gaborfilters [12] with different orientations and scales. The impulse response is defined asa Gaussian function, fg, multiplied by a harmonic function, fh. The Fourier trans-form of the impulse response can efficiently be calculated as the Fourier transformof fh, Fh, convoluted by the Fourier transform of fg, Fg;

I(g) = Fh ∗ Fg (4.1)

These impulse responses is then aggregated to a x by y spatial resolution. Thisresults in a x ∗ y ∗M ∗N dimensional descriptor vector, where M is the number ofscales used and N the number of different orientations used per scale. Hays et al.

19

20 Implementation

[13] found that a gist scene descriptor with edge responses at 6 orientations and 5scales, aggregated to a 4 ∗ 4 spatial resolution worked best for this purpose. Thatsetup is used in this thesis as well. This descriptor is calculated for all images inthe database and saved to disc.

Aude Oliva and Antonio Torralba have published their MATLAB code to cal-culate the gist descriptor of an image. This code have been rewritten in CUDA Cand adapted to allow parallel execution.

In addition to the gist descriptor, the color values of the images, down sampledto the same 4 ∗ 4 resolution, are also saved. These color descriptors are calculatedin CIELAB color-space

All calculations are also done on horizontally flipped images in order to virtu-ally extend the dataset and increase the probability of finding suitable matches.Possible problems with this are that mirrored text or characteristic objets etc maybreak the illusion and reveal that the image is manipulated.

4.2 Finding Similar ImagesThe search for suitable data to fill the missing region is divided into three steps;find the semantically structurally most similar images in the database, align themto find the best local match and compare the texture with the source image.

4.2.1 Semantic Scene MatchingThis implementation is quite straight forward. First gist and color descriptor arecalculated for the source image in the same way as in section 4.1. These are thencompared to the precomputed descriptors using a SSD error measurement.

Since the user wants to remove a part of the image, there is no need to comparethat region. This is solved by weighting each spatial bin by the ratio of valid pixelsin that bin.

The errors are combined and weighted so that the gist descriptor contributesabout twice as much as the color descriptor to the resulting error value.

When all precomputed descriptors are compared to the source image the dif-ferences are sorted using a simple quicksort function. Finally, the 40 images withthe lowest cost and their corresponding IDs are returned and used in future cal-culations.

Pseudo code for this is given in algorithm 1.

4.2.2 Local Context MatchingThe local context matching algorithm calculates the correlation between each ofthe 40 top images from section 4.2.1 and a 20 pixel context around the missingregion in the source image. This is done to find the placement of the matchedimages that best correlates to the source image.

The correlation function calculates an array with error measurements for eachvalid placement. A placement is considered valid only if the context fits entirelyin the matched scene. For each placement a pixel-wise SSD error between the

4.2 Finding Similar Images 21

Algorithm 1 Semantic Scene Matchingσ ← number of images in the databaseCalculate gist descriptor for the source image.Convert source image to CIELAB color-space and down sample to 4 by 4 reso-lution.for all posts ∈ database dodatabase→ gistdb

database→ Labdb

diff [i]← ssd(gistsource, gistdb) + ssd(Labsource, Labdb)

//Flipped imagedatabase→ gistdb

database→ Labdb

diff [i+ σ]← ssd(gistsource, gistdb) + ssd(Labsource, Labdb)end forquicksort(diff)topmatches← diff [0 . . . 39]return topmatches

context of the source image and the corresponding pixels in the matched scene iscalculated.

The images are assumed to already be roughly aligned with the source image,so for each placement a small penalty is multiplied to the correlation error based onthe chessboard distance of the translation. Chessboard distance is demonstratedin table 4.1.

This procedure is repeated with the matched scenes scaled to 90% and 81% oftheir original size.

Finally, the scale and placement that turned out to be the best match for eachimage is returned, along with the corresponding error values.

Algorithm is given in algorithm 2.

Table 4.1. Chessboard distance from element X.

2 2 2 2 22 1 1 1 22 1 X 1 22 1 1 1 22 2 2 2 2

22 Implementation

Algorithm 2 Local Context Matching ( sceneMatches[ ] )distance← calculateChessboardDistance()sourceContext← getContext(sourceImage)for all 40 most semantically similar images doimage← readImage(sceneMatches[i].id)image.toCIELAB

correlationScore[]← calculateCorrelation(image, sourceContext, distance)x, y, score← findMin(correlationScore[])localResults[i]← x, y, score

imagescaled ← scale(image, 0.91)correlationScore[]← calculateCorrelation(imagescaled, sourceContext, distance)x, y, score← findMin(correlationScore[])if score < localResults[i].score thenlocalResults[i]← x, y, score

end if

imagescaled ← scale(image, 0.8)correlationScore[]← calculateCorrelation(imagescaled, sourceContext, distance)x, y, score← findMin(correlationScore[])if score < localResults[i].score thenlocalResults[i]← x, y, score

end ifend forreturn localResult

4.2 Finding Similar Images 23

4.2.3 Texture Matching

The texture matching function compares the texture of the source image with thematched scenes, within the context around the missing region. It creates a texturedescriptor for each image. The descriptor is constructed by applying the Sobeloperator [27] to the image, followed by a 5 by 5 median filter . These are thencompared using a SSD error measurement.


Algorithm 3 Texture MatchingsourceContext← getContext(sourceImage)textures ← applySobelOperator(sourceImage)textures ← apply5x5medianFilter(textures)for all 40 most semantically similar images doAdjust image to be aligned with the position calculated in section 4.2.2texturem ← applySobelOperator(image)texturem ← apply5x5medianFilter(texturem)localResults[i]← SSD(textures, texturem, sourceContext)

end forreturn localResult

4.2.4 Graph Cut Seam Finding

The graph cut seam finding algorithm make use of the CudaCuts interface byVineet et al. [29]. In ten iterations it first calculates a data cost and smooth costand then calls the CudaCuts graphCut() function. Equation 3.5 and 3.6 are usedto calculate the data cost and equation 3.7 to calculate the smooth cost. This isrepeated for all of the 40 top images from section 4.2.1 and returned for use in thefinal blending of the images.


Algorithm 4 Graph Cut Seam Findingfor all 40 most semantically similar images do

for j = 1→ 10 dodataTerm← calcDataCost()smoothTerm← calcSmoothCost()tempCut← graphCut(dataTerm, smoothTerm)j ← j + 1

end forgraphCuts[i]← tempCut

end forreturn graphCuts

24 Implementation

4.3 Poisson BlendingThe twenty images with the lowest combined score from section 4.2.1 - 4.2.4 areblended with standard Poisson blending. This implementation uses Agrawal etal’s [1, 2] 2D Poisson solver.

The Poisson blending function starts by calculating the image gradient for thecomposited image (the source image with the matched image aligned in the missingregion) in both x and y direction. Then it calls the poison solver and normalizesthe result by subtracting the minimum value and dividing by the maximum. Thisis done for the red, green and blue color channels separately.


Algorithm 5 Poisson Blendinggradx ← ∂f/∂xgrady ← ∂f/∂yfor all color channels doSolvePoissonNeumann(gradx.colorChannel, grady.colorChannel, result)minval← min(result)maxval← max(result−minval)finalImage.colorChanel← (result−minval)/maxval

end forreturn finalImage

4.4 GPU ImplementationMost algorithms described earlier in this chapter typically performs operations likefiltering, comparing, multiplying and adding or subtracting highly parallel data likepictures and arrays. They are thus well suited for this and can be parallelized toa large extent and implemented on the GPU using CUDA [21] from Nvidia.

A version of the image completion method where therefore implemented on theGPU using CUDA, since a parallelized implementation could potentially lead toa significant performance boost. Almost every part of the method where to someextent paralleled. However, how much of each part that was parallelized differs.Most of the parallelized parts are some form of image filtering or some form ofmathematical operation on the images.

In addition to that, the graph cut seam finding algorithm, section 4.2.4, usesthe GPU to calculate the parameters for the cost function and uses the CudaCutsinterface by Vineet et al. to solve it. The Poisson blending implementation, section4.3, uses Agrawal et al’s CUDA implemented solver.

Parallelizing the algorithms and implementing them using CUDA C requiredextensive work, much more so than the initial MATLAB implementation. Thismostly due to the lack of built in tools, primarily for basic image processing meth-ods.

For evaluation of the GPU implementation, see chapter 5.

Chapter 5

Results

In this thesis, a simple software that successfully uses a small database of localimages for image completion was developed. The software is more of a tech demothan a final application. The software is limited to images with a 256 pixels by 256pixels resolution due to limitations in storage on one of the developer machines,but the method could theoretically work with arbitrary sized images.

5.1 SystemFinal testing where done on a standard desktop computer with Intel Core 2 QuadCPU running at 2.33 GHz, 8 GB of memory (RAM) and a NVIDIA GeForce GT130 graphics card.

5.2 PerformanceThe implementations in this thesis have a precalculation step for the gist and colordescriptors. These descriptors are static and only needs to be calculated once foreach image. Table 5.2 shows a comparison between the different implementationsof the execution time of this step. The time measurements are for the entiredatabase. Hays et al.[13] do not mention any similar step in their implementation,Photoshop uses some form of local method and has thus no need for this step.

Implementation Execution timeCUDA 20 minMATLAB ∼4 hHays et al. ?Photoshop -

Table 5.1. Comparison between the precalculation part of the implementations.

25

26 Results

Table 5.2 shows a comparison between different image completion implementa-tion. The time measurement is for the time it takes a user to complete an image,any recalculations are excluded. Hays et al. publishes performance data from twoimplementations, one running on a single CPU, and one running on 15 parallelCPUs. There is no mention of the performance of these CPUs.

Implementation Execution timeCUDA 25-70 sMATLAB 2-3 minHays et al. on one CPU > 1 hHays et al. on 15 CPU:s in parallel 5 minPhotoshop near instant

Table 5.2. Comparison between the image completion part of the implementations.

5.3 ResultsThis section will publish a number of images completed with the method proposedin this thesis. They are accompanied by the original image, the mask that speci-fies the missing region (the white region should be removed) and the result fromcompleting the image using content aware fill in Photoshop.

5.3 Results 27

Figure 5.1. Sky replacement. Top left: original image. Top right: mask. Middleleft: composited image pre blending. Middle right: composited image post blending.Bottom left: alternative result. Bottom right: result from Content Aware Fill in AdobePhotoshop CS5.1.

28 Results

Figure 5.2. Remove nearby road. Top left: original image. Top right: mask. Bottomleft: result from thesis. Bottom right: result from Content Aware Fill in Adobe PhotoshopCS5.1.

5.3 Results 29

Figure 5.3. Replace shore. Top left: original image. Top right: mask. Bottom left:result from thesis. Bottom right: result from Content Aware Fill in Adobe PhotoshopCS5.1.

30 Results

Figure 5.4. Remove large building. Top left: original image. Top right: mask. Bottomleft: result from thesis. Bottom right: result from Content Aware Fill in Adobe PhotoshopCS5.1.

5.3 Results 31

Figure 5.5. Remove distant road. Top left: original image. Top right: mask. Bottomleft: result from thesis. Bottom right: result from Content Aware Fill in Adobe PhotoshopCS5.1.

Chapter 6

Evaluation

This chapter will compare the results and performance from this thesis with pre-vious work by Hays et al. [13] and the market leading commercial software AdobePhotoshop CS5 and evaluate its commercial potential.

6.1 QualityIt is hard to directly compare the quality of the image completions done in thisthesis with those made by Hays et al. since the hardware available the develop-ment did not allow storage of 2.3 millions of photographs. However, the biggestdifference between this thesis and their work is the size of the database. Thus itcan be assumed that both produce similar results when a good match is found.Due to the larger database used by Hays et al. it is more likely that their scenecompletion method will find a good match in the general case with a random im-age, while this thesis is more tailored to the specific use case of completing one ofthe users own photos.

Figure 5.3 and 5.4 shows that both this thesis and Photoshop is capable of pro-ducing plausible image completions when there is plenty of data around the missingregion to analyze. One could argue that if the user removes a foreground object in5.3, he may not want another one transferred there instead. On the other hand,Photoshop produces texture discontinuities and a slightly skewed perspective. Alsoin 5.4 Photoshop produces slightly more obvious seams and discontinuities.

However, the biggest advantage with this thesis compared to Photoshop istruly obvious in figure 5.1 and 5.2. If the user wants to replace the dull andoverexposed sky in figure 5.1, Photoshop’s local image completion algorithm hasno suitable data to transfer into the region. And thus, the result is not satisfying.However, the algorithm proposed in this thesis analyzes the scene and finds severalof suitable skies to complete the image with. Two of them are shown in figure 5.1.

In figure 5.2 the user wants to remove the road in the foreground. Even thoughthere is still plenty of texture and color information from the field left after theroad has been removed, Photoshop can not produce a useable completion of the

33

34 Evaluation

image. On the other hand, the method proposed in this thesis finds a closeup shotof the same field in another photo and transfers it seamlessly to the missing region.

Figure 5.5 shows a case where this method can not find a suitable image to com-plete the image with, and replaces the road with a icy lake. The result looks quiteplausible, but it is likely not what the user intended to do. However, Photoshopalso fails to complete this image.

6.2 PerformanceThough not as fast as the Content aware fill in Photoshop with nearly instantoperations on images with 256 by 256 resolution, the performance of the imple-mentation in this thesis far exceeds the performance that Hays et al. achieves.The current completion time of between thirty seconds and one minute is viablefor day to day use, and is five to ten times faster than what Hays et al. achievedusing fifteen CPU:s.

6.3 Commercial PotentialThe image completion method proposed in this thesis have shown that a localdatabase of private photos can successfully be used in image completion methodsto gain significant quality in the completions compared to commercial softwaresuch as Adobe Photoshop. This is done while avoiding many of the problemsassociated with the method proposed by Hays et al.

Since the photos are already on the computer, no extra storage is needed, nordoes the user have to download millions of images from the Internet. By using athese local photos, the user also avoids any the copyright issues.

A photographer today typically take many similar photographs on similarscenes during a photo session. Therefore a smaller number of images are needed tofind images that are visually and structurally similar, then when random imagesdownloaded from the internet are used, since the database contains a higher ratioof relevant data.

The performance of the algorithm are still worse than for local algorithms,but a processing time of one minute on a standard desktop computer is viable forcommercial use. By using a smaller database and modern technologies such asCUDA, this method is 60-120 times faster than the work by Hays et al.

Chapter 7

Conclusion

The results from our implementation are promising. It have shown that a smalldatabase of photos can be used to improve the quality of image completions, incontrary to the findings if Hays et al [13]. The reason for this is likely due to thetype of images used in the databases. Hays et al. uses random images downloadedfrom the internet. These images, even though they represent similar scenes, havelittle visual and structural correlation with each other. Thus a vast number ofimages are needed in order to find images that are visually and structurally similarto other random images that the user may want to edit.

Our does instead use the user’s own private, or professional, images for thecompletions. A photographer today typically take many similar photographs onsimilar scenes during a photo session. Thus a smaller number of images are neededto find images that are visually and structurally similar, given a usage case wherethe user edit its own photos.

This approach do also seem more suitable for a future commercial application.The approach taken by Hays have copyright issues associated with licensing mil-lions of photographs. It also requires large amounts of storage and computingcapabilities in order to store and analyze those files. The approach in this thesisrequires no extra storage of images and the computing time for the image comple-tions is commercially viable using standard desktop computers. And there is noneed to license the images since the algorithm only uses the photographers ownimages.

Compared to existing commercial implementations of image completion algo-rithms, this method produces plausible results where market leading software likeAdobe Photoshop cannot, as seen in figure 5.1. The computing time is not asfast as local methods, but a computing time of one minute on low end graphicshardware should not be a big problem.

However, in retrospect the CUDA implementation did not give the performanceboost over the initial MATLAB implementation on the actual image completionsthat I was hoping for. This is probably due to the lack of experience with CUDAor other methods for general purpose computing on the GPU, which have resultedin some poorly optimized code. It would likely have been a wiser decision to

35

36 Conclusion

stick with the original implementation and spend more time on optimizing it andperfecting the results.

7.1 Future workThe completion time could likely see a significant improvement with a more opti-mized implementation and better use of the graphics hardware.

The image completions could probably benefit from some form of hybrid witha local method. For example, it could be a good idea in some cases to fill someof the missing region using this method and the rest using a local method, or touse some form of local texture propagation in conjunction with this method toget a more seamless result. Another possibility is to fill the missing region usingmultiple images.

More advanced techniques for aligning image features, like Scale-Invariant Fea-ture Transform, SIFT [19], could be used to better align the matched scenes withthe surrounding region.

The poisson blending can from time to time lead to visible artifacts that makesit very obvious that the image have been modified. It could therefore be a goodidea to explore other blending techniques.

Since this method uses the user local photo-library, it would be possible tomake use of the metadata to find similar images. For example: GPS coordinatescould be used to find photos taken within the same area or at places with similargeology. Time stamps could be used to find images taken on the same time of theday or in the same time of the year. That could help to find images with similarlight conditions and ensure that nature photos are within the right season.

It could also be beneficial to use labels, comments and ratings etc. that theuser already has entered for the photos.

Bibliography

[1] Amit Agrawal, Rama Chellappa, and Ramesh Raskar. An algebraic approachto surface reconstruction from gradient fields. In Proceedings of the TenthIEEE International Conference on Computer Vision (ICCV’05) Volume 1 -Volume 01, pages 174–181, Washington, DC, USA, 2005. IEEE ComputerSociety.

[2] Amit Agrawal and Ramesh Raskar. What is the range of surface reconstruc-tions from a gradient field. In In ECCV, pages 578–591. Springer, 2006.

[3] M. Bertalmio, A. L. Bertozzi, and G. Sapiro. Navier-stokes, fluid dynamics,and image and video inpainting. In Proc. IEEE Computer Vision and PatternRecognition (CVPR, pages 355–362, 2001.

[4] Marcelo Bertalmio, Guillermo Sapiro, Vincent Caselles, and Coloma Ballester.Image inpainting. In Proceedings of the 27th annual conference on Computergraphics and interactive techniques, SIGGRAPH ’00, pages 417–424, NewYork, NY, USA, 2000. ACM Press/Addison-Wesley Publishing Co.

[5] Yuri Boykov, Olga Veksler, and Ramin Zabih. Fast approximate energy min-imization via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell., 23:1222–1239, November 2001.

[6] A. Criminisi, P. Pérez, and K. Toyama. Object removal by exemplar-based in-painting. Computer Vision and Pattern Recognition, IEEE Computer SocietyConference on, 2:721, 2003.

[7] Nicholas Diakopoulos, Irfan Essa, and Ramesh Jain. Content based imagesynthesis. In In CIVR, pages 299–307, 2004.

[8] Bekir Dizdaroglu. An image completion method using decomposition.EURASIP J. Adv. Sig. Proc., 2011, 2011.

[9] Xiaojun Du, Dongwook Cho, and Tien D. Bui. Image segmentation andinpainting using hierarchical level set and texture mapping. Signal Process.,91:852–863, April 2011.

[10] Alexei A. Efros and William T. Freeman. Image quilting for texture synthesisand transfer. In Proceedings of the 28th annual conference on Computer

37

38 Bibliography

graphics and interactive techniques, SIGGRAPH ’01, pages 341–346, NewYork, NY, USA, 2001. ACM.

[11] Alexei A. Efros and Thomas K. Leung. Texture synthesis by non-parametricsampling. In Proceedings of the International Conference on ComputerVision-Volume 2 - Volume 2, ICCV ’99, pages 1033–, Washington, DC, USA,1999. IEEE Computer Society.

[12] D. Gabor. Theory of communication. part 1: The analysis of information.Electrical Engineers-Part III: Radio and Communication Engineering, Journalof the Institution of, 93(26):429–441, 1946.

[13] James Hays and Alexei A. Efros. Scene completion using millions of pho-tographs. ACM Trans. Graph., 26, July 2007.

[14] Aaron Hertzmann, Charles E. Jacobs, Nuria Oliver, Brian Curless, andDavid H. Salesin. Image analogies. In Proceedings of the 28th annual con-ference on Computer graphics and interactive techniques, SIGGRAPH ’01,pages 327–340, New York, NY, USA, 2001. ACM.

[15] Yiming Ji, Kai H. Chang, and Chi-Cheng Hung. Efficient edge detection andobject segmentation using gabor filters. In Proceedings of the 42nd annualSoutheast regional conference, ACM-SE 42, pages 454–459, New York, NY,USA, 2004. ACM.

[16] M. Johnson, G. J. Brostow, J. Shotton, O. Arandjelovic, V. Kwatra, andR. Cipolla. Semantic photo synthesis. COMPUTER GRAPHICS FORUM,2006.

[17] Vivek Kwatra, Arno Schödl, Irfan Essa, Greg Turk, and Aaron Bobick. Graph-cut textures: image and video synthesis using graph cuts. ACM Trans.Graph., 22:277–286, July 2003.

[18] Shutao Li and Ming Zhao. Image inpainting with salient structure completionand texture propagation. Pattern Recognition Letters, 32(9):1256 – 1266,2011.

[19] David G. Lowe. Distinctive image features from scale-invariant keypoints.Int. J. Comput. Vision, 60:91–110, November 2004.

[20] David Mumford and Jayant Shah. Optimal approximations by piecewisesmooth functions and associated variational problems. Communications onPure and Applied Mathematics, 42(5):577–685, 1989.

[21] NVIDIA. Cuda c programming guide. March 2011.

[22] Aude Oliva and Antonio Torralba. Modeling the shape of the scene: A holisticrepresentation of the spatial envelope. International Journal of ComputerVision, 42:145–175, 2001.

Bibliography 39

[23] Aude Oliva and Antonio Torralba. Building the gist of a scene: the role ofglobal image features in recognition. Progress in brain research, 155:23–36,2006.

[24] Rupert Paget and I. D. Longstaff. Texture synthesis via a noncausal nonpara-metric multiscale markov random field, 1998.

[25] Patrick Pérez, Michel Gangnet, and Andrew Blake. Poisson image editing.ACM Trans. Graph., 22:313–318, July 2003.

[26] Javier Portilla and Eero P. Simoncelli. A parametric texture model basedon joint statistics of complex wavelet coefficients. Int. J. Comput. Vision,40:49–70, October 2000.

[27] Irwin Edward Sobel. Camera models and machine perception. PhD thesis,Stanford, CA, USA, 1970. AAI7102831.

[28] Antonio Torralba, Kevin P. Murphy, William T. Freeman, and Mark A. Ru-bin. Context-based vision system for place and object recognition. In Pro-ceedings of the Ninth IEEE International Conference on Computer Vision -Volume 2, ICCV ’03, pages 273–, Washington, DC, USA, 2003. IEEE Com-puter Society.

[29] Vibhav Vineet and P. J. Narayanan. CUDA cuts: Fast graph cuts on the GPU.Computer Vision and Pattern Recognition Workshop, 0:1–8, June 2008.

[30] Li-Yi Wei and Marc Levoy. Fast texture synthesis using tree-structured vectorquantization. In Proceedings of the 27th annual conference on Computergraphics and interactive techniques, SIGGRAPH ’00, pages 479–488, NewYork, NY, USA, 2000. ACM Press/Addison-Wesley Publishing Co.

[31] Marta Wilczkowiak, Gabriel J. Brostow, Ben Tordoff, and Roberto Cipolla.Hole filling through photomontage. In 16th British Machine Vision Conference2005 - BMVC’2005, Oxford, United Kingdom, pages 492–501, July 2005.

[32] B. Wohlberg. Inpainting by Joint Optimization of Linear Combinations ofExemplars. IEEE Signal Processing Letters, 18:75–78, January 2011.

[33] Ying-Qing Xu, Baining Guo, and Harry Shum. Chaos Mosaic: Fast andMemory Efficient Texture Synthesis. Technical report, Microsoft Research,April 2002.

[34] Zongben Xu and Jian Sun. Image inpainting by patch propagation usingpatch sparsity. IEEE Transactions on Image Processing, 19(5):1153–1165,2010.

UpphovsrättDetta dokument hålls tillgängligt på Internet — eller dess framtida ersättare —under 25 år från publiceringsdatum under förutsättning att inga extraordinäraomständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för icke-kommersiell forskning och för undervisning. Överföring av upphovsrätten vid ensenare tidpunkt kan inte upphäva detta tillstånd. All annan användning av doku-mentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerhe-ten och tillgängligheten finns det lösningar av teknisk och administrativ art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsmani den omfattning som god sed kräver vid användning av dokumentet på ovan be-skrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan formeller i sådant sammanhang som är kränkande för upphovsmannens litterära ellerkonstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se förla-gets hemsida http://www.ep.liu.se/

CopyrightThe publishers will keep this document online on the Internet — or its possi-ble replacement — for a period of 25 years from the date of publication barringexceptional circumstances.

The online availability of the document implies a permanent permission foranyone to read, to download, to print out single copies for his/her own use andto use it unchanged for any non-commercial research and educational purpose.Subsequent transfers of copyright cannot revoke this permission. All other uses ofthe document are conditional on the consent of the copyright owner. The publisherhas taken technical and administrative measures to assure authenticity, securityand accessibility.

According to intellectual property law the author has the right to be mentionedwhen his/her work is accessed as described above and to be protected againstinfringement.

For additional information about the Linköping University Electronic Pressand its procedures for publication and for assurance of document integrity, pleaserefer to its www home page: http://www.ep.liu.se/

c© Mikael Dalkvist

Documents

Institutionen för systemteknik - DiVA portal442585/FULLTEXT02.pdf · today typically take many similar photographs on similar scenes during a photo ... synthesis works by stitching