13
International Journal of Computer Science & Information Technology (IJCSIT) Vol 5, No 2, April 2013 DOI : 10.5121/ijcsit.2013.5202 19 OBTAINING SUPER-RESOLUTION IMAGES BY COMBINING LOW-RESOLUTION IMAGES WITH HIGH-FREQUENCY INFORMATION DERIVEDFROM TRAINING IMAGES Emil Bilgazyev 1 , Nikolaos Tsekos 2 , and Ernst Leiss 3 1,2,3 Department of Computer Science, University of Houston, TX, USA 1 [email protected], 2 [email protected], 3 [email protected] ABSTRACT In this paper, we propose a new algorithm to estimate a super-resolution image from a given low-resolution image, by adding high-frequency information that is extracted from natural high-resolution images in the training dataset. The selection of the high-frequency information from the training dataset is accomplished in two steps, a nearest-neighbor search algorithm is used to select the closest images from the training dataset, which can be implemented in the GPU, and a sparse-representation algorithm is used to estimate a weight parameter to combine the high-frequency information of selected images. This simple but very powerful super-resolution algorithm can produce state-of-the-art results. Qualitatively and quantitatively, we demonstrate that the proposed algorithm outperforms existing state-of-the-art super-resolution algorithms. KEYWORDS Super-resolution, face recognition, sparse representation. 1. INTRODUCTION Recent advances in electronics, sensors, and optics have led to a widespread availability of video-based surveillance and monitoring systems. Some of the imaging devices, such as cameras, camcorders, and surveillance cameras, have limited achievable resolution due to factors such as quality of lenses, limited number of sensors in the camera, etc. Increasing the quality of lenses or the number of sensors in the camera will also increase the cost of the device; in some cases the desired resolution may be still not achievable with the current technology. However, many applications, ranging from security to broadcasting, are driving the need for higher-resolution images or videos for better visualization [1]. The idea behind super-resolution is to enhance the low-resolution input image, such that the spatial-resolution (total number of independent pixels within the image) as well as pixel-resolution (total number of pixels) are improved.

OBTAINING SUPER-RESOLUTION IMAGES BY COMBINING LOW-RESOLUTION IMAGES WITH HIGH-FREQUENCY INFORMATION DERIVEDFROM TRAINING IMAGES

  • Upload
    ijcsit

  • View
    967

  • Download
    6

Embed Size (px)

DESCRIPTION

In this paper, we propose a new algorithm to estimate a super-resolution image from a given low-resolution image, by adding high-frequency information that is extracted from natural high-resolution images in the training dataset. The selection of the high-frequency information from the training dataset is accomplished in two steps, a nearest-neighbor search algorithm is used to select the closest images from the training dataset, which can be implemented in the GPU, and a sparse-representation algorithm is used to estimate a weight parameter to combine the high-frequency information of selected images. This simple but very powerful super-resolution algorithm can produce state-of-the-art results. Qualitatively and quantitatively, we demonstrate that the proposed algorithm outperforms existing state-of-the-art super-resolution algorithms.

Citation preview

Page 1: OBTAINING SUPER-RESOLUTION IMAGES BY COMBINING LOW-RESOLUTION IMAGES WITH HIGH-FREQUENCY INFORMATION DERIVEDFROM TRAINING IMAGES

International Journal of Computer Science & Information Technology (IJCSIT) Vol 5, No 2, April 2013

DOI : 10.5121/ijcsit.2013.5202 19

OBTAINING SUPER-RESOLUTION IMAGESBYCOMBINING LOW-RESOLUTION IMAGESWITHHIGH-FREQUENCY INFORMATIONDERIVEDFROM TRAINING IMAGES

Emil Bilgazyev1, Nikolaos Tsekos2, and Ernst Leiss3

1,2,3Department of Computer Science, University of Houston, TX, [email protected], [email protected], [email protected]

ABSTRACT

In this paper, we propose a new algorithm to estimate a super-resolution image from a given low-resolutionimage, by adding high-frequency information that is extracted from natural high-resolution images in thetraining dataset. The selection of the high-frequency information from the training dataset is accomplished intwo steps, a nearest-neighbor search algorithm is used to select the closest images from the training dataset,which can be implemented in the GPU, and a sparse-representation algorithm is used to estimate a weightparameter to combine the high-frequency information of selected images. This simple but very powerfulsuper-resolution algorithm can produce state-of-the-art results. Qualitatively and quantitatively, wedemonstrate that the proposed algorithm outperforms existing state-of-the-art super-resolution algorithms.

KEYWORDS

Super-resolution, face recognition, sparse representation.

1. INTRODUCTION

Recent advances in electronics, sensors, and optics have led to a widespread availability ofvideo-based surveillance and monitoring systems. Some of the imaging devices, such as cameras,camcorders, and surveillance cameras, have limited achievable resolution due to factors such asquality of lenses, limited number of sensors in the camera, etc. Increasing the quality of lenses orthe number of sensors in the camera will also increase the cost of the device; in some cases thedesired resolution may be still not achievable with the current technology. However, manyapplications, ranging from security to broadcasting, are driving the need for higher-resolutionimages or videos for better visualization [1].

The idea behind super-resolution is to enhance the low-resolution input image, such that thespatial-resolution (total number of independent pixels within the image) as well as pixel-resolution(total number of pixels) are improved.

Page 2: OBTAINING SUPER-RESOLUTION IMAGES BY COMBINING LOW-RESOLUTION IMAGES WITH HIGH-FREQUENCY INFORMATION DERIVEDFROM TRAINING IMAGES

International Journal of Computer Science & Information Technology (IJCSIT) Vol 5, No 2, April 2013

20

In this paper, we propose a new approach to estimate super-resolution by combining a givenlow-resolution image with high-frequency information obtained from training images (Fig. 1). Thenearest-neighbor-search algorithm is used to select closest images from the training dataset, and asparse representation algorithm is used to estimate a weight parameter to combine thehigh-frequencyinformation of selected images. The main motivation of our approach is that thehigh-frequency information helps to obtain sharp edges on the reconstructed images (see Fig. 1).

(a) (b) (c)Figure 1: Depiction of (a) the super-resolution image obtained by combining (b) a given low-resolution

image and (c) high-frequency information estimated from a natural high-resolution training dataset.

The rest of the paper is organized as follow: Previous work is presented in Section 2, a descriptionof our proposed method is presented in Section 3, the implementation details are presented inSection 4, and experimental results of the proposed algorithm as well as other algorithms arepresented in Section 5. Finally, Section 6 summarizes our findings and concludes the paper.

2. PREVIOUS WORK

In this section, we briefly review existing techniques for super-resolution of low-resolution imagesfor general and domain-specific purposes. In recent years, several methods have been proposed thataddress the issue of image resolution. Existing super-resolution (SR) algorithms can be classifiedinto two classes: multi-frame-based and example-based algorithms [8]. Multi-frame-basedmethods compute an high-resolution (HR) image from a set of low-resolution (LR) images fromany domain [6]. The key assumption of multi-frame-based super-resolution methods is that the setof input LR images overlap and each LR image contains additional information than other LRimages. Then multi-frame-based SR methods combine these sets of LR images into one image sothat all information is contained in a single output SR image. Additionally, these methods performsuper-resolution with the general goal of improving the quality of the image so that the resultinghigher-resolution image is also visually pleasing. The example-based methods compute an HRcounterpart of a single LR image from a known domain [2, 13, 18, 10, 14]. These methods learnobserved information targeted to a specific domain and thus, can exploit prior knowledge to obtainsuperior results specific to that domain. Our approach belongs to this category, where we use atraining database to improve reconstruction output.

Moreover, the domain-specific SR methods targeting the same domain differ considerably fromeach other in the way they model and apply a priori knowledge about natural images. Yang et al.[22] introduced a method to reconstruct SR images using a sparse representation of the input LRimages. However, the performance of these example-based SR methods degrades rapidly if themagnification factor is more than 2. In addition, the performance of these SR methods is highlydependent on the size of the training database.

International Journal of Computer Science & Information Technology (IJCSIT) Vol 5, No 2, April 2013

20

In this paper, we propose a new approach to estimate super-resolution by combining a givenlow-resolution image with high-frequency information obtained from training images (Fig. 1). Thenearest-neighbor-search algorithm is used to select closest images from the training dataset, and asparse representation algorithm is used to estimate a weight parameter to combine thehigh-frequencyinformation of selected images. The main motivation of our approach is that thehigh-frequency information helps to obtain sharp edges on the reconstructed images (see Fig. 1).

(a) (b) (c)Figure 1: Depiction of (a) the super-resolution image obtained by combining (b) a given low-resolution

image and (c) high-frequency information estimated from a natural high-resolution training dataset.

The rest of the paper is organized as follow: Previous work is presented in Section 2, a descriptionof our proposed method is presented in Section 3, the implementation details are presented inSection 4, and experimental results of the proposed algorithm as well as other algorithms arepresented in Section 5. Finally, Section 6 summarizes our findings and concludes the paper.

2. PREVIOUS WORK

In this section, we briefly review existing techniques for super-resolution of low-resolution imagesfor general and domain-specific purposes. In recent years, several methods have been proposed thataddress the issue of image resolution. Existing super-resolution (SR) algorithms can be classifiedinto two classes: multi-frame-based and example-based algorithms [8]. Multi-frame-basedmethods compute an high-resolution (HR) image from a set of low-resolution (LR) images fromany domain [6]. The key assumption of multi-frame-based super-resolution methods is that the setof input LR images overlap and each LR image contains additional information than other LRimages. Then multi-frame-based SR methods combine these sets of LR images into one image sothat all information is contained in a single output SR image. Additionally, these methods performsuper-resolution with the general goal of improving the quality of the image so that the resultinghigher-resolution image is also visually pleasing. The example-based methods compute an HRcounterpart of a single LR image from a known domain [2, 13, 18, 10, 14]. These methods learnobserved information targeted to a specific domain and thus, can exploit prior knowledge to obtainsuperior results specific to that domain. Our approach belongs to this category, where we use atraining database to improve reconstruction output.

Moreover, the domain-specific SR methods targeting the same domain differ considerably fromeach other in the way they model and apply a priori knowledge about natural images. Yang et al.[22] introduced a method to reconstruct SR images using a sparse representation of the input LRimages. However, the performance of these example-based SR methods degrades rapidly if themagnification factor is more than 2. In addition, the performance of these SR methods is highlydependent on the size of the training database.

International Journal of Computer Science & Information Technology (IJCSIT) Vol 5, No 2, April 2013

20

In this paper, we propose a new approach to estimate super-resolution by combining a givenlow-resolution image with high-frequency information obtained from training images (Fig. 1). Thenearest-neighbor-search algorithm is used to select closest images from the training dataset, and asparse representation algorithm is used to estimate a weight parameter to combine thehigh-frequencyinformation of selected images. The main motivation of our approach is that thehigh-frequency information helps to obtain sharp edges on the reconstructed images (see Fig. 1).

(a) (b) (c)Figure 1: Depiction of (a) the super-resolution image obtained by combining (b) a given low-resolution

image and (c) high-frequency information estimated from a natural high-resolution training dataset.

The rest of the paper is organized as follow: Previous work is presented in Section 2, a descriptionof our proposed method is presented in Section 3, the implementation details are presented inSection 4, and experimental results of the proposed algorithm as well as other algorithms arepresented in Section 5. Finally, Section 6 summarizes our findings and concludes the paper.

2. PREVIOUS WORK

In this section, we briefly review existing techniques for super-resolution of low-resolution imagesfor general and domain-specific purposes. In recent years, several methods have been proposed thataddress the issue of image resolution. Existing super-resolution (SR) algorithms can be classifiedinto two classes: multi-frame-based and example-based algorithms [8]. Multi-frame-basedmethods compute an high-resolution (HR) image from a set of low-resolution (LR) images fromany domain [6]. The key assumption of multi-frame-based super-resolution methods is that the setof input LR images overlap and each LR image contains additional information than other LRimages. Then multi-frame-based SR methods combine these sets of LR images into one image sothat all information is contained in a single output SR image. Additionally, these methods performsuper-resolution with the general goal of improving the quality of the image so that the resultinghigher-resolution image is also visually pleasing. The example-based methods compute an HRcounterpart of a single LR image from a known domain [2, 13, 18, 10, 14]. These methods learnobserved information targeted to a specific domain and thus, can exploit prior knowledge to obtainsuperior results specific to that domain. Our approach belongs to this category, where we use atraining database to improve reconstruction output.

Moreover, the domain-specific SR methods targeting the same domain differ considerably fromeach other in the way they model and apply a priori knowledge about natural images. Yang et al.[22] introduced a method to reconstruct SR images using a sparse representation of the input LRimages. However, the performance of these example-based SR methods degrades rapidly if themagnification factor is more than 2. In addition, the performance of these SR methods is highlydependent on the size of the training database.

Page 3: OBTAINING SUPER-RESOLUTION IMAGES BY COMBINING LOW-RESOLUTION IMAGES WITH HIGH-FREQUENCY INFORMATION DERIVEDFROM TRAINING IMAGES

International Journal of Computer Science & Information Technology (IJCSIT) Vol 5, No 2, April 2013

21

Freeman et al. [8] proposed an example-based learning strategy that applies to generic imageswhere the LR to HR relationship is learned using a Markov Random Field (MRF). Sun et al. [17]extended this approach by using the Primal Sketch priors to reconstruct edge regions and corners bydeblurring them. The main drawback of these methods is that they require a large database of LRand HR image pairs to train the MRF. Chang et al. [3] used the Locally Linear Embedding (LLE)

Figure 2: Depiction of the pipeline for the proposed super-resolution algorithm.

manifold learning approach to map the the local geometry of HR images to LR images with theassumption that the manifolds between LR and HR images are similar. In addition, theyreconstructed a SR image using K neighbors. However, the manifold between the synthetic LR thatis generated from HR images is not similar to the manifold of real scenario LR images, which arecaptured under different environments and camera settings. Also, using a fixed number ofneighbors to reconstruct an SR image usually results in blurring effects such as artifacts in theedges, due to over- or under-fitting.

Another approach is derived from a multi-frame based approach to reconstruct an SR image from asingle LR image [7, 12, 19]. These approaches learn the co-occurrence of a patch within the imagewhere the correspondence between LR and HR is predicted. These approaches cannot be used toreconstruct a SR image from a single LR facial image, due to the limited number of similar patcheswithin a facial image.

SR reconstruction based on wavelet analysis has been shown to be well suited for reconstruction,denoising and deblurring, and has been used in a variety of application domains includingbiomedical [11], biometrics [5], and astronomy [20]. In addition, it provides an accurate and sparserepresentation of images that consist of smooth regions with isolated abrupt changes [16]. In ourmethod, we propose to take advantage of the wavelet decomposition-based approach in conjunctionwith compressed sensing techniques to improve the quality of the super-resolution output.

Page 4: OBTAINING SUPER-RESOLUTION IMAGES BY COMBINING LOW-RESOLUTION IMAGES WITH HIGH-FREQUENCY INFORMATION DERIVEDFROM TRAINING IMAGES

International Journal of Computer Science & Information Technology (IJCSIT) Vol 5, No 2, April 2013

22

3. METHODOLOGY

Table 1. Notations used in this paper

Symbols DescriptionX Collection of training images

iX thi training image ( ×m nR )

jix ,thj patch of thi image ( ×k lR )

threshold value

i sparse representation of thi patch

p. pl -norm

D Dictionary

W , −1W forward and inverse wavelet transforms

, high- and low-frequencies of image

][ Concatenation or vectors or matrices

Let nmi RX ×∈ be the thi image of the training dataset }0...=:{= NiXX i , and ×∈,

k li jx R

be the thj patch of an image }0...=:{= , MjxX jii . The wavelet transform of an image patch

x will return low- and high-frequency information:

( ) = [ ( ), ( )] ,W x x x (1)

where W is the forward wavelet transform, )(x is the low-frequency information, and )(x isthe high-frequency information of an image patch x . Taking the inverse wavelet transform ofhigh- and low-frequency information of original image (without any processing on them) willresult in the original image:

1= ([ ( ), ( )]) ,x W x x − (2)

where −1W is the inverse wavelet transform. If we use Haar wavelet transform with its

coefficients being 0.5 instead of 2 (nonquadratic-mirror-filter), then the low-frequencyinformation of an image x will actually be a low-resolution version of an image x , where fourneighboring pixels are averaged; in other words, it is similar to down-sampling an image x by afactor of 2 with nearest-neighbor interpolation, and the high-frequency information )(x of animage x will be similar to the horizontal, vertical and diagonal gradients of x .

Assume that, for a given low-resolution image patch iy which is the thi patch of an image y , we

can find a similar patch }0={= NMjx j from the natural image patches, then by combining

iy with the high-frequency information )( jx of a high-resolution patch jx , and taking the

Page 5: OBTAINING SUPER-RESOLUTION IMAGES BY COMBINING LOW-RESOLUTION IMAGES WITH HIGH-FREQUENCY INFORMATION DERIVEDFROM TRAINING IMAGES

International Journal of Computer Science & Information Technology (IJCSIT) Vol 5, No 2, April 2013

23

inverse wavelet transform, we will get the super-resolution *y (see Fig. 1):

− − ≤2* 1

02= ([ , ( )]) , ( ) ,i i j i jy W y x y x (3)

where 0 is a small nonnegative value.

It is not guaranteed that we will always find an jx such that 0

2

2)( ≤− ji xy , thus, we

introduce an approach to estimate a few closest low-resolution patches ( )( jx ) from the training

dataset and then estimate a weight for each patch )( jx which will be used to combine

high-frequency information of the training patches )( jx .

To find closest matches to the low-resolution input patch iy , we use a nearest-neighbor search

algorithm:

,},)(=:{= 1

2

2Xxxyccc

iciciii ∈∀≤− (4)

where c is a vector containing the indexes ( ic ) of training patches of the closest matches to input

patch iy , and 1 is the radius threshold of a nearest-neighbor search. After selecting the closest

matches to iy , we build two dictionaries from the selected patches jx ; the first dictionary will be

the joint of low-frequency information of training patches )( jx where it will be used to estimate

a weight parameter, and the second dictionary will be the joint of high-frequency information oftraining patches )( jx :

.}:)({=,}:)({= cjxDcjxD jiji ∈∈ (5)

We use a sparse representation algorithm [21] to estimate the weight parameter. The

sparse-representation i of an input image patch iy with respect to the dictionary iD , is used

as a weight for fusion of the high-frequency information of training patches ( iD ):

.argmin=12 iiii

i

i Dy

+− (6)

The sparse representation algorithm (Eq. 6) tries to estimate iy by fusing a few atoms (columns)

of the dictionary iD , by assigning non-zero weights to these atoms. The result will be the

sparse-representation i , which has only a few non-zero elements. In other words, the input image

patch iy can be represented by combining a few atoms of iD ( iii Dy ≈ ) with a weight

parameter i ; similarly, the high-frequency information of training patches iD can also be

Page 6: OBTAINING SUPER-RESOLUTION IMAGES BY COMBINING LOW-RESOLUTION IMAGES WITH HIGH-FREQUENCY INFORMATION DERIVEDFROM TRAINING IMAGES

International Journal of Computer Science & Information Technology (IJCSIT) Vol 5, No 2, April 2013

24

combined with the same weight parameter i , to estimate the unknown high-frequency

information of an input image patch iy : −* 1= ([ , ]) ,i i i iy W y D (7)

where *iy is the output (super-resolution) image patch, and W −1 is the inverse wavelet transform.

Figure 2 depicts the pipeline for the proposed algorithm. For the training step, from high-resolutiontraining images we extract patches, then we compute low-frequency (will become low-resolutiontraining image patches) and high-frequency information for each patch in the training dataset. For

the reconstruction step, given an input low-resolution image y , we extract a patch iy , find nearest

neighbors c within the given radius 1 (this can be speeded-up usinga GPU), then from selected

neighbors c , we construct low-frequency iD and high-frequency dictionaries

iD , where the

low-frequency dictionary is used to estimate the sparse representation i of input low-resolution

patch iy with respect to the selected neighbors, and the high-frequency dictionary iD will be

used to fuse its atoms (columns) with a weight parameter, where the sparse representation i will

be used as a weight parameter. Finally, by taking the inverse wavelet transform ( −1W ) of a givenlow-resolution image patch iy with fused high-frequency information, we will get the

super-resolution patch *y . Iteratively repeating the reconstruction step (red-dotted block in Fig. 2)

for each patch in the low-resolution image y , we will obtain the super-resolution image *y .

4. IMPLEMENTATION DETAILS

In this section we will explain the implementation details. As we have pointed out in Sec. 3, we

extract patches for each training image }0...=:{= , MjxX jii with lkji Rx ×∈, . The number

M depends on the window function, which determines how we would like to select the patches.There are two ways to select the patches from the image; one is by selecting distinct patches froman image, where two consecutive patches don’t overlap, and another is by selecting overlapped

patches (sliding window), where two consecutive patches are overlapped. Since the 2l -norm innearest-neighbor search is sensitive to the shift, we slide the window by one pixel in horizontal orvertical direction, where the two consecutive patches will overlap each other by lk ×−1)( or

1)( −× lk , where ×∈,k l

i jx R . To store these patches we will require an enormous amount of

storage space lklnkmN ××−×−× )()( , where N is the number of training images and×∈ m n

iX R . For example, if we have 1000 images natural images in the training dataset, and each

has a resolution of 10001000× pixels, to store the patches of 4040× , we will require 1.34TB ofstorage space, which would be inefficient and computationally expensive.To reduce the number ofpatches, we removed patches which don’t contain any gradients, or contain very few gradients,

2

2

2, ≥∇ jix where ∇ is the sum of gradients along the vertical and horizontal directions (

Page 7: OBTAINING SUPER-RESOLUTION IMAGES BY COMBINING LOW-RESOLUTION IMAGES WITH HIGH-FREQUENCY INFORMATION DERIVEDFROM TRAINING IMAGES

International Journal of Computer Science & Information Technology (IJCSIT) Vol 5, No 2, April 2013

25

y

x

x

xx jiji

ji ∂∂

+∂

∂∇ ,,

, = ), and 2 is the threshold value to filter out the patches with less gradient

variation. Similarly, we calculate the gradients on input low-resolution patches ( iy ), and if they are

below the threshold 2 , we upsample them using bicubic interpolation, where no super-resolutionreconstruction will be performed on that patch. To improve the computation speed, thenearest-neighbor search can be calculated in the GPU, and since all given low-resolution patchesare calculated independently of each other, multi-threaded processing can be used for eachsuper-resolution patch reconstruction.

In the wavelet transform, for low-pass filter and high-pass filter we used [ 0.5 , 0.5 ] and[ 0.5− , 0.5 ], where 2D filters for wavelet transform are created from them. These filters are notquadratic-mirror-filters (nonorthogonal), thus, during the inverse wavelet transform we need tomultiply the output by 4 . The reason for choosing these values for the filters is, the low-frequencyinformation (analysis part) of the forward wavelet transform will be the same as down-sampling thesignal by a factor of 2 with nearest neighbor interpolation, which is used in the nearest-neighborsearch.During the experiments, all color images are converted to YCbCr , where only theluminance component ( Y ) is used.For the display, the blue- and red-difference chromacomponents ( Cb and Cr ) of an input low-resolution image are up-sampled and combined with

the super-resolution image to obtain the color image *y (Fig. 2).

Note that we can reduce the storage space for the patches to zero, by extracting the patches of thetraining images during reconstruction. This can be accomplished by changing the neighbor-searchalgorithm, and can be implemented in GPU. During the neighbor-searching, each GPU thread willbe assigned to extract low-frequency )( , jlx and high-frequency information ,( )l jx at an

assigned position j of a training image lX ; we compute the distance to the input low-resolution

image patch iy , and if the distance is less than the threshold 2 , then the GPU thread will return

the high-frequency information )( , jlx , where the returned high-frequency information will be

used to construct iD :

− ≤ ∀ ∈2

12

= { ( ) : = ( ) , } .i i i i c ci iD c c y x x X (8)

As a threshold (radius) value for nearest-neighbor search algorithm we used 0.5 for natural images,and 0.3 for facial images. Both low-frequency information of training image and input imagepatches are normalized before calculating euclidean distance. We selected these valuesexperimentally, where at these values we get highest SNR and lowest MSE. As we know that thatthe euclidean distance ( in nearest-neighbor search) is sensitive to noise, but in our approach, ourmain goal is to reduce the number of training patches which are close to input patch. Thus, we takea higher the threshold value for nearest-neighbor search, where we select closest matches, then thesparse representation is performed on them. Note that, sparse representation estimation (Eq. 6)tends to estimate input patch from training patches, where noise is taken care of [14].Reducing thestorage space will slightly increase the super-resolution reconstruction time, since the wavelettransform will be computed during the reconstruction.

Page 8: OBTAINING SUPER-RESOLUTION IMAGES BY COMBINING LOW-RESOLUTION IMAGES WITH HIGH-FREQUENCY INFORMATION DERIVEDFROM TRAINING IMAGES

International Journal of Computer Science & Information Technology (IJCSIT) Vol 5, No 2, April 2013

26

5. EXPERIMENT RESULTS

We performed experiments on a variety of images to test the performance of our approach (HSR) aswell as other super-resolution algorithms: BCI [1], SSR [22] and MSR [8]. We conducted two typesof experiments.

For the first one, we performed the experiment on the Berkeley Segmentation Dataset 500 [15]. Itcontains natural images, where the natural images are divided into two groups; the first group ofimages (Fig. 3(a)) are used to train super-resolution algorithms (except BCI), and the second groupimages (Fig. 3(b)) are used to test the performance of the super-resolution algorithms. To measurethe performance of the algorithms, we use mean-square-error (MSE) and signal-to-noise ratio(SNR) as a distance metric. These algorithms measure the difference between the ground truth andthe reconstructed images.

The second type of experiment is performed on facial images (Fig. 4), where face recognitionsystem is used as a distance metric to demonstrate the performance of the super-resolutionalgorithms.

(a) (b)Figure 3: Depiction of Berkeley Segmentation Dataset 500 images used for (a) training and (b) testing the

super-resolution algorithms.

(a) (b)Figure 4: Depiction of facial images used for (a) training and (b) testing the super-resolution algorithms.

Page 9: OBTAINING SUPER-RESOLUTION IMAGES BY COMBINING LOW-RESOLUTION IMAGES WITH HIGH-FREQUENCY INFORMATION DERIVEDFROM TRAINING IMAGES

International Journal of Computer Science & Information Technology (IJCSIT) Vol 5, No 2, April 2013

27

5.1 Results on Natural Images

In Figure 5, we show the output of the proposed super-resolution algorithm, BCI, SSR, and MSR.The red rectangle is zoomed-in and displayed in Figure 6. In this figure we focus on the effectofsuper-resolution algorithms on low-level patterns (fur of the bear). Most of the super-resolutionalgorithms tend to improve the sharpness of the edges along the border of the objects, which looksgood to human eyes, and the low-level patterns are ignored. One can see that the output of BCI issmooth (Fig. 5(b)), and from the zoomed-in region (Fig. 6(b)) it can be noticed that the edges alongthe border of the object are smoothed, and similarly, the pattern inside the regions is also smooth.This is because BCI interpolates the neighboring pixel values in the lower-resolution to introduce anew pixel value in the higher-resolution. This is the same as taking the inverse wavelet transform ofa given low-resolution image with its high-frequency information being zero, thus thereconstructed image will not contain any sharp edges. The result of MSR has sharp edges, however,it contains block artifacts (Fig. 5(c)). One can see that the edges around the border of an object aresharp, but the patterns inside the region are smoothed, and block artifact are introduced (Fig. 6(c)).On the other hand, the result of SSR doesn’t contain sharp edges along the border of the object, butit contains sharper patterns compared to BCI and MSR (Fig. 5(d)). The result of the proposedsuper-resolution algorithm has sharp edges, sharp patterns, as well as fewer artifacts compared toother methods

(Fig. 5(e) and Fig. 6(e)), and visually it looks more similar to the ground truth image (Fig. 5(f) andFig. 6(f)).

Figure 7 shows the performance of the super-resolution algorithms on a different image with fewerpatterns. One can see that the output of the BCI is still smooth along the borders, and inside theregion it is clearer. The output of MSR looks better for the images with fewer patterns, where ittends to reconstruct the edges along the borders.

(a) (b) (c)

(d) (e ) (f)Figure 5: Depiction of low-resolution, super-resolution and original high-resolution images. (a)

Low-resolution image, (b) output of BCI, (c) output of SSR, (d) output of MSR, (e) output of proposedalgorithm, and (f) original high-resolution image. The solid rectangle boxes in red color represents the

regions that is magnified and displayed in Figure 5 for better visualization. One can see that the output of theproposed algorithm has sharper patterns compared to other SR algorithms.

Page 10: OBTAINING SUPER-RESOLUTION IMAGES BY COMBINING LOW-RESOLUTION IMAGES WITH HIGH-FREQUENCY INFORMATION DERIVEDFROM TRAINING IMAGES

International Journal of Computer Science & Information Technology (IJCSIT) Vol 5, No 2, April 2013

28

(a) (b) (c)

(d) ( e ) (f)

Figure 6: Depiction of a region (red rectangle in Figure 3) for (a) low-resolution image, output of (b) BCI, (c)SSR, (d) MSR, (e) proposed algorithm, and (f) original high-resolution image. Notice that the the proposed

algorithm has sharper patterns compared to other SR algorithms.

(a) (b) (c)

(d) (e ) (f)

Figure 7: Depiction of low-resolution, super-resolution and original high-resolution images. (a)Low-resolution image, (b) output of BCI, (c) output of SSR, (d) output of MSR, (e) output of proposed

algorithm, and (f) original high-resolution image. The solid rectangle boxes in yellow and red colorsrepresent the regions that were magnified and displayed on the right side of each image for better

visualization. One can see that the output of the proposed algorithm has better visual quality compared toother SR algorithms.

Page 11: OBTAINING SUPER-RESOLUTION IMAGES BY COMBINING LOW-RESOLUTION IMAGES WITH HIGH-FREQUENCY INFORMATION DERIVEDFROM TRAINING IMAGES

International Journal of Computer Science & Information Technology (IJCSIT) Vol 5, No 2, April 2013

29

In the output of SSR, one can see that the edges on the borders are smooth, and inside the regions ithas ringing artifacts. The SSR algorithm builds dictionaries from high-resolution andlow-resolution image patches by reducing the number of atoms (columns) of the dictionaries undera constraint that these dictionaries can represent the image patches in the training dataset withminimal difference. This is similar to compressing or dimension reduction, where we try topreserve the structure of the signal, not the details of the signal, and sometimes we get artifactsduring the reconstruction1.

We also computed the average SNR and MSE to quantitatively measure the performance of thesuper-resolution algorithms. Table 5.1 depicts the average SNR and MSE values for BCI, MSR,SSR, and HSR. Notice that the proposed algorithm has the highest signal-to-noise ratio and thelowest difference mean-square-error.

Table 5.1: Experimental Results

5.2. Results on Facial Images

We conducted experiments on surveillance camera facial images (SCFace)[9]. This databasecontains 4,160 static images from 130 subjects. The images were acquired in an uncontrolledindoor environment using five surveillance cameras of various qualities and ranges. For each ofthese cameras, one image from each subject at three distances, 4.2 m, 2.6 m, and 1.0 m wasacquired. Another set of images was acquired by a mug shot camera. Nine images per subjectprovide nine discrete images ranging from left to right profile in equal steps of 22.5 degrees,including a frontal mug-shot image at 0 degrees. The database contains images in visible andinfrared spectrum. Images from different quality cameras mimic real-world conditions. Thehigh-resolution images are used as a gallery (Fig. 4(a)), while the images captured by a camera withvisible light spectrum from a 4.2 m distance are used as a probe (Fig. 4(b)). Since the SCFacedataset consists of two types of images, high-resolution images and surveillance images, we usedthe high-resolution images to train SR methods and the surveillance images as a probe.

We used Sparse Representation based Face Recognition proposed by Wright .et al [21], to testthe performance of the super-resolution algorithms. It has been proven that the performance of theface recognition systems relies on low-level information (high-frequency information) of the facialimages [4]. The high-level information, which is the structure of the face, affects less theperformance of face recognition systems compared to low-level information, unless we comparehuman faces with other objects such as monkey, lion, car, etc., where the structures between them

1 The lower-frequencies of the signal affect more the difference between original and reconstructed signals, compared to thehigher-frequencies. For example, if we remove the DC component (0 Hz) from one of the signals, original or reconstructed, thedifference between them will be very large. Thus keeping the lower-frequencies of the signal helps to preserve the structure and haveminimal difference between the original and reconstructed signals.

Dist. Metric \ SR Algorithms BCI SSR MSR HSR

SNR ( dB ) 23.08 24.76 18.46 25.34

MSE 5.45 5.81 12.01 3.95

Page 12: OBTAINING SUPER-RESOLUTION IMAGES BY COMBINING LOW-RESOLUTION IMAGES WITH HIGH-FREQUENCY INFORMATION DERIVEDFROM TRAINING IMAGES

International Journal of Computer Science & Information Technology (IJCSIT) Vol 5, No 2, April 2013

30

are very different. Most of the human faces have similar structures: two eyes, one nose, twoeyebrows, etc., and in the low-resolution facial images, the edges (high-frequency information)around the eyes, eyebrows, nose, mouth, etc., are lost which decreases the performance of facerecognition systems [21]. Even for humans it is very difficult to recognize a person from alow-resolution image (see Fig. 4). Figure 8 depicts the given low-resolution images, and the outputof super-resolution images. The rank-1 face recognition accuracy for LR, BCI, SSR, MSR, and ourproposed algorithms are: 2%, 18%, 13%, 16%, and 21%. Overall, the face recognition accuracy islow, but compared to the face recognition performance on the low-resolution images, we cansummarize that the super-resolution algorithms can improve the recognition accuracy.

6. CONCLUSION

We have proposed a novel approach to reconstruct super-resolution images for better visual qualityas well as for better face recognition purposes, which also can be applied to other fields. Wepresented a sparse representation-based SR method to recover the high-frequency components ofan

(a) (b) (c) (d)

Figure 8: Depiction of LR and output of SR images. (a) Low-resolution image, output of (b) BCI, (c) SSR,and (d) proposed method. For this experiment we used a patch size of 10×10 pixels; thus when we increasethe patch size we introduce ringing artifacts, which can be seen in the reconstructed image (d). Quantitatively,

in terms of face recognition our proposed super-resolution algorithm outperforms other super-resolutionalgorithms.

SR image. We demonstrated the superiority of our methods over existing state-of-the-artsuper-resolution methods for the task of face recognition in low-resolution images obtained fromreal world surveillance data, as well as better performance in terms of MSE and SNR. We concludethat by having more than one training image for the subject we can significantly improve the visualquality of the proposed super-resolution output, as well as the recognition accuracy.

REFERENCES

[1] M. Ao, D. Yi, Z. Lei, and S. Z. Li. Handbook of remote biometrics, chapter Face Recognition at aDistance:System Issues, pages 155–167. Springer London, 2009.

[2] S. Baker and T. Kanade. Hallucinating faces. In Proc. IEEE International Conference onAutomatic Face and Gesture Recognition, pages 83–88, Grenoble, France, March 28-30, 2002.

[3] H. Chang, D. Y. Yeung, and Y. Xiong. Super-resolution through neighbor embedding. In Proc.IEEEInternational Conference on Computer Vision and Pattern Recognition, pages275–282,Washington DC., 27 June-2 July 2004.

[4] G. Chen and W. Xie. Pattern recognition using dual-tree complex wavelet features and svm. InProc. Canadian Conference on Electrical and Computer Engineering, pages 2053–2056, 2008.

Page 13: OBTAINING SUPER-RESOLUTION IMAGES BY COMBINING LOW-RESOLUTION IMAGES WITH HIGH-FREQUENCY INFORMATION DERIVEDFROM TRAINING IMAGES

International Journal of Computer Science & Information Technology (IJCSIT) Vol 5, No 2, April 2013

31

[5] A. Elayan, H. Ozkaramanli, and H. Demirel. Complex wavelet transform-based face recognition.EURASIP Journal on Advances in Signal Processing, 10(1):1–13, Jan 2008.

[6] S. Farsiu, M. D. Robinson, M. Elad, and P. Milanfar. Fast and robust multiframe super resolution.IEEE Transactions on Image Processing, 13(10):1327–1344, 2004.

[7] W. T. Freeman, T. R. Jones, and E. C. Pasztor. Learning low-level vision. IEEE InternationalJournal of Computer Vision, 40(1):25–47, 2000.

[8] W. T. Freeman and C. Liu. Markov random fields for super-resolution and texture synthesis, chapter10, pages 1–30. MIT Press, 2011.

[9] M. Grgic, K. Delac, and S. Grgic. SCface - surveillance cameras face database. Multimedia Toolsand Applications Journal, 51(3):863–879, 2011.

[10] P. H. Hennings-Yeomans. Simultaneous super-resolution and recognition. PhD thesis, CarnegieMellon University, Pittsburgh, PA, USA, 2008.

[11] J. T. Hsu, C. C. Yen, C. C. Li, M. Sun, B. Tian, and M. Kaygusuz. Application of wavelet-basedPOCS superresolution for cardiovascular MRI image enhancement. In Proc. InternationalConference on Image and Graphics, pages 572 – 575, Hong Kong, China, Dec. 18-20, 2004.

[12] K. I. Kim and Y. Kwon. Single-image super-resolution using sparse regression and natural imageprior. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(6):1127–1133, 2010.

[13] C. Liu, H. Y. Shum, and C. S. Zhang. A two-step approach to hallucinating faces: global parametricmodel and local nonparametric model. In Proc. IEEE Computer Society Conference on ComputerVision and Pattern Recognition, pages 192–198, San Diego, CA, USA, Jun. 20-26, 2005.

[14] J. Mairal, M. Elad, and G. Sapiro. Sparse representation for color image restoration. IEEETransactions on Image Processing, 17(1):53–69, Jan. 2008.

[15] D. Martin, C. Fowlkes, D. Tal, and J. Malik. A database of human segmented natural images and itsapplication to evaluating segmentation algorithms and measuring ecological statistics. In Proc. 8thInternational Conference on Computer Vision, volume 2, pages 416–423, July 2001.

[16] G. Pajares and J. M. Cruz. A wavelet based image fusion tutorial. Pattern Recognition,37(9):1855–1872, Sep. 2004.

[17] J. Sun, N. N. Zheng, H. Tao, and H. Shum. Image hallucination with primal sketch priors. In Proc.IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages II – 729–36vol.2, Madison, WI, Jun. 18-20, 2003.

[18] J. Wang, S. Zhu, and Y. Gong. Resolution enhancement based on learning the sparse association ofimage patches. Pattern Recognition Letters, 31(1):1–10, Jan. 2010.

[19] Q. Wang, X. Tang, and H. Shum. Patch based blind image super-resolution. In Proc. IEEEInternational Conference on Computer Vision, Beijing, China, Oct. 17-20, 2005.

[20] R. Willet, I. Jermyn, R. Nowak, and J. Zerubia. Wavelet based super resolution in astronomy. InProc. Astronomical Data Analysis Software and Systems, volume 314, pages 107–116, Strasbourg,France, 2003.

[21] J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma. Robust face recognition via sparserepresentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(2):210–227,February 2009.

[22] J. Yang, J. Wright, T. Huang, and Y. Ma. Image super-resolution as sparse representation of rawimage patches. In Proc. IEEE Computer Society Conference on Computer Vision and PatternRecognition, Anchorage, AK, Jun. 23-28, 2008.