13
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 5, MAY 2010 1201 New Learning Based Super-Resolution: Use of DWT and IGMRF Prior Prakash P.Gajjar and Manjunath V. Joshi, Member, IEEE Abstract—In this paper, we propose a new learning-based approach for super-resolving an image captured at low spatial resolution. Given the low spatial resolution test image and a database consisting of low and high spatial resolution images, we obtain super-resolution for the test image. We first obtain an initial high-resolution (HR) estimate by learning the high-fre- quency details from the available database. A new discrete wavelet transform (DWT) based approach is proposed for learning that uses a set of low-resolution (LR) images and their corresponding HR versions. Since the super-resolution is an ill-posed problem, we obtain the final solution using a regularization framework. The LR image is modeled as the aliased and noisy version of the corresponding HR image, and the aliasing matrix entries are estimated using the test image and the initial HR estimate. The prior model for the super-resolved image is chosen as an Inhomo- geneous Gaussian Markov random field (IGMRF) and the model parameters are estimated using the same initial HR estimate. A maximum a posteriori (MAP) estimation is used to arrive at the cost function which is minimized using a simple gradient descent approach. We demonstrate the effectiveness of the proposed approach by conducting the experiments on gray scale as well as on color images. The method is compared with the standard interpolation technique and also with existing learning-based approaches. The proposed approach can be used in applications such as wildlife sensor networks, remote surveillance where the memory, the transmission bandwidth, and the camera cost are the main constraints. Index Terms—Discrete wavelet transform (DWT), gradient de- scent, inhomogeneous Gaussian Markov random field (IGMRF), learning, regularization, super-resolution. I. INTRODUCTION I N many imaging applications like remote surveillance, wildlife sensor network and remote sensing it is not feasible to capture the high-resolution (HR) images even if the camera is capable of. This is mainly due to application specific limitations such as memory, transmission bandwidth, power and camera cost. Since the HR imaging leads to better analysis, classifica- tion and interpretation, algorithmic approaches to obtain the HR image using the given low-resolution (LR) observation and a database consisting of LR and HR images can be useful in such applications. The standard interpolation techniques like Manuscript received January 25, 2009; revised September 14, 2009. First published January 26, 2010; current version published April 16, 2010. The as- sociate editor coordinating the review of this manuscript and approving it for publication was Prof. Peyman Milanfar. The authors are with the Dhirubhai Ambani Institute of Information and Communication Technology, Gandhinagar-382 007, Gujarat India (e-mail: [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIP.2010.2041408 pixel replication and bilinear interpolation increase the pixel count without actually adding the details. These techniques perform well in smoother regions of the images and tend to blur edges and other sharp details in the images. Super-resolution techniques attempt to obtain the HR image from one or more LR observations. In this work, we show that it is possible to super-resolve a LR image using available database of set of LR and HR images. Many of the low-cost cameras have limited optical zoom and are fitted with low memory. With such a camera one is forced to capture the images and video at limited resolution. Present-day high-end cameras have options to cap- ture images and video at different spatial resolution. Hence, it is possible to click a large number of LR-HR images offline and store them on a computer. This is a one time operation. Once the database is created, the images captured by a LR camera can be super-resolved using our approach. This is definitely advantageous as one can obtain HR images/video by using a LR camera fitted with a limited memory. The proposed technique can also be used in image/video compression for transmission over a channel with limited bandwidth. One can transmit the compressed LR images and obtain HR at the receiver end by using a set of training pairs. Another important application lies in video surveillance where the video from different cameras can be captured at low spatial resolution and by the use of already available LR-HR database, a HR video can be obtained for analysis at a later time. Many researchers have attempted to solve the problem of super-resolution. Tsai and Huang [1] were the first researchers to improve the resolution of an image. They demonstrate the reconstruction of a single improved resolution image from sev- eral down-sampled noise free versions of it. The authors in [2] use maximum a posteriori (MAP) framework for jointly esti- mating the registration parameters and the HR image from se- verely aliased observations. Elad and Feuer in [3] propose a uni- fied methodology to super-resolve an image from several ob- servations which are geometrically warped, blurred, noisy and downsampled. They combine maximum likelihood, MAP and projection onto convex sets approaches. Capel and Zisserman [4] have employed fusion of information from several planer views for mosaicing and super-resolution. Recently, the authors in [5] propose a joint MAP formulation combining motion esti- mation, segmentation, and super-resolution (SR) together. They solve the super-resolution problem by a cyclic coordinate de- cent process that treats the motion and the segmentation fields as well as the HR image as unknowns and estimates them jointly using the available data. In the letter [6], the authors present an approach to reconstruct high spatial-resolution and high-dy- namic-range images from multiple and differently exposed im- ages simultaneously. They propose a stochastic super-resolution 1057-7149/$26.00 © 2010 IEEE

New Learning Based Super-Resolution: Use of DWT and IGMRF Prior

Embed Size (px)

Citation preview

Page 1: New Learning Based Super-Resolution: Use of DWT and IGMRF Prior

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 5, MAY 2010 1201

New Learning Based Super-Resolution:Use of DWT and IGMRF Prior

Prakash P. Gajjar and Manjunath V. Joshi, Member, IEEE

Abstract—In this paper, we propose a new learning-basedapproach for super-resolving an image captured at low spatialresolution. Given the low spatial resolution test image and adatabase consisting of low and high spatial resolution images,we obtain super-resolution for the test image. We first obtain aninitial high-resolution (HR) estimate by learning the high-fre-quency details from the available database. A new discrete wavelettransform (DWT) based approach is proposed for learning thatuses a set of low-resolution (LR) images and their correspondingHR versions. Since the super-resolution is an ill-posed problem,we obtain the final solution using a regularization framework.The LR image is modeled as the aliased and noisy version ofthe corresponding HR image, and the aliasing matrix entries areestimated using the test image and the initial HR estimate. Theprior model for the super-resolved image is chosen as an Inhomo-geneous Gaussian Markov random field (IGMRF) and the modelparameters are estimated using the same initial HR estimate. Amaximum a posteriori (MAP) estimation is used to arrive at thecost function which is minimized using a simple gradient descentapproach. We demonstrate the effectiveness of the proposedapproach by conducting the experiments on gray scale as wellas on color images. The method is compared with the standardinterpolation technique and also with existing learning-basedapproaches. The proposed approach can be used in applicationssuch as wildlife sensor networks, remote surveillance where thememory, the transmission bandwidth, and the camera cost are themain constraints.

Index Terms—Discrete wavelet transform (DWT), gradient de-scent, inhomogeneous Gaussian Markov random field (IGMRF),learning, regularization, super-resolution.

I. INTRODUCTION

I N many imaging applications like remote surveillance,wildlife sensor network and remote sensing it is not feasible

to capture the high-resolution (HR) images even if the camera iscapable of. This is mainly due to application specific limitationssuch as memory, transmission bandwidth, power and cameracost. Since the HR imaging leads to better analysis, classifica-tion and interpretation, algorithmic approaches to obtain theHR image using the given low-resolution (LR) observation anda database consisting of LR and HR images can be useful insuch applications. The standard interpolation techniques like

Manuscript received January 25, 2009; revised September 14, 2009. Firstpublished January 26, 2010; current version published April 16, 2010. The as-sociate editor coordinating the review of this manuscript and approving it forpublication was Prof. Peyman Milanfar.

The authors are with the Dhirubhai Ambani Institute of Information andCommunication Technology, Gandhinagar-382 007, Gujarat India (e-mail:[email protected]; [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TIP.2010.2041408

pixel replication and bilinear interpolation increase the pixelcount without actually adding the details. These techniquesperform well in smoother regions of the images and tend to bluredges and other sharp details in the images. Super-resolutiontechniques attempt to obtain the HR image from one or moreLR observations. In this work, we show that it is possible tosuper-resolve a LR image using available database of set of LRand HR images. Many of the low-cost cameras have limitedoptical zoom and are fitted with low memory. With such acamera one is forced to capture the images and video at limitedresolution. Present-day high-end cameras have options to cap-ture images and video at different spatial resolution. Hence, itis possible to click a large number of LR-HR images offline andstore them on a computer. This is a one time operation. Oncethe database is created, the images captured by a LR cameracan be super-resolved using our approach. This is definitelyadvantageous as one can obtain HR images/video by using a LRcamera fitted with a limited memory. The proposed techniquecan also be used in image/video compression for transmissionover a channel with limited bandwidth. One can transmit thecompressed LR images and obtain HR at the receiver end byusing a set of training pairs. Another important application liesin video surveillance where the video from different camerascan be captured at low spatial resolution and by the use ofalready available LR-HR database, a HR video can be obtainedfor analysis at a later time.

Many researchers have attempted to solve the problem ofsuper-resolution. Tsai and Huang [1] were the first researchersto improve the resolution of an image. They demonstrate thereconstruction of a single improved resolution image from sev-eral down-sampled noise free versions of it. The authors in [2]use maximum a posteriori (MAP) framework for jointly esti-mating the registration parameters and the HR image from se-verely aliased observations. Elad and Feuer in [3] propose a uni-fied methodology to super-resolve an image from several ob-servations which are geometrically warped, blurred, noisy anddownsampled. They combine maximum likelihood, MAP andprojection onto convex sets approaches. Capel and Zisserman[4] have employed fusion of information from several planerviews for mosaicing and super-resolution. Recently, the authorsin [5] propose a joint MAP formulation combining motion esti-mation, segmentation, and super-resolution (SR) together. Theysolve the super-resolution problem by a cyclic coordinate de-cent process that treats the motion and the segmentation fieldsas well as the HR image as unknowns and estimates them jointlyusing the available data. In the letter [6], the authors presentan approach to reconstruct high spatial-resolution and high-dy-namic-range images from multiple and differently exposed im-ages simultaneously. They propose a stochastic super-resolution

1057-7149/$26.00 © 2010 IEEE

Page 2: New Learning Based Super-Resolution: Use of DWT and IGMRF Prior

1202 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 5, MAY 2010

reconstruction algorithm that models nonlinear camera responsefunction, exposure time, sensor noise, and quantization error inaddition to spatial blurring and sampling. The authors in [7] pro-pose a technique to enhance license plate numbers of movingvehicles in real traffic videos. They obtain a HR image of thenumber plate by fusing the information derived from multiple,sub-pixel shifted, and noisy LR observations. They model thesuper-resolved image as a Markov random field and estimate itusing a graduated nonconvexity optimization procedure. The ar-ticle [8] reviews a variety of super-resolution methods. Farsiu etal. [9] propose a unified approach of demosaicing and super-res-olution of set of LR color images. They employ bilateral regular-ization of the luminance term for reconstruction of sharp edges,and that of the chrominance term and intercolor dependenciesterm to remove the color artifacts from the HR estimate. Theyuse norm for the data error term to make the method robustto errors in data and modeling. Most common methods for colorimage super-resolution involve application of super-resolutionalgorithm to each of the color components independently ortransform the problem to a different color space where chromi-nance layers are separated from luminance and super-resolu-tion is applied to luminance component only. All the above ap-proaches use motion as a cue for solving the super-resolutionproblem that requires accurate registration between the obser-vations.

In recent times, learning-based super-resolution algorithmshave attracted much attention. In these algorithms, the a prioriinformation is derived from the training database. Freeman et al.[10] propose an example-based super-resolution technique.They estimate missing high-frequency details by interpolatingthe input LR image into the desired scale. The super-resolutionis performed by the nearest neighbor-based estimation ofhigh-frequency patches based on the corresponding patchesof input low-frequency image. The authors in [11] presenta learning-based approach to recognize digits of the vehicleregistration plates. They super-resolve and restore the imagepatches using undirected graphical model. The learning-basedimage hallucination technique is proposed in [12]. Here, theauthors use primal sketch priors for primitive layers (edges andjunctions) and employ patch-based learning using large imagedatabase. In [13], Baker and Kanade proposed a hallucinationtechnique based on the recognition of generic local features.These local features are then used to predict a recognition-basedprior rather than a smoothness prior as is the case with mostof the super-resolution techniques. The authors in [14] presenta learning-based method to super-resolve face images using akernel principal component analysis (PCA) based prior model.They regularize the solution using prior probability based onthe energy lying outside the span of principal componentsidentified in a higher dimensional feature space. Ni and Nguyenutilize support vector regression (SVR) in the frequency do-main and pose the super-resolution problem as a kernel learningproblem [15]. The drawback of SVR is that it increases thecomputational complexity. Based on the framework of Freemanet al. [16], Kim and Kwon investigate a regression-based ap-proach for single-image super-resolution [17]. Here the authorsgenerate a set of candidates for each pixel using patch-wiseregression and combine them based on the estimated confi-

dence for each pixel. In the post processing step, they employregularization technique using discontinuity preserving prior.Brandi et al. [18] propose an example-based approach for videosuper-resolution. They restore the high-frequency informationof an interpolated block by searching in a database for a similarblock, and by adding the high frequency of the chosen block tothe interpolated one. In [19], the authors address the problemof super-resolution from a single image using multiscale tensorvoting framework. They consider simultaneously all the threecolor channels to produce a multiscale edge representation toguide the process of HR color image reconstruction, which issubjected to the back projection constraint. Liu and Shum [20]present a two-step hybrid approach for super-resolving faceimage by combining Freeman’s image primitive technique [10]and PCA model-based approach. They propose a global para-metric model called “global face image” carrying the commonfacial properties and a local nonparametric model called “localfeature image” that records the local individualities. The HRface image is obtained by composition of the global face imageand the local feature image. The authors in [21] recover thesuper-resolution image through neighbor embedding algorithm.They employ histogram matching for selecting more reason-able training images having related contents. In [22], authorspropose a Neighbor embedding-based super-resolution throughedge detection and Feature Selection (NeedFS). They proposea combination of appropriate features for preserving edgesas well as smoothing the color regions. The training patchesare learned with different neighborhood sizes depending onedge detection. The disadvantage of all the above approachesis that they either obtain the LR images in the database bydownsampling the HR images, i.e., simulate the LR images oruse an interpolated version of the LR image while searching.Such a database do not represent the true spatial featuresrelationship between LR-HR pairs as they do not correspondto the images captured by a real camera. Jiji et al. demonstratesuper-resolution of a single frame gray scale image using atraining database consisting of HR images downloaded fromthe internet [23]. They learn the high-frequency details of theSR image from the database and obtain regularized solutionby employing Markov random field (MRF) prior model and awavelet prior. Our work in this paper is based on their work.However, we use a different approach for learning as well asfor regularization.

In this paper, we present a learning-based approach forsuper-resolving an image using single observation. This paperis based on our work in [24]. First, we learn high-frequencycontents of the super-resolved image from a database and obtaininitial estimate of super-resolved image. We use DWT-basedmethod to learn the high-frequency contents. We constructthe database by capturing both the LR as well as their HRversions using a real camera. Thus, we make use of the truetransformation that exists between the LR and HR imageswhile learning. We then model the unknown HR image as anIGMRF and estimate the model parameters using the initialHR estimate. The aliasing (decimation) matrix entries usedin the image formation model are also estimated using thesame initial estimate. The final HR estimate is obtained byusing the MAP formulation. The method is extended to color

Page 3: New Learning Based Super-Resolution: Use of DWT and IGMRF Prior

GAJJAR AND JOSHI: NEW LEARNING BASED SUPER-RESOLUTION 1203

image super-resolution where we super-resolve the luminancecomponent using proposed learning-based approach and theninterpolate the chrominance components in the wavelet domainin order to obtain super-resolved color image.

The outline of the paper is as follows. In Section II, a generaldescription of the proposed method is presented. Section III de-scribes the approach for learning initial estimate for super-re-solved image using DWT. The forward model for the imageformation process and the technique of decimation estimationare discussed in Section IV. Section V addresses the IGMRFprior model for the super-resolved image and the estimation ofmodel parameters, while Section VI describes MAP estimationof the super-resolved image using a simple optimization tech-nique. Application of the proposed technique to super-resolvecolor images is described in Section VII. The experimental re-sults and the performance of the proposed approach are dealtin Section VIII. Some concluding remarks are drawn in Sec-tion IX.

II. BLOCK DIAGRAM DESCRIPTION OF THE APPROACH

The proposed technique of learning-based super-resolutionis illustrated by the block diagram shown in Fig. 1. Given aLR observation (test image), we learn its high-frequency con-tents from a database consisting of a set of LR images and theirHR versions. It may be noted that the LR images are not con-structed by downsampling the HR images as is done by mostof the learning-based approaches. Instead they are captured byusing a real camera comprised of various resolution settingsand hence represent the true LR-HR versions of the scenes.In order to learn the high-frequency components, we considerdiscrete wavelet transform-based method. The transform coeffi-cients corresponding to the high-frequency contents are learnedfrom the database and an initial estimate for the HR version ofthe observation is obtained by taking the inverse DWT. This ini-tial HR estimate is used for decimation estimation as well asfor estimating the IGMRF parameters. The estimated decima-tion models the aliasing due to undersampling and the IGMRFparameters inject the geometrical properties in test image cor-responding to the HR image. We then use an MAP estimationto arrive at a cost function consisting of data fitting term andthe prior term. A suitable optimization is exploited to minimizethe cost function. The minimization leads final super-resolvedimage. We extend this method to color image super-resolutionwhere we super-resolve the luminance component using theproposed method and use the interpolation in wavelet domainfor chrominance components. The luminance and chrominancecomponents are then combined to obtain the super-resolution.We mention here that although we use a large number of LR-HRimages in the database, it is not possible to capture the true spa-tial features for the HR image using the learning. Also the use ofwavelet transform limits the learning of edges in the horizontal,vertical and diagonal directions only. Hence, we need regular-ization in order to obtain a better solution.

The main highlights of the paper are as follows.• We learn the wavelet coefficients that correspond to the

high-frequency contents of the super-resolved image froma database consisting of a set of LR images and theirHR versions. Since the construction of the database is a

Fig. 1. Schematic representation of proposed approach for image super-resolu-tion. Here LR, HR, and SR stand for LR, HR, and super-resolution, respectively.IGMRF represents Inhomogeneous Gaussian Markov random field.

one time and offline operation, we can use the computermemory for the storage of database images. This allowsus to capture a large number of images, even when thememory of the camera is limited.

• The database consists of LR and HR images both capturedby varying the resolution setting of a camera. Such pairstruly represent the spatial features relationship between LRimage and its HR versions.

• An inhomogeneous Gaussian MRF is used to model thesuper-resolved image field. The advantage in using thismodel is that it is adaptive to the local structures in animage and hence eliminates the need for a separate edgepreserving prior.

• We estimate the aliasing matrix entries and the IGMRFparameters using the initial HR estimate. Since we use trueLR-HR pairs for learning the initial HR estimate we expectthat the estimated parameters are close to their true values.

• For color image super-resolution, we apply the proposedmethod for super-resolving luminance component andsuggest a wavelet domain interpolation approach for thechrominance components.

• While using the edge preserving IGMRF prior we employa simple gradient descent approach and thus avoid the useof computationally taxing optimization techniques such assimulated annealing.

III. LEARNING THE INITIAL HR ESTIMATE

In this section, we discuss the new learning technique inorder to obtain the initial HR estimate. It may be noted thatthe learning-based approach proposed in [23] uses a databaseconsisting of HR images only and these images are downloadedfrom the internet. The main drawback of this approach is thatthe use of the images downloaded from the internet does notguarantee that these images indeed represent HR database.They may represent the upsampled versions of the LR im-ages obtained using standard interpolation techniques. Also,these may represent the collection of images captured usingdifferent cameras with different hardware configurations. Allthis contributes for errors in the SR estimation. Our approachdiffers from their approach. We use a training set of LR-HRimages covering a wide range of scenes. These images arecaptured by adjusting the resolution setting of a real camera.It is of interest to note that our database do not contain the LRimages synthesized using the downsampling operation as usedby the other learning-based approaches. Truly, this makes the

Page 4: New Learning Based Super-Resolution: Use of DWT and IGMRF Prior

1204 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 5, MAY 2010

algorithm capable of super-resolution for the cases when theHR ground truth is not available.

In this paper, we learn the transform coefficients for the initialestimate of the super-resolved image for a decimation (upsam-pling) factor of and . The database for leaningconsists of a large number sets of LR and HR images coveringindoor scenes as well as outdoor scenes taken at different timesand with different lighting conditions. For a decimation factor

, a set consists of two images (LR and HR) for each ofthe captured scenes. Similarly, for a decimation factor of ,there are three images in a set for each scene having differentresolutions. Before we describe the method for learning, it is im-portant to mention the following points. For a decimation factorof , we first learn the initial estimate for using thedatabase consisting of LR-HR pairs with a resolution factor of2. We then use this estimate as the test image for . Wethus apply the single octave learning algorithm in two steps inorder to obtain image super-resolution for . The reasonfor the two step operation is as follows. In a multiresolutionsystem, every coefficient at a given scale can be related to aset of coefficients at the next coarser scale of similar orienta-tion [25]. The Lipschitz property states that near sharp edges,the wavelet coefficients changes exponentially over the scales[26], [27]. Hence, the error between the coefficient in test imageand its best matching coefficient from the LR training image in-creases exponentially when one learns the wavelet coefficientsusing a database of LR-HR pairs with a resolution difference offour. In the proposed single octave approach this error adds uplinearly over the scales. Thus, the error propagation in the pro-posed approach is linear and the learning of wavelet coefficientsis more accurate in comparison with the learning using one stepoperation. It may be noted here that the two step operation of thesingle octave algorithm restricts the super-resolution to powersof 2 only.

Now we describe the approach for DWT-based learning fora decimation factor of . We use two level waveletdecomposition of the test image for learning the wavelet coef-ficients at the finer scale. The LR training images in the data-base are also decomposed into two levels, while their HR ver-sions are decomposed into one level. The reason for taking onelevel decomposition for HR training images is as follows. Withone level decomposition the subband LL represents the scaledversion of the HR image and the subbands LH, HL, and theHH represent the detailed coefficients (vertical, horizontal anddiagonal edges) at the HR. This means that for , boththe LR image (subband LL) and the edge details at finer scales(subbands LH, HL, and HH) are available in HR transformedimage. This motivates us to compare the edges in the test imagewith those present in the LR training set and choose the bestmatching wavelet coefficients from the HR images. Thus, givenan LR test image, we learn its edges (high-frequency content)at finer scale using these LR-HR training images. Fig. 2 illus-trates the block schematic for learning the wavelet coefficientsof the test image at finer scales using a set of training imagepairs for a decimation factor of 2. Fig. 2(a) shows the subbands

of the LR test image and Fig. 2(b) displays the subbands, of the LR training images and sub-

bands , of the HR training images. We compare

Fig. 2. Illustration of learning of wavelet coefficients at a finer scale using adatabase of LR-HR image pairs. (a) Test image (LR observation) with a two levelwavelet decomposition. Wavelet coefficients (marked as hollow squares) are tobe estimated for the subbands shown with the dotted lines. (b) A training set ofLR and HR images in the wavelet domain (LR training images are decomposedinto two level and the HR training images into one level).

the coefficients in subbands to of the test image with thosein subbands to of the LR training images and obtainthe best matching coefficients for subbands , , andof the test image. Here the test image and the LR training im-ages are of size pixels. The HR training images havea size of pixels. is the number of sets of LR-HRtraining images in the database.

The learning procedure is as follows. Let bethe wavelet coefficient at a location in subband0, where . The wavelet coefficients

, andare the wavelet coefficients corresponding to subbands ,

and , respectively. These wavelet coefficients and a2 2 block consisting of ,

and

in each of the subands arethen considered to learn a 4 4 wavelet block in each of thesubbands . For each of the subbands inthe test image, we need to learn coefficients forcorresponding location in subbands . Thus, for everylocation in subband 0, we learn a totalcoefficients for the subbands . We search for theLR training image that has a bast match with the test imageby comparing the wavelet coefficients in the subbandsin the minimum absolute difference (MAD) sense. The cor-responding wavelet coefficients from the subbandsof the HR training image are then copied into the subbands

of the test image. For a given location the bestmatching LR training image in the subbands is foundby using the following equation for MAD. See (1), shown at

Page 5: New Learning Based Super-Resolution: Use of DWT and IGMRF Prior

GAJJAR AND JOSHI: NEW LEARNING BASED SUPER-RESOLUTION 1205

the bottom of the page, where , ,and is an index to the best matching LR image in thedatabase for the location and . Here,

, , denotes the wavelet coefficientfor training image at location . For each location insubband of the LR observation, a best fit 4 4 blockof wavelet coefficients in subbands from the HR imageof the training pairs given by are copied into subbands

, and of the test image. Thus, we have

(2)

Here, , , , and . Thiscompletes the learning process. The inverse wavelet transformof the learned HR image then gives initial HR estimate.

We now analyze the computational complexity of the pro-posed approach for single octave operation. We learn thewavelet coefficients corresponding to high-frequency using adatabase consists of pairs of LR-HR images. Let the testimage of size be decomposed into levels of wavelettransform. As a result of this decomposition, the transformedimage has subbands of details and one subband of theapproximation. There are subbands for each of vertical,horizontal and diagonal details. Consider the subbands corre-sponding to vertical details. We learn the vertical details at finerscale using the vertical details across all coarser scales. In otherwords, we use wavelet coefficients in all the vertical detailssubbands to find best matching coefficients from the database.The size of a subband at level isand it consists of coefficients. We need to learnfiner coefficients for each of the coefficients for the coarsesubband. For each of the coefficients in this subband, thebest matching coefficients at finer level can be searched bycomparing corresponding coefficients in thevertical detail subbands. These coefficients are compared withcoefficients at corresponding locations in the each of the LRimages in the database. Thus, number of comparison requiredfor learning vertical details is .Similar comparisons are used to find best matching coef-ficients at finer level of horizontal and diagonal subbands.Hence, total number of comparison operations for learning

Fig. 3. Image formation model. Here, � symbol represents decimation and �

represents the decimation factor.

all the details amounts to . Inour experiment, we use two level decompositionof the test image of size 64 64 and learn the coefficientsusing the database of 750 pairs of LR-HR im-ages. In this case, the number of comparisons required are

. Although thisinvolves number of computations, it is not computationallytaxing to present day high-performance computers as theprocess is not iterative.

IV. FORWARD MODEL AND DECIMATION ESTIMATION

We propose a super-resolution algorithm that attempts to es-timate HR image from single LR observation. This is an inverseproblem. Solving such a problem needs a forward model thatrepresents the image formation process. We consider a linearforward model for the image formation which can be written as

(3)

Here, the observed image is of size pixels andrepresents the lexicographically ordered vector of size ,which contains the pixels from image . Similarly, is actualHR image. The decimation matrix takes care of aliasing. Foran integer decimation factor of , the decimation matrix con-sists of nonzero elements along each row at appropriate lo-cations. Here, is the independent and identically distributed(i.i.d.) noise vector with zero mean and variance . It has samesize as . The pictorial representation of this observation modelis shown in Fig. 3. The multivariate noise probability density isgiven by

(4)

Our problem is to estimate given , which is an ill-posed in-verse problem.

Generally, the decimation model used to obtain the aliasedpixel intensities from the HR pixels, for a decimation factor of

(1)

Page 6: New Learning Based Super-Resolution: Use of DWT and IGMRF Prior

1206 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 5, MAY 2010

, has the form [28]

(5)

The decimation matrix in (5) indicates that a LR pixel intensityis obtained by averaging the intensities of pixels cor-

responding to the same scene in the HR image and adding noiseintensity . This decimation model simulates the integra-tion of light intensity that falls on the HR detector. This assumesthat the entire area of a pixel acts as the light sensing area andthere is no space in the pixel area for wiring or insulation. Inother words, fill factor for the CCD array is unity. However, inpractice, the observed intensity at a pixel captured due to LRsampling depends on various factors such as fill factor, cameragain, zoom factor, noise etc. Hence, the aliased LR pixel inten-sity of an image point is not always equally weighted sum ofthe HR intensities and it has to be estimated. The estimation ofaliasing needs true HR image which is not available. Since theinitial HR estimate is already available (discussed in Section III)we make use of the same to estimate the decimation matrix en-tries and thus learn the aliasing. The decimation matrix of theform shown in (5), can now be modified as

(6)

where . When we use (5) each ofthe has a value of . However, in the above equation, theestimates of are based the LR observation and the initial HRestimate. Thus, the estimated are more accurate as comparedto using in (5) and are closer to the true values forthe chosen model. We use a simple least squares approach toestimate the decimation. It may be noted that this form of deci-mation matrix implicitly contains moving average (space in-variant) blur in the downsampling process.

V. IGMRF PRIOR MODEL

As discussed in Section IV super-resolution is an ill-posedinverse problem. There are infinite solutions to (3). A reason-able assumption about the nature of the true image makes theill-posed problem into better posed and this leads to a better so-lution. Selection of the appropriate model as the prior helps usto obtain a better solution. In practice, real images are made upof various textures, sharp edges and smooth areas. The edges inan image lead to high-frequency details whereas smooth regionscorrespond to low-frequency details. In our work, we use a priorthat brings in a spatial correlation constraint in the HR image in

order to get better solution. Markov random field (MRF) modelis a widely used model to characterizes such a spatial depen-dency. An MRF prior for the unknown HR image can be de-scribed by using a energy function expressed as the Gibbsiandensity given by

(7)

where is the super-resolved image to be estimated and ispartition function. One can choose as a quadratic form with asingle global parameter, assuming that the images are globallysmooth. However, a more efficient model would be to choose

such that only the homogeneous regions are smooth and dis-continuities are preserved in the form of edges. In order to takecare of discontinuities, Geman and Geman [29] introduced theconcept of line fields. But the estimation of the line field pa-rameters is computationally taxing. In order to avoid the use ofMRF with line fields and reduce the computational complexityone can consider the use of homogeneous AR model as in [30].However, this prior tends to blur the edges and other sharp de-tails in the super-resolved image.

In general, a real image cannot be represented efficientlyby using homogeneous models. A more efficient model is onewhich considers that only homogeneous regions are smoothand that edges must remain sharp. This motivates us to consideran inhomogeneous prior which can adapt to the local structureof the image in order to provide a better reconstruction. Thishelps to eliminate the need for a separate prior to preserveedges as well as smoother regions in an image. We model thesuper-resolved image by an inhomogeneous Gaussian MRFwith an energy function that allows us to adjust amount of reg-ularization locally. Corresponding energy function is definedin [31] as

(8)

where is first order derivative operator and is the super-re-solved image. and are the IGMRF parameters at lo-cation for vertical and horizontal directions, respectively.In the above energy function, the authors model the spatial de-pendency at a pixel location by considering a first order neigh-borhood thus considering edges occurring in the horizontal andvertical directions only. However, in practice, there may be di-agonal edges in the reconstructed image. In order to take careof these edges we consider a second order neighborhood andmodify the energy function as follows. See the equation shownat the bottom of the page. Here, and are the IGMRF pa-rameters at location for diagonal directions. A low value of

indicates the presence of edge between two pixels. These pa-rameters help to obtain a solution which is less noisy in smooth

Page 7: New Learning Based Super-Resolution: Use of DWT and IGMRF Prior

GAJJAR AND JOSHI: NEW LEARNING BASED SUPER-RESOLUTION 1207

areas and preserve sharp details in other areas. Now, in orderto estimate the IGMRF parameters we need the true super-re-solved image which is not available and has to be estimated.The authors in [31] use IGMRF prior to regularize the solutionwhile using them for satellite image deblurring. They first obtaina close approximation of the final solution using a different ap-proach and use the same for estimating the IGMRF parameters.In our work the learned initial HR estimate is a close approxi-mation to the SR image, and, hence, we make use of the samefor estimating the IGMRF parameters. These prior parametersare estimated as discussed in [31] using the following equations:

(9)

where is the pixel intensity of the initial estimate at loca-tion . Thus, we estimate four parameters at each pixel loca-tion. These parameters cannot be approximated from degradedversions of original image. The parameters estimated from theblurred image have high values which leads to oversmooth so-lution and the parameters estimated from the noisy image are ofvery low values that leads to noisy solutions. Hence, we use thealready learned HR estimation in order to obtain a better esti-mate of these parameters. In order to avoid computational diffi-culties, we set an upper bound whenever the gradientbecomes zero, i.e., whenever the neighboring pixel intensitiesare the same. Thus, we set a minimum spatial difference of 1for practical reasons. This avoids obtaining high regularizationparameter that would slow down the optimization. It ensuresthat the pixels with zero intensity difference are weighted al-most same as those with small intensity difference (in this case,with a pixel intensity difference of one).

VI. SUPER-RESOLUTION ESTIMATION

We now explain how an MAP estimation of the dense inten-sity field (super-resolved image) can be obtained. The IGMRFmodel on the super-resolved image serves as the prior for theMAP estimation in which the prior parameters are alreadyknown. The data fitting term is derived from the forwardmodel which describes the image formation process. The datafitting term contains the decimation matrix estimated using theinitial HR image and the test image. In order to use maximuma posteriori estimation to super-resolve the test image, we needto obtain the estimate as

(10)

Using the Bayes’ rule, we can write

(11)

Since the denominator is not a function of , (11) can be writtenas

(12)

Now taking the , we can write

(13)

Finally, using (3) and (7), the final cost function to be minimizedcan be expressed as

(14)

In (14), the first term ensures the fidelity of the final solutionto the observed data through the image formation model. Thesecond term is inhomogeneous smoothness prior. Since this costfunction is convex, it can be easily minimized using a simplegradient descent optimization technique, which quickly leads tothe minima. This optimization process is an iterative method andthe choice of initial solution fed to the optimization process de-termines the speed of convergence. Use of a close approximateto the solution as an initial estimate speed-up the optimizationprocess. In order to provide good initial guess, we use the al-ready learned HR estimate.

VII. APPLYING THE ALGORITHM TO COLOR IMAGES

Application of the monochrome super resolution technique toeach of the , , and color components unbalance the naturalcorrespondences between the color components in the solution.This produces certain artifacts in super-resolved color image. Inaddition, applying super-resolution techniques to each of thesecomponents separately increases the computational burden. Toavoid these drawbacks, we separate the luminance channel andchrominance components by applying color space transforma-tion and represent the color image in color space.Sincethe human eye is more sensitive to the details in the luminancecomponent of an image than the details in the chrominance com-ponent, we super-resolve the luminance component using theproposed approach and expand the chrominance componentsusing a simple interpolation technique. The authors in [32] pro-pose an interpolation technique for gray scale images by inter-polating the wavelet coefficients at finer scale. We apply theirapproach for expanding the chrominance components and

. The frequency domain interpolation of these componentsleads to enhanced edges as compared to spatial interpolationmethods like bilinear interpolation or bicubic interpolation. Weuse the super-resolved luminance component and the interpo-lated chrominance components to obtain super-resolved colorimage by converting to color space. In the fol-lowing section, we describe the frequency domain interpolationof the chrominance components of a color image.

Now we describe frequency domain interpolation of thechrominance components. We take two level wavelet decompo-sition of the chrominance components as shown in Fig. 4 andinterpolate wavelet coefficients for finer scale, i.e., subbands

. We exploit the idea from zero tree concept, i.e.,in a multiresolution system, every coefficient at a given scale

Page 8: New Learning Based Super-Resolution: Use of DWT and IGMRF Prior

1208 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 5, MAY 2010

Fig. 4. Interpolation of the wavelet coefficients at finer scales. The waveletcoefficients in subbands � ����� are interpolated using the zero tree concept.

can be related to a set of coefficients at the next coarser scaleof similar orientation [25]. This can be used to find the waveletcoefficients at finer scale. Thus, to interpolate the wavelet coef-ficients in subband , we relate the coefficients in subbands

and and calculate the ratios , asdescribed below.

Let be the wavelet coefficient at a location ,where . The wavelet coefficients ,

and are the waveletcoefficients corresponding to subbands , and , re-spectively. Consider subbands and for interpolatingcoefficients in subband . The four wavelet coefficients

in subband are related to asingle coefficient in subband . We define thefour ratios , , of the four coefficients in subband

and related one wavelet coefficient in subband as

A single coefficient in subband corre-sponds to four coefficients in sub-band . We calculate these four coefficients by multiplying

with four ratios ,

(15)

Similarly, the wavelet coefficients in subbands andare interpolated by calculating from the respective sub-bands and multiplying them with corresponding coefficients insubbands and , respectively. Once the coefficients in all

three subbands are interpolated, we take inversewavelet transform to get the expanded image. We apply this in-terpolation technique to both the components, i.e., and .

VIII. EXPERIMENTAL RESULTS

In this section, we demonstrate the efficacy of the proposedmethod to super-resolve a LR observation. We first show resultsof our learning-based techniques to obtain the initial estimateof the super-resolved image. We then illustrate the results of theoptimization using the initial HR estimate for super-resolvingthe gray scale images as well as the color images. All the exper-iments are conducted on real world images. The test images areof size 64 64 and the super-resolution is shown for upsam-pling (decimation) factors of and , respectively.Thus, the size of the super-resolved images are 128 128 and256 256, respectively. We use ’ ’ wavelets in all our ex-periments while estimating the initial HR image and also whileinterpolating the chrominance components. In order to comparethe results using quantitative measure, we use the mean squarederror (MSE) as the criteria which is given by

(16)

where represents the original HR image and rep-resents the estimated initial HR image or the super-resolvedimage. In order to compare the results based on MSE, we choosean LR image from the database as a test image, and, hence, thetrue HR image is available for comparison. It has to be men-tioned here that the HR images of the test images are removedfrom the database during learning process. We also conductan experiment when the observed image is taken from a LRcamera. For this experiment we show the qualitative assessmentonly. All the experiments were conducted on a computer withPentium M, 1.70-GHz processor.

A. Experimental Results on Initial Estimates

Since we use the initial HR image to estimate the aliasing aswell as the IGMRF parameters it is important to discuss on thequality of the learned HR image. In this section, we show theresults on the initial estimates obtained using our new learningstrategy. We compare these results with the learning-based ap-proach presented in [23] as they use a learning strategy basedon the DWT. In this paper, the authors learn the high-frequencydetails from the database consisting of HR images and obtainthe super-resolution in a regularization frame work. They usean MRF prior and a wavelet prior while using regularization.The training images used for learning are downloaded from theinternet. In our approach, we construct a database of LR im-ages and their HR versions. It is worth to mention again thatour database do not contain simulated images generated usingdownsampling or upsampling. The construction of this databaseis one time and offline operation. For each scene there are 2 or3 images depending on upsampling factor. For example, for anupsampling factor of , each scene has two images,an LR image and its HR version. If , then there are threeimages, an LR image and two HR versions of the same. All the

Page 9: New Learning Based Super-Resolution: Use of DWT and IGMRF Prior

GAJJAR AND JOSHI: NEW LEARNING BASED SUPER-RESOLUTION 1209

images in the database are of real world scenes. A computer con-trolled camera was used to capture the images for the database.In order to avoid motion of the camera while capturing imagesof a scene at different resolutions, a stable and isolated phys-ical setup was used. The camera was triggered by a MATLABprogram at successive but three different time instances for cap-turing images at three different resolutions. The resolution set-ting of the camera was changed by the program before eachtrigger. The time duration between two successive triggers wasless than a millisecond. The images of live subjects were cap-tured under controlled environment. We assume that the mo-tion of subjects like human and other moving objects duringone thousandths of a second is negligible. We applied mean cor-rection for compensating the intensity variations among the im-ages of each scene. Once the database is ready it can be usedfor super-resolving images captured by the same camera or bya different camera having LR. Our database consists of imagesof 750 scenes. LR-HR images in the database include indoor aswell as outdoor scenes captured at different times. The LR andHR images are of sizes 64 64, 128 128, and 256 256, re-spectively, and the test image (to be super-resolved) is of size64 64. It may be noted here that size of the test image need tobe an integer power of 2 and should not exceed that of the LRtraining images. We have a total of images usedfor learning the initial HR estimate. Since we show the resultsfor and , we make use of and

for and , respectively. Here themultiplication factors 2 and 3 correspond to number of imagesof a scene. Fig. 5 shows randomly selected training images inthe database. It may be mentioned here that we make use of im-ages in columns (a) and (b) while learning for and useimages in column (b) and (c) for . The gray scale testimages used in the experiments are displayed in Fig. 6. Thesetest images are made up of different textures and contain sharpedges as well as smooth areas. The size of test images Image 1,Image 2 and Image 3 is 64 64 and that of Image 4 and Image5 is 128 128.

Figs. 7 and 8 show the results of the proposed learning tech-nique for the upsampling factor . Fig. 7 displays resultsfor the test images of size 64 64. Fig. 7(a) shows ground truthimages and Fig. 7(b) shows images expanded using bicubic in-terpolation. Fig. 7(c) and (d) shows the images upsampled usingthe Jiji et al. [23] approach and the proposed learning tech-nique, respectively. Similar results for the test images of size128 128 are shown in Fig. 8. It can be seen that characters onthe keys in image 1 and image 4 upsampled using proposed ap-proach are clearly visible. Similarly window panes in image 2and image 5 in Figs. 7(d) and 8(d) appear sharp. Now, we showthe results of the proposed learning technique for the upsam-pling factor . We use Image 1, Image 2 and Image 3 dis-played in Fig. 6 as the test images and obtain the initial HR esti-mates of size 256 256. The results are shown in Fig. 9. In thisfigure, the first column shows the images expanded usingbicubic interpolation. The columns (b) and (c) show the imageslearned using Jiji et al. approach and the proposed approaches,respectively. Comparison of these images shows that the learnedimages using proposed technique exhibit sharp edges and bettertexture. This is expected as we learn the edges using a data set

Fig. 5. Randomly selected sets of training images in the database. (a) LR im-ages. (b) HR images with upsampling (decimation) factor of � �� � ��, and(c) HR images with upsampling factor of 4 �� � ��.

Fig. 6. LR observed images (test images). The size of Image 1, Image 2, andImage 3 is 64� 64 and that of Image 4 and Image 5 is 128� 128.

that has true LR-HR images. The boundary regions in the im-ages are not reconstructed well as we assume zero values for thepixels outside the image. This can be reduced by performingmirroring on the boundaries prior to processing. The quanti-tative comparison of these images is shown in Table I. It canbe clearly observed that the MSE for the proposed learning ap-proach is less than that for the approach of Jiji et al. [23]. Table IIshows the computational complexity of the proposed learningalgorithm in terms of the time required for learning the initialHR estimates for and . From the values given in thetable, we observe that for the time for learning is higher ascompared to that for . This is because we use two step op-eration for while learning. The use of one step operationreduces the computation time. However, as already pointed outit suffers from the exponential error propagation while learning.Thus, we compromise speed for better accuracy.

B. Experimental Results on Super-Resolution

Let us now see how well we can super-resolve a given LRobservation. First we present the results on super-resolutionfor gray scale images. The observed images are super-resolvedusing the proposed regularization framework formulated as anMAP estimate. The regularization makes use of the decimation

Page 10: New Learning Based Super-Resolution: Use of DWT and IGMRF Prior

1210 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 5, MAY 2010

Fig. 7. Results of learning the initial HR estimates for the test images of size64� 64 shown in Fig. 6 �� � ��. (a) Ground truth image, (b) images expandedusing the bicubic interpolation, (c) initial HR estimates obtained using the ap-proach proposed in [23], and (d) initial HR estimates obtained using the pro-posed learning technique.

Fig. 8. Results of learning the initial HR estimates for the test images of size128� 128 shown in Fig. 6 �� � ��. (a) Ground truth image, (b) Images ex-panded using the bicubic interpolation, (c) initial HR estimates obtained usingthe approach proposed in [23], and (d) initial HR estimates obtained using theproposed learning technique.

matrix entries and the IGMRF parameters, both of which areestimated using the learned initial HR estimate. We use thesame LR observations as used in the previous experiment. Onceagain the results obtained are compared with the approachproposed in [23]. The comparison is also shown with the use ofbicubically interpolated image as the initial HR estimate. Whileusing the bicubic interpolation as the initial estimate for reg-ularization we use the decimation and the IGMRF parametersestimated by the same. Fig. 10 shows the results for the upsam-pling factors of and . In Fig. 10(a)–(c), weshow the results for , and in Fig. 10(d)–(f), we show theresults for . Fig. 10(a) and (d) shows the super-resolvedimages using the MRF and wavelet priors [23]. In Fig. 10(b)

Fig. 9. Results of learning the initial HR estimates for the test images of size64� 64 shown in Fig. 6 �� � ��. (a) Image expanded using bicubic inter-polation, (b) initial HR estimates obtained using the approach proposed in [23],and (c) initial HR estimates obtained using the proposed wavelet-based learningtechnique.

TABLE IMEAN SQUARED ERROR COMPARISON FOR THE INITIAL HR ESTIMATE

OBTAINED USING DIFFERENT TECHNIQUES

TABLE IICOMPUTATIONAL COMPLEXITY OF THE PROPOSED ALGORITHM IN TERMS OF

TIME REQUIRED FOR LEARNING THE INITIAL HR ESTIMATE FOR � � � AND

� � �

and (e), we display the upsampled images by using the bicubicinterpolation as the initial HR estimate while in Fig. 10(c) and(f) correspond to the results obtained using the learned imageas the initial HR estimate. We can observe that in image 1,the characters and on the keyboard are clearly visiblein Fig. 10(c) and (f). The window grills are well preserved inthe image 2 for the proposed method. Similarly, we can seeimprovements in image 3 when we use the proposed method.We can clearly observe that the eyes and the mouth regionshave better details. Also the text behind the person is clearlyreadable. It can be seen from the comparison of images inFig. 10(b) and (c) as well as images in Fig. 10(e) and (f) that the

Page 11: New Learning Based Super-Resolution: Use of DWT and IGMRF Prior

GAJJAR AND JOSHI: NEW LEARNING BASED SUPER-RESOLUTION 1211

Fig. 10. Results for gray scale images, (a)-(c) for � � � and (d)-(e) for � � �.Images in (a) and (d) correspond to super-resolved (SR) images using MRFprior and wavelet prior as in [23]. Images in (b) and (e) correspond to the up-sampled images using the bicubic interpolated as initial HR estimate obtainedwhile regularization. Images in (c) and (f) correspond to super-resolved (SR)images using the proposed approach.

learned initial HR estimate produces better final solution whencompared to using the bicubically interpolated image used asthe initial estimate. This is because the decimation entries andthe IGMRF parameters estimated from the learned initial HRestimate are close to the true values when compared to thoseestimated from the bicubically interpolated initial estimate.The comparison of the performance in terms of the meansquared error is shown in Table III. We see that the MSE usingthe proposed learning-based approach is less when comparedto that obtained using the bicubic interpolation as the initialestimate as well as with that proposed in [23].

We now show the experiments for color image super-res-olution. As we have discussed earlier the luminance compo-nent of the color image is super-resolved using the proposedlearning-based method and the chrominance components areinterpolated in the wavelet domain. The super-resolved colorimage is then obtained by applying color space transformationon the super-resolved luminance component and interpolatedchrominance components. The results are compared with thebicubic interpolation, the approach proposed in [23], and thesingle frame super-resolution approach proposed by Kim andKwon [17]. In the later method for super-resolution, the data-base consists of HR images and their downsampled versions.

TABLE IIIPERFORMANCE COMPARISON IN TERMS OF MEAN SQUARED ERROR FOR THE

GRAY SCALE IMAGE SUPER-RESOLUTION.

Fig. 11. LR observed color images. Image 1, Image 2, and Image 3 correspondto LR database images while Image 4 is captured using a different LR camera.

For this experiment we show the results for an upsampling factorof 4 only. Fig. 11 shows the observed color images. Image 1,Image 2, and Image 3 are chosen as test images from the data-base; hence, the true HR image is available for comparison.Image 4 is captured using a LR camera other than the one usedto capture the images for the database. Fig. 12(a) shows theimages obtained using bicubic expansion and Fig. 12(b) showsthe super-resolved images obtained using the MRF and waveletpriors proposed in [23]. The results of super-resolution usingthe method proposed in [17] is shown in Fig. 12(c). Finally, inFig. 12(d), we display the results obtained using our approach.Comparison of the figures show more clear details in the super-resolved images using the proposed approach. It can be seenfrom Fig. 12(d) that the edges in window panes in the image1 are well preserved while using the proposed approach. Onceagain as in the gray scale image super-resolution the text behindthe person in image 2 is clearly readable as compared to that inFig. 12(a)–(c). Also, in the same image, the spectacle glassesand the lips look clearer. In image 3, the eyes and nostrils looksharper in Fig. 12(d). The smooth region like cheeks look better.As mentioned earlier, Image 4 is captured using a LR cameraother than that used to construct the database. In this image thehorizontal and vertical bars of the gate appear clear as comparedto that in Fig. 12(a)–(c). From this result, it can be seen that thealgorithm works for images captured using different cameras aswell. In each of the images super-resolved using the proposedmethod the natural correspondences between the , , andcomponents are in balance and there are no color artifacts. Thecomparison using the mean squared error for color super-res-olution is shown in Table IV. We see from the table that thesuper-resolved images using the proposed technique have com-parable MSE with the other methods.

IX. CONCLUSION

We have presented a new approach to super-resolve an imageusing single observation. The missing high-frequency detailsare learned from a database consisting of LR images and theirHR versions all captured by varying resolution settings of a

Page 12: New Learning Based Super-Resolution: Use of DWT and IGMRF Prior

1212 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 5, MAY 2010

Fig. 12. Results on the color images for � � �. Images in each column cor-respond to (a) image expanded using bicubic interpolation, (b) super-resolved(SR) image using MRF prior and wavelet prior as in [23], (c) SR image usingexample-based single image super-resolution proposed by Kim and Kwon in[17], and (d) SR image using the proposed method.

TABLE IVPERFORMANCE COMPARISON IN TERMS OF MEAN SQUARED ERROR

FOR THE COLOR SUPER-RESOLUTION �� � ��. WE HAVE NOT

SHOWN MSE COMPARISON FOR IMAGE 4 BECAUSE THE ACTUAL

HR IMAGE IS NOT AVAILABLE

real camera. A DWT-based learning is used to obtain an ini-tial HR estimate and the super-resolved image is obtained using

an MAP estimate. An inhomogeneous Gaussian MRF modelis used as a prior. Both the model parameters as well as thedecimation are estimated using the learned HR estimate. Wehave extended the algorithm to super-resolve the color images,where the luminance component is super-resolved using pro-posed technique and the chrominance components are inter-polated using the wavelet transform. The super-resolved grayimage as well as the color images using the proposed methodare less noisy in constant areas and preserve the textures andsharp details in other regions.

ACKNOWLEDGMENT

The authors would like to thank the anonymous reviewers andthe editor for their constructive suggestions that have greatlyimproved the contents of the paper. P. P. Gajjar would like tothank Prof. J. Zhang of Fudan University, Shanghai, China, andKim and Kwon for their code for comparing the results.

REFERENCES

[1] R. Y. Tsai and T. S. Huang, “Multiframe image restoration and regis-tration,” Adv. Comput. Vis. Image Processs., pp. 317–339, 1984.

[2] R. C. Hardie, K. J. Barnard, and E. E. Armstrong, “Joint map registra-tion and high-resolution image estimation using a sequence of under-sampled images,” IEEE Trans. Image Process., vol. 6, pp. 1621–1633,1997.

[3] M. Elad and A. Feuer, “Restoration of a single super-resolution imagefrom several blurred, noisy and undersampled measured images,” IEEETrans. Image Process., vol. 6, pp. 1646–1658, 1997.

[4] D. Capel and A. Zisserman, “Automated mosaicing with super-reso-lution zoom,” in Proc. IEEE Int. Conf. Computer Vision and PatternRecognition, 1998, pp. 885–891.

[5] H. Shen, L. Zhang, B. Huang, and P. Li, “A MAP approach for jointmotion estimation, segmentation, and super resolution,” IEEE Trans.Image Process., vol. 16, pp. 479–490, 2007.

[6] B. K. Gunturk and M. Gevrekci, “High-resolution image reconstructionfrom multiple differently exposed images,” IEEE Signal Process. Lett.,vol. 13, no. 4, pp. 197–200, Apr. 2006.

[7] K. V. Suresh, G. M. Kumar, and A. N. Rajagopalan, “Superresolutionof license plates in real traffic videos,” IEEE Trans. Intell. Transport.Syst., vol. 8, no. 2, pp. 321–331, Feb. 2007.

[8] S. Farsiu, D. Robinson, M. Elad, and P. Milanfar, “Advances and chal-langes in super-resolution,” Int. J. Imag. Syst. Technol., vol. 14, pp.47–57, 2004.

[9] S. Farsiu, M. Elad, and P. Milanfar, “Multi-frame demosaicing andsuper-resolution of color images,” IEEE Trans. Image Process., vol.15, no. 1, pp. 141–159, Jan. 2006.

[10] W. T. Freeman, T. R. Jones, and E. C. Pasztor, “Example-based super-resolution,” IEEE Comput. Graph. Appl., vol. 22, no. 2, pp. 56–65, Feb.2002.

[11] R. Shyamsundar, M. D. Gupta, N. Petrovic, and T. S. Huang,“Learning-based nonparametric image super-resolution,” EURASIP J.Appl. Signal Process., pp. 1–11, 2006.

[12] J. Sun, N. Zheng, H. Tao, and H. Shum, “Image hallucination withprimal sketch priors,” in Proc. IEEE Int. Conf. Computer Vision andPattern Recognition, 2003, vol. II, pp. 729–736.

[13] S. Baker and T. Kanade, “Limits on super-resolution and how to breakthem,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 9, pp.1167–1183, Sep. 2002.

[14] A. Chakrabarti, A. N. Rajagopalan, and R. Chellappa, “Super-resolu-tion of face images using kernel PCA-based prior,” IEEE Trans. Mul-timedia, vol. 9, pp. 888–892, 2007.

[15] K. Ni and T. Q. Nguyen, “Image superresolution using support vectorregression,” IEEE Trans. Image Process., vol. 16, pp. 1596–1610,2007.

[16] W. T. Freeman, E. C. Pasztor, and O. T. Carmichael, “Learning low-level vision,” Int. J. Comput. Vis., vol. 40, no. 1, pp. 25–47, 2000.

[17] K. I. Kim and Y. Kwon, “Example-based learning for single-imagesuper-resolution,” in Proc. 30th Annu. Symp. Deutsche Arbeitsgemein-schaft fur Mustererkennung, 2008, pp. 456–465.

Page 13: New Learning Based Super-Resolution: Use of DWT and IGMRF Prior

GAJJAR AND JOSHI: NEW LEARNING BASED SUPER-RESOLUTION 1213

[18] F. Brandi, R. de Queiroz, and D. Mukherjee, “Super resolution of videousing key frames,” in Proc. IEEE Int. Symp. Circuits Syst., 2008, pp.1608–1611.

[19] Y. W. Tai, W. S. Tong, and C. K. Tang, “Perceptually-inspired andedge-directed color image super-resolution,” in Proc. IEEE Int. Conf.Computer Vision and Pattern Recognition, 2006, vol. 2, pp. 1948–1955.

[20] C. Liu, H. Shum, and C. Zhang, “A two-step approach to hallucinatingfaces: Global parametric model and local nonparametric model,” inProc. IEEE Int. Conf. Computer Vision and Pattern Recognition, 2001,pp. 192–198.

[21] T. M. Chan and J. P. Zhang, “An improved super-resolution with man-ifold learning and histogram matching,” in Proc. IAPR Int. Conf. Bio-metric, 2006, pp. 756–762.

[22] T. M. Chan, J. P. Zhang, J. Pu, and H. Huang, “Neighbor embeddingbased super-resolution algorithm through edge detection and featureselection,” Pattern Recognit. Lett., vol. 30, no. 5, pp. 494–502, 2009.

[23] C. V. Jiji, M. V. Joshi, and S. Chaudhuri, “Single-frame image su-perresolution using learned wavelet coefficients,” Int. J. Imag. Syst.Technol., vol. 14, no. 3, pp. 105–112, 2004.

[24] P. P. Gajjar and M. V. Joshi, “Single frame super-resolution: A newlearning based approach and use of IGMRF prior,” in Proc. IndianConf. Computer Vision, Graphics and Image Processing, 2008, pp.636–643.

[25] H. M. Shapiro, “Embedded image coding,” IEEE Trans. SignalProcess., vol. 41, pp. 3445–3462, 1993.

[26] S. G. Mallat and W. Hwang, “Singularity detection and processing withwavelets,” IEEE Trans. Inf. Theory, vol. 38, pp. 617–643, 1992.

[27] S. G. Mallat and S. Zhong, “Characterization of signals from multi-scale edges,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 14, no. 7,pp. 710–732, Jul. 1992.

[28] R. R. Schultz and R. L. Stevenson, “A bayesian approach to imageexpansion for improved definition,” IEEE Trans. Image Process., vol.3, pp. 233–242, 1994.

[29] S. Geman and D. Geman, “Stochastic relaxation, gibbs distribution andthe bayesian restoration of image,” IEEE Trans. Pattern Anal. Mach.Intell., vol. PAMI-6, no. 6, pp. 721–741, 1984.

[30] M. V. Joshi, L. Bruzzone, and S. Chaudhuri, “Model-based approachto multiresolution fusion in remotely sensed images,” IEEE Trans.Geosci. Remote Sens., vol. 44, no. 9, pp. 2549–2562, Sep. 2006.

[31] A. Jalobeanu, L. Blanc-Féruad, and J. Zerubia, “An adaptive gaussianmodel for satellite image deblurring,” IEEE Trans. Image Process., vol.13, pp. 613–621, 2004.

[32] , S. Chaudhuri, Ed., Super-Resolution Imaging. Norwell, MA:Kluwer, 2001, ch. 2.

Prakash P. Gajjar received the B.E. degree fromSardar Patel University, Vallabh Vidyanagar, India,and the M.E. degree from the Dharmasinh DesaiInstitute of Technology, Nadiad, India, in 1991and 2003, respectively. He is currently pursuingthe Ph.D. degree at Dhirubhai Ambani Instituteof Information and Communication Technology,(DA-IICT), Gandhinagar, India.

At present, he is a senior lecturer at GovernmentPolytechnic, Ahmedabad, India.

Manjunath V. Joshi (M’06) was born inRanebennur, Karnataka, India. He received theB.E. degree from Mysore University and theM.Tech. and Ph.D. degrees from IIT Mumbai, India.

At present, he is a Professor at Dhirubhai AmbaniInstitute of Information and Communication Tech-nology (DA-IICT), Gandhinagar, India. He has beeninvolved in active research in the areas of signal pro-cessing, image processing, and computer vision forthe last five years. He has coauthored a book entitledMotion-Free Super Resolution (New York: Springer).

Dr. Joshi was given the outstanding researcher award in the EngineeringSection by Research Scholars Forum of IIT Bombay, India. He also receivedthe Best Ph.D. Thesis Award from Infineon India. He received the Dr. VikramSarabhai Award for the year 2006–2007 in the field of Information Technologyconstituted by the Government of Gujarat, India. He serves as a reviewer formany of the International conferences/journals in his areas of expertise. He isa life member of IETE (India).