10
P olygon-based representations of 3D objects offer resolution independence over a wide range of scales. With this approach, object boundaries remain sharp when we zoom in on an object until very close range, where faceting appears due to finite polygon size (see Figure 1). However, constructing polygon models for complex, real-world objects can be difficult. Image- based rendering (IBR), a comple- mentary approach for representing and rendering objects, uses cameras to obtain rich models directly from real-world data. Unfortunately, these representations no longer have resolution independence. When we enlarge a bitmapped image, we get a blurry result. Figure 2 shows the problem for an IBR ver- sion of a teapot image, rich with real-world detail. Standard pixel interpolation methods, such as pixel replication (Figures 2b and 2c) and cubic-spline interpolation (Fig- ures 2d and 2e), introduce artifacts or blur edges. For images enlarged three octaves (fac- tors of two) such as these, sharpening the interpolated result has little useful effect (Figures 2f and 2g). We call methods for achieving high-resolution enlargements of pixel-based images super-resolution algorithms. Many applications in graphics or image pro- cessing could benefit from such resolution indepen- dence, including IBR, texture mapping, enlarging consumer photographs, and converting NTSC video content to high-definition television. We built on anoth- er training-based super-resolution algorithm 1 and devel- oped a faster and simpler algorithm for one-pass super-resolution. (The one-pass, example-based algo- rithm gives the enlargements in Figures 2h and 2i.) Our algorithm requires only a nearest-neighbor search in the training set for a vector derived from each patch of local image data. This one-pass super-resolution algorithm is a step toward achieving resolution independence in image-based representations. We don’t expect perfect resolution independence—even the polygon represen- tation doesn’t have that—but increasing the resolution independence of pixel-based representations is an important task for IBR. Example-based approaches Super-resolution relates to image interpolation—how should we interpolate between the digital samples of a photograph? Researchers have long studied this prob- lem, although only recently with machine learning or sampling approaches. (See the “Related Approaches” sidebar for more details.) Three complimentary ways exist for increasing an image’s apparent resolution: 0272-1716/02/$17.00 © 2002 IEEE Image-Based Modeling, Rendering, and Lighting 56 March/April 2002 To address the lack of resolution independence in most models, we developed a fast and simple one-pass, training-based super- resolution algorithm for creating plausible high- frequency details in zoomed images. William T. Freeman, Thouis R. Jones, and Egon C. Pasztor Mitsubishi Electric Research Labs Example-Based Super-Resolution 1 (a) When we model an object with traditional polygon techniques, it lacks some of the richness of real-world objects but behaves properly under enlarge- ment. (b) The teapot’s edge remains sharp when we enlarge it. (a) (b)

Example-Based Super-Resolution

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Example-Based Super-Resolution

Polygon-based representations of 3Dobjects offer resolution independence

over a wide range of scales. With this approach, objectboundaries remain sharp when we zoom in on an objectuntil very close range, where faceting appears due to

finite polygon size (see Figure 1).However, constructing polygonmodels for complex, real-worldobjects can be difficult. Image-based rendering (IBR), a comple-mentary approach for representingand rendering objects, uses camerasto obtain rich models directly fromreal-world data. Unfortunately,these representations no longerhave resolution independence.When we enlarge a bitmappedimage, we get a blurry result. Figure2 shows the problem for an IBR ver-sion of a teapot image, rich withreal-world detail. Standard pixelinterpolation methods, such aspixel replication (Figures 2b and 2c)and cubic-spline interpolation (Fig-ures 2d and 2e), introduce artifacts

or blur edges. For images enlarged three octaves (fac-tors of two) such as these, sharpening the interpolatedresult has little useful effect (Figures 2f and 2g).

We call methods for achieving high-resolution

enlargements of pixel-based images super-resolutionalgorithms. Many applications in graphics or image pro-cessing could benefit from such resolution indepen-dence, including IBR, texture mapping, enlargingconsumer photographs, and converting NTSC videocontent to high-definition television. We built on anoth-er training-based super-resolution algorithm1 and devel-oped a faster and simpler algorithm for one-passsuper-resolution. (The one-pass, example-based algo-rithm gives the enlargements in Figures 2h and 2i.) Ouralgorithm requires only a nearest-neighbor search in thetraining set for a vector derived from each patch of localimage data. This one-pass super-resolution algorithm isa step toward achieving resolution independence inimage-based representations. We don’t expect perfectresolution independence—even the polygon represen-tation doesn’t have that—but increasing the resolutionindependence of pixel-based representations is animportant task for IBR.

Example-based approachesSuper-resolution relates to image interpolation—how

should we interpolate between the digital samples of aphotograph? Researchers have long studied this prob-lem, although only recently with machine learning orsampling approaches. (See the “Related Approaches”sidebar for more details.)

Three complimentary ways exist for increasing animage’s apparent resolution:

0272-1716/02/$17.00 © 2002 IEEE

Image-Based Modeling, Rendering, and Lighting

56 March/April 2002

To address the lack of

resolution independence in

most models, we developed

a fast and simple one-pass,

training-based super-

resolution algorithm for

creating plausible high-

frequency details in zoomed

images.

William T. Freeman, Thouis R. Jones, andEgon C. PasztorMitsubishi Electric Research Labs

Example-BasedSuper-Resolution

1 (a) When we model an objectwith traditional polygontechniques, it lacks some of therichness of real-world objects butbehaves properly under enlarge-ment. (b) The teapot’s edgeremains sharp when we enlarge it.

(a) (b)

Page 2: Example-Based Super-Resolution

■ Sharpening by amplifying existing image details.This is the change in the spatial frequency amplitudespectrum of an image associated with image sharp-ening. Existing high frequencies in the image areamplified. This is often useful to do, provided noiseisn’t amplified.

■ Aggregating from multiple frames. Extracting a singlehigh-resolution frame from a sequence of low-resolution video images adds value and is sometimesreferred to as super-resolution.

■ Single-frame super-resolution. The goal of this arti-cle is to estimate missing high-resolution detail thatisn’t present in the original image, and which we can’tmake visible by simple sharpening.

We feel researchers should use each method whereverapplicable, but in this article, we focus on single-framesuper-resolution. Although integrating resolutioninformation over multiple frames is sometimes calledsuper-resolution, for the purposes of this article, super-resolution will refer to the single-frame enlargementproblem.

Because the richness of real-world images is difficultto capture analytically, for the past several years, we’vebeen exploring a learning-based approach for enlarg-ing images.1-3 In a training set, the algorithm learns thefine details that correspond to different image regionsseen at a low-resolution and then uses those learnedrelationships to predict fine details in other images.(Recently, Hertzmann et. al4 also used a training-basedmethod to perform super-resolution, in the context ofanalogies between images. Baker and Kanade5 focusedon enlarging images of a known model class—for exam-ple, faces. Liu et. al6 built on their and our work.)

To understand why this approach should work at all,

IEEE Computer Graphics and Applications 57

2 (a) An image (100 × 100 pixels)of a real-world teapot shows arichness of texture but yields ablocky or blurred image when weenlarge it by a factor of 8 in eachdimension by (b, c) pixel replicationor (d, e) cubic-spline interpolation.((b) through (i) were 32 × 32 pixeloriginal subimages, zoomed by 8 to256 × 256 images). Sharpening thecubic-spline interpolation might nothelp to increase the perceptualsharpness; we used the “sharpenmore” option in Adobe Photoshop(f, g). (h, i) The results of our one-pass super-resolution algorithm,maintaining edge and line sharp-ness as well as inventing plausibletexture details.

(a)

(b) (c)

(d) (e)

(f) (g)

(h) (i)

Related ApproachesThe cubic spline1 is a common image

interpolation function, but it suffers from blurringedges and image details. Recent attempts toimprove on cubic-spline interpolation2-4 have metwith limited success. Schreiber3 proposed asharpened Gaussian interpolator function tominimize information spillover between pixelsand optimize flatness in smooth areas. Schultzand Stevenson5 used a Bayesian method forsuper-resolution, but it hypothesizes rather thanlearns the prior probability.

These analytic approaches often suffer fromperceived loss of detail in textured regions. Aproprietary, undisclosed algorithm, AltamiraGenuine Fractals 2.0 (an Adobe Photoshopplug-in, http://www.altamira.com), does as wellas any of the nontraining-based methods, butcan cause blurring in texture regions and at finelines. Recently, image interpolation-based level-set methods6 have shown excellent results foredges.

References1. R. Keys, “Cubic Convolution Interpolation for Digital

Image Processing,” IEEE Trans. Acoustics, Speech, Sig-

nal Processing, vol. 29, no. 6, 1981, pp. 1153-1160.2. F. Fekri, R.M. Mersereau, and R.W. Schafer, “A Gen-

eralized Interpolative Vq Method for Jointly OptimalQuantization and Interpolation of Images,” Proc. Int’l

Conf. Acoustics, Speech, and Signal Processing

(ICASSP), vol. 5, IEEE Press, Piscataway, N.J., 1998,pp. 2657-2660.

3. W.F. Schreiber, Fundamentals of Electronic Imaging

Systems, Springer-Verlag, New York, 1986.4. S. Thurnhofer and S. Mitra, “Edge-Enhanced Image

Zooming,” Optical Engineering, vol. 35, no. 7, July1996, pp. 1862-1870.

5. R.R. Schultz and R.L. Stevenson, “A BayesianApproach to Image Expansion for Improved Defin-ition,” IEEE Trans. Image Processing, vol. 3, no. 3, May1994, pp. 233-242.

6. B. Morse and D. Schwartzwald, “Image Magnifica-tion Using Levelset Reconstruction,” Proc. Interna-

tional Conf. Computer Vision (ICCV), IEEE CS Press,Los Alamitos, Calif., 2001, pp. 333-341.

Page 3: Example-Based Super-Resolution

consider that a collection of image pixels are special sig-nals that have much less variability than a correspond-ing set of completely random variables. Researchershave studied these regularities to account for the earlyprocessing stages of the mammalian visual systems.7,8

We exploit these regularities in our algorithms as well.We use small pieces of training images, modified for gen-eralization by appropriate preprocessing, to create plau-sible image information in other images. Withoutrestriction to a specific class of training images, it’sunreasonable to expect to generate the correct high-resolution information. We aim for the more attainablegoal of synthesizing visually plausible image details,such as sharp edges, and plausible looking texture.

Training set generationTo generate our training set, we start from a collec-

tion of high-resolution images and degrade each of themin a manner corresponding to the degradation we planto undo in the images we later process. Typically, we blurand subsample them to create a low-resolution imageof one-half the number of original pixels in each dimen-sion (one-quarter the total number of pixels). To changeresolution by higher factors, we typically use the single-octave algorithm recursively.

We apply an initial analytic interpolation, such ascubic spline, to the low-resolution image. This gener-ates an image of the desired number of pixels that lackshigh-resolution detail. In our training set, we only needto store the differences between the image’s cubic-splineinterpolation and the original high-resolution image.Figures 3a and 3c show low- and high-resolution ver-sions of an image. Figure 3b is the initial interpolation(bilinear for this example).

We want to store the high-resolution patch corre-sponding to every possible low-resolution image patch;these patches are typically 5 ×5 and 7 ×7 pixels, respec-tively. Even restricting ourselves to plausible imageinformation, this is a huge amount of information tostore, so we must preprocess the images to remove vari-

ability and make the training sets as generally applica-ble as possible.

We believe that the highest spatial-frequency compo-nents of the low-resolution image (Figure 3b) are mostimportant in predicting the extra details in Figure 3c. Wefilter out the lowest frequency components in Figure 3bso that we don’t have to store example patches for all pos-sible lowest frequency component values. We also believethat the relationship between high- and low-resolutionimage patches is essentially independent of local imagecontrast. We don’t want to have to store examples of thatunderlying relationship for all possible values of the localimage contrast. Therefore, we apply a local contrast nor-malization, which we describe later on in the “Prediction”section. In Figures 3d and 3e, we used the resulting band-pass filtered and contrast normalized image pairs fortraining. We undo the contrast normalization step whenwe reconstruct the high-resolution image.

Super-resolution algorithmsIf local image information alone were sufficient to

predict the missing high-resolution details, we wouldbe able to use the training set patches by themselves forsuper-resolution. For a given input image we want toenlarge, we would apply the preprocessing steps, breakthe image into patches, and look-up the missing high-resolution detail. Unfortunately, that approach doesn’twork, as Figure 4a illustrates. The resulting high-reso-lution detail image looks like oatmeal. The local patchalone isn’t sufficient to estimate plausible looking high-resolution detail.

Figure 4b illustrates why the local method doesn’twork. For a given low-resolution input patch, wesearched a typical training database of approximately100,000 patches to find the 16 closest examples to theinput patch (see the second line in Figure 4b). Each ofthese looks fairly similar to the input patch. The bottomrow shows the high-resolution detail corresponding toeach of these training examples; each of those looks fair-ly different from the other. This illustrates that local

Image-Based Modeling, Rendering, and Lighting

58 March/April 2002

3 Image preprocessing steps fortraining images. (a) We start from alow-resolution image and (c) itscorresponding high-resolutionsource. (b) We form an initial inter-polation of the low-resolutionimage to the higher pixel samplingresolution. In the training set, westore corresponding pairs of patch-es from (d) and (e), which are theband-pass or high-pass filtered andcontrast normalized versions of (b) and (c), respectively. This pro-cessing allows the same trainingexamples to apply in differentimage contrasts and low-frequencyoffsets.

(a)

(b) (c)

(d) (e)

Page 4: Example-Based Super-Resolution

patch information alone is insufficient for super-resolution, and we must take into account spatial neigh-borhood effects.

We explored two different approaches to exploit neigh-borhood relationships in super-resolution algorithms.The first uses a Markov network to probabilistically modelthe relationships between high- and low-resolution patch-es, and between neighboring high-resolution patches.1-3

It uses an iterative algorithm, which usually convergesquickly. The second approach, which we describe in detailin this article, is a one-pass algorithm that uses the samelocal relationship information as the Markov network. It’sa fast, approximate solution to the Markov network.

Markov networkWe model the spatial relationships between patches

using a Markov network, which has many well-knownuses in image processing.9 In Figure 5, the circles rep-resent network nodes, and the lines indicate statistical

dependencies between nodes. We let the low-resolutionimage patches be observation nodes, y. We select the 16or so closest examples to each input patch as the differ-ent states of the hidden nodes, x, that we seek to esti-mate. For this network, the probability of any givenhigh-resolution patch choice for each node is propor-tional to the product of all sets of compatibility matri-ces ψ relating the possible states of each pair ofneighboring hidden nodes, and vectors φrelating eachobservation to the underlying hidden states:

(1)

Z is a normalization constant, and the first product isover all neighboring pairs of nodes, i and j. yi and xi arethe observed low-resolution and estimated high-reso-lution patches at node i, respectively.

To specify the Markov network’s ψij (xi, xj) functions,

P x yZ

x x x yij

ij

i j i

i

i i| , ,( ) = ( ) ( )( )∏ ∏1 ψ φ

IEEE Computer Graphics and Applications 59

Input patch

Closest imagepatches from database

Correspondinghigh-resolution

patches from database

(a)

(b)

y3

y2

y4

y1

x3

x2

x4

x1

Φ(xi, yi)

Ψ(xi, xj)

Low-resolution patches

High-resolution patches

5 Markovnetwork modelfor the super-resolutionproblem. Thelow-resolutionpatches at eachnode yi are theobserved input.The high-resolution patchat each node xi

is the quantitywe want toestimate.

4 (a) Estimated high frequenciesfor the tiger image (Figure 3eshows the true high frequencies)formed by substituting the highfrequencies of the closest trainingpatch to Figure 3d. The lack of arecognizable image indicates thatan algorithm using only local low-resolution information is insuffi-cient; we must also use spatialcontext. (b) An input patch andsimilar low-resolution (middlerows) and paired high-resolution(bottom rows) patches. For many ofthese similar low-resolution patch-es, the high-resolution patches aredifferent, reinforcing the lessonfrom (a).

Page 5: Example-Based Super-Resolution

we use a simple trick.1 We sample the input image’s nodesso that the high-resolution patches overlap with eachother by one or more pixels. In the overlap region, thepixel values of compatible neighboring patches shouldagree. We measure dij (xi, xj), the sum of squared differ-ences between patch candidates xi and xj in their overlapregions at nodes i and j. The compatibility matrix betweennodes i and j is then

where σ is a noise parameter. We use a similar quadrat-ic penalty on differences between the observed low-resolution image patch, yi, and the candidate low-reso-lution patch found from the training set, xi, to specify theMarkov network compatibility function, φi (xi, yi).

The optimal high-resolution patches at each node is thecollection that maximizes the Markov network’s proba-bility. Finding the exact solution can be computationallyintractable, but we’ve found good results using the approx-imate solution obtained by running a fast, iterative algo-rithm called belief propagation. Typically, three or fouriterations of the algorithm are sufficient (see Figure 6).

The belief-propagation algorithm updates “mes-sages,” mij from node i to node j, which are vectors ofthe dimensionality of the state we estimate at node j.For example, with Figure 4b, the incoming messageswould have dimension 16—one to modify the proba-bility of each candidate high-resolution patch. Usingmij(xj) to indicate the component of the vector mij cor-responding to the patch candidate xj, the rule1,10 forupdating the message from node i to node j is

(2)

The sum is over all patch candidatesxi at node i, and the product is overall neighbors of the node i except fornode j. Upon convergence, thebelief-propagation estimate of themarginal probability bi for eachhigh-resolution patch xi at eachnode i is

(3)

(Yedidia, Freeman, and Weiss11

show the connection between thisestimate and an approximation usedin physics due to Bethe. Freeman,Pasztor, and Carmichael1 providedetails of the belief-propagationimplementation.)

One-pass algorithmThe fact that belief propagation

converged to a solution of theMarkov network so quickly led us tobelieve that simpler machinery

b x m x x yi i ki i i

k

i i( ) = ( ) ( )∏ φ ,

m x x x m x x yij j ij

x

i j ki i i

k j

i i

i

( ) = ( ) ( ) ( )∑ ∏≠

φ φ, ,

ψ ij i jij i j

x xd x x

, exp,( ) = −

( )

2 2σ

Image-Based Modeling, Rendering, and Lighting

60 March/April 2002

Input

MeanAbs + ε∗ α

÷

Concatenate

Best match

Highfrequencies

Training data

7 Block dia-gram showingraster-orderper-patch pro-cessing. At eachstep, we uselocal low- andhigh-frequencydetails (in greenand red, respec-tively) to searchthe training setfor a new high-frequencypatch, which weadd to the high-frequencyimage.

6 Belief-propagation solution to the Markov networkfor super-resolution. (a, b, and c) Estimated high fre-quencies after 0, 1, and 3 iterations of belief propaga-tion. (d) Estimated full-resolution image. We appliedthe inverse of the contrast normalization we used inFigure 3d to (c). We added the result to Figure 3b toobtain (d). The training set for this image was twocategories of the Corel database, including other tigers,but not this image.1

(a)

(b)

(c) (d)

Page 6: Example-Based Super-Resolution

might suffice. We found a one-passalgorithm that gives results that arenearly as good as the iterative solu-tion to the Markov network.

In the one-pass algorithm, we onlycompute high-resolution patch com-patibilities for neighboring high-res-olution patches that are alreadyselected, typically the patches aboveand to the left, in raster-scan orderprocessing. If we prestructure thetraining data properly (see Figure 7),we can match the local low-resolution image data andselect the compatible high-resolution patch candidate ina single operation—finding the nearest neighbor to agiven input vector in the training set. The simplificationavoids various steps in setting up and solving the Markovrandom field (MRF) of the previous section: finding thecandidate set at each node, finding the compatibilitymatrices between all pairs of nodes, and using the itera-tive belief-propagation algorithm. Figure 8 shows a sec-tion of an image enlarged by both methods. We find thatthe one-pass algorithm is of approximately the samequality as the MRF-based algorithm for this problem.

Algorithm detailsIn the simplest terms, one-pass super-resolution gen-

erates the missing high-frequency content of a zoomedimage as a sequence of predictions from local imageinformation. We subdivide the input image into low-fre-quency patches that are traversed in raster-scan order.At each step, a high-frequency patch is selected by anearest neighbor search from the training set based onthe local low-frequency details and adjacent, previous-ly determined high-frequency patches.

As we already mentioned, it takes two steps to createan enlarged image with the desired number of pixelsand corresponding additional image details. First, wedouble the number of pixels in the image, using a con-ventional image interpolation method such as cubic-spline or bilinear interpolation. Then, we predictmissing image details in the interpolated image to cre-ate the super-resolution output.

In the algorithms we describe next, we perform theinitial interpolation via cubic-spline interpolation. Wescale an image down by convolving with a [0.25 0.50.25] blurring filter followed by subsampling on the evenindices. (Freeman, Pasztor, and Carmichael1 use linearinterpolation for the upsampling, which puts slightlymore interpolation burden on the rest of the algorithm.)

PredictionGiven the highest frequencies in an input image, the

super-resolution algorithm predicts the next octave up—that is, the frequencies missing from an image zoomedwith cubic interpolation. The algorithm’s output is thesum of its input and the high-frequency predictions.

We predict the high frequencies for the N × N pixelpatches in raster-scan order. Each prediction is based ontwo competing requirements. First, the high-frequencypatch should come from a location in the training imagethat has a similar low-frequency appearance. Second,

the high-frequency prediction should agree at the edgesof the patch with the overlapping pixels of its neighborsto ensure that the high-frequency predictions are com-patible with those of the neighboring patches.

We fulfill the first requirement by extracting a low-frequency patch—M × M, not necessarily the same sizeas the high-frequency patch—from the image we’relooking at to find a match in the training set, which ismade up of pairs of low- and high-frequency patches.To meet the second requirement, we overlap predictedpatches at their borders. When searching the trainingset, we also use the high-frequency data previously pre-dicted to select the best pair. A user-controlled weight-ing factor α adjusts the relative importance of matchingthe low frequency patch versus matching the neighbor-ing high-frequency patches in the overlap regions.

The super-resolution algorithm operates under theassumption that the predictive relationship between low-and high-resolution images is independent of local imagecontrast. Because of this, we normalize patch pairs bythe average absolute value of the low-frequency patchacross the color channels. (We add a small ε to avoid thedenominator becoming zero at very low contrasts. εeffectively defines a floor of local image contrast belowwhich we assume patch variability is due to noise.)

The pixels in the low-frequency patch and the high-frequency overlap are concatenated to form a searchvector. The training set is also stored as a set of such vec-tors, so we search for a match by finding the nearestneighbor in the training set. When we find a match, wereverse the contrast normalization on the high-frequency patch and add it to the initial interpolation toobtain the output image (see Figure 7).

Search algorithmWe search for matches using an L2 norm. Due to the

high dimension of the search space, finding the absolutebest match would be computationally prohibitive.Instead, we use a tree-based, approximate nearestneighbor search. The tree is built by recursively splittingthe training set in the direction of higher variation. Ateach step, we divide the set of tiles in half to maintain abalanced tree.

We use a best-first tree search to find a good match.This allows for a speed–quality trade-off: by searchingmore tree branches, we can find a better match. Becausebest-first search is unlikely to give the true best matchwithout searching most or all of the tree, we improvethe best-first match with a greedy downhill search in thegraph connecting approximate nearest neighbors in the

IEEE Computer Graphics and Applications 61

8 (a) Teapot image we enlarged by one octave using the belief-propagationmethod.1 (b) Teapot image we enlarged by one octave using our one-passmethod with the search method we describe in the “Search algorithm”section. The output of our simpler algorithm resembles that of the first.

(a) (b)

Page 7: Example-Based Super-Resolution

training set. This improves the match with negligiblecost. In all one-pass algorithm examples in this article,we connect each patch pair to its 32 approximate near-est neighbors, which we compute with a method simi-lar to Nene and Nayar.12

Training set and parametersWe build training sets for the super-resolution algo-

rithm from band-pass and high-pass pairs taken from aset of training images. Spatially corresponding M × Mlow-frequency and N × N high-frequency patches aretaken from image pairs.

Patch pairs are contrast normalized, as we describedearlier. We create the search vector for a patch pair byconcatenating the low-frequency patch and the regionthat will be overlapped in the high-frequency patch dur-ing the prediction phase, adjusted by the weighting fac-tor α (see Figure 9).

We used the same set of training images for all thesuper-resolution examples in this article (see Figure 9).We took them with a Nikon Coolpix 950 digital cameraat 640 × 480 resolution and used the highest quality

compression settings.Paying attention to parameter set-

tings can improve image quality. Forboth levels of zooming, we used 5 ×5 pixel high-resolution patches (N =5) with 7 × 7 pixel low-resolutionpatches (M = 7). The overlapbetween adjacent high-resolutionpatches was 1 pixel. These patchsizes capture small details well.

For a more conservative estimateof the higher resolution detail (notused here), we apply the algorithmfour times at staggered offsets rela-tive to the patch sampling grid. Thisgives four independent estimates ofthe high frequencies, which we canthen average together, smoothingsome image details but potentiallyreducing artifacts.

The parameter α controls thetrade-off between matching the low-resolution patch data and finding ahigh-resolution patch that is com-patible with its neighbors. The value

gave good quality results in ourexperiments. The fraction compen-sates for the different relative areasof the low-frequency patches andoverlapped high-frequency pixels asa function of M and N.

ResultsFigure 10 shows our algorithm

applied to a man’s face. The trainingset is from the images in Figure 9.

The resulting zooms are significantly sharper than thosefrom cubic-spline interpolation, preserving sharp edgesand image details.

Figure 11 shows an example where our low-level train-ing set alone isn’t enough to distinguish JPEG compres-sion noise from correct image data. The algorithminterprets the artifacts as image data and enhances them.Extensions of specialized high-level models5 might prop-erly handle images like this.

It might seem that to enlarge an image of one class—for example, a flower—we would need a training set thatcontained images of that same class—for example, otherflowers. However, this isn’t the case. Generic images canbe a good training set for other generic images. Figure12 shows an image (blurred and down-sampled froman original high-resolution image) zoomed with theone-pass super-resolution algorithm along with thesame image zoomed with cubic spline and the originalhigh-resolution image. Figure 12c shows the images weused from the training set in the super-resolution zoom.Figure 12b shows the details of a few patches in thezoomed image and their corresponding best matches in

α =

−0 1

2 1

2

.MN

Image-Based Modeling, Rendering, and Lighting

62 March/April 2002

9 Training images we used for theexamples in this article (unlessotherwise stated). We sampledpatches at 1 pixel offsets over eachof these images and over theirsynthetically generated low-resolu-tion counterparts (after preprocess-ing steps). These six 200 × 200images yielded a training set ofslightly more than 200,000 high-and low-resolution image patchpairs.

10 (a) Originalimage. (b) Cubic-splineinterpolation.(c) One-passsuper-resolutioninterpolation.

(a) (b) (c)

11 Failure example. (a) Originalimage. (b) Cubic-spline interpola-tion by factor of 4 in each dimen-sion. Note JPEG compressionartifacts are visible. (c) One-passsuper-resolution interpolation.Without high-level information, thealgorithm treats the JPEG noise as asignal and amplifies it.

(a) (b) (c)

Page 8: Example-Based Super-Resolution

IEEE Computer Graphics and Applications 63

Input

Cubic-spline zoom Super-resolution zoom True high-resolution image

Source image patches

Band-pass filtered and contrast normalized

True high-resolution pixels

High-resolution pixels chosen by super-resolution

Band-pass filtered and contrast normalized best-match

patches from training data

Best-match patches from training data

Training images

(a)

(b)

(c)

12 Example showing how the(one-pass) algorithm uses patchesin the training image to createdetail in the test image. (a) Testimage. (b) Patch matches. (c) Train-ing images with location of best-match patches marked.

Page 9: Example-Based Super-Resolution

the training set. Figure 12b’s top and bottom rows showthe image content of the patches in the super-resolutionimage and the training set. The second and fifth rows inFigure 12b show the low-resolution, contrast normal-ized patches. Figure 12b’s third row shows the high-resolution content of the original high-resolution image,and the fourth shows the high-resolution patch chosenby the super-resolution algorithm. Although not per-fect, the matches between the original and estimatedhigh-resolution patches are reasonably good. Note thatthe algorithm can use training patch examples fromsource image regions that look different than the regionswhere they are inserted into the zoomed image. Forexample, the orange bordered patch corresponds to ashadow boundary on wood in the training image (ofthree girls), but the algorithm applies it to zoom up agreen plant occlusion boundary. The band-pass filter-ing and contrast normalization allows for this reuse,which makes the training set more powerful.

Although the training set doesn’t have to be very sim-ilar to the image to be enlarged, it should be in the sameimage class—such as text or color image. In Figure 13,we enlarged the image in Figure 12a using a patholog-ical training set of images of text. (Freeman, Pasztor,and Carmichael1 give related experiments.) The algo-rithm does its best to explain the observed low-resolution image in its vocabulary of text examples,resulting in a zoom with high-resolution detail formedout of concatenated characters.

Discussion and conclusionsRecent patch-based texture synthesis models13,14 also

use spatial consistency constraints similar to those weapplied here. Our method differs from that of Hertz-mann et. al4 because it operates on tiles rather than per-pixel, providing a performance benefit. It also

normalizes the training set according to contrast andassumes that the highest frequency details in an imagecan be predicted using only the next lower octave. Thesetwo generalizations let us enlarge a wider class of imagesusing a single, generic training set, rather than restrict-ing us to operating on images that are very similar to thetraining image. Pentland and Horowitz15 used a train-ing-based approach to super-resolution, but they didn’tuse the spatial consistency constraints that we believeare necessary for good image quality.

If well-known objects are sparsely sampled in theimage, an image extrapolation based on local image evi-dence alone won’t produce the new details that theviewer expects. Very small face images are susceptibleto this problem. To address these properly, we wouldhave to add higher level reasoning to the algorithm. (SeeBaker and Kanade5 or Liu, Shum, and Zhang6 for super-resolution algorithms tuned to a particular class ofimages, such as faces or text.)

In the zoomed-up images, low-contrast details next tohigh-contrast edges can be lost because of the contrastnormalization fixing on the level of the high-contrastedge. Independent contrast normalization for differentimage orientations, each zoomed separately, mightaddress this problem. However, it isn’t clear that a one-pass implementation would suffice for that modification.

Finally, our algorithm works best when the data’s res-olution or noise degradations match those of the imagesto which it’s applied. Numerically, the root-mean-squared error from the true high frequencies tend to beapproximately the same as for the original cubic-splineinterpolation. Unfortunately, this metric has only a loosecorrelation with perceived image quality.16 Typical pro-cessing time for the single-pass algorithm is 2 seconds toenlarge a 100 × 100 image up to 200 × 200 pixels.

We’ve focused on enlarging single images. Enlarging

Image-Based Modeling, Rendering, and Lighting

64 March/April 2002

13 Super-resolutionexample using apathologicaltraining setcomposedentirely of textin one font. (a) An exampleimage from thetraining set. (b) Zoomedimage and (c) close-up.The algorithmdoes its best toinvent plausibledetail for thisimage, formingcontours byconcatenatedletters.

(a) (b) (c)

Page 10: Example-Based Super-Resolution

moving images is different in two respects. More inputdata exists, so multiple observations of the same pixelcould be used for super-resolution. Also, we must takecare to ensure coherence across subsequent frames sothat the made-up image details don’t scintillate in themoving image.

Our algorithms are an instance of a general training-based approach that can be useful for image-processingor graphics applications. Training sets can help enlargeimages, remove noise, estimate 3D surface shapes, andattack other imaging applications. ■

AcknowledgmentsWe thank Ted Adelson, Owen Carmichael, and John

Haddon for helpful discussions.

References1. W.T. Freeman, E.C. Pasztor, and O.T. Carmichael, “Learn-

ing Low-Level Vision,” Int’l J. Computer Vision, vol. 40, no.1, Oct. 2000, pp. 25-47.

2. W.T. Freeman and E.C. Pasztor, “Learning to EstimateScenes from Images,” Adv. Neural Information ProcessingSystems, M.S. Kearns, S.A. Solla, and D.A. Cohn, eds., vol.11, MIT Press, Cambridge, Mass., 1999, pp. 775-781.

3. W.T. Freeman and E.C. Pasztor, “Markov Networks forSuperresolution,” Proc. 34th Ann. Conf. Information Sci-ences and Systems (CISS 2000), Dept. Electrical Eng.,Princeton Univ., 2000.

4. A. Hertzmann et al., “Image Analogies,” Computer Graph-ics (Proc. Siggraph 2001), ACM Press, New York, 2001, pp.327-340.

5. S. Baker and T. Kanade, “Limits on Super-Resolution andHow to Break Them,” Proc. IEEE Conf. Computer Vision andPattern Recognition (CVPR), vol. II, IEEE CS Press, LosAlamitos, Calif., 2000, pp. 372-379.

6. C. Liu, H. Shum, and C. Zhang, “A Two-Step Approach toHallucinating Faces: Global Parametric Model and LocalNon-Parametric Model,” Proc. Int’l Conf. Computer Vision(ICCV), vol. I, IEEE CS Press, Los Alamitos, Calif., 2001,pp. 192-198.

7. D.J. Field, “What Is the Goal of Sensory Coding,” NeuralComputation, vol. 6, no. 4, July 1994, pp. 559-601.

8. O. Schwartz and E.P. Simoncelli, “Natural Signal Statisticsand Sensory Gain Control,” Nature Neuroscience, vol. 4, no.8, Aug. 2001, pp. 819-825.

9. S. Geman and D. Geman, “Stochastic Relaxation, GibbsDistribution, and the Bayesian Restoration of Images,” IEEETrans. Pattern Analysis and Machine Intelligence, vol. 6, no.4, Nov. 1984, pp. 721-741.

10. J. Pearl, Probabilistic Reasoning in Intelligent Systems: Net-works of Plausible Inference, Morgan Kaufmann, San Fran-cisco, 1988.

11. J.S. Yedidia, W.T. Freeman, and Y. Weiss, “GeneralizedBelief Propagation,” Advances in Neural Information Pro-cessing Systems, T.K. Leen, T.G. Dietterich, and V. Tresp,eds., vol. 13, MIT Press, Cambridge, Mass., 2001, pp. 689-695.

12. S.A. Nene and S.K. Nayar, “A Simple Algorithm for Nearest

Neighbor Search in High Dimensions,” IEEE Trans. PatternAnalysis and Machine Intelligence, vol. 19, no. 9, Sept. 1997,pp. 989-1003.

13. A.A. Efros and W.T. Freeman, “Image Quilting for TextureSynthesis and Transfer,” Computer Graphics (Proc. Sig-graph 2001), ACM Press, New York, 2001, pp. 341-346.

14. L. Liang et al. “Real-Time Texture Synthesis by Patch-BasedSampling,” ACM Trans. Graphics, vol. 20, no. 3, July 2001,pp. 127-150.

15. A. Pentland and B. Horowitz, “A Practical Approach to Frac-tal-Based Image Compression,” Digital Images and HumanVision. A.B. Watson, ed., MIT Press, Cambridge, Mass.,1993.

16. W.F. Schreiber, Fundamentals of Electronic Imaging Sys-tems, Springer-Verlag, New York, 1986.

William T. Freeman is an associ-ate professor of computer science atthe Massachusetts Institute of Tech-nology in the Artificial IntelligenceLab. His research interests includemachine learning applied to com-puter vision and computer graphics,

Bayesian models of visual perception, and interactiveapplications of computer vision. He has a PhD in comput-er vision from MIT and worked at Mitsubishi ElectricResearch Labs for nine years.

Thouis R. Jones is a graduate stu-dent in the Computer GraphicsGroup at the MIT Laboratory forComputer Science. His research inter-ests include shape representation forcomputer graphics, antialiasing, andsuper-resolution. He has a BS in com-

puter science from the University of Utah.

Egon C. Pasztor is a graduate stu-dent at the MIT Media Lab, where hehas worked on computer vision andis currently working on interfaces forcomputer-assisted musical composi-tion. His research interests includehuman–computer interfaces and

interaction technologies that make machine interaction amore natural and productive experience. He has a BS incomputer science from California Institute of Technology.

Readers may contact William Freeman at the MIT Arti-ficial Intelligence Lab, 200 Technology Square, Cambridge,MA 02139, email [email protected].

For further information on this or any other computingtopic, please visit our Digital Library at http://computer.org/publications/dlib.

IEEE Computer Graphics and Applications 65