PAPER TO APPEAR IN IEEE TRANSACTIONS ON ...cs.nju.edu.cn/ywguo/Image Retargeting Using Mesh...object shapes. Effective image retargeting should emphasize impor-tant content while retaining

PAPER TO APPEAR IN IEEE TRANSACTIONS ON MULTIMEDIA 1

Image Retargeting Using Mesh ParametrizationYanwen Guo, Feng Liu, Jian Shi,

Zhi-Hua Zhou, Senior Member, IEEE, and Michael Gleicher

Abstract—Image retargeting aims to adapt images to displays of small sizes and different aspect ratios. Effective retargeting requiresemphasizing the important content while retaining surrounding context with minimal visual distortion. In this paper, we presentsuch an effective image retargeting method using saliency-based mesh parametrization. Our method first constructs a mesh imagerepresentation that is consistent with the underlying image structures. Such a mesh representation enables easy preservation ofimage structures during retargeting since it captures underlying image structures. Based on this mesh representation, we formulatethe problem of retargeting an image to a desired size as a constrained image mesh parametrization problem that aims at finding ahomomorphous target mesh with desired size. Specifically, to emphasize salient objects and minimize visual distortion, we associateimage saliency into the image mesh and regard image structure as constraints for mesh parametrization. Through a stretch-basedmesh parametrization process we obtain the homomorphous target mesh, which is then used to render the target image by texturemapping. The effectiveness of our algorithm is demonstrated by experiments.

Index Terms—Image retargeting, mesh parametrization, attention model.

✦

1 INTRODUCTION

Mobile devices, such as cellular phones and PDAs,are increasingly common. These devices often have im-age display functionality. Normally, images have muchhigher resolutions and different aspect ratios than thesmall screens of these mobile devices. Those images needto be adapted to fit the target displays. We define theproblem of adapting an image to various target screensas image retargeting.

A common solution to image retargeting is to uni-formly rescale the original image according to the tar-get screen size. This naive scaling is problematic. Theimportant objects in image maybe become too smallto be recognized. Moreover, if the original image hasa different aspect ratio than target screen, distortion isintroduced into the result. Alternatively, an importantimage region is cropped and displayed [3], [21], [25].The disadvantage of cropping is the loss of contextualcontent which is important to appreciate the image.When there exist multiple important objects that arefaraway from each other, cropping inevitably loses someof them in order to keep a necessary resolution of theselected object.

Recent methods achieve focus+context image retarget-ing by non-uniform warping [12] or segmentation-basedimage composition [24], [23]. The former suffers from im-age distortion and the latter is subject to performance ofsegmentation. The seam carving method achieves imageresizing by iteratively carving less noticeable seams [2].

Yanwen Guo, Jian Shi, and Zhi-Hua Zhou are with the National Key Labfor Novel Software Technology, Nanjing University, Nanjing 210093, P. R.China.Email: [email protected], [email protected], [email protected] Liu, and Michael Gleicher are with the Department of ComputerSciences, University of Wisconsin-Madison.Email: {fliu, gleicher}@cs.wisc.eduManuscript received Sep. 1st, 2008; revised Mar. 2nd, 2009.

Although it tries to remove less noticeable seams, itstill produces artifacts like breaking objects since imagestructures have not been explicitly considered. Similarly,the non-homogenous retargeting method [29] tries toachieve seamless retargeting results without resortingto image segmentation. It, however, fails to take imagestructures into account, and sometimes breaks importantobject shapes.

Effective image retargeting should emphasize impor-tant content while retaining surrounding context withminimal visual distortion. Given the limited target sizeand different aspect ratios, we necessarily have to in-troduce distortion into the target image, in order toemphasize important content and meanwhile to retainthe context. The challenge is to minimize the visualdistortion. In this paper, we propose to represent theimage by a mesh that is consistent with the underly-ing image structures, and then adapt images via meshtransformation. This mesh representation enables us toeasily preserve image structures during retargeting. Alsoadapting images via transforming mesh ensures smoothimage transformation. This helps to achieve effectiveimage retargeting. Specifically, in this paper, we defineimage retargeting as a mesh parametrization problem,which aims to find a homomorphous mesh with thetarget size. We first build a feature relevant controllingmesh from the source image. To emphasize salient ob-jects, we associate image saliency with the mesh, andaccordingly encourage the mesh to morph in such away that the size of a salient mesh cell (triangle) isreduced less than non-salient ones. To minimize visualdistortion, our method encodes preservation of salientobjects and image structure as constraints during meshparametrization. The target mesh is solved using astretch minimizing parametrization scheme. The targetimage is finally rendered by texture mapping.


The main contribution of this paper is an effective im-age retargeting method which has the following benefits.

• Our method emphasizes important image contentwhile retaining the surrounding context with mini-mal visual distortion. Mesh representation endowssalient objects and strong structures with compactrepresentations. Emphasis of salient objects as wellas preservation of structures are simultaneouslyachieved by encoding them as constraints of para-metrization during retargeting.

• Image with multiple salient objects can be easilyretargeted via mesh parametrization. Retargeted po-sitions of salient objects are automatically deter-mined by the mesh parametrization process, ratherthan placing objects’ positions beforehand as previ-ous methods have done. Furthermore, exaggeratedscales of salient objects can be determined automat-ically or controlled by user with simple parametersetting.

The rest of this paper is organized as follows. Wegive a brief overview of previous work in Section II. InSection III, we describe the retargeting approach usingsaliency-based mesh parametrization in detail. We exam-ine our method in Section IV and conclude the paper inthe last Section.

2 RELATED WORK

The problem of retargeting images to small screenshas been considered by many researchers. Almost allimage browsers on small-screen devices provide scalingfunctionality. A large image is uniformly downsampledto fit the target screens. Important objects in the imageare often small and difficult to identify. To account fordifferent aspect ratios, either the distortion caused byaspect ratio change is introduced or black letter box isused to fill in the blank space, thus wasting the preciousspace.

Many methods identify an important region in the im-age and crop this region to fit target screen. These meth-ods usually rely on image analysis, such as attentionmodel extraction and object detection, to identify the im-portant region [3], [7], [9], [8], [17]. Eye-tracking systemscan also be used to infer the important region [21]. Afterobtaining the important region, it is cropped out [3],[21], [25]. These cropping-based retargeting methods caneffectively preserve important information of the originalimage, at the expense of losing valuable context infor-mation. Moreover, their performance is dependent onthe accuracy of important region detection. Alternatively,when multiple important objects are identified, a touron the image can be created by jumping through eachobject [10], [16]. While this scheme is effective in somesituations, it is infeasible when viewer must examinemany images within limited time.

Recently, Liu and Gleicher [12] presented a fish-eyeview warping scheme to achieve the goal of emphasizingthe important region while retaining the surrounding

context. This method can effectively achieve the fo-cus+context view of the original image. However, itcreates noticeable distortion, especially in the contex-tual region. Setlur et al. [24] proposed to segment theprominent objects and to re-compose them onto resizedbackground. This method can achieve focus+context atthe minimal distortion. However, the performance issubject to that of image segmentation. In [13], Gal et al.presented a Laplacian editing based texturing methodthat allows any image to be warped in a feature-awaremanner. One difference between their method and theproposed approach is that they employ regular grid torepresent input image, while we use triangular meshes.Using irregular triangular meshes enables a better ap-proximation of structured edges, thus facilitating thepreservation of them, especially for slant edges. A simi-lar quad grids-based method is given by Wang et al. [14],which computes the optimal target grids by distributingwarping scales non-uniformly among grid lines accord-ing to image content. An efficient formulation for thenon-linear optimization, which allows interactive imageresizing, has also been developed. Most recently, Avidanand Shamir [2] proposed a nice “seam carving” methodfor resizing images. Their method iteratively carves un-noticeable seams to reduce the image size. However, itoften breaks image structures since the carving proce-dure only relies on low-level saliency information andfails to consider image structures.

Retargeting videos has also been considered [5], [11],[15], [20], [27], [29]. For example, Fan et al. [5] used avisual attention model to find important regions, andprovided automatic, semi-automatic and manual modesfor users to select and zoom into these important re-gions while browsing. Liu and Gleicher [15] utilizedcinematography rules and designed a video retargetingsystem to automate pan and scan operation. Both Setluret al. [23] and Cheng et al. [4] proposed context re-composition for preserving salient information. Thesemethods first segment salient objects based on the de-tected saliency prior. Then they fill in the holes left afterobject segmentation. Finally, they compose salient objectsback onto the resized background video frames basedon a set of aesthetic criteria. Wolf et al. introduced anoptimization algorithm for non-homogeneous video re-targeting [29]. Considering spatial-temporal constraints,a per-pixel discrete transformation that shrinks less im-portant pixels is utilized to resize video sequences.

3 RETARGETING USING MESHPARAMETRIZATION

Image retargeting attempts to adapt a source imageto display of a smaller size and different aspect ratiothan originally intended. We formulate image retargetingas a mesh parametrization problem that emphasizesthe important content while retaining the surroundingbackground with slight distortion. Fig. 1 illustrates theoutline of our algorithm. To emphasize salient objects in


(a) (b) (c) (d) (e)

Fig. 1. Algorithm overview. Our algorithm first builds a feature relevant controlling mesh (b) from the input (a). The source meshis then associated with image saliency information (c) to emphasize the salient objects, marked with blue edges in (b). Afterwards,the target mesh with desired resolution is produced by solving a mesh parametrization problem (d). Standard texture mapping isfinally used to render target image as shown in (e).

image, we first calculate a saliency map of the sourceimage (Section 3.1). Then we build a controlling mesh ofthe input image that is consistent with the underlyingimage structures, and associate the saliency informationwith the source mesh (Section 3.2). In this way, retar-geting is transformed into a parametrization problemof finding a target mesh with the desired resolution. Inorder to improve visual quality of the target image, weinterpret preservation of saliency and image structure in-formation as constraints of parametrization (Section 3.3).Afterwards, the target mesh is solved by a constrainedstretch-based mesh parametrization scheme (Section 3.4).Retargeting result is finally rendered using the standardtexture mapping algorithm.

3.1 Importance Computation

The information of important objects in an image isnecessary for image retargeting to emphasize those ob-jects. Although semantic image understanding is be-yond the state-of-the-art, heuristic rules have been usedsuccessfully as an alternative (c.f. [16], [3], [23], [12],[25]). Like previous work, we use the low-level saliencyinformation as well as high-level object information toinfer important objects.

Saliency has been used in many applications to detectthe important region in an image. It has been extensivelyexplored in the past several years [9], [8], [3], [17]. We usethe contrast-based saliency detection method proposedby Ma et al. [17] to compute a normalized saliencyvalue for each image pixel. Based on heuristic rulesabout human visual perception, this method calculatesthe visual-feature contrast as saliency.

Human face and body are usually important in photos.We employ the Viola-Jones face detector to detect faces[26]. Besides, We have developed a method that usesthe located face to automatically estimate the body. Weestimate one rough head-shoulder mask covering humanbody and then extract human body using an iterativeGraph Cut algorithm [28]. Specially, saliency values forthose pixels in human body are set 1. In addition, userscan also specify important objects that are hard to detectautomatically.

3.2 Image Representation

Generating a mesh from an image has been used in somemultimedia applications like video compression [1]. Thekey issue is to generate a mesh that is consistent withthe input image structures. Inspired by the previouswork [1], we first detect feature points from the inputimage Is, and then use Delaunay triangulation [22]algorithm to generate the mesh as follows:

• We first evenly discretize the input image boundary,and use all the points there as part of the featurepoints.

• We employ an edge detection operator, for exampleCanny operator, for extracting some feature points.Note that, only part of Canny detected pixels arekept as mesh points using a distance threshold. Forkeeping uniformity of point density, some auxiliaryones are usually added, as mesh with nearly uni-form density normally facilitates its processing.

• Finally, we use the constrained Delaunay triangula-tion algorithm [22] to generate a feature-consistentmesh Ms (Fig. 2).

(a) (b)

Fig. 2. Mesh generation. (a) Input image with boundary (theyellow dots) and feature points (orange ones) evenly distributed.(b) Delaunay triangulation result. The rest mesh points areadded automatically with a distance threshold to maintain meshuniformity.

For clarity of exposition, we define here some nota-tions of the mesh. Let Mt be the target mesh to besolved. {Pi = (xi, yi)|i = 1, ..., n} and {Qi = (ui, vi)|i =1, ..., n} represent the points of Ms and Mt separately.{Qi|i = 1, ..., n} are to be solved in Mt, and Qi isthe counterpart of Pi of Ms. △P = △(Pi, Pj , Pk) and△Q = △(Qi, Qj, Qk) are corresponding triangle pairs ofMs and Mt. Due to the same topology of Mt and Ms,edge e(QiQj) in Mt corresponds to edge e(PiPj) in Ms.


In Fig. 1, (b) is Ms produced using the above schemeand (d) is Mt solved by parametrization.

In order to treat image regions discriminatively, weassociate the source mesh Ms with the saliency map ofIs. The salient value S△P

of a triangle △P is calculatedby averaging saliency values of all pixels in △P . Withthese associated saliency information, we can performmesh parametrization in such a way that salient edgesshrink less than those less salient ones. In this way,salient objects are emphasized.

3.3 Constraints in Parametrization

Retargeting should completely preserve salient objectsthat are immune from even slight distortion. Meanwhile,emphasize them to some extent. Furthermore, imagestructures in the remaining regions, which show in termsof strong edges, are important visual features. Theyshall be retained as-rigid-as possible. To achieve thesegoals, we define here several constraints, and incorpo-rate them into mesh parametrization. The constraintsinclude boundary constraint ensuring boundary con-sistence, saliency constraint emphasizing salient objectsand avoiding distorting them, and structure constraintpreserving strong edges as-rigid-as possible.

3.3.1 Boundary constraint

The source image boundary shall be mapped onto thatof the target screen. Therefore, we require that the targetmesh boundary should be adhere to the target screenboundary by defining the following constraints.

First, we map the points of Mt which correspondto four corner points of Ms onto the target screen’scorners by fixing those corner points. For the other non-corner points on the up and bottom boundaries, thev−coordinates of M′

ts boundary points are fixed as thecorresponding y−coordinates of the screen’s up and bot-tom boundary respectively. While their u−coordinatesare to be solved by parametrization. Similarly, forthose non-corner points on the left and right bound-ary, we fix their u−coordinates while solving for theirv−coordinates (Fig. 3).

PU

PB

PL PR

yU

yB

xL xR

QU=(uU,vU)

QB=(uB,vB)

QL

=(u

L,v

L)

QR

=(u

R,v

R)

(a) (b)

Fig. 3. Boundary constraint. (a) Boundary points PU , PB, PL,and PR of Ms should be mapped onto boundary of the targetscreen. (b) Accordingly, v−coordinates of QU and QB are setas yU and yB separately, and we solve for their u−coordinates.For QL and QR, opposite case exists.

Suppose that M′ts up, bottom, left, and right bound-

ary points are QU,i|i=1,.,nU, QB,i|i=1,.,nB

, QL,i|i=1,.,nL,

and QR,i|i=1,.,nRrespectively. Correspondingly, target

screen’s up and bottom y−coordinates are yU and yB . Itsleft and right x−coordinates are xL and xR. We expressthe above boundary constraints as:

FB =

nU∑

i=1

|vU,i − yU | +

nB∑

i=1

|vB,i − yB| +

nL∑

i=1

|uL,i − xL| +

nR∑

i=1

|uR,i − xR| = 0. (1)

3.3.2 Saliency constraint

We aim to emphasize salient objects in an image andminimize their distortion. Since automatically obtainingaccurate object boundary is difficult, we use a conser-vative strategy. Specifically, we cluster triangles withsaliency above a given threshold Sµ as an object. Gener-ally, we try to include non-object area rather than missa triangle that is a part of the object by using a smallthreshold. An object is defined as follows:

{Osj =

nj⋃

i=1

(S△i> Sµ)|j = 1, ..., J}, (2)

where Osj is a salient object in the source image withnj triangles. J is the number of salient objects. Throughextensive experiments, Sµ is set to 0.6 empirically.

Pos2Pos1

Qos1 Qos2

(a) (b)

Fig. 4. Saliency constraint. (a) The input image contains twosalient objects. (b) Each salient object undergos rigidly scaleand translation transform. Its retargeted part is determined bymass point’s position, together with a scale factor encoded inpolar coordinate transformation.

We define the following constraint to avoid distortionof objects. Specifically, only a scaling transformation isallowed for every salient object Osj . To achieve thisgoal, a mass-center point POsj

is first calculated byaveraging all points in Osj of Ms. Afterwards, the polarcoordinates (ri, θi) of Pi in Osj is obtained by takingPOsj

as the pole. During retargeting, Osj undergos rigidtransformation with a scale factor zOsj

. Hence the trans-formed polar coordinates of Pi equal (zOsj

·ri, θi). If givenQOsj

, the target position of POsjafter retargeting, we can

convert the polar coordinates into the target Euclideancoordinates of Qi, the corresponding target point of Pi


(Fig. 4). Assume that the formula of transforming polarcoordinates to Euclidean coordinates is f , we then have:

Qi = f(QOsj, zOsj

· ri, θi), (3)

We can represent the above saliency constraint as:

FSA =

J∑

j=1

(

nj∑

i=1

|Qi − f(QOsj, zOsj

· ri, θi)|) = 0, (4)

where retargeted part of Osj has nj points Qi|i=1,.,nj.

To emphasize salient objects, relative scales of salientobjects should be exaggerated in contrast to other re-gions so that salient ones have higher resolution. Wecalculate the scale factor zOsj

as follows. For a sin-gle salient object in image, zOsj

can be determinedby restricting the retargeted salient object within thetarget screen. If several salient objects exist, total areaof retargeted objects should not exceed that of the targetscreen. Furthermore, the summed width and height ofretargeted objects should not exceed those of the targetscreen as well.

In parametrization, QOsjis the only unknown parame-

ter for salient object Osj to be solved by parametrization.All points of retargeted salient object are encoded by it in(3). Once QOsj

is known, they can be figured out using(3).

3.3.3 Structure constraint

Strong edges are important visual features. They arevital clues for understanding image content, and shouldbe maintained as-rigid-as possible. As salient objectsare preserved completely by using the above saliencyconstraint, we only address here strong edge segmentsin the rest regions. We detect those strong edge segmentsusing Hough transform. The segments detected are asfollows:

{Lk = (

mk⋃

i=1

Pk,i)|k = 1, ..., K}. (5)

where Lk is the segment passing through pointsPk,i|i=1,..,mk

of Ms as illustrated in Fig. 5 (a). K is thenumber of segments. As Hough transform may producetoo many trivial segments, we filter out those segmentsby requiring that the length of each segment exceeds agiven threshold. The default value of this threshold is setto 0.1 fold of the minimum of ws and hs. Furthermore,only one segment is reserved if several segments areclose enough. In implementation, Hough transform isfirst exerted and the resulting segments are discretizedinto points before Delaunay triangulation for ensuringthat mesh edges adhere to strong edges.

We require that Qk,i|i=1,..,mk, target points of

Pk,i|i=1,..,mk, still pass through a line. We call it

linearity constraint. Assume the retargeted lines are:

{RLk = (

mk⋃

i=1

Qk,i)|k = 1, ..., K}. (6)

P1,i0

P1,i, i=1,...,5 P2,i, i=1,...,8

Q1,i, i=1,...,5 Q2,i, i=1,...,8

(a) (b)

Fig. 5. Structure constraint. (a) Two feature point sets P1,i,i=1,.5

and P2,i,i=1,.8 adhere to strong edges detected in input image.(b) Their retargeted sets Q1,i,i=1,...,5 and Q2,i,i=1,...,8 shouldpass through lines as well. We interpret this as soft constraint.

RLk is the retargeted line of Lk by the expression y =ak·x + bk. Qk,i satisfies vk,i = ak·uk,i + bk. Here ak andbk can be easily represented by two points in RLk.

In practice, however, some line segments have toviolate the linearity constraint in order to emphasizeimportant objects in the input image. So we define thefollowing soft energy constraint and encourage meet ofthe above linearity constraint as much as possible:

EST =

K∑

k=1

(

mk∑

i=1

(vk,i − ak·uk,i + bk)2). (7)

We have defined several constraints. In following, weshow how to integrate them into parametrization as softor hard constraints during parametrization.

3.4 Constrained Mesh Parametrization

With the source mesh Ms, we formulate retargeting as aparametrization problem that tries to find a target meshMt. Mt has the same topology as Ms and the targetscreen size. Mesh parametrization has been extensivelystudied in computer graphics. Its original intention isto establish the correspondence between unordered 3Dmesh and the mesh on parametric domain, e.g., plane orsphere. Most methods typically seek solutions by mini-mizing the metric deformation involved in parametriza-tion. Till now, many different metric criteria have beenbrought forward. Among them, stretch-based methodsusually work well when reducing the global mesh dis-tortion [18], [30]. The basic observation is that from per-spective of discrete computational geometry, the meshcan be fully determined by the lengthes of all its edges.We extend traditional 3D-to-2D mesh parametrization to2D-to-2D, and solve Mt using one constrained stretch-based parametrization scheme.

3.4.1 Computation of edge lengthes of target mesh

Mt is completely characterized by the lengths of itsedges. As edges of salient objects are determined bytheir scale factors, we describe the method of computingthe ideal lengths of remaining edges of Mt. Such ideal


ws

hs

os1 os2

wt

ht

(a) (b)

Fig. 6. Scale factor for each edge is determined by resolutionvariance of retargeting, as well as its position with respect tosalient objects.

lengths will serve as the objective stretches in parame-trization. Reasonable length setting normally facilitatesthe parametrization process, leading to topology validtarget mesh.

Intuitively, edges of Mt are transformed from theircounterparts of Ms with respect to resolution variationof retargeting. Transforming scale of each edge is alsosubject to exaggerated scales of salient objects. We ad-dress such scale in x and y directions separately. Todetermine transforming scale in x direction, the mesh isdivided according to salient objects’ positions as shownin Fig. 6 (a). We first compute a virtual scale for eachmesh point, then compute the transforming scale for oneedge by averaging scales of its two points.

Let ws×hs and wt×ht represent resolutions of thesource image and target screen separately. It is obviousthat if a point Pi = (xi, yi) lies in the red rectangle inFig. 6 (a), its ideal scale should be wt/ws. Otherwise,if it lies in the green or blue rectangle, the scale is alsorelated to the scale factors of salient objects. Assume thatsegment lengthes formed by the line y = yi intersectingwith salient objects are {lOsj

|j = 1, ..., J}. The ideal scalefactor of Pi is

(wt −J

∑

j=1

zOsj∗ lOsj

)/(ws −J

∑

j=1

lOsj). (8)

where zOsjis the scale factor of Osj . For instance the

mesh of Fig. 6 (a), if Pi lies in the red rectangle, lOs1=

lOs2= 0. If it lies in the green rectangle, lOs1

= 0 and lOs2

is positive. Otherwise both lOs1and lOs2

are positive.

We can similarly compute the virtual scale for Pi iny direction. For each edge e(PiPj), its scale factor iscalculated by averaging the scale factors of Pi and Pj inx and y directions separately. Let (sxij , syij) be e(PiPj)

′sscale factor.

Assume that x and y directional lengthes of edgee(PiPj) are (lxij , lyij). Taking the above e(PiPj)

′s scalefactor into account, the reasonable length configurationof edge e(QiQj) in Mt is:

lij = le(QiQj) =√

(sxij ∗ lxij)2 + (syij ∗ lyij)2. (9)

3.4.2 Mesh parametrizationMt can be inferred from its edge lengths by minimizingthe following energy function:

El =∑

(QiQj)∈edges

(||Qi − Qj ||2 − l2ij)

2/l2ij . (10)

For salient object Osj , stretching terms about edgeswithin its retargeted part are precluded from (10). Onlythe terms about edges connecting their boundary pointswith outer points take effect. Boundary points are en-coded by retargeted pole of Osj as described in subsec-tion 3.3.2.

Taking symmetry of the above equation into account,energy gradients on point Qi are:

∂El

∂ui

= 8∑

(QiQj)∈edges

(||Qi − Qj ||2 − l2ij) · (ui − uj)/l2ij (11)

∂El

∂vi

= 8∑

(QiQj)∈edges

(||Qi − Qj||2 − l2ij) · (vi − vj)/l2ij (12)

Obviously, the above equations can be easily solved.But when Mt is excessively dense, directly solving theabove equations may cause adjacent triangles of Mt toflip over, leading to invalid topology. This is caused byinverting the orientation of the triangle points. To tacklethis issue, we revise the energy function by penalizingreversion of triangle orientation with sign function [18].

Original mesh

x

y

Pj

Pk2

Pi

Pk1

Retargeted meshu

v

Qj

Qi

Qk2

Qk1

(a) (b)

Fig. 7. Orientations of the triangles (b) in retargeted meshshould keep consistent with orientations of their correspondingones (a) in original mesh.

Assume that △Q1 = △(QiQk1Qj), △Q2 =△(QiQjQk2) are two adjacent triangles incident uponedge e(QiQj). Their corresponding triangles in Ms ofthe source image are △P1 = △(PiPk1Pj), and △P2 =△(PiPjPk2) (Fig.7). For each pair of corresponding tri-angles, the orientations of points should be equal. Toachieve this, we define:

wij = sign min(det(−−−−→QiQk1,

−−−−→QjQk1) · det(

−−−→PiPk1,

−−−→PjPk1),

det(−−−−→QiQk2,

−−−−→QjQk2) · det(

−−−→PiPk2,

−−−→PjPk2)). (13)

The new energy function is defined as follows accord-ingly:

El =∑

(QiQj)∈edges

(wij · ||Qi − Qj ||2 − l2ij)

2/l2ij , (14)


Input Mesh Target mesh Our result

Fig. 8. Our retargeting results. The first column shows the input images with lines detected. The middle two columns are themeshes resulting from triangulation and the target meshes. In the last column, we show our retargeting results.

where coefficient wij penalizes the triangle in Mt thatflips over its orientation. If so, wij is chosen as -1,otherwise +1. This scheme guarantees a valid targetmesh.

Overall, taking into account constraints in subsectionIII.3, the constrained mesh parametrization is finallydefined as,

argminQi,i=1,...,n(El + λ · EST ), s.t. FB , FSA = 0. (15)

Boundary constraint and saliency constraint are hardconstraints in the above equation, while structure con-straint is viewed as soft constraint. λ is one weight factorbalancing the influence of parametrization energy andstructure constraint. Through extensive experiments, itis set to 0.5 for all our results in Section 4.

The energy of (15) is minimized using the multi-dimensional Newton’s method. For each iteration ofNewton’s method, a multi-grid solver [19] is used tosolve the sparse linear equations. In practice, the initialsolutions of the equation are set to the positions of meshpoints resulting from scaling the input image. For the

spare mesh density as shown in Fig. 1(b), the equationconverges to the final solution very fast.

Points of Mt in less salient region and retargeted polesof salient objects are generated through minimizationof (15). Afterwards, points on retargeted salient objectsare produced using equation (3). Once the target meshMt is obtained, we render the resulting image using astandard texture mapping algorithm [6]. Such processin general can be efficiently implemented, since texturemapping is a basic function supported by modern graph-ics processing units (GPUs) and graphics hardware oncellular phones.

4 EXPERIMENTS

We experimented with our algorithm on a varietyof images. Some representative results are shown inFigs. 8, 9, 10, 11, and 15.

Fig. 8 demonstrates some results together with theintermediate meshes. The second and third columns givethe meshes generated by triangulation, and the targetmeshes produced by our algorithm respectively. For


(a) (c) (d) (b)

Fig. 9. Results for horses in grassland with different aspect ratios. (a) Original image with 743 × 512 resolution. (b), (c), and (d)Our results of sizes 320 × 320, 480 × 270, and 320 × 240, and in aspect ratios of 1 : 1, 16 : 9, and 4 : 3 separately.

(a) (b) (c) (e)

(d)

Fig. 10. Results for the cow and goat in grassland with different aspect ratios. (a) Original image with 800 × 534 resolution. (b),(c), (d), and (e) Our results of sizes 240 × 320, 240 × 240, and 664 × 240 and 428 × 240. They are in aspect ratios of 3 : 4, 1 : 1,25 : 9, and 16 : 9 respectively.

Fig. 11. Example of retargeting salient objects with different target scales. From left to right: input image of size 910 × 720, andresulting images of size 320 × 320 and with the scales of 0.25, 0.35 and 0.5 of salient people compared with the original.

images with background structures, lines detected andused as structure constraints are shown in the first col-umn. Although such lines are sparse, they exert crucialeffect in preserving background structures. For the inputimage in the fourth row, retargeting positions of the twogirls are automatically computed by our algorithm.

Figs. 9 and 10 show several examples where thereexist multiple important objects. Original images areadapted to different sizes and aspect ratios of popularmobile devices, e.g. 1:1, 3:4, 4:3, 16:9, even 25:9. Ascan be seen, our method effectively emphasizes all theimportant objects by non-uniformly shrinking contextual

regions. The distortion in these results is merely visible.Using our approach, target scales of salient objects canbe controlled by the user. Fig. 11 gives such an example.The retargeting scales of salient objects vary accordingto user input scale parameters.

To demonstrate the effectiveness of our algorithm,we also compare to the results of many representativemethods as shown in Fig. 15. The original images in thefirst, third and fifth rows are of sizes 1024×768, 685×512and 800×586 separately. All resulting images are of size320 × 320 except the uniform scaling results. Obviously,we can see that uniform scaling usually makes the faces


(a) (b) (c) (d)

Fig. 12. Comparison between our approach and quad grids-based methods. (a) Input image. (b) Gal et al’s result [13]. (c) Wanget al’s result [14]. (d) Our result. Our result is comparable to their results. (a), (b), and (c) are fetched from the paper [14].

too small to identify. While automatic cropping caneffectively emphasizes the important content [25], thecontext is lost. Results of fisheye warping (last columnin odd rows) can achieve the goal of focus+context.However, fisheye warping may introduce noticeabledistortions to background structures. Moreover, it onlyworks well for image with single important object. Theresults in the first column of even rows are producedby segmentation-based composition [23]. For image withsimilar foreground and background color distributions,accurate segmentation poses a big challenge. Moreover,inpainting for background with complex structure isdifficult. The seam carving method [2] may also destroypersons in the images (second column of even rows).We then integrate saliency and face detection into seamcarving algorithm such that humans can be preservedcompletely (third column of even rows). But objects inbackground are destroyed now (see the artifacts high-lighted by the red rectangles).

The last column of even rows in Fig. 15 gives our retar-geting results. In our results, the persons are given highresolution and free from distortion. Since our methodpreserves image structures as much as possible, littlevisual distortion is introduced to the background ofimages. For example, the structure of the Golden GateBridge is well preserved.

We further compare our result with those producedby quad grids-based methods [13], [14]. From Fig. 12, itis obvious that our result is comparable to their results.

Computational complexity of our approach mainlydepends on the size of the mesh, namely the number ofmesh points and edges. The mesh size again dependson the source image size and mesh density. The ma-jor computation is spent on minimizing the energy inEquation (15). Normally, a source image is divided intoa spare mesh, with several hundred mesh points. Forexample, the mesh of Fig. 1(b) has around 150 points.The number of variables of equation (15) for this exampleis around 120. Such equation can be solved efficientlyusing the multidimensional Newton’s method. Giventhe target mesh, rendering the target image using thestandard texture mapping method is trivial especiallywhen such operation is accelerated by a common PC or

cellular phone graphics chip.

(a) (b)

Fig. 13. Example of the image display for user study. Partic-ipants were shown two versions of each image on the imagepanels of a virtual Palm Treo. The input image was also givenas reference.

4.1 User Study

To further evaluate the effectiveness of our approach, wecarried out one user study. The objective is to determinewhether the results of our method are preferred by usersto those of other methods including uniform scaling,automatic cropping [25], fisheye warping [12], and seamcarving [2].

TABLE 1Statistical data of user study.

Parti Std.Images cipants Mean (%) of wins dev. p

Croping 25 28 18.5 2.05 0.0001(73.86%)

Scaling 25 23 19.7 1.61 0.0001(78.8%)

Fisheye 25 22 16.27 3.91 0.0041warp (65.1%)Seam 25 24 15.3 2.58 0.0126

carving (61.2%)


Input

Our

Input

Our

Input

Our

Input

Our

Fig. 14. Images (odd rows) and our retargeting results (even rows) used in user study. Another image is shown in Fig. 13.

25 representative images, together with the results ofthe tested algorithms, were used in our experiments. Theimages have wide diversity of types, e.g. images withsimple and complex background, people and multiplesalient objects. All the target images are of size 320×320that fits a Palm screen resolution. The input high reso-lution images and our retargeting results are shown inFig. 14. The user study consisted of four parallel parts, inwhich we compare the results of our method with thoseof scaling, cropping, fisheye warping, as well as seamcarving separately. For fair comparison, seam carving isexecuted by incorporating saliency and face detection.

The study was conducted off-line and on-line simulta-neously. For the off-line study, a participant was shown apair of retargeted results of each input image, displayedon the image panels of a virtual Palm Treo as shown inFig. 13. Each pair consisted of one result of our methodand one of the other method. Whether our result wason the left or right was randomized. Specifically, theinput image was also displayed for comparison. Theparticipant was asked to select one from the pair they

preferred, and to write the answer on the answer sheet.On-line study was conducted similarly. The link of thestudy was opened to our computer science departmentand one consumer electronics company during the study.Subjects voluntarily responded to advertisements postedto the mailing lists and were not compensated for theirtime. Totally 97 different participants participated inour user study. They were in ages of 20 to 35, andwere mainly undergraduate and graduate students ofour department, as well as the IT employees. To ourknowledge, they remained unknown about our project.

The statistical data of user study are shown in Table1. For each part of the study, we count the total numberof times that our result was preferred by participants.Overall, participants selected our results over croppedimages 18.5 out of 25 times (73.86%) and over scaledimages 19.7 (78.7%). p values of the t test given in thelast column of the Table show that the comparisonswere statistically significant. We further checked thetesting images, and found that for images with a simplebackground, users normally chose cropping against ours.


While for the image with obvious structures in the back-ground, they usually enjoyed ours as we preserved moreimage structures and avoided distortion as much aspossible. This phenomenon indicates that visual featuresin surrounding context still have dominant effects onviewing experience in addition to the important objects.

Participants selected our results over results of fisheye-view warping 65.1% of times, over those of seam carving61.2% of times. Actually, both fish-eye warping and seamcarving achieve “focus+context” image retargeting withdifferent schemes. Participants finally reported that formost of the results, they cannot distinguish our effectfrom the effects of fish-eye warping and seam carvingat first glance. Our results visually resemble those of thetwo methods, especially when fitting images with simplebackground to a Palm screen of size 320×320. However,for images with complex background structures, fish-eye warping tends to shrink objects near backgroundtoo much. In addition, seam carving inevitably destroyssome background structures. While our results can pre-serve the original background structures as-rigid-as pos-sible, and meanwhile emphasize the important objectsto some extent. As a result, users normally prefer ourresults when referring to the input images.

4.2 Limitations

Our method retargets images with salient objects by aprocess of saliency-based mesh parametrization. Salientobject extraction relies on image saliency and face detec-tion. Since 2D image is prone to variations of body poses,face expression, and illumination, the robust humanface and body detection is still challenging. This makesaccurate salient object extraction difficult. In our testexamples, our method fails to capture some importanthuman bodies with varied face poses or illumination. Itis one drawback of our approach.

In addition, we emphasize salient object with an em-phasis scale which can be determined automatically orset by the user. Emphasis of relative scale of salient objecthowever will inevitably distort its nearby objects. Wetry to relieve such artifact with the structure constraint,which is imposed by detected structured segments dis-tributed over background. Nevertheless, if backgroundcontains fewer detected segments, the structure con-straint is insufficient. One example can be seen in theUniversal Studios near the Globe example in Fig. 15.As the left part of the image contains fewer structuredsegments, structure constraint can not perfectly dealwith the deformation of this area during retargeting.The people standing near the Globe in background aredistorted as a result. This is another limitation of ourcurrent implementation. In fact, such image has manyother structured curves, e.g. the contour of the Globe. Infuture, we intend to integrate preservation of more com-plex curved structures into our parametrization formula-tion, for better relieving obvious structure deformation.Furthermore, now we mainly evaluate the effectiveness

of our structure constraint for images with a varietyamount of structures, ranging from simple backgroundto complex background such as some examples shown inFig. 8 and Fig. 14. We do not fully evaluate it for overlycomplex images. It will be helpful to further evaluateour approach on more images with more complex back-ground.

5 CONCLUSIONS AND FUTURE WORK

We have proposed an effective image retargeting methodbased on a mesh image representation. For the first time,we formulate image retargeting as a mesh parametriza-tion problem that aims to find a homomorphous meshwith the desired size of the target display. Our methodachieves emphasizing important objects while retainingthe surrounding context. Our scheme of using meshrepresentation for image adaptation enables easy preser-vation of images structures. Thus little visual distortionis introduced into the target image.

Representing images with mesh provides a feasibleway for image retargeting with the solid foundationfrom mesh parametrization. Under mesh representation,salient objects have a compact structure using triangles,which makes emphasis of them more convenient andcontrollable. However, for computation efficiency, weusually triangulate the image with sparse mesh density.The object shape is not precise. To relieve this prob-lem, we include more triangles into the object ratherthan missing some parts of the object. In this way, weguarantee least distortion to the object. Extra valuablescreen space however is allocated to some non-importantregions that are taken as parts of the object. This is alimitation of our current implementation. To account forthis, we will explore the use of multi-resolution meshesfor representation of images in future. That is, salient ob-ject is precisely approximated by dense triangles, whilebackground is endowed with coarse ones for efficiency.

Our approach tries to adapt images with little distor-tion. Nevertheless, image layout in terms of semanticand topological relations between objects is not ad-dressed. Such aspects are important to the visual per-ception. For instance the positional relations of playersin football match, as well as their locations relative tothe ball are important. In future, we shall take into con-sideration such relations, and investigate the solutions tothe problem. Moreover, extending our current method tovideo is important.

ACKNOWLEDGMENT

We would like to thank the reviewers for their construc-tive comments which helped improving this manuscript.Tongwei Ren and Xiang Wang provided the matlabcode of Seam carving. This work was supported by theNational Natural Science Foundation of China (Grants60703084, 60723003, 60635030, and 60721002), NSF grantIIS-0416284, Jiangsu Science Foundation (BK2008018)and Jiangsu 333 High-Level Talent Cultivation Program.


REFERENCES

[1] Y. Altunbasak and A. Tekalp. Closed-form connectivity-preservingsolutions for motion compensation using 2-d meshes. IEEE Trans.Image Processing, 6(9):1255–1269, 1997.

[2] S. Avidan and A. Shamir. Seam carving for content-aware imageresizing. ACM Trans. on Graphics, 26(3):267–276, 2007.

[3] L.-Q. Chen, X. Xie, X. Fan, W.-Y. Ma, H.-J. Zhang, and H.-Q. Zhou.A visual attention model for adapting images on small displays.Multimedia Systems, 9(4):353–364, 2003.

[4] W.-H. Cheng, C.-W. Wang, and J.-L. Wu. Video adaptation forsmall display based on content recomposition. IEEE Trans. Circuitsand Systems for Video Technology, 17(1):43–58, 2007.

[5] X. Fan, X. Xie, H.-Q. Zhou, and W.-Y. Ma. Looking into videoframes on small displays. In ACM Multimedia, pages 247–250, 2003.

[6] D. Hearn and M. P. Baker. Computer graphics with OpenGL. PearsonPrentice Hall, 2004.

[7] Y. Hu, D. Rajan, and L.-T. Chia. Robust subspace analysis fordetecting visual attention regions in images. In ACM Multimedia,pages 716–724, 2005.

[8] L. Itti and C. Koch. Computational modeling of visual attention.Nature Reviews Neuroscience, 2(3):194–203, March 2001.

[9] L. Itti, C. Koch, and E. Niebur. A model of saliency-based visualattention for rapid scene analysis. IEEE Trans. Pattern Analysis andMachine Intelligence, 20(11):1254–1259, 1998.

[10] S. Jiang, H. Liu, Z. Zhao, Q. Huang, and W. Gao. Generatingvideo sequence from photo image for mobile screens by contentanalysis. In IEEE International Conference on Multimedia and Expo,pages 1475–4178, 2007.

[11] Y. Li, Y.-F. Ma, and H.-J. Zhang. Salient region detection andtracking in video. In IEEE International Conference on Multimediaand Expo, pages 269–272, 2003.

[12] F. Liu and M. Gleicher. Automatic image retargeting with fisheye-view warping. In ACM UIST, pages 153–162, 2005.

[13] R. Gal, O. Sorkine, and D. Cohen-Or. Feature-aware texturing. InProceedings of Eurographics Symposium on Rendering, pages 297–303,2006.

[14] Y.-Sh. Wang, C.-L. Tai. O. Sorkine, and T.-Y. Lee. OptimizedScale-and-Stretch for Image Resizing. In ACM Trans. Graph. (ACMSiggraph Asia, 27(5), 2008.

[15] F. Liu and M. Gleicher. Video retargeting: Automating pan andscan. In ACM Multimedia, pages 241–250, 2006.

[16] H. Liu, X. Xie, W.-Y. Ma, and H.-J. Zhang. Automatic browsing oflarge pictures on mobile devices. ACM Multimedia, pages 148–155,2003.

[17] Y.-F. Ma and H.-J. Zhang. Contrast-based image attention analysisby using fuzzy growing. In ACM Multimedia, pages 374–381, 2003.

[18] J. Maillot, H. Yahia, and A. Verroust. Interactive texture mapping.In ACM Siggraph, pages 27–34, 1993.

[19] W. Press, S. Teukolsky, W. Vetterling, and B. Flannery. Numericalrecipes in C: The art of scientific computing. Cambridge UniversityPress, 1992.

[20] Y. Rui, L. He, A. Gupta, and Q. Liu. Building an intelligent cameramanagement system. In ACM Multimedia, pages 2–11, 2001.

[21] A. Santella, M. Agrawala, D. DeCarlo, D. Salesin, and M. Cohen.Gaze-based interaction for semi-automatic photo cropping. In Proc.of CHI, 2006.

[22] R. Seidel. Constrained delaunay triangulations and voronoidiagrams with obstacles. Technical Report 260, Inst. for InformationProcessing, Graz, Austria, 1988.

[23] V. Setlur, T. Lechner, M. Nienhaus, and B. Gooch. Retargetingimages and video for preserving information saliency. IEEEComputer Graphics and Applications, 27(5):80–88, 2007.

[24] V. Setlur, S. Takagi, R. Raskar, M. Gleicher, and B. Gooch. Auto-matic Image Retargeting. Proceedings of 4th International Conferenceon Mobile and Ubiquitous Multimedia, 2005.

[25] B. Suh, H. Ling, B. Bederson, and D. Jacobs. Automatic thumbnailcropping and its effectiveness. In ACM UIST, pages 95–104, 2003.

[26] P. Viola and M. Jones. Rapid object detection using a boostedcascade of simple features. In IEEE Conf. CVPR, pages 511–518,2001.

[27] J. Wang, M. Reinders, R. Lagendijk, J. Lindenberg, andM. Kankanhalli. Video content representation on tiny devices. InIEEE International Conference on Multimedia and Expo, pages 1711–1714, 2004.

[28] J. Wang, Y. Ying, Y. Guo, and Q.-S. Peng. Automatic foregroundextraction of head shoulder images. In Computer Graphcis Interna-tional (Springer LNCS 4035), pages 385–396, 2006.

[29] L. Wolf, M. Guttmann, and D. Cohen-Or. Non-homogeneouscontent-driven video-retargeting. In IEEE International Conferenceon Computer Vision, pages 1–6, 2007.

[30] K. Zhou, J. Snyder, B. Guo, and H.-Y. Shum. Iso-charts: stretch-driven mesh parameterization using spectral analysis. In ACMSymp. Geometry Processing, pages 45–54, 2004.

Yanwen Guo received his Ph.D degree inapplied mathematics from State Key Lab ofCAD&CG, Zhejiang University in 2006. He iscurrently an associate professor at the NationalLaboratory of Novel Software Technology, De-partment of Computer Science and Technology,Nanjing University. His main research interestsinclude image and video processing, geometryprocessing, and face related applications. Heworked as a visiting researcher in Departmentof Computer Science and Engineering, the Chi-

nese University of Hong Kong in 2006 and a visiting researcher inDepartment of Computer Science, the University of Hong Kong in 2008.

Feng Liu received the B.S. and M.S. degreesfrom Zhejiang University, Hangzhou, China, in2001 and 2004, respectively, both in computerscience. He is currently a Ph. D. candidate inthe Department of Computer Sciences at theUniversity of Wisconsin-Madison, USA. His re-search interests are in the areas of multimedia,computer vision and graphics.

Jian Shi received the BSc degree in computerscience from Nanjing University, China, in 2007.He is now a master student in the Departmentof Computer Science & Technology at NanjingUniversity. His research interest is in image andvideo processing.


Zhi-Hua Zhou (S’00-M’01-SM’06) received theBSc, MSc and PhD degrees in computer sciencefrom Nanjing University, China, in 1996, 1998and 2000, respectively, all with the highest hon-ors.He joined the Department of Computer Science& Technology at Nanjing University as an as-sistant professor in 2001, and is currently Che-ung Kong Professor and Director of the LAMDAgroup. His research interests are in artificial in-telligence, machine learning, data mining, pat-

tern recognition, information retrieval, evolutionary computation, andneural computation. In these areas he has published over 70 papersin leading international journals or conference proceedings.Dr. Zhou has won various awards/honors including the National Science& Technology Award for Young Scholars of China (2006), the Awardof National Science Fund for Distinguished Young Scholars of China(2003), the National Excellent Doctoral Dissertation Award of China(2003), the Microsoft Young Professorship Award (2006), etc. He isan Associate Editor of IEEE Transactions on Knowledge and DataEngineering, Associate Editor-in-Chief of Chinese Science Bulletin, andon the editorial boards of Artificial Intelligence in Medicine, IntelligentData Analysis, Science in China, etc. He is/was a PAKDD SteeringCommittee member, Program Committee Chair/Co-Chair of PAKDD’07,PRICAI’08 and ACML’09, Vice Chair or Area Chair of IEEE ICDM’06,IEEE ICDM’08, SIAM DM’09, ACM CIKM’09, etc., Program Committeemember of various international conferences including AAAI, ICML,ECML, ACM SIGKDD, IEEE ICDM, ACM Multimedia, etc., and GeneralChair/Co-Chair or Program Committee Chair/Co-Chair of a dozen of na-tive conferences. He is a senior member of China Computer Federation(CCF), the Vice Chair of the CCF Artificial Intelligence & Pattern Recog-nition Society, an Executive Committee member of Chinese Associationof Artificial Intelligence (CAAI), the Chair of the CAAI Machine LearningSociety, and the Chair of the IEEE Computer Society Nanjing Chapter.He is a member of AAAI and ACM, and a senior member of IEEE andIEEE Computer Society.

Michael Gleicher is an Associate Professor inthe Department of Computer Sciences at theUniversity of Wisconsin, Madison. Prof. Gleicheris founder and leader of the Department’s Com-puter Graphics group. Prof. Gleicher’s current re-search falls into three categories: character an-imation; automated multimedia processing andproduction; and visualization and analysis toolsfor life sciences applications. Prior to joining theuniversity, Prof. Gleicher was a researcher atThe Autodesk Vision Technology Center and at

Apple Computer’s Advanced Technology Group. He earned his Ph. D. inComputer Science from Carnegie Mellon University, and holds a B.S.E.in Electrical Engineering from Duke University.


Input image Uniform scaling Automatic cropping Fisheye warping

Segmentation-based composition Seam carving Seam carving with saliency detection Our result





Fig. 15. Comparisons between our approach and previous methods. All results are produced to fit a 320 × 320 PDA screen.

Documents

PAPER TO APPEAR IN IEEE TRANSACTIONS ON ...cs.nju.edu.cn/ywguo/Image Retargeting Using Mesh...object shapes. Effective image retargeting should emphasize impor-tant content while retaining