16
1624 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, NO. 9, SEPTEMBER 2008 Rate Bounds on SSIM Index of Quantized Images Sumohana S. Channappayya, Member, IEEE, Alan Conrad Bovik, Fellow, IEEE, and Robert W. Heath, Jr., Senior Member, IEEE Abstract—In this paper, we derive bounds on the structural simi- larity (SSIM) index as a function of quantization rate for fixed-rate uniform quantization of image discrete cosine transform (DCT) coefficients under the high-rate assumption. The space domain SSIM index is first expressed in terms of the DCT coefficients of the space domain vectors. The transform domain SSIM index is then used to derive bounds on the average SSIM index as a func- tion of quantization rate for uniform, Gaussian, and Laplacian sources. As an illustrative example, uniform quantization of the DCT coefficients of natural images is considered. We show that the SSIM index between the reference and quantized images fall within the bounds for a large set of natural images. Further, we show using a simple example that the proposed bounds could be very useful for rate allocation problems in practical image and video coding applications. I. INTRODUCTION T HE mean squared error (MSE) is a popular metric in the design of algorithms ranging from image quality assessment to quantization to restoration. The popularity of the MSE can be attributed to two main reasons: amenability to analysis and a lack of competitive perceptual distortion metrics. The importance of designing image processing algorithms opti- mized for perceptual quality measures, as opposed to the MSE, has been long recognized [7], [13]. Image coding algorithms that are optimized for perceptual distortion measures have been proposed by several authors and have become a part of image coding standards [10], [23], [24], [26], [43]. These algorithms use distortion measures such as those based on models of the human visual system (HVS), variants of the MSE such as the weighted MSE, or other empirical measures of quality. Advances in understanding the HVS [29], and improved mod- eling of the statistics of natural scenes [36] have fostered the development of powerful image quality assessment (IQA) algo- rithms. These new IQA algorithms include, among others, the structural similarity (SSIM) index [39], or Wang-Bovik index, and its variants [41], [42], the Visual Information Fidelity Crite- rion (VIF) [30] and the Visual Signal to Noise Ratio (VSNR) [4]. These IQA algorithms outperform the measures currently used Manuscript received August 23, 2007; revised April 16, 2008. Published August 13, 2008 (projected). This work was supported in part by the Texas Advanced Technology Program under Grant 003658-0380-2003 and in part by the National Science Foundation under Grant 0728748. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Giovanni Poggi. S. S. Channappayya was with the Department of Electrical and Computer En- gineering, The University of Texas at Austin, Austin TX 78712-0240 USA. He is now with the PacketVideo Corporation, San Diego, CA 92121 USA (e-mail: [email protected]). A. C. Bovik and R. W. Heath, Jr. are with the Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX 78712- 0240 USA (e-mail: [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIP.2008.2001400 in perceptual image coding algorithms such as the just-notice- able distortion (JND) metric [22], in addition to being better than the MSE in measuring perceptual image quality [32]. Image pro- cessing algorithms that optimize for these new IQA algorithms are only recently being developed [5], [6], [40]. The emergence of these IQA algorithms present the possibility of improved de- sign of image coding algorithms. In this paper, we analyze the relation between the quantiza- tion rate and the distortion measured by that SSIM index. The SSIM index is considered as opposed to other recent IQA al- gorithms since its performance is very competitive while being highly amenable to analysis. Nevertheless, a closed form rela- tion between rate and SSIM index for fixed rate uniform quan- tization is extremely difficult, if not impossible, to derive. In- stead, we derive upper and lower bounds on the SSIM index as a function of quantization rate (under a high rate assumption). It is shown that the SSIM index between the reference and quan- tized versions of the input falls within these bounds not only for uniform, Gaussian, and Laplacian sources, but also for nat- ural images. The usefulness of the bounds in a practical image coding scenario is demonstrated using a simple rate allocation example. A. Related Work A brief overview of IQA algorithms is presented, followed by a discussion of image coding algorithms whose design is influ- enced by the properties of the HVS. The human eye is the ulti- mate receiver of all visual information. It is, therefore, natural to take into account the properties of the human visual system (HVS) in the design of an objective perceptual distortion mea- sure. Several full-reference IQA algorithms have incorporated important properties of the HVS in their design. Yet, it is im- portant to note that the HVS is still only weakly understood. This makes image quality assessment (even full-reference) a very challenging task that is still being actively researched. The inadequacy of global MSE as a measure of image quality is well documented [8], [13], [21], [30], [31], [34], [37], [38], [39], [45]. In the context of image coding, it has been shown that incorporating models for several important aspects of the HVS such as the contrast sensitivity function (CSF), eye movement, edge masking etc. in the design, in addition to using distortion measures such as frequency weighted mean squared error, result in substantial quality gains when compared to using only the MSE [13], [43]. We briefly discuss perceptual distortion mea- sures, followed by a discussion of perceptually motivated image coding systems. Lubin’s mechanistic model for the HVS [21], Daly’s visual difference predictor (VDP) [8], Teo and Heeger’s normaliza- tion model for the visual cortex [34], and Winkler’s model of the HVS for video stimulus [45] are all examples of IQA and VQA algorithms that explicitly model the HVS in their design. All these algorithms use some form of the norm and error- 1057-7149/$25.00 © 2008 IEEE

1624 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, …live.ece.utexas.edu/publications/2008/ssc_ratebounds_tip_sep08.pdf · 1624 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17,

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 1624 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, …live.ece.utexas.edu/publications/2008/ssc_ratebounds_tip_sep08.pdf · 1624 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17,

1624 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, NO. 9, SEPTEMBER 2008

Rate Bounds on SSIM Index of Quantized ImagesSumohana S. Channappayya, Member, IEEE, Alan Conrad Bovik, Fellow, IEEE, and

Robert W. Heath, Jr., Senior Member, IEEE

Abstract—In this paper, we derive bounds on the structural simi-larity (SSIM) index as a function of quantization rate for fixed-rateuniform quantization of image discrete cosine transform (DCT)coefficients under the high-rate assumption. The space domainSSIM index is first expressed in terms of the DCT coefficients ofthe space domain vectors. The transform domain SSIM index isthen used to derive bounds on the average SSIM index as a func-tion of quantization rate for uniform, Gaussian, and Laplaciansources. As an illustrative example, uniform quantization of theDCT coefficients of natural images is considered. We show thatthe SSIM index between the reference and quantized images fallwithin the bounds for a large set of natural images. Further, weshow using a simple example that the proposed bounds could bevery useful for rate allocation problems in practical image andvideo coding applications.

I. INTRODUCTION

T HE mean squared error (MSE) is a popular metric inthe design of algorithms ranging from image quality

assessment to quantization to restoration. The popularity ofthe MSE can be attributed to two main reasons: amenability toanalysis and a lack of competitive perceptual distortion metrics.The importance of designing image processing algorithms opti-mized for perceptual quality measures, as opposed to the MSE,has been long recognized [7], [13]. Image coding algorithmsthat are optimized for perceptual distortion measures have beenproposed by several authors and have become a part of imagecoding standards [10], [23], [24], [26], [43]. These algorithmsuse distortion measures such as those based on models of thehuman visual system (HVS), variants of the MSE such as theweighted MSE, or other empirical measures of quality.

Advances in understanding the HVS [29], and improved mod-eling of the statistics of natural scenes [36] have fostered thedevelopment of powerful image quality assessment (IQA) algo-rithms. These new IQA algorithms include, among others, thestructural similarity (SSIM) index [39], or Wang-Bovik index,and its variants [41], [42], the Visual Information Fidelity Crite-rion (VIF) [30] and the Visual Signal to Noise Ratio (VSNR) [4].These IQA algorithms outperform the measures currently used

Manuscript received August 23, 2007; revised April 16, 2008. PublishedAugust 13, 2008 (projected). This work was supported in part by the TexasAdvanced Technology Program under Grant 003658-0380-2003 and in part bythe National Science Foundation under Grant 0728748. The associate editorcoordinating the review of this manuscript and approving it for publicationwas Dr. Giovanni Poggi.

S. S. Channappayya was with the Department of Electrical and Computer En-gineering, The University of Texas at Austin, Austin TX 78712-0240 USA. Heis now with the PacketVideo Corporation, San Diego, CA 92121 USA (e-mail:[email protected]).

A. C. Bovik and R. W. Heath, Jr. are with the Department of Electrical andComputer Engineering, The University of Texas at Austin, Austin, TX 78712-0240 USA (e-mail: [email protected]; [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TIP.2008.2001400

in perceptual image coding algorithms such as the just-notice-able distortion (JND) metric [22], in addition to being better thanthe MSE in measuring perceptual image quality [32]. Image pro-cessing algorithms that optimize for these new IQA algorithmsare only recently being developed [5], [6], [40]. The emergenceof these IQA algorithms present the possibility of improved de-sign of image coding algorithms.

In this paper, we analyze the relation between the quantiza-tion rate and the distortion measured by that SSIM index. TheSSIM index is considered as opposed to other recent IQA al-gorithms since its performance is very competitive while beinghighly amenable to analysis. Nevertheless, a closed form rela-tion between rate and SSIM index for fixed rate uniform quan-tization is extremely difficult, if not impossible, to derive. In-stead, we derive upper and lower bounds on the SSIM index asa function of quantization rate (under a high rate assumption). Itis shown that the SSIM index between the reference and quan-tized versions of the input falls within these bounds not onlyfor uniform, Gaussian, and Laplacian sources, but also for nat-ural images. The usefulness of the bounds in a practical imagecoding scenario is demonstrated using a simple rate allocationexample.

A. Related Work

A brief overview of IQA algorithms is presented, followed bya discussion of image coding algorithms whose design is influ-enced by the properties of the HVS. The human eye is the ulti-mate receiver of all visual information. It is, therefore, naturalto take into account the properties of the human visual system(HVS) in the design of an objective perceptual distortion mea-sure. Several full-reference IQA algorithms have incorporatedimportant properties of the HVS in their design. Yet, it is im-portant to note that the HVS is still only weakly understood.This makes image quality assessment (even full-reference) avery challenging task that is still being actively researched.

The inadequacy of global MSE as a measure of image qualityis well documented [8], [13], [21], [30], [31], [34], [37], [38],[39], [45]. In the context of image coding, it has been shown thatincorporating models for several important aspects of the HVSsuch as the contrast sensitivity function (CSF), eye movement,edge masking etc. in the design, in addition to using distortionmeasures such as frequency weighted mean squared error, resultin substantial quality gains when compared to using only theMSE [13], [43]. We briefly discuss perceptual distortion mea-sures, followed by a discussion of perceptually motivated imagecoding systems.

Lubin’s mechanistic model for the HVS [21], Daly’s visualdifference predictor (VDP) [8], Teo and Heeger’s normaliza-tion model for the visual cortex [34], and Winkler’s model ofthe HVS for video stimulus [45] are all examples of IQA andVQA algorithms that explicitly model the HVS in their design.All these algorithms use some form of the norm and error-

1057-7149/$25.00 © 2008 IEEE

Page 2: 1624 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, …live.ece.utexas.edu/publications/2008/ssc_ratebounds_tip_sep08.pdf · 1624 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17,

CHANNAPPAYYA et al.: RATE BOUNDS ON SSIM INDEX OF QUANTIZED IMAGES 1625

pooling to arrive at a measure of quality. These measures per-form consistently better than the MSE in measuring the percep-tual quality of images and videos [32].

In a departure from earlier quality measures, Wang and Bovik[37] proposed the universal image quality index (UQI). The UQIdiffers from the earlier philosophy in that there is no explicitHVS modeling and the error is no longer measured using annorm. The idea behind UQI is to measure the distortion of threeimage features (locally)—luminance, contrast, and correlationbetween the reference and the distorted image. The correlationbetween UQI and the mean opinion score (MOS) of subjectivestudies is significantly better than the correlation between MSEand MOS [37]. In addition to better perceptual correlation, thismetric is intuitive, computationally efficient, and analyticallyamenable. Wang et al., [39] proposed an improvement to theUQI in a measure called the structural similarity (SSIM) index.The SSIM index introduces stabilizing constants to handle theinstability issues associate with the UQI. A detailed discussionof the SSIM index is presented in Section II-A. Some of theother recent IQA algorithms include the information fidelity cri-terion (IFC) [31], and its improved version, the visual informa-tion fidelity criterion [32]. Both these measures use empiricalstatistical models for natural scenes in their design [33], [36]. Aperformance analysis of these recent algorithms can be found in[30]. Another recent IQA algorithm is the visual signal to noiseratio (VSNR) [4]. A detailed discussion of the state-of-the-artimage quality metrics can be found in [38].

The importance of optimizing image codecs for perceptualdistortion measures has also been long recognized. A nat-ural way to design such codecs is to use HVS models (suchas the aforementioned ones) in the bit-allocation process.Mannos and Sakrison [23] provided the first-ever analysis ofrate versus a perceptual distortion measure. The distortionmeasure used here is a weighted mean squared error and theoptimal weighting function is determined empirically so thatit maximizes perceptual quality. The general approach to thedesign of a perceptually optimal encoder is that the image isdecomposed using transforms such as the DCT or wavelets.Visual models [21], [8] are used to mask the coefficients in sucha way that the visibility of quantization errors is minimized.Examples of such codecs include Nill’s perceptually weightedcosine transform approach [24], Eggerton and Srinath’s per-ceptually weighted quantization of DCT coefficients subject toan entropy constraint [10], the Safranek–Johnston perceptualimage coder [28], Watson’s work on perceptually optimizedDCT quantization matrices [44], Buccigrossi and Simoncelli’simage codec based on the statistics of wavelet coefficients [2],Hontsch and Karam’s adaptive coder with perceptual distortioncontrol [16], Chandler and Hemami’s dynamic contrast-basedimage coder [3], and Liu et al.’s JPEG2000 compliant encodingwith a perceptual distortion control mechanism [20]. This listis nonexhaustive but covers several significant contributionsmade towards the design of perceptually optimal image codecs.

Eckert and Bradley [9] provide a thorough review of percep-tually optimized image coding techniques. Pappas and Safranek[25] summarize several popular image quality measures withparticular reference to those used for image compression. Theflavor of our work presented here is most closely related to [23]and [44] in that we analyze the relation between the quantizationrate and a popular and successful perceptual distortion measure(SSIM index).

B. Proposed Work

Next, we discuss the nature and relevance of the proposedwork, followed by a brief outline of the rest of the paper. TheMSE between a random variable (RV) and its quantizedversion at a given quantization rate for fixed rate uni-form scalar quantization is well known and is approximatedas MSE [12], [15], where is thequantization step size. This is valid under the high resolutionassumption, where the rate is large, and the contribution ofthe overload region is ignored. This result holds well for mostpractical scenarios in image and video coding and has beenused to estimate the MSE (PSNR) of quantized images as afunction of quantization rate (e.g., Sabir et al. [27]). Currently,a similar relation between the SSIM index and quantization ratedoes not exist. As with the MSE, such a relation would be veryuseful in bit-allocation problems in image and video coding.In a broader context, such a relation (between rate and SSIMindex) could be used in the design of image and video codecsthat can guarantee a desired level of perceptual quality.

In this paper, upper and lower bounds on the SSIM indexas a function of quantization rate are proposed. Fixed rate uni-form quantization under the high-rate assumption is considered.Since the discrete cosine transform (DCT) is commonly used inthe transform coding of images, our analysis is carried out inthe DCT domain. It is shown for a large set of natural images,in addition to uniform, Gaussian, and Laplacian sources, thatthe SSIM index between the reference and quantized versionsof the input lie within the proposed bounds. We demonstrate theusefulness of the proposed bounds in a practical image codingscenario with a simple rate allocation example.

The paper is organized as follows. Section II presents a briefoverview of the SSIM index, and uniform quantization, and for-mulates the SSIM index versus rate problem. Bounds on theSSIM index as a function of quantization rate are presentedin Section III, followed by a discussion of its properties. InSection IV, we present results and highlight the usefulness of theproposed bounds, followed by concluding remarks in Section V.

II. PROBLEM FORMULATION

In this section, we provide an overview of the SSIM index,and express the space domain SSIM index in terms of the DCTcoefficients of the space domain vectors. We briefly discuss uni-form quantizers, and then the notation used in the sequel. Fi-nally, the expression for the average SSIM index as a function ofquantization step size is presented, and the motivation to boundthis expression is discussed.

A. Structural Similarity Index

The most general form of the metric that is used to measurethe structural similarity between two signal vectors and(both in ) is

SSIM (1)

The term comparesthe luminance of the signals,

compares the contrast of the signals, andmeasures the structural correlation

of the signals. The quantities are the sample means of

Page 3: 1624 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, …live.ece.utexas.edu/publications/2008/ssc_ratebounds_tip_sep08.pdf · 1624 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17,

1626 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, NO. 9, SEPTEMBER 2008

and respectively, are the sample variances of andrespectively, and is the sample cross-covariance betweenand . The constants are used to stabilize the metricfor the case where the means and variances become small. Theparameters , and , are used to adjustthe relative importance of the three components. We use thefollowing simplified form of the SSIM index in our work (where

, and )

SSIM (2)

In image quality assessment, image blocks from the referenceand distorted image constitute the vectors and respectively.The average of the SSIM values across the image (also calledmean SSIM or MSSIM) gives the final quality measure. Thedesign philosophy of the SSIM index is to acknowledge the factthat natural images are highly structured, and that the measureof structural correlation (between the reference and the distortedimage) is important for deciding overall visual quality. Further,the SSIM index measures quality locally and is able to capturelocal dissimilarities better, unlike global quality measures suchas MSE (and, hence, PSNR). Though (2) has a form that is morecomplicated than MSE, it remains analytically tractable. Thesefeatures make the SSIM index attractive to work with.

B. Measuring SSIM index From DCT Coefficients

The DCT is widely used in the transform coding of imagesand videos and is central to several popular image (JPEG) andvideo coding standards (MPEG-x) [26], [11]. Highly efficientsoftware and hardware implementations of the DCT form thecore of several of these standards. The DCT is popular due toits energy compaction property, combined with efficient imple-mentations. These reasons motivate us to perform our analysisin the DCT domain. The SSIM index in (2) is defined in thespace domain, however. In the following, we derive a simpleyet useful formula for measuring the SSIM index between twovectors from their DCT coefficients. Similar expressions can beobtained for the Fourier transform, as well.

In order to measure the SSIM index from DCT coefficients,the space domain mean, variance, and cross correlation are ex-pressed in terms of DCT coefficients. The DCT of a vector

is [18]

whereif

(3)

The DCT is a unitary transform and obeys the Parseval’s the-orem [35]. Using this property and (3), the following relationsbetween space domain mean, variance, cross correlation, andthe DCT coefficients are established

from (4)

Fig. 1. Three-bit uniform quantizer. Shown are the quantization intervals andquantization levels. The figure also shows the notation used in the paper.

(5)

(6)

Substituting the space domain mean, variance, and cross corre-lation terms in the definition of SSIM (2) with the expressionsin (4)–(6)

SSIM

(7)

This expression can be particularly useful when perform qualityassessment of JPEG compressed images without having to de-compress the images to the space domain (for computing theSSIM index from nonoverlapping blocks). We use the DCT do-main expression for the SSIM index in the following analysis.

C. Uniform Quantization

Uniform quantization [1], [15] is the earliest, simplest, andmost common form of quantization. It is used in a range ofaudio, image, and video coding applications [11], [26] mainlydue to its simplicity. While other forms of quantization are wellstudied [15], asymptotic analysis of the relation between rateand distortion (mean squared error) for fixed-rate uniform quan-tization of symmetric sources with infinite support was reportedonly as recently as 2001 by Hui and Neuhoff [17]. We use re-sults from this work in our implementation.

A uniform quantizer is illustrated in Fig. 1, with its granularregion highlighted. The following notation is used in our anal-ysis. The range of the granular region is denoted by , thenumber of quantization levels , where is the quantiza-tion rate. The quantizer step size is denoted by .The quantization levels are denoted by , with

. An interval in the granular region isdenoted by .

The relation between SSIM index and rate is derived underthe high rate assumption and includes contributions only fromthe granular region. We assume that the DCT coefficients areindependent [19], and that they are quantized at different stepsizes [26]. In the sequel, we use the term rate and quantiza-tion step size interchangeably (for notational convenience) sincethey are related by , where and are as de-fined above.

Page 4: 1624 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, …live.ece.utexas.edu/publications/2008/ssc_ratebounds_tip_sep08.pdf · 1624 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17,

CHANNAPPAYYA et al.: RATE BOUNDS ON SSIM INDEX OF QUANTIZED IMAGES 1627

D. Relation Between SSIM Index and Quantization Rate

Let denote a random vector com-posed of DCT coefficients. In the sequel, we assume that theelements are independent and have a jointdensity . Each element ofthe random vector is uniformly quantized at rate . Underthese assumptions, an interval in the joint granular region of thequantizers can be indexed by a vector ,where varies between 0 and . The vectoris quantized by a point . We ig-nore the contribution of the overload region to the average SSIMbetween and , and consider only the granular region.The average SSIM index between and is computed asshown in (8) at the bottom of the page. In practice, the mostcommon DCT block size used in image and video coding ap-plications is 8 8. The expression in (8) however, is quite for-midable to evaluate and implement even for DCT block sizes assmall as 2 2. Therefore, directly using (8) in a practical sce-nario appears extremely difficult, if not impossible. To make thisproblem tractable, we develop upper and lower bounds on (8).These bounds are shown to be accurate in estimating the rangeof the average SSIM index between the reference and quantizedversions of a variety of sources. Further, it is also shown thatthe bounds are easier to implement and evaluate than an explicitsolution to (8).

III. BOUNDS ON THE SSIM INDEX

In this section, we present upper and lower bounds on theaverage SSIM index as a function of quantization rate, eval-uate these bounds for uniform, Gaussian, and Laplacian sources,and discuss several properties of these bounds. We assume thatthe DCT coefficients are independent, and each coefficientis quantized separately at step size . The high-resolution as-sumption is made, and only the contribution of the granular re-gion is considered.

Theorem 3.1: For a random vector with independentcomponents, the average SSIM index [as defined in (7)and (8)] between and its uniformly quantized version

is boundedwith probability by

SSIM

(9)

where is the step size assigned to quantizer to quan-tize random variable

is the average value of the contribution fromthe mean term, are quantities defined below, is therange of the granular region of the quantizer with the largestspan, and is a stabilizing constant [from (7)].

The terms and for a given probability are

(10)

where are dependent on the source distribu-tion. For the case of uniform sources,

, whichmakes the bounds hold with probability .

The main idea used to arrive at the bounds is to relate theSSIM index in (7) to MSE. Once this relation is established,the average SSIM index can be bounded with terms that area function of the MSE, and the standard high resolution MSEresult for fixed rate uniform quantization can be applied. Thedetailed proof can be found in the Appendix. The use of thestandard MSE result gives the bounds several useful propertiesthat are discussed in the following, and in Section IV.

The term is evaluated next for uniform, Gaussian, andLaplacian sources. The uniform source is considered as it is bestsuited to uniform quantization. Its upper bound provides an es-timate of the highest average SSIM index that is achievable ata given quantization rate. Gaussian and Laplacian sources areconsidered as they are commonly used to model DCT coeffi-cients [19]. The expressions for these bounds can be very easilyimplemented for these sources for any DCT block size. Most

SSIM

SSIM

SSIM

(8)

Page 5: 1624 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, …live.ece.utexas.edu/publications/2008/ssc_ratebounds_tip_sep08.pdf · 1624 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17,

1628 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, NO. 9, SEPTEMBER 2008

importantly, we show that the bounds are indeed accurate notonly for these sources, but also for a large set of natural images.

Before presenting expressions for , we revisit the notationused. These expressions correspond to the DC coefficient ,quantized by quantizer . The rate assigned to this quantizeris . The quantization levels of are indexed using . Theupper and lower limits of an interval are notated by and

respectively. Finally, a quantization level is denoted by .The number of DCT coefficients is denoted by , and is aconstant from (7). The steps in arriving at the expressions forare presented in the Appendix.

A. Uniform Source

Suppose that the DC coefficient is uniformly distributedover . For this case, the expression for isgiven by

(11)

B. Gaussian Source

If the DC coefficient is Gaussian distributed with zeromean and variance , the expression for is given by

(12)

where , is the exponential in-tegral. The expression for is an approximation in this casesince we consider the contribution of only one term in the nu-merator (see Appendix for details).

If the AC coefficients are indepen-dent and Gaussian distributed with zero mean and variance

, respectively

(13)

C. Laplacian Source

If the DC coefficient is Laplacian distributed with zeromean and variance , the expression for is a combinationof three terms depending on the values of the upper and lowerlimits of the interval . Suppose that there are intervals

corresponding to Case 1 intervalsin Case 2 , and intervals in Case 3

, with . Each case isevaluated as follows.

Case 1:

(14)

where isthe exponential integral.

Case 2:

(15)

where isthe exponential integral.

Case 3:

(16)

with as above evaluated over the interval and alsoas above, evaluated over

(17)

If the AC coefficients are indepen-dent and Laplacian distributed with zero mean and variance

, respectively

(18)

are a conservative set of parameters that satisfy the bounds.The distribution of is assumed to have zero mean mainly

to simplify notation. The essence of these results is the sameirrespective of the mean.

D. Properties of the Bounds

The bounds in (9) possess several useful properties. (a) Theterms can be easily evaluated for several

Page 6: 1624 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, …live.ece.utexas.edu/publications/2008/ssc_ratebounds_tip_sep08.pdf · 1624 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17,

CHANNAPPAYYA et al.: RATE BOUNDS ON SSIM INDEX OF QUANTIZED IMAGES 1629

Fig. 2. Uniform source: For i.i.d. uniform sources (over [�0:5; 0:5]) of different lengths, that have been uniformly quantized at the same rate, shown are the upperand lower bounds on SSIM index. Also shown is the true SSIM index. (a) Source size 4� 4. (b) Source size 8� 8.

Fig. 3. Gaussian source: For i.i.d. Gaussian sources (zero mean, unit variance) of different lengths, that have been uniformly quantized at the same rate, shownare the upper and lower bounds on SSIM index with p = 0:9. Also shown is the true SSIM index. (a) Source size 4� 4. (b) Source size 8� 8.

commonly used unbounded source types (as shown in the pre-vious subsections). (b) The second term in the product is easy toevaluate. These two properties make the bounds tractable whencompared to (8). (c) In practice, different DCT coefficients arequantized at different rates in order to optimally allocate bits.The bounds hold for any combination of rates, thereby makingthem attractive in practical rate allocation problems. (d) Fromthe expression for the bounds, we see that they can be imple-mented efficiently and easily (even for the complex lookingLaplacian case). Note that the second term in the bound involvesonly summation and division operations. This property couldbe very useful if these bounds were to be used in real-time codecimplementations. This property also allows for fast computationat any practical DCT block size. (e) These bounds can easilybe extended to SSIM index’s predecessor—the universal imagequality index (UQI).

A point to note is that though the analysis considers a 1-DDCT, it is easy to show that the results carry over to the 2-DDCT case. The 2-D DCT obeys the Parseval’s theorem, and therelation between the space domain and DCT domain means andinner products also hold. In the following section, we presentseveral results that illustrate the useful properties of the pro-posed bounds.

IV. RESULTS

In this section, simulation results for a variety of examplesare presented. Results for uniform, Gaussian, and Laplacian arepresented first, followed by results for natural images. Finally,to illustrate the usefulness of these bounds in a practical imagecoding scenario, a bit-allocation problem is solved using theproposed bounds.

A. Uniform, Gaussian, and Laplacian Sources

The results are classified into two parts: equal rate and un-equal rate allocation. As the names suggest, in the equal ratecase, all the elements of the random vector are quantized at thesame rate, while in the unequal rate case, different rates are as-signed to different elements of the random vector.

1) Equal Rate: The first set of results are presented inFigs. 2–4. These results illustrate several properties of thebounds. The true SSIM index lies within the bounds overa range of rates for all three source types, over a range ofcommonly used DCT block sizes. In these results, zero meani.i.d source have been used with the uniform source in therange , and the Gaussian and Laplacian sources bothhaving unit variance. The source sizes correspond to random

Page 7: 1624 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, …live.ece.utexas.edu/publications/2008/ssc_ratebounds_tip_sep08.pdf · 1624 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17,

1630 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, NO. 9, SEPTEMBER 2008

Fig. 4. Laplacian source: For i.i.d. Laplacian sources (zero mean, unit variance) of different lengths, that have been uniformly quantized at the same rate, shownare the upper and lower bounds on SSIM index with p = 0:9. Also shown is the true SSIM index. (a) Source size 4� 4. (b) Source size 8� 8.

Fig. 5. General variance example: Upper and lower bounds on SSIM index for sources whose variance is different from unity, along with true SSIM Indices.(a) 64 independent uniform random variables (RVs) that have been divided into four groups. The first group is uniformly distributed over [�2; 2], the second groupover [�1:5; 1:5], the third group over [�1; 1], and the last group over [�0:5;0:5]. (b) 64 zero mean independent Gaussian RVs divided into four groups, with thefirst group having standard deviation of 4, the second group having a standard deviation of 3, the third having a standard deviation of 2, and the last group having astandard deviation of 1. (c) 64 zero mean independent Laplacian RVs also divided into four groups, and assigned variances identical to the Gaussian case. p = 0:9.(a) Uniform source. (b) Gaussian source. (c) Laplacian source.

vectors of size 4, 16, and 64, respectively. It is important to notethat these results are general in the sense that they can be usedfor comparison with any normalized data set.

The bounds presented in the previous section hold for anysource variance. Fig. 5 shows the results for the case of equalrate allocation to the three sources with zero mean, and variancedifferent from one. In this case, the source vector contains 64zero mean independent random variables (RVs). For simplicity,the component RVs are divided into four groups, and each groupis i.i.d. For the uniform source, the first group is distributed over

, the second group over , the third group over, and the last group over . For the Gaussian

source, the first group has a standard deviation of 4, the secondgroup has a standard deviation of 3, the third has a standarddeviation of 2, and the last group has a standard deviation of1. Finally, the four groups in the Laplacian case have variancesidentical to the Gaussian case. From the figure, we see that thetrue SSIM index lies within these bounds over a range of rates.These plots point to the fact that the proposed bounds could beused as-is in a practical image coding scenario where there is norestriction on the variance of the DCT coefficients. This exampleis still not general enough since we assume that equal rate isassigned to all the RVs. We present more general results in thefollowing.

2) Unequal Rate: We now present results for the more prac-tically relevant case of unequal rates being assigned to the ele-

ments of the random vectors. As before, both unit-variance, anda general variance source set is considered.

The results for the unit-variance (unit step for the uniformsource) are shown in Fig. 6. In this example, the source vector iscomposed of 64 zero mean i.i.d RVs. For the uniform source, theRVs are distributed over . The Gaussian and Lapla-cian sources have zero mean and unit variance. The 64 RVsare divided into four groups, and each group is assigned thesame rate. The division is motivated by the practice of groupingDCT coefficients based on their perceptual importance [43]. Thecombinations of rates considered are (8, 6, 4, 2), (5, 5, 3, 3), (4, 3,2, 1), and (3, 3, 1, 1). These rates were chosen as they are a rep-resentative set of practical rate combinations. The figure revealsthat the true SSIM index lies within the bounds, and provide agood estimate of the range of the average SSIM index for all thecombinations considered. This example further strengthens thecase for the applicability of these bounds in a practical setting.

The results for the most general case of unequal rates beingassigned to a source composed of independent RVs is shownin Fig. 7. As with the unit variance case above, the source iscomposed of 64 zero mean independent RVs that are dividedinto four groups. Each group has the same variance, and is as-signed the same quantization rate. The sources considered herehave the same distributions as in the equal-rate general varianceexample. The same rate combinations as in the unit variancecase are used. The figure shows that the true SSIM index again

Page 8: 1624 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, …live.ece.utexas.edu/publications/2008/ssc_ratebounds_tip_sep08.pdf · 1624 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17,

CHANNAPPAYYA et al.: RATE BOUNDS ON SSIM INDEX OF QUANTIZED IMAGES 1631

Fig. 6. Upper and lower bounds on the SSIM index for a set of 64 i.i.d zero-mean sources that have been divided into four groups, and each group quantized ata different rate. The rate profiles include (8, 6, 4, 2), (5, 5, 3, 3), (4, 3, 2, 1), and (3, 3, 1, 1). (a) Uniform source distributed over [�0:5; 0:5]. (b) Zero mean, unitvariance Gaussian source. (c) Zero mean, unit variance Laplacian source. p = 0:9. (a) Uniform. (b) Gaussian. (c) Laplacian.

Fig. 7. Upper and lower bounds on the SSIM index for a set of 64 independent zero-mean sources that have been divided into four groups, and each groupquantized at a different rate. The rate profiles include (8, 6, 4, 2), (5, 5, 3, 3), (4, 3, 2, 1), and (3, 3, 1, 1). (a) 64 independent uniform RVs that have been dividedinto four groups. The first group is uniformly distributed over [�2; 2], the second group over [�1:5;1:5], the third group over [�1; 1], and the last group over[�0:5;0:5]. (b) 64 zero mean independent Gaussian RVs divided into four groups, with the first group having standard deviation of 4, the second group havinga standard deviation of 3, the third having a standard deviation of 2, and the last group having a standard deviation of 1. (c) 64 zero mean independent LaplacianRVs also divided into four groups, and assigned variances identical to the Gaussian case. p = 0:9. (a) Uniform. (b) Gaussian. (c) Laplacian.

lies within the bounds for all the combinations considered. Thisexample demonstrates that the bounds can indeed be used in apractical image coding scenario where there is no restriction onthe variance of the source or on the rate that is assigned to eachDCT coefficient. We highlight this with examples of quantiza-tion of DCT coefficient of natural images in the following.

All the simulation results presented so far have the followingexperimental setup. For each distribution, a realization con-sisting of 100 000 samples is used. Each point in the plot is theaverage of ten iterations. The value for the granular regionfor the Gaussian and Laplacian source have been chosen basedon the optimal values presented in Hui and Neuhoff [17].

B. Natural Images

In case of natural images, the DCT coefficients are quantizedat various rates, and the SSIM index between the reference andquantized images is computed. Since DCT coefficients of imagepatches are well approximated either by Gaussian or Lapla-

cian probability density functions (pdf’s), the bounds for thesesources are compared with the true SSIM Indices. The mean andvariance of the DCT coefficients are used to estimate bounds forGaussian and Laplacian sources. In all our examples, an 8 8DCT block size is used. As with the distributions above, we con-sider two sets of results for natural images as well, one whereequal rate is assigned to all the coefficients, and the other wherethere is no such restriction.

1) Equal Rate: Two examples are presented, one where theDCT coefficients of the image are normalized (subtract meanand divide by variance), and the other where no normalizationis performed. In both cases, like frequency DCT coefficients aregrouped together, and quantized at the same rate.

The bounds for the zero-mean unit variance case for Gaussianand Laplacian sources are general in the sense they could becompared with the quantization results for any normalized dataset (and saved as a lookup table). The results for equal rate allo-cation to normalized DCT coefficients is shown in Fig. 8. It pro-

Page 9: 1624 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, …live.ece.utexas.edu/publications/2008/ssc_ratebounds_tip_sep08.pdf · 1624 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17,

1632 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, NO. 9, SEPTEMBER 2008

Fig. 8. Equal rate normalized coding. DCT coefficients are first normalized, and then quantized at the rates shown. The yellow plot (squares) shows the SSIMindex between the reference and quantized normalized coefficients. Note that the true SSIM Indices fall nicely within the the bounds for all three images. p = 0:9.(a) Boats. (b) Mandrill. (c) Goldhill.

Fig. 9. Equal rate general coding. The DCT coefficients are quantized at the rates shown. The yellow plot (squares) shows the SSIM index between the referenceand quantized images. p = 0:9. (a) Boats. (b) Mandrill. (c) Goldhill.

vides initial validation that the true SSIM index of normalizedDCT coefficients of natural images also lies within the bounds.It also shows that the Gaussian and Laplacian models for DCTcoefficients are indeed a good fit. The SSIM index values re-ported in these plots are measured between the normalized DCTcoefficients, and their quantized versions.

In a practical setting, DCT coefficients are not usually nor-malized. The proposed bounds are general, and do not imposeany normalization restrictions. In our next example, the DCTcoefficients are quantized after their mean is removed (forsimplicity). The variance of the DCT coefficients (groupedaccording to their frequency) is used to determine the granularregion of the quantizer, based on the results from Hui andNeuhoff [17]. The results for this case are shown in Fig. 9. TheSSIM Indices for the images are true values measured betweenthe reference and quantized images. These plots reiterate thatthe bounds provide a good estimate of the range of the expectedSSIM index for natural images. Observe that the Laplacianlower bound and Gaussian upper bound form good lower andupper bounds for natural images. Note that this example isstill restricted by the equal rate assumption. We present moregeneral results in the following.

2) Unequal Rate: We now present results for unequal ratequantization of DCT coefficients of natural images. The resultsare presented for both normalized and regular DCT coefficients.To present the results succinctly, yet retain the flavor of a prac-tical setting, the following experimental setup is used. The 64DCT coefficients are lexicographically ordered, and divided into

four groups. The first group contains the DC coefficient, and isgiven the highest precedence in rate allocation. The remainingthree groups are given progressively decreasing importance. Allthe coefficients in a group are assigned the same rate. Two casesare considered—one at high rate, and the other at low rate.

The results are presented first for the normalized case. As be-fore, the normalized case serves as a good first test to validatethe proposed bounds. Fig. 10 presents the results for the highrate case where the four DCT coefficient groups are assigned 8,6, 4, and 2 bits/coefficient respectively, according to their impor-tance. The results for the low rate case where the four DCT co-efficients are assigned 3, 2, 1, and 1 bit/coefficient respectivelyare presented in Fig. 11. From these results, it is seen that thetrue SSIM Indices lie well within the Gaussian and Laplacianbounds for all the images. One main reason for the accuracy ofthese bounds is the goodness of Gaussian and Laplacian fits tothe DCT coefficients. The constants that go into the bounds havebeen chosen empirically so that the bounds are as tight as pos-sible for as many images as possible. This is discussed in detailin the following subsections.

Finally, we present the most general example. In this ex-ample, the DCT coefficients are quantized (after mean subtrac-tion), at two rates, as before. The variance of the DCT coeffi-cients is used to determine the extent of the granular regions ofthe quantizer, and to form the bounds. The results for the highrate case is shown in Fig. 12, and for the low rate case in Fig. 13.From the figures, we can conclude that the bounds are indeeduseful even in the most practical case. The true SSIM Indices

Page 10: 1624 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, …live.ece.utexas.edu/publications/2008/ssc_ratebounds_tip_sep08.pdf · 1624 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17,

CHANNAPPAYYA et al.: RATE BOUNDS ON SSIM INDEX OF QUANTIZED IMAGES 1633

Fig. 10. Upper and lower bounds on the SSIM index of quantized normalized DCT coefficients of six natural images—Boats, Mandrill, Goldhill, Lena, Peppers,and Barbara. The DCT coefficients have been lexicographically ordered, and divided into four groups. The first group is assigned 8 bits/coefficient, the next group6 bits/coefficient, the third group is allocated 4 bits/coefficient, and the last group is assigned 2 bits/coefficient. (a) Upper and lower bounds for a zero mean unitvariance i.i.d Gaussian source consisting of 64 components. (b) Upper and lower bounds for a zero mean unit variance i.i.d Laplacian source consisting of 64components. The bounds vary across images since the constants C ;C are image dependent. p = 0:9. (a) Gaussian. (b) Laplacian.

Fig. 11. Upper and lower bounds on the SSIM index of quantized normalized DCT coefficients of six natural images—Boats, Mandrill, Goldhill, Lena, Peppers,and Barbara. The DCT coefficients have been lexicographically ordered, and divided into four groups. The first group is assigned 3 bits/coefficient, the next group2 bits/coefficient, the third group is allocated 1 bit/coefficient, and the last group is assigned 1 bit/coefficient. (a) Upper and lower bounds for a zero mean unitvariance i.i.d Gaussian source consisting of 64 components. (b) Upper and lower bounds for a zero mean unit variance i.i.d Laplacian source consisting of 64components. The bounds vary across images since the constants C ;C are image dependent. p = 0:9. (a) Gaussian. (b) Laplacian.

lie well within the bounds for both the high and low rate cases.The bounds are tighter for the high rate case as compared to thelow rate case, which can be attributed to the underlying high res-olution result. As with the normalized case, our aim is to useand in the bounds that are as general as possible, across rates,and images. This aspect is discussed further in the following.

C. Bit-Allocation Example

So far, examples that highlight several useful properties of thebounds have been presented. Now, an example that illustratesthe practical applicability of the bounds is presented. Consider

the following rate allocation problem, and the associated con-straints. Suppose that a bit budget of 128 bits is to be allocated tothe 64 DCT coefficients. To make this problem tractable, the fol-lowing constraints are introduced. The DCT coefficients are di-vided into four groups, each containing 16 coefficients. Further,the first group is assumed to contain the most important coeffi-cients, the next group to contain the next most important coeffi-cients, and so on. Finally, the more important group is always as-signed bits greater than or equal to the number of bits assigned tothe group immediately lower in importance. Though this setupis simple, it is a fair reflection of a true coding scenario. Underthese assumptions, four combinations are possible—(5, 1, 1, 1),

Page 11: 1624 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, …live.ece.utexas.edu/publications/2008/ssc_ratebounds_tip_sep08.pdf · 1624 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17,

1634 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, NO. 9, SEPTEMBER 2008

Fig. 12. Upper and lower bounds on the SSIM index of six quantized natural images—Boats, Mandrill, Goldhill, Lena, Peppers, and Barbara. The DCT coefficientshave been lexicographically ordered, and divided into four groups. The first group is assigned 8 bits/coefficient, the next group 6 bits/coefficient, the third groupis allocated 4 bits/coefficient, and the last group is assigned 2 bits/coefficient. (a) Upper and lower bounds for a zero mean Gaussian source consisting of 64independent components. The variance of the components is equal to the variance of the 64 DCT coefficients, respectively. (b) Upper and lower bounds for azero mean Laplacian source consisting of 64 independent components. Again, the variance of the components is equal to the variance of the 64 DCT coefficients,respectively. Also shown is the true SSIM index. p = 0:9. (a) Gaussian. (b) Laplacian.

Fig. 13. Upper and lower bounds on the SSIM index of six quantized natural images—Boats, Mandrill, Goldhill, Lena, Peppers, and Barbara. The DCT coefficientshave been lexicographically ordered, and divided into four groups. The first group is assigned 3 bits/coefficient, the next group 2 bits/coefficient, the third group isallocated 1 bit/coefficient, and the last group is assigned 1 bit/coefficient. (a) Upper and lower bounds for a zero mean Gaussian source consisting of 64 independentcomponents. The variance of the components is equal to the variance of the 64 DCT coefficients, respectively. (b) Upper and lower bounds for a zero mean Laplaciansource consisting of 64 independent components. Again, the variance of the components is equal to the variance of the 64 DCT coefficients, respectively. Alsoshown is the true SSIM index. p = 0:9. (a) Gaussian. (b) Laplacian.

(4, 2, 1, 1), (3, 3, 1, 1), and (2, 2, 2, 2). The problem is to findthe rate combination that results in the highest SSIM index ofthe quantized image.

The proposed bounds give a range over which the averageSSIM index can be expected to lie. This however, is not directlyuseful in a bit-allocation problem, where a single score for theexpected SSIM index is desired. In this example, the average ofLaplacian upper and lower bounds is used as a coarse estimateof the expected SSIM index. The decision rule is to choose thecombination of rates that gives the highest estimate of the SSIMindex from the Laplacian bounds. This decision rule is empir-ical, based on the results for over 50 training images from the

‘Austin and its Vicinity’ database. The results are reported forsix test images (popularly used in the literature) in Table I.

From the table we see that the correct combination is chosen(based on true SSIM Indices), five of the six times. Even theerroneous choice is not very expensive in terms of the reductionof the SSIM index. The most important test for the bit-allocationtechnique is the visual quality check. The main motivation toderive these bounds is the fact that the SSIM index is a powerfulIQA algorithm. It means that high SSIM Indices correspond tohigh visual quality of the images in question (and low scorescorrespond to poor visual quality). The bit-allocation results forthe Boats image is shown in Fig. 14. We see that the image

Page 12: 1624 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, …live.ece.utexas.edu/publications/2008/ssc_ratebounds_tip_sep08.pdf · 1624 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17,

CHANNAPPAYYA et al.: RATE BOUNDS ON SSIM INDEX OF QUANTIZED IMAGES 1635

TABLE IESTIMATE OF THE SSIM INDEX OF SIX QUANTIZED NATURAL IMAGES BASED ON THE LAPLACIAN BOUNDS. THE DCT COEFFICIENTS OF THESE

IMAGES HAVE BEEN DIVIDED INTO FOUR GROUPS, AND QUANTIZED AT DIFFERENT RATE COMBINATIONS [(5, 1, 1, 1), (4, 2, 1, 1),(3, 3, 1, 1), AND (2, 2, 2, 2)] SUCH THAT A BIT BUDGET OF 128 BITS IS SATISFIED. THE COMBINATION THAT GIVES

THE HIGHEST SSIM INDEX ESTIMATE IS CHOSEN TO BE SSIM-OPTIMAL AT THE GIVEN BIT BUDGET p = 0:9

quantized using the rate combination (5, 1, 1, 1) in Fig. 14(b) hasthe highest visual quality and SSIM index. The fact that our bit-allocation technique does indeed pick this image provides thestrongest validation of the usefulness of the proposed bounds.Note that the true score is reported only to validate the choicebased on the estimates. Through this example, we demonstratethat the proposed bounds could be used in a practical imagecoding scenario.

D. Discussion

The results have shown that the proposed bounds work wellover a range of sources. It must be noted that the bounds werenot accurate for a very small percentage of examples, espe-cially when working with natural images. The bounds are di-rectly impacted by the choice of the numbers and . It isimportant that these are as general as possible, while remainingas tight as possible. A straightforward choice for is 0, andfor is (for the case of bounded sources), where isthe extent of the granular region of the quantizer. This choicewas found to give loose bounds, especially the lower bound.For the case of uniform sources, it is easy to see that

not only make the bounds hold with , but also give good re-sults over a large set of examples. The minimum and maximumvalues are determined from the data that is actually quantized,

and the quantization levels of the quantizers used. The maximaand minima are global over the set of random variable realiza-tions or DCT coefficients.

For the case of unbounded inputs (Gaussian and Laplaciansources), it was found that choosing in (10) givesbounds that are tight over a large set of images. Further, theexpressions in (10) are conservative in the sense that letting

actually gives bounds that hold with a probability closeto 1.

The results have demonstrated several useful properties ofthe bounds. The key to arriving at the bounds is the relationbetween the SSIM index and the MSE (see Appendix for de-tails). Once the relation between the SSIM index and MSE isestablished, the well-known result for fixed rate uniform quan-tization of a general random variable MSE isused to formulate the bounds. This MSE result is very strongover the range of quantization rates generally used in image andvideo coding applications. This is one of the main reasons for thestrength of the proposed bounds. It also explains the relativelypoorer performance of the bounds at lower rates. Further, theexpression for is accurate for all the three distributions con-sidered in this analysis. Finally, since the distribution of DCTcoefficients of natural images is well approximated by either aGaussian or Laplacian pdf, the bounds for Gaussian and Lapla-cian sources are well-behaved even for natural images. Oneother useful property that the proposed bounds possess is their

Page 13: 1624 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, …live.ece.utexas.edu/publications/2008/ssc_ratebounds_tip_sep08.pdf · 1624 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17,

1636 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, NO. 9, SEPTEMBER 2008

Fig. 14. Rate allocation example. (a) The original Boats image. (b) Original quantized using the 5, 1, 1, 1 rate profile. SSIM index = 0:7743, Laplacian estimateof SSIM index = 0:7865; MSE = 130:28. (c) Original quantized using the 4, 2, 1, 1, rate profile. SSIM index = 0:7551, Laplacian estimate of SSIM index =

0:7676;MSE = 191:62. (d) Original quantized using the 3, 3, 1, 1 rate profile. SSIM index = 0:6689, Laplacian estimate of SSIM index = 0:6704; MSE =

301:39. p = 0:9. (a) Original. (b) 5111. (c) 4211. (d) 3311.

ease of implementation, even for large DCT block sizes [as com-pared to (8)]. This is again a byproduct of the underlying MSEresult.

V. CONCLUSION AND FUTURE WORK

In this paper, we presented bounds on the SSIM index as afunction of quantization rate for fixed-rate uniform quantiza-tion. The proposed bounds make use of a well-known relationbetween MSE and quantization rate for fixed rate uniform quan-tization under the high rate assumption. We have demonstratedthe strength of the proposed bounds using a wide variety of ex-amples, and their usefulness in a practical scenario. Through

these results, we have taken a step in the path of designing per-ceptually optimal image coding algorithms, and more generallyin designing perceptually optimal image processing algorithmsthat leverage the strength of the emerging IQA algorithms.

Several improvements to the proposed bounds could be madefor future research. It is well known that one of the main rea-sons for the compression efficiency of transform based codingis the unequal perceptual importance of the transform coeffi-cients (DCT or wavelet). Our results currently do not incorpo-rate this feature of DCT coefficients. The SSIM index wouldhave to be modified to do so, and we hope to address this soon.Similar analysis for the wavelet transform would also be veryuseful. We believe that such an analysis would be similar in

Page 14: 1624 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, …live.ece.utexas.edu/publications/2008/ssc_ratebounds_tip_sep08.pdf · 1624 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17,

CHANNAPPAYYA et al.: RATE BOUNDS ON SSIM INDEX OF QUANTIZED IMAGES 1637

flavor to the current analysis. An extension to video would in-volve applying the current results on a frame-by-frame basis.Finally, the bounds could be improved by a more careful choiceof constants.

APPENDIX

In the following, we present the Proof of Theorem 3.1 and thederivation of for uniform, Gaussian, and Laplacian sources.

Let andrepresent a vector and its quan-

tized version in the DCT domain, respectively. The subscriptsindex the appropriate code points in the joint

granular region of the quantizer denoted by . We assumethat is the DC coefficient. Further, let and represent thespace domain versions of and respectively.

Proof: Let vectorbe a set of quantization levels corresponding to

SSIM with

(19)

where corresponds to the mean term, and corresponds tothe structure term. It is easy to show that (for naturalimages), and . Now

(20)

To simplify the denominator in the above equation, a variableis introduced. Since is large for

the most interesting case of an 8 8 DCT, the distribution ofcan be approximated well by a Gaussian distribution due to thecentral limit theorem. For a specified probability , chooseand such that . The term

is subtracted from since it is larger than the highest valuethat any of quantizer levels can take. Since is Gaussian, itfollows that the expression for that make the bounds holdwith probability is given by (10), where the first and secondmoments of are computed based on the source distribution.Since , the following bound holds with probability

(21)

Applying the expectation operator, using the high-rate uniformquantization result MSE [1], [15], and the inde-pendence assumption

(22)

Replacing with and rearranging terms

SSIM

(23)

The derivation of the term for different source types isgiven below. For simplicity, we assume that is zero-mean.The results hold irrespective of this assumption.

A. Uniform Source

The source pdf is given by

ifelsewhere

and the expression

substituting in the first integral

and using the standard result in the second

(24)

Page 15: 1624 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, …live.ece.utexas.edu/publications/2008/ssc_ratebounds_tip_sep08.pdf · 1624 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17,

1638 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, NO. 9, SEPTEMBER 2008

B. Gaussian Source

For this case, we present the expression for the contributionto only from the first term in the numerator of . The contri-bution from in the numerator cannot be evaluated in closedform. Note that the presence of in the denominator helpsremain stable. Here, , and

substituting

and simplifying

(25)

where , is the exponentialintegral.

C. Laplacian Source

Here, . We present thesteps for the case where , and , and assumethat there are intervals that satisfy this case. The other casesfollow similar steps

using partial fraction expansions for both terms

with

and simplifying

(26)

ACKNOWLEDGMENT

The authors would like to thank Prof. C. Caramanis for usefuldiscussions on the probabilistic bounds.

REFERENCES

[1] W. R. Bennett, “Spectra of quantized signals,” Bell Syst. Tech. J., vol.27, pp. 446–472, Jul. 1948.

[2] R. W. Buccigrossi and E. P. Simoncelli, “Image compression via jointstatistical characterization in the wavelet domain,” IEEE Trans. ImageProcess., vol. 8, no. 12, pp. 1688–1701, Dec. 1999.

[3] D. M. Chandler and S. S. Hemami, “Dynamic contrast-based quan-tization for lossy wavelet image compression,” IEEE Trans. ImageProcess., vol. 14, no. 4, pp. 2284–2298, Apr. 2005.

[4] D. M. Chandler and S. S. Hemami, “VSNR: A wavelet-based visualsignal-to-noise ratio for natural images,” IEEE Trans. Image Process.,vol. 16, pp. 397–410, 2007.

[5] S. S. Channappayya, A. C. Bovik, C. Caramanis, and R. W. Heath,Jr., “Design of linear equalizers optimized for the structural similarityindex,” IEEE Trans. Image Process., vol. 16, no. 7, pp. 857–872, Jul.2007.

[6] S. S. Channappayya, A. C. Bovik, and R. W. Heath, Jr., “A linear es-timator optimized for the structural similarity index and its applicationto image denoising,” Proc. IEEE Int. Conf. Image Processing, 2006.

[7] T. M. Cover and J. A. Thomas, Elements of Information Theory. NewYork: Wiley, 1991.

[8] S. Daly, “The visible difference predictor: An algorithm for the assess-ment of image fidelity,” in Digital Images and Human Vision, A. B.Watson, Ed. Cambridge, MA: MIT Press, 1993, pp. 179–206.

[9] M. P. Eckert and A. P. Bradley, “Perceptual quality metrics applied tostill image compression,” Signal Process., vol. 70, pp. 177–200, Nov.1998.

[10] J. D. Eggerton and M. D. Srinath, “A visually weighted quantizationscheme for image bandwidth compression at low data rates,” IEEETrans. Commun., vol. COM-34, no. 12, pp. 840–847, Aug. 1986.

[11] C. Fogg, D. J. LeGall, J. L. Mitchell, and W. B. Pennebaker, MPEGVideo Compression Standard.. New York: Springer, 1996.

[12] A. Gersho and R. M. Gray, Vector Quantization and Signal Compres-sion.. New York: Springer, 1991.

[13] B. Girod, , A. B. Watson, Ed., “What’s Wrong with Mean-squaredError?,” in Digital Images and Human Vision. Cambridge, MA: MITPress, 1993, pp. 207–220.

[14] I. S. Gradshteyn and I. M. Ryzhik, Table of Integrals, Series, and Prod-ucts.. San Diego, CA: Academic, 2000.

[15] R. M. Gray and D. L. Neuhoff, “Quantization,” IEEE Trans. Inf.Theory, vol. 44, no. 6, pp. 2325–2383, Oct. 1998.

[16] I. Hontsch and L. J. Karam, “Adaptive image coding with perceptualdistortion control,” IEEE Trans. Image Process., vol. 11, no. 3, pp.213–222, Mar. 2002.

[17] D. Hui and D. L. Neuhoff, “Asymptotic analysis of optimal fixed-rateuniform scalar quantization,” IEEE Trans. Inf. Theory, vol. 47, no. 3,pp. 957–977, Mar. 2001.

[18] A. K. Jain, Fundamentals of Digital Image Processing. EnglewoodCliffs, NJ: Prentice-Hall, 1989.

[19] E. Y. Lam and J. W. Goodman, “A mathematical analysis of the DCTcoefficient distributions for images,” IEEE Trans. Image Process., vol.9, no. 10, pp. 1661–1666, Oct. 2000.

[20] Z. Liu, L. J. Karam, and A. B. Watson, “JPEG2000 encoding with per-ceptual distortion control,” IEEE Trans. Image Process., vol. 15, no. 7,pp. 1763–1778, Jul. 2006.

Page 16: 1624 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, …live.ece.utexas.edu/publications/2008/ssc_ratebounds_tip_sep08.pdf · 1624 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17,

CHANNAPPAYYA et al.: RATE BOUNDS ON SSIM INDEX OF QUANTIZED IMAGES 1639

[21] J. Lubin, , A. B. Watson, Ed., “The use of psychophysical data andmodels in the analysis of display system performance,” in DigitalImages and Human Vision. Cambridge, MA: MIT Press, 1993, pp.163–178.

[22] J. Lubin, Visual Models for Target Detection and Recognition. Sin-gapore: World Scientific, 1995, ch. 10, pp. 245–283.

[23] J. L. Mannos and D. J. Sakrison, “The effects of a visual fidelity crite-rion on the encoding of images,” IEEE Trans. Inf. Theory, vol. 20, no.7 , pp. 525–536, Jul. 1974.

[24] N. B. Nill, “A visual model weighted cosine transform for imagecompression and quality assessment,” IEEE Trans. Commun., vol.COM-33, pp. 551–557, Jul. 1985.

[25] T. N. Pappas and R. J. Safranek, , A. C. Bovik, Ed., “Perceptual Cri-teria for Image Quality Evaluation,” in Handbook of Image and VideoProcessing. New York: Academic, 2000, pp. 669–684.

[26] W. B. Pennebaker and J. L. Mitchell, JPEG Still Image Data Compres-sion Standard. New York: Van Nostrand Reinhold, 1993.

[27] M. F. Sabir, H. R. Sheikh, R. W. Heath, Jr., and A. C. Bovik, “A jointsource-channel distortion model for JPEG compressed images,” IEEETrans. Image Process., vol. 15, no. 6, pp. 1349–1364, Jun. 2006.

[28] R. Safranek and J. Johnston, “A perceptually tuned sub-band imagecoder with image dependent quantization and post-quantization datacompression,” in Proc. Int. Conf. Acoustics, Speech, and Signal Pro-cessing, May 23–26, 1989, vol. 3, pp. 1945–1948.

[29] O. Schwartz and E. P. Simoncelli, “Natural signal statistics and sensorygain control,” Nature Neurosci., vol. 4, no. 8, pp. 819–825, Aug. 2001.

[30] H. R. Sheikh and A. C. Bovik, “Image information and visual quality,”IEEE Trans. Image Process., vol. 15, no. 2, pp. 430–444, Feb. 2006.

[31] H. R. Sheikh, A. C. Bovik, and G. de Veciana, “An information fidelitycriterion for image quality assessment using natural scene statistics,”IEEE Trans. Image Process., vol. 14, no. 12, pp. 2117–2128, Dec. 2005.

[32] H. R. Sheikh, M. F. Sabir, and A. C. Bovik, “A statistical evaluationof recent full reference image quality assessment algorithms,” IEEETrans. Image Process., vol. 15, no. 11, pp. 3440–3451, Nov. 2006.

[33] E. Simoncelli, W. Freeman, E. Adelson, and D. Heeger, “Shiftablemultiscale transforms,” IEEE Trans. Inf. Theory, vol. 38, no. 2, pp.587–607, Mar. 1992.

[34] P. C. Teo and D. J. Heeger, “Perceptual image distortion,” Proc. SPIE,vol. 2179, pp. 127–141, 1994.

[35] M. Vetterli and J. Kovacevic, Wavelets and Subband Coding. Engle-wood Cliffs, NJ: Prentice-Hall, 1995.

[36] M. J. Wainwright and E. P. Simoncelli, “Scale mixtures of gaussiansand the statistics of natural scenes,” Adv. Neural Inf. Process. Syst., vol.12, pp. 855–861, 2000.

[37] Z. Wang and A. C. Bovik, “A universal image quality index,” IEEESignal Process. Lett., vol. 9, no. 3, pp. 81–84, Mar. 2002.

[38] Z. Wang and A. C. Bovik, Modern Image Quality Assessment. SanRafael, CA: Morgan and Claypool, 2006.

[39] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Imagequality assessment: From error visibility to structural similarity,” IEEETrans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.

[40] Z. Wang, Q. Li, and X. Shang, “Perceptual image coding based on amaximum of minimal structural similarity criterion,” presented at theIEEE Int. Conf. Image Processing, Sep. 2007.

[41] Z. Wang and E. P. Simoncelli, “Translation insensitive image simi-larity in complex wavelet domain,” in Proc. IEEE Int. Conf. Acoustics,Speech, and Signal Processing, 2005, vol. 2, pp. 573–576.

[42] Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multi-scale structuralsimilarity for image quality assessment,” in Proc. IEEE Asilomar Conf.Signals, Systems, Computers, 2003, vol. 2, pp. 1398–1402.

[43] A. Watson, “Visually optimal DCT quantization matrices for individualimages,” in Proc. Data Compression Conf., Mar. 30–Apr. 2, 1993, pp.178–187.

[44] A. B. Watson, Digital Images and Human Vision. Cambridge, MA:MIT Press, 1993.

[45] S. Winkler, “A perceptual distortion mertic for digital color video,”Proc. SPIE, vol. 3644, pp. 175–184, 1999.

Sumohana S. Channappayya (S’01–M’08) re-ceived the B. E. degree from the University ofMysore, India, in 1998, the M.S. degree in electricalengineering from the Arizona State University,Tempe, AZ, in 2000, and the Ph.D. degree in elec-trical and computer engineering from The Universityof Texas at Austin in 2007.

He worked for PacketVideo Corporation, SanDiego, CA, from 2001 to 2003. He is currently withPacketVideo Corporation. His research interestsinclude image restoration, image and video coding,

image and video quality assessment, and multimedia communication.

Alan Conrad Bovik (S’80–M’81–SM’89–F’96) re-ceived the B.S., M.S., and Ph.D. degrees in electricaland computer engineering from the University of Illi-nois at Urbana-Champaign, Urbana, in 1980, 1982,and 1984, respectively.

He is currently the Curry/Cullen Trust EndowedProfessor at The University of Texas at Austin,where he is the Director of the Laboratory for Imageand Video Engineering (LIVE) in the Center forPerceptual Systems. His research interests includeimage and video processing, computational vision,

digital microscopy, and modeling of biological visual perception. He has pub-lished over 450 technical articles in these areas and holds two U.S. patents. Heis also the author of The Handbook of Image and Video Processing (Elsevier,2005, 2nd ed.) and Modern Image Quality Assessment (Morgan & Claypool,2006).

Dr. Bovik has received a number of major awards from the IEEE SignalProcessing Society, including: the Education Award (2007); the TechnicalAchievement Award (2005), the Distinguished Lecturer Award (2000); and theMeritorious Service Award (1998). He is also a recipient of the DistinguishedAlumni Award from the University of Illinois at Urbana-Champaign (2008),the IEEE Third Millennium Medal (2000), and two journal paper awardsfrom the international Pattern Recognition Society (1988 and 1993). He isa Fellow of the Optical Society of America and a Fellow of the Societyof Photo-Optical and Instrumentation Engineers. He has been involved innumerous professional society activities, including: Board of Governors, IEEESignal Processing Society, 1996–1998; Editor-in-Chief, IEEE TRANSACTIONS

ON IMAGE PROCESSING, 1996–2002; Editorial Board, PROCEEDINGS OF THE

IEEE, 1998–2004; Series Editor for Image, Video, and Multimedia Processing,Morgan and Claypool Publishing Company, 2003–present; and FoundingGeneral Chairman, First IEEE International Conference on Image Processing,Austin, TX, November 1994. He is a registered Professional Engineer in theState of Texas and is a frequent consultant to legal, industrial, and academicinstitutions.

Robert W. Heath, Jr. (S’96–M’01–SM’06) re-ceived the B.S. and M.S. degrees from the Universityof Virginia, Charlottesville, in 1996 and 1997,respectively, and the Ph.D. degree from StanfordUniversity, Stanford, CA, in 2002, all in electricalengineering.

From 1998 to 2001, he was a Senior Member ofthe Technical Staff then a Senior Consultant at IospanWireless, Inc., San Jose, CA where he worked on thedesign and implementation of the physical and linklayers of the first commercial MIMO-OFDM com-

munication system. In 2003, he founded MIMO Wireless, Inc., a consultingcompany dedicated to the advancement of MIMO technology. Since January2002, he has been with the Department of Electrical and Computer Engineering,The University of Texas at Austin, where he is currently an Associate Professorand member of the Wireless Networking and Communications Group. His re-search interests include several aspects of MIMO communication: limited feed-back techniques, multihop networking, multiuser MIMO, antenna design, andscheduling algorithms, as well as 60-GHz communication techniques and mul-timedia signal processing.

Dr. Heath has been an Editor for the IEEE TRANSACTIONS ON

COMMUNICATIONS and an Associate Editor for the IEEE TRANSACTIONS

ON VEHICULAR TECHNOLOGY. He is a member of the Signal Processingfor Communications Technical Committee in the IEEE Signal ProcessingSociety. He was a Technical Co-Chair for the 2007 Fall Vehicular TechnologyConference, the General Chair of the 2008 Communication Theory Workshop,and a co-organizer of the 2009 Signal Processing for Wireless CommunicationsWorkshop. He is the recipient of the David and Doris Lybarger EndowedFaculty Fellowship in Engineering and is a registered Professional Engineerin Texas.