Relations between local and global perceptual image ...vision.eng.shizuoka.ac.jp/pubs/pdfs/93940M.pdf · Keywords: Image quality, local image quality, visual masking, local detection

Relations between local and global perceptual image qualityand visual masking

Md Mushfiqul Alam, Pranita Patil, Martin T. Hagan, and Damon M. Chandler

School of Electrical and Computer Engineering,Oklahoma State University, Stillwater, OK 74078

ABSTRACT

Perceptual quality assessment of digital images and videos are important for various image-processing appli-cations. For assessing the image quality, researchers have often used the idea of visual masking (or distortionvisibility) to design image-quality predictors specifically for the near-threshold distortions. However, it is stillunknown that while assessing the quality of natural images, how the local distortion visibilities relate with thelocal quality scores. Furthermore, the summing mechanism of the local quality scores to predict the global qualityscores is also crucial for better prediction of the perceptual image quality. In this paper, the local and globalqualities of six images and six distortion levels were measured using subjective experiments. Gabor-noise targetwas used as distortion in the quality-assessment experiments to be consistent with our previous study [Alam,Vilankar, Field, and Chandler, Journal of Vision, 2014], in which the local root-mean-square contrast detectionthresholds of detecting the Gabor-noise target were measured at each spatial location of the undistorted images.Comparison of the results of this quality-assessment experiment and the previous detection experiment showsthat masking predicted the local quality scores more than 95% correctly above 15 dB threshold within 5% subjectscores. Furthermore, it was found that an approximate squared summation of local-quality scores predicted theglobal quality scores suitably (Spearman rank-order correlation 0.97).

Keywords: Image quality, local image quality, visual masking, local detection thresholds, natural scenes.

1. INTRODUCTION

Perceptual quality assessment of digital images is important for maintaining quality services to the digitalmedia consumers. Even though the quality assessment of images and videos has become more challengingfor increasing variety of display forms,1 the consumer demands for high-quality images and videos have recentlybeen increased due to the better compression schemes, such as H.264, HEVC. Furthermore, several internet-basedmedia providers, such as Netflix, and Hulu Plus, and display device manufacturers, such as Sony, Samsung, andLG have made it possible for the consumers to watch high-quality ultra HD videos. Because the demand andavailability of high-quality media is increasing in recent days, better assessment of the perceptual quality ofhigh-quality images and videos is becoming more important.

Although the quality assessment of the high-quality images is important, most images in the current imagequality databases are heavily distorted. The subjective quality ratings in the image quality databases aregenerally expressed in terms of mean-opinion-scores (MOS) or difference-mean-opinion-scores (DMOS), and thehigher the MOS or lower the DMOS, the better the quality of the image. Although it is difficult to set up aspecific threshold to classify an image as low-quality or high-quality, roughly 70% of the images in the currentimage-quality databases are heavily distorted: in the LIVE image quality database,2 67.2% images (660 imagesout of 982 images) have DMOS above 25; in the CSIQ image quality database3 63.4% images (549 images outof 866 images) have DMOS above 20, and in the TID database4 86.8% images (1475 images out of 1700 images)have MOS above 2.5. Even though such heavily distorted images are important for better understanding theperceptual strategies to evaluate quality of various distortion levels, it is still an open research question that ifthe human observer really employs single or multiple strategies at different distortion levels.3,5, 6 Furthermore, ifthe human observer employs different strategies in different distortion levels, current image-processing algorithmscan benefit from a study exploring such strategies.

Further author information: E-mail: {mdma, pranita, mhagan, damon.chandler}@okstate.edu

Invited Paper

Human Vision and Electronic Imaging XX, edited by Bernice E. Rogowitz, Thrasyvoulos N. Pappas, Huib de Ridder,Proc. of SPIE-IS&T Electronic Imaging, SPIE Vol. 9394, 93940M · © 2015 SPIE-IS&T

CCC code: 0277-786X/15/$18 · doi: 10.1117/12.2084935

Prof. of SPIE-IS&T Vol. 9394 93940M-1

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 05/07/2016 Terms of Use: http://spiedigitallibrary.org/ss/TermsOfUse.aspx

Visual masking7 phenomenon has been used to estimate the relative distortion visibility in images and videos,specifically for near-threshold distortions.3,8–15 For example, Damera-Venkata et al.11 used a traditional contrastmasking model to develop a noise quality measure for natural images. Similarly, Chandler and Hemami13

presented a visual signal-to-noise ratio (VSNR), which applied two different schemes for near-threshold and supra-threshold distortions. Specifically, VSNR used a wavelet based model of visual masking and visual summationto measure the visual fidelity for near-threshold distortion. Recently, Laparra et al.16 used an improved divisivenormalization based masking model for better assessment of image quality. Similarly, Larson and Chandler3

proposed the most-apparent-distortion (MAD) image-quality assessment algorithm, which adopted a two-stagestrategy, namely, detection-based strategy for high-quality images, and appearance-based strategy for low-qualityimages. In the detection-based strategy of MAD, local luminance and contrast masking were used to model thedistortion visibility.

Although masking phenomenon has been often used for image-quality assessment models, two issues havenot yet been addressed properly: first, it is generally assumed that masking is more effective for assessing thequality of near-threshold distortions. However, such assumption has not been tested experimentally, which iscrucial for better assessment of image-quality at different distortion levels. Second, it is still a research questionthat how the masking in the local regions of an image might affect the global quality of the image. This paperaddresses these two issues by presenting the results of a series of controlled subjective experiments for assessingthe local and global quality of natural images. The local and global quality scores, along with the local detectionthresholds (masking thresholds) obtained from our previous study17,18 were used to analyze the relationshipsbetween masking and image quality, and local and global quality.

It should be noted that the experiments presented in this paper used a smaller but diverse set of images withonly a single type of distortion. Although testing with more number of images with more distortion types wouldbe interesting, we believe this study represents an important first step towards a better understanding of therelationships between masking and local quality, and local and global quality.

2. EXPERIMENT METHODS

We performed two experiments to measure: (a) local quality, and (b) global quality of the images. This sectionprovides details of the experiment apparatus, stimuli, procedure, and subjects employed to measure the localand global qualities of the natural images. Note that in our previous study17,18 on masking, we measured localdetection thresholds within natural scenes. In the experiments described in this paper, we kept the experimentenvironment and the stimuli consistent with our previous study17,18 so that the relationship between maskingand image quality could be explored consistently.

2.1 Apparatus

Stimuli were displayed on a Samsung SyncMaster S24B240 LED monitor. The monitor was driven by a computerwhich was equipped with an Intel core 2 quad q6600 processor, 3.0 GB of RAM, with an NVIDIA GeForce 8600GT graphics card. The screen dimensions were 23.6 inches diagonally, 20.5 inches horizontally, and 11.5 inchesvertically. The display resolution was 1920× 1080 pixels at a frame rate of 60 Hz. The total angle subtended bythe display was around 47 degrees. The maximum possible radial frequency was 20.4 cycles/degree (c/deg). Theminimum and maximum luminances of the display were set to 0.09 and 114 cd/m2, respectively. The relationshipbetween the digital pixel value and displayed luminance was linearized in software by using a lookup table withluminance measurements made via a Konica Minolta Chroma Meter (CS-100A). The subjects viewed the stimulibinocularly through natural pupils in a darkened room at a distance of approximately 60 cm. The local-qualityexperiment was conducted using a single monitor, and the global-quality experiment was conducted by placingtwo similar monitors with similar settings side-by-side.

2.2 Stimuli

This section describes details of the stimuli generation steps for the (a) global-quality experiment and (b) local-quality experiment.



1

��

Reference

image, !"

Distorted

image, !#

Distorted

image, !$ Distorted

image, !%

Distorted

image, !& Distorted

image, !�

Gabor noise, '(

�% �& �$ �#

Reference

patch, !),"

Distorted

patch, !),�

Distorted

patch, !),&

Distorted

patch, !),%

Distorted

patch, !),$

Distorted

patch, !),#

Figure 1. Stimuli of the experiments. The full-sized distorted images Dl, (l = 0, ..., 5) were the stimuli of global-qualityassessment experiment. The total angle subtended by each stimulus of the global-quality assessment experiment was 12.5degrees. The distorted patches Dp,l, (p = 1, ..., 36 and l = 0, ..., 5) were the stimuli of local-quality assessment experiment.The total angle subtended by each stimulus of the local-quality assessment experiment was 2.1 degrees without the context,and 5.2 degrees with context. The details of the stimuli generation steps are given in the Section 2.2.

2.2.1 Stimuli of Global-Quality Assessment Experiment

Six reference images (D0 or R ∈ {log seaside, swarm, elk,native american,monument, aerial city}) were chosenas mask images from the CSIQ database.3 Each of the six images was chosen from six different categories. Thetop row of Figure 4 shows the mask images along with their names and corresponding categories. It should benoted that each of these images was normalized to span an 8-bit digital range of 0− 255, which produced sharp,high-contrast images that are useful for image-quality assessment, but this also indicates that the contrasts werenot necessarily identical to the original scenes. The dimension of each image was 510× 510 pixels (12.5 degrees).

The target (distortion) was a vertically oriented Gabor-noise pattern which had a center radial frequency of3.7 c/deg and one-octave bandwidth. The top image of Figure 1 shows the Gabor-noise pattern (GN ), whichwas 510 × 510 pixels in dimension, and ranged between −1 to +1 with zero mean. The details of the targetgeneration steps can be found in the methods section of our previous study on masking.18

Figure 1 shows how the stimuli were generated. Note that including the reference image, there are sixdistorted images Dl (l = 0, ..., 5). The distorted images were calculated via:

Dl(x, y) = ⌊ml ×GN (x, y) +D0(x, y)⌉ ,

Dl(x, y) =

0 if Dl(x, y) < 0,

255 if Dl(x, y) > 255,

Dl(x, y) otherwise.

(1)



Table 1. Target contrasts (Cl, l = 0, ..., 5) and the multipliers (ml) at six distortion levels (l = 0, ..., 5) of six images(R ∈ {log seaside, swarm, elk,native american,monument, aerial city}).

Target contrasts: C0 C1 C2 C3 C4 C5

Units:linear 0 0.052 0.103 0.207 0.413 0.826

decibel (dB) −∞ -25.74 -19.72 -13.69 -7.68 -1.66multipliers: m0 m1 m2 m3 m4 m5

Images:

log seaside 0 14.24 28.79 59.19 127.08 293.61swarm 0 11.12 22.48 45.32 90.82 178.34

elk 0 11.63 23.39 46.78 93.57 187.18native american 0 14.92 30.53 63.13 133.62 319.15

monument 0 13.82 27.88 56.31 116.41 269.81aerial city 0 10.96 22.06 44.04 87.25 172.81

where GN is the Gabor-noise pattern, ml (l = 0, ..., 5) are the multipliers, ⌊ ⌉ denotes the rounding operation, D0

is the undistorted mask image, x = 1, ..., 510, and y = 1, ..., 510,. The values of the multipliers (ml) were chosensuch that the RMS contrasts19 of the target were certain multiples of the maximum RMS contrast detectionthreshold found from our previous study.18 From our previous study, we found the maximum local detectionthreshold for patches having luminance > 3 cd/m2 was CT,max = 0.207 (−13.69 dB). The top three rows ofTable 1 show the target RMS contrasts at the six distortion levels (Cl, l = 0, ..., 5). The relationship betweenCT,max and Cl (in linear RMS contrast units) can be shown via:

C0 = 0× CT,max, C1 = 0.25× CT,max, C2 = 0.5× CT,max,

C3 = 1.0× CT,max, C4 = 2.0× CT,max, C5 = 4.0× CT,max.(2)

The bottom seven rows of Table 1 show the values of the multipliers (ml) for the six distortion levels (l =0, ..., 5) of the six images (R ∈ {log seaside, swarm, elk,native american,monument, aerial city}) to achieve thecorresponding target contrasts. Thus, we generated total 36 distorted images each having 510× 510 pixels (12.5degrees), and used the 36 distorted images for global-quality assessment experiment.

2.2.2 Stimuli of Local-Quality Assessment Experiment

Each of the reference images (R or D0) and the Gabor-noise target (GN ) was of size 510 × 510 pixels. Formeasuring the local quality scores, each of the reference images was divided into 36 patches (Dp,0, p = 1, ..., 36)of size 85× 85 pixels (around 2.1 degrees). The Gabor noise target (GN ) was also divided into 36 correspondingpatches (GNp , p = 1, ..., 36) of size 85×85 pixels (2.1 degrees). The stimuli (Dp,l) were generated by multiplyingthe Gabor-noise patch (GNp) with a scalar ml corresponding to the image I and the distortion level l, and addingthe multiplication output with the undistorted image patch Dp,0 via:

Dp,l(x, y) = ml ×GNp(x, y) +Dp,0(x, y), (3)

where p = 1, ..., 36, x = 1, ..., 85, and y = 1, ..., 85.

To better simulate the spatially-localized condition, the patches (Dp,l) were additionally padded with 64pixels (around 1.6 degrees) of context from the reference image. The angle subtended by the stimuli was 5.2degrees with the context, and 2.1 degrees without the context. Lets denote the stimuli with context as Dp,l.To reduce edge effects, before adding to the undistorted image patch Dp,0, the Gabor-noise patch (GNp) wasmultiplied with a circular window (2.1 deg or 85× 85-pixels) given by,

w(r) = 1− 1

1 + exp (γ − r/β), (4)

where r =√u2 + v2, β = 3, γ = 10, u = −42.5,−41.5, ..., 41.5, and v = −42.5,−41.5, ..., 41.5. Similarly, the

context-padded stimuli (Dp,l) were gradually alpha-blended with the background luminance (14 cd/m2) via,

Dp,l = w × Dp,l + (1− w)× Γ, (5)



Within-patch local quality

Inter-image local quality

Within-image local quality g

Within-patch local quality

aerial_city

swarm

(a) Procedure of global-quality assessment experiment (b) Procedure of local-quality assessment experiment

Figure 2. Illustration of the procedure of (a) global-quality assessment experiment, and (b) local-quality assessmentexperiment.

where Dp,l is the windowed padded stimuli, and where Γ was a digital value of 105, yielding a backgroundluminance of 14 cd/m2. The circular window w (5.2 deg or 213× 213 pixels) was generated via Equation 4 withthe parameters β = 3, γ = 30, u = −106.5,−105.5, ..., 105.5, and v = −106.5,−105.5, ..., 105.5. The windowedpadded stimuli (Dp,l) were viewed by the subjects during local quality-assessment experiment. Example stimuliare shown at the bottom row of the Figure 1. Total number of stimuli for the local-quality assessment experimentwas 1296 (6 images × 6 distortion levels per image × 36 patches per distortion level).

2.3 Procedure

The subjective scores of the images and image patches were measured based on a linear displacement of the imagesand patches across calibrated monitors placed side by side with equal viewing distance to the observer.3 In thissection, first the procedure of the global-quality assessment experiment is described, and then the procedure ofthe local-quality assessment experiment is described.

2.3.1 Procedure of Global-Quality Assessment Experiment

The distorted images (Dl) were displayed in two side-by-side placed monitors, such that the six distorted images(l = 0, ..., 5) of the same image appeared in the same row, and different images appeared in different rows. Figure2(a) shows the initial arrangement of the images shown in the displays.

The score of the images were indicated only by the horizontal positions of the images. Each of the 36 imagescould be dragged and dropped to another position of the display by using mouse inputs. Subjects were instructedto place poorer quality images to the right, and better quality images to the left within the display. Subjectswere also instructed to carefully compare the relative horizontal distance of one image with different distortedlevels of the same image, as well as with distorted levels of other images.

During horizontal movement of one image, the position of only that image changed horizontally. However, forbetter comparison between two different images, when one image was moved vertically, all six distorted imagesof that image moved vertically altogether, keeping their horizontal positions the same. For better viewing, if oneimage was selected using mouse input, all six distorted levels of that image appeared in front of other images.Subjects viewed the images in a darkened room, and were not given any time limitation for the quality assessmenttask. The horizontal positions of the images were saved at the end of the experiment.



2.3.2 Procedure of Local-Quality Assessment Experiment

The number of patches to score in the local-quality assessment experiment was significantly higher (1296 patches)compared to the global-quality assessment experiment. Thus, we adopted a three steps procedure to measure thelocal quality scores of the patches: within-patch local quality assessment, within-image local quality assessment,and inter-image local quality assessment. The illustration of the three steps procedure is shown in Figure 2(b).The steps are discussed in the following:

First step: Within-patch local quality. The pth patch of an image R had six distortion levels (l = 0, ..., 5).We denote the local quality scores of the six distortion levels of a fixed patch p, and a fixed image R as within-patch local quality. In this step, six distorted-level patches of the same patch p of the same image R weredisplayed in a single monitor. The initial horizontal positions of the six patches were random (the patches werenot placed according to their distortion levels), and the vertical positions of the patches were same. Subjectswere instructed to place the better quality patches to the left, and the poorer quality patches to the right usingmouse inputs. Subjects could score new sets of patches, or change the scores of previous sets of patches byusing two separate push buttons. At the end of this step, the horizontal positions of the patches were saved aswithin-patch local quality scores.

Second step: Within-image local quality. In this step, the six distortion levels of different patches(different p) of the same image (same R) were viewed altogether. The initial horizontal positions of the patcheswere set from the results of the previous within-patch quality step. Subjects were instructed to carefully move thepatches, such that the relative horizontal positions of the patches reflect the quality variations due to distortionlevels as well as the patch contents. At the end of this step, the horizontal positions of the patches were savedas within-image local quality scores.

Third Step: Inter-image local quality. From the first two steps, the quality variations due to localcontents, and distortion levels were reflected. However, the local quality may also vary due to the global contentvariations in different images (different R). To account for the global content variations, in this step, four setsof distorted-level patches were chosen from each image R. The four sets contained the poorest quality patchesmeasured from the previous two steps. Subjects viewed different sets of patches coming from different images(different R) altogether. Subjects were instructed to move the position of the poorest quality patch of each setby comparing with other sets. Subjects were not allowed to change the positions of other patches except thepoorest patch in a set. At the end of this step, the horizontal positions of the patches were saved. The averagemovement of the four sets per image R, were used to calculate six multiplication factors with which the scoresfound from the second step were multiplied to incorporate the inter-image local quality variations.

2.4 Subjects

Four adults (MA, YZ, TP, and JH) including the author (MA) participated in the experiment. All subjectshad normal or corrected-to-normal visual acuity. All subjects were experienced with subjective image-qualityassessment experiments. Each of the distorted images and distorted patches was scored at least once by eachsubject.

3. EXPERIMENT RESULTS AND ANALYSIS

In this section, first the subject consistency of the experiment is shown. Then, qualitative observations on thelocal quality scores and detection thresholds are given. After that, quantitative analysis of the local qualityscores and detection thresholds is given. At the end of this section, the relationship between the local qualityscores and global quality scores is discussed.

3.1 Subject Consistency

Subjects were consistent in judging both the local and global image quality scores. Table 2 shows the Pearsoncorrelation coefficient (CC) and the Spearman correlation coefficient (SROCC) between two different subjectsfor both the local quality and global quality experiments. Note that before calculating the CC, the scores weretransformed through a logistic transform3,20 to remove any nonlinearity between the scores. The average CC andSROCC values for local quality experiment were 0.931 and 0.907, respectively, and the average CC and SROCC



Table 2. Subject consistency in terms of Pearson correlation coefficient (CC) and Spearman rank order correlation coeffi-cient (SROCC).

MA/YZ MA/TP MA/JH YZ/TP YZ/JH TP/JH Average

Local qualityCC 0.959 0.930 0.925 0.941 0.927 0.903 0.931

SROCC 0.951 0.938 0.884 0.935 0.876 0.862 0.907

Global qualityCC 0.995 0.990 0.993 0.995 0.993 0.985 0.992

SROCC 0.986 0.981 0.988 0.995 0.987 0.977 0.986

log_

seaside

swarm

elk

native_

american

monument

aerial_

city

LQM

LQM

LQM

LQM

LQM

LQM

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

LQM

colorbar

Better

quality

��:

-∞ dB

��!:

-25.7 dB

��":

-19.7 dB

��#:

-13.7 dB

��$:

-7.7 dB

��%:

-1.7 dB

��:

-∞ dB

��!:

-25.7 dB

��":

-19.7 dB

��#:

-13.7 dB

��$:

-7.7 dB

��%:

-1.7 dB

Figure 3. Local Quality Maps (LQM) of six images (R ∈ {log seaside, swarm, elk,native american,monument, aerial city})and six different distortion levels (l = 0, ..., 5). Note that the distortion levels are defined by the target contrasts over thefull-sized images as described in Table 1. The local quality (LQ) scores were calculated by averaging the scores from fourdifferent subjects. The scores corresponding to the gray-scale values in the LQMs are indicated by the colorbar at theright.

Threshold

Colorbar (dB)

70

-50

-40

-30

-20

-10

0

1010

0

-10

-20

-30

-40

-50

Lower

distortion

visibility

Image

Masking

map /

DVM

Landscape

log_seaside

Plant

swarm

Animal

elk

People

native_american

Structure

monument

Urban

aerial_city

Figure 4. Masking maps drawn from the results of our previous study.18 Each patch in the masking map denotes theRMS contrast detection threshold for detecting a Gabor-noise target (as shown in Figure 1) placed over the correspondingpatch in the mask image. The thresholds corresponding to the gray scale values in the masking maps are indicated bythe colorbar at the right. Note that in this paper, masking maps are also denoted by distortion visibility maps (DVM).

values for global-quality experiment were 0.992 and 0.986, indicating that the subjects were very consistent witheach other.

3.2 Local Quality and Masking: Qualitative Observations

The local quality scores are presented in forms of Local Quality Maps (LQM). Figure 3 shows the average LQMsof six reference images and six distortion levels. The local quality (LQ) scores were calculated by first averaging



Table 3. The number of patches in which the target contrasts (Cp) are above the target detection thresholds (CT,p). ldenote the distortion levels. In this table, l = 1, ..., 5. The image names are shown at the top row.

log seside swarm elk native american monument aerial cityl = 1 32 33 15 36 36 36l = 2 35 36 31 36 36 36

l = 3, ..., 5 36 36 36 36 36 36

the scores from four different subjects, and then normalizing to fall within the range 0 to 1. Note that in Figure3, the local quality scores corresponding to the gray scale values of the LQMs are indicated by the colorbar atthe right.

The standard deviations of the local quality scores resulting from four different subjects’ scores are shown inFigure 5 in the forms of standard deviation (LQM std) maps. From Figure 5, note that for lower and highestdistortion levels (l = 0, 1, and l = 5) the standard deviations were lower compared to the other distortion levels(l = 2, ..., 4). Overall, the average, minimum, and maximum standard deviations of the local quality scores were0.076, 0, and 0.338.

To observe the qualitative relation between the local quality and masking, in Figure 4 the local detectionthresholds (CT,p) of the six images are shown in forms of masking maps or distortion visibility maps (DVM). Thedetection thresholds were measured in our previous study18 using the same Gabor-noise target used in this study.Note that the detection thresholds corresponding to the gray scale values in the masking maps are indicated bythe colorbar at the right.

Score saturation at very-low distortions. At distortion level l = 0 (C0 = −∞ dB) almost all the patchesof all six images show best scores (LQs very close to 1.0) in the LQMs of figure 3. Note that many patches indistortion levels l = 1 (C1 = −25.7 dB) and l = 2 (C2 = −19.7 dB) also show scores very close to 1.0. Forexample, the scores of most of the patches in the elk image have saturated to values close to 1.0 at distortionlevels, l = 0, ..., 2. Such saturations are also visible at the center region of the LQMs of the swarm image, and inthe building regions of the LQMs of the aerial city image.

The score saturation at higher distortion levels is less visible compared to the lower distortion levels. Thereason for score saturation at the lower distortion levels could be the fact that the local target contrasts at thoselevels were below the detection thresholds. Table 3 shows the number of patches in which the target contrasts(Cp) are above the target detection thresholds (CT,p). Note that except the elk, log seaside, and swarm imagesat distortion levels l = 1 and l = 2, all the patches contained supra-threshold distortions. However, the LQMs inFigure 3 suggests that many patches showed score saturation even in the supra-threshold regime (see the brightpatches in distortion levels l = 1 and l = 2 of monument, aerial city, and native american).

Map similarities at medium target contrasts. A visual inspection of the LQMs in Figure 3 and maskingmaps in Figure 4 reveals that the patterns of the LQMs of an image follow the same pattern of the masking mapof the same image. For example, the masking map of the image swarm shows lower distortion visibility at thecenter of the map. Similarly, the LQMs of swarm (distortion levels l = 1, ..., 5) show higher quality regions atthe center. However, for most of the images, the pattern similarity is visible at the medium to higher distortionlevels (l = 3, 4, 5, Cl > −13.7 dB). For example, for the image elk, the pattern similarity between the maskingmaps and the LQMs is better visible at the distortion levels l = 3, 4, and 5. Other images also show such patternsimilarities.

Score uncertainties at very-low distortions. During the local-quality experiment, subjects scored thereference patches along with the distorted patches. Notice that for several reference patches, the scores wereless than the maximum score 1.0. For example, a careful visual examination of the LQMs of monument, elk,log seaside, and swarm at distortion level l = 0 shows a few undistorted patches are slightly gray (smaller than1.0) compared to other bright patches of the corresponding LQMs.

During the experiment, subjects were aware of the fact that the undistorted patches were present among thestimuli. However, because the horizontal positions of the patches at different distortion levels were randomized,during the experiment subjects had to identify the undistorted patch by visual examination of the patches. Itshould be noted that the initial randomization was crucial to remove the possibility of locational bias to the



log_

seaside

swarm

elk

native_

american

monument

aerial_

city

LQM

std

LQM

std

LQM

std

LQM

std

LQM

std

LQM

std

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

LQM

colorbar

Higher

standard

deviation

��:

-∞ dB

��!:

-25.7 dB

��":

-19.7 dB

��#:

-13.7 dB

��$:

-7.7 dB

��%:

-1.7 dB

��:

-∞ dB

��!:

-25.7 dB

��":

-19.7 dB

��#:

-13.7 dB

��$:

-7.7 dB

��%:

-1.7 dB

Figure 5. The standard deviations of the local quality maps (LQM) of six different images with six distortion levels.The standard deviations were calculated from the scores coming from four different subjects. The standard deviationscorresponding to the gray scale values in the LQM-std maps are indicated by the colorbar at the right.

scores. However, the initial randomization also added uncertainty about the identification of the undistortedpatches during the experiment, which resulted in scores of slightly less than 1.0 for some undistorted patches.

3.3 Local Quality and Masking: Quantitative Analysis

In this section, the relationship of the local qualities and the detection thresholds are discussed using quantitativemeasures.

3.3.1 Local Quality Prediction using Threshold Elevation

Figure 6 shows the log local-image-quality versus target threshold elevation plots at five distortion levels (l =1, , 5). The log of local quality scores are shown for better visualization. The left side of the green dotted linedenote the below threshold region, and the right side of the green dotted line denote the supra-threshold region.Data were fitted using a sigmoid function,

log (LQ) =τ1 − τ2

1 + exp(−(∆C − τ3)/τ4)+ τ2, (6)

and the fitted curves are shown by using red solid lines in Figure 6. The fit parameters τ1, τ2, τ3, and τ4, arealso shown in Figure 6. From Figure 6 note that for lower distortion levels (l ≤ −13.7 dB), many patches werescored as perfect quality (LQ ≈ 1) even at the supra-threshold region. Specifically, note that at target contrastCl=1 = −25.7 dB and Cl=2 = −19.7 dB many patches were scored as perfect quality till 10 dB of thresholdelevation. At target contrast Cl=3 = −13.7 dB, patches were scored as perfect quality till 5 dB of thresholdelevation. Beyond target contrast −13.7 dB (Cl=4,5), very few patches were scored as perfect quality.

Furthermore from Figure 6 note that as the target contrast increases the fall-off from highest quality to lowerquality becomes wider. For example, for Cl=1 and Cl=2 the fall-off occurs at threshold elevations of around 17 dBand 20 dB, respectively. However, for Cl>2 the fall-off becomes wider and the relation between log local-qualityand threshold elevation becomes more linear.

3.3.2 Local Quality Prediction Performance using Threshold Elevation

The first research goal of this paper is to measure how far beyond near-threshold masking is valid for imagequality prediction. We quantified the prediction performance of local quality only by using threshold elevation



-10 0 10 20 30 40

-0.4

-0.3

-0.2

-0.1

0

log

L

Q

-10 0 10 20 30 40

-0.5

-0.4

-0.3

-0.2

-0.1

0

-10 0 10 20 30 40

-1

-0.8

-0.6

-0.4

-0.2

0

-10 0 10 20 30 40

-1.5

-1

-0.5

0

-10 0 10 20 30 40

-2

-1.5

-1

-0.5

0

0

Below

Threshold,

! < ",!

Supra

Threshold,

! > ",!

Threshold elevation, Δ = ! − ",!

(a) %&' = −25.7 dB

(b) %&( = −19.7 dB

(c) %&) = −13.7 dB

(d) %&* = −7.7 dB

(a) %&+ = −1.7 dB

At %&' = −25.7 dB:

• Perfect quality at below threshold, Δ < 0

• Perfect quality even at -/ = 4~64 dB

• Fall off approximately at 68 dB

• Sigmoid parameters:

:6 :; :? :@

-0.25 -0.018 17.48 0.19

At %&( = −19.7 dB:


• Perfect quality even at -/ = 4~64 dB

• Fall off approximately at ;4 dB


:6 :; :? :@

-0.58 -0.052 20.67 0.76

At %&) = −13.7 dB:

• Perfect quality at below threshold, A/ < 4

• Perfect quality even at -/ = 4~B dB

• Wider fall off between 10 dB at 20 dB


:6 :; :? :@

-1.14 -0.032 25.42 4.61

At %&* = −7.7 dB:


• Few perfect quality at Δ > 0 dB

• Much wider fall off than %&)


:6 :; :? :@

-0.11 -2.02 30.55 -7.81

At %&+ = −1.7 dB:

• Few patches at below threshold, Δ < 0

• No perfect quality at Δ > 0 dB

• Much wider fall off than %&*


:6 :; :? :@

-4.61 0.23 36.84 13.07

Figure 6. Relation between local image quality and target threshold elevation at five distortion levels (l = 1, ..., 5). Theleft side of the green dotted line denote the below threshold region, and the right side of the green dotted line denotethe supra-threshold region. Data were fitted using a sigmoid function, and the fitted curves are shown by using red solidlines. The fit parameters τ1, τ2, τ3, and τ4, are also shown.

by using a “percent-correct-prediction” measure. First, both the experiment local-quality scores and predicted



log_seaside elk

monument aerial_city

Experiment

LQM

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

LQM

colorbar

Better

quality

��:

−25.7 dB

�!:

−19.7 dB

�":

−13.7 dB

�#:

−7.7 dB

�$:

−1.7 dB

94.4

100

100 97.2

100 97.2

100

86.1

94.4 94.4

72.2 94.4

97.2

80.6

94.4 88.9

72.2 88.9

80.6

69.4

97.2 75

80.6 80.6

61.1

41.7

75 58.3

66.7 72.2

94.4 100 94.4 94.4 94.4 83.3 86.1 94.4 80.6 72.2 88.9 61.1 66.7 77.8 55.6

100 100 88.9 86.1 77.8 94.4 80.6 69.4 83.3 86.1 77.8 75 58.3 69.4 66.7

��:

−25.7 dB

�!:

−19.7 dB

�":

−13.7 dB

�#:

−7.7 dB

�$:

−1.7 dB

��:

−25.7 dB

�!:

−19.7 dB

�":

−13.7 dB

�#:

−7.7 dB

�$:

−1.7 dB

��:

−25.7 dB

�!:

−19.7 dB

�":

−13.7 dB

�#:

−7.7 dB

�$:

−1.7 dB

��:

−25.7 dB

�!:

−19.7 dB

�":

−13.7 dB

�#:

−7.7 dB

�$:

−1.7 dB

��:

−25.7 dB

�!:

−19.7 dB

�":

−13.7 dB

�#:

−7.7 dB

�$:

−1.7 dB

Predicted

LQM (I)

Predicted

LQM (II)

% correct:

% correct:

Experiment

LQM

Predicted

LQM (I)

Predicted

LQM (II)

% correct:

% correct:

swarm

native_american

Figure 7. The predicted and experimental local quality maps (LQMs) for six images log seaside, swarm, elk, na-tive american, monument, and aerial city. For each image the top row shows the distorted images and the second row shows the local quality maps from the experiment. Third row shows the predicted local quality maps created by using sigmoid fitting on the threshold elevations measured from our previous masking experiment.18 Fourth row shows the predicted local quality maps created by using sigmoid fitting on the threshold elevations measured from Watson and Solomon’s masking model,21 which was fitted to predict our previous masking experiment data.18 The experiment data along with model codes are available in http://vision.okstate.edu/masking/.

Threshold elevation, Δ�

% C

orr

ect

loca

l

qual

ity p

redic

tion

0

20

40

60

80

100

0 5 10 15 20 25 30 35 40 45

% Correct local quality prediction using

experiment detection threshold

% Correct local quality prediction using

masking model

Figure 8. Percent correct prediction of the local quality scores using only the threshold elevation. The green line indicatesthe prediction using experimental detection thresholds, and the red line indicates prediction using Watson and Solomon’smasking model21 which was fitted on our masking database.

local-quality (via the sigmoid fitting) scores for below-threshold region were made perfect-quality via,

LQ =

{1, ∆C ≤ 0

LQ, ∆C > 0,(7)



0 0.5 10

0.2

0.4

0.6

0.8

1

Global quality, GQ

GQ

!: 0.1

!: 1.8

!: 35

!

RMSE

between

GQ and GQ

100

102

10-2

10-1

100

!: 1.8,

RMSE: 0.072

(a) (b)

Figure 9. The global quality prediction for varying β: (a) scatter plot between GQ and GQ, and (b) root-mean-square-error

between GQ and GQ for varying β.

and,

LQ =

{1, ∆C ≤ 0

LQ, ∆C > 0.(8)

In the “percent-correct-prediction” measure, a correct prediction occur when LQ is within ±5% of LQ, and awrong prediction occur when LQ is outside of ±5% of LQ.

We have predicted the local quality via a sigmoid fitting by using the threshold elevations found directly fromour previous masking study.18 Furthermore, we predicted the local quality scores via sigmoid fitting by usingthe threshold elevations found from Watson and Solomon’s masking model which was trained on our maskingdatabase.18

Figure 7 shows the predicted and experimental local quality maps (LQMs) for six images log seaside, swarm,elk, native american, monument, and aerial city. For each image the top row shows the distorted images andthe second row shows the local quality maps from the experiment. Third row shows the predicted local qualitymaps from experimental detection threshold. Fourth row shows the predicted local quality maps from fit-ted Watson and Solomon’s masking model.21 The experiment data along with model codes are available inhttp://vision.okstate.edu/masking/.

From Figure 7 note that the percent correct prediction both using the experiment and model thresholdelevations are quite higher at lower target contrast levels, and decreases at higher target contrast levels. However,note that for most of the images, even at fourth level of distortion Cl=4 : −7.7 dB, masking thresholds alonecould predict the local quality scores more than 80% correctly.

Figure 8 summarizes the prediction performance of the local quality scores using only the threshold elevation.The green line indicates the prediction using experimental detection thresholds, and the red line indicates pre-diction using Watson and Solomon’s masking model21 which was fitted on our masking database. By observingFigure 8 we can summarize that masking predicted the local quality scores more than 95% correctly above 15dB threshold within 5% subject scores.

3.4 Relation between Local Quality and Global Quality

From the global-quality assessment experiment, we measured global quality scores of the 36 full-size images. Theglobal quality scores were particularly measured to explore the summing mechanism of the local quality scoresto generate global quality scores. For each of the 36 full-size images we measured 36 local quality scores from



the local quality assessment experiment. Although the local quality scores can be summed up using variousschemes,22,23 we used a simple summation of powered local-quality scores:

GQ =

36∑p=1

(|LQp|β

), (9)

where β is the power. β was optimized to achieve minimum prediction error. Figure 9(a) shows the scatter plots

of experiment global quality (GQ) and predicted global quality (GQ) for three different β. Figure 9(b) showsthe root-mean-square-error (RMSE) between the experiment and predicted global quality scores for varying β.We found that the best prediction performance occur at β = 1.8 (Spearman rank-order correlation 0.97 between

GQ and GQ at β = 1.8). Thus, an approximate squared summation of local-quality scores predicted the globalquality scores suitably. However, we currently have only 36 global-quality scores, and 1296 local quality scoresusing only one distortion type. More data at each distortion level would help explore the summation rule atvarying distortion levels.

4. CONCLUSIONS AND FUTURE WORKS

The local and global qualities of six images with six distortion levels were measured using subjective experiments.Gabor-noise was used as distortions in the quality-assessment experiments to be consistent with our previousstudy, in which the local RMS contrast detection thresholds of detecting the Gabor noise target were measuredat each spatial location of the undistorted images. Results of the experiment showed that masking predicted thelocal quality scores more than 95% correctly above 15 dB threshold within 5% subject scores. Furthermore, wefound that an approximate squared summation of local-quality scores predicted the global quality scores suitably(Spearman rank-order correlation 0.97). Our future work includes designing perceptual model to predict the localimage-quality scores via neural networks.

ACKNOWLEDGMENTS

This material is based upon work supported by, or in part by, the National Science Foundation, Grant Number1054612, and the U.S. Army Research Laboratory (USARL) and the U.S. Army Research Office (USARO) undercontract/grant number W911NF-10-1-0015.

REFERENCES

[1] Bovik, A. C., “Automatic prediction of perceptual image and video quality,” (2013).

[2] Sheikh, H. R., Sabir, M. F., and Bovik, A. C., “A statistical evaluation of recent full reference image qualityassessment algorithms,” Image Processing, IEEE Transactions on 15(11), 3440–3451 (2006).

[3] Larson, E. C. and Chandler, D. M., “Most apparent distortion: full-reference image quality assessment andthe role of strategy,” Journal of Electronic Imaging 19(1), 011006 (2010).

[4] Ponomarenko, N., Lukin, V., Zelensky, A., Egiazarian, K., Carli, M., and Battisti, F., “Tid2008-a databasefor evaluation of full-reference visual quality assessment metrics,” Advances of Modern Radioelectron-ics 10(4), 30–45 (2009).

[5] Chandler, D. M., Alam, M. M., and Phan, T. D., “Seven challenges for image quality research,” in[IS&T/SPIE Electronic Imaging ], 901402–901402, International Society for Optics and Photonics (2014).

[6] Chandler, D. M., “Seven challenges in image quality assessment: past, present, and future research,” ISRNSignal Processing 2013 (2013).

[7] Legge, G. E. and Foley, J. M., “Contrast masking in human vision,” J. of Opt. Soc. Am. 70, 1458–1470(1980).

[8] Daly, S. J., “Visible differences predictor: an algorithm for the assessment of image fidelity,” in [DigitalImages and Human Vision ], Watson, A. B., ed., 179–206 (1993).

[9] Heeger, D. J. and Teo, P. C., “A model of perceptual image fidelity,” in [Proceedings of the InternationalConference on Image Processing, 1995. ], 2, 343–345, IEEE (1995).



[10] Watson, A. B., Borthwick, R., and Taylor, M., “Image quality and entropy masking,” Proceedings ofSPIE 3016 (1997).

[11] Damera-Venkata, N., Kite, T. D., Geisler, W. S., Evans, B. L., and Bovik, A. C., “Image quality assessmentbased on a degradation model,” Image Processing, IEEE Transactions on 9(4), 636–650 (2000).

[12] Wang, Z., Bovik, A. C., Sheikh, H. R., and Simoncelli, E. P., “Image quality assessment: from error visibilityto structural similarity,” Image Processing, IEEE Transactions on 13(4), 600–612 (2004).

[13] Chandler, D. M. and Hemami, S. S., “Vsnr: A wavelet-based visual signal-to-noise ratio for natural images,”IEEE Transactions on Image Processing 16(9), 2284–2298 (2007).

[14] Ninassi, A., Meur, O. L., Callet, P. L., Barba, D., et al., “On the performance of human visual system basedimage quality assessment metric using wavelet domain,” in [Proceedings of the SPIE Conference HumanVision and Electronic Imaging XIII ], 6806 (2008).

[15] Aydin, T. O., Cadık, M., Myszkowski, K., and Seidel, H.-P., “Video quality assessment for computer graphicsapplications,” in [ACM Transactions on Graphics (TOG) ], 29(6), 161, ACM (2010).

[16] Laparra, V., Munoz-Marı, J., and Malo, J., “Divisive normalization image quality metric revisited,” JOSAA 27(4), 852–864 (2010).

[17] Alam, M. M., Vilankar, K. P., and Chandler, D. M., “A database of local masking thresholds in natural im-ages,” in [IS&T/SPIE Electronic Imaging ], 86510G–86510G, International Society for Optics and Photonics(2013).

[18] Alam, M. M., Vilankar, K. P., Field, D. J., and Chandler, D. M., “Local masking in natu-ral images: A database and analysis,” Journal of vision 14(8), 22 (2014. Database available at:http://vision.okstate.edu/masking/).

[19] Moulden, B., Kingdom, F. A. A., and Gatley, L. F., “The standard deviation of luminance as a metric forcontrast in random-dot images,” Perception 19, 79–101 (1990).

[20] VQEG, “Final report from the video quality experts group on the validation of objective models of videoquality assessment, phase ii,” (August 2003). http://www.vqeg.org.

[21] Watson, A. B. and Solomon, J. A., “A model of visual contrast gain control and pattern masking,” J. ofOpt. Soc. Am. A 14(9), 2379–2391 (1997).

[22] Wang, Z. and Shang, X., “Spatial pooling strategies for perceptual image quality assessment,” in [ImageProcessing, 2006 IEEE International Conference on ], 2945–2948, IEEE (2006).

[23] Chandler, D. M. and Hemami, S. S., “Effects of natural images on the detectability of simple and compoundwavelet subband quantization distortions,” J. Opt. Soc. Am. A 20 (July 2003).



Documents

Relations between local and global perceptual image ...vision.eng.shizuoka.ac.jp/pubs/pdfs/93940M.pdf · Keywords: Image quality, local image quality, visual masking, local detection