12
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 25, NO. 8, AUGUST 2016 3775 Sparse Representation-Based Image Quality Index With Adaptive Sub-Dictionaries Leida Li, Hao Cai, Yabin Zhang, Weisi Lin, Fellow, IEEE , Alex C. Kot, Fellow, IEEE , and Xingming Sun Abstract— Distortions cause structural changes in digital images, leading to degraded visual quality. Dictionary-based sparse representation has been widely studied recently due to its ability to extract inherent image structures. Meantime, it can extract image features with slightly higher level semantics. Intuitively, sparse representation can be used for image quality assessment, because visible distortions can cause significant changes to the sparse features. In this paper, a new sparse representation-based image quality assessment model is proposed based on the construction of adaptive sub-dictionaries. An over- complete dictionary trained from natural images is employed to capture the structure changes between the reference and distorted images by sparse feature extraction via adaptive sub-dictionary selection. Based on the observation that image sparse features are invariant to weak degradations and the perceived image quality is generally influenced by diverse issues, three auxiliary quality features are added, including gradient, color, and luminance information. The proposed method is not sensitive to training images, so a universal dictionary can be adopted for quality evaluation. Extensive experiments on five public image quality databases demonstrate that the proposed method produces the state-of-the-art results, and it delivers consistently well perfor- mances when tested in different image quality databases. Index Terms— Quality evaluation, overcomplete synthesis dictionary, sparse coding, adaptive sub-dictionary. I. I NTRODUCTION I MAGES are easily subject to distortions during their acqui- sition and processing. Image quality assessment (IQA) can be used to benchmark image processing algorithms and judge how an image is degraded [1]. IQA is a fundamental issue Manuscript received November 30, 2014; revised January 20, 2016 and April 26, 2016; accepted June 5, 2016. Date of publication June 7, 2016; date of current version June 23, 2016. This work was supported in part by the Qing Lan Project of Jiangsu Province, in part by the Fundamental Research Funds for the Central Universities under Grant 2015QNA66, and in part by the National Natural Science Foundation of China under Grant 61379143. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Patrick Le Callet. L. Li is with the School of Information and Electrical Engineering, China University of Mining and Technology, Xuzhou 221116, China (e-mail: [email protected]). H. Cai is with the Department of Computer Science, Memorial University of Newfoundland, St. John’s, NL A1B 3X5, Canada (e-mail: [email protected]). Y. Zhang and W. Lin are with the School of Computer Engineering, Nanyang Technological University, Singapore 639798 (e-mail: [email protected]; [email protected]). A. C. Kot is with the School of Electrical and Electronic Engi- neering, Nanyang Technological University, Singapore 639798 (e-mail: [email protected]). X. Sun is with the School of Computer and Software, Nanjing University of Information Science & Technology, Nanjing 210044, China (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIP.2016.2577891 in many image processing problems, such as image/video coding [2], image restoration and image forensics [3]–[5]. In modern IQA, mathematical models are built for evaluating image distortions, which are expected to perform consistently with human eyes. While nowadays great efforts have been made to the quality assessment of new multimedia signals, such as High Dynamic Range (HDR) images [6], Ultra High Definition (UHD) [7] and 3D [8] videos, traditional IQA is still a problem far from being solved. The current IQA models can be classified into full- reference (FR), reduced-reference (RR) and no-reference (NR) metrics [9]. FR metrics evaluate the quality of a distorted image with the full information of an undistorted reference image. RR metrics employ partial information of a reference image to achieve quality assessment [10], [11]. Different from FR and RR metrics, NR metrics evaluate the quality only using the distorted image. NR-IQA metrics can be further classified into distortion-specific and general-purpose metrics. Distortion-specific metrics evaluate the quality of a single distortion, typically blocking artifacts [12], [13], blur [14], [15] and ring artifacts [16]. By contrast, general-purpose metrics evaluate the overall quality of an image without knowing the distortion types [17]. In this paper, we focus on FR-IQA. The widely utilized peak signal to noise ratio (PSNR) model is calculated by computing pixel-to-pixel differences between the reference and distorted images. While having clear physi- cal meaning, it has also been criticized for not consistent with human perception [18]. This is easy to understand, because distortions in different regions of an image have varying impacts on the visual quality, but PSNR treats them equally. Furthermore, human eyes are only sensitive to suprathreshold distortions, but PSNR changes even an image is subject to the slightest and invisible changes. Therefore, it is highly desired to develop IQA models that can behave similarly with human eyes. In the past few years, great advances have been achieved in the IQA community. In [19], Wang et al. addressed the pioneering work titled structural similarity (SSIM). The SSIM model was mainly based on the observation that the per- ceived quality of images is basically affected by structure changes. In SSIM, structure, luminance and contrast changes were quantified and combined to define the objective qual- ity score of an image. The multi-scale version of SSIM, namely MS-SSIM, was also proposed [20]. In MS-SSIM, the impact of viewing conditions on image quality was considered using the multi-resolution representation of images. In [21], Wang et al. improved SSIM by incorporating information- content-based weighting, producing the IW-SSIM metric. 1057-7149 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Sparse Representation-Based Image Quality Index With ... · Sparse Representation-Based Image Quality Index With Adaptive Sub-Dictionaries Leida Li, Hao Cai, ... of current version

  • Upload
    others

  • View
    5

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Sparse Representation-Based Image Quality Index With ... · Sparse Representation-Based Image Quality Index With Adaptive Sub-Dictionaries Leida Li, Hao Cai, ... of current version

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 25, NO. 8, AUGUST 2016 3775

Sparse Representation-Based Image Quality IndexWith Adaptive Sub-Dictionaries

Leida Li, Hao Cai, Yabin Zhang, Weisi Lin, Fellow, IEEE, Alex C. Kot, Fellow, IEEE, and Xingming Sun

Abstract— Distortions cause structural changes in digitalimages, leading to degraded visual quality. Dictionary-basedsparse representation has been widely studied recently due toits ability to extract inherent image structures. Meantime, itcan extract image features with slightly higher level semantics.Intuitively, sparse representation can be used for image qualityassessment, because visible distortions can cause significantchanges to the sparse features. In this paper, a new sparserepresentation-based image quality assessment model is proposedbased on the construction of adaptive sub-dictionaries. An over-complete dictionary trained from natural images is employed tocapture the structure changes between the reference and distortedimages by sparse feature extraction via adaptive sub-dictionaryselection. Based on the observation that image sparse features areinvariant to weak degradations and the perceived image qualityis generally influenced by diverse issues, three auxiliary qualityfeatures are added, including gradient, color, and luminanceinformation. The proposed method is not sensitive to trainingimages, so a universal dictionary can be adopted for qualityevaluation. Extensive experiments on five public image qualitydatabases demonstrate that the proposed method produces thestate-of-the-art results, and it delivers consistently well perfor-mances when tested in different image quality databases.

Index Terms— Quality evaluation, overcomplete synthesisdictionary, sparse coding, adaptive sub-dictionary.

I. INTRODUCTION

IMAGES are easily subject to distortions during their acqui-sition and processing. Image quality assessment (IQA) can

be used to benchmark image processing algorithms and judgehow an image is degraded [1]. IQA is a fundamental issue

Manuscript received November 30, 2014; revised January 20, 2016 andApril 26, 2016; accepted June 5, 2016. Date of publication June 7, 2016; dateof current version June 23, 2016. This work was supported in part by theQing Lan Project of Jiangsu Province, in part by the Fundamental ResearchFunds for the Central Universities under Grant 2015QNA66, and in part bythe National Natural Science Foundation of China under Grant 61379143. Theassociate editor coordinating the review of this manuscript and approving itfor publication was Prof. Patrick Le Callet.

L. Li is with the School of Information and Electrical Engineering,China University of Mining and Technology, Xuzhou 221116, China (e-mail:[email protected]).

H. Cai is with the Department of Computer Science, Memorial University ofNewfoundland, St. John’s, NL A1B 3X5, Canada (e-mail: [email protected]).

Y. Zhang and W. Lin are with the School of Computer Engineering, NanyangTechnological University, Singapore 639798 (e-mail: [email protected];[email protected]).

A. C. Kot is with the School of Electrical and Electronic Engi-neering, Nanyang Technological University, Singapore 639798 (e-mail:[email protected]).

X. Sun is with the School of Computer and Software, Nanjing Universityof Information Science & Technology, Nanjing 210044, China (e-mail:[email protected]).

Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TIP.2016.2577891

in many image processing problems, such as image/videocoding [2], image restoration and image forensics [3]–[5].In modern IQA, mathematical models are built for evaluatingimage distortions, which are expected to perform consistentlywith human eyes. While nowadays great efforts have beenmade to the quality assessment of new multimedia signals,such as High Dynamic Range (HDR) images [6], Ultra HighDefinition (UHD) [7] and 3D [8] videos, traditional IQA isstill a problem far from being solved.

The current IQA models can be classified into full-reference (FR), reduced-reference (RR) and no-reference (NR)metrics [9]. FR metrics evaluate the quality of a distortedimage with the full information of an undistorted referenceimage. RR metrics employ partial information of a referenceimage to achieve quality assessment [10], [11]. Different fromFR and RR metrics, NR metrics evaluate the quality onlyusing the distorted image. NR-IQA metrics can be furtherclassified into distortion-specific and general-purpose metrics.Distortion-specific metrics evaluate the quality of a singledistortion, typically blocking artifacts [12], [13], blur [14], [15]and ring artifacts [16]. By contrast, general-purpose metricsevaluate the overall quality of an image without knowing thedistortion types [17]. In this paper, we focus on FR-IQA.

The widely utilized peak signal to noise ratio (PSNR) modelis calculated by computing pixel-to-pixel differences betweenthe reference and distorted images. While having clear physi-cal meaning, it has also been criticized for not consistent withhuman perception [18]. This is easy to understand, becausedistortions in different regions of an image have varyingimpacts on the visual quality, but PSNR treats them equally.Furthermore, human eyes are only sensitive to suprathresholddistortions, but PSNR changes even an image is subject to theslightest and invisible changes. Therefore, it is highly desiredto develop IQA models that can behave similarly with humaneyes.

In the past few years, great advances have been achievedin the IQA community. In [19], Wang et al. addressed thepioneering work titled structural similarity (SSIM). The SSIMmodel was mainly based on the observation that the per-ceived quality of images is basically affected by structurechanges. In SSIM, structure, luminance and contrast changeswere quantified and combined to define the objective qual-ity score of an image. The multi-scale version of SSIM,namely MS-SSIM, was also proposed [20]. In MS-SSIM, theimpact of viewing conditions on image quality was consideredusing the multi-resolution representation of images. In [21],Wang et al. improved SSIM by incorporating information-content-based weighting, producing the IW-SSIM metric.

1057-7149 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Page 2: Sparse Representation-Based Image Quality Index With ... · Sparse Representation-Based Image Quality Index With Adaptive Sub-Dictionaries Leida Li, Hao Cai, ... of current version

3776 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 25, NO. 8, AUGUST 2016

In [22], an information fidelity criterion (IFC) was addressedfor quality evaluation, which is based on natural scenestatistics. An improved version of IFC, named visualinformation fidelity (VIF), was later proposed with betterperformance [23]. Larson and Chandler [24] proposed the mostapparent distortion (MAD) metric by imitating the differentprinciples of human eyes when viewing images with differ-ent quality levels. The feature similarity (FSIM) frameworkwas designed by extracting perceptual features using phasecongruency [25]. In another work, Liu et al. [26] claimedthat gradient similarity (GSM) can capture both structure andcontrast distortions simultaneously. In [27], a highly efficientquality model was reported based on the gradient magnitudesimilarity deviation (GMSD). In this work, the quality of animage was defined as the standard deviation of the gradientsimilarity map between the reference and distorted images.More recently, Zhang et al. [28] proposed a visual saliencybased index (VSI) for FR-IQA, where visual saliency mapswere employed to measure the distortions.

In the existing image quality models, low-level features arecommonly adopted. However, high-level features are utilizedin the general understanding of images [29]. As a result,higher-level features with more semantics are expected to bemore suitable for visual quality evaluation. In the past fewyears, sparse representation has been widely used in recogni-tion and restoration problems with promising results [30]–[32].Generally, the basis vectors in dictionaries are somewhat likethe simple cells in the primary visual cortex [33], [34]. It isgenerally accepted that slightly higher-level image featurescan be obtained using such dictionaries, which are usuallytrained from natural images. Recently, a few works have beendone towards this direction, among which are two representa-tives [35], [36]. In [35], Chang et al. employed independentcomponent analysis (ICA) to train a feature detector. Imagestructures were then represented using the feature detector.Finally, the detector outputs were compared to produce thesparse feature fidelity (SFF) metric. In Guha’s work [36], [37],a dictionary was first trained using the reference image. Thenthe dictionary was used to decompose both the reference anddistorted images. The quality score was generated by com-puting the similarity between the sparse coefficients. Whilethese models have brought new perspectives on how to usesparse representation for IQA, several important problemsremain to be solved. In SFF [35], an undercomplete dictionaryof size 8 × 192 is used to compute the sparse features.Such a small dictionary may not be able to effectively rep-resent the diversified image contents in practice. By con-trast, overcomplete dictionaries are used in [36] and [37],which are more effective for feature extraction. However,the basis vectors used for representing the reference anddistorted images are selected based on the orthogonal matchingpursuit (OMP) algorithm [38]. This indicates that very likelydifferent basis vectors are adopted for representing the twoimages. Therefore, the obtained sparse coefficients are notfairly comparable, leading to inaccurate representation ofimage distortions. Furthermore, for each reference image, aseparate dictionary should be trained, so the computationalcost is very high.

In this work, a new sparse representation-based FR-IQAmodel is presented, which is mainly based on the proposedAdaptive Sub-Dictionary selection strategy (QASD). Firstly,a number of natural images are utilized to train an over-complete visual dictionary. In the feature extraction stage,this dictionary is used to represent the reference image. Forthe distorted image, it is not represented using the sameovercomplete dictionary. Instead, it is represented using sub-dictionaries, which are obtained based on the already usedbasis vectors in the representation of reference images. By thismeans, the reference image and the corresponding distortedversion are represented using exactly the same basis vectors.As a result, image distortions can be accurately represented.Having noticed that weak distortions in images usually havevery little influence on the sparse representation based featuresand diverse aspects tend to affect image quality in practice,three additional features are employed for more effective IQA.The proposed method has the advantage that it is not sensitiveto training images, so a universal dictionary can be used toevaluate the quality of common images.

The proposed QASD model differs from the aforemen-tioned two representative sparse representation based FR-IQAmetrics [35], [36] in that: (1) A universal overcomplete dic-tionary is employed for sparse feature extraction, which ismore effective and efficient. (2) Adaptive sub-dictionaries areconstructed and used for the feature extraction in distortedimages, so that image distortions can be correctly captured.(3) Unlike metrics [35], [36] that compare the coefficientsdirectly, the proposed method employs the sparse coefficientsto construct two feature maps for quality evaluation. Theperformance of the proposed method is verified on five publicimage quality databases. Experimental results demonstrate thatthe proposed method produces the state-of-the-art performanceand it performs consistently well across databases.

A preliminary version of this work was published inPCM’15 [39]. In this journal version, we have providedmore in-depth theoretical analysis of the proposed qualitymodel. Furthermore, extensive experimental validations havebeen conducted using more subjectively-rated image qualitydatabases, together with more comprehensive comparisonswith the state-of-the-art models. A thorough analysis of theimpacts of different parameter settings on the overall perfor-mances of the proposed method is also investigated, which isimportant for understanding the characteristics of the proposedmodel.

II. SPARSE REPRESENTATION

In sparse representation, an input signal is represented usingonly a few basis vectors in a pre-trained dictionary. Givenan overcomplete dictionary D ∈ Rn×K (n < K ), whichconsists of K n-dimension basis vectors {di }K

i=1, a signal y isrepresented as a linear combination of the selected best basisvectors:

y = Dx =K∑

i=1

xidi , s.t. ‖y − Dx‖2 ≤ ε, (1)

where x ∈ RK is the representation vector, while ‖·‖2 denotesthe �2 norm.

Page 3: Sparse Representation-Based Image Quality Index With ... · Sparse Representation-Based Image Quality Index With Adaptive Sub-Dictionaries Leida Li, Hao Cai, ... of current version

LI et al.: SPARSE REPRESENTATION-BASED IMAGE QUALITY INDEX WITH ADAPTIVE SUB-DICTIONARIES 3777

Fig. 1. Flowchart of the proposed QASD model.

In sparse representation, a signal is expected to be bestapproximated with the fewest basis vectors. Therefore, wehave the following objective function:

minx

‖x‖0, s.t. ‖y − Dx‖2 ≤ ε, (2)

where ‖ · ‖0 denotes the �0 norm, which counts the number ofnonzero elements in a vector.

To obtain the representation vectors, the current work oftenutilizes pursuit algorithms [40], which has the capacity todetermine the most informative basis vectors for representinga signal, and meantime satisfies a sparsity constraint. In theproposed method, we use the OMP approach [38].

The training of an overcomplete dictionary is usually basedon a number of natural images. In practice, a great deal of localpatches are selected randomly from the training images. Theprinciple of dictionary training is to find a set of basis vestors,a.k.a. atoms, that can produce the best approximation for allinput image patches. Given a total number of Np randomly

selected natural image patches {yi}Npi=1, the dictionary can be

formulated by iteratively updating an initial dictionary and therepresentation vector as follows [31]:

D = arg minD,x

‖y − Dx‖22 + λ‖x‖1,

s.t. ‖Di‖22 ≤ 1, i = 1, 2, . . . , K , (3)

where ‖ · ‖1 denotes the �1 norm, the parameter λ is usedto balance the sparsity and fidelity of the approximationto the input signal y. Several effective dictionary learningalgorithms have been proposed in the literatures [40]–[42].In the proposed method, the dictionary is trained using the

method in [41]. The readers are referred to [41] for furtherdetails of dictionary learning.

III. PROPOSED QASD MODEL

Fig. 1 shows a flowchart of the proposed QASD metric,which contains a sparse feature module and an auxiliaryfeature module. Here, the sparse feature module is the maincontribution of this paper, which aims to extract slightlyhigher-level feature for distortion analysis. The auxiliary fea-ture module is included with the aim to account for theinvariance of sparse features to weak degradations on imagesand other perceptual issues in IQA. In this work, contrast,color and luminance are considered in the auxiliary featuremodule.

A. Adaptive Sub-Dictionaries

In this work, we use a universal overcomplete dictionaryto capture the similarity/distance between the reference anddistorted images. To achieve this goal, the reference anddistorted images need to be represented using the same setof basis vectors. However, in the OMP algorithm, the optimalbasis vectors used to represent an input signal are determinedwith the objective to produce the minimal representation error.This indicates that when representing a reference image patchand the corresponding distorted version, the basis vectors arevery likely different, depending on the distortion strength.If this occurs, the representation vectors cannot accuratelycapture the differences between the two images.

To solve the aforementioned problem, we propose an adap-tive sub-dictionary selection strategy. The idea of the proposed

Page 4: Sparse Representation-Based Image Quality Index With ... · Sparse Representation-Based Image Quality Index With Adaptive Sub-Dictionaries Leida Li, Hao Cai, ... of current version

3778 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 25, NO. 8, AUGUST 2016

Fig. 2. Adaptive sub-dictionary. In this example, the reference signal yri j is

represented as the linear combination of six basis vectors in the overcompletedictionary D. An undercomplete sub-dictionary �i j is constructed usingthe six basis vectors. The distorted signal yd

i j is represented as the linearcombination of all the six basis vectors in �i j .

approach is illustrated in Fig. 2. Given a block from thereference image, which is denoted by yr

i j and i, j denotethe row and column indices of the block, the overcompletedictionary D is used to represent it, producing the sparsecoefficients xr

i j and the used basis vectors {dk}Lk=1, where

L denotes the sparsity. Then a sub-dictionary �i j is con-structed with all the L already used basis vectors. Then thedistorted version of the block is forced to be represented usingall the basis vectors in the corresponding sub-dictionary �i j ,generating the sparse coefficients xd

i j . By this means, thesame basis vectors are utilized to represent the reference andthe corresponding distorted image blocks. As a result, thesimilarity/difference between the sparse coefficients xr

i j and xdi j

can correctly capture the distortions. It should be noted thatthe sub-dictionary �i j is adaptive to the reference imageblock yr

i j . This means that for different reference image blocks,different sub-dictionaries will be generated for representing thecorresponding distorted image blocks.

B. Sparse Feature

Sparse feature similarity is the main contribution of theproposed method, which we believe would be beneficial formodern IQA. In this paper, an image is represented by sparserepresentation block-wisely. Given an image pair F1 and F2,both with size M × N , we first partition them into nonoverlap-ping patches, all with the same size R × R. The blocks of thereference and distorted images are denoted by {yr

i j } and {ydi j },

respectively, where i = 1, 2, . . . , � MR �, j = 1, 2, . . . , � N

R �,�·� is the floor operation. Given a reference block yr

i j , it isrepresented using the dictionary D:

yri j = Dxr

i j , s.t. ‖yri j − Dxr

i j ‖2 ≤ ε, (4)

where xri j is the sparse coefficients with sparsity

(nonzero elements) of L. Then the L used basis vectorsconstitute a sub-dictionary, which is denoted by �i j ,

i = 1, 2, . . . , � MR �, j = 1, 2, . . . , � N

R �. Then the distortedblock yd

i j is represented based on the sub-dictionary �i j asfollows:

ydi j = �i j xd

i j , s.t. ‖ydi j − �i j xd

i j ‖2 ≤ ε. (5)

It should be noted that in Eq. (5), the sparsity is set to L,namely all the basis vectors in �i j are used to represent yd

i j ,producing the representation vector xd

i j . As aforementioned,the objective is to ensure that the reference and correspondingdistorted blocks are represented using exactly the same setof basis vectors, so that the produced sparse features are fairfor comparison. This is crucial to the success of the proposedmethod.

With the sparse coefficients {xri j } and {xd

i j }, i = 1, 2, . . . ,

� MR �, j = 1, 2, . . . , � N

R �, two feature maps are generated andused to evaluate the similarity/difference between the referenceand distorted images. In practice, these two feature mapsare generated by computing the root inner product of therepresentation vectors:

FMr (i, j) =√

〈xri j , xr

i j 〉, (6)

FMd(i, j) =√

〈xdi j , xd

i j 〉, (7)

where FMr and FMd are the two feature maps, respectively,and 〈 · 〉 represents the inner product. Note that the size of

the feature maps is � MR � × � N

R �. We then resize the featuremaps to the same size with the original image before furtherprocessing.

In Fig. 3, we show an example of feature map extractionfrom a reference and three distorted counterparts, which are alltaken from the TID2013 database [47]. The three distortionsare Gaussian blur, JPEG compression and sparse samplingwith reconstruction, respectively. It is obvious that the featuremaps are representative of distortions. Different distortionsproduce different feature maps, which can be clearly seen fromthe marked regions in the feature maps. This confirms the factthat the feature maps can capture the distortion characteristics.As a result, it is feasible to employ the feature maps formeasuring image distortions.

The two feature maps are compared to produce the sparse

feature similarity. With two feature maps FMr and FMd , thefeature similarity is computed as follows:

SFM(i, j) = 2FMr (i, j)FMd (i, j) + c1

[FMr (i, j)]2 + [FMd(i, j)]2 + c1, (8)

where i = 1, 2, . . . , M , j = 1, 2, . . . , N , and c1 is a constant.It should be noted that the similarity between the feature mapsis calculated in a pixel-wise manner, so SFM is actually aquality map, indicating the distortion strength in image localareas.

C. Auxiliary FeatureIn the proposed model, the sparse representation based

feature maps can extract slightly higher-level features forcapturing inherent image structures [41]. This also indicatesthat these sparse features are mainly affected by moder-ate and heavy distortions in images. In other words, weak

Page 5: Sparse Representation-Based Image Quality Index With ... · Sparse Representation-Based Image Quality Index With Adaptive Sub-Dictionaries Leida Li, Hao Cai, ... of current version

LI et al.: SPARSE REPRESENTATION-BASED IMAGE QUALITY INDEX WITH ADAPTIVE SUB-DICTIONARIES 3779

Fig. 3. Example images and their corresponding feature maps. (a) Reference; (b) Gaussian blur; (c) JPEG compression; (d) Sparse sampling and reconstruction;(e)-(h) Are the corresponding feature maps.

distortions have very limited influence on the sparsefeature maps. Meantime, the visual quality of an image isusually influenced by diverse factors. Therefore, a singlesparse feature is not expected to perform well. With theseconsiderations, three auxiliary features are integrated for moreeffective IQA, i.e., gradient, color and luminance.

Image gradient is very sensitive to distortions, both heavyand weak. Therefore, we employ gradient as a complementaryfeature to measure image structures, especially for weakdistortions. Furthermore, it can also capture contrast changein images [25]. In our work, an image is first convertedfrom RGB space to YCbCr space [43], which has the capac-ity to distinguish intensity and color components, whereY denotes the intensity, and Cb and Cr denote the color. Withthis model, color distortion can be simultaneously computedusing Cb and Cr channels, which is the second auxiliaryfeature.

For an image pair, the Y, Cb, Cr components are repre-sented by Yr , Cbr , Crr and Yd , Cbd , Crd , respectively. TheScharr operator [25] is first used to compute the gradients ofYr and Yd , which are represented by Gr and Gd , respectively.Then gradient similarity is computed as:

SG(i, j) = 2Gr (i, j)Gd(i, j) + c2

[Gr (i, j)]2 + [Gd(i, j)

]2 + c2

. (9)

Similarly, color similarity can be calculated as:

SC(i, j) = 2Cbr (i, j)Cbd (i, j) + c3[Cbr (i, j)

]2 + [Cbd(i, j)

]2 + c3

· 2Crr (i, j)Crd(i, j) + c3[Crr (i, j)

]2 + [Crd(i, j)

]2 + c3

. (10)

In Eqs. (9) and (10), i = 1, 2, . . . , M , j = 1, 2, . . . , N , andc2 and c3 are constants.

Luminance is the final auxiliary feature in the proposedmodel. Here, luminance similarity is computed similarly tothe SFF model [35]. First, the block pairs that have large

mean differences are selected and denoted by Br and Bd .Then the luminance similarity is computed as the correlationbetween the selected mean-value pairs. The mean values ofthe selected block pairs are first calculated, which are denotedby mr = {mr

i }Ti=1 and md = {md

i }Ti=1, T is the number of

selected block pairs. Then the luminance similarity score isdefined as [35]:

QL =∑T

i=1

(mr

i − μ(mr )) · (md

i − μ(md)) + c4√∑Ti=1

(mr

i −μ(mr ))2 ·∑T

i=1

(md

i − μ(md))2 + c4

,

(11)

where μ(·) is the mean operation, and c4 is a constant.

D. Sparse Feature-Based Weighting

It has been widely accepted that in the perception of imagequality, different regions have different contributions [21].Specifically, distortions in textured regions have more impacton the perceptual quality than those in smooth regions.In order to obtain an overall quality score, the local qualitymap is usually weighted, which aims to adapt to the charac-teristics of the HVS. It is observed from Fig. 3 that the featuremap also indicates the local visual importance, where texturedregions are assigned bigger values than smooth regions. As aresult, we propose to generate the weighting map using thefeature maps as follows:

W(i, j) = max(

FMr (i, j), FMd(i, j)), (12)

where W is the proposed weighting map, which has the samesize with the feature maps. Examples of the weighting mapsare shown in Fig. 4. It can be seen from the figure that biggerweights are assigned to the textured regions, which are moreimportant in the perception of image quality.

With the sparse feature-based weighting map W, the localquality map SFM, local gradient similarity map SG and local

Page 6: Sparse Representation-Based Image Quality Index With ... · Sparse Representation-Based Image Quality Index With Adaptive Sub-Dictionaries Leida Li, Hao Cai, ... of current version

3780 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 25, NO. 8, AUGUST 2016

Fig. 4. Sparse feature-based weighting maps. (a) Original images;(b) Distorted images (top: Gaussian blur, bottom: JPEG); (c) Weighting maps.

color similarity map SC are weighted, producing three overallquality scores:

QFM =∑M

i=1∑N

j=1 SFM(i, j) · W(i, j)∑M

i=1∑N

j=1 W(i, j), (13)

QG =∑M

x=1∑N

y=1 SG(i, j) · W(i, j)∑M

x=1∑N

y=1 W(i, j), (14)

QC =∑M

x=1∑N

y=1 SC(i, j) · W(i, j)∑M

x=1∑N

y=1 W(i, j). (15)

So far, we have obtained the sparse feature similarityscore QFM, gradient similarity score QG, color similarityscore QC and luminance score QL. The overall quality scoreis calculated as:

Q = QFM · (QG)α · (QC)β · (QL)γ , (16)

where parameters α, β, γ ∈ [0, 1] are used to balance therelative contributions of different components.

IV. EXPERIMENTAL RESULTS

A. Experimental Settings

Five public image quality databases are used for perfor-mance evaluation of the proposed QASD model, namelyIVC [44], LIVE [19], [45], CSIQ [24], TID2008 [46] andTID2013 [47]. The characteristics of these databases are listedin Table I, including the numbers of reference images, dis-torted images, distortion types, etc. In the table, MOS denotesthe mean opinion score, and DMOS denotes the differencemean opinion score. A total of 6530 images with differentdistortion types are tested in our experiments.

The proposed QASD model is compared with elevenFR-IQA metrics, including PSNR, SSIM [19], MS-SSIM [20],IW-SSIM [21], VIF [23], MAD [24], FSIM [25], GSM [26],SFF [35], GMSD [27] and VSI [28]. Three commonly usedcriteria are adopted for quantitative performance evaluation,including Spearman rank order correlation coefficient (SRCC),Pearson linear correlation coefficient (PLCC) and root meansquared error (RMSE) [21]. SRCC is calculated according

TABLE I

DATABASES FOR PERFORMANCE EVALUATION

Fig. 5. Ten training images and the trained dictionary.

to the rank of the scores, and it is used to evaluate theprediction monotonicity. PLCC and RMSE are used to evaluatethe prediction accuracy. To compute PLCC and RMSE, alogistic mapping is first conducted between the subjectiveand predicted scores. In this paper, a five-parameter logisticfunction is used [48]:

f (x) = τ1

(1

2− 1

1 + eτ2(x−τ3)

)+ τ4x + τ5, (17)

where τi , i = 1, 2, . . . , 5, are the parameters to be fitted.In our implementation, 8 × 8 (R = 8) blocks are adopted

for sparse representation. An overcomplete dictionary of size64 × 256 is used, which is trained using ten natural images.To be more specific, we first randomly select 10,000 blocksfrom the ten images. Then the dictionary learning method [41]is adopted to train the dictionary. The ten training imagesand the corresponding trained dictionary are shown in Fig. 5.Furthermore, we set the sparsity to 2 for implementation.The parameters α, β, γ in Eq. (16) are determined based onLIVE database. Specifically, these parameters are first tunedby maximizing the SRCC and PLCC values in LIVE database.Then they are used in the performance evaluation of the otherfour databases, namely IVC, CSIQ, TID2008 and TID2013.In this paper, we set α = 0.25, β = 0.03, γ = 0.65, which canachieve consistently good performances across all databases.

Page 7: Sparse Representation-Based Image Quality Index With ... · Sparse Representation-Based Image Quality Index With Adaptive Sub-Dictionaries Leida Li, Hao Cai, ... of current version

LI et al.: SPARSE REPRESENTATION-BASED IMAGE QUALITY INDEX WITH ADAPTIVE SUB-DICTIONARIES 3781

TABLE II

SUMMARY OF EXPERIMENTAL RESULTS OF QASD AND COMPARED FR-IQA MODELS

B. Performance Evaluation

In this subsection, the performances of QASD and theeleven compared metrics are evaluated in terms of PLCC,SRCC and RMSE. The experimental results on the fivedatabases are summarized in Table II. For each performancecriterion and database, the three best results are marked inboldface.

It is clear in Table II that the proposed QASD performsconsistently well in the five databases. In IVC, QASD out-performs all the other metrics in terms of both predictionaccuracy and monotonicity. In TID2008 and TID2013, whichare the two largest databases, QASD outperforms most ofthe state-of-the-art metrics. Specifically, in TID2008, QASDproduces the best prediction accuracy, while the predictionmonotonicity is only slightly worse than GMSD and VSI.In TID2013, the performance of QASD is only worse thanVSI, and it outperforms the other ten metrics. In LIVE andCSIQ, although not the best, QASD performs only slightlyworse than the best metrics. Based on these results, we candraw the conclusion that QASD produces the state-of-the-artperformances, which are consistently well across databases.

C. Statistical Significance Analysis

We further evaluate the statistical significance of the pro-posed model using F-test [45], which is based on the predictionresiduals between the mapped objective scores (after thelogistical mapping) and the subjective scores. Let the variancesof prediction residuals of two metrics A and B be denoted byσ 2

A and σ 2B, then the F score is defined as:

F = σ 2A/σ 2

B. (18)

The statistical significance can be obtained by comparing Fwith a threshold Fcrit ical , which is determined by the numberof prediction residuals and a confidence level. If F > Fcrit ical

TABLE III

SUMMARY OF STATISTICAL PERFORMANCES BETWEEN QASDAND THE COMPARED METRICS ON FIVE DATABASES

(or F < 1/Fcrit ical ), metric A has significantly bigger(or smaller) prediction residuals than metric B, which indicatesthat metric B performs significantly better (or worse) thanmetric A. Otherwise, their performances are comparable.Table III lists the statistical significance between QASDand eleven compared metrics, where “+1”, “0” and “−1”mean that QASD performs statistically better, comparable andworse than the corresponding metric in comparison (with 95%confidence), respectively.

In Table III, it is clear that our method is superior tothe existing metrics in most cases. Among the 55 cases(combinations of metrics and databases), our metric performssignificantly better for 41 times, comparable for 10 times andworse for only 4 times. To be specific, in IVC database,only FSIM, GSM and SFF are comparable to our method,

Page 8: Sparse Representation-Based Image Quality Index With ... · Sparse Representation-Based Image Quality Index With Adaptive Sub-Dictionaries Leida Li, Hao Cai, ... of current version

3782 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 25, NO. 8, AUGUST 2016

Fig. 6. Scatter plots of subjective scores against predicted scores by different metrics on the TID2013 database. The red lines are fitted with the logistic functionand the black dash lines are 95% confidence intervals. (a) SSIM. (b) VIF. (c) IW-SSIM. (d) MAD. (e) FSIM. (f) GSM. (g) SFF. (h) VSI. (i) Proposed QASD.

and QASD outperforms all other metrics. In LIVE, onlyMAD performs better, and our method performs better orcomparatively than the remaining ten metrics. In CSIQ, ourmethod performs better than 8 metrics, and only SFF andGMSD outperform ours. In TID2008, only SFF and GMSDare comparable to our method, and QASD is superior to theother 9 metrics. In TID2013, our method performs better than10 metrics except for VSI. Although VSI performs betterin TID2013, our method outperforms VSI in the other fourdatabases. This indicates that the proposed method performsconsistently well across all databases, which is desired in real-world applications.

Fig.6 shows the scatter plots of the subjective scores againstthe objective scores predicted using different quality metricsin the largest TID2013 database, where the distortion typesrepresented by different shapes and colors are shown in Fig. 7.A good metric is expected to produce scatter plots with betterconvergence and monotonicity. Convergence means that thesample points are close to the fitted curve, while monotonicity

Fig. 7. Distortions represented by different shapes and colors in Fig. 6.

means that the objective scores should increase (or decrease)with the decrease (or increase) of the corresponding subjec-tive scores. It is intuitively observed from Fig. 6 that SFF,VSI and the proposed QASD exhibit slightly better conver-gence and monotonicity than the other metrics. Followingthem are IW_SSIM, MAD, FSIM and GSM. By comparison,

Page 9: Sparse Representation-Based Image Quality Index With ... · Sparse Representation-Based Image Quality Index With Adaptive Sub-Dictionaries Leida Li, Hao Cai, ... of current version

LI et al.: SPARSE REPRESENTATION-BASED IMAGE QUALITY INDEX WITH ADAPTIVE SUB-DICTIONARIES 3783

TABLE IV

SRCC COMPARISON OF 12 IQA MODELS IN INDIVIDUAL DISTORTION TYPES

the two earlier metrics SSIM and VIF do not produce verysatisfactory fitting results, because neither the convergence northe monotonicity is sufficiently good. From the fitting resultof QASD, it is known that the convergence for the moderateand high-quality images (corresponding to high MOS values)are better than that of low-quality images, which also holdsfor most of the compared metrics. This indicates that most ofthe IQA metrics are more effective in evaluating high-qualityimages than low-quality images.

D. Performance on Individual Distortions

In order to know how the proposed method performs ondifferent distortion types, we conduct the same experimentson individual distortions. Table IV summarizes the SRCCresults of the twelve quality models on IVC, LIVE and CSIQdatabases. In the table, the metrics that produce the best threeresults for each distortion are marked in boldface.

It is observed from the table that none of the metrics canachieve the best performance for all distortion types. In eachdatabase, QASD always has several cases that are among thethe top three metrics. This indicates that QASD not onlyproduces very good overall performance, but also performswell on individual distortion types.

E. Impact of Training Images

Dictionary is fundamental to the proposed method, whichis trained using natural images. Therefore, it is meaningful toinvestigate the impact of training images on the performanceof the proposed metric. To this end, we train other twodictionaries using different images, and then use the newdictionaries to test the performance of the proposed method.Fig. 8 shows the two sets of training images, together with thedictionaries. Table V lists the results of PLCC and SRCC onthe five databases. Note that in the table, Dict. 1 denotes thedictionary trained using the images in Fig. 5.

Fig. 8. Other two training image sets and the corresponding dictionaries.(a) Training image set 2. (b) Training image set 3. (c) Dictionary 2 (Dict. 2).(d) Dictionary 3 (Dict. 3).

From Table V, it is observed that the three dictionariesproduce very similar results. This indicates that the proposedmethod is not sensitive to training images during dictionarylearning. Therefore, a universal dictionary can be used topredict the quality of common images. This is helpful forpractical applications.

Page 10: Sparse Representation-Based Image Quality Index With ... · Sparse Representation-Based Image Quality Index With Adaptive Sub-Dictionaries Leida Li, Hao Cai, ... of current version

3784 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 25, NO. 8, AUGUST 2016

TABLE V

PERFORMANCES OF QASD WITH DIFFERENT DICTIONARIES

TABLE VI

IMPACT OF SPARSITIES ON THE PERFORMANCES OF QASD

F. Impact of Sparsities

In dictionary based sparse representation, sparsity refersto the number of basis vectors used to represent a givensignal. Specifically in this paper, sparsity is the number ofnonzero elements (L) in xr

i j of Eq. (4). In this subsection, wetest the performances of the proposed method using differentsparsities. Table VI lists the experimental results.

It is known from the table that the overall performance ofQASD improves gradually with the increase of sparsity. Whenthe sparsity is higher than two, the performance improvesvery slightly. With the consideration that a large sparsity willincur high computational cost, we set the sparsity to twoin this work, which can achieve a good trade-off betweencomputational time and metric performance.

V. CONCLUSIONS

In this work, we have addressed a novel full-referenceimage quality assessment model using sparse representation.A universal overcomplete dictionary has been utilized toextract sparse features, which are used to measure the structuredistortions in images. Particularly, we propose an adaptive sub-dictionary selection approach to achieve this goal. Instead of

comparing the sparse coefficients directly, we have also pro-posed to construct two feature maps and evaluate imagequality based on their similarity measurement. The featuremaps are also adopted to construct a weighting map. The finalquality score is generated by combining the sparse featuresimilarity with auxiliary features, including gradient, colorand luminance. We have conducted extensive experiments andcomparisons on five publicly available image quality data-bases, and the experimental results have demonstrated that theproposed method produces the state-of-the-art performance.

In real-word applications, apart from the distortions them-selves, several external factors also have great impact on theperceived image quality, such as viewing distance and displaycharacteristics [6]. In order to design more advanced imagequality models, viewing distance and display characteristicsshould be considered. Furthermore, how to handle image colorand structure distortions in a holistic manner [49] needs moreexploration.

ACKNOWLEDGEMENT

The authors would like to thank both the editor and theanonymous reviewers for their invaluable comments and sug-gestions that helped a lot in the revisions of this paper.

REFERENCES

[1] W. Lin and C.-C. Jay Kuo, “Perceptual visual quality metrics: A survey,”J. Visual Commun. Image Represent., vol. 22, no. 4, pp. 297–312, 2011.

[2] Z. Pan, Y. Zhang, and S. Kwong, “Efficient motion and disparityestimation optimization for low complexity multiview video coding,”IEEE Trans. Broadcast., vol. 61, no. 2, pp. 166–176, Jun. 2015.

[3] J. Li, X. Li, B. Yang, and X. Sun, “Segmentation-based image copy-move forgery detection scheme,” IEEE Trans. Inf. Forensics Security,vol. 10, no. 3, pp. 507–518, Mar. 2015.

[4] Z. Xia, X. Wang, X. Sun, Q. Liu, and N. Xiong, “Steganalysis of LSBmatching using differences between nonadjacent pixels,” MultimediaTools Appl., vol. 75, no. 4, pp. 1947–1962, Feb. 2016.

[5] Z. Xia, X. Wang, X. Sun, and B. Wang, “Steganalysis of least significantbit matching using multi-order differences,” Security Commun. Netw.,vol. 7, no. 8, pp. 1283–1291, Aug. 2014.

[6] M. Narwaria, R. K. Mantiuk, M. P. Da Silva, and P. Le Callet,“HDR-VDP-2.2: A calibrated method for objective quality prediction ofhigh-dynamic range and standard images,” J. Electron. Imag., vol. 24,no. 1, Jan. 2015, Art. no. 010501.

[7] K. Berger, Y. Koudota, M. Barkowsky, and P. Le Callet, “Subjectivequality assessment comparing UHD and HD resolution in HEVCtransmission chains,” in Proc. 7th Int. Conf. Quality MultimediaExper. (QoMEX), May 2015, pp. 1–6.

[8] F. Battisti, E. Bosc, M. Carli, P. Le Callet, and S. Perugia, “Objectiveimage quality assessment of 3D synthesized views,” Signal Process.,Image Commun., vol. 30, pp. 78–88, Jan. 2015.

[9] D. M. Chandler, “Seven challenges in image quality assessment:Past, present, and future research,” ISRN Signal Process., vol. 2013,Feb. 2013, Art. no. 905685.

[10] A. Rehman and Z. Wang, “Reduced-reference image quality assessmentby structural similarity estimation,” IEEE Trans. Image Process., vol. 21,no. 8, pp. 3378–3389, Aug. 2012.

[11] J. Wu, W. Lin, G. Shi, and A. Liu, “Reduced-reference image qualityassessment with visual information fidelity,” IEEE Trans. Multimedia,vol. 15, no. 7, pp. 1700–1705, Nov. 2013.

[12] L. Li, H. Zhu, G. Yang, and J. Qian, “Referenceless measure of blockingartifacts by Tchebichef kernel analysis,” IEEE Signal Process. Lett.,vol. 21, no. 1, pp. 122–125, Jan. 2014.

[13] S. A. Golestaneh and D. M. Chandler, “No-reference quality assessmentof JPEG images via a quality relevance map,” IEEE Signal Process. Lett.,vol. 21, no. 2, pp. 155–158, Feb. 2014.

[14] R. Hassen, Z. Wang, and M. M. A. Salama, “Image sharpness assessmentbased on local phase coherence,” IEEE Trans. Image Process., vol. 22,no. 7, pp. 2798–2810, Jul. 2013.

Page 11: Sparse Representation-Based Image Quality Index With ... · Sparse Representation-Based Image Quality Index With Adaptive Sub-Dictionaries Leida Li, Hao Cai, ... of current version

LI et al.: SPARSE REPRESENTATION-BASED IMAGE QUALITY INDEX WITH ADAPTIVE SUB-DICTIONARIES 3785

[15] L. Li, W. Lin, X. Wang, G. Yang, K. Bahrami, and A. C. Kot, “No-reference image blur assessment based on discrete orthogonal moments,”IEEE Trans. Cybern., vol. 46, no. 1, pp. 39–50, Jan. 2016.

[16] H. Liu, N. Klomp, and I. Heynderickx, “A no-reference metric forperceived ringing artifacts in images,” IEEE Trans. Circuits Syst. VideoTechnol., vol. 20, no. 4, pp. 529–539, Apr. 2010.

[17] A. Mittal, R. Soundararajan, and A. C. Bovik, “Making a ‘completelyblind’ image quality analyzer,” IEEE Signal Process. Lett., vol. 20, no. 3,pp. 209–212, Mar. 2013.

[18] Z. Wang and A. C. Bovik, “Mean squared error: Love it or leave it?A new look at signal fidelity measures,” IEEE Signal Process. Mag.,vol. 26, no. 1, pp. 98–117, Jan. 2009.

[19] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Imagequality assessment: From error visibility to structural similarity,” IEEETrans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.

[20] Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structuralsimilarity for image quality assessment,” in Proc. 37th IEEE AsilomarConf. Signals, Syst. Comput., Nov. 2003, pp. 1398–1402.

[21] Z. Wang and Q. Li, “Information content weighting for perceptualimage quality assessment,” IEEE Trans. Image Process., vol. 20, no. 5,pp. 1185–1198, May 2011.

[22] H. R. Sheikh, A. C. Bovik, and G. de Veciana, “An information fidelitycriterion for image quality assessment using natural scene statistics,”IEEE Trans. Image Process., vol. 14, no. 12, pp. 2117–2128, Dec. 2005.

[23] H. R. Sheikh and A. C. Bovik, “Image information and visual quality,”IEEE Trans. Image Process., vol. 15, no. 2, pp. 430–444, Feb. 2006.

[24] E. C. Larson and D. M. Chandler, “Most apparent distortion: Full-reference image quality assessment and the role of strategy,” J. Electron.Imag., vol. 19, no. 1, p. 011006, Mar. 2010.

[25] L. Zhang, L. Zhang, X. Mou, and D. Zhang, “FSIM: A feature similarityindex for image quality assessment,” IEEE Trans. Image Process.,vol. 20, no. 8, pp. 2378–2386, Aug. 2011.

[26] A. Liu, W. Lin, and M. Narwaria, “Image quality assessment basedon gradient similarity,” IEEE Trans. Image Process., vol. 21, no. 4,pp. 1500–1512, Apr. 2012.

[27] W. Xue, L. Zhang, X. Mou, and A. C. Bovik, “Gradient magnitudesimilarity deviation: A highly efficient perceptual image quality index,”IEEE Trans. Image Process., vol. 23, no. 2, pp. 684–695, Feb. 2014.

[28] L. Zhang, Y. Shen, and H. Li, “VSI: A visual saliency-induced indexfor perceptual image quality assessment,” IEEE Trans. Image Process.,vol. 23, no. 10, pp. 4270–4281, Oct. 2014.

[29] L.-J. Li, H. Su, Y. Lim, and L. Fei-Fei, “Object bank: An object-levelimage representation for high-level visual recognition,” Int. J. Comput.Vis., vol. 107, no. 1, pp. 20–39, Mar. 2014.

[30] J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma, “Robust facerecognition via sparse representation,” IEEE Trans. Pattern Anal. Mach.Intell., vol. 31, no. 2, pp. 210–227, Feb. 2009.

[31] J. Yang, J. Wright, T. S. Huang, and Y. Ma, “Image super-resolutionvia sparse representation,” IEEE Trans. Image Process., vol. 19, no. 11,pp. 2861–2873, Nov. 2010.

[32] J. Mairal, M. Elad, and G. Sapiro, “Sparse representation for color imagerestoration,” IEEE Trans. Image Process., vol. 17, no. 1, pp. 53–69,Jan. 2008.

[33] B. A. Olshausen and D. J. Field, “Emergence of simple-cell receptivefield properties by learning a sparse code for natural images,” Nature,vol. 381, no. 6583, pp. 607–609, 1996.

[34] B. A. Olshausen and D. J. Field, “Sparse coding with an overcompletebasis set: A strategy employed by V1?” Vis. Res., vol. 37, no. 23,pp. 3311–3325, 1997.

[35] H.-W. Chang, H. Yang, Y. Gan, and M.-H. Wang, “Sparse feature fidelityfor perceptual image quality assessment,” IEEE Trans. Image Process.,vol. 22, no. 10, pp. 4007–4018, Oct. 2013.

[36] T. Guha, E. Nezhadarya, and R. K. Ward, “Sparse representation-basedimage quality assessment,” Signal Process., Image Commun., vol. 29,no. 10, pp. 1138–1148, Nov. 2014.

[37] T. Guha, E. Nezhadarya, and R. K. Ward, “Learning sparse models forimage quality assessment,” in Proc. IEEE Int. Conf. Acoust., Speech,Signal Process., May 2014, pp. 151–155.

[38] Y. C. Pati, R. Rezaiifar, and P. S. Krishnaprasad, “Orthogonal matchingpursuit: Recursive function approximation with applications to waveletdecomposition,” in Proc. 27th Annu. Asilomar Conf. Signals, Syst.,Comput., Nov. 1993, pp. 40–44.

[39] L. Li, H. Cai, Y. Zhang, and J. Qian, “Sparse representation based imagequality assessment with adaptive sub-dictionary selection,” in Proc. 16thPacific-Rim Conf. Multimedia, 2015, pp. 53–62.

[40] M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: An algorithm fordesigning overcomplete dictionaries for sparse representation,” IEEETrans. Signal Process., vol. 54, no. 11, pp. 4311–4322, Nov. 2006.

[41] H. Lee, A. Battle, R. Raina, and A. Y. Ng, “Efficient sparse codingalgorithms,” in Proc. 19th Ann. Conf. Neural Inf. Process. Syst. (NIPS),2007, pp. 801–808.

[42] J. Mairal, F. Bach, J. Ponce, and G. Sapiro, “Online dictionary learningfor sparse coding,” in Proc. 26th Annu. Int. Conf. Mach. Learn.,Montreal, QC, Canada, 2009, pp. 689–696.

[43] M. S. Nixon and A. S. Aguado, Feature Extraction & Image Processingfor Computer Vision. New York, NY, USA: Academic, 2012.

[44] P. Le Callet and F. Autrusseau. (2005). Subjective Quality Assess-ment IR-CCyN/IVC Database, accessed on 2005. [Online]. Available:http://www.irccyn.ecnantes.fr/ivcdb/

[45] H. R. Sheikh, M. F. Sabir, and A. C. Bovik, “A statistical evaluation ofrecent full reference image quality assessment algorithms,” IEEE Trans.Image Process., vol. 15, no. 11, pp. 3440–3451, Nov. 2006.

[46] N. Ponomarenko et al., “TID2008—A database for evaluationof full-reference visual quality assessment metrics,” Adv. ModernRadioelectron., vol. 10, no. 4, pp. 30–45, 2009.

[47] N. Ponomarenko et al., “Color image database TID2013: Peculiaritiesand preliminary results,” in Proc. 4th Eur. Workshop Vis. Inf. Process.,Jun. 2013, pp. 106–111.

[48] (Aug. 2003). Final Report from the Video Quality Experts Group on theValidation of Objective Models of Video Quality Assessment, Phase II.[Online]. Available: http://www.vqeg.org

[49] B. Chen, H. Shu, G. Coatrieux, G. Chen, X. Sun, and J. L. Coatrieux,“Color image analysis by quaternion-type moments,” J. Math. Imag. Vis.,vol. 51, no. 1, pp. 124–144, 2015.

Leida Li received the B.S. and Ph.D. degrees fromXidian University, Xi’an, China, in 2004 and 2009,respectively. In 2008, he was a visitingPh.D. student with the Department of ElectronicEngineering, National Kaohsiung University ofApplied Sciences, Taiwan. From 2014 to 2015,he was a Visiting Research Fellow with the Schoolof Electrical and Electronic Engineering, NanyangTechnological University, Singapore. He is currentlya Professor with the School of Information andElectrical Engineering, China University of Mining

and Technology, China. His research interests include multimedia qualityassessment, information hiding, and image forensics.

Hao Cai received the B.E. degree from the ChinaUniversity of Geosciences, Wuhan, China, in 2013,and the master’s degree with the School of Infor-mation and Electrical Engineering, China Univer-sity of Mining and Technology, China, in 2015.He is currently pursuing the Ph.D. degree with theDepartment of Computer Science, Memorial Univer-sity of Newfoundland, Canada. His research inter-ests include image quality assessment and sparserepresentation.

Yabin Zhang received the B.E. degree in electronicinformation engineering from the Honors School,Harbin Institute of Technology in 2013. He is cur-rently pursuing the Ph.D. degree with the Schoolof Computer Engineering, Nanyang TechnologicalUniversity, Singapore. His research interests includevideo coding, image/video processing, and computervision.

Page 12: Sparse Representation-Based Image Quality Index With ... · Sparse Representation-Based Image Quality Index With Adaptive Sub-Dictionaries Leida Li, Hao Cai, ... of current version

3786 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 25, NO. 8, AUGUST 2016

Weisi Lin (M’92–SM’98–F’16) received the B.S.and M.S. degrees from Zhongshan University,Guangzhou, China, in 1982 and 1985, respec-tively, and the Ph.D. degree from King’s College,London University, London, U.K., in 1992. He wasthe Lab Head, Visual Processing, and the ActingDepartment Manager, Media Processing with theInstitute for Infocomm Research, Singapore. He iscurrently an Associate Professor with the Schoolof Computer Engineering, Nanyang TechnologicalUniversity, Singapore. He is a Chartered Engineer

in U.K. He has authored over 200 refereed papers in international journalsand conferences. His areas of expertise include image processing, perceptualmodeling, video compression, multimedia communication, and computervision.

Dr. Lin is a fellow of Institution of Engineering Technology, and anHonorary Fellow with the Singapore Institute of Engineering Technolo-gists. He co-chairs the IEEE MMTC Special Interest Group on Quality ofExperience. He was the Lead Technical Program Chair for the Pacific-RimConf. On Multimedia 2012, and was a Technical Program Chair of theIEEE International Conference on Multimedia and Expo 2013. He serveson the editorial boards of the IEEE TRANSACTIONS ON IMAGE PROCESS-ING, the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO

TECHNOLOGY, the IEEE SIGNAL PROCESSING LETTERS and the Journalof Visual Communication and Image Representation. He has been elected asa Distinguished Lecturer of APSIPA (2012/3). He served as the Lead GuestEditor for a special issue on perceptual signal processing, the IEEE JOURNAL

OF SELECTED TOPICS IN SIGNAL PROCESSING, in 2012.

Alex C. Kot (S’85–M’89–SM’98–F’06) has beenwith Nanyang Technological University, Singapore,since 1991. He headed the Division of Informa-tion Engineering with the School of Electrical andElectronic Engineering for eight years and servedas an Associate Chair/Research and the Vice DeanResearch with the School of Electrical and Elec-tronic Engineering. He is currently a Professorand an Associate Dean with the College of Engi-neering and a Director of the Rapid-Rich ObjectSearch (ROSE) Lab. He has published extensively

in the areas of signal processing for communication, biometrics, data-hiding,image forensics, and information security.

Dr. Kot is member of the IEEE Fellow Evaluation Committee and a fellowof Academy of Engineering, Singapore. He received the Best Teacher ofthe Year Award and is a co-author for several best paper awards, includingICPR, IEEE WIFS, ICEC, and IWDW. Dr. Kot has served on the IEEE SPSociety in various capacities, such as the General Cochair for the 2004 IEEEInternational Conference on Image Processing and a Chair of the worldwideSPS Chapter Chairs, and the Distinguished Lecturer Program. He serves asthe Vice-President for the IEEE Signal Processing Society. He also servedas a Guest Editor for the Special Issues for the IEEE TRANSACTIONS ON

CSVT and the EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING.He is also an Editor of the EURASIP JOURNAL ON ADVANCED IN SIGNALPROCESSING. He is an IEEE SPS Distinguished Lecturer. He served as anAssociate Editor of the IEEE TRANSACTIONS ON IMAGE PROCESSING, theIEEE TRANSACTIONS ON SIGNAL PROCESSING, the IEEE TRANSACTIONS

ON MULTIMEDIA, the IEEE SIGNAL PROCESSING LETTERS, the IEEESignal Processing Magazine, the IEEE JOURNAL OF SELECTED TOPICS

IN SIGNAL PROCESSING, the IEEE TRANSACTIONS ON CIRCUITS AND

SYSTEMS FOR VIDEO TECHNOLOGY, and the IEEE TRANSACTIONS ON

CIRCUITS AND SYSTEMS I and the IEEE TRANSACTIONS ON CIRCUITS ANDSYSTEMS II. He is currently Associate Editor of the IEEE TRANSACTIONS

ON INFORMATION FORENSICS AND SECURITY.

Xingming Sun received the B.S. degree in math-ematics from Hunan Normal University, Hunan,China, in 1984, the M.S. degree in computingscience from the Dalian University of Scienceand Technology, Dalian, China, in 1988, and thePh.D. degree in computer science from Fudan Uni-versity, Shanghai, China, in 2001. He is currentlya Professor with the School of Computer and Soft-ware, Nanjing University of Information Science andTechnology, Nanjing, China. His research interestsinclude network and information security, digital

watermarking, cloud computing security, and wireless network security.