Image Quality Assessment: Unifying Structure and Texture ...valser.org/webinar/slide/slides/20200527/kede-IQA-0524.pdf · Deep Image Structure and Texture Similarity (DISTS) Goal:

Image Quality Assessment:

Unifying Structure and Texture Similarity

May 27, 2020

Kede Ma

Collaborators

Keyan Ding

PhD student

City University of Hong Kong

Shiqi Wang

Assistant Professor

City University of Hong Kong

Eero P. Simoncelli

Professor

New York University

Outline

➢ Review of Full-Reference Image Quality Assessment (FR-IQA)

➢ Deep Image Structure and Texture Similarity Metric (DISTS)

➢ Model Comparison by “Perceptual Optimization”

Full-Reference IQA Review

• Error visibility methods

• Structural similarity methods

• Information theoretic methods

• Learning based methods

• Fusion based methods

MSE (PSNR), VSNR, MAD, PAMSE, NLPD, …

SSIM, MS-SSIM, IW-SSIM, FSIM, GSIM, GMSD, VSI, …

IFC, VIF, …

DNN-based: WaDIQaM-FR, DeepQA, LPIPS, PieAPP, …

MAE + VGG Loss, …

MSE?

Image Credit: Berardino

MSE?

Image Credit: Wang and Simoncelli

SSIM?

• Not “accurate” enough

• Not “computational efficient” enough

• Not misalignment-aware

• Not color-aware

• Not texture-aware

MS-SSIM, IW-SSIM, VIF, MAD, FSIM, VSI, NLPD, LPIPS, …

PAMSE, GMSD, …

Adaptive linear system, CW-SSIM, GTI-IQA, …

Adaptive linear system, FSIM_c, LPIPS, PieAPP, …

STSIM, …

Texture Similarity

Existing full-reference IQA models are over-sensitive to texture resampling

×PSNR, SSIM ✓ LPIPS, DISTS

Reference Blurred Resampled

Texture Similarity

High-resolution EDSR SRGAN

×PSNR, SSIM, LPIPS ✓ DISTS

Existing full-reference IQA models are over-sensitive to texture resampling

A Common Problem of Recent Full-Reference IQA Models

They do not satisfy the uniqueness property (identity of indiscernibles):

D(x, y) = 0 x = y×

Surjective

SSIM, MSE

MS-SSIM, NLPD, DISTS

FSIM, VSI, GMSD

VIF, CW-SSIM, MAD

DeepIQA, PieAPP

Injective

Bijective

Uniqueness is very important for “perceptual optimization”!

Reference Image Recovery

Initialization SSIM FSIM VIF GMSD

Reference NLPD PieAPP LPIPS DISTS

Recovered images

Deep Image Structure and Texture Similarity (DISTS)

Goal:

Develop a full-reference IQA metric that is

1) sensitive to structural distortions (e.g., artifacts due to noise, blur, or compression)

2) tolerant to texture resampling (exchanging a texture region with a new sample)

Two steps:

1. Transform an image to a perceptual representation

2. Measure the distance on the representation

DISTS — Representation

• Use pretrained VGG features VGG

features𝑥 = 𝑓(𝑥)

Conv_5

Conv_4

Conv_3

Conv_2

Conv_1

Hanning window

• Satisfy the injective property

(distinct inputs should map to distinct outputs)

• Replace Max pooling with L2 pooling (translation-invariant)

𝑥

DISTS — Quality Measurements

1. Design texture similarity using global means

We synthesize textures by solving

(a) Statistics of wavelet subbands

710 parameters

(b) Gram matrices of VGG features

~306Kparameters

(c) Global means of VGG features

1,475 parameters

Global mean of each feature map

Reference (a) Portilla & (b) Gatys et al. (c) OursSimoncelli

DISTS — Quality Measurements

2. Design structure similarity using global covariance (inspired by SSIM)

Use normalized “global mean”:

3. Combine texture and structure terms:

Positive learnable weights (1475*2)

DISTS — Transferring to a Metric

Texture comparsionStructure comparsion

𝑥 𝑦𝛼𝑖𝑗 𝛽𝑖𝑗

𝑥𝑗(𝑖)

𝑦𝑗(𝑖)

𝐷 𝑥, 𝑦

l s

Code is available at https://github.com/dingkeyan93/DISTS

DISTS — Training

are jointly optimized for human perception of image quality (KADID-10k dataset)

and texture invariance (two patches (z1, z2) sampled from the same texture image)

The final objective:

https://github.com/dingkeyan93/DISTS

DISTS — Connections to Existing IQA Measures

• SSIM and its variants

MS-SSIM, CW-SSIM

• The adaptive linear system framework (Wang and Simoncelli, 2005)

Separating structural and non-structural distortions

• Content and style losses

MSE on VGG features, Gram matrix

• Image restoration losses

Weighted sum of L1/L2 distances computed on the raw

pixels and several stages of VGG feature maps

DISTS — Performance on Quality Prediction

• Three standard IQA databases

DISTS — Performance on Quality Prediction

• Image generation/restoration quality databases

DISTS — Performance on Texture Similarity

• Two texture quality databases

DISTS — Texture Classification and Retrieval

• Brodatz texture dataset

DISTS — Invariance to Geometric Transformations

• A visual example

Reference

Translation, 5% Dilation, 1.05 Cloud movement

Blur JPEG JP2K

DISTS

PSNR

SSIM

FSIM

DISTS — Summary

• A new full-reference IQA method, which is the first of its kind with

built-in invariance to texture resampling

• DISTS unifies structure and texture similarity, is robust to mild geometric

distortions, and performs well in texture relevant tasks

• DISTS can be employed as an objective function in various optimization

problems

A Perceptual Optimization Tour of Full-

Reference IQA Models

IQA Model Comparison

1. Compute correlation with human judgments (PLCC, SRCC)

1) Huge budget to build a large-scale database

2) With potential risk of overfitting

2. MAximum Differentiation competition (MAD) methodology

1) MAD (Wang and Simoncelli, 2008) synthesizes counter-examples to falsify a model

(the generated images may be highly unnatural)

2) gMAD (Ma et al., 2016) searches counter-examples from a large unlabeled image set

3. Compare the IQA-based optimization results

“Analysis by Synthesis”

“Perceptual Optimization”

• Diagram of IQA-based Optimization:

Input Image processing

system

IQA model

evaluation

Output

Feedback

Reference

Denoising

Compression

…

MSE

SSIM

…

A highly promising but relatively under-studied application of objective IQA measures

Optimization Objective

• Select 11 representative IQA models:

MAE, MS-SSIM, VIF, CW-SSIM,

MAD, FSIM, GMSD, VSI, NLPD,

LPIPS, DISTS

• Four low-level vision tasks:

– Image denoising

– Blind image deblurring

– Single image super-resolution

– Lossy image compressionCode is available at

https://github.com/dingkeyan93/IQA-optimization

https://github.com/dingkeyan93/IQA-optimization

Optimization Network

• Denoising and Deblurring:

Input Output

ResB

lock

Conv

…

ResB

lock

Conv +

Conv

ReL

U

Conv +

ResBlock

Optimization Network

• Super-resolution:

• Compression:

Input

Output

ResB

lock

Conv

…

ResB

lock

Conv +

Upsam

ple

Conv

Upsam

ple

Conv

Input Output

ResB

lock

Co

nv

…

×𝑛

ResB

lock

Co

nv

Q

Co

nv

ResB

lock

ResB

lock

…

Co

nv

𝑛×Dow

nsam

ple

Up

sample

Analysis Transform Synthesis Transform

Optimization Performance

• Subjective Testing

Two-alternative forced choice (2AFC) method

The Bradley-Terry model is employed to convert

paired comparison results to global rankings

The paired t-test is conducted to investigate whether

the optimization results of the IQA models are

statistically significantTest images

(from the validation set of DIV2K)

Optimization Performance

MS-SSIM MAE MAD LPIPS DISTS NLPD CW-SSIM VSI VIF FSIM GMSD0.70 0.65 0.45 0.45 0.39 0.37 0.36 -0.44 -0.51 -0.58 -2.04

DISTS LPIPS MAD MS-SSIM MAE CW-SSIM VIF NLPD FSIM VSI GMSD3.23 3.10 0.48 0.32 0.20 0.16 -0.79 -0.94 -1.54 -1.73 -2.75

Denoising

Deblurring

Super-res

Compression

DISTS LPIPS MS-SSIM MAE NLPD MAD FSIM VIF VSI GMSD CW-SSIM

2.50 1.88 1.20 1.02 0.65 0.53 -0.70 -1.37 -1.81 -1.85 -2.04

DISTS LPIPS MS-SSIM MAE MAD NLPD FSIM VIF VSI GMSD CW-SSIM2.61 2.35 1.58 1.53 0.68 0.29 -0.37 -1.64 -2.00 -2.06 -4.26

• Performance ranking and grouping:

Best worst

Visual Example — Denoising

Visual Example — Deblurring

Visual Example — Super-Resolution

Visual Example — Compression

Artifacts Analysis

Blurring

MAE, MS-SSIM and NLPD, relying on simple injective mappings,

prefer to make a more conservative estimate, producing something

akin to a superposition of all possible outcomes

GT MAE MS-SSIM NLPD

Super-resolution

Artifacts Analysis

Ringing

FSIM, VSI and GMSD, rely heavily on local gradient magnitude for

feature similarity comparison. This leads to enormous “fake edge” lines

that are imperceptible to gradient operator

GT FSIM VSI GMSD

Deblurring

Artifacts Analysis

• Over-Enhancement

VIF (and IFC), does not fully respect reference information when normalizing

the covariance, with a value larger than unity (indicating an enhancement

of visual quality). But this “improvement” is often going too far

GT VIFSuper-resolution

Artifacts Analysis

• Luminance and color

GMSD, NLPD and so on, discard luminance information, leaving

a huge “null space” to accommodate luminance distortions

GT NLPD GMSDCompression

Conclusions

Some findings:

1. Optimization comparison provides an alternative means of testing the perceptual

relevance of IQA models in a more realistic setting

2. Through perceptual optimization, a number of novel distortions are generated, which

can easily fool many competing models

3. MAE / MSE, SSIM / MS-SSIM will continue to play a central role in optimizing image

processing systems

4. Recent IQA models with surjective mappings (e.g., FSIM, VSI, GMSD, etc.) may still

be used to monitor image quality in a limited space, but not suitable for optimization

5. Two DNN-based models, LPIPS and DISTS seem to stand out in our experiments, but

the high computation and lack of interpretability may hinder their application

Thanks!

Documents

Image Quality Assessment: Unifying Structure and Texture ...valser.org/webinar/slide/slides/20200527/kede-IQA-0524.pdf · Deep Image Structure and Texture Similarity (DISTS) Goal: