Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Image Quality Assessment:
Unifying Structure and Texture Similarity
May 27, 2020
Kede Ma
Collaborators
Keyan Ding
PhD student
City University of Hong Kong
Shiqi Wang
Assistant Professor
City University of Hong Kong
Eero P. Simoncelli
Professor
New York University
Outline
➢ Review of Full-Reference Image Quality Assessment (FR-IQA)
➢ Deep Image Structure and Texture Similarity Metric (DISTS)
➢ Model Comparison by “Perceptual Optimization”
Full-Reference IQA Review
• Error visibility methods
• Structural similarity methods
• Information theoretic methods
• Learning based methods
• Fusion based methods
MSE (PSNR), VSNR, MAD, PAMSE, NLPD, …
SSIM, MS-SSIM, IW-SSIM, FSIM, GSIM, GMSD, VSI, …
IFC, VIF, …
DNN-based: WaDIQaM-FR, DeepQA, LPIPS, PieAPP, …
MAE + VGG Loss, …
MSE?
Image Credit: Berardino
MSE?
Image Credit: Wang and Simoncelli
SSIM?
• Not “accurate” enough
• Not “computational efficient” enough
• Not misalignment-aware
• Not color-aware
• Not texture-aware
MS-SSIM, IW-SSIM, VIF, MAD, FSIM, VSI, NLPD, LPIPS, …
PAMSE, GMSD, …
Adaptive linear system, CW-SSIM, GTI-IQA, …
Adaptive linear system, FSIM_c, LPIPS, PieAPP, …
STSIM, …
Texture Similarity
Existing full-reference IQA models are over-sensitive to texture resampling
×PSNR, SSIM ✓ LPIPS, DISTS
Reference Blurred Resampled
Texture Similarity
High-resolution EDSR SRGAN
×PSNR, SSIM, LPIPS ✓ DISTS
Existing full-reference IQA models are over-sensitive to texture resampling
A Common Problem of Recent Full-Reference IQA Models
They do not satisfy the uniqueness property (identity of indiscernibles):
D(x, y) = 0 x = y×
Surjective
SSIM, MSE
MS-SSIM, NLPD, DISTS
FSIM, VSI, GMSD
VIF, CW-SSIM, MAD
DeepIQA, PieAPP
Injective
Bijective
Uniqueness is very important for “perceptual optimization”!
Reference Image Recovery
Initialization SSIM FSIM VIF GMSD
Reference NLPD PieAPP LPIPS DISTS
Recovered images
Deep Image Structure and Texture Similarity (DISTS)
Goal:
Develop a full-reference IQA metric that is
1) sensitive to structural distortions (e.g., artifacts due to noise, blur, or compression)
2) tolerant to texture resampling (exchanging a texture region with a new sample)
Two steps:
1. Transform an image to a perceptual representation
2. Measure the distance on the representation
DISTS — Representation
• Use pretrained VGG features VGG
features𝑥 = 𝑓(𝑥)
Conv_5
Conv_4
Conv_3
Conv_2
Conv_1
Hanning window
• Satisfy the injective property
(distinct inputs should map to distinct outputs)
• Replace Max pooling with L2 pooling (translation-invariant)
𝑥
DISTS — Quality Measurements
1. Design texture similarity using global means
We synthesize textures by solving
(a) Statistics of wavelet subbands
710 parameters
(b) Gram matrices of VGG features
~306Kparameters
(c) Global means of VGG features
1,475 parameters
Global mean of each feature map
Reference (a) Portilla & (b) Gatys et al. (c) OursSimoncelli
DISTS — Quality Measurements
2. Design structure similarity using global covariance (inspired by SSIM)
Use normalized “global mean”:
3. Combine texture and structure terms:
Positive learnable weights (1475*2)
DISTS — Transferring to a Metric
Texture comparsionStructure comparsion
𝑥 𝑦𝛼𝑖𝑗 𝛽𝑖𝑗
𝑥𝑗(𝑖)
𝑦𝑗(𝑖)
𝐷 𝑥, 𝑦
l s
Code is available at https://github.com/dingkeyan93/DISTS
DISTS — Training
are jointly optimized for human perception of image quality (KADID-10k dataset)
and texture invariance (two patches (z1, z2) sampled from the same texture image)
The final objective:
DISTS — Connections to Existing IQA Measures
• SSIM and its variants
MS-SSIM, CW-SSIM
• The adaptive linear system framework (Wang and Simoncelli, 2005)
Separating structural and non-structural distortions
• Content and style losses
MSE on VGG features, Gram matrix
• Image restoration losses
Weighted sum of L1/L2 distances computed on the raw
pixels and several stages of VGG feature maps
DISTS — Performance on Quality Prediction
• Three standard IQA databases
DISTS — Performance on Quality Prediction
• Image generation/restoration quality databases
DISTS — Performance on Texture Similarity
• Two texture quality databases
DISTS — Texture Classification and Retrieval
• Brodatz texture dataset
DISTS — Invariance to Geometric Transformations
• A visual example
Reference
Translation, 5% Dilation, 1.05 Cloud movement
Blur JPEG JP2K
DISTS
PSNR
SSIM
FSIM
DISTS — Summary
• A new full-reference IQA method, which is the first of its kind with
built-in invariance to texture resampling
• DISTS unifies structure and texture similarity, is robust to mild geometric
distortions, and performs well in texture relevant tasks
• DISTS can be employed as an objective function in various optimization
problems
A Perceptual Optimization Tour of Full-
Reference IQA Models
IQA Model Comparison
1. Compute correlation with human judgments (PLCC, SRCC)
1) Huge budget to build a large-scale database
2) With potential risk of overfitting
2. MAximum Differentiation competition (MAD) methodology
1) MAD (Wang and Simoncelli, 2008) synthesizes counter-examples to falsify a model
(the generated images may be highly unnatural)
2) gMAD (Ma et al., 2016) searches counter-examples from a large unlabeled image set
3. Compare the IQA-based optimization results
“Analysis by Synthesis”
“Perceptual Optimization”
• Diagram of IQA-based Optimization:
Input Image processing
system
IQA model
evaluation
Output
Feedback
Reference
Denoising
Compression
…
MSE
SSIM
…
A highly promising but relatively under-studied application of objective IQA measures
Optimization Objective
• Select 11 representative IQA models:
MAE, MS-SSIM, VIF, CW-SSIM,
MAD, FSIM, GMSD, VSI, NLPD,
LPIPS, DISTS
• Four low-level vision tasks:
– Image denoising
– Blind image deblurring
– Single image super-resolution
– Lossy image compressionCode is available at
https://github.com/dingkeyan93/IQA-optimization
Optimization Network
• Denoising and Deblurring:
Input Output
ResB
lock
Conv
…
ResB
lock
Conv +
Conv
ReL
U
Conv +
ResBlock
Optimization Network
• Super-resolution:
• Compression:
Input
Output
ResB
lock
Conv
…
ResB
lock
Conv +
Upsam
ple
Conv
Upsam
ple
Conv
Input Output
ResB
lock
Co
nv
…
×𝑛
ResB
lock
Co
nv
Q
Co
nv
ResB
lock
ResB
lock
…
Co
nv
𝑛×Dow
nsam
ple
Up
sample
Analysis Transform Synthesis Transform
Optimization Performance
• Subjective Testing
Two-alternative forced choice (2AFC) method
The Bradley-Terry model is employed to convert
paired comparison results to global rankings
The paired t-test is conducted to investigate whether
the optimization results of the IQA models are
statistically significantTest images
(from the validation set of DIV2K)
Optimization Performance
MS-SSIM MAE MAD LPIPS DISTS NLPD CW-SSIM VSI VIF FSIM GMSD0.70 0.65 0.45 0.45 0.39 0.37 0.36 -0.44 -0.51 -0.58 -2.04
DISTS LPIPS MAD MS-SSIM MAE CW-SSIM VIF NLPD FSIM VSI GMSD3.23 3.10 0.48 0.32 0.20 0.16 -0.79 -0.94 -1.54 -1.73 -2.75
Denoising
Deblurring
Super-res
Compression
DISTS LPIPS MS-SSIM MAE NLPD MAD FSIM VIF VSI GMSD CW-SSIM
2.50 1.88 1.20 1.02 0.65 0.53 -0.70 -1.37 -1.81 -1.85 -2.04
DISTS LPIPS MS-SSIM MAE MAD NLPD FSIM VIF VSI GMSD CW-SSIM2.61 2.35 1.58 1.53 0.68 0.29 -0.37 -1.64 -2.00 -2.06 -4.26
• Performance ranking and grouping:
Best worst
Visual Example — Denoising
Visual Example — Deblurring
Visual Example — Super-Resolution
Visual Example — Compression
Artifacts Analysis
Blurring
MAE, MS-SSIM and NLPD, relying on simple injective mappings,
prefer to make a more conservative estimate, producing something
akin to a superposition of all possible outcomes
GT MAE MS-SSIM NLPD
Super-resolution
Artifacts Analysis
Ringing
FSIM, VSI and GMSD, rely heavily on local gradient magnitude for
feature similarity comparison. This leads to enormous “fake edge” lines
that are imperceptible to gradient operator
GT FSIM VSI GMSD
Deblurring
Artifacts Analysis
• Over-Enhancement
VIF (and IFC), does not fully respect reference information when normalizing
the covariance, with a value larger than unity (indicating an enhancement
of visual quality). But this “improvement” is often going too far
GT VIFSuper-resolution
Artifacts Analysis
• Luminance and color
GMSD, NLPD and so on, discard luminance information, leaving
a huge “null space” to accommodate luminance distortions
GT NLPD GMSDCompression
Conclusions
Some findings:
1. Optimization comparison provides an alternative means of testing the perceptual
relevance of IQA models in a more realistic setting
2. Through perceptual optimization, a number of novel distortions are generated, which
can easily fool many competing models
3. MAE / MSE, SSIM / MS-SSIM will continue to play a central role in optimizing image
processing systems
4. Recent IQA models with surjective mappings (e.g., FSIM, VSI, GMSD, etc.) may still
be used to monitor image quality in a limited space, but not suitable for optimization
5. Two DNN-based models, LPIPS and DISTS seem to stand out in our experiments, but
the high computation and lack of interpretability may hinder their application
Thanks!