Combined scalability coding based on the scalable extension of H.264/AVC

Combined scalability codingCombined scalability codingbased on the scalable extension of based on the scalable extension of

H.264/AVCH.264/AVC

Sangseok Park, PhD candidate

June 13, 2008

2

AbstractScalable Video Coding (SVC) of H.264/AVC

The annex G of latest SVC draft [JVT-X201]Spatial, temporal, quality scalabilitiesFinal Draft International Standard (FDIS) in July 2007

Simplified Fine-Granular Scalability (FGS) [JVT-W111]Combination of significant and refinement coding passesIntroduction of code type methodSVC phase 2

Bit-Depth Scalability (BDS) designBased on Inter-Layer Prediction (ILP) [Schwarz07]Reverse tone-mapping processThe low complexity motion-compensation structureSVC phase 2

[JVT-X201] T. Wiegand et al, “Joint Draft ITU-T Rec. H.264 | ISO/IEC 14496-10 / Amd.3 Scalable video coding,” JVT Document, JVT-X201, Geneva, Switzerland, Jul. 2007, [JVT-W111] M. Karczewicz, S. Park, and H. Chung., "Report of core experiment on FGS simplification (CE1)." JVT Document, JVT-W111. San Jose, California, Apr. 2007 [Schwarz07] H. Schwarz, D. Marpe, and T. Wiegand,“Overview of the Scalable Video Coding Extension of the H.264 /AVC standard,” JVT Document, IEEE Trans. CSVT, vol. 17(9), pp. 1103-1120, Sept. 2007

3

Scalability : One bit-stream can adapt itself according to networks or terminals by dropping or truncation of parts of a bit-stream even without severe degradation of the content

Good coding efficiencyLow decoder complexityMuch better than simulcast stream, the simplest scalability approach, which combines several independent bit streams

[HHI website] HHI's Image Communication., "The Scalable Video Coding Amendment of the H.264/AVC Standard." http://ip.hhi.de/imagecom_G1/savce/.

[HHI website]

4

Quality scalabilityCGS coding ( Coarse Grain Scalability )

Small number of bit-extraction pointsCoarse bit rate variationDecreasing quantization steps from the SNR base layer to enhancement layer.

Fine grain scalability (FGS) codingProgressive refinement (PR) slicesCan be truncated at arbitrary extraction pointsNo zig-zag scanning order of transform coefficients, cyclical scanning order usedHigh complexity Macroblock

(MB)

Slice

5

SVC encoder block diagram

Inputvideo

2D spatialdecimation

Multiplex

Base layer

H.264/AVC compatible scalable video encoder

Transform/Quantization

Transform/Quantization

Entropyencoder

Entropyencoder

1st spatial enhancement layer

2nd spatial enhancement layer

Hierarchicalmotion-compensated

prediction


prediction

texture

motion


prediction

texture

motionTransform/

Quantization

Inter-layer prediction in intra, motion, and residual

H.264/AVC compatible

base layer bitstream

FGS enhancement layer



Scalable bit-streamSpatial

Scalability

CGS

texture

motionEntropyencoder

Progressive refinement

coding (FGS)


coding (FGS)


coding (FGS)

6

Simplified FGS encoder block diagram

EOB: End of block, VLC: Variable length coder, CBP: Coded block patternBL : Base layer, EL : Enhancement layer

FGS layer

Prescan MBs to find VLC code tablefor coding of significant coefficients

Encode luma & chroma CBP VLC

Encode luma && chroma

Frame

Prescan MBs to find refinement coefficients

VLC writer

SIGNIFICANT & NOT CODED coefficients : refinement coefficients

Combining refinement coefficientsSign of BL = coeffBL < 0 ? 1 : 0Sign of EL = coeffEL < 0 ? 1 : 0Sign=XOR (sign of BL, sign of EL )Symbol = Sign ? 2 : 1

Encode MB header ( luma scan index == 0 )

Encode luma & chroma significant & refinement coefficient

in case CBP4x4 != 0

VLC writer

VLC writerx x

EOB shift arraystatistically obtained Select the best codebook out of 5

Encode luma refinement coefficients

Bit-stream

Bit-stream

Bit-stream

This part are skipped for simplification

7

Results for FGS ( Fine Granular Scalability) Performed on the basis of JSVM 7.10 with C++ [JSVM] The proposed method was accepted and verified as SVC of H.264/AVC in the 23rd JVT meeting, San Jose, CA [JVT-W200]The average improvement on all tested CIF sequences is 0.46% bitrate reduction while the complexity of original FGS encoder is reduced so much that high-level syntax decreased up to one-third of the original FGS encoder [JVT-W200]

[JSVM] JSVM 8.10, RWTH CVS server[JVT-W200] T. Wiegand et al, “Meeting Report, Draft 7,” JVT Document, JVT-W200, San Jose, CA, Apr. 2007

8

Bit-Depth Scalability (BDS)

New scalability, called as Bit-Depth scalability, needed for High dynamic range (HDR) contents, such as high accurate video, remote sensing, medical applications, digital animation movies since HDR cameras and display devices have been developed.

Bit Depth,8, shows two to three orders of dynamic range ex: 256

Bit Depth,10, shows three to four orders of dynamic range ex: 1024

Backward compatibility should be considered.

The content can be viewed simultaneously in both current low dynamic range (LDR) devices and HDR devices.

However the current SVC does not support the bit-depth scalability.

HDR camera

HDR sequence

LDR sequence

SVCbit-stream

HDRdevices

LDRdevices

Postproduction

TM processingBacward

compatibility

Bit-stream extraction

9

Tone-mapping (TM) or Inverse tone-mapping (iTM) ideas for reduction or extension of the dynamic range.

TM : convert HDR sequences into LDR sequences. ex: 10bpp to 8bpp

iTM : convert LDR sequences into HDR sequences but not a exactly mathematical inverse due to loss of information. ex: 8bpp to 10bpp

Preserve the human perception for the scene.

HDR images cannot be viewed with conventional monitors but can be viewed after TM processes.

10

The coding flow of the enhancement layer for each macroblock is arranged as follows

11

+

Tone mapping

+Entropy codingLow bit-depth

input data

High bit-depth input data

+

-Transform

Transform

Quantization

QuantizationEntropy coding

-

Multiplex

10bit

8 bit

10bit

8 bit

Bit stream

The enhancement layer

The base layer

Prediction(Inter)

Prediction(Inter/Intra)

Inversetone mapping

+

10 bit

Deblockingfilter

Deblockingfilter

Scaling and Inversetransform

Scaling and Inversetransform

Base layer(intra MBs)

Enhancement layer

Frame n

Collocated MB

Base layer(inter MBs)

Enhancement layer

Frame n

Collocated MB

Inter-Layer Intra Prediction

Reconstructedpixels

Inter-Layer Motion Prediction

MVs, Reference index

MVs, Reference index

RD optimization

SVC structure for BDS

[Park08] S. Park and K.R.Rao, “Bit-Depth Scalable Video Coding Based on H.264/AVC,” IEICE Trans. Fundamentals Letter, Vol.E91-A, No.6, pp. 1541-1544 June 2008

12

Generate a mapping function

Inverse tone-mapping (iTM) to expand the dynamic range in LDR (Low Dynamic Range) sequences

Linear scaling is the simplest approach

Severe noise in borders of bright area and dark area and makes contrast change sharply

Mapping Function (MF) approach

Arithmetic mean, not computationally expensive and easy to use to obtain a one-to-many mapping function [Mantiuk06]

i : the number of pixels per frame, j :

is a pixel value in a LDR sequence, is a pixel value in a HDR (High Dynamic Range) sequence, and is the number of frequencies where how many cases of pixels fall into each j bin. Mapping information is sent on a sequence parameter set (SPS) for the entire

sequence

Can be overridden, depending on the features of each frame, by being sent on picture parameter set (PPS) or a slice header.

),(ˆ),(),(

2),(),(ˆ

1010

810

yxLyxLyxResidual

yxLyxLHDR

LDR

iLiLsum HDRLDR )()(

1)( jsumjhist

jMF

layerbasetheinpixelperbits2 iLLDR iLHDR

)( jhist

13

Bit Steam in H.264/AVC [Wiegand03]VCL (Video Coding Layer) NAL unit (Network Abstraction Layer)

contains the values of the samples in the video pictures

non VCL NAL unit contains associated additional information such as parameter sets

One frame can be one slice or split into several slices but one frame corresponds to one slice in my research

0 1 703

704

576704

...

...

405503

4CIF 8 bpp frame

150 50

0 1 703

704

576704

...

...

405503

4CIF 10 bpp frame

560 200

Index i

iLLDR

iLHDR

704*576-1=405503

2550 1 150 ...

1500 LDRL

5600 HDRL

560

...

50200

...

... ... jhist

Accumulated sum

Each bin contains the number of counts

Obtain average by each count

jMF

SPS#1 PPS#1 Slice #1 or Frame #1 PPS#2 Slice #2 or Frame #2...

One bit stream

non VCL NAL unit non VCL NAL unit

VCL NAL unit VCL NAL unit

[Wiegand03] T. Wiegand et al, “Overview of the H.264/AVC video coding standard,” IEEE Trans. CSVT, vol. 13(7), pp. 560-576, July 2003.

14

),(),(*_),(ˆ8

110 yxoffsetyxLfactorscalingyxL LDR

),(*_),(),( 810 yxLfactorscalingyxLyxResidual LDRHDR

1 1

10 ,,*1),(

Wm

mi

Hn

nj

HDR yjxiResidualyxLHW

yxoffset

)),((),(ˆ8

210 yxLMFyxL LDR

15

Bitrate(kbps) PSNR-Y(dB) Bitrate PSNR-Y PSNR(dB) Bitrate(%)

3215.13 43.85 3132.28 43.707644.33 46.90 7533.29 46.71

15829.76 50.58 15699.18 50.3629116.24 54.53 29399.27 54.37

6935.62 38.94 5131.48 39.5212931.06 42.66 9982.40 42.8724121.86 46.44 19280.89 46.2945838.05 50.34 37380.40 50.10

3650.80 43.13 3584.24 42.938993.74 46.01 8961.20 45.87

19258.09 49.60 19330.82 49.5336366.92 53.50 36799.18 53.50

3472.78 41.76 3343.08 41.617662.78 44.47 7304.21 44.22

16603.69 47.33 15805.15 46.9835965.69 50.71 34531.20 50.39

5141.98 40.72 5004.92 40.6414637.22 43.63 14440.12 43.5532431.34 47.35 32152.78 47.3260242.00 51.49 60407.29 51.53

2446.87 42.85 2694.82 42.845914.61 44.96 6051.15 44.83

14376.38 47.63 14523.98 47.5033552.38 51.08 33785.44 50.97

Average of 10bit sequences 0.14 -1.26

63264.40 43.67 40684.40 47.70107487.60 47.75 72171.92 50.68175330.16 51.49 124266.40 53.43288465.84 54.97 219020.96 56.2982232.00 42.12 58211.76 44.76

138702.08 45.78 100101.04 47.88223707.36 49.16 167135.76 50.80364473.12 52.59 283502.64 53.78

Average of 12bit sequences 4.27 -48.61

library 4.72 -52.37

Sunrise 3.81 -44.86

Waves -0.18 5.99

Plane -0.11 3.06

Staples -0.01 0.02

Freeway 1.41 -22.22

Night -0.12 2.62

HHI Proposed

CapitolRecords -0.14 2.98

Video Sequences

Experimental results

16

Coding gain of 0.14dB or 1.2% reduction in bits rate is obtained for 10 bits/pixel test sequences.Coding gain for 12 bits/pixel test sequences reaches up to 4.2dB or 48% reduction in bit rate.This approach brings the minimum increase in complexity by avoiding motion estimation in the enhancement layerIncreases the robustness of quality when there is no frequent update of a mapping function table

17

Future Works related to H.265

H.265 design project from VCEG meeting in Geneva, Apr. 2008 [Lee08] Progressive-scan (only) Picture sizes

QVGA, VGA, 1080p60, 2kx4kFrame rate

12.5/15, 24/25/30, 50/60, 100/120,Picture size/grid conversion within the design (e.g. 4:4:4 4:2:0, 8bpp, 10bpp, 12bpp)Sampling grid : 4:2:0, 4:4:4, Bayer Color ArrayViews : 1, N > 1 Portable encoders, Parallelism, memory bandwidth, asymmetry (can shift balance from encoder to decoder for videoconferencing, surveillance, and mobile camcorders) from complexity issues

[Lee08] From Dr. Yung-Lyul Lee at Sejong University in Korea, presently visiting professor in UTA

Documents

Combined scalability coding based on the scalable extension of H.264/AVC