35
Video coding (Part 5) Video coding (Part 5) Microsoft Window Media and VC Microsoft Window Media and VC - - 1 1 Yi-Shin Tung National Taiwan University (NTU)

9. Video Coding (VC-1)

Embed Size (px)

Citation preview

Page 1: 9. Video Coding (VC-1)

Video coding (Part 5)Video coding (Part 5)⎯⎯Microsoft Window Media and VCMicrosoft Window Media and VC--11

Yi-Shin TungNational Taiwan University (NTU)

Page 2: 9. Video Coding (VC-1)

OutlineOutline

Windows Media family and its evolutionWMV applicationsVideo coding toolsComparison with MPEG-2, H.264/AVCPerformance evaluationsConclusions

Page 3: 9. Video Coding (VC-1)

Goal and applicationsGoal and applications

Focus on streaming compressed audio and video over the Internet to personal computers.Has a vision to move forward and enable the effective delivery of digital media through any networks to any devices.Applications include:– Internet based application like Web broadcast, VOD.– Consumer electronics like DVD, car audio and mobile

phones.– Terrestrial and satellite broadcast (DVB-T and DVB-S)

Page 4: 9. Video Coding (VC-1)

WM endWM end--22--end deliveryend delivery

Windows SDKWM porting kit

Page 5: 9. Video Coding (VC-1)

Windows Media Windows Media CodecsCodecs

Audio codec– Windows Media Audio 9 (mono/stereo, 8kHz~48kHz,

5kbps~320kbps, CD quality at 48~128kbps)– Windows Media Audio 9 Professional (5.1 or 7.1 ch, up to 96kHz,

up to 24 bits/sample, 128kbps~)– Windows Media Audio 9 Lossless (2:1 ratio for stereo)– Windows Media Audio 9 Voice (mono, 4kbps~20kbps, hybrid

CELP/transform coding)Video codec

– Windows Media Video 7 and 8 (non-standard version of MPEG-4)– Windows Media Video 9 (VC-9, VC-1) (160x120@10kbps,

BT.601@2Mbps, 720p@4~6Mbps, 1080i@6~20Mbps)– Windows Media Video 9 Screen (generally 28kbps, 100kbps for

images)– Windows Media Video 9 Image (slide show and transitions)

Page 6: 9. Video Coding (VC-1)

Encoding operational modesEncoding operational modes

One-pass CBR (live encoding and transmission)Two-pass CBR (offline encoding for on-demand streaming)One-pass VBR (live capture)Two-pass VBR (download-and-play applications)Peak-constrained VBR (constrained reading-speed)

Avg/max/min bitrates are specified.

Multiple bitrate encoding (MBR)

Page 7: 9. Video Coding (VC-1)

WMV statusWMV status

HD movies have been commercially released in 2003.WMV-9 is under consideration of SMPTE, to be VC-1 by C-24 group, Sep 2003. Promoted to CD, March 2004.

– previously named “Proposed SMPTE Standard for Television: VC-9 Compressed Video Bitstream Format and Decoding Process”

VC-1 becomes a mandatory codec for two major formats of HD video

– HD-DVD: Microsoft on every DVD, Feb 2004http://news.com.com/2100-1041_3-5166786.html?tag=nefd_top

– BD-DVD (blu-ray): H.264 and VC-1 added to blu-ray standardhttp://www.digitmag.co.uk/news/index.cfm?NewsID=4382

MPEG-LA announces plan for joint VC-1 license– Call for essential patents is first step (http://www.mpegla.com/pid/vc9/)

Page 8: 9. Video Coding (VC-1)

Decoding process block diagramDecoding process block diagram

Bit-streamParsing

Overlap Smooth & Loop Filter

Decoded Frame

Buffer(1-frame delay)

Inv.VLC

InvQuant

InvTransf

PredInv.VLC

Motion Compensation

? pel interp

4MV

? pel interp

Intensity Comp.

&Range

Re-mapping

Out-of-Loop Processing

Post-filtering

Color Conv.

Re-sizing

Implementation-specific

Conforming Implementation

Page 9: 9. Video Coding (VC-1)

Same structureSame structure

Internal color format is 8-bit 4:2:0.Block-based motion compensation and spatial transform.I/P/B definitions are similar to MPEG-4. (not as H.264)

Page 10: 9. Video Coding (VC-1)

Design criteriaDesign criteria

Design metrics– Rate-distortion curve– Visually feedback by cinema testing– Drift-free design for bit exact reconstruction– Computational complexity v.s. coding gain

FP arithmetic is ruled out16 bit word size is preferredConditional statements should be minimized.

Guideline: Any inefficiency in signal processing operations tends to have a big impact on R-D at high rates, whereas any inefficiency in entropy coding has more impact at low rate R-D plot.– Signal process ops: motion comp., transform, loop filtering.– Entropy coding: zigzag scanning, motion vector prediction.

Page 11: 9. Video Coding (VC-1)

Salient innovations of WMVSalient innovations of WMV--99

Adaptive block size transformLimited precision transform setAdaptive motion compensationAdaptive quantizationAdvanced entropy codingLoop filteringAdvanced B frame codingInterlace codingOverlap smoothingLow-rate toolsFading compensation

Page 12: 9. Video Coding (VC-1)

Adaptive block size transformAdaptive block size transform

Large transform v.s. small transform– Pros: good to capture trends and periodicities– Cons: spreading effects due to local transients, ringing effects

Trends and textures are better preserved by large transform, while areas of discontinuity are better by small transform.One 8x8, two 8x4, two 4x8 or four 4x4 transforms are applicable to code a block, which allows to use the size best suited for the underlying data.Transform type can be signaled at the frame, macroblock or block level.Intra block always adopts 8x8 transform.

Page 13: 9. Video Coding (VC-1)

Adaptive block size transform (contAdaptive block size transform (cont’’d)d)

The ability of retain texture information by large transform.Although R-D gain is not huge, it provides major subjective quality benefits, especially for the subtle texture, film details and grain noise.In H.264 high profile, adaptive transform is added for acknowledging this benefit.

Page 14: 9. Video Coding (VC-1)

16 bit transform16 bit transformDesign constraints

– A full 16-bit operation, where both sums and products of two 16-bit values produce results within 16-bits.

– Forward and inverse transform form an orthogonal pair. V×U = diag(D)– Transform approximates a DCT.– Norms of basis functions within one transform type are identical.– Norms of basis functions between transform types are identical.

8x8 inverse transform places the tightest constraint.WMV-9 relaxes the last two constraints. The norms are in the ratio 288:289:292 (1% difference). This is compensated during encodingprocess.Row Itrans => rounding => column Itrans => rounding

Page 15: 9. Video Coding (VC-1)

Motion compensationMotion compensation

8x8 or 16x16 predictionUp to ¼-pel motion vector is adopted.Adaptive motion mode derived from 3 criteria (MV resolution, size, filtering type) is signaled at frame level.– Mixed block size (16x16 and 8x8), ¼-pel, bicubic [high

bitrate]– 16x16, ¼-pel, bicubic– 16x16, ½-pel, bicubic– 16x16, ½-pel, bilinear, [low bitrate]

Page 16: 9. Video Coding (VC-1)

BicubicBicubic filteringfilteringDirect filtering approach, where the 4-tapped coefficients are– (-1*P1 + 9*P2 + 9*P3 -1*P4 + 8 – r) >> 4– (-4*P1 + 53*P2 + 18*P3 – 3*P4 + 32 – r) >> 6– (-3*P1 + 18*P2 + 53*P3 – 4*P4 + 32 – r) >> 6

¼-pel bilinear filtering is applied to chrominance components. ½-pel bilinear is optional for low complexity applications.

Case 3

Case 6

Case 2

Case 1

Case 4

Case 5

Integer locations Case 7Case 8

Page 17: 9. Video Coding (VC-1)

Adaptive quantizationAdaptive quantization

The same quantization rule applies to all 4 transform coeffs.Two quantization modes, decided at each frame– Dead-zone, suitable for low bitrate, {-kQ-D, 0, kQ+D}– Regular uniform quantization, high bitrate, {kQ}– Adaptively change according to the running QP

In the encoding side, dead-zone is always existed.5/2×QP

3/2×QP

Dead-zone

Regular uniform quant

Page 18: 9. Video Coding (VC-1)

Entropy coding: Context adaptive multiple Entropy coding: Context adaptive multiple VLCsVLCs

In WMV9, up to 8 tables (coding sets) are used for coding each symbol and is selected by each frame. E.g., there are 8 transform AC coeff. tables, which is different from H.264, symbols are encoded adaptively by several tables of different symbol distributions.

Y blocks Cb and Cr blocksIndex Table Index Table

0 High Rate Intra 0 High Rate Inter

1 High Motion Intra 1 High Motion Inter

2 Mid Rate Intra 2 Mid Rate Inter

Y blocks Cb and Cr blocksIndex Table Index Table

0 High Rate Intra 0 High Rate Inter

1 High Motion Intra 1 High Motion Inter

2 Mid Rate Intra 2 Mid Rate Inter

Coding Set Correspondence for PQINDEX <= 8

Coding Set Correspondence for PQINDEX > 8 run_before zerosLeft1 2 3 4 5 6 >6

0 1 1 11 11 11 11 1111 0 01 10 10 10 000 1102 - 00 01 01 011 001 1013 - - 00 001 010 011 1004 - - - 000 001 010 0115 - - - - 000 101 0106 - - - - - 100 001… - - - - - - …

Page 19: 9. Video Coding (VC-1)

Entropy coding:Entropy coding: BitplaneBitplane codingcoding

Some symbols are spatially correlated, e.g. MB type. An efficient way to encode these symbols by taking advantage of spatial dependency of these bits7 Modes: Raw, RowSkip, ColSkip, Norm-2, Norm-6, Diff-2 and Diff-6

skip

intra

interMB type of P-VOP

………

… … … …

Norm-2Diff-2

Norm-6Diff-6

Row-skipCol-skip

Page 20: 9. Video Coding (VC-1)

Loop filteringLoop filtering

Independent block coding leads to– Visible “blocky” artifacts– The quality reduction of reference frames

In-loop deblocking filter is used as H.264.Filtering is applied to every 4th, 8th, 12th, etc pixel row or column depending on transform type.Adaptive filtering ruleA shortcut to save computation.Filtering energy is small than that of H.264.

Shortcut

Page 21: 9. Video Coding (VC-1)

Interlace codingInterlace coding

Field picture coding mode– Intra-MB is coded as the progressive case– Inter-MB may be either predicted by one 16x16 or four 4x4

MVs, where each MV can refer to either one of two previously encoded fields.

Page 22: 9. Video Coding (VC-1)

Interlace coding (contInterlace coding (cont’’d)d)

Frame picture coding mode– Intra-MB may be coded by frame DCT or field DCT.– Inter-MB may adopt frame prediction (1 or 4 MVs) or field

prediction (2 or 4 MVs) in addition to DCT types.

Page 23: 9. Video Coding (VC-1)

Advanced BAdvanced B--frame codingframe coding

Explicit coding of the B frame’s temporal position relative to its two reference frames. (variable velocity model)Intra-coded B frames.Improve MV coding efficiency.Allow bottom B-field to refer to top B-field.

Page 24: 9. Video Coding (VC-1)

Overlap smoothing Overlap smoothing

Another technique to reduce blocking artifacts in intra areas.Drawback of deblocking filtering

– It is purely a decoder process, which operates equally on both block-aligned true edges and apparent block edges.

– Usually disable in the less complex profiles.The lapped transform is another way to remove blocking effect.Spatial-domain approach makes lapped transform as a pre- and post-processing.Adaptive applications rule: applied in the lower bitrate, also can be switched on or off at MB-basis.

3

70011711

11711007

1

0

1

0

3

2

1

0

3

2

1

0

>>

⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜

+

⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜

−−

=

⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜

rrrr

xxxx

yyyy

a0 a1 b1 b0

p0 p1 q1 q0

Page 25: 9. Video Coding (VC-1)

LowLow--rate tools (Rate control tools)rate tools (Rate control tools)Dynamic range reduction (intensity res.)– Luminance and chrominance values may be scaled down by

a factor of 2 before coding.

Dynamic frame resizing (spat. res.)– Coded frame size may be half in vertical, horizontal or both

to further reduce rate cost and keep the constant bitrate requirement.

Int. range reductionFrame re-sizing

original

Page 26: 9. Video Coding (VC-1)

Fading compensationFading compensation

Effective with global illumination changes– Natural illumination changes– Artificial transitioning effects, such as fade-to-black, fade-

from-black and dissolves, blending, cross-fades and morphing.

Encoder detects fading prior to motion compensation by comparing the error measure with a threshold.Encoder and decoder use the quantized fading parameters based on a linear first-order function to transform the original reference frame into a new reference frame.

Page 27: 9. Video Coding (VC-1)

Video smoothingVideo smoothingInterpolate missing frames after decoding, also referred to as frame interpolationUse an advanced optical flow estimation technique (on a per-pixel basis), along with warping, to synthesize new frames.Need a CPU at 733MHz to interpolate a video clip at 320x240 from 10 to 30 fps.J. Ribas-Corbeta and J. Sklansky, “Interframe interpolation of cinematic sequences,” Journal of VCIR, Dec 1993.

Page 28: 9. Video Coding (VC-1)

Profiles and levelsProfiles and levels

Simple profileMain profileAdvanced profile

Page 29: 9. Video Coding (VC-1)

MPEGMPEG--2 Video2 Video SMPTE VCSMPTE VC--99 H.264/AVCH.264/AVC

prediction coding

Motion res. & Interpolation ½ bilinear Adaptive ½ bilinear +½ 4-tap FIR +¼ 4-tap FIR/direct

¼ 6-tap FIR/cascaded

Motion block size 16x16 16x16, 8x8 16x16, 16x8, …, 4x4

Brightness change N/A Intensity compensation (P/B) Weighted prediction (B)

Intra prediction Freq-domain pred. (DC) Freq-domain pred. (DC/AC) Spatial-domain prediction

transform coding, entropy coding & postprocessing

CA Multiple VLCs N/A Y Y

Bitplane coding N/A Y N/A

Dynamic frame resizingDynamic range reduction

N/A Y N/A

Streaming & error resilience

Data partitioning N/A N/A Slice level partitioning

Bitstream switching N/A System level SI/SP frames

Post-processing Optional In-the-loop deblockingOverlapped transform

In-the-loop deblocking

Rate control

Quantization uniform Adaptive uniform and non-uniform log scale

Arithmetic coding N/A N/A Y (Main profile)

Ref. Frame num (P/B) 1/2 1/2 M/M

Generalised B N/A N/A Y

Inter-intra mixed N/A Y N/A

Transform size & type 8x8 float 8x8, 8x4, 4x8, 4x4 integer 4x4 integer (only +, >>)

Comparison among HDComparison among HD--DVD video candidates DVD video candidates

Page 30: 9. Video Coding (VC-1)

WMV v.s. MPEGWMV v.s. MPEG--22

Page 31: 9. Video Coding (VC-1)

WMV v.s. MPEGWMV v.s. MPEG--4 SP4 SP

Page 32: 9. Video Coding (VC-1)

WMV v.s. H.264WMV v.s. H.264Glasgow_qcif_15fps

29

30

31

32

33

34

35

36

37

38

39

40

41

42

0 100 200 300 400 500 600

Kbps

PSN

R-Y WMV9

H264-1ref

Page 33: 9. Video Coding (VC-1)

ConclusionsConclusions

Software and hardware components can be developed based on SDKs or WM hardware porting kits.WM 9 provides a variety of state-of-the-art audio and video codecs for different applications.The quality of WMV-9 is competitive with H.264/AVC and arguably superior based on several independent tests, with significantly lower computational complexity.This paper explains why some of the tools unique to WMV-9 provide an intrinsic quality benefit over H.264/AVC.

Page 34: 9. Video Coding (VC-1)

Reading assignmentReading assignment

Mandatory– Sridhar Srinivasan et al., Windows Digital Media Division,

Microsoft Corporation, “Windows Media Video 9: overview and applications,“ Signal Processing: Image Communication, Oct 2004.

Page 35: 9. Video Coding (VC-1)

HomeworkHomework

7. Composite symbol represents different properties of one MB, and tries to exploit its joint occurrence probability. Bitplane coding collects the same symbol for all MBs and removes the in-between correlations. Could you think out a way to simultaneously take advantage of both?