Video Coding - National Chiao Tung Universitycmliu/Courses/Compression/...Video Coding C.M. Liu...

Preview:

Citation preview

Video Coding

C.M. Liu

Perceptual Signal Processing Lab

College of Computer Science

National Chiao-Tung University

Office: EC538

(03)5731877

cmliu@cs.nctu.edu.tw

(

http://www.csie.nctu.edu.tw/~cmliu/Courses/Compression/

1. Color Fundamentals

Sir Isaac Newton in 1666

A glass prism

Six Broad Regions

Violet, Blue, Green, Yellow, Orange, and Red.

2. Color Fundamentals

Light Properties

Visible light is composed of a relatively narrow band (400nm - 700 nm) of

frequencies in the electromagnetic spectrum.

Achromatic light has only the attribute: intensity or amount.

Three basic quantities used to describe the quality of a chromatic light source:

radiance, luminance, and brightness.

2. Color Fundamentals

Radiance

The total amount of energy that flows from the light source.

It is measured in watts(W).

Luminance

A measure of the amount of energy an observer perceives from a light source.

It is measured in lumens(lm)

Brightness

A subjective descriptor that is practically impossible to measure.

It embodies the achromatic notion of intensity and is one of the key factors in describing color sensation.

2. Color Fundamentals

2. Color Fundamentals

Human Eye

Cones (6-7 millions in an eye)

65% of all cones are sensitive to red light.

33% are sensitive to green light.

2% are sensitive to blue light.

Primary Colors of Light

Primary Colors

Red, Green, Blue.

Secondary Colors

Cyran = G+B

Magnenta = B+R

Yellow = R+G

W=R+G+B.

2. Color Fundamentals

Primary Colors of Pigments Primary Colors C, M, Y C absorbs R M absorbs G Y absorbs B

Secondary Colors C+M absorbs R and G C+M = B

M+Y absorbs G and B M+Y = R

Y+C absorbs B and R Y+C = G

C+M+Y = K (black)

2. Color Fundamentals

Color Specified by brightness and chromaticity (hue and saturation). Chromaticity can be regarded as brightness-normalized color. Hue is an attribute associated with the dominant wavelength in a mixture

of light waves. Saturation referrs to the relative purity or the amount of white light

mixed with a hue. The degree of saturation is inversely proportional to the amount of white

light added.

Tristimulus Values (R, G, B) The amounts of red, green, blue required to form a specific color.

Trichromatic Coefficients x = R/(R+G+B) y = G/(R+G+B) z = B/(R+G+B) Note that x+y+z = 1, thus trichromatic coefficients can be represented

only by x and y (2D space)

2. Color Fundamentals-- CIE Chromaticity Diagram

Pure color is at the boundary of the tongue

Suppose that we have two colors (two points within the tongue)

Any mid-point on the line can be obtained by mixing the two colors

2. Color Fundamentals

Color Gamut

Any point within the corresponding triangle can be obtained by mixing the three colors

All colors cannot be reproduced using only three primary colors.

The boundary of the color is irregular because of a combination of additive and subtractive mixing.

The rugged shape represent the colors that can be represented by a typical printing devices based on CMY space.

3. Color Models

Definition

A color model is a specification of a coordinate system and a subspace within that system where each color is represented by a single point.

Purpose

Facilitate the specification of colors in some standard way.

RGB (red, green, blue)

Display devices

CMY (cyan, magenta, yellow) and CMYK(+black)

Printing devices

HSI (hue, saturation, intensity)

Intuitive description of colors.

Hue (red, green, … violet)

Saturation

Red: high saturation

Pink: less saturation

3. Color Models-- RGB

3. Color Models-- RGB

Full Color (True color) Each of RGB components are represented by 8 bits

RGB pixel is said to have a depth of 24 bits

There are 224 (=16,777,216) colors

Safe RGM Color

Have a subset of colors that are likely to be reproduced faithfully,

reasonably independently of viewer hardware capabilities.

63 = 216 safe colors

6 reproduction levels for each R, G, B

0, 51, 102, 153, 204, or 255.

3. Color Models– RGB

3. Color Models– RGB Full Color

3. Color Models– RGB Safe Color

3. Color Models-- CMY

CMY and CMYK Models

Color printers and copiers, require CMY data input or perform an

RGB to CMY conversion internally.

C+M+Y = K

Black is used most frequently for typical printing Thus,

black ink is added, yielding the CMYK color model

3. Color Models-- HSI

HSI Model

Intensity axis : the main diagonal from black to white

A plane perpendicular to this axis contains the colors with the same

intensity

Saturation: the distance from the intensity axis

Hue: the angle on the plane with respect to the red color

3. Color Models-- HSI

3. Color Models– HSI

Intensity axis : the main diagonal from black to white

A plane perpendicular to this axis contains the colors with the same intensity

Saturation: the distance from the intensity axis

Hue: the angle on the plane with respect to the red color

3. Color Models– RGB to HSI

Hue

Saturation

Intensity

3. Color Models– HSI to RGB

)(3

)60cos(

cos1

)1(

1200

:1

BRIG

H

HSIR

SIB

H

SectorRGCase

)(3

)60cos(

cos1

)1(

240120

:2

GRIB

H

HSIG

SIR

H

SectorGBCase

)(3

)60cos(

cos1

)1(

360240

:3

BGIR

H

HSIB

SIG

H

SectorBRCase

Converting colors from HSI to RGB

There are three sectors of interest, corresponding to the

120° intervals in the separation of primaries.

3. Color Models– HSI

RGB with HSI representation

Discontinuity.

3. Color Models– HIS

Manipulation Example

Analog Video25

Video: A sequence of images played back fast enough to reproduce to the illusion of motion.

Early movies:

16 frames per second (fps)

Updated to 24 fps

Double-/triple-blade shutter artificial 48/72 fps

European standards PAL/SECAM

50 Hz electricity 25 fps

US/Japan/etc

60 Hz electricity 30 fps

29.97 fps (1953 w/ color TV)

TV26

Television, a medium. So called because it is nether

rare nor well done.

--Anonymous

The CRT27

Interlacing28

Odd lines

Retrace

(horizontal blanking)

Even lines

The Color CRT29

The NTSC Standard30

525 scan lines 482/483/487(?) visible

Aspect ration—4:3 (by T. Edison in 1930s) 4/3 x 483 ~ 644 pixels

Image is actually continuous horizontally

Other aspect ratios: PAL/SECAM: 1.33

16/35mm film: 1.33

HDTV: 1.78

Widescreen film: 1.85

70mm film: 2.10

Cinemascope film: 2.35

pel aspect ratio Picture element is not a dot—more like a rectangle

Various Resolution32

TV Formats33

Video Parameters34

Progressive Scan, Frame Rate.

The RGB-to-YUV Conversion35

PAL

Y = 0.299R + 0.587G + 0.115B

U = -0.147R - 0.289G - 0.436B = 0.492 (B-Y)

V = 0.615R - 0.515G - 0.100B = 0.877 (R-Y)

NTSC:

Y = 0.299R + 0.587G + 0.115B

I = 0.596R - 0.274G - 0.322B = -sin33°U + cos33°V

Q = 0.211R - 0.523G - 0.311B = cos33°U + sin33°V

CCIR 601/ITU-R BT.601-236

Standard sampling rates

Multiples of 3.725 MHz

Sampling patterns Y:Cb:Cr

4:4:4—all components samples @13.5MHz

Typical 4:2:2

Luminance sampled @13.5 MHz

Chrominance components sampled @6.75 MHz

RGB to CCIR 60137

After RGB-to-YCbCr conversion

Y normalized as Ys [0, 1]

Cb Cbs [ -½ , ½ ]

Cr Crs [ -½ , ½ ]

8-bit integer conversion

Y = 219Ys + 16, Y [16, 235]

U = 224Cbs + 128, U [16, 240]

V = 224Crs + 128, V [16, 240]

Common Interchange Format (CIF)38

Teleconferencing standard

PAL/NTSC-based: YUV/30 fps

Multiples of 16 x 16 SQCIF: 128 x 96

QCIF: 176 x 144

CIF: 352 x 288

4CIF: 704 x 576

16CIF: 1408 x 1152

Pixel aspect ratio: 1.222:1

SIF: Source Input Format

MPEG-1‘s parlance for CIF

625-line (PAL) & 525-line (NTSC) version

Motion Compensation39

The use of previous frames as prediction of current

frame

I.e. exploitation of temporal redundancy

Rationale:

Most of the time, frame-to-frame changes will be ‗small‘

Idea:

Identify ‗objects‘ that have moved and include a motion

compensation vector

Motion Compensation Example40

Frame #1

Frame #2

Frames 1 & 2 overlaid

motion vectors

Motion Compensation Example41

Block-based Motion Compensation42

‗Pixel-splitting‘ Motion Estimation43

Observation

Best fit may not be pixel aligned

Idea:

―Double‖ the image size

I.e., introduce intermediate pixels with interpolated values

,5.04

5.02,5.02

5.02,5.02

21

21

DCBAc

DBvCAv

DChBAh

Motion Estimation Considerations44

Observations:

Smaller block more possibilities to explore

Larger block higher chance of not finding a match

Note: Numerous methods exist for balancing prediction

accuracy (compression) & computation time

Motion Estimation Example45

46

47

Subpixel Eestimation48

49

MPEG-1/2

MPEG-1 (ISO/IEC 11172) completed in 1991 digital storage media at bit rates up to about 1.5Mbps

remove intra and inter-frame redundancy with block-based DCT and

motion compensation (I, P and B-frames)

progressive pictures only, optimized for SIF (352x240) resolution

fixed 4:2:0 color format

MPEG-2 (ISO/IEC 13818) completed in 1994 extensions that allow for greater input format flexibility, higher data

rates and better error resilience

field/frame prediction modes for interlace format support

field/frame DCT coding syntax

downloadable quantization matrix

scalability extensions (spatial, temporal, SNR)

display syntax (e.g., 3:2 pull-down, pan-and-scan, color formats)

MPEG3, 4, 7

MPEG-3

– Original intended for HDTV coding, dropped when MPEG-2 application domain was extended to HDTV

MPEG-4

– Originally intended for very low bit rate audio/visual coding

– It may be extended for both low and high bit rate application

– Object-oriented coding algorithm

MPEG-7

– There is no reason to pick up the series number 7 instead of 5 or 6 or other

– Intend to set a standard of “Multimedia content description interface” that will specify a standardized description of various types of multimedia information.

51

Bitrates and Resolutions

Standard TV

HDTV

Over 1080P

1080P

64K 1M 1.5M 15M 300M Over 600M

MPEG-4

MPEG-4Studio Profile

CIF

QCIF

MPEG-2

MPEG-24:2:2

Profile

MPEG1

Introduction

Video Formats

Frame Reorder

Data Hierarchy

Syntax

Compression Ratio

MPEG1-- Introduction

Backgrounds

ISO/IEC Draft Standard CD 11172, Dec., 1991

Compression and Decompression of Video & Audio Signals

Synchronization of Audio and Video

Lossy Coding Techniques

Features

A Toolkit

Supports intra and interframe modes.

Only progressive-format data is supported.

The algorithm specifies the bit stream syntax and semantics and a method for decoding it.

MPEG-1 Video54

Overall structure very similar to H.261

… with some non-trivial differences

Focus on stored as opposed to live video

Random access

In H.261 potentially all frames after the first may depend on previous one

MPEG-1 provides random access by requiring periodic independently-encoded frames /I-frames/

Distance b/w I-frames is a trade-off b/w convenience & compression

Also

P-frames—predictively coded

B-frames—bidirectionally predictively coded

MPEG1-- Introduction (c.1)

Features

The algorithm does not specify preprocessing of the video, encoding steps(e.g. motion estimation), postprocessing.

The algorithm does not specify parameters such as coded bit rate. lines per picture(<4096), pels per line (<4096), picture rate(24, 15, or 30), and pel aspect ratio(14 choices) .

A special subset of the parameter space

pels/line <= 720

lines <= 576

macroblocks per picture <= 396

macroblocks per sec. <= 396x25

picture rate <=30

bit rate <=1.86 Mbits/sec.

4:2:0 Format

Crominance components is 1/2 resolution of Y

components ( in both directions)

The MPEG1 format

4:2:0 macroblocks

MPEG1-- Video Formats

x x x x x x

x x x x x x

x x x x x x

x x x x x x

x x x x x x

x x x x x x

Y Cb Cr

MPEG1-- Video Formats(c.1)

4:1:1 Format

Same bits/pel area as 4:2:0 but vertical resolution is

higher

Used in DVIx x x x x x

x x x x x x

x x x x x

x x x x x x

x x x x

x x x x x x

x

xx

MPEG1-- Video Formats(c.2)

4:2:2 Format

4:2:2 Macroblock

x x x x x x

x x x x x x

x x x x x

x x x x x x

x x x x

x x x x x x

x

xx

Y Cb Cr

MPEG1-- Video Formats(c.3)

4:4:4 Macroblock

x x

x x

x x

x x

x

x x

x

Y Cb Cr

x

x

x

x

x

x

x x

x x

x x

x x

x

x x

x

x

x

x

x

x

x

MPEG1-- Frame Reorder

Encoder Input

GOP1 is a closed GOP while GOP2 a open GOP.

Encoder Output

Decoder Ouput

1(I) 2(B) 3(B) 4(P) 5(B) 6(B) 7(P) 8(B) 9(B) 10(I) 11(B) 12(B) 13(P)

1(I) 4(P) 2(B) 3(B) 7(P) 5(B) 6(P) 10(I) 8(B) 9(B) 13(P) 11(B) 12(B)

1(I) 4(P) 2(B) 3(B) 7(P) 5(B) 6(P) 10(I) 8(B) 9(B) 13(P) 11(B) 12(B)

GOP1 GOP2

MPEG1-- Data Hierarchy

Video Sequence

The highest syntctic structure of the coded video bitstream.

Sequence header, sequence extension.

Group of Pictures(GOP)

PictureSlice Macroblock Block

MPEG1-- Data Hierarchy (GOP)

I Pictures

Not dependent on another pictures

P Pictures

Predicted from I or P pictures

1 2 3 4 5 6 7 8 1I B B B P B B B I

Forward Prediction

Bidirectional Prediction

B Pictures Predicted from nearby I

and/or P pictures

MPEG-1 Frame Types63

I : All information for frame present.

P: Predictively encoded from previous I or P.

B: Predictively encoded from previous I or P

and next I or P.

I P IP P PB B B B B B B B B B

MPEG-1 Display vs. Bitstream Order

64

Problem: B-frames depend on future—how could they be

decoded?

Solution? Reorder frames!

I P IP P PB B B B B B B B B B

1 4 167 10 132 3 5 6 8 9 11 12 14 15

1 2 145 8 113 4 6 7 9 10 12 13 15 16

MPEG1-- Functional Blocks

DCT-Video Input Output

Buffer

ME & MC

Frame

store

Q-1 &

DCT-1

Q VLC

Changable per Macroblocks

Control Signals• Macroblock type

• Coded Block Pattern

• Quantizer Scale Factor

MPEG1-- Motion Compensation & Estimation

Objects

Reduce the temporal redundancy

Motion Compensation

Process of compensating for the displacement of moving objects from one frame

to another.

Motion Estimation

The process for finding corresponding pixels in the frame; this process is referred

to as motion estimation.

t t+1 t+2

Macroblocks(16x16) Search Area

BPI

Motion/No motion: check if a motion vector transmitted or is it assumed to be zero.

Intra/Non Intra: check the variance of the estimation errors using the vector in step 1.

Coded/not coded: check if the residual is large enough to be coded using DCT

Quant/No Quant: check if the quantizer scale is satisfactory or should be changed.

MPEG1-- Encoded Tree for Macroblocks in P-picture

Macroblocks

Motion

No Motion

Non Intra

Intra

Coded No MQuant

Not Coded

Quant

No QuantQuant

No QuantQuant

No QuantQuantCoded

Not Coded

Intra

Non Intraskipped MB

MPEG-1 Rate Control68

Sequence level: within a GOP (Group of Pictures)

B-frames are easiest to eliminate

Frame level:

Quantization step adjustment

Dropping of higher order coefficients

Constrained Parameter Bitstream (CPB)

Horizontal size 768 pixels

Vertical size 576 pixels

396 macroblocks/frame @ 25 fps (352 x 288 pixels)

330 macroblocks/frame @ 30 fps (352 x 240 pixels)

Rate is 1-1.5 Mbit/s

CPB is understood to be the MPEG-1 typical setup

MPEG1-- Tree Decision for Macroblocks in P-

picture

VAR: variance of original

VAROR: variance of reconstructed error

xBD motionvector

( ) 0

256

yBD motionvector

( )0

256

1

1.5

3

0.5

2.7y=x/1.1

Motion Compensation

No

Motion

Compensation

VAR64

64

Intra

Non Intra

VAROR

MPEG1-- Macroblock Types in B-picture

Macroblock Types

Intracoded

Forwrd predictive coded

Backward predictive coded

Bidirectional predictive coded

Bidirection prediction

Yields accurate prediction in the case of cover/uncovered images.

Two pictures are needed to decode a B picture.

Significant compresion relative to unidirectional prediction (7 kbits/picture versus 100 kbits/picture).

MPEG1-- Encoded Tree for Macroblocks in B-picture

Macroblocks

Forward

Coded

No MQuant

Not Coded

MQuant

CodedNo MQuant

Not Coded

MQuant

Coded

Not Coded

No MQuant

MQuant

No MQuant

MQuant

Backward

Forward &

Backward

Intra

MPEG-1 Bitstream Syntax72

Macroblock and Slice73

MPEG-1 Bitstream Syntax (2)74

extension.header 0000 01B5

GOP.start 0000 01B8

picture.start 0000 0100

reserved 0000 01B0

reserved 0000 01B1

reserved 0000 01B6

sequence.end 0000 01B7

sequence.error 0000 01B4

sequence.header 0000 01B3

slice.start.1 0000 0101

slice.start 0000 01AF

user.data.start 0000 01B2

MPEG1-- Discrete Cosine Transform

Objects

Orthogonal transform

Filter-bank-oriented

With the frequency domain interpretation

A fast algorithm and a close approximation to the optimal for a large class of

images

Transform a block in spatial domain into another domain suitable for removing

spatial and psychovisual redundancy

DCT u vC u C v

I j k j u k v

where C x for x and C x for x

kj

( , )( ) ( )

( , ) cos[( ) / ]cos[( ) / ]

( ) , ( )

4

2 1 16 2 1 16

1

20 1 0

MPEG1-- Quantization

Concepts

The combination with run-length coding contribute

to most of the compression

Visual quality achievement by adaptive

quantization

Coarser quantizer for higher frequencies

Application specification quantization matrix

8 16 19 22 26 27 29 34

16 16 22 24 27 29 34 37

19 22 26 27 29 34 34 38

22 22 26 27 29 34 37 40

22 26 27 29 32 35 40 48

26 27 29 32 35 40 48 58

26 27 29 34 38 46 56 69

27 29 35 38 46 56 69 83

For Intra blocks (both luminance and chrominace)

Nonintrablocks all scales are 16

(( , )

( , ) * _) ( , )

int

( ( , ) int

32 1

2

0

I v u

W v u quantizer scalek QF v u

where k for rablock

k sign I u v for non rablock

Quantizer77

MPEG1-- VLC & Runlength Coding

Reduce the coding redundancies

Runlength coding in Zig-Zag Scanning

Variable length coding for the value and runs

3 0 0 0 0 0 0 0

2 -2 0 0 0 0 0 0

4 0 20 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

run-length level run value

MPEG1-- Syntax

Sequence Layer

Sequence_header_code

Horizontal_size

Vertical_size

Pel_aspect_ratio

Picture_Rate

Bit_Rate

Quantizer specification

User Data

MPEG1-- Syntax (c.1)

Group of Picture Layer

Group start code

Time code

Open/closed GOP

User data

Picture Layer

Picture header

temporal_reference

picture_coding_typw (I, P, B, DC only)

MPEG1-- Syntax (c.2)

Slice (resynchronization)

Slice_start_code

Macroblock Layer(motion compensation)

Macroblock header

Macroblock_address_increment

Macroblock_type

Quantizer scale

Motin_vector

Block Layer-Block Data (DCT Unit)

DCT data

MPEG1-- Compression Ratios at Each Stage

Single Frame at 640x480x24 bpp 910KB

Preprocess (filter to reduce noise) ??

YUV (4:2:2) Conversion (from RGB) 460 KB

Scaling to CIF 115 KB

DCT 115 KB

Quantization 115 KB

Run-Length + Huffman Coding 24 KB

Intraframe Compression 5 KB

23:1 Compression ratio or a 184:1 compression ratio

MPEG2-- MPEG Working Group

MPEG

Formed in 1988 to establish standards for coding of moving

pictures and associated audio for various applications such

as storage media, distribution and communication.

ITU-T SGXV

The Experts Group for ATM Video Coding was formed in

1990 to develop video coding standards appropriate for B-

ISDN using ATM transport.

MPEG2 Draft

Draft was prepared by MPEG and ITU-T SG15

MPEG2 Standards

Eight Parts

13818-1 System

13818-2 Video

13818-3 Audio

13818-4 Conformance Testing

13818-5 Simulation Software

13818-6 Digital Storage Media Command and Control (DSM-CC, July, 1996)

13818-7 Nonbackwards Compatible Audio (April 1997)

13818-8 10 bit Video

13818-9 Real-Time Interface (July 1996)

Profiles

4:2:2 Profile ( Jan, 1996)

Multiview Profile (Nov., 1996)

MPEG2 Applications

BSS Broadcasting Satellite Service (to the home)

CATV Cable TV Distribution on Optical Networks, Copper, etc.

CDAD Cable Digital Audio Distribution

DAB Digital Audio Broadcasting (terrestrial and satellite broadcasting)

DTTB Digital Terrestrial Television Broadcasting

EC Electronic Cinema

ENG Electronic News Gathering(including SNG, Satellite News Gathering)

FSS Fixed Satellite Service (e.g. to head end)

MPEG2 Applications (c.1)

HTT Home Television Theatre

IPC International Communications (videoconferencing, videophone, etc)

ISM Interactive Storage Media (optical disks, etc)

MMM Multimedia Mailing

NCA News and Current Affairs

NDB Networked Database Services (via ATM, etc.)

RVS Remote Video Surveillance

SSM Serial Storage Media (digital VTR, etc.)

MPEG2-- System

Video audio synchronization

Multiplexing multiple programs

Transporting over communication channels

Multi-media on CD-ROM

Broadcasting

Digital Storage Media Command and Control (DSM)

Protocal

MPEG2-- Audio

Three Layers Coding (I, II, III)

3/2 Stereo (3 front/ 2 surround) plus Low

Frequency Enhancement (LFE) Channels

Downwards and Backwards Compatibility

Multi-Lingual Capability

Multi-Channel Audio Coding

MPEG2-- Video

Scalable and Nonscalable Syntax

Profiles and Levels

Progressive and Interlaced Sequences

Frame and Field Picture Processing

Error Concealment

MPEG2-- Video

MPEG-2 Enhancements

– Basic coding mode is interframe DCT with I, P, B pictures

– New field/frame prediction modes for interlace support

– Quantization/coding extensions to MPEG-1 syntax for improved quality Improved quatization with greater range/adaptive

Mew Intra-frame VLC‘s

– Scalability extensions for hierarchical service, robustness, etc. Spatial scalability modes for compatibility

Temporal scalability

SNR scalability

DATA partitioning (frequency scalability)

– New system layer for multiplexing, transport, etc.

MPEG2-- Video

MPEG-2 Prediction Modes

– MB syntax extended to include a number of alternative

prediction modes for better compression of interlaced video

– Frame-based prediction (identical to that of MPEG-1)

I B P

1 or 2 vector

MPEG2-- Video

– Field-based prediction

Each field of a MB is predicted separately in this mode

– Adaptive field/frame selection based on better match (should say better compression performance)

I B P

2 or 4 vectors 2 vectors

MPEG2-- Video

– Special prediction mode - dual prime

Basically a set of field motion vectors with a scaling to near

or far field, plus a transmitted delta

Reference Prediction

MPEG2-- Video

Field/frame DCT coding syntax

Field DCT Coding Luminance Macroblock Frame DCT Coding

Note: Chrominance blocks in 4:2:0 mode are always DCT coded in Frame order

MPEG2-- Video

Alternative Zig-Zag scan

8x8 block of quantized DCT coefficients

Normal Zig-Zag scan.

Mandatory in MPEG-1

Option in MPEG-2

Alternative Zig-Zag scan

Not used in MPEG-1

Option in MPEG-2

For Frame DCT

coding of interlaced

video, more energy

exists here, so run

length coding is more

efficient.

MPEG2-- Profile and Levels

Concepts

MPEG2 is a generic standard and it is not practical to implement the full specification at the early stages of its adoption.

A limited number of subsets have been defined by means of "profile" and "level".

Profile

A subset of the bitstream syntax. Within this subset, it is still possible to have a large variation in encoders and decoders on values taken by parameters in the bitstream.

Levels

Levels are defined within each profile to deal with the variation in a profile.

A level within a profile is a defined set of constraints imposed on parameters in the bitstream.

level

profilesyntax

profile

Levels

High

High

(1440)

Main

Low

Simple (4:2:0) Main (4:2:0)Main+ (4:2:0)

(scalable)

High

(scalable)Max. Resolution

Y samples/sec

Min. Resolution

Y samples/sec

# of layers

Bit-Rates (Mbps)

Max. Resolution

Y samples/sec

Min. Resolution

Y samples/sec

# of layers

Bit-Rates (Mbps)

Max. Resolution

Y samples/sec

Min. Resolution

Y samples/sec

# of layers

Bit-Rates (Mbps)

Max. Resolution

Y samples/sec

Min. Resolution

Y samples/sec

# of layers

Bit-Rates (Mbps)

720/576/30

10.4M

1/1/1

15

1920/1152/60

62.7 M

1/1/1

80

1440/1152/60

47.0 M

1/1/1

60

720/576/30

10.4M

1/1/1

15

352/288/30

3.05 M

1/1/1

4

1440/1152/60

47.0 M

720/576/30

10.4 M

3/2/2

60(a), 40 (mid+b), 15(b)

720/576/30

10.4 M

2/1/2

15(a), 10(b)

352/288/30

3.05 M

2/1/2

4(a), 3(b)

1920/1152/60

62.7/83.6 M

960/576/30

14.8/19.7 M

3/2/2

100(a), 40(m+b), 15(b)

1440/1152/60

720/576/30

11.1/14.8 M

3/2/2

80(a), 60(m+b), + 20(b)

720/576/30

11.1/14.8 M

352/288/30

3.05 M

3/2/2

20(a), 15(m+b), 4(b)

MPEG2-- Profile and Level (c.1)

MPEG2-- Compatibility between Different Profiles/Levels

NP

@

HL

NP

@

H-14

NP

@

ML

M+

@

H-14

M+

@

ML

MP

@

LL

MP

@

HL

MP

@

H-14

MP

@

ML

MP

@

LL

SP

@

ML

NP@HL x

NP@H-14 x x

NP@ML x x x

M+@H-14 x x x

M+@ML x x x x x

M+@LL x x x x x x

MP@HL x x

MP@H-14 x x x x x

MP@ML x x x x x x x x x

MP@LL x x x x x x x x x x

SP@ML x x x x x x x x x

MPEG2-- Scalable Extensions

Motivation

Support applications such as video on ATM,

interworking of video standards, HDTV with embedded

TV, etc.

Four Modes of Scalability

Data Partitioning

SNR Scalability

Spatial Scalability

Temporal Scalability

HDTV Compression100

―Grand Alliance‖

FCC-encouraged partnership to define HDTV standard

MPEG-2 compression

HDTV == MP@HL

H.263101

Based on H.261

Focus on non-interlaced video

GOBs/slices

Strip of pixels w/ multiple of 16

Bottom strip may have fewer

GOB macroblocks

Main upgrades

Works with P & I frames

Motion compensation [-16, 15.5]

Prediction == median of motion vectors of neighbors

Half-pixel motion compensation

H.263: Bitstream Structure102

H.263 Optional Modes103

Unrestricted motion vector [-31.5, 31.5] useful for higher resolutions

Motion vector can point outside picture

Syntax-based arithmetic coding Var-length codes replaced w/ AC, m = 16

Specifies various CC tables:

MVector, intra-DC, intra-/inter-AC coefficients

Advanced prediction Four luminance vectors (vs. one for baseline)

Overlapped Block Motion Compensation (OMBC)

Weighted sum of predictions

PB-frames P + B picture/frame

H.263+ Modes104

Advanced intra coding

Prediction-based encoding for coefficients

Deblocking filter

Smoothing of block boundaries for better prediction

Reference picture selection

Selection of reference frame other than the preceding

Temporal, SNR, & spatial scalability

Similar to MPEG-2

Temporal achieved through separate B frames

SNR through layering

Spatial through upsampling

H.263+ Modes (2)105

Reference picture resampling

Resizing/warping of reference picture to obtain better prediction

Reduced resolution update

For highly active scenes

Macroblock is assumed twice as high/wide

Alternate VLC

Enables the use of intra frame codes for inter coding

Helps during high-activity periods

Modified quantization

Split luminance/chrominance quantization

Escape sequences for overload situations

Enhanced reference picture selection

H.264/MPEG-4 Part 10106

Same baseline macroblock structure, plus Submacroblocks:

8x4, 4x8, 4x4, 8x8, 16x8, 8x16

Motion compensation Variable granularity motion tracking at various

levels of detail

Quarter-pixel accuracy

Block-edge filters

Up to 32 possible reference pictures

B pictures—up to two motion vectors (as before)

Pskip—only motion vector is transmitted

H.264 Transform107

New 4x4 DCT/DWHT combination

Pros

Simpler implementation

Better for small stationary

Less noise & noise propagation

Cons

Not normalized--compensated through scaling during quantization

H.264 Intra Prediction108

H.261-263—no de-correlation for I frames

H.264—prediction for intra-coding (9 modes):

109

Components in an MPEG-4 Terminal

...N

e

t

w

o

r

k

L

a

y

e

r

Hierarchical, Interactive,

Audiovisual Scene

...

Elementary

Streams

Demultiplex

...Primitive

AV Objects

Decompression Composition and

Rendering

...Upstream Data

(User Events, Class Request, ...)

Composition

Information

Scene Description

(Script or Classes)

110

Basics of MPEG-4

A scene is constructed of multiple independent objects

Audio or visual, natural or synthetic

Objects can be encoded separately with scene description information

This allows to create the combination of different object types, e.g., animation with natural

video, 3D mesh, Web papers, ...

Objects are composited in a scene at the decoder side:

MPEG-4 has standardized a binary format for scene description, referred to as BIFS,

which is based on VRML

This allows to multiplex and synchronize the data associated with objects, so

that they can transported over network providing a QoS appropriate for

the nature of the specific objects

And interactivity with audiovisual scene generated at the receiver‘s side

111

Parts of the StandardPart 1: Systems Part 6: DMIF

Part 2: Visual Part 7: Optimized encoder tools

Part 3: Audio Part 8: MPEG-4 on IP

Part 4: Conformance Part 9: Reference hardware

Part 5: Software framework Part 10: Advanced Video Coding

112

Video Object and Video Object Plane

VO3

(Background)

VO2 (Moving

Object)

VO1 (Stationary

Object)

VOP: instance of a video object at a

given time

113

Syntax Hierarchy

VOS0

VO0

VOS1

VO1

VOL0 VOL1

GOV0 GOV1

VOP0 .....VOPn VOPn+1 .....VOPm

Video Object Sequence

Visual Object

Video Object Layer

Group of Video Object Plane

Video Object Plane

114

Encoder/Decoder Structure

InputVOP

Definition

VOP 0

Coding

VOP 1

Coding

VOP 2

Coding

MUX Bitstream

OutputComposition

VOP 0

Decoding

VOP 1

Decoding

VOP 2

Decoding

DEMUXBitstream

115

VOP Encoder

Shape

Coding

Motion

Estimation

Motion

Compensation

Texture

Coding

MU

X

Previous Reconstructed

VOP

Buffer

Motion

Coding

+

-

116

VOP Formation

...

Control MB

Tightest Rectangle

Extended

Bounding

Box

Intelligently generated VOP

: control point

...

Object

Code the boundary of each

block with shape coding

117

Binary Shape Coding

Context-based arithmetic encoder (CAE)

Basic idea

operates on the macroblock level

compute a context from neighboring pixels, 8 or 9 bit integer

based on context, use LUT to get probability (pixel is either 0 or 1)

obtain sequence of probabilities that drive an arithmetic encoder

C9 C8 C7

C5 C4 C3C6 C2

C0 xC1

C8

C6 C5C7

C4

C3 C2 C1

C0 x

Intra

Previous Current

Frame

Inter

a b

k

k

kcC 2

118

Lossy Shape Coding

CAE is able a achieve a lossless representation

For rate reduction, MPEG-4 allows the encoder to sub-sample the blocks by

a factor of 2 or 4 - lossy shape coding

distortion is the difference btwn the original and up-sampled block

must also transmit the conversion ratio

downsampling

upsampling

MxCR

MxCR

M

M

conversion error

M

M

119

P-VOP Motion

Estimation/Compensation

Basic techniques

motion estimation (ME) modified for arbitrarily shaped VOP

full and half pixel motion vectors, Intra/Inter decisions

Padding Process

pixels outside of VOP boundary must be padded before ME

padded pixels are not included in matching process

Advanced prediction

16x16, 8x8, and field predictions

16x16 mode

(1 MV)8x8 mode

(4 MV’s)

120

Differential Coding of MV‘s for P-

VOPs

MVMV1

MV2MV3

MV

MV3MV2

MV1 MVMV1

MV2MV3

MVMV1

MV2MV3

MV

MV3MV2

MV1

For 16x16 mode For 8x8 mode

MVDx = MVx - Px

Px = median(MV1x, MV2x, MV3x)

121

Texture Coding

Basic techniques

DCT with motion compensation as in MPEG-2

VLC tables for DC coefficients

run-length coding and VLC tables for AC coefficients

New methods

Intra DC/AC prediction for I- and P-VOPs

texture coding for arbitrarily shape VOPs

low-pass extrapolation (LPE) technique

shape-adaptive DCT (SA-DCT)

supports both H.263 and MPEG quantization methods

122

Adaptive DC Prediction

Choose best DC predictor based on gradients of the DC values

if (|QDCA - QDCB| < |QDCB - QDCC|)QDCX’ = QDCC

else

QDCX’ = QDCA

Obtain differential DC value from this best predictor

A

B C D

X MacroblockY

or or

123

Adaptive AC Prediction

Either coefficients from

the first row or the first

column of a previous

coded block are used to

predict the co-sited

coefficients of the current

block

The best direction is

chosen from the direction

of the DC prediction

A

B

X

DC

or

Macroblock

Y

or

124

Q-Step Scaling for AC Prediction

To compensate for differences in the quantization of previous horizontaly or

vertically adjacent blocks used in AC prediction of the current block, scaling

of prediction coefficients becomes necessary

For example, if block A was chosen to predict block X

Note, complexity increase due to division and storage of previous row of

AC coefficients

00 '

i A Ai X

X

QAC QPQAC

QP

Low-Pass Extrapolation Padding125

3 types of MB‘s in a VOP with arbitrary shape completely located inside VOP (no special treatment)

completely located outside VOP (skipped, 1 bit)

blocks that lie on the boundary (need padding before DCT)

LPE padding for intra blocks only Step 1: assign pixels outside the VOP boundary the mean value of pixels inside

the VOP

Step 2: beginning from upper-left, process outside pixels row by row, taking the

average of 4 pixel values

This process is intended to fill in the undefined pixel

values, while not adding significant energy to HF

126

Shape-Adaptive DCT (SA-DCT)

The SA-DCT algorithm is based on predefined orthonormal sets of DCT basis functions

Apply 1D DCT vertically and horizontally according to the number of active pixels in the row and column of the block

Final number of the SA-DCT coefficients is identical to the number of active pixels of image

Zigzag scan is modified so that non-active coeffs are neglected

Column

DCTs

Row

DCTs

Active image pixels Coefficients of Column DCTs SA-DCT result

127

Error Resilience: Resynchronization

Enable resynchronization between the decoder and bitstream after an error has been detected

Packet approach: provide periodic resync markers based on the number of bits within a packet, not the

number of MB‘s in a packet

header information is contained at the start of a packet so that decoding can be restarted

all predictively coded info must be contained in one packet to prevent error propagation

Interval synchronization to avoid start code emulation: start codes appear only at legal fixed interval locations

128

Error Resilience: Data Recovery

After synchronization has been reestablished, data recovery attempts to

recover data that would be lost

Reversible Variable Length Codes (RVLC)

Huffman codes are designed to be read in both directions

loss of coding efficiency, but substantial increase in error resilience

Resync

Marker

Macroblock

_number

quant

_scale

HEC Motion & Header

Information

Motion

Marker

Texture

Information

Resync

Marker

Texture

Header

TCOEF

Forward Backward

Errors

Decode Decode

129

Error Concealment

Assuming resynchronization can localize errors, attempt to conceal errors by using available info

Data Partitioning: separate motion and texture bits

place resync marker in between

if texture is lost, use motion to conceal the error

Outside of the standard, error concealment can be done in a number of other ways for example, if motion vector is lost, try to predict it from ones that have already

been decoded

130

Sprite Coding

A sprite is an image composed of pixels that are visible throughout an entire video segment

e.g., sprite contains all the background pixels in a panning sequence

Initial sprite coded with I-VOP techniques, then updated

Sprites can be used to reconstruct and predict VOPs

Need to estimate warping parameters that define the relation between the sprite and pixels in a VOP (global motion parameters)

Good coding efficiency for scenes with global motion

131

Object-Based Scalable Encoding

Spatial Scalability

Temporal Scalability

Enhancement layer

Base layer

P

I

B B

P P

VOL0

frame number

Base Layer

0 6 12

Enhancement

Layer

VOL1

frame number0 6 122 4 8 10

132

Wavelet-based Texture Coding

Still Texture Object

Decomposition of image using DWT

high coding efficiency

excellent for spatial and SNR scalability

Quantization of wavelet coefficients

LL band is coded using DPCM

higher order band are coded using zero-tree

Entropy coding using adaptive arithmetic encoder

QUANTZeroTree

ScanningAC

QUANT AC

Other

Bands

Bitstream

PredictionLow-Low

input

DWT

T

133

2D Mesh and Face Animation

MW0

MNS0

ENS0

ES0IRISD0

Mesh Objects

tessellation of 2D planar region into polygon patches

vertices of mesh are referred to as node points

node points are warped from one frame to the next; motion information is coded

• Face Objects

– shape, texture and expressions

of the face are controlled by

Facial Animation Parameters

– FAPs can also be used for

accurate speech articulation,

where visemes are used to

code lip configurations and

mood of the speaker

134

Amendments to Visual Part

2000 Edition issued in Jan‘01 (version1 + version2) included in the 2000 edition are:

all tools/techniques discussed so far

additional coding efficiency tools (global motion, 1/4-pel motion, SA-DCT)

increased flexibility in object-based scalable coding

improved error robustness; NEWPRED tool that switches ref frames

dynamic resolution conversion

Amendment 1 tools to support Studio Profile; spec has been frozen Jan‘01

Amendment 2 tools to support Streaming Video Profile; spec has been frozen Jan‘01

major addition: Fine Granularity Scalability (FGS) where enhancement layer is

bit-plane coded

135

Amendment 1 - Studio Profile

Objectives Object-based techniques for video creation

Higher coding efficiency for studio storage

Applications Professional broadcast, Studio and post production, Inter-studio transmission

Requirements Formats - 4:2:2, 4:4:4 (YUV and RGB), progressive and interlaced

Resolutions - up to 2048 by 2048 pixels per VOP

Bit-rates - up to 1.2 Gbps bitrate with up to 12 bits pixel depth

Lossless coding capability

Support for Binary Shape, Grayscale Shape for Alpha transparency, depth, displacement, Sprites

136

New Tools in Studio Profile

High efficient VLC for high bitrate

Grouping of DCT coefficients based on their values

Recursive selection of VLC table for groups of coefficients as function of

previously coded group

Coded data = group indicator + fixed length code determining the actual value

within the group

Flexible access by special slice structure

New tools for lossless coding

137

Profile Definitions

Simple Studio Profile

To be applied for image acquisition and editing

Only Intra coding for independent processing of frames

Lossless transcoding from MPEG-2 4:2:2 Profile

Support for Arbitrary Shape (binary or grayscale)

Core Studio Profile

To be applied for inter-studio transmission

Inter (P-VOP) coding for more efficient compression

Support for Sprites

138

Amendment 2 - Streaming Video

Profiles

Encoder DecoderChannel

Traditional Model of a Communication System

Internet Streaming Applications

Encoder

Server DecoderChannel

Server

Channel

Channel

Channel

Decoder

Decoder

Decoder

Basic assumptions of traditional model:

Encoder knows channel capacity

Decoder is able to process all received bits

139

New Objective for Video Coding

Channel Bandwidth

Received

Quality

Traditional

Source

Coding

New

Objective

Good

Moderate

Bad

HighLow

Traditional

Distortion-Rate

Curve

140

Fine Granularity Scalability

Motion compensated DCT coding in base layer to reach lower bound of

bitrate range

Bitplane coding of DCT coefficients in enhancement layer to cover bitrate

range

Enhancement layer bitstream may be truncated into any number of bits per

frame

Decoder may ignore some enhancement bits

Reconstructed video quality proportional to number of decoded bits

141

Basic Encoder Structure

DCT Q

Q-1

IDCT

Motion

Compensation

Motion

Estimation

Frame

Memory

VLCInput Video

Base Layer

Bitstream

Bit-plane

Shift

Find

Maximum

Bit-plane

VLC Enhancement

Bitstream

Enhancement Layer Encoding

Clipping

DCT

142

Basic Decoder Structure

VLD Q-1 IDCT

Motion

Compensation

Frame

Memory

Bit-plane

VLDIDCT

Enhancement Layer Decoding

Base Layer

Bitstream

Enhancement

Bitstream

Base Layer Video

(optional output)

Enhancement VideoClipping

Clipping

Bit-plane

Shift

143

Basic Bitplane Coding Technique

+

-

+

- +1

0

0+

0

0

0

0

1

111

1

1

1

MSB

LSB

Bit-Plane

A block of 8x8 DCT coefficient differences

Zigzag ordering of a block of 8x8 DCT

coefficient differences

+ - +

1

0

0

0

1

11

MSB

LSB

Bit-Plane

A block of 8x8 DCT coefficient differences after zigzag ordering

+

0

+

0

-

0

1

10 0 0 0 0 0 0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0 1

0

0

0

0 0

0

0

0

0

18 zeros12 zeros

20 zeros

MSB

LSB

Bit-Plane

(RUN, EOP) symbols for a block

of 8x8 DCT coefficient differences

after zigzag ordering

(0, 1)

(28, 1)

(6, 0)

(0, 0) (0, 0) (26, 1)

(2, 0) (31, 1)

144

Application: Varying BW Environment

Video

Source

ConsumerConsumer

ConsumerConsumer

Video

SourceConsumer

Time

Bandwidth

User Variation

Temporal

Variation

User Variation: bandwidth varies from user to user

Temporal Variation: bandwidth varies with time

145

Multiplexer CBR Channel

FGS Enhancement Layer Encoder

VBR Base Layer Encoder

Data Server

FGS Enhancement Layer Encoder

VBR Base Layer Encoder

Video 1

Video N

Data

Base 1

Enh 1

Base N

Enh N

Data

Application: Statistical Multiplexing

146

MPEG-4 System Layer Model

SL SL SL

TransMux Layer

FlexMux

TransMux Streams

FlexMux Channel

TransMux Channel FlexMux Streams DMIF Network Interface

DMIF Application Interface

Elementary Stream Interface

SL-Packetized Streams

Elementary Streams

FlexMux

Sync Layer

DMIF Layer

SL SLSL

FlexMux

SL

(RTP)

UDP

IP

(PES)

MPEG2

TS

AAL2

ATM

H223

PSTN

....

....

....

DAB

MuxFile Broad-

cast

Inter-

active (not specified in MPEG-4)

Del

iver

y L

ayer

148

Profile Definitions of Version 1

Simple Profile

Bssic tools of I/P VOP AC/DC Prediction and 4 MV unrestricted

Short header and Error Resilience tools

Core Profile

Simple + Binary Shape, Quantization Method 1/2 and B-VOP

Main Profile

Core + Grey Shape, Interlace and Sprite

Simple Scalable Profile

Simple + Spatial and temperal scalability and B-VOP

N-Bit Profile

Core + N-Bit

Animated 2D Mesh

Core + Scalable Still Texture, 2D dynamic Mesh

Basic Animated Texture

Banary Shape, Scalable Still Texture and 2D Dynamic Mesh

Still Scalable Texture - Scalable Still Texture

Simple Face - Face Animation Parameters

149

Profile Definitions of Version 2 Advanced Real Time Simple Profile

Simple +

Advanced error resilience with back channel,

improved temporal scalability with low buffering delay

Core Scalable Profile

Simple scalable +

Core +

SNR, Spatila/Temporal Scalability for Region or Object of interest

Advanced Coding Efficiency Profile

Tools for improving coding efficiency for both rectangular and arbitrary shaped objects

For applications such as mobile broadcast reception

Advanced Scalable Texture Profile

Tools for decoding arbitrary shaped texture and still image including scalable shape coding

Advanced Core Profile

Core Profile +

Tools for decoding arbitrary shaped video objects and arbitrary shaped scalable still image

Simple Face and Body Animation Profile

Simple face animation + body animation

150

Profile Definitions in subsequent version

Advanced Simple Profile

Simple Profile +

Several tools to make it more efficient:

B-frames

1/4 pel motuon compensation

Extra quantization tables

Global Motion Compensation

Fine Granularity Scalable Profile

Use Advanced Simple Profile as base layer

Fine granularity scalability (FGS)

Fine granularity scalability - temporal (FGST)

Simple Studio Profile

I-frames only

Arbitrary shape

Multiple alpha channels

Up to 2 Gbps

Core Studio Profile

Simple Studio Profile + P-frames

151

MPEG-4 Video Profiles @ Levels

Spatial

&

Temporal

Scalability

Arbitrary

Shape

Rectangular

Frame

No

Scalability

Quality

&

Temporal

Scalability

Additional

Tools

Higher

Error

Resilience

Simple

Core

Simple

Scalable

Core

Scalable

Main

Advanced

Simple

Advanced

Coding

Efficiency

Fine

Granularity

Scalable

Advanced

Realtime

Simple

Simple

Studio

Core

Studio

Additional

Tools

IS

AMD-1

AMD-2

Profiles are used to limit the set of tools in a decoding device

Levels are used to place limits on complexity

153

Visual Object Types/Tools 0f V.2

Visua Tools Visual Object Types

Advanced Real

Time Simple

Advanced Coding

Efficiency

Advanced Scalable

Texture

Core Scalable Simple FBA

Basic

•I/P-Vop

•AC/DC Prediction

•4-MV, Unrestricted MV

X X X

Error Resilience

•Slice resynchronization

•Data partitioning

•Reversible VLC

X X X

Short Header X X X

B-VOP X X

P-VOP with OBMC (Texture) X X

Method 1/Method 2

Quantization

X X

P-VOP based Temporal

Scalability

•Rectangular

•Arbitrary Shape

X X

Binary Shape X X

Grey Shape X

Interlace X

Sprite

154

Visual Object Types/Tools 0f V.2

(Cont‘d)Visua Tools Visual Object Types

Advanced Real

Time Simple

Advanced Coding

Efficiency

Advanced

Scalable Texture

Core Scalable Simple FBA

Temporal Scalability (Rectangular) X

Spatial Scalability (Rectangular) X

N-Bit

Scalable Still Texture X

2D Dynamic Mesh with uniform

topology

2D Dynamic Mesh with Delaunay

topology

Facial Animation Parameters X

Body Animation Parameters X

Dynamic Resolution Conversion X

NEWPRED X

Global Motion Compensation X

¼ Pel motion Compensationn X

SA-DCT X

Error Resilience for Visual Texture

Coding

X

Wavelet Tiling X

Scalable Shape Coding for Still Texture X

Object Based Spatial Scalability X

156

Visual Profiles of V.1

Object Types Simpl

e

Simple

Scalable

Core Core

Scalable

Advance

d Real

Time

Simple

Advance

d Coding

Efficienc

y

Advanced

Scalable

Texture

Simple

FBA

Profiles

V2-1 Advanced

Real Time

Simple

X X

V2-2 Core Scalable X X X X

V2-3 Advanced

Coding

Efficiency

X X X

V2-4 Advanced

Core

X X X

V2-5 Advanced

Scalable

Texture

X

V2-6 Simple FBA X

159

What‘s New in MPEG-4 Visual

New Video Codec: ―MPEG-4 Advanced Video Coding‖

It is developed by JVT (Joint Video Team) of ITU-T and MPEG

Major task is the coding performance improvement

It is based on H.26L

It will be Part 10 of MPEG-4 at MPEG

It probably will be H.264 at ITU-T

Animation Framework eXtension - AFX

High level description of anumation

Enhanced rendering

Compact representations

Low bit rate animations

Scalability based on terminal capabilities

Interactivity at user level, scene level and client-server session level

Compression of representations for static and dynamic tools

3D Video Coding

Interframe Wavelet Coding

160

MPEG-4 Advanced Video Coding (1)

Summary of Fairfax meeting

A total of 160 proposals have been submitted to JVT

Working draft WD.1 has been created

Two profiles: baseline profile and main profile have been decided

Baseline profile to be royalty free, main decoding features include:

I, P pictures

In loop deblocking filters

Interlace support (Level dependent, Level 2.1 or above)

1/4 pel motion prediction

Tree-structured motion segmentation down to 4x4

VLC-based entropy coding

Flexible Macroblock ordering

Main profile

Including all features in baseline profile and adds

B-pictures,

CABAC (Content Adaptive Binary Arithmetic Coding)

Adaptive Block-size Transforms

1/8-sample motion compensation

161

MPEG-4 Advanced Video Coding (2)

Notes on Profiles and Levels

Motion vector range will be limited

A limit is imposed on extreme aspect ratios

Number of reference pictures increases with picture size, never exceeding 15

TBD's

Exact values of motion vector range limit

Smaller than 8x8 bi-predictive motions in B-pictures for Main profile

Adaptive B-picture interpolation in Main profile

Unfulfilled requirements

No 4:2:2 source format support (pending further study)

Mixing Intra and Inter coding type within macroblocks

Data partitioning

SP & SI ("switching" pictures)

Level's summarized with typical format as follows:

Level 1 = QCIF @ 15 (Intermediate levels 1.1 = CIF @ 7.5, 1.2 = CIF @ 15)

Level 2 = CIF @ 30, (Intermediate levels 2.1 = HHR and 2.2)

Level 3 = SDTV (Intermediate levels 3.1, 3.2)

Level 4 = HDTV

Level 5 = SHDTV (1920x1088 @ 60p)

162

MPEG-4 Advanced Video Coding (3)

Technical summary

Order of bitstream within MB: total seven modles

0 0 1 0 1

2 3

16x16 16x8 8x16 8x8

8x8 8x4 4x8

0 0 1

1

0

0 1

2 3

4x4

1

0

MB-Modes

8x8-Modes

CBPY 8x8 block order

0 1

2 3

4 5

6 7

8 9

10 11

12 13

14 15

Luma residual coding 4x4 block order

18 19

20 21

22 23

24 25

16 17

VU

2x2 DC

AC

Chroma residual coding 4x4 block order

0 1

32

163

MPEG-4 Advanced Video Coding (4)

Motion compensation

Motion vector data for 1-16 blocks are transmitted

Motion vector prediction

Median prediction is used except for 16x8 or 8x16 blocks

• The prediction of E is formed as median of A, B, and C

Directional segmentation prediction

• Vector block size 8x16

Left block: A is used if it has same reference picture as E, otherwise "median prediction" is used

Right block: C is used if it has same reference picture as E, otherwise "median prediction" is used

• Vector block size 16x8:

Upper block: B is used as prediction if it has same reference picture as E, otherwise "median

prediction" is used

Lower block: A is used as prediction if it has the same reference picture as E, otherwise "median

prediction" is used

D B C

AE

16x88x16

164

MPEG-4 Advanced Video Coding (5)

Reference pictures

Default reference field number assignment when the current picture is first field coded

Default reference field number assignment when the current picture is second field coded

current field0 12 34 5

Ref. Frame (field) Buf.

Ref. Field No.

......

f1 f2f1 f2f1 f2f1 f2f1 f2f1 f2 f1 f2

6 78 910 11

current field0 12 34 5

Ref. Frame (field) Buf.

Ref. Field No.

......

f1 f2f1 f2f1 f2f1 f2f1 f2f1 f2 f1 f2

6 78 910 11

165

MPEG-4 Advanced Video Coding (6)

Intra prediction: two intra prediction modes

Intra prediction modes for 4x4 of luma

Mode 0: DC prediction

Mode 1: Vertical Prediction

Mode 2: Horizontal prediction

Mode 3: Diagonal Down/Right prediction

Mode 4: Diagonal Down/Left prediction

Mode 5: Vertical-Left prediction

Mode 6: Vertical-Right prediction

Mode 7: Horizontal-Up prediction

Mode 8: Horizontal-Down prediction

Intra prediction for 16x16 mode for luma

Mode 0: Vertical

Mode 1: Horizontal

Mode 2: DC prediction

Mode 3: Plane prediction

1

2

34

56

7

8

166

MPEG-4 Advanced Video Coding (7)

Adaptive Block size Transforms (ABT)

Use of ABT to increase coding efficiency

ABT is synchronized with Motion Compensation

for frame motion compensation, ABT applied to frame MBs

for field motion compensation, ABT applied to field MBs

ABT transform coefficient decoding

Progressive scan

Interlaced scan

4x4

4x8 8x8

8x4

1 3 9 13

2 6 10 14

4 7 11 15

5 8 12 16

1 5 13 21

2 6 14 22

3 7 15 23

4 12 20 28

8 16 24 29

9 17 25 30

10 18 26 31

11 19 27 32

1 3 7 11 15 19 23 27

2 6 10 14 18 22 26 30

4 8 12 16 20 24 28 31

5 9 13 17 21 25 29 32

1 4 9 16 23 31 39 53

2 5 15 22 30 38 46 54

3 8 17 24 32 40 47 59

6 10 21 29 37 45 52 60

7 14 25 33 41 48 55 61

11 18 26 34 42 49 56 62

12 19 27 35 43 50 57 63

13 20 28 36 44 51 58 64

4x44x8

8x48x8

167

MPEG-4 Advanced Video Coding (8)

Content-based Adaptive Binary Arithmetic Coding (CABAC)

Context modeling: provides estimates of conditional probabilities of the coding symbols, utilizing

suitable context models, given inter-symbol redundancy can be exploited by switching between

different probability models according to already coded symbols

Arithmetic codes: permit non-integer number of bits to be assigned to each symbol of the alphabet, this

is extremely beneficial for symbol probabilities much greater than 0.5, which often occur with efficient

context modeling. This is extremely beneficial for symbol probabilities much greater than 0.5, which

often occur with efficient context modeling. In this case, a variable length code has to spend at least one

bit in contrast to arithmetic codes, which may use a fraction of one bit

Adaptive arithmetic: codes permit the entropy coder to adapt itself to non-stationary symbol statistics,

For instance, the statistics of motion vector magnitudes vary over space and time as well as for different

sequences and bit-rates. Hence, an adaptive model taking into account the cumulative probabilities of

already coded motion vectors leads to a better fit of the arithmetic codes to the current symbol statistics

168

MPEG-4 Advanced Video Coding (9)

Other techniques for image quality and encoding performance improvement

In loop deblocking filter

a conditional filtering is applied to boundaries of the 4x4 blocks of a reconstructed MB

in the first step, 16 pel of the 4 vertical edges (horizontal filtering) of the 4x4 raster are filtered

after that, 4 horizontal edges (vertical filtering) follow.

Encode optimization

Using R-D optimizations

Finding optimum prediction mode

the best reference frame

the best motion vectors

fractional pel accuracy

Macroblock level optimum mode decision

decision between intra and inter

adaptive block size

170

Conclusions

MPEG-4 visual standards overviewed

MPEG-4 is the first standard which can be used for object-based coding

Simple profile of MPEG-4 has been used for wireless and internet video transmission

MPEG-4 is expected to be major coding scheme for multimedia applications

Several parts of MPEG-4 designed for special applications such as FGS for video streaming, AVC for increasing coding efficiency (may be used for HD DVD with red laser)

171

For Further Information

MPEG-4 Industry Forum

http://www.m4if.org/

MPEG Home Page

http://mpeg.nist.gov/

IEEE Trans. CSVT special issues:

Feb ‗97: on MPEG-4

Nov ‗98: on representation/coding of images/ video (part I)

Feb ‗99: on representation/coding of images/ video (part II)

Dec ‗99: on object-based coding

Mar ‗01: on streaming video

Recommended