Video Compression Conferencing & Internet Videoce.sharif.edu/courses/86-87/2/ce342/resources/root/Lecture/H263.pdfCr, Cb represents the color difference or chrominance of a pixel

Video CompressionConferencing & Internet

Video

Hamid R. Rabiee, PhDIntel Corporation &

Oregon Graduate Institute

ObjectivesThe student should be able to:� Describe the basic components of the H.263

video codec and how it differs from H.261.� Describe and understand the improvements of

H.263+ over H.263.� Understand enough about Internet and WWW

protocols to see how they affect video.� Understand the basics of streaming video over

the Internet as well as error resiliency andconcealment techniques.

Section 1: Conferencing Video

Section 2: Internet Review

Section 3: Internet Video

Outline


�Video Compression Review�Chronology of Video Standards�The Input Video Format�H.263 Overview�H.263+ Overview

Video Compression Review

MotionEstimation

&Compensation

MotionEstimation

&Compensation

Transform,Quantization, Zig-Zag Scan & Run-Length Encoding


SymbolEncoderSymbolEncoder

Frames ofDigital Video

Bit Stream

Video codecs have threemain functional blocks


Garden Variety Video Coder

MotionEstimation

&Compensation

MotionEstimation

&Compensation





Bit Stream

The symbol encoder exploits thestatistical properties of its input by usingshorter code words for more commonsymbols.Examples: Huffman & Arithmetic Coding


Symbol Encoding

MotionEstimation

&Compensation

MotionEstimation

&Compensation





Bit Stream

This block is the basis for most losslessimage coders (in conjunction with DPCM,etc.)


Symbol Encoding

MotionEstimation

&Compensation

MotionEstimation

&Compensation





Bit Stream

A transform (usually DCT) is applied tothe input data for better energycompaction which decreases the entropyand improves the performance of thesymbol encoder.


Transform & Quantization

MotionEstimation

&Compensation

MotionEstimation

&Compensation





Bit Stream

The DCT also decomposes the input intoits frequency components so thatperceptual properties can be exploited.For example, we can throw away highfrequency content first.



MotionEstimation

&Compensation

MotionEstimation

&Compensation





Bit Stream

Quantization lets us reduce therepresentation size of each symbol,improving compression but at theexpense of added errors. It’s the maintuning knob for controlling data rate.



MotionEstimation

&Compensation

MotionEstimation

&Compensation





Bit Stream

Zig-zag scanning and run-lengthencoding orders the data into 1-D arraysand replaces long runs of zeros with run-length symbols.



MotionEstimation

&Compensation

MotionEstimation

&Compensation





Bit Stream

These two components form the basis formany still image compression algorithmssuch as JPEG, PhotoCD, M-JPEG and DV.


Still Image Compression

MotionEstimation

&Compensation

MotionEstimation

&Compensation





Bit Stream

Finally, because video is a sequence of pictureswith high temporal correlation, we add motionestimation/compensation to try to predict asmuch of the current frame as possible from theprevious frame.


Motion Estimation/Compensation

MotionEstimation

&Compensation

MotionEstimation

&Compensation





Bit Stream

Most common method is to predict eachblock in the current frame by a (possiblytranslated) block of the previous frame.


Motion Estimation/Compensation

MotionEstimation

&Compensation

MotionEstimation

&Compensation





Bit Stream

These three components form the basis formost of the standard video compressionalgorithms: MPEG-1, -2, & -4, H.261, H.263,H.263+.


Garden Variety Video Coder


�Video Compression Review�

�The Input Video Format�H.263 Overview�H.263+ Overview

Chronology of Video Standards

1990 1996 20021992 1994 1998 2000

H.263LH.263L

H.263++H.263++

H.263+H.263+

H.263H.263H.261H.261

MPEG 7MPEG 7

MPEG 4MPEG 4

MPEG 2MPEG 2

MPEG 1MPEG 1

ISO

ITU

-T

Chronology of Video Standards• (1990) H.261, ITU-T

– Designed to work at multiples of 64 kb/s(px64).

– Operates on standard frame sizes CIF, QCIF.• (1992) MPEG-1, ISO “Storage & Retrieval

of Audio & Video”– Evolution of H.261.– Main application is CD-ROM based video (~1.5

Mb/s).

Chronology continued

• (1994-5) MPEG-2, ISO “Digital Television”– Evolution of MPEG-1.– Main application is video broadcast (DirecTV,

DVD, HDTV).– Typically operates at data rates of 2-3 Mb/s and

above.

Chronology continued• (1996) H.263, ITU-T

– Evolution of all of the above.– Supports more standard frame sizes (SQCIF,

QCIF, CIF, 4CIF, 16CIF).– Targeted low bit rate video

Chronology continued• (1/99) MPEG-4, ISO “Multimedia

Applications”– MPEG4 video based on H.263, similar to H.263+– Adds more sophisticated binary and multi-bit

transparency support.– Support for multi-layered, non-rectangular video

display.• (2H/’00) H.263++ (H.263V3), ITU-T

– Tentative work item.– Addition of features to H.263.– Maintain backward compatibility with H.263 V.1.

Chronology continued• (2001) MPEG7, ISO “Content

Representation for Info Search”– Specify a standardized description of various

types of multimedia information. Thisdescription shall be associated with thecontent itself, to allow fast and efficientsearching for material that is of a user’sinterest.

• (2002) H.263L, ITU-T– Call for Proposals, early ‘98.– Proposals reviewed through 11/98, decision to

proceed.– Determined in 2001


�Video Compression Review�Chronology of Video Standards�

�H.263 Overview�H.263+ Overview

Video Format for Conferencing• Input color format is YCbCr (a.k.a. YUV). Y is the

luminance component, U & V are chrominance(color difference) components.

• Chrominance is subsampled by two in eachdirection.

• Input frame size is based on the CommonIntermediate Format (CIF) which is 352x288 pixelsfor luminance and 176x144 for each of thechrominance components.

Input Format

CbCr

Y=

0.299 0.587 0.114-0.169 -0.331 0.5000.500 -0.419 -0.081

RGB

YCbCr

=

Y represents the luminance of a pixel.Cr, Cb represents the color difference or chrominance of a pixel.

Input Format

• Defined as input color space to H.263, H.263+,H.261, MPEG, etc.

• It’s a 3x3 transformation from RGB.

YCbCr (YUV) Color Space

1

10

100

0 2 4 6 8 10

frequency

YC

Input Format

• The human eye is more sensitive to spatialdetail in luminance than in chrominance.

• Hence, it doesn’t make sense to have asmany pixels in the chrominance planes.

Subsampled Chrominance

luminance pelchrominance pelblock edge

Different thanMPEG-2 4:2:0Different thanMPEG-2 4:2:0

Input Format

Spatial relation between lumaand chroma pels for CIF 4:2:0

352 4 3 264 288/ × = ≠

Input Format

•The input video format is based on CommonIntermediate Format or CIF.

•It is called Common Intermediate Format becauseit is derivable from both 525 line/60 Hz (NTSC) and625 line/50 Hz (PAL) video signals.

•CIF is defined as 352 pels per line and 288 linesper frame.

•The picture area for CIF is defined to have anaspect ratio of about 4:3 . However,

Common Intermediate Format

Picture4:3

352

288

Pixel12:11

Pixels are not square in CIF.

Input Format

Picture & Pixel Aspect Ratios

Hence on a square pixel display such as acomputer screen, the video will look slightlycompressed horizontally. The solution is tospatially resample the video frames to be

384 x 288 or 352 x 264

This corresponds to a 4:3 aspect ratio forthe picture area on a square pixel display.

Input Format

Picture & Pixel Aspect Ratios

The luma and chroma planes are divided into 8x8pixel blocks. Every four luma blocks areassociated with a corresponding Cb and Cr blockto create a macroblock.

8x8 pixel blocks

macroblock

Y

Cb Cr

Input Format

Blocks and Macroblocks


�Video Compression Review�Chronology of Video Standards�The Input Video Format�

�H.263+ Overview

ITU-T RecommendationH.263

ITU-T RecommendationH.263

• H.263 targets low data rates (< 28 kb/s).For example it can compress QCIF videoto 10-15 fps at 20 kb/s.

• For the first time there is a standard videocodec that can be used for videoconferencing over normal phone lines(H.324).

• H.263 is also used in ISDN-based VC(H.320) and network/Internet VC (H.323).

Composed of a baseline plusfour negotiable options

Baseline CodecBaseline Codec

Unrestricted/Extended MotionVector Mode


Advanced Prediction ModeAdvanced Prediction Mode

PB Frames ModePB Frames Mode

Syntax-based ArithmeticCoding Mode

Syntax-based ArithmeticCoding Mode

ITU-T Recommendation H.263

Format Y U,VSQCIF 128x96 64x48QCIF 176x144 88x72CIF 352x288 176x1444CIF 704x576 352x28816CIF 1408x1152 704x576

Always 12:11 pixel aspect ratio.

H.263 Baseline

Frame Formats

Picture & Macroblock Types• Two picture types:

– INTRA (I-frame) implies no temporal predictionis performed.

– INTER (P-frame) may employ temporalprediction.

• Macroblock (MB) types:– INTRA & INTER MB types (even in P-frames).

• INTER MBs have shorter symbols in P frames• INTRA MBs have shorter symbols in I frames

– Not coded - MB data is copied from previousdecoded frame.

H.263 Baseline

X

CB

A

( )CBAX MVMVMVMVMV ,,median−=∆

H.263 Baseline

• Motion vectors have 1/2 pixelgranularity. Reference framesmust be interpolated by two.

• MV’s are not coded directly,but rather a median predictoris used.

• The predictor residual is thencoded using a VLC table.

Motion Vectors

Motion Vector Delta (MVD)Symbol Lengths

02468

101214

0 0.5 1 1.5 2 2.5 -3.5

4.0 -5.0

5.5 -12.0

12.5-15.5

MVD Absolute Value

Cod

e le

ngth

in b

its

H.263 Baseline

H.263 Baseline

Assign a variable length code accordingto three parameters (3-D VLC):

1 - Length of the run of zeros preceding thecurrent nonzero coefficient.

2 - Amplitude of the current coefficient.

3 - Indication of whether current coefficientis the last one in the block.

3 - The most common are variable lengthcoded (3-13 bits), the rest are coded withescape sequences (22 bits)

Transform Coefficient Coding

Q

-Q

2Q

-2Q

in

out

H.263 Baseline

• H.263 uses a scalar quantizer with centerclipping.

• Quantizer varies from 2 to 62, by 2’s.• Can be varied ±1, ±2 at macroblock

boundaries (2 bits), or 2-62 at row andpicture boundaries (5 bits).

Quantization

Hierarchy of three layers.

Picture Layer

GOB* Layer

MB Layer

*A GOB is usually a row of macroblocks, exceptfor frame sizes greater than CIF.

Picture Hdr GOB Hdr MB MB ... GOB Hdr ...

H.263 Baseline

Bit Stream Syntax

Picture StartCode

TemporalReference

PictureType

PictureQuant

H.263 Baseline

• PSC - sequence of bits that can not be emulatedanywhere else in the bit stream.

• TR - 29.97 Hz counter indicating time reference fora picture.

• PType - Denotes INTRA, INTER-coded, etc.

• P-Quant - Indicates which quantizer (2…62) isused initially for the picture.

Picture Layer Concepts

GOB StartCode

GOBNumber

GOBQuant

H.263 Baseline

• GSC - Another unique start code (17 bits).• GOB Number - Indicates which GOB,

counting vertically from the top (5 bits).• GOB Quant - Indicates which quantizer

(2…62) is used for this GOB (5 bits).

GOB Layer ConceptsGOB Headers are Optional

GOB can be decoded independently from therest of the frame.

CodedFlag

MBType

Code BlockPattern

MVDeltas

TransformCoefficientsDQuant

H.263 Baseline

• COD - if set, indicates empty INTER MB.• MB Type - indicates INTER, INTRA, whether MV

is present, etc.• CBP - indicates which blocks, if any, are empty.• DQuant - indicates a quantizer change by +/- 2, 4.• MV Deltas - are the MV prediction residuals.• Transform coefficients - are the 3-D VLC’s for the

coefficients.

Macroblock Layer Concepts


• Motion vectors are permitted to point outsidethe picture boundaries.– non-existent pixels are created by replicating the

edge pixels.– improves compression when there is movement

across the edge of a picture boundary or whenthere is camera panning.

• Also possible to extend the range of themotion vectors from [-16,15.5] to [-31.5,31.5]with some restrictions. This better addresseshigh motion scenes.

H.263 Options

Motion Vectors OverPicture Boundaries

Target Frame NReference Frame N-1

Edge pixelsare repeated.

H.263 Options

Extended MV Range

15.5

15.5

-16

-16-16

-1615.5

15.5 (31.5,31.5)

Base motion vector range.

Extended motionvector range, [-16,15.5]around MV predictor.

H.263 Options

H.263 Options

• Includes motion vectors across pictureboundaries from the previous mode.

• Option of using four motion vectors for 8x8blocks instead of one motion vector for 16x16blocks as in baseline.

• Overlapped motion compensation to reduceblocking artifacts.

Advanced Prediction Mode

Overlapped MotionCompensation

• In normal motion compensation, the currentblock is composed of– the predicted block from the previous frame

(referenced by the motion vectors), plus– the residual data transmitted in the bit stream for

the current block.• In overlapped motion compensation, the

prediction is a weighted sum of threepredictions.

H.263 Options

Overlapped MotionCompensation

H.263 Options

• Let (m, n) be the column & row indices of an8×8 pixel block in a frame.

• Let (i, j) be the column & row indices of apixel within an 8×8 block.

• Let (x, y) be the column & row indices of apixel within the entire frame so that:

(x, y) = (m×8 + i, n×8 + j)

Overlapped Motion Comp.• Let (MV0x,MV0y) denote the

motion vectors for the currentblock.

• Let (MV1x,MV1y) denote themotion vectors for the blockabove (below) if the currentpixel is in the top (bottom) halfof the current block.

• Let (MV2x,MV2y) denote themotion vectors for the block tothe left (right) if the currentpixel is in the left (right) half ofthe current block.

H.263 Options

MV0

MV1

MV1

MV2 MV2

Overlapped Motion Comp.Then the summed, weighted prediction is denoted:P(x,y) =

(q(x,y) H0(i,j) + r(x,y) H1(i,j) + s(x,y) H2(i,j) +4)/8

Where,q(x,y) = (x + MV0x, y + MV0y),r(x,y) = (x + MV1x, y + MV1y),s(x,y) = (x + MV2x, y + MV2y)

H.263 Options

Overlapped Motion Comp.4 5 5 5 5 5 5 4

5 5 5 5 5 5 5 5

5 5 6 6 6 6 5 5

5 5 6 6 6 6 5 5

5 5 6 6 6 6 5 5

5 5 6 6 6 6 5 5

5 5 5 5 5 5 5 5

4 5 5 5 5 5 5 4

H0(i, j) =

H.263 Options

Overlapped Motion Comp.1 2 2 2 2 2 2 1

1 1 2 2 2 2 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 2 2 2 2 1 1

1 2 2 2 2 2 2 1

H1(i, j) =

H.263 Options

H2(i, j) = ( H1(i, j) )T

H.263 Options

• Permits two pictures to be coded as one unit:a P frame as in baseline, and a bi-directionallypredicted frame or B frame.

• B frames provide more efficient compressionat times.

• Can increase frame rate 2X with only about30% increase in bit rate.

• Restriction: the backward predictor cannotextend outside the current MB position of thefuture frame. See diagram.

PB Frames Mode

Picture 1P or I Frame

Picture 2B Frame


V 1/2 -V 1/2

2X frame rate for only 30% more bits.

H.263 Options

PB Frames

PB

H.263 Options

• In this mode, all the variable length codingand decoding of baseline H.263 is replacedwith arithmetic coding/decoding. Thisremoves the restriction that each sumbolmust be represented by an integer number ofbits, thus improving compression efficiency.

• Experiments indicate that compression canbe improved by up to 10% over variablelength coding/decoding.

• Complexity of arithmetic coding is higherthan variable length coding, however.

Syntax based ArithmeticCoding Mode

• H.261 only accepts QCIF and CIF format.

• No 1/2 pel motion estimation in H.261, instead ituses a spatial loop filter.

• H.261 does not use median predictors for motionvectors but simply uses the motion vector in theMB to the left as predictor.

• H.261 does not use a 3-D VLC for transformcoefficient coding.

• GOB headers are mandatory in H.261.

• Quantizer changes at MB granularity requires 5bits in H.261 and only 2 bits in H.263.

H.263 Improvements over H.261

H.261 H.263

Demo: QCIF, 8 fps @ 28 Kb/s

VideoConferencing

Demonstration

VideoConferencing

Demonstration


�Video Compression Review�Chronology of Video Standards�The Input Video Format�H.263 Overview�

H.263 Options

ITU-T RecommendationH.263 Version 2

(H.263+)

H.263 Ver. 2 (H.263+)• H.263+ was standardized in January, 1998.

• H.263+ is the working name for H.263Version 2.

• Adds negotiable options and featureswhile still retaining a backwardscompatibility mode.

H.263+

H.263 “plus” more negotiable options

• Arbitrary frame size, pixel aspect ratio (includingsquare), and picture clock frequency

• Advanced INTRA frame coding• Loop de-blocking filter• Slice structures• Supplemental enhancement information• Improved PB-frames

H.263: OverviewH.263+

H.263: OverviewH.263 “plus” more negotiable options

• Reference picture selection• Temporal, SNR, and Spatial Scalability Mode• Reference picture resampling• Reduced resolution update mode• Independently segmented decoding• Alternative INTER VLC• Modified quantization

H.263+

• In addition to the multiples of CIF, H.263+ permitsany frame size from 4x4 to 2048x1152 pixels inincrements of 4.

• Besides the 12:11 pixel aspect ratio (PAR), H.263+supports square (1:1), 525-line 4:3 picture (10:11),CIF for 16:9 picture (16:11), 525-line for 16:9 picture(40:33), and other arbitrary ratios.

• In addition to picture clock frequencies of 29.97 Hz(NTSC), H.263+ supports 25 Hz (PAL), 30 Hz andother arbitrary frequencies.

Arbitrary Frame Size, PixelAspect Ratio, Clock Frequency

H.263+

• In this mode, either the DCcoefficient, 1st column, or 1st rowof coefficients are predicted fromneighboring blocks.

• Prediction is determined on a MB-by-MB basis.

• Essentially DPCM of INTRA DCTcoefficients.

• Can save up to 40% of the bits onINTRA frames.

Advanced INTRA Coding Mode

Advanced INTRA Mode

DCT Blocks

RowPrediction

ColumnPrediction

H.263+

A

DCB

DCBA

Blockboundary

block2block1

block1

H.263+

• Filter pixels along blockboundaries while preservingedges in the image content.

• Filter is in the coding loop whichmeans it filters the decodedreference frame used for motioncompensation.

• Can be used in conjunction with apost-filter to further reducecoding artifacts.

Deblocking Filter Mode


A

DCB

DCBA

BlockBoundary

BlockBoundary

H.263+


• A, B, C and D are replaced by new values,A1, B1, C1, and D1 based on a set of non-linear equations.

• The strength of the filter is proportional tothe quantization strength.

H.263+

Deblocking Filter ModeA,B,C,D are replaced by A1,B1,C1, D1:

B1 = clip(B + d1)C1 = clip(C - d1)A1 = A - d2D1 = D + d2d2 = clipd1((A - D)/4, d1 / 3)d1 = Filter((A - 4B + 4C - D)/8, Strength(QUANT) )

Filter(x, Strength) =SIGN(x) * (MAX(0, abs(x) - MAX(0, 2*( abs(x) - Strength))))

H.263+

Post-Filter

• Filter the decoded frame first horizontally,then vertically, using a 1-D filter.

• The post-filter strength is proportional tothe quantization: Strength(QUANT)

D1 = D + Filter((A+B+C+E+F+G-6D)/8,Strength)

H.263+

Deblocking Filter Demo

H.263+

No Filter DeblockingLoop Filter

Deblocking Filter Demo

H.263+

No Filter Loop &Post Filter

Filter Demo Videos

No Filter Loop Filter

Loop &Post Filter

H.263+

• Allows insertion of resynchronization markersat macroblock boundaries to improve networkpacketization and reduce overhead. More onthis later.

• Allows more flexible tiling of video frames intoindependently decodable areas to support“view ports”, a.k.a. “local decode.”

• Improves error resiliency by reducing intra-frame dependence.

• Permits out-of-order transmission to reducelatency.

Slice Structured Mode

Slice Structured Mode

SliceBoundaries

No INTRA or MVPrediction acrossslice boundaries.

H.263+

Slices start and endon macroblockboundaries.

Slice Structured ModeIndependent Segments

SliceBoundaries

No INTRA or MVPrediction acrossslice boundaries.

H.263+

Slice sizes remainfixed between INTRAframes.

H.263+

Backwards compatible with H.263 butpermits indication of supplementalinformation for features such as:

• Partial and full picture freeze requests• Partial and full picture snapshot tags• Video segment start and end tags for off-line

storage• Progressive refinement segment start and end

tags• Chroma keying info for transparency

Supplemental EnhancementInformation

H.263+

• Allows frame size changes of acompressed video sequence withoutinserting an INTRA frame.

• Permits the warping of the reference framevia affine transformations to addressspecial effects such as zoom, rotation,translation.

• Can be used for emergency rate control bydropping frame sizes adaptively when bitrate get too high.

Reference Picture Resampling

Reference Picture Resamplingwith Warping

Specify arbitrarywarping parametersvia displacementvectors from corners.

H.263+

Reference Picture ResamplingFactor of 4 Size Change

P P P P P

No INTRA Frame Required when changingvideo frame sizes

H.263+

Base LayerEnhancement Layer 1Enhancement Layer 2

H.263+

• A scalable bit stream consists of layersrepresenting different levels of video quality.

• Everything can be discarded except for the baselayer and still have reasonable video.

• If bandwidth permits, one or more enhancementlayers can also be decoded which refines the baselayer in one of three ways:

temporal, SNR, or spatial

Scalability Mode

Layered Video Bitstreams

Enh. Layer 1

Enhancement Layer 3

Enhancement Layer 4

Base Layer

Enhancement Layer 2

H.263+ Encoder

40 kb/s

20 kb/s

90 kb/s

200 kb/s

320 kb/s

H.263+

H.263+

• Scalability is typically used when one bit streammust support several different transmissionbandwidths simultaneously, or some processdownstream needs to change the data rateunbeknownst to the encoder.

• Example: Conferencing Multipoint Control Unit(we’ll see another example in Internet Video)

Scalability Mode

Layered Video Bit Streams inmultipoint conferencing

384 kb/s

384 kb/s

128 kb/s

28.8 kb/s

H.263+

Temporal Enhancement

HigherFrame Rate!

Base LayerBase Layer + B Frames+ B Frames

H.263+

Temporal scalability means that two or moreframe rates can be supported by the same bitstream. In other words, frames can bediscarded (to lower the frame rate) and the bitstream remains usable.

H.263+

Temporal Scalability

IorP

B B P ......

H.263+

• The discarded frames are never used asprediction.

• In the previous diagram the I and P frames formthe base layer and the B frames from thetemporal enhancement layer.

• This is usually achieved using bidirectionalpredicted frames or B-frames.

Temporal Scalability


Picture 2B Frame


V 1/2

-V 1/2

2X frame rate for only 30% more bits

H.263+

B Frames

Temporal ScalabilityDemonstration

• layer 0, 3.25 fps, P-frames

• layer 1, 15 fps, B-frames

H.263+

SNR Enhancement

BetterSpatialQuality!

Base LayerBase Layer + SNR Layer+ SNR Layer

H.263+

H.263+

• Base layer frames are coded just as they wouldbe in a normal coding process.

• The SNR enhancement layer then codes thedifference between the decoded base layerframes and the originals.

• The SNR enhancement MB’s may be predictedfrom the base layer or the previous frame in theenhancement layer, or both.

• The process may be repeated by addinganother SNR enhancement layer, and so on...

SNR Scalability

Base Layer(15 kbit/s)

EnhancementLayer(40 kbit/s)

Legend:I - Intracoded or KeyFrame

P - Predicted Frame

EI - Enhancement layerkey frame

EP - Enhancement layerpredicted frame

H.263+

SNR Scalability

EI EP EP

PPI

SNR ScalabilityDemonstration

• layer 0, 10 fps, 40 kbps

• layer 1, 10 fps, 400 kbps

H.263+

Spatial Enhancement

MoreSpatial

Resolution!!

Base LayerBase Layer + Spatial Layer+ Spatial Layer

H.263+

H.263+

• For spatial scalability, the video is down-sampled by two horizontally and vertically priorto encoding as the base layer.

• The enhancement layer is 2X the size of thebase layer in each dimension.

• The base layer is interpolated by 2X beforepredicting the spatial enhancement layer.

Spatial Scalability

H.263+

Spatial Scalability

EPEPEI

I P P

EnhancementLayer

BaseLayer

Spatial ScalabilityDemonstration

• layer 0, QCIF, 10 fps, 60kbps

• layer 1, CIF, 10 fps, 300kbps

H.263+

It is possible to combine temporal, SNRand spatial scalability into a flexiblelayered framework with many levels ofquality.

H.263+

Hybrid Scalability

H.263+

Hybrid Scalability

EP

EP

P

EI

EP

P

B

EP

P

EI

EI

IBaseLayer

EnhancementLayer 1

EnhancementLayer 2

Scalability Demonstration• SNR/Spatial Scalability, 10 fps

– layer 0, 88x72, ~5 kbit/s– layer 1, 176x144, ~15– layer 2, 176x144, ~40– layer 3, 352x288, ~80– layer 4, 352x288, ~200

H.263+

H.263+

Other Miscellaneous Features• Improved PB-frames

– Improves upon the previous PB-frame mode by permittingforward prediction of “B” frame with a new vector.

• Reference picture selection (discussed later)– A lower latency method for dealing with error prone

environments by using some type of back-channel toindicate to an encoder when a frame has been received andcan be used for motion estimation.

• Reduced resolution update mode– Used for bit rate control by reducing the size of the residual

frame adaptively when bit rate gets too high.

Other Miscellaneous Features• Independently decodable segments

– When signaled, it restricts the use of dataoutside of a current Group-of-Blocksegment or slice segment. Useful for errorresiliency.

• Alternate INTER VLC– Permits use of an alternative VLC table

that is better suited for INTRA codedblocks, or blocks with low quantization.

H.263+

Other Miscellaneous Features• Modified Quantization

– Allows more flexibility in adapting quantizerson a macroblock by macroblock basis byenabling large quantizer changes throughthe use of escape codes.

– Reduces quantizer step size forchrominance blocks, compared to luminanceblocks.

– Modifies the allowable DCT coefficient rangeto avoid clipping, yet disallows illegalcoefficient/quantizer combinations.

H.263+

Section 1: Conferencing VideoSection 2: Internet ReviewSection 3: Internet Video

�

Outline

The InternetThe Internet

Phone lines are “circuit-switched”. A (virtual) circuitis established at call initiation and remains for theduration of the call.

Source Dest.switch

switch

switch

Internet Review

Internet Basics

Computer networks are “packet-switched”. Data isfragmented into packets, and each packet finds itsway to the destination using different routes. Lotsof implications...

Source Dest.switch

switch

switchX

Internet Review

Internet Basics

R

R R

Corporate LAN

INTERNETINTERNET(Global Public)(Global Public)

AOL

HyperStreamHyperStreamFR, SMDS, ATMFR, SMDS, ATM

LAN LAN

TYMNETTYMNET

MCI MailMCI Mail

LAN Mail

GW

HostDial-up IP“SLIP”, “PPP”

IP

IPIP

“SMTP”E-mail

FR

FRFR“SLIP”“PPP”

X.25

“SMTP”IP

Dial-up

E-mail

The Internet is heterogeneous [V. Cerf]

Network Access Layerconsists of routines for accessing

physical networks

Network Access Layerconsists of routines for accessing

physical networks

1

Internet Layerdefines the datagram and handles the

routing of data.

Internet Layerdefines the datagram and handles the

routing of data.

2

Host-to-Host Transport Layerprovides end-to-end data delivery

services.

Host-to-Host Transport Layerprovides end-to-end data delivery

services.

3

Application Layerconsists of applications and processes

that use the network.

Application Layerconsists of applications and processes

that use the network.

4

Internet Review

Layers in the Internet ProtocolArchitecture

HeaderHeader

HeaderHeader

Data Encapsulation

HeaderHeader

DataDataApplication Layer

Transport Layer

Internet Layer

Network Access Layer

DataData

HeaderHeader DataData

HeaderHeader HeaderHeader DataData

Internet Review

Data Encapsulation

I P

FDDIFDDI

EthernetEthernet

Token RingToken Ring

HDLCHDLC

SMDSSMDS

X.25X.25

ATMATM

FRFR

TCPTCP UDPUDP

SNMPSNMP DNSDNS

TELNETTELNET FTPFTP SMTPSMTP

MIMEMIME . . .

. . . NetworkAccessLayer

Internet

Host-HostTransport

Utility/Application

RTPRTP

Internet Review

MBoneMBone

VIC/VATVIC/VAT

Internet Protocol Architecture

IPIP

UD

PU

DP

RTPRTP

Specific Protocols forMultimedia

IPIP

TCPTCP UDPUDP

RTPRTP

Physical NetworkPhysical Network

payloadpayload

RTPRTP payloadpayload

UD

PU

DP

RTPRTP payloadpayload

Data

Internet Review

Payload header

Specific Protocols forMultimedia

Internet Review

• IP implements two basic functions– addressing & fragmentation

• IP treats each packet as an independententity.

• Internet routers choose the best path to sendeach packet based on its address. Eachpacket may take a different route.

• Routers may fragment and reassemblepackets when necessary for transmission onsmaller packet networks.

The Internet Protocol (IP)

Internet Review

The Internet Protocol (IP)

• IP packets have a Time-to-Live, after whichthey are deleted by a router.

• IP does not ensure secure transmission.• IP only error-checks headers, not payload.• Summary: no guarantee a packet will reach

its destination, and no guarantee of when itwill get there.

Transmission Control Protocol(TCP)

Internet Review

Transmission Control Protocol(TCP)

• TCP is connection-oriented, end-to-end reliable, in-order protocol.

• TCP does not make any reliability assumptions of theunderlying networks.

• Acknowledgment is sent for each packet.• A transmitter places a copy of each packet sent in a

timed buffer. If no “ack” is received before the time isout, the packet is re-transmitted.

• TCP has inherently large latency - not well suited forstreaming multimedia.

Internet Review

Universal Datagram Protocol(UDP)

• UDP is a simple protocol for transmittingpackets over IP.

• Smaller header than TCP, hence loweroverhead.

• Does not re-transmit packets. This is OK formultimedia since a late packet usually mustbe discarded anyway.

• Performs check-sum of data.

Internet Review

Real time Transport Protocol(RTP)

• RTP carries data that has real time properties• Typically runs on UDP/IP• Does not ensure timely delivery or QoS.• Does not prevent out-of-order delivery.• Profiles and payload formats must be

defined.• Profiles define extensions to the RTP header

for a particular class of applications such asaudio/video conferencing (IETF RFC 1890).

Internet Review

Real-time Transport Protocol(RTP)

• Payload formats define how a particular kindof payload, such as H.261 video, should becarried in RTP.

• Used by Netscape LiveMedia, MicrosoftNetMeeting®, Intel VideoPhone, ProShare®Video Conferencing applications and publicdomain conferencing tools such as VIC andVAT.

Internet Review

Real-time Transport ControlProtocol (RTCP)

• RTCP is a companion protocol to RTP whichmonitors the quality of service and conveysinformation about the participants in an on-going session.

• It allows participants to send transmissionand reception statistics to other participants. Italso sends information that allowsparticipants to associate media types such asaudio/video for lip-sync.

Internet Review

Real-time Transport ControlProtocol (RTCP)

• Sender reports allow senders to derive roundtrip propagation times.

• Receiver reports include count of lost packetsand inter-arrival jitter.

• Scales to a large number of users since mustreduce the rate of reports as the number ofparticipants increases.

• Most products today don’t use the informationto avoid congestion, but that will change inthe next year or two.

Internet Review

Multicast Backbone (Mbone)• Most IP-based communication is unicast. A

packet is intended for a single destination.For multi-participant applications, streamingmultimedia to each destination individuallycan waste network resources, since the samedata may be travelling along sub-networks.

• A multicast address is designed to enable thedelivery of packets to a set of hosts that havebeen configured as members of a multicastgroup across various subnetworks.

S1

D1

S2D1

D21

1

11

2

2

21

1

1

1

S1 sends duplicatepackets because there’stwo participants: D1, D2..

D2 sees excesstraffic on this

subnet.

Internet Review

Unicast ExampleStreaming media to multi-participants

S1

D1

S2D1

D21

1

12

2

21

1

S1 sends single set ofpackets to a multicast

group.

D2 doesn’t seeany excess traffic

on this subnet.

Both D1 receiverssubscribe to the

same multicast group.

Internet Review

Multicast ExampleStreaming media to multi-participants

Multicast Backbone (MBone)

• Most routers sold in the last 2-3 years supportmulticast.

• Not turned on yet in the Internet backbone.• Currently there is an MBone overlay which

uses a combination of multicast (wheresupported) and tunneling.

• Multicast at your local ISP may be 1-2 yearsaway.

Internet Review

Internet Review

ReSerVation Protocol (RSVP)Internet Draft

• Used by hosts to obtain a certain QoS fromunderlying networks for a multimedia stream.

• At each node, RSVP daemon attempts to make aresource reservation for the stream.

• It communicates with two local modules: admissioncontrol and policy control.

• Admission control determines whether the node hassufficient resources available. “The Internet BusySignal”

• Policy control determines whether the user hasadministrative permission to make the reservation.

Internet Review

Real-time Streaming Protocol(RTSP) Internet Draft

• A “network remote control” for multimedia servers.• Establishes and controls either a single or several

time-synchronized streams of continuous media suchas audio and video.

• Supports the following operations:– Requests a presentation from a media server.– Invite a media server to join a conference and playback or

record.– Notify clients that additional media is available for an existing

presentation.

Internet Review

Hyper-Text Transport Protocol(HTTP)

• HTTP generally runs on TCP/IP and is theprotocol upon which World-Wide-Web data istransmitted.

• Defines a “stateless” connection betweenreceiver and sender.

• Sends and receives MIME-like messages andhandles caching, etc.

• No provisions for latency or QoS guarantees.

OutlineSection 1: Conferencing VideoSection 2: Internet ReviewSection 3: Internet Video

��

Internet Video

We’ll look at some solutions...

Internet Video

How do we stream video overthe Internet?

• How do we handle the special cases ofunicasting? Multicasting?

• What about packet-loss? Quality of service?Congestion?

Internet Video

HTTP Streaming• HTTP was not designed for streaming

multimedia, nevertheless because of itswidespread deployment via Web browsers,many applications stream via HTTP.

• It uses a custom browser plug-in which canstart decoding video as it arrives, rather thanwaiting for the whole file to download.

• Operates on TCP so it doesn’t have to dealwith errors, but the side effect is high latencyand large inter-arrival jitter.

Internet Video

HTTP Streaming• Usually a receive buffer is employed which

can buffer enough data (usually severalseconds) to compensate for latency and jitter.

• Not applicable to two-way communication!• Firewalls are not a problem with HTTP.

Internet Video

RTP Streaming• RTP was designed for streaming multimedia.• Does not resend lost packets since this would

add latency and a late packet might as wellbe lost in streaming video.

• Used by Intel Videophone, MicrosoftNetMeeting, Netscape LiveMedia,RealNetworks, etc.

• Forms the basis for network videoconferencing systems (ITU-T H.323)

Internet Video

RTP Streaming• Subject to packet loss, and has no quality of

service guarantees.• Can deal with network congestion via RTCP

reports under some conditions:– Should be encoding real time so video rate can be

changed dynamically.• Needs a payload defined for each media it

carries.

Internet Video

RTP HeaderRTP Header

H.263 Payload HeaderH.263 Payload Header

H.263 Payload (bit stream)H.263 Payload (bit stream)

H.263 Payload for RTP• Payloads must be defined in the IETF for all media

carried by RTP.• A payload has been defined for H.263 and is now an

Internet RFC.• A payload has been defined for H.263+ as an ad-hoc

group activity in the ITU and is now an Internet Draft.• An RTP packet typically consists of...

Internet Video

H.263 Payload for RTP• The H.263 payload header contains

redundant information about the H.263 bitstream which can assist a payload handlerand decoder in the event that related packetsare lost.

• Slice mode of H.263+ aids RTP packetizationby allowing fragmentation on MB boundaries(instead of MB rows) and restricting datadependencies between slices.

• But what do we do when packets are lost orarrive too late to use?

Error Resiliency:Redundancy &

Concealment Techniques

Internet Video

Internet Packet Loss• Depends on network topology.• On the Mbone

– 2-5% packet loss– single packet loss most common

• For end-to-end transmission, loss rates of10% not uncommon.

• For ISPs, loss rates may be even higherduring high periods of congestion.

Internet Video

Packet Loss Burst LengthsDistribution of length of loss bursts

observed at a receiver

0.0001

0.001

0.01

0.1

1

0 5 10 15 20 25 30 35 40 45 50

length of loss bursts, b

Prob

abili

ty o

f bur

sts

of le

ngth

b

Internet Video

Conditional loss probability

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 2 4 6 8 10 12

Number of consecutive packets lost, n

Prob

abili

ty o

f los

ing

pack

et n

+1

Internet Video

First Order Loss Model2-Stage Gilbert Model

NoLoss Loss

1 - p 1 - q

p

q

Internet Video

p = 0.083 q = 0.823

Error Resiliency

+

- RED

UN

DA

NC

Y

com

pres

sion

resi

lienc

y

• Error resiliency and compressionhave conflicting requirements.

• Video compression attempts toremove as much redundancy out ofa video sequence as possible.

• Error resiliency techniques at somepoint must reconstruct data thathas been lost and must rely onextrapolations from redundantdata.

Internet Video

Error ResiliencyErrors tend to propagate in video compression

because of its predictive nature.

I or P frame P frame

One block is lost.One block is lost. Error propagates to twoblocks in the next frame.Error propagates to twoblocks in the next frame.

Internet Video

Error ResiliencyInternet Video

There are essentially two approaches todealing with errors from packet loss:

– Error redundancy methods are preventativemeasures that add extra infromation at theencoder to make it easier to recover when datais lost. The extra overhead decreasescompression efficiency but should improveoverall quality in the presence of packet loss.

– Error concealment techniques are the methodsthat are used to hide errors that occur oncepackets are lost.

Usually both methods are employed.

Simple INTRA Coding &Skipped Blocks

Internet Video

• Increasing the number of INTRA codedblocks that the encoder produces willreduce error propagation since INTRAblocks are not predicted.

• Blocks that are lost at the decoder aresimply treated as empty INTER codedblocks. The block is simply copied fromthe previous frame.

• Very simple to implement.

Intra Coding Resiliency

20

25

30

35

40

45

20 40 60 80 100 120 140 160 180

Data Rate (kbps)

Aver

age

PSNR

resil 0loss 0

resil 5loss 0

resil 10loss 0

resil 0loss 10-20resil 5loss 10-20resil 10loss 10-20

Internet Video

Reference Picture SelectionMode of H.263+

I or Pframe

Pframe

Pframe

Last acknowledgederror-free frame.

In RPS Mode, a frame is not used forprediction in the encoder until it’s been

acknowledged to be error free.

No acknowledgmentreceived yet - not

used for prediction.

Internet Video

Reference Picture Selection

Internet Video

• ACK-based: a picture is assumed tocontain errors, and thus is not used forprediction unless an ACK is received, or…

• NACK-based: a picture will be used forprediction unless a NACK is received, inwhich case the previous picture that didn’treceive a NACK will be used.

Multi-threaded Video

13

2

5 7 9

4

6

8 10

I

P

P

P

P P

P P

PI

Internet Video

• Reference pictures are interleaved to create twoor more independently decodable threads.

• If a frame is lost, the frame rate drops to 1/2 rateuntil a sync frame is reached.

• Same syntax as Reference Picture Selection, butwithout ACK/NACK.

• Adds some overhead since prediction is notbased on most recent frame.

Conditional Replenishment

ME/MC DCT, etc.

decoder

decoder

Encoder

Internet Video

• A video encoder contains a decoder (calledthe loop decoder) to create decoded previousframes which are then used for motionestimation and compensation.

• The loop decoder must stay in sync with thereal decoder, otherwise errors propagate.

Conditional Replenishment

Internet Video

• One solution is to discard the loop decoder.• Can do this if we restrict ourselves to just two

macroblock types:– INTRA coded and– empty (just copy the same block from the previous

frame)• The technique is to check if the current block has

changed substantially since the previous frameand then code it as INTRA if it has changed.Otherwise mark it as empty.

• A periodic refresh of INTRA coded blocksensures all errors eventually disappear.

Error TrackingAppendix II, H.263

Internet Video

• Lost macroblocks are reported back to theencoder using a reliable back-channel.

• The encoder catalogs spatial propagation of eachmacroblock over the last M frames.

• When a macroblock is reported missing, theencoder calculates the accumulated error in eachMB of the current frame.

• If an error threshold is exceeded, the block iscoded as INTRA.

• Additionally, the erroneous macroblocks are notused as prediction for future frames in order tocontain the error.

Prioritized Encoding

AC Coefficients

DC Coefficients

MB Information

Motion Vectors

Picture Header

Incr

easi

ngEr

ror P

rote

ctio

n

Internet Video

• Some parts of a bit stream contribute more toimage artifacts than others if lost.

• The bit stream can be prioritized and moreprotection can be added for higher priorityportions.

Prioritized Encoding Demo

Internet Video

UnprotectedEncoding

PrioritizedEncoding

(23% Overhead)

Videos used with permission of ICSI, UC Berkeley

Error Concealment byInterpolation

d1

d2

Lost block

Take the weighted average of4 neighboring pixels.

Internet Video

Other Error ConcealmentTechniques

Internet Video

• Error Concealment with Least Square Constraints• Error Concealment with Bayesian Estimators• Error Concealment with Polynomial Interpolation• Error Concealment with Edge-Based Interpolation• Error Concealment with Multi-directional

Recursive Nonlinear Filter (MRNF)

See references for more information...

Example: MRNF Filtering

[email protected] bpp, block loss:10% MRNF-GMLOS, PSNR=34.94dB

Internet Video

Network Congestion

• Most multimedia applications place the burden ofrate adaptivity on the source.

• For mutlicasting over heterogeneous networksand receivers, it’s impossible to meet theconflicting requirements which forces the sourceto encode at a least-common denominator level.

• The smallest network pipe dictates the quality forall the other participants of the multicast session.

• If congestion occurs, the quality of servicedegrades as more packets are lost.

Internet Video

Receiver-driven LayeredMulticast

• If the responsibility of rate adaptation is moved tothe receiver, heterogeneity is preserved.

• One method of receiver based rate adaptivity is tocombine a layered source with a layeredtransmission system.

• Each bit stream layer belongs to a differentmulticast group.

• In this way, a receiver can control the rate bysubscribing to multicast groups and thus layersof the video bit stream.

Internet Video

Receiver-driven LayeredMulticast

S

D3

D2

D1

R

R123

123

12

12

1

Internet Video

Multicast groups are not transmittedon networks that have no subscribers.

Documents

Video Compression Conferencing & Internet Videoce.sharif.edu/courses/86-87/2/ce342/resources/root/Lecture/H263.pdfCr, Cb represents the color difference or chrominance of a pixel