View
247
Download
3
Category
Preview:
Citation preview
CYH/MMT/CmpAV/p.1
Audio and video compression
4.1 introduction
• Unlike text and images, both audio and most videosignals are continuously varying analog signals.
• Compression algorithms associated with digitized audioand video are different from those associated with textand images.
CYH/MMT/CmpAV/p.2
4.2 Audio compression
• Speech and non-speech signals are encoded in differentapproaches.
4.2.1 Speech coding
• Differential pulse code modulation (DPCM) is aderivative of standard PCM and exploits the fact that, formost audio signals, the range of the differences inamplitude between successive samples of the audiowaveform is less than the range of the actual sampleamplitudes. (G.711)
• In Adaptive differential PCM (ADPCM), fewer bits areused to encode smaller difference values than for largervalues. (G.721, G.722 & G.726)
• DPCM and ADPCM can also be used to encode non-speech signals.
• In linear predictive coding (LPC), a speech signal isanalyzed to extract its perceptual features including pitchand format frequencies and these features are thenencoded. (LPC-10, G.728 , G.723 & G.729)
CYH/MMT/CmpAV/p.3
• Summary of speech compression standards and theirapplications:
Standard Compressiontechnique
Compressedbit rate (kbps)
Quality Exampleapplications
G.711 PCM+companding
64 Good PSTN/ISDNtelephony
G.721 ADPCM 3216
GoodFair
Telephony atreduced bitrates
G.722 ADPCMwithsubbandcoding
6456/48
ExcellentGood
Audioconferencing
G.726 ADPCMwithsubbandcoding
40/3224/16
GoodFair
Generaltelephony atreduced bitrates
LPC-10 LPC 2.4/1.2 Poor Telephony inmilitarynetworks
G.728 Code-excitedLPC (CELP)
16 Good Low delay/lowbit ratetelephony
G.729 CELP 8 Good Telephony incellularnetworks
G.729(A) CELP 8 Good Simultaneoustelephony anddata (fax)
G.723.1 CELP 6.35.3
GoodFair
Video andinternettelephony
CYH/MMT/CmpAV/p.4
4.2.2 Perceptual coding
• Audio signal is coded based on a psychoacoustic modelwhich describes the limitations of the human ear.
• Ear is more sensitive to some signals than others.
• Frequency masking: A strong signal may reduce thelevel of sensitivity of the ear to other signals which arenear to it in frequency.
• Temporal masking: When the ear hears a loud sound, ittakes a short but finite time before it can hear a quietersound.
CY
H/M
MT
/Cm
pAV
/p.5C
YH
/MM
T/C
mpA
V/p.6
CYH/MMT/CmpAV/p.7 CYH/MMT/CmpAV/p.8
MPEG audio coders
• An international standard based on this approach isdefined in ISO Recommendation 11172-3.
• Summary of MPEG layer 1, 2 and 3 perceptual encoders
Layer Application Compressedbit rate
Quality Exampleinput-to-output delay
1 Digital audiocassette
32-448 kbps Hi-fi qualityat 192kbpsper channel
20ms
2 Digital audioand digital videobroadcasting
32-192 kbps Near CD-quality at128 kbps perchannel
40ms
3 CD-qualityaudio over lowbit rate channel
64 kbps CD-qualityat 64 kbpsper channel
60ms
• A higher layer makes a better use of the psychoacousticmodel and hence higher compression rate can beachieved.
• The 3 layers require increasing levels of complexity (andhence cost) to achieve a particular perceived quality, thechoice of layer and bit rate is often a compromisebetween the desired perceived quality and the availablebit rate.
CYH/MMT/CmpAV/p.9
Dolby audio coders
• In AC-1, the bit allocation information of the quantizedsubband samples is directly encoded and embedded inthe bit-stream.
• In AC-2, this information is indirectly encoded and hasto be estimated at the decoder.
• In AC-3, additional information is transmitted tocompensate for the estimation error.
• The acoustic quality of both the MPEG and Dolby audiocoders were found to be comparable.
• Summary of compression standards for general audio:
Standard Compressedbit rate
Quality Exampleapplications
Layer 1 32-448kbps Hi-fi qualityat 192kbps
Digital audiocassettes
Layer 2 32-192kbps Near CD at128 kbps
Digital audio anddigital videobroadcasting
MPEGAudio
Layer 3 64kbps CD quality CD-quality overlow bit ratechannels
AC-1 512kbps Hi-fi quality Radio andtelevision satelliterelays
AC-2 256kbps Hi-fi quality PC sound cards
Dolbyaudiocoders
AC-3 192kbps Near CDquality
Digital videobroadcasting
CYH/MMT/CmpAV/p.10
CYH/MMT/CmpAV/p.11 CYH/MMT/CmpAV/p.12
4.3 Video compression
• There is not just a single standard associated with videobut rather a range of standards, each targeted at aparticular application domain.
4.3.1 Video compression principles
• Video is simply a sequence of digitized pictures and it isalso referred to as moving pictures.
• A video sequence can be encoded with JPEG algorithmframe by frame and this approach is known as motionJPEG.
• In addition to the spatial redundancy present in eachframe, considerable redundancy is often present betweensuccessive frames.
• Frames are classified as 1 of 3 basic frame types (I-, P-and B- frames) and encoded differently.
CYH/MMT/CmpAV/p.13 CYH/MMT/CmpAV/p.14
• I-frames:
• I-frames are encoded independently using the JPEGalgorithm.
• I-frames are inserted into the output stream relativelyfrequently.
• I-frames are used as access points for random accessand FF/FR functionality in the bit stream.
• P-frames:
• Frames are partitioned into blocks of size 16x16(macroblocks).
• To encode a P-frame, the contents of each macroblockin the target frame are compared on a pixel-by-pixelbasis with the contents of the reference frame to find abest-matched block of equal size.
• The reference frame can be a P- or I- frame.
• The (x,y) offset of the macroblock being encoded andthe best-matched block is known as motion vector.
• This motion-vector-searching process is known asmotion estimation.
CYH/MMT/CmpAV/p.15 CYH/MMT/CmpAV/p.16
• A prediction of the target frame is made with thereference frame based on the motion vectors obtained.
• The difference between the predicted frame and theactual target frame is known as the prediction error.
• Motion compensation: Additional bits are required toencode the prediction error so as to compensate for thedifference if necessary.
• B-frames:
• To encoded a B-frame, any motion is estimated withreference to both the immediately preceding I- or P-frame and the immediately succeeding P- or I-frame.
• B-frames provide the highest level of compression.
• B-frames are not involved in the coding of otherframes and hence they do not propagate errors.
CYH/MMT/CmpAV/p.17 CYH/MMT/CmpAV/p.18
• The number of frames between successive I-frames isknown as a group of pictures (GOP).
• The number of frames between a P-frame and theimmediately preceding I- or P-frame is called theprediction span.
• The order of encoding and transmission of the frames ischanged to minimize the time required to decode theframes.
• A 4th type of frame known as a PB-frame has also beendefined. Two neighboring P- and B-frames are encodedas if they were a single frame.
• A 5th type of frame known as a D-frame has beendefined for use in movie/video-on-demand applications.
CYH/MMT/CmpAV/p.19 CYH/MMT/CmpAV/p.20
• Basic bitstream format:
• Type : type of frame , I, P or B
• Address : identifies the location of the macroblock inthe frame
• Quantization value: the threshold value used toquantize all DCT coefficients in the macroblock.
• Motion vector: encoded vector
• Block present: indicates which block in themacroblock are present
• Typical figures of the compression ratios
• I-frames: 10~20:1
• P-frames: 20~30:1
• B-frames: 30~50:1
CYH/MMT/CmpAV/p.21
4.3.2 H.261
• H.261 has been defined by the ITU-T for the provisionof video telephony and videoconferencing services overan ISDN.
• Supports I- and P-frames only.
CYH/MMT/CmpAV/p.22
• Encoding format:
• Type: indicates if the macroblock is intracoded orintercoded
• Address: identifies the location of the macroblock inthe frame
• Quantization value: the threshold value used toquantize all DCT coefficients in the macroblock.
• Motion vector: encoded vector
• Coded block pattern: indicates which block in themacroblock are present
• Picture start code: indicates the start of a new frame.
• Temporal reference: a timestamp for the decoder tosynchronize the video information with the audioinformation.
• Picture type: indicates if the frame is encoded as I- orP-frame.
• GOB start code: is a resynchronization marker whichis used for resynchronization in case of error.
• Group of (macro)block (GOP) is a structure consists of3x11 macroblocks.
CYH/MMT/CmpAV/p.23 CYH/MMT/CmpAV/p.24
4.3.3 H.263
• H.263 has been defined by the ITU-T for use in a rangeof real-time video applications over wireless and PSTNs.
• The applications include video telephony,videoconferencing, security surveillance, interactivegames playing and so on.
• H.263 standard has a number of advanced coding optionscompared with H.261:
• Progressive scanning with a refresh rate of either 15 or7.5 fps.
• Support I-, P-, B- and PB- frames
• Motion vectors, if necessary, are allowed to pointoutside of the frame area.
• Schemes such as error tracking, independent segmentdecoding and reference picture selection are includedin the standard that aim at minimizing the effects oferrors on neighboring GOBs.
• Error concealment scheme is incorporated into thedecoder to mask the error from the viewer.
CYH/MMT/CmpAV/p.25
4.3.4 MPEG
• The Motion Pictures Expert Group (MPEG) was formedby the ISO to formulate a set of standards relating to arange of multimedia applications that involve the use ofvideo with sound.
MPEG1 : ISO Recommendation 11172
• Similar video compression technique as H.261.
• Progressive scanning with a refresh rate of 30Hz (forNTSC) and 25Hz (for PAL)
• Support I-, P- and B- frames
• I-frames must be used for the various random-accessfunctions associated with VCRs.
• Improvement with respect to H.261:
1. A new layer called slice is added in the structure ofthe stream such that the decoder can resynchronizemore quickly in case of error.
2. support B-frames
3. larger searching window of motion vectors and finerresolution of its representation
CYH/MMT/CmpAV/p.26
• Typical figures of the compression ratios
• I-frames: 10:1
• P-frames: 20:1
• B-frames: 50:1
CYH/MMT/CmpAV/p.27 CYH/MMT/CmpAV/p.28
• Bitstream format:
• Sequence start code: indicates the start of a sequence
CYH/MMT/CmpAV/p.29
• Video parameters: specify the screen size and aspectratio
• Bitstream parameters: indicate the bit rate and the sizeof the memory/ frame buffers that are required
• Quantization parameters: contain the contents of thequantization tables that are to be used.
-• GOP start code: indicates the start of a GOP
• Time stamp: used for synchronization purposes
• Parameters: defines the particular sequence of frametypes that are used in each GOP (e.g. IPPBPP)
-• Picture start code: indicates the start of a frame
• Type: indicates if it's a I-, P- or B-frame
• Buffer parameters: indicate how full the buffer shouldbe before the decoding operation should start
• Encode parameters: indicate the resolution of a motionvector.
-• Slice start code: indicates the start of a slice
• Vertical position: indicates the scan line in which theslice is
• Quantization parameters: indicates the scaling factorthat applies to this slice.
CYH/MMT/CmpAV/p.30
MPEG2 : ISO Recommendation 13818
• It supports four levels - low, main, high 1440 and high -each targeted at a particular application domain.
• There are 5 profiles associated with each level: simple,main, spatial resolution, quantization accuracy and high.
• The different combinations of levels and profiles form aframework for all standards activities associated withMPEG-2.
• One of the most popular setting is the MP@ML standardwhich is for digital television broadcasting.
• There are 3 standards associated with HDTV: advancedtelevision (ATV) in North America, digital videobroadcast (DVB) in Europe, and multiple sub-Nyquistsampling encoding (MUSE) in Japan.
ATV DVB MUSEAspect ratio 16/9 4/3 16/9Resolution 1280x720 1440x1152 1920x1035Compression(video)
MP@HL ofMPEG2
SSP@H1440of MPEG2
Similar toMP@HL
Compression(Audio)
Dolby AC-3 MP2
CYH/MMT/CmpAV/p.31
• Summary of video compression standards
Standard Digitizationformat
Compressedbit rate
Example applications
H.261 CIF/QCIF x64kbps Video telephony/conferencing over ISDNand LANs
H.263 S-QCIF/QCIF
<64kbps Video telephony/conferencing and securitysurveillance over low bitrate channels
MPEG-1/ISO11172
SIF <1.5Mbps Storage of VHS-qualityvideo on CD-ROMs
MPEG-2/ISO13818Low SIF <4Mbps Recording of VHS-quality
video4:2:0 <15MbpsMain4:2:2 <20Mbps
Digital video broadcasting
4:2:0 <60MbpsHigh 14404:2:2 <80Mbps
HDTV (4/3 aspect ratio)
4:2:0 <80MbpsHigh4:2:2 <100Mbps
HDTV (16/9 aspect ratio)
MPEG-4 Various 5kbps-tens Mbps
Versatile multimediacoding standard
Recommended