CS A490 Digital Media and Interactive Systemsmercury.pr.erau.edu/~siewerts/dmis/Lecture-Week-5.pdfCS A490 Digital Media and Interactive Systems ... mis_code/dct2/dct2.c ... Shannon’s

September 25, 2013 Sam Siewert

CS A490 Digital Media and Interactive

Systems

Lecture 5 - Deeper Dive into MPEG Digital Video Encoding

Overview MPEG at a High Level

– Pixel/Color Encoding in YUV (e.g. YCrCb 4:2:2) – Macro block encoding (DCT, Quantization, Zigzag, Huffman) – Group of Pictures

Deeper Dive in 13818-1 and 13818-2 – 13818-1: Transport Streams for Video & Audio

Container for Program Streams (188 Byte Packets) Multiplexed Video and Audio Elementary Streams PSI – Program Specific Information System Clock (PCT, PTS/DTS)

– 13818-2: Elementary Video Stream Encode/Decode Video DCT Macro-blocks Color Format GoP (I-Frame, B-Frame, P-Frame) Motion Compensation and Vector Quantization

Differences Between MPEG-2 and MPEG-4 Trick-Play Operations on Program and Transport Streams

Sam Siewert 2

MPEG-2: Order Of Operators

Sam Siewert 3

#1: POINT (Pixel) Encoding #2 A-C: Macro-Block Lossy Intra-Frame Compression #3: Motion-Based Compression in Group of Pictures

#1

#2A

#2B #2C #3

Sam Siewert 4

Step #1 – RGB to YCrCb 4:4:4 24-bit/30-bit (Lossless)

For every Y sample in a scan-line, there is also one CrCb sample – Each Y (Y7:Y0), Cr (Cr7:Cr0), & Cb (Cb7:Cb0) Sample is 8 bits – No compression between RGB and YCrCb 4:4:4 (both 24 bits/pixel) – For DCI and Post Production, Each Sample can be 10 bits

Typically a Post Production, CEDIA or DCI format

… 0 319

… 76,480 76,799

…

= Y, Cr, and Cb sample = Y sample only

48 bit to 32 bit

Sam Siewert 5

Step #1 – RGB to YCrCb 4:2:2 (Lossy) For every 2 Y samples in a scan-line, there is one CrCb sample – Each Y (Y7:Y0), Cr (Cr7:Cr0), & Cb (Cb7:Cb0) Sample is 8 bits – Two RGB Pixes = 48 bits, Whereas Two YCrCb is 32 bits, or 16

bits per pixel vs. 24 bits per pixel (33% smaller frame size)

… 0 319

… 76,480 76,799

…

= Y, Cr, and Cb sample = Y sample only

Pixel-0 = Y7:Y00, Cb7:Cb00; Pixel-1 = Y7:Y01, Cr7:Cr00 Pixel-2 = Y7:Y02, Cb7:Cb01; Pixel-3 = Y7:Y03, Cr7:Cr01 Pixel-4 = Y7:Y04, Cb7:Cb02; Pixel-5 = Y7:Y05, Cr7:Cr02

Sam Siewert 6

Step #1 – RGB to YCrCb 4:2:0 (Lossy) For every 4 Y samples in a scan-line, there is one CrCb sample – Each Y (Y7:Y0), Cr (Cr7:Cr0), & Cb (Cb7:Cb0) Sample is 8 bits – Two RGB Pixes = 48 bits, Whereas Four YCrCb is 48 bits, or 12

bits per pixel on average vs. 24 bits per pixel (50% smaller)

… 0 319

… 76,480 76,799

…

= Cr, Cb sample = Y sample only

Convolution Concepts Math operation on 2 functions, that produces a 3rd PSF Sharpen meets this Definition So do Many Mask Operations applied to Pixel Neighborhoods

Sam Siewert 7

2 impulses, f(t), g(X – t)

Area inside intersection

f convolved with g over t

DCT – Discrete Cosine Transform Convolution of Image with Discrete Cosine See http://www.cse.uaa.alaska.edu/~ssiewert/a490dmis_code/example-dct1/ De-convolved to restore image from Convolved Image

Sam Siewert 8

DCT

Inverse DCT

http://www.cse.uaa.alaska.edu/~ssiewert/a490dmis_code/example-dct1/

DCT Concepts F(x) is a sum of sinusoids (with specific frequency and amplitude) DCT operates of a discrete number of samples We can derive DC sum at any x, even where F(x) not known N x N Macro-block has Zero Frequency DC at 0,0 Increasing Horizontal Frequency Increasing Vertical Frequency Can De-convolve Can Eliminate High Frequency Horizontal and Vertical Terms with Minimal Losses to De-convolution (Loss of High Frequency Image Features)

Sam Siewert 9

Basic Concept of Waveforms Complex Waveform is Sum of Simple Fundamentals Simple Fundamentals Can Be Derived from Complex

Sam Siewert 10

Scanline DCT Example Small Losses Due to DCT, iDCT Numerical Truncation Larger Losses Due to H.O.T. Quantization and Truncation http://www.cse.uaa.alaska.edu/~ssiewert/a490dmis_doc/1D-DCT-N-Fundamentals.xlsx

Sam Siewert 11

http://www.cse.uaa.alaska.edu/~ssiewert/a490dmis_doc/1D-DCT-N-Fundamentals.xlsx

http://www.cse.uaa.alaska.edu/~ssiewert/a490dmis_doc/1D-DCT-N-Fundamentals.xlsx

What Is Lost with DCT Quantization? Noise More Than Anything Else Complex XY Variable Patterns (Real Science Data?)

Sam Siewert 12

Complex Tiling Higher Frequency X Higher Frequency Y Terms Can Still be Ignored

Complex Wood Texture Most Detail in X Far Less in Y

Randomized Cactus Image High X Detail High Y Detail Most Loss of Detail, But Noisy

Step #2A: Macro-block Discrete Cosine Transform

8x8 Pixel Block – Macro-block – SD NTSC 720x480 (90x60 Macro-blocks), 3:2 Aspect Ratio – HD 720 1280x720 (160x90 Macro-blocks), 16:9 AR – HD 1080 1920x1080 (240x135 Macro-blocks), 16:9 AR

Sam Siewert 13

Step #2B: Macro-block Quantization (Lossy)

Apply Weighting and Scaling 8x8 to DCT Produces Lots of Repeated Values (and Zeros) Compared to Original

Sam Siewert 14

Decode Process for #2A-B

Sam Siewert 15

How Lossy is the Decode Macro-Block?

Sam Siewert 16

OpenCV Macroblock DCT Example Same Cactus 320x240 with 80x80 DCT Macroblocks

Sam Siewert 17

DCT iDCT

Same Cactus 320x240 Again with 8x8 DCT Macroblocks

DCT iDCT

Mathematics for 2D DCT Frequency Variation on X and Y axes from top left to bottom right Straight-forward Algorithm Based on 2D Equation is O(n2) per dimension Like Cooley-Tukey for DFT, a DCT Algorithm that is O(n*log2(n)) has been formulated (Arai, Y.; Agui, T.; Nakajima, M. - Numerical Recipes: The Art of Scientific Computing (3rd ed.)) http://www.cse.uaa.alaska.edu/~ssiewert/a490dmis_code/dct2/dct2.c

Sam Siewert 18

http://en.wikipedia.org/wiki/File:Dctjpeg.png

http://www.cse.uaa.alaska.edu/~ssiewert/a490dmis_code/dct2/dct2.c

http://www.cse.uaa.alaska.edu/~ssiewert/a490dmis_code/dct2/dct2.c

Step #2C: Macro-block Run-Length and Huffman Encoding

Zig-Zag Run-Length Encoding to Exploit Repeated Data and Zeros found in H.O.T. of Quantized DCT

– 86, 1, 7, -5, -1, 0, 1, 0, 0, 2, -1, 1, 0, -1, 0 , 0, 0, 0, -1, 0, 0, …

Becomes:

Sam Siewert 19

Huffman Applied to RLE Data Huffman Tables for MPEG-2 Macro-Blocks Defined in 13818-2 (Lossless) Compression Based on Probability of Occurance Shannon’s Source Coding Theory: log2(P), P=probability of occurrence, Binary encoding of Symbols

Sam Siewert 20

Step #3: Group of Pictures Concept – Transmit Change-Only Data I-Frame Compressed Only Intra-Frame By Methods #2A-2C to Macro-Blocks I-Frame Can Be Decoded Alone P-Frame is Differences Only Over the GoP B-Frame is Differences Only Between Both I-Frame and Closest P-Frame Difference Data Can be Further Encoded with Lossless Methods Without Steps 2A-C, Specifically Quantization, and With High Motion Video, Could Blow-Up

Sam Siewert 21

Group of Pictures: High Level View

Sam Siewert 22

Overall MPEG YCrCb Compression Performance

Standard Definition 720x480x2 (675KB/frame) @ 30fps – Requires 20MB/sec (200 Mbps) Uncompressed – Typical MPEG-2 @ 3.75 Mbps, > 50x Compression – Typical MPEG-4 @ 1.5 Mbps, > 100x Compression – 10 to 20 Programs on QAM 256 (48Mbps, 6MhZ/Ch) – ≈10 MPEG-4 Programs on ATSC 8VSB (19.39 Mbps, 6MhZ/Ch)

HD 720p (1280x720x2,1800KB/frame) @ 30fps – Requires 53MB/sec (530Mbps) Uncompressed – Typical MPEG-2 @ 20 Mbps, > 25x Compression – Typical MPEG-4 @ 10 Mbps, > 50x Compression

HD 1080p (1920x1080x2, 4050KB/frame) @ 30fps – Requires 120MB/sec (1200Mbps) Uncompressed – Typical MPEG-2, VC-1 @ 45 Mbps, > 30x Compression – Typical MPEG-4 @ 20 Mbps, > 60x Compression

Sam Siewert 23

13818-2 Defines Elementray/Program Streams

13818-2: Elementary Video Stream Encode/Decode – Defines Color Sub-Sampling Formats – 8x8 Macro-Block Encoding – Video DCT – Post DCT Macro-Block Quantization Weighting and Scaling

Coefficients – RLE Zig-Zag Macro-Block Sampling – Huffman Encoding Table – Group of Pictures:I-Frame, B-Frame, P-Frame – Presentation and Decode Time Stamps (PTS/DTS) – Order of Encode and Decode Operations – Format for Video and Audio Elementary Streams

Not Suitable for Transport over Lossy Networks, but Sufficient for Local Playback (DVD, PC HDD, Flash-Memory Media)

Sam Siewert 24

13818-1 13818-1: Transport Streams for Video & Audio – Container for Program Streams (188 Byte Packets) – Multiplexed Video and Audio Elementary Streams – PSI – Program Specific Information

PID – Program ID Guide Data Emergency Broadcast

– System Clock (PCR, PTS/DTS) – Sequence Headers – Resolution and Format Information, Bit-Rate

GoP Header, Frame Header Slices of Macro-Blocks for Resoltuion Decoder Information (Color, Quantization Tables)

– Can Be Multiple Programs or Combined Audio and Video as a Program

MPEG-2 Video Elementary Stream AC-3 Audio Elementary Stream Secondary Audio Stream (Different Langauge) Up to 10 or More Audio+Video in One Transport Stream for Virtual Channles

Sam Siewert 25

Parsing an Elementary Video Stream

Sam Siewert 26

Many 188-Byte Packet Types and Header Allows for Multi-plexing of many Video and Audio Streams on a Carrier

PSI (Program Specific Information) Data

There are 4 PSI tables: Program Association (PAT), Program Map (PMT), Conditional Access (CAT), and Network Information (NIT). The MPEG-2 specification does not specify the format of the CAT and NIT.

PAT stands for Program Association Table. It lists all programs available in the transport stream. Each

of the listed programs is identified by a 16-bit value called program_number. Each of the programs listed in PAT has an associated value of PID for its Program Map Table (PMT). The value 0x0000 of program_number is reserved to specify the PID where to look for Network Information Table (NIT). If such a program is not present in PAT the default PID value (0x0010) shall be used for NIT. TS Packets containing PAT information always have PID 0x0000.

Program Map Tables (PMTs) contain information about programs. For each program, there is one

PMT. While the MPEG-2 standard permits more than one PMT section to be transmitted on a single PID, most MPEG-2 "users" such as ATSC and SCTE require each PMT to be transmitted on a separate PID that is not used for any other packets. The PMTs provide information on each program present in the transport stream, including the program_number, and list the elementary streams that comprise the described MPEG-2 program. There are also locations for optional descriptors that describe the entire MPEG-2 program, as well as an optional descriptor for each elementary stream. Each elementary stream is labeled with a stream_type value.

Sam Siewert 27

http://en.wikipedia.org/wiki/Program_Specific_Information

PSI (Program Specific Information) Data

PCR To enable a decoder to present synchronized content, such as audio tracks matching the

associated video, at least once each 100 ms a Program Clock Reference, or PCR is transmitted in the adaptation field of an MPEG-2 transport stream packet. The PID with the PCR for an MPEG-2 program is identified by the pcr_pid value in the associated Program Map Table. The value of the PCR, when properly used, is employed to generate a system_timing_clock in the decoder. The STC decoder, when properly implemented, provides a highly accurate time base that is used to synchronize audio and video elementary streams. Timing in MPEG2 references this clock, for example the presentation time stamp (PTS) is intended to be relative to the PCR. The first 33 bits are based on a 90 kHz clock. The last 9 are based on a 27 MHz clock. The maximum jitter permitted for the PCR is +/- 500 ns.

Null packets Some transmission schemes, such as those in ATSC and DVB, impose strict constant

bitrate requirements on the transport stream. In order to ensure that the stream maintains a constant bitrate, a Multiplexer may need to insert some additional packets. The PID 0x1FFF is reserved for this purpose. The payload of null packets may not contain any data at all, and the receiver is expected to ignore its contents.

Sam Siewert 28

http://en.wikipedia.org/wiki/Presentation_time_stamp

http://en.wikipedia.org/wiki/ATSC_Standards

http://en.wikipedia.org/wiki/Digital_Video_Broadcasting

http://en.wikipedia.org/wiki/Multiplexer

MPEG-4 vs. MPEG-2 MPEG-2 – Defined by ISO 13818-1, 13818-2 – Leverages MPEG-1 (Motion Picture Experts Group – 1988) – Widely Used for Digital Video – Digital Cable TV, DVD – Transport Stream designed for Broadcast (Lossy, No Beginning or End of

Stream) ATSC – Advanced Television Systems Committee (HDTV Broadcast)

– 8VSB Modulation – 8 level Vestigal Sideband Modulation, 6MhZ channel, 19.39 Mbps, Reed-Solomon Error Correction

– Up to 1080p (1920x1080) Video Resolution – AC-3 (Dolby) Audio

DVB – Digital Video Broadcast (Europe, Satellite) – Program Stream designed for Playback Media (DVD, Flash, HDD, etc.)

MPEG-4 – Defined by ISO 14496 (1998) – Leverages MPEG-2 Standards for Program/Transport, Encode/Decode – Better Compression Rates (improved motion prediction for P,B frames),

MPEG-4 Part-10 (H.264), e.g. Blu-Ray – Extensions for Digital Rights Management – Advanced Audio Encoding – Becoming More Widely Deployed for HD and Because of Lower Bit-Rate

Transport Streams

Sam Siewert 29

Trick-Play Program Stream Operations – Pause – Single Frame Display, Silent – Fast-Forward – 60fps (2x), 120fps (4x), etc., Silent – Rewind – 2x, 4x, etc. Frame Rate in Reverse, Silent – Start-Over – Back to First Frame – Play – 30fps (29.97 NTSC), Audio

Program Stream – Video and Audio Elementary Streams – PTS/DTS – Presentation and Decode Time Stamps – Adjust as

Needed – Bit-rate Control? (Doubles with 2x FF?)

Transport Stream – Multiple Programs Containing Multiple Elementary Streams – PCR – Program Clock Reference – Adjust As Needed – Discontinuity Bit – Set As Needed

Sam Siewert 30

Documents

CS A490 Digital Media and Interactive Systemsmercury.pr.erau.edu/~siewerts/dmis/Lecture-Week-5.pdfCS A490 Digital Media and Interactive Systems ... mis_code/dct2/dct2.c ... Shannon’s