Lec05, Entropy Coding, v1.06.pptce. ... Page 13 Multimedia Systems, Entropy Coding Entropy Coding Entropy,

  • View
    1

  • Download
    0

Embed Size (px)

Text of Lec05, Entropy Coding, v1.06.pptce. ... Page 13 Multimedia Systems, Entropy Coding Entropy Coding...

  • Multimedia Systems

    Entropy Coding

    Mahdi Amiri

    October 2015

    Sharif University of Technology

    Course Presentation

  • Assumptions:

    Single source and user

    Unlimited complexity and delay

    Page 1 Multimedia Systems, Entropy Coding

    Source and Channel Coding Shannon's Separation Principle

    Information

    Source

    Generates information

    we want to transmit or

    store.

    Source

    Coding

    Channel

    Coding

    Reduces number of bits

    to store or transmit

    relevant information.

    Increases number of bits

    or changes them to protect

    against channel errors.

    What about joint source and channel coding?

    Coding related elements in a communication system.

    Claude E.

    Shannon,

    1916-2001

    Ref.: en.wikipedia.org/wiki/Information_theory

    information source: en.wikipedia.org/wiki/Information_source

    source coding : en.wikipedia.org/wiki/Data_compression

    Channel coding: en.wikipedia.org/wiki/Forward_error_correction

  • Page 2 Multimedia Systems, Entropy Coding

    Source Coding Motivation

    Data storage and transmission cost money.

    Use fewest number of bits to represent information source.

    Pro:

    Less memory, less transmission time.

    Cons:

    Extra processing required.

    Distortion (if using lossy compression ).

    Data has to be decompressed to be represented, this

    may cause delay.

  • Page 3 Multimedia Systems, Entropy Coding

    Source Coding Principles

    Example

    The source coder shall represent the video signal by the minimum number of

    (binary) symbols without exceeding an acceptable level of distortion.

    Two principles are utilized:

    1. Properties of the information source that are known a priori result in

    redundant information that need not be transmitted (“redundancy

    reduction“).

    2. The human observer does not perceive certain deviations of the

    received signal from the original (“irrelevancy reduction“).

    Approaches:

    Lossless coding: completely reversible, exploit 1. principle only.

    Lossy coding: not reversible, exploit 1. and 2. principle.

  • Page 4 Multimedia Systems, Entropy Coding

    Data Compression Lossless and Lossy

    Lossless

    Exact reconstruction is possible.

    Applied to general data.

    Lower compression rates.

    Examples: Run-length, Huffman, Lempel-Ziv.

    Lossy

    Higher compression rates.

    Applied to audio, image and video.

    Examples: CELP, JPEG, MPEG-2.

  • Page 5 Multimedia Systems, Entropy Coding

    Data Compression Codec (Encoder and Decoder)

    T

    Transform,

    prediction

    Reconstructed

    signal

    Original

    signal

    T-1

    Q

    Q-1

    E

    E-1

    Compressed

    bit-stream

    Inverse

    Transform

    Quantization

    Dequantization

    Entropy

    encoder

    Entropy

    decoder

    General structure of a Codec.

    In information theory an entropy encoding is a

    lossless data compression scheme that is independent

    of the specific characteristics of the medium. Ref.: en.wikipedia.org/wiki/Entropy_(information_theory)

    en.wikipedia.org/wiki/Entropy_encoding

  • Run-length encoding

    Fixed Length Coding (FLC)

    Variable Length Coding (VLC)

    Huffman Coding Algorithm

    Entropy, Definition

    Lempel-Ziv (LZ77)

    Lempel-Ziv-Welch (LZW)

    Arithmetic Coding

    Page 6 Multimedia Systems, Entropy Coding

    Entropy Coding Selected Topics and Algorithms

  • Page 7 Multimedia Systems, Entropy Coding

    Lossless Compression Run-Length Encoding (RLE)

    BBBBHHDDXXXXKKKKWWZZZZ 4B2H2D4X4K2W4Z

    Image of a rectangle

    0, 40

    0, 40

    0,10 1,20 0,10

    0,10 1,1 0,18 1,1 0,10

    0,10 1,1 0,18 1,1 0,10

    0,10 1,1 0,18 1,1 0,10

    0,10 1,20 0,10

    0,40

    RLE used in

    Fax machines.

  • Page 8 Multimedia Systems, Entropy Coding

    Lossless Compression Fixed Length Coding (FLC)

    A simple example

    ►♣♣♠☻►♣☼►☻The message to code:

    5 different symbols � at least 3 bits

    Message length: 10 symbols

    Total bits required to code: 10*3 = 30 bits

    Codeword table

  • Page 9 Multimedia Systems, Entropy Coding

    Lossless Compression Variable Length Coding (VLC)

    Intuition: Those symbols that are more frequent should have smaller codes, yet since their length is not the same, there must be a way of distinguishing each code

    ►♣♣♠☻►♣☼►☻The message to code:

    Total bits required to code: 3*2 +3*2+2*2+3+3 = 24 bits

    Codeword table To identify end of a codeword as

    soon as it arrives, no codeword can

    be a prefix of another codeword

    How to find the optimal codeword table?

  • Page 10 Multimedia Systems, Entropy Coding

    Lossless Compression VLC, Example Application

    Morse code

    nonprefix code

    Needs separator symbol

    for unique decodability

  • Page 11 Multimedia Systems, Entropy Coding

    Lossless Compression Huffman Coding Algorithm

    Step 1: Take the two least probable symbols in the alphabet

    (longest codewords, equal length, differing in last digit)

    Step 2: Combine these two symbols into a single symbol, and repeat.

    P(n): Probability of

    symbol number n

    Here there is 9 symbols.

    e.g. symbols can be

    alphabet letters ‘a’, ‘b’, ‘c’,

    ‘d’, ‘e’, ‘f’, ‘g’, ‘h’, ‘i’

  • Page 12 Multimedia Systems, Entropy Coding

    Lossless Compression Huffman Coding Algorithm

    David A. Huffman

    1925-1999

    Paper: "A Method for the Construction of

    Minimum-Redundancy Codes“, 1952

    Results in "prefix-free codes“

    Most efficient

    No other mapping will produce a smaller average output size,

    If the actual symbol frequencies agree with those used to create the code.

    Cons:

    Have to run through the entire data in advance to find frequencies.

    ‘Minimum-Redundancy’ is not favorable for error correction techniques (bits

    are not predictable if e.g. one is missing).

    Does not support block of symbols: Huffman is designed to code single

    characters only. Therefore at least one bit is required per character, e.g. a word of

    8 characters requires at least an 8 bit code.

  • Page 13 Multimedia Systems, Entropy Coding

    Entropy Coding Entropy, Definition

    The entropy, H, of a discrete random variable X is a measure of the

    amount of uncertainty associated with the value of X.

    Measure of information content (in bits)

    A quantitative measure of the disorder of a system

    It is impossible to compress the data such that the average

    number of bits per symbol is less than the Shannon entropy

    of the source(in noiseless channel)

    The Intuition Behind the Formula

    ( ) ( ) ( )2 1

    log x X

    H X P x P x∈

    = ⋅∑

    Claude E. Shannon

    1916-2001( ) ( ) 1

    amount of uncertatinty P x H P x

    ↑ ⇒ ↓ ⇒ ∼

    ( ) ( )2

    1 bringing it to the world of bits log , information content of H I x x

    P x ⇒ ∼ =

    ( )weighted average number of bits required to encode each possible value and P x⇒ × ∑

    X � Information Source

    P(x) � Probability that symbol x in X will occur

    Information Theory

    Point of View

  • Page 14 Multimedia Systems, Entropy Coding

    Lossless Compression Lempel-Ziv (LZ77)

    Algorithm for compression of character sequences

    Assumption: Sequences of characters are repeated

    Idea: Replace a character sequence by a reference to an earlier occurrence 1. Define a: search buffer = (portion) of recently encoded data

    look-ahead buffer = not yet encoded data

    2. Find the longest match between

    the first characters of the look ahead buffer

    and an arbitrary character sequence in the search buffer

    3. Produces output

    offset + length = reference to earlier occurrence

    next_character = the first character following the match in the look ahead buffer

  • Page 15 Multimedia Systems, Entropy Coding

    Lossless Compression Lempel-Ziv-Welch (LZW)

    Drops the search buffer and keeps an explicit dictionary

    Produces only output

    Used by unix "compress", "GIF", "V24.bis", "TIFF”

    Example: wabbapwabbapwabbapwabbapwoopwoopwoo

    Progress clip at 12th entry

    Encoder output sequence so far: 5 2 3 3 2 1

  • Page 16 Multimedia Systems, Entropy Coding

    Lossless Compression Lempel-Ziv-Welch (LZW)

    Example: wabbapwabbapwabbapwabbapwoopwoopwoo

    Progress clip at the end of above example

    Encoder output sequence: 5 2 3 3 2 1

    6 8 10 12 9 11 7 16 5 4 4 11 21 23 4

  • Page 17 Multimedia Systems, Entropy Coding

    Lossless Compression Arithmetic Coding

    Encodes the block of symbols into a single number, a

    fraction n where (0