26
Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY DO NOT DISTRIBUTE Multimedia Communications Dr.-Ing. Aljoscha Smolic Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY DO NOT DISTRIBUTE Information Theory, Entropycoding

MMC12_InfoTheory_Entropiecod

Embed Size (px)

DESCRIPTION

MMC12_InfoTheory_Entropiecod

Citation preview

  • Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    Multimedia

    Communications

    Dr.-Ing. Aljoscha Smolic

    Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    Information Theory,

    Entropycoding

  • Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    Overview

    Basic terms and principles of information theory and signal processing for multimedia

    Information theory, entropycoding Communication channel Sampling Quantisation Transformation Signal processing (filtering) Statistics Prediction

    Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    Materials

    J.-R. Ohm

    Multimedia Communication Technology

    Springer-Verlag

    Google, Wikipedia: Information theory, Entropycoding

    C.E. Shannon, "A Mathematical Theory of Communication", Bell System

    Technical Journal, vol. 27, pp. 379-423, 623-656, July, October, 1948

    http://cm.bell-labs.com/cm/ms/what/shannonday/shannon1948.pdf

  • Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    Relevance - Irrelevance

    Relevance: Parts of a message that are of impact for the receiver

    Irrelevance: Unimportant parts of a message Things, we cannot perceive (tones > 20 kHz, infrared light, masking mp3)

    Things, the presentation device (e.g. display) cannot handle (conversion cinema -> TV)

    Fundamental instrument of compression: Irrelevance reduction, detection and removal of

    irrelevant parts of messages Example: mp3, frequencies we cannot hear, are

    detected and not transmitted

    Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    JPEG

    Original Compressed 1:150

  • Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    JPEG

    Original Compressed 1:150

    Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    Redundancy Redundancy: Parts of a message that result from the

    rest, that can be reconstructed given the rest

    E.g.: der weie Schimmel, He jumped into the lake and got wet.

    We can find another representation, which is more compact

    Rot, rot, rot, rot, rot, blau, blau, blau, rot, rot, rot 5 x rot, 3 x blau, 3 x rot Run-length coding, fax (black, white)

    Fundamental instrument of compression: Redundancy reduction, detection and removal of

    redundant parts of messages

  • Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    Reversibility

    Redundancy reduction: is fully reversible, i.e. the complete, exact information can be reconstructed

    Irrelevancy reduction is irreversible, i.e. the discarded parts can not be reconstructed later

    Redundancy relates to statistics and models

    Irrelevancy relates to human perception

    Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    Communication Plane

    Relevancy

    Non-Redundancy

    Interesting

  • Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    Information content: Numerical measure for the information that is finally included within a message

    Numerical measure for the predictability of an event within its context

    Measured via the probability of a message/event Die Tagesschau kommt um 20.00h, low IC Die Tagesschau kommt heute um 20.30h, high IC

    The more surprise, the higher the numerical value of IC

    Information is surprise, uncertainty

    Information Content

    Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    Low IC High IC

    t t

    Information Content

  • Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    ?

    Information Content

    Low IC High IC

    Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    j event from event space J

    E.g. Kopf from {Kopf, Zahl}

    Probability for result Kopf: P(j) =

    Definition Information Content (unit bit):

    )(1log)( 2 jPji = 12/1

    1log)"Kopf(" 2 ==i

    Information Content

  • Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    Manipulated coin Probability for Kopf: P(j) = 1/4 Probability for Zahl: P(j) = 3/4

    Event Kopf is more surprising -> higher IC

    24/1

    1log)"Kopf(" 2 ==i

    42,04/3

    1log)"Zahl(" 2 ==i

    Information Content

    Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    Entropy = mean information content of a source Determines how many bit per symbol are necessary to encode a

    source binary (theroretical bound)

    Entropy

    ==

    J

    jjijPJH

    1)()()(

    112/112/1)"Coin(" =+=H

    815,024/142,04/3)"Coin dManipulate(" =+=H

  • Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    Source Coin has higher entropy (mean information content) than source Manipulated Coin

    Source Manipulated Coin contains redundancy, is predictable

    In theory: Coding of source Coin requires a mean of 1 bit per event (0 = Kopf, 1 = Zahl)

    Coding of source Manipulated Coin requires a mean of 0,815 bit per event (see below how to do code assignment)

    Entropy

    Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    Excercise: Calculate Information Content and Entropy for source Wrfel assuming same probabilities for all events

    P(j) = 1/6

    58,258,261)"(" 6

    1 ===j

    WrfelH

    Entropy

    58,26/1

    1log)"6("...)"1(" 2 ==== ii

  • Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    Excercise: Calculate Information Content and Entropy for source gezinkter Wrfel assuming the following probabilities:

    P(1) = P(2) = P(3) = P(4) = 1/8; P(5) = P(6) = 2/8 = 1/4

    Entropy

    38/1

    1log)"4("...)"1(" 2 ==== ii

    24/1

    1log)"6(")"5(" 2 === ii

    Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    Entropy

    5,22413

    81)" (" 4

    1

    6

    5 = +== =j j

    WrfelgezinkterH

    38/1

    1log)"4("...)"1(" 2 ==== ii

    24/1

    1log)"6(")"5(" 2 === ii

  • Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    Entropy Wrfel

    1/6

    P(j) i(j)

    j1 6

    2,58H(J)=2,58=log26

    j1 6

    58,26log6

    11log)"6("...)"1(" 22 ===== ii 58,258,261)(

    6

    1

    =

    ==

    jJH

    )(1log)( 2 jPji =

    ==

    J

    jjijPJH

    1)()()(

    Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    Uniformly distributed source: all events have same probability

    Maximum Entropy of a source with N events: Decision Content H0 = log2N = max(H)

    Equal to information content of every possible event None of the events is favored Source is not predictable, completely random No Redundancy

    Decision Content

    j1 N

    P(j)1/N

    j1 N

    i(j)log2N

    H(J) = log2N = H0

  • Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    Entropy of a Source with a Known Event

    j1 6

    P(j)

    0

    1

    j1 6

    i(j)

    0

    ===== 22 log01log)"5("...)"1(" ii 0

    11log)"6(" 2 ==i

    )min(0010)(5

    1HJH

    j==+=

    =

    Source does not transmit information

    H(J) = 0

    Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    Entropy gezinkter Wrfel

    3

    j1 6

    i(j)H(J)=2,5

    j1 6

    P(j)

    1/8

    1/4

    2

    P(1) = P(2) = P(3) = P(4) = 1/8; P(5) = P(6) = 1/4

    38/1

    1log)"4("...)"1(" 2 ==== ii 24/11log)"6(")"5(" 2 === ii

    5,22413

    81)" (" 4

    1

    6

    5 = +== =j j

    WrfelgezinkterH

  • Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    Non-uniformly distributed source: events with different probabilities

    Entropy smaller than decision content 0 H(J) H0 = log2N Source contains redundancy, it is predictable R = H0 - H(J)

    Redundancy

    j1 N

    i(j)0 H(J) H0

    j1 N

    P(j)

    Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    Continuous signals are analogue, the representation (e.g. voltage) follows the physical phenomenon (sound, light)

    Goal: Digital Representation: Representation via a sequence of symbols Mostly binary symbols (bits) with 2 possible values (states), e.g. 0 and 1, A und B

    Physical representation of this information about states then e.g. by 2 voltage values

    Digital Representation

    t

    U(t)

    1 0 1 1 0on/off

  • Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    Not the signal itself, but a different description of it is represented, the information

    High quality at low cost

    Error and noise free transmission, storage, copy possible

    Digital manipulation (editing) of the signals is possible, software Integration to multimedia is possible, new formats of content

    Digital Representation

    Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    Assignment of information to a symbol

    Kopf = 1 Zahl = 0

    Source alphabet (discrete) Codebook

    rot = 00 gelb = 01 grn = 10 wei = 11

    01 00 10 01 =

    Digital Representation

  • Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    Codebook with N bit can represent 2N code words

    N = 1 Code words 1, 0 => 2 source events can be represented (source alphabet Kopf, Zahl)

    N = 2 Code words 00, 01, 10, 11 => 4 source events can be represented ( )

    Digital Representation

    Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    Source alphabet (discrete) Codebook A source with M events requires a codebook with N >= log2M

    Wrfel: M = 6 -> N >= log26 = 2,58 bit -> N = 3 bit z.B. 000 = 1, 001 = 2, 010 = 3, 011 = 4, 100 = 5, 101 = 6

    Unused code words: 110, 111

    German Alphabet: M = 26 -> N >= log226 = 4,7 bit -> N = 5 bit z.B. 00000 = a, 00001 = b, Possible code words with 5 bit: 25 = 32

    -> 6 code words unused

    Fixed Length Coding

  • Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    Excercise: Create a binary fixed length code for the teams in the Fussball Bundesliga (18 teams)

    Which code length is required? How many code words are not used?

    M = 18 => N >= log218 bit = 4,17 bit => N = 5 bit

    E.g. 00000 = Hertha BSC, 00001 = Bayern Muenchen,

    Possible code words with 5 bit: 25 = 32 -> 14 code words unused

    Fixed Length Coding

    Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    Excercise: Create a binary fixed length code for the international airports (e.g. TXL, SFO), assumption 26 letters

    How may elements does the source alphabet contain? Which code length is required? How many code words are not used?

    M = 26*26*26 = 263 = 17576 =>N >= log217576 bit = 14,10 bit => N = 15 bit

    Possible code words with15 bit: 215 = 32768-> 15192 code words unused

    Digitale Reprsentation

  • Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    Manipulated coin Probability for Kopf: P(j) = 1/4 Probability for Zahl: P(j) = 3/4

    Event Kopf is more surprising -> higher IC

    24/1

    1log)"Kopf(" 2 ==i

    42,04/3

    1log)"Zahl(" 2 ==i

    Remind: Information Content

    Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    Entropy = mean information content of a source Determines how many bit per symbol are necessary to encode a

    source binary (theroretical bound)

    Remind: Entropy

    ==

    J

    jjijPJH

    1)()()(

    112/112/1)"Coin(" =+=H

    815,024/142,04/3)"Coin dManipulate(" =+=H

  • Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    Define combined event: Double Manipulated Coin

    Source: KK, KZ, ZK, ZZ

    Probabilities: P(KK) = 1/4 * 1/4 = 1/16 P(KZ) = 1/4 * 3/4 = 3/16 P(ZK) = 3/4 * 1/4 = 3/16 P(ZZ) = 3/4 * 3/4 = 9/16

    Combined Events

    Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    Variable length code (VLC): (Arbitrarily defined!!! No Huffman Code)

    1 = ZZ 000 = ZK 001 = KZ 010 = KK

    Mean bit length per event ( [Probability * number of bits for event j] ):

    875,131613

    1633

    1631

    169)()()(

    1=+++= =

    =

    J

    jjNjPJN

    Variable Length Codes

  • Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    Mean number of bits per combined event: 1,875 bit

    I.e. coding of one event requires a mean of 0.9375 bit

    Coding as single event requires 1 bit per event

    Usage of combined events and variable length codes adapted to probability allows approximation to theoretical bound given by Entropy of 0,815 bit

    Variable Length Codes

    Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    Define combined event: Triple manipulated coin

    Source: KKK, KKZ, KZK, KZZ, ZKK, ZKZ, ZZK, ZZZ

    23 events

    Combined Events

  • Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    Probabilities: P(KKK)= 1/4 * 1/4 * 1/4 = 1/64 P(KKZ)= 1/4 * 1/4 * 3/4 = 3/64 P(KZK)= 1/4 * 3/4 * 1/4 = 3/64 P(KZZ)= 1/4 * 3/4 * 3/4 = 9/64 P(ZKK)= 3/4 * 1/4 * 1/4 = 3/64 P(ZKZ)= 3/4 * 1/4 * 3/4 = 9/64 P(ZZK)= 3/4 * 3/4 * 1/4 = 9/64 P(ZZZ)= 3/4 * 3/4 * 3/4 = 27/64

    Combined Events

    Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    Variable lenght code (Huffman): KKK = 00000 KKZ = 00001 KZK = 00010 KZZ = 010 ZKK = 00011 ZKZ = 011 ZZK = 001 ZZZ = 1

    Variable Length Codes

  • Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    Mean bit length per event ( [Probability * number] of bits for event j):

    2,469=+++

    ++++== =

    164273

    6493

    6493

    649

    56435

    6435

    6435

    641)()()(

    1

    J

    jjNjPJN

    Variable Length Codes

    Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    Mean number of bits per combined event: 2,469 bit

    I.e. coding of one event requires a mean of 0.823 bit

    Coding as single event requires 1 bit per event

    Usage of combined events and variable length codes adapted to probability allows approximation to theoretical bound given by Entropy of 0,815 bit

    Variable Length Codes

  • Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    Definition of longer combined events and adequate VLCs allows even better approximation of entropy

    Theorie: For combined events of infinite length the entropy is reached for every source

    But: Infinite processing time (delay)

    In practice: Optimization of efficiency and delay for given application

    Variable Length Codes

    Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    The entropy of a source determines the minimum bitrate required for error free transmission of source symbols.

    It can be achieved by block codes and appropriate variable lenghth coding.

    Shannons Theorem

  • Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    Entropy Coding Purpose: Lossless coding of a discrete source A = {a1, a2, , aJ} using a

    codebook C = {c1, c2, , cJ}

    Goal: Compression exploiting redundancy of the source

    Theoretical bound: Entropy of the source

    Determines the minimum mean number of bits per symbol required

    for binary coding of the source A

    ==

    J

    j jjaiaPH

    1)()()(A

    Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    Instruments: Variable length coding, block codes (for combined

    events)

    Code words of different bit length are assigned according to

    probability of source events

    Events with high probability get short code words, unlikely events get

    longer code words

    On average this results in a reduction of overall bitrate

    Entropy Coding

  • Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    Code Rate Bit length of code word

    Rate = average bit length of code words of a code

    Code must be decodable unambiguously:

    Separators between code words, increases rate

    Prefix free codes: no code word may be the beginning of any

    other code word

    = ====

    J

    j jjJ

    j jjjaiaPHaNaPaNR

    11)()()()()()( A

    )( jaN

    Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    Code Tree

    Prefix free codes :

    Can be represented

    by a code tree

    Wurzel("root")

    1

    0

    1

    0

    1

    0 10

    10

    10

    1

    0

    1

    01

    0

    Knoten("node")

    Blatt("leave")

    "111"

    "1101"

    "1100"

    "1011"

    "1010"

    "011"

    "0101"

    "0100"

    "100"

    "00"

    Zweig("branch")

  • Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    p(7)=0,29

    p(6)=0,28

    p(5)=0,16

    p(4)=0,14

    p(3)=0,07

    p(2)=0,03p(1)=0,02

    p(0)=0,01 p=0,03p=0,06

    p=0,13

    p=0,27

    p=0,43

    p=0,57

    "0""1" "0"

    "0"

    "0"

    "0"

    "0"

    "0"

    "1"

    "1"

    "1"

    "1"

    "1""1"

    "11"

    "10"

    "01"

    "001"

    "0001"

    "00001"

    "000001"

    "000000"

    Huffman Codes

    Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    Huffman Codes

    Algorithm of Huffman coding

    Sort the events by probabilities

    Combine the least probable branches, assign 0 and 1

    Create new combined branch with sum of probabilities

    Combine again least probable branches

    etc. until root

    A code tree is created

    The code words result reading from root to leaves

  • Aljoscha Smolic

    Multimedia Communications

    FOR CLASS USE ONLY

    DO NOT DISTRIBUTE

    Huffman Codes In this example:

    Entropy:

    Fixed length coding:

    M = 8 => N = log28 = 3

    Rate: 49,2)()()(1

    = ===

    J

    j jjjaNaPaNR

    45,2)( =AH