MMC12_InfoTheory_Entropiecod

Aljoscha Smolic

Multimedia Communications

FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Multimedia

Communications

Dr.-Ing. Aljoscha Smolic

Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Information Theory,

Entropycoding

Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Overview

Basic terms and principles of information theory and signal processing for multimedia

Information theory, entropycoding Communication channel Sampling Quantisation Transformation Signal processing (filtering) Statistics Prediction

Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Materials

J.-R. Ohm

Multimedia Communication Technology

Springer-Verlag

Google, Wikipedia: Information theory, Entropycoding

C.E. Shannon, "A Mathematical Theory of Communication", Bell System

Technical Journal, vol. 27, pp. 379-423, 623-656, July, October, 1948

http://cm.bell-labs.com/cm/ms/what/shannonday/shannon1948.pdf

Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Relevance - Irrelevance

Relevance: Parts of a message that are of impact for the receiver

Irrelevance: Unimportant parts of a message Things, we cannot perceive (tones > 20 kHz, infrared light, masking mp3)

Things, the presentation device (e.g. display) cannot handle (conversion cinema -> TV)

Fundamental instrument of compression: Irrelevance reduction, detection and removal of

irrelevant parts of messages Example: mp3, frequencies we cannot hear, are

detected and not transmitted

Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

JPEG

Original Compressed 1:150

Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

JPEG

Original Compressed 1:150

Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Redundancy Redundancy: Parts of a message that result from the

rest, that can be reconstructed given the rest

E.g.: der weie Schimmel, He jumped into the lake and got wet.

We can find another representation, which is more compact

Rot, rot, rot, rot, rot, blau, blau, blau, rot, rot, rot 5 x rot, 3 x blau, 3 x rot Run-length coding, fax (black, white)

Fundamental instrument of compression: Redundancy reduction, detection and removal of

redundant parts of messages

Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Reversibility

Redundancy reduction: is fully reversible, i.e. the complete, exact information can be reconstructed

Irrelevancy reduction is irreversible, i.e. the discarded parts can not be reconstructed later

Redundancy relates to statistics and models

Irrelevancy relates to human perception

Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Communication Plane

Relevancy

Non-Redundancy

Interesting

Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Information content: Numerical measure for the information that is finally included within a message

Numerical measure for the predictability of an event within its context

Measured via the probability of a message/event Die Tagesschau kommt um 20.00h, low IC Die Tagesschau kommt heute um 20.30h, high IC

The more surprise, the higher the numerical value of IC

Information is surprise, uncertainty

Information Content

Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Low IC High IC

t t

Information Content

Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

?

Information Content

Low IC High IC

Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

j event from event space J

E.g. Kopf from {Kopf, Zahl}

Probability for result Kopf: P(j) =

Definition Information Content (unit bit):

)(1log)( 2 jPji = 12/1

1log)"Kopf(" 2 ==i

Information Content

Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Manipulated coin Probability for Kopf: P(j) = 1/4 Probability for Zahl: P(j) = 3/4

Event Kopf is more surprising -> higher IC

24/1

1log)"Kopf(" 2 ==i

42,04/3

1log)"Zahl(" 2 ==i

Information Content

Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Entropy = mean information content of a source Determines how many bit per symbol are necessary to encode a

source binary (theroretical bound)

Entropy

==

J

jjijPJH

1)()()(

112/112/1)"Coin(" =+=H

815,024/142,04/3)"Coin dManipulate(" =+=H

Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Source Coin has higher entropy (mean information content) than source Manipulated Coin

Source Manipulated Coin contains redundancy, is predictable

In theory: Coding of source Coin requires a mean of 1 bit per event (0 = Kopf, 1 = Zahl)

Coding of source Manipulated Coin requires a mean of 0,815 bit per event (see below how to do code assignment)

Entropy

Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Excercise: Calculate Information Content and Entropy for source Wrfel assuming same probabilities for all events

P(j) = 1/6

58,258,261)"(" 6

1 ===j

WrfelH

Entropy

58,26/1

1log)"6("...)"1(" 2 ==== ii

Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Excercise: Calculate Information Content and Entropy for source gezinkter Wrfel assuming the following probabilities:

P(1) = P(2) = P(3) = P(4) = 1/8; P(5) = P(6) = 2/8 = 1/4

Entropy

38/1

1log)"4("...)"1(" 2 ==== ii

24/1

1log)"6(")"5(" 2 === ii

Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Entropy

5,22413

81)" (" 4

1

6

5 = +== =j j

WrfelgezinkterH

38/1

1log)"4("...)"1(" 2 ==== ii

24/1

1log)"6(")"5(" 2 === ii

Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Entropy Wrfel

1/6

P(j) i(j)

j1 6

2,58H(J)=2,58=log26

j1 6

58,26log6

11log)"6("...)"1(" 22 ===== ii 58,258,261)(

6

1

=

==

jJH

)(1log)( 2 jPji =

==

J

jjijPJH

1)()()(

Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Uniformly distributed source: all events have same probability

Maximum Entropy of a source with N events: Decision Content H0 = log2N = max(H)

Equal to information content of every possible event None of the events is favored Source is not predictable, completely random No Redundancy

Decision Content

j1 N

P(j)1/N

j1 N

i(j)log2N

H(J) = log2N = H0

Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Entropy of a Source with a Known Event

j1 6

P(j)

0

1

j1 6

i(j)

0

===== 22 log01log)"5("...)"1(" ii 0

11log)"6(" 2 ==i

)min(0010)(5

1HJH

j==+=

=

Source does not transmit information

H(J) = 0

Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Entropy gezinkter Wrfel

3

j1 6

i(j)H(J)=2,5

j1 6

P(j)

1/8

1/4

2

P(1) = P(2) = P(3) = P(4) = 1/8; P(5) = P(6) = 1/4

38/1

1log)"4("...)"1(" 2 ==== ii 24/11log)"6(")"5(" 2 === ii

5,22413

81)" (" 4

1

6

5 = +== =j j

WrfelgezinkterH

Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Non-uniformly distributed source: events with different probabilities

Entropy smaller than decision content 0 H(J) H0 = log2N Source contains redundancy, it is predictable R = H0 - H(J)

Redundancy

j1 N

i(j)0 H(J) H0

j1 N

P(j)

Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Continuous signals are analogue, the representation (e.g. voltage) follows the physical phenomenon (sound, light)

Goal: Digital Representation: Representation via a sequence of symbols Mostly binary symbols (bits) with 2 possible values (states), e.g. 0 and 1, A und B

Physical representation of this information about states then e.g. by 2 voltage values

Digital Representation

t

U(t)

1 0 1 1 0on/off

Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Not the signal itself, but a different description of it is represented, the information

High quality at low cost

Error and noise free transmission, storage, copy possible

Digital manipulation (editing) of the signals is possible, software Integration to multimedia is possible, new formats of content


Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Assignment of information to a symbol

Kopf = 1 Zahl = 0

Source alphabet (discrete) Codebook

rot = 00 gelb = 01 grn = 10 wei = 11

01 00 10 01 =


Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Codebook with N bit can represent 2N code words

N = 1 Code words 1, 0 => 2 source events can be represented (source alphabet Kopf, Zahl)

N = 2 Code words 00, 01, 10, 11 => 4 source events can be represented ( )


Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Source alphabet (discrete) Codebook A source with M events requires a codebook with N >= log2M

Wrfel: M = 6 -> N >= log26 = 2,58 bit -> N = 3 bit z.B. 000 = 1, 001 = 2, 010 = 3, 011 = 4, 100 = 5, 101 = 6

Unused code words: 110, 111

German Alphabet: M = 26 -> N >= log226 = 4,7 bit -> N = 5 bit z.B. 00000 = a, 00001 = b, Possible code words with 5 bit: 25 = 32

-> 6 code words unused

Fixed Length Coding

Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Excercise: Create a binary fixed length code for the teams in the Fussball Bundesliga (18 teams)

Which code length is required? How many code words are not used?

M = 18 => N >= log218 bit = 4,17 bit => N = 5 bit

E.g. 00000 = Hertha BSC, 00001 = Bayern Muenchen,

Possible code words with 5 bit: 25 = 32 -> 14 code words unused

Fixed Length Coding

Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Excercise: Create a binary fixed length code for the international airports (e.g. TXL, SFO), assumption 26 letters

How may elements does the source alphabet contain? Which code length is required? How many code words are not used?

M = 26*26*26 = 263 = 17576 =>N >= log217576 bit = 14,10 bit => N = 15 bit

Possible code words with15 bit: 215 = 32768-> 15192 code words unused

Digitale Reprsentation

Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Manipulated coin Probability for Kopf: P(j) = 1/4 Probability for Zahl: P(j) = 3/4

Event Kopf is more surprising -> higher IC

24/1

1log)"Kopf(" 2 ==i

42,04/3

1log)"Zahl(" 2 ==i

Remind: Information Content

Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Entropy = mean information content of a source Determines how many bit per symbol are necessary to encode a

source binary (theroretical bound)

Remind: Entropy

==

J

jjijPJH

1)()()(

112/112/1)"Coin(" =+=H

815,024/142,04/3)"Coin dManipulate(" =+=H

Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Define combined event: Double Manipulated Coin

Source: KK, KZ, ZK, ZZ

Probabilities: P(KK) = 1/4 * 1/4 = 1/16 P(KZ) = 1/4 * 3/4 = 3/16 P(ZK) = 3/4 * 1/4 = 3/16 P(ZZ) = 3/4 * 3/4 = 9/16

Combined Events

Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Variable length code (VLC): (Arbitrarily defined!!! No Huffman Code)

1 = ZZ 000 = ZK 001 = KZ 010 = KK

Mean bit length per event ( [Probability * number of bits for event j] ):

875,131613

1633

1631

169)()()(

1=+++= =

=

J

jjNjPJN

Variable Length Codes

Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Mean number of bits per combined event: 1,875 bit

I.e. coding of one event requires a mean of 0.9375 bit

Coding as single event requires 1 bit per event

Usage of combined events and variable length codes adapted to probability allows approximation to theoretical bound given by Entropy of 0,815 bit


Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Define combined event: Triple manipulated coin

Source: KKK, KKZ, KZK, KZZ, ZKK, ZKZ, ZZK, ZZZ

23 events

Combined Events

Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Probabilities: P(KKK)= 1/4 * 1/4 * 1/4 = 1/64 P(KKZ)= 1/4 * 1/4 * 3/4 = 3/64 P(KZK)= 1/4 * 3/4 * 1/4 = 3/64 P(KZZ)= 1/4 * 3/4 * 3/4 = 9/64 P(ZKK)= 3/4 * 1/4 * 1/4 = 3/64 P(ZKZ)= 3/4 * 1/4 * 3/4 = 9/64 P(ZZK)= 3/4 * 3/4 * 1/4 = 9/64 P(ZZZ)= 3/4 * 3/4 * 3/4 = 27/64

Combined Events

Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Variable lenght code (Huffman): KKK = 00000 KKZ = 00001 KZK = 00010 KZZ = 010 ZKK = 00011 ZKZ = 011 ZZK = 001 ZZZ = 1


Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Mean bit length per event ( [Probability * number] of bits for event j):

2,469=+++

++++== =

164273

6493

6493

649

56435

6435

6435

641)()()(

1

J

jjNjPJN


Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Mean number of bits per combined event: 2,469 bit

I.e. coding of one event requires a mean of 0.823 bit

Coding as single event requires 1 bit per event

Usage of combined events and variable length codes adapted to probability allows approximation to theoretical bound given by Entropy of 0,815 bit


Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Definition of longer combined events and adequate VLCs allows even better approximation of entropy

Theorie: For combined events of infinite length the entropy is reached for every source

But: Infinite processing time (delay)

In practice: Optimization of efficiency and delay for given application


Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

The entropy of a source determines the minimum bitrate required for error free transmission of source symbols.

It can be achieved by block codes and appropriate variable lenghth coding.

Shannons Theorem

Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Entropy Coding Purpose: Lossless coding of a discrete source A = {a1, a2, , aJ} using a

codebook C = {c1, c2, , cJ}

Goal: Compression exploiting redundancy of the source

Theoretical bound: Entropy of the source

Determines the minimum mean number of bits per symbol required

for binary coding of the source A

==

J

j jjaiaPH

1)()()(A

Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Instruments: Variable length coding, block codes (for combined

events)

Code words of different bit length are assigned according to

probability of source events

Events with high probability get short code words, unlikely events get

longer code words

On average this results in a reduction of overall bitrate

Entropy Coding

Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Code Rate Bit length of code word

Rate = average bit length of code words of a code

Code must be decodable unambiguously:

Separators between code words, increases rate

Prefix free codes: no code word may be the beginning of any

other code word

= ====

J

j jjJ

j jjjaiaPHaNaPaNR

11)()()()()()( A

)( jaN

Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Code Tree

Prefix free codes :

Can be represented

by a code tree

Wurzel("root")

1

0

1

0

1

0 10

10

10

1

0

1

01

0

Knoten("node")

Blatt("leave")

"111"

"1101"

"1100"

"1011"

"1010"

"011"

"0101"

"0100"

"100"

"00"

Zweig("branch")

Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

p(7)=0,29

p(6)=0,28

p(5)=0,16

p(4)=0,14

p(3)=0,07

p(2)=0,03p(1)=0,02

p(0)=0,01 p=0,03p=0,06

p=0,13

p=0,27

p=0,43

p=0,57

"0""1" "0"

"0"

"0"

"0"

"0"

"0"

"1"

"1"

"1"

"1"

"1""1"

"11"

"10"

"01"

"001"

"0001"

"00001"

"000001"

"000000"

Huffman Codes

Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Huffman Codes

Algorithm of Huffman coding

Sort the events by probabilities

Combine the least probable branches, assign 0 and 1

Create new combined branch with sum of probabilities

Combine again least probable branches

etc. until root

A code tree is created

The code words result reading from root to leaves

Aljoscha Smolic


FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Huffman Codes In this example:

Entropy:

Fixed length coding:

M = 8 => N = log28 = 3

Rate: 49,2)()()(1

= ===

J

j jjjaNaPaNR

45,2)( =AH

Documents

MMC12_InfoTheory_Entropiecod