ENTROPY - luc.devroye.orgluc.devroye.org/HenriMertens-Entropy.pdf · ENTROPY THE BASICS OF INFORMATION THEORY Shannon 's theory from 1948. Shannon's view Lower bounds for compression

ENTROPY

(Notes by Glenn. Mertens)

ENTROPYTHE BASICS OF INFORMATION THEORY

Shannon 's theory from 1948.

Shannon 's view

Lower bounds for compression

Entropy E = Z.fi leg,Yp

.

.

The global view

Back to prefix codes.

. hempel - Ziv compression .

Shannon 's view

IFEE -oe→Bt7EF → REFER

reproofed as a ( lossless compassion)path in atria

①

Ebbntght !10

HUGE -07

.

Expected length of compressed file-

Ewa: gameD D D D probability of seeing

files filet files file 4 input file "i

".

So,

the last compression method, given

the fi 's,

is the

Huffman code.

BUT . . . .The tree is too large !

The pi 's are often not known precisely .

Nevertheless,

we know a lot about the best compression method :

Shannon 's theorem.

E E ruin !.

.

Pili s E t I.

all binarytrees

E = Epi legftp.. = (binary) entropy.

G- o )

KRAFT 's INEQUALITY

Let Li be the depths of the leaves in a binary tree.

Then

? # E I.

Frodo (By induction : exercise ? )°

← →" " .

i÷!÷!÷÷÷÷÷÷.

PROOF Of SHANNON 'S THEOREM

(LOWER BOUND ) E

Epik . -

- Epi lggeli-ER.bg,

Ghi: )tEpiloja= Zpilofageip.

t E

> -

ZK.CZ?p.-D#tE=

-

- + E

Read:

a

'

atiYig÷÷÷→I

( UPPER BOUND )

By converse of Kraft i If ?Yzli ee,

then there exists

a binary tree with leaves at these depths ( enna 're ).

so, given Pi ,At

e , = togElp?

.

Then I € -2¥ ,p..

= Ipi =L.

fo,

we can use these

Li to make a tree.

Let that tree define the code.

The

expected length is 9- called theShannon - Fano code

-2 pili = Epitopes Epi leg ftp.t Epi = Et I.

So,

there is a code with expected length ⇐ Et I.

The global viewunrealistic

If the input consists of an independent symbols froman alphabet A

,with symbol probabilities pi ,

then each symbol can be coded via Hoffman ,

and we have a total expected length

← M ×③¥ntqyof one symbolLower bound : 3 M * E

(entropy of file = sum of outagesSo

. . . . of the symbols)

It help togroup

the symbols in

groups of⑧ ,and

Huffman - code eachgot . T

a small number

01 : One could use Lampert - Ziv compression Csa later )

Solution I : Hoffman coding on gaps of characters

be: Group "letters in set . of s :Caleb )

,Cbca)

,.

. .

Get pi by countingoccurrences in a file

Construct the Huffman code .

Code t decode as for prefix codes

RECALL

ofI

toStart at root and detect a leaf .

Decode.

Repeat .

Coin pressed 00001 I I I 01010101sequence

in -

Decoded i ta ta III ! IFTime to decode = Length of compressed sequence .

compressed sequences-The decoder :

Given the compressed sequence s

- de

1£ t

Given the binary tree with the code ; it.

root is t ;its leaves contain symbols of the alphabet A

.

seat (traveling pointer in t )while Isl > o :

G a- get neat Git from s

if 8=0 then a - left CoDelse x ← right Cx ]

if a is a leaf then output hey In ]Rot

LEMPEL - Ziv COMPRESSION (taupe and Ziv,

1977 ) (Solution I )

Beagles:

zip , jpeg ,

most compression methods.

Feature: Undergenerous input file assumptions ,

the expectedlength of the compressed sequence

is close to E.

Method: Parse inept in smallest pieces never seen before

INIVT pie,q#60aa ab ale ab e c 6 G Gaa 6 a⑥ aaa G a ac

PIECE # O I 2 3 4 5 6 7 8 9 10 11 12

tta ! ! !: dictate data.

take He

painfulFastsymbol of piece

to front piece

THE BINARY SEQUENCE .

Fr k - th piece ,we have

lT¥a symbol from the← alphabetan integer in too . . .sk - ⇒

l needs tofalktI bit.

piece # I 2 3 4 56 7 8 9 10 in i 7 12

Tof O 1 2 233 334 4 4 4 ,- -

ii.Tx I

x needs a fixed # of bits : TegelAT.

In output ,all bits axe clearly identified .

DATA STRUCTURE

FOR CODING / DECODING .

:THE DIGITAL SEARCH TREE :

"

IAI -

anytrie "

INPUT

0aaaba babe ebb Gaa Ga

⑥aaaeoaac

PIECE # O I 2 3 4 5 6 7 8 9 10 11 12

ttoa.t.at. .! dictate !a data !. Ieago

10

÷E÷¥÷÷÷÷¥:*.÷÷÷ .

In theparsing phase :

start at root and descend to a leafA add a symbol Cand new leaf )

to add a piece .

Exercise: If inept is of size n

,then write the

parsing algorithm that produces

andG) the tree

G) the sequence ( Oa ) ha ) lob ) . - -

and show that it takes time On ).

In the decoding phase : keep a table of pointers .

piece #

%athenEta:impatient3 o 6 withpeanutpointers

4 I G - -

-

5 4 C

060 c

7 3 6€34aato 6 a

LI 2 812 a c

Decodepiece to as ⑥a) → ( o e a) → (01 c a)

Exercise: Writean Oln ) algorithm for decoding .

Documents

ENTROPY - luc.devroye.orgluc.devroye.org/HenriMertens-Entropy.pdf · ENTROPY THE BASICS OF INFORMATION THEORY Shannon 's theory from 1948. Shannon's view Lower bounds for compression