16
ENTROPY ( Notes by Glenn . Mertens )

ENTROPY - luc.devroye.orgluc.devroye.org/HenriMertens-Entropy.pdf · ENTROPY THE BASICS OF INFORMATION THEORY Shannon 's theory from 1948. Shannon's view Lower bounds for compression

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ENTROPY - luc.devroye.orgluc.devroye.org/HenriMertens-Entropy.pdf · ENTROPY THE BASICS OF INFORMATION THEORY Shannon 's theory from 1948. Shannon's view Lower bounds for compression

ENTROPY

(Notes by Glenn. Mertens)

Page 2: ENTROPY - luc.devroye.orgluc.devroye.org/HenriMertens-Entropy.pdf · ENTROPY THE BASICS OF INFORMATION THEORY Shannon 's theory from 1948. Shannon's view Lower bounds for compression

ENTROPYTHE BASICS OF INFORMATION THEORY

Shannon 's theory from 1948.

Shannon 's view

Lower bounds for compression

Entropy E = Z.fi leg,Yp

.

.

The global view

Back to prefix codes.

. hempel - Ziv compression .

Page 3: ENTROPY - luc.devroye.orgluc.devroye.org/HenriMertens-Entropy.pdf · ENTROPY THE BASICS OF INFORMATION THEORY Shannon 's theory from 1948. Shannon's view Lower bounds for compression

Shannon 's view

IFEE -oe→Bt7EF → REFER

reproofed as a ( lossless compassion)path in atria

Ebbntght !10

HUGE -07

.

Expected length of compressed file-

Ewa: gameD D D D probability of seeing

files filet files file 4 input file "i

".

Page 4: ENTROPY - luc.devroye.orgluc.devroye.org/HenriMertens-Entropy.pdf · ENTROPY THE BASICS OF INFORMATION THEORY Shannon 's theory from 1948. Shannon's view Lower bounds for compression

So,

the last compression method, given

the fi 's,

is the

Huffman code.

BUT . . . .The tree is too large !

The pi 's are often not known precisely .

Nevertheless,

we know a lot about the best compression method :

Shannon 's theorem.

E E ruin !.

.

Pili s E t I.

all binarytrees

E = Epi legftp.. = (binary) entropy.

G- o )

Page 5: ENTROPY - luc.devroye.orgluc.devroye.org/HenriMertens-Entropy.pdf · ENTROPY THE BASICS OF INFORMATION THEORY Shannon 's theory from 1948. Shannon's view Lower bounds for compression

KRAFT 's INEQUALITY

Let Li be the depths of the leaves in a binary tree.

Then

? # E I.

Frodo (By induction : exercise ? )°

← →" " .

i÷!÷!÷÷÷÷÷÷.

Page 6: ENTROPY - luc.devroye.orgluc.devroye.org/HenriMertens-Entropy.pdf · ENTROPY THE BASICS OF INFORMATION THEORY Shannon 's theory from 1948. Shannon's view Lower bounds for compression

PROOF Of SHANNON 'S THEOREM

(LOWER BOUND ) E

Epik . -

- Epi lggeli-ER.bg,

Ghi: )tEpiloja= Zpilofageip.

t E

> -

ZK.CZ?p.-D#tE=

-

- + E

Read:

a

'

atiYig÷÷÷→I

Page 7: ENTROPY - luc.devroye.orgluc.devroye.org/HenriMertens-Entropy.pdf · ENTROPY THE BASICS OF INFORMATION THEORY Shannon 's theory from 1948. Shannon's view Lower bounds for compression

( UPPER BOUND )

By converse of Kraft i If ?Yzli ee,

then there exists

a binary tree with leaves at these depths ( enna 're ).

so, given Pi ,At

e , = togElp?

.

Then I € -2¥ ,p..

= Ipi =L.

fo,

we can use these

Li to make a tree.

Let that tree define the code.

The

expected length is 9- called theShannon - Fano code

-2 pili = Epitopes Epi leg ftp.t Epi = Et I.

So,

there is a code with expected length ⇐ Et I.

Page 8: ENTROPY - luc.devroye.orgluc.devroye.org/HenriMertens-Entropy.pdf · ENTROPY THE BASICS OF INFORMATION THEORY Shannon 's theory from 1948. Shannon's view Lower bounds for compression

The global viewunrealistic

If the input consists of an independent symbols froman alphabet A

,with symbol probabilities pi ,

then each symbol can be coded via Hoffman ,

and we have a total expected length

← M ×③¥ntqyof one symbolLower bound : 3 M * E

(entropy of file = sum of outagesSo

. . . . of the symbols)

It help togroup

the symbols in

groups of⑧ ,and

Huffman - code eachgot . T

a small number

01 : One could use Lampert - Ziv compression Csa later )

Page 9: ENTROPY - luc.devroye.orgluc.devroye.org/HenriMertens-Entropy.pdf · ENTROPY THE BASICS OF INFORMATION THEORY Shannon 's theory from 1948. Shannon's view Lower bounds for compression

Solution I : Hoffman coding on gaps of characters

be: Group "letters in set . of s :Caleb )

,Cbca)

,.

. .

Get pi by countingoccurrences in a file

Construct the Huffman code .

Code t decode as for prefix codes

Page 10: ENTROPY - luc.devroye.orgluc.devroye.org/HenriMertens-Entropy.pdf · ENTROPY THE BASICS OF INFORMATION THEORY Shannon 's theory from 1948. Shannon's view Lower bounds for compression

RECALL

ofI

toStart at root and detect a leaf .

Decode.

Repeat .

Coin pressed 00001 I I I 01010101sequence

in -

Decoded i ta ta III ! IFTime to decode = Length of compressed sequence .

Page 11: ENTROPY - luc.devroye.orgluc.devroye.org/HenriMertens-Entropy.pdf · ENTROPY THE BASICS OF INFORMATION THEORY Shannon 's theory from 1948. Shannon's view Lower bounds for compression

compressed sequences-The decoder :

Given the compressed sequence s

- de

1£ t

Given the binary tree with the code ; it.

root is t ;its leaves contain symbols of the alphabet A

.

seat (traveling pointer in t )while Isl > o :

G a- get neat Git from s

if 8=0 then a - left CoDelse x ← right Cx ]

if a is a leaf then output hey In ]Rot

Page 12: ENTROPY - luc.devroye.orgluc.devroye.org/HenriMertens-Entropy.pdf · ENTROPY THE BASICS OF INFORMATION THEORY Shannon 's theory from 1948. Shannon's view Lower bounds for compression

LEMPEL - Ziv COMPRESSION (taupe and Ziv,

1977 ) (Solution I )

Beagles:

zip , jpeg ,

most compression methods.

Feature: Undergenerous input file assumptions ,

the expectedlength of the compressed sequence

is close to E.

Method: Parse inept in smallest pieces never seen before

INIVT pie,q#60aa ab ale ab e c 6 G Gaa 6 a⑥ aaa G a ac

PIECE # O I 2 3 4 5 6 7 8 9 10 11 12

tta ! ! !: dictate data.

take He

painfulFastsymbol of piece

to front piece

Page 13: ENTROPY - luc.devroye.orgluc.devroye.org/HenriMertens-Entropy.pdf · ENTROPY THE BASICS OF INFORMATION THEORY Shannon 's theory from 1948. Shannon's view Lower bounds for compression

THE BINARY SEQUENCE .

Fr k - th piece ,we have

lT¥a symbol from the← alphabetan integer in too . . .sk - ⇒

l needs tofalktI bit.

piece # I 2 3 4 56 7 8 9 10 in i 7 12

Tof O 1 2 233 334 4 4 4 ,- -

ii.Tx I

x needs a fixed # of bits : TegelAT.

In output ,all bits axe clearly identified .

Page 14: ENTROPY - luc.devroye.orgluc.devroye.org/HenriMertens-Entropy.pdf · ENTROPY THE BASICS OF INFORMATION THEORY Shannon 's theory from 1948. Shannon's view Lower bounds for compression

DATA STRUCTURE

FOR CODING / DECODING .

:THE DIGITAL SEARCH TREE :

"

IAI -

anytrie "

INPUT

0aaaba babe ebb Gaa Ga

⑥aaaeoaac

PIECE # O I 2 3 4 5 6 7 8 9 10 11 12

ttoa.t.at. .! dictate !a data !. Ieago

10

÷E÷¥÷÷÷÷¥:*.÷÷÷ .

Page 15: ENTROPY - luc.devroye.orgluc.devroye.org/HenriMertens-Entropy.pdf · ENTROPY THE BASICS OF INFORMATION THEORY Shannon 's theory from 1948. Shannon's view Lower bounds for compression

In theparsing phase :

start at root and descend to a leafA add a symbol Cand new leaf )

to add a piece .

Exercise: If inept is of size n

,then write the

parsing algorithm that produces

andG) the tree

G) the sequence ( Oa ) ha ) lob ) . - -

and show that it takes time On ).

Page 16: ENTROPY - luc.devroye.orgluc.devroye.org/HenriMertens-Entropy.pdf · ENTROPY THE BASICS OF INFORMATION THEORY Shannon 's theory from 1948. Shannon's view Lower bounds for compression

In the decoding phase : keep a table of pointers .

piece #

%athenEta:impatient3 o 6 withpeanutpointers

4 I G - -

-

5 4 C

060 c

7 3 6€34aato 6 a

LI 2 812 a c

Decodepiece to as ⑥a) → ( o e a) → (01 c a)

Exercise: Writean Oln ) algorithm for decoding .