22
February 6, 2005 LH-1 Higher-Order Lossless Source Codes (1) Block codes (aka dictionary-based codes) (a) Fixed-to-fixed length codes (FFLC) (b) Fixed-to-variable length codes (FVLC) (including the first-order codes discussed earlier) (c) Variable-to-fixed-length (VFLC) (d) Variable-to-variable-length codes (VVLC) (e) Run-length codes (a type VFLC or VVLC) (2) Conditional codes (3) Arithmetic codes (4) Universal/adaptive codes Forward (two pass) or Backward/one pass Dictionary based (principally, Lempel-Ziv type) or Statistical (principally, adaptive arithmetic coding) We will discuss (1b) and (2) here. Will discuss others later as time permits. February 6, 2005 LH-2 Example of FVLC Source: stationary rand. process {X n }, alphabet A = {1,2,3}, H(X) = log 2 3 = 1.585 P(X n =i) = 1 3 , P(X n+1 =j|X n =i) = 1 2 , j=i 1 4 , ji first-order code P(X=x) x codewor d 1/3 1 0 1/3 2 10 1/3 3 11 rate 1.67 ------------------------------------------------ How is it that we have beaten the entropy? As previously defined, the entropy is a lower bound only to the performance of first-order coding. second-order code P(X n =x 1 ,X n+1 =x 2 ) x 1 x 2 code word 1/6 11 000 1/12 12 0110 1/12 13 0111 1/12 21 100 1/6 22 001 1/12 23 101 1/12 31 110 1/12 32 111 1/6 33 010 avg length 19/6 rate 1.583

Higher-Order Lossless Source Codes

  • Upload
    others

  • View
    13

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Higher-Order Lossless Source Codes

Feb

ruar

y 6,

200

5LH

-1

Hig

her-

Ord

er L

ossl

ess

Sou

rce

Cod

es

(1)

Blo

ck c

odes

(a

ka d

ictio

nary

-bas

ed c

odes

)

(a)

Fix

ed-t

o-fix

ed le

ngth

cod

es (

FF

LC)

(b)

Fix

ed-t

o-va

riabl

e le

ngth

cod

es (

FV

LC)

(incl

udin

g th

e fir

st-o

rder

cod

es d

iscu

ssed

ear

lier)

(c)

Var

iabl

e-to

-fix

ed-le

ngth

(V

FLC

)

(d)

Var

iabl

e-to

-var

iabl

e-le

ngth

cod

es (

VV

LC)

(e)

Run

-leng

th c

odes

(a

typ

e V

FLC

or

VV

LC)

(2)

Con

ditio

nal c

odes

(3)

Arit

hmet

ic c

odes

(4)

Uni

vers

al/a

dapt

ive

code

s

For

war

d (t

wo

pass

)or

Bac

kwar

d/on

e pa

ss

Dic

tiona

ry b

ased

(pr

inci

pally

, Le

mpe

l-Ziv

typ

e) o

rS

tatis

tical

(p

rinci

pally

, ad

aptiv

e ar

ithm

etic

cod

ing)

We

will

dis

cuss

(1

b) a

nd (

2) h

ere.

W

ill d

iscu

ss o

ther

s la

ter

as t

ime

pe

rmits

.

Feb

ruar

y 6,

200

5LH

-2

Exa

mpl

e of

FV

LC

So

urc

e:

stat

iona

ry r

and.

pro

cess

{X

n},

alph

abet

A =

{1,

2,3}

, H

(X)

= lo

g 2 3

= 1

.585

P(X

n=i)

= 1 3

, P

(Xn+

1=j|X

n=i)

= 1 2

, j=

i1 4

, j≠i

firs

t-o

rder

co

de

P(X

=x)

xco

de

wo

rd

1/3

10

1/3

210

1/3

311

rate

1.67

----

----

----

----

----

----

----

----

----

----

----

----

How

is it

tha

t w

e ha

ve b

eate

n th

een

trop

y?

As

prev

ious

ly d

efin

ed,

the

entr

opy

is a

low

er b

ound

onl

y to

the

perf

orm

ance

of

first

-ord

er c

odin

g.

se

con

d-o

rder

co

de

P(X

n=x 1

,Xn+

1=x 2

)x 1

x 2co

de

wo

rd1/

611

000

1/12

1201

101/

1213

0111

1/12

2110

01/

622

001

1/12

2310

11/

1231

110

1/12

3211

11/

633

010

avg

leng

th19

/6

rate

1.58

3

Page 2: Higher-Order Lossless Source Codes

Feb

ruar

y 6,

200

5LH

-3

For

mal

ities

of F

ixed

-to-

Var

iabl

e-Le

ngth

Blo

ck C

odes

So

urc

e

Dis

cret

e-tim

e, d

iscr

ete-

valu

ed w

ith a

lpha

bet

A =

{a 1

,...}

= {

1, 2

, ...

},

finite

or c

ount

ably

infin

ite

Mod

eled

as

a di

scre

te-t

ime

rand

om p

roce

ss

Not

atio

n: X

or

{X

n} o

r {

Xn}

∞ n=-∞

or

{X

n}∞ n=

1

Ass

ume

prob

abili

ty d

istr

ibut

ion

is k

now

n,

i.e.

Pr(

X1=

x 1,..

.,Xk=

x k)

iskn

own

for

all c

hoic

es o

f k

and

(x

1,…

,xk)

Ass

ume

X i

s st

atio

nary

, i.e

.

Pr(

X1=

x 1,..

.,Xk=

x k)

= P

r(X

n+1=

x 1,..

.,Xn+

k=x k

)

for

any

n an

d x 1

,…,x

k

Seq

uenc

e no

tatio

n:

Ak

= a

ll se

quen

ces

x =

(x 1

,…,x

k)

of le

ngth

k

fro

m a

lpha

bet

A,

If A

ha

s M

sy

mbo

ls,

then

A

k has

M

k s

eque

nces

.

Alte

rnat

e no

tatio

n:

Ak

= {a

1, a

2, …

},

whe

re e

ach

ai

is s

ome

sequ

ence

of

leng

th k

fro

m

A.

Pro

babi

litie

s:

(P1,

P2,

…),

w

here

P

i = P

r((X

1,…

,Xk)

= a

i) o

r p

(x)

=p X

(x)

= P

r(X

=x)

Feb

ruar

y 6,

200

5LH

-4

Fix

ed-t

o-v

aria

ble

len

gth

(F

VL

) C

od

es

An

FV

L co

de w

ith (

inpu

t) b

lock

leng

th (

aka

"ord

er"

or "

dim

ensi

on")

k

divi

des

the

sour

ce s

eque

nce

into

blo

cks

of le

ngth

k

enco

des

each

blo

ck w

ith a

uni

quel

y de

coda

ble,

var

iabl

e-le

ngth

cod

eha

ving

one

cod

ewor

d fo

r ev

ery

poss

ible

sou

rce

sequ

ence

of

leng

th

k.

Thu

s, it

is e

ssen

tially

the

sam

e as

a f

irst-

orde

r va

riabl

e-le

ngth

cod

e,ex

cept

tha

t a

larg

er c

odeb

ook

is n

eede

d be

caus

e on

e co

dew

ord

isne

eded

for

eac

h bl

ock

of le

ngth

k,

as

opp

osed

to

a sm

alle

r co

debo

okw

ith o

ne c

odew

ord

for

each

ind

ivid

ual

sour

ce s

ymbo

ls.

The

key

cha

ract

eris

tics

of s

uch

a co

de a

re:

k =

blo

ckle

ngth

,

C =

{ v 1

, v2,

… }

=

code

book

, un

ique

ly d

ecod

eabl

e (u

sual

ly p

refix

),w

ith o

ne c

odew

ord

for

each

seq

uenc

e x

in

A

k ,

Cod

ewor

d le

ngth

s l 1

,l 2,..

.,

whe

re

l i =

l(v i

) =

l(i)

= l(

a i)

Eac

h co

dew

ord

has

the

form

v i

= (

v i,1,..

.,v i,l

i) ∈

{0,1

}*,

whe

re

{0,1

}*

is t

he s

et o

f al

l fin

ite-le

ngth

bin

ary

sequ

ence

s.

e =

en

codi

ng r

ule,

whi

ch

assi

gns

a co

dew

ord

in

C

to e

ach

x

in A

k .

Spe

cific

ally

, z

= e

(x)

= v

i w

hen

x =

ai.

deco

ding

pro

cedu

re d

epen

ds o

n th

e co

debo

ok.

Page 3: Higher-Order Lossless Source Codes

Feb

ruar

y 6,

200

5LH

-5

The

ope

ratio

n of

an

FV

L co

de w

ith b

lock

leng

th

k =

3

is il

lust

rate

d be

low

:

sou

rce

seq

ue

nce

en

cod

ed

b

ina

ry s

eq

ue

nce

X

X

X

X

X

X

X

X

X

X

...

1 2

3

4 5

6

7

8

9

10

e(X

X

X

)1 2 3

e(X

X

X

)4 5 6

e(X

X

X

)7 8 9

e(X

.

..)

10

The

per

form

ance

of

the

code

is d

eter

min

ed b

y its

Ave

rage

len

gth

– l =

E l(

X)

= E

l(e(

X))

= ∑ x∈

Ak

l(

x) p

(x)

=

∑ i

l i P

i

Ra

te

R =

– l k

bits

/sou

rce

sym

bol

Feb

ruar

y 6,

200

5LH

-6

Key

Que

stio

ns

1.

How

to

desi

gn b

lock

leng

th F

VL

code

s w

ith in

put

bloc

klen

gth

ksm

all a

vera

ge le

ngth

or

rate

?

2.

How

sm

all c

an b

e th

e ra

te o

f an

FV

L co

de w

ith b

lock

engt

h k

?

3.

How

sm

all c

an b

e th

e ra

te o

f an

y F

VL

code

with

any

blo

ckle

ngth

wha

tsoe

ver?

W

hat

valu

e of

k

is b

est?

An

swe

rs

1.

How

to

desi

gn F

VL

code

s w

ith s

mal

l ave

rage

leng

th o

r ra

te?

Use

sam

e ap

proa

ches

as

befo

re,

i.e.

appl

y S

hann

on o

r H

uffm

ande

sign

str

ateg

ies

to t

he s

et c

onta

inin

g pr

obab

ilitie

s of

(

P1,

P2,

… ),

equi

vale

ntly

to

{p(

x):

x ∈

Ak }

Page 4: Higher-Order Lossless Source Codes

Feb

ruar

y 6,

200

5LH

-7

2.

How

sm

all c

an b

e th

e ra

te o

f an

FV

L co

de w

ith b

lock

engt

h k

?

Def

ine:

− l* k =

sm

alle

st a

vg le

ngth

am

ong

all p

refix

(or

UD

) co

des

with

bloc

klen

gth

k

R* k

= − l* k k

=

sm

alle

st r

ate

amon

g al

l FV

L: c

odes

with

blo

ckle

ngth

k

Fro

m t

he C

odin

g T

heor

em f

or f

irst-

orde

r co

des,

we

dedu

ce

FV

L C

od

ing

Th

eore

m 1

:

H(X

1,...

,Xk)

− l* k <

H(X

1,...

,Xk)

+ 1

Hk(

X)

R* k

< H

k(X

) +

1 k

wh

ere H

(X1,

...,X

k)

= -

∑ i P

i log

2 P

i =

-∑ x∈

Ak

p

(x)

log 2

p(x

)

=

entr

opy

of

X1,

...,X

k

Hk

= H

k(X

) =

1 k H(X

1,...

,Xk)

= k

th-o

rder

ent

ropy

of

X

Feb

ruar

y 6,

200

5LH

-8

3.

How

sm

all c

an b

e th

e ra

te o

f an

y F

VL

code

with

any

blo

ckle

ngth

wha

tsoe

ver?

W

hat

valu

e of

k

is b

est?

Incr

easi

ng

k d

ecre

ases

the

1 k ter

m,

whi

ch is

a g

ood

thin

g.

But

we

also

need

to

know

how

H

k w

ith

k?

Fac

t 1:

F

or a

sta

tiona

ry r

ando

m p

roce

ss,

Hk+

1 ≤

Hk

≤ H

1

Hk'

s co

nver

ge t

o a

limit.

lim k→∞ H

k =

inf k

Hk

Pro

of:

The

ine

qual

ities

will

be

prov

ed l

ater

whe

n w

e di

scus

s co

nditi

onal

codi

ng.

Sin

ce t

he

Hk'

s a

re n

onin

crea

sing

and

bou

nded

fro

m b

elow

by

zero

,th

ey m

ust

appr

oach

a li

mit.

(R

esul

t fr

om a

naly

sis

(e.g

. M

ath

451)

.)

Sin

ce t

heH

k's

are

non

incr

easi

ng t

he li

mit

is a

lso

the

inf.

Def

ine: H

∞ =

lim k→

∞ H

k =

ent

ropy

rat

e of

the

sou

rce

X

Page 5: Higher-Order Lossless Source Codes

Feb

ruar

y 6,

200

5LH

-9

Cod

ing

The

orem

1 a

nd F

act

1 im

ply

that

tha

t ra

te

R

of a

ny

code

with

bloc

klen

gth

k

satis

fies

R ≥

Hk

≥ H

Tha

t is

, no

FV

L co

de w

hats

oeve

r ca

n ha

ve r

ate

less

tha

n H

∞.

Mor

eove

r, f

or a

ny s

mal

l num

ber

ε,

we

can

choo

se

k la

rge

enou

gh t

hat

Hk(

X)

≤ H

∞ +

ε 2

and

1 k ≤ ε 2

.

The

n C

odin

g T

heor

em 1

impl

ies

that

for

thi

s k

, t

here

is a

cod

e w

ith r

ate

R

=

R* k

< H

k(X

) +

1 k ≤

H∞ +

ε 2 +

ε 2 =

H∞ +

ε.

Sin

ce

ε c

an b

e ar

bitr

arily

sm

all,

this

sho

ws

ther

e ar

e co

des

with

rat

ear

bitr

arily

clo

se t

o H

∞.

In

som

e ca

ses,

the

re m

ay a

ctua

lly b

e a

code

with

rate

H

∞,

but

in o

ther

s th

ere

is n

ot.

In

any

even

t, w

e ha

ve p

rove

d th

efo

llow

ing:

FV

L C

od

ing

Th

eore

m 2

: F

or a

sta

tiona

ry s

ourc

e X

, t

he "

leas

t" a

vera

gera

te o

f F

VL

code

s of

any

blo

ckle

ngth

is

R* F

VL

=∆ i

nf k

R* k

= H

∞.

Equ

ival

ently

, no

FV

L co

de c

an h

ave

rate

less

tha

n H

∞,

and

for

any

ε >

0,

ther

e is

a c

ode

with

rat

e le

ss t

han

H∞

+ ε

.

Feb

ruar

y 6,

200

5LH

-10

Not

es a

nd E

xam

ples

(1)

The

R

* k's

nee

d no

t de

crea

se m

onot

onic

ally

. (

The

y ar

e su

badd

itive

.)

Exa

mpl

e:

Con

side

r a

bina

ry I

ID s

ourc

e w

ith

p(0

)=.7

, p

(1)

= .

2

R* 1

= 1

,

{0,

1}

is a

n op

timal

cod

e w

ith b

lock

leng

th 1

R* 2

= 1.

81 2 =

.905

, {0

, 10

, 11

0, 1

11}

is a

n op

timal

cod

e w

ith b

lock

leng

th 2

R* 3

= 2.

726

3 =

.908

7{0

0, 1

0, 0

10, 0

11, 1

100,

110

1, 1

110,

111

1}is

an

optim

al c

ode

with

blo

ckle

ngth

3

(2)

R* k

is s

ubad

ditiv

e,

i.e.

for

any

pos

itive

inte

gers

k

and

l,

R* k+

l ≤ k k+

l R* k

+ l l+k

R* l

As

we

have

dis

cuss

ed p

revi

ousl

y, a

sub

addi

tive

sequ

ence

has

a l

imit

that

equa

ls it

s in

f. T

here

fore

, su

badd

itivi

ty im

plie

s

R*

= i

nf k R

* k =

lim k→

∞ R

* k

Sub

addi

tivity

impl

ies

that

R

* k d

ecre

ases

whe

n k

in

crea

ses

by a

n in

tege

rm

ultip

le:

R* 1

≥ R

* k ≥

R* mk

≥ R

* f

or a

ll po

sitiv

e in

tege

rs

k a

nd

m

Page 6: Higher-Order Lossless Source Codes

Feb

ruar

y 6,

200

5LH

-11

Pro

of

of

sub

add

itiv

ity:

Let

Ck

and

C

l be

cod

es w

ith b

lock

leng

ths

k

and

l,

resp

ectiv

ely,

tha

t ha

vera

tes

R* k

and

R

* l,

resp

ectiv

ely.

Le

t C

be

the

blo

ckle

ngth

k+

l co

deob

tain

ed b

y us

ing

Ck

on

the

first

k

sou

rce

sym

bols

and

C

l on

the

nex

t l

sour

ce s

ymbo

ls.

Tha

t is

, C

= C

k×C

l. T

hen,

the

ave

rage

of

leng

th o

f C

is

sum

of

the

aver

age

leng

ths

of

Ck

and

C

l, i.

e.

– l(C

) =

– l(C

k) +

– l(C

l) ,

and

the

rate

of

C

R(C

) =

– l(C

)k+

l =

k k+l – l(

Ck)

k +

l k+l – l(

Cl)

l

=

k k+l R

(Ck)

+

l k+l R

(Cl)

=

k k+l R

* k +

l k+l R

* l

Sin

ce

R* k

≤ R

(C),

w

e ha

ve f

rom

the

abo

ve t

hat

R

* k ≤

k k+

l R* k

+

l k+l R

* l

(3)

In m

ost

case

s, t

here

is n

o co

de w

ith r

ate

exac

tly e

qual

to

H∞

,

and

ther

e is

no

code

with

blo

ckle

ngth

k

and

rat

e ex

actly

equ

al t

o H

k.

(4)

We

will

sho

w la

ter

that

for

an

IID s

ourc

e (i.

e. s

tatio

nary

and

mem

oryl

ess)

H1

= H

2 =

... =

Hk

= H

Thi

s ha

ppen

s on

ly w

hen

the

sour

ce i

s m

emor

yles

s.

So

ther

e is

les

s be

nefit

to in

crea

sing

k

for

an

IID s

ourc

e th

an f

or a

sou

rce

with

mem

ory.

Feb

ruar

y 6,

200

5LH

-12

(5)

We

will

sho

w la

ter

that

for

a (

first

-ord

er)

Mar

kov

sour

ce

Hk

= k-

1 k H

(X1,

X2)

- k-

2 k H

(X1)

=

1 k H

(X1)

+ k-

1 k H

(X2|

X1)

,

whe

re

H(X

2|X

1)

is t

he c

ondi

tiona

l ent

ropy

to

be d

efin

ed la

ter.

Exa

mpl

e:

A s

tatio

nary

Mar

kov

sour

ce

X

alph

abet

A =

{1,

2,3}

, P

(Xn=

i) =

1 3 ,

P(X

n+1=

j|Xn=

i) =

1 2 ,

j=i

1 4 , j≠

ik

12

3∞

Hk

log 2

3

1.58

5

1 2 (lo

g 2 3

+ 3 2)

1.54

2

1 3 (lo

g 2 3

+ 2

3 2)

1.52

8

3 2

1.50

0

R* k

1.66

71.

583

1.55

61.

500

Hk

+ 1 k

2.58

52.

043

1.86

11.

500

Hk

+ P

max k

1.91

81.

626

1.55

61.

500

1.0

00

1.5

00

2.0

00

2.5

00

3.0

00

02

46

81

01

2

blo

ckle

ng

th

Hk

Hk

+ 1/

k

R* k

Page 7: Higher-Order Lossless Source Codes

Feb

ruar

y 6,

200

5LH

-13

(6)

"Red

unda

ncy

of a

cod

e"

=∆ R

ate

- H

By

Cod

ing

The

orem

1:

redu

ndan

cy o

f th

e be

st F

VL

code

with

blo

ckle

ngth

k

is a

t m

ost

1/k.

(7)

Tig

hter

upp

er b

ound

on

R* k

(G

alla

ger1 )

It

can

be s

how

n th

at

− l* k ≤

H(X

1,...

,Xk)

+ P

max

if

Pm

ax <

1/2

H(X

1,...

,Xk)

+ P

max

+ 0

.086

if P

max

≥ 1

/2

whe

re

Pm

ax

=

max

{P1,

P2,

… }

. H

ence

,

R* k

≤ H

k(X

)+ P

max

/k i

f P

max

< 1

/2

Hk(

X)

+ P

max

/k +

0.0

86/k

if P

max

≥ 1

/2

Sin

ce

Pm

ax

decr

ease

s w

ith k

(u

sual

ly q

uite

rap

idly

), t

his

indi

cate

s th

atre

dund

ancy

dec

reas

es m

ore

rapi

dly

than

1/k

.

1 R.G

. Gal

lage

r, "

Var

iatio

ns o

n a

them

by

Huf

fman

," IE

EE

Tra

ns. I

nfor

m. T

hy.,

pp. 6

68-6

74, N

ov. 1

978.

Feb

ruar

y 6,

200

5LH

-14

(8)

Alth

ough

the

re a

re o

ther

typ

es o

f lo

ssle

ss c

odes

bes

ides

FV

L bl

ock

code

s,

it tu

rns

out

that

for

a s

tatio

nary

sou

rce,

no

code

can

hav

e ra

te le

ssth

an

H∞

. T

hus

FV

L co

des

perf

orm

as

wel

l as

any

code

s.

(9)

Com

plex

ity:

A

n F

VL

code

with

blo

ckle

ngth

k

for

a s

ourc

e w

ith a

n M

-sy

mbo

l alp

habe

t ha

s M

k co

dew

ords

. T

hus,

com

plex

ity o

f th

is t

ype

ofco

ding

gro

ws

rapi

dly

(exp

onen

tially

) w

ith k

.

Page 8: Higher-Order Lossless Source Codes

Feb

ruar

y 6,

200

5LH

-15

Est

imat

es o

f H

k fo

r V

ario

us L

angu

ages

M=

|A|

log 2

MH

1H

2H

3H

4H

Eng

lish

264.

704.

143.

853.

671.

3E

nglis

h27

4.75

4.03

3.68

3.48

Eng

lish

264.

704.

12E

nglis

h27

4.75

4.09

3.66

3.39

3.21

Fre

nch

264.

703.

98S

pani

sh26

4.70

4.02

Ge

rma

n26

4.70

4.10

Po

rtu

gu

ese

26?

4.70

?3.

923.

723.

53R

uss

ian

365.

174.

554.

003.

653.

42S

am

oan

174.

093.

403.

042.

832.

69T

amil

304.

914.

34A

rab

ic32

5.00

4.21

3.99

3.49

Ch

ine

se47

0012

.20

9.63

Ref

eren

ce:

Tex

t C

ompr

essi

on,

Bel

l, C

lear

y, W

itten

, p.

95.

(Som

e "c

onve

rsio

n" w

as n

eede

d to

obt

ain

H2,

H3,

H4

from

the

tab

le g

iven

ther

e.)

Feb

ruar

y 6,

200

5LH

-16

Con

ditio

nal L

ossl

ess

Cod

es

Exa

mp

le:

X, s

tatio

nary

, alp

habe

t: A

= {

1,2,

3},

P(X

n=i)

= 1 3

, P

(Xn+

1=j|X

n=i)

= 1 2

, j=

i1 4

, j≠i

firs

t-o

rder

co

de

P(X

=x)

xco

de

wo

rd1/

31

01/

32

101/

33

11R

ate

1.67

seco

nd

-ord

er c

od

e

P(X

n=x 1

,Xn+

1=x 2

)x 1

x 2co

de

wo

rd

1/6

1100

01/

1212

0110

1/12

1301

111/

1221

100

1/6

2200

11/

1223

101

1/12

3111

01/

1232

111

1/6

3301

0av

g le

ngth

19/

6R

ate

1.58

3

con

dit

ion

al c

od

e

pr

evio

us s

ymbo

l Xn

-1

Xn

12

3

11/

20

1/4

111/

410

21/

410

1/2

01/

411

31/

411

1/4

101/

20

− l1.

51.

51.

5

Rat

e =

1.5

bits

/sym

bol

=

H∞

Is c

ode

uniq

uely

dec

odab

le?

Yes

, kn

owin

g th

e pr

evio

us s

ymbo

l, th

ede

code

r kn

ows

whi

ch p

refix

cod

e w

asus

ed t

o en

code

the

cur

rent

sym

bol.

It is

sai

d to

be

"bac

kwar

d ad

aptiv

e".

Page 9: Higher-Order Lossless Source Codes

Feb

ruar

y 6,

200

5LH

-17

sour

cese

quen

ce

enco

ded

bina

ry s

eque

nce

X

X

X

X

X

X

X

X

...

1

2

3

4

5

6

7

8

e(X

|X

)2

1 e(

X |

X )

3

2e(

X |

X )

5

4e(

X |

X )

7

6

e(X

|X

)4

3

e(X

|X

)6

5

e(X

|X

)8

7

Con

ditio

nal L

ossl

ess

Cod

ing:

For

mal

Intr

oduc

tion

Ass

ume

disc

rete

-val

ued,

sta

tiona

ry s

ourc

e X

, as

usu

al.

Bas

ic id

ea:

Whe

n en

codi

ng

Xn,

use

a c

ode

desi

gned

for

the

con

ditio

nal p

roba

bilit

ydi

strib

utio

n of

X

n g

iven

the

val

ue o

f th

e pr

evio

us s

ymbo

l X

n-1.

Tha

t is

, w

hen

Xn-

1=a,

us

e a

pref

ix c

ode

Ca

= {

v a,1

, va,

2, …

} w

ith le

ngth

s {

l a,1

, la,

2, ..

. }

desi

gned

for

p(1|

a), p

(2|a

), p

(3|a

), ..

.

wh

ere

p(i|a

) =

Pr(

Xn=

i | X

n-1=

a )

Con

ditio

nal R

ate:

giv

en

Xn-

1=a,

Ra

= – l a

= ∑

i l a

,i p(

i|a)

Ove

rall

Rat

e:

R =

∑ a p

(a)

– l a =

∑a

∑ i l a,i

p(a)

p(i|

a)

Feb

ruar

y 6,

200

5LH

-18

If fo

r ea

ch a

, C

a is

opt

imal

for

( p

(1|a

), p

(2|a

), p

(3|a

), ..

. ),

then

H(X

|a)

≤ R

a <

H(X

|a)

+ 1

wh

ere

H(X

|a)

= H

(X2|

X1=

a) =

-∑ i

p(i|

a) lo

g 2 p

(i|a)

= c

ond'

l ent

r’y o

f X

2 gi

ven

X1=

a

and

the

over

all r

ate

is

H(X

2|X

1) ≤

R <

H(X

2|X

1) +

1

wh

ere

H(X

2|X

1) =

∑ a p

(a)

H(X

2|X

1=a)

=

-∑ a

∑ i p

(a)

p(i|a

) lo

g 2 p

(i|a)

=

cond

ition

al e

ntro

py o

f X

2 gi

ven

X1

Not

e ho

w s

imila

r th

is n

otat

ion

is t

o th

at o

f H

(X2|

X1=

a).

The

y ar

e no

t th

e sa

me.

Co

nd

itio

nal

Lo

ssle

ss C

od

ing

Th

eore

m:

For

a s

tatio

nary

sou

rce

X,

H(X

2|X

1) ≤

R* c

< H

(X2|

X1)

+ 1

,

whe

re

R* c

den

otes

the

leas

t ra

te o

f co

nditi

onal

cod

es u

sed

to e

ncod

eso

urce

X

.

Page 10: Higher-Order Lossless Source Codes

Feb

ruar

y 6,

200

5LH

-19

No

tes

:

(1)

Thi

s is

a k

ind

of b

ackw

ard

adap

tive

codi

ng.

The

enc

oder

and

dec

oder

adap

t th

e co

de b

ased

on

the

prev

ious

sym

bol.

(2)

It is

not

abs

olut

ely

esse

ntia

l tha

t th

e C

a's

be

pref

ix c

odes

. H

owev

er,

if no

t, on

e w

ould

hav

e to

req

uire

a k

ind

of "

mut

ual"

uni

que

deco

dabi

lity,

nam

ely,

tha

t an

y fin

ite s

eque

nce

of c

odew

ords

fro

m a

ny o

f th

e co

des

be d

ecod

able

in o

nly

one

way

.

(3)

The

upp

er b

ound

R

* c <

H(X

2|X

1)+

1 c

an b

e tig

hten

ed b

y us

ing

the

boun

d R

< H

(X2|

X1)

+p c

,max

+0.

0836

, w

here

p c

,max

is

the

larg

est v

alue

of

p(i|a

) o

ver

all c

hoic

es o

f i

and

a.

Feb

ruar

y 6,

200

5LH

-20

Pre

dict

ive/

Diff

eren

tial C

odin

g

A s

peci

al k

ind

of c

ondi

tiona

l cod

e

FV

Len

code

rF

VL

deco

der

Exa

mp

le:

Alp

habe

t of

X:

AX =

{0,

1,2}

Alp

habe

t of

U:

AU =

{-2

,-1,

0,1,

2}

Cod

eboo

k fo

r U

: C

U =

{110

, 01,

00,

10,

111

}

Equ

ival

ent

cond

ition

al e

ncod

ing

tabl

e

prev

ious

sym

bol X

n-1

Xn

01

2

000

0111

01

1000

012

111

1000

Com

plex

ity:

Com

pute

U

n =

Xn-

Xn-

1

Sto

re ju

st t

he o

ne c

odeb

ook

CU

with

2|

A|-

1 co

dew

ords

Ra

te:

R ≅

H(U

)

Ofte

n, H

(U)

≅ H

(X2|

X1)

Imp

rove

me

nt:

Rep

lace

"de

lay"

with

a p

redi

ctor

~ Xn

= g

(Xn-

1,X

n-2,

...)

Wid

ely

used

in

loss

less

im

age

codi

ng,

e.g.

"lo

ssle

ss J

PE

G".

Page 11: Higher-Order Lossless Source Codes

Feb

ruar

y 6,

200

5LH

-21

Impr

oved

Ver

sion

s of

Con

ditio

nal L

ossl

ess

Cod

ing

The

re a

re t

wo

way

s to

im

prov

e co

nditi

onal

cod

ing.

(1)

mth

-ord

er c

ondi

tiona

l enc

odin

g:

cond

ition

on

m p

ast

sym

bols

(ra

ther

tha

n on

on

e)

(2)

(m

,k)

cond

ition

al b

lock

cod

ing:

co

nditi

onal

ly e

ncod

e k

sym

bols

at

atim

e (r

athe

r th

an o

ne).

Feb

ruar

y 6,

200

5LH

-22

(1)

mth

-ord

er c

on

dit

ion

al c

od

ing

In m

th-o

rder

con

ditio

nal c

odin

g, w

hen

enco

ding

sym

bol

Xn

one

use

s a

pref

ixco

debo

ok d

eter

min

ed b

y th

e pr

evio

us

m

sym

bols

Xn-

m,…

,Xn-

1.

Tha

t is

, fo

r ev

ery

m-t

uple

x =

(x 1

,…,x

m)

in A

m ,

the

re is

a p

refix

cod

eboo

k

Cx

= {

v x,1

, vx,

2, …

}w

ith l

engt

hs { lx,

1, l x

,2, …

}an

d a

dist

inct

cod

ewor

d fo

r ev

ery

sym

bol i

n A

. W

hen

(X

n-m

,…,X

n-1)

= x

,th

en

Xn

is e

ncod

ed w

ith c

odeb

ook

Cx.

The

ope

ratio

n of

a 2

nd-o

rder

con

ditio

nal c

ode

is il

lust

rate

d be

low

:

sourc

ese

quence

enco

ded

bin

ary

sequence

X X

X

X

X

X

...

1

2

3

4

5

6

e(X

|X

X

)

3

2

1

e(X

|X

X

)

4

3

2(X |X

X

)

5

4

3

e(X

|X

X

)

6

5

4

Def

init

ion

:

R* 1|

m =

leas

t ra

te o

f m

th-o

rder

con

ditio

nal c

odin

g fo

r so

urce

X

Page 12: Higher-Order Lossless Source Codes

Feb

ruar

y 6,

200

5LH

-23

An

alys

is

Giv

en t

hat

(X

n-m

,…,X

n-1)

= x

, t

he c

ondi

tiona

l ave

rage

rat

e of

the

cod

e is

Rx

= – l x

= ∑

i

l x,i

p(i|x

)

whe

re

p(i|x

) =

Pr(

Xn=

i|Xn-

m,…

,Xn-

1=x)

,

and

the

over

all r

ate

is

R =

∑ x p

(x)

Rx

= ∑ x

p(x

) – l x

whe

re

p(x)

= P

r(X

n-1

n-m

=x)

= P

r(X

m 1=

x),

(

the

latte

r by

sta

tiona

rity)

.

If fo

r ea

ch

x, C

x is

opt

imal

for

(p(

1|x)

, p(

2|x)

, p(

3|x)

, ...

),

then

H(X

n|x)

≤ R

x <

H(X

n|x)

+ 1

wh

ere

H(X

n|x)

=

H(X

n|X

n-m

,…,X

n-1=

x) =

- ∑ i

p(i|

x) lo

g 2 p

(i|x)

=

con

ditio

nal e

ntro

py o

f X

2 g

iven

Xn-

m,…

,Xn-

1=x.

The

ove

rall

rate

is

R* 1|

m

and

from

the

abo

ve

H1|

m ≤

R* 1|

m <

H1|

m +

1

wh

ere

H1|

m =

H(X

n|X

n-m

,…,X

n-1)

= ∑ a

p(x

) H

(Xn|

Xn-

m,…

,Xn-

1=x)

= c

ondi

tiona

l ent

ropy

of

H(X

n|X

n-m

,…,X

n-1)

Aga

in,

note

how

sim

ilar

this

nam

e is

to

that

of

H(X

n|X

n-m

,…,X

n-1=

x).

Feb

ruar

y 6,

200

5LH

-24

Est

imat

es o

f H

(Xn|

Xn-

m,…

,Xn-

1) f

or v

ario

us la

ngua

ges

H1|

m

=∆ H

(Xn|

Xn-

m,…

,Xn-

1)

M=

|A|

log 2

MH

(X1)

H1|

2H

1|3

H1|

4H

1|8

H1|

12H

1|∞

Eng

lish

264.

704.

143.

563.

31.

3

Eng

lish

274.

754.

033.

323.

1

Eng

lish

264.

704.

12

Eng

lish

274.

754.

093.

232.

852.

662.

432.

40

Fre

nch

264.

703.

98

Spa

nish

264.

704.

02

Ger

man

264.

704.

10

Por

tugu

ese

26?

4.70

?3.

923.

513.

15

Rus

sian

365.

174.

553.

442.

952.

722.

452.

40

Sam

oan

174.

093.

402.

682.

402.

282.

162.

14

Tam

il30

4.91

4.34

Ara

bic

325.

004.

213.

772.

49

Chi

nese

4700

12.2

09.

63

Ref

eren

ce:

Tex

t C

ompr

essi

on,

Bel

l, C

lear

y, W

itten

, p.

95.

Page 13: Higher-Order Lossless Source Codes

Feb

ruar

y 6,

200

5LH

-25

(2)

Co

nd

itio

nal

Blo

ck C

od

ing

.

In (

k|m

)-co

nditi

onal

-blo

ck e

ncod

ing,

just

as

with

FV

L en

codi

ng w

ithbl

ockl

engt

h k

, o

ne d

ivid

es t

he s

ourc

e se

quen

ce in

to b

lock

s of

leng

th

k.

The

n as

with

mth

-ord

er c

ondi

tiona

l cod

ing,

whe

n en

codi

ng a

blo

ckbe

ginn

ing

with

sym

bol

Xn,

i.e

. th

e bl

ock

(X

n,…

,Xn+

k-1)

on

e us

es a

pref

ix c

odeb

ook

dete

rmin

ed b

y th

e pr

evio

us

m

sym

bols

X

n-m

,…,X

n-1.

Tha

t is

, fo

r ev

ery

m-t

uple

x

= (

x 1,…

,xm

) in

A

m ,

the

re is

a p

refix

code

book

C

x =

{ v x

,1, v

x,2,

… }

with

leng

ths

{ l x

,1, l

x,2,

… }

and

one

dist

inct

cod

ewor

d fo

r ev

ery

sequ

ence

in

Ak .

W

hen

(X

n-m

,…,X

n-1)

= x

,th

en

(Xn,

…,X

n+k-

1)

is e

ncod

ed w

ith c

odeb

ook

Cx.

The

ope

ratio

n of

a (

1,2)

-con

d'l b

lock

cod

e is

illu

stra

ted

belo

w:

sourc

ese

quence

enco

ded

bin

ary

sequence

X X

X

X

X

X

X

..

. 1

2

3

4

5

6

7

e(X

X

|X

)

3

2

1e(X

X

|X

)

7

6

5e(X

X

|X

)

5

4

3

Def

init

ion

:

R* k|

m

=

leas

t ra

te o

f an

y (

k|m

)-co

nditi

onal

cod

e fo

r X

Feb

ruar

y 6,

200

5LH

-26

An

alys

is

To

allo

w m

ore

com

pact

exp

ress

ions

, le

t X

v u d

enot

e th

e se

quen

ce

(Xu,

…,X

v).

Giv

en th

at X

n-1

n-m

= x

, t

he a

vera

ge r

ate

of t

he c

ode

is

Rx

=

1 k – l x =

1 k ∑ i

l x,i

p(a i

|x)

whe

re

p(a i

|x)

= P

r(X

n+k-

1n

= a i

|Xn-

1n-

m =

x),

an

d th

e ov

eral

l rat

e is

R =

1 k ∑ x p

(x)

Rx

=

1 k ∑ x p

(x)

– l x

whe

re

p(x)

= P

r(X

n-1

n-m

=x)

= P

r(X

m 1=

x),

(th

e la

tter

by s

tatio

narit

y).

If C

x is

opt

imal

for

(p(

a 1|x

), p

(a2|

x), p

(a3|

x), .

.. ),

th

en

1 k H(X

n+k-

1n

|x)

Rx

<

1 k H

(Xn+

k-1

n|x

) +

1 k

wh

ere

H(X

n+k-

1n

|x)

=

H(X

n+k-

1n

|Xn-

1n-

m=x

) =

- ∑ i

p(a

i|x)

log 2

p(a

i|x)

=

con

ditio

nal e

ntro

py o

f X

n+k-

1

giv

en X

n-1

n-m

=x.

and

from

the

abo

ve,

the

over

all r

ate

is

Hk|

m ≤

R* k|

m <

Hk|

m +

1 k

Page 14: Higher-Order Lossless Source Codes

Feb

ruar

y 6,

200

5LH

-27

wh

ere

Hk|

m =

1 k H

(Xn+

k-1

n|X

n-1

n-m

)

=

(k|m

)-th

ord

er e

ntro

py

and

H(X

n+k-

1n

|Xn-

1n-

m)

= ∑ x

p(x

) H

(Xn+

k-1

n|x

) =

- ∑ x

∑ i p

(x)

p( a

i|x)

log 2

p(a

i|x)

=

cond

ition

al e

ntro

py o

f X

n+k-

1

giv

en X

n-1

n-m

Aga

in,

note

how

sim

ilar

this

nam

e is

to

that

of

H(X

n+k-

1n

|Xn-

1n-

m=x

).

Feb

ruar

y 6,

200

5LH

-28

We

now

sum

mar

ize

and

take

the

lim

it.

Co

din

g T

heo

rem

fo

r C

on

dit

ion

al B

lock

Co

des

:

For

a s

tatio

nary

sou

rce

X a

nd in

tege

rs

k≥1,

m≥0

(a)

Hk|

m ≤

R* k|

m <

Hk|

m +

1 k ,

wh

ere

R* k|

m

deno

tes

the

leas

t ra

te o

f an

y (k

|m)

cond

ition

al b

lock

cod

e

Hk|

m =

1 k H

(Xn+

k-1

n|X

n-1

n-m

) is

the

(k|

m)-

th o

rder

ent

ropy

.

Mo

reo

ver,

(b)

lim k→∞

R* k|

m =

H∞,

for

any

m

(c)

H∞ ≤

lim m→

∞R

* k|m

< H

∞ +

1 k ,

for

any

k.

Whe

n m

=0,

w

e in

terp

ret

a (k

|m)

cond

ition

al b

lock

cod

e to

mea

n an

unco

nditi

onal

blo

ck c

ode.

T

his

theo

rem

gen

eral

izes

tho

se o

n pp

. 7,

9.

Pro

of:

F

or

m=

0,

(a)

and

(b)

wer

e es

tabl

ishe

d in

the

the

orem

s on

pp.

7,9

.F

or

m≥1

, (

a) w

as e

stab

liish

ed o

n pr

evio

us p

ages

. (

b) a

nd (

c) f

ollo

w f

rom

(a)

and

Pro

pert

ies

15 a

nd 1

6, g

iven

late

r, w

hich

sho

w

lim k→∞H

k|m

= H

∞ f

or a

ny m

, a

nd

lim

m→

∞H

k|m

= H

∞ f

or a

ny k

Thi

s th

eore

m d

emon

stra

tes

that

one

can

app

roac

h th

e le

ast

poss

ible

rat

e, H

∞,

in a

num

ber

of w

ays.

Page 15: Higher-Order Lossless Source Codes

Feb

ruar

y 6,

200

5LH

-29

Co

mp

lexi

ty o

f co

nd

itio

nal

an

d c

on

dit

ion

al b

lock

co

des

Sup

pose

the

sou

rce

alph

abet

con

tain

s M

sy

mbo

ls.

A d

irect

impl

emen

tatio

n of

an

(k|m

) co

nditi

onal

blo

ck c

ode

requ

ires

stor

ing

the

Mk

cod

ewor

ds o

f ea

ch o

f th

e M

m

code

s C

x.

Tha

t is

, th

eto

tal s

tora

ge r

equi

red

for

enco

ding

or

deco

ding

is p

ropo

rtio

nal t

o M

k+m

.

Thu

s co

mpl

exity

gro

ws

expo

nent

ially

with

m

an

d k

.

Sup

pose

we

fix a

val

ue

K

and

requ

ire t

hat

k+

m =

K,

the

reby

fix

ing

the

com

plex

ity,

at le

ast

appr

oxim

atel

y.

Wha

t va

lues

of

k

and

m

lead

to

the

low

est

rate

?

The

ans

wer

dep

ends

on

the

sour

ce,

so t

here

is

no u

nive

rsal

ly b

est

choi

ce.

One

can

use

H

k|m

+ 1

/k

as a

gui

de t

o th

e ch

oice

of

k

and

m.

The

fac

ts g

iven

late

r sh

ow t

hat

amon

g ch

oice

s of

k

,m

such

tha

tk+

m=

K,

Hk|

m

is s

mal

lest

whe

n k

=1

and

m

=K

-1,

i.e.

H1|

K-1

is

sma

llest

.

Feb

ruar

y 6,

200

5LH

-30

Exa

mp

le:

A h

ypo

thet

ical

20t

h-o

rder

Mar

kov

sou

rce

wit

h

H1=

10,

H

∞ =

2

Hk|

m

02468

10

12

05

10

15

20

25

bloc

k si

ze k

bits/symbol

m=

0

m=

1

m=

2

m=

4

m=

6

m=

8

m=

10

mem

ory

leng

th

Page 16: Higher-Order Lossless Source Codes

Feb

ruar

y 6,

200

5LH

-31

1 5 9 13 1

7m=0

m=2

m=4

m=6

m=8

m=10

012345678910

bits/symbol

bloc

k si

ze k

Hk|

m

9-1

0

8-9

7-8

6-7

5-6

4-5

3-4

2-3

1-2

0-1

mem

ory

Feb

ruar

y 6,

200

5LH

-32

Tab

le o

f H

k|m

m =

01

23

45

67

89

10

11

12

k=

11

08.

217.

476.

96.

426

5.62

5.27

4.94

4.6

4.34

4.07

3.8

29.

117.

847.

196.

666.

25.

815.

445.

14.

84.

494.

213.

943.

68

38.

567.

536.

936.

46.

015.

635.

284.

94.

644.

354.

073.

813.

6

48.

037.

156.

66.

155.

755.

385

4.73

4.43

4.15

3.88

3.6

3.38

57.

626.

86.

345.

915.

525.

24.

854.

544.

253.

983.

73.

473.

23

67.

36.

586.

15.

695.

34.

994.

674.

384.

13.

83.

573.

333.

09

77

6.34

5.89

5.5

5.15

4.82

4.51

4.22

3.9

3.69

3.44

3.2

2.96

86.

746.

135.

75.

324.

984.

664.

364.

13.

813.

563.

313.

072.

84

96.

515.

95.

525.

154.

824.

514.

23.

943.

683.

433.

192.

952.

75

10

6.3

5.77

5.37

5.02

4.69

4.4

4.1

3.83

3.57

3.33

3.09

2.86

2.67

11

6.14

5.62

5.23

4.88

4.6

4.27

3.99

3.72

3.47

3.22

2.99

2.78

2.61

12

5.95

5.45

5.07

4.7

4.42

4.13

3.86

3.6

3.35

3.12

2.91

2.71

2.56

13

5.76

5.28

4.9

4.59

4.28

43.

733.

473.

243.

042.

842.

662.

52

14

5.59

5.1

4.77

4.45

4.15

3.87

3.61

3.37

3.15

2.96

2.78

2.61

2.48

15

5.4

4.97

4.63

4.31

4.02

3.75

3.5

3.28

3.08

2.9

2.73

2.57

2.45

16

5.26

4.82

4.49

4.18

3.89

3.64

3.41

3.2

3.01

2.84

2.68

2.54

2.42

17

5.1

4.68

4.35

4.05

3.78

3.54

3.32

3.13

2.95

2.79

2.64

2.5

2.4

18

4.95

4.54

4.22

3.94

3.68

3.46

3.25

3.06

2.9

2.75

2.6

2.48

2.37

19

4.81

4.41

4.1

3.84

3.6

3.38

3.18

3.01

2.85

2.71

2.57

2.45

2.35

20

4.67

4.29

43.

743.

523.

313.

122.

962.

812.

672.

542.

432.

34

The

val

ues

on a

dia

gona

l, su

ch a

s th

ose

prin

ted

in b

old,

hav

e th

e sa

me

com

ple

xity

.

Page 17: Higher-Order Lossless Source Codes

Feb

ruar

y 6,

200

5LH

-33

Mor

e D

efin

ition

s an

d P

rope

rtie

s of

Ent

ropy

Fo

r T

wo

Ran

do

m V

aria

ble

s

Def

init

ion

s:

(a)

The

ent

ropy

of

a pa

ir of

ran

dom

var

iabl

es X

and

Y

(som

etim

es c

alle

d th

eir

"joi

nt"

entr

opy)

is

H(X

,Y)

= -

∑ x,y p

(x,y

) lo

g 2 p

(x,y

) ,

w

here

p(

x,y)

=∆ P

r(X

=x

and

Y=

y).

Thi

nk o

f (

X,Y

) as

one

ran

dom

var

iabl

e w

ith a

n al

phab

et c

onsi

stin

g of

pai

rs.

The

nth

e ab

ove

is ju

st t

he u

sual

def

initi

on o

f en

trop

y, a

nd h

as t

he u

sual

pro

pert

ies.

(b)

The

con

ditio

nal e

ntro

py o

f ra

ndom

var

iabl

e X

gi

ven

a pa

rtic

ular

val

ue o

f ra

ndom

varia

ble

Y H(X

|Y=y

) =

-∑ x

p(x

|y)

log 2

p(x

|y),

whe

re

p(x|

y) =

Pr(

X=

x|Y

=y)

.

Thi

s is

an

ordi

nary

ent

ropy

--

nam

ely

the

entr

opy

of

X

whe

n a

spec

ific

valu

e of

Y

is g

iven

--

and

as s

uch

it ha

s al

l the

usu

al p

rope

rtie

s of

ent

ropy

.

(c)

Con

ditio

nal e

ntro

py o

f ra

ndom

var

iabl

e X

giv

en a

ran

dom

var

iabl

e Y

H(X

|Y)

= ∑ y

p(y

) H

(X|Y

=y)

= -

∑ x,y

p(x

,y)

log 2

p(x

|y)

Thi

s is

an

aver

age

of t

he p

revi

ous

kind

of

cond

ition

al e

ntro

py.

Feb

ruar

y 6,

200

5LH

-34

Pro

per

ties

: (

elem

enta

ry i

n in

form

atio

n th

eory

)

1.H

(X|Y

) ≥

0

with

equ

ality

iff

X

is a

func

tion

of

Y

Pro

of:

H(X

|Y=

y) ≥

0

for

all y

be

caus

e it

is a

n or

dina

ry e

ntro

py.

Sin

ce H

(X|Y

)is

the

ave

rage

of

such

, it

too

is n

onne

gativ

e.

H(X

|Y)

= 0

if

and

only

ifH

(X|Y

=y)

=0

for

each

y

with

pos

itive

pro

babi

lity.

T

his

happ

ens

if an

d on

ly f

orea

ch y

with

pos

itive

pro

babi

lity,

the

re is

a v

alue

x s

uch

that

P

r(X

=x|

Y=

y)=

1.

Inot

her

wor

ds,

Y d

eter

min

es

X;

i.e.

X

is

a f

unct

ion

of

Y.

2.H

(X|Y

) ≤

H(X

)

with

equ

ality

iff

X a

nd Y

are

in

depe

nden

t.

Pro

of:

We'

ll sh

ow

H(X

) -

H(X

|Y)

≥ 0

with

equ

ality

iff

X in

dep

of Y

.

H(X

) -

H(X

|Y)

= -

∑ x,y p

(x,y

) lo

g 2 p

(x)

+ ∑ x,

y p

(x,y

) lo

g 2 p(

x,y)

p(y)

= -

∑ x,y p

(x,y

) ln

p(x)

p(y

)p(

x,y)

1 ln

2

≥ -

∑ x,y p

(x,y

)

p

(x) p

(y)

p(x,

y) -

1

1 ln 2

u

sing

ln z

≤ z

-1

= -

∑ x,y p

(x) p

(y)

1 ln 2

+ ∑ x,

y p

(x,y

) 1 ln

2 =

0.

Equ

ality

hol

ds if

and

onl

y if

p(x

) p(y

) =

p(x

,y)

for

all x

,y; i

.e.

if an

d on

ly if

X a

ndY

ar

e in

depe

nden

t.

Page 18: Higher-Order Lossless Source Codes

Feb

ruar

y 6,

200

5LH

-35

3.

Cha

in r

ule:

H(X

,Y)

= H

(X)

+ H

(Y|X

) =

H(Y

) +

H(X

|Y)

Pro

of: H

(X,Y

)=

-∑ x,

y p

(x,y

) lo

g 2 p

(x) p

(y|x

)

= -∑ x,

y p

(x,y

) lo

g 2 p

(x)

-∑ x,

y p

(x,y

) lo

g 2 p

(y|x

)

= H

(X)

+ H

(Y|X

).

Inte

rcha

ngin

g X

an

d Y

in

thi

s ar

gum

ent

show

s H

(X,Y

) =

H(Y

)+H

(X|Y

).

4.H

(X,Y

) ≤

H(X

) +

H(Y

)

with

equ

ality

iff

X a

nd Y

are

inde

pend

ent

Pro

of:

Usi

ng F

acts

3 a

nd 2

,

H(X

,Y)

= H

(Y)

+ H

(X|Y

) ≤

H(Y

) +

H(X

)

with

equ

ality

iff

X a

nd Y

are

inde

pend

ent.

5.H

(X)

≤ H

(X,Y

)w

ith e

qual

ity if

f Y

is a

func

tion

of X

Pro

of:

By

Fac

t 3,

H(X

) =

H(X

,Y)

- H

(Y|X

) ≤

H(X

,Y),

w

here

the

ineq

ualit

y is

from

Fac

t 1, a

nd e

qual

ity h

olds

iff

H(X

|Y)

= 0

, whi

ch b

y F

act 1

hap

pens

iff

X i

s a

func

tion

of

Y.

Feb

ruar

y 6,

200

5LH

-36

Mo

re T

han

Tw

o R

and

om

Var

iab

les

Def

init

ion

s:

(d)

Ent

ropy

of

X1,

…,X

k (

som

etim

es c

alle

d th

eir

"joi

nt e

ntro

py")

H(X

1,...

,Xk)

= -

∑ x

p(x

) lo

g 2 p

(x)

w

here

x

= (

x 1,…

,xk)

(e)

Con

ditio

nal e

ntro

py o

f ra

ndom

var

iabl

es

X1,

…,X

k g

iven

par

ticul

ar v

alue

s of

rand

om v

aria

bles

Y

1,…

,Ym

H(X

1,…

,Xk|

Y1,

…,Y

m=

y 1,…

,ym

) =

-∑ x

p(x

|y)

log 2

p(x

|y),

whe

re

y =

(y 1

,…,y

m)

(f)

C

ondi

tiona

l ent

ropy

of

rand

om v

aria

bles

X1,

…,X

k g

iven

ran

dom

var

iabl

esY

1,…

,Ym

H(X

1,…

,Xk|

Y1,

…,Y

m)

= ∑ y

p(y

) H

(X1,

…,X

k|Y

1,…

,Ym

=y 1

,…,y

m)

= -∑ x,

y p

(x,y

) lo

g 2 p

(x|y

)

Not

e:If

one

thin

ks o

f (

X1,

…,X

k)

and

(Y

1,…

,Ym

) e

ach

as a

sin

gle

rand

om v

aria

ble

with

a v

ecto

r-va

lued

alp

habe

t, th

en t

he a

bove

for

mul

as a

re t

he s

ame

as t

heco

rres

pond

ing

"tw

o-ra

ndom

var

iabl

e" f

orm

ulas

.

Page 19: Higher-Order Lossless Source Codes

Feb

ruar

y 6,

200

5LH

-37

Pro

per

ties

:

6.H

(Y1,

...,Y

n|X

1,…

,Xm

) ≤

H(Y

1,…

,Yn|

X1,

…,X

m'),

0 ≤

m' <

m,

with

equ

ality

iff

Y1,

…,Y

n is

con

ditio

nally

inde

pend

ent

of

Xm

'+1,

…,X

m

give

nX

1,…

,Xm

'.

Pro

of:

Like

that

of F

act 2

.

7.C

hain

rul

e:

H(X

1,...

,Xk)

= H

(X1)

+ H

(X2|

X1)

+ H

(X3|

X1,

X2)

+ H

(X4|

X1,

X2,

X3)

+

... +

H(X

k|X

1,X

2,…

,Xk-

1)

Pro

of:

Like

that

of F

act 3

.

8.H

(X1,

...,X

k) ≤

H(X

1) +

H(X

2) +

... +

H(X

k)

with

equ

ality

iff

the

Xi's

ar

e in

depe

nden

t

Pro

of:

Fol

low

s fr

om th

e ch

ain

rule

and

the

fact

that

H

(Xi|X

1,…

,Xi-1

) ≤

H(X

i) w

itheq

ualit

y if

and

only

if X

i is

inde

pend

ent o

f X

1,…

,Xi-1

.

Feb

ruar

y 6,

200

5LH

-38

Fo

r S

tati

on

ary

Ran

do

m P

roce

sses

Let

{X

k}

be s

tatio

nary

ran

dom

pro

cess

.

Def

init

ion

s:

(g)

Hk

=∆

1 k H(X

1,...

,Xk)

(h)

H1|

m =∆

H(X

n|X

n-m

,X2,

…,X

n-1)

( H

1|0

=∆ H

(X1)

= H

1 )

(i)H

k|m

=∆

1 k H

(Xn+

k-1

n|X

n-1

n-m

)(H

1|m

= H

1|m

)

(j)H

∞ =∆

lim k→∞ H

k =

ent

ropy

rat

e of

X

Fac

ts t

o b

e es

tab

lish

ed:

Hk|

m

decr

ease

s or

sta

ys t

he s

ame

as

k o

r m

de

crea

ses

Hk|

m →

H∞ a

s k

and

/or

m g

o to

infin

ity..

Page 20: Higher-Order Lossless Source Codes

Feb

ruar

y 6,

200

5LH

-39

Pro

per

ties

:

9.H

1|m

+1 ≤

H1|

m

Pro

of:

Fol

low

s fr

om P

rope

rty

6 an

d st

atio

narit

y.

10.

Hk

= 1 k

(H1

+ H

1|1

+ H

1|2

+ …

+H

1|k-

1) ≥

H1|

k-1

≥ H

1|k

Pro

of:

Hk

= 1 k

H(X

1,…

,Xk)

= 1 k

(H(X

1) +

H(X

2|X

1) +

H(X

3|X

1,X

2) +

... +

H(X

k|X

1,X

2,…

,Xk-

1))

by c

hain

rul

e=

1 k (H

1 +

H1|

1 +

H1|

2 +

… +

H1|

k-1)

by

sta

tiona

rity

≥ 1 k

(H1|

k-1

+ H

1|k-

1 +

H1|

k-1

+ …

+ H

1|k-

1)

by

Pro

p. 9

.

= H

1|k-

1 ≥

H1|

k b

y P

rop.

9

11.

Hk+

1 ≤

Hk

Sin

ce

Hk'

s a

re n

onin

crea

sing

and

bou

nded

bel

ow b

y ze

ro,

they

mus

t ha

ve a

limit.

H

ence

, H

is a

wel

l-def

ined

qua

ntity

.

Pro

of:

By

Pro

p. 1

0,

Hk

is th

e av

erag

e of

k te

rms

H1,

H1|

1, H

1|2,

…, H

1|k-

1.

Sim

ilarly

, H

k+1

is a

vera

ge o

f k+

1 te

rms

H(X

1), H

1|1,

H1|

2, …

, H

1|k-

1, H

1|k

.

Sin

ce t

he e

xtra

ter

m in

H

k+1

is n

o la

rger

tha

n al

l oth

er t

erm

s,

Hk+

1 ≤

Hk.

Feb

ruar

y 6,

200

5LH

-40

12.

lim k→∞H

1|k

= lim k→

∞H

k =∆

H

Pro

of:

Sin

ce t

he

H1|

k's

are

non

incr

easi

ng w

ith

k a

nd b

ound

ed b

elow

by

zero

, th

ey m

ust

have

a li

mit.

S

ince

by

Pro

p. 1

0,

Hk

is t

he a

vera

ge o

f th

e k

term

s H

1, H

1|1,

H1|

2, …

, H1|

k-1,

th

e lim

it of

the

H1|

k's

equ

als

the

limit

of t

heH

k's,

w

hich

by

defin

ition

is

H∞

.

13.

Hk|

m =

1 k (H

1|m

+ H

1|m

+1 +

... +

H1|

m+k

-1)

Pro

of:

Hk|

m =

1 k

H(X

n+k-

1n

|Xn-

1n-

m)

= 1 k

(H(X

n|X

n-1

n-m

) +

H(X

n+1|

Xn n-

m)

+ ...

+ H

(Xn+

k-1|

Xn+

k-2

n-m

)) =

1 k (H

1|m

+ H

1|m

+2 +

... +

H1|

m+

k-1)

) b

y st

atio

narit

y

14.

H1|

k+m

≤ H

k|m

≤ H

k

Pro

of:

(a)

Hk|

m =

1 k H

(Xn+

k-1

n|X

n-1

n-m

) ≤

1 k H

(Xn+

k-1

n)

= H

k

(b)

By

Pro

p. 1

3,

Hk|

m

is t

he a

vera

ge o

f te

rms,

eac

h of

whi

ch is

at

leas

t H

1|k+

m.

The

refo

re,

Hk|

m ≥

H1|

k+m

.

Page 21: Higher-Order Lossless Source Codes

Feb

ruar

y 6,

200

5LH

-41

15H

K+1

|m ≤

HK

|m

Hk|

m+1

≤ H

K|m

The

refo

re,

H1

≥ H

2 ≥

H3

... d

ecre

ases

with

k

H1|

1

≥H

2|1

≥H

3|1

≥H

1|2

H3|

2≥

H3|

2≥

......

......

......

decr

ease

s w

ith

m

16.

lim k→∞H

k|m

= H

∞ f

or a

ny m

Pro

of:

Rec

all f

rom

Pro

p. 1

3 th

at

Hk|

m =

1 k (H

1|m

+ H

1|m

+2 +

... +

H1|

k+m

)

It fo

llow

s th

at t

he li

mit

as

k→∞

of

H

k|m

eq

uals

lim k→∞H

1|k+

m =

H∞

17.

lim m→

∞H

k|m

= H

for

any

k

Pro

of:

Rec

all f

rom

Pro

p. 1

3 th

at

Hk|

m =

1 k (H

1|m

+ H

1|m

+2 +

... +

H1|

k+m

)

It fo

llow

s th

at t

he li

mit

as

m→

of

Hk|

m

equa

ls

lim m→

∞H

1|k+

m =

H∞

Feb

ruar

y 6,

200

5LH

-42

Exa

mp

les:

A.

IID

sou

rce:

H(X

1) =

H1

= H

k =

H∞ =

H1|

1 =

H1|

k =

Hk|

m,

for

all k

,m

B.

(firs

t-or

der)

Mar

kov

sour

ce:

Def

init

ion

: A

sta

tiona

ry s

ourc

e is

firs

t-or

der

Mar

kov

if

p(x m

|x1,

…,x

m-1

) =

p(x

m|x

m-1

) f

or a

ll m

and

x1,

…,x

m

The

n fo

r an

y m

≥ 1

,

H1|

m =

H(X

m|X

1,…

,Xm

-1)

=

-∑

x 1…

x m

p(x

1,…

,xm

) log

2 p(

x m|x

1,…

,xm

-1)

= -

∑x 1

…x m

p(x

1,…

,xm

) lo

g 2 p

(xm

|xm

-1)

= -

∑x m

-1,x

m

p(x

m-1

,xm

) lo

g 2 p

(xm

|xm

-1)

=

H(X

m|X

m-1

) =

H

(X2|

X1)

=

H1|

1

by s

tatio

narit

y

Sin

ce a

ll th

e H

1|m

's

equa

l H

1|1,

H∞ =

lim m→

∞ H

1|m

= H

1|1

By

Pro

p. 1

0 a

nd t

he a

bove

Hk

= 1 k

(H1+

H1|

1+H

1|2+

…+H

1|k-

1) =

1 k(

H1+

(k-1

)H1|

1) =

1 k

H1+

k-1 k H

1|1

from

whi

ch w

e se

e th

at

Hk

con

verg

es d

own

to

H∞

, a

s it

shou

ld.

Page 22: Higher-Order Lossless Source Codes

Feb

ruar

y 6,

200

5LH

-43

Sim

ilarly

, on

e ca

n sh

ow

Hk|

m =

H1|

1 =

H∞

f

or a

ll k,

m

Bec

ause

H

1|1

= H

∞,

we

see

that

firs

t-or

der

cond

ition

al c

odin

g ca

n co

me

with

inon

e bi

t of

opt

imal

for

firs

t-or

der

Mar

kov

sour

ces.

In

oth

er w

ords

, th

ere

is n

ore

ason

to

use

2nd

or h

ighe

r or

der

cond

ition

al c

odin

g, a

nd t

he o

nly

reas

on t

ous

e bl

ock

codi

ng is

to

redu

ce t

he p

ossi

bly

one

bit

of r

edun

danc

y.

C.

nth

-ord

er M

arko

v so

urc

e:

Def

init

ion

: A

sta

tiona

ry s

ourc

e is

nth

-ord

er M

arko

v if

p(x k

|x1,

…,x

k-1)

= p

(xk|

x k-n

,…,x

k-1)

fo

r al

l k>

n &

x1,

…,x

k

The

n as

in t

he c

ase

of 1

st-o

rder

Mar

kov

sour

ces,

it c

an b

e sh

own

that

if

m≥n

,

H1|

m =

Hk|

m =

H1|

n =

H∞ ,

fo

r ev

ery

k,

and

that

if k

> n

,

Hk

=

1 k (nH

n+H

1|n+

H1|

n+1+

…+H

1|k-

1) =

1 k (nH

n+H

1|n+

H1|

n+…

+H1|

n)

=

n k H

n +

k-n k H

1|n

whi

ch c

onve

rges

dow

n to

H

∞ =

H1|

n a

s k

→ ∞

Sin

ce

H1|

n =

H∞

, n

th-o

rder

con

ditio

nal c

odin

g co

mes

with

in 1

bit

of o

pt'l

for

nth-

orde

r M

arko

v so

urce

s.

Thu

s th

ere

is n

o re

ason

to

use

high

er o

rder

con

di-

tiona

l cod

ing,

and

the

onl

y re

ason

to

use

bloc

k co

ding

is t

o re

duce

the

at

mos

t1

bit

redu

ndan

cy.