February 3, 2010Harvard QR481 Coding and Entropy

February 3, 2010 Harvard QR48 1

Coding and Entropy

February 3, 2010 2

Squeezing out the “Air” Suppose you want to ship pillows in boxes and

are charged by the size of the box

Lossless data compression Entropy = lower limit of compressibility

Harvard QR48

February 3, 2010 3

Claude Shannon (1916-2001)A Mathematical Theory of Communication (1948)

Harvard QR48

February 3, 2010 4

Communication over a ChannelSource Coded Bits Received Bits Decoded Message

S X Y T Channel symbols bits bits symbolsEncode bits before putting them in the channelDecode bits when they come out of the channel

E.g. the transformation from S into X changes“yea” --> 1 “nay” --> 0

Changing Y into T does the reverseFor now, assume no noise in the channel, i.e. X=Y

Harvard QR48

February 3, 2010 5

Example: TelegraphySource English letters -> Morse Code

D -..

-.. D

Baltimore

Washington-..

Harvard QR48

February 3, 2010 6

Low and High Information Content Messages The more frequent a message is, the less information it

conveys when it occurs Two weather forecast messages:

Bos:

LA: In LA “Sunny” is a low information message and “cloudy” is

a high information message

Harvard QR48

February 3, 2010 7

Harvard Grades

Less information in Harvard grades now than in recent past

% A A- B+ B B- C+

2005 24 25 21 13 6 2

1995 21 23 20 14 8 3

1986 14 19 21 17 10 5

Harvard QR48

February 3, 2010 8

Fixed Length Codes (Block Codes) Example: 4 symbols, A, B, C, D A=00, B=01, C=10, D=11 In general, with n symbols, codes need to be of

length lg n, rounded up For English text, 26 letters + space = 27 symbols,

length = 5 since 24 < 27 < 25

(replace all punctuation marks by space) AKA “block codes”

Harvard QR48

February 3, 2010 9

Modeling the Message Source

Characteristics of the stream of messages coming from the source affect the choice of the coding method

We need a model for a source of English text that can be described and analyzed mathematically

Source Destination

Harvard QR48

February 3, 2010 10

How can we improve on block codes? Simple 4-symbol example: A, B, C, D If that is all we know, need 2 bits/symbol What if we know symbol frequencies? Use shorter codes for more frequent symbols

Morse Code does something like this Example:

A B C D

.7 .1 .1 .1

0 100 101 110

Harvard QR48

February 3, 2010 11

Prefix CodesOnly one way to decode left to right

A B C D

.7 .1 .1 .1

0 100 101 110

Harvard QR48

February 3, 2010 12

Minimum Average Code Length?

Average bits per symbol:

A B C D

.7 .1 .1 .1

0 100 101 110

A B C D

.7 .1 .1 .1

0 10 110 111.7·1+.1·2+.1·3+.1·3 = 1.5

.7·1+.1·3+.1·3+.1·3 = 1.6

bits/symbol (down from 2)

Harvard QR48

February 3, 2010 13

Entropy of this code <= 1.5 bits/symbol

A B C D

.7 .1 .1 .1

0 10 110 111.7·1+.1·2+.1·3+.1·3 = 1.5

Possibly lower? How low?

Harvard QR48


Self-Information If a symbol S has frequency p, its self-

information is H(S) = lg(1/p) = -lg p.

S A B C D

p .25 .25 .25 .25

H(S) 2 2 2 2

p .7 .1 .1 .1

H(S) .51 3.32 3.32 3.32


First-Order Entropy of Source = Average Self-Information

S A B C D

p .25 .25 .25 .25

-lgp 2 2 2 2

-plgp .5 .5 .5 .5

p .7 .1 .1 .1

-lgp .51 3.32 3.32 3.32

-plgp .357 .332 .332 .332

-∑ plgp

2

1.353


Entropy, Compressibility, Redundancy Lower entropy More redundant More

compressible Less information Higher entropy Less redundant Less

compressible More information A source of “yea”s and “nay”s takes 24 bits per

symbol but contains at most one bit per symbol of information

010110010100010101000001 = yea 010011100100000110101001 = nay


A B C D

.7 .1 .1 .1

0 10 110 111

Entropy and Compression

Average length for this code =.7·1+.1·2+.1·3+.1·3 = 1.5

No code taking only symbol frequencies into account can be better than first-order entropy

First-order Entropy of this source = .7·lg(1/.7)+.1·lg(1/.1)+ .1·lg(1/.1)+.1·lg(1/.1) = 1.353

First-order Entropy of English is about 4 bits/character based on “typical” English texts

“Efficiency” of code = (entropy of source)/(average code length) = 1.353/1.5 = 90%


A Simple Prefix Code:Huffman Codes Suppose we know the symbol frequencies. We

can calculate the (first-order) entropy. Can we design a code to match?

There is an algorithm that transforms a set of symbol frequencies into a variable-length, prefix code that achieves average code length approximately equal to the entropy.

David Huffman, 1951


Huffman Code ExampleA

.35

B

.05

C

.2

D

.15

E

.25

BD

.2

BCD

.4AE

.6

ABCDE

1.0


Huffman Code ExampleA

.35

B

.05

C

.2

D

.15

E

.25

BD

.2

BCD

.4AE

.6

ABCDE

1.0

0 1

01

0

1

01

A 00

B 100

C 11

D 101

E 01

Entropy 2.12

Ave length

2.20


Efficiency of Huffman Codes Huffman codes are as efficient as possible if only

first-order information (symbol frequencies) is taken into account.

Huffman code is always within 1 bit/symbol of the entropy.


Second-Order Entropy Second-Order Entropy of a source is

calculated by treating digrams as single symbols according to their frequencies

Occurrences of q and u are not independent so it is helpful to treat qu as one

Second-order entropy of English is about 3.3 bits/character

How English Would Look Based on frequencies alone

• 0: xfoml rxkhrjffjuj zlpwcfwkcyj ffjeyvkcqsghyd qpaamkbzaacibzlhjqd

• 1: ocroh hli rgwr nmielwis eu ll nbnesebya th eei alhenhttpa oobttva

• 2: On ie antsoutinys are t inctore st be s deamy achin d ilonasive tucoowe at

• 3: IN NO IST LAT WHEY CRATICT FROURE BIRS GROCID PONDENOME OF DEMONSTURES OF THE REPTAGIN IS REGOACTIONA


How English Would Look Based on word frequencies

• 1) REPRESENTING AND SPEEDILY IS AN GOOD APT OR COME CAN DIFFERENT NATURAL HERE HE THE A IN CAME THE TO OF TO EXPERT GRAY COME TO FURNISHES THE LINE MESSAGE HAD BE THESE

• 2) THE HEAD AND IN FRONTAL ATTACK ON AN ENGLISH WRITER THAT THE CHARACTER OF THIS POINT IS THEREFORE ANOTHER METHOD FOR THE LETTERS THAT THE TIME OF WHO EVER TOLD THE PROBLEM FOR AN UNEXPECTED



What is entropy of English? Entropy is the “limit” of the information per

symbol using single symbols, digrams, trigrams, …

Not really calculable because English is a finite language!

Nonetheless it can be determined experimentally using Shannon’s game

Answer: a little more than 1 bit/character


Shannon’s Remarkable 1948 paper


Shannon’s Source Coding Theorem No code can achieve efficiency greater

than 1, but For any source, there are codes with

efficiency as close to 1 as desired. The proof does not give a method to find

the best codes. It just sets a limit on how good they can be.


Huffman coding used widely Eg JPEGs use Huffman codes to for the

pixel-to-pixel changes in color values Colors usually change gradually so there are many

small numbers, 0, 1, 2, in this sequence

JPEGs sometimes use a fancier compression method called “arithmetic coding”

Arithmetic coding produces 5% better compression


Why don’t JPEGs use arithmetic coding?

Because it is patented by IBMUnited States Patent 4,905,297

Langdon, Jr. , et al. February 27, 1990

Arithmetic coding encoder and decoder system Abstract Apparatus and method for compressing and de-compressing binary decision data by arithmetic coding and decoding wherein the estimated probability Qe of the less probable of the two decision events, or outcomes, adapts as decisions are successively encoded. To facilitate coding computations, an augend value A for the current number line interval is held to approximate …

What if Huffman had patented his code?

Documents

February 3, 2010Harvard QR481 Coding and Entropy