29
February 3, 2010 Harvard QR48 1 Coding and Entropy

February 3, 2010Harvard QR481 Coding and Entropy

  • View
    225

  • Download
    1

Embed Size (px)

Citation preview

Page 1: February 3, 2010Harvard QR481 Coding and Entropy

February 3, 2010 Harvard QR48 1

Coding and Entropy

Page 2: February 3, 2010Harvard QR481 Coding and Entropy

February 3, 2010 2

Squeezing out the “Air” Suppose you want to ship pillows in boxes and

are charged by the size of the box

Lossless data compression Entropy = lower limit of compressibility

Harvard QR48

Page 3: February 3, 2010Harvard QR481 Coding and Entropy

February 3, 2010 3

Claude Shannon (1916-2001)A Mathematical Theory of Communication (1948)

Harvard QR48

Page 4: February 3, 2010Harvard QR481 Coding and Entropy

February 3, 2010 4

Communication over a ChannelSource Coded Bits Received Bits Decoded Message

S X Y T Channel symbols bits bits symbolsEncode bits before putting them in the channelDecode bits when they come out of the channel

E.g. the transformation from S into X changes“yea” --> 1 “nay” --> 0

Changing Y into T does the reverseFor now, assume no noise in the channel, i.e. X=Y

Harvard QR48

Page 5: February 3, 2010Harvard QR481 Coding and Entropy

February 3, 2010 5

Example: TelegraphySource English letters -> Morse Code

D -..

-.. D

Baltimore

Washington-..

Harvard QR48

Page 6: February 3, 2010Harvard QR481 Coding and Entropy

February 3, 2010 6

Low and High Information Content Messages The more frequent a message is, the less information it

conveys when it occurs Two weather forecast messages:

Bos:

LA: In LA “Sunny” is a low information message and “cloudy” is

a high information message

Harvard QR48

Page 7: February 3, 2010Harvard QR481 Coding and Entropy

February 3, 2010 7

Harvard Grades

Less information in Harvard grades now than in recent past

% A A- B+ B B- C+

2005 24 25 21 13 6 2

1995 21 23 20 14 8 3

1986 14 19 21 17 10 5

Harvard QR48

Page 8: February 3, 2010Harvard QR481 Coding and Entropy

February 3, 2010 8

Fixed Length Codes (Block Codes) Example: 4 symbols, A, B, C, D A=00, B=01, C=10, D=11 In general, with n symbols, codes need to be of

length lg n, rounded up For English text, 26 letters + space = 27 symbols,

length = 5 since 24 < 27 < 25

(replace all punctuation marks by space) AKA “block codes”

Harvard QR48

Page 9: February 3, 2010Harvard QR481 Coding and Entropy

February 3, 2010 9

Modeling the Message Source

Characteristics of the stream of messages coming from the source affect the choice of the coding method

We need a model for a source of English text that can be described and analyzed mathematically

Source Destination

Harvard QR48

Page 10: February 3, 2010Harvard QR481 Coding and Entropy

February 3, 2010 10

How can we improve on block codes? Simple 4-symbol example: A, B, C, D If that is all we know, need 2 bits/symbol What if we know symbol frequencies? Use shorter codes for more frequent symbols

Morse Code does something like this Example:

A B C D

.7 .1 .1 .1

0 100 101 110

Harvard QR48

Page 11: February 3, 2010Harvard QR481 Coding and Entropy

February 3, 2010 11

Prefix CodesOnly one way to decode left to right

A B C D

.7 .1 .1 .1

0 100 101 110

Harvard QR48

Page 12: February 3, 2010Harvard QR481 Coding and Entropy

February 3, 2010 12

Minimum Average Code Length?

Average bits per symbol:

A B C D

.7 .1 .1 .1

0 100 101 110

A B C D

.7 .1 .1 .1

0 10 110 111.7·1+.1·2+.1·3+.1·3 = 1.5

.7·1+.1·3+.1·3+.1·3 = 1.6

bits/symbol (down from 2)

Harvard QR48

Page 13: February 3, 2010Harvard QR481 Coding and Entropy

February 3, 2010 13

Entropy of this code <= 1.5 bits/symbol

A B C D

.7 .1 .1 .1

0 10 110 111.7·1+.1·2+.1·3+.1·3 = 1.5

Possibly lower? How low?

Harvard QR48

Page 14: February 3, 2010Harvard QR481 Coding and Entropy

February 3, 2010 Harvard QR48 14

Self-Information If a symbol S has frequency p, its self-

information is H(S) = lg(1/p) = -lg p.

S A B C D

p .25 .25 .25 .25

H(S) 2 2 2 2

p .7 .1 .1 .1

H(S) .51 3.32 3.32 3.32

Page 15: February 3, 2010Harvard QR481 Coding and Entropy

February 3, 2010 Harvard QR48 15

First-Order Entropy of Source = Average Self-Information

S A B C D

p .25 .25 .25 .25

-lgp 2 2 2 2

-plgp .5 .5 .5 .5

p .7 .1 .1 .1

-lgp .51 3.32 3.32 3.32

-plgp .357 .332 .332 .332

-∑ plgp

2

1.353

Page 16: February 3, 2010Harvard QR481 Coding and Entropy

February 3, 2010 Harvard QR48 16

Entropy, Compressibility, Redundancy Lower entropy More redundant More

compressible Less information Higher entropy Less redundant Less

compressible More information A source of “yea”s and “nay”s takes 24 bits per

symbol but contains at most one bit per symbol of information

010110010100010101000001 = yea 010011100100000110101001 = nay

Page 17: February 3, 2010Harvard QR481 Coding and Entropy

February 8, 2010 Harvard QR48 17

A B C D

.7 .1 .1 .1

0 10 110 111

Entropy and Compression

Average length for this code =.7·1+.1·2+.1·3+.1·3 = 1.5

No code taking only symbol frequencies into account can be better than first-order entropy

First-order Entropy of this source = .7·lg(1/.7)+.1·lg(1/.1)+ .1·lg(1/.1)+.1·lg(1/.1) = 1.353

First-order Entropy of English is about 4 bits/character based on “typical” English texts

“Efficiency” of code = (entropy of source)/(average code length) = 1.353/1.5 = 90%

Page 18: February 3, 2010Harvard QR481 Coding and Entropy

February 8, 2010 Harvard QR48 18

A Simple Prefix Code:Huffman Codes Suppose we know the symbol frequencies. We

can calculate the (first-order) entropy. Can we design a code to match?

There is an algorithm that transforms a set of symbol frequencies into a variable-length, prefix code that achieves average code length approximately equal to the entropy.

David Huffman, 1951

Page 19: February 3, 2010Harvard QR481 Coding and Entropy

February 8, 2010 Harvard QR48 19

Huffman Code ExampleA

.35

B

.05

C

.2

D

.15

E

.25

BD

.2

BCD

.4AE

.6

ABCDE

1.0

Page 20: February 3, 2010Harvard QR481 Coding and Entropy

February 8, 2010 Harvard QR48 20

Huffman Code ExampleA

.35

B

.05

C

.2

D

.15

E

.25

BD

.2

BCD

.4AE

.6

ABCDE

1.0

0 1

01

0

1

01

A 00

B 100

C 11

D 101

E 01

Entropy 2.12

Ave length

2.20

Page 21: February 3, 2010Harvard QR481 Coding and Entropy

February 8, 2010 Harvard QR48 21

Efficiency of Huffman Codes Huffman codes are as efficient as possible if only

first-order information (symbol frequencies) is taken into account.

Huffman code is always within 1 bit/symbol of the entropy.

Page 22: February 3, 2010Harvard QR481 Coding and Entropy

February 8, 2010 Harvard QR48 22

Second-Order Entropy Second-Order Entropy of a source is

calculated by treating digrams as single symbols according to their frequencies

Occurrences of q and u are not independent so it is helpful to treat qu as one

Second-order entropy of English is about 3.3 bits/character

Page 23: February 3, 2010Harvard QR481 Coding and Entropy

How English Would Look Based on frequencies alone

• 0: xfoml rxkhrjffjuj zlpwcfwkcyj ffjeyvkcqsghyd qpaamkbzaacibzlhjqd

• 1: ocroh hli rgwr nmielwis eu ll nbnesebya th eei alhenhttpa oobttva

• 2: On ie antsoutinys are t inctore st be s deamy achin d ilonasive tucoowe at

• 3: IN NO IST LAT WHEY CRATICT FROURE BIRS GROCID PONDENOME OF DEMONSTURES OF THE REPTAGIN IS REGOACTIONA

February 8, 2010 Harvard QR48 23

Page 24: February 3, 2010Harvard QR481 Coding and Entropy

How English Would Look Based on word frequencies

• 1) REPRESENTING AND SPEEDILY IS AN GOOD APT OR COME CAN DIFFERENT NATURAL HERE HE THE A IN CAME THE TO OF TO EXPERT GRAY COME TO FURNISHES THE LINE MESSAGE HAD BE THESE

• 2) THE HEAD AND IN FRONTAL ATTACK ON AN ENGLISH WRITER THAT THE CHARACTER OF THIS POINT IS THEREFORE ANOTHER METHOD FOR THE LETTERS THAT THE TIME OF WHO EVER TOLD THE PROBLEM FOR AN UNEXPECTED

February 8, 2010 Harvard QR48 24

Page 25: February 3, 2010Harvard QR481 Coding and Entropy

February 8, 2010 Harvard QR48 25

What is entropy of English? Entropy is the “limit” of the information per

symbol using single symbols, digrams, trigrams, …

Not really calculable because English is a finite language!

Nonetheless it can be determined experimentally using Shannon’s game

Answer: a little more than 1 bit/character

Page 26: February 3, 2010Harvard QR481 Coding and Entropy

February 8, 2010 Harvard QR48 26

Shannon’s Remarkable 1948 paper

Page 27: February 3, 2010Harvard QR481 Coding and Entropy

February 8, 2010 Harvard QR48 27

Shannon’s Source Coding Theorem No code can achieve efficiency greater

than 1, but For any source, there are codes with

efficiency as close to 1 as desired. The proof does not give a method to find

the best codes. It just sets a limit on how good they can be.

Page 28: February 3, 2010Harvard QR481 Coding and Entropy

February 8, 2010 Harvard QR48 28

Huffman coding used widely Eg JPEGs use Huffman codes to for the

pixel-to-pixel changes in color values Colors usually change gradually so there are many

small numbers, 0, 1, 2, in this sequence

JPEGs sometimes use a fancier compression method called “arithmetic coding”

Arithmetic coding produces 5% better compression

Page 29: February 3, 2010Harvard QR481 Coding and Entropy

February 8, 2010 Harvard QR48 29

Why don’t JPEGs use arithmetic coding?

Because it is patented by IBMUnited States Patent 4,905,297

Langdon, Jr. ,   et al. February 27, 1990

Arithmetic coding encoder and decoder system Abstract Apparatus and method for compressing and de-compressing binary decision data by arithmetic coding and decoding wherein the estimated probability Qe of the less probable of the two decision events, or outcomes, adapts as decisions are successively encoded. To facilitate coding computations, an augend value A for the current number line interval is held to approximate …

What if Huffman had patented his code?