Digital Communication Systems 452 - 2... · 2019. 1. 9. · [The ARRL Handbook for Radio Communications 2013] 01632 48 64 80 96 112 (American Standard Code for Information Interchange)

1

Digital Communication SystemsECS 452

Asst. Prof. Dr. Prapun [email protected]

2. Source Coding

Office Hours: BKD, 6th floor of Sirindhralai building

Monday 10:00-10:40Tuesday 12:00-12:40Thursday 14:20-15:30

Elements of digital commu. sys.

2

Noise & In

terferen

ce

Information Source

Destination

Channel

ReceivedSignal

TransmittedSignal

Message

Recovered Message

Source Encoder

Channel Encoder

DigitalModulator

Source Decoder

Channel Decoder

DigitalDemodulator

Transmitter

Receiver

Remove redundancy

Add systematic redundancy

System Under Consideration

3

Noise & In

terferen

ce

Information Source

Destination

Channel

ReceivedSignal

TransmittedSignal

Message

Recovered Message

Source Encoder

Channel Encoder

DigitalModulator

Source Decoder

Channel Decoder

DigitalDemodulator

Transmitter

Receiver

Remove redundancy

Add systematic redundancy

Main Reference

4

Elements of Information Theory

2006, 2nd Edition

Chapters 2, 4 and 5

‘the jewel in Stanford's crown’

One of the greatest information theorists since Claude Shannon (and the one most like Shannon in approach, clarity, and taste).

English Alphabet (Non-Technical Use)

5

The ASCII Coded Character Set

6[The ARRL Handbook for Radio Communications 2013]

0 16 32 48 64 80 96 112

US UK

(American Standard Code for Information Interchange)

Example: ASCII Encoder

7

Characterx

Codewordc(x)

⋮E 1000101

⋮L 1001100

⋮O 1001111

⋮V 1010110

⋮

SourceEncoder

Information Source

“LOVE”“1001100100111110101101000101”

>> M = 'LOVE';>> X = dec2bin(M,7);>> X = reshape(X',1,numel(X))X =1001100100111110101101000101

MATLAB:

Remark:numel(A) = prod(size(A))(the number of elements in matrix A)

English Redundancy: Ex. 1

8

J-st tr- t- r--d th-s s-nt-nc-.


9

yxx cxn xndxrstxndwhxt x xm wrxtxngxvxn xf x rxplxcx xllthx vxwxls wxth xn 'x' (t gts lttl hrdr f y dn'tvn kn whr th vwls r).


10

To be, or xxx xx xx, xxxx xx xxx xxxxxxxx

Entropy Rate of Thai Text

11

Introduction to Data Compression

12 [ https://www.khanacademy.org/computing/computer-science/informationtheory/moderninfotheory/v/compressioncodes ]

Introduction to Data Compression


ASCII: Source Alphabet of Size = 128


0 16 32 48 64 80 96 112

(American Standard Code for Information Interchange)

Ex. Source alphabet of size = 4

15

Ex. DMS (1)

16

, , , ,X a b c d e

a c a c e c d b c ed a e e d a b b b db b a a b e b e d cc e d b c e c a a ca a e a c c a a d cd e e a a c a a a bb c a e b b e d b cd e b c a e e d d cd a b c a b c d d ed c e a b a a c a d

Information Source

1 , , , , ,50, otherwise

X

x a b c d ep x

Approximately 20% are letter ‘a’s[GenRV_Discrete_datasample_Ex1.m]

Ex. DMS (1)

17 [GenRV_Discrete_datasample_Ex1.m]

clear all; close all;

S_X = 'abcde'; p_X = [1/5 1/5 1/5 1/5 1/5];

n = 100;MessageSequence = datasample(S_X,n,'Weights',p_X)MessageSequence = reshape(MessageSequence,10,10)

>> GenRV_Discrete_datasample_Ex1

MessageSequence =

eebbedddeceacdbcbedeecacaecedcaedabecccabbcccebdbbbeccbadeaaaecceccdaccedadabceddaceadacdaededcdcade

MessageSequence =

eeeabbacdeeacebeeeadbcadcccdcebdcacccaedebabcbedacdceeeacadddbccbdcbacdeecdedccaeddcbaaeddcecabacdae

Ex. DMS (2)

18

1,2,3,4X

1 , 1,21 , 2,41 , 3,480, otherwise

X

x

xp x

x

Information Source

Approximately 50% are number ‘1’s

2 1 1 2 1 4 1 1 1 11 1 4 1 1 2 4 2 2 13 1 1 2 3 2 4 1 2 42 1 1 2 1 1 3 3 1 11 3 4 1 4 1 1 2 4 14 1 4 1 2 2 1 4 2 14 1 1 1 1 2 1 4 2 42 1 1 1 2 1 2 1 3 22 1 1 1 1 1 1 2 3 22 1 1 2 1 4 2 1 2 1

[GenRV_Discrete_datasample_Ex2.m]

Ex. DMS (2)

19 [GenRV_Discrete_datasample_Ex2.m]

clear all; close all;

S_X = [1 2 3 4]; p_X = [1/2 1/4 1/8 1/8];

n = 20;

MessageSequence = randsrc(1,n,[S_X;p_X]);%MessageSequence = datasample(S_X,n,'Weights',p_X);

rf = hist(MessageSequence,S_X)/n; % Ref. Freq. calc.stem(S_X,rf,'rx','LineWidth',2) % Plot Rel. Freq.hold onstem(S_X,p_X,'bo','LineWidth',2) % Plot pmfxlim([min(S_X)-1,max(S_X)+1])legend('Rel. freq. from sim.','pmf p_X(x)')xlabel('x')grid on

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

x

Rel. freq. from sim.pmf pX(x)

20

DMS in MATLABclear all; close all;

S_X = [1 2 3 4]; p_X = [1/2 1/4 1/8 1/8]; n = 1e6;

SourceString = randsrc(1,n,[S_X;p_X]);

rf = hist(SourceString,S_X)/n; % Ref. Freq. calc.stem(S_X,rf,'rx','LineWidth',2) % Plot Rel. Freq.hold onstem(S_X,p_X,'bo','LineWidth',2) % Plot pmfxlim([min(S_X)-1,max(S_X)+1])legend('Rel. freq. from sim.','pmf p_X(x)')xlabel('x')grid on

SourceString = datasample(S_X,n,'Weights',p_X);

Alternatively, we can also use

[GenRV_Discrete_datasample_Ex.m]

A more realistic example of pmf:

21 [http://en.wikipedia.org/wiki/Letter_frequency]

Relative freq. of letters in the English language

A more realistic example of pmf:

22

Relative freq. of letters in the English languageordered by frequency

[http://en.wikipedia.org/wiki/Letter_frequency]


23

Characterx

Codewordc(x)

⋮E 1000101

⋮L 1001100

⋮O 1001111

⋮V 1010110

⋮

SourceEncoder

Information Source

“LOVE”“1001100100111110101101000101”


MATLAB:

c(“L”) c(“O”) c(“V”) c(“E”)

Cod

eboo

k

Remark:numel(A) = prod(size(A))(the number of elements in matrix A)

The ASCII Coded Character Set


0 16 32 48 64 80 96 112

A Byte (8 bits) vs. 7 bits

25

>> dec2bin('I Love ECS452',7)ans =1001001010000010011001101111111011011001010100000100010110000111010011011010001101010110010

>> dec2bin('I Love ECS452',8)ans =01001001001000000100110001101111011101100110010100100000010001010100001101010011001101000011010100110010

>> dec2bin('I Love You',8)ans =01001001001000000100110001101111011101100110010100100000010110010110111101110101

Geeky ways to express your love

26

>> dec2bin('i love you',8)ans =01101001001000000110110001101111011101100110010100100000011110010110111101110101

https://www.etsy.com/listing/91473057/binary-i-love-you-printable-for-your?ref=sr_gallery_9&ga_search_query=binary&ga_filters=holidays+-supplies+valentine&ga_search_type=all&ga_view_type=galleryhttp://mentalfloss.com/article/29979/14-geeky-valentines-day-cardshttps://www.etsy.com/listing/174002615/binary-love-geeky-romantic-pdf-cross?ref=sr_gallery_26&ga_search_query=binary&ga_filters=holidays+-supplies+valentine&ga_search_type=all&ga_view_type=galleryhttps://www.etsy.com/listing/185919057/i-love-you-binary-925-silver-dog-tag-can?ref=sc_3&plkey=cdf3741cf5c63291bbc127f1fa7fb03e641daafd%3A185919057&ga_search_query=binary&ga_filters=holidays+-supplies+valentine&ga_search_type=all&ga_view_type=gallery http://www.cafepress.com/+binary-code+long_sleeve_tees

Summary: Source Encoder

27

SourceEncoder

Information Source

“LOVE”“1001100100111110101101000101”

c(“L”) c(“O”) c(“V”) c(“E”)

• An encoder · is a function that maps each of the symbol in the source alphabet into a corresponding (binary) codeword.

• The list for such mapping is called the codebook.

Source Symbolx

Codewordc(x)

⋮E 1000101

⋮L 1001100

⋮O 1001111

⋮V 1010110

⋮

Discrete Memoryless Source (DMS)

• The codewordcorresponding to a source symbol is denoted by

.• the length of

• Each codeword is constructed from a code alphabet.

• For binary codeword, the code alphabet is 0,1

source string encoded string

• The source alphabet is the collection of all possible source symbols.

• Each symbol that the source generates is assumed to be randomly selected from the source alphabet.

w/o extension

Morse code

28

Telegraph network

Samuel Morse, 1838

A sequence of on-off tones (or , lights, or clicks)

(wired and wireless)

Example

29 [http://www.wolframalpha.com/input/?i=%22I+love+you.%22+in+Morse+code]

Example

30

Morse code: Key Idea

31

Frequently-used characters are mapped to short codewords.

Relative frequencies of letters in the English language


32

Frequently-used characters (e,t) are mapped to short codewords.

Relative frequencies of letters in the English language


33

Frequently-used characters (e,t) are mapped to short codewords.

Basic form of compression.

รหสัมอร์สภาษาไทย

34


35

Character Codeword

⋮E 1000101

⋮L 1001100

⋮O 1001111

⋮V 1010110

⋮

SourceEncoder

Information Source

“LOVE”“1001100100111110101101000101”


MATLAB:

Another Example of non-UD code

36

Suppose we want to convey the sequence of outcomes from rolling a dice.

x c(x)

1 1

2 10

3 11

4 100

5 101

6 110

A sequence of throws such as 53214 is encoded as 10111101100


37

Suppose we want to convey the sequence of outcomes from rolling a dice.

x c(x)

1 1

2 10

3 11

4 100

5 101

6 110

The encoded string 11 could be interpreted as 11: 1 1 3: 11

The encoded string 110 could be interpreted as 12: 1 10 6: 110


38

x c(x)

A 1

B 011

C 01110

D 1110

E 10011


39

x c(x)

A 1

B 011

C 01110

D 1110

E 10011

Consider the encoded string 011101110011.

It can be interpreted as CDB: 01110 1110 011 BABE: 011 1 011 10011

[ https://en.wikipedia.org/wiki/Sardinas%E2%80%93Patterson_algorithm ]

Game: 20 Questions

40

20 Questions is a classic game that has been played since the 19th century.

One person thinks of something (an object, a person, an animal, etc.)

The others playing can ask 20 questions in an effort to guess what it is.

20 Questions: Example

41

Shannon–Fano coding

42

Proposed in Shannon’s “A Mathematical Theory of Communication” in 1948

The method was attributed to Fano, who later published it as a technical report. Fano, R.M. (1949). “The transmission of information”.

Technical Report No. 65. Cambridge (Mass.), USA: Research Laboratory of Electronics at MIT.

Should not be confused with Shannon coding, the coding method used to prove Shannon's

noiseless coding theorem, or with Shannon–Fano–Elias coding (also known as Elias coding), the

precursor to arithmetic coding.

Prof. Robert Fano (1917-2016)Shannon Award (1976 )

Huffman Code

43

MIT, 1951 Information theory class taught by Professor Fano. Huffman and his classmates were given the choice of

a term paper on the problem of finding the most efficient binary code.

or a final exam.

Huffman, unable to prove any codes were the most efficient, was about to give up and start studying for the final when he hit upon the idea of using a frequency-sorted binary tree and quickly proved this method the most efficient.

Huffman avoided the major flaw of the suboptimal Shannon-Fanocoding by building the tree from the bottom up instead of from the top down.

David Huffman (1925–1999)Hamming Medal (1999)

Huffman’s paper (1952)

44[D. A. Huffman, "A Method for the Construction of Minimum-Redundancy Codes," in Proceedings of the IRE, vol. 40, no. 9, pp. 1098-1101, Sept. 1952.][ http://ieeexplore.ieee.org/document/4051119/ ]

Summary:

45

[2.16-17] A good code must be uniquely decodable (UD). Difficult to check.

[2.24] Consider a special family of codes: prefix(-free) code. Always UD. Same as being instantaneous.

All codes

Nonsingular codes

UD codes

Prefix-free

Huffman

codes

codes

[Defn 2.30] Huffman’s recipe Repeatedly combine the two least-likely (combined) symbols. Automatically give prefix-free code.

[2.37] For a given source’s pmf, Huffman codes are optimalamong all UD codes for that source.

[Defn 2.36]

Each source symbol can be decoded as soon as we come to the end of the codeword corresponding to it

No codeword is a prefix of any other codeword.

[Defn 2.18]

Huffman coding


Ex. Huffman Coding in MATLAB

47 [Huffman_Demo_Ex1]

Observe that MATLAB automatically give the expected length of the codewords

pX = [0.5 0.25 0.125 0.125]; % pmf of XSX = [1:length(pX)]; % Source Alphabet[dict,EL] = huffmandict(SX,pX); % Create codebook

%% Pretty print the codebook.codebook = dict;for i = 1:length(codebook)

codebook{i,2} = num2str(codebook{i,2});endcodebook

%% Try to encode some random source stringn = 5; % Number of source symbols to be generatedsourceString = randsrc(1,10,[SX; pX]) % Create data using pXencodedString = huffmanenco(sourceString,dict) % Encode the data

[Ex. 2.31]


48

codebook =

[1] '0' [2] '1 0' [3] '1 1 1'[4] '1 1 0'

sourceString =

1 4 4 1 3 1 1 4 3 4

encodedString =

0 1 1 0 1 1 0 0 1 1 1 0 0 1 1 0 1 1 1 1 1 0

[Huffman_Demo_Ex1]


49 [Huffman_Demo_Ex2]

pX = [0.4 0.3 0.1 0.1 0.06 0.04]; % pmf of XSX = [1:length(pX)]; % Source Alphabet[dict,EL] = huffmandict(SX,pX); % Create codebook



EL

[Ex. 2.32]

The codewords can be different from our answers found earlier.

The expected length is the same.

>> Huffman_Demo_Ex2

codebook =

[1] '1' [2] '0 1' [3] '0 0 0 0' [4] '0 0 1' [5] '0 0 0 1 0'[6] '0 0 0 1 1'

EL =

2.2000


50

pX = [1/8, 5/24, 7/24, 3/8]; % pmf of XSX = [1:length(pX)]; % Source Alphabet[dict,EL] = huffmandict(SX,pX); % Create codebook



EL

[Exercise]

codebook = [1] '0 0 1'[2] '0 0 0'[3] '0 1' [4] '1'

EL =1.9583

>> -pX*(log2(pX)).'ans =

1.8956

56

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-0.4

-0.35

-0.3

-0.25

-0.2

-0.15

-0.1

-0.05

0

x

Entropy and Description of RV

57 [ https://www.khanacademy.org/computing/computer-science/informationtheory/moderninfotheory/v/information-entropy ]

Entropy and Description of RV

58 [ https://www.khanacademy.org/computing/computer-science/informationtheory/moderninfotheory/v/information-entropy ]

Summary: Optimality of Huffman Codes

59

Consider a given DMS withknown pmf … [Defn 2.36] A code is optimal

if it is UD and its corresponding expected length is the shortestamong all possible UD codes for that source.

[2.37] Huffman codes are optimal.

All codes

Nonsingular codes

UD codes

Prefix-free

Huffman

codes

codes

Expected length (per source symbol) of an optimal code

Expected length (per source symbol) of a Huffman code

1

[2.49-2.54] Bounds on expected lengths:

Summary: Entropy

60

Entropy measures the amount of uncertainty (randomness) in a RV.

Three formulas for calculating entropy: [Defn 2.41] Given a pmf of a RV , ≡ ∑ log .

[2.44] Given a probability vector ,

≡ ∑ log .

[Defn 2.47] Given a number , ≡ log 1 log 1

[2.56] Operational meaning: Entropy of a random variable is the average length of its shortest description.

Set 0log 0 0.

binary entropy function

Examples

61

Example 2.31

Example 2.32

Huffman

1.75H X X

Huffman

2.14 2.2H X X

Efficiency = 100%

Efficiency 97%

Examples

62

Example 2.33

Example 2.34

ABCD

Huffman

2.29 2.3H X X

Huffman

1.86 2H X X Efficiency 93%

Efficiency 99%

Summary: Entropy

63

Important Bounds

deterministic uniform The entropy of a uniform (discrete) random variable:

The entropy of a Bernoulli random variable:

binary entropy function

Huffman Coding: Source Extension

66

1 2 3 4 5 6 7 80.4

0.5

0.6

0.7

0.8

0.9

1

n: order of extension

i.i.d.

BernoullikX p0.1p

nL

1

0.533

0.645

[Ex.2.40]

Huffman Coding: Source Extension

67

1 2 3 4 5 6 7 80.4

0.6

0.8

1

1.2

1.4

1.6

1.8

Order of source extension

H X

1H Xn

n

i.i.d.

BernoullikX p0.1p

nL

[Ex.2.40]

Summary: Source Extension

68

The encoder operates on the blocks rather than on individual symbols.

[Defn 2.39] -th extension coding: 1 block = successive source symbols

= expected (average) codeword length per source symbol when Huffman coding is used with -th extension

→

1 2 3 4 5 6 7 80.4

0.6

0.8

1

1.2

1.4

1.6

1.8

Order of source extension

H X

1H Xn

nL

Documents

Digital Communication Systems 452 - 2... · 2019. 1. 9. · [The ARRL Handbook for Radio Communications 2013] 01632 48 64 80 96 112 (American Standard Code for Information Interchange)