Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
1
Digital Communication SystemsECS 452
Asst. Prof. Dr. Prapun [email protected]
2. Source Coding
Office Hours: BKD, 6th floor of Sirindhralai building
Monday 10:00-10:40Tuesday 12:00-12:40Thursday 14:20-15:30
Elements of digital commu. sys.
2
Noise & In
terferen
ce
Information Source
Destination
Channel
ReceivedSignal
TransmittedSignal
Message
Recovered Message
Source Encoder
Channel Encoder
DigitalModulator
Source Decoder
Channel Decoder
DigitalDemodulator
Transmitter
Receiver
Remove redundancy
Add systematic redundancy
System Under Consideration
3
Noise & In
terferen
ce
Information Source
Destination
Channel
ReceivedSignal
TransmittedSignal
Message
Recovered Message
Source Encoder
Channel Encoder
DigitalModulator
Source Decoder
Channel Decoder
DigitalDemodulator
Transmitter
Receiver
Remove redundancy
Add systematic redundancy
Main Reference
4
Elements of Information Theory
2006, 2nd Edition
Chapters 2, 4 and 5
‘the jewel in Stanford's crown’
One of the greatest information theorists since Claude Shannon (and the one most like Shannon in approach, clarity, and taste).
English Alphabet (Non-Technical Use)
5
The ASCII Coded Character Set
6[The ARRL Handbook for Radio Communications 2013]
0 16 32 48 64 80 96 112
US UK
(American Standard Code for Information Interchange)
Example: ASCII Encoder
7
Characterx
Codewordc(x)
⋮E 1000101
⋮L 1001100
⋮O 1001111
⋮V 1010110
⋮
SourceEncoder
Information Source
“LOVE”“1001100100111110101101000101”
>> M = 'LOVE';>> X = dec2bin(M,7);>> X = reshape(X',1,numel(X))X =1001100100111110101101000101
MATLAB:
Remark:numel(A) = prod(size(A))(the number of elements in matrix A)
English Redundancy: Ex. 1
8
J-st tr- t- r--d th-s s-nt-nc-.
English Redundancy: Ex. 2
9
yxx cxn xndxrstxndwhxt x xm wrxtxngxvxn xf x rxplxcx xllthx vxwxls wxth xn 'x' (t gts lttl hrdr f y dn'tvn kn whr th vwls r).
English Redundancy: Ex. 3
10
To be, or xxx xx xx, xxxx xx xxx xxxxxxxx
Entropy Rate of Thai Text
11
Introduction to Data Compression
12 [ https://www.khanacademy.org/computing/computer-science/informationtheory/moderninfotheory/v/compressioncodes ]
Introduction to Data Compression
13 [ https://www.khanacademy.org/computing/computer-science/informationtheory/moderninfotheory/v/compressioncodes ]
ASCII: Source Alphabet of Size = 128
14[The ARRL Handbook for Radio Communications 2013]
0 16 32 48 64 80 96 112
(American Standard Code for Information Interchange)
Ex. Source alphabet of size = 4
15
Ex. DMS (1)
16
, , , ,X a b c d e
a c a c e c d b c ed a e e d a b b b db b a a b e b e d cc e d b c e c a a ca a e a c c a a d cd e e a a c a a a bb c a e b b e d b cd e b c a e e d d cd a b c a b c d d ed c e a b a a c a d
Information Source
1 , , , , ,50, otherwise
X
x a b c d ep x
Approximately 20% are letter ‘a’s[GenRV_Discrete_datasample_Ex1.m]
Ex. DMS (1)
17 [GenRV_Discrete_datasample_Ex1.m]
clear all; close all;
S_X = 'abcde'; p_X = [1/5 1/5 1/5 1/5 1/5];
n = 100;MessageSequence = datasample(S_X,n,'Weights',p_X)MessageSequence = reshape(MessageSequence,10,10)
>> GenRV_Discrete_datasample_Ex1
MessageSequence =
eebbedddeceacdbcbedeecacaecedcaedabecccabbcccebdbbbeccbadeaaaecceccdaccedadabceddaceadacdaededcdcade
MessageSequence =
eeeabbacdeeacebeeeadbcadcccdcebdcacccaedebabcbedacdceeeacadddbccbdcbacdeecdedccaeddcbaaeddcecabacdae
Ex. DMS (2)
18
1,2,3,4X
1 , 1,21 , 2,41 , 3,480, otherwise
X
x
xp x
x
Information Source
Approximately 50% are number ‘1’s
2 1 1 2 1 4 1 1 1 11 1 4 1 1 2 4 2 2 13 1 1 2 3 2 4 1 2 42 1 1 2 1 1 3 3 1 11 3 4 1 4 1 1 2 4 14 1 4 1 2 2 1 4 2 14 1 1 1 1 2 1 4 2 42 1 1 1 2 1 2 1 3 22 1 1 1 1 1 1 2 3 22 1 1 2 1 4 2 1 2 1
[GenRV_Discrete_datasample_Ex2.m]
Ex. DMS (2)
19 [GenRV_Discrete_datasample_Ex2.m]
clear all; close all;
S_X = [1 2 3 4]; p_X = [1/2 1/4 1/8 1/8];
n = 20;
MessageSequence = randsrc(1,n,[S_X;p_X]);%MessageSequence = datasample(S_X,n,'Weights',p_X);
rf = hist(MessageSequence,S_X)/n; % Ref. Freq. calc.stem(S_X,rf,'rx','LineWidth',2) % Plot Rel. Freq.hold onstem(S_X,p_X,'bo','LineWidth',2) % Plot pmfxlim([min(S_X)-1,max(S_X)+1])legend('Rel. freq. from sim.','pmf p_X(x)')xlabel('x')grid on
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
x
Rel. freq. from sim.pmf pX(x)
20
DMS in MATLABclear all; close all;
S_X = [1 2 3 4]; p_X = [1/2 1/4 1/8 1/8]; n = 1e6;
SourceString = randsrc(1,n,[S_X;p_X]);
rf = hist(SourceString,S_X)/n; % Ref. Freq. calc.stem(S_X,rf,'rx','LineWidth',2) % Plot Rel. Freq.hold onstem(S_X,p_X,'bo','LineWidth',2) % Plot pmfxlim([min(S_X)-1,max(S_X)+1])legend('Rel. freq. from sim.','pmf p_X(x)')xlabel('x')grid on
SourceString = datasample(S_X,n,'Weights',p_X);
Alternatively, we can also use
[GenRV_Discrete_datasample_Ex.m]
A more realistic example of pmf:
21 [http://en.wikipedia.org/wiki/Letter_frequency]
Relative freq. of letters in the English language
A more realistic example of pmf:
22
Relative freq. of letters in the English languageordered by frequency
[http://en.wikipedia.org/wiki/Letter_frequency]
Example: ASCII Encoder
23
Characterx
Codewordc(x)
⋮E 1000101
⋮L 1001100
⋮O 1001111
⋮V 1010110
⋮
SourceEncoder
Information Source
“LOVE”“1001100100111110101101000101”
>> M = 'LOVE';>> X = dec2bin(M,7);>> X = reshape(X',1,numel(X))X =1001100100111110101101000101
MATLAB:
c(“L”) c(“O”) c(“V”) c(“E”)
Cod
eboo
k
Remark:numel(A) = prod(size(A))(the number of elements in matrix A)
The ASCII Coded Character Set
24[The ARRL Handbook for Radio Communications 2013]
0 16 32 48 64 80 96 112
A Byte (8 bits) vs. 7 bits
25
>> dec2bin('I Love ECS452',7)ans =1001001010000010011001101111111011011001010100000100010110000111010011011010001101010110010
>> dec2bin('I Love ECS452',8)ans =01001001001000000100110001101111011101100110010100100000010001010100001101010011001101000011010100110010
>> dec2bin('I Love You',8)ans =01001001001000000100110001101111011101100110010100100000010110010110111101110101
Geeky ways to express your love
26
>> dec2bin('i love you',8)ans =01101001001000000110110001101111011101100110010100100000011110010110111101110101
https://www.etsy.com/listing/91473057/binary-i-love-you-printable-for-your?ref=sr_gallery_9&ga_search_query=binary&ga_filters=holidays+-supplies+valentine&ga_search_type=all&ga_view_type=galleryhttp://mentalfloss.com/article/29979/14-geeky-valentines-day-cardshttps://www.etsy.com/listing/174002615/binary-love-geeky-romantic-pdf-cross?ref=sr_gallery_26&ga_search_query=binary&ga_filters=holidays+-supplies+valentine&ga_search_type=all&ga_view_type=galleryhttps://www.etsy.com/listing/185919057/i-love-you-binary-925-silver-dog-tag-can?ref=sc_3&plkey=cdf3741cf5c63291bbc127f1fa7fb03e641daafd%3A185919057&ga_search_query=binary&ga_filters=holidays+-supplies+valentine&ga_search_type=all&ga_view_type=gallery http://www.cafepress.com/+binary-code+long_sleeve_tees
Summary: Source Encoder
27
SourceEncoder
Information Source
“LOVE”“1001100100111110101101000101”
c(“L”) c(“O”) c(“V”) c(“E”)
• An encoder · is a function that maps each of the symbol in the source alphabet into a corresponding (binary) codeword.
• The list for such mapping is called the codebook.
Source Symbolx
Codewordc(x)
⋮E 1000101
⋮L 1001100
⋮O 1001111
⋮V 1010110
⋮
Discrete Memoryless Source (DMS)
• The codewordcorresponding to a source symbol is denoted by
.• the length of
• Each codeword is constructed from a code alphabet.
• For binary codeword, the code alphabet is 0,1
source string encoded string
• The source alphabet is the collection of all possible source symbols.
• Each symbol that the source generates is assumed to be randomly selected from the source alphabet.
w/o extension
Morse code
28
Telegraph network
Samuel Morse, 1838
A sequence of on-off tones (or , lights, or clicks)
(wired and wireless)
Example
29 [http://www.wolframalpha.com/input/?i=%22I+love+you.%22+in+Morse+code]
Example
30
Morse code: Key Idea
31
Frequently-used characters are mapped to short codewords.
Relative frequencies of letters in the English language
Morse code: Key Idea
32
Frequently-used characters (e,t) are mapped to short codewords.
Relative frequencies of letters in the English language
Morse code: Key Idea
33
Frequently-used characters (e,t) are mapped to short codewords.
Basic form of compression.
รหสัมอร์สภาษาไทย
34
Example: ASCII Encoder
35
Character Codeword
⋮E 1000101
⋮L 1001100
⋮O 1001111
⋮V 1010110
⋮
SourceEncoder
Information Source
“LOVE”“1001100100111110101101000101”
>> M = 'LOVE';>> X = dec2bin(M,7);>> X = reshape(X',1,numel(X))X =1001100100111110101101000101
MATLAB:
Another Example of non-UD code
36
Suppose we want to convey the sequence of outcomes from rolling a dice.
x c(x)
1 1
2 10
3 11
4 100
5 101
6 110
A sequence of throws such as 53214 is encoded as 10111101100
Another Example of non-UD code
37
Suppose we want to convey the sequence of outcomes from rolling a dice.
x c(x)
1 1
2 10
3 11
4 100
5 101
6 110
The encoded string 11 could be interpreted as 11: 1 1 3: 11
The encoded string 110 could be interpreted as 12: 1 10 6: 110
Another Example of non-UD code
38
x c(x)
A 1
B 011
C 01110
D 1110
E 10011
Another Example of non-UD code
39
x c(x)
A 1
B 011
C 01110
D 1110
E 10011
Consider the encoded string 011101110011.
It can be interpreted as CDB: 01110 1110 011 BABE: 011 1 011 10011
[ https://en.wikipedia.org/wiki/Sardinas%E2%80%93Patterson_algorithm ]
Game: 20 Questions
40
20 Questions is a classic game that has been played since the 19th century.
One person thinks of something (an object, a person, an animal, etc.)
The others playing can ask 20 questions in an effort to guess what it is.
20 Questions: Example
41
Shannon–Fano coding
42
Proposed in Shannon’s “A Mathematical Theory of Communication” in 1948
The method was attributed to Fano, who later published it as a technical report. Fano, R.M. (1949). “The transmission of information”.
Technical Report No. 65. Cambridge (Mass.), USA: Research Laboratory of Electronics at MIT.
Should not be confused with Shannon coding, the coding method used to prove Shannon's
noiseless coding theorem, or with Shannon–Fano–Elias coding (also known as Elias coding), the
precursor to arithmetic coding.
Prof. Robert Fano (1917-2016)Shannon Award (1976 )
Huffman Code
43
MIT, 1951 Information theory class taught by Professor Fano. Huffman and his classmates were given the choice of
a term paper on the problem of finding the most efficient binary code.
or a final exam.
Huffman, unable to prove any codes were the most efficient, was about to give up and start studying for the final when he hit upon the idea of using a frequency-sorted binary tree and quickly proved this method the most efficient.
Huffman avoided the major flaw of the suboptimal Shannon-Fanocoding by building the tree from the bottom up instead of from the top down.
David Huffman (1925–1999)Hamming Medal (1999)
Huffman’s paper (1952)
44[D. A. Huffman, "A Method for the Construction of Minimum-Redundancy Codes," in Proceedings of the IRE, vol. 40, no. 9, pp. 1098-1101, Sept. 1952.][ http://ieeexplore.ieee.org/document/4051119/ ]
Summary:
45
[2.16-17] A good code must be uniquely decodable (UD). Difficult to check.
[2.24] Consider a special family of codes: prefix(-free) code. Always UD. Same as being instantaneous.
All codes
Nonsingular codes
UD codes
Prefix-free
Huffman
codes
codes
[Defn 2.30] Huffman’s recipe Repeatedly combine the two least-likely (combined) symbols. Automatically give prefix-free code.
[2.37] For a given source’s pmf, Huffman codes are optimalamong all UD codes for that source.
[Defn 2.36]
Each source symbol can be decoded as soon as we come to the end of the codeword corresponding to it
No codeword is a prefix of any other codeword.
[Defn 2.18]
Huffman coding
46 [ https://www.khanacademy.org/computing/computer-science/informationtheory/moderninfotheory/v/compressioncodes ]
Ex. Huffman Coding in MATLAB
47 [Huffman_Demo_Ex1]
Observe that MATLAB automatically give the expected length of the codewords
pX = [0.5 0.25 0.125 0.125]; % pmf of XSX = [1:length(pX)]; % Source Alphabet[dict,EL] = huffmandict(SX,pX); % Create codebook
%% Pretty print the codebook.codebook = dict;for i = 1:length(codebook)
codebook{i,2} = num2str(codebook{i,2});endcodebook
%% Try to encode some random source stringn = 5; % Number of source symbols to be generatedsourceString = randsrc(1,10,[SX; pX]) % Create data using pXencodedString = huffmanenco(sourceString,dict) % Encode the data
[Ex. 2.31]
Ex. Huffman Coding in MATLAB
48
codebook =
[1] '0' [2] '1 0' [3] '1 1 1'[4] '1 1 0'
sourceString =
1 4 4 1 3 1 1 4 3 4
encodedString =
0 1 1 0 1 1 0 0 1 1 1 0 0 1 1 0 1 1 1 1 1 0
[Huffman_Demo_Ex1]
Ex. Huffman Coding in MATLAB
49 [Huffman_Demo_Ex2]
pX = [0.4 0.3 0.1 0.1 0.06 0.04]; % pmf of XSX = [1:length(pX)]; % Source Alphabet[dict,EL] = huffmandict(SX,pX); % Create codebook
%% Pretty print the codebook.codebook = dict;for i = 1:length(codebook)
codebook{i,2} = num2str(codebook{i,2});endcodebook
EL
[Ex. 2.32]
The codewords can be different from our answers found earlier.
The expected length is the same.
>> Huffman_Demo_Ex2
codebook =
[1] '1' [2] '0 1' [3] '0 0 0 0' [4] '0 0 1' [5] '0 0 0 1 0'[6] '0 0 0 1 1'
EL =
2.2000
Ex. Huffman Coding in MATLAB
50
pX = [1/8, 5/24, 7/24, 3/8]; % pmf of XSX = [1:length(pX)]; % Source Alphabet[dict,EL] = huffmandict(SX,pX); % Create codebook
%% Pretty print the codebook.codebook = dict;for i = 1:length(codebook)
codebook{i,2} = num2str(codebook{i,2});endcodebook
EL
[Exercise]
codebook = [1] '0 0 1'[2] '0 0 0'[3] '0 1' [4] '1'
EL =1.9583
>> -pX*(log2(pX)).'ans =
1.8956
56
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-0.4
-0.35
-0.3
-0.25
-0.2
-0.15
-0.1
-0.05
0
x
Entropy and Description of RV
57 [ https://www.khanacademy.org/computing/computer-science/informationtheory/moderninfotheory/v/information-entropy ]
Entropy and Description of RV
58 [ https://www.khanacademy.org/computing/computer-science/informationtheory/moderninfotheory/v/information-entropy ]
Summary: Optimality of Huffman Codes
59
Consider a given DMS withknown pmf … [Defn 2.36] A code is optimal
if it is UD and its corresponding expected length is the shortestamong all possible UD codes for that source.
[2.37] Huffman codes are optimal.
All codes
Nonsingular codes
UD codes
Prefix-free
Huffman
codes
codes
Expected length (per source symbol) of an optimal code
Expected length (per source symbol) of a Huffman code
1
[2.49-2.54] Bounds on expected lengths:
Summary: Entropy
60
Entropy measures the amount of uncertainty (randomness) in a RV.
Three formulas for calculating entropy: [Defn 2.41] Given a pmf of a RV , ≡ ∑ log .
[2.44] Given a probability vector ,
≡ ∑ log .
[Defn 2.47] Given a number , ≡ log 1 log 1
[2.56] Operational meaning: Entropy of a random variable is the average length of its shortest description.
Set 0log 0 0.
binary entropy function
Examples
61
Example 2.31
Example 2.32
Huffman
1.75H X X
Huffman
2.14 2.2H X X
Efficiency = 100%
Efficiency 97%
Examples
62
Example 2.33
Example 2.34
ABCD
Huffman
2.29 2.3H X X
Huffman
1.86 2H X X Efficiency 93%
Efficiency 99%
Summary: Entropy
63
Important Bounds
deterministic uniform The entropy of a uniform (discrete) random variable:
The entropy of a Bernoulli random variable:
binary entropy function
Huffman Coding: Source Extension
66
1 2 3 4 5 6 7 80.4
0.5
0.6
0.7
0.8
0.9
1
n: order of extension
i.i.d.
BernoullikX p0.1p
nL
1
0.533
0.645
[Ex.2.40]
Huffman Coding: Source Extension
67
1 2 3 4 5 6 7 80.4
0.6
0.8
1
1.2
1.4
1.6
1.8
Order of source extension
H X
1H Xn
n
i.i.d.
BernoullikX p0.1p
nL
[Ex.2.40]
Summary: Source Extension
68
The encoder operates on the blocks rather than on individual symbols.
[Defn 2.39] -th extension coding: 1 block = successive source symbols
= expected (average) codeword length per source symbol when Huffman coding is used with -th extension
→
1 2 3 4 5 6 7 80.4
0.6
0.8
1
1.2
1.4
1.6
1.8
Order of source extension
H X
1H Xn
nL