INFORMATION THEORY Pui-chor Wong

INFORMATION THEORY

Pui-chor Wong

Introduction• Information theory:- Deals with the amount, encoding,

transmission and decoding of information. It deals with the measurement of information but not the meaning of information.

• The motivation for information theory is provided by Shannon’s coding theorem.

• Shannon’s coding theorem: if a source has an information rate less than the channel capacity, there exists a coding procedure such that the source output can be transmitted over the channel with an arbitrarily small probability of error.

SIGNALLING SYSTEM

• Input signals• Encoded Input• Channel Transmitted Information ( A Noise

source will also be an input to this stage)• Encoded Output• Output Signals

Information

Definition.• If a message e has probability pe, its

information I is given by: I = LOGB(1/pe)

Example

Determine the information associated with an input set consisting of the 26 letters of the alphabet. Assume each alphabet is sent with the same probability.

Solution:

• The probability of each letter is 1/26.• Hence Information

I = log2(26) = 4.7 bits

Entropy

• Entropy of an input symbol set (denoted by H(X)), is defined to be the average information of all symbols in the input and is measured in bits/symbol.

• This is a useful property in practical communication systems, since long sequences are usually transmitted from an information source.

• The average information can be obtained by weighting each information value (I(xi)) by the portion of time in which it persists (i.e it’s probability p(xi)) and adding all the partial products p(xi)I(xi) for each symbol.

Formula

Comments on Entropy

• For an input set of N symbols, it can be shown that the entropy H(X) satisfies the relation: 0 H(x) log2(N)

• H(X) is the average amount of information necessary to specify which symbol has been generated.

Matlab codefunction my_entropy()% Plot of entropy with probability for a 2 symbol set.p = 0:0.01:1-eps;H = p.* log2(1./p)+(1-p).*log2(1./(1-p));clffigure(1)plot(p,H)Title('Entropy variation with probability')xlabel('Probability')ylabel('Entropy')grid on

Information rate

In order to specify the characteristics of a communication channel, one design criteria is the knowledge of how fast information is generated. This design criteria is referred to as the information rate and is measured in bits per second.

If a source X emits symbols at a rate r symbols (messages) per second, then the information rate is defined to be

R = rH(X) = average number of bits of information per

second.

Example

An analog signal of bandwidth B is sampled at the Nyquist rate (i.e. 2B). Assuming the resulting sequence are quantized into 4 levels; Q1, Q2, Q3 and Q4 with probabilities, p1 = p4 = 1/8 and p2 = p3 = 3/8.

Solution

continued..

The information rate R is:R = rH(X) = 2B(1.8) = 3.6B bits/sAs indicated earlier, maximum entropy will

occur when each symbol is transmitted with equal probability. Hence if p = ¼

continued..

Source Coding

• Purpose:– To minimize the average bit rate required to

represent the source by reducing the redundancy of the information source. Or alternately, increasing the efficiency of transmission by reducing the bandwidth requirement.

Code length

• If the binary code assigned to symbol xi by the encoder has a length ni ( measured in bits), the average code length L per source symbol is given by:

Code Efficiency

• The code efficiency is defined as:

where Lmin is the minimum possible value of L. When approaches 1, the code is said to be efficient.

Code Redundancy

• The code redundancy is defined as:

Source coding theorem

• This states that for a DMS with entropy H(X), the average code word length L per symbol is bounded as:

L H(X)• With Lmin written as H(X), the code efficiency can

be written as:

Channel Capacity

• Hartley Law– Cc = Capacity in bps– B = Channel Bandwidth in Hz– M = The number of level for each signaling

element.

...With Noise

• The number of signaling levels M, is related to the signal-to-noise ratio as follows:

• The channel capacity can now be expressed as:

Channel Coding • Purpose:

– To design codes for the reliable transmission of digital information over noisy channels.

• The task of source coding is to represent the source information with the minimum of symbols. When a code is transmitted over a channel in the presence of noise, errors will occur. The task of channel coding is to represent the source information in a manner that minimizes the error probability in decoding.

• It is apparent that channel coding requires the use of redundancy. If all possible outputs of the channel correspond uniquely to a source input, there is no possibility of detecting errors in the transmission. To detect, and possibly correct errors, the channel code sequence must be longer than the source sequence. The rate R of a channel code is the average ratio of the source sequence length to the channel code length. Thus, R < 1.

Code classification

xi Code 1 Code 2 Code 3 Code 4 Code 5 Code 6

X1 00 00 0 0 0 1

X2 01 01 1 10 01 01

X3 00 10 00 110 011 001

X4 11 11 11 111 0111 0001

Code classification.. 2Fixed- length Codes:- Code word length is fixed. Examples Code 1 and Code 2Variable-length Codes: Code word length is not fixed. All codes except Codes 1 and 2. Distinct Codes:- Each code word is distinguishable from other code words. All except code 1. Note that x1 and x3 are the same.Prefix-Free Codes: No code word can be formed by adding code symbols to another code. Thus in a prefix-free code, no code word is a prefix of another. Codes 2, 4 and 6 are prefix-free. Uniquely decodable codes:- The original source sequence can be reconstructed perfectly from the encoded binary sequence. Code 3 is not uniquely decodable, since a sequence 1001 may correspond to x2x3x2 or x2x1x1x2 . Instantaneous Codes:- A uniquely decodable code is instantaneous if the end of any code word is recognizable without examining subsequenct code symbols.

Huffman encoding Algorithm • 1. Sort source outputs in decreasing order of their

probabilities.• 2. Merge the two least-probable outputs into a single output

whise probability is the sum of the corresponding probabilities.

• 3. If the number of remaining outputs is 2, then go to the next step, otherwise go to step 1.

• 4. Arbitrary assign 0 and 1 as code words for the 2 remaining outputs.

• 5. If an output is the result of a merger of two outputs in a preceding step, append the current code word with a 0 and a 1 to obtain the code word for the preceding outputs and then repeat step 5. If no output is preceded by another output in a preceding step, then stop.

Example

Error detection coding• The theoretical limitations of coding are placed by the results of information

theory. These results are frustrating in that they offer little clue as to how the coding should be performed. Error detection coding is designed to permit the detection of errors. Once detected, the receiver may ask for a re-transmission of the erroneous bits, or it may simply inform the recipient that the transmission was corrupted. In a binary channel, error checking codes are called parity check codes.

• Our ability to detect errors depends on the code rate. A low rate has a high detection probability, but a high redundancy.

• The receiver will assign to the received codeword the preassigned codeword that minimizes the Hamming distance between the two words. If we wish to identify any pattern of n or less errors, the Hamming distance between the preassigned codewords must be n + 1 or greater.

Single parity check code

A very common code is the single parity check code. This code appends to each K data bits an additional bit whose value is taken to make the K + 1 word even (or odd). Such a choice is said to have even (odd) parity. With even (odd) parity, a single bit error will make the received word odd (even). The preassigned code words are always even (odd), and hence are separated by a Hamming distance of 2 or more.

Some Math..

• P{single bit error} = p• P{no error in single bit} = (1-p)• P{no error in 8 bits} = (1-p)8

• P{unseen error in 8 bits} = 1-(1-p)8 = 7.9 x 10-4

Suppose the BER is p = 10-4

continued..• P{no error in single bit} = (1-p)• P{no error in 9 bits} = (1-p)9

• P{ single error in 9 bits } = 9(P{single bit error} P{no error in other 8 bits}) = 9p(1-p)8

• P{unseen error in 9 bits} = 1-P{no error in 9 bits} -P{single error in 9 bits} )= 1-(1-p)9 – 9p(1-p)8

= 3.6 x 10-7

The addition of a parity bit has reduced the uncorrected error rate by three orders of magnitude.

Hamming distance & weight

• Hamming Distance d(ci,cj) or dij between codewords ci and cj :- Is the number of positions in which ci and cj differ.

• Hamming weight w(ci) of a codeword ci:- Is the number of non-zero elements in ci. Equivalent to the Hamming Distance between x and 0 ( 0 being the sequence with all zeros).

Example

• Compute the Hamming distance between the 2 code words, 101101 and 001100

100001001100101101

2100001 wd ij

Detection & Correction

• Error Detection:- It can be shown that to detect n bit errors, a coding scheme requires the use of codewords with a Hamming distance of at least n + 1.

• Error Correction:- It can also be shown that to correct n bit errors requires a coding scheme with at least a Hamming distance of 2n + 1 between the codewords.

Example• A code consists of 8 codewords:

00010111110000100011011110110110110100110101111010000000

• If 1101011 is received, what is the decoded codeword?

Solution• The decoded codeword is the codeword closest in

Hamming distance to 1101011.

• Hence the decoded codeword is 1111011

511010110000000

411010110111101311010111001101511010110110110111010111111011411010111000110411010111110000211010110001011

wwwwwwww

Linear block codes• An (n,k) block code is completely defined by M = 2k

binary sequences each of fixed length n. • Each of the sequences of fixed length n is referred to as a

code word. k is the number of information bits.

• The Code C thus consists of M code words

• C = {c1, c2, c3, c4, …cM}

continued..

• Practical codes are normally block codes. A block code converts a fixed length of k data bits to a fixed length n codeword, where n > k. The code rate Rc is:

• and the redundancy of the code is nk

Rc

nk

1

operations

• Arithmetic operations involve addition and multiplication. These are performed according to the conventions of the arithmetic field. The elements used for codes are from a finite set generally referred to as a Galois field and denoted by GF(q) ( where q is the number of elements in that field.

• Binary codewords use 2 elements (0 and 1) and hence a GF(2) is used. Arithmetic operations are performed mod(2).

Generator and Parity check Matrices

• The output of a linear binary block encoder (i.e a codeword ) for an (n,k) linear block code, is a linear combination of a set of k basis vectors, each of length n, denoted by g1, g2, …..gk.

• The vectors g1, g2, …..gk are not unique.

continued..

• From linear algebra, the basis vectors, g1, g2, …..gk, can be represented as a matrix G defined as:

g

gg

k

G2

1

Denoting the k bits in Xm as: Xm = {xm1, xm2, xm3, …….cmk } and the n bits in Cm as: Cm = {cm1, cm2, cm3, …….cmn } The code word Cm can be obtained using the generator matrix as: Cm = Xm G

Example

• Given

• Determine the codewords for each of the input messages.

• A (5,2) code obtained by mapping the Information sequence { 00, 01, 10, 11} to C = {00000, 01111, 10100, 11011}.

0111110100

G

Any generator matrix G of an (n,k) code can be reduced by row operations (and column permutations) to the ‘systematic form’ defined as:

PI knkXkXkG )(|

Where Ik is a k-by-k identity matrix and P is a k-by-(n-k) binary matrix. The P matrix determines the n-k redundant or parity check bits of the generated code word.For a given information bit sequence, the code word obtained using a systematic form G matrix, has the 1st k bits are identical to the information bit, and the remaining n-k bits are linear combinations of the k information bits. The resulting (n,k) code is a systematic code.

For systematic binary code,

)(| knkkk PG I

IP kkkkn

TH |)(

A further simplification can be obtained for binary code, that is since -PT = PT

It implies that:

IPIP knXknXkknkkkkn

T TH )()()()( ||

parity check matrix

The parity check matrix H, will be one which satisfies the following orthogonality principle: cmHT = 0 Where HT is the transpose of H,

Cyclic codes

• Definition:– A cyclic code (a left circular shift) is a linear

block code with the extra condition that if c is a code word, a cyclic shift of it is also a code word.

• If C = {c1, c2, c3, c4, …cn} then C(1) – C(1) = { c2, c3, c4, …cn, c1 } is a cyclic shifted

version of C.

Generation of cyclic codes

• For any cyclic code (n,k):• The code word polynomial c(p)

corresponding to an information/message sequence X(p) = { x1, x2, x3, …….xk } can be obtained from a generator polynomial g(p) as follows:

)()()( pgpXpc

xxpxpxpx kk

kkk

i

ik

ippX

1

2

2

1

11

......)(

1......)(2

3

1

2

ppg gpgpgp kn

knknkn

The generator polynomial g(p) is always a factor of pn + 1(or pn-1) of degree n-k

Example

• A message bit sequence is given by [1010100101]. When the 3-bit parity sequence [111] is appended to message sequence, determine:

• a) The code word • b) The polynomial representation of the

code word.

Solution

• New code word is: [1010100101111]• Which is a (13,3) code word

1)(2579 pppppX

1)(12 pppr

1

11)(

123581012

1225793

ppppppp

ppppppppc

Example

• Generate a (7,4) cyclic code.The generator polynomial should be of degree n-k = 7-4 = 3.The polynomial has to divide p7+1

1111

3237pP pppp

The only 3rd degree polynomials, which divide p7+1 are p3+p2+1 and p3+p+1. If we choose 1)(

23 pppg

and multiply it by all polynomials of the form

xxpxpx ppX43

2

2

3

1)(

for the information sequence { 0000, 0001, 0010,……1110, 1111 }, the corresponding code word sequence is: { 0 0 0 0 0 0 0, 0 0 0 1 1 0 1, 0 0 1 1 0 1 0, 0 0 1 0 1 1 1, 0 1 1 0 1 0 0, 0 1 1 1 0 0 1, 0 1 0 1 1 1 0, 0 1 0 0 0 1 1, 1 1 0 1 0 0 0, 1 1 0 0 1 0 1, 1 1 1 0 0 1 0, 1 1 1 1 1 1 1, 1 0 1 1 1 0 0, 1 0 1 0 0 0 1, 1 0 0 0 1 1 0, 1 0 0 1 0 1 1 }

Generating Systematic cyclic codes

• An (n,k) codeword created by appending (n-k) redundant (parity check) bits to the k message bits is referred to as a systematic code.

• Generating code words of length n by appending (n-k) bits to k message or data bits to create a code word c, can be expressed as follows:

• c(p) = pn-kX(p) + r(p)• where:• c(p) is the polynomial representation of the code word• X(p) is the polynomial representation of the message bits• r(p) is the polynomial representation of the appended bits

Implementation

Multiplying the message polynomial X(p) by pn-k.Dividing pn-kX(p) by g(p) (modulo-2 division) to obtain the remainder r(p). r(p) is the polynomial of the appended bits (parity check bits, Frame check sequence).Adding r(p) to pn-kX(p). (The addition is performed modulo-2).

Example

• Given a message sequence of 8 bits [11100110], obtain the (12,8) systematic code word given the generator polynomial

1)(34 pppg

Solution

• Step 1:• Multiplying the message polynomial X(p)

by p12-8 to obtain:

ppppp

ppppp p

5691011

25674

Solution..

• Step 2:• Divide pn-kX(p) by g(p) (modulo-2 division)

to obtain the remainder r(p).

1134

22457

34

5691011

ppppppp

ppppppp p

p

ppr p 2

)(Hence the systematic code word is: [111001100110]

Documents

INFORMATION THEORY Pui-chor Wong