Proving Shannon’s Second Theorem

7/24/2019 Proving Shannons Second Theorem

1/18

CSC310 Information Theory Sam Roweis

Lecture 17:

Proving Shannons Second Theorem

November 8, 2006


2/18

How Good Are Simple Codes? 1

Shannons noisy coding theorem says we can get the probability oferror in decoding a block,pB, arbitrarily close to zero whentransmitting at any rate, R, below the capacity,C if we use

good codes of large enough length, N.

For repetition codes, as N increases, pB 0, butR0 as well.

For Hamming codes, which we will study next week,

asN increases,R1, butpB 1as well, since theres bound tobe more than one error in a really big block.

0.1

0.01

1e-05

1e-10

1e-15

0 0.2 0.4 0.6 0.8 1

Rate

more useful codes

R5

R3

R61

R1


3/18

Good Codes Arent Easy to Find 2

In the 60 years since Shannons noisy coding theorem, manyschemes for creating codes have been found, but most of themdontallow one to reach the performance promised by theorem.

They can still be useful. For example, error correction in computermemory necessarily works on fairly small blocks (eg, 64 bits).Performance on bigger blocks is irrelevant.

But in other applications computer networks, communicationwith spacecraft, digital television we could use quite big blocks ifit would help with error correction.

How can we do this in practice?


4/18

Getting to Capacity for the BEC 3

Wecan get near-error-free transmission for the binary erasurechannel, at any rate below capacity, using a practical method.

We use a linear[N, K]code, defined by a set ofM=NKparity-check equations:

c1,1 v1+c1,2 v2+ +c1,NvN= 0

c2,1 v1+c2,2 v2+ +c2,NvN= 0...

cM,1 v1+ cM,2 v2+ + cM,NvN= 0

For the BEC, any bit received as 0 or 1 is guaranteed to be correct.To decode, we fill in these known values in the equations above,and then try to solve for the unknown values, where the bit wasreceived as an erasure (?).

This can be done either iteratively (as you will do in the upcomingassignment) or via more computationally intensive methods.


5/18

When Will BEC Decoding Succeed? 4

If the probability of an erasure is f, andNis large, there will verylikely be aroundN ferasures in the received data, assuming theparity checks are independent (this is just the Law of LargeNumbers).

So the decoder will be solving Mequations inUunknowns, whereU is very likely to be near N f

These equations will be consistent, since the correct decoding iscertainly a solution.

The correct decoding will be the uniquesolution which the

decoder is guaranteed to find as long as Uout of the Mequations are independent.


6/18

Picking the Code at Random 5

Suppose we pick a code specified by the parity-checkcoefficients,cij at random. (!!!)

How likely is it that the equations that we need to solve to decodea transmission that has Uerasures will have a unique solution?

Imagine randomly picking the parity-check equations after wereceive the transmission with U erasures. How many equationswould we expect to have to pick to get Uindependent equations?

Once we haveiindependent equations, the probability that thenext equation picked will be dependent on these will be

2i

2U =

1

2Ui

since there are2i ways of combining the previous equations, and 2U

possible equations.


7/18

Picking the Code at Random (Continued) 6

The expected number of dependent equations picked before we getUindependent ones is

U1i=0

12Ui

1 1

2Ui1

=

U1i=0

12Ui 1

Reordering the terms, we can see that this is small:

1 + 1/3 + 1/7 +


8/18

What about the BSC? 7

A similar argument using randomly-chosen codes is used in theproof of Shannons noisy coding theorem for the BSC.Well look at a sketch of this proof.

But unlike the random codes for the BEC (which are practical), therandom codes used in this proof are completely impractical.

This was why we needed to consider random codes of a differentkind, whose parity-check matrices are mostly zeros. These are theLow Density Parity Check Codes, which can be used in practice,and allow near-error-free transmission at close to capacity.


9/18

Shannons Noisy Coding Theorem for the BSC 8

Consider a BSC with error probabilityf 0, and forany desired limit on error probability, >0, there is a code of somelengthNwhose rate,R, is at least C, and for which theprobability that nearest neighbor decoding will decode a codeword

incorrectly is less than.

We can now give a proof of this, which more-or-less follows theproof for general channels in Chapter 10 of MacKays book.

The idea is based on showing that a randomly chosen codeperforms quite well and hence that there must be specific codeswhich also perform quite well.


10/18

Strategy for Proving the Theorem 9

Rather than showing how to construct a specific code for givenvalues off,, and, we will consider choosing a code of a suitablelength,N, and rate log2(M)/N, by pickingM codewords

at random fromZN2 .

We consider the following scenario:

1. We randomly pick a code, C, which we give to both the sender

and the receiver.2. The sender randomly picks a codeword x C, and transmits it

through the channel.

3. The channel randomly generates an error pattern, n, and deliversy=x + nto the receiver.

4. The receiver decodes y to a codeword, x, that is nearest to y inHamming distance.

If the probability that this process leads to x =xis< , thenthere must be some specific code with error probability < .


11/18

Rearranging the Order of Choices 10

It will be convenient to rearrange the order in which randomchoices are made, as follows:

1. We randomly pick one codeword, x, which is the one the sendertransmits.

2. The channel randomly generates an error pattern, n, that isadded to xto give the received data, y. Let the number of

transmission errors (ie, ones in n) be w.3. We now randomly pick the other M1codewords. If the

Hamming distance from y of all these codewords is greater than

w, nearest-neighbor decoding will make the correct choice. The probability of the decoder making the wrong choice here is the

same as before. Sneaky huh?


12/18

The Typical Number of Errors 11

IfN is large, we expect that close toN fof the Nbits in acodeword will be received in error. In other words, we expect theerror vector, n, to contain close to N f ones.

Specifically, the Law of Large Numbers tells us that for any >0,there is some value for Nsuch that:

P(f < w/N < f+ ) 1 /2

wherew is the number of errors in n. Well say that error vectors,n, for which f < w/N < f+ are typical.

f f+ w/N


13/18

How Many Typical Error Vectors Are There? 12

How many error vectors, n, are ther for whichf < w/N < f+ ?

Ifnis such a typical error pattern withw errors, thenP(n) = fw(1f)Nw > fN(f+)(1f)N(1f)

LetJbe the number of typical error vectors. Since the total

probability of all these vectors must not exceed one, we must haveJfN(f+)(1f)N(1f) < 1

and hence

J < fN(f+)(1f)N(1f)

Equivalently,

J


14/18

Decoding with Typical Error Patterns 13

The probability that the codeword nearest to y is the correctdecoding will be at least as great as the probability that thefollowing sub-optimal decoder decodes correctly:

If there is exactly one codeword x for which n=y x hasa typical number of ones, then decode to x, otherwise declarethat decoding has failed.

This sub-optimal decoder can fail in two ways:

The correct decoding, x, may correspond to an error pattern,n=y x, that is not typical.

Some other codeword, x, may exist for which the error patternn =y x is typical.


15/18

Bounding the Probability of Failure (I) 14

The total probability of decoding failure is less than the sum of theprobabilities of failing in these two ways.We will try to limit each of these to/2.

We can choose Nto be big enough that

P(f < w/N < f+ ) 1 /2

This ensures that the actual error pattern will be non-typical withprobability less than/2.

We now need to limit the probability that some other codeword

also corresponds to a typical error pattern.


16/18

Bounding the Probability of Failure (II) 15

The number of typical error patterns is

J < 2N(H2(f)+log2((1f)/f))

For a random codeword, x, other than the one actuallytransmitted, the corresponding error pattern given y will contain 0sand 1s that are independent and equally likely.

The probability that one such codeword will produce a typical errorpattern is therefore

J/2N < 2N(1H2(f)log2((1f)/f))

The probability that anyof the other M1 codewords willcorrespond to a typical error pattern is bounded by M times this.We need this to be less than /2, ie

M2N(1H2(f)log2((1f)/f)) < /2


17/18

Finishing the Proof 16

Finally, we need to pick,M, andNso that the two types of errorhave probabilities less than /2, and the rate,Ris at least C .

We will letM= 2(C)N, and make sure N is large enough thatR=(C)N/N < C.

With this value ofM, we need

2(C)N 2N(1H2(f)log2((1f)/f)) < /2

2N(1H2(f)(C)N/Nlog2((1f)/f)) < /2

The channel capacity is C= 1 H2(f), so that1H2(f) (C)N/N = CR is positive.

For a sufficiently small value of,1H2(f) (C)N/N log2((1f)/f)will also bepositive. With this and a large enoughN, the probabilities of

both types of error will be less than /2, so the total errorprobability will be less than .


18/18

Good Codes and Minimum Distance 17

Recall that for a code to be guaranteed to correct up to t errors,its minimum distance must be at least 2t + 1.

Whats the minimum distance for the random codes used to provethe noisy coding theorem?

A randomN-bit code is very likely to have minimum distancedN/2 if we pick two codewords randomly, about half their

bits will differ. So these codes are likely not guaranteedto correctpatterns ofN/4or more errors.

A BSC with error probabilityfwill produce about N f errors.

So forf >1/4, we expect to get more errors than the code isguaranteed to correct. Yet we know these codes are good!

Conclusion: A code may be able to correct almost allpatterns of

terrors even if it cant correct allsuch patterns.

Documents

Proving Shannon’s Second Theorem