49
1 Zero Error James L. Massey Prof.-em. ETH Zürich Adjunct Prof., Lund Univ., Sweden Adjunct Prof., Tech. Univ Denmark Trondhjemsgade 3, 2TH DK-2100 Copenhagen East [email protected] Information Theory Winter School 2007 La Colle sur Loup 16 March 2007

Zero Error - IEEE IT Winter School 2007itwinterschool07.eurecom.fr/Tutorials/Massey_Zero_error.pdf · Since its founding by Shannon in 1948, ... then the zero-error capacity of the

  • Upload
    lamkien

  • View
    217

  • Download
    2

Embed Size (px)

Citation preview

1

Zero Error

James L. Massey

Prof.-em. ETH ZürichAdjunct Prof., Lund Univ., SwedenAdjunct Prof., Tech. Univ Denmark

Trondhjemsgade 3, 2THDK-2100 Copenhagen East

[email protected]

Information TheoryWinter School 2007

La Colle sur Loup16 March 2007

2

The new results in these lectureswere obtained in joint research with

Peter C. Massey

3

Since its founding by Shannon in 1948,information theory has mostly dealt withthe small-error capacity C of channels.

C is the largest number such that, for everyε > 0 and every δ > 0, information bits from aBinary Symmetric Source (BSS) can, with theuse of proper coding, be sent over the channelat a rate R > C - δ bits/channel-use and withbit error probability Pb < ε.

BSS = Monkey witha fair coin.

4

Example: The Binary Erasure Channel (BEC)

1 - p

1 - p

0 0

1 1

Y X∆∆∆∆p

p

As almost everybody knows,

C = 1 - p bits/use

Hereafter, we consider only Discrete MemorylessChannels (DMCs).

5

Peter Elias (1923-2001) and friendsBoston 17 August 1998

6

In his 1956 paper, “The zero error capacity ofa noisy channel”, Shannon defined the zero-error capacity Co as the largest number suchthat, for every δ > 0, K information bits from aBinary Symmetric Source can, with the use ofblock code of length N, be sent over thechannel at a rate R = K/N > Co - δ bits/channel-use and with block error probability Pb = 0.

What is the zero-error-capacity of the BEC?

N.B. When one deals with zero error, bit errorprobability and block error probability coincide.

7

First consider the special case of the BEC with p = 0.

1

1

0 0

1 1

Y X∆∆∆∆

(We will not show impossible transitions.)

The output letter ∆∆∆∆ is unreachable for this BEC.

Obviously, Co = 1 bit/use.

8

What is the zero-error-capacity of the BEC inthe nontrivial case where p > 0 ?

Trivial Lemma:If you cannot send even one information bit overthe channel with zero error no matter how largethe block length N, then Co = 0 bits/use.

To send one information bit, we need acode with only two codewords of length N,e.g., 0 0 … 0 and 1 1 … 1. But no matterwhich codeword we send, ∆∆∆∆ ∆∆∆∆ … ∆∆∆∆ will bereceived with nonzero probability and thedecoder cannot then make a zero-errordecision. Therefore, Co = 0 bits/use.

9

Let Y(x) denote the set of all output letters reachable(with positive probability) from the input letter x.

Theorem (Shannon, 1956):The zero-error-capacity Co of a discretememoryless channel is zero unless and onlyunless it has two input letters x and x’ whosereachable sets Y(x) and Y(x’) are disjoint.

N.B. If Co ≠ 0, then Co ≥ 1 bit/use; in factR = 1 bit/use with zero error can beachieved with a block code of length N = 1.

The two input letters of the BEC with p > 0 donot have disjoint reachable sets because theerasure symbol ∆ is in both sets. Thus Co = 0.

10

Shannon called two input letters x and x’ adjacent iftheir reachable sets Y(x) and Y(x’) are not disjoint, i.e.,if there is an output letter reachable from both.(Shannon was usually a master at choosing appropriateterminology but, in my opinion, not this time.) I amgoing to use the term confusable instead of “adjacent”.

We will say that two or more channel input letters arenon-confusable if no output letter can be reachedfrom more than one of these inputs.

Similarly, we will say that two or more channel inputsequences of the same length N (N > 0) are non-confusable if there is no output sequence of length Nthat can result with nonzero probability fromtransmitting two of these input sequences.

11

Theorem (Shannon, 1956):The zero-error-capacity Co of a discretememoryless channel is zero unless and onlyunless it has two non-confusable input letters.

In our language, the previous theorem becomes.

Shannon’s terminology may have been unfortunate,but Shannon was putting his finger on the rightconcept! When one is considering zero-errorcapacity, the only thing that counts about atransition probability is whether it is zero or not.

This makes computing zero-error capacitya combinatorial problem rather than ananalytical probabilistic problem.

12

Claude Elwood Shannon (1916–2001)(photograph 17 April 1961 by Göran Einarsson)

Shannon liked ideas more than equations!

13

If the maximum number of non-confusableinput letters is M, then the zero-errorcapacity of the channel satisfies

Co ≥ log2 M bits/use.

14

An example from Shannon:

0

1

2

3

4

0

1

2

3

4

0 and 2 are non-confusable inputs.

1 and 3 are non-confusable inputs.

2 and 4 are non-confusable inputs.

0 and 3 are non-confusable inputs.

1 and 4 are non-confusable inputs.

No three input letters are non-confusable.Is Co = 1 bit/use ?

Y X

15

Shannon pointed outthat if we use onlythe following pairsof inputs: 00, 12, 24,31, 43. Then thechannel equivalentlybecomes

00

12

24

31

43

0001101112132223242034303132414243440304

These five inputsfor the compoundchannel are non-confusable so weknow that

Co ≥ (log25)/2 ≈ 1.16 bits/use. Is this number Co ?

16

*László Lovász; On the Shannon capacity of a graph, IEEETrans. Info. Theory, vol. IT-25, pp. 1 - 7, January 1979.

In a celebrated 1979 paper* that won theInformation Theory Society’s annual paper award,Lovász proved that Co = (log25)/2 ≈ 1.16 bits/usefor Shannon’s channel.

This was 23 years after Shannon’s paper waspublished!

17

In spite of what I’ve said until now, I’m not reallyinterested in zero-error capacity except whenthere is a feedback channel available.

First we consider what Shannon called completefeedback by which he meant that “there exists areturn channel sending back from the receivingpoint to the transmitting point, without error,the letters actually received. It is assumed thatthis information is received at the transmittingpoint before the next letter is transmitted,and can be used, therefore, if desired, inchoosing the next transmitted letter.”

Shannon wrote CoF to denote the zero-errorcapacity with complete feedback.

18

Example of the BEC:

BECp

Feedbackassistedencoder

BSSDecoderSink101...101... 11100111...∆∆1∆0∆∆1...

...1∆∆0∆1∆∆

(There is assumed to be a slight delay in the complete feedback line.)

The encoder is using the rule: transmit each informationbit repeatedly until it is received unerased.

This coding scheme clearly gives zero error ! Moreover,the rate of transmission is R = 1 - p bits/use (theprobability of success on each transmission) so that theBEC is used 1/(1 - p) times on average for each info. bit.

Can we conclude that CoF = 1 - p = C ???

19

Shannon says NO!.

Shannon proved in 1956 thatif Co = 0 then CoF = 0.

How can we reconcile this statementwith the example of the BEC ???

20

Shannon’s allowed only “block” codes!

In his 1956 paper, “The zero error capacity of a noisychannel”, Shannon in fact defined the zero-errorcapacity CoF with complete feedback as the largestnumber such that, for every δ > 0, K information bitsfrom a Binary Symmetric Source can be sent with Nuses of the DMC with complete feedback at a rate R= K/N > CoF - δ bits/channel-use and with zero error.

But if Co = 0, then one cannot send one informationbit with zero error over a BEC, even with an adaptivecode of length N and no matter how large N may bebecause the received sequence can be ∆∆∆∆ ∆∆∆∆ … ∆∆∆∆ forboth values of the information bit. The sameargument applies to any channel with Co = 0.

21

In 1948, Shannon used the following definition:

The small-error capacity C is the largestnumber such that, for every ε > 0 and every δ> 0, information bits from a Binary SymmetricSource can, with the use of a block code ofsufficiently large length N, be sent over thechannel at a rate R > C - δ bits/channel-useand with block error probability PB < ε.

When one deals with “small error”, there is noloss of generality in restricting oneself to blockcoding, but for “zero error” one loses generality.Shannon seems to have overlooked this point.

22

In his 1956 paper, Shannon proved that the small-errorcapacity of a discrete memoryless channel (the onlykind of channel that we have been, and will be, talkingabout) is not increased by the availability of completefeedback. It may have seemed natural to him then thatthe zero-error capacity also should not be increased bythe availability of complete feedback. Thus, he may nothave reflected much over the generality of hisrestriction to “block” codes in his definition of CoF.

There are very few places in his enormous record ofcontributions to information theory that Shannonadopted an unnecessarily restrictive approachthisappears to be one of them.

23

The zero-error capacity CoFa with completefeedback is the largest number such that, forevery δ > 0, information bits from a BinarySymmetric Source can, with the use of someadaptive coding scheme, be sent with zero errorover the channel at rate R > CoFa - δbits/channel-use and with average coding delayat most Da, Da < ∞, for every information bit.

We will use the definition:

(The “a” in CoFa is intended as a reminder thatwe are considering “average coding delay”.)

24

Our previous example of adaptive coding for theBEC gives Di ≤ 1/(1-p) for all i and hence

CoFa = 1 - p = C.

By the coding delay for an information bit, wemean the number of channel uses beginning whenthe information bit enters the encoder and endingwhen the information bit is assigned its value bythe decoder.

Let Di be the decoding delay for the ith informationbit. For a block code of length N, Di ≤ N for all i.

25

We will say that an output letter y of a discretememoryless channel is a disprover for the inputletter x if y cannot be reached from x but canbe reached from at least one other input letter.

1 - p

1 - p

0 0

1 1

Y X∆∆∆∆p

p

For the BEC with 0 ≤ p < 1, the output letter 0 isa disprover for the input letter 1, and the outputletter 1 is a disprover for the input letter 0.

26

The idea behind the definition of a “disprover”:if y is a disprover for x, then the appearanceof y in the received sequence proves that thecorresponding transmitted letter was not x.

What is the simplest noisy discretememoryless channel with Co = 0 whoseoutput letters include a disprover ?

Is it the Binary Erasure Channel ?

27

No! It is the Z-channel !

1 - p1 1

Y X

0

p

0

(0 < p < 1)

1

The output letter 1 is a disprover for the input letter 0.

Does this channel have CoFa > 0 ?

28

Suppose we create a new channel by using theZ-channel for the input pairs 01 and 10 only.

1 - p

01 01

10 10

Y X00p

p

1 - p

Thus, by our previous result for the BEC, weknow that for the Z-channel CoFa ≥ (1 - p)/2.

Yes!

29

We can do this same “trick” for any discretememoryless channel whose output alphabetcontains at least one disprover y for someinput x. 1 - p

y x'

Y Xp

x

(0 ≤ p < 1)

1set of alloutputs except y

(x’ is any input letter that can reach y.)

It follows that any such channel has CoFa ≥ (1 - p)/2 > 0.

30

We have proved:

Theorem 1:For a discrete memoryless channel,CoFa > 0 if and only if the outputalphabet contains at least onedisprover y for some input letter x.

But what is the actual value of CoFa ?Is it C or is it in general smaller?

31

Theorem 2:If a discrete memoryless channel has CoFa > 0,i.e., if the output alphabet contains at leastone disprover y for some input letter x, thenCoFa = C (the small-error capacity).

Proof:We can use the disprover y for the input letter xto create a Z-channel with probability p (0 ≤ p < 1)to the receiver. We can use two uses of this Z-channel to create a Binary Erasure Channel (BEC)with erasure probability p as shown on slide 26.We will use this BEC to send a one-bit ACK orNAK to the receiver in the following way eachtime coded block is sent on the forward channel:

32

Suppose we have a block code of length N forsending K information bits in N uses of theforward channel with rate RB = K/N > C - δB andblock error probability PB < εB.

After we send a block, we know from thecomplete feedback whether the decoding wascorrect or not. If yes, we use our created BECto send a 1 (ACK) to the receiver without errorwith an average of D = 2/(1 - p) channel uses. Ifno, we use our created BEC to send a 0 (NAK) tothe receiver without error, again with an averageof D = 2/(1 - p) channel uses. The ACK informsthe receiver that a new block will now betransmitted; the NAK informs the receiver thatthe previous block will be retransmitted.

33

The probability of success (an ACK) on each blocktransmission is 1 - PB so the average transmissionrate is

R = (1 - PB ) = RB (1 - PB ) K

N + D 1 + D/N1

≥ (C - δB ) (1 - εB ) 1 + D/N1

Thus, for any given δ > 0, we can choose δB and εBsufficiently small and N sufficiently large so that

R ≥ C - δ,

which proves Theorem 2.

34

How much noiseless (but not complete)feedback is needed to approach the zero-error capacity with feedback, CoFa ?

Consider now noiseless, but notnecessarily complete, feedback.

There are coding schemes using noiselessfeedback for sending with zero error at anyrate less than CoFa = C over a DMC withCoFa > 0, that approach arbitrarily closelyto one binary digit of noiseless feedbackfor each bit of information transmitted onthe forward channel.

A first answer:

35

In our ACK/NAK block coding scheme forapproaching CoFa, K = NRB binary digits ofnoiseless feedback suffice for each block oflength N transmitted on the forward channel.The receiver can simply send back the K decodedinformation bits over the noiseless channel. (Thiswould mean that the receiver has to wait untildecoding was complete to send the feedback andhence the sender would have to interleave thetransmission of two blocks to keep the forwardchannel busy.) The repetitions of “NAK”ed blockswill cause the average number of binary digits senton the noiseless feedback channel to slightlyexceed K for each block of K information bits, butone can make this average approach K arbitrarilyclosely by choosing N sufficiently large.

36

Are there DMCs with CoFa > 0 and Co = 0for which one can send with zero error usingmuch less than one binary digit ofnoiseless feedback for each bit ofinformation sent over the forward channel?

Is it possible to make the number of binarydigits of noiseless feedback for each bit ofinformation sent over the forward channelapproach zero?

37

Proposition 1:There are coding schemes using noiselessfeedback for sending with zero error atany rate less than CoFa = C = 1 - p over aBEC with 0 < p < 1, in which the number ofbinary digits of noiseless feedback foreach bit of information can be made toapproach 0 arbitrarily closely.

YES, to both questions.

38

The main idea:For block coding on the BEC, if there is only onecodeword that agrees with the received word in allunerased positions, then the decoder can outputthis codeword with no possibility of error.

We will call such certain-to-be-correct decodingunambiguous decoding and denote its probabilityby PUA and we write PA = 1 - PUA for theprobability of ambiguous decoding.

Let PML be the probability of error for maximum-likelihood decoding.

39

Proposition 2:For block coding on a BEC with 0 < p < 1,

PA ≤ 2 PML.

Suppose y is the received block and x is a codeword.P(y|x) = 0 unless x agrees with y in all unerasedpositions in which case P(y|x) = (1-p)N-e pe where e isthe number of erasures.⇒ every “all-agreeing” codeword is a valid choice forthe maximum-likelihood decoding decision.The correct codeword must agree with y in allunerased positions. Thus the conditional probabilityof error for the ML decoder must be 0 when thedecoding is unambiguous and is at least 1/2 whendecoding is ambiguous. It follows that PML ≥ (1/2)PA.

40

Suppose we have a block code of length N forsending K information bits in N uses of the BEC atrate RB = K/N > CoFa - δB = 1 - p - δB with maximum-likelihood block error probability PML < εB.

After receiving a block, the receiver sends a 1(ACK) to the sender on the noiseless feedbackchannel if the decoding was unambiguous (whichtells the sender to transmit a new block). If thedecoding was ambiguous, the receiver sends a 0(NAK) on the noiseless feedback channel (whichtells the sender to retransmit the current block).

The probability that a NAK will be sent for any block isPA ≤ 2PML < 2εB. Thus the average number of binarydigits of noiseless feedback for K = NRBinformation bits is less than 1/(1 - 2εB ).

41

Do coding schemes for sending with zeroerror at rates approaching CoFa = C = 1 - pover a BEC with 0 < p < 1 actually requirenoiseless feedback?

Would a Z-channel with 0 < pz < 1 be goodenough for the feedback channel?

Would any DMC with CoFa > 0 be good enoughfor the feeback channel?

Are there DMCs with CoFa = 0 that are goodenough for the feedback channel?

42

The answers are NO, YES, YES and NO.

Suppose when the decoding is unambiguous thereceiver feeds back a string of n 1’s (ACKs) tothe sender on the Z-channel and when thedecoding is ambiguous feeds back a string of n0’s (NAKs) on the Z-channel. The probabilitythat at least one ACK will get through is 1 -pzn.If the sender gets no ACK for a block, thesender repeats the block.

But this doesn’t quite work as is because thereceiver would not know in general whether thedecoded block is a repetition or is a new block.

43

The sender solves this problem by making the firstinformation bit in each block a toggle bit, which isinitially set to 0 and which is complemented eachtime the sender transmits a new block.

b1 b2 . . . bK

togglebit

true info.bits

BLOCKENCODERc1 c2 . . . cN

codeword for BEC

The receiver maintains a test bit, which is initially 0,and recognizes a new unambiguously decoded blockby the fact that it yields a toggle bit equal to thetest bit, following which recognition the receivercomplements the test bit.

44

The analysis is the same as for the case ofnoiseless feedback except that

• The probability of block retransmission, PA, isreplaced by PA + (1- PA)pz

n when computing theaverage number of uses of the feedbackchannel per block of information bits, and

• there are now K - 1 = NRB-1 information bits,rather than K = NRB, in each block.

It should be obvious that, for proper choice of n,these changes have a negligible effect on theaverage rate of information transmission and thatthe average number of binary digits of noiselessfeedback per information bit can still be madearbitrarily small.

45

Theorem 1 implies that, as illustrated on slide 27,every DMC with CoFa > 0 contains an embeddedZ-channel and hence can be used as the feedbackchannel in the previously described zero-errorCoFa-approaching coding scheme for the BEC.

If CoFa = 0, then it follows from Theorem 1 thatthe DMC output alphabet contains no disproversand hence from a received sequence on thischannel one can never be certain that one oftwo possible codewords was not sent. Thus, sucha DMC cannot be used as the feedback channel inany zero-error coding scheme for the BEC.

46

The foregoing results imply that for a two-way channel consisting of two BECs, i.e.,

BEC #1

BEC #2

X1

X2

Y1

Y2

one can transmit with zero error at rates R1 and R2simultaneously approaching the capacities C1 and C2of the individual BECs (because the ACK/NAKfeedback transmissions can be made with anarbitrarily small fraction of channel usage).

47

Do coding schemes for sending with zero error atrates approaching CoFa > 0 with erasurefeedback* over a DMC with Co = 0 require thatthis forward channel be an erasure channel?

*or, equivalently, with any DMC having Co = 0 andCoFa > 0 as the feedback channel, because everysuch channel can be converted to a BEC channelas shown on slides 27 and 28.

N.B. One can certainly send with zero error at somepositive rate with erasure feedback over any DMCwith Co = 0 and CoFa > 0 because such a channel canbe converted to a BEC with positive capacity.

48

Our guess is that this forward channel mustbe an erasure channel, but we are not sureof this.

The next example suggests that we can comefairly close to capacity when the forwardchannel is a Z-channel.

49

Suppose the forward channel is a Z-channel with0 < p < 1. If one uses a constant-weight blockcode, then P(y|x) = 0 unless the codeword xagrees with the received block y in allpositions where y contains a 1, in which case P(y|x) = pw-w’(1-p)w’

where w is the Hamming weight of a codewordand w’ is the Hamming weight of y. Thusdecoding is unambiguous if there is only onesuch codeword and otherwise all (and at leasttwo) such agreeing codewords are valid choicesfor a maximum-likelihood decoder. Again thisimplies PA ≤ 2PML. But it does not seem possibleto reach the capacity of the Z-channel withsuch constant-weight coding.