6990 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 59, …rv285/ITNov13_indel.pdf · 2013. 10. 18. · 6990 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 59, NO. 11, NOVEMBER 2013

6990 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 59, NO. 11, NOVEMBER 2013

Achievable Rates for Channels WithDeletions and Insertions

Ramji Venkataramanan, Member, IEEE, Sekhar Tatikonda, Senior Member, IEEE, andKannan Ramchandran, Fellow, IEEE

Abstract—This paper considers a binary channel with deletionsand insertions, where each input bit is transformed in one of thefollowing ways: it is deleted with probability , or an extra bit isadded after it with probability , or it is transmitted unmodifiedwith probability . A computable lower bound on the ca-pacity of this channel is derived. The transformation of the inputsequence by the channel may be viewed in terms of runs as fol-lows: some runs of the input sequence get shorter/longer, some runsget deleted, and some new runs are added. It is difficult for thedecoder to synchronize the channel output sequence to the trans-mitted codewordmainly due to deleted runs and new inserted runs.The main idea is a mutual information decomposition in terms ofthe rate achieved by a suboptimal decoder that determines the po-sitions of the deleted and inserted runs in addition to decodingthe transmitted codeword. The mutual information between thechannel input and output sequences is expressed as the sum of therate achieved by this decoder and the rate loss due to its subopti-mality. Obtaining computable lower bounds on each of these quan-tities yields a lower bound on the capacity. The bounds proposed inthis paper provide the first characterization of achievable rates forchannels with general insertions, and for channels with both dele-tions and insertions. For the special case of the deletion channel,the proposed bound improves on the previous best lower boundfor deletion probabilities up to 0.3.

Index Terms—Deletion channel, insertion channel, capacitylower bounds.

I. INTRODUCTION

C ONSIDER a binary input channel where for each bit (de-noted ), the output is generated in one of the following

ways:1) The bit is deleted with probability .2) An extra bit is inserted after with probability . The extrabit is equal to (a duplication) with probability , andequal to (a complementary insertion) with probability

.

Manuscript received July 01, 2011; revised March 29, 2013; accepted June21, 2013. Date of publication August 15, 2013; date of current version Oc-tober 16, 2013. This work was supported in part by the NSF under Grant CCF-1017744. This paper was presented in part at the 2011 IEEE International Sym-posium on Information Theory and in part at the 2013 IEEE Information TheoryWorkshop.R. Venkataramanan was with the Department of Electrical Engineering, Yale

University, New Haven, CT 06511 USA. He is now with the Department ofEngineering, University of Cambridge, Cambridge CB2 1PZ, U.K. (e-mail:[email protected]).S. Tatikonda is with the Department of Electrical Engineering, Yale Univer-

sity, New Haven, CT 06511 USA (e-mail: [email protected]).K. Ramchandran is with the Department of Electrical Engineering and

Computer Science, University of California, Berkeley, CA 94704 USA (e-mail:[email protected]).Communicated by S. Diggavi, Associate Editor for Shannon Theory.Color versions of one or more of the figures in this paper are available online

at http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/TIT.2013.2278181

3) No deletions or insertions occur, and the output is withprobability .

The channel acts independently on each bit. We refer to thischannel as the InDel channel with parameters . If theinput to the channel is a sequence of bits, the length of theoutput sequence will be close to for large due tothe law of large numbers.Channels with synchronization errors can be used to model

timing mismatch in communication systems. Channels withdeletions and insertions also occur in magnetic recording [1].The problem of synchronization also appears in file backupand file sharing [2], [3], where distributed nodes may havedifferent versions of the same file which differ by a smallnumber of edits. The edits may include deletions, insertions,and substitutions. The minimum communication rate requiredto synchronize the remote sources is closely related to thecapacity of an associated synchronization channel. This con-nection is discussed at the end of this paper.The model above with corresponds to the deletion

channel, which has been studied in several recent papers, e.g.,[4]–[11]. When , we obtain the insertion channel with pa-rameters . The insertion channel with is the elemen-tary sticky channel [12], where all insertions are duplications.In this paper, we obtain lower bounds on the capacity of the

InDel channel. Our starting point is the result of Dobrushin [13]for general synchronization channels which states that the ca-pacity is given by the maximum of the mutual information perbit between the input and output sequences. There are two chal-lenges to computing the capacity through this characterization.The first is evaluating the mutual information, which is a diffi-cult task because of the memory inherent in the joint distribu-tion of the input and output sequences. The second challenge isto optimize the mutual information over all input distributions.In this paper, we choose the input distribution to be the class

of first-order Markov processes and focus on the problem ofevaluating the mutual information. It is known that first-orderMarkov input distributions yield good capacity lower boundsfor the deletion channel [4], [5] and the elementary stickychannel [12], both special cases of the InDel channel. Thissuggests they are likely to perform well on the InDel channelas well. The runs of a binary sequence are its alternating blocksof contiguous zeros and ones. First-order Markov sequenceshave runs that are independent, and the average run lengthcan be controlled via the Markov parameter. This fits wellwith the techniques used in this paper, which are based on therelationship between input and output runs of the channel.For a synchronization channel, it is useful to think of the

input and output sequences in terms of runs of symbols ratherthan individual symbols. If there was a one-to-one correspon-dence between the runs of the input sequence and those of

0018-9448 © 2013 IEEE

VENKATARAMANAN et al.: ACHIEVABLE RATES FOR CHANNELS WITH DELETIONS AND INSERTIONS 6991

the output sequence , we could write the conditional distribu-tion as a product distribution of run-length transforma-tions; computing the mutual information would then be straight-forward. Unfortunately, such a correspondence is not possiblesince deletions can lead to some runs being lost, and insertionsto new runs being inserted.Themain idea of the paper is to use auxiliary sequences which

indicate the positions (in the output sequence) where runs weredeleted and inserted. Consider a decoder that decodes the auxil-iary sequences in addition to the transmitted codeword. Such adecoder is suboptimal compared to a maximum-likelihood de-coder because of the extra information it decodes, but its perfor-mance is tractable. The mutual information between the channelinput and output sequences is decomposed as the sum of twoterms: 1) the rate achieved by the suboptimal decoder, and 2)the rate loss due to the suboptimality of the decoder. We obtaina lower bound on the channel capacity via lower bounds on eachof the terms above. For the special case of the deletion channel,the rate achieved by the suboptimal decoder can be preciselycalculated.To gain insight, we first consider the special cases of the inser-

tion channel and the deletion channel separately. The insertionchannel with parameters introduces approximately in-sertions in a sufficiently long input sequence of length . A frac-tion nearly of these insertions are duplications, and the restare complementary insertions. Note that new runs can only beintroduced by complementary insertions. We consider a subop-timal decoder that first decodes the positions of the complemen-tary insertions. For the deletion channel, we consider a decoderthat first decodes an auxiliary sequence whose symbols indicatethe number of runs deleted between each pair of adjacent bitsin the output sequence. Augmenting the output sequence withthe positions of deleted runs results in a one-to-one correspon-dence between input and output runs. For the InDel channel, thesuboptimal decoder decodes both auxiliary sequences describedabove. In each case, a capacity lower bound is obtained by com-bining bounds on the rate achieved by the suboptimal decoderand the rate loss due to suboptimality.The main contributions of the paper are the following:1) Theorems 1 and 2 together provide the first characteriza-tion of achievable rates for the general insertion channel

. Previous results exist only for the special case ofthe sticky channel ( , i.e., only duplications).

2) For the special case of the deletion channel , The-orem 3 improves on the best-known capacity lower boundin [5] for .

3) Theorem 4 provides the first characterization of achievablerates for the InDel channel.

Our approach provides a general framework to compute thecapacity of channels with synchronization errors, and suggestsseveral directions to obtain sharper capacity bounds. For ex-ample, results on the structure of optimal input distributions forthese channels (in the spirit of [10], [11]) could be combinedwith our approach to improve the lower bounds. One could alsoobtain upper bounds on the capacity by assuming that the auxil-iary sequences are available “for free” at the decoder, as done in[4] for the deletion channel. For clarity, we only consider the bi-nary InDel channel. The results presented here can be extended

to channels with any finite alphabet. This is briefly discussed inSection VII.

A. Related Work

Jigsaw Decoding: The best previous lower bounds for thedeletion channel are due to Drinea and Mitzenmacher [5]. Theyuse a “jigsaw” decoder which decodes the transmitted codewordby determining exactly which group of runs in give rise toeach run in (this is called the “type” of the -run in [5]). An-alyzing the performance of such a decoder yields a lower boundon the deletion capacity. The suboptimality of the jigsaw de-coder is due to the fact that there may be many sequences oftypes consistent with a given pair . The rate loss due tothis suboptimality is precisely characterized in [6] in terms of amutual information decomposition. For a given input distribu-tion that is i.i.d across runs, [6] expresses themutual informationas the sum of two quantities—the first is the rate achieved by thejigsaw decoder, the second is a conditional entropy term that isthe rate loss incurred due to using a jigsaw decoder rather thanan optimal (maximum-likelihood) decoder. This conditional en-tropy is a multi-letter expression that is hard to compute and isestimated via simulation in [6] for a few values of .Our approach to the deletion channel in Section V also in-

volves a mutual information decomposition, but in terms of adifferent suboptimal decoder. The first term in the decomposi-tion is the rate achieved by decoding the positions of the deletedruns in addition to the transmitted codeword; the second term israte penalty incurred by such a decoder. An interesting observa-tion (discussed in Section V-B) is that the decoder we consideris actually inferior to the jigsaw decoder. However, the penaltyterm of our decoder is easier to bound analytically. As a con-sequence, our mutual information decomposition yields betterlower bounds on the deletion capacity for a range of deletionprobabilities. Additionally, the idea of a decoder that synchro-nizes input and output runs naturally extends to channels withgeneral insertions: we decompose the mutual information interms of the rate achieved by imposing that the positions of com-plementary insertions be decoded, and the rate penalty incurredby such decoder. The jigsaw decoder, in contrast, requires thateach output run be associated with a set of complete input runs,which is not possible when there are complementary insertions.Other Related Work: Dobrushin’s capacity characterization

was used in [11] to establish a series expansion for the dele-tion capacity at small values of . The capacity is estimated bycomputing the leading terms of the expansion, and it is shownthat the optimal input distribution can be obtained by smoothlyperturbing the i.i.d Bernoulli process. In [7], a genie-aideddecoder with access to the locations of deleted runs was usedto upper bound the deletion capacity using an equivalent dis-crete memoryless channel (DMC). In [9], bounds on the dele-tion capacity were obtained by considering a decoder equippedwith side-information specifying the number of output bits cor-responding to successive blocks of input bits, for any positiveinteger . This new channel is equivalent to a DMC with aninput alphabet of size , whose capacity can be numericallycomputed using the Blahut–Arimoto algorithm (for as largeas computationally feasible). The upper bound in [9] is the best


known for a wide range of , but the lower bound is weaker thanthat of [5] and the one proposed here.In [14], bounds are obtained on the capacity of a channel

with deletions and duplications by converting it to an equiva-lent channel with states. Various results on the capacity of chan-nels with synchronization and substitution errors are obtainedin [15]. Finally, we note that a different channel model with bitflips and synchronization errors was studied in [16] and [17].In this model, an insertion is defined as an input bit being re-placed by two random bits. We have only mentioned the papersthat are closely related to the results of this paper. The reader isreferred to [8] for an exhaustive list of references on synchro-nization channels. Mercier et al. [15] also contain a review ofexisting results on these channels.After laying down the formal definitions and technical

machinery in Section II, in Section III we describe codingschemes which give intuition about our bounding techniques.In Section IV, we consider the insertion channel andderive two lower bounds on its capacity. For this channel,previous bounds exist only for the special case of the elemen-tary sticky channel [12]. In Section V, we derive alower bound on the capacity of the deletion channeland compare it with the best previous lower bound [5]. Wealso compare the suboptimality of decoding the positions ofdeleted runs with the suboptimality of the jigsaw decoder of[5]. In Section VI, we combine the ideas of Sections IV andV to obtain a lower bound for the InDel channel. Section VIIconcludes the paper with a discussion of open questions.

II. PRELIMINARIES

Notation: denotes the set of nonnegative integers, andthe set of natural numbers. For , .

Logarithms are with base 2, and entropy is measured in bits.is the binary entropy function and is the indicator function ofthe set . We use uppercase letters to denote random variables,bold-face letters for random processes, and superscript notationto denote random vectors.The communication over the channel is characterized by

three random processes defined over the same probabilityspace: the input process , the output process

, and , where is the numberof output symbols corresponding to the first input symbols. Ifthe underlying probability space is , each realization

determines the sample paths ,, and .

The length channel input sequence isand the output sequence is . Note that is a randomvariable determined by the channel realization. For brevity, wesometimes use underlined notation for random vectors whenwe do not need to be explicit about their length. Thus,

and .Definition 1: An code with block length and rateconsists of1) An encoder mapping , and2) A decoder mapping where is

for the deletion channel, for theinsertion channel, and for the InDel channel.

Assuming the message is drawn uniformly from the set, the probability of error of a code is

A rate is achievable if there exists a sequence ofcodes such that as . The supremum of allachievable rates is the capacity . The following characteriza-tion of capacity follows from a result proved for a general classof synchronization channels by Dobrushin [13].Fact 1: Let . Then,

exists, and is equal to the capacity ofthe InDel channel.

Proof: Dobrushin proved the following general result in[13]. Consider a channel with and denoting the alphabets ofpossible symbols at the input and output, respectively. For eachinput symbol in , the output belongs to , the set of all finitesequences of elements of , including the empty sequence. Thechannel is memoryless and is specified by the stochastic matrix

. Also assume that for each inputsymbol , the length of the (possibly empty) output sequencehas nonzero finite expected value. Then exists, andis equal the capacity of the channel.The InDel channel is a special case of the above model with

, and the length of the output correspondingto any input symbol has a maximum value of two and expectedvalue equal to , which is nonzero for all . Hence,the claim is a direct consequence of Dobrushin’s result.In this paper, we fix the input process to be the class of binary

symmetric first-orderMarkov processes and focus on evaluatingthe mutual information. This will give us a lower bound on thecapacity. The input process is characterized bythe following distribution for all :

where for , and for

(1)A binary sequence may be represented by a sequence of positiveintegers representing the lengths of its runs, and the value of thefirst bit (to indicate whether the first run has zeros or ones). Forexample, the sequence 0001100000 can be represented as (3, 2,5) if we know that the first bit is 0. The value of the first bitof can be communicated to the decoder with vanishing rate,and we will assume this has been done at the outset. Hence,denoting the length of the th run of by we have thefollowing equivalence: . For a first-orderMarkov binary source of (1), the run-lengths are independentand geometrically distributed, i.e.,

(2)

The average length of a run in is , so the number of runsin a sequence of length is close to for large . Our


bounding techniques aim to establish a one-to-one correspon-dence between input runs and output runs. The independenceof run-lengths of enables us to obtain analytical boundson the capacity. We denote by , ,

the mutual information and entropies computedwith the channel input sequence distributed as in (1). Forall , we have

(3)

Therefore,

(4)

where is the entropy rate of the Markov process [18].We will derive upper bounds onand use it in (4) to obtain a lower bound on the capacity.

A. Technical Lemmas

To formally prove our results, we will use a frameworksimilar to [6]. The notion of uniform integrability will play animportant role. We list the relevant definitions and technicallemmas below. The reader is referred to [6, Appendix I] for agood overview of the concept of uniform integrability.Definition 2: A family of random variables is uni-

formly integrable if

Lemma 1. [19, Lemma 7.10.6]: A family of random vari-ables is uniformly integrable if and only if both thefollowing conditions hold:1) , and2) For any , there exists some such that for alland any event with , we have.

Let denote the random variable whose value isthe size of the support of the conditional distribution of given.Lemma 2. [6, Lemma 4]: Let be a sequence of

pairs of discrete random variables with for

some constant . Then,

. In particular, the sequence is uni-formly integrable.Lemma 3. [19, Th. 7.10.3]: Suppose that is a

sequence of random variables that converges to in probability.Then the following are equivalent.1) is uniformly integrable.2) for all , and .Lemma 4: Let be a process for which the

asymptotic equipartition property (AEP) holds, i.e.,

where is the (finite) entropy rate of the process . Letbe a sequence of positive integer valued random vari-

ables defined on the same probability space as the s, and sup-

pose that almost surely for some constant .Then,

Proof: Fix and define and. Since a.s., there exists

an such that for all ,

It follows that for all ,

(5)

Hence,

(6)

Similarly, one can show that

(7)

Since is arbitrary, combining (6) and (7), we get the resultof the lemma.

III. CODING SCHEMES

In this section, we describe coding schemes to give intuitionabout the auxiliary sequences used to obtain the bounds. Thediscussion here is informal. The capacity bounds are rigorouslyproved in Sections IV –VII where the auxiliary sequences areused to directly decompose and the limiting be-havior is bounded using information-theoretic inequalities andelementary tools from analysis.

A. Insertion Channel

Consider the insertion channel with parameters . For, the inserted bits may create new runs, so we cannot

associate each run of with a run in . For example, let

(8)

where the inserted bits are indicated in large italics. There isone duplication (in the third run), and two complementary in-sertions (in the first and second runs). While a duplication neverintroduces a new run, a complementary insertion introduces anew run, except when it occurs at the end of a run of (e.g.,the 0 inserted at the end of the second run in (8)). For anyinput-pair , define an auxiliary sequence

where if is a complementary inser-tion, and otherwise. The sequence indicates thepositions of the complementary insertions in . In the ex-ample of (8), .


Consider the following coding scheme. Construct a codebookof codewords of length , each chosen independently ac-cording to the first-order Markov distribution (1). Let de-note the transmitted codeword, and the channel output.From , the decoder decodes (using joint typicality) the po-sitions of the complementary insertions, in addition to the inputsequence. The joint distribution of these sequences is deter-mined by the input distribution (1) and the channel parameters

.Such a decoder is suboptimal since the complementary in-

sertion pattern is not unique given an input–output pair. For example, the pair , can

either correspond to a complementary insertion in the secondbit or a duplication in the third bit. The maximum rate achiev-able by this decoder is obtained by analyzing the probability oferror. Assuming all sequences satisfy the asymptotic equiparti-tion property [18], we have for sufficiently large

(9)

The second term above is the probability thatare jointly typical when is picked independently from

. The first term is obtained by taking a union boundover all the codewords and all the typical complementary inser-tion patterns for each codeword. Hence, the probability of errorgoes to zero if

(10)

We can decompose the mutual information as

(11)

The first part above is the rate achieved by the decoder describedabove and the second “penalty” term represents the rate-loss dueto its suboptimality. We obtain a lower bound on the insertioncapacity in Section IV-B by obtaining good single-letter lowerbounds on the limiting behavior of both terms in (11).

B. Deletion Channel

Consider the following pair of input and output sequencesfor the deletion channel: , . For thispair, we can associate each run of uniquely with a run in .Therefore, we can write

where , denote the lengths of the th runs of and ,respectively. We observe that if no runs in are completely

deleted, then the conditional distribution of given may bewritten as a product distribution of run-length transformations

(12)

where for all runs , for. In general, there are runs of that are completely

deleted. For example, if and , wecannot associate the single run in uniquely with a run in .For any input–output pair , define an auxiliary

sequence , where isthe number of runs completely deleted in between the bitscorresponding to and . ( is the number of runs deletedbefore the first output symbol , and is the number ofruns deleted after the last output symbol .) For example, if

and the bits shown in italics were deleted to

give , then . On the other hand, if thelast six bits were all deleted, i.e., , then

. Thus, is not uniquely determined given .The auxiliary sequence enables us to augment with thepositions of missing runs. As will be explained in Section V,the runs of this augmented output sequence are in one-to-onecorrespondence with the runs of the input sequence.Consider the following coding scheme. Construct a codebook

of codewords of length , each chosen independently ac-cording to (1). The decoder receives , and decodes (usingjoint typicality) both the auxiliary sequence and the input se-quence. Such a decoder is suboptimal since the auxiliary se-quence is not unique given a codeword and theoutput . Assuming all sequences satisfy the asymptoticequipartition property, we have for sufficiently large

(13)The second term above is the probability that

are jointly typical when ispicked independently from . The first term isobtained by taking a union bound over all the codewords andall the typical auxiliary sequences for each codeword. Hence,the probability of error goes to zero if

(14)

We can decompose the mutual information as

(15)

In Section V, we obtain an exact expression for the limit of thefirst term as and a lower bound for the penalty term.These together yield a lower bound on the deletion capacity.


C. InDel Channel

For the InDel channel, we use both auxiliary sequencesand . The suboptimal decoder decodes both these se-quences in addition to the codeword . The mutual informa-tion decomposition in this case is

(16)

In Section VI, we establish a lower bound on the capacity ofthe InDel channel by obtaining lower bounds for both parts of(16). As seen from (11), (15), and (16), the rate penalty for usingthe suboptimal decoder is the conditional entropy of the auxil-iary sequences given both the input and output sequences; this isessentially the extra information decoded compared to a max-imum-likelihood decoder. In Sections IV –VI, we bound thisconditional entropy by identifying insertion/deletion patternsthat lead to different auxiliary sequences for the samepair.

IV. INSERTION CHANNEL

In this channel, an extra bit may be inserted after each bit ofwith probability . When a bit is inserted after , theinserted bit is equal to (a duplication) with probability , andequal to (a complementary insertion) with probability .When , we have only duplications—this is the elementarysticky channel studied in [12]. In this case, we can associate eachrun of with a unique run in , which leads to a computablesingle-letter characterization of the best achievable rates witha first-order Markov distribution. We derive two lower boundson the capacity of the insertion channel, each using a differentauxiliary sequence.

A. Lower Bound 1

For any input-pair , define an auxiliary sequencewhere if is an inserted bit, and

otherwise. The sequence indicates the positionsof all the inserted bits in , and is not unique for a given

. Using , we can decompose as

since . Therefore,

(17)

The last term above represents the rate loss of a suboptimaldecoder that decodes the transmitted codeword by first deter-mining the positions of all the insertions. We obtain a lowerbound on the insertion capacity by deriving an upper bound onthe limsup and a lower bound on the liminf in (17).Proposition 1: The process

is a second-order Markov process character-ized by the following joint distribution for all :

where for , and :

(18)

Proof: We need to show that for all , the fol-lowing Markov relation holds:

. First consider. Since , is the most recent input

bit (say ) before .is the

probability that the following independent events occur: 1) theinput bit equals and 2) there was no insertion afterinput bit . Since the insertion process is i.i.d and independentof the first-order Markov process , we have

Similarly, we obtain

Next consider

Since , is the most recent input bit (say, )before . Also note that is the input bit since isan insertion. (At most one insertion can occur after each inputbit.) Hence,

is just the probability that , which is equal to .Similarly,

equals .


Remark: Proposition 1 implies that the process canbe characterized as a Markov chain with state at time given by

. This is an aperiodic, irreducible Markov chain.Hence, a stationary distribution exists, which forcan be verified to be

(19)

Lemma 5:

Proof: See Appendix I-A.Lemma 6:

and

(20)

Proof: See Appendix I-B.Bounding the penalty term, we next focus on the penalty

term which is the uncertainty in the posi-tions of the insertions given both the channel input and outputsequences. Consider the following example:

(21)

Assume that the value of is known for all bits in exceptthe run of ones shown above. Further Suppose that itis known that the -bits shown in the first line of (21) exactlycorrespond to the -bits in the second line. For any and

, the following are all the insertion patterns that areconsistent with the shown pair.1) The 0 preceding the -run undergoes a complementaryinsertion leading to the first 1 in the -run. Then,out of the 1s in the -run undergo duplications, theremaining 1s are transmitted without any insertions.

2) The 0 preceding the -run is transmitted without any in-sertions. of the 1s in the -run undergo duplications,the remaining are transmitted without insertions.

For the same pair, the first scenario above leads todifferent sequences, and the second leads to another

s. Calculating the entropy associated with these patterns

yields a lower bound on the penalty term. This intuition is maderigorous in the following lemma.Lemma 7:

where

(22)

with .Proof: See Appendix I-C.

Theorem 1: (LB 1) The capacity of the insertion channel withparameters can be lower bounded as

where is defined in (22).Proof: Using Lemmas 5, 6, and 7 in (17), we obtain the

RHS above, which is a lower bound on the insertion capacitydue to (4). We optimize the lower bound by maximizing overthe Markov parameter .

B. Lower Bound 2

For any input-pair , define an auxiliary sequencewhere if is a complementary

insertion, and otherwise. The sequence indicatesthe positions of the complementary insertions in . Note that

is different from the sequence , which indicates thepositions of all the insertions. Using , we can decompose

as

(23)

Using this, we have

(24)

We obtain a lower bound on the insertion capacity by boundingeach of the limiting terms in (24).Lemma 8:

Proof: The proof of this lemma is identical to that ofLemma 5, and can be obtained by replacing with .


Lemma 9:

and

(25)

Proof: See Appendix I-D.We now derive two upper bounds on the limiting behavior

of . Define as the sequence obtainedfrom by flipping the complementary insertions in

, i.e., flip bit if . has insertions in the samelocations as , but the insertions are all duplications. Hence,

has the same number of runs as . Recall from Section IIthat we can represent both binary sequences in terms of theirrun-lengths as

where , the number of runs in (and ) is a randomvariable. Therefore, for all we have the upper bound

(26)

where the inequality holds because is a function of.

We can obtain another upper bound by removing the com-plementary insertions from . Define as the sequence ob-tained from by deleting the complementary inser-tions. Let denote the length of . Since all the complemen-tary insertions have been removed, has the samenumber of runs as . We therefore have the bound

(27)

Proposition 2: The processes

and

are both i.i.d processes characterized by the following joint dis-tributions for all . For and ,

(28)

(29)

Proof: Since is a Markov process, are inde-pendent with

is generated from by independently duplicating eachbit with probability . Hence, can be thought of being ob-tained by passing a run of length through a DMC with tran-sition probability

Equation (29) can be obtained in a similar fashion by observingthat is generated from by independently duplicatingeach bit with probability .Lemma 10:

where the joint distributions of and aregiven by Proposition 2.

Proof: See Appendix I-E.The penalty term is the uncertainty in

the positions of the complementary insertions given both thechannel input and output sequences. We lower bound this usinga technique very similar to the one used for the penalty termin Section IV-A. Consider again the pair shown in (21)with the knowledge that the -bits shown in the first line of(21) yielded the -bits in the second line. Further assume thatthe value of is known for all bits in except the run ofones shown above. Denoting the first bit of the run of ones inby , the only remaining uncertainty in is in . Indeed,1) if the 0 preceding the -run undergoes a comple-mentary insertion leading to the first 1 in the -run. Then,

out of the 1s in the -run undergo duplications,the remaining 1s are transmitted without any insertions.

2) if the 0 preceding the -run is transmitted withoutany insertions. out of the 1s in the -run un-dergo duplications, the remaining are transmitted withoutinsertions.

Calculating the binary entropy associated with the two casesabove yields a lower bound on the penalty term. This intuitionis made rigorous in the following lemma.Lemma 11:

where

Proof: See Appendix I-F.


Fig. 1. Comparison of the two lower bounds on with .

Theorem 2 (LB 2): The capacity of the insertion channel withparameters can be lower bounded as

where , are computed using thejoint distributions given in Proposition 2.

Proof: Using Lemmas 8–11 in (24), we obtain the RHSabove, which is a lower bound on the insertion capacity dueto (4). We optimize the lower bound by maximizing over theMarkov parameter .We note that the infinite sums in the formulas for

and can be truncated to compute the capacity lowerbounds since they appear with positive signs in Theorems 1and 2. Fig. 1 compares with for different values ofwith fixed at 0.8. We show the bounds obtained by evalu-ating separately with (deleting the complementary in-sertions) and (flipping the complementary insertions). Forsmaller insertion probabilities we observe the -bound is better,i.e., . We also observe thatis generally a better bound than , except when is large.For large , it is more efficient to decode the positions of all theinsertions rather than just the complementary insertions. Specif-ically, comparing Lemmas 6 and 9,

for large values of because is very likely to be 1 if .(Recall that whenever .) Combining the boundsof Theorems 1 and 2, we observe that is alower bound to the insertion capacity. This is plotted in Fig. 2for various values of , for , 0.5, 0.8, 1. The lower boundis not monotonic in for a given . This is because the curve isthe maximum of three bounds ( , with and ), eachof which has a different behavior as we vary , .For , the bound is very close to the near-optimal lower

bound in [12]. The difference is entirely due to using a first-orderMarkov input distribution rather than the numerically optimizedinput distribution in [12].

Fig. 2. Lower bound on the insertion capacity for , 0.8. For, the lower bound of [12] is shown using .

V. DELETION CHANNEL

In this channel, each input bit is deleted with probability, or retained with probability . For any input–outputpair , define the auxiliary sequence , where

is the number of runs completely deleted inbetween the bits corresponding to and . ( is thenumber of runs deleted before the first output symbol , and

is the number of runs deleted after the last outputsymbol .) Examples of for the input–output pair

were given in Section III-B.The auxiliary sequence let us augment with the positions

of missing runs. Consider . If the decoder isgiven and , it can form the augmentedsequence , where a “ ” denotes a missing run, orequivalently a run of length 0 in . With the “ ” markers indi-cating deleted runs, we can associate each run of the augmentedsequence uniquely with a run in . Denote bythe run-lengths of the augmented sequence , whereif run is a “ .” Then, we have

(30)where :

(31)

Using the auxiliary sequence , we can write

(32)

We therefore have

(33)


We will show that exists,and obtain an analytical expression for this limit. We also derivea lower bound on the penalty term, thereby obtaining a lowerbound on the deletion capacity. We remark that it has beenshown in [6] that for any input distribution with independentruns, exists for the deletion channel.Hence, the on the left-hand side of (33) is actually alimit.Proposition 3: The process is a first-order

Markov process characterized by the following joint distribu-tion for all

where for , and

(34)

Proof: The proof of this proposition can be found in [8].Proposition 4: The process

is a first-order Markov process characterizedby the following joint distribution for all :

where for and :

(35)

and

(36)

Proof: In the sequel, we use the shorthand notation to de-note the sequence . We needto show that

for all and , .Let the output symbols correspond to

input symbols for some positive inte-gers . is the number of runsbetween the input symbols and , not countingthe runs containing and . Similarly, is thenumber of runs between the input symbols and ,not counting the runs containing and , etc.First consider the case where . When

and , note that , the number of

completely deleted runs between and , is either zeroor an odd number. We have

(37)

where is obtained as follows. is the probabilitythat the input run containing contains bits after ,and is the probability that at least one of them is notdeleted. This needs to hold for some in order to have

and . By reasoning similar to the above, wehave for :

(38)

where the first term in is the probability that the remainder ofthe run containing is completely deleted, the second termis the probability that the next runs are deleted, and the lastterm is the probability that the subsequent run is not completelydeleted.When and , the number of deleted runs

is either zero or an even number. For we have

(39)

In the above, the first term in is the probability that the re-mainder of the run containing is completely deleted, thesecond term is the probability that the next runs are deleted (may be equal to zero), and the third term is the probability thatthe subsequent run is not completely deleted. This completes theproof of the lemma.We now show that and

each exist, thereby provingthe existence of .Lemma 12:

where the joint distribution of isgiven by (34)–(36).

Proof: See Appendix II-A.To determine the limiting behavior of

, we recall that can beequivalently represented in terms of its run-lengths as

, where , the number of runs in ,


is a random variable. Also recall from the discussion atthe beginning of this section that the pair of sequences

is equivalent to an augmented sequenceformed by adding the positions of the deleted runs to

. can be equivalently represented in terms ofits run-lengths as , where we emphasize that

can take value 0 as well. To summarize, we have

(40)Thus, for all

(41)Proposition 5: The process

is an i.i.d process charac-terized by the following joint distribution for all :

(42)


Since the deletion process is i.i.d, each can be thought ofbeing obtained by passing a run of length through a DMCwith transition probability

Lemma 13:

where the joint distribution of is given by (42).Proof: See Appendix II-B.

A. Bounding the Penalty Term

The penalty term is the uncertainty inthe positions of the deleted runs given both the channel inputand output sequences. To get some intuition about this term,consider the following example:

(43)

Given the uncertainty in corresponds to how manyof the output bits came from the first run of zeros in ,and how many came from the second. In (43), can be oneof four sequences: (2, 0, 0, 0), (0, 0, 0, 2), (0, 1, 0, 0), and(0, 0, 1, 0). The first case corresponds to all the output bitscoming from the second run of zeros; in the second case all theoutput bits come from the first run. The third and fourth casescorrespond to the output bits coming from both input runs ofzeros. The probability of the deletion patterns resulting in each

of these possibilities can be calculated. We can thus computeprecisely for this example. For general , we

lower bound by considering patterns in ofthe form shown in (43). This is done in the following lemma.Lemma 14:

where

(44)

where and is the entropy of the pmf. (In (44) it is assumed that for .)Proof: See Appendix II-C.

Theorem 3: The deletion channel capacity can be lowerbounded as

where

(45)

and

(46)

Proof: We obtain the lower bound on the deletion ca-pacity by using Lemmas 12–14 in (33). is thenbe computed using the joint distribution given by (34)–(36).

can be computed using the joint distribution givenin Proposition 5. Finally, we optimize the lower bound bymaximizing over the Markov parameter .Since the summations in (46) and (44) appear with positive

signs in Theorem 3, they can be truncated to compute a capacitylower bound. Table I shows the capacity lower bound of The-orem 3 for various values of together with opti-mized with a resolution of 0.005. We notice that the maximizingincreases with , i.e., input runs get longer and are hence less

likely to be deleted completely. We observe that for(values shown in bold) our lower bound improves on that of[5], the best previous lower bound on the deletion capacity.A sharper lower bound on the penalty term will improve the

capacity bound of Theorem 3. In deriving in Lemma 14,


TABLE ICAPACITY LOWER BOUND FOR THE DELETION CHANNEL

we considered -runs obtained from either one or three adjacent-runs and lower bounded the conditional entropy

assuming all but three -runs giving rise to the -run wereknown. The lower bound can be refined by additionally con-sidering the cases where -run arose from adjacent-runs. However, this will imply a more complicated formula

for in (44).

B. Comparison With Jigsaw Decoding

The jigsaw decoder decodes the type of each run in the outputsequence . The type of a -run is the set of input runs thatgave rise to it, with the first input run in the set contributing atleast one bit. The penalty for decoding the codeword by firstdecoding the sequence of types is . Acharacterization of this conditional entropy in terms of the jointdistribution of is derived in [6], but we will not need theprecise expression for the discussion below.Given a pair , observe that knowledge of uniquely

determines the sequence of types of , but not vice versa. Forexample, consider the pair

(47)

Suppose we know that , i.e., can be aug-mented with deleted runs as . Then, the types of thethree runs are

(48)

In contrast, suppose we know that the set of types for thepair in (47) is as shown in (48). Then, and

(corresponding to deletion patternsand , respectively) are both consistent -sequenceswith the given set of types. In summary, since the set of types isa function of we have

In other words, the rate penalty incurred by the jigsaw decoderis smaller than the penalty of the suboptimal decoder consideredhere. However, the penalty term for our decoder can be lowerbounded analytically, which leads to improved lower bounds onthe deletion capacity for . The jigsaw penalty term is

Fig. 3. Cascade channel equivalent to the InDel channel.

harder to lower bound and is estimated via simulation for a fewvalues of in [6].We note that the analysis of the jigsaw decoder in [5] and

[6] relies on two conditions being satisfied: 1) output runs areindependent, 2) each output run arises from a set of completeinput runs. In a channel with only deletions and duplications, thesecond condition is always true, and the first is guaranteed bychoosing the input distribution to be i.i.d across runs. With com-plementary insertions, the output runs are dependent even whenthe input distribution is i.i.d. Further, we cannot associate eachoutput run with a set of complete input runs. For example, if

and , each output run of zeros correspondsto only a part of the input run. It is therefore hard to extendjigsaw decoding to channels where complementary insertionsoccur. For such channels, the rate achieved by a decoder whichdecodes auxiliary sequences to synchronize the output runs withthe runs of the transmitted codeword can still be lower boundedanalytically. Though the rate of such a decoder may not be closeto capacity, the final capacity bound is higher since it also in-cludes the lower bound to the penalty term.

VI. INDEL CHANNEL

The InDel channel is defined by three parameterswith . Each input bit undergoes a deletion with proba-bility , a duplication with probability , a complementary in-sertion with probability . Each input bit is deleted with proba-bility ; given that a particular bit is not deleted, the probabilitythat it undergoes an insertion is . Therefore, one can thinkof the channel as a cascade of two channels, as shown in Fig. 3.The first channel is a deletion channel that deletes each bit inde-pendently with probability . The second channel is an insertionchannel with parameters , where . We prove theequivalence of this cascade decomposition below.Claim: The InDel channel is equivalent to the cascade

channel in the sense that both have the same transition proba-bility , and hence the same capacity.

Proof: For an -bit input sequence, define the deletion–in-sertion pattern of the channel as thesequence where indicates whether the channel introduces adeletion/duplication/complementary insertion/no modificationin bit of the input. Note that if the underlying probabilityspace is , the realization determines thedeletion–insertion pattern . We calculate the probabilityof any specified pattern occurring in a) the InDel channel, andb)the cascade channel.Consider a deletion–insertion pattern with deletions at

positions , duplications at positions ,and complementary insertions at positions . Theprobability of this pattern occurring in the InDel channel is


The probability of this pattern occurring in the cascade channelof Fig. 3 is

(49)

where the first term in is the probability of deletions occur-ring in the specified positions in the first channel, and the secondterm is the probability of the insertions occurring in the speci-fied positions in the second channel. Thus, for any fixed pair

, every deletion–insertion pattern that produces fromhas the same probability in both the InDel channel and the

cascade channel. This implies that the two channels have thesame transition probability.To obtain a lower bound on the capacity, we work with

the cascade channel and use two auxiliary sequences,and .

As in Section IV-B, indicates the complementary inser-tions in : if is a complementary insertion, and

otherwise. As in Section V, indicates the po-sitions of the missing runs: , if runs were completelydeleted between and . We decomposeas

(50)

We therefore have

(51)

Using the techniques developed in the previous two sections, webound each of the limiting terms above to obtain a lower boundon the InDel capacity.Lemma 15:

where .

Proof: We first note that

(52)The proof of (52) is essentially the same as that of Lemma 5,with two changes: replaces , and converges to

for the InDel channel. We then have

(53)

Therefore,

(54)

provided the limit exists. From the cascade representation inFig. 3, we see that the insertions are introduced by the secondchannel in the cascade. The input to this insertion channel is aprocess which is the output of the first channelin the cascade. From Propositon 3, is a first-order Markovprocess with parameter . We therefore need tocalculate where is the outputwhen a first-order Markov process with parameter is trans-mitted through an insertion channel with parameters . Butwe have already computed inLemma 9 for an insertion channel with parameters with afirst-order Markov input with parameter . Hence, in Lemma 9we can replace by and by to obtain

Substituting and simplifying yields a lower bound forfrom (54). Combining with (52)

gives the statement of the lemma.Lemma 16:


where

Proof: See Appendix III-A.As in Section IV-B, we upper bound

in two ways. Thefirst involves flipping the complementary insertions into obtain ; the second bound in obtained by deletingthe complementary insertions to obtain . In both cases,the extra runs introduced by the channel are removed. Using

, we can augment (resp. ) by adding thepositions of the deleted runs to obtain a sequence(resp. ) which contains the same number of runs as. can be represented in terms of its run-lengths as

, where we emphasize that cantake value 0 as well. To summarize, we have

(55)

Proposition 6:1) The processis an i.i.d process characterized by the following joint dis-tribution for all :

(56)

(57)

where , the set of possible values for the number of inser-tions , is given by

2) The processis an i.i.d process who joint distribution is obtained by re-placing in (57) with .


Since there is a one-to-one correspondence between the runs ofand the runs of , we can think of each being obtained

by passing a run of length through a DMC. For a pair

, if the number of insertions is , the number ofdeletions is easily seen to be . Since there can be atmost one insertion after each input bit, no more than half the bitsin an output run can be insertions; hence the maximum value ofis . The minimum value of is zero for , and

for . Using these together with the fact that each bit canindependently undergo an insertion with probability , a deletionwith probability , or no change with probability , thetransition probability of the memoryless run-length channel isgiven by (57).The proof for the second part is identical except that the ef-

fective insertion probability is now since the complementaryinsertions have been removed.Lemma 17:

(58)

where the joint distributions of and aregiven by Proposition 6.

Proof: The proof is identical to that of Lemma 13.Using the cascade representation, the penalty term in (51) can

be bounded as follows.Lemma 18:

Proof: We have

(59)

The first term in (59) can be lower bounded as

(60)

The equality above holds due to the Markov chain

(61)


where and denote the deletion and insertion processesof the first and second channels in the cascade, respectively.1

The equality in (60) is due to the fact that the process is afunction of . We then have

(62)

which is obtained as follows. The second channel in the cascadehas insertion probability and input which is first-orderMarkov with parameter . We then obtain (62) via steps verysimilar to the proof of Lemma 11, accounting for the fact that

. The second term in (59) is bounded as follows:

(63)

In (63), holds because is a function of. (Recall if is an inserted

bit and 0 otherwise.) To obtain , first note that canbe determined from (by deleting the insertedbits from ). is the length -sequencecorresponding to just the first channel in the cascade: for

, is the number of runs completelydeleted between bits and . In contrast, is the-sequence for the overall channel: is the number ofdeleted runs between bits and . holds because given

, knowledge of is sufficient to reconstructand vice versa. To obtain , we observe that is a

function of . Then, it follows from the Markov chain(61) that is conditionally independent ofgiven , resulting in . Finally, applying Lemma 14to the first channel in the cascade we get

(64)

Using (62)–(64) in (59) completes the proof.Theorem 4: The capacity of the InDel channel can be lower

bounded as

where , , , , are defined in Lemma 16, and, are computed using the joint

distributions given in Proposition 6.Proof: The result is obtained by using Lemmas 15–18 in

(51).

1 is a process where indicates if the th input bit was deleted ornot. Similarly, specifies which input bits to the second channel undergoduplications and which undergo complementary insertions.

Fig. 4. Lower bound on the InDel capacity for .

The lower bound is plotted in Fig. 4 for various values of, for and for . Like the deletion channel,

the maximizing was found to increase as was increased;this has the effect of making runs longer and thus less likely tobe deleted completely.For Theorem 4, we used the sequence to indicate the

positions of complementary insertions together with to in-dicate deleted runs. We can obtain another lower bound on theInDel capacity by using the sequence instead of . Thisbound can be derived in a straightforward manner by combiningthe techniques of Sections IV-A and V, and is omitted. As inSection IV-A, we expect that a lower bound with improve ona bound with when is large. Such a bound would be usefulfor InDel channels with large and small .

VII. DISCUSSION

The approach we have used to obtain capacity lower boundsconsists of two parts: analyzing a decoder that synchronizesinput and output runs by decoding auxiliary sequences, andbounding the rate penalty due to the suboptimality of such adecoder. The rate penalty measures the extra information thatis decoded compared to maximum-likelihood decoding, andcan be lower bounded by identifying patterns of deletions andinsertions that give rise to multiple auxiliary sequences for thesame pair.We now discuss some directions for future work.Improving the Lower Bounds: The lower bound for the

penalty terms can be sharpened by identifying additional dele-tion/insertion patterns that lead to different auxiliary sequencesfor the same pair of input and output sequences. For the dele-tion channel, such patterns were briefly described at the endof Section V-A. There are also a few other ways to sharpenthe bounds for the insertion and InDel channels. Creating thesequences and using the positions of complementaryinsertions synchronizes the input and output runs, but is notan optimal way to the sequence . Is there a better way touse the knowledge of Another observation is that thepresence of insertions results in an output process that is notMarkov, which is the reason an exact expression for the limitingbehavior of could not be obtained in Lemma 9. A


better bound for this term would improve the capacity lowerbound.Another direction is to investigate the performance of more

general input distributions with i.i.d runs. In [11], a specificinput distribution with i.i.d runs was shown to be optimal for thedeletion channel as . Further, it was shown that a first-order Markov input distribution incurs a rate loss of roughly

with respect to the optimal distribution as . A sim-ilar result on the structure of the optimal input distribution forthe InDel channel with small values of and would be veryuseful, and could be combined with the approach used here toobtain good estimates of the capacity for small insertion anddeletion probabilities.Upper Bounds: Obtaining upper bounds for insertion and

InDel channels is an important topic to address. For the InDelchannel with , the upper bounding techniques of [7] and[12] can be used. Here an augmented decoder which knows thepositions of the deleted runs is considered, and the capacityper unit cost of the resulting synchronized channel is upperbounded. For channels where complementary insertions occur

, the computational approach of [9], [17] can be usedto obtain upper bounds. Here it is assumed that the decoder issupplied with markers distinguishing blocks of the output whicharose from successive input blocks of length . This augmentedchannel is then equivalent to a memoryless channel with inputalphabet size , whose capacity can be evaluated numericallyusing the Blahut–Arimoto algorithm.It is interesting to explore how the insertion capacity varies

with for fixed . Fig. 2 shows that the insertion lower boundis not a monotonic function of , but is the highest for .Does the actual capacity also behave in a similar way? Further,what is the “worst” for a given ?

Extension to Larger Alphabets: The capacity lower boundscan be extended to channels with larger alphabet. Though somemodifications arise, the steps leading up to the result are sim-ilar. For an alphabet of size , fix a symmetric first-orderMarkov input distribution such that the probability that

is and the probability that takes any of the othervalues in the alphabet is . Consider the deletion channelwhere each symbol is deleted with probability . The output se-quence is still first-order Markov. Proposition 4 gets modifiedso that the right side of (36) is multiplied by to accountfor the multiple possibilities when . Lemmas 12 and 14remain unchanged. The input entropy rate is computed with thenew input distribution, and the conditional entropy rate calcula-tion in Lemma 13 now has to account for the fact that when runsare deleted, the symbol corresponding to the run also needs tobe indicated. In other words, (40) and (41) are modified so thatcorresponds to a sequence of (symbol, run-length) pairs, as

does . The insertion and InDel capacity lower bounds canbe similarly extended for larger alphabet channels that are sym-metric.The framework used here can be extended to derive bounds

for channels with substitution errors in addition to deletions andinsertions. For this, we would need an additional auxiliary se-quence, e.g., one that indicates the positions of substitutions.The problem of synchronization also appears in file backup

and file sharing [2], [3]. In the basic model, we have two termi-nals, with the first terminal having source a and the secondhaving , which is an edited version of . The edits may in-clude deletions, insertions, and substitutions. A basic questionis: to update to , what is the minimum communication rateneeded from the first node to the second? It can be shown thatregardless of whether the first terminal knows or not, the op-timal rate is given by the limiting behavior of . The

(65)

(66)


results derived in this paper provide bounds on this optimal ratefor the case where is Markov, and the edit modelis one with i.i.d deletions and insertions. Extension of these re-sults to edit models with substitution errors would yield ratesto benchmark the performance of practical file synchronizationtools such as rsync [20].

APPENDIX IINSERTION CHANNEL

A) Proof of Lemma 5: We begin by noting thatalmost surely, due to the strong law of large numbers.

We decompose and upper bound as shownin (65) at the bottom of the previous page. First examine thethird term in the last line of (65). The size of the supportof is at most , since is a bi-nary sequence of length at most . Hence, from Lemma 2,

is uniformly integrable. FromLemma 1, for any , there exists a such that

(67)

whenever . Sincealmost surely, is less

than for all sufficiently large . Thus, (67) is true for all suf-ficiently large . Similarly, the third term can be shown to besmaller than for all sufficiently large . Therefore, for all suf-ficiently large , (65) becomes

(68)

where holds because and can eachtake on at most different values. Hence,

Since is arbitrary, we let to obtain

(69)Using steps similar to (65), we can decompose and lower

bound as shown in (66) at the bottom of theprevious page. Using arguments identical to the one used forthe upper bound, we can show that the last two terms in (66)are smaller than in absolute value for all sufficiently large ,leading to

(70)

Combining (69) and (70) completes the proof.B) Proof of Lemma 6: We have

(71)

where the inequality holds because conditioning cannot increaseentropy. Therefore,

(72)

From Proposition 1, the process is characterized by aMarkov chain with state at time given by . Theconditional distribution is given by(18) and the stationary joint distribution is givenby (19). The right side of (72) can then be computed from theentropy rate of this Markov chain as

(73)

where the subscript refers to the stationary joint distributionon , given by (19) and (18).

can be computed as follows.First, we note that whenever . Therefore,

(74)

From (19) and (18), we have

(75)

and

(76)


The remaining terms in (74) can be similarly calculated to obtain(20).

C) Proof of Lemma 7: In the following, all entropies arewith respect to the distribution , so we drop the subscript forbrevity. Let denote the th run of , and denote thesequence of -bits corresponding to the th run of . (Notethat is distinct from the th run of .) Let denote thenumber of runs in sequence . Using this notation, we can write

. We have

(77)The inequality in the last line of (77) is obtained by condi-

tioning on additional random variables: denotes all thebits in except . Given , we know exactly which bitin corresponds to each bit in

Therefore, the set of bits in that correspond to the run isalso known. This set of bits is denoted by .2 To summarize,given , the pair is determined andany remaining uncertaintymay only be about which bits inunderwent insertions to yield .To obtain an analytical lower bound, we only consider terms

in (77) for which has a particular struc-ture. Motivated by the discussion in Section IV-A (see (21)),we consider terms for which is an -run of lengthand is a -run of length for some and

. Let and be the -bits just before andafter , respectively; similarly denote by and the-bits just before and after the run . Define the set

as in (78) at the bottom of the page. The arrow in the definitionin (78) means that the input bits give rise to

through some pattern of insertions. We lower

2Note that is not the th run of . In fact, may not even be a fullrun of .

bound (77) by considering only that belong to:

(79)

Given , the following are the different choices for inser-tion pattern :1) The bit undergoes a complementary insertionleading to the first in the . Then, out of thes in undergo duplications, the remaining s are

transmitted without any insertions. There are suchinsertion patterns, each of which occurs with probability

.2) is transmitted without any insertions. out ofthe s in the undergo duplications, the remainingare transmitted without insertions. There are such in-sertion patterns; each of which occurs with probability

.We thus compute for all

(80)

with . Substituting (80) in (79), we obtain

(81)

The terms in square brackets is simply the expected number of-runs for which the event in (78) occurs. By definition,

the event occurs only when the run is a full -run thatyields without any complementary insertions in . (Ifthere is a complementary insertion in a full run of , it willcreate a for which is only part of the full -run.)Hence in (81), the term in brackets can be written as shown

in (82) at the bottom of the next page. In (82), each term in

(78)


is the probability of an -run having length bits3 multipliedby the probability of it generating a -run of length (givenby points 1 and 2 above). is obtained by recognizing that theexpected number of runs in is . Substituting (82) in (81)and dividing throughout by yields the lemma.

D) Proof of Lemma 9: We have

(83)

Therefore,

(84)

We now show that the on the right side of (84) is a limitand is given by (25).Note that whenever since we cannot have

two consecutive insertions. Also, wheneversince only when is a complementary insertion. Thus,we have for all :

(85)

3Conditioned on , the run-lengths of are not strictly independent asthey have to sum to . This can be handled by observing that concen-trates around its mean and taking the sum only over values much smallerthan , e.g., . We then show that conditioned on , theprobability that a run of has length is very close to .

The remainder of the proof consists of showing that the quan-tities in (85) all converge to well-defined limits which can beeasily computed.For all , , where if

is an inserted bit, and otherwise. Therefore,

(86)

The binary-valued process is aMarkov chain with tran-sition probabilities

(87)

For , this is an irreducible, aperiodic Markov chainwith a stationary distribution given by

(88)

Hence for any ,

(89)for all sufficiently large . Using this in (86), for all sufficientlylarge , the distribution is within total variation normof the following stationary distribution:

(90)Further, we have

since the input distribution and the insertion processare both symmetric in 0 and 1. Hence, the stationary distributionfor is

(91)

(82)


Next, we determine the conditional distribution.

For ,

(92)

In the chain above, is obtained using (86). is obtainedas follows. The event im-plies is a duplication, and hence correspondsto an input bit (say ), and is the next input bit . Theprobability that is . Hence,

. When, corresponds to an input bit, say

. Conditioned on this, the event can occurin two ways:1) is the next input bit and is equal to . This eventhas probability .

2) is a duplication of . This event has probability .Hence,

. We similarly calculate

(93)

(94)

and(95)

Using (89) in (92)–(95), we see that for all sufficiently large ,the distribution is within a totalvariation norm from the following stationary distribution:

(96)

We conclude that the distribution con-verges to as , where the joint dis-tribution is obtained by combining (91) and (96). Due to thecontinuity of the entropy function in the joint distribution, wealso have

We can use these facts in (85) and computeto obtain the lemma.

E) Proof of Lemma 10: Due to (26), it is enough toshow that converges to

. Since is an i.i.dprocess, from the strong law of large numbers, we have

(97)

Further, the normalized number of input runs almostsurely. Using the above in Lemma 4, we obtain

(98)

We now argue that

is uniformly integrable.can be upper bounded by since the random se-quence is equivalent to , whichcan take on at most values. Hence, from Lemma 2,

is uniformly inte-grable. Using this together with (98) in Lemma 3, we concludethat

(99)

The proof that converges to isessentially identical.

F) Proof of Lemma 11: The proof is similar to that ofLemma 7 in Appendix I-C. As before, let denote the thrun of and be the sequence of s corresponding to theth run of . We can expand as

(100)


The last inequality holds because can bedetermined from and extra conditioning cannot increasethe entropy. As in Appendix I-C, we only consider terms in(100) for which belongs to for some

and . (Please refer to (78) for the definitionof and related notation.) We lower bound the right sideof (100) as follows:

(101)

Given , the only uncertainty in is inthe first bit of , which is denoted (using the notationintroduced in (78)). The different possibilities for are thefollowing.1) if undergoes a complementary inser-tion leading to . In this case, out ofthe s in undergo duplications and the remainings are transmitted without any insertions. There aresuch insertion patterns, each of which occurs with proba-bility .

2) if is transmitted without any inser-tions. In this case, out of the s in the undergoduplications, the remaining are transmitted without inser-tions. There are such insertion patterns, each of whichoccurs with probability .

is the binary entropyassociated with the two possibilities above and is given by

(102)

Substituting (102) in (101), we obtain

(103)

The term in square brackets above was computed in (82). Sub-stituting it in (103) and dividing throughout by completes theproof.

APPENDIX IIDELETION CHANNEL

A) Proof of Lemma 12: We first show that

(104)

From Propositions 3 and 4, and areboth ergodicMarkov chains with stationary transition probabili-ties. Therefore, from the Shannon–McMillan–Breiman theorem[21], we have

(105)

(106)

Subtracting (106) from (105), we obtain

(107)

Due to Lemma 4, (107) and the fact that almost surelyimply that (104) holds.The support of the random variable can be upper

bounded by representing as

(108)

where the s represent the bits of the sequence , and eachrepresents a missing run. Since the maximum length of the bi-

nary sequence in (108) is , we have .Hence, from Lemma 2, is uniformly in-tegrable. Using this together with (104) in Lemma 3, we con-clude that

(109)

We thus have

B) Proof of Lemma 13: Due to (41), it is enough toshow that converges to

. Since is an i.i.dprocess, from the strong law of large numbers, we have

(110)


Further, we have the normalized number of input runsalmost surely. Using the above in Lemma 4, we obtain

(111)

Next, can be upperbounded by since the random sequence isequivalent to , which can take on at most values. Hence,from Lemma 2, isuniformly integrable. Using this together with (111) in Lemma3, we conclude that

(112)

C) Proof of Lemma 14: We expand in termsof the runs of . Following the notation used in Appendix I-C,we denote the number of runs in by , the th run ofby and the corresponding part of by . We have

(113)

where is the sequence obtained by removingfrom . denotes the exact deletion pattern corre-sponding to the output bits , i.e., it tells us which bit incorresponds to each bit in . The inequality in (113)

holds since is determined by .Motivated by the discussion in Section V-A, we obtain an

analytical lower bound for the right side of (113) by consid-ering only those terms for which the run is generatedfrom either one or three adjacent runs in , as in (43). For, and , define the set as in(114) at the bottom of the page. We allow the possibility thatall of is generated from just one of the three runs; fur-ther note that the -run in the middle is always deleted. Theright side of (113) is lower bounded by considering only triples

, as follows:

(116)

can be computedas follows for . Given , we ex-actly know the set of adjacent runs in from gave rise to .Given , arises fromone or three adjacent runs of . It is possible that additionalruns may be deleted in on either side of the three adjacentruns shown in (114). To handle this case, we can assume that

gives enough information so that we know threeadjacent input runs that correspond to . (Conditioning onadditional random variables can only decrease the lower bound.)Then, the length- vector has at most one nonzero element:for , if was formed with bits from the first

(114)

(115)


length- run and bits from the third length- run, willhave a nonzero in position . If was formed with allbits from the first length- run, then all the elements ofare zero and the symbol in immediately after is nonzero.We thus have

(117)

Next, for a fixed , we compute the three innermost sumsof in (116). This is done in(115) at the bottom of the previous page where is obtained asfollows. In the third line of (115), each term of the inner expec-tation is the probability of three successive -runs having thespecified lengths and giving rise to a -run of length . We usethe fact that is first-order Markov and thus has independentruns.4 holds because is first-order Markov with parameterand has expected length . The expected number of

runs in equals times the expected length of .Substituting (117) and (115) in (116), we obtain

(118)

Dividing both sides by yields the result of the lemma.

APPENDIX IIIINDEL CHANNEL

A) Proof of Lemma 16: We first note that

(119)

The proof of (119) is along the same lines as that ofLemma 5: we use the uniform integrability of the sequence

along with the fact that. The uniform integrability follows from

Lemma 2 since is upper boundedby as explained in Section II-A. We then have

(120)

4Conditioned on , the run-lengths of are not strictly independent.This can be handled by observing that concentrates around its meanand taking the , sums only over values much smaller than , e.g., ,

. We can then show that conditioned on , the probability thata run of has length is very close to .

We will show that exists and ob-tain an analytical expression for it. For all ,

(121)

The first equality above holds since implies is aninserted bit, and so no deleted runs occur between and .

can be computed as follows:

(122)

The last two terms in (a) are obtained by noting thatimplies is an insertion and hence . In this case,is the last noninserted bit before , and the last two terms in (a)correspond to being a duplication and a complementaryinsertion, respectively. The convergence in last line is due to thefact that is a Markov chain with stationary distribution

(123)

Similarly, as , con-verges to

(124)

The joint distributions andare next determined in order

to compute the entropies in (121). For , we have

(125)

The first term in (125) corresponds to being an originalinput bit, the second term to being a duplication, andthe third to being a complementary insertion. Each ofthese terms can be calculated in a manner very similar to


(35) and (36) in Proposition 4. Combined with the conver-gence of to the stationary distribution (123), we obtainthat converges to thedistribution

(126)

Similarly, we can also show thatconverges to the distribution

(127)

As a sanity check, it can be verified that when the joint distribu-tions given by (126) and (127) are summed over , they yieldthe distributions specified in (122) and (124), respectively.The expression in (121) can now be computed by substi-

tuting from (122) and (124) for the terms andcalculating the entropy terms using the joint distributions in(126) and (127). (The calculation is elementary but somewhattedious, and hence omitted.) This yields the limiting value of

. The lemma then follows from (119) and(120).

ACKNOWLEDGMENT

We thank the Associate Editor and the anonymous reviewersfor their comments which helped us strengthen our initial resultsand improve the paper.

REFERENCES[1] A. R. Iyengar, P. H. Siegel, and J. K. Wolf, “Write channel model for

bit-patterned media recording,” IEEE Trans. Magn., vol. 47, no. 1, pp.35–45, Jan. 2011.

[2] N. Ma, K. Ramchandran, and D. Tse, “Efficient file synchronization:A distributed source coding approach,” in Proc. IEEE Int. Symp. Inf.Theory, 2011, pp. 583–587.

[3] R. Venkataramanan, S. Tatikonda, and K. Ramchandran, “Bounds onthe optimal rate for synchronization from deletions and insertions,”presented at the Inf. Theory Appl. Workshop, San Diego, CA, USA,2011.

[4] S. N. Diggavi andM. Grossglauser, “On information transmission overa finite buffer channel,” IEEE Trans. Inf. Theory, vol. 52, no. 3, pp.1226–1237, Mar. 2006.

[5] E. Drinea and M. Mitzenmacher, “Improved lower bounds for the ca-pacity of i.i.d. deletion and duplication channels,” IEEE Trans. Inf.Theory, vol. 53, no. 8, pp. 2693–2714, Aug. 2007.

[6] A. Kirsch and E. Drinea, “Directly lower bounding the information ca-pacity for channels with i.i.d. deletions and duplications,” IEEE Trans.Inf. Theory, vol. 56, no. 1, pp. 86–102, Jan. 2010.

[7] S. Diggavi, M. Mitzenmacher, and H. Pfister, “Capacity upper boundsfor the deletion channel,” in Proc. IEEE Int. Symp. Inf. Theory, 2007,pp. 1716–1720.

[8] M. Mitzenmacher, “A survey of results for deletion channels and re-lated synchronization channels,” Probability Surv., vol. 6, pp. 1–33,2009.

[9] D. Fertonani and T. M. Duman, “Novel bounds on the capacity of thebinary deletion channel,” IEEE Trans. Inf. Theory, vol. 56, no. 6, pp.2753–2765, Jun. 2010.

[10] A. Kalai, M. Mitzenmacher, and M. Sudan, “Tight asymptotic boundsfor the deletion channel with small deletion probabilities,” in Proc.IEEE Int. Symp. Inf. Theory, 2010, pp. 997–1001.

[11] Y. Kanoria and A. Montanari, “On the deletion channel with smalldeletion probability,” in Proc. IEEE Int. Symp. Inf. Theory, 2010, pp.1002–1006 [Online]. Available: http://arxiv.org/abs/1104.5546

[12] M.Mitzenmacher, “Capacity bounds for sticky channels,” IEEE Trans.Inf. Theory, vol. 54, no. 1, pp. 72–77, Jan. 2008.

[13] R. L. Dobrushin, “Shannon’s theorems for channels with synchroniza-tion errors,” Probl. Peredachi Inf., vol. 3, no. 4, pp. 18–36, 1967.

[14] A. R. Iyengar, P. H. Siegel, and J. K. Wolf, “Modeling and informa-tion rates for synchronization error channels,” in Proc. IEEE Int. Symp.Inf. Theory, 2011, pp. 380–384 [Online]. Available: http://arxiv.org/abs/1302.2702

[15] H. Mercier, V. Tarokh, and F. Labeau, “Bounds on the capacity of dis-crete memoryless channels corrupted by synchronization and substitu-tion errors,” IEEE Trans. Inf. Theory, vol. 58, no. 7, pp. 4306–4330,Jul. 2012.

[16] R. G. Gallager, “Sequential decoding for binary channels with noiseand synchronization errors,”, Oct. 1961, Lincoln Lab Group Report.

[17] D. Fertonani, T. M. Duman, and M. F. Erden, “Bounds on the capacityof channels with insertions, deletions and substitutions,” IEEE Trans.Commun., vol. 59, no. 1, pp. 2–6, Jan. 2011.

[18] T. M. Cover and J. A. Thomas, Elements of information theory. NewYork, NY, USA: Wiley-Interscience, 2006.

[19] G. Grimmett and D. Stirzaker, Probability and Random Processes.Oxford, U.K.: Oxford Univ. Press, 2001.

[20] A. Tridgell and P. Mackerras, The rsync algorithm Nov. 1998 [Online].Available: http://rsync.samba.org/

[21] P. H. Algoet and T. M. Cover, “A sandwich proof of the Shannon-Mcmillan-Breiman theorem,” Annu. Probability, vol. 16, no. 2, pp.899–909, 1988.

Ramji Venkataramanan (S’06–M’09) received the B.Tech. degree in Elec-trical Engineering from the Indian Institute of Technology, Madras in 2002, andthe Ph.D. degree in Electrical Engineering (Systems) from the University ofMichigan, Ann Arbor in 2008. He held post-doctoral positions at Stanford Uni-versity and Yale University before joining the University of Cambridge in 2013,where he is a University Lecturer in the Department of Engineering and a Fellowof Trinity Hall. His current research interests are in information theory, codingand statistical inference.

Sekhar Tatikonda (S’92–M’00–SM’13) received the B.S. (1993),M.S. (1995),and Ph.D. (2000) in Electrical Engineering and Computer Science from theMassachusetts Institute of Technology, Cambridge, MA. He was a postdoctoralfellow at the University of California, Berkeley from 2000–2002. In 2002, hejoined the Electrical Engineering Department at Yale University, New Haven,CT, where he is currently an associate professor. His research interests are incommunication theory, information theory, stochastic control, distributed opti-mization, statistical machine learning and inference.

Kannan Ramchandran (F’05) received the Ph.D. degree in 1993 from Co-lumbia University, New York. He is currently a Professor of Electrical Engi-neering and Computer Science at the University of California at Berkeley, wherehe has been since 1999. Prior to that, he was with the University of Illinois atUrbana-Champaign from 1993 to 1999, and was at AT&T Bell Laboratoriesfrom 1984 to 1990. He is a Fellow of the IEEE and has won numerous awardsincluding the Eli Jury thesis award at Columbia, a couple of Best Paper awardsfrom the IEEE Signal Processing Society, a Hank Magnusky Scholar awardat Illinois, an Okawa Foundation Research Prize at Berkeley, an OutstandingTeaching Award from the EECS Department at UC Berkeley, and has coau-thored several best student paper awards at conferences and workshops. Hiscurrent research interests include distributed signal processing and coding forwireless systems, coding for distributed storage, peer-to-peer networking andvideo content delivery, security, and multiuser information and communicationtheory.

Documents

6990 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 59, …rv285/ITNov13_indel.pdf · 2013. 10. 18. · 6990 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 59, NO. 11, NOVEMBER 2013