[PDF] Polar Codes for Channels with Insertions, Deletions, and Substitutions

Abstract

This paper presents a coding scheme for an insertion deletion substitution channel. We extend a previous scheme for the deletion channel where polar codes are modified by adding "guard bands" between segments. In the new scheme, each guard band is comprised of a middle segment of '1' symbols, and left and right segments of '0' symbols. Our coding scheme allows for a regular hidden-Markov input distribution, and achieves the information rate between the input and corresponding output of such a distribution. Thus, we prove that our scheme can be used to efficiently achieve the capacity of the channel. The probability of error of our scheme decays exponentially in the cube-root of the block length.

Full PDF

aa r X i v : . [ c s . I T ] F e b Polar Codes for Channels withInsertions, Deletions, and Substitutions

Henry D. Pﬁster

Duke University

Ido Tal

Technion

Abstract —This paper presents a coding scheme for an insertiondeletion substitution channel. We extend a previous scheme forthe deletion channel where polar codes are modiﬁed by adding“guard bands” between segments. In the new scheme, each guardband is comprised of a middle segment of ‘1’ symbols, and leftand right segments of ‘0’ symbols. Our coding scheme allows fora regular hidden-Markov input distribution, and achieves theinformation rate between the input and corresponding output ofsuch a distribution. Thus, we prove that our scheme can be usedto efﬁciently achieve the capacity of the channel. The probabilityof error of our scheme decays exponentially in the cube-root ofthe block length.

I. I

NTRODUCTION

In many communications systems, symbol-timing errorscan result in insertion and deletion errors. For example, theinsertion deletion substitution (IDS) channel maps a length- N input string to a ﬁnite output string by sampling an i.i.d.output process for each input that selects between insertion,deletion, and substitution. These types of channels were ﬁrststudied in the 1960s [1], [2] and modern coding techniqueswere ﬁrst applied to them in [3]. Over the past 15 years,bounds on the capacity of synchronization error channels havebeen signiﬁcantly improved [4]–[8].In [9], [10], a capacity-achieving coding scheme is intro-duced for the deletion channel based on polar codes. The con-struction and proof builds upon many earlier results (e.g., [11]–[19]); see [10] for a detailed description of these connections.The construction in [9], [10] is based on generating code-words consisting of smaller blocks separated by guard bands.After reception, the overall output sequence is separated intoblocks associated with the smaller input blocks. However, theseparation process changes the effective channel experiencedby the small blocks. In particular, the guard bands are longblocks of zeros and the separation process removes all ze-ros on either side of the small block. The analysis in [9],[10] shows that the resulting channel, dubbed the trimmeddeletion channel (TDC), polarizes weakly and has the samemutual information rate as the original deletion channel. Dueto the possibly unbounded memory in the deletion channel,the standard extension [20] to strong polarization does notwork. Instead, strong polarization can be shown for the polarcombining of these small blocks due to the independenceprovided by the guard bands. These elements complete theachievability proof for the deletion channel.In this paper, we apply roughly the same coding schemeto the IDS channel. The main difference is that separatingthe overall output sequence into smaller blocks is more chal-lenging. For the deletion channel, an input that only contains zeros always gives an output that only contains zeros. Thus,the separation process consists of parsing into small blocks andremoving zeros from the edges. For the IDS channel, an inputonly containing zeros typically gives an output containing bothzeros and ones. Fortunately, the expected fraction of zeros willbe noticeably larger than the fraction of ones. This observationalong with a more complicated parsing process can be usedto separate the overall output sequence into blocks associatedwith the smaller input blocks.The key challenge is designing the parsing process so thatthe effective channel experienced by the small block can beanalyzed. In particular, our parsing process produces segmentsthat can be seen roughly as the IDS output of an inputconsisting of a preﬁx of zeros, the original input data, anda sufﬁx of zeros. For all the small output blocks, the preﬁxand sufﬁx lengths are i.i.d. random variables with a knowndistribution. We refer to the resulting channel as the dirty zero-padded (DZP) IDS channel. To establish the coding theoremfor the IDS channel, we must show three things. First, that ourparsing of the IDS channel output gives small blocks whosejoint input-output distribution matches that of the DZP channel.Second, that the DZP channel polarizes weakly and has thesame mutual information rate as the original IDS channel.Third, that the trellis representation of the joint input-outputdistribution of the IDS channel [3] can be modiﬁed to givethe joint input-output distribution of the DZP channel. In thiswork, we establish these three elements and describe the ﬁrsttwo elements herein.By combining the parsing process described in this paperwith the results of [9], [10], one gets the following theorem.Due to space limitations, many details are deferred to theextended version of this paper. Theorem 1.

Fix a regular hidden-Markov input process anda parameter ν ∈ (0 , / . The rate of our coding schemeapproaches the mutual information rate between the inputprocess and the binary IDS channel output. The encoding anddecoding complexities are O (Λ log Λ) and O (Λ ν ) , respec-tively, where Λ is the blocklength. For any < ν ′ < ν andsufﬁciently large blocklength Λ , the probability of decodingerror is at most − Λ ν ′ . The structure of this paper is as follows. In Section II wedeﬁne the IDS channel, and also a close variant which weterm the “dirty zero padding IDS channel” (DZP). Section IIIdetails how encoding is done. In Section IV, we deﬁne twodecoding methods. Namely, we ﬁrst deﬁne a decoding methodexecuted by a genie, which is in possession of some extranformation (it knows where the “commas” which separatethe outputs corresponding to certain input blocks are). Theutility of the genie’s decoding method is that it is easy toanalyze (the DZP channel is used in the analysis). We then adeﬁne second decoding method: Aladdin’s decoding method.Since Aladdin is a mere mortal, he does not have knowledgeof where the above commas lie. That is, Aladdin’s methodis the one we can actually implement. The main trick is toshow that with very high probability, the genie’s decoder andAladdin’s decoder produce the exact same result.II. C

HANNEL MODELS

In this section, we deﬁne the IDS and DZP channels.

A. Dobrushin’s Channel and the IDS Channel

In 1967, Dobrushin introduced a general class of channelswith synchronization errors and proved a random codingtheorem for that class [2]. The model consists of a ﬁniteinput alphabet X and a conditional distribution p Y | X ( ·| x ) overﬁnite output strings Y ⊆ X ∗ = ∪ ∞ n =0 X n given x ∈ X ,where ǫ denotes the empty string of length . For the input x = ( x , x , . . . , x N ) ∈ X N , the channel output is generatedby drawing Y n ∼ p Y | X ( ·| x n ) i.i.d. and concatenating to get Y = Y ⊙ Y ⊙ · · · ⊙ Y N . For example, the binary deletion channel with deletionprobability p d has X = { , } and Y = { ǫ, , } withnon-zero probabilities P Y | X ( ǫ | x ) = p d and P Y | X ( x | x ) =1 − p d for all x ∈ X . Similarly, the binary IDS channelwe consider has IDS probabilities ( p i , p d , p s ) , X = { , } ,and Y = { ǫ, , , , , , } with non-zero probabilities P Y | X ( ǫ | x ) = p d , P Y | X ( x | x ) = 1 − p i − p d − p s , P Y | X ( x | x ) = p s , P Y | X (0 x | x ) = P Y | X (1 x | x ) = p i / for all x ∈ X .While we focus on this binary IDS channel for concreteness,the approach described here should generalize to any well-behaved binary-input Dobrushin channel for which the outputdistribution associated with the all-zero input is distinguishablefrom ﬁnite shifts of the output distribution associated with theall-one input. For simplicity, we focus on the case where theyare distinguishable simply by counting ones and zeros.Deﬁne α | x ( α | x ) as the expected number of ( ) symbolsat the output of the channel, given that the input was x ∈X . Note that the expected length of an output, given that theinput was x ∈ X is α | x + α | x . We require that this sum isindependent of x , and denote it as β = α | + α | = α | + α | . (1)We also require an “advantage” to x at the output, if theinput was x . That is, we require that α | > α | and α | > α | , (2)and denote γ , min { α | − α | , α | − α | } > , (3)where the inequality follows by (2).Informally, the above “advantage to the input at the output”will allow us to differentiate between a long input of symbolsand a long input of symbols. Speciﬁcally, ﬁx a window length h > and an x ∈ X . Then, generate an output sequenceof length at least h + 1 and optionally remove the ﬁrst outputbit. Then, we count the number of symbols contained in theﬁrst h positions of the string. If it is at least h/ , then wedeclare that x = 0 ; otherwise, we declare that x = 1 . Thefollowing lemma states that we have a very high chance ofguessing correctly, for large enough window length h . Lemma 2.

Let x ∈ X be ﬁxed, and let a window length h ≥ h be given, where h is a constant dependent on the channel.Let Y be a string of length h generated by truncating theoutput associated with the all- x input where the ﬁrst output bitis optionally removed. Then, the probability that Y containsfewer than h/ bits equal to x is less than err h , e − h · c , where c is a positive constant dependent on the channel.Proof: See Appendix.

B. Dirty-Zero-Padding IDS channel

The DZP channel W ⋆ is deﬁned by the IDS channel W , theinput blocklength N , and two probability distributions over X ∗ , P padLeft and P padRight . Given the length- N input string x , we ﬁrst pass x through the IDS channel W and let y middle denote the output. Next, we draw two independent vectors y left and y right according to the probability distributions P padLeft and P padRight , respectively. The output of the DZP channel isthen given by y ⋆ = y left ⊙ y middle ⊙ y right . (4)We will specify P padLeft and P padRight later. For now, letus say informally that y left and y right are the result of passingstrings of ‘0’ symbols through the channel W . Hence the name:we pad y from the left and right by vectors corresponding tozeros “dirtied” by passing through the channel W .Informally, the following lemma states that, in the limit as N → ∞ , the mutual information rates of W and W ⋆ areequal. As will become apparent later, the maximum possiblelength of y left and the maximum possible length of y right bothgrow sub-linearly in N . Hence, the condition of the lemmais not vacant. Lemma 3.

Let X ∈ X N be a random vector of length N .Let Y and Y ⋆ be the outputs gotten when X is input to theIDS channel W and the DZP channel W ⋆ , respectively. Let N be large enough so that the maximum length that y left cantake and the maximum length that y right can take are both atmost N − . Then, I ( X ; Y ) − N N ≤ I ( X ; Y ⋆ ) N ≤ I ( X ; Y ) N . Proof.

It sufﬁces to prove the inequalities for the numer-ators since the denominators all equal N . The inequality I ( X ; Y ⋆ ) ≤ I ( X ; Y ) follows by the data-processing inequal-ity, since X , Y , and Y ⋆ form a Markov chain, in that order.We will show that I ( X ; Y ) − N ≤ I ( X ; Y ⋆ ) . (5)et us ﬁrst denote Y ⋆ = Y left ⊙ Y middle ⊙ Y right , asdescribed above. Next, note that we can assume w.l.o.g. that Y = Y middle . Finally, note that X , ( Y ⋆ , | Y left | , | Y right | ) , and Y form a Markov chain, in the order, where | · | denotes thelength of a string. Thus, I ( X ; Y ) ≤ I ( X ; Y ⋆ , | Y left | , | Y right | ) ≤ I ( X ; Y ⋆ ) + H ( | Y left | ) + H ( | Y right | ) ≤ I ( X ; Y ⋆ ) + 2 log ( N ) , because both | Y left | and | Y right | can take at most N differentvalues. Thus, (5) holds and the proof is complete.III. E NCODING

Suppose for a moment that we were coding not for theIDS channel W , but for the DZP channel W ⋆ . First of all,recall that the channel W ⋆ accepts a block of length N bits. We choose a typically “large” N . However, instead ofonly sending a single block of length N , we send Φ suchblocks, denoted x (1) , x (2) , . . . , x (Φ) . The important point tonote is the output: denote the output of W ⋆ correspondingto x ( i ) as y ⋆ ( i ) . We assume that the output correspondingto the above input is ( y ⋆ (1) , y ⋆ (2) , . . . , y ⋆ (Φ)) , as opposedto y ⋆ (1) ⊙ y ⋆ (2) ⊙ · · · ⊙ y ⋆ (Φ) . That is, we assume that theoutput blocks corresponding to the input blocks are punctuated .Namely, given the output corresponding to Φ blocks, we candistinguish the output corresponding to input block x ( i ) . Thisis in stark contrast to W , in which no such punctuation isgiven.For this setting, one can both encode and decode usingpolar codes; this is very similar to what was done in [10]with the DZP channel playing the role of the block-TDCchannel. Given the information symbols and frozen indices,the information symbols are mapped to a polar codeword oflength Φ N using polar encoding. Also, extending the ideasin [3], [10], we can build a trellis for calculating the jointprobability of x being the input to W ⋆ and y ⋆ being theoutput (building such a trellis involves the use of P padLeft and P padRight ). Finally, using Lemma 3 and essentially the sameproof as [10], this coding scheme can approach the capacityof the IDS channel W . Due to lack of space, we do not gointo further details.Our coding scheme for the IDS channel W consistsof two phases. In the ﬁrst phase, we produce the blocks x (1) , x (2) , . . . , x (Φ) by taking the whole polar codeword andadding commas to separate into blocks of length N . Then, weimagine these blocks being transmitted over W ⋆ . In the secondphase, we add guard bands (deﬁned shortly) between the aboveblocks. The result is a long codeword that is transmitted overthe channel W . Loosely speaking, the purpose of the guardbands is to allow the decoder to simulate the operation of W ⋆ on the blocks x (1) , x (2) , . . . , x (Φ) , even though we are in facttransmitting over the channel W .Denote N = 2 n , N = 2 n and Φ = 2 n = 2 n − n , where n = ⌈ nν ⌉ and ν was ﬁxed in Theorem 1. Let x = x (1) ⊙ x (2) ⊙ · · · ⊙ x (Φ) (6)be a vector of length N , consisting of blocks x ( i ) , each oflength N . We denote by g ( x ) the result of adding guard bands to x . For this, let us denote x = x I ⊙ x II , where x I and x II are the left and right halves of x , each of length N/ . g ( x ) , ( x if n ≤ n g ( x I ) ⊙ g n ⊙ g ( x II ) if n > n , (7)where g n is termed the guard band and deﬁned as follows.Denote by ( ℓ ) and ( ℓ ) a string of ℓ consecutive ‘ ’ symbolsand a string of ℓ consecutive ‘ ’ symbols, respectively. Let ℓ n , ⌊ (1 − ξ )( n − ⌋ , where ξ ∈ (0 , / is a ‘small’ constant determined by thedifference between ν and ν ′ in Theorem 1. Then, g n , ( ℓ n ) | {z } g left n ⊙ g mid n z }| { ( ℓ n ) | {z } g midleft n ⊙ ( ℓ n ) | {z } g midright n ⊙ ( ℓ n ) | {z } g right n . (8)We note that g left n and g right n are not, in fact, functions of n .IV. D ECODING

We now consider two settings for decoding. In both settings,a vector g ( x ) is transmitted over the IDS channel W , and thecorresponding output is y . Both settings differ only in their pre-liminary step, which parses the received vector y into Φ sub-vectors. In the ﬁrst setting, which we call “genie parsing”, anall-knowing genie receives the output y and adds commas incertain appropriate places. Recall from (6) that x is comprisedof Φ blocks. After adding commas to the output, the genie pro-duces for each block x ( i ) a corresponding output y ⋆ ( i ) . Theresult is a series of outputs y ⋆ = ( y ⋆ (1) , y ⋆ (2) , . . . , y ⋆ (Φ)) ,where for each i , the probability law of y ⋆ ( i ) given x ⋆ ( i ) isthe DZP channel W ⋆ ( y ⋆ ( i ) | x ( i )) . We then use the methodsdescribed in [10] to decode x from y ⋆ .The second setting is called “Aladdin parsing”. As before, g ( x ) is transmitted and y is received. The goal of Aladdin isto produce the same sequence y ⋆ as the genie. Since Aladdinis a mere mortal, he does not have the knowledge required toguarantee that he will add commas in the appropriate places.This raises the question, “Why does the genie output havedirty zero-padding?”. An all-knowing genie could produce theIDS output sequences y (1) , . . . , y (Φ) . But, our genie choosesa weaker strategy (based on an i.i.d. dither sequence) so thatAladdin can hope to match the genie’s parsing by makinguse of the guard bands. Thus, we will show that Aladdin cansucceed in producing y ⋆ with very high probability. A. Genie parsing

Recall from (7) that the codeword we transmit is comprisedof blocks x ( i ) , ≤ i ≤ Φ , separated by guard bands.Let g ( i ) denote the guard band between x ( i ) and x ( i + 1) ,where g ( i ) equals g n ′ for some n < n ′ ≤ n which is afunction of i . Now we recall from (8) that each guard band g ( i ) is comprised of four blocks, which we denote g left ( i ) , g midleft ( i ) , g midright ( i ) , and g right ( i ) , for ≤ i ≤ Φ − . Thegenie receives the output y , and adds commas between allthe blocks because the genie can distinguish which substringof y equals y ( i ) , the output corresponding to x ( i ) . It canlso distinguish which part of y corresponds to g (cid:3) ( i ) , where (cid:3) ∈ { left , midleft , midright , right } . We denote the relevantpart of y as d (cid:3) ( i ) , where “d” stands for “dirty”.Recall from (4) that, in order to return y ⋆ ( i ) = y left ( i ) ⊙ y ( i ) ⊙ y right ( i ) , the DZP channel W ⋆ must pad y ( i ) from the left and right.This padding is according to the probability distributions P padLeft and P padRight , which have yet to be speciﬁed. Now,we deﬁne how the genie produces y left ( i ) and y right ( i ) fromthe following punctuated segment of y ⋆ , d midright ( i − , d right ( i − , y ( i ) , d left ( i ) , d midleft ( i ) . In doing this, we implicitly deﬁne P padLeft and P padRight asthe distributions of y left ( i ) and y right ( i ) . Before we proceed,we encourage the reader to validate the following points: y left ( i ) and y right ( i ) are independent and their distributions • depend on the channel statistics of W ; • are not functions of i ; • are not functions of either x ( i ) nor y ( i ) .Consider an index < i < Φ (not the ﬁrst nor last block).We now describe how y left ( i ) depends on d midright ( i − and d right ( i − . The description of how y right ( i ) dependson d left ( i ) and d midleft ( i ) is given by reﬂection symmetry.Before diving into the details, we emphasize that y left ( i ) willconsist of some sufﬁx of d right ( i − . Since d right ( i − isthe result of sending a string of zeros, g right ( i − , we willindeed pad y ( i ) with a string of “dirty zeros”. Here are thedetails.1) The genie considers the length of d midright ( i − .a) If it is less than h , where h , ℓ n · β , the genie pads d midright ( i − from the left. Thisis done by conceptually drawing a string from p Y | X ( ·| and prepending d midright ( i − withthe string. In practice, we use independent randomvariables to simulate p Y | X . This is repeated untilthe length of d midright ( i − is at least h .2) The genie considers the concatenated string z = d midright ( i − ⊙ d right ( i − . It places a window of length h at the right side of d midright ( i − . That is, the window starts at z s andends at z e , where e = | d midright ( i − | and s = e − h .3) The genie draws a random integer ρ = ρ left ( i ) uniformlyfrom { , , . . . , h } . We think of ρ as a “random dither”.4) The genie shifts the window by ρ positions right. Thatis, ρ is added to both s and e .a) If the window falls off z , that is, if e > | z | ,the genie chooses y left = ǫ , the empty string.Otherwise, the genie continues to the next step.5) The genie counts the number of ‘ ’ symbols in thewindow (i.e., the cardinality of { s ≤ i ≤ e | z i = 0 } ),6) If the count is at least h/ , the genie sets y left to theremainder of z after deleting z to z e and then ﬁnishesby returning y left . 7) Otherwise, the genie shifts the window one frame right.That is, h is added to both s and e .a) If the window falls off z , that is, if e > | z | ,the genie chooses y left = ǫ , the empty string.Otherwise, the genie continues to the next step.8) We set y left to the remainder of z after deleting z to z e and then ﬁnish by returning y left .The rationale of above procedure will become clearer afterwe explain Aladdin’s algorithm. For now, note that it iswell deﬁned and does indeed satisfy the requirements statedpreviously. The reader should also keep in mind that gettinginto a substep is ‘bad’ with respect to Aladdin’s ability tomimic the genie. That is, we would like the probability ofentering substeps 1a, 4a, or 7a to be ‘small’.We must address one last point: how the paddings for blocks i = 1 and i = Φ are handled. The right padding for i = 1 and the left padding for i = Φ are as above. The left paddingfor i = 1 and the right padding for i = Φ (i.e., the edgepadding) are given by random sampling from P padLeft and P padRight . These choices are coupled so that the genie andAladdin always choose the same realizations for these edgepaddings. B. Aladdin parsing

Aladdin receives the vector y , and as a preliminary stepadds the edge padding on the left and right (both of which arecoupled to the genie’s choices). We denote the resulting vec-tor y pad . Aladdin’s parsing is given by mimicGenie( y pad , n ) where the recursive function mimicGenie( z , m ) is deﬁned by: • If m = n , Aladdin returns z . Otherwise, • Aladdin builds z I and z II as follows, and then return mimicGenie( z I , m − (which contains m − n − vec-tors), followed by mimicGenie( z II , m − (which alsocontains m − n − vectors). Namely, Aladdin returns m − n vectors. • Let z I be the left half of z and z II be the right half of z (in case | z | is odd, z I is longer than z II , by one bit).Then, Aladdin trims z I and z II . • Trimming z I is the “mirror image” of trimming z II , whichis done as follows: – Aladdin places a window of length h at the startof z II . That is, the window starts at s = 0 andends at e = h . If in any stage of the algorithmthe window “falls off z II ”, meaning that e > | z II | ,Aladdin declares failure. – Aladdin randomly and uniformly chooses a randomdither ρ ′ uniformly from { , , . . . , h } . – Aladdin shifts the window ρ positions right byadding ρ to both s and e . – Aladdin checks if the window contains at least h/ ‘ ’ symbols. If it does, the process continues to thenext step. If it does not, Aladdin moves the windowone frame to the right by adding h to both s and e ,and then repeats this bullet point. – Aladdin trims z II by removing the ﬁrst e symbols. . Connections between Aladdin and genie parsing Let us compare Aladdin’s parsing to that of the genie. Firstof all, note that both decoders use − random dithersduring their runs (recall that we’ve denoted a random dither as ρ for the genie and ρ ′ for Aladdin). To help Aladdin match thegenie, these dithers can be coupled . That is, for each choice ofrandom dithers the genie makes we couple a unique choice ofrandom dithers that Aladdin makes. We also couple the choicesof The utility of this coupling is that, with high probability,both the genie and Aladdin return the same vector of DZPchannel outputs ( y ⋆ ( i )) Φ i =1 .Before describing the coupling, we note that, given the DZPparsing, the proof of Theorem 1 follows essentially the samesteps as the main result in [10]. The steps will be detailed ina forthcoming longer version of this paper [21].For brevity, we explain the coupling in terms of just twodithers. Consider the ρ that the genie chooses for padding x ( i ) from the left, for i = Φ / . We couple this ρ with the ρ ′ Aladdin chooses in the topmost part of the recursion, forproducing z II . Typically, the midpoint of z is in d midright ( i − or d midleft ( i − . Aladdin adds the dither ρ to the window,and then shift it one frame right, until the number of zerosis large enough. By Lemma 2 we conclude that the numberof zeros will typically not be large enough, until the windowcontains some part of d right ( i − . Consider the ﬁrst time thishappens, and set the Genie’s ρ to the number of symbols from d right ( i − . One can think of the genie as having a ‘shortcut’that avoids the previous steps Aladdin took. Both Aladdin andthe Genie have the same window, at this point. If it containsenough zero symbols, they return the same padding. If it doesnot, they both shift it one frame right. At this stage, typically,the window will only contain symbols from d right ( i − , “dirtyzeros”. Thus, again by Lemma 2, Aladdin will typically stopat this stage, as the genie always does, and both will returnthe same left padding. A PPENDIX

A. Proof of Lemma 2

Let < δ < be a constant, dependent on the channel, thatwe will ﬁx later. Recall that the expected length of an outputcorresponding to a single input is β , and deﬁne k = ⌈ h (1 − δ ) /β ⌉ . (9)We can think of the output Y as being manufactured as follows.We input the ﬁrst bit ( x ) to the channel, then input k more bits(all x ), and if the output up to this point has length less than h + 1 , inputting however many bits (all x ) are needed in orderfor the output length to be at least h + 1 . Then, we possiblyremove the ﬁrst bit of the output, and set Y to the ﬁrst h bits.We will call the output corresponding to the k input bits afterthe ﬁrst input bit the essential output . Our proof hinges onshowing that the following two events occur with very highprobability: 1) all of the essential output is contained in Y ,and 2) the essential output has more than h/ bits equal to x .Denote by Z = Z ⊙ Z ⊙ · · · ⊙ Z k the essential output,where Z i is the output corresponding to input bit i + 1 . Weﬁnd it easier to deﬁne bad events: event A occurs if the lengthof Z is at least h − ; event B occurs if Z contains at most h/ bits equal to x . Clearly, if neither A nor B occur, theabove good events occur , and we correctly guess x .Now, let us choose δ = min (cid:26) γ · α | , γ · α | , (cid:27) , (10)where γ is deﬁned in (3). Note that, indeed, < δ < . Let h ′ = 2( β + 1) δ − > , (11)where the inequality follows from (10). Assume that h ≥ h ′ .By Hoeffding’s bound [22, Theorem 4.12], P ( A ) ≤ e − k ( h − k − β ) = 2 e − k ( h − k − β ) ( a ) ≤ e − ( h (1 − δ ) β ) · ( h − k − β ) , where ( a ) follows from (9). Noting the squared term on theRHS, we next show that h − k > h − h (1 − δ ) β + 1 ≥ β − δ/ > β . Indeed, the ﬁrst inequality follows from (9), noting that h − is positive, since h ≥ h ′ > ; the second follows from (11),recalling that h ≥ h ′ ; the third follows since δ > . Thus,from the above two displayed equations we conclude that P ( A ) < e − h ( (1 − δ ) β ) · ( β − δ/ − β ) . (12)For event B , we use Hoeffding’s inequality and (9) to get P ( B ) ≤ e − k ( αx | x − h k ) = 2 e − k ( αx | x − h k ) ≤ e − ( h (1 − δ ) β ) · ( αx | x − h k ) . Focusing on the squared term on the RHS, we now prove that α x | x ≥ β/ − δ ≥ h k . The second inequality follows easily from (9). For the ﬁrstinequality, ﬁrst recall that β = α x | x + α − x | x , by (1). Thus, itsufﬁces to prove that α x | x − α − x | x ≥ δα x | x . By (3), this willfollow if we prove that γ ≥ δα x | x , which holds by (10). Thus,from the above two displayed equations we conclude that P ( B ) ≤ e − ( h (1 − δ ) β ) · ( αx | x − β/ − δ ) = 2 e − ( h (1 − δ ) β ) ·  αx | x − α − x | x − δαx | x − δ  . Slightly reﬁning the above arguments, we get from (1), (3),and (10), that α x | x − α − x | x ≥ γ ≥ γ ≥ δα x | x . Thus, from the above two displayed equations we get that P ( B ) ≤ e − ( h (1 − δ ) β ) · ( γ − γ/ − δ ) = 2 e − h ( − δβ ) · ( γ/ − δ ) . (13) Recall that the output due to the ﬁrst input bit has length at most . n light of (12) and (13), let us deﬁne c ,

12 min  ( (1 − δ ) β ) · ( β − δ/ − β ) | {z } c ′ , ( − δβ ) · ( γ/ − δ ) | {z } c ′′  and take h ≥ h ′ large enough such that for all h ≥ h , e − h · c ′ + 2 e − h · c ′′ ≤ e − h · c . That is, h , max { h ′ , ln(4) /c } .R EFERENCES[1] R. Gallager, “Sequential decoding for binary channels with noise andsynchronization errors,” 1961, Lincoln Lab Group Report.[2] R. L. Dobrushin, “Shannon’s theorems for channels with synchronizationerrors,”

Problemy Peredachi Informatsii , vol. 3, no. 4, pp. 18–36, 1967.[3] M. C. Davey and D. J. MacKay, “Reliable communication over chan-nels with insertions, deletions, and substitutions,”

IEEE Trans. Inform.Theory , vol. 47, no. 2, pp. 687–698, 2001.[4] M. Mitzenmacher, “A survey of results for deletion channels and relatedsynchronization channels,”

Probability Surveys , vol. 6, pp. 1–33, 2009.[5] Y. Kanoria and A. Montanari, “Optimal coding for the binary deletionchannel with small deletion probability,”

IEEE Trans. Inform. Theory ,vol. 59, no. 10, pp. 6192–6219, 2013.[6] M. Rahmati and T. M. Duman, “Upper bounds on the capacity ofdeletion channels using channel fragmentation,”

IEEE Trans. Inform.Theory , vol. 61, no. 1, pp. 146–156, 2015.[7] J. Castiglione and A. Kavcic, “Trellis based lower bounds on capacitiesof channels with synchronization errors,” in

Information Theory Work-shop . Jeju, South Korea: IEEE, 2015, pp. 24–28.[8] M. Cheraghchi, “Capacity upper bounds for deletion-type channels,”

Journal of the ACM (JACM) , vol. 66, no. 2, p. 9, 2019.[9] I. Tal, H. D. Pﬁster, A. Fazeli, and A. Vardy, “Polar codes for the deletionchannel: Weak and strong polarization,” in

Proc. IEEE Int. Symp. Inform.Theory , 2019, pp. 1362–1366.[10] ——, “Polar codes for the deletion channel: weak and strong polariza-tion,” 2020, preprint arXiv:1904.13385v2.[11] R. Wang, R. Liu, and Y. Hou, “Joint successive cancellation decodingof polar codes over intersymbol interference channels,” 2014, preprintarXiv:1404.3001. [12] R. Wang, J. Honda, H. Yamamoto, R. Liu, and Y. Hou, “Constructionof polar codes for channels with memory,” in , October 2015, pp. 187–191.[13] E. K. Thomas, V. Y. F. Tan, A. Vardy, and M. Motani, “Polar codingfor the binary erasure channel with deletions,”

IEEE CommunicationsLetters , vol. 21, no. 4, pp. 710–713, April 2017.[14] K. Tian, A. Fazeli, A. Vardy, and R. Liu, “Polar codes for channelswith deletions,” in , 2017, pp. 572–579.[15] K. Tian, A. Fazeli, and A. Vardy, “Polar coding for deletion channels:Theory and implementation,” in

IEEE International Symposium onInformation Theory , 2018, pp. 1869–1873.[16] ——, “Polar coding for deletion channels,” 2018, submitted to IEEETrans. Inform. Theory.[17] Y. Li and V. Y. F. Tan, “On the capacity of channels with deletions andstates,” 2019, preprint arXiv:1911.04473.[18] E. S¸as¸o˘glu and I. Tal, “Polar coding for processes with memory,”

IEEETrans. Inform. Theory , vol. 65, no. 4, pp. 1994–2003, April 2019.[19] B. Shuval and I. Tal, “Universal polarization for processes with memory,”2018, preprint arXiv:1811.05727v1.[20] ——, “Fast polarization for processes with memory,”

IEEE Trans.Inform. Theory , vol. 65, no. 4, pp. 2004–2020, April 2019.[21] H. D. Pﬁster and I. Tal, “Polar codes for channels with insertions,deletions, and substitutions,” arXiv preprint in preparation.[22] M. Mitzenmacher and E. Upfal,