[PDF] Channels That Die

Abstract

Given the possibility of communication systems failing catastrophically, we investigate limits to communicating over channels that fail at random times. These channels are finite-state semi-Markov channels. We show that communication with arbitrarily small probability of error is not possible. Making use of results in finite blocklength channel coding, we determine sequences of blocklengths that optimize transmission volume communicated at fixed maximum message error probabilities. We provide a partial ordering of communication channels. A dynamic programming formulation is used to show the structural result that channel state feedback does not improve performance.

Full PDF

aa r X i v : . [ c s . I T ] J un Channels That Die

Lav R. Varshney, Sanjoy K. Mitter, and Vivek K Goyal

Abstract

Given the possibility of communication systems failing catastrophically, we investigate limits to communicatingover channels that fail at random times. These channels are ﬁnite-state semi-Markov channels. We show thatcommunication with arbitrarily small probability of error is not possible. Making use of results in ﬁnite blocklengthchannel coding, we determine sequences of blocklengths that optimize transmission volume communicated at ﬁxedmaximum message error probabilities. We provide a partial ordering of communication channels. A dynamicprogramming formulation is used to show the structural result that channel state feedback does not improveperformance.“a communication channel. . . might be inoperative because of an ampliﬁer failure, a broken or cut telephone wire, . . . ” — I. M. Jacobs [2]

I. I

NTRODUCTION

Physical systems have a tendency to fail at random times [3]. This is true whether considering communicationsystems embedded in sensor networks that may run out of energy [4], synthetic communication systems embeddedin biological cells that may die [5], communication systems embedded in spacecraft that may enter black holes[9], or communication systems embedded in oceans with undersea cables that may be cut [10]. In these scenariosand beyond, failure of the communication system may be modeled as communication channel death.As such, it is of interest to study information-theoretic limits on communicating over channels that die atrandom times. This paper gives results on the fundamental limits of what is possible and what is impossiblewhen communicating over channels that die. Communication with arbitrarily small probability of error ( Shannonreliability ) is not possible for any positive communication volume, however a suitably deﬁned notion of η -reliabilityis possible. Schemes that optimize communication volume for a given level of η -reliability are developed herein.The central trade-off in communicating over channels that die is in the lengths of codeword blocks. Longer blocksimprove communication performance as classically known, whereas shorter blocks have a smaller probability ofbeing prematurely terminated due to channel death. In several settings, a simple greedy algorithm for determining thesequence of blocklengths yields a certiﬁably optimal solution. We also develop a dynamic programming formulationto optimize the ordered integer partition that determines the sequence of blocklengths. Besides algorithmic utility,solving the dynamic program demonstrates the structural result that channel state feedback does not improveperformance.The optimization of codeword blocklengths is reminiscent of frame size control in wireless networks [11]–[14],however such techniques are used in conjunction with automatic repeat request protocols and are motivated byamortizing protocol information. Moreover, the results demonstrate the beneﬁt of adapting to either channel stateor decision feedback. Contrarily, we show that adaptation to channel state provides no beneﬁt for channels that die.Limits on channel coding with ﬁnite blocklength [15]–[21] are central to our development. Indeed, channelsthat die bring the notion of ﬁnite blocklength to the fore and provide a concrete physical reason to step back This work was supported in part by the NSF Grants CCR-0325774 and CCF-0729069.This work appeared in part in the Proceedings of the Forty-Seventh Annual Allerton Conference on Communication, Control, andComputing [1].L. R. Varshney was with the Department of Electrical Engineering and Computer Science, the Research Laboratory of Electronics, andthe Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, MA 02139 USA. He is now withthe IBM Thomas J. Watson Research Center, Hawthorne, NY 10532 USA (e-mail: [email protected]).S. K. Mitter is with the Department of Electrical Engineering and Computer Science, the Engineering Systems Division, and the Laboratoryfor Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, MA 02139 USA (e-mail: [email protected]).V. K. Goyal is with the Department of Electrical Engineering and Computer Science and the Research Laboratory of Electronics,Massachusetts Institute of Technology, Cambridge, MA 02139 USA (e-mail: [email protected]). We sidestep teleological discussions of natural biology [6], [7] by considering synthetic biology [8]. from inﬁnity. Notions of outage in wireless communication [22], [23] and lost letters in postal channels [24] aresimilar to channel death, except that neither outage nor lost letters are permanent conditions. Therefore blocklengthasymptotics are useful to study those channel models but are not useful for channels that die. Recent work that hassimilar motivations as this paper provides the outage capacity of a wireless channel [25].The remainder of the paper is organized as follows. Section II deﬁnes discrete memoryless channels that dieand shows that these channels have zero Shannon capacity. Section III states the communication system model andalso ﬁxes our novel performance criteria. Section IV shows that our notion of Shannon reliability is not achievable,strengthening the result of zero Shannon capacity and then provides a communication scheme and determinesits performance. Section V optimizes performance for several death distributions using either a greedy algorithmor a dynamic programming algorithm. Optimization demonstrates that channel state feedback does not improveperformance. Section VI discusses the partial ordering of channels. Section VII suggests several extensions to thiswork. II. C

HANNEL M ODEL

Consider a channel with ﬁnite input alphabet X and ﬁnite output alphabet Y . It has an alive state s = a whenit acts like a noisy discrete memoryless channel (DMC) and a dead state s = d when it erases the input. Assumethroughout the paper that the DMC from the alive state has zero error capacity [28] equal to zero. For example, if the channel acts like a binary symmetric channel (BSC) with crossover probability < ε < inthe alive state, with X = { , } , and Y = { , , ? } , then the transmission matrix in the alive state is p ( y | x, s = a ) = p a ( y | x ) = (cid:20) − ε ε ε − ε (cid:21) , (1)and the transmission matrix in the dead state is p ( y | x, s = d ) = p d ( y | x ) = (cid:20) (cid:21) . (2)The channel starts in state s = a and then transitions to s = d at some random time T , where it remains for all timethereafter. That is, the channel is in state a for times n = 1 , , . . . , T and in state d for times n = T + 1 , T + 2 , . . . .The death time distribution is denoted p T ( t ) . Note that there is always a ﬁnite t † such that p T ( t † ) > . A. Finite-State Semi-Markov Channel

Channels that die can be classiﬁed as ﬁnite-state channels (FSCs) [31, Sec. 4.6].

Proposition 1:

A channel that dies ( X , p a ( y | x ) , p d ( y | x ) , p T ( t ) , Y ) is a ﬁnite-state channel. Proof:

Follows by deﬁnition, since the channel has two states.Channels that die have semi-Markovian [32, Sec. 4.8], [33, Sec. 5.7] properties.

Deﬁnition 1:

A semi-Markov process changes state according to a Markov chain but takes a random amount oftime between changes. More speciﬁcally, it is a stochastic process with states from a discrete alphabet S , such thatwhenever it enters state s , s ∈ S : • The next state it will enter is state r with probability that depends only on s, r ∈ S . • Given that the next state to be entered is state r , the time until the transition from s to r occurs has distributionthat depends only on s, r ∈ S . Deﬁnition 2:

The Markovian sequence of states of a semi-Markov process is called the embedded Markov chainof the semi-Markov process.

Deﬁnition 3:

A semi-Markov process is irreducible if its embedded Markov chain is irreducible.

Proposition 2:

A channel that dies ( X , p a ( y | x ) , p d ( y | x ) , p T ( t ) , Y ) has a channel state sequence that is a non-irreducible semi-Markov process. The phrase “back from inﬁnity” is borrowed from J. Ziv’s 1997 Shannon Lecture. Our results can be extended to cover cases where the channel acts like other channels [26], [27] in the alive state. If the channel is noiseless in the alive state, the problem is similar to settings where fountain codes [29] are used in the point-to-pointcase and growth codes [30] are used in the network case.

Proof:

When in state a , the next state is d with probability and given that the next state is to be d , the timeuntil the transition from a to d has distribution p T ( t ) . When in state d , the next state is d with probability . Thus,the channel state sequence is a semi-Markov process.The semi-Markov state process is not irreducible because the a state of the embedded Markov chain is transient.Note that when T is a geometric random variable, the channel state process forms a Markov chain, with transientstate a and recurrent, absorbing state d .There are further special classes of FSCs. Deﬁnition 4:

An FSC is a ﬁnite-state semi-Markov channel (FSSMC) if its state sequence forms a semi-Markovprocess.

Deﬁnition 5:

An FSC is a ﬁnite-state Markov channel (FSMC) if its state sequence forms a Markov chain.

Proposition 3:

A channel that dies ( X , p a ( y | x ) , p d ( y | x ) , p T ( t ) , Y ) is an FSSMC and is an FSMC when T isgeometric. Proof:

Follows from Props. 1 and 2.FSMCs have been widely studied in the literature [31], [34], [35], particularly the panic button/child’s toy channelof Gallager [34, p. 26], [31, p. 103] and the Gilbert-Elliott channel and its extensions [36], [37].Contrarily, FSSMCs seem to not have been speciﬁcally studied in information theory. There are a few works [38]–[40] that give semi-Markov channel models for wireless communications systems but do not provide information-theoretic characterizations.

B. Capacity is Zero

A channel that dies has Shannon capacity equal to zero. To show this, ﬁrst notice that if the initial state of achannel that dies were not ﬁxed, then it would be an indecomposable FSC [31, Sec. 4.6], where the effect of theinitial state dies away.

Proposition 4:

If the initial state of a channel that dies ( X , p a ( y | x ) , p d ( y | x ) , p T ( t ) , Y ) is not ﬁxed, then it is anindecomposable FSC. Proof:

The embedded Markov chain for a channel that dies has a unique absorbing state d .Indecomposable FSCs have the property that the upper capacity, deﬁned in [31, (4.6.6)], and lower capacity,deﬁned in [31, (4.6.3)], are identical [31, Thm. 4.6.4]. This can be used to show that the capacity of a channel thatdies is zero. Proposition 5:

The Shannon capacity, C , of a channel that dies ( X , p a ( y | x ) , p d ( y | x ) , p T ( t ) , Y ) is zero. Proof:

Although the initial state is s = a here, temporarily suppose that s may be either a or d . Then thechannel is indecomposable by Prop. 4.The lower capacity C equals the upper capacity C , for indecomposable channels by [31, Thm. 4.6.4]. Theinformation rate of a memoryless p d ( y | x ) ‘dead’ channel is clearly zero for any input distribution, so the lowercapacity C = 0 . Thus the Shannon capacity for a channel that dies with initial alive state is C = C = 0 .III. C OMMUNICATION S YSTEM

In order to information theoretically characterize a channel that dies, a communication system that contains thechannel is described.We have an information stream (like i.i.d. equiprobable bits), which can be grouped into a sequence of k messages, ( W , W , . . . , W k ) . Each message W i is drawn from a message set W i = { , , . . . , M i } . Each message W i is en-coded into a channel input codeword X n i ( W i ) and these codewords ( X n ( W ) , X n ( W ) , . . . , X n k ( W k )) are trans-mitted in sequence over the channel. A noisy version of this codeword sequence is received, Y n + n + ··· + n k ( W , W , . . . , W k ) .The receiver then guesses the sequence of messages using an appropriate decoding rule g , to produce ( ˆ W , ˆ W , . . . , ˆ W k ) = g ( Y n + n + ··· + n k ) . The ˆ W i s are drawn from alphabets W ⊖ i = W i ∪ ⊖ , where the ⊖ message indicates the decoderdeclaring an erasure. The receiver makes an error on message i if ˆ W i = W i and ˆ W i = ⊖ .Block coding results are typically expressed with the concern of sending one message rather than k messages ashere. Tree codes are beyond the scope of this paper, since we desire to communicate messages. A reformulation of communicating overchannels that die using tree codes [41, Ch. 10] with early termination [42] would, however, be interesting. In fact, communicating overchannels that die using convolutional codes with sequential decoding would be very natural, but would require performance criteria differentfrom the ones developed herein.

System deﬁnitions can be formalized as follows.

Deﬁnition 6: An ( M i , n i ) individual message code for a channel that dies ( X , p a ( y | x ) , p d ( y | x ) , p T ( t ) , Y ) consistsof:1) An individual message index set { , , . . . , M i } , and2) An individual message encoding function f i : { , , . . . , M i } 7→ X n i .The individual message index set { , , . . . , M i } is denoted W i , and the set of individual message codewords { f i (1) , f i (2) , . . . , f i ( M i ) } is called the individual message codebook . Deﬁnition 7: An ( M i , n i ) ki =1 code for a channel that dies ( X , p a ( y | x ) , p d ( y | x ) , p T ( t ) , Y ) is a sequence of k individual message codes, ( M i , n i ) ki =1 , in the sense of comprising:1) A sequence of individual message index sets W , W , . . . , W k ,2) A sequence of individual message encoding functions f = ( f , f , . . . , f k ) , and3) A decoding function g : Y P ki =1 n i

7→ W ⊖ × W ⊖ × · · · × W ⊖ k .There is no essential loss of generality by assuming that the decoding function g is decomposed into a sequenceof individual message decoding functions g = ( g , g , . . . , g n ) where g i : Y n i

7→ W ⊖ i when individual messagesare chosen independently, due to this independence and the conditional memorylessness of the channel.To deﬁne performance measures, we assume that the decoder operates on an individual message basis. That is,when applying the communication system, let ˆ W = g ( Y n ) , ˆ W = g ( Y n + n n +1 ) , and so on.For the sequel, we make a further assumption on the operation of the decoder. Assumption 1:

If all n i channel output symbols used by individual message decoder g i are not ? , then the rangeof g i is W i . If any of the n i channel output symbols used by individual message decoder g i are ? , then g i maps to ⊖ .This assumption corresponds to the physical properties of a communication system where the decoder fails catas-trophically. Once the decoder fails, it cannot perform any decoding operations, and so the ? symbols in the channelmodel of system failure must be ignored. A. Performance Measures

We formally write the notion of error for the communication system as follows.

Deﬁnition 8:

For all ≤ w ≤ M i , let λ w ( i ) = Pr[ ˆ W i = w | W i = w, ˆ W i = ⊖ ] be the conditional message probability of error given that the i th individual message is w . Deﬁnition 9:

The maximal probability of error for an ( M i , n i ) individual message code is λ max ( i ) = max w ∈W i λ w ( i ) . Deﬁnition 10:

The maximal probability of error for an ( M i , n i ) ki =1 code is λ max = max i ∈{ ,...,k } λ max ( i ) .Performance criteria weaker than traditional in information theory are deﬁned, since the Shannon capacity of achannel that dies is zero (Prop. 5). In particular, we deﬁne formal notions of how much information is transmittedusing a code and how long it takes. Deﬁnition 11:

The transmission time of an ( M i , n i ) ki =1 code is N = P ki =1 n i . Deﬁnition 12:

The expected transmission volume of an ( M i , n i ) ki =1 code is V = E T  X i ∈{ ,...,k | ˆ W i = ⊖} log M i  .Notice that although declared erasures do not lead to errors, they do not contribute transmission volume either.The several performance criteria for a code may be combined together. Deﬁnition 13:

Given ≤ η < , a pair of numbers ( N , V ) (where N is a positive integer and V is non-negative) is said to be an achievable transmission time-volume at η -reliability if there exists, for some k , an ( M i , n i ) ki =1 code for the channel that dies ( X , p a ( y | x ) , p d ( y | x ) , p T ( t ) , Y ) such that λ max ≤ η , (3) N ≤ N , and (4) V ≥ V . (5)Moreover, ( N , V ) is said to be an achievable transmission time-volume at Shannon reliability if it is anachievable transmission time-volume at η -reliability for all < η < .IV. L IMITS ON C OMMUNICATION

Having deﬁned the notion of achievable transmission time-volume at various levels of reliability, the goal of thiswork is to demarcate what is achievable.

A. Shannon Reliability is Not Achievable

Not only is the Shannon capacity of a channel that dies zero, but also there is no

V > such that ( N, V ) is an achievable transmission time-volume at Shannon reliability. A coding scheme that always declares erasureswould achieve zero error probability (and therefore Shannon reliability) but would not provide positive transmissionvolume; this is also not allowed under Assumption 1.Lemmas are stated and proved after the proof of the main proposition. For brevity, the proof is limited to thealive-BSC case, but can be extended to general alive-DMCs by choosing the two most distant letters in Y forconstructing the repetition code, among other things. Proposition 6:

For a channel that dies ( X , p a ( y | x ) , p d ( y | x ) , p T ( t ) , Y ) , there is no V > such that ( N, V ) is anachievable transmission time-volume at Shannon reliability. Proof:

From the error probability viewpoint, transmitting longer codes is not harder than transmitting shortercodes (Lem. 1) and transmitting smaller codes is not harder than transmitting larger codes (Lem. 2). Hence, thedesired result follows from showing that even the longest and smallest code that has positive expected transmissionvolume cannot achieve Shannon reliability.Clearly the longest and smallest code uses a single individual message code of length n → ∞ and size M = 2 .Among such codes, transmitting the binary repetition code is not harder than transmitting any other code (Lem. 3).Hence showing that the binary repetition code cannot achieve Shannon reliability yields the desired result.Consider transmitting a single ( M = 2 , n ) individual message code that is simply a binary repetition code overa channel that dies ( X , p a ( y | x ) , p d ( y | x ) , p T ( t ) , Y ) .Let W = { . . . , . . . } , where the two codewords are of length n . Assume that the all-zeros codewordand the all-ones codeword are each transmitted with probability / and measure average probability of error, sinceaverage error probability lower bounds λ max (1) [31, Problem 5.32]. The transmission time N = n and let N → ∞ .The expected transmission volume is log 2 > .Under equiprobable signaling over a BSC, the minimum error probability decoder is the maximum likelihooddecoder, which in turn is the minimum distance decoder [43, Problem 2.13].The scenario corresponds to binary hypothesis testing over a BSC( ε ) with T observations (since after the channeldies, the output symbols do not help with hypothesis testing). Since there is a ﬁnite t † such that p T ( t † ) > , thereis a ﬁxed constant K such that λ max > K > for any realization T = t .Thus Shannon reliability is not achievable. Lemma 1:

When transmitting over the alive state’s memoryless channel p a ( y | x ) , let the maximal probability oferror λ max ( i ) for an optimal ( M i , n i ) individual message code and minimum probability of error individual decoder g i be λ max ( i ; n i ) . Then λ max ( i ; n i + 1) ≤ λ max ( i ; n i ) . Proof:

Consider the optimal block-length- n i individual message code/decoder, which achieves λ max ( i ; n i ) . Useit to construct an n i + 1 individual message code that appends a dummy symbol to each codeword and an associateddecoder that operates by ignoring this last symbol. The error performance of this (suboptimal) code/decoder is clearly λ max ( i ; n i ) , and so the optimal performance can only be better: λ max ( i ; n i + 1) ≤ λ max ( i ; n i ) . Lemma 2:

When transmitting over the alive state’s memoryless channel p a ( y | x ) , let the maximal probabilityof error P max e ( i ) for an optimal ( M i , n i ) individual message code and minimum probability of error individualdecoder f ( i ) D be P max e ( i ; M i ) . Then P max e ( i ; M i ) ≤ P max e ( i ; M i + 1) . Proof:

Follows from sphere-packing principles.

Lemma 3:

When transmitting over the alive state’s memoryless channel p a ( y | x ) , the optimal ( M i = 2 , n i ) individual message code can be taken as a binary repetition code. Proof:

Under minimum distance decoding (which yields the minimum error probability [43, Problem 2.13])for a code transmitted over a BSC, increasing the distance between codewords can only reduce error probability.The repetition code has maximum Hamming distance between codewords.Notice that Prop. 6 also directly implies Prop. 5, providing an alternate proof.

Corollary 1:

The Shannon capacity of a channel that dies ( X , p a ( y | x ) , p d ( y | x ) , p T ( t ) , Y ) is zero. B. Finite Blocklength Channel Coding

Before developing an optimal scheme for η -reliable communication over a channel that dies, ﬁnite block lengthchannel coding is reviewed.Under our deﬁnitions, traditional channel coding results [15], [17]–[21] provide information about individualmessage codes, determining the achievable trios ( n i , M i , λ max ( i )) . In particular, the largest possible M i for a given n i and λ max ( i ) is denoted M ∗ ( n i , λ max ( i )) .The purpose of this work is not to improve upper and lower bounds on ﬁnite block length channel coding,but to use existing results to study channels that die. In fact, for the sequel, simply assume that the function M ∗ ( n i , λ max ( i )) is known, as are codes/decoders that achieve this value. In principle, optimal individual messagecodes may be found through exhaustive search [17], [44]. Although algebraic notions of code quality do not directlyimply error probability quality [45], perfect codes such as the Hamming or Golay codes may also be optimal incertain limited cases.Recent results comparing upper and lower bounds around Strassen’s normal approximation to log M ∗ ( n i , λ max ( i )) [46] have demonstrated that the approximation is quite good [19]. Remark 1:

We assume that optimal M ∗ ( n i , η ) -achieving individual message codes are known. Exact upper andlower bounds to log M ∗ ( n i , η ) can be substituted to make our results precise. For numerical demonstrations, wewill further assume that optimal codes have performance given by Strassen’s approximation.The following expression for log M ∗ ( n i , η ) that ﬁrst appeared in [46] is also given as [19, Thm. 6]. Lemma 4:

Let M ∗ ( n i , η ) be the largest size of an individual message code with block length n i and maximalerror probability upper bounded by λ max ( i ) < η . Then, for any DMC with capacity C and < η ≤ / , log M ∗ ( n i , η ) = n i C − √ n i ρQ − ( η ) + O (log n i ) ,where Q ( x ) = 1 √ π Z ∞ x e − t / dt , ρ = min X : C = I ( X ; Y ) var (cid:20) log p Y | X ( y | x ) p Y ( y ) (cid:21) ,and standard asymptotic notation [47] is used.For the BSC( ε ), the approximation (ignoring the O (log n i ) term above) is: log M ∗ ≈ n i (1 − h ( ε )) − p n i ε (1 − ε ) Q − ( η ) log ε − ε , (6)where h ( · ) is the binary entropy function. This BSC expression ﬁrst appeared in [48].For intuition, we plot the approximate log M ∗ ( n i , η ) function for a BSC( ε ) in Fig. 1(a). Notice that log M ∗ iszero for small n i since no code can achieve the target error probability η . Also notice that log M ∗ is a monotonicallyincreasing function of n i . Moreover, notice in Fig. 1(b) that even when normalized, (log M ∗ ) /n i , is a monotonicallyincreasing function of n i . Therefore longer blocks provide more ‘bang for the buck.’ The curve in Fig. 1(b)asymptotically approaches capacity. INDIVIDUAL MESSAGE BLOCK LENGTH n i I ND I V I DU A L M ESSA G E T R A N S M I SS I O N V O L U M E (a) INDIVIDUAL MESSAGE BLOCK LENGTH n i I ND I V I DU A L M ESSA G E R A T E (b)Fig. 1. (a). The expression (6) for ε = 0 . and η = 0 . . (b). Normalized version, (log M ∗ ( n i , η )) /n i , for ε = 0 . and η = 0 . .The capacity of a BSC( ε ) is − h ( ε ) = 0 . . C. η -reliable Communication We now describe a coding scheme that achieves positive expected transmission volume at η -reliability. Survivalprobability of the channel plays a key role in measuring performance. Deﬁnition 14:

The survival function of a channel that dies ( X , p a ( y | x ) , p d ( y | x ) , p T ( t ) , Y ) is Pr[

T > t ] , is denoted R T ( t ) , and satisﬁes R T ( t ) = Pr[ T > t ] = 1 − t X τ =1 p T ( τ ) = 1 − F T ( t ) ,where F T is the cumulative distribution function. R T ( t ) is a non-increasing function. Proposition 7:

The transmission time-volume N = k X i =1 n i , V = k X i =1 R T ( e i ) log M ∗ ( n i , η ) ! is achievable at η -reliability for any sequence ( n i ) ki =1 of individual message codeword lengths, where e = 0 , e = n , e = n + n , . . . , e k = P ki =1 n i . Proof:Code Design:

A target error probability η and a sequence ( n i ) ki =1 of individual message codeword lengthsare ﬁxed. Construct a length- k sequence of ( M i , n i ) individual message codes and individual decoding functions ( W i , f i , g i ) that achieve optimal performance. The size of W i is |W i | = log M ∗ ( n i , η ) . Note that individual decodingfunctions g i have range W i rather than W ⊖ i . Encoding:

A codeword W = w is selected uniformly at random from the codebook W . The mapping of thiscodeword into n channel input letters, X e e +1 = f ( w ) , is transmitted in channel usage times n = e + 1 , e +2 , . . . , e .Then a codeword W = w is selected uniformly at random from the codebook W . The mapping of this codewordinto n channel input letters, X e e +1 = f ( w ) , is transmitted in channel usage times n = e + 1 , e + 2 , . . . , e .This procedure continues until the last individual message code in the code is transmitted. That is, a codeword W k = w k is selected uniformly at random from the codebook W k . The mapping of this codeword into n k channelinput letters, X e k e k − +1 = f k ( w k ) , is transmitted in channel usage times n = e k − + 1 , e k − + 2 , . . . , e k .We refer to channel usage times n ∈ { e i − + 1 , e i − + 2 , . . . , e i } as the i th transmission epoch. Decoding:

For decoding, the channel output symbols for each epoch are processed separately. If any of thechannel output symbols in an epoch are erasure symbols ? , then a decoding erasure ⊖ is declared for the messagein that epoch, i.e. ˆ W i = ⊖ . Otherwise, the individual message decoding function g i : Y n i → W i is applied toobtain ˆ W i = g i ( Y e i e i − +1 ) . Performance Analysis:

Having deﬁned the communication scheme, we measure the error probability, transmissiontime, and expected transmission volume.The decoder will either produce an erasure ⊖ or use an individual message decoder g i . When g i is used, themaximal error probability of individual message code error is bounded as λ max ( i ) < η by construction. Sincedeclared erasures ⊖ do not lead to error, and since all λ max ( i ) < η , it follows that λ max < η .The transmission time is simply N = P n i .Recall the deﬁnition of expected transmission volume: E  X i ∈{ ,...,k | ˆ W i = ⊖} log M i  = X i ∈{ ,...,k | ˆ W i = ⊖} E { log M i } and the fact that the channel produces the erasure symbol ? for all channel usage times after death, n > T , but notbefore. Combining this with the length of an optimal code, log M ∗ ( n i , η ) , leads to the expression k X i =1 Pr[

T > e i ] log M ∗ ( n i , η ) ,since all individual message codewords that are received in their entirety before the channel dies are decoded using g i whereas any individual message codewords that are even partially cut off are declared ⊖ .Recalling the deﬁnition of the survival function, the expected transmission volume of the communication schemeis k X i =1 R T ( e i ) log M ∗ ( n i , η ) as desired.Prop. 7 is valid for any choice of ( n i ) ki =1 . Since (log M ∗ ) /n i is monotonically increasing, it is better to useindividual message codes that are as long as possible. With longer individual message codes, however, there is agreater chance of many channel usages being wasted if the channel dies in the middle of transmission. The basictrade-off is captured in picking the set of values { n , n , . . . , n k } . For ﬁxed and ﬁnite N , this involves picking anordered integer partition n + n + · · · + n k = N . We optimize this choice in Section V. D. Converse Arguments

Since we simply have operational expressions and no informational expressions in our development, as perRemark 1, and since optimal individual message codes and individual message decoders are assumed to be used,it may seem as though converse arguments are not required. This would indeed follow, if the following two thingswere true, which follow from Assumption 1. First, that there is no beneﬁt in trying to decode the last partiallyerased message block. Second, that there is no beneﬁt to errors-and-erasures decoding [49] by the g i for codewordsthat are received before channel death. Under Assumption 1, Prop. 7 gives the best performance possible.One might wonder whether Assumption 1 is needed. That there would be no beneﬁt in trying to decode the lastpartially erased block follows from the conjecture that an optimal individual message code would have no latentredundancy that could be exploited to achieve a λ max ( i = last) < η , but this is a property of the actual optimalcode.Understanding the possibility of errors-and-erasures decoding [49] by the individual message decoders alsorequires knowing properties of actual optimal codes. It is unclear how the choice of threshold in errors-and-erasuresdecoding would affect the expected transmission volume k X i =1 (1 − ξ i ) R T ( e i ) log M ∗ ( n i , ξ i , η ) ,where ξ i would be the speciﬁed erasure probability for individual message i , and M ∗ ( n i , ξ i , η ) would be themaximum individual message codebook size under erasure probability ξ i and maximum error probability η .What we can say, however, is that at the level of Strassen’s approximation (up to the log n term), log M ∗ ( n i , ξ i , η ) and log M ∗ ( n i , η ) are the same [50, Thm. 47].V. O PTIMIZING THE C OMMUNICATION S CHEME

In Section IV-C, we had not optimized the lengths of the individual message codes; we do so here. For ﬁxed η and N , we maximize the expected transmission volume V over the choice of the ordered integer partition n + n + · · · + n k = N : max ( n i ) ki =1 : P n i = N k X i =1 R T ( e i ) log M ∗ ( n i , η ) . (7)For ﬁnite N , this optimization can be carried out by an exhaustive search over all N − ordered integer partitions.If the death distribution p T ( t ) has ﬁnite support, there is no loss of generality in considering only ﬁnite N . Sinceexhaustive search has exponential complexity, however, there is value in trying to use a simpliﬁed algorithm. Adynamic programming formulation for the ﬁnite horizon case is developed in Section V-C. The next subsectiondevelops a greedy algorithm which is applicable to both the ﬁnite and inﬁnite horizon cases and yields the optimalsolution for certain problems. A. A Greedy Algorithm

To try to solve the optimization problem (7), we propose a greedy algorithm that optimizes blocklengths n i oneby one. Algorithm 1:

1) Maximize R T ( n ) log M ∗ ( n , η ) through the choice of n independently of any other n i .2) Maximize R T ( e ) log M ∗ ( n , η ) after ﬁxing e = n , but independently of later n i .3) Maximize R T ( e ) log M ∗ ( n , η ) after ﬁxing e , but independently of later n i .4) Continue in the same manner for all subsequent n i .Sometimes the algorithm produces the correct solution. Proposition 8:

The solution produced by the greedy algorithm, ( n i ) , is locally optimal if R T ( e i ) log M ∗ ( n i , η ) − R T ( e i −

1) log M ∗ ( n i − , η ) R T ( e i +1 ) [log M ∗ ( n i +1 + 1 , η ) − log M ∗ ( n i +1 , η )] ≥ (8)for each i . Proof:

The solution of the greedy algorithm partitions time using a set of epoch boundaries ( e i ) . The proofproceeds by testing whether local perturbation of an arbitrary epoch boundary can improve performance. There aretwo possible perturbations: a shift to the left or a shift to the right.First consider shifting an arbitrary epoch boundary e i to the right by one. This makes the left epoch longer andthe right epoch shorter. Lengthening the left epoch does not improve performance due to the greedy optimization ofthe algorithm. Shortening the right epoch does not improve performance since R T ( e i ) remains unchanged whereas log M ∗ ( n i , η ) does not increase since log M ∗ is a non-decreasing function of n i .Now consider shifting an arbitrary epoch boundary e i to the left by one. This makes the left epoch shorter andthe right epoch longer. Reducing the left epoch will not improve performance due to greediness, but enlarging theright epoch might improve performance, so the gain and loss must be balanced.The loss in performance (a positive quantity) for the left epoch is ∆ l = R T ( e i ) log M ∗ ( n i , η ) − R T ( e i −

1) log M ∗ ( n i − , η ) whereas the gain in performance (a positive quantity) for the right epoch is ∆ r = R T ( e i +1 ) [log M ∗ ( n i +1 + 1 , η ) − log M ∗ ( n i +1 , η )] .If ∆ l ≥ ∆ r , then perturbation will not improve performance. The condition may be rearranged as R T ( e i ) log M ∗ ( n i , η ) − R T ( e i −

1) log M ∗ ( n i − , η ) R T ( e i +1 ) [log M ∗ ( n i +1 + 1 , η ) − log M ∗ ( n i +1 , η )] ≥ This is the condition (8), so the left-perturbation does not improve performance. Hence, the solution produced bythe greedy algorithm is locally optimal.

Proposition 9:

The solution produced by the greedy algorithm, ( n i ) , is globally optimal if R T ( e i ) log M ∗ ( n i , η ) − R T ( e i − K i ) log M ∗ ( n i − K i , η ) R T ( e i +1 ) [log M ∗ ( n i +1 + K i , η ) − log M ∗ ( n i +1 , η )] ≥ (9)for each i , and any non-negative integers K i ≤ n i . Proof:

The result follows by repeating the argument for local optimality in Prop. 8 for shifts of any admissiblesize K i .There is an easily checked special case of global optimality condition (9) under the Strassen approximation,given in the forthcoming Prop. 10. Lemma 5:

The function log M ∗ S ( z, η ) − log M ∗ S ( z − K, η ) is a non-decreasing function of z for any K , where log M ∗ S ( z, η ) = zC − √ zρQ − ( η ) (10)is Strassen’s approximation. Proof:

Essentially follows from the fact that √ z is a concave ∩ function in z . More speciﬁcally √ z satisﬁes −√ z + √ z − K ≤ −√ z + 1 + √ z + 1 − K for K ≤ z . This implies: −√ z √ ρQ − ( η ) + √ z − K √ ρQ − ( η ) ≤ −√ z + 1 √ ρQ − ( η ) + √ z + 1 − K √ ρQ − ( η ) .Adding the positive constant KC to both sides, in the form zC − zC + KC on the left and in the form ( z + 1) C − ( z + 1) C + KC on the right yields zC − √ zρQ − ( η ) − ( z − K ) C + √ z − K √ ρQ − ( η ) ≤ ( z + 1) C − √ z + 1 √ ρQ − ( η ) − ( z + 1 − K ) C + √ z + 1 − K √ ρQ − ( η ) and so [log M ∗ S ( z, η ) − log M ∗ S ( z − K, η )] ≤ [log M ∗ S ( z + 1 , η ) − log M ∗ S ( z + 1 − K, η )] . Proposition 10:

If the solution produced by the greedy algorithm using Strassen’s approximation (10) satisﬁes n ≥ n ≥ · · · ≥ n k , then condition (9) for global optimality is satisﬁed. Proof:

Since R T ( · ) is a non-increasing survival function, R T ( e i − K ) ≥ R T ( e i +1 ) (11)for the non-negative integer K . Since the function [log M ∗ S ( z, η ) − log M ∗ S ( z − K, η )] is a non-decreasing functionof z by Lem. 5, and since the n i are in non-increasing order, log M ∗ S ( n i , η ) − log M ∗ S ( n i − K, η ) ≥ log M ∗ S ( n i +1 + K, η ) − log M ∗ S ( n i +1 , η ) . (12)Taking products of (11) and (12) and rearranging yields the condition: R T ( e i − K ) [log M ∗ S ( n i , η ) − log M ∗ S ( n i − K, η )] R T ( e i +1 ) (cid:2) log M ∗ S ( n i +1 + K, η ) − log M ∗ S ( n i +1 , η ) (cid:3) ≥ .Since R T ( · ) is a non-increasing survival function, R T ( e i − K ) ≥ R T ( e i ) ≥ R T ( e i +1 ) .Therefore the global optimality condition (9) is also satisﬁed, by substituting R T ( e i ) for R T ( e i − K ) in one place. B. Geometric Death Distribution

A common failure mode for systems that do not age is a geometric death time T [3]: p T ( t ) = α (1 − α ) ( t − ,and R T ( t ) = (1 − α ) t ,where α is the death time parameter. Proposition 11:

When T is geometric, then the solution to (7) under Strassen’s approximation yields equal epochsizes. This optimal size is given by arg max ν R T ( ν ) log M ∗ ( ν, η ) . Proof:

Begin by showing that Algorithm 1 will produce a solution with equal epoch sizes. Recall that thesurvival function of a geometric random variable with parameter < α ≤ is R T ( t ) = (1 − α ) t . Therefore theﬁrst step of the algorithm will choose n as n = arg max ν (1 − α ) ν log M ∗ ( ν, η ) .The second step of the algorithm will choose n = arg max ν (1 − α ) n (1 − α ) ν log M ∗ ( ν, η )= arg max ν (1 − α ) ν log M ∗ ( ν, η ) ,which is the same as n . In general, n i = arg max ν (1 − α ) e i − (1 − α ) ν log M ∗ ( ν, η )= arg max ν (1 − α ) ν log M ∗ ( ν, η ) ,so n = n = · · · .Such a solution satisﬁes n ≥ n ≥ · · · and so it is optimal by Prop. 10.The optimal epoch size for geometric death under Strassen’s approximation can be found analytically, [51,Sec. 6.4.2]. Consider the setting when the alive state corresponds to a BSC( ε ). For ﬁxed crossover probability ε and target error probability η , the optimal epoch size is plotted as a function of α in Fig. 2. The less likely thechannel is to die early, the longer the optimal epoch length. CHANNEL DEATH α O P T I M A L EP O CH L E N G T H Fig. 2. Optimal epoch lengths under Strassen’s approximation for an ( ε, α ) BSC-geometric channel that dies for ε = 0 . and η = 0 . . CHANNEL DEATH α CH A NN E L N O I SE ε η = 1e−2 η = 1e−3 η = 1e−4 η = 1e−5 η = 1e−6 η = 1e−7 η = 1e−8 Fig. 3. Achievable η -reliability in sending bits over ( ε, α ) BSC-geometric channel that dies.

Alternatively, rather than ﬁxing η , one might ﬁx the number of bits to be communicated and ﬁnd the best levelof reliability that is possible. Fig. 3 shows the best λ max = η that is possible when communicating bits over aBSC( ε )-geometric( α ) channel that dies.Notice that the geometric death time distribution forms a boundary case for Prop. 10. One can consider discreteWeibull death time distributions [52] to see what happens with heavier tails: p T ( t ) = (1 − α ) ( t − β − (1 − α ) t β ,and R T ( t ) = (1 − α ) t β ,where β is the shape parameter. When β > , the tail is lighter than geometric and when β < , the tail is heavierthan geometric.With heavy-tailed death distributions, the greedy algorithm gives epoch sizes that are non-increasing: n ≥ n ≥· · · , and therefore optimal; it is better to send long blocks ﬁrst and then send shorter ones. C. Dynamic Programming

The greedy algorithm of the previous section solves (7) under certain conditions. For ﬁnite N , a dynamic program(DP) may be used to solve (7) under any conditions. To develop the DP formulation [53], we assume that channel state feedback (whether the channel output is ? or whether it is some other symbol) is available to the transmitter,however solving the DP will show that channel state feedback is not required. System Dynamics: (cid:20) ζ n ω n (cid:21) = (cid:20) ( ζ n − + 1)ˆ s n − ω n − κ n − (cid:21) , (13)for n = 1 , , . . . , N + 1 . The following state variables, disturbances, and controls are used: • ζ n ∈ Z ∗ is a state variable that counts the location in the current transmission epoch, • ω n ∈ { , } is a state variable that indicates whether the channel is alive ( ) or dead ( ), • κ n ∈ { , } ∼ Bern ( R T ( n )) is a disturbance that kills ( ) or revives ( ) the channel in the next time step,and • ˆ s n ∈ { , } is a control input that starts ( ) or continues ( ) a transmission epoch in the next time step. Initial State:

Since the channel starts alive (note that R T (1) = 1 ) and since the ﬁrst transmission epoch starts atthe beginning of time, (cid:20) ζ ω (cid:21) = (cid:20) (cid:21) . (14) Additive Cost:

Transmission volume log M ∗ ( ζ n + 1 , η ) is credited if the channel is alive (i.e. ω n = 1 ) and thetransmission epoch is to be restarted in the next time step (i.e. − ˆ s n = 1 ). This implies a cost function c n ( ζ n , ω n , ˆ s n ) = − (1 − ˆ s n ) ω n log M ∗ ( ζ n + 1 , η ) . (15)This is negative so that smaller is better. Terminal Cost:

There is no terminal cost: c N +1 = 0 . Cost-to-go:

From time n to time N + 1 is: E ~κ ( N X i = n c i ( ζ i , ω i , ˆ s i ) ) = − E ~κ ( N X i = n (1 − ˆ s i ) ω i log M ∗ ( ζ i + 1 , η ) ) .Notice that the state variable ζ n which counts epoch time is known to the transmitter and is determinable bythe receiver through transmitter simulation. The state variable ω n indicates the channel state and is known to thereceiver by observing the channel output. It may be communicated to the transmitter through the channel statefeedback. The following result follows directly. Proposition 12:

A communication scheme that follows the dynamics (13) and additive cost (15) achieves thetransmission time-volume

N, V = − E " N X n =1 c n at η -reliability.DP may be used to ﬁnd the optimal control policy (ˆ s n ) . Proposition 13:

The optimal − V for the initial state (14), dynamics (13), additive cost (15), and no terminalcost is equal to the cost of the solution produced by the dynamic programming algorithm. Proof:

The system described by initial state (14), dynamics (13), and additive cost (15) is in the form of the basic problem of dynamic programming [53, Sec. 1.2]. Thus the result follows from [53, Prop. 1.3.1]The DP optimization computations are now carried out; standard J notation is used for cost [53]. The base caseat time N + 1 is J N +1 ( ζ N +1 , ω N +1 ) = c N +1 = 0 .In proceeding backwards from time N to time : J n ( ζ n , ω n ) = min ˆ s n ∈{ , } E κ n { c n ( ζ n , ω n , ˆ s n ) + J n +1 ( f n ( ζ n , ω n , ˆ s n , κ n )) } ,for n = 1 , , . . . , N , where f n ( ζ n , ω n , ˆ s n , κ n ) = (cid:2) ζ n +1 ω n +1 (cid:3) T = (cid:2) ( ζ n + 1)ˆ s n ω n κ n (cid:3) T . Substituting our additive cost function yields: J n ( ζ n , ω n ) = min ˆ s n ∈{ , } − E κ n { (1 − ˆ s n ) ω n log M ∗ ( ζ n + 1 , η ) } + E κ n { J n +1 } (16) = min ˆ s n ∈{ , } − (1 − ˆ s n ) R T ( n ) log M ∗ ( ζ n + 1 , η ) + E κ n { J n +1 } .Notice that the state variable ω n dropped out of the ﬁrst term when we took the expectation with respect to thedisturbance κ n . This is true for each stage in the DP. Proposition 14:

For a channel that dies ( X , p a ( y | x ) , p d ( y | x ) , p T ( t ) , Y ) , channel state feedback does not improveperformance. Proof:

By repeating the expectation calculation in (16) for each stage n in the stage-by-stage DP algorithm,it is veriﬁed that state variable ω does not enter into the stage optimization problem. Hence the transmitter doesnot require channel state feedback to determine the optimal signaling strategy. D. A Dynamic Programming Example

To provide some intuition on the choice of epoch lengths, we present a short example. Consider the channel thatdies with X = { , } , Y = { , , ? } , p a ( y | x ) given by (1) with ε = 0 . , p d ( y | x ) given by (2), and p T ( t ) that isuniform over a ﬁnite horizon of length (disallowing death in the ﬁrst time step): p T ( t ) = ( / , t = 2 , . . . , , otherwise.Our goal is to communicate with η -reliability, η = 0 . .Since the death distribution has ﬁnite support, there is no beneﬁt to transmitting after death is guaranteed. Supposesome sequence of n i s is chosen arbitrarily: ( n = 13 , n = 13 , n = 13 , n = 1) . This has expected transmissionvolume (under the Strassen approximation) V = X i =1 R T ( e i ) log M ∗ ( n i , η ) ( a ) = log M ∗ (13 , . X i =1 R T ( e i )= log M ∗ (13 , . R T (13) + R T (26) + R T (39)]= 4 . /

13 + 14 /

39 + 1 /

39] = 4 . bits.where (a) removes the fourth epoch since uncoded transmission cannot achieve η -reliability.If we run the DP algorithm to optimize the ordered integer partition, we get the result ( n = 20 , n = 12 , n =6 , n = 2) . Notice that since the solution is in order, the greedy algorithm would also have succeeded. The expectedtransmission volume for this strategy (under the Strassen approximation) is V = R T (20) log M ∗ (20 , . R T (32) log M ∗ (12 , . R T (38) log M ∗ (6 , . / · . / · . / · . . bits. E. A Precise Solution

It has been assumed that optimal ﬁnite block length codes are known and used. Moreover, the Strassen approx-imation has been used for certain computations. It is, however, also of interest to determine precisely which codeshould be used over a channel that dies. This subsection gives an example where a sequence of length- binaryGolay codes [54] are optimal. Similar examples may be developed for other perfect codes; a perfect code is onefor which there are equal-radius spheres centered at the codewords that are disjoint and that completely ﬁll X n i . Equivalently ( n = 20 , n = 12 , n = 6 , n = 1 , n = 1) , since the last two channel usages are wasted (see Fig. 1(a)) to hedge againstchannel death. Before presenting the example, the sphere-packing upper bound on log M ∗ ( n i , η ) for a BSC( ε ) is derived. Recallthe notion of decoding radius [55] and let ρ ( ε, η ) be the largest integer such that ρ X s =0 (cid:18) n i s (cid:19) ε s (1 − ε ) n i − s ≤ − η .The sphere-packing bound follows from counting how many decoding regions of radius ρ could conceivably ﬁtin the Hamming space n i disjointly. Let D s,m be the number of channel output sequences that are decoded intomessage w m and have distance s from the m th codeword. By the nature of Hamming space, D s,m ≤ (cid:18) n i s (cid:19) and due to the volume constraint, M X m =1 ρ X s =0 D s,m ≤ n i .Hence, the maximal codebook size M ∗ ( n i , η ) is upper-bounded as M ∗ ( n i , η ) ≤ n i P ρs =0 D s,m ≤ n i P ρ ( ε,η ) s =0 (cid:0) n i s (cid:1) .Thus the sphere-packing upper bound on log M ∗ ( n i , η ) is log M ∗ ( n i , η ) ≤ n i − log  ρ ( ε,η ) X s =0 (cid:18) n i s (cid:19) , log M sp ( n i , η ) .Perfect codes such as the binary Golay code of length can sometimes achieve the sphere-packing bound withequality.Consider an ( ε, α ) BSC-geometric channel that dies, with ε = 0 . and α = 0 . . The target error probabilityis ﬁxed at η = 2 . × − . For these values of ε and η , the decoding radius ρ ( ε, η ) = 1 for ≤ n i ≤ . It is ρ ( ε, η ) = 2 for ≤ n i ≤ ; ρ ( ε, η ) = 3 for ≤ n i ≤ ; ρ ( ε, η ) = 4 for ≤ n i ≤ ; and so on.Moreover, one can note that the ( n = 23 , M = 4096) binary Golay code has a decoding radius of ; thus itmeets the BSC sphere-packing bound M sp (23 , . × − ) = 2 with equality.Now to bring channel death into the picture. If one proceeds greedily, following Algorithm 1, but using thesphere-packing bound log M sp ( n i , η ) rather than the optimal log M ∗ ( n i , η ) , n ( ε = 0 . , α = 0 . , η = 2 . × − )= arg max ν ¯ α ν log ν P ρ ( ε,η ) s =0 = 23 .By the memorylessness argument of Prop. 11, it follows that running Algorithm 1 with the sphere-packing boundwill yield

23 = n = n = · · · .It remains to show that Algorithm 1 actually gives the true solution. Had Strassen’s approximation been usedrather than the sphere-packing bound, the result would follow directly from Prop. 11. Instead, the global optimalitycondition (9) can be veriﬁed exhaustively for all possible shift sizes K for the ﬁrst epoch: ¯ α log M sp (23 , η ) − ¯ α − K log M sp (23 − K, η )¯ α log M sp (23 + K ) − ¯ α log M sp (23 , η ) ≥ . Then the same exhaustive veriﬁcation is performed for all possible shifts for the second epoch: ¯ α log M sp (23 , η ) − ¯ α − K log M sp (23 − K, η )¯ α log M sp (23 + K ) − ¯ α log M sp (23 , η ) ≥ α (cid:2) ¯ α log M sp (23 , η ) − ¯ α − K log M sp (23 − K, η ) (cid:3) ¯ α [ ¯ α log M sp (23 + K ) − ¯ α log M sp (23 , η )] ≥ α log M sp (23 , η ) − ¯ α − K log M sp (23 − K, η )¯ α log M sp (23 + K ) − ¯ α log M sp (23 , η ) ≥ .The exhaustive veriﬁcation can be carried out indeﬁnitely to show that using the length- binary Golay code forevery epoch is optimal. F. Practical Codes and Empirical Death Distributions

It should be noted that the algorithms developed for optimizing communication schemes over channels that diework with arbitrary death distributions, even empirically measured ones, e.g. the experimentally characterized deathproperties of a synthetic biology communication system [5, Fig. 3: Reliability].Further, rather than considering the log M ∗ ( n i , η ) function for optimal ﬁnite block length codes, the codeoptimization procedures would work just as well if a collection of ﬁnite block length codes was provided. Sucha limited set of codes might be selected for decoding complexity or other practical reasons. As an example,consider the collection C of binary minimum distance codes of lengths between and given in [44, DVDsupplement]. We run the optimization over the example in Sec. V-D but restricting to C .The result obtained for epoch sizes is ( n = 15 , n = 15 , n = 9 , n = 1) . Under the Strassen approximation,this set of epoch sizes gives . bits, as compared to . bits under the optimal epoch sizes under the Strassenapproximation. However the Strassen approximation is not correct and the actual number of bits achieved with theoptimized epoch sizes for C is . bits. The two minimum distance codes used are the ( n = 15 , M = 256 , d = 5) code and the ( n = 9 , M = 6 , d = 3) code. It remains to be seen whether the restriction to the collection of minimumdistance codes is actually suboptimal.VI. P ARTIAL O RDERING OF C HANNELS

It is of interest to order channels that die by quality. The partial ordering of DMCs was studied by Shannon [56],and as a ﬁrst step, we can slightly extend his result to order channels that die having common death distributions.

Deﬁnition 15:

Let p ( i, j ) be the transition probabilities for a DMC C and let q ( k, l ) be the transition probabilitiesfor a DMC C . Then C is said to include C , C ⊇ C , if there exist two sets of valid transition probabilities r γ ( k, i ) and t γ ( j, l ) , and there exists a vector g : g γ ≥ and P γ g γ = 1 , such that X γ,i,j g γ r γ ( k, i ) p ( i, j ) t γ ( j, l ) = q ( k, l ) . Proposition 15:

Consider two channels that die with identical death distributions: ( X , p a , p d , p T ( t ) , Y ) and ( X , q a , q d , p T ( t ) , Y ) . Let DMC C correspond to p a and let DMC C correspond to q a and moreover suppose that C ⊇ C . Fix a transmission time N and an expected transmission volume V . Let η be the best level of reliabilityfor the ﬁrst channel and η be the best level of reliability for the second channel, under ( N, V ) . Then η ≤ η . Proof:

The main theorem of [56] proves that the average error probability when transmitting an individualmessage code over C is less than or equal to the average error probability when transmitting the same individualmessage code over C .Shannon’s proof [56] holds mutatis mutandis for maximum error probability, replacing “average error probability”by “maximum error probability.”The desired result follows by concatenating individual message codes into a code.We can also order channels that die having common alive state transition probabilities. Deﬁnition 16:

Consider two random variables T and U with survival functions R T ( · ) and R U ( · ) respectively.Then U is said to stochastically dominate T , U ≥ st T , if R T ( t ) ≤ R U ( t ) for all t . Proposition 16:

Consider two channels that die with identical state properties: ( X , p a ( y | x ) , p d ( y | x ) , p T , Y ) and ( X , p a ( y | x ) , p d ( y | x ) , q U , Y ) . Let death random variable T correspond to p T and let death random variable U correspond to q U and moreover suppose that U ≥ st T . Fix a transmission time N and a level of reliability η . Let V be the best expected transmission volume for the ﬁrst channel and V be the best expected transmission volumefor the second channel, under ( N, η ) . Then V ≥ V . Proof:

Recall the expected transmission volume expression (7) for the ﬁrst channel: max ( n i ): P n i = N X i R T ( e i ) log M ∗ ( n i , η ) and for the second channel: max ( ν i ): P ν i = N X i R U ( ι i ) log M ∗ ( ν i , η ) .Since R T ( t ) ≤ R U ( t ) for all t , the result follows directly.These two results give individual ordering principles in the two dimensions essentially depicted in Fig. 3. Puttingthem together provides a partial order on all channels that die: if one channel is better than another channel in bothdimensions, than it is better overall. Proposition 17:

Consider two channels that die: ( X , p a , p d , p T , Y ) and ( X , q a , q d , q U , Y ) . Let DMC C cor-respond to p a and let DMC C correspond to q a and moreover suppose that C ⊇ C . Let death random variable T correspond to p T and let death random variable U correspond to q U and moreover suppose that U ≥ st T . Fixa transmission time N and a level of reliability η . Let V be the best expected transmission volume for the ﬁrstchannel and V be the best expected transmission volume for the second channel, under ( N, η ) . Then V ≥ V .VII. C ONCLUSION AND F UTURE W ORK

We have formulated the problem of communication over channels that die and have shown how to maximizeexpected transmission volume at a given level of error probability reliability.There are several extensions to the basic formulation studied in this work that one might consider; we list a few: • Inspired by synthetic biology [5], rather than thinking of death time as independent of the signaling scheme X n , one might consider channels that die because they lose ﬁtness as a consequence of operation: T wouldbe dependent on X n . This would be similar to Gallager’s panic button/child’s toy channel, and would haveintersymbol interference [31], [34]. There would also be strong connections to channels that heat up [57] andcommunication with a dynamic cost [58, Ch. 3]. • In the emerging attention economy [59], agents faced with information overload [60] may permanently stoplistening to certain communication media received over noisy channels. This setting is exactly modeled bychannels that die. The impact of communication over channels that die on the productivity and efﬁciency ofhuman organizations may be determined by building on the results herein. • Since channel death is indicated by the symbol ? , the receiver unequivocally knows death time. Other channelmodels might not have a distinct output letter for death and would need to detect death, perhaps using thetheory of estimating stopping times [61]. • Inspired by communication terminals that randomly lie within communication range, e.g. in vehicular com-munication, one might also consider a channel that is born at a random time and then dies at a random time.One would suspect that channel state feedback would be beneﬁcial. Networks of birth-death channels are alsoof interest and would have connections to percolation-style work [2]. • This work has simply considered the channel coding problem, however there are several formulations of end-to-end information transmission problems over channels that die, which are of interest in many applicationareas. There is no reason to suspect a separation principle.Randomly stepping back from inﬁnity leads to some new understanding of the fundamental limits of communicationin the presence of noise and unreliability. A

CKNOWLEDGMENT

We thank Barry Canton (Ginkgo BioWorks) and Drew Endy (Stanford University) for discussions on synthetic bi-ology that initially inspired this work. Discussions with Baris Nakiboglu and Yury Polyanskiy (Princeton University)are also appreciated. R EFERENCES [1] L. R. Varshney, S. K. Mitter, and V. K. Goyal, “Channels that die,” in

Proc. 47th Annu. Allerton Conf. Commun. Control Comput. ,Sept.-Oct. 2009, pp. 566–573.[2] I. M. Jacobs, “Connectivity in probabilistic graphs: An abstract study of reliable communications in systems containing unreliablecomponents,” Sc.D. thesis, Massachusetts Institute of Technology, Cambridge, MA, Aug. 1959.[3] D. J. Davis, “An analysis of some failure data,”

J. Am. Stat. Assoc. , vol. 47, no. 258, pp. 113–150, Jun. 1952.[4] I. Dietrich and F. Dressler, “On the lifetime of wireless sensor networks,”

ACM Trans. Sensor Netw. , vol. 5, no. 1, p. 5, Feb. 2009.[5] B. Canton, A. Labno, and D. Endy, “Reﬁnement and standardization of synthetic biological parts and devices,”

Nat. Biotechnol. , vol. 26,no. 7, pp. 787–793, Jul. 2008.[6] J. Pfeifer, “The use of information theory in biology: Lessons from social insects,”

Biol. Theory , vol. 1, no. 3, pp. 317–330, Summer2006.[7] L. R. Varshney, “Optimal information storage: Nonsequential sources and neural channels,” S.M. thesis, Massachusetts Institute ofTechnology, Cambridge, MA, Jun. 2006.[8] D. Endy, “Foundations for engineering biology,”

Nature , vol. 438, no. 7067, pp. 449–453, Nov. 2005.[9] J. D. Bekenstein, “The limits of information,”

Stud. Hist. Philos. Mod. Phys. , vol. 32, no. 4, pp. 511–524, Dec. 2001.[10] D. R. Headrick,

The Invisible Weapon: Telecommunications and International Politics, 1851–1945 . New York: Oxford UniversityPress, 1991.[11] S. Hara, A. Ogino, M. Araki, M. Okada, and N. Morinaga, “Throughput performance of SAW–ARQ protocol with adaptive packetlength in mobile packet data transmission,”

IEEE Trans. Veh. Technol. , vol. 45, no. 3, pp. 561–569, Aug. 1996.[12] E. Modiano, “An adaptive algorithm for optimizing the packet size used in wireless ARQ protocols,”

Wireless Netw. , vol. 5, no. 4, pp.279–286, Jul. 1999.[13] P. Lettieri and M. B. Srivastava, “Adaptive frame length control for improving wireless link throughput, range, and energy efﬁciency,”in

Proc. 17th Annu. Joint Conf. IEEE Computer Commun. Soc. (INFOCOM’98) , vol. 2, Mar. 1998, pp. 564–571.[14] S. Ci, H. Sharif, and K. Nuli, “Study of an adaptive frame size predictor to enhance energy conservation in wireless sensor networks,”

IEEE J. Sel. Areas Commun. , vol. 23, no. 2, pp. 283–292, Feb. 2005.[15] D. Slepian, “Bounds on communication,”

Bell Syst. Tech. J. , vol. 42, pp. 681–707, May 1963.[16] S. S. L. Chang, B. Harris, and J. J. Metzner, “Optimum message transmission in a ﬁnite time,”

IRE Trans. Inf. Theory , vol. 8, no. 5,pp. 215–224, Sep. 1962.[17] S. J. MacMullan and O. M. Collins, “A comparison of known codes, random codes, and the best codes,”

IEEE Trans. Inf. Theory ,vol. 44, no. 7, pp. 3009–3022, Nov. 1998.[18] J. N. Laneman, “On the distribution of mutual information,” in

Proc. Inf. Theory Appl. Inaugural Workshop , Feb. 2006.[19] Y. Polyanskiy, H. V. Poor, and S. Verd´u, “New channel coding achievability bounds,” in

Proc. 2008 IEEE Int. Symp. Inf. Theory , Jul.2008, pp. 1763–1767.[20] D. Buckingham and M. C. Valenti, “The information-outage probability of ﬁnite-length codes over AWGN channels,” in

Proc. 42ndAnnu. Conf. Inf. Sci. Syst. (CISS 2008) , Mar. 2008, pp. 390–395.[21] G. Wiechman and I. Sason, “An improved sphere-packing bound for ﬁnite-length codes over symmetric memoryless channels,”

IEEETrans. Inf. Theory , vol. 54, no. 5, pp. 1962–1990, May 2008.[22] L. H. Ozarow, S. Shamai, and A. D. Wyner, “Information theoretic considerations for cellular mobile radio,”

IEEE Trans. Veh. Technol. ,vol. 43, no. 2, pp. 359–378, May 1994.[23] A. Goldsmith,

Wireless Communications . New York: Cambridge University Press, 2005.[24] J. K. Wolf, A. D. Wyner, and J. Ziv, “The channel capacity of the postal channel,”

Inf. Control , vol. 16, no. 2, pp. 167–172, Apr. 1970.[25] M. Zeng, R. Zhang, and S. Cui, “On the outage capacity of a dying channel,” in

Proc. IEEE Global Telecommun. Conf. (GLOBECOM2008) , Dec. 2008.[26] Y. Polyanskiy, H. V. Poor, and S. Verd´u, “Dispersion of the Gilbert-Elliott channel,” in

Proc. 2009 IEEE Int. Symp. Inf. Theory , Jul.2009, pp. 2209–2213.[27] ——, “Dispersion of Gaussian channels,” in

Proc. 2009 IEEE Int. Symp. Inf. Theory , Jul. 2009, pp. 2204–2208.[28] C. E. Shannon, “The zero error capacity of a noisy channel,”

IRE Trans. Inf. Theory , vol. IT-2, no. 3, pp. 8–19, Sep. 1956.[29] S. Sanghavi, “Intermediate performance of rateless codes,” in

Proc. IEEE Inf. Theory Workshop (ITW’07) , Sep. 2007, pp. 478–482.[30] A. Kamra, V. Misra, J. Feldman, and D. Rubenstein, “Growth codes: Maximizing sensor network data persistence,” in

Proc. 2006 Conf.Appl. Technol. Archit. Protocols Comput. Commun. (SIGCOMM’06) , Sep. 2006, pp. 255–266.[31] R. G. Gallager,

Information Theory and Reliable Communication . New York: John Wiley & Sons, 1968.[32] S. M. Ross,

Stochastic Processes . John Wiley & Sons, 1996.[33] R. G. Gallager,

Discrete Stochastic Processes . Boston: Kluwer Academic Publishers, 1996.[34] R. Gallager,

Information Theory and Reliable Communication , ser. International Centre for Mechanical Sciences, Courses and Lectures.Vienna: Springer-Verlag, 1972, no. 30.[35] S. Tatikonda and S. Mitter, “The capacity of channels with feedback,”

IEEE Trans. Inf. Theory , vol. 55, no. 1, pp. 323–349, Jan. 2009.[36] M. Mushkin and I. Bar-David, “Capacity and coding for the Gilbert–Elliott channels,”

IEEE Trans. Inf. Theory , vol. 35, no. 6, pp.1211–1290, Nov. 1989.[37] A. J. Goldsmith and P. P. Varaiya, “Capacity, mutual information, and coding for ﬁnite-state Markov channels,”

IEEE Trans. Inf. Theory ,vol. 42, no. 3, pp. 868–886, May 1996.[38] L. E. Braten and T. Tjelta, “Semi-Markov multistate modeling of the land mobile propagation channel for geostationary satellites,”

IEEE Trans. Antennas Propag. , vol. 50, no. 12, pp. 1795–1802, Dec. 2002.[39] J. Wang, J. Cai, and A. S. Alfa, “New channel model for wireless communications: Finite-state phase-type semi-Markov channelmodel,” in

Proc. IEEE Int. Conf. Commun. (ICC 2008) , May 2008, pp. 4461–4465. [40] S. Wang and J.-T. Park, “Modeling and analysis of multi-type failures in wireless body area networks with semi-Markov model,” IEEECommun. Lett. , vol. 14, no. 1, pp. 6–8, Jan. 2010.[41] F. Jelinek,

Probabilistic Information Theory: Discrete and Memoryless Models . New York: McGraw-Hill Book Company, 1968.[42] G. D. Forney, Jr., “Convolutional codes II. Maximum-likelihood decoding,”

Inf. Control , vol. 25, no. 3, pp. 222–266, Jul. 1974.[43] R. J. McEliece,

The Theory of Information and Coding . Cambridge: Cambridge University Press, 2002.[44] P. Kaski and P. R. J. ¨Osterg˚ard,

Classiﬁcation Algorithms for Codes and Designs . Berlin: Springer, 2006.[45] A. Barg and A. McGregor, “Distance distribution of binary codes and the error probability of decoding,”

IEEE Trans. Inf. Theory ,vol. 51, no. 12, pp. 4237–4246, Dec. 2005.[46] V. Strassen, “Asymptotische absch¨atzungen in Shannons informationstheorie,” in

Transactions of the 3rd Prague Conference onInformation Theory, Statistical Decision Functions, Random Processes . Prague: Pub. House of the Czechoslovak Academy of Sciences,1962, pp. 689–723.[47] D. E. Knuth, “Big omicron and big omega and big theta,”

SIGACT News , vol. 8, no. 2, pp. 18–24, Apr.-June 1976.[48] L. Weiss, “On the strong converse of the coding theorem for symmetric channels without memory,”

Q. Appl. Math. , vol. 18, no. 3, pp.209–214, Oct. 1960.[49] G. D. Forney, Jr., “Exponential error bounds for erasure, list, and decision feedback schemes,”

IEEE Trans. Inf. Theory , vol. IT-14,no. 2, pp. 206–220, Mar. 1968.[50] Y. Polyanskiy, “Channel coding: non-asymptotic fundamental limits,” Ph.D. dissertation, Princeton University, Nov. 2010.[51] L. R. Varshney, “Unreliable and resource-constrained decoding,” Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, MA,Jun. 2010.[52] M. S. A. Khan, A. Khalique, and A. M. Abouammoh, “On estimating parameters in a discrete Weibull distribution,”

IEEE Trans. Rel. ,vol. 38, no. 3, pp. 348–350, Aug. 1989.[53] D. P. Bertsekas,

Dynamic Programming and Optimal Control , 3rd ed. Belmont, MA: Athena Scientiﬁc, 2005, vol. 1.[54] M. J. E. Golay, “Notes on digital coding,”

Proc. IRE , vol. 37, no. 6, p. 657, Jun. 1949.[55] R. E. Blahut,

Theory and Practice of Error Control Codes . Reading, MA: Addison-Wesley Publishing Company, 1983.[56] C. E. Shannon, “A note on a partial ordering for communication channels,”

Inf. Control , vol. 1, no. 4, pp. 390–397, Dec. 1958.[57] T. Koch, A. Lapidoth, and P. P. Sotiriadis, “Channels that heat up,”

IEEE Trans. Inf. Theory , vol. 55, no. 8, pp. 3594–3612, Aug. 2009.[58] K. Eswaran, “Communication and third parties: Costs, cues, and conﬁdentiality,” Ph.D. dissertation, University of California, Berkeley,Berkeley, CA, 2009.[59] T. H. Davenport and J. C. Beck,

The Attention Economy: Understanding the New Currency of Business . Boston: Harvard BusinessSchool Press, 2001.[60] T. Van Zandt, “Information overload in a network of targeted communication,”

Rand J. Econ. , vol. 35, no. 3, pp. 542–560, Autumn2004.[61] U. Niesen and A. Tchamkerten, “Tracking stopping times through noisy observations,”