On the Delay of Network Coding over Line Networks
Theodoros K. Dikaliotis, Alexandros G. Dimakis, Tracey Ho, Michelle Effros
aa r X i v : . [ c s . I T ] O c t On the Delay of Network Codingover Line Networks
Theodoros K. Dikaliotis, Alexandros G. Dimakis, Tracey Ho, Michelle EffrosDepartment of Electrical EngineeringCalifornia Institute of Technology email: { tdikal,adim,tho,effros } @caltech.edu Abstract
We analyze a simple network where a source and a receiver are connected by a line of erasure channels ofdifferent reliabilities. Recent prior work has shown that random linear network coding can achieve the min-cutcapacity and therefore the asymptotic rate is determined by the worst link of the line network. In this paper weinvestigate the delay for transmitting a batch of packets, which is a function of all the erasure probabilities and thenumber of packets in the batch. We show a monotonicity result on the delay function and derive simple expressionswhich characterize the expected delay behavior of line networks. Further, we use a martingale bounded differencesargument to show that the actual delay is tightly concentrated around its expectation.
I. I
NTRODUCTION
A common approach for practical network coding performs random linear coding over batches orgenerations [1], where the relevant delay measure is the time taken for the batch to be received. Such in-network coding is particularly beneficial in lossy networks [2] compared to end-to-end erasure coding. Inthis paper we investigate the batch end-to-end delay for lossy line networks. We consider the use of randomlinear network coding without feedback and a packet erasure model with different link qualities. All thenodes in the network store all the packets they receive and whenever given a transmission opportunity,send a random linear combination of all the stored packets [2], [3] over erasure links.Despite the extensive recent work on network coding over lossy networks (e.g. [2], [3], [4]) the expectedtime required to send a fixed number of packets over a network of erasure links is not completelycharacterized. Closely related work on delay in queueing theory [5], [6] assumes Poisson arrivals and their results pertain to the delay of individual packets in steady state and [7] examines the delay fora single queue multicasting to several users using block network coding. In our work, we consider abatch of n packets that need to be communicated over a line network of ℓ erasure links where each linkexperiences an erasure with probability p , p , . . . , p ℓ and we are interested in the expected total time E T n for the n packets to travel across the line network.Prior work [2], [3] established that random linear network coding can achieve the min-cut capacity andtherefore the asymptotic rate is determined by the worst link of the line network. Therefore, the expectedtime E T n for the n packets to cross the network is E T n = n − max ≤ i ≤ ℓ p i + D ( n, p , p , . . . , p ℓ ) , (1)where the delay function D ( n, p , p , . . . , p ℓ ) is the sublinear part: lim n →∞ ,ℓ fixed D ( n, p , p , . . . , p ℓ ) n = 0 . However, relatively little is known about the delay function D ( n, p , p , . . . , p ℓ ) .In this work we characterize the delay function by showing that it is non-decreasing in n and is boundedby a simple function ¯ D ( p , p , . . . , p ℓ ) of the link erasure probabilities. The main results of this paper arethe following two theorems which characterize the expected behavior and show a concentration of theactual delay random variable close to this expectation. Theorem 1:
Consider n packets communicated through a line network of ℓ links with erasureprobabilities p , p , . . . , p ℓ and assume that there is a unique worst link: p m := max ≤ i ≤ ℓ p i , p i < p m < ∀ i = m. The expected time E T n to send all n packets is: E T n = n − max ≤ i ≤ ℓ p i + D ( n, p , p , . . . , p ℓ ) , where the delay function D ( n, p , p , . . . , p ℓ ) is non-decreasing in n and upper bounded by: ¯ D ( p , p , . . . , p ℓ ) := ℓ X i =1 ,i = m p m p m − p i . If on the other hand there are two links that take the worst value, then the delay function is not boundedbut still exhibits the sublinear behavior. Pakzad et al. [3] prove that in the case of a two-hop network withidentical links the delay function grows as √ n . We also prove the following concentration result: Theorem 2:
The time T n for n packets to travel across the network is concentrated around its expectedvalue with high probability. In particular for sufficiently large n : P [ | T n − E T n | > ǫ n ] ≤ (cid:18) − max ≤ i ≤ ℓ p i (cid:19) n + o (cid:18) n (cid:19) , for deviations ǫ n = n / / (1 − max ≤ i ≤ ℓ p i ) .Since E T n grows linearly in n and the deviations ǫ n are sublinear, T n is tightly concentrated around itsexpectation for large n with probability approaching one.The remainder of this paper is organized as follows: Section II presents the precise model we use forpacket communication. Section III presents the analysis for the general multi-hop network. Section IVcontains a discussion of the results presented in this paper along with comments for future research.II. M ODEL
The general network under consideration is depicted in Fig. 1. The network consists of ℓ + 1 nodes N ( i ) , ≤ i ≤ ℓ + 1 , and ℓ links L ( i ) , ≤ i ≤ ℓ , with source node N (1) and destination node N ( ℓ +1) . Node N ( i ) , ≤ i ≤ ℓ is connected to node N ( i +1) to its right through the erasure link L ( i ) .We assume a discrete time model in which the source wishes to transmit n packets to the destination.At each time step, node N ( i ) can transmit one packet through link L ( i ) to node N ( i +1) , ≤ i ≤ ℓ . Thetransmission succeeds with probability − p i or the packet gets erased with probability p i . Erasures acrossdifferent links and time steps are assumed to be independent. At each time step the packet transmittedby node N ( i ) is a random linear combination of all previously received packets at the node. We want todetermine the time T n taken for the destination node to receive (decode) all the n packets initially presentat the source node N (1) . We assume that no link fails with probability 1 ( p i < , ≤ i ≤ ℓ ) or else theproblem becomes trivial since there are no packets traveling through the network. The destination node N ( ℓ +1) will decode once it receives n linearly independent combinations of the initial packets.Coding at each hop (network coding) is needed to achieve minimum delay when feedback is unavailable,slow or expensive. If instantaneous feedback is available at each hop an automatic repeat request (ARQ)scheme with simple forwarding of packets achieves a block delay performance identical to network coding. Note that coding only at the source is suboptimal in terms of throughput and delay [2]. The only feedbackrequired in the network coding case is that the destination node N ( ℓ +1) , once it receives all the necessarylinearly independent packets, signals the end of transmission to all the other nodes.As explained in [8], information travels through the network in the form of innovative packets. A packetat node N ( i ) , ≤ i ≤ ℓ is innovative if it does not belong to the space spanned by packets present atnode N ( i +1) . Each node needs to code, and therefore store, only the part of the information that has notalready been received by N ( i +1) . If feedback was present, nodes could equivalently drop packets that donot add information to the nodes on their right. Therefore the analysis becomes essentially a queueingtheory problem for innovative packets.In our model, in case of a success the packet is assumed to be transmitted to the next nodeinstantaneously, i.e. we ignore the transmission delay along the links. Moreover, there is no restriction onthe number of packets n or the number of hops ℓ , and there is no requirement for the network to reachsteady state. Fig. 1. Multi-hop network
III. G
ENERAL L INE N ETWORKS
A. Proof of Theorem 1
Let the random variable R ( i ) n , ≤ i ≤ ℓ, denote the rank difference between node N ( i ) and node N ( i +1) ,at the moment packet n arrives at N (2) . This is exactly the number of innovative packets present at node N ( i ) at the random time when packet n arrives at N (2) .The time T n taken to send n packets from the source node N (1) to the destination N ( ℓ +1) can beexpressed as the sum of time T (1) n required for all the n packets to cross the first link and the time τ n required for all the remaining innovative packets R (2) n , . . . , R ( ℓ ) n at nodes N (2) , . . . , N ( ℓ ) respectively toreach the destination node N ( ℓ +1) : T n = T (1) n + τ n . (2)All the quantities in equation (2) are random variables and we want to compute their expected values. Due to the linearity of the expectation E T n = E T (1) n + E τ n (3)and by defining X (1) j , ≤ j ≤ n to be the time taken for packet j to cross the first link, we get: E T (1) n = n X j =1 E X (1) j = n − p (4)since X (1) j , ≤ j ≤ n, are all geometric random variables ( P (cid:16) X (2) j = k (cid:17) = (1 − p ) · p k − , k ≥ ).Therefore combining equations (3) and (4) we get: E T (1) n = n − p + E τ n . (5)Equations (1), (5) give us D ( n, p , p , . . . , p ℓ ) = n − p − n − max ≤ i ≤ ℓ p i + E τ n and clearly the key quantity for calculating the delay function D ( n, p , p , . . . , p ℓ ) is the expected time E τ n taken for all the remaining innovative packets at nodes N (2) , . . . , N ( ℓ ) to reach the destination. Forthe simplest case of a two-hop network ( ℓ = 2 ) we can derive recursive formulas for computing thisexpectation for each n . Table III-A has closed-form expressions for the delay function D ( n, p , p ) for n = 1 , . . . , . It is seen that as n grows, the number of terms in the above expression increases rapidly, TABLE IT
HE DELAY FUNCTION D ( n, p , p ) FOR DIFFERENT VALUES OF n n D ( n, p , p ) − p − − max( p ,p ) + − p − p − − max( p ,p ) + − p − − p p − p − − max( p ,p ) + p ( − p ( − p +(2 − p ) p +(1 − − p ) p ) p )) (1 − p )(1 − p p ) − p − − max( p ,p ) + n p (3 − p (11 + 4 p p + p (5 + (5 − p ) p ) + p p (1 − p (5 + 2 p (5 + 3 p ))) − p (4 + p (15 + p (21 − (1 − p ) p ))) + p (1 − p (1 − p (31 + p (5 + 4 p )))))) o (1 − p )(1 − p p ) making these exact formulas impractical, and as expected for larger values of ℓ ( ≥ ) the situation onlyworsens. Our subsequent analysis derives tight upper bounds on the delay function D ( n, p , p , . . . , p ℓ ) for any ℓ which do not depend on n .The ( ℓ − -tuple Y n = ( R (2) n , . . . , R ( ℓ ) n ) representing the number of innovative packets remaining at nodes N (2) , . . . , N ( ℓ ) the moment packet n arrives at node N (2) (including packet n ) is a multidimensionalMarkov process with state space E ⊂ N ℓ − (the state space is a proper subset of N ℓ − since Y n can nevertake the values (0 , ∗ , . . . , ∗ ) ). Using the coupling method [9] and an argument similar to the one given atProposition 2 in [10] it can be shown that Y n is a stochastically increasing function of n (meaning thatas n increases there is a higher probability of having more innovative packets at nodes N (2) , . . . , N ( ℓ ) ). Proposition 1:
The Markov process Y n = ( R (2) n , . . . , R ( ℓ ) n ) is (cid:22) st -increasing. Proof:
Given in the appendix along with the necessary definitions.A direct result of Proposition 1 is that the expected time taken E τ n for the remaining packets at nodes N (2) , . . . , N ( ℓ ) to reach the destination is a non-decreasing function of n : E τ n ≤ E τ n +1 ≤ lim n →∞ E τ n (6)where in the second inequality is meaningful when the limit exists.Innovative packets travelling in the network from node N (2) to node N ( ℓ +1) can be viewed as customerstravelling through a network of service stations in tandem. Indeed, each innovative packet (customer)arrives at the first station (node N (2) ) with a geometric arrival process and the transmission (service) timeis also geometrically distributed. Once an innovative packet has been transmitted (serviced) it leaves thecurrent node (station) and arrives at the next node (station) waiting for its next transmission (service).By using the interchangeability result on service station from Weber [11], we can interchange theposition of any two links without affecting the departure process of node N ( ℓ ) and therefore the delayfunction. Consequently, without loss of generality we can swap the position of the worst link in the queue(that is unique from the assumptions of Theorem 1) with the first link leaving the positions of all otherlinks unaltered, and therefore without loss of generality we can simply assume that the first link is theworst link ( p , p , . . . , p ℓ < p < ).It is helpful to assume the first link to be the worst one in order to use the results of Hsu and Burkein [12]. The authors proved that a tandem network with geometrically distributed service times and ageometric input process, reaches steady state as long as the input process is slower than any of the servicetimes. Our line network is depicted in Fig. 1 and the input process (of innovative packets) is the geometricarrival process at node N (2) from N (1) . Since p , p , . . . , p ℓ < p the arrival process is slower than anyservice process (transmission of the innovative packet to the next hop) and therefore the network in Fig. 1reaches steady state. Sending an arbitrarily large number of packets ( n → ∞ ) makes the problem of estimating lim n →∞ E τ n –ifthe network was not reaching a steady state the above limit would diverge–the same as calculating theexpected time taken to send all the remaining innovative packets at nodes N (2) , . . . , N ( ℓ ) to reach thedestination N ( ℓ +1) at steady state. This is exactly the expected end-to-end delay for a single customer ina line network that has reached equilibrium. This quantity has been calculated in [13] (page 67, Theorem4.10) and is equal to lim n →∞ E τ n = ℓ X i =2 p p − p i . (7)Combining equations (6) and (7) concludes the proof of Theorem 1 by changing p to p m := max p i < . B. Proof of concentration
Here we present a martingale concentration argument. In particular we prove a slightly stronger versionof Theorem 2:
Theorem 3 (Extended version of Theorem 2):
The time T n for n packets to travel across the linenetwork is concentrated around its expected value with high probability. In particular for sufficientlylarge n : P [ | T n − E T n | > ǫ n ] ≤ − max ≤ i ≤ ℓ p i ) n + 2(1 − max ≤ i ≤ ℓ p i )) n δ n − n δ . for deviations ǫ n = n / δ / (1 − max ≤ i ≤ ℓ p i ) , δ ∈ (0 , / . Proof:
The main idea of the proof is to use the method of Martingale bounded differences [14].This method works as follows: first we show that the random variable we want to show is concentrated isa function of a finite set of independent random variables. Then we show that this function is Lipschitzwith respect to these random variables, i.e. it cannot change its value too much if only one of thesevariables is modified. Using this function we construct the corresponding Doob martingale and use theAzuma-Hoeffding [14] inequality to establish concentration. See also [15], [16] for related concentrationresults using similar martingale techniques.Unfortunately however this method does not seem to be directly applicable to T n because it cannotbe naturally expressed as a function of a bounded number of independent random variables. We use thefollowing trick of showing concentration for another quantity first and then linking that concentration tothe concentration of T n . Specifically, we define R t to be the number of innovative (linearly independent) packets received at thedestination node N ( ℓ +1) after t time steps. R t is linked with T n through the equation: T n = arg t ( R t = n ) . (8)The number of received packets is a well defined function of the link states at each time step. If thereare ℓ links in total, then: R t = g ( z , ..., z ℓ , . . . , z t , ..., z tℓ ) where z ij , ≤ i ≤ t and ≤ j ≤ ℓ , are equal to or depending on whether link j is OFF or ON attime i . If a packet is sent on a link that is ON, it is received successfully; if sent on a link that is OFF,it is erased. It is clear that this function satisfies a bounded Lipschitz condition with a bound equal to : | g ( z , ..., z ℓ , ..., z ij , ..., z t , ..., z tℓ ) − g ( z , ..., z ℓ , ..., z ′ ij , ..., z t , ..., z tℓ ) | ≤ . This is because if we look at the history of all the links failing or succeeding at all the t time slots,changing one of these link states in one time slot can at most influence the received rank by one.Using the Azuma-Hoeffding inequality (see the Appendix Theorem 4) on the Doob martingaleconstructed by R t = g ( z , ..., z ℓ , ..., z t , ..., z tℓ ) we get following the concentration result: Proposition 2:
The number of received packets R t is a concentrated random variable around its meanvalue: P ( | R t − E R t | ≥ ε t ) ≤ t where ε t . = r tℓ ℓn (2 t ) . (9) Proof:
Given in the appendix.Using this concentration and the relation (8) between T n and R t we can show that deviations of theorder ε t . = q tℓ ℓn (2 t ) for R t translate to deviations of the order of ǫ n = n / δ / (1 − max ≤ i ≤ ℓ p i ) for T n . InTheorem 3 smaller values δ give tighter bounds that hold for larger n . Define the events: H t = {| R t − E R t | < ε t } and H t = {| R t − E R t | ≥ ε t } and further define t un ( u stands for upper bound) to be some t , ideally the smallest t , such that E R t − ε t ≥ n and t ln ( l stands for lower bound) to be some t , ideally the largest t , such that E R t + ε t ≤ n . Then wehave: P ( T n ≥ t un ) = P ( T n ≥ t un | H t un ) · P ( H t un )+ P ( T n ≥ t un | H t un ) · P ( H t un ) where: • P ( T n ≥ t un | H t un ) = 0 since at time t = t un the destination has already received more than n innovativepackets. Indeed given that H t un holds: n ≤ E R t un − ε t un < R t un where the first inequality is due to thedefinition of t un . • P ( H t un ) ≤ • P ( T n ≥ t un | H t un ) ≤ • P ( H t un ) ≤ t un due to equation (9).Therefore: P ( T n ≥ t un ) ≤ t un . (10)Similarly: P ( T n ≥ t ln ) = P ( T n ≥ t ln | H t ln ) · P ( H t ln )+ P ( T n ≥ t ln | H t ln ) · P ( H t ln ) where: • P ( T n ≤ t ln | H t ln ) = 0 since at time t = t ln the destination has already received less than n innovativepackets. Indeed given that H t ln holds: R t un < E R t un + ε t un < n where the last inequality is due to thedefinition of t ln . • P ( H t ln ) ≤ • P ( T n ≤ t ln | H t ln ) ≤ • P ( H t ln ) ≤ t ln due to equation (9). Therefore: P ( T n ≤ t ln ) ≤ t ln . (11)Equations (10) and (11) show that the random variable T n representing the time required for n packetsto travel across a line network exhibits some kind of concentration between t ln and t un , which are bothfunctions of n . In the case of a line network, E R t = A · t − r ( t ) where A = (1 − max ≤ i ≤ ℓ p i ) is a constantequal to the capacity of the line network and r ( t ) is a bounded function representing the expected numberof innovative packets that have crossed the first link (once again the worst link in the network has beenpositioned as the first link) by time t without having reached the destination. Since r ( t ) is bounded, alegitimate choice for large enough n for t ln and t un is the following (see Lemma 1 in the Appendix): t un = ( n + n / δ ′ ) /A, δ ′ ∈ (0 , / (12) t ln = ( n − n / δ ′ ) /A, δ ′ ∈ (0 , / (13)From both (10) and (11): P ( t ln ≤ T n ≤ t un ) = 1 − P ( T n ≤ t ln ) − P ( T n ≥ t un ) ≥ − t ln − t un (14)and by substituting in (14) the t un , t ln from equations (12) and (13) we get: P ( − n / δ ′ A ≤ T n − nA ≤ n / δ ′ A ) ≥ − An − n / δ ′ − An + n / δ ′ and since E T n = nA + O (1) we have: P ( | T n − E T n | ≤ n / δ A ) ≥ − An − An δ n − n δ or P ( | T n − E T n | > n / δ A ) ≤ An + 2 An δ n − n δ where δ > δ ′ and a simple substitution of A with (1 − max ≤ i ≤ ℓ p i ) concludes the proof. Fig. 2. The probability mass function of T n of a two-hop network with n = 50 , p = 0 . , p = 0 . IV. D
ISCUSSION AND C ONCLUSIONS
In this paper we analyzed the delay function and characterized its asymptotic behavior for an arbitraryset of erasure probabilities p , p , . . . , p ℓ that has a single worst link. The validity of our analysis isexperimentally shown in Fig. 4 and 5. In particular, Fig. 4 shows the probability mass function (pmf) —computed via simulation — of T n tightly concentrated around its expected value for a somewhat smallvalue of n = 50 . Fig. 5 shows the delay function D ( n, p , p ) rapidly approaching the computed bound ¯ D ( p , p ) as n grows (for p = 0 . , p = 0 . ).One limitation of our technique is the assumption of a single worst link. It is critical in our analysisbecause after bringing the worst link in the first position, it is equivalent to guaranteeing that all the otherqueues are bounded in expectation. If there is more than one bottleneck link the delay function can beunbounded [3] and the general behavior remains a topic for future work. Further understanding the delayfunction for more general networks is a challenging problem that might be relevant for delay critical Fig. 3. The delay function D ( n, p , p ) for a two-hop network with p = 0 . , p = 0 . applications. A CKNOWLEDGMENT
This material is partly funded by subcontract
PPENDIX
Definition 1:
A binary relation (cid:22) defined on a set P is called a preorder if it is reflexive and transitive,i.e. ∀ a, b, c ∈ P : a (cid:22) a (reflexivity) (15) ( a (cid:22) b ) ∧ ( b (cid:22) c ) ⇒ a (cid:22) c (transitivity) (16) Definition 2:
On the set N ℓ − of all integer ( ℓ − -tuples we define the regular preorder (cid:22) that is ∀ a, b ∈ N ℓ − a (cid:22) b iff a ≤ b , . . . , a ℓ − ≤ b ℓ − where a = ( a , . . . , a ℓ − ) and b = ( b , . . . , b ℓ − ) . Similarlywe can define the preorder (cid:23) . Definition 3:
A random vector X ∈ N ℓ − is said to be stochastically smaller in the usual stochasticorder than a random vector Y ∈ N ℓ − , (denoted by X (cid:22) st Y ) if: ∀ ω ∈ N ℓ − , P ( X (cid:23) ω ) ≤ P ( Y (cid:23) ω ) . Definition 4:
A family of random variables { Y n } n ∈ N is called stochastically increasing ( (cid:22) st -increasing)if Y k (cid:22) st Y n whenever k ≤ n . Proof: [Proof of Proposition 1] Markov process { Y n , n ≥ } , is a multidimensional process on E = N ℓ − representing the number of innovative packets at nodes N (2) , . . . , N ( ℓ ) when packet n arrivesat N (2) . To prove that the Markov process { Y n , n ≥ } is stochastically increasing we introduce twoother processes { X n , n ≥ } and { Z n , n ≥ } having the same state space and transition probabilities as { Y n , n ≥ } .More precisely, Markov process { Y n , n ≥ } is effectively observing the evolution of the numberof innovative packets present at every node of the tandem queue. We define the two new processes { X n , n ≥ } and { Z n , n ≥ } to observe the evolution of two other tandem queues having the same linkfailure probabilities as the queue of { Y n , n ≥ } . Fig. 4. Multi-hop network with the corresponding Markov chains
As seen in Fig. 4, at each time step and at every link, the queues for { X n , n ≥ } and { Z n , n ≥ } either both succeed or a fail together. Moreover the successes or failures on each link on the queuesobserved by { X n , n ≥ } and { Z n , n ≥ } are independent of the successes or failures on the queueobserved by { Y n , n ≥ } . Formally the joint process { ( X n , Z n ) , n ≥ } constitute a coupling meaningthat marginally each one of { X n , n ≥ } and { Z n , n ≥ } have the transition matrix P Y of { Y n , n ≥ } . If Markov processes { X n , n ≥ } and { Z n , n ≥ } have different initial conditions then the followingrelation holds: X (cid:22) Z ⇒ X n (cid:22) Z n (17)The proof of the above statement is very similar to the proof of Proposition 2 in [10]. Essentiallyrelation (17) states that since at both queues all links succeed or fail together the queue that holds morepackets at each node initially ( n = 1 ) will also hold more packets subsequently ( n > ) at every node.The initial state Y of Markov process { Y n , n ≥ } is state α = (1 , , . . . , that is also called theminimal state since any other state is greater than the minimal state. To prove Proposition 1 we set bothprocesses { Y n , n ≥ } and { X n , n ≥ } to start from the minimal state ( Y D = δ α , X D = δ α where D = meansequality in distribution), whereas process { Z n , n ≥ } has initial distribution µ that is the distribution ofprocess { Y n , n ≥ } after ( n − k ) steps ( µ = P n − kY δ α and Z D = µ ). Then for every ω in the state space of { Y n , n ≥ } we get: P ( X n (cid:23) ω ) = P ( Y n (cid:23) ω ) = P ( Z k (cid:23) ω ) (18)where the first equality holds since the two processes have the same distribution–both start from theminimal element and have the same transition matrices–and the second equality holds since Z k D = P kY µ ≡ P kY ( P n − kY δ α ) = P nY δ α D = Y n . Moreover due to the definition of the minimal element, X (cid:22) Z and using (17) we get X n (cid:22) Z n .Therefore P ( Z k (cid:23) ω ) ≥ P ( X k (cid:23) ω ) = P ( Y k (cid:23) ω ) . (19)The last equality follows from the fact that the two distributions have the same law. Equations (18) and(19) conclude the proof. Definition 5:
A sequence of random variables V , V , . . . is said to be a martingale with respect to another sequence U , U , . . . if, for all n ≥ , the following conditions hold: • E [ | V n | ] < ∞ • E [ V n +1 | U , . . . , U n ] = V n A sequence of random variables V , V , . . . is called martingale when it is a martingale with respect toitself. That is: • E [ | V n | ] < ∞ • E [ V n +1 | V , ..., V n ] = V n Theorem 4: (Azuma-Hoeffding Inequality): Let X , X ,..., X n be a martingale such that B k ≤ X k − X k − ≤ B k + d k for some constants d k and for some random variables B k that may be a function of X , ..., X k − . Thenfor all t ≥ and any λ > , P ( | X t − X | ≥ λ ) ≤ (cid:18) − λ P ti =1 d i (cid:19) Proof:
Theorem 12.6 in [14]
Proof: [Proof of Proposition 2] The proof is based on the fact that from a sequence of randomvariables U , U , . . . , U n and any function f it’s possible to define a new sequence V , . . . , V n V = E [ f ( U , . . . , U n )] V i = E [ f ( U , . . . , U n ) | U , . . . , U i ] that is a martingale ( Doob martingale). Using the identity E [ V | W ] = E [ E [ V | U, W ] | W ] it’s easy to verifythat the above sequence V , . . . , V n is indeed a martingale. Moreover if function f is c-Lipschitz and U , . . . , U n are independent it can be proved that the differences V i − V i − are restricted within boundedintervals [14] (pages 305-306).Function R t = g ( z , ..., z tℓ ) has a bounded expectation, is and the random variables z ij areindependent and therefore all the requirements of the above analysis hold. Specifically by setting G h = E [ g ( z , ..., z tℓ ) | z , ..., z kr | {z } ] h -terms in total we can apply the Azuma-Hoeffding inequality on the G , ..., G tℓ martingale and we get the followingconcentration result P [ | G tℓ − G | ≥ λ ] = P [ | R t − E [ R t ] | ≥ λ ] ≤ {− λ tℓ } . (20)The equality above holds since • G = E [ R t ] • G tℓ = R t (the random variable itself)and by substituting on (20) λ with ε t . = q tℓ ℓn (2 t ) P [ | R t − E [ R t ] | ≥ ε t ] ≤ t Lemma 1:
When the expected number of innovative packets E R t received at the destination by time t is given by E R t = A · t − r ( t ) where A is a constant and r ( t ) is a bounded function then one legitimatechoice for t un and t ln is: t un = ( n + n / δ ′ ) /A, δ ′ ∈ (0 , / t ln = ( n − n / δ ′ ) /A, δ ′ ∈ (0 , / Proof:
The only requirement for t un is that it is a t such that E R t − ǫ t ≥ n . This is indeed true forlarge enough n if we substitute t un with ( n + n / δ ′ ) /A : E [ R t un ] − ǫ t un ≥ n ⇒ At un − r ( t un ) − ǫ t un ≥ n ⇒ At un − r ( t un ) − r ℓ · t un ℓn (2 t un ) ≥ n ⇒ A · n + n / δ A − r ( t un ) − r ℓ ( n + n / δ ) A ℓn ( 2( n + n / δ ) A ) ≥≥ n + n / δ − B − r ℓ ( n + n / δ ) A ℓn ( 2( n + n / δ ) A ) ≥ n ⇒ n / δ ≥ r ℓ ( n + n / δ ) A ℓn ( 2( n + n / δ ) A ) + B ⇒ n / δ ≥ √ n r ℓ (1 + n δ − / A ℓn ( 2( n + n / δ ) A ) + B ⇒ n δ ≥ r ℓ (1 + n δ − / A ℓn ( 2( n + n / δ ) A ) + Bn / where B is the upper bound of the function r ( t ) and the last equation holds for large enough n . Similarly t ln is a t such that E R t + ǫ t ≤ n . This is indeed true for large enough n if we substitute t ln with ( n − n / δ ′ ) /A : E [ R t ln ] + ǫ t ln ≤ n ⇒ At ln − r ( t ln ) + ǫ t ln ≤ n ⇒ At ln − r ( t ln ) + r ℓ · t ln ℓn (2 t ln ) ≤ n ⇒ A · n − n / δ A − r ( t ln ) + r ℓ ( n − n / δ ) A ℓn ( 2( n − n / δ ) A ) ≤≤ n − n / δ + r ℓ ( n − n / δ ) A ℓn ( 2( n − n / δ ) A ) ≤ n ⇒ r ℓ ( n − n / δ ) A ℓn ( 2( n − n / δ ) A ) ≤ n / δ ⇒ √ n r ℓ (1 − n δ − / A ℓn ( 2( n − n / δ ) A ) ≤ n / δ ⇒ r ℓ (1 − n δ − / A ℓn ( 2( n − n / δ ) A ) ≤ n δ where the last inequality holds for large enough n .R EFERENCES [1] P. A. Chou, Y. Wu, and K. Jain, “Practical network coding,” in
Proc. 41st Annual Allerton Conference on Communication, Control,and Computing , 2003. [Online]. Available: http://research.microsoft.com/ ∼ pachou/pubs/ChouWJ03.pdf[2] D. S. Lun, M. M´edard, and M. Effros, “On coding for reliable communication over packet networks,” in In Proc. 42nd Annual AllertonConference on Communication, Control, and Computing, Invited paper , September-October 2004.[3] P. Pakzad, C. Fragouli, and A. Shokrollahi, “Coding schemes for line networks,” in
Proc. IEEE Int. Symp. Inf. Theory (ISIT) , Sep.2005, pp. 1853–1857.[4] A. F. Dana, R. Gowaikar, R. Palanki, B. Hassibi, and M. Effros, “Capacity of wireless erasure networks,”
IEEE Transactions onInformation Theory , vol. 52, pp. 789–804, 2006.[5] I. Rubin, “Communication networks: Message path delays,”
IEEE Trans. Inf. Theory , vol. 20, no. 6, pp. 738–745, Nov. 1974.[6] M. Shalmon, “Exact delay analysis of packet-switching concentrating networks,”
IEEE Trans. Commun. , vol. 35, no. 12, pp. 1265–1271,Dec. 1987.[7] B. Shrader and A. Ephremides, “On the queueing delay of a multicast erasure channel,” in
Proceedings of the IEEE Information TheoryWorkshop , 2006.[8] J. Sundararajan, D. Shah, and M. M´edard, “On queueing in coded networks queue size follows degrees of freedom,”
InformationTheory for Wireless Networks, 2007 IEEE Information Theory Workshop on , pp. 1–6, July 2007.[9] T. Lindvall,
Lectures on the Coupling Method . Courier Dover Publications, 2002.[10] H. Castel-Taleb, L. Mokdad, and N. Pekergin, “Aggregated bounding markov processes applied to the analysis of tandem queues,” in
ValueTools ’07: Proceedings of the 2nd international conference on Performance evaluation methodologies and tools . ICST, 2007,pp. 1–10.[11] R. R. Weber, “The interchangeability of tandem queues with heterogeneous customers and dependent service times,”
Adv. Appl.Probability , vol. 24, no. 3, pp. 727–737, Sep. 1992.[12] J. Hsu and P. Burke, “Behavior of tandem buffers with geometric input and Markovian output,”
IEEE Trans. Commun. , vol. 24, no. 3,pp. 358–361, Mar. 1976. [13] H. Daduna, Queueing Networks with Discrete Time Scale . New York: Springer-Verlag, 2001.[14] M. Mitzenmacher and E. Upfal,
Probability and Computing: Randomized Algorithms and Probabilistic Analysis