Characterization of Random Linear Network Coding with Application to Broadcast Optimization in Intermittently Connected Networks
aa r X i v : . [ c s . I T ] A p r Characterization of Random Linear NetworkCoding with Application to Broadcast Optimizationin Intermittently Connected Networks
Gabriel Popa ∗∗ Computer Engineering and Networks Laboratory, ETH Zurich, Gloriastr. 35, 8092 Zurich, SwitzerlandEmail: [email protected]
Abstract —We address the problem of optimizing the through-put of network coded traffic in mobile networks operating inchallenging environments where connectivity is intermittent andlocally available memory space is limited. Random linear networkcoding (RLNC) is shown to be equivalent (across all possibleinitial conditions) to a random message selection strategy wherenodes are able to exchange buffer occupancy information duringcontacts. This result creates the premises for a tractable analysisof RLNC packet spread, which is in turn used for enhancing itsthroughput under broadcast. By exploiting the similarity betweenchannel coding and RLNC in intermittently connected networks,we show that quite surprisingly, network coding, when not usedproperly, is still significantly underutilizing network resources.We propose an enhanced forwarding protocol that increasesconsiderably the throughput for practical cases, with negligibleadditional delay.
I. I
NTRODUCTION
The paper focuses on improving throughput in intermittentlyconnected networks while maintaining low delivery delays.Intermittently connected networks (or DTNs – disruptiontolerant networks) are networks of very mobile, power- andmemory-constrained devices where connectivity is sporadic.This is the model of choice for wireless networks operatingin challenging conditions (networks of UAVs, disaster reliefscenarios, etc. ). As traditional routing approaches cannot beapplied in this case (little is known in advance about futureconnectivity), the literature has developed opportunistic (epi-demic) forwarding protocols that replicate packets to multiplerelay nodes in order to optimize the delivery delay and/orthe chance that packets get delivered to the destination(s) [1].Increasing throughput, while keeping low delay, is a problemof practical interest as it would enable nodes to receivemore information per time unit, with almost the same delay.RLNC has emerged recently as a promising approach for suchapplications. It ameliorates the transmission by introducing thediversity of multiple independent combinations in the epidemicforwarding. Nevertheless, analyzing and optimizing RLNCfor DTNs is difficult[2]. We prove that RLNC in DTNs isin fact equivalent to a forwarding algorithm not employingnetwork coding, which is much easier to analyze. Using thisequivalence, we show that RLNC still produces too manyredundant packets during contacts, thereby underutilizing net-work resources (inter-contact times and consequently bufferspace). Our study shows that transmissions of a backloggedsource can be conveniently pipelined even though no feedbackfrom destinations is available such that significant throughputgains can be attained with negligible additional delay. Basedon these observations, we design and evaluate a forwarding protocol with reduced buffer and energy requirements (lessmobility required for collecting the same number packets).
Related work:
In their seminal paper, Deb et al. [3] offer anin-depth analysis of random linear network coding (RLNC)for networks with intermittent contacts. Numerous studieshave built upon these results, extending them to the caseof DTNs[4], [5], [6]; RLNC is used to improve the averagedelivery ratio within a given time unit. Our work is motivatedby the observation that these studies extend the conclusions of[3] past the assumptions under which those results have beenobtained, such that they do not hold anymore. In particular,all protocols for optimizing throughput or delays in DTNsbreak the initial condition assumption (messages do not haveequal initial spread), which leads to significant throughput loss.Both Lin et al. [6] and Altman et al. [7] (which studies networkcoding and Reed-Solomon codes in two-hop DTNs) note thesimilarity between channel coding and data transmission inDTNs. We study the implications of this analogy on theaforementioned initial condition assumption.II. N
ETWORK M ODELS
The network model is similar to the one used in [2]. Thenetwork consists of N mobile nodes with the same radio rangeand buffer space B . We consider that a wireless link (contact)is established between two nodes when they are in each others’radio range. All contacts are bidirectional. Their duration isconsidered to be negligible with respect to the inter-contacttimes, but sufficient enough to allow the transmission of onepacket in each direction. We consider mainly the case of abacklogged source broadcasting data to the entire network andthen extend the conclusions to multi- and unicast. The sourceaims at maximizing the average throughput at destinations.We consider a mobility model with exponential inter-contacttimes of parameter λ , which has been validated for a widespectrum of mobility scenarios[8]. Our analysis is howevernot constrained to this type of mobility.The backlogged source is considered to have at least ν ∈ N packets in its buffer. These packets will be called hereafter variables and represent original (not coded) source-generatedpackets. Out of them, the source selects every time a set of ν oldest packets to be transmitted in the network. When thetransmission of the ν packets is considered completed (after afixed time, TTL, set as a function of N ), they are deleted byall nodes, the source selects the next ν packets and repeats theoperation. Minimizing the time to delivery and the probabilitythat not all nodes have decoded the content are desirable.ll nodes implement random linear network coding over afinite field GF (2 k ) . The source is assumed to send to nodesthat it encounters coded packets (one such packet/contact). Coded packets are elements in the set of ν independent linearcombinations of ν variables (set called packet batch ), wherecoefficients are randomly selected from GF (2 k ) . Note that ν ≤ B and the buffer occupancy is described by the numberof independent linear combinations present in a node’s buffer.Packets have size K (a multiple of k ) and are treated as vectorsof values from GF (2 k ) . During a contact, nodes scale eachvector (coded packet) in their buffers with randomly selectedelements from GF (2 k ) and adds them, thereby creating a newnetwork coded packet, which is sent to the other node. A nodeis able to decode all variables only when it has received ν independent linear combinations. We say that a coded packetreceived by a node is innovative if it increases the rank ofthe equation system formed by coded packets in that node’sbuffer. A contact is efficient iff at least one innovative packetis transferred. We are analyzing two protocols: one in whichrelay nodes send random linear combinations of coded packetsstored in their buffer during contacts (as described above)and the other where nodes compare their buffers and onlyforward to each other (coded) packets selected uniformly atrandom among those not contained by the other. The twoprotocols are denoted by Γ (true RLNC) and ∆ (a typeof random message selection), respectively. RLNC schemestransport along with packets the random coefficients as wellas the identities of original variables combined in the codedpackets, providing therefore a distributed solution[9], [10]. Itcan be proven that the overhead of storing and transportingthese random coefficients is small. Note that ∆ can also beused with variables as packets (instead of coded packets),as relays do not perform network coding, thus eliminatingcoefficient overhead. Γ and ∆ are similar to E-NCP, E-RP[6].III. M AIN R ESULTS
A. Random Message Selection with Feedback vs.
RLNC
The following result shows that the operation of randommessage selection with buffer feedback during contacts ( ∆ ) isalmost identical to true RLNC ( Γ ). Thus, results for ∆ apply to Γ and vice versa. The equivalence uncovered by this theoremcan be used for designing optimal distributed network codingprotocols for intermittently-connected networks, initially underthe more tractable ∆ , then applied to Γ . ∆ relies on nodesexchanging information about the list of packets in buffers,during contacts. Should this capability not be available, Γ -typeRLNC offers the distributed counterpart. Theorem 3.1:
Given identical mobility and initial condi-tions (set of packets already disseminated by the source in thenetwork and which are prepared to start the epidemic networkspread), an arbitrarily-selected contact between two nodes A and B at time t will have approximately the same probabilitythat A delivers a novel packet to B , under both Γ and ∆ . Proof:
We use the following notation: for a node w , S − w and S + w designate the subspace spanned by the coded packets Using counting Bloom filters [5] also implies that ( Γ ∼ = ∆ ∼ = global rarest). belonging to this node’s buffer, before and after a contact withanother node v , respectively. It is thus easy to infer[3] that: P r [ S + w * S − u | S − w ⊆ S − u , S − v * S − u ] ≥ − q (1) P r [ dim ( S + w ) > dim ( S − w ) | S − v * S − w ] ≥ − q (2)where q = | GF (2 k ) | and u can be any other node. The twoprobabilities describe the way a node w acquires new degreesof freedom in its buffer under RLNC with Γ -type forwarding.Each of these probabilities is equal to for ∆ -type forwarding,so eq. (1) and (2) continue to be true even for ∆ . If weconsider the case of a very large q ( q → + ∞ ), then Γ and ∆ become identical. For known mobility and known initialpacket distribution, we can construct a DTMC to capture thepacket propagation. A state contains an array of size N and anelement of this array at index i has to store the list of degreesof freedom acquired by node i until that time step. Thereare N degrees of freedom under both Γ and ∆ . Consider acontact between nodes A and B , where we analyze only thetransmission from A to B . From eq. (1), (2) this transmissionis successful iff node A ’s buffer has one degree of freedomnot available to B . In both Γ and ∆ this degree of freedom isselected uniformly at random from those available to A and notavailable to B . Thus, the transition probabilities are the samefor both DTMCs and the two protocols behave identically. Ina more realistic setting when q is finite, RLNC with Γ will infact slightly underperform ∆ , because the probabilities in eq.(1), (2) will be for ∆ and ≥ − q for Γ .To prove rigorously that the uniform selection of degreesof freedom (dimensions) leads to similar behavior of Γ and ∆ , we have to postulate the following elementary theorem,known from linear algebra, presented here without proof: Every n -dimensional vector space V over some finitefield F is isomorphic to F n . If v , v , v , . . . , v n is a basisof V , then the mapping φ : F n → V : ( a , . . . , a n ) P nk =1 a k v k is an isomorphism. Observation:
Since the choice of basis for V is not unique(there are many possibilities) ⇒ the above isomorphism is alsonot unique. In fact, we can construct many such isomorphisms. Final steps:
We need this isomorphism simply becausetracking the evolution of vector spaces (that is, node buffers)during the packet spread process is very challenging. Suchisomorphisms offer an easy way to label buffers in a consistentmanner. In particular, we are interested in mapping each bufferto a subset of the base { p , . . . , p ν } , where p , . . . , p ν arethe initial packets at source. We regard each buffer as asubspace/subset of the ν -dimensional vector space. Each suchbuffer/subspace is generated by the vectors/packets present init. Note that the labelling will be performed for every node,at will hold at every step of the packet spread. However, afinal point needs to be discussed. One has to observe that wecannot simply map all k -dimensional subspaces to the sameset of k vectors of the base (actually, to the subspace thatthey generate). This is simply because then all buffers willlook identical after applying the isomorphism. Based on thefact that every intersection of subspaces is also a subspace,we can build the mappings/labellings for each node in a waythat can prevent this problem. To this end, we specify aard constraint requiring that the intersection of subspacesbe respected even after applying the isomorphism . This canbe translate as follows: the intersection of any number ofsubspaces (buffers) has to be a subspace of the same dimensionin the original version and after applying the isomorphism.This is effectively the final step of our proof. The attentivereader will have already noticed that instead of working withcoded packets, we have mapped our buffers to sets of originalpackets, thus effectively equating Γ to ∆ . B. Finding Optimal Spread Ratios
We analyze how the number of coded packet copies in-fluences the instant throughput of broadcast under ∆ -typeforwarding and extend the result to Γ forwarding, uni- andmulticast. For our mobility model, if coded packets have eachthe same number of copies in the network at the beginningof the forwarding, then, for an arbitrary k , P r [ node ℓ hasa copy of coded packet p k ] = ζ k = ct., ( ∀ ) ℓ . m p i ( t ) isthe number of copies of p i contained by network at time t (not counting the source), and ρ p i ( t ) = m pi ( t ) N − is thecorrespondent instant density. We seek to find the relationbetween ρ p i ( t ) , i = 1 , ν that maximizes the instantaneousthroughput. For this, we analyze the efficiency of each node’sfirst contact after instant t , arbitrarily chosen. For tractability,we first look at the case when each relay node contains exactlyone coded packet at time t and generalize afterwards. In thiscase, ν P i =1 m p i ( t ) = N − , ν P i =1 ρ p i ( t ) = 1 . We exclude w.l.o.g. the source as its contacts are efficient by definition anyway. If A ( t ) p i is the set of nodes (without the source) containing a copyof coded packet p i at time t , |A ( t ) p i | = m p i ( t ) , ( ∀ ) i = 1 , ν ⇒ ν S i =1 A ( t ) p i = N −{ s } , A ( t ) p i ∩A ( t ) p j = ∅ , ( ∀ ) i = j , where N −{ s } is the set of all nodes, without the source. For an arbitrary ℓ ∈ A ( t ) p i , P r [ next contact of ℓ is inefficient ] = ρ p i ( t ) − N − (efficient with probability − ρ p i ( t ) + N − ), meaning that ℓ has met another node from the same set, no data transferoccurred and the waiting time preceding the contact had beenwasted.We are interested in maximizing throughput (maximizingthe expected number of efficient first contacts of each nodeafter instant t ). Therefore, under ν P j =1 ρ p j ( t ) = 1 we maximize f ( ρ p ( t ) , . . . , ρ p ν ( t )) = ν X k =1 X ℓ ∈A ( t ) k (1 − ρ p k ( t )) = (3) = ν X k =1 m p k (1 − ρ p k ) = ( N − · ν X k =1 ρ p k (1 − ρ p k ) (4)Using Lagrange multipliers, Λ( ρ p ( t ) , . . . , ρ p ν ( t ) , λ sol ) = f ( ρ p ( t ) , . . . , ρ p ν ( t )) + (5) λ sol · ( ν X k =1 ρ p k ( t ) − ⇒ ∂ Λ ∂ρ p k = ( N − · (1 − ρ p k ) (6) Considers bidirectional contacts. Approximation that node densities donot change significantly between two contacts verified in Section III-C. Theconstant N − does not influence the result. + λ sol = 0 ⇒ λ sol = 2( N − ν − ( N − (7)Replacing λ sol we find that ρ p ( t ) = . . . = ρ p ν ( t ) , ρ ( t ) = ν . Thus, all densities should be equal at instant t . Thegeneralization for the case of more (or less) than packet/nodeon average ( ν P j =1 ρ p j ( t ) = c, c ≥ ) is provided by Theorem 3.2:
The following condition is necessary formaximizing throughput of a batch of ν coded packets in a DTNwith ∆ -type forwarding: ρ p ( t ) = ρ p ( t ) = ρ p ( t ) = . . . = ρ p ν ( t ) , ρ ( t ) , ( ∀ ) t . In other words, regardless of the bufferoccupancy level, packet densities should be roughly equal toensure maximal throughput. Proof:
For each node ℓ define the concept of en-tire buffer packet as being the indicator function ℓ : { p , p , p , . . . , p ν } → { , } , ℓ ( p k ) = 1 ⇔ the buffer of ℓ contains coded packet p k . A contact between ℓ, ℵ ∈ N isefficient ⇔ ℓ = 1 ℵ . Using the above argument for entirebuffer packets we check that P ℓ ∈N −{ s } ρ ℓ = 1 and therefore ρ ℓ ( t ) = ρ buf ( t ) = ct., ( ∀ ) ℓ ∈ N − { s } is necessary forthroughput maximization. From this set of equalities at time t ,considering an arbitrarily chosen but fixed entire buffer packet ℵ ⇒ P r [ ℓ ∈ A ( t )1 ℵ ] = P r [1 ℓ ( p ) = 1 ℵ ( p )] · P r [1 ℓ ( p ) =1 ℵ ( p )] · P r [1 ℓ ( p ) = 1 ℵ ( p )] · . . . · P r [1 ℓ ( p ν ) = 1 ℵ ( p ν )] = ρ buf ( t ) = ct., ( ∀ ) ℓ ∈ N − { s } . But since none of the buffersis yet full ( max ℓ ∈N −{ s } | ℓ | = κ < N , | ℓ | = κ ℓ – ℓ ’s bufferoccupancy) ⇒ P r [1 ℓ ( p i ) = 1] · P r [1 ℓ ( p i ) = 1] · P r [1 ℓ ( p i ) =1] · . . . · P r [1 ℓ ( p i κℓ ) = 1] = Ω = ct., i , i , i , ..., i κ ℓ =1 , ν, ( ∀ ) ℓ ∈ N − { s } . This is an equation system with N − product equations and ν ≤ B ≤ N unknowns, where eachprobability is known to be strictly positive. By equalizingequations with the same number of factors, we obtain thatwith high probability P r [1 ℓ ( p ) = 1] = P r [1 ℓ ( p ) = 1] = P r [1 ℓ ( p ) = 1] = . . . = P r [1 ℓ ( p ν ) = 1] = ct., ( ∀ ) ℓ ∈N − { s } , which means that coded packets should have equalspreads. Remark:
The same condition is necessary for providingoptimal throughput also for unicast and multicast. This isbecause each node will deliver a packet to the target(s) withequal probability.
C. Impact of Packet Counts on Contact Efficiency
In this paragraph we explain why the assumption of equalpacket spread cannot be taken for granted and demonstrateits performance impact. We define the entropy of rela-tive (normalized) coded packet densities at time instant t as H ( ρ ′ ( t )) = ν P k =1 ρ ′ p k ( t ) · log ν ( ρ ′ pk ( t ) ) , where ρ ′ ( t ) =( ρ ′ p ( t ) , ρ ′ p ( t ) , ρ ′ p ( t ) , . . . , ρ ′ p ν ( t )) and ρ ′ p k ( t ) ∈ [0 , are thenormalized counterparts of ρ p k ( t ) , ( ∀ ) k = 1 , ν . The entropyis close to iff network coded packets have similar instantdensities in the network. The entropy allows us to quantifythe discrepancy between densities through a scalar. We analyzethe evolution with time of H ( ρ ′ ( t )) and use as an example a The independence assumption is reasonable, given that a coded packet isselected by a node randomly from its buffer for transmission during a contact(only from those packets that are innovative ). We consider by convention that · log(+ ∞ ) = 0 . E n t r op y o f no r m a li z ed pa ck e t den s i t i e s Time (s)Equal densities (seed (a) Protocol 1 (continuous source transmission) vs.
Protocol 2.(equal densities) E n t r op y o f no r m a li z ed pa ck e t den s i t i e s Time (s)Equal densities (seed (b) Protocol 1 (continuous source transmission) vs.
Protocol 3.(equal densities)Fig. 1. Comparison of instantaneous entropies of normalized network codedpacket densities for a network of nodes network with N = 100 nodes, B = 11 , λ = 0 . . InFig. 1(a)-1(b) we show how the entropy evolves over timeusing three representative mobility realizations that run untilall destinations decode the data, for the following protocols:1. A benchmark ∆ -type protocol: the source transmits abatch of coded packets continuously until all desti-nation nodes have decoded the data. No specific measureis taken to maintain equal densities;2. Another ∆ -type protocol with a batch of codedpackets, each placed in a separate node before the spreadis triggered. The source continues transmitting codedpackets to non-full nodes;3. Similar to 2., with the exception that after distributing theinitial copies and triggering their spread, the source stopsdisseminating data. Remarks:
There are a number of observations, which holdin general (also for Γ ). Firstly, high entropies are conservedby exponential inter-contact times. Secondly, the delay ofprotocols 2. and 3. is always almost identical, meaning thatthe source intervention does not improve the throughput any-more and that high entropy should be sufficient for maxi-mizing throughput. Thirdly, when the source injects packetsin a greedy manner (strategy commonly considered to yieldminimal delay[6]), the entropy drops significantly, impacting The same behavior is observed for other combinations of parameters.Timeline scaled with N − in plots. Fig. 1(a),1(b) show time-evolving entropies for protocols 2., 3. (equaldensities) without the time needed by the source to place a copy of eachcoded packet in disjoint nodes. This will be studied in Section IV. overall contact efficiency.IV. I
MPROVED F ORWARDING P ROTOCOL
We define the seeding phase of a transmission as the timeinterval used by source to place ν independent coded packetseach on a distinct relay node, for a batch of ν variables. Thetime needed for this operation is a random variable T sn , where n ∈ N is the identifier of the packet batch (or T s whenwe do not refer to a specific n ). Similarly, we define the propagation phase of the transmission as the interval in whichthe independent coded packets are forwarded epidemically inthe network; this step finishes when all destination nodes havesuccessfully decoded the packets. The time needed for thepropagation phase of batch n is another random variable, T pn (identically distributed as T p ). For every packet batch, thepropagation phase takes place immediately after the seedingphase. The key idea is that the seeding phase of packet batch n + 1 can be performed in parallel with the propagation phaseof the packet batch n . To accomplish this, we need to ensurethat ν = B − O ( C ) , s.t. B − O ( C ) buffer slots are available topropagation of batch n and O ( C ) are reserved for seeding ofbatch n + 1 . For an each node, these B − ν places will hostpackets copied directly from the source. Remark:
The throughput loss caused by the fact that ν = B is shown to be negligible in comparison to the gain resultingfrom pipelining (see Section V). In practice B − ν ∈ { , } . Theorem 4.1:
The seeding phase can be completed in Θ( ν ) steps (in practice, approximately ν consecutive contacts of thesource) with high probability. At the end of the seeding phase,each of the ν independent coded packets will be placed on adifferent relay node with high probability. Proof:
The seeding algorithm performed by the source isdescribed in the following. From the original ν variables, thesource constructs ν independent coded packets with RLNC.Each of these ν coded packets is sent by the source only once.During j th contact, the j th coded packet p j is sent by sourceto the peer node, j = 1 , ν . To ensure that all packets startspreading at roughly the same time (during propagation phase),the source specifies that coded packet p j should be forwardedonly after the estimated time to finish the seeding. In the mostfavorable case, the source encounters a different node everytime. This happens for B ≪ N ( ν ≪ N ). In this case we canset ν = B − . Let P ′ i be the probability that packet p i will besuccessfully placed in a node not already containing a packetfrom the same batch. Then, P ′ i = N − i +1 N ⇒ E [ X ] = ν P i =1 1 P ′ i ,where X is a r.v. (the number of steps to perform seeding).Thus E [ X ] ≈ ν for B ≪ N . The source faces a variant ofthe coupon collector problem for higher B . For this case, wecan set ν = B − , and the probability that two coded packetsof the same batch end up in the same node during seeding ismuch higher. However, the relay will move the extra codedpacket to another node not already containing a coded packetfrom the same batch) with the first opportunity, which occurswith high probability. Therefore, the probability that p i is notplaced successfully for this case (neither by the source, norby the relay at some point in the future, before the end of theseeding phase, which should occur after the ν th contact of theource) is π i ≤ (1 − P ′ i ) · ν Q k = i +1 (1 − P ′ k ) = i − N · ν Q k = i +1 k − N .In practice, we work with networks of limited buffers, were ν ≤ B ≪ N and therefore π i ≈ . This probability is verylow even when ν = ⌈ N ⌉ . In conclusion, seeding can be doneon average in ν steps successfully.Splitting in two phases (seeding and propagation) is suggestedby the resemblance to channel coding: to approach channel(which is analogous to the DTN) capacity, a block of ν bits isassembled, coded, sent and then decoded by the destination.As ν → + ∞ , the capacity can be approached asymptotically. Corollary 4.1:
The seeding phase occurs with the minimumpossible energy consumption for the source.No feedback is assumed and reliable packet delivery isrequired even when the source is backlogged. We thereforeenforce T lp as deadline for the propagation phase and aimto achieve full delivery with high probability, before thisdeadline is reached. The time spent in the propagation phaseis measured from the ν th contact of the source (the one thatdelivered the last packet of the batch to the network). Theprobability that the propagation phase will be longer than T lp can be obtained using one of the following: • P r [ T p > T lp ] ≤ E [ T p ] T lp = ε p (Markov’s inequality); • P r [ T p > T lp ] ≤ inf s e − sT lp · M T p ( s ) = ε p , where M X ( s ) is the mgf of variable X (Chernoff’s inequality which isthe tightest, if applicable).The derivation for T p (CCDF) is omitted here due to limitedspace and is provided by [11]. Moreover, due to the factthat packet densities are almost equal at the beginning ofthe propagation phase, the assumptions made in [6] are nowaccurate, allowing easier analytical treatment. The probabilitythat there is at least one destination that has not decoded alldata is ε p → , for T lp reasonably large. In Section V weshow that this condition can be achieved already for low T lp and therefore throughput is not affected.V. S IMULATION R ESULTS
We test the pipelined- Γ protocol (with RLNC at interme-diary nodes) against the simple Γ forwarding protocol, whichalso uses coding at intermediary nodes and which should bethroughput optimal. Fig. 2(a) shows the additional throughputprovided by the pipelined protocol. To ensure full delivery,we let T lp = max n ∈{ ,..., } { T pn } . Surprisingly, pipeliningis quite close to achieving the throughput capacity (no morethat one packet, coded or not, can be sent by the sourceduring a contact). Fig. 2(b) shows an extra delay incurredby packets due to the seeding phase exceeding the length ofthe propagation phase; its impact is however minimal. Usingsmaller buffers, pipelining can achieve throughputs superiorto usual RLNC schemes (which need more memory), at thecost of a small additional delay. The pipelining protocol canbe used also with non-coded packets. This is necessary whendestination nodes only require some of the packets to bedelivered, and do not need to decode the entire packet batch.In this case the overhead associated with transmission ofcoefficients and computations over the finite field is eliminated,but the observations from Fig. 2(a) and 2(b) remain valid. Theattempt made in [6] to use equalizing spray counts does not −3 Buffer size P a ck e t s /t i m e un i t ( pe r de s t i na t i on node ) PipelinedNetwork coding (a) Throughputs ( λ = 0 . , N = 100 ) Buffer size A v g . de l a y t o f u ll de li v e r y Total delay (pipelined)Seeding extra delay (pipelined)Total delay (network coding) (b) Delays ( λ = 0 . , N = 100 )Fig. 2. Performance evaluation of the pipelining protocol (averaged over 100source-generated packet batches) obtain better delays simply because it still allows a long initiallow entropy interval, which has a snowball effect. Furtherincreasing the throughput by setting ν > B is not possible,because in broadcast every node must be able to decode thetransmission.Our conclusions would seem to contradict the results ofAhlswede et al. [12] and Deb et al. [3]; this is however notthe case, because we complement in fact the two papers forthe case of intermittently connected networks. Firstly, RLNCwould reach the maxflow-mincut bound when ν → + ∞ ,which would mean very large buffers (not possible in ourcase). Secondly, [3] assumes equal initial packet spread, as-sumption which does not hold in DTNs with usual forwardingprotocols. VI. C ONCLUSIONS
In this paper we consider the problem of optimizingthroughput in intermittently connected networks, with minimalimpact on delay. We specifically address the practical caseof limited buffers. It is proven that network coding underuti-lizes the available resources. Following information theoreticalhints, we design a practical forwarding protocol relying onpipelining, which achieves asymptotically reliable delivery andoutperforms network coding in throughput, energy consump-tion and memory usage with negligible delay overhead. DTNsare shown to be very sensitive to initial forwarding conditions(in particular, initial number of packet copies). Setting themto convenient values is easily achieved and yields significantperformance gains. On the other hand, trying to control thenetwork after the initiation of the forwarding process is muchmore challenging. Our analysis of the single source broadcasteneralizes to uni- and multicast. A thorough considerationof energy constraints, congestion, multiple unsynchronizedsources, comparison with other coding techniques, improvedpipelining and applicability of the maximum-entropy principleto other mobility models is left for future work.R
EFERENCES[1] A. Vahdat and D. Becker, “Epidemic routing for partially-connected adhoc networks,” Duke University, Tech. Rep. CS-2000-06, 2000.[2] R. Subramanian and F. Fekri, “Throughput performance of network-coded multicast in an intermittently-connected network,” in
Modelingand Optimization in Mobile, Ad Hoc and Wireless Networks (WiOpt),2010 Proceedings of the 8th International Symposium on , 31 2010.[3] S. Deb, M. M´edard, and C. Choute, “Algebraic gossip: a networkcoding approach to optimal multiple rumor mongering,”
IEEE/ACMTrans. Netw. , vol. 14, pp. 2486–2507, June 2006. [Online]. Available:http://dx.doi.org/10.1109/TIT.2006.874532[4] J. Widmer and J.-Y. Le Boudec, “Network coding for efficientcommunication in extreme networks,” in
Proceedings of the 2005 ACMSIGCOMM workshop on Delay-tolerant networking , ser. WDTN ’05.New York, NY, USA: ACM, 2005, pp. 284–291. [Online]. Available:http://doi.acm.org/10.1145/1080139.1080147[5] Y. Lin, B. Liang, and B. Li, “Performance modeling of network codingin epidemic routing,” in
Proceedings of the 1st international MobiSysworkshop on Mobile opportunistic networking , ser. MobiOpp ’07.New York, NY, USA: ACM, 2007, pp. 67–74. [Online]. Available:http://doi.acm.org/10.1145/1247694.1247709[6] Y. Lin, B. Li, and B. Liang, “Efficient network coded data transmissionsin disruption tolerant networks,” in
INFOCOM 2008. The 27th Confer-ence on Computer Communications. IEEE , 2008, pp. 1508 –1516.[7] E. Altman, F. De Pellegrini, and L. Sassatelli, “Dynamic control ofcoding in delay tolerant networks,” in
INFOCOM, 2010 ProceedingsIEEE , 2010, pp. 1 –5.[8] R. Groenevelt, P. Nain, and G. Koole, “Message delay in manet,” in
Proceedings of the 2005 ACM SIGMETRICS international conferenceon Measurement and modeling of computer systems , ser. SIGMETRICS’05. New York, NY, USA: ACM, 2005, pp. 412–413. [Online].Available: http://doi.acm.org/10.1145/1064212.1064280[9] T. Ho, M. M´edard, J. Shi, M. Effros, and D. R. Karger, “On randomizednetwork coding,” in
Proc. Allerton , Oct. 2003.[10] P. A. Chou, Y. Wu, and K. Jain, “Practical network coding,” in
Proc.Allerton , Oct. 2003.[11] T. Spyropoulos, T. Turletti, and K. Obraczka, “Routing in delay-tolerant networks comprising heterogeneous node populations,”
MobileComputing, IEEE Transactions on , vol. 8, no. 8, pp. 1132 –1147, 2009.[12] R. Ahlswede, N. Cai, S.-Y. Li, and R. Yeung, “Network informationflow,”
Information Theory, IEEE Transactions on , vol. 46, no. 4, pp.1204 –1216, Jul. 2000. A PPENDIX
A. Remarks Regarding the Equivalence Between ∆ and Γ To see why the equivalence holds, we can consider the effectof the initial packet density distribution on both schemes. Letus assume that out of the ν coded packets, the source hasmanaged to send to the network only ψ < ν . These packetshave the normalized density distributions ρ ′ , ρ ′ , ρ ′ , . . . , ρ ′ ψ ,meaning that ρ ′ j = 0 , ( ∀ ) j = ψ + 1 , ν . Clearly, due to theuniformity of the mobility model, the probability distributionfor having already received these packets is the same acrossnodes. Let us assume w.l.o.g. that no coded packets from thesource have been yet coded together. The higher the entropy,the higher is the chance that the distribution is very skewed.This means that some packets have already achieved highspread, while most others have only a few copies in thenetwork. Then, the chance that two nodes in the next contacthave exactly the same buffer content is very high. In thiscase, both Γ and ∆ generate the same inefficient contacts.Even if their buffers are not exactly the same, the overlapwill be anyway significant. The contact will deliver with a high probability an independent packet to destination, but theproblem is that most contacts in the network generate newrandom vectors from the same very few linear subspaces ofsimilar dimension . For this reason, a node having received anetwork coded packet is still very likely to deliver during itsnext contact a vector which is already in the linear subspaceof the receiving node . The essential observation to be madeis that Γ does not promote packets of lower densities betterthan ∆ . A rare packet will be coded together with othersat basically the same rate as the one at which ∆ promotesit. Indeed, the nodes receiving a rare packet will be ableto deliver new combinations to others (under Γ ), since thecombinations contain the new packet. But this happens exactlythe same under ∆ too, anyway. As the buffers will be almostidentical, the nodes having the rare packets will have to sendthem anyway, just like in Γ , because almost all the othersthat they have are already present in the nodes they meet.In other words, nodes receive new degrees of freedom at thesame rate, both under ∆ and Γ . What matters, is that a newindependent vector has been received, but also the way it wasobtained. If most nodes receive independent vectors generatedfrom the same few bases, then in the next step they will forsure deliver redundant packets. As the source disseminatesthe initial base { p , p , p , . . . , p ν } in the network during itscontacts, it matters which packets of the initial base havereached destination nodes, and not the way these packets havebeen combined by the network. The assumption we madeabove that we first regard a network which has not coded yetpackets together is indeed without loss of generality preciselyfor this reason. These simple facts provide us with the resultthat the behavior of both ∆ and Γ is almost identical. ∆ cantherefore be used as a very good approximation for Γ , wherethis is necessary for tractability reasons. . Impact of Entropy on Contact Efficiency Fig. 3(a)-3(c) show contact efficiency for the same threemobility traces used in Fig. 1(a)-1(b) . It can be clearly seenthat high entropies allow the number of efficient contactsper unit of time to increase very fast and to remain at highlevels, therefore improving throughputs, as opposed to the lowentropy case. Low entropy will always generate much lessefficient contacts, with negative effects as both ways of thebidirectional links established during contacts are affected. A v g . nu m be r o f e ff i c i en t s i m p l e x t r an s f e r s ( pe r s e c ond ) Time (s) Equal densitiesContinuous source transmission (a) Random mobility with seed no. 1 A v g . nu m be r o f e ff i c i en t s i m p l e x t r an s f e r s ( pe r s e c ond ) Time (s) Equal densitiesContinuous source transmission (b) Random mobility with seed no. 2 A v g . nu m be r o f e ff i c i en t s i m p l e x t r an s f e r s ( pe r s e c ond ) Time (s) Equal densitiesContinuous source transmission (c) Random mobility with seed no. 3Fig. 3. Average number of efficient contacts per unit of time (over a slidingwindow of 50 contacts), for protocols 1. (black) and 3. (blue).8