Capacity Approximations for Gaussian Relay Networks
11 Capacity Approximations forGaussian Relay Networks
Ritesh Kolte,
Student Member, IEEE,
Ayfer ¨Ozg¨ur,
Member, IEEE and Abbas El Gamal,
Fellow, IEEE
Abstract —Consider a Gaussian relay network where a sourcenode communicates to a destination node with the help of severallayers of relays. Recent work has shown that compress-and-forward based strategies can achieve the capacity of this networkwithin an additive gap. Here, the relays quantize their receivedsignals at the noise level and map them to random Gaussiancodebooks. The resultant gap to capacity is independent of theSNRs of the channels in the network and the topology but islinear in the total number of nodes.In this paper, we provide an improved lower bound onthe rate achieved by compress-and-forward based strategies(noisy network coding in particular) in arbitrary Gaussian relaynetworks, whose gap to capacity depends on the network not onlythrough the total number of nodes but also through the degreesof freedom of the min cut of the network. We illustrate that formany networks, this refined lower bound can lead to a betterapproximation of the capacity. In particular, we demonstrate thatit leads to a logarithmic rather than linear capacity gap in thetotal number of nodes for certain classes of layered networks.The improvement comes from quantizing the received signals ofthe relays at a resolution decreasing with the total number ofnodes in the network. This suggests that the rule-of-thumb inliterature of quantizing the received signals at the noise level canbe highly suboptimal.
Index Terms —Relay Networks, Gap to Capacity, Noisy Net-work Coding, Network Topology, Quantization
I. I
NTRODUCTION
Consider a source node communicating to a destinationnode via a sequence of relays connected by point-to-pointAWGN channels, as depicted in Figure 1. The capacity of thisline network is achieved by simple decode-and-forward andis equal to the minimum of the capacities of the successivepoint-to-point links. The decoding at each stage removes thenoise corrupting the information signal and therefore the end-to-end rate achieved is independent of the number of timesthe message is retransmitted.Unfortunately, the optimality of decode-and-forward is lim-ited to this line topology, and in physically degraded networksin general. In more general networks with multiple relays ateach layer, it is well-understood that the rate achieved bydecode-and-forward can be arbitrarily smaller than capacity.Characterizing the capacity of more general networks has been
Manuscript received July 14, 2014; accepted July 6, 2015; date of currentversion August 2015. This work was presented in part in ITW 2013 SevilleSpain [1] and IZS 2014 Zurich Switzerland [2].Communicated by Tie Liu, Associate Editor for Shannon Theory.The authors are with the Department of Electrical Engineering at Stan-ford University. (Emails: [email protected], [email protected], [email protected]). The work of R. Kolte and A. ¨Ozg¨ur was partlysupported a Stanford Graduate Fellowship, NSF CAREER award 1254786and the NSF Center for Science of Information under grant agreement CCF-0939370. s r r r D − d Fig. 1: Line Networkof interest for a long time [3] (also see [4] and referencestherein). Recently, significant progress has been made ([5], [6],[7], [8], [9]) which shows that compress-and-forward basedstrategies can be a better fit for general relay networks. Here,relays quantize/compress their observations without decodingand forward the compressions to the destination by mappingthem to a new codebook. In particular, it has been shownthat compress-and-forward based relaying strategies (such asquantize-map-and-forward in [5] and noisy network codingin [6]) can achieve rates that are within a bounded gap tothe capacity of any relay network with multi-source multicasttraffic. The gap is independent of the coefficients and SNR’sof the constituent channels and the topology of the network.However, it depends linearly on the total number of nodeswhich limits the applicability of these results to small networkswith a few relays. A recent result that we would like topoint out here is [10] in which an extension of partial-decode-forward, called distributed decode-forward, has been shown toachieve a similar result. The gap to capacity for this scheme isalso shown to be linear in the number of nodes, with a lowerconstant compared to noisy network coding.Since the gap to capacity of compress-forward based strate-gies is linear in the number of nodes, for the line networkin Figure 1, they yield an achievable rate whose gap tocapacity is linear in the depth of the network D . One naturalway to explain this gap is the noise accumulation. As theinformation signal proceeds deeper into the network, it iscorrupted by more and more noise. Therefore, any strategythat does not remove the noise corrupting the signal at eachstage by decoding the source message will naturally suffer arate loss that increases with the number of stages. However,it is not clear why this rate loss should be linear in the depthof the network as the current results in the literature suggest[5], [6], [7]. The total variance of the accumulated noise overthe D stages of the network is D times the variance of thenoise at each stage (assuming identical noise variances overthe D stages). A factor of D increase in the noise variancein a point-to-point Gaussian channel would lead to at most a log D decrease in capacity, and therefore it is natural to ask ifwe can reduce the performance loss of compress-and-forwardstrategies from linear to logarithmic in D , first in the contextof this example and then in more general networks.The first contribution of this paper is to show that a judi-cious choice of the quantization (or compression) resolutions a r X i v : . [ c s . I T ] J u l at the relays can significantly improve the performance ofcompress-and-forward based strategies (noisy network codingin particular). For example in the line network in Figure 1, ifthe relay nodes quantize their observed signals at a resolutiondecreasing linearly in D , the rate loss due to compress-and-forward is only logarithmic in D . (See Section IV.) Thisis counterintuitive as coarser quantization introduces morenoise to the communication and our result suggests that themore relaying stages we have, the more coarsely we shouldquantize. The rule-of-thumb used in the current literature[5], [6], [7] is to quantize the received signals at the noiselevel (independent of the number of relays) which we showto be highly suboptimal. The improvement due to coarserquantization is because in compress-and-forward, there is arate penalty for communicating the quantized signals to thedestination and this rate penalty can be significantly largerthan the rate penalty associated with coarser quantization. Adetailed discussion on this is presented in Section V. The factthat optimizing the quantization resolutions can lead to betterrates for compress-and-forward was also observed in [11], [12]in the context of the Gaussian diamond network.An immediate question is whether this observation can leadto better capacity approximations for more general Gaussiannetworks beyond the line network. To address this question,we suggest a new approximation philosophy for the capacityof Gaussian networks. The current approach is to approximatethe capacity within a gap that depends only on the numberof nodes. However, two networks with the same number ofnodes can have very different topologies which can potentiallylead to significantly different performance for compress-and-forward. While it is desirable to have capacity approximationswhich are independent of the instantaneous channel realiza-tions and SNR’s in the network, since these parameters havea wide dynamical range and typically change over a short timescale in wireless networks, topological properties of a networktypically change over a much longer time scale. Developingcapacity approximations which reveal the dependence of thegap not only on the number of nodes but other structuralproperties of the network can allow for a better understandingof the performance gap of compress-and-forward strategiesas well as yield tighter capacity approximations for manyGaussian networks.The main result of this paper is a new capacity approx-imation for Gaussian networks where the gap to capacitydepends not only on the number of nodes but also on thenumber of degrees of freedom (DOF) of the mincut of thenetwork. While the DOF of the mincut of the network can becarefully evaluated for a given network with specific channelrealizations (in which case our result will yield the tightestapproximation for this network), in many cases this quantitycan be easily bounded based only on the topological propertiesof the network. For example, for the line network in Figure 1the DOF of the mincut is trivially bounded by , while for adiamond network [11] it can be trivially bounded by . Forsuch networks, our result yields a logarithmic rather than lineargap in the number of nodes. As before, the improvement isbased on a judicious choice of the quantization resolutions atthe relays with noisy network coding. Finally, we look at specific settings and demonstrate thatour general result can yield better capacity approximations forthese settings than those available in the literature. The firstsetup we consider is the multi-layer fast-fading Gaussian relaynetwork in Figure 2. Here a source node equipped with K antennas communicates to a destination node equipped with K antennas over D layers, each layer containing K single-antenna relays. Each relay observes a noisy linear combinationof the signals transmitted by the relays in the previous layer.All channels are subject to i.i.d. Rayleigh fast-fading. Currentresults on compress-and-forward [5], [6], [7] yield a rate whichis within . KD gap to the capacity of this network, where KD is the total number of nodes. Instead, we show thatif relays quantize their received signals at a resolution thatdecreases as the number of layers increases, compress-and-forward can achieve a rate which is within an additive gapof K log D + K of the network capacity. So for a fixed K ,as the number of layers D increases, this gap only growslogarithmically in the depth of the network D .As a side result, we provide an analysis of the compress-and-forward based strategies in [5], [6], [7] in fast-fading wire-less networks. Fast-fading wireless networks are considered inTheorem 8.4 of [5], however the conclusion of the theoremand its proof are erroneous. Theorem 8.4 of [5] suggests thatthe ergodic fast-fading capacity of a wireless relay network isapproximately given by the expected value of the cutset upperbound (where the expectation is over the fading distribution).In contrast, we show that the capacity is approximately givenby the minimum of the expected cut values. The differenceis in the order of the expectation over the fading distributionand the minimization over different cuts. Note that the secondquantity can be arbitrarily larger than the first. s d V V V D − V D Fig. 2: Multi-Layer Relay Network for K = 3 , each H i is aRayleigh fading matrixThe problem of developing better capacity approximationsfor this setup has also been considered in [13], where acomputation alignment strategy is proposed to remove theaccumulating noise with the depth of the network. This yieldsa gap K + 5 K log K . Computation alignment is based onthe idea of combining compute-forward [14] with ergodicalignment proposed in [15]. While the gap to capacity obtainedby computation alignment is independent of D , this strategy issignificantly more complex than compress-forward and has anumber of problems from a practical perspective. In particular,ergodic alignment over the fading process leads to largedelays in communication and requires each relay to know theinstantaneous realizations of all the channels in the network.Moreover, its performance critically depends on the symmetry of the fading statistics. The compress-forward strategy withimproved quantization we propose in this paper requires onlythe destination to know the instantaneous channel realizationsin the network. In particular, no channel state information isrequired at the source and at the relays, and the fading statisticsare not critical to the operation of the strategy.To illustrate this last point, we consider another setupwhere the network has the same layered topology, howeverthe channel coefficients for each link are now fixed with unitmagnitudes and arbitrary phases (i.e. each channel coefficientis of the form e jθ for some arbitrary θ ∈ [0 , π ] ). Our approxi-mation gap for this setup is K log D + K log K + K which isagain logarithmic in the depth of the network rather than linear.Computation alignment is obviously not applicable in this caseand the best currently available capacity approximation for thissetup is . KD which follows from capacity approximationsfor general Gaussian networks [5], [6], [7].The aforementioned and previous results raise the questionof whether tighter gaps scaling sublinearly in the networksize can be obtained in the general case (independent ofnetwork topology). In this respect, we would like to mentionan interesting recent work [16] that shows that obtaining agap between capacity and cutset bound that is sublinear inthe number of nodes for general Gaussian relay networks ispossible if and only if the cutset bound is tight for all Gaussianrelay networks.The paper is organized as follows. The next section de-scribes the model and some background. The main resultsand a discussion of the results are presented in Section III. Weillustrate the basic idea behind the results via the simple exam-ple of a line network in Section IV. Section V aims to clarifythe counterintuitive observation that coarser quantization at therelays can result in a better achievable rate. The formal proofsof the main results are presented in Sections VI, VII and VIII.II. M
ODEL AND P RELIMINARIES
In the following subsection, we describe the general modelof a Gaussian relay network, which is the subject of our mainresult.
A. General Model
Consider a Gaussian relay network, as depicted in Figure 3where a source node s communicates to a destination node d a message m ∈ [1 : 2 nR ] in n transmissions with the helpof a set of relay nodes. Let the number of transmit antennasand receive antennas at node i be M i and N i respectively.We assume N s = 0 and M d = 0 . Let N denote the setof all nodes and M = (cid:80) i ∈N M i and N = (cid:80) i ∈N N i be thetotal number of transmit and receive antennas respectively. Thesignal received by node i at time t is denoted as Y i [ t ] ∈ C N i × which is given by Y i [ t ] = (cid:88) j (cid:54) = i H ij X j [ t ] + Z i [ t ] , where H ij ∈ C N i × M j contains the (complex) channel gainsfrom node j to node i , and X j [ t ] ∈ C M j × is the transmittedvector by node j at time t . We assume that Y s = 0 and X d = 0 . Each node is subject to an average power constraint P perantenna and Z i [ t ] ∼ CN (0 , σ I ) , independent across time andacross different receive antennas. The relays are constrainedto be strictly causal in their operations, i.e. at any relay node i , X i [ t ] can be a function only of { Y i [1] , Y i [2] , . . . , Y i [ t − } . A rate R is said to be achievable if the probability of errorof decoding the message m ∈ [1 : 2 nR ] at the destination d can be made arbitrarily small by choosing a sufficiently large n . The supremum of all achievable rates is called the capacity C of the network. s d Fig. 3: Gaussian Relay NetworkIn sections VII and VIII, we focus on the following twospecial cases of Gaussian relay networks respectively.
B. Fast-fading Layered Network
In section VII, as stated in the introduction and depictedin Figure 2, we consider a fast-fading layered network, whereeach layer except the first and last contains K single-antennanodes. The nodes in the i th layer are collectively referred toas V i where ≤ i ≤ D , while a particular node j in layer i is referred to as the pair ( i, j ) . The layer V consists of thesource node s containing K transmit antennas, while the layer V D consists of the destination node d , which has K receiveantennas. Let V i denote V ∪ V ∪ · · · ∪ V i . We assume that s and d are equipped with multiple antennas in order to keep theproblem interesting. Otherwise, the minimum cut becomes themultiple-input-single-output cut from the last layer of relays to d and this trivializes the problem of approximately achievingthe capacity of the network. Instead of multiple antennas at d ,one can also assume orthogonal bit-pipes from nodes in V D − to d , as done in [13].For ≤ i ≤ D − , the received signal at node ( i + 1 , j ) in V i +1 (or antenna if i = D − ) depends only on the transmitsignals of nodes in V i and at time t is given by Y ( i +1 ,j ) [ t ] = K (cid:88) k =1 h ( i,k ) → ( i +1 ,j ) [ t ] X ( i,k ) [ t ] + Z ( i +1 ,j ) [ t ] , The channel gain h ( i,k ) → ( i +1 ,j ) is i.i.d. CN (0 , across timeindependent of everything else (i.e., other channel gains,noise and transmitted signals). In other words, we assumeindependent fast Rayleigh fading. The source nodes and therelay nodes do not know the instantaneous realizations of thechannel coefficients, i.e have no transmit or receive channelstate information. (The source node knows the topology of thenetwork and the channel statistics, i.e. the end-to-end ergodicrate supported by the network.) All instantaneous channelrealizations are known at the destination node and are used while decoding the transmitted message from the source node.Thus, we can effectively treat { Y d , H } as the received signal atthe destination, where H contains all the channel realizations. C. Static Layered Network
The topology of the static layered network that we con-sider in Section VIII is the same as that of the fast-fadinglayered network, i.e. a source node with K transmit antennascommunicates to a destination node with K receive antennasover D − layers each containing K single-antenna relays.However, instead of assuming fast-fading, we now focus onthe case where each channel gain h ( i,k ) → ( i +1 ,j ) is an arbitrarycomplex number with unit magnitude, i.e., of the form e jθ for some arbitrary θ ∈ [0 , π ] (possibly different for different ( i, k ) → ( i + 1 , j ) ), where the j in the superscript stands forthe imaginary unit. D. Background
An upper bound on the capacity C of any relay network isgiven by the cutset bound [17], which is as follows, C ≤ C (cid:44) sup p ( x N ) (cid:18) min Ω: s ∈ Ω ,d ∈ Ω c C (Ω) (cid:19) , (1)where Ω is a subset of N , and C (Ω) (cid:44) I ( X Ω ; Y Ω c | X Ω c ) , (2)and Ω c denotes N \ Ω . The notation X Ω is standard and refersto the set of random variables { X i : i ∈ Ω } . In [6], the authors propose an achievability scheme basedon compress-and-forward operation at the relays named “noisynetwork coding” (NNC). This scheme achieves any rate R thatis less than R NNC , which is given in (3) at the top of the nextpage. To keep the expressions short, we are assuming that ˆ Y Ω c contains Y d . In other words, ˆ Y d can be set to be equal to Y d . We refer the reader to [6] for the details of this scheme. It isshown in [6] that the gap between the cutset bound and therates achieved by noisy network coding for Gaussian relaynetworks with multi-source multicast traffic is no more than . |N | . III. M
AIN R ESULT
Given a Gaussian relay network as described in Section II-Aand a cut of this network Ω ⊆ N , for any Q ≥ , we define C i.i.d.Q (Ω) (cid:44) log det (cid:18) I + P ( Q + 1) σ H Ω → Ω c H † Ω → Ω c (cid:19) , (4)where the matrix H Ω → Ω c denotes the induced MIMO matrixfrom Ω to Ω c . In the case of single-antenna nodes, it isobtained by enumerating nodes in Ω and Ω c in an arbitraryfashion and H Ω → Ω c is the | Ω c | × | Ω | matrix whose ( i, j ) thentry contains the channel coefficient from node j ∈ Ω tonode i ∈ Ω c . In the case of multiple antennas, it is obtained byenumerating the transmit antennas in Ω and receive antennasin Ω c and the entries of the matrix denote the correspondingchannel coefficient. In this paper, log denotes the naturallogarithm. The expression in (4) is the mutual information across the cut Ω , defined in (2), when the channel inputdistributions at each node are i.i.d. CN (0 , P I ) and the noise ateach antenna is i.i.d. CN (0 , ( Q + 1) σ ) (instead of CN (0 , σ ) as originally defined in Section II-A). For a given Q ≥ , let Ω ∗ Q be the cut that minimizes C i.i.d.Q (Ω) , Ω ∗ Q (cid:44) arg min Ω: s ∈ Ω ,d ∈ Ω c C i.i.d.Q (Ω) . (5)Let d ∗ Q be the rank of the corresponding MIMO matrix H Ω ∗ Q → (Ω ∗ Q ) c . We will also refer to d ∗ Q as the number of degreesof freedom of the MIMO channel corresponding to the cut Ω ∗ Q , expressed succinctly as d ∗ Q = DOF (cid:32) arg min Ω: s ∈ Ω ,d ∈ Ω c C i.i.d.Q (Ω) (cid:33) . (6)Note that the min cut Ω ∗ Q and therefore d ∗ Q depends on Q . Inparticular, if Q and Q are two non-negative numbers andsay Q > Q ≥ , then d ∗ Q can be larger than, smaller thanor same as d ∗ Q . The following theorem states our main result. Theorem 1.
The capacity C of the network described inSection II-A satisfies C ≥ C ≥ C − d ∗ log (cid:18) Md ∗ (cid:19) − NQ − d ∗ Q log( Q + 1) , for any non-negative Q , where C is the cutset bound of thenetwork given in (1) . Note that Q in the theorem is a free parameter that can beoptimized for a given network to minimize the gap between theachieved rate and the cutset upper bound. In the proof of thetheorem, we will see that Q corresponds to the variance of thequantization noise introduced at the relays in noisy networkcoding [6]; larger Q corresponds to coarser quantization. Inprevious works [5], [6], Q is chosen to be constant independentof the number of nodes (or antennas) N (i.e. Q ≈ and thequantization noise Qσ is of the order of the Gaussian noisevariance σ ). Observe that due to the third term N/Q of thegap in Theorem 1, this results in a gap that is at least linearin N . Trivially upper bounding both d ∗ and d ∗ Q by N makesthe first and the third term also linear in N . However, in manycases, the min cut of the network can have much smaller DOFthan M and N and in such cases allowing Q to depend on N can result in a much smaller gap.For example, in the diamond network with single-antennaat each node it is clear a priori that any cut of the networkhas at most two degrees of freedom, regardless of the numberof relays, and therefore d ∗ Q ≤ for any Q . It can be seenimmediately from the above theorem that choosing Q = N in this case results in a gap logarithmic in N [11], whichcompares favorably with a gap that is linear in N . Similarly,for the fast-fading layered network with K single-antennanodes per layer defined in Section II-B, we show in Section VIIthat d ∗ Q ≤ K for any Q . If there are D layers in the networkso that N = M = KD , the above expression tells us thatchoosing Q to be proportional to D gives a gap that islogarithmic in D instead of linear in D . In Section VIII, wedemonstrate yet another setting in which applying Theorem 1and choosing Q to be proportional to the number of layers R NNC (cid:44) sup (cid:81) k ∈N p ( x k ) p (ˆ y k | y k ,x k ) min Ω: s ∈ Ω ,d ∈ Ω c (cid:16) I ( X Ω ; ˆ Y Ω c | X Ω c ) − I ( Y Ω ; ˆ Y Ω | X N , ˆ Y Ω c ) (cid:17) . (3)allows us to obtain an improved gap. This demonstrates thatthe rule of thumb in the current literature to quantize receivedsignals at the noise level ( Q ≈ ) can be highly suboptimal.Theorems 2 and 3 stated below provide formally the resultsthat are mentioned in the preceding paragraph. Theorem 2.
The capacity C of the fast-fading layered networkdescribed in Section II-B satisfies C ≥ C ≥ C − K log D − K. (7)Theorem 2 follows from evaluating the required quantitiesin the expression in Theorem 1 for the setup in Section II-B.However, directly applying the result of Theorem 1 for thissetup yields a gap of K log D + K . It turns out that wecan further tighten the gap to K log D + K based on theobservation that for this setup, the cutset bound can beevaluated explicitly and the optimal channel input distributionturns out to be independent across the antennas. The detailedproof appears in Section VII-A and VII-B.The following corollary extends the result of Theorem 2to the setup considered in [13]. In this setup, instead of asingle K -antenna source, there are K single-antenna sources { s , s . . . , s K } interested in communicating with the destina-tion, as depicted in Figure 4. We show that Theorem 2 alsoimplies a similar result for the sum-capacity C of this network. Corollary 1.
The sum-capacity C of the network in Figure 4satisfies C ≥ C ≥ C − K log D − K. (8)The proof of Corollary 1 appears in Section VII-C. s s s d V V V D − V D Fig. 4: Fast-Fading Layered Network with multiple sourcesThe following theorem states the result for the static layerednetwork setup, and the proof is given in Section VIII.
Theorem 3.
For K ≥ and D ≥ , the capacity C of thelayered network described in Section II-C satisfies C ≥ C ≥ C − K log D − K log K − K. (9)IV. L INE N ETWORK
We first illustrate the main idea of this paper in a simplesetting, the line network in Figure 1. Here we assume that eachlink i is a AWGN channel with gain h i and the channel gains h i are fixed and known. Each node has power P and the noise variance is σ . (The conclusions below also hold under a fast-fading assumption similar to the one described in Section II.)It is clear that a decode-forward strategy at the relays achievesthe capacity of this line network, while compress-and-forwardbased strategies (such as quantize-map-forward in [5] andnoisy network coding in [6]) with quantization done at thenoise level have a gap to capacity that is linear in the numberof nodes D . Here, we show that if relays instead quantize atresolution ( D − times the noise level, the gap to capacitybecomes logarithmic in D .Number the nodes s through d as , , , . . . , D . Let’sconsider the rate achievable by noisy network coding for thisnetwork, assuming all relay nodes choose their transmissioncodebooks independently from a Gaussian distribution, i.e. X i ∼ CN (0 , P ) and independent of each other. As describedin Section II-D, the rate min ≤ i ≤ D − (cid:16) I ( X i ; ˆ Y i +1 | X i +1 ) − I ( Y V i ; ˆ Y V i | X N , ˆ Y N \V i ) (cid:17) , is achievable, where V i = { , . . . , i } , and each relay chooses ˆ Y i = Y i + ˆ Z i where ˆ Z i ∼ N (0 , ( D − σ ) independent ofeverything else. Since Y i +1 = h i X i + Z i +1 , the channel from X i to ˆ Y i +1 is effectively an AWGN channel of noise power Dσ and gain h i . Then the first term in the achievable rateexpression becomes log (cid:16) | h i | PDσ (cid:17) which is greater than orequal to log (cid:16) | h i | Pσ (cid:17) − log( D ) .Due to the coarse quantization, the second term in theachievable rate expression is reduced significantly as comparedto quantizing at the noise level. We have I ( Y V i ; ˆ Y V i | X N , ˆ Y N \V i ) = I ( Z V i ; { Z + ˆ Z } V i )= ( |V i | −
1) log (cid:18) σ ( D − σ (cid:19) = i log (cid:18) D − (cid:19) ≤ iD − ≤ , since i ≤ D − . Since the capacity of the line network is givenby the minimum of the capacities of each link: min i log(1 + | h i | P ) , we see that decreasing the resolution of quantizationas the number of nodes increases results in a gap of log( D )+1 to capacity. If the quantization were done at the noise level, thefirst term in the noisy network coding achievable rate wouldsuffer from only a log(2) decrease instead of log( D ) withrespect to capacity, however the second term would be linearin D , overall resulting in a gap to capacity that is linear in D .At a first glance, coarser quantization resulting in betterachievable rates might seem counter-intuitive. We discuss thisin more depth in the following section. V. G
AP TO C APACITY WITH N OISY N ETWORK C ODING
In this section, we discuss the elements of the gap betweenthe rate achieved by noisy network coding (NNC) and the cut-set bound and identify a trade-off between different elementsof the gap. Our main result builds on the understanding of thistrade-off.Consider an arbitrary discrete memoryless network with aset of nodes N where a source node s wants to communicateto a destination node d with the help of the remaining nodesacting as relays. As stated earlier in Section II-D, noisynetwork coding can achieve the rate given in (3). Comparingthis with the cutset bound on the capacity of the network, C = sup p ( x N ) min Ω: s ∈ Ω ,d ∈ Ω c I ( X Ω ; Y Ω c | X Ω c ) , (10)we observe the following differences. First, while the maxi-mization in (10) is over all possible input distributions, onlyindependent input distributions are admissible in (3). This gapcorresponds to a potential beamforming gain that is allowed inthe cutset bound but not exploited by NNC. Second, the firstterm in (3) is similar to (10) but with Y Ω c in (10) replaced by ˆ Y Ω c in (3). This difference corresponds to a rate loss due tothe quantization noise introduced by the relays. Third, thereis the extra term I ( Y Ω ; ˆ Y Ω | X N , ˆ Y Ω c ) reducing the rate in (3).One way to potentially interpret this term would be as therate penalty for communicating the quantized (compressed)observations ˆ Y Ω to the destination on top of the desiredmessage. Note that this is the rate required to describe theobservations Y Ω at the distortion dictated by ˆ Y Ω to a decoderthat already knows (or has decoded) X N , ˆ Y Ω c .However, it is not completely clear if this interpretation isprecise because the non-unique decoder employed by NNCdoes not require the quantization indices to be explicitlydecoded. The non-unique decoder of NNC searches for theunique source codeword that is jointly typical with some (notnecessarily unique) set of quantization indices at the relays andthe received signal at the destination. The following example inFigure 5 illustrates that in certain cases the decoder can indeedrecover the transmitted message even if it can not uniquelyrecover the quantization index of the relay. Even though wefocus on the extremal case where the r − d link is zero, thediscussion extends to the case where this link is sufficientlyweak. s r dh ≈ h Fig. 5: ExampleConsider the classical relay channel with a very weak linkfrom the relay to the destination. Clearly, as long as the sourceuses a codebook of rate less than the capacity of the direct link,no matter what the operation at the relay is, the destinationcan always decode the source message by performing a jointtypicality test between its received signal and the sourcecodebook (which is subsumed by the non-unique typicality test of NNC). In particular, if the relay quantizes too finely,then there is no way for the destination to recover the relay’squantization index, even though the source message can stillbe recovered.On the other hand, this example reveals the followingstrange property of the expression in (3). While the abovediscussion reveals that in the setup of Fig. 5, the rate achievedby NNC is equal to the capacity of the direct link independentof the relay’s operation (i.e. what ˆ Y r is), the rate in (3) isdecreasing with increasing resolution for the quantization atthe relay (due to the subtractive term I ( Y Ω ; ˆ Y Ω | X N , ˆ Y Ω c ) ).This suggests a more careful analysis of the rate achieved byNNC which leads to the improved rate given in (11) at the topof the next page. Here, only those relays that are in M ⊆ N are considered in the non-unique typicality decoding, whilethe other relay transmissions are treated as noise. For example,for the relay channel in Figure 5, this would correspond to notconsidering the relay in the typicality decoding.It has been shown in [18] that if M ∗ is the subset thatmaximizes (11) for a given (cid:81) i ∈N p ( x i ) p (ˆ y i | y i , x i ) , then thequantization indices of the relays in M ∗ can be uniquelydecoded at the destination, while the quantization indices ofthe relays in N \ M ∗ cannot be decoded and in fact, itis optimal to treat the transmissions from these relays asnoise. Since the transmissions from N \ M ∗ are treatedas noise, the expression (11) is increased if these relaysare shut down. Hence, we can conclude that in the optimaldistribution (cid:81) i ∈N p ( x i ) p (ˆ y i | y i , x i ) for NNC, some relays canbe off (not utilized or equivalently always quantizing theirreceived signals to zero) and some relays can be active, butthe quantization indices of all relays (the active ones andtrivially the inactive ones) can be uniquely decoded at thedestination. Since the quantization indices are communicatedto the destination together with the source message, thereshould be a rate penalty for communicating them which isprecisely the term I ( Y Ω ; ˆ Y Ω | X M , ˆ Y Ω c ) .The above discussion reveals that NNC communicates notonly the source message but also the quantization indices to thedestination despite the non-unique typicality test performed atthe decoder; and while making quantizations finer introducesless quantization noise in the communication, it leads toa larger rate penalty for communicating these quantizationindices to the destination. This tradeoff is made explicit inTheorem 1 which establishes the following achievable rate C − d ∗ log (cid:18) Md ∗ (cid:19) − NQ − d ∗ Q log( Q + 1) , for any Q ≥ . Here, the term NQ corresponds to therate penalty associated with communicating the quantizationindices and the term d ∗ Q log( Q + 1) corresponds to the ratepenalty due to the quantization noise. Choosing a larger Q increases the latter but decreases the former.VI. P ROOF OF M AIN R ESULT
In this section we prove Theorem 1 by evaluating the rateachieved by noisy network coding in (3) for a specific choiceof the distribution (cid:81) k ∈N p ( x k ) p (ˆ y k | y k , x k ) that satisfies the sup (cid:81) i ∈N p ( x i ) p (ˆ y i | y i ,x i ) sup M⊆N min Ω ⊆M : s ∈ Ω ,d ∈M\ Ω (cid:16) I ( X Ω ; ˆ Y Ω c | X Ω c ) − I ( Y Ω ; ˆ Y Ω | X M , ˆ Y Ω c ) (cid:17) (11)power constraint. We choose the channel input vector at eachnode j as X j ∼ CN (0 , P I ) and ˆ Y k for each receive antennain the network is chosen such that ˆ Y k = Y k + ˆ Z k where ˆ Z k ∼ CN (0 , Qσ ) , (12)independent of everything else, for some Q ≥ . Then, theachievable rate stated in (3) is given by min Ω: s ∈ Ω ,d ∈ Ω c (cid:16) I ( X Ω ; ˆ Y Ω c | X Ω c ) − I ( Y Ω ; ˆ Y Ω | X N , ˆ Y Ω c ) (cid:17) . (13)This implies that the following rates are also achievable: min Ω: s ∈ Ω ,d ∈ Ω c I ( X Ω ; ˆ Y Ω c | X Ω c ) − max Ω: s ∈ Ω ,d ∈ Ω c I ( Y Ω ; ˆ Y Ω | X N , ˆ Y Ω c ) . (14)We first show that for the choice of the distribution for X j ’sand ˆ Y k ’s in (12), we have I ( Y Ω ; ˆ Y Ω | X N , ˆ Y Ω c ) ≤ NQ for all cuts Ω such that s ∈ Ω , d ∈ Ω c , as follows. I ( Y Ω ; ˆ Y Ω | X N , ˆ Y Ω c )= h ( ˆ Y Ω | X N , ˆ Y Ω c ) − h ( ˆ Y Ω | Y Ω , X N , ˆ Y Ω c ) ( a ) = h ( ˆ Y Ω | X N , ˆ Y Ω c ) − h ( ˆ Y Ω | Y Ω , X N ) ≤ h ( ˆ Y Ω | X N ) − h ( ˆ Y Ω | Y Ω , X N ) ( b ) = (cid:88) j ∈ Ω N j log ( Q + 1) − (cid:88) j ∈ Ω N j log ( Q )= (cid:88) j ∈ Ω N j log (cid:18) Q (cid:19) ≤ NQ , (15)where both (a) and (b) follow due to our specific choice forthe distribution (cid:81) k ∈N p ( x k ) p (ˆ y k | y k , x k ) . Hence, max Ω: s ∈ Ω ,d ∈ Ω c I ( Y Ω ; ˆ Y Ω | X N , ˆ Y Ω c ) ≤ NQ . (16)We now lower bound the first term in (14). Since X Ω ischosen to be CN (0 , P I ) , the quantity I ( X Ω ; ˆ Y Ω c | X Ω c ) isequal to C i.i.d.Q (Ω) , where C i.i.d.Q (Ω) is defined in (4). Let Ω ∗ Q denote the cut with minimal cut value as defined in (5).Then, min Ω: s ∈ Ω ,d ∈ Ω c I ( X Ω ; ˆ Y Ω c | X Ω c )= min Ω: s ∈ Ω ,d ∈ Ω c C i.i.d.Q (Ω)= C i.i.d.Q (Ω ∗ Q ) ( a ) ≥ C i.i.d. (Ω ∗ Q ) − d ∗ Q log( Q + 1) (17) ( b ) ≥ C i.i.d. (Ω ∗ ) − d ∗ Q log( Q + 1) ( c ) ≥ sup p ( x N ) I ( X Ω ∗ ; Y (Ω ∗ ) c | X (Ω ∗ ) c ) − d ∗ log (cid:32) (cid:80) i ∈ Ω ∗ M i d ∗ (cid:33) − d ∗ Q log( Q + 1) (18) ≥ sup p ( x N ) I ( X Ω ∗ ; Y (Ω ∗ ) c | X (Ω ∗ ) c ) − d ∗ log (cid:18) Md ∗ (cid:19) − d ∗ Q log( Q + 1)= sup p ( x N ) min Ω: s ∈ Ω ,d ∈ Ω c I ( X Ω ; Y Ω c | X Ω c ) − d ∗ log (cid:18) Md ∗ (cid:19) − d ∗ Q log( Q + 1)= C − d ∗ log (cid:18) Md ∗ (cid:19) − d ∗ Q log( Q + 1) , (19)where ( a ) is justified by the following: C i.i.d.Q (Ω ∗ Q )= log det (cid:18) I + P ( Q + 1) σ H Ω ∗ Q → (Ω ∗ Q ) c H † Ω ∗ Q → (Ω ∗ Q ) c (cid:19) ≥ log det (cid:18) I + Pσ H Ω ∗ Q → (Ω ∗ Q ) c H † Ω ∗ Q → (Ω ∗ Q ) c (cid:19) − d ∗ Q log( Q + 1)= C i.i.d. (Ω ∗ Q ) − d ∗ Q log( Q + 1) , (20) ( b ) follows by the definition of Ω ∗ and ( c ) follows from [5,Lemma 6.6] equation (144), which considers a MIMO channelwith per-antenna power constraint and bounds the gap betweenits capacity and the largest achievable rate with no spatialcoding, i.e. the rate achieved by using independent inputs atthe antennas.The proof of Theorem 1 follows from (16) and (19).We next state an observation which will be useful inSection VIII when we analyze the static layered network. Remark 1.
If there exists a set of cuts A such that min Ω: s ∈ Ω ,d ∈ Ω c C i.i.d.Q (Ω) ≥ min Ω ∈A : s ∈ Ω ,d ∈ Ω c C i.i.d.Q (Ω) − κ for all Q , where κ is a constant, then the gap between theupper and the lower bound in Theorem 1 can be potentiallyimproved to ˜ d ∗ log (cid:32) M ˜ d ∗ (cid:33) + NQ + ˜ d ∗ Q log( Q + 1) + κ, (21) where ˜ d ∗ Q (cid:44) DOF arg min Ω ∈A : s ∈ Ω ,d ∈ Ω c C i.i.d.Q (Ω) . (22)This can be seen by modifying the proof of the lower bound (19) slightly as: min Ω: s ∈ Ω ,d ∈ Ω c I ( X Ω ; ˆ Y Ω c | X Ω c )= min Ω: s ∈ Ω ,d ∈ Ω c C i.i.d.Q (Ω) ≥ min Ω ∈A : s ∈ Ω ,d ∈ Ω c C i.i.d.Q (Ω) − κ ≥ min Ω ∈A : s ∈ Ω ,d ∈ Ω c C i.i.d. (Ω) − ˜ d ∗ Q log( Q + 1) − κ ≥ C − ˜ d ∗ log (cid:32) M ˜ d ∗ (cid:33) − ˜ d ∗ Q log( Q + 1) − κ, where each step follows by the same arguments in (19).VII. F AST - FADING L AYERED N ETWORK
In this section, we concentrate on the fast-fading layerednetwork defined in Section II-B and obtain an approximationfor the capacity of this network.
A. Applying Theorem 1 to the fast-fading layered network
For the fast-fading setup, we assume that the destinationknows all the instantaneous channel realizations in the networkwhile the source and the relay nodes only know the statisticsof the channel coefficients. We first note that under thisassumption, the cutset bound and the noisy network codingrate can be expressed as follows.- Cutset Bound:Noting that under the above assumption the effectivereceived signal at the destination can be considered to be ( Y d , H ) , where H contains all the channel realizations inthe network, the cutset bound in (1) can be written as C = sup p ( x N ) (cid:18) min Ω: s ∈ Ω ,d ∈ Ω c C (Ω) (cid:19) , (23)where C (Ω) (cid:44) I ( X Ω ; Y Ω c , H | X Ω c )= I ( X Ω ; Y Ω c | X Ω c , H ) since X N is independent of H .- Noisy Network Coding:The rate achieved by noisy network coding is given by(24) given at the top of the next page, where we haveagain used the fact that X N is independent of H .We now proceed to the proof of Theorem 2. We first notethat by following similar steps as in the proof of Theorem 1,we can get the following result: C ≥ C ≥ C − d ∗ log (cid:18) Md ∗ (cid:19) − NQ − d ∗ Q log( Q + 1) , (25)where d ∗ Q is now analogously defined as the expected degreesof freedom of the fast-fading MIMO channel correspondingto the cut Ω ∗ Q that minimizes E [ C i.i.d.Q (Ω)] , which we expressas d ∗ Q (cid:44) DOF (cid:32) arg min Ω: s ∈ Ω ,d ∈ Ω c E (cid:2) C i.i.d.Q (Ω) (cid:3)(cid:33) , and the expectation is with respect to the randomness inthe channels. Note that when we proved Theorem 1, wedefined C i.i.d.Q (Ω) to be the first mutual information term inthe achievable rate for noisy network coding in (13) whenthe input distributions X j are i.i.d. CN (0 , P I ) and ˆ Y k ’s arechosen according to (12). In the current fast-fading case thefirst mutual information term in the achievable rate for noisynetwork coding in (24) is equal to E [ C i.i.d.Q (Ω)] under thesame distribution for the X j ’s and ˆ Y k ’s. Therefore, the proofof Theorem 1 can be applied verbatim in the current case byonly modifying the definition of d ∗ Q accordingly.Now, by choosing Q to be equal to Q (cid:48) = D − , we getthat C ≥ C − d ∗ log (cid:18) Md ∗ (cid:19) − NQ (cid:48) − d ∗ Q (cid:48) log( Q (cid:48) + 1)= C − d ∗ log (cid:18) K ( D − d ∗ (cid:19) − K ( D − Q (cid:48) − d ∗ Q (cid:48) log( Q (cid:48) + 1) ( a ) = C − K log (cid:18) K ( D − K (cid:19) − K ( D − Q (cid:48) − K log( Q (cid:48) + 1) ( b ) ≥ C − K log D − K − K log D, = C − K log D − K, where- ( a ) follows from Lemma 1, provided below, which statesthat d ∗ Q = K for any Q ≥ ; and- ( b ) follows since Q (cid:48) = D − . Thus, we have characterized the capacity of the fast-fadinglayered network within a gap of K log D + K . The nextsubsection describes how this result can be tightened to obtaina gap equal to K log D + K, which will conclude the proofof Theorem 2. Lemma 1.
For the fast-fading layered network, we have forany Q ≥ , min Ω: s ∈ Ω ,d ∈ Ω c E (cid:2) C i.i.d.Q (Ω) (cid:3) = E (cid:2) C i.i.d.Q ( V ) (cid:3) , which implies d ∗ Q = K. Proof:
See Appendix A.
B. Tightening the approximation
The main idea in tightening the approximation is that forthe fast-fading layered network, we can get rid of the term d ∗ log (cid:16) Md ∗ (cid:17) in the gap given by Theorem 1.Recall from the proof of Theorem 1 that this term appearsbecause we need to bound the difference between the capacityof a MIMO channel with per-antenna power constraint and therate achievable by using independent inputs at each antenna.However, for an i.i.d. Rayleigh fast-fading MIMO channel, itis the case that independent inputs at each node are optimaland so the largest rate achievable by using independent inputsat each antenna is equal to the capacity [19]. R NNC = sup (cid:81) k ∈N p ( x k ) p (ˆ y k | y k ,x k ) min Ω: s ∈ Ω ,d ∈ Ω c (cid:16) I ( X Ω ; ˆ Y Ω c | X Ω c , H ) − I ( Y Ω ; ˆ Y Ω | X N , ˆ Y Ω c , H ) (cid:17) , (24)Then, the proof for obtaining equation (25) which is basedon the proof of Theorem 1 can be repeated verbatim exceptfor one change: in (18), the term d ∗ log (cid:18) (cid:80) i ∈ Ω ∗ M i d ∗ (cid:19) can be removed. This is valid since Ω ∗ = V as shownby Lemma 1, which induces an i.i.d. Rayleigh fast-fading K × K MIMO channel. This improves the lower boundobtained in the previous subsection from C − K log D − K to C − K log D − K. For clarity, we present the arguments infull formality below.We first define, for any Q ≥ , f Q ( x, y ) (cid:44) E (cid:20) log det (cid:18) I + P ( Q + 1) σ H x,y H † x,y (cid:19)(cid:21) , (26)where H x,y is a x × y matrix containing i.i.d. CN (0 , entries.Note that using this notation, we have that E (cid:2) C i.i.d.Q ( V ) (cid:3) isequal to f Q ( K, K ) .Using this notation, the statement of Lemma 1 is min Ω: s ∈ Ω ,d ∈ Ω c E (cid:2) C i.i.d.Q (Ω) (cid:3) = E (cid:2) C i.i.d.Q ( V ) (cid:3) = f Q ( K, K ) . (27)Before proceeding to the proof of the lower bound, we givethe following lemma, which states that the cutset bound de-fined in (23), which involves a maximization over all possibleinput distributions, is equal to min Ω: s ∈ Ω ,d ∈ Ω c E (cid:2) C i.i.d. (Ω) (cid:3) . Lemma 2.
For the fast-fading layered network, C = min Ω: s ∈ Ω ,d ∈ Ω c E (cid:2) C i.i.d. (Ω) (cid:3) , and hence C also equals E (cid:2) C i.i.d. ( V ) (cid:3) = f ( K, K ) . Proof:
See Appendix B.Using the above lemma, we can now complete the proof ofthe tighter lower bound via the following chain of inequalities.Recall that X j are chosen to be i.i.d. CN (0 , P I ) and ˆ Y k ’s arechosen according to (12). As in the previous subsection, weset Q to be equal to Q (cid:48) = D − .C ( a ) ≥ min Ω: s ∈ Ω ,d ∈ Ω c (cid:16) I ( X Ω ; ˆ Y Ω c | X Ω c , H ) − I ( Y Ω ; ˆ Y Ω | X N , ˆ Y Ω c , H ) (cid:17) ≥ min Ω: s ∈ Ω ,d ∈ Ω c I ( X Ω ; ˆ Y Ω c | X Ω c , H ) − max Ω: s ∈ Ω ,d ∈ Ω c I ( Y Ω ; ˆ Y Ω | X N , ˆ Y Ω c , H ) ( b ) ≥ min Ω: s ∈ Ω ,d ∈ Ω c I ( X Ω ; ˆ Y Ω c | X Ω c , H ) − K ( D − Q (cid:48) = min Ω: s ∈ Ω ,d ∈ Ω c E (cid:2) C i.i.d.Q (cid:48) (Ω) (cid:3) − K ( D − Q (cid:48) ( c ) = f Q (cid:48) ( K, K ) − K ( D − Q (cid:48) ( d ) ≥ f ( K, K ) − K log( Q (cid:48) + 1) − K ( D − Q (cid:48) ( e ) = C − K log( Q (cid:48) + 1) − K ( D − Q (cid:48) = C − K log D − K, (28)where- ( a ) gives the rate achieved by noisy network coding,- ( b ) follows since, similar to (15), max Ω: s ∈ Ω ,d ∈ Ω c I ( Y Ω ; ˆ Y Ω | X N , ˆ Y Ω c , H ) ≤ K ( D − Q (cid:48) , - ( c ) follows from (27),- ( d ) follows, similarly to (17), because f Q (cid:48) ( K, K )= E (cid:20) log det (cid:18) I + P ( Q (cid:48) + 1) σ H K,K H † K,K (cid:19)(cid:21) ≥ E (cid:20) log det (cid:18) I + Pσ H K,K H † K,K (cid:19)(cid:21) − K log( Q (cid:48) + 1)= f ( K, K ) − K log( Q (cid:48) + 1) , (29)- ( e ) follows from Lemma 2. Note the difference be-tween this step and the corresponding step (18) in theproof of Theorem 1. For general networks, the term d ∗ log (cid:16) (cid:80) i ∈N M i d ∗ (cid:17) is required, while for the specialcase of fast-fading layered networks, we are able to getrid of it.This concludes the proof of Theorem 2. C. Proof of Corollary 1
In this subsection, we prove that the result of Theorem 2 canbe extended to the case with multiple sources. Assume that K single-antenna sources each wish to transmit a message at rate RK , so that the sum-rate is R . We have, via the cutset bound,the following upper bound on the achievable sum-rate K : R < sup p ( x N ) min Ω : s ,s ,...,s K ∈ Ω ,d ∈ Ω c I ( X Ω ; Y Ω c | X Ω c , H ) . The RHS of the above expression is equal to the cutsetbound on the achievable rate in the case of a single sourceas given in (23). Hence, we have that if a sum-rate R isachievable, then it must satisfy R < C.
This proves the upper bound on the sum-capacity. Inthe remainder of this subsection, we focus on proving thelower bound. As before, we fix the distribution p ( x N ) to be (cid:81) k ∈N p ( x k ) , with each term being CN (0 , P ) . The distribu-tion p (ˆ y k | y k , x k ) at the relays is to be of the same form asthat in (12). From the result for multiple sources stated in [6, Theorem 1], we get that R is achievable if for all ≤ k ≤ K ,we have k RK < min Ω: |{ s i : s i ∈ Ω }| = k,d ∈ Ω c ( I ( X Ω ; Y Ω c | X Ω c , H ) − I ( Y Ω ; ˆ Y Ω | X N , Y Ω c , H ) (cid:17) . (30)For a given k , the above constraint is obtained by consideringcuts Ω which contain k source nodes and therefore it upperbounds the sum rate kR/K achievable for these k sources.Note that we get a constraint on R for each value of k , where k ∈ { , , . . . , K } . Also, note that if we consider k = K , weget a constraint on R that is the same as (24). So, if thiswere the only constraint on R , then the proof of Theorem 2in Section VII-B, which shows that the right-hand side of (24)is larger than C − K log D − K , would conclude the proof ofCorollary 1. Towards this goal, we prove in Appendix C thatany k < K imposes a constraint on R that is only looser thanthe constraint R < C − K log D − K = f ( K, K ) − K log D − K. This concludes the proof of Corollary 1.VIII. S
TATIC L AYERED N ETWORKS
In this section, we prove Theorem 3. We first show that forany Q ≥ , min Ω: s ∈ Ω ,d ∈ Ω c C i.i.d.Q (Ω) can be approximatedupto an additive constant by restricting the minimization tocuts in a particular class. Then, Theorem 3 is proved bymaking use of Remark 1.For convenience, let H V i →V i +1 denote the matrix in C K × K containing channel gains from nodes in layer i to nodes inlayer i + 1 , and call the K entries in H V i →V i +1 as the linksin layer i . With this convention in mind, let A denote the setof cuts Ω for which the links crossing from Ω to Ω c comefrom at most K − layers, e.g. see Figure 6. Ω Ω c s d V V V D − V D Fig. 6: The cut Ω depicted here / ∈ A since the crossing linkscome from 4 layers, and > K − . Lemma 3.
For the static layered network in Section II-C, wehave, for any Q ≥ , min Ω: s ∈ Ω ,d ∈ Ω c C i.i.d.Q (Ω) ≤ min Ω ∈A : s ∈ Ω ,d ∈ Ω c C i.i.d.Q (Ω) , and min Ω: s ∈ Ω ,d ∈ Ω c C i.i.d.Q (Ω) ≥ min Ω ∈A : s ∈ Ω ,d ∈ Ω c C i.i.d.Q (Ω) − K log K. Proof:
The upper bound is immediate. The lower boundcan be proved by noting that the chain of inequalities givenon top of the next page, holds for any cut Ω / ∈ A , where ( a ) follows since for any cut / ∈ A , at least K terms in thesummation are non-zero and each of these terms can be lower-bounded by the AWGN capacity of a point-to-point channelbetween a single transmit and single receive antenna with unitmagnitude channel coefficient; and ( b ) follows by Lemma 4which is stated and proved below. This concludes the proof ofthe lemma. Lemma 4.
For the static layered network in Section II-C, wehave, for any Q ≥ ,C i.i.d.Q ( V ) ≤ K log (cid:18) P ( Q + 1) σ (cid:19) + K log K. Proof: C i.i.d.Q ( V ) = log det (cid:18) I + P ( Q + 1) σ H V →V H †V →V (cid:19) ( a ) ≤ K (cid:88) i =1 log (cid:18) P ( Q + 1) σ h i h † i (cid:19) ( b ) = K (cid:88) i =1 log (cid:18) P ( Q + 1) σ K (cid:19) ≤ K log (cid:18) P ( Q + 1) σ (cid:19) + K log K, where h i denotes the i th row of H V →V and ( a ) follows byusing Hadamard’s inequality and ( b ) follows from the fact thatthe channel gains have unit magnitude.We now use the observation made in Remark 1 to proveTheorem 3. As in the previous section, first note that M = N = K ( D − . Then, we note that for any cut Ω in A , thematrix H Ω → Ω c can have at most K ( K − columns. This isbecause the links crossing from Ω to Ω c come from at most K − layers, hence there can be at most K ( K − nodesin Ω from which the crossing links originate. Hence, a trivialupper bound on ˜ d ∗ Q (defined in (22)) for any Q is ˜ d ∗ Q ≤ K ( K − ≤ K . (31)Now, we set Q to be Q (cid:48) = D − and use the result in (21)to prove Theorem 3 as follows: C ≥ C − ˜ d ∗ log (cid:32) M ˜ d ∗ (cid:33) − NQ (cid:48) − ˜ d ∗ Q (cid:48) log( Q (cid:48) + 1) − κ ( a ) ≥ C − ˜ d ∗ log (cid:32) M ˜ d ∗ (cid:33) − NQ (cid:48) − ˜ d ∗ Q (cid:48) log( Q (cid:48) + 1) − K log K ( b ) ≥ C − K log (cid:18) K ( D − K (cid:19) − K ( D − Q (cid:48) − K log( Q (cid:48) + 1) − K log K ( c ) = C − K log (cid:18) D − K (cid:19) − K − K log D − K log K ≥ C − K log D − K log K − K, C i.i.d.Q (Ω) = D − (cid:88) i =0 log det (cid:18) I + P ( Q + 1) σ H ( V i ∩ Ω) → ( V i +1 ∩ Ω c ) H † ( V i ∩ Ω) → ( V i +1 ∩ Ω c ) (cid:19) ( a ) ≥ K log (cid:18) P ( Q + 1) σ (cid:19) ( b ) ≥ C i.i.d.Q ( V ) − K log K ≥ min Ω ∈A : s ∈ Ω ,d ∈ Ω c C i.i.d.Q (Ω) − K log K where ( a ) follows by Lemma 3, ( b ) follows from (31) and thefact that x log(1 + M/x ) is an increasing function of x , and ( c ) follows since Q (cid:48) = D − . This concludes the proof ofTheorem 3. IX. C ONCLUDING R EMARKS
In this paper, we have developed improved capacity ap-proximations for Gaussian relay networks. While existingapproximations bound the capacity gap only in terms of thetotal number of nodes in the network, we have developed arefined approximation for the capacity of general Gaussianrelay networks where the gap depends not only on the totalnumber of nodes but other structural properties of the network(the degrees of freedom of the mincut). We have shown thatthis refined result allows to better approximate the capacity ofmany Gaussian networks, some classes of layered networks inparticular.The improvement comes from carefully exploiting a trade-off inherent to compress-and-forward based strategies. Whenrelays quantize/compress signals very finely, little quantizationnoise is introduced to the communication. When relays quan-tize/compress signals coarsely, there is a smaller rate penaltyassociated with communicating these quantization indices tothe destination. We have shown that this trade-off can bevery much in favor of coarse quantization, leading to thecounter-intuitive principle of quantizing signals more and morecoarsely with increasing number of relaying stages.A
PPENDIX AP ROOF OF L EMMA Proof:
By the definition of C i.i.d.Q (Ω) , E (cid:2) C i.i.d.Q (Ω) (cid:3) = E (cid:20) log det (cid:18) I + P ( Q + 1) σ H Ω → Ω c H † Ω → Ω c (cid:19)(cid:21) . We first note that for any cut Ω in the set {V , V , . . . , V D − } , the statistics of H Ω → Ω c are identical.Hence, the value of E (cid:2) C i.i.d.Q (Ω) (cid:3) is the same for all thesecuts and we use V as a representative.We now prove the statement: For any Q ≥ , min Ω: s ∈ Ω ,d ∈ Ω c E (cid:2) C i.i.d.Q (Ω) (cid:3) = E (cid:2) C i.i.d.Q ( V ) (cid:3) . (32)The proof of the “ ≤ ” direction of the inequality, i.e. min Ω: s ∈ Ω ,d ∈ Ω c E (cid:2) C i.i.d.Q (Ω) (cid:3) ≤ E (cid:2) C i.i.d.Q ( V ) (cid:3) is immediate. We focus on proving the inequality in the otherdirection in the remainder of this proof.Consider a cut Ω that contains M nodes from V , M from V and so on until M D − from V D − (see Figure 6). Then E (cid:2) C i.i.d.Q (Ω) (cid:3) is given by E (cid:20) log det (cid:18) I + P ( Q + 1) σ H Ω → Ω c H † Ω → Ω c (cid:19)(cid:21) , where H Ω → Ω c is a block diagonal matrix containing blocksof size M c -by- K , M c -by- M , M c -by- M , . . . , M cD − -by- M D − and finally K -by- M D − . In the preceding sentence, wehave abused notation slightly by using M ci to mean |V i |− M i = K − M i . Since H Ω → Ω c has a block diagonal structure, E (cid:2) C i.i.d.Q (Ω) (cid:3) breaks down into a sum of terms, each being a function of thenumber of nodes in Ω that belong to two adjacent layers. Thus, E (cid:2) C i.i.d.Q (Ω) (cid:3) = E (cid:20) log det (cid:18) I + P ( Q + 1) σ H Ω → Ω c H † Ω → Ω c (cid:19)(cid:21) = f Q ( M c , K ) + f Q ( M c , M )+ · · · + f Q ( M cD − , M D − ) + f Q ( K, M D − ) , (33)where f Q ( x, y ) is defined as in (26): f Q ( x, y ) (cid:44) E (cid:20) log det (cid:18) I + P ( Q + 1) σ H x,y H † x,y (cid:19)(cid:21) , and H x,y is a x × y matrix containing i.i.d. CN (0 , entries.Note that using this notation, E (cid:2) C i.i.d.Q ( V ) (cid:3) is equal to f Q ( K, K ) . So, our aim is to show that for any cut Ω , thequantity appearing in (33) is no less than f Q ( K, K ) .To accomplish this, we note the following properties of thefunction f Q ( x, y ) :a) f Q ( x, y ) = f Q ( y, x ) .b) f Q ( z, y ) ≥ f Q ( x, y ) if z ≥ x .c) f Q ( x, y ) + f Q ( K − x, y ) ≥ f Q ( K, y ) .The first two properties are straightforward and the third prop-erty follows via a simple application of Hadamard’s inequality.Proving that the quantity in (33) is no less than f Q ( K, K ) isjust a matter of applying these properties multiple times. Forconcreteness, we show this for the case D = 4 below, whichcan be generalized in a straightforward fashion to higher values of D . f Q ( M c , K ) + f Q ( M c , M ) + f Q ( M c , M ) + f Q ( K, M ) ≥ f Q ( M c , K ) + f Q ( M c , M )+ f Q ( M c , M ) + f Q ( M , M ) ≥ f Q ( M c , K ) + f Q ( M c , M ) + f Q ( K, M ) ≥ f Q ( M c , K ) + f Q ( M c , M ) + f Q ( M , M ) ≥ f Q ( M c , K ) + f Q ( K, M ) ≥ f Q ( K, K ) (34) = E (cid:2) C i.i.d.Q ( V ) (cid:3) , where the first inequality follows by applying property (b) tothe last term in the first line, the second inequality follows byapplying (c) to the last two terms in the earlier line etc. Sincethis is true for any cut Ω , we have shown that min Ω: s ∈ Ω ,d ∈ Ω c E (cid:2) C i.i.d.Q (Ω) (cid:3) ≥ E (cid:2) C i.i.d.Q ( V ) (cid:3) . (35)Thus, we have shown that (32) is true, i.e. min Ω: s ∈ Ω ,d ∈ Ω c E (cid:2) C i.i.d.Q (Ω) (cid:3) = E (cid:2) C i.i.d.Q ( V ) (cid:3) = f Q ( K, K ) , (36)which implies that V ∈ arg min Ω: s ∈ Ω ,d ∈ Ω c E (cid:2) C i.i.d.Q (Ω) (cid:3) . This further implies that d ∗ Q = K, since the DOF of the fast-fading MIMO channel correspondingto V is K . A PPENDIX BP ROOF OF L EMMA C = sup p ( x N ) (cid:18) min Ω: s ∈ Ω ,d ∈ Ω c C (Ω) (cid:19) = sup p ( x N ) (cid:18) min Ω: s ∈ Ω ,d ∈ Ω c I ( X Ω ; Y Ω c | X Ω c , H ) (cid:19) ≤ sup p ( x N ) (cid:0) I ( X V ; Y V | X ( V ) c , H ) (cid:1) ( a ) = E (cid:20) log det (cid:18) I + Pσ H V → ( V ) c H †V → ( V ) c (cid:19)(cid:21) = E (cid:2) C i.i.d. ( V ) (cid:3) ( b ) = min Ω: s ∈ Ω ,d ∈ Ω c E (cid:2) C i.i.d. (Ω) (cid:3) ≤ sup p ( x N ) (cid:18) min Ω: s ∈ Ω ,d ∈ Ω c I ( X Ω ; Y Ω c | X Ω c , H ) (cid:19) = C, where ( a ) follows by the fact that for a i.i.d. Rayleighfast-fading MIMO channel, the optimal input distribution isindependent across antennas [19], and ( b ) follows from (32)which shows that the cut that minimizes E (cid:2) C i.i.d. (Ω) (cid:3) is V . A PPENDIX
CIn this appendix, we elaborate on the argument required toprove the lower bound in Corollary 1.Consider a cut Ω such that |{ s i : s i ∈ Ω }| = k. Let Ω contain M i nodes from layer V i , for ≤ i ≤ D − . Asbefore, we choose the quantization noise variance Q to be Q (cid:48) = D − . This gives us a constraint on the achievablesum-rate R as follows: R < Kk (cid:16) I ( X Ω ; ˆ Y Ω c | X Ω c , H ) − I ( Y Ω ; ˆ Y Ω | X N , ˆ Y Ω c , H ) (cid:17) = Kk (cid:16) E (cid:2) C i.i.d.Q (cid:48) (Ω) (cid:3) − I ( Y Ω ; ˆ Y Ω | X N , ˆ Y Ω c , H ) (cid:17) = Kk (cid:16) f Q (cid:48) ( M c , k ) + f Q (cid:48) ( M c , M ) + · · · + f Q (cid:48) ( K, M D − ) − I ( Y Ω ; ˆ Y Ω | X N , ˆ Y Ω c , H ) (cid:17) , where we use the notation f Q ( x, y ) defined in (26). Since wehave I ( Y Ω ; ˆ Y Ω | X N , ˆ Y Ω c , H ) ≤ (cid:80) D − i =1 M i Q (cid:48) = (cid:80) D − i =1 M i D − , which can be proved using steps similar to those used to arriveat (15), we can impose a tighter constraint on the sum-rate R due to the cut Ω , which is as follows. R < Kk (cid:18) f Q (cid:48) ( M c , k ) + f Q (cid:48) ( M c , M )+ · · · + f Q (cid:48) ( K, M D − ) − (cid:80) D − i =1 M i D − (cid:19) . (37)In the following, we show for any k < K , the above isweaker than R < f ( K, K ) − K log D − K, (38)i.e. the right-hand side of (37) for any k < K is larger than f ( K, K ) − K log D − K .Note that if f ( K, K ) − K log D − K ≤ , the achievablerate claimed by (38) is zero so there is nothing to prove, sowe assume that f ( K, K ) − K log D − K > . • If the cut Ω has M = M = · · · = M D − = 0 , then theexpression in the constraint (37) becomes Kk (cid:18) f Q (cid:48) ( M c , k ) + f Q (cid:48) ( M c , M )+ · · · + f Q (cid:48) ( K, M D − ) − (cid:80) D − i =1 M i D − (cid:19) = Kk f Q (cid:48) ( K, k ) ( a ) ≥ f Q (cid:48) ( K, K ) ≥ f Q (cid:48) ( K, K ) − K ( b ) ≥ f ( K, K ) − K log D − K, where ( a ) follows from Claim 1, provided at the end ofthis Appendix, and ( b ) follows by the same argument asin (29). k (cid:0) Kk (cid:1) (cid:88) ≤ i < ··· . The RHS of the constraintdue to Ω is Kk (cid:18) f Q (cid:48) ( M c , k ) + f Q (cid:48) ( M c , M )+ · · · + f Q (cid:48) ( K, M D − ) − (cid:80) D − i =1 M i D − (cid:19) = Kk f Q (cid:48) ( M c , k ) + Kk (cid:18) f Q (cid:48) ( M c , M )+ · · · + f Q (cid:48) ( K, M D − ) − (cid:80) D − i =1 M i D − (cid:19) ( a ) ≥ f Q (cid:48) ( M c , K ) + Kk (cid:18) f Q (cid:48) ( M c , M )+ · · · + f Q (cid:48) ( K, M D − ) − (cid:80) D − i =1 M i D − (cid:19) ( b ) ≥ f Q (cid:48) ( M c , K ) + (cid:18) f Q (cid:48) ( M c , M )+ · · · + f Q (cid:48) ( K, M D − ) − (cid:80) D − i =1 M i D − (cid:19) ( c ) ≥ f Q (cid:48) ( K, K ) − K ≥ f ( K, K ) − K log D − K, where- ( a ) follows by Claim 1,- ( b ) follows because Kk ≥ and because f Q (cid:48) ( M c , M ) + · · · + f Q (cid:48) ( K, M D − ) − (cid:80) D − i =1 M i D − , is non-negative, which is proved as follows: f Q (cid:48) ( M c , M ) + · · · + f Q (cid:48) ( K, M D − ) − (cid:80) D − i =1 M i D − ≥ f Q (cid:48) ( K, M i ∗ ) − (cid:80) D − i =1 M i D − ≥ f Q (cid:48) ( K, M i ∗ ) − M i ∗ ≥ M i ∗ K f Q (cid:48) ( K, K ) − M i ∗ = M i ∗ K ( f Q (cid:48) ( K, K ) − K ) ≥ M i ∗ K ( f ( K, K ) − K log D − K ) ≥ , - ( c ) follows by noting that the expression in ( b ) isthe constraint on sum-rate imposed by a cut whichis V ∪ Ω , which we know is lower bounded by f Q (cid:48) ( K, K ) − K. The above analysis shows that (38) renders all other con-straints redundant.
Claim 1.
For any Q ≥ , any k ∈ { , , . . . , K − } and any l ∈ { , , . . . , K } , Kk f Q ( l, k ) ≥ f Q ( l, K ) . Proof:
Recall that f Q ( l, K ) is defined to be E (cid:20) log det (cid:18) I + P ( Q + 1) σ H † l,K H l,K (cid:19)(cid:21) . To be more explicit in the following, we write I p to denotean identity matrix of size p . Also, for brevity, we denote P ( Q +1) σ by λ . For any fixed H l,K , we have by [20, eq. (3.15)]the inequality given at the top of this page, where H l, ( i ,...,i k ) is obtained by choosing the columns of H l,K indexed by ( i , . . . , i k ) . Hence, k (cid:0) Kk (cid:1) (cid:88) ≤ i < ···
IEEE Information Theory Workshop (ITW) Seville ,2013.[2] R. Kolte, A. ¨Ozg¨ur, and A. El Gamal, “Optimized noisy network codingfor gaussian relay networks,” in
IEEE International Zurich Seminar onCommunications , 2014, pp. 140–143.[3] T. Cover and A. El Gamal, “Capacity theorems for the relay channel,”
IEEE Transactions on Information Theory , vol. 25, no. 5, pp. 572–584,Sep 1979.[4] G. Kramer, I. Maric, and R. D. Yates,
Cooperative Communications .Foundations and Trends in Networking, Now Publishers, 2007.[5] A. Avestimehr, S. Diggavi, and D. Tse, “Wireless network informationflow: A deterministic approach,”
IEEE Transactions on InformationTheory , vol. 57, no. 4, pp. 1872–1905, 2011.[6] S. Lim, Y.-H. Kim, A. El Gamal, and S.-Y. Chung, “Noisy networkcoding,”
IEEE Transactions on Information Theory , vol. 57, no. 5, pp.3132–3152, 2011.[7] A. Ozgur and S. Diggavi, “Approximately achieving gaussian relaynetwork capacity with lattice-based qmf codes,”
IEEE Transactions onInformation Theory , vol. 59, no. 12, pp. 8275–8294, Dec 2013.[8] A. Raja and P. Viswanath, “Compress-and-forward scheme for relaynetworks: Backword decoding and connection to bisubmodular flows,”
IEEE Transactions on Information Theory , vol. 60, no. 9, pp. 5627–5638, Sept 2014.[9] G. Kramer and J. Hou, “Short-message quantize-forward network cod-ing,” in
Multi-Carrier Systems Solutions (MC-SS), 2011 8th Interna-tional Workshop on , 2011, pp. 1–3.[10] S. H. Lim, K. T. Kim, and Y.-H. Kim, “Distributed decode-forward formulticast,” in
IEEE International Symposium on Information Theory ,2014, pp. 636–640. [11] B. Chern and A. Ozgur, “Achieving the capacity of the n -relay gaussiandiamond network within log n bits,”
IEEE Transactions on InformationTheory , vol. 60, no. 12, pp. 7708–7718, Dec 2014.[12] A. Sengupta, I.-H. Wang, and C. Fragouli, “Optimizing quantize-map-and-forward relaying for gaussian diamond networks,” in
IEEEInformation Theory Workshop (ITW), Lausanne , 2012, pp. 381–385.[13] U. Niesen, B. Nazer, and P. Whiting, “Computation alignment: Capac-ity approximation without noise accumulation,”
IEEE Transactions onInformation Theory , vol. 59, no. 6, pp. 3811–3832, 2013.[14] B. Nazer and M. Gastpar, “Compute-and-forward: Harnessing inter-ference through structured codes,”
IEEE Transactions on InformationTheory , vol. 57, no. 10, pp. 6463–6486, 2011.[15] B. Nazer, M. Gastpar, S. Jafar, and S. Vishwanath, “Ergodic interferencealignment,”
IEEE Transactions on Information Theory , vol. 58, no. 10,pp. 6355–6371, 2012.[16] T. Courtade and A. ¨Ozg¨ur, “Approximate capacity of gaussian relaynetworks: Is a sublinear gap to the cutset bound plausible?” in
IEEEInternational Symposium on Information Theory , 2015.[17] A. El Gamal, “On information flow in relay networks,” in
NTC ’81;National Telecommunications Conference, Volume 2 , vol. 2, 1981, pp.D4.1.1–D4.1.4.[18] X. Wu and L.-L. Xie, “On the optimal compressions in the compress-and-forward relay schemes,”
IEEE Transactions on Information Theory ,vol. 59, no. 5, pp. 2613–2628, 2013.[19] I. E. Telatar, “Capacity of multi-antenna gaussian channels,”
EuropeanTransactions on Telecommunications , vol. 10, pp. 585–595, 1999.[20] T. S. Han, “Nonnegative entropy measures of multivariate symmetriccorrelations,”