[PDF] On the Capacity of Multiplicative Finite-Field Matrix Channels

Abstract

This paper deals with the multiplicative finite-field matrix channel, a discrete memoryless channel whose input and output are matrices (over a finite field) related by a multiplicative transfer matrix. The model considered here assumes that all transfer matrices with the same rank are equiprobable, so that the channel is completely characterized by the rank distribution of the transfer matrix. This model is seen to be more flexible than previously proposed ones in describing random linear network coding systems subject to link erasures, while still being sufficiently simple to allow tractability. The model is also conservative in the sense that its capacity provides a lower bound on the capacity of any channel with the same rank distribution. A main contribution is to express the channel capacity as the solution of a convex optimization problem which can be easily solved by numerical computation. For the special case of constant-rank input, a closed-form expression for the capacity is obtained. The behavior of the channel for asymptotically large field size or packet length is studied, and it is shown that constant-rank input suffices in this case. Finally, it is proved that the well-known approach of treating inputs and outputs as subspaces is information-lossless even in this more general model.

Full PDF

11 On the Capacity of Multiplicative Finite-FieldMatrix Channels

Roberto W. N´obrega,

Student Member, IEEE , Danilo Silva,

Member, IEEE , andBartolomeu F. Uchˆoa-Filho,

Senior Member, IEEE

Abstract —This paper deals with the multiplicative ﬁnite-ﬁeldmatrix channel, a discrete memoryless channel whose input andoutput are matrices (over a ﬁnite ﬁeld) related by a multiplicativetransfer matrix. The model considered here assumes that alltransfer matrices with the same rank are equiprobable, so thatthe channel is completely characterized by the rank distributionof the transfer matrix. This model is seen to be more ﬂexible thanpreviously proposed ones in describing random linear networkcoding systems subject to link erasures, while still being sufﬁ-ciently simple to allow tractability. The model is also conservativein the sense that its capacity provides a lower bound on thecapacity of any channel with the same rank distribution. A maincontribution is to express the channel capacity as the solutionof a convex optimization problem which can be easily solvedby numerical computation. For the special case of constant-rankinput, a closed-form expression for the capacity is obtained. Thebehavior of the channel for asymptotically large ﬁeld size orpacket length is studied, and it is shown that constant-rankinput sufﬁces in this case. Finally, it is proved that the well-known approach of treating inputs and outputs as subspaces isinformation-lossless even in this more general model.

Index Terms —Channel capacity, ﬁnite-ﬁeld matrix channel,multiplicative matrix channel, noncoherent network coding, ran-dom linear network coding, subspace coding.

I. I

NTRODUCTION

Finite-ﬁeld matrix channels are communication channelswhere both the input and the output are matrices over someﬁnite ﬁeld F q . The interest in such channels has been risingsince the seminal work of Koetter and Kschischang [3], whichconnects ﬁnite-ﬁeld matrix channels to the problem of errorcontrol in noncoherent network coding. In contrast with thecombinatorial framework of [3], the present paper follows [4]–[6] and adopts a probabilistic approach.The object of study of this work is the multiplicative ﬁnite-ﬁeld matrix channel ( MMC ), modeled by the law Y = GX , (1)where X ∈ F n × (cid:96)q is the channel input matrix , Y ∈ F m × (cid:96)q is the channel output matrix , and G ∈ F m × nq is the channel This work was supported in part by CNPq–Brazil. The material in thispaper was presented in part at the 2011 IEEE International Symposium onInformation Theory [1]. Some of the earlier ideas on which this work is basedappeared in an unpublished draft [2].The authors are with the Department of Electrical Engineering of theFederal University of Santa Catarina, Florian´opolis 88040–970, Brazil. (email:[email protected]; [email protected]; [email protected]).Copyright (c) 2013 IEEE. Personal use of this material is permitted.However, permission to use this material for any other purposes must beobtained from the IEEE by sending a request to [email protected]. Throughout this paper, random entities are represented using boldfaceletters, while italic letters are used for their samples. transfer matrix , with X and G statistically independent. Forsimplicity, we assume max { n, m } ≤ (cid:96) . This model turns outto be well-suited for random linear network coding systems [7]in the absence of malicious nodes, but possibly subject to linkerasures. In this context, X is the matrix whose rows are the n packets transmitted by the source node, Y is the matrixwhose rows are the m packets received by the sink node, and (cid:96) is the number of q -ary symbols in each packet. Also, G isthe network transfer matrix, whose probability distributionis dictated by the network topology, the random choices ofcoding coefﬁcients, and the link erasure probabilities.Multiplicative ﬁnite-ﬁeld matrix channels have been pre-viously considered by Silva et al. [5] and Jafari et al. [6].Speciﬁcally, in [5], G is chosen uniformly at random amongall full-rank matrices, while in [6], G has i.i.d. entries selecteduniformly at random (or, equivalently, G is uniform over allmatrices). Although these transfer matrix distributions couldin principle be used to model random linear network codingsystems, they cannot properly reﬂect different network topolo-gies or accurately describe systems in which link erasuresplay an important role. This is because in these models thetransfer matrix distribution is completely speciﬁed by the ﬁeldsize q and the dimensions n and m . On the other hand,a full description of a completely general transfer matrixdistribution requires, in addition, the speciﬁcation of q nm parameters (namely, Pr[ G = G ] , for G ∈ F m × nq ), thereforebeing impractical even for modest values of q , n and m .In view of this tension between tractability and generality,the present paper suggests a new model which generalizesboth the models of [5] and [6], but still keeps to a realis-tic level the amount of information needed to describe thechannel. Speciﬁcally, we allow the probability distributionof the rank of G to be arbitrary; nevertheless we considerthat all matrices with the same rank are equiprobable. Wesay such a transfer matrix is uniform given rank (abbreviatedas u.g.r. ). Under this assumption, the probability distributionof the rank of the transfer matrix completely determines thedistribution of the transfer matrix itself and, therefore, alsocompletely determines the channel. Thus, the model onlyrequires min { n, m } + 1 parameters to describe the channel(namely, Pr[rank G = r ] , for ≤ r ≤ min { n, m } ). While itis a challenging problem to obtain the rank distribution ana-lytically for a general network topology (even in the simplestcase of erasure-free links), in practice, a reasonable estimatemay be obtained more simply by Monte Carlo simulation for agiven network model. In fact, the (empirical) rank distributionis a natural ﬁgure of merit for most noncoherent network a r X i v : . [ c s . I T ] A p r coding implementations (see, e.g., [8]). Thus, it is not entirelyunrealistic to assume that this information is indeed available.In order to convince the reader of the usefulness of theproposed model in practical scenarios, we provide an example(see Section IV) on how the u.g.r. transfer matrix is able tobetter capture some properties of noncoherent network codingsystems when compared to existing models. Speciﬁcally, wewill see that for certain network topologies, the capacitiesin [5], [6] deviate more and more from the true capacity as the(graph) distance between the source and sink nodes increasesor the link erasure probability grows. Furthermore, as we shallprove, any MMC can be reduced to our model (although witha potential decrease in the channel capacity) by means ofa simple preprocessing at the transmitter and receiver. Sincethis preprocessing does not alter the rank distribution of thetransfer matrix, this implies that among all transfer matricessharing the same rank distribution, the u.g.r. is the one withlowest channel capacity. In this sense, the u.g.r. model seems toarise naturally in the study of multiplicative ﬁnite-ﬁeld matrixchannels.In this paper, we concentrate on the problem of ﬁndingthe capacity and mutual information of the MMC with u.g.r.transfer matrix. We show that the capacity is achieved whenthe input matrix (similarly to the transfer matrix) is u.g.r., andan expression for the mutual information is derived for thiskind of input. As a consequence, we are able to greatly reducethe complexity of the convex optimization problem involvedin obtaining the channel capacity and the associated optimalinput, when compared to the most general MMC model—areduction from q n(cid:96) to n + 1 variables, as we shall see. Wethen turn over to the special situation of constant-rank input.In this case, we are able to obtain a closed-form expression forthe constant-rank capacity. Later on, we consider the problemin which q or (cid:96) are allowed to grow arbitrarily, and showthat the true channel capacity is achieved by constant-rankinput. As a ﬁnal contribution, we verify that communicationvia subspaces is still optimal when the transfer matrix is u.g.r.This generalizes similar conclusions previously obtained in [5]and [6].A related line of work by Yang et al. [9]–[12], doneconcurrently to and independently of our work, considersa completely general transfer matrix distribution (with thetransfer matrix still independent of the input). They wereable to identify a class of inputs (which they call “ α -type”)that is sufﬁcient to achieve the channel capacity. As a result,the number of optimization variables required to compute thechannel capacity is reduced—although to a number that is stillexponential in the matrix size. They also derive upper andlower bounds on the capacity which depends only on the rankdistribution of the transfer matrix. It is worth mentioning thatsome of our results can be obtained by specializing the resultsin [9] to a u.g.r. transfer matrix. (Appropriate comparisons aremade along the text whenever applicable.) Nevertheless, webelieve that the approach we follow here is simpler and moreinsightful for this particular case.Finally, it is worth noticing that some of the results obtainedin this paper have been subsequently employed in [13], wherean arbitrarily varying channel approach to the MMC is considered. More precisely, [13] assumes that the rank ofthe transfer matrix is randomly chosen according to a knownprobability distribution, but, apart from that, the transfer matrixcan be changed arbitrarily from time-slot to time-slot. It isshown that the capacity of this channel is the same as thecapacity of the MMC with u.g.r. transfer matrix consideredhere.The remainder of this paper is organized as follows. Sec-tion II presents some notation, basic facts, and a brief reviewon discrete memoryless channels. Section III deﬁnes thechannel model under consideration. Section IV considers amotivating example. Section V contains the main results ofthis work, whose proofs are located in Section VI. Section VIIconcludes the paper.II. N OTATION AND B ACKGROUND

Let F q be a ﬁnite ﬁeld. We denote by F m × nq the set of all m × n matrices with entries in F q , and by T r ( F m × nq ) thosematrices in F m × nq with rank r . For notational convenience,we sometimes set T r = T r ( F m × nq ) when the matrix dimen-sion m × n and the ﬁeld size q are implied by the context.Also, T ( F m × nq ) (cid:44) T min { n,m } ( F m × nq ) is the set of all m × n full-rank matrices. It is well-known (see, e.g., [14]) that |T ( F m × rq ) | = r − (cid:89) i =0 ( q m − q i ) , for r ≤ m , and |T r ( F m × nq ) | = |T ( F m × rq ) | (cid:20) nr (cid:21) q , (2)where (cid:20) nr (cid:21) q (cid:44)  r − (cid:89) i =0 q n − q i q r − q i , if ≤ r ≤ n, , else, (3)denotes the Gaussian binomial coefﬁcient. It is also knownthat the Gaussian binomial coefﬁcient satisﬁes [3, Lemma 4] q r ( n − r ) ≤ (cid:20) nr (cid:21) q ≤ γ q q r ( n − r ) , (4)where γ q = ∞ (cid:89) i =1 − q − i . In this paper, we let (cid:104) A (cid:105) denote the row space of a matrix A ,and P ] the indicator function of P , that is, P ] = (cid:40) , if P is true, , otherwise.A discrete memoryless channel ( DMC ) [15] with input x and output y is deﬁned by a triplet ( X , p y | x , Y ) , where X and Y are the channel input and output alphabets, respectively,and p y | x , called the channel transition probability , gives theconditional probability that y = y ∈ Y is received given that x = x ∈ X is sent. The channel is memoryless in the sensethat what happens to the transmitted symbol at one time is independent of what happens to the transmitted symbol at anyother time. The capacity of the DMC is then given by C = max p x I ( x ; y ) , where I ( x ; y ) is the mutual information between x and y , andthe maximization is over all possible input distributions p x .An interesting question is whether input or output letters ofa DMC can be grouped together without reducing the channelmutual information. The following result (see, e.g., [16, § Lemma 1:

Let ( X , p y | x , Y ) be a DMC with input x andoutput y . In addition, let f : X → U and g : Y → V besurjective functions, and deﬁne u = f ( x ) and v = f ( y ) . Thefollowing holds:1) I ( x ; y ) = I ( u ; y ) for all p x if and only if, for every pair x, x (cid:48) ∈ X satisfying f ( x ) = f ( x (cid:48) ) , we have p y | x ( y | x ) = p y | x ( y | x (cid:48) ) for all y ∈ Y .2) I ( x ; y ) = I ( x ; v ) for all p x if and only if, for every pair y, y (cid:48) ∈ Y satisfying g ( y ) = g ( y (cid:48) ) , there exists some realnumber α such that p y | x ( y (cid:48) | x ) = α p y | x ( y | x ) for all x ∈ X . III. C HANNEL M ODEL

The MMC described by the channel law (1) can naturallybe viewed as a DMC deﬁned by ( X = F n × (cid:96)q , p Y | X , Y = F m × (cid:96)q ) , where the channel transition probability is given by p Y | X ( Y | X ) = (cid:88) G p G | X ( G | X ) p Y | X , G ( Y | X, G )= (cid:88) G p G ( G ) 1[ Y = GX ] (and thus completely characterized by p G ). This work dealswith a special class of this channel, in which the transfermatrix G is “uniform given rank,” a concept deﬁned next. Deﬁnition:

A random matrix A ∈ F m × nq distributed accord-ing to p A is said to be uniform given rank ( u.g.r. , for short) if,for every A, A (cid:48) ∈ F m × nq , we have p A ( A ) = p A ( A (cid:48) ) whenever rank A = rank A (cid:48) .Let A be a random matrix over F m × nq with probabilitydistribution p A . Also, let k = rank A ; this is a randomvariable taking values on { , . . . , min { n, m }} according toa probability distribution p k given by p k ( k ) = (cid:88) A ∈T k p A ( A ) . Then, it is clear that A is u.g.r. if and only if p A ( A ) = p k ( k ) |T k ( F m × nq ) | , where k = rank A . In this way, the rank probability distribu-tion p k completely determines p A for A u.g.r. In addition, itis not hard to show that the entropy of A satisﬁes H ( A ) ≤ (cid:88) k p k ( k ) log q |T k ( F m × nq ) | p k ( k ) , (5) T T G Arbitrary MMCMMC with u.g.r. transfer matrix G = T GT X X Y Y Fig. 1: Turning an arbitrary MMC into an MMC with u.g.r. transfer matrix.The rank distribution of the new channel is the same as the original channel. with equality when A is u.g.r. This is because among allmatrices with a given rank probability distribution, the u.g.r.is the one with largest entropy.As said before, both the models of Silva et al. [5] andJafari et al. [6] are special cases of the u.g.r. model consid-ered here. Indeed, let r = rank G , distributed according to p r ( r ) = (cid:80) G ∈T r p G ( G ) , be the random variable representingthe rank of the transfer matrix. Then, for the channel modelin [5], where G is uniformly distributed over T ( F n × nq ) , wehave p r ( r ) = (cid:40) , if r = n, , else, (6)while for the channel model in [6], where G is uniformlydistributed over F m × nq , we have p r ( r ) = |T r ( F m × nq ) | q nm . (7)We remark that every MMC can be artiﬁcially transformedinto an MMC with u.g.r. transfer matrix (having the same rankdistribution as the original channel) by means of “randomiza-tion” at both the transmitter and receiver. Theorem 2 belowmakes this precise. We prove this theorem as an applicationof a generalized version of the crypto lemma [17], whichmay be useful in other applications. The proofs are given inAppendix A. Theorem 2:

Let G ∈ F m × nq be a random matrix witharbitrary probability distribution, and deﬁne G (cid:48) = T GT ,where T ∈ T ( F m × mq ) and T ∈ T ( F n × nq ) are uniformlydistributed full-rank square matrices, independent of G andof each other. Then, G (cid:48) is u.g.r. and has the same rankdistribution as G .Effectively (see Fig. 1), instead of transmitting the originalsource packets (say X (cid:48) ), the transmitter sends X = T X (cid:48) ;and instead of the actual channel output (say Y ), the receiverconsiders Y (cid:48) = T Y for decoding. (Here, T and T are deﬁned as in Theorem 2.) Consequently, if the transfermatrix of the original channel is G , we have Y (cid:48) = T Y = T GX = T GT X (cid:48) = G (cid:48) X (cid:48) , where G (cid:48) , according toTheorem 2, is u.g.r. and has the same rank distribution as G .Naturally, from the data-processing inequality [15], we have I ( X (cid:48) ; Y (cid:48) ) ≤ I ( X ; Y ) , so that this transformation comes atthe expense of a potential reduction of the channel capacity.Thus, we conclude that, among all transfer matrices sharingthe same rank distribution, the u.g.r. is the one with lowestchannel capacity, and that any capacity result obtained for theMMC with u.g.r. transfer matrix can be used as a lower boundfor MMCs with non-u.g.r. transfer matrices. Source Layer Layer Layer L Sink... ... · · ·· · ·· · · ...... ...

Fig. 2: Wireless layered relay network. There are L layers, and each layerhas N relay nodes. A few more comments are in order. First, note that random-ization at the transmitter (but not at the receiver) is already ausual practice in random linear network coding systems [3].Second, since both the multiplication of matrices and thegeneration of a random invertible matrix can be accomplishedin polynomial time, the randomization is also a polynomial-time procedure. Third, because T and T are independentof G and of each other, no channel knowledge is assumed,and no common randomness shared by the transmitter andreceiver is required. Finally, for a numerical quantiﬁcation ofthe rate loss incurred by randomization, refer to Example 2 inSection IV. IV. M OTIVATING E XAMPLE

In this section, we present an example showing how theu.g.r. model is able to better model a noncoherent networkcoding system. Consider the wireless relay network depictedin Fig. 2, with L layers (columns) and N relay nodes per layer.Assume that the system operates with packets of length (cid:96) , andthat between each two consecutive layers (also between thesource node and layer , and layer L and the sink node) thereare N orthogonal broadcast channels, which are subject toindependent erasures occurring in the end of the channel withprobability (cid:15) . Whenever a packet is erased, it is consideredto be received as the all-zero vector. In addition, assume thatthere is no communication between nonadjacent layers, as wellas between nodes in the same layer.The system operates as follows. First, the source node trans-mits packets to the ﬁrst layer by using all the N orthogonalbroadcast channels. It repeats this process M times, so thata total of M N packets is received by each node in theﬁrst layer. (It is assumed that the source does not performany randomization.) After that, each node in the ﬁrst layercomputes M random linear combinations (with i.i.d. uniformcoefﬁcients in F q ) of all its received packets, and broadcaststhese linear combinations to the second layer, again in M timeslots, by using one of the N orthogonal channels assigned toit. In this way, a total of M N packets is received by eachnode in the second layer, M from each node of the ﬁrst layer.The system operates similarly up to layer L . Finally, the sinknode receives M N packets, M from each node in layer L .We now show that this system can be modeled as anMMC with n = m = M N . Let X ∈ F MN × (cid:96)q (resp., Y ∈ F MN × (cid:96)q ) denote the matrix whose rows are the packetstransmitted (resp., received) by the source (resp., sink) node. Let R i,j ∈ F MN × (cid:96)q (resp., S i,j ∈ F M × (cid:96)q ) denote the matrixwhose rows are the packets received (resp., transmitted) bythe j -th relay node of the i -th layer, for ≤ i ≤ L and ≤ j ≤ N . From the network operation just described, weknow that S i,j = A i,j R i,j , for ≤ i ≤ L and ≤ j ≤ N , where A i,j ∈ F M × MNq arematrices whose entries are i.i.d. selected uniformly at random.We also know that R ,j = E ,j X , R i,j = E i,j  S i − , ... S i − ,N  , and Y = E (cid:48)  S L, ... S L,N  , for ≤ i ≤ L and ≤ j ≤ N , where E i,j , E (cid:48) ∈ F MN × MNq are diagonal matrices (modeling the erasures) whose diagonalentries are i.i.d. with p (0) = (cid:15) and p (1) = 1 − (cid:15) . From this,we can deduce that Y = GX , where G = E (cid:48) A L E L · · · A E A E , (8)in which A i ∈ F MN × MN q (a block-diagonal matrix) and E i ∈ F MN × MNq are given by A i =  A i, . . . A i,N  , E i =  E i, ... E i,N  . Note that, in general, the transfer matrix given in (8) is not u.g.r. Therefore, as mentioned in Section III, the capacityresults from Section V will serve only as lower bounds onthe channel capacity. We herein call the attention to the factthat the calculation of the real value of the channel capacityis a computationally heavy task, even for small values ofparameters. For example, when q = 2 and n = m = (cid:96) = 8 ,a priori, we need to solve an optimization problem over q n(cid:96) = 2 variables, which is clearly impractical. Accordingto [9], we could simplify the problem to (cid:80) nu =0 (cid:2) nu (cid:3) q > variables, but this number is still impractical. Example 1:

Figs. 3a and 3b show the rank distribution p r induced by the wireless layered relay network with q = 2 and N = M = 2 (thus, n = m = M N = 4 ), as a function of (cid:15) , for L = 1 , and as a function of L , for (cid:15) = 0 , respectively. Note thatthe value of (cid:96) is unimportant here. Both rank distributions wereobtained from (8) by the Monte Carlo method with 100,000realizations.Figs. 3c and 3d show the channel capacity of the corre-sponding MMC assuming u.g.r. transfer matrix, with the rankdistributions of Figs. 3a and 3b, and considering a packetlength (cid:96) = 8 . The results were obtained from Theorem 4 ofSection V. The ﬁgures also show the capacity obtained for asystem with the same parameters q , n , m , and (cid:96) , but modeledwith a full-rank uniform transfer matrix [5] or with a uniformtransfer matrix [6], as well as the coherent upper bound of [9](i.e., the channel capacity assuming that both the transmitterand receiver know the transfer matrix). † p r ( r ) r =0 r =1 r =2 r =3 r =4 (a) Rank distribution for L = 1 , as a function of (cid:15) . L p r ( r ) r =0 r =1 r =2 r =3 r =4 (b) Rank distribution for (cid:15) = 0 , as a function of L . † ¯ C Upper bound (coherent)Lower bound (u.g.r.)Silva et al.

Jafari et al. (c) Capacity for L = 1 and (cid:96) = 8 , as a function of (cid:15) . L ¯ C Upper bound (coherent)Lower bound (u.g.r.)Silva et al.

Jafari et al. (d) Capacity for (cid:15) = 0 and (cid:96) = 8 , as a function of L .Fig. 3: Rank distribution and channel capacity for the wireless layered relay network with N = M = 2 and q = 2 . Clearly, the models of [5] and [6] are insensitive to theeffects of link erasures and variations on the topology (hereillustrated by the number of layers). The capacities for thesesmodels are seen to deviate substantially from the true capacity.In contrast, from the trends of the lower and upper boundscurves, it can be inferred that the capacity for the u.g.r. modelbehaves much like the true capacity (note that the upper boundgoes to zero as (cid:15) approaches one or L increases; therefore, sodoes the true capacity). In fact, as the next example illustrates,the u.g.r. lower bound may actually be close to the truecapacity. Example 2:

This example aims to quantify the rate lossincurred by considering a matrix channel being u.g.r. when,in fact, it is not. For such, we consider the wireless layeredrelay network with ﬁeld size q = 2 , a single layer ( L = 1 ),and two relay nodes ( N = 2 ). We also set M = 1 , so that n = m = 2 . In this case, (8) yields G = E (cid:48) A E = (cid:20) e a e e a e e a e e a e (cid:21) , where e , . . . , e ∈ F (related to the erasures) are i.i.d. with Pr[ e i = 0] = (cid:15) , and a , . . . , a ∈ F (the network codingcoefﬁcients) are i.i.d. with Pr[ a i = 0] = 1 / . The transfermatrix distribution p G ( G ) with (cid:15) = 1 / is shown in Fig. 4a,which also shows the corresponding u.g.r. distribution.Fig. 4b shows the true channel capacity (obtained by solvingthe original maximization problem over q n(cid:96) = 64 variables),along with the u.g.r. lower bound (obtained by solving a maximization problem over n + 1 = 3 variables, accordingto Theorem 4), and the coherent upper bound (given by E [ r ] ,according to Yang et al. [9]), as a function of (cid:15) , for a packetlength (cid:96) = 3 . It is interesting to observe that the u.g.r. lowerbound is tight for (cid:15) = 0 , since in this case G becomesuniformly distributed over F m × nq , and thus u.g.r. Also, forall other values of (cid:15) , the true capacity is very close to theu.g.r. lower bound, which constitutes an evidence that theu.g.r. model serves as a good approximation for noncoherentnetwork coding systems.V. M AIN R ESULTS

This section present the main results of this work, whoseproofs are left to Section VI. In what follows, we consideran MMC with input matrix X , output matrix Y , and u.g.r.transfer matrix G . In addition to r (cid:44) rank G , distributedaccording to p r ( r ) = (cid:80) G ∈T r p G ( G ) , we also make use ofthe random variables u (cid:44) rank X and v (cid:44) rank Y , whoseprobability distributions are given by p u ( u ) = (cid:80) X ∈T u p X ( X ) and p v ( v ) = (cid:80) Y ∈T v p Y ( Y ) , respectively.The rank transition probability , that is, the probability ofreceiving a matrix with rank v = v given the transmittedmatrix has rank u = u , plays an important role in this work.Since u → X → Y → v forms a Markov chain, the rank G p G ( G ) true distribution (left)u.g.r. distribution (right) (a) True and u.g.r. transfer matrix distributions, for (cid:15) = 1 / . † ¯ C Upper bound (coherent)CapacityLower bound (u.g.r.) (b) True capacity and bounds, as a function of (cid:15) , for (cid:96) = 3 .Fig. 4: Transfer matrix distribution and channel capacity for the wireless layered relay network with L = 1 , N = 2 , M = 1 , and q = 2 . In Fig. 4a, thehorizontal axis consists of all the matrices in F × , ordered from left to right as follows: [00; 00] , [10; 00] , [01; 00] , [00; 10] , [00; 01] , [11; 00] , [00; 11] , [10; 10] , [01; 01] , [11; 11] , [10; 01] , [01; 10] , [11; 10] , [11; 01] , [10; 11] , [01; 11] . transition probability is given by p v | u ( v | u ) = (cid:88) X,Y p v | Y ( v | Y ) p Y | X ( Y | X ) p X | u ( X | u )= (cid:88) X ∈T u p X | u ( X | u ) (cid:88) Y ∈T v p Y | X ( Y | X ) , and, therefore, may depend not only on p Y | X (i.e., on p G ),but also on p X | u . In the next theorem, we ﬁnd the value ofthe rank transition probability for the case of a u.g.r. transfermatrix, and we show that it is independent of p X | u . We alsodetermine the channel transition probability in terms of therank transition probability. Theorem 3:

The following holds for the MMC with u.g.r.transfer matrix:1) Let u , v , and r be nonnegative integers such that r ≤ min { n, m } . We have p v | u , r ( v | u, r ) = (cid:2) uv (cid:3) q (cid:2) nr (cid:3) q (cid:20) n − ur − v (cid:21) q q v ( n − u − r + v ) , (9)which does not depend on p X | u . Thus, the rank transitionprobability is given by p v | u ( v | u ) = (cid:88) r p r ( r ) p v | u , r ( v | u, r ) , and the output rank probability is given by p v ( v ) = (cid:88) u p u ( u ) p v | u ( v | u ) .

2) The channel transition probability is given by p Y | X ( Y | X ) =  p v | u ( v | u ) |T v ( F m × uq ) | , if (cid:104) Y (cid:105) ⊆ (cid:104) X (cid:105) , , else. (10)Moreover, if the input X is u.g.r., so is the output Y . Remark:

Let u , v , and r be nonnegative integers such that r ≤ min { n, m } . Recall from (3) that the Gaussian binomialcoefﬁcient (cid:2) xy (cid:3) q is nonzero if and only if ≤ y ≤ x . Thus,according to (9), we have p v | u , r ( v | u, r ) (cid:54) = 0 if and only if ≤ v ≤ u and ≤ r − v ≤ n − u ; these, in turn, are equivalent to u + r − n ≤ v ≤ min { u, r } . This is expected: the upper boundfollows trivially because rank GX ≤ min { rank X, rank G } ,and the lower bound follows from Sylvester’s rank inequality,which says that, if G and X are matrices of sizes m × n and n × (cid:96) , respectively, then rank X + rank G − n ≤ rank GX .We next derive the channel capacity. We will see that u.g.r.input sufﬁces to achieve the capacity, so that there is no needto consider more general inputs. Let I ∗ ( p u ) (cid:44) max p X : p u I ( X ; Y ) , (11)where the maximum is over the collection of all matrixprobability distributions p X with associated rank probabilitydistribution equal to p u , that is, over the set { p X : (cid:80) X ∈T u p X ( X ) = p u ( u ) , for u = 0 , . . . , n } . Theorem 4:

The capacity of the MMC with u.g.r. transfermatrix is given by C = max p u I ∗ ( p u ) , where I ∗ ( p u ) , as deﬁned in (11), is achieved by u.g.r. input,and is given by I ∗ ( p u ) = (cid:88) v p v ( v ) log q |T v ( F m × (cid:96)q ) | p v ( v ) − (cid:88) u h u p u ( u ) , (12)where h u = (cid:88) v p v | u ( v | u ) log q |T v ( F m × uq ) | p v | u ( v | u ) . (13)From Theorem 4, we can see that the problem of ﬁnding thecapacity and the corresponding optimal input for the MMCwith u.g.r. transfer matrix, which was originally a convexoptimization problem over q n(cid:96) variables (namely, p X ( X ) for X ∈ F n × (cid:96)q ), is simpliﬁed to another convex optimizationproblem, this time involving only n + 1 variables (namely, p u ( u ) , for u = 0 , . . . , n ). The solution to this optimizationproblem can be obtained by standard methods (see, e.g., [18]).We now focus on the special situation in which the inputmatrices are restricted to have constant rank . This case is of interest for at least two reasons. First, constant-rank inputhappens to be asymptotically optimal both in the packet lengthand in the ﬁeld size (as we shall see next). And second,most of the existing practical constructions for subspace codesare “codes in the Grassmannian,” that is, constant-dimensionsubspace codes [3].Let C u denote the maximum channel mutual informationwhen the input is restricted to rank- u matrices. Let u ∗ denotethe value of u that maximizes C u , so that C u ∗ = max u C u .We call C u the rank- u capacity , and C u ∗ the constant-rankcapacity of the multiplicative ﬁnite-ﬁeld matrix channel. Theorem 5:

The rank- u capacity of the MMC with u.g.r.transfer matrix is achieved by the uniform [over T u ( F n × (cid:96)q ) ]input distribution, and is given by C u = (cid:88) v p v | u ( v | u ) log q (cid:2) (cid:96)v (cid:3) q (cid:2) uv (cid:3) q . (14)Moreover, C u ∗ ≤ C ≤ C u ∗ + log q (min { n, m } + 1) . (15) Remark:

In particular, if the input is always full rank (i.e., u = n ), then v = r (since v = rank Y = rank GX =rank G = r ). The capacity becomes simply C n = (cid:88) r p r ( r ) log q (cid:2) (cid:96)r (cid:3) q (cid:2) nr (cid:3) q , a result obtained earlier in [2]. Moreover, since p v |(cid:104) X (cid:105) ( v | U ) only depends on U through u = dim U (see Theorem 3), ourresult agrees with [9, Theorem 7].We next turn to the behavior of the channel for asymp-totically large packet length (cid:96) , and asymptotically large ﬁeldsize q . We show that, for both scenarios, constant-rank inputsufﬁces to achieve the capacity.Consider ﬁrst the asymptotic behavior in the packet length (cid:96) .In this situation, it is appropriate to deﬁne ¯ C (cid:44) C/(cid:96) ,the normalized capacity of the matrix channel, measured inpackets per channel use. We also deﬁne the normalized rank- u capacity as ¯ C u (cid:44) C u /(cid:96) , and the normalized constant-rankcapacity as ¯ C u ∗ , where u ∗ is the value of u that maximizes ¯ C u . Theorem 6:

Asymptotically in the packet length (cid:96) , thenormalized capacity of the MMC with u.g.r. transfer matrix isachieved with constant-rank uniform input, and is given by lim (cid:96) →∞ ¯ C = E [ r ] . The optimal input rank is always u ∗ = n . Remark:

This result is also obtained in [9, Corollary 1] forthe case of an MMC with a general transfer matrix.We now turn to the asymptotic behavior in the ﬁeld size q .In a general situation, the rank distribution may depend on q [for example, the case in (7)]. Thus, in what follows, we let p ∞ r ( r ) (cid:44) lim q →∞ p r ( r ) denote the limiting distribution of r in q , assuming such a limitexists. Of course, when the rank distribution does not dependon q , then p ∞ r ( r ) = p r ( r ) . Theorem 7:

Asymptotically in the ﬁeld size q , the capacityof the MMC with u.g.r. transfer matrix is achieved withconstant-rank uniform input, and is given by lim q →∞ C = max u (cid:34) ( (cid:96) − u ) (cid:88) r p ∞ r ( r ) min { u, r } (cid:35) . Remark:

Consider random linear network coding in theabsence of link errors and erasures. When the ﬁeld size q isasymptotically large, it is known [7] that the transfer matrixwill have rank h with probability approaching one, where h is the network mincut. In this case, p ∞ r ( r ) = 1[ r = h ] , so that lim q →∞ C = max u [( (cid:96) − u ) min { u, h } ] = ( (cid:96) − u ∗ ) u ∗ , where u ∗ = min { h, (cid:98) (cid:96)/ (cid:99)} . For the sub-case in which h =min { n, m } , we have u ∗ = min { n, m, (cid:98) (cid:96)/ (cid:99)} , which agreeswith [5, Proposition 3] and [6, Theorem 2], since in both cases p ∞ r ( r ) = 1[ r = min { n, m } ] [see equations (6) and (7)].Our last result is concerned with the optimality of sub-space coding [3] for the MMC with u.g.r. transfer matrix.Let P ( F (cid:96)q , d ) denote the set of all subspaces of F (cid:96)q withdimension d or less. Theorem 8:

Consider the MMC with u.g.r. transfer matrix.Deﬁne U (cid:44) (cid:104) X (cid:105) and V (cid:44) (cid:104) Y (cid:105) . Then, I ( X ; Y ) = I ( U ; V ) , (16)for every input distribution p X . Furthermore, for every U ∈P ( F (cid:96)q , n ) and V ∈ P ( F (cid:96)q , m ) , we have p V | U ( V | U ) = |T ( F m × dim Vq ) | p Y | X ( Y | X ) , (17)where X ∈ F n × (cid:96)q and Y ∈ F m × (cid:96)q are any matrices such that (cid:104) X (cid:105) = U and (cid:104) Y (cid:105) = V .As a consequence of Theorem 8, the matrix channel ( X = F n × (cid:96)q , p Y | X , Y = F m × (cid:96)q ) can be transformed into a (simpler) subspace channel ( U = P ( F (cid:96)q , n ) , p V | U , V = P ( F (cid:96)q , m )) with channel transition probability p V | U given by (17). Con-cretely, the new channel is obtained by concatenating theoriginal channel at the input with a device that takes asubspace U to any matrix X such that (cid:104) X (cid:105) = U , and atthe output with a device that computes V = (cid:104) Y (cid:105) . Dueto (16), any coding scheme for the matrix channel has acounterpart in the subspace channel achieving exactly the samemutual information, and vice versa. In particular, one mayfocus solely on ( U , p V | U , V ) when designing and analyzingcapacity-achieving schemes. VI. P

ROOFS

This section presents the proofs omitted from Section V.In order to preserve space, we will often drop the subscriptsof the probability distributions, writing, for example, p ( X ) instead of p X ( X ) . Before we proceed, we present a series ofmatrix enumeration results that will prove useful throughoutthis section. Lemma 9:

Let X ∈ T u ( F n × (cid:96)q ) be given. The number ofmatrices Y ∈ T v ( F m × (cid:96)q ) such that (cid:104) Y (cid:105) ⊆ (cid:104) X (cid:105) is given by |{ Y ∈ T v : (cid:104) Y (cid:105) ⊆ (cid:104) X (cid:105)}| = |T v ( F m × uq ) | . Now, let Y ∈ T v ( F m × (cid:96)q ) be given. The number of matrices X ∈ T u ( F n × (cid:96)q ) such that (cid:104) Y (cid:105) ⊆ (cid:104) X (cid:105) is given by |{ X ∈ T u : (cid:104) Y (cid:105) ⊆ (cid:104) X (cid:105)}| = |T v ( F m × uq ) | |T u ( F n × (cid:96)q ) ||T v ( F m × (cid:96)q ) | . Proof:

For every X ∈ T u ( F n × (cid:96)q ) , deﬁne J ( X ) = { Y ∈ T v : (cid:104) Y (cid:105) ⊆ (cid:104) X (cid:105)} . Let X , X ∈ T u ( F n × (cid:96)q ) . Then, there exist invertible matrices S ∈ F n × nq and T ∈ F (cid:96) × (cid:96)q such that X = SX T . It is not hardto show that Y (cid:55)→ Y T − is a bijection between J ( X ) and J ( X ) , so that we must have |J ( X ) | = |J ( X ) | . Therefore,to compute the value of |J ( X ) | , we can set X = (cid:20) I u

00 0 (cid:21) ∈ F n × (cid:96)q , where I u is the u × u identity matrix. Since Y ∈ J ( X ) if andonly if Y is of the form [ Y , where Y ∈ T v ( F m × uq ) , weconclude that |J ( X ) | = |T v ( F m × uq ) | , as desired.Now, for every Y ∈ T v ( F m × (cid:96)q ) , deﬁne K ( Y ) = { X ∈ T u : (cid:104) Y (cid:105) ⊆ (cid:104) X (cid:105)} . Similarly to the previous paragraph, it is possible to show that |K ( Y ) | = |K ( Y ) | for every Y , Y ∈ T v ( F m × (cid:96)q ) . Considerthen a bipartite graph where X s in T u ( F n × (cid:96)q ) are the nodes inthe left-hand side, Y s in T v ( F m × (cid:96)q ) are the nodes in the right-hand side, and in which a node X is connected with a node Y if and only if (cid:104) Y (cid:105) ⊆ (cid:104) X (cid:105) . The number of edges connectedwith nodes in the left-hand side, namely, |T u ( F n × (cid:96)q ) | |J ( X ) | ,must be equal to the number of edges connected with nodes inthe right-hand side, namely, |T v ( F m × (cid:96)q ) | |K ( Y ) | , from whichthe second statement follows.The next lemma is a combinatorial result by Brawley andCarlitz [19]. Lemma 10:

Let G ∈ T v ( F m × uq ) be a given matrix. Thenumber of matrices G ∈ T r ( F m × nq ) whose left m × u sub-matrix is G is given by φ q ( m, n, u, r, v ) (cid:44) |T ( F m × rq ) ||T ( F m × vq ) | (cid:20) n − ur − v (cid:21) q q v ( n − u − r + v ) . We now derive another basic enumeration result which isclosely related to the multiplicative ﬁnite-ﬁeld matrix channel.

Lemma 11:

Let X ∈ T u ( F n × (cid:96)q ) and Y ∈ T v ( F m × (cid:96)q ) . Thenumber of matrices G ∈ T r ( F m × nq ) such that GX = Y is |{ G ∈ T r : GX = Y }| = φ q ( m, n, u, r, v ) 1[ (cid:104) Y (cid:105) ⊆ (cid:104) X (cid:105) ] . Proof:

Let X ∈ T u ( F n × (cid:96)q ) , Y ∈ T v ( F m × (cid:96)q ) , and deﬁne J ( X, Y ) = { G ∈ T r : GX = Y } . If (cid:104) Y (cid:105) (cid:42) (cid:104) X (cid:105) , then clearly |J ( X, Y ) | = 0 , since no G cantake X into Y . Suppose, then, that (cid:104) Y (cid:105) ⊆ (cid:104) X (cid:105) . Using asimilar argument as employed in the proof of Lemma 9, wecan conclude that it sufﬁces to show the result for X = (cid:20) I u

00 0 (cid:21) ∈ F n × (cid:96)q , where I u is the u × u identity matrix. For this particular X ,we must have Y = [ Y for some Y ∈ T v ( F m × uq ) (recallthat (cid:104) Y (cid:105) ⊆ (cid:104) X (cid:105) is assumed). On the other hand, we also have Y = GX = [ G , where G ∈ F m × uq is the left m × u sub-matrix of G . We thus have G ∈ J ( X, Y ) if and only if G ∈ T r ( F m × nq ) and G = Y ∈ T v ( F m × uq ) . The result nowfollows from Lemma 10.We are ﬁnally ready to prove the theorems. Proof of Theorem 3:

Let X ∈ T u ( F n × (cid:96)q ) , Y ∈ T v ( F m × (cid:96)q ) ,and r such that ≤ r ≤ min { n, m } . We have p ( Y | X, r ) = (cid:88) G ∈T r p ( G | r ) p ( Y | X, G ) ( a ) = 1 |T r ( F m × nq ) | (cid:88) G ∈T r Y = GX ] ( b ) = 1 |T r ( F m × nq ) | φ q ( m, n, u, r, v ) 1[ (cid:104) Y (cid:105) ⊆ (cid:104) X (cid:105) ] , where ( a ) follows because G is u.g.r., and ( b ) follows fromLemma 11. Therefore, from Lemma 9, we may write p ( v | X, r ) = (cid:88) Y ∈T v p ( Y | X, r ) = |T v ( F m × uq ) ||T r ( F m × nq ) | φ q ( m, n, u, r, v ) , so that, p ( v | u, r ) = (cid:88) X ∈T u p ( X | u ) p ( v | X, r )= (cid:88) X ∈T u p ( X | u ) |T v ( F m × uq ) ||T r ( F m × nq ) | φ q ( m, n, u, r, v )= |T v ( F m × uq ) ||T r ( F m × nq ) | φ q ( m, n, u, r, v ) , and (10) follows by comparing the expressions for p ( Y | X, r ) and p ( v | u, r ) . To prove (9), we substitute φ q ( m, n, u, r, v ) withits deﬁnition (see Lemma 10), to get p ( v | u, r ) = |T v ( F m × uq ) ||T r ( F m × nq ) | |T ( F m × rq ) ||T ( F m × vq ) | (cid:20) n − ur − v (cid:21) q q v ( n − u − r + v ) = (cid:2) uv (cid:3) q (cid:2) nr (cid:3) q (cid:20) n − ur − v (cid:21) q q v ( n − u − r + v ) , where we used (2) in the last step. To ﬁnish the proof, assume that X is u.g.r. Then, for each Y ∈ T v ( F m × (cid:96)q ) , we have p ( Y ) = (cid:88) u (cid:88) X ∈T u p ( Y | X ) p ( X ) ( a ) = (cid:88) u p ( u ) |T u ( F n × (cid:96)q ) | (cid:88) X ∈T u p ( Y | X ) ( b ) = (cid:88) u p ( u ) |T u ( F n × (cid:96)q ) | p ( v | u ) |T v ( F m × uq ) | (cid:88) X ∈T u (cid:104) Y (cid:105) ⊆ (cid:104) X (cid:105) ] ( c ) = (cid:88) u p ( u ) |T u ( F n × (cid:96)q ) | p ( v | u ) |T v ( F m × uq ) | |T v ( F m × uq ) | |T u ( F n × (cid:96)q ) ||T v ( F m × (cid:96)q ) | = p ( v ) |T v ( F m × (cid:96)q ) | , where ( a ) follows because X is u.g.r., ( b ) follows from (10),and ( c ) follows from Lemma 9. Therefore, Y is also u.g.r., asclaimed. Proof of Theorem 4:

For each X ∈ T u ( F n × (cid:96)q ) , we have H ( Y | X = X ) = (cid:88) v (cid:88) Y ∈T v p ( Y | X ) log q p ( Y | X )= (cid:88) v p ( v | u ) log q |T v ( F m × uq ) | p ( v | u ) = h u , where we substituted p ( Y | X ) as in (10). Averaging over all X ∈ F n × (cid:96)q , we get H ( Y | X ) = (cid:88) u (cid:88) X ∈T u H ( Y | X = X ) p ( X )= (cid:88) u h u (cid:88) X ∈T u p ( X )= (cid:88) u h u p ( u ) , which depends on p X only through p u . Therefore, I ∗ ( p u ) = max p X : p u I ( X ; Y )= max p X : p u [ H ( Y ) − H ( Y | X )]= [ max p X : p u H ( Y )] − (cid:88) u h u p ( u ) , and we get the desired result from (5). Proof of Theorem 5:

If the input is restricted to rank- u matrices, then u = u is a constant, and therefore p ( v ) = p ( v | u ) . The channel mutual information given by Theorem 4simpliﬁes to (cid:88) v p ( v | u ) log q |T v ( F m × (cid:96)q ) ||T v ( F m × uq ) | , and we get (14) by applying (2).The lower bound of (15) is immediate. Similarly to Yang etal. in [9, Lemma 4], we can rewrite the mutual informa- tion (12) as I ∗ ( p u ) = (cid:88) v p ( v ) log q |T v ( F m × (cid:96)q ) | p ( v ) − (cid:88) u p ( u ) h u = (cid:88) u,v p ( u ) p ( v | u ) log q |T v ( F m × (cid:96)q ) | p ( v ) + − (cid:88) u,v p ( u ) p ( v | u ) log q |T v ( F m × uq ) | p ( v | u )= (cid:88) u,v p ( u ) p ( v | u ) log q |T v ( F m × (cid:96)q ) ||T v ( F m × uq ) | ++ (cid:88) u,v p ( u ) p ( v | u ) log q p ( v | u ) p ( v )= (cid:88) u p ( u ) C u + I ( u ; v ) , where I ( u ; v ) is the mutual information between the randomvariables u and v . The upper bound of (15) then followsbecause (cid:80) u p ( u ) C u ≤ max u C u = C u ∗ and I ( u ; v ) ≤ log q (min { n, m } + 1) . Proof of Theorem 6:

Dividing (15) by (cid:96) , and taking thelimit when (cid:96) → ∞ , we obtain lim (cid:96) →∞ ¯ C = lim (cid:96) →∞ ¯ C u ∗ , so that constant-rank input is sufﬁcient to achieve capacity forasymptotically large (cid:96) . Now, dividing (14) by (cid:96) , and takingthe limit when (cid:96) → ∞ , we obtain lim (cid:96) →∞ ¯ C u = (cid:88) v p ( v | u ) (cid:32) lim (cid:96) →∞ (cid:96) log q (cid:2) (cid:96)v (cid:3) q (cid:2) uv (cid:3) q (cid:33) = (cid:88) v p ( v | u ) (cid:32) lim (cid:96) →∞ (cid:96) log q (cid:20) (cid:96)v (cid:21) q − lim (cid:96) →∞ (cid:96) log q (cid:20) uv (cid:21) q (cid:33) = (cid:88) v p ( v | u ) (cid:32) lim (cid:96) →∞ (cid:96) log q (cid:20) (cid:96)v (cid:21) q (cid:33) = (cid:88) v v p ( v | u ) = E [ v | u = u ] , where the ﬁrst equality in the last line is a consequence of (4).Finally, since v ≤ r , we have E [ v | u = u, r = r ] ≤ r = E [ v | u = n, r = r ] , for all u ∈ { , . . . , n } . Multiplying both sides by p ( r ) andsumming over r , we obtain E [ v | u = u ] ≤ E [ r ] = E [ v | u = n ] , which shows that lim (cid:96) →∞ ¯ C u = E [ v | u = u ] is maximumwhen u = n , with the maximum value being E [ r ] .For the next result, we will need the following intuitive fact. Lemma 12:

We have lim q →∞ p ( v | u, r ) = (cid:40) , if v = min { u, r } , , else. Proof:

This is clearly true if v > min { u, r } . When v ≤ min { u, r } , we have from (4) and from Theorem 3 that q v ( u − v ) · γ − q q − r ( n − r ) · q ( r − v )( n − u − r + v ) · q v ( n − u − r + v ) ≤ p ( v | u, r ) = (cid:20) uv (cid:21) q (cid:20) nr (cid:21) − q (cid:20) n − ur − v (cid:21) q q v ( n − u − r + v ) ≤ γ q q v ( u − v ) · q − r ( n − r ) · γ q q ( r − v )( n − u − r + v ) · q v ( n − u − r + v ) . After simplifying, we get γ − q q − ( u − v )( r − v ) ≤ p ( v | u, r ) ≤ γ q q − ( u − v )( r − v ) , and the desired result follows because lim q →∞ γ q = 1 . Proof of Theorem 7:

The quantity log q (min { n, m } + 1) in the right-hand side of (15) goes to zero as q → ∞ , so that lim q →∞ C = lim q →∞ C u ∗ , that is, constant-rank input sufﬁces for asymptotically large q .Now, from (14), we have lim q →∞ C u = (cid:88) v (cid:18) lim q →∞ p ( v | u ) (cid:19) (cid:32) lim q →∞ log q (cid:2) (cid:96)v (cid:3) q (cid:2) uv (cid:3) q (cid:33) For the ﬁrst parenthesis, we have from Lemma 12 that lim q →∞ p ( v | u ) = n (cid:88) r =0 p ∞ r ( r )1[ v = min { u, r } ] . For the second parenthesis, we have from (4) that lim q →∞ log q (cid:2) (cid:96)v (cid:3) q (cid:2) uv (cid:3) q = v ( (cid:96) − u ) . Therefore, lim q →∞ C u = (cid:88) v (cid:88) r p ∞ r ( r )1[ v = min { u, r } ] v ( (cid:96) − u )= ( (cid:96) − u ) (cid:88) r p ∞ r ( r ) (cid:88) v v = min { u, r } ] v = ( (cid:96) − u ) (cid:88) r p ∞ r ( r ) min { u, r } , as desired. Proof of Theorem 8:

From Theorem 3 we know that p Y | X ( Y | X ) depends on X and Y only through (cid:104) X (cid:105) and (cid:104) Y (cid:105) .Therefore, according to Lemma 1, the maps f ( X ) = (cid:104) X (cid:105) and g ( Y ) = (cid:104) Y (cid:105) are information-lossless. This proves (16).To prove (17), we ﬁrst apply the input grouping to theoriginal matrix channel ( X , p Y | X , Y ) , to get an intermediatechannel ( U , p Y | U , Y ) , with p Y | U ( Y | U ) = p Y | X ( Y | X ) , where X is such that (cid:104) X (cid:105) = U . Then, we apply the output groupingto this intermediate channel to get the subspace channel ( U , p V | U , V ) with p V | U ( V | U ) = (cid:88) Y (cid:48) : (cid:104) Y (cid:48) (cid:105) = V p Y | U ( Y (cid:48) | U )= |T ( F m × dim Vq ) | p Y | U ( Y | U ) , where Y is such that (cid:104) Y (cid:105) = V . Note that the last step in theabove equation follows from |{ Y (cid:48) ∈ F m × (cid:96)q : (cid:104) Y (cid:48) (cid:105) = V }| = |T ( F m × dim Vq ) | , which is true because associated with every Y (cid:48) ∈ F m × (cid:96)q such that (cid:104) Y (cid:48) (cid:105) = V , there is a unique full-rank matrix T ∈T ( F m × dim Vq ) such that Y (cid:48) = T ˜ Y , where ˜ Y ∈ T ( F dim V × (cid:96)q ) is any ﬁxed full-rank matrix satisfying (cid:104) ˜ Y (cid:105) = V .VII. C ONCLUSIONS

This work has considered probabilistic multiplicative ﬁnite-ﬁeld matrix channels in which the transfer matrix is uniformlydistributed conditioned on its rank. We advocate the applica-tion of this channel model in practical noncoherent networkcoding systems subject to link erasures, for we believe itis ﬂexible enough to capture the essential characteristics ofthe system, while still being mathematically tractable. Thiscontrasts with previously considered channel models, whichare either too restrictive or too complex.As contributions, we have shown that the problem of ﬁndingthe channel capacity can be reduced to a convex optimizationproblem on n +1 variables (rather than q n(cid:96) ), allowing for easynumerical computation by standard techniques. We have alsospecialized our results to the important case of constant-rankinput, in which we were able to ﬁnd a closed-form expressionfor the capacity. For asymptotically large ﬁeld or packet length,we have shown that constant-rank input is optimal. Finally, wehave proven that even in our more general setup, subspacecoding is still sufﬁcient to achieve capacity. Many of ourresults generalize existing conclusions in prior literature.The present paper has focused mainly on the capacity andmutual information of the multiplicative ﬁnite-ﬁeld matrixchannel. The design of low-complexity capacity-achievingschemes for this channel is an important and still largelyopen problem. Recent work by Yang et al. [9], [12] hasaddressed this problem by considering the construction ofcodes based on the expected value of the rank of the transfermatrix, E [ r ] . Nevertheless, the design of codes based on therank distribution p r is yet to be investigated. Finally, anotherchallenging and interesting research line motivated by thepresent work is the computation of the rank distribution asa function of a given network topology.A PPENDIX

AA V

ARIATION OF THE C RYPTO L EMMA We start by recalling the following well-known result,known as the crypto lemma for the case of ﬁnite groups [17].

Lemma 13:

Let ( G , · ) be a ﬁnite group. Let y = g · x ,where x and g are random variables over G , and g is uniformover G and independent of x . Then, y is uniform over G andindependent of x .Now, let S be a set. Recall that a (left) group action of G on S is a binary operator ◦ : G×S → S such that ( g · g ) ◦ x = g ◦ ( g ◦ x ) , for all g , g ∈ G and x ∈ S ; and e ◦ x = x , forall x ∈ S , where e is the identity element of G . Every group G acts on itself ( S = G ) by left multiplication, that is, throughthe action given by g ◦ x = g · x . This appendix generalizes thecrypto lemma from this special case to the case of an arbitrary This appendix is a joint work with Chen Feng. action of G on some ﬁnite set S . Before we proceed, we needto recall a few basic facts about group actions [20, § x ∈ S , the orbit of G containing x is deﬁned as G ◦ x (cid:44) { g ◦ x : g ∈ G} . The relation on S deﬁned by x ∼ y iff x = g ◦ y for some g ∈ G is an equivalence relation. We have x ∼ y iff G ◦ x = G ◦ y iff x and y are in the same orbit. The size of each orbit is givenby |G ◦ x | = |G| / |G x,x | , where G x,x (cid:44) { g ∈ G : g ◦ x = x } isthe stabilizer of x in G (a subgroup of G ). An action is called transitive if there is only one orbit. Lemma 14:

Let ( G , · ) be a ﬁnite group, S a ﬁnite set, and ◦ : G × S → S a group action of G on S . Let y = g ◦ x (sothat x and y lie in the same orbit), where x and g are randomvariables over S and G , respectively, and g is uniform over G and independent of x . Then, y is piece-wise uniform over theorbits of the action and conditionally independent of x giventhat a particular orbit occurs. Remark:

In particular, if the action is transitive, then y isuniform over S and independent of x . This is the case of theaction g ◦ x = g · x , so we recover Lemma 13. Proof:

Since g is uniform and independent of x , we havethat, for all x, y ∈ S , p y | x ( y | x ) = |G x,y ||G| , where G x,y (cid:44) { g ∈ G : g ◦ x = y } . If x ∼ y (so that G ◦ x = G ◦ y ), it can be shown that G x,y is a coset of thestabilizer G x,x , which implies |G x,y | = |G x,x | , and thus p y | x ( y | x ) = |G x,x ||G| = 1 |G ◦ x | = 1 |G ◦ y | . On the other hand, if x (cid:28) y , then clearly p y | x ( y | x ) = 0 .Therefore, p y ( y ) = (cid:88) x p y | x ( y | x ) p x ( x )= 1 |G ◦ y | (cid:88) x : x ∼ y p x ( x )= Pr[ x ∼ y ] |G ◦ y | = Pr[ y ∼ y ] |G ◦ y | , and the lemma follows.Theorem 2 is a corollary of this result. Proof of Theorem 2:

The result follows after applyingLemma 14 with G = T ( F m × mq ) × T ( F n × nq ) , where theoperation is ( T (cid:48) , T (cid:48) ) · ( T , T ) = ( T (cid:48) T , T T (cid:48) ) , S = F m × nq ,and ◦ : G × S → S deﬁned by ( T , T ) ◦ M = T M T .The facts that ( G , · ) is a group and ◦ is an action of G on S follow from basic linear algebra; the orbits, in this case, are {T r ( F m × nq ) : r = 0 , . . . , min { n, m }} , which are completelycharacterized by the rank of G .A CKNOWLEDGMENTS

The authors would like to thank Chen Feng, Frank Kschis-chang, and Shenghao Yang for useful discussions. We arealso thankful for the anonymous reviewers for their helpfulcomments and suggestions. R

EFERENCES[1] R. W. N´obrega, B. F. Uchˆoa-Filho, and D. Silva, “On the capacity ofmultiplicative ﬁnite-ﬁeld matrix channels,” in

Proceedings of the 2011IEEE International Symposium on Information Theory (ISIT’11) , SaintPetersburg, Russia, Jul. 2011, pp. 248–252.[2] B. F. Uchˆoa-Filho and R. W. N´obrega, “The capacity of random linearcoding networks as subspace channels,”

Computing Research Repository(CoRR) , vol. abs/1001.1021, Jan. 2010.[3] R. Koetter and F. R. Kschischang, “Coding for errors and erasures inrandom network coding,”

IEEE Transactions on Information Theory ,vol. 54, no. 8, pp. 3579–3591, Aug. 2008.[4] A. Montanari and R. L. Urbanke, “Iterative coding for network coding,”

IEEE Transactions on Information Theory , vol. 59, no. 3, pp. 1563–1572, Mar. 2013.[5] D. Silva, F. R. Kschischang, and R. Koetter, “Communication overﬁnite-ﬁeld matrix channels,”

IEEE Transactions on Information Theory ,vol. 56, no. 2, pp. 1296–1305, Mar. 2010.[6] M. Jafari Siavoshani, S. Mohajer, C. Fragouli, and S. Diggavi, “Onthe capacity of non-coherent network coding,”

IEEE Transactions onInformation Theory , vol. 57, no. 2, pp. 1046–1066, Feb. 2011.[7] T. Ho, M. M´edard, R. Koetter, D. Karger, M. Effros, J. Shi, andB. Leong, “A random linear network coding approach to multicast,”

IEEE Transactions on Information Theory , vol. 52, no. 10, pp. 4413–4430, Oct. 2006.[8] P. Chou, Y. Wu, and K. Jain, “Practical network coding,” in

Proceedingsof the 41st Annual Allerton Conference on Communication, Control, andComputing (Allerton’03) , Monticello, Illinois, Oct. 2003.[9] S. Yang, S.-W. Ho, J. Meng, E.-h. Yang, and R. W. Yeung, “Linearoperator channels over ﬁnite ﬁelds,”

Computing Research Repository(CoRR) , vol. abs/1002.2293, Apr. 2010.[10] S. Yang, S.-W. Ho, J. Meng, and E.-h. Yang, “Capacity analysis of linearoperator channels over ﬁnite ﬁelds,”

Computing Research Repository(CoRR) , vol. abs/1108.4257, Dec. 2012.[11] ——, “Optimality of subspace coding for linear operator channels overﬁnite ﬁelds,” in

Proceedings of the 2010 IEEE Information TheoryWorkshop (ITW’10) , Cairo, Egypt, Jan. 2010, pp. 400–404.[12] S. Yang, J. Meng, and E.-h. Yang, “Coding for linear operator channelsover ﬁnite ﬁelds,” in

Proceedings of the 2010 IEEE InternationalSymposium on Information Theory (ISIT’10) , Austin, Texas, Jun. 2010,pp. 2413–2417.[13] M. Jafari Siavoshani, S. Yang, and R. W. Yeung, “Non-coherent networkcoding: An arbitrarily varying channel approach,” in

Proceedings of the2012 IEEE International Symposium on Information Theory (ISIT’12) ,Cambridge, Massachusetts, Jul. 2012, pp. 1672–1676.[14] S. D. Fisher and M. N. Alexander, “Matrices over a ﬁnite ﬁeld,”

American Mathematical Monthly , vol. 73, pp. 639–641, Jun. 1966.[15] T. M. Cover and J. A. Thomas,

Elements of Information Theory , 2nd ed.Wiley-Interscience, 2006.[16] N. Abramson,

Information Theory and Coding . McGraw-Hill, 1963.[17] G. D. Forney Jr., “On the role of MMSE estimation in approachingthe information-theoretic limits of linear Gaussian channels: Shannonmeets Wiener,” in

Proceedings of the 41st Annual Allerton Conferenceon Communication, Control, and Computing (Allerton’03) , Monticello,Illinois, Oct. 2003.[18] S. Boyd and L. Vandenberghe,

Convex Optimization , 2nd ed. Cam-bridge University Press, 2004.[19] J. V. Brawley and L. Carlitz, “Enumeration of matrices with prescribedrow and column sums,”

Linear Algebra and Applications , vol. 6, pp.165–174, 1973.[20] D. S. Dummit and R. M. Foote,