On ℓ p -norm Computation over Multiple-Access Channels
aa r X i v : . [ c s . I T ] S e p On ℓ p -norm Computation over Multiple-AccessChannels (Invited Paper) Steffen Limmer ∗ , Sławomir Sta ´nczak ∗†∗ Fraunhofer Institute for Telecommunications, Heinrich Hertz Institute, Einsteinufer 37, 10587 Berlin, Germany, † Fachgebiet Informationstheorie und theoretische Informationstechnik,Technische Universit¨at Berlin, Einsteinufer 25, 10587 Berlin, Germany
Abstract
This paper addresses some aspects of the general problem of information transfer and distributed function computationin wireless networks. Many applications of wireless technology foresee networks of autonomous devices executing tasks thatcan be posed as distributed function computation. In today’s wireless networks, the tasks of communication and (distributed)computation are performed separately, although an efficient network operation calls for approaches in which the informationtransfer is dynamically adapted to time-varying computation objectives. Thus, wireless communications and function computationmust be tightly coupled and it is shown in this paper that information theory may play a crucial role in the design of efficientcomputation-aware wireless communication and networking strategies. This is explained in more detail by considering the problemof computing ℓ p -norms over multiple access channels. I. I
NTRODUCTION
Future wireless networks are envisioned to consist of a massive large of communication devices that perform network tasksautonomously. A main enabler for this vision is the ability of the network to extract the useful information from a huge amountof data distributed over the different nodes. In many scenarios, the network reveals its true purpose-centric character, and theindividual transmission of every collected sensor value to some sink node can be circumvented. Consider for example anenvironment monitoring system, where the main objective is to make predictions with respect to a small set of state variablesthat are formed by aggregating measured values of the network nodes. In such a setting, the wireless network can leverage achannel that emerges from a dumb bit-pipe to a network and signal processing building block providing arithmetic operationsthat are facilitated by the laws of physics. In particular, the superposition property of electromagnetic waves provides certaincomputation capabilities inherently. In turn, this property can drive the convergence between pure transmission of waveformsand performing arithmetic operations on network variables. In this regard, References [1], [2] showed that channel collisionscan be exploited through a generalized
Computation over Multiple-Access Channels (CoMAC) framework. This frameworksubsumes techniques and mechanisms for coding and transmission to compute functions at a designated sink node using thesuperposition property of the multiple-access channel. In [2], [3], the authors showed that natural characteristics allow tocompute functions contained in the space of nomographic functions . Continuing the analysis of computable functions withinthis framework, this work analyzes the computation of ℓ p norms. Computing ℓ p norms is of high practical relevance for manyapplications, as it allows to compute the number of non-zero elements for p → (and various proxies for < p ≤ ) or themaximum value for p → ∞ (see Fig. 1). The main contribution of this work is a unified analysis for CoMAC of ℓ p using shortsequences and a fixed energy detector at the receiver. A. Notation
Scalars, vectors and matrices are denoted by lowercase, bold lowercase and bold uppercase letters, respectively. R , R + , Z + and N denote the sets of real, nonnegative real, nonnegative integer and natural numbers. , and I K denote the vector of allzeros, all ones and the identity matrix of size K × K . tr {·} , vec {·} and unvec {·} denote the trace of a matrix, the vectorizationof a matrix obtained by stacking it’s columns and the inverse vectorization operation. N ( · , · ) and E [ · ] describe the normaldistribution and the expectation operator. ( · ) T and ⊗ denote transposition and kronecker product, respectively.II. C OMPUTING ℓ p - NORMS OVER M ULTIPLE -A CCESS C HANNELS
Consider a wireless sensor network consisting of one designated sink node and K ∈ N nodes that monitor a physical quantityby sensor values [ x , . . . , x K ] T := x ∈ R K . The objective of the network is to compute functions of the form f ( x ) = k x k pp = K X k =1 | x k | p (1)at the sink node for given values of p > (see also Fig. 1 for illustration) Precisely, the class of functions to be computed(referred to as desired functions in what follows in accordance with [2]) is given by the ℓ p -(pseudo)-norm to the power ofSfrag replacements x | x | p p=2 p=.25p=.5p=1p=2-1.5 -1 -0.5 0 0.5 1 1.500.511.522.5 Fig. 1: ℓ p (pseudo-)norms for different values of p . p . In order to compute such non-linear functions, we assume as in [2] that each node is equipped with a pre-processingunit ϕ : R → R + and transmit their pre-processed values simultaneously to the sink node using transmit sequences S =[ s , . . . , s K ] ∈ R M × K (see Fig. 2). For ease of exposition, we assume in the following that(i) the sensor nodes have perfect channel knowledge so that the effect of the communication channel can be perfectlyequalized,(ii) the nodes can be synchronized on frame and symbol level and(iii) the receiver side detector is restricted to pure energy detection. Moreover, we treat the problem in the real-valued domain and point out that the analysis for the complex domain followsalong similar lines.
Remark 1.
It is important to emphasize that except for the requirement of real-valued entries, there are no additional constraintson the matrix S . This means that transmit sequences are jointly optimized with transmit powers. This stands in contrast to thestudies related to CDMA systems, where the spreading sequences are normalized to be of unit norm and transmit powers aredefined separately (see for instance [4]). In the proposed setup depicted in Fig. 2, the received signal y and the output of the energy detector ˆ f are, respectively,given by y = Sϕ ( x ) + n (2) ˆ f = k Sϕ ( x ) + n k . (3)Here, ϕ : R K → R K denotes the concatenated mapping from raw sensor readings x to transmit symbols ϕ ( x ) that is definedto be ϕ ( x ) := [ | x | p , . . . , | x K | p ] T . (4)PSfrag replacements x x K ϕ ( · ) ϕ ( · ) s s K n k·k ˆ f ( x ) Fig. 2: System structure of the proposed ℓ p -norm computation scheme. Here, the prefix pseudo applies to p < , in which case the functions are not a proper norm. For an analysis of different channel state information schemes at the sensor nodes and the effect of coarse frame synchronization we refer the reader to[2]. An analysis of more complex (e.g. affine or nonlinear) receiver-side detectors is beyond the scope of the paper. emark 2.
In an idealized setting with n = and M ≥ K , the desired function value can be recovered exactly using any S such that S T S = I K . In this case, we have ˆ f ( x ) = k Sϕ ( x ) k = ϕ ( x ) T S T Sϕ ( x ) = K X k =1 (cid:16) | x k | p (cid:17) (5) = k x k pp ≡ f ( x ) . However, the case described in Remark 2 is of minor practical relevance, as it excludes the effect of noise and is limited tothe case K ≥ M , i.e. sequence lengths that exceed the number of network nodes. The more interesting case is a noisy settingwith M < K , and we motivate the use of the given pre-processing function and system structure by the identity in (5). Wenote that in addition to the application for networked computation of ℓ p -norms, the considered setting is also closely relatedto problems in coding theory and robust dimensionality reduction (see e.g. [5],[6]).To simplify the subsequent analysis, we fix some M < K and consider an
MSE metric for estimating f by ˆ f using S anda fixed receiver structure as depicted in Fig. 2. In this case, the MSE can be computed as J ( S ) = E h ( f − ˆ f ) i (6) = E ϕ ( x ) T ( I K − S T S ) ϕ ( x ) | {z } a − ϕ ( x ) T S T n | {z } b − n T n | {z } c = E (cid:2) a + b + c − ab − ac + 2 bc (cid:3) , which can be evaluated assuming that the probability distribution functions of x and n are given. Lemma 1.
Assume that x and n are independent and distributed as x ∼ N ( , σ x I K ) and n ∼ N ( , σ n I M ) . Under thisassumption, the expectation in (6) decomposes and we have E (cid:2) a (cid:3) = tr (cid:8) M ( S T S ⊗ S T S ) (cid:9) (7) − (cid:8) M ( I ⊗ S T S ) (cid:9) + tr { M } E (cid:2) b (cid:3) = 4 σ n tr (cid:8) CS T S (cid:9) E (cid:2) c (cid:3) = tr { N } E [ ac ] = M σ n tr (cid:8) C ( I − S T S ) (cid:9) E [ ab ] = E [ bc ] = 0 , with C = E (cid:2) ϕ ( x ) ϕ ( x ) T (cid:3) (8) M = E h vec (cid:8) ϕ ( x ) ϕ ( x ) T (cid:9) vec (cid:8) ϕ ( x ) ϕ ( x ) T (cid:9) T i N = E h vec (cid:8) nn T (cid:9) vec (cid:8) nn T (cid:9) T i , Proof:
The result for E [ a ] follows from ϕ ( x ) T Aϕ ( x ) ϕ ( x ) T Bϕ ( x ) = (9) = tr (cid:8) ϕ ( x ) ϕ ( x ) T Aϕ ( x ) ϕ ( x ) T B (cid:9) = vec (cid:8) A T ϕ ( x ) ϕ ( x ) T (cid:9) T vec (cid:8) ϕ ( x ) ϕ ( x ) T B (cid:9) = tr n ( B T ⊗ A )vec (cid:8) ϕ ( x ) ϕ ( x ) T (cid:9) vec (cid:8) ϕ ( x ) ϕ ( x ) T (cid:9) T o using tr { ABC } = tr { CAB } = tr { BCA } (10) tr (cid:8) A T B (cid:9) = vec { A } T vec { B } vec { ABC } = ( C T ⊗ A )vec { B } ( A ⊗ B )( C ⊗ D ) = AC ⊗ BD , and performing the expectation w.r.t. the random variable x . The term E [ c ] follows similarly with n as random variable. E [ b ] , E [ ac ] , E [ ab ] and E [ b ] follow from independency of x and n and the zero mean assumption on n , respectively.oreover, we point out the following without a proof. Lemma 2.
The MSE function J : R M × K → R + defined by (6) attains a minimum on R M × K . To compute the matrices C , M and N , we note, that all matrix entries are equal to monomials with either non-negativerational exponents α ∈ R K + for C and M , or non-negative integer exponents β ∈ Z M + for N . Thus, the entries of C and M can be computed by extending the derivation of central absolute moments in [7] to the monomial case: Q ( α ) = E " K Y k =1 | x k | α k = Z R K | x | α · · · | x K | α K p x ( x ) d x = (2 σ x ) P Kk =1 αk √ π K K Y k =1 Γ (cid:18) α k + 12 (cid:19) . (11)Similarly, the entries of N can be obtained by extending the derivation of central moments in [7]: I ( β ) = E " M Y m =1 n β k k = Z R M n β · · · n β M M p n ( n ) d n (12) = if some β m is odd (2 σ n ) P Mm =1 βm √ π M Q Mm =1 Γ( β m +12 ) if all β m are even.Now we are in a position to state our optimization problem. Proposition 1.
Let ˆ f be given by (3) , and let x ∼ N ( , σ x I K ) and n ∼ N ( , σ n I M ) be independent. Then, the optimalsequences for computing f = k x k pp given the receiver structure in Fig. 2 w.r.t. an MSE metric can be obtained as a solutionto the problem min S ∈ R M × K tr (cid:8) M ( S T S ⊗ S T S ) (cid:9) − (cid:8) M ( ⊗ S T S ) (cid:9) + tr { M } + 4 σ n tr (cid:8) CS T S (cid:9) + tr { N } − M σ n tr (cid:8) C (cid:0) I − S T S (cid:1)(cid:9) (13) = min S ∈ R M × K J ( S ) . Proof:
The proof follows directly by combining Lemma 2, Lemma 1 and (6).To the best of our knowledge, a closed-form solution to this problem is not known. Consequently we are going to approachthe problem by numerical methods.III. F
IRST -O RDER O PTIMIZATION OF T RANSMIT S EQUENCES
In this section, we develop a simple gradient descent algorithm to optimize the cost function J over the unconstrained inputdomain R M × K . Unfortunately, the problem of Proposition 1 (Problem (13)) is not convex and there is no guarantee that thealgorithm converges to a global minimum. The problem of designing an algorithm with global convergence is left as an openproblem for future research.Instead, we make use of the well-known gradient descent iteration to optimize the MSE function J : S ( t +1) = S ( t ) − µ ( t ) ∇ S J ( S ( t ) ) , (14)where a suitable step-size that guarantees a non-increasing sequence of objective values J ( S ( t ) ) can be obtained using theArmijo criterion [8], [9] J (cid:16) S ( t ) − µ ( t ) ∇ S J ( S ( t ) ) (cid:17) ≤ J ( S ( t ) ) − cµ ( t ) k∇ S J ( S ( t ) ) k F . (15)In fact, it can be shown that a sufficiently small constant step size would guarantee a non-increasing sequence of objectivevalues as well. Since the objective function is bounded below (it is greater than zero), we can conclude that the sequence J ( S ( t ) ) must converge under (15). This fact will be used for termination condition.To obtain an analytic expression for ∇ S J ( S ) we refer to the analytic expression in Lemma 1 and (6), and simple standardformulas (e.g. [10]) yield ∇ S E (cid:2) b (cid:3) = 4 σ n ( SC T + SC ) (16) ∇ S E [ ac ] = − M σ n ( SC T + SC ) ∇ S E (cid:2) b (cid:3) = ∇ S E (cid:2) c (cid:3) = 0 . t remains to compute ∇ S E (cid:2) a (cid:3) . To this end, we introduce the following lemma: Lemma 3.
Let A ∈ R K × K and R ( A ) denote a permutation of A given by R ( A ) = [vec { A , } , . . . , vec { A K, } , vec { A , } , . . . , vec { A , } , . . . , vec { A K,K } ] T (17) where A i,j ∈ R K × K , i, j ∈ { , . . . , K } denote the K × K block matrix partitioning of A . If R ( A ) has a singular valuedecomposition R ( A ) = K X k =1 σ k u k v Tk , (18) where σ k , u k and v k are the corresponding singular values and singular vectors, then the matrices U k = unvec { u k } and V k = unvec { v k } form a decomposition A = K X k =1 σ k U k ⊗ V k . (19) Proof:
The proof follows directly from Corollary 2.2 in [11].
Remark 3.
In our case M has the additional property that R ( M ) = M = M T , which is stated here without proof. Hence,the SVD in Lemma 3 can be replaced by an EVD. Consequently, the matrix M can be decomposed as M = K X k =1 M k ⊗ M k , (20) with M k := √ σ k U k ≡ √ σ k V k . Now we are in a position to obtain an analytic expression for ∇ E [ a ] . Proposition 2.
Let E [ a ] be given according to Lemma 1 and M according to Lemma 3 and Remark 3. Then, ∇ E [ a ] isgiven by ∇ S E [ a ] = 2 K X k =1 tr { M k S T S } · ( SM k + SM Tk ) (21) − K X k =1 tr { M k } · ( SM Tk + SM k ) . Proof:
Define ∇ S E [ a ] := ∆ (1) − (2) + ∆ (3) with ∆ (1) := ∇ S tr (cid:8) M ( S T S ⊗ S T S ) (cid:9) (22) ∆ (2) := ∇ S tr (cid:8) M ( I ⊗ S T S ) (cid:9) ∆ (3) := ∇ S tr { M } . Using Lemma 3, Remark 3 and trace derivatives (e.g. [10]) we obtain ∆ (1) = ∇ S K X k =1 tr (cid:8) ( M k ⊗ M k ) · ( S T S ⊗ S T S ) (cid:9) (23) = ∇ S K X k =1 tr { M k S T S } tr { M k S T S } = 2 K X k =1 tr { M k S T S } · ( SM k + SM Tk ) , (2) = ∇ S K X k =1 tr (cid:8) ( M k ⊗ M k ) · ( ⊗ S T S ) (cid:9) (24) = ∇ S K X k =1 tr { M k } · tr { M k S T S } = K X k =1 tr { M k } · ( SM Tk + SM k ) , and ∆ (3) = 0 , (25)which completes the proof.Using Proposition 2 and (16), the overall gradient of J ( S ) w.r.t. S is given by ∇ S J ( S ) = ∇ S E [ a ] + ∇ S E [ b ] − ∇ S E [ ac ] (26) = ∆ (1) − (2) + (2 M + 4) σ n (cid:0) SC T + SC (cid:1) . In practice, the sum involved in the computation of ∆ (1) and ∆ (2) can be truncated depending on the decay of singular values σ k to reduce the computational burden. The resulting gradient descent algorithm is described in Alg. 1. Data : initial iterate S (0) , desired norm p Result : optimized matrix ˜ S ∈ R M × K initialization: compute C , M , M k , N ; while J ( S ( t ) ) − J ( S ( t − ) ≥ εJ ( S ( t ) ) and t ≤ T do find µ ( t ) that satisfies (15) ; S ( t +1) = S ( t ) − µ ( t ) ∇ S J ( S ) ; end Algorithm 1: Gradient descent algorithm for optimizing sequence matrices.IV. N
UMERICAL R ESULTS
To evaluate the performance of the proposed first-order optimization scheme we simulate a network consisting of K ∈ { , } nodes and sequence lengths M ∈ { , } to compute the desired function f = k x k pp for p ∈ [10 − , . The signal and noisepowers are set to σ x = 1 and σ n ∈ { . , . } , the number of Monte Carlo iterations is and the gradient descentoptimization is carried out using relative threshold ε = 10 − , Armijo parameter c = 0 . and a maximum of T = 10 gradientdescent iterations. For comparison, we choose equiangular tight frames (ETFs), which are known to meet both Welch Bound and
Maximum Welch Bound with equality and are good candidate solutions for many applications in communications andcoding (see e.g. [5]). For the simulations, we use scaled versions of the × and × ETFs from [12, p. 78f], where thescaling factor is obtained by line-search to optimize (6) (denoted by
WBE ). The result is fed as initial iterate into the gradientdescent optimization algorithm from Alg. 1 (denoted by
ALG 1 ). The results are depicted in Fig. 3 and 4. According to oursimulation results, we can achieve considerable performance gains over ETFs for p (strictly) between − and . On the otherhand, the performance gains for the case p = 4 and noise level σ n = 10 − as well as the case p = 10 − and noise level σ n = 0 . are rather moderate. Remark 4.
It is important to emphasize that the transmit powers (norms) of the compared sequences are allowed to bedifferent in our setting. However, higher transmit powers do not necessarily result in a lower estimation error due to the fixedenergy detector at the receiver (see Fig. 2). In fact, our simulation results show, that optimized sequences can even have lowertotal/maximum transmit power in some cases.
V. C
ONCLUSION
In this paper, we studied the problem of computing ℓ p - norms over the wireless channel using a previously proposed schemein an idealized setting comprising perfect channel equalization and node synchronization. Assuming a simple energy detectionscheme at a designated sink node and scalar pre-processing units at the transmitter nodes we optimize sequences for the bestMSE performance. For the case of Gaussian priors on signal and noise, we give a unified error-analysis for the resultingSfrag replacements p l og ( J ) WBE: 6x16 ALG 1: 3x6WBE: 3x6ALG 1: 6x16WBE: 6x16 σ x = 1 and σ n = 10 − . Monte Carlo results are shown in solid, analytical results from (13) indashed linestyle.PSfrag replacements p l og ( J ) WBE: 6x16 ALG 1: 3x6WBE: 3x6ALG 1: 6x16WBE: 6x16 σ x = 1 and σ n = 0 . . Monte Carlo results are shown in solid, analytical results from (13) indashed linestyle.MSE as a function of p and the deployed transmit sequences. By using a simple gradient descent scheme, we showed that (Maximum) Welch Bound Equality Sequences , which are a good candidate solution for network tasks involving interferenceavoidance, can be outperformed in terms of an MSE criterion, though the performance gains in the investigated small-scalenetwork are rather moderate. An interesting direction to further improve the MSE performance is the use of more complexreceiver structures as well as fixed-rank manifold based optimization methods as outlined in [9]. Promising applications ofthe outlined computation scheme involve measuring the sparsity of sensor values in a network or the maximum sensor value.However, for measuring the sparsity of the sensor values, the Gaussian signal prior poses a limitation in the sense that typicalrealizations are not sparse. Using a more accurate model for sparse signals by sparse processes or compressible distributionsconstitutes an interesting task, however, the required analysis seems to be much more complicated.A CKNOWLEDGMENT
This work was supported by the German Research Foundation (DFG) under grant STA 864/7-1 and by the German Ministry ofResearch and Education (BMBF) under grant 01BU1224. The authors would like to thank J. Mohammadi, R.L.G. Cavalcante andM. Goldenbaum for helpful comments and discussions as well as the authors of [9] for making available their implementation,which was used as a reference for the implementation of Alg. 1.
EFERENCES[1] B. Nazer and M. Gastpar, “Computation over multiple-access channels,”
IEEE Transactions on Information Theory , vol. 53, no. 10, pp. 3498–3516,2007.[2] M. Goldenbaum and S. Stanczak, “Robust analog function computation via wireless multiple-access channels,”
IEEE Transactions on Communications ,vol. 61, no. 9, 2013.[3] M. Goldenbaum and S. Stanczak, “Computing functions via SIMO multiple-access channels: How much channel knowledge is needed?,” in
AcousticsSpeech and Signal Processing (ICASSP), 2010 IEEE International Conference on . IEEE, 2010, pp. 3394–3397.[4] H. Boche and S. Stanczak, “Iterative algorithm for finding optimal resource allocations in symbol-asynchronous cdma channels with different sirrequirements,” in
Signals, Systems and Computers, 2002. Conference Record of the Thirty-Sixth Asilomar Conference on . IEEE, 2002, vol. 2, pp.1909–1913.[5] T. Strohmer and R. W. Heath, “Grassmannian frames with applications to coding and communication,”
Applied and computational harmonic analysis ,vol. 14, no. 3, pp. 257–275, 2003.[6] P. Li, T. J. Hastie, and K. W. Church, “Nonlinear estimators and tail bounds for dimension reduction in ℓ using cauchy random projections,” in Learning Theory , pp. 514–529. Springer, 2007.[7] A. Winkelbauer, “Moments and absolute moments of the normal distribution,” arXiv preprint arXiv:1209.4340 , 2012.[8] S. J. Wright and J. Nocedal,
Numerical optimization , Springer New York, 1999.[9] B. Mishra, G. Meyer, and R. Sepulchre, “Low-rank optimization for distance matrix completion,” in
IEEE Conference on Decision and Control andEuropean Control Conference (CDC-ECC) , 2011.[10] A. Hjorungnes and D. Gesbert, “Complex-valued matrix differentiation: Techniques and key results,”
IEEE Transactions on Signal Processing , vol. 55,no. 6, pp. 2740–2746, 2007.[11] C. F. Van Loan and N. Pitsianis,
Approximation with Kronecker products , Springer, 1993.[12] D. Redmond,