[PDF] Resource Allocation and Dithering of Bayesian Parameter Estimation Using Mixed-Resolution Data

Abstract

Quantization of signals is an integral part of modern signal processing applications, such as sensing, communication, and inference. While signal quantization provides many physical advantages, it usually degrades the subsequent estimation performance that is based on quantized data. In order to maintain physical constraints and simultaneously bring substantial performance gain, in this work we consider systems with mixed-resolution, 1-bit quantized and continuous-valued, data. First, we describe the linear minimum mean-squared error (LMMSE) estimator and its associated mean-squared error (MSE) for the general mixed-resolution model. However, the MSE of the LMMSE requires matrix inversion in which the number of measurements defines the matrix dimensions and thus, is not a tractable tool for optimization and system design. Therefore, we present the linear Gaussian orthonormal (LGO) measurement model and derive a closed-form analytic expression for the MSE of the LMMSE estimator under this model. In addition, we present two common special cases of the LGO model: 1) scalar parameter estimation and 2) channel estimation in mixed-ADC multiple-input multiple-output (MIMO) communication systems. We then solve the resource allocation optimization problem of the LGO model with the proposed tractable form of the MSE as an objective function and under a power constraint using a one-dimensional search. Moreover, we present the concept of dithering for mixed-resolution models and optimize the dithering noise as part of the resource allocation optimization problem for two dithering schemes: 1) adding noise only to the quantized measurements and 2) adding noise to both measurement types. Finally, we present simulations that demonstrate the advantages of using mixed-resolution measurements and the possible improvement introduced with dithering and resource allocation.

Full PDF

aa r X i v : . [ ee ss . SP ] S e p Resource Allocation and Dithering of BayesianParameter Estimation Using Mixed-Resolution Data

Itai E. Berman,

Student Member, IEEE and Tirza Routtenberg,

Senior Member, IEEE

Abstract —Quantization of signals is an integral part of modernsignal processing applications, such as sensing, communication,and inference. While signal quantization provides many phys-ical advantages, it usually degrades the subsequent estimationperformance that is based on quantized data. In order to main-tain physical constraints and simultaneously bring substantialperformance gain, in this work we consider systems with mixed-resolution, 1-bit quantized and continuous-valued, data. First,we describe the linear minimum mean-squared error (LMMSE)estimator and its associated mean-squared error (MSE) forthe general mixed-resolution model. However, the MSE of theLMMSE requires matrix inversion in which the number ofmeasurements deﬁnes the matrix dimensions and thus, is nota tractable tool for optimization and system design. Therefore,we present the linear Gaussian orthonormal (LGO) measurementmodel and derive a closed-form analytic expression for the MSEof the LMMSE estimator under this model. In addition, wepresent two common special cases of the LGO model: 1) scalarparameter estimation and 2) channel estimation in mixed-ADCmultiple-input multiple-output (MIMO) communication systems.We then solve the resource allocation optimization problem ofthe LGO model with the proposed tractable form of the MSEas an objective function and under a power constraint usinga one-dimensional search. Moreover, we present the concept ofdithering for mixed-resolution models and optimize the ditheringnoise as part of the resource allocation optimization problem fortwo dithering schemes: 1) adding noise only to the quantizedmeasurements and 2) adding noise to both measurement types.Finally, we present simulations that demonstrate the advantagesof using mixed-resolution measurements and the possible im-provement introduced with dithering and resource allocation.

Index Terms —Massive MIMO, resource allocation, mixed-ADC, linear minimum mean-squared error, dithering

I. I

NTRODUCTION

Traditional statistical signal processing (SSP) techniqueswere developed for high-resolution sensors, under the unre-alistic assumption of inﬁnite precision sampling, or “analog”data, that can neglect the quantization effect. High-resolutionsensors result in high performance in various SSP tasks,such as parameter estimation. In modern signal processing,signal quantization plays an important role with various ap-plications, including wireless sensor networks (WSNs) [1]–[5], direction of arrival estimation [6], target tracking [7]–[9], multiple-input multiple-output (MIMO) communications[10]–[17], cognitive radio [18], and array processing [19]. Forexample, in communication systems there is usually a need

This work is partially supported by the ISRAEL SCIENCE FOUNDATION(ISF), grant No. 1173/16.I. Berman and T. Routtenberg are with the School of Electrical and Com-puter Engineering Ben-Gurion University of the Negev Beer-Sheva 84105,Israel, e-mail: [email protected], [email protected]. for cheaper, less power-hungry analog-to-digital converters(ADCs) while maintaining accurate channel estimation [13].The widespread use of signal quantization is due to its manyadvantages, which include reduction of hardware complexity,power consumption, communication bandwidth, sensor cost,and sensor’s physical dimensions, as well as enabling high-rate sampling [20], [21]. Despite all its practical advantages,quantization results in low-resolution signals, which degradesthe performance of subsequent parameter estimation that isbased on the quantized data. Moreover, signal quantizationintroduces nonlinear effects into the system, which poses newchallenges for parameter estimation that relies on these signals,such as non-convex optimizations [22], [23].The nonlinear problem of parameter estimation based onlow-precision samples, especially from 1-bit (signed) mea-surements has been discussed widely in the literature (see,e.g. [23]–[27]). For instance, in [1], [2], [28] the maximum-likelihood (ML) estimator and the corresponding Cram ´ er-Rao bounds (CRBs) for quantized samples are presented.However, the ML is usually intractable, and, thus, varioussuboptimal, low-complexity methods have been developedin the literature [14], [19], [24], [29]. In [30] the problemof parameter estimation of a random parameter using 1-bitdithered measurements is studied, deriving lower bounds onthe mean-squared error (MSE) using the Bayesian CRB anddesigning dither strategies. Studies on channel estimation inmassive MIMO systems with 1-bit ADCs show acceptableperformance in channel capacity and the achievable rate dueto the use of a large number of antennas compared to that ofanalog ADCs [10], [31], [32]. In all these methods the useof quantized data results in a degradation of the estimationperformance compared with the analog-data based methods.In addition to purely-quantized or purely-analog data, a fewworks have been using schemes with multiple quantization res-olution data [4], [13], [33]. For example, the ML and CRB fornon-Bayesian estimation with partially quantized observationsis considered in [33]. For the Bayesian case, the minimumMSE (MMSE) estimation of a uniformly distributed parameter,based on both quantized and unquantized observations, hasbeen suggested in [4] and linear MMSE (LMMSE) in speciﬁcapplications is discussed in [11], [12], [15]. However, whilethere are various estimation algorithms based on quantizedsignals (see above), there has been less emphasis on the analy-sis and design of mixed-resolution architectures. Considerableimprovements could be obtained by optimization of estimationschemes relying on both quantized and continuous-valued datawith respect to their parameters, performance, and complexity.This optimization is crucial in order to cope with the limited resources for data processing, storage, and communication inreal-world applications. However, incorporating quantized dataresults in non-trivial operations, creating a need for new tools.A main example is that, in contrast with continuous valuedata, using dithering, i.e. adding noise to a signal prior to itsquantization, improves the estimation performance based onthis signal [22], [34], [35]. However, while there are variousestimation algorithms based on quantized signals, there hasbeen less emphasis on the analysis and design of mixed-resolution architectures.In this work, we consider Bayesian parameter estimationin systems with mixed-resolution, analog and 1-bit quantized,measurements. We develop the LMMSE estimator and its as-sociated MSE for the considered model. We present the linearGaussian orthonormal (LGO) measurement model, which isshown to generalize common schemes, including: 1) scalarparameter estimation and 2) channel estimation in mixed-ADC massive MIMO communication systems. A closed-formanalytic expression of the MSE of the LMMSE estimator isderived under the LGO measurement model. The resource al-location problem is formalized under the LGO model with thetractable expression of the MSE as the objective function andunder power constraints, and solved using a one-dimensionalsearch over value pairs. The concept of dithering, the additionof noise to the measurements before quantization, is presentedfor the mixed-resolution scheme and the resource allocationproblem is solved while also optimizing the dithering noise.Finally, simulations for the LGO model have been conductedand have shown the advantages of mixed-resolution estimationcompared with purely-quantized or purely-analog settings, forthe scalar case and for channel estimation in massive MIMO.In addition, the possible improvement from dithering can beseen even when adding the noise to both the analog andquantized measurements.The remainder of the paper is organized as follows: SectionII presents the general mixed-resolution measurement modeland the resource allocation problem which is shown to becomputationally tedious. In Section III, the LGO measurementmodel is presented and the resource allocation problem issolved for the LGO model, including dithering design. InSection IV, special cases of the LGO model are discussed.Simulations of the resource allocation method are given inSection V. Finally, our conclusions can be found in SectionVI. Notation:

We use boldface lowercase letters to denotevectors and boldface capital letters for matrices. The identitymatrix of size M × M is denoted by I M and vector ofones of length N is denoted by N . The symbols ( · ) ∗ , ( · ) T ,and ( · ) H represent the conjugate, transpose, and conjugatetranspose operators, respectively. The symbol ⊗ is the Kro-necker product. We use trace ( A ) to denote the trace of thematrix A , and diag ( A ) to denote a diagonal matrix containingonly the diagonal elements of A . The arcsin ( · ) fucntion,when applied to a vector or matrix, is applied elementwisely.The distribution of a circularly symmetric complex Gaussianrandom vector with mean µ and covariance matrix Σ isdenoted by CN ( µ , Σ ) and from here on noted as complexGaussian. We denote the covariance matrix of a vector a as C a = E[ aa H ] and correlation between vectors a and b as C ab = E[ ab H ] . The set of non-negative integers is denote by Z + . The 1-bit element-wise quantization function is appliedseparately on the real, Re ( z ) , and imaginary, Im ( z ) , part ofany complex number z ∈ C , and is deﬁned as Q ( z ) = 1 √ "( , Re ( z ) ≥ − , Re ( z ) < j ( , Im ( z ) ≥ − , Im ( z ) < . (1)II. S YSTEM M ODEL

In this section we present the system model and introducethe problem of parameter estimation using mixed-resolutionmeasurements. In Subsection II-A the measurement modelis presented and in Subsection II-B, the LMMSE estimatorand its associated MSE are derived for the discussed model.Finally, in Subsection II-C we present the resource allocationoptimization problem and discuss the difﬁculties of solvingthe problem.

A. General Mixed-Resolution Measurement Model

We consider the problem of estimating a random parametervector based on mixed-resolution data. In particular, we as-sume a parameter vector, θ ∈ C M , with a zero-mean complexGaussian distribution, θ ∼ CN ( , Σ θ ) , where Σ θ is a knownpositive deﬁnite covariance matrix. The goal is to estimate θ from a linear measurement model having both analog, high-resolution measurements: x a = H θ + w a , (2)and quantized, low-resolution measurements: x q = Q ( G θ + w q ) , (3)where the quantization operator, Q ( · ) , is deﬁned in (1).The matrices H ∈ C N a × M and G ∈ C N q × M are known,with N a and N q being the number of analog and quantizedmeasurements, respectively, and the added noise vectors, w a and w q , are independent, zero-mean, complex Gaussian noise,i.e. w a ∼ CN ( , σ a I N a ) and w q ∼ CN ( , σ q I N q ) . It is alsoassumed that the noise vectors, w a and w q , and the unknownparameter vector, θ , are mutually independent. As a result, theanalog measurements follow a complex Gaussian distribution,i.e. x a ∼ CN ( , C x a ) , with the covariance matrix C x a = HΣ θ H H + σ a I N a . (4)Similarly, the vector y △ = G θ + w q (5)is a complex Gaussian vector, i.e. y ∼ CN ( , C y ) , with thecovariance matrix C y = GΣ θ G H + σ q I N q . (6)In particular, (3) implies that E[ x q ] = . However, thedistribution of the quantized measurements x q in (3) does nothave a closed-form expression for the general case. The goal is to use mixed-resolution measurements, i.e. the augmentedvector x △ = (cid:2) x Ta x Tq (cid:3) T , (7)to estimate θ efﬁciently.The considered model is fundamental in various signalprocessing applications with mixed-resolution data. An im-portant case of this model, which is discussed in detail inSubsection IV-B, is channel estimation in massive MIMOcommunication systems. In this case, there are multiple userstransmitting data to multiple antennas and the goal is toestimate the channel between users and each antenna, whichis the unknown parameter vector θ in this case. Due to systemlimitation, such as sensor power consumption, sensor cost,and channel capacity, part of the observed data is quantizedat the antenna and sent the fusion center for estimation.This scenario, presented schematically in Fig. 1, can beinterpreted as a joint distributed-centralized estimation setupin a MIMO communication system with partially-quantizedmeasurements.Fig. 1: Schematic system model of estimation with mixed-resolution data. A data vector, θ , is transmitted and receivedover a known channel and undergoes either a high- or 1-bit low-resolution quantization. The unknown data vector isthen estimated from the mixed-resolution measurements in thefusion center. B. LMMSE Estimation

The MMSE estimator of θ based on the mixed-resolutiondata x in (7) is given by the conditional expectation, ˆ θ MMSE =E[ θ | x ] . Derivation of the MMSE estimator requires an ana-lytic form of the conditional probability distribution function, f ( θ | x ) , which does not have a closed-form expression ingeneral in the presence of quantized measurements. Moreover,since the MMSE estimator is a function of both the analog(continuous valued) measurements and the quantized (discretevalued) measurements, then, even the numerical evaluation of ˆ θ MMSE is intractable and requires multidimensional numericalintegration. Therefore, usually the LMMSE estimator is usedwhen quantized measurements are involved.For the sake of simplicity of presentation, in the following, ˆ θ denotes the LMMSE estimator. For zero-mean measurements,as in our case, and under the assumption that C x is a non-singular matrix, the LMMSE estimator, based on both x a and x q , is given by ˆ θ = C θ x C − x x , (8)and the associated MSE of the LMMSE estimator is M SE = E h (ˆ θ − θ ) H (ˆ θ − θ ) i = trace ( Σ θ ) − trace (cid:0) C θ x C − x C H θ x (cid:1) . (9)In Appendix A it is shown that the auto-covariance matrix of x and the cross-covariance matrix of x and θ are block matricesgiven by C x =  HΣ θ H H + σ a I N a ... (cid:16)q π HΣ θ G H ( diag ( C y )) − (cid:17) H ... q π HΣ θ G H ( diag ( C y )) − π (cid:16) arcsin (cid:16) ( diag ( C y )) − Re ( C y ) ( diag ( C y )) − (cid:17) + j arcsin (cid:16) ( diag ( C y )) − Im ( C y ) ( diag ( C y )) − (cid:17)(cid:17) (10)and C θ x = h Σ θ H H q π Σ θ G H ( diag ( C y )) − i , (11)respectively. By substituting (10) and (11) into (8) and (9) weobtain the LMMSE estimator and its associated MSE. C. Optimization of MSE

The main goal in this paper is to ﬁnd the optimal number ofanalog and quantized measurements, N ∗ a and N ∗ q , respectively,in the sense of minimum MSE of the associated LMMSEestimator, given in (9) under some physical constraints. Thatis, we aim to solve the following optimization problem, min N a ,N q trace ( Σ θ ) − trace (cid:0) C θ x C − x C H θ x (cid:1) s.t.  h ( N a , N q ) ≤ ( N a , N q ) = N a , N q ∈ Z + , (12)where h ( N a , N q ) ≤ and h ( N a , N q ) = 0 represent dif-ferent inequality and equality constraints, respectively, whichstem from physical system requirements. The optimizationproblem in (12) is an integer programming problem, whichoften leads to solutions of combinatorial nature that cannotbe solved in a reasonable time, even for small datasets [36].Moreover, since the decision variables N a and N q representthe dimensions of the matrices C θ x and C x in the objectivefunction of (12), the problem cannot be solved by a simplerelaxation that allows non-integer rational solutions. As aresult, for each value pair of the decision variables, N a and N q , we need to calculate the inverse matrix, C − x , and performmatrix multiplication, giving a total computational complexityof O (cid:0) ( N a + N q ) + 2 M ( N a + N q ) (cid:1) , which increases as thenumber of measurements increases. This approach may beintractable and may hinder insights into the original problem. III. O

PTIMIZATION FOR THE O RTHONORMAL M EASUREMENT M ODEL

In this section we present the LGO measurement model,optimize the resource allocation for this model, and proposethe use of dithering. In Subsection III-A we present theLGO model and derive a tractable expression for the MSEunder this model. In Subsection III-B we discuss some phys-ical constraints common in real-world systems with mixed-resolution measurements and in Subsection III-C solve theresource allocation optimization problem under a constraint. InSubsection III-D the concept of dithering for the LGO model isdiscussed and two possible cases of the optimization problemare presented while allowing dithering.

A. Orthonomal Measurement Model

The LGO model is the model described in Subsection II.A,which also satisﬁes the following assumptions:A.1) The elements of θ are uncorrelated with unit variance,i.e. Σ θ = I M .A.2) The matrix H is a block matrix of size N a × M , where N a = M n a : H = (cid:2) H H . . . H n a (cid:3) T , (13)where each block satisﬁes H Hi H j = ρ a I M , i = j (14)and ρ a > . If i = j then the product H Hi H j can takearbitrary values.A.3) The matrix G is a block matrix of size N q × M , where N q = M n q , with equal blocks: G = n q ⊗ G , (15)where G H G = ρ q I M , (16)in which ρ q > . Theorem 1.

The LMMSE estimator for the mixed-resolutionmodel, described in Subsection II-A and under AssumptionsA.1-A.3, is ˆ θ = (cid:20)(cid:16) ρ a n a + σ a − ρ q n q σ a π ( ρ q + σ q )( α + β ( n a ) ρ q n q )( ρ a n a + σ a ) (cid:17) H H ... q π ( ρ q + σ q ) σ a ( α + β ( n a ) ρ q n q )( ρ a n a + σ a ) G H i x (17) and its associated MSE is M SE = M − M (cid:18) ρ a n a ρ a n a + σ a + 2 ρ q n q σ a π ( ρ q + σ q ) ( α + β ( n a ) ρ q n q ) ( ρ a n a + σ a ) ! , (18) where α △ = 1 − π arcsin (cid:18) ρ q ρ q + σ q (cid:19) = 2 π arccos (cid:18) ρ q ρ q + σ q (cid:19) (19) and β ( n a ) △ = 2 π arcsin (cid:18) ρ q ρ q + σ q (cid:19) ρ q − ρ a n a π ( ρ q + σ q )( ρ a n a + σ a ) . (20) Proof:

The proof appears in Appendix B.The MSE in (18) does not require matrix inversion and isonly a function of the scalar variables: n a , n q , ρ a , ρ q , σ a , and sigma q . Thus, it can be used to solve various optimizationproblems. The two extreme cases of Theorem 1 are when n a =0 and when n q = 0 . For these cases, substituting n a = 0 in(18), we obtain M SE = M − M ρ q n q π ( ρ q + σ q ) ( α + (1 − α ) n q ) . (21)Similarly, substituting n q = 0 in (18), we obtain M SE = M − M ρ a n a ρ a n a + σ a . (22)In both cases, the MSE given in (21) and (22) is a mono-tonically decreasing function of n q and n a , respectively. Theexpression in (21) coincides with the result in [10] for purelyquantized data. B. Constraints

Physical distributed networks, such as sensor networks,typically suffer from energy constraints and limited commu-nication bandwidth, requiring quantization before data can betransmitted to a fusion center for further processing. In thiscontext, one of the incentives of integrating low-resolutionADCs is their power consumption, which is much lower thanthat of high-resolution ADCs. In particular, power consumedin each ADC can be expressed as a factor of the number ofquantization bits, ˜ b , as follows: P ADC = FOM W f s ˜ b , (23)where f s is the sampling rate and FOM W is Walden’s ﬁgure-of-merit for evaluating the power efﬁciency with ADCs reso-lution and speed [20]. In this paper, the power consumptionof a single high-resolution ADC with b bits is denoted by P H and that of a single 1-bit ADC by P L . Thus, the totalpower consumption of the N a high-resolution measurements is N a P H and for the N q low-resolution measurements is N q P L .Due to power limits of physical systems, we consider that thefollowing constraint is imposed on the total power: N a P H + N q P L ≤ P max . (24)Substituting the power consumption of each ADC from (23)with ˜ b = b and ˜ b = 1 for the b -bit and 1-bit measurements,respectively, into (24), we obtain the constraint b N a + 2 N q ≤ ˜ P max , (25)where ˜ P max △ = P max / FOM W f s is the normalized maximum power.By substituting N a = M n a and N q = M n q , from assumptionA.2 and A.3 of the LGO model, (25) can be rewritten as b M n a + 2 M n q ≤ ˜ P max . (26) It should be noted that while the constraint in (24) treatsthe high-resolution measurements as ﬁnite b -bit quantizeddata, the MSE in (18) is derived under the assumption ofpure analog measurements. Therefore, the number of bitsthat are used to represent the high-resolution ADC, b , shouldbe chosen such that the quantization error is negligible. Insimulations, we demonstrate that by choosing b large enough,the approximation of analog measurements holds and the MSEfrom (18) is achieved by b -bit quantized data.In addition to the power constraints in (25), systems mayalso have other physical constraints on the number of mea-surements. This can be due to system design or availableworkspace, such as a ﬁeld in which sensors are deployed,requiring a minimal distance from each other to avoid interfer-ence. The optimization in the following section can be readilyextended to incorporate such constraints. C. Resource Allocation

In this subsection, we optimize the resource allocation ofthe LGO model using the analytical expression of the MSEderived in Subsection III-A as an objective function andimposing the constraints from Subsection III-B. We show thatthe integer programming problem from (12) can be solved inpolynomial time. It should be noted that the expression of theMSE in (18) can be used as a tractable objective function fordifferent optimization problems of the LGO mixed-resolutionscheme.Under assumptions A.1-A.3, the objective function, i.e. theMSE from (9), is now given by (18). In addition, since M isknown, the decision variables can be changed to be n a and n q ,using the relation N a = M n a and N q = M n q . The minimiza-tion of the MSE in the following is conducted under the powerconstraint in Subsection III-B. Thus, applying the constraintfrom (26) and substituting (18) in (12), the minimum MSEproblem under a power constraint is formulated as min n a ,n q M − M (cid:18) ρ a n a ρ a n a + σ a + 2 ρ q n q σ a π ( ρ q + σ q )( α + β ( n a ) ρ q n q )( ρ a n a + σ a ) (cid:19) s.t. ( b M n a + 2 M n q ≤ ˜ P max n a , n q ∈ Z + . (27)Solving the optimization problem in (27) no longer requiresthe inversion of C x . Moreover, the values n a and n q are nolonger found in the matrix dimensions. Thus, (27) can besolved using a standard search approach or by a conventionalrelaxation approach. In this paper, we adopt the ﬁrst option. Proposition 1.

The optimization problem in (27) can be solvedusing a one-dimensional search over n a , taking a set ofdiscrete values n a ∈ ( , , · · · , $ ˜ P max b M %) (28) where for each value of n a , the value of n q is choosen to utilize maximum power, i.e. n q = $ ˜ P max − b M n a M % . (29) Proof:

The proof appears in Appendix C.Based on Proposition 1, one can numerically evaluate theoptimal value, n ∗ a , by using a simple one-dimensional searchalgorithm over the discrete values of n a described in (28) tominimize the MSE. Then, the optimal number of quantizedmeasurements, n ∗ q , is obtained by substituting n a = n ∗ a in(29). The optimal resource allocation depends on the systemparameters: the total energy budget ˜ P max and the noisevariances, σ a and σ q .In the following we present a few special cases to interpretthe MSE in (18) and the resource allocation optimization in(27). • For the trivial case where the noise of the analog mea-surements approaches zero, i.e. σ a → , it can be seenthat the MSE in (18) approaches zero as well for any n a ≥ . Therefore, in this case an optimal solution of(27) is obtained for n ∗ a = 1 , which enables the estimationof θ without an estimation error such that the MSE is nolonger a function of n q . That is, additional measurements,both analog and quantized, won’t change the optimalMSE value. • For the case where the noise of the quantized measure-ments approaches zero, i.e. σ q → , the parameter α ,deﬁned in (19), also approaches zero. By substituting σ q → and α → in (18), we obtain that the MSEin this case is given by lim σ q → M SE = M − M (cid:18) ρ a n a ρ a n a + σ a + 2 σ a π ( ρ a n a + σ a ) − ρ a n a ( ρ a n a + σ a ) (cid:19) , (30)for n q ≥ and by (22) for n q = 0 . Depending onthe available power, ˜ P max , the optimal resource allocationscheme in this case is determined. It can be seen that for n q ≥ , the MSE given in (30) is a constant functionw.r.t. n q . That is, for the noiseless case, the number ofquantized measurements does not change the MSE aslong as there is at least a single quantized measurement.Therefore, for this case of σ q → , the optimal policy inthe sense of minimum MSE of the LMMSE estimator isone of the two following options: 1) take the maximumpossible number of analog measurements and at leasta single quantized measurement or 2) use only analogmeasurements, i.e. n a max △ = j ˜ P max b M k , as given in (22).The choice between these two options is as follows. Ifthe available power maintains the following inequality ˜ P max − $ ˜ P max b M % b M ≥ M (31)then Option 1 is the optimal solution of (27) for thiscase. If the inequality in (31) does not hold, the optimalsolution is either Option 1 with n ∗ a = n a max − and n ∗ q ≥ or Option 2 with n ∗ a = n a max and n ∗ q = 0 .Comparing the MSE in (22) and (30) for the latter case,Option 2 is optimal if the following inequality holds (cid:0) ( π − ρ a − ρ a σ a (cid:1) n a max > σ a − πρ a σ a + ( π − ρ a (32)and if not, then Option 1 is optimal. Thus, the samplingpolicy in this case depends on the noise variance of theanalog measurement, σ a , the factor ρ a , and the maximumnumber of available analog measurements under thepower constraint from (24), i.e. by the condition in (31). D. Optimization with Dithering

Dither, roughly speaking, is a random noise process addedto a signal prior to its quantization [22], [34], [35]. Theaddition of dithering noise is commonly used in both Bayesianand non-Bayesian estimation with low-resolution quantizeddata. Although noise commonly degrades the performance ofa system, it has been shown that the addition of noise toquantized measurements can improve system performance.In this subsection, we consider the addition of independent,zero-mean, complex Gaussian dithering noise vectors, w d a ∼CN ( , σ d a I N a ) and w d q ∼ CN ( , σ d q I N q ) , to the analogand quantized measurements, respectively, before quantization.The dithering noise vectors, w d a and w d q , and the vectors θ , w a , and w q are assumed to be mutually independent.Therefore, the analog measurement vector in (2) now equals x a = H θ + w a + w d a , (33)and the quantized measurement vector in (3) satisﬁes x q = Q (cid:0) G θ + w q + w d q (cid:1) . (34)Since the vectors θ , w d a , w d q , w a , and w q are all mutuallyindependent, then it can be shown, similar to the derivation of(4) and (6), that (33) and (34) imply, in this case, the followingcovariance matrices of the measurements: C x a = HΣ θ H H + ( σ a + σ d a ) I N a (35)and C y = GΣ θ G H + ( σ q + σ d q ) I N q . (36)Thus, all of the results developed in Subsections III-A andIII-C hold with the noise variance of the analog measurements, σ a , increasing by σ d a , and that of the quantized measurements, σ q , increasing by σ d q . In particular, the MSE for the LGOmodel from (18) in this case is given by M SE ˆ θ = M − M ρ a n a ρ a n a + σ a + σ d a + 2 ρ q n q ( σ a + σ d a ) π ( ρ q + σ q + σ d q ) ( α d + β d ( n a ) ρ q n q ) (cid:0) ρ a n a + σ a + σ d a (cid:1)  , (37)where α d △ = 2 π arccos ρ q ρ q + σ q + σ d q ! (38) and β d ( n a ) △ = 2 π arcsin ρ q ρ q + σ q + σ d q ! ρ q − ρ a n a π ( ρ q + σ q + σ d q )( ρ a n a + σ a + σ d a ) , (39)are the equivalent of (19) and (20), respectively, and replacing σ a and σ q with σ a + σ d a and σ q + σ d q , respectively.Under this model, the goal is to ﬁnd the optimal resourceallocation, the number of analog and quantized measurements,which minimizes the estimator’s MSE while also optimizingthe variance of the added dithering noise. Therefore, welook at the LGO Model from Subection III-A and solve theoptimization problem given in (27) with the objective functionnow being the MSE from (37) and adding the ditheringnoise variances, σ d a and σ d q , as additional decision variables.Mathematically, the optimization problem in (27) with theaddition of dithering noise can be rewritten as min n a ,n q ,σ da ,σ dq M − M ρ a n a ρ a n a + σ a + σ d a + 2 ρ q n q ( σ a + σ d a ) π ( ρ q + σ q + σ d q )( α d + β d ( n a ) ρ q n q )( ρ a n a + σ a + σ d a ) ! s.t.  b M n a + 2 M n q ≤ ˜ P max n a , n q ∈ N σ d a ≥ σ d q ≥ , (40)where α d and β d are given in (38) and (39), respectively.It can be shown that for any given set of values n a , n q , and σ d q the optimal dither noise added to the analogmeasurements, σ d a , is zero. This solution is intuitive sincefor analog data, the MSE decreases as the signal-to-noiseratio (SNR) increases, and, thus, the addition of a ditheringnoise can only degrade estimation performance. Therefore, inthe following, we discuss two scenarios of dithering: 1) theoptimal solution, which is obtained by adding dithering noiseto the quantized measurements only; and 2) adding ditheringnoise with the same variance to the entire system, both analogand quantized measurements. Thus, we add a single decisionvariable to the optimization problem in (40), denoted as σ d ,where for Scenario 1 we set σ d a = 0 and σ d q = σ d , and forScenario 2 we set σ d a = σ d q = σ d .Similar to the derivation of Proposition 1, it can be provedthat for each value of n a and given the dithering noise vari-ances, σ d a and σ d q , the number of quantized measurements, n q , should be chosen to be the maximum allowed under thepower constraint. Works that utilize dithering, such as [13],[37], ﬁnd the optimal dithering variance by using an exahustivesearch. Similarly, we ﬁnd the optimal allocation with dithering,i.e. the solution of (40), by utilizing a two-dimensional searchover each value pair, n a and n q , while utilizing maximumpower and for each pair search possible values of ditheringvariance σ d . This algorithm is summarized in Algorithm 1. Algorithm 1:

Resource Allocation with Dithering

Input: M , b , ˜ P max , σ d max , σ d res Output: n ∗ a , n ∗ q , σ d ∗ Initialize

M SE opt = M for n a = 0 : j ˜ P max b M k do n q = j ˜ P max − b Mn a M k for σ d = 0 : σ d res : σ d max do Calculate

M SE temp by substituting n a , n q ,and σ d in (37) if M SE temp < M SE opt then

Update optimal values:

M SE opt , n ∗ a , n ∗ q , σ d ∗ . endendend IV. S

PECIAL C ASES

In this section we discuss two special cases of the LGOmodel, described in Subsection III-C. In Subsection IV-A, wepresent the problem of estimating a scalar parameter fromnoisy measurements, a model which is widely used in WSN,for example [1], [22]. In Subsection IV-B, we discuss theallocation of analog and quantized measurements for channelestimation in massive MIMO communication systems [10],[15], [31], [38]–[40].

A. Estimation of a Scalar Parameter

In this subsection, we are interested in estimating a scalarunknown parameter, θ ∈ C , with a zero-mean complex Gaus-sian distribution, θ ∼ CN (0 , , based on mixed-resolutiondata. Suppose, for example, a WSN with N sensors, where N a of them transmit analog measurements and N q transmit 1-bit quantized measurements, where N = N a + N q , to a centralunit for estimation. In the case, (2) and (3) are reduced to x a = N a θ + w a (41)and x q = Q (cid:0) N q θ + w q (cid:1) , (42)respectively. We assume that w a and w q are indepen-dent, zero-mean complex Gaussian noise distributed w a ∼CN ( , σ I N a ) and w q ∼ CN ( , σ I N q ) . This scalar estima-tion problem satisﬁes the LGO model assumptions: First, itcan be seen that the distribution of the unknown parameter, θ ∼ CN (0 , , satisﬁes A.1). Second, H = N a and G = N q can be treated as block matrices with each entry being thescalar , which in turn satisﬁes assumption A.2) and A.3)with ρ a = 1 and ρ q = 1 . Satisfying the assumptions allows usto use Theorem 1 in order to ﬁnd the optimal measurementallocation scheme for the scalar case. It should be noted thatsince the dimensions of the auto-covariance matrix C x areaffected by the number of measurements, N a and N q , and notby the size of the unknown parameter vector θ . Solving theoptimization problem in (12) still requires the inversion of C x at each value pair for the scalar case. Therefore, the proposed tractable formulation in (27) is also relevant and important forthe scalar case. B. Channel Estimation in Massive MIMO

In this subsection, we consider the special case of channelestimation using analog and 1-bit quantized measurement inmassive MIMO networks. Massive MIMO has a high potentialof enabling technology beyond fourth generation (5G) cellularsystems due to its advantages in terms of spectral efﬁciency,energy efﬁciency, and the ability to use low-cost low-powerhardware [41], [42]. The following model of mixed-ADCmassive MIMO has been used in [10], [15], [31], [38]–[40]and is described in detail due to its importance and to clarifythe relation to the considered LGO model.We study the uplink of a single-cell multi-user MIMOsystem consisting of K single antenna users transmitting in-dependent data symbols simultaneously to a base station (BS)equipped with L antennas. We consider a block-fading modelwith coherence bandwidth W c and coherence time T c . In thismodel, each channel remains constant in a coherence intervalof length T = T c W c symbols and changes independentlybetween intervals. The coherence interval can be dividedinto two parts: the ﬁrst part is used for channel estimation,referred to as the training phase, while the second part is fordata transmission. During training, all K users simultaneouslytransmit their pilot sequences of K mutually orthogonal pilotsymbols. Therefore, the received signal during the trainingphase is R = √ ρ AΦ T + W , (43)where A ∈ C L × K is the channel matrix, ρ is the pilottransmission power, Φ ∈ C K × K is the pilot signal matrixtransmitted from the K users, and W is independent, zero-mean, complex Gaussian noise with each element distributed CN ( , σ ) . We assume the channel vectors are i.i.d. anddenote the l th row of A as a l , where a l has a zero-meancomplex Gaussian distribution, i.e. a l ∼ CN ( , I K ) . The pilotsequences are drawn from the pilot signal matrix Φ , where Φ H Φ = I K .Since we assume independent channels and noise, then therows of R from (43) are mutually independent, enabling usto analyze each row separately. Therefore, the l th row of R (viewed as a column vector), satisﬁes r l = √ ρ Φa l + w l , l = 1 , . . . , L. (44)Due to the high power consumption of high-resolution ADCsand the less informative data of low-resolution ADCs, in manyworks both analog and quantized measurements are used tobeneﬁt from both worlds (see, e.g. [12], [13] and referencestherein). In order to achieve good performance there is a needfor more measurements when working with quantized data asopposed to analog. Therefore, the pilot sequence, Φ , can betransmitted a number of times with the antennas switchingbetween the high- and low-resolution ADCs at each transmit.Let us transmit the K pilot symbols N times. For eachchannel l ∈ { , . . . , L } , we denote the number of times thepilot sequence is transmitted with the analog ADC connectedto the antenna as n a l and with the 1-bit quantized ADC as n q l where n a l + n q l = N . We can organize the measurementsnow such that the analog measurements from (44) are r a l = √ ρ Φ a a l + w a , (45)where r a l ∈ C n al K and Φ a is a n a l K × K block matrixdeﬁned as Φ a = n a ⊗ Φ . (46)Similarly, the quantized measurements from (44) are r q l = √ ρ Φ q a l + w q , (47)where r q l ∈ C n aq K and Φ q is a n q l K × K block matrixdeﬁned as Φ q = n q ⊗ Φ . (48)By setting a l = θ , r a l = x a , and r q l = x q the measurementmodel in (45) and (47) coincides with the general measurementmodel in (2) and (3) where M = K . Moreover, we nowshow that Assumptions A.1-A.3 from Subsection III-A aresatisﬁed for the mixed-ADC massive MIMO model. First, thechannel is modeled such that a l ∼ CN ( , I M ) keeping theassumption that Σ θ = I M . Thus, Assumption A.1) is satisﬁed.Second, by using (45) and (46), it can be seen that the matrix H = √ ρ Φ a is a block matrix of size n a l K × K and satisﬁes ( √ ρ Φ ) H √ ρ Φ = ρ I K , thus Assumption A.2) is satisﬁed with ρ a = ρ . Similarly, by using (47) and (48), it can be seen thatthe matrix G = √ ρ Φ q satisﬁes Assmuption A.3) with ρ q = ρ .The goal under this model is to estimate the channel of asystem consisting of a single antenna, L = 1 , while usingboth a high- and low-resolution ADCs in the BS, which theantenna can switch between to acquire both analog and quan-tized measurements. The number of measurements taken, orequivalently the number of times the pilot signal is transmittedusing each ADC resolution, is to be optimized to minimize theMSE of the LMMSE estimator, while not exceeding the powerconsumption at the BS. Substituting M = K , ρ a = ρ q = ρ ,and σ a = σ q = σ in (27), the resource allocation problemcan be solved for the problem of channel estimation in massiveMIMO systems. Similarly, the same substitution can be donein (40) allowing to ﬁnd the optimal resource allocation withdithering.This approach can be extended to the more general case,where there are L ≥ i.i.d. antennas or channels to estimate.The channel estimation problem for the channel (43) canbe decomposed into parallel estimation problems [30] solvedseparately for each channel with the maximum power availableequaling for example, to P max /L , allocating each channel anequal power supply, thus allowing us to optimize the wholesystem while solving the problem for a single channel.V. S IMULATIONS

In this section we numerically evaluate the performance ofthe mixed-resolution system presented in Section III and thatof the proposed resource allocation optimization approach. InSubsection V-A we simulate the scalar case from SubsectionIV-A, in Subsection V-A we simulate the model of channelestimation in massive MIMO from Subsection IV-B, and inSubsection V-C we compare the run time of the proposed resource allocation approach and the brute-force approach in(12). As discussed in Subsection III-B, in order for the quan-tization noise to be negligible we use b = 6 bits on a quan-tization range [ − , to represent our analog measurements.The noise added to the analog and quantized measurements isassumed to have the same variance, σ a = σ q = σ , and we set ρ a = ρ q = 1 . Our results are averaged over 100 Monte-Carlosimulations. A. Scalar Parameter Estimation

In this subsection, estimation of a scalar parameter, M = 1 ,as discussed in Subsection IV-A, is evaluated. The simulationresults in Fig. 2 show the behavior of the MSE from (18) forthe scalar case with different values of n a and n q as a functionof the noise variance σ . It can be seen that the addition ofmeasurements, be it analog or quantized, does not degradethe performance in terms of MSE. In addition, when onlyanalog measurement are used, n q = 0 , the MSE monotonicallydecreases as σ decreases. The same can not be said for themixed-resolution case for which the behavior of the MSE isnot even convex. For example, it can be seen that there arecases such as n a = 1 and n q = 100 that the addition ofdithering noise to both measurement types can improve theMSE which may be counterintuitive due to the behavior ofpure analog measurement estimation. In Fig. 3a and Fig. 3b, -1 M SE N a =1,N q =0N a =1,N q =40N a =1,N q =60N a =1,N q =80N a =1,N q =100N a =5,N q =0N a =5,N q =40N a =5,N q =60N a =5,N q =80N a =5,N q =100 Fig. 2: Scalar case: effects of noise variance on the estimator’sMSE for different number of measurements.the MSE is shown for a given noise variance, σ , as a functionof the number of analog measurements, n a , and quantizedmeasurements, n q . Dots on the graphs show the possible valuepairs of analog and quantized measurements, n a and n q , fordifferent values of available power, ˜ P max . These ﬁgures show,as before, that taking more measurements does not increasethe MSE. In addition, it can be seen that using a mixed-resolution approach can have a lower MSE than assuming anaive approach which utilizes only one type of measurementup to the maximum power available. B. Channel Estimation in Massive MIMO

In this subsection, the massive MIMO model, as describedin Subsection IV-B, is simulated. The resource allocationoptimization problem in (27) is solved for M = 10 users (a)(b) Fig. 3: Scalar Case: The MSE as a function of the numberof analog and quantized measurements, n a and n q . The noisevariance is σ = 1 (a) and σ = 2 (b), where σ = σ a = σ q .transmitting a randomly generated pilot matrix Φ . The max-imum power available is set to ˜ P max = 2 b M n a max , where b = 6 and n a max = 20 . In Fig. 4 we compare between theoptimal resource allocation calculated using a one-dimensionalsearch, as described in Subsection III-C, and between twonaive solutions: 1) a greedy scheme, which utilizes the maxi-mum number of analog measurements and 2) a cost-effectivescheme, which uses the maximum number of measurementsby only having quantized measurements. The analytic MSE iscalculated under the assumption of pure analog measurements,i.e. as given in (27), although we use in practice b = 6 -level quantized data. Therefore, we also present Monte-Carlosimulations of the obtained MSE in practice that show thatthe values to represent the analog measurements make thequantization noise negligible for the purpose of estimation.We can divide the graph into three sections: 1) low noisevariance, σ < . , in which case the use of all analogmeasurements is optimal, 2) high noise variance, σ > , inwhich the use of all quantized measurements is optimal, and 3)middle section, . < σ < , in which it is optimal to use amixed-resolution measurement scheme. Moreover, Fig. 4 alsocompares the aforementioned solutions to the solution of theresource allocation optimization problem with optimization ofthe dithering noise added only to the quantized measurements.This is done using a two-dimensional search over n a and σ q d with σ q d ∈ [0 , with increments of . , as in Algorithm1. In the middle section in which better performance wasachieved by using the mixed-resolution scheme as opposed to the naive solutions, the addition of dithering noise improvedperformance even more. As expected, in the low noise variancesection, in which the analog measurements are optimal, thedithering had no effect but it did effect part of the highnoise variance section improving the performance when usingall quantized measurement. Therefore, we can conclude thatutilizing dithering can improve system performance in thesense of the LMMSE estimator’s MSE. -0.5 -1 M SE Optimal With Dither-TheoreticalOptimal With Dither-Monte CarloOptimal No Dither-TheoreticalOptimal No Dither-Monte CarloGreedy-TheoreticalGreedy-Monte CarloCost-eff.-TheoreticalCost-eff.-Monte Carlo

Fig. 4: Channel estimation in massive MIMO with M = 10 users. The MSE of the LMMSE estimator versus the noisevariance, where σ = σ a = σ q . The addition of ditheringnoise is available only to the 1-bit quantized meausrements inthe mixed-resolution scheme. C. Run Time

In this subsection, the computational complexity of solvingthe resource allocation optimization problem is evaluated un-der the LGO measurement model. This is done by comparingthe time of solving the optimization problem in (12) usingthe closed-form analytic expression of the MSE derived inTheorem 1 compared to the general MSE term in (9), whichrequires matrix inversion. In both cases, the one-dimensionalsearch from Proposition 1 is used. It should be noted thatthe general MSE term in (9) does not give insight on thebehavior of the MSE. Therefore, Proposition 1 isn’t provenfor the general term which in turn requires a two-dimensionalsearch over all possible value pairs of n a and n q in orderto solve the resource allocation optimization problem. Thus,for the general case, the run time of the optimization problemusing the general MSE term is much longer than that presentedin the simulation. The average computation time, run time,was evaluated by running the algorithm using Matlab onan Intel Xeon E5-2660 CPU. In Fig. 5 we show the runtime of the optimal resource allocation in both forms ofthe MSE as a function of the maximum number of analogmeasurements available. The maximum power available isset to equal ˜ P max = 2 b M n a max and we evaluate the casesof M = 1 , , . It can be seen that the more analogmeasurements are available, the larger the maximum powerand therefore, the computation time increases since the searchis over more values. This has a larger effect when using matrix inversion since C x ∈ C ( n a + n q ) M × ( n a + n q ) M . In addition, thesize of the unknown parameter vector, θ ∈ C M , does notaffect the run time of the proposed approach in Theorem 1while increasing the calculation time of the matrix inversionMSE in the direct approach.

10 15 20 25 30 35 40 45 5010 -6 -4 -2 T i m e [ s ] M=1,Direct ApproachM=1,Proposed ApproachM=3,Direct Approach M=3,Proposed ApproachM=6,Direct ApproachM=6,Proposed Approach

Fig. 5: Run time comparison between calculation of the MSEusing the direct approach and the closed-form expression inTheorem 1 for different values of M .VI. C ONCLUSION

In this paper, we consider Bayesian parameter estimationusing mixed-resolution measurements. First, we derive theLMMSE estimator and its associated MSE for the mixed-resolution case. It is shown that the MSE requires matrixinversion with the size of the matrix depending on the numberof analog and quantized measurements, and thus, optimizationproblems that aim to minimize the MSE w.r.t. the number ofmeasurements are impractical due to the exhaustive searchrequired. Next, we present the LGO model for which wecalculate a closed-form expression for the LMMSE estimatorand its corresponding MSE. Two special cases for which themixed-resolution scheme under the LGO model is relevant arepresented: 1) scalar parameter estimation which is used, for ex-ample, in WSN, and 2) channel estimation in massive MIMOcommunication systems. Based on the closed-form expressionof the MSE and enforcing a power consumption constraint, aresource allocation optimization problem is formulated withthe goal of ﬁnding the optimal resources, namely the numberof analog and quantized measurements. A one-dimensionalsearch is proven to be sufﬁcient in ﬁnding the optimal solutionto the problem. Furthermore, the concept of dithering ispresented and the resource allocation optimization problem isderived while also allowing optimization of the dithering noisevariance added to the system. Finally, in the simulations weshow that the mixed-resolution scheme outperforms the naiveapproaches of pure analog or pure quantized measurements forcertain ranges of noise variance that are not in the asymptoticregion nor the so-called non-informative region. In addition,the possible beneﬁts of dithering on estimation performancein terms of MSE are shown for mixed-resolution schemes.Solving the resource allocation optimization problem for theLGO measurement model has a low-complexity solution al-lowing fast calculation. The mixed-resolution scheme can and should be adopted in different real-world applications with thesolution for the LGO measurement model easily achieved andthus improve system performance.A

PPENDIX AD ERIVATION OF (10)

AND (11)In this appendix we develop the auto-covariance and cross-covariance matrices from (10) and (11) for the general modeldescribed in Subsection II-A. By using (7), it can be veriﬁedthat the auto-covariance and cross-covariance matrices areblock matrices given by C x = (cid:20) C x a C x a x q C x q x a C x q (cid:21) (49)and C θ x = (cid:2) C θ x a C θ x q (cid:3) , (50)respectively, where C x a is given in (4). From the analog mea-surement vector x a given in (2) and based on the measurementmodel in Subsection II-A, it can be veriﬁed that C θ x a = Σ θ H H . (51)To calculate the covariance matrix C x q we use the arcsinelaw (p. 396 in [43]) which implies that given two zero-meanjointly complex Gaussian random variables, r and t , the cross-covariance of the quantized variables, Q ( r ) and Q ( t ) , is givenby C Q ( r ) , Q ( t ) = E [ Q ( r ) Q ∗ ( t )]= 2 π " arcsin Re { C rt } p σ t σ r ! + j arcsin Im { C rt } p σ t σ r ! , (52)where σ t and σ r are the covariance of the random variables t and r , respectively. Therefore, given the measurement vector x q in (3), which is the 1-bit quantization of y in (5), andapplying the result in (52) element-wise, the auto-covariancematrix is given by C x q =E (cid:2) x q x Hq (cid:3) = E (cid:2) Q ( y ) Q H ( y ) (cid:3) = 2 π h arcsin (cid:16) ( diag ( C y )) − Re ( C y ) ( diag ( C y )) − (cid:17) + j arcsin (cid:16) ( diag ( C y )) − Im ( C y ) ( diag ( C y )) − (cid:17)i , (53)where C y is deﬁned in (6). It should be noted that thematrix C x q in (53) is well deﬁned according to the followingexplanation. The elements of the matrix h ( diag ( C y )) − C y ( diag ( C y )) − i i,j = C y i y j p C y i C y j , (54)are, by deﬁnition, the Pearson correlation coefﬁcients. There-fore, from the properties of the Pearson correlation coefﬁceint, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) C y i y j p C y i C y j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ , (55)and since (cid:12)(cid:12) Re (cid:0) C y i y j (cid:1)(cid:12)(cid:12) ≤ (cid:12)(cid:12) C y i y j (cid:12)(cid:12) we obtain (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Re (cid:0) C y i y j (cid:1)p C y i C y j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ . (56) Similarly, the same holds for Im (cid:0) C y i y j (cid:1) and thus, the auto-covariance matrix C x q in (53) is well deﬁned.Finally, to calculate the cross-covariance matrix of x q with θ and x a we use the Bussgang Theorem [44], which impliesthat for two zero-mean complex Gaussian random variables, r and t , with 1-bit quantization as given in (1), the cross-covariance is given by C r Q ( t ) = E [ r Q ∗ ( t )] s πσ t C rt , (57)where σ t is the covariance of the random variable t . Therefore,the cross-covariance matrix of θ and x q is given by C θ x q = E (cid:2) θ Q H ( y ) (cid:3) = r π E h θ (cid:16) θ H G H + w Hq (cid:17)i ( diag ( C y )) − = r π Σ θ G H ( diag ( C y )) − , (58)where the second equality is obtained by implementing theBussgang formula from (57) element-wise, and the last equal-ity following the fact that θ and w a are mutually independentand G is deterministic. Similarly, from (57) and due to thefact that w a , w q , and θ are mutually independent, the cross-covariance of x a and x q is given by C x a x q = E (cid:2) x a Q ( y ) H (cid:3) = r π E h(cid:0) H θ + w Ha (cid:1) (cid:16) θ H G H + w Hq (cid:17)i ( diag ( C y )) − = r π HΣ θ G H ( diag ( C y )) − . (59)By substituting (51), (53), (58), and (59) in (49) and (50) weobtain that the auto-covariance and cross-covariance matricesare given by (10) and (11), respectively.A PPENDIX BP ROOF OF T HEOREM Σ θ = I M from Assumption A.1 in (4), (6), and (51) we obtain that in thiscase the auto-covariance and cross-covariance matrices satisfy C x a = HH H + σ a I N a , (60) C y = GG H + σ q I N q , (61)and C θ x a = H H . (62)By substituting (15) and (16) from Assumption A.3 in (61),we obtain C y = ρ q ( n q Tn q ) ⊗ I M + σ q I N q , (63)which is a real matrix. Thus, by applying the diagonal operatoron C y in (63), one obtainsdiag ( C y ) = ( ρ q + σ q ) I N q , (64) and therefore, ( diag ( C y )) − = 1 q ρ q + σ q I N q . (65)From (63) and (65), we obtain that ( diag ( C y )) − C y ( diag ( C y )) − = I N q + ρ q ρ q + σ q (( n q Tn q ) ⊗ I M − I N q ) . (66)Substituting (66) in (53) we obtain C x q = 2 π arcsin (cid:16) ( diag ( C y )) − Re ( C y ) ( diag ( C y )) − (cid:17) . (67)Applying the element-wise arcsin function on (66), results in C x q = 2 π (cid:18) π I N q + arcsin (cid:18) ρ q ρ q + σ q (cid:19) (cid:18) ρ q GG H − I N q (cid:19)(cid:19) = α I N q + (1 − α ) 1 ρ q GG H , (68)where α is deﬁned in (19).Substituting Σ θ = I M , from Assumption A.1, and (65) into(59) and (58) we obtain: C x a x q = s π ( ρ q + σ q ) HG H (69)and C θ x q = s π ( ρ q + σ q ) G H , (70)respectively. Substitution of (60), (68), (69), and (70) in (10)and (11) results in C x = (cid:20) C x a C x a x q C x q x a C x q (cid:21) =  HH H + σ a I N a q π ( ρ q + σ q ) HG H q π ( ρ q + σ q ) GH H α I N q + (1 − α ) ρ q GG H  (71)and C θ x = h H H q π ( ρ q + σ q ) G H i , (72)respectively.The auto-covariance matrix in (71) is a block matrix.Therefore in order to calculate its inverse, we ﬁrst note thatusing the Woodbury matrix identity (Eq. (0.7.4.1) [45]) theinverse of the left upper block of C x , which is given in (60),satisﬁes C − x a = 1 σ a I N a − σ a H (cid:18) I M + 1 σ a H H H (cid:19) − H H ! = 1 σ a (cid:18) I N a − ρ a n a + σ a HH H (cid:19) . (73)where the last equality is obtained from (13) and (14) in As-sumption A.2 which implies that H H H = ρ a n a I M . Second, by using (68), (69), and (73), it can be veriﬁed that C x q − C x q x a C − x a C x a x q = α I N q + (1 − α ) 1 ρ q GG H − π ( ρ q + σ q ) 1 σ a GH H (cid:18) I N a − ρ a n a + σ a HH H (cid:19) HG H = α (cid:18) I N q + βα GG H (cid:19) △ = D , (74)where β is deﬁned in (20) and using H H H = ρ a n a I M . Byusing the Woodbury matrix identity on (74), the inverse matrixis given by: (cid:0) C x q − C x q x a C − x a C x a x q (cid:1) − = 1 α I N q − βα G (cid:18) I M + βα G H G (cid:19) − G H ! = 1 α (cid:18) I N q − βα + βρ q n q GG H (cid:19) , (75)where the last equality is obtained from (15) and (16) inAssumption A.3, which implies that G H G = ρ q n q I M . Usingblock matrix inversion together with the results in (73) and(75), it can be veriﬁed that the inverse auto-covariance matrixin (71) is C − x =  C − x a + C − x a C x a x q D − C x q x a C − x a ... − D − C x q x a C − x a ... − C − x a C x a x q D − D − (cid:21) =  σ a (cid:0) I N a + ν ( n a , n q ) HH H (cid:1) ... − ξ ( n a , n q ) GH H ... − ξ ( n a , n q ) HG H α (cid:16) I N q − β ( n a ) α + β ( n a ) ρ q n q GG H (cid:17) , (76)where ν ( n a , n q ) △ = − ρ a n a + σ a + 2 ρ q n q σ a π ( ρ q + σ q )( α + β ( n a ) ρ q n q )( ρ a n a + σ a ) (77)and ξ ( n a , n q ) △ = s π (cid:0) ρ q + σ q (cid:1) α + β ( n a ) ρ q n q )( ρ a n a + σ a ) . (78)Substituting (72) and (76) into (8) and (9) we obtain (17) and(18), respectively. A PPENDIX CP ROOF OF P ROPOSITION n a , taking the maximum number of quantizedmeasurements, n q , under the power constraint is optimal interms of minimizing the MSE, as in the optimization problem given in (27). We show that for any given value of n a , weobtain that M SE | n a ,n q − M SE | n a ,n q +1 ≥ , (79)meaning that the addition of a quantized measurement can onlyimprove the MSE. Substituting the MSE in (18) into (79), weobtain that M SE | n a ,n q − M SE | n a ,n q +1 = M − M (cid:18) ρ a n a ρ a n a + σ a + 2 ρ q n q σ a π ( ρ q + σ q ) ( α + β ( n a ) ρ q n q ) ( ρ a n a + σ a ) ! − (cid:20) M − M (cid:18) ρ a n a ρ a n a + σ a + 2 ρ q ( n q + 1) σ a π ( ρ q + σ q ) ( α + β ( n a ) ρ q ( n q + 1)) ( ρ a n a + σ a ) ! = M ρ q σ a π ( ρ q + σ q )( ρ a n a + σ a ) (cid:20) n q + 1 α + β ( n a ) ρ q ( n q + 1) − n q α + β ( n a ) ρ q n q (cid:21) = M ρ q σ a π ( ρ q + σ q )( ρ a n a + σ a ) · α ( α + β ( n a ) ρ q ( n q + 1))( α + β ( n a ) ρ q n q ) ≥ . (80)In the following, we show that β ( n a ) > . First, it can beseen that according to the deﬁnition of β in (20) we have β ( n a ) = 2 πρ q arcsin (cid:18) ρ q ρ q + σ q (cid:19) − πρ q ρ a n a ρ a n a + σ a ρ q ρ q + σ q = 2 πρ q (cid:18) arcsin (cid:18) ρ q ρ q + σ q (cid:19) − ρ a n a ρ a n a + σ a ρ q ρ q + σ q (cid:19) , (81)where ≤ ρ a n a ρ a n a + σ a ≤ , (82)and since ρ a > , σ a ≥ , and n a ≥ . In addition, < ρ q ρ q + σ q ≤ , (83)since ρ q > and σ q ≥ . Therefore, due to (82) and (83), thefollowing expression is also positive arcsin (cid:18) ρ q ρ q + σ q (cid:19) − ρ a n a ρ a n a + σ a ρ q ρ q + σ q > , (84)and thus, we can conclude that β ( n a ) > . Moreover, as aresult of (83), we also have that ≤ α < . (85)Since n a , n q , ρ a , ρ q , σ a , σ q , α , and β ( n a ) in (80) arenon-negative, the inequality in (79) holds. Therefore, giventhe number of analog measurements, we take the maximumnumber of quantized measurements possible under the powerconstraint. This in turn allows us to solve using a one-dimensional search over n a with the value of n q given in(29). R EFERENCES[1] A. Ribeiro and G. B. Giannakis, “Bandwidth-constrained distributedestimation for wireless sensor networks-part I: Gaussian case,”

IEEETrans. Signal Processing , vol. 54, no. 3, pp. 1131–1143, Mar. 2006.[2] ——, “Bandwidth-constrained distributed estimation for wireless sensornetworks-part II: unknown probability density function,”

IEEE Trans.Signal Processing , vol. 54, no. 7, pp. 2784–2796, July 2006.[3] P. Chavali and A. Nehorai, “Managing multi-modal sensor networksusing price theory,”

IEEE Trans. Signal Processing , vol. 60, no. 9, pp.4874–4887, 2012.[4] D. Saska, R. S. Blum, and L. Kaplan, “Fusion of quantized andunquantized sensor data for estimation,”

IEEE Signal Processing Letters ,vol. 22, no. 11, pp. 1927–1930, Nov. 2015.[5] D. Mandic, M. Golz, A. Kuh, D. Obradovic, and T. Tanaka,

Signalprocessing techniques for knowledge extraction and information fusion .US: Springer, 2008.[6] R. M. Corey and A. C. Singer, “Wideband source localization usingone-bit quantized arrays,” in . IEEE, 2017, pp. 1–5.[7] A. Ribeiro, I. D. Schizas, S. I. Roumeliotis, and G. Giannakis, “Kalmanﬁltering in wireless sensor networks,”

IEEE Control Systems Magazine ,vol. 30, no. 2, pp. 66–86, Apr. 2010.[8] M. Stein, A. K ¨ urzl, A. Mezghani, and J. A. Nossek, “Asymptotic param-eter tracking performance with measurement data of 1-bit resolution,” IEEE Trans. Signal Processing , vol. 63, no. 22, pp. 6086–6095, Nov2015.[9] P. M. Djuri ´ c, M. Vemula, and M. F. Bugallo, “Target tracking by particleﬁltering in binary sensor networks,” IEEE Trans. Signal Processing ,vol. 56, no. 6, pp. 2229–2238, June 2008.[10] Y. Li, C. Tao, G. Seco-Granados, A. Mezghani, A. L. Swindlehurst, andL. Liu, “Channel estimation and performance analysis of one-bit massiveMIMO systems,”

IEEE Trans. Signal Processing , vol. 65, no. 15, pp.4075–4089, 2017.[11] J. Park, S. Park, A. Yazdan, and R. W. Heath, “Optimization of mixed-ADC multi-antenna systems for cloud-RAN deployments,”

IEEE Trans.Communications , vol. 65, no. 9, pp. 3962–3975, 2017.[12] T.-C. Zhang, C.-K. Wen, S. Jin, and T. Jiang, “Mixed-ADC massiveMIMO detectors: Performance analysis and design optimization,”

IEEETrans. Wireless Communications , vol. 15, no. 11, pp. 7738–7752, 2016.[13] N. Liang and W. Zhang, “Mixed-ADC massive MIMO,”

IEEE Journalon Selected Areas in Communications , vol. 34, no. 4, pp. 983–997, 2016.[14] J. Choi, J. Mo, and R. W. Heath, “Near maximum-likelihood detectorand channel estimator for uplink multiuser massive MIMO systems withone-bit ADCs,”

IEEE Trans. Communications , vol. 64, no. 5, pp. 2005–2018, May 2016.[15] H. Pirzadeh and A. L. Swindlehurst, “Spectral efﬁciency of mixed-ADCmassive MIMO,”

IEEE Trans. Signal Processing , vol. 66, no. 13, pp.3599–3613, 2018.[16] N. Shlezinger, Y. C. Eldar, and M. R. Rodrigues, “Asymptotic task-basedquantization with application to massive MIMO,”

IEEE Trans. SignalProcessing , vol. 67, no. 15, pp. 3995–4012, 2019.[17] T. L. Marzetta, “Noncooperative cellular wireless with unlimited num-bers of base station antennas,”

IEEE trans. wireless communications ,vol. 9, no. 11, pp. 3590–3600, 2010.[18] J. Lunden, V. Koivunen, and H. V. Poor, “Spectrum exploration and ex-ploitation for cognitive radio: Recent advances,”

IEEE Signal ProcessingMagazine , vol. 32, no. 3, pp. 123–140, May 2015.[19] O. Bar-Shalom and A. J. Weiss, “DOA estimation using one-bit quan-tized measurements,”

IEEE Trans. Aerospace and Electronic Systems ,vol. 38, no. 3, pp. 868–884, 2002.[20] R. H. Walden, “Analog-to-digital converter survey and analysis,”

IEEEJournal on selected areas in communications , vol. 17, no. 4, pp. 539–550, 1999.[21] Bin Le, T. W. Rondeau, J. H. Reed, and C. W. Bostian, “Analog-to-digital converters,”

IEEE Signal Processing Magazine , vol. 22, no. 6,pp. 69–77, Nov 2005.[22] H. C. Papadopoulos, G. W. Wornell, and A. V. Oppenheim, “Sequentialsignal encoding from noisy measurements using quantizers with dy-namic bias control,”

IEEE Trans. Information Theory , vol. 47, no. 3,pp. 978–1002, Mar. 2001.[23] M. S. Stein, S. Bar, J. A. Nossek, and J. Tabrikian, “Performance anal-ysis for channel estimation with 1-bit ADC and unknown quantizationthreshold,”

IEEE Trans. Signal Processing , vol. 66, no. 10, pp. 2557–2571, May 2018. [24] A. Host-Madsen and P. Handel, “Effects of sampling and quantizationon single-tone frequency estimation,”

IEEE Trans. Signal Processing ,vol. 48, no. 3, pp. 650–662, Mar. 2000.[25] A. Kipnis and J. C. Duchi, “Mean estimation from one-bit mea-surements,”

CoRR , vol. abs/1901.03403, 2019. [Online]. Available:http://arxiv.org/abs/1901.03403[26] G. Wang, J. Zhu, R. S. Blum, P. Willett, S. Marano, V. Matta, andP. Braca, “Signal amplitude estimation and detection from unlabeledbinary quantized samples,”

IEEE Trans. Signal Processing , vol. 66,no. 16, pp. 4291–4303, Aug. 2018.[27] M. S. Stein, “Spectral power parameter estimation of random sourceswith binary sampled signals,” 2019.[28] J. Zhu, X. Lin, R. S. Blum, and Y. Gu, “Parameter estimation fromquantized observations in multiplicative noise environments,”

IEEETrans. Signal Processing , vol. 63, no. 15, pp. 4037–4050, Aug. 2015.[29] J. Ren, T. Zhang, J. Li, and P. Stoica, “Sinusoidal parameter estimationfrom signed measurements via majorization-minimization based RE-LAX,”

IEEE Trans. Signal Processing , vol. 67, no. 8, pp. 2173–2186,Apr. 2019.[30] G. Zeitler, G. Kramer, and A. C. Singer, “Bayesian parameter estimationusing single-bit dithered quantization,”

IEEE Transactions on SignalProcessing , vol. 60, no. 6, pp. 2713–2726, 2012.[31] S. Jacobsson, G. Durisi, M. Coldrey, U. Gustavsson, and C. Studer,“One-bit massive MIMO: Channel estimation and high-order modula-tions,” in

IEEE International Conference on Communication Workshop(ICCW) , 2015, pp. 1304–1309.[32] Q. Wan, J. Fang, H. Duan, Z. Chen, and H. Li, “Generalized BussgangLMMSE channel estimation for one-bit massive MIMO systems,”

IEEETransactions on Wireless Communications , 2020.[33] N. Harel and T. Routtenberg, “Non-Bayesian estimation with partiallyquantized observations,” in

International Conference on Digital SignalProcessing (DSP) , 2017, pp. 1–5.[34] F. Gustafsson and R. Karlsson, “Generating dithering noise for maxi-mum likelihood estimation from quantized data,”

Automatica , vol. 49,no. 2, pp. 554–560, 2013.[35] O. Dabeer and E. Masry, “Multivariate signal parameter estimationunder dependent noise from 1-bit dithered quantized data,”

IEEE Trans.Information Theory , vol. 54, no. 4, pp. 1637–1654, 2008.[36] S. Boyd and L. Vandenberghe,

Convex Optimization . CambridgeUniversity Press, 2004.[37] E. Vlachos and J. Thompson, “Dithered beamforming for channel esti-mation in mmwave-based massive MIMO,” in .IEEE, 2018, pp. 3604–3608.[38] L. Lu, G. Y. Li, A. L. Swindlehurst, A. Ashikhmin, and R. Zhang, “Anoverview of massive MIMO: Beneﬁts and challenges,”

IEEE journal ofselected topics in signal processing , vol. 8, no. 5, pp. 742–758, 2014.[39] H. Pirzadeh and A. L. Swindlehurst, “Spectral efﬁciency under energyconstraint for mixed-ADC MRC massive MIMO,”

IEEE Signal Process-ing Letters , vol. 24, no. 12, pp. 1847–1851, 2017.[40] S. Jacobsson, G. Durisi, M. Coldrey, U. Gustavsson, and C. Studer,“Throughput analysis of massive MIMO uplink with low-resolutionADCs,”

IEEE Trans. Wireless Communications , vol. 16, no. 6, pp. 4038–4051, 2017.[41] E. G. Larsson, O. Edfors, F. Tufvesson, and T. L. Marzetta, “MassiveMIMO for next generation wireless systems,”

IEEE communicationsmagazine , vol. 52, no. 2, pp. 186–195, 2014.[42] A. Puglielli, A. Townley, G. LaCaille, V. Milovanovi´c, P. Lu, K. Trot-skovsky, A. Whitcombe, N. Narevsky, G. Wright, T. Courtade et al. ,“Design of energy-and cost-efﬁcient massive mimo arrays,”

Proceedingsof the IEEE , vol. 104, no. 3, pp. 586–606, 2015.[43] A. Papoulis,

Probability, random variables and stochastic processes ,4th ed. Boston: McGraw-Hill, 2002.[44] J. J. Bussgang, “Crosscorrelation functions of amplitude-distorted gaus-sian signals,”

Research Laboratory of Electronics, Massachusetts Insti-tute of Technology , 1952.[45] R. A. Horn and C. R. Johnson,

Matrix analysis . Cambridge universitypress, 2012.. Cambridge universitypress, 2012.