Channel Vector Subspace Estimation from Low-Dimensional Projections
11 Massive MIMO Channel Subspace Estimation fromLow-Dimensional Projections
Saeid Haghighatshoar,
Member, IEEE,
Giuseppe Caire,
Fellow, IEEE
Abstract —Massive MIMO is a variant of multiuser MIMOwhere the number of base-station antennas M is very large(typically ≈ ), and generally much larger than the number ofspatially multiplexed data streams (typically ≈ ). The benefitsof such approach have been intensively investigated in the pastfew years, and all-digital experimental implementations have alsobeen demonstrated. Unfortunately, the front-end A/D conversionnecessary to drive hundreds of antennas, with a signal bandwidthof the order of 10 to 100 MHz, requires very large sampling bit-rate and power consumption.In order to reduce such implementation requirements, HybridDigital-Analog architectures have been proposed. In particular,our work in this paper is motivated by one of such schemesnamed Joint Spatial Division and Multiplexing (JSDM), wherethe downlink precoder (resp., uplink linear receiver) is split intothe product of a baseband linear projection (digital) and an RFreconfigurable beamforming network (analog), such that only areduced number m (cid:28) M of A/D converters and RF modula-tion/demodulation chains is needed. In JSDM, users are groupedaccording to similarity of their channel dominant subspaces, andthese groups are separated by the analog beamforming stage,where multiplexing gain in each group is achieved using thedigital precoder. Therefore, it is apparent that extracting thechannel subspace information of the M -dim channel vectorsfrom snapshots of m -dim projections , with m (cid:28) M , plays afundamental role in JSDM implementation.In this paper, we develop novel efficient algorithms that requiresampling only m = O (2 √ M ) specific array elements accordingto a coprime sampling scheme, and for a given p (cid:28) M , returna p -dim beamformer that has a performance comparable withthe best p -dim beamformer that can be designed from the fullknowledge of the exact channel covariance matrix. We assessthe performance of our proposed estimators both analyticallyand empirically via numerical simulations. We also demonstrateby simulation that the proposed subspace estimation methodsprovide near-ideal performance for a massive MIMO JSDMsystem, by comparing with the case where the user channelcovariances are perfectly known. I. I
NTRODUCTION C ONSIDER a multiuser MIMO channel formed by a base-station (BS) with M antennas and K single-antennamobile users in a cellular network. Following the current massive MIMO approach [1–4], uplink (UL) and downlink(DL) are organized in Time Division Duplexing (TDD), andthe BS transmit/receive hardware is designed or calibrated inorder to preserve UL-DL reciprocity [5, 6] such that the BScan estimate the channel vectors of the users from UL trainingsignals sent by the users on orthogonal dimensions. Since thereis no multiuser interference on the UL training phase, in this A shorter version of this paper was presented at the International ZurichSeminar, Zurich, Switzerland, March 2016.The authors are with the Communications and Information Theory Group,Technische Universit¨at Berlin ( { saeid.haghighatshoar, caire } @tu-berlin.de). paper we shall focus on the basic channel estimation problemfor a single user.In massive MIMO systems, the number of antennas M istypically much larger than the number of users K scheduledto communicate over a given transmission time slot (i.e.,the number of spatially multiplexed data streams). Letting D denote the duration of a time slot (expressed in channeluses), τ D channel uses for some τ ∈ (0 , , are dedicated totraining and the remaining (1 − τ ) D channel uses are devotedto data transmission, where it is assumed that D is not largerthan the channel coherence block length, i.e., the number ofchannel uses over which the channel is nearly constant [1].It turns out that for isotropically distributed channel vectorswith min { M, K } ≥ D/ , it is optimal to devote a fraction τ = 1 / of the slot to channel estimation while serving only D/ out of K users in the remaining half [1]. In many relevant scenarios, the channel vectors are highlycorrelated since the propagation occurs through a small set ofAngle of Arrivals (AoAs). This correlation can be exploitedto improve the system multiplexing gain and decrease thetraining overhead. A particularly effective scheme is the JointSpace Division and Multiplexing (JSDM) approach proposedand analyzed in [7–11]. JSDM starts from the considerationthat for a user with a channel vector h ∈ C M the signalcovariance matrix S = E [ hh H ] is typically low-rank . More-over, according to the well-known and widely accepted Wide-Sense Stationary Uncorrelated Scattering (WSSUS) channelmodel, S is invariant over time and frequency. In particular,while the small-scale fading has a coherence time between0.1s and 10 ms for motion speed between 1 m/s to 10 m/sat the carrier frequency of 3 GHz, the time over which thechannel vector can be considered WSS is of the order of tensof seconds, i.e., from 2 to 4 orders of magnitude larger. Hence,estimating the signal subspace of a user is a much easier taskthan estimating the instantaneous channel vector h on eachcoherence time slot. This is especially important in mm-wavechannels (e.g., carrier frequency of the order of 30 GHz) since,due to the higher carrier frequency, the Doppler bandwidthof these channels is large and therefore D is small, i.e., themultiplexing gain of D/ achieved by estimating the channelsby TDD on each given slot as in [1] is significantly impaired.When the subspace information for the users can be accu-rately estimated over a long sequence of time slots, JSDM When
K > D/ , then groups of D/ users are scheduled over differenttime slots such that all users achieve a positive throughput (i.e., rate averagedover a long sequence of scheduling slots). This is especially true in the case of a tower-mounted BS and/or in the caseof mm-wave channels, as experimentally confirmed by channel measurements(see [9] and references therein). a r X i v : . [ c s . I T ] J u l partitions the users into G > groups such that users in eachgroup have approximately the same dominant channel sub-space [7–9]. The overall multiplexing gain is obtained in twostages, as the concatenation of two linear projections. Namely,groups are separated by zero-forcing beamforming that usesonly the group subspace information. Then, additional mul-tiuser multiplexing gain can be obtained by conventional linearprecoding applied independently in each group. In this way,the system multiplexing gain can be boosted by G such thata decrease in D can be compensated by a larger G [12].Furthermore, JSDM lends itself naturally to a Hybrid DigitalAnalog (HDA) implementation, where the group-separatingbeamformer can be implemented in the analog (RF) domain,and the multiuser precoding inside each group is implementedin the digital (baseband) domain. The analog beamform-ing projection reduces the dimensionality from M to someintermediate dimension m (cid:28) M . Then, the resulting m inputs (UL) are converted into digital baseband signals, andare further processed in the digital domain. This has theadditional non-trivial advantage that only m (cid:28) M RF chains(A/D converters and modulators) are needed, thus reducingsignificantly the massive MIMO BS receiver/transmitter front-end complexity and power consumption.From what said, it is apparent that a central task at the BSside consists in estimating, for each user, a subspace containinga significant amount of its received signal power. Since in anHDA implementation we do not have direct access to all the M antennas, but only to m (cid:28) M analog output observations,we need to estimate this subspace from snapshots of a low-dimprojection of the signal. A. Contribution
In this paper, we aim to design such a subspace estimatorfor a BS with a large uniform linear array (ULA) with M (cid:29) antennas. The geometry of the array is shown in Fig. 1, witharray elements having uniform spacing d . • • d • d • d • d • ( M − d User...Scattering Channel θ i Fig. 1:
Array configuration in a multi-antenna receiver in thepresence of a scattering channel with discrete angle of arrivals.
We assume that the array serves the users in the angularrange [ − θ max , θ max ] for some θ max ∈ (0 , π/ , and we let d = λ θ max ) , where λ is the wave-length. In general,we assume that we can observe only low-dim sketches ofthe received signal via m (cid:28) M linear projections. In the case where the projection matrix contains a single non-zeroelement equal to 1 in each row, we recover the case of arraysubsampling as a special case. In particular, we shall consider acoprime sampling scheme requiring m = O (2 √ M ) . Coprimesubsampling was first developed by Vaidyanathan and Pal in[13, 14], where they showed that for a given spatial spanfor the array, one obtains approximately the same resolutionas a uniform linear array by nonuniformly sampling only afew array elements at coprime locations. We propose sev-eral algorithms for estimating the signal subspace and castthem as convex optimization problems that can be solvedefficiently. We also compare via simulation the performanceof our algorithms with other state-of-the-art algorithms inthe literature. The relevance of the proposed approach forJSDM is demonstrated via a representative example, where thedominant subspace of users with different channel correlationsare estimated and grouped according to the Grassmanianquantization scheme introduced in [8]. Then, JSDM is appliedto the estimated user groups. We compare the achieved sum-rate of our scheme with the ideal case, where the users’channel covariances are perfectly known, as in [8], and wefind that the performance penalty incurred by our proposedmethod is negligible, even for very short training lengths. Notation.
Throughout the paper, the output of an optimizationalgorithm arg min x f ( x ) is denoted by x ∗ . We use T and T + for the space of all M × M Hermitian Toeplitz andHermitian semi-definite Toeplitz matrices. We always use I for the identity matrix, where the dimension may be explicitlyindicated for the sake of clarity (e.g., I k denotes the k × k identity matrix). We denote a k × k diagonal matrix with k diagonal elements α , . . . , α k with diag ( α , . . . , α k ) . Wedefine H ( M, p ) = { U M × p ∈ C M × p : U H U = I p } as the setof tall unitary matrices of dimension M × p . For matrices andvectors of appropriate dimensions, we define the inner productby (cid:104) K , L (cid:105) = Tr ( KL H ) , where Tr denotes the trace operator,and define the induced norm by (cid:107) K (cid:107) = (cid:112) (cid:104) K , K (cid:105) , also knownas Frobenius norm for matrices. For an integer k ∈ Z , we usethe shorthand notation [ k ] for the set of non-negative integers { , , . . . , k − } , where the set is empty if k < .II. R ELATED W ORK
Several works in the literature are related to the problemaddressed in this paper, which can be summarized in thefollowing four categories: Subspace tracking, Low-rank matrixrecovery, Direction-of-arrival (DoA) estimation, and MultipleMeasurement Vectors (MMV) problem in compressed sensing(CS). For the sake of completeness, in this section we brieflyreview these approaches.While in the rest of this paper we shall treat a generalscattering model, for simplicity of exposition it is convenientto focus here on a simple model in which the transmissionbetween a user and the BS occurs through p scatterers (seeFig. 1). One snapshot of the received signal is given by y = p (cid:88) (cid:96) =1 a ( θ (cid:96) ) w (cid:96) x + n , (1)where x is the transmitted (training) symbol, w (cid:96) ∼ CN (0 , σ (cid:96) ) is the channel gain of the (cid:96) -th multipath component, n ∼ CN ( , σ I M ) is the additive white Gaussian noise of thereceiver antenna, and where a ( θ ) ∈ C M is the array responseat AoA θ , whose k -th component is given by [ a ( θ )] k = e jkπ sin( θ )sin( θ max) . (2)According to the WSSUS model, the channel gains for dif-ferent paths, i.e., { w (cid:96) } p(cid:96) =1 , are uncorrelated. Without loss ofgenerality, we suppose x = 1 in all training snapshots. Letting A = [ a ( θ ) , . . . , a ( θ p )] , we have y ( t ) = Aw ( t ) + n ( t ) , t ∈ [ T ] , (3)where w ( t ) = ( w ( t ) , . . . , w p ( t )) T for different t ∈ [ T ] are statistically independent. Also, we assume that the AoAs { θ (cid:96) } p(cid:96) =1 are invariant over the whole training period of length T slots. We gather the received signal’s and subsampled sig-nal’s snapshots into M × T matrix Y = [ y (0) , . . . , y ( T − ,and m × T matrix X = [ x (0) , . . . , x ( T − , where x ( t ) = B y ( t ) , and X = BY for the m × M projection matrix B .From (3), the covariance of y ( t ) is given by C y = AΣA H + σ I M = p (cid:88) (cid:96) =1 σ (cid:96) a ( θ (cid:96) ) a ( θ (cid:96) ) H + σ I M , (4)where Σ = diag ( σ , . . . , σ p ) is the covariance matrix of w ( t ) . A. Subspace Tracking from Incomplete Observations
Let C y = UΛU H be the singular value decomposition(SVD) of C y , where Λ = diag ( λ , . . . , λ M ) denotes thediagonal matrix of singular values (sorted in non-increasingorder). Denoting by U p the M × p matrix consisting of thefirst p columns of U , we have that the columns of U p forman orthonormal basis for the signal subspace. The goal of subspace tracking from incomplete observations consists inestimating this subspace from the noisy low-dim sketches x ( t ) = By ( t ) , revealed to the estimator sequentially for t ∈ [ T ] . The noiseless version of this problem was studied byChi et. al. in [15], proposing the PETRELS algorithm. Anotheralgorithm named GROUSE was proposed by Balzano et. al.in [16]. The main focus of both algorithms is to optimizethe computational complexity rather than the data size, andtherefore they are mainly suited to the case where both M and T are high. B. Low-rank Matrix Recovery
For p (cid:28) M and for a high signal-to-noise ratio (SNR),the covariance matrix C y in (4) is nearly low-rank. Recoveryof low-rank matrices from a collection of a few possiblynoisy samples is of great importance in signal processing andmachine learning. Recently, it has been shown that this can beachieved via nuclear-norm minimization, which is a convexproblem and can be efficiently solved [17]. For a symmetricmatrix M , the nuclear norm (cid:107) M (cid:107) ∗ is given by the sum ofthe absolute values of the eigen-values of M , and reduces to Tr ( M ) when M is positive semi-definite (PSD). In our case, we have only a collection of T snapshots X = BY as definedbefore. Let (cid:98) C y = 1 T T (cid:88) t =1 y ( t ) y ( t ) H , (cid:98) C x = B (cid:98) C y B H (5)be the sample covariance of the full and projected signal. Anatural extension of the matrix completion by nuclear-normminimization to our case is readily give by: min M Tr ( M ) subject to M ∈ T + , (cid:107) (cid:98) C x − BMB H (cid:107) ≤ (cid:15), (6)where (cid:15) is an estimate of the (cid:96) -norm of the error. C. Direction-of-arrival Estimation and Super-resolution
From (3), it is seen that the received signal y ( t ) is a noisysuperposition of p independent Gaussian sources arriving from p different angles. This is the same model studied for direction-of-arrival (DoA) estimation. There are two main categoriesof algorithms for DoA estimation: classical super-resolution(SR) algorithms such as ESPRIT [18] and MUSIC [19], andmore recent compressed sensing based algorithms that use theangular sparsity of the signal over a discrete grid of AoAs.Although grid-based approaches suffer from the mismatch ofoff-grid sources [20], they have been vastly studied [21–29].Recently, Cand`es and Fernandez-Granda [30, 31] developeda SR technique based on total-variation (TV) minimization,which inherits the convex optimization computational advan-tage of compressed sensing. This approach was extended byTan et. al. in [32] to DoA estimation with coprime arrays, whenthe AoAs are sufficiently separated. In a wireless environment,the AoAs may be clustered. This implies that the separationrequirement for the SR setup may not be met. For example,often a continuous AoA density function has been observedin measurements (e.g., see [33]), and is considered in channelmodels (e.g., see [34]). This represents an obstacle for astraightforward application of SR methods (both classical[18, 19] and modern [30–32]). Since in this paper we aimat estimating the subspace of the signal rather than DoAs, inSection IV-D we extend the SR approach, and develop a newalgorithm for estimating the signal subspace. D. Multiple Measurement Vectors (MMV)
It is seen from (3) that, neglecting the measurement noise n ( t ) , the signal y ( t ) has typically a sparse representationover the continuous dictionary { Ba ( θ ) , θ ∈ [ − θ max , θ max ] } ,i.e., only p atoms of the dictionary { Ba ( θ i ) } pi =1 are neededto represent the signal. After a suitable discretization of thedictionary (e.g., using a discrete grid of AoAs), the problemof estimating Y from the collection of snapshots X = BY ,knowing that each column x ( t ) of X has the same sparsitypattern (i.e., it is a linear combination of the same dictionaryelements { Ba ( θ i ) } pi =1 ) is the classical Multiple MeasurementVectors (MMV) problem in compressed sensing, which hasbeen widely studied in the literature (see e.g., [26, 35–40] andrefs. therein). Since (as in our case) the underlying dictionarymay be continuous, more recently off-grid MMV techniqueshave also been developed [41–43]. We will compare the performance of our algorithms witha grid-based MMV approach as in [36], where the channelcoefficients W = [ w (0) , . . . , w ( T − are estimated by W ∗ = arg min M ∈ C G × T (cid:107) M (cid:107) , subject to (cid:107) X − DM (cid:107) ≤ (cid:15), (7)where D = [ Ba ( θ ) , . . . , Ba ( θ G )] is a quantized dictionaryover a grid of AoAs of size G , and where the so-called (cid:96) , -norm of the matrix M = [ m , . . . , m G ] T is defined as (cid:107) M (cid:107) , = (cid:80) Gi =1 (cid:107) m i (cid:107) , where m i ∈ C T , i = 1 , . . . , G ,denote the rows of M . The signal subspace is eventually givenby span { a ( θ i ) : i ∈ A} , where A contains the index of the“active” columns of D , i.e., those indexed by the support setof W ∗ .We will also compare our algorithms with a grid-lessapproach inspired by [41–43], based on applying atomic-normdenoising to the received signal Y . This can be cast as thefollowing semi-definite program (SDP) ( T ∗ , W ∗ , Z ∗ ) = arg min T ∈ T + , W ∈ C T × T , Z ∈ C M × T Tr ( T ) + Tr ( W ) subject to (cid:20) T ZZ H W (cid:21) (cid:23) , (cid:107) X − BZ (cid:107) ≤ (cid:15) (cid:48) , (8)where T ∗ gives an estimate of the signal covariance matrixand where (cid:15) (cid:48) is an estimate of (cid:96) -norm of the noise in theprojected data.In both (7) and (8), the computational complexity scaleswith the number of observation snapshots T . This posesa problem when the training time T is large. Although anADMM formulation as in [44] is proposed in [42] to reducethe computational complexity, the parameters of ADMM needto be selected very carefully to guarantee convergence. Weshall see that our algorithms perform equally or better than(7) and (8) and have significantly less complexity for large T ,since their complexity does not scale with T .III. C HANNEL M ODEL AND P ROBLEM S TATEMENT
More general than in (1), the channel vector may be formedby the superposition of a continuum of array responses. Inorder to include this case, we define the AoA scattering func-tion γ ( u ) , which describes the received power density alongthe direction identified by u ∈ [ − , , where u = sin( θ )sin( θ max ) for θ ∈ [ − θ max , θ max ] . We denote the array vector in the u domain by a ( u ) , where [ a ( u )] k = e jkπu . Then, the channelmodel is given by y ( t ) = (cid:90) − (cid:112) γ ( u ) a ( u ) z ( u, t ) du + n ( t ) , (9)where z ( u, t ) is a white circularly symmetric Gaussian processwith a covariance function E (cid:104) z ( u, t ) z ( u (cid:48) , t (cid:48) ) ∗ (cid:105) = δ ( u − u (cid:48) ) δ t,t (cid:48) . The covariance matrix of y ( t ) is also given by C y = (cid:90) − γ ( u ) a ( u ) a ( u ) H du + σ I M = S + σ I M , (10) This scaling depends highly on the specific SDP solver and the structureof the matrix, but it is typically at least of the order O ( T ) . where S = S ( γ ) := (cid:82) − γ ( u ) a ( u ) a ( u ) H du denotes thecovariance matrix of the signal part, and where σ I M is thecovariance matrix of the white additive noise. We define thereceived SNR by snr := Tr ( S ( γ )) Tr ( σ I M ) = (cid:82) − γ ( u ) (cid:107) a ( u ) (cid:107) duM σ = (cid:82) − γ ( u ) duσ , where (cid:82) − γ ( u ) du is the whole received signal power in agiven array element. For the ULA, S is a Toeplitz matrixwith [ S ] ij = [ f ] i − j , where f is an M -dim vector with [ f ] k = (cid:82) − γ ( u ) e jkπu du for k ∈ [ M ] , and corresponds tothe k -th Fourier coefficient of the density γ .We define the best p -dim beamforming matrix for thecovariance matrix S as V p = arg max U ∈ H ( M,p ) (cid:104) S , UU H (cid:105) .Letting S = VΛV H = (cid:80) Mi =1 λ i v i v H i be the SVD of S , thematrix V p is an M × p tall unitary matrix formed by the first p columns of V . The signal power captured by this beamformeris given by (cid:104) S , V p V H p (cid:105) = (cid:80) pi =1 λ i .In this paper, we are concerned with the estimation of V p ,for some appropriately chosen p , from the noisy snapshots ofthe projected channel (sketches) X = BY obtained duringa training period of length T , as defined at the beginning ofSection II. In order to measure the “goodness” of estimators,we propose the following performance metric which is relevantto the underlying communication problem of JSDM groupseparation beamforming. First, we define the efficiency of thebest p -dim beamformer by η p = (cid:104) S , V p V H p (cid:105) Tr ( S ) = Tr (cid:0) V H p SV p (cid:1) Tr ( S ) . (11)If η p ≈ for some p (cid:28) M , then a significant amount ofsignal’s power is captured by a low-dim beamformer. Let now (cid:101) V p = (cid:101) V p ( X ) be an estimator of V p from the sketches X .We define the relative efficiency of (cid:101) V p as Γ p = (cid:104) S , (cid:101) V p (cid:101) V H p (cid:105)(cid:104) S , V p V H p (cid:105) = 1 − (cid:104) S , V p V H p (cid:105) − (cid:104) S , (cid:101) V p (cid:101) V H p (cid:105)(cid:104) S , V p V H p (cid:105) . (12)Hence, the efficiency of (cid:101) V p is given by (cid:101) η p = Γ p η p , and − Γ p represents the fraction of signal power lost due to themismatch between the optimal beamformer and its estimate.It is immediate to see that Γ p ∈ [0 , , where it is desirable tomake it as close to as possible, in particular for those valuesof p for which η p ≈ . Remark We shall compare different subspace estimatorsfor a given channel statistics, number of antennas and numberof measurements (i.e., γ , snr , M , and m ) according to thefollowing procedure: 1) fix some (cid:15) ∈ (0 , ; 2) find minimum p such that η p ≥ − (cid:15) ; 3) compare subspace estimators interms of Γ p . This approach is quite different from the classicalDoA estimation used in array processing (e.g., in radar). There,the relevant parameters to be estimated are the AoAs. Inour problem, we do not really care about discrete angles,but only about a good approximation (in terms of capturedsignal power) of the span of the corresponding array responsevectors. It follows that the problem of identifiability thattypically arises in DoA estimation when the minimum angularspacing is too small, is irrelevant here. This is the reason why we can handle continuous AoA scattering functions γ ,in contrast to some SR methods that assume discrete andsufficiently spaced AoAs. ♦ IV. P
ROPOSED A LGORITHMS FOR S UBSPACE E STIMATION
In this section, we introduce the coprime sampling thatwe will use throughout the paper. We explain the proposedalgorithms for estimating the signal subspace and providefurther intuitions and discussions about their performance.
A. Coprime Sampling Operator
Let D be a subset of [ M ] of size L and consider a ULAwhose elements are located at id with i ∈ D (see Fig. 1). Thearray is called a minimum-redundancy linear arrays (MRLA)if for every (cid:96) ∈ [ M ] , with (cid:96) (cid:54) = 0 , there are unique elements i, i (cid:48) ∈ D such that (cid:96) = i − i (cid:48) . This implies that M = L ( L − +1 or approximately L ≈ √ M . Now, consider an arbitraryconfiguration of sensors D ⊂ [ M ] and let us define thedifference set ∆ D = { i − i (cid:48) : i, i (cid:48) ∈ D with i ≥ i (cid:48) } . (13)It is clear that ∆ D ⊂ [ M ] . We call D a complete cover (CC)if ∆ D = [ M ] . This implies that for every (cid:96) ∈ [ M ] , there isat least a (not necessarily unique) pair i, i (cid:48) ∈ D such that (cid:96) = i − i (cid:48) . By this definition, the location of sensors for a MRLAbuilds a CC with a minimum size. For large values of M , itis possible to build a CC of size √ M by coprime sampling[13, 14]. Let q , q be coprime numbers (i.e., gcd( q , q ) = 1 )that are very close to each other, and satisfy q q ≈ M , suchthat q ≈ q ≈ √ M . Let D be the set of all nonnegativeinteger combinations of q and q less than or equal to M − ,i.e., D = ∪ i =1 , { k : k ∈ [ M ] , mod ( k, q i ) = 0 } . Note that |D| ≈ √ M . We define the covering set of an element k ∈ [ M ] by X k = { ( i, i (cid:48) ) : i, i (cid:48) ∈ D , i ≥ i (cid:48) , i − i (cid:48) = k } , (14)and its size by c k = |X k | . For suitable selection of q and q and for sufficiently large M , c k ≥ for almost all k ∈ [ M ] ,the set ∆ D is approximately equal to [ M ] , and D is a CCfor [ M ] . In the rest of the paper, we always assume that D is a CC for [ M ] . Suppose the elements d i of D are sortedin increasing order with d i being the i -th largest element inthe list. Also, we let m = |D| and let B be the m × M binary matrix with elements [ B ] i,d i = 1 for i ∈ { , . . . , m } and zero otherwise. It is immediate to check that BB H = I m .We will use B as the projection matrix to produces the low-dim observations X = BY . In passing, this has the advantagethat the projection reduces to array subsampling, or “antennaselection”, which is very easy to implement in the analog RFdomain by simple switches connecting the selected antennasto the RF demodulation chains and A/D converters.The first most basic property that a projection matrix B must satisfy in order to allow for efficient subspace estimation For small values of M , the set D might not be a CC, however, as wewill explain the performance of our proposed algorithms will not changedramatically as far as the number of uncovered elements in [ M ] is negligiblecompared with M . is identifiability, that is, the associated matrix map must be abijection when restricted to the class of signal covariance ma-trices generated by the model at hand. For coprime sampling,this is ensured by the following result. Proposition Let S be an M × M Hermitian Toeplitzmatrix and let B be the coprime sampling matrix. Then themapping S → BSB H is a bijection. (cid:3) Proof:
Since S is Toeplitz, for any i, j ∈ [ M ] with i ≥ j ,we have [ S ] i,j = [ f ] i − j , for some M -dim vector f . Also, as S is Hermitian, f fully specifies S . Let i, i (cid:48) ∈ { , . . . , m } with i ≥ i (cid:48) . We can check that [ BSB H ] i,i (cid:48) = [ S ] d i ,d i (cid:48) = [ f ] d i − d i (cid:48) . (15)As D is a complete cover for [ M ] , for any k ∈ [ M ] thereare d i , d i (cid:48) ∈ D such that d i − d i (cid:48) = k , which using (15)implies that [ BSB H ] i,i (cid:48) = [ f ] k . Thus, all the elements of S can be recovered from the low-dim matrix BSB H and vice-versa, thus, the mapping is a bijection. Remark Although in this paper, for simplicity of im-plementation, we focus on a coprime sampling matrix B ,all the proposed algorithms, except the super-resolution (SR)algorithm in Section IV-D, can be applied to other samplingmatrices (e.g., i.i.d. Gaussian matrices). ♦ B. Algorithm 1 : Approximate Maximum Likelihood (AML)Estimator
For the signal model (9), we can immediately prove thefollowing result.
Proposition Let (cid:98) C x = T XX H be the sample covarianceof the observations X . Then (cid:98) C x is a sufficient statistics forestimating the signal covariance matrix S . (cid:3) Proof:
Recall that C x = BC y B H = BSB H + σ I m ,where we have explicitly used the fact that BB H = I m . Asthe observations X are Gaussian, after some simple algebrathe likelihood function is given by p ( X | S ) = exp (cid:110) − T Tr (cid:16) (cid:98) C x ( BSB H + σ I m ) − (cid:17)(cid:111) π T m det ( BSB H + σ I m ) T . (16)It follows that the likelihood function depends on X only via (cid:98) C x . From the Fischer-Neyman factorization theorem [45], itfollows that (cid:98) C x is a sufficient statistics.We always assume that the noise variance σ can beestimated during the system’s operation. In this section, forsimplicity of the notation, we suppose that the input signalis scaled by σ , and denote the resulting sample covarianceby (cid:98) C (cid:101) x , where due to normalization (cid:98) C (cid:101) x = (cid:98) C x /σ . Then,the Maximum-Likelihood (ML) estimator for the normalizedsubsampled data can be written as (cid:101) S ∗ = arg min (cid:101) S ∈ T + L ( (cid:101) S ) ,where (cid:101) S = S /σ , and where L ( (cid:101) S ) is the minus log-likelihoodfunction given by L ( (cid:101) S ) = log det ( I m + B (cid:101) SB H ) + Tr (cid:16) (cid:98) C (cid:101) x ( I m + B (cid:101) SB H ) − (cid:17) . By direct inspection, we have the following result.
Proposition L ( (cid:101) S ) is the sum of a concave function L cav ( (cid:101) S ) = log det ( I m + B (cid:101) SB H ) and a convex function L vex ( (cid:101) S ) = Tr (cid:16) (cid:98) C (cid:101) x ( I m + B (cid:101) SB H ) − (cid:17) . (cid:3) As L ( (cid:101) S ) is not convex, local optimization techniques suchas gradient descent are not guaranteed to converge to theglobally optimal solution. Since (cid:101) S scales with SNR, it ispossible to obtain a convex (indeed, linear) approximation ofthe concave function L cav ( (cid:101) S ) , which is tight especially for lowSNR. More precisely, we have the following result. Proposition L cav ( (cid:101) S ) ≤ Tr ( B (cid:101) SB H ) for all (cid:101) S ∈ T + .Moreover, for the low-SNR regime ( snr (cid:28) ), we have L cav ( (cid:101) S ) = Tr ( B (cid:101) SB H ) + o ( snr ) . (cid:3) Proof:
See Appendix C-A.Proposition 4 states that for low SNR, Tr ( B (cid:101) SB H ) is thebest linear approximation for L cav ( (cid:101) S ) , which implies that L app ( (cid:101) S ) = Tr ( B (cid:101) SB H ) + Tr ( (cid:98) C (cid:101) x ( I m + B (cid:101) SB H ) − ) , (17)is the best convex upper bound for L ( (cid:101) S ) . We define the approximate maximum likelihood (AML) estimator for thenormalized input by (cid:101) S ∗ = arg min (cid:101) S ∈ T + L ( (cid:101) S ) . Remark It is interesting to note that this approximationis valid independent of the length of the training period T asfar as snr is sufficiently small. Although the total signal-to-noise ratio of the estimation problem increases by increasing T , the validity of this approximation only depends on theSNR of an individual sample rather than the accumulativesignal-to-noise ratio of the whole training samples. On theother hand, increasing T yields (cid:98) C x → C x = E [ x ( t ) x ( t ) H ] byconsistency of the sample covariance estimator, and improvesthe estimation. ♦ The next proposition shows that the AML estimation canbe cast as an SDP.
Proposition Let L app ( (cid:101) S ) = Tr ( B (cid:101) SB H ) + Tr ( (cid:98) C (cid:101) x ( I m + B (cid:101) SB H ) − ) and let (cid:98) C (cid:101) x = UΛU H be the SVD of (cid:98) C (cid:101) x . Thenthe AML estimate can be obtained from the following SDP ( (cid:101) S ∗ , W ∗ ) = arg min M ∈ T + , W Tr ( BMB H ) + Tr ( W ) subject to (cid:34) I m + BMB H (cid:101) ∆ (cid:101) ∆ H W (cid:35) (cid:23) , (18)where (cid:101) ∆ = (cid:98) C / (cid:101) x = UΛ / . (cid:3) Proof:
See Appendix C-B.Some remarks about optimization problem (18) are in order.
Remark Although the optimization algorithm (18) givesan approximation of the ML estimate for low-SNR regime, itdoes not need the explicit knowledge of SNR. However, anestimate of noise level in the array is necessary to scale theinput data. By Proposition 4, we expect that the performanceof AML be very close to the performance of the ML in thelow-SNR regime. ♦ Remark After descaling all the parameters by σ , theSDP in (18) can be equivalently written as ( S ∗ , W ∗ ) = arg min M ∈ T + , W Tr ( BMB H ) + Tr ( W ) subject to (cid:20) σ I m + BMB H ∆∆ H W (cid:21) (cid:23) , (19)where ∆ = σ (cid:101) ∆ = (cid:98) C / x , where (cid:98) C x is the sample covarianceof the input data without any normalization. This can be usedto directly estimate S . The optimization (19) shows a close resemblance to the atomic norm computation introduced in(8) for the MMV problem. However, there are also someinteresting differences. For example, in (19), the noise variance σ and the sampling operator B directly appear in the SDPconstraint, whereas in MMV formulation (8), they appear asan additional regularization term, i.e., (cid:107) X − BT (cid:107) ≤ (cid:15) (cid:48) , where (cid:15) (cid:48) can be set by knowing the value of σ . Moreover, the wholeobservation X during the training period has been replaced by ∆ = (cid:98) C / x in (19), thus, its complexity is independent of thetraining length T . ♦ Improving AML via
Concave-Convex Procedure . As the costfunction L ( (cid:101) S ) = L cav ( (cid:101) S ) + L vex ( (cid:101) S ) is a sum of a convex anda concave function, by slightly modifying algorithm (18), wecan obtain better estimates of the signal covariance matrix (cid:101) S even for high-SNR regime via the concave-convex procedure (CCCP) [46]. This consists in running a sequence of convexprograms iteratively, such that in each iteration a better esti-mate of the optimal (not necessarily the globally optimal) MLsolution is computed. Let (cid:101) S (cid:96) , (cid:96) = 0 , , . . . , denote the estimategenerated at iteration (cid:96) . Consider the iteration k , where theestimates (cid:101) S , (cid:101) S , . . . , (cid:101) S k have already been computed. Giventhe last estimate (cid:101) S k , let Γ k := I m + B (cid:101) S k B H . Then, we canapproximate the concave function L cav ( (cid:101) S ) as L cav ( (cid:101) S ) = log det ( I m + B (cid:101) SB H )= log det ( Γ k + B ( (cid:101) S − (cid:101) S k ) B H )= log det ( Γ k ) + log det (cid:16) I m + Γ − / k B ( (cid:101) S − (cid:101) S k ) B H Γ − / k (cid:17) ( a ) ≤ log det ( I + B (cid:101) S k B H ) + Tr (cid:16) Γ − / k B ( (cid:101) S − (cid:101) S k ) B H Γ − / k (cid:17) = L cav ( (cid:101) S k ) + (cid:104) B H Γ − k B , (cid:101) S − (cid:101) S k (cid:105) , (20)where in ( a ) , we used an extension of Proposition 4 proved inAppendix C-A to upper bound log det ( I m + H ) for a HermitianPSD matrix H by tr ( H ) . We also define Υ k ( (cid:101) S ; (cid:101) S k ) := L cav ( (cid:101) S k ) + (cid:104) B H Γ − k B , (cid:101) S − (cid:101) S k (cid:105) , and L k ( (cid:101) S ; (cid:101) S k ) := Υ k ( (cid:101) S ; (cid:101) S k ) + L vex ( (cid:101) S ) . It follows that, forarbitrary (cid:101) S and (cid:101) S k in T + , we have L ( (cid:101) S ) ≤ L k ( (cid:101) S ; (cid:101) S k ) . Also,from Proposition 4, we know that Υ k ( (cid:101) S ; (cid:101) S k ) is a tight convexupper bound for the concave function L cav ( (cid:101) S ) especiallyaround (cid:101) S k . Thus, L k ( (cid:101) S ; (cid:101) S k ) is a tight convex upper boundfor L ( (cid:101) S ) . To find the next estimate (cid:101) S k +1 in CCCP, we solvethe following convex optimization (cid:101) S k +1 = arg min M ∈ T + L k ( M ; (cid:101) S k ) . (21)Using Proposition 5, this can also be cast as an SDP that can beefficiently solved. We initialize the estimates with (cid:101) S = for k = 0 . It is immediately seen that Υ ( (cid:101) S ; (cid:101) S ) = Tr ( B (cid:101) SB H ) ,and L ( (cid:101) S , (cid:101) S ) = L app ( (cid:101) S ) coincides with the AML function in(17). Thus, the estimate (cid:101) S corresponds to the AML estimate.We can also see that the sequence of estimates { (cid:101) S k } ∞ k =0 monotonically improve the likelihood function, i.e., L ( (cid:101) S k +1 ) ≤ L k ( (cid:101) S k +1 ; (cid:101) S k ) = min M ∈ T + L k ( M ; (cid:101) S k ) (22) ≤ L k ( (cid:101) S k ; (cid:101) S k ) = L ( (cid:101) S k ) , (23) where we used the identity Υ k ( (cid:101) S k ; (cid:101) S k ) = L cav ( (cid:101) S k ) , whichimplies L k ( (cid:101) S k ; (cid:101) S k ) = L ( (cid:101) S k ) . As a result, if AML is agood approximation of ML, we expect that (cid:101) S provides agood initialization point for the CCCP, such that the sequence { (cid:101) S k } ∞ k =1 converges to the ML estimate (globally optimalpoint). C. Algorithm 2 : MMV with Reduced Dimension (RMMV)
One of the main problems with grid-based and off-gridMMV optimizations in (7) and (8) is that their complexityscales very fast with the sample size T . Here, we propose anSVD-based technique as in [26] to reduce the computationalcomplexity of (7) and (8). Consider again the observation X = BY during the training period of length T . For thetime being, assume that the discrete AoA model holds and thearrival angles belong to a prefixed grid with elements in theinterval [ − θ max , θ max ] . In this case, the model X = DW + N with the discretized dictionary D = [ Ba ( θ ) , . . . , Ba ( θ G )] holds exactly.We assume that T (cid:29) m = O ( √ M ) and consider “economyform” SVD X = U m Σ m V H m , where V m is a T × m tall unitary matrix and Σ m is the m × m diagonal matrixof the non-zero singular values. We define the new data (cid:101) X = XV m = U m Σ m . Notice that (cid:101) X can be simply computedfrom the sample covariance matrix of the data (cid:98) C x = T XX H ,thus, it is not necessary to store the whole observation X during the training time. Moreover, (cid:98) C x can also be computedfrom (cid:101) X , thus, Proposition 2 implies that (cid:101) X is also a sufficientstatistics. We also have (cid:101) X = DWV m + NV m = D (cid:102) W + (cid:101) N , (24)where (cid:102) W and (cid:101) N of dimension G × m and m × m respectively,are the modified channel gains and array noise. It is notdifficult to check that the reduced observation in (24) is still inthe MMV format, in the sense that the matrix (cid:102) W has nonzerorows only on the grid points corresponding to the channelAoAs. However, the dimension of the problem now is fixedand does not scale with T . Of course, since in reality the AoAsare not exactly placed on a grid, this method suffers from thealready mentioned mismatch due to domain discretization.Our second algorithm for subspace estimation, referred toin the following as Reduced MMV (RMMV), simply appliesthe off-grid atomic-norm minimization for the MMV problemreviewed in Section II-D to the low-dim data (cid:101) X . This can becast as the following SDP ( S ∗ , W ∗ , Z ∗ ) = arg min M ∈ T + , W ∈ C m × m , Z ∈ C M × m Tr ( M ) + Tr ( W ) subject to (cid:20) M ZZ H W (cid:21) (cid:23) , (cid:107) (cid:101) X − BZ (cid:107) ≤ (cid:15), (25)where (cid:15) is an estimate of the norm of (cid:101) N . For large values of T ,we expect that (cid:98) C x ≈ σ I m + BSB H by the consistency of thesample covariance estimator, such that the noise componentsin (cid:101) N remain approximately independent and Gaussian. If m is sufficiently large, then the optimal value of (cid:15) concentratesaround (cid:15) ∗ = σ √ m = mσ ≈ σ √ M , where σ is the noisevariance in each array element, and where we used the fact that, for coprime sampling, m ≈ √ M . The noise level σ at the output of the array elements can be typically estimatedduring the system’s operation. D. Algorithm 3 : Super Resolution (SR)
Let S ( γ ) = (cid:82) − γ ( u ) a ( u ) a ( u ) H du be the signal covariancematrix as in (10). It is not difficult to check that, the firstcolumn of S contains the vector of Fourier coefficients of γ given by f := (cid:104) γ, a (cid:105) := (cid:82) − γ ( u ) a ( u ) du , where [ f ] k =[ (cid:104) γ, a (cid:105) ] k := (cid:82) − γ ( u ) e jkπu du , k ∈ [ M ] . Since S ( γ ) isToeplitz, Proposition 1 yields that for the coprime samplingmatrix B introduced in Section IV-A all the elements of S , andas a result f , can be identified from BSB H . This implies thatfor a sufficiently large T , we can estimate f accurately usingthe elements of the sample covariance matrix (cid:98) C x = B (cid:98) C y B H .Let X k and c k = |X k | be the covering set and the coveringnumber of the element k , as defined in (14). We define thefollowing estimator for [ f ] k [ (cid:98) f ] k = (cid:80) ( i,i (cid:48) ) ∈X k [ (cid:98) C x ] i,i (cid:48) c k . (26)In Appendix 9, in Proposition B-A, we prove that for k (cid:54) =0 , [ (cid:98) f ] k is an unbiased estimator of [ f ] k with an approximatevariance (exact if c k = 1 ) given by ( σ +[ f ] ) T c k , converging to as T → ∞ . We propose the following TV-minimization torecover the subspace of the signal from the estimates (cid:98) f γ ∗ = arg min (cid:107) f (cid:107) TV subject to (cid:107)(cid:104) f, a (cid:105) − (cid:98) f (cid:107) ≤ (cid:15), (27)where (cid:15) is an estimate of the norm of the noise in the data.For a non-negative measure γ , the total-variation norm (cid:107) γ (cid:107) TV is simply given by (cid:82) − γ ( u ) du = [ f ] . Moreover, as f can beobtained from the first column of the Toeplitz matrix S ( γ ) ,we can write the optimization (27) directly in terms of thecovariance matrix: S ∗ = arg min M ∈ T + Tr ( M ) subject to (cid:107) M e − (cid:98) f (cid:107) ≤ ξ (cid:114) MT ( σ + [ M ] ) , (28)where e = (1 , , . . . , T has dimension M × , where [ M ] is the diagonal element of the Toeplitz matrix M (equivalentto [ f ] ), and where the (cid:15) parameter in (27) has been replacedby an estimate thereof in which ξ ≈ is some parameter thatcan be tuned appropriately. The motivation for (28) is that forsufficiently large M and for the true signal distribution γ , thebest value of (cid:15) in (27) can be estimated by (cid:107)(cid:104) γ, a (cid:105) − (cid:98) f (cid:107) = (cid:88) k | [ (cid:98) f ] k − [ f ] k | → (cid:88) k E (cid:104)(cid:12)(cid:12) [ (cid:98) f ] k − [ f ] k (cid:12)(cid:12) (cid:105) = (cid:88) k V ar (cid:104) [ (cid:98) f ] k (cid:105) ( a ) ≤ M ( σ + [ f ] ) T , (29)where in ( a ) we used the results proved for the variance ofthe estimate [ (cid:98) f ] k in Appendix B-A, and the fact for thoseelements with c k > , the resulting variance is less than ( σ +[ f ] ) T . Thus, we have replaced (cid:15) in (27) by its approxima-tion (cid:113) MT ( σ +[ f ] ) = (cid:113) MT ( σ +[ M ] ) , where an additional tuning by the scaling parameter ξ has been added to includethe variation of this optimal value around its mean. Algorithm(28) is a convex optimization that can be solved if an estimateof the noise variance σ is available. In particular, no priorknowledge of SNR is necessary. Remark It might happen, especially for small array size M , that some of the elements k ∈ [ M ] are not covered by thecoprime sampling D , i.e., c k = |X k | = 0 . In this case, [ (cid:98) f ] k cannot be estimated for those elements. However, we can still run(27) or equivalently (28) by including in the constraint onlythe Fourier coefficients corresponding to the elements with c k ≥ . Note that since the optimization is done over T + , ifthe number of unobserved elements of f (i.e., those elementswith c k = 0 ) is negligible compared with M , they do notaffect the performance considerably. ♦ Remark A coprime sampling scheme similar to oursalong with TV-minimization has been used in [32] for DoAestimation. Provided the AoAs are well-separated, the esti-mation algorithm in [32] can estimate them from the dualoptimization proposed in [30, 31]. However, the authors donot use the positivity of the measure (in our case γ ) thatnaturally arises because of positive semi-definite property ofthe signal covariance matrix. Moreover, in this paper, we dealwith a wireless scattering channel for which the AoAs maybe clustered. This implies that the separation requirement forthe super-resolution setup may not be met. However, sinceour aim is to estimate the subspace of the signal rather thanAoAs, using the positivity of the underlying measure, we candirectly solve the primal problem (27) rather than the dualone that is used for DoA estimation. In particular, we do notneed to go through the complicated and error-prone procedureof estimating the support (AoAs) via the dual polynomial, asrequired by DoA estimation in [32]. ♦ E. Algorithm 4 : Covariance Matrix Projection (CMP)
We define the sampling operator sub : C M × M → C m × m ,where sub ( K ) = BKB H . Using the operator sub , we candefine a positive semi-definite bilinear form on the space of M × M matrices by (cid:104) K , L (cid:105) B = (cid:104) sub ( K ) , sub ( L ) (cid:105) , where thelatter inner product is defined in the space of m × m matrices.This induces the seminorm (cid:107) K (cid:107) B = (cid:112) (cid:104) K , K (cid:105) B . It can beeasily checked that (cid:104) K , L (cid:105) B restricted to T + is indeed aninner product, and thus it induces a well-defined norm. Wealso define α B ( M ) = max K ∈ T (cid:107) K (cid:107)(cid:107) K (cid:107) B , (30)where dependence on M indicates that the maximization isperformed over the space of all M × M Hermitian Toeplitzmatrices. The parameter α B ( M ) is a measure of coherence ofthe sampling matrix B with respect to the space of Toeplitzmatrices. It is not difficult to check that for the coprime matrix B , it holds that ≤ α B ( M ) ≤ √ M . Our analysis shows thatthe CMP algorithm, defined in the following, performs betterfor sampling matrices B with a smaller α B ( M ) . Note that (cid:107) . (cid:107) B is not a norm on the space of all M × M matrices sincewe can simply find an M × M matrix K (cid:54) = 0 for which (cid:107) K (cid:107) B = 0 . Let (cid:98) C x = T XX H . In order to recover the dominant p -dimsubspace of the signal, we first find an estimate of the wholesignal covariance matrix C y by C ∗ y = arg min M ∈ T + (cid:107) (cid:98) C x − BMB H (cid:107) . (31)This is equivalent to the following optimization problem C ∗ y = arg min M ∈ T + (cid:107) (cid:98) C y − M (cid:107) B , (32)which implies that the optimal solution C ∗ y is given by theprojection of the sample covariance matrix of the whole arraysignal on T + under seminorm (cid:107) . (cid:107) B . Note that this projectionis unique since the restriction of the seminorm (cid:107) . (cid:107) B to T + isindeed a norm and the projection theorem holds.If an estimate of the noise variance σ is available, we candirectly estimate the covariance matrix of the signal via thefollowing variant of (32) S ∗ = arg min M ∈ T + (cid:107) (cid:98) C x − σ I m − BMB H (cid:107) . (33)Once C ∗ y or S ∗ are estimated, we use its p -dim dominantsubspace (for some appropriately chosen p ∈ { , . . . , M } ) asan estimate of the signal subspace.V. P ERFORMANCE A NALYSIS AND D ISCUSSION
In this section, we provide lower bounds on the performanceof SR and CMP algorithms. We did not make progress inthe analysis of AML and RMMV due to the difficulty ofcharacterizing the SDP solution. Therefore, this is left as afuture work.For the CMP Algorithm we obtain the following perfor-mance bound.
Theorem Consider the signal model given by (9) over atraining period of length T . Then, for a given p ∈ { , . . . , M } ,the CMP estimating a p -dim signal subspace achieves perfor-mance measure Γ p (see (12)) satisfying E [Γ p ] ≥ max (cid:110) − √ pη p √ T (1 + 1 snr ) , (cid:111) , (34) V ar [Γ p ] ≤ pT η p (1 + 1 snr ) , (35)where η p is defined in (11), and where snr denotes the SNRin one snapshot t ∈ [ T ] . (cid:3) Proof:
See Appendix B-B.A result similar to Theorem 1 holds for the SR Algorithm.
Theorem Consider the signal model (9) with the powerdistribution γ ( u ) with an M -dim Fourier coefficients f = (cid:82) − γ ( u ) a ( u ) du . Suppose that ξ in (28) is sufficiently largesuch that f is feasible. Then, for a given p ∈ { , . . . , M } , theSR estimating a p -dim signal subspace achieves performancemeasure Γ p satisfying Γ p ≥ − √ p ξη p √ T (1 + 1 snr ) , (36)where snr denotes the SNR in one snapshot t ∈ [ T ] . (cid:3) Proof:
See Appendix B-B.Some remarks about Theorem 1 and 2 are in order here.
Remark It is seen from Theorem 1 that for T → ∞ , E [Γ p ] converges to and V ar [Γ p ] tends to , which shows theconsistency of CMP. Similarly, it is not difficult to check thatfor large T , by taking ξ ≈ , the conditions of Theorem 2 holdwith very high probability. This implies that lim T →∞ Γ p = 1 in probability, yielding the consistency of SR. Thus, for large T , the p -dim subspace estimate obtained from both algorithmsis as efficient as the best p -dim signal subspace. ♦ Remark Both Theorem 1 and 2 indicate that even forinfinite SNR, one still needs to take some measurements. Thereason is that, even in the absence of noise, the signal y ( t ) in (9) is stochastic and both estimators need some data todiscover the underlying signal covariance structure. It is alsoseen that the performance is not very sensitive to SNR whenthe latter is not too small. However, for snr ↓ , the requiredtraining time for achieving a specific target performancescales as T = O ( snr ) . This may indicate that the subspaceestimation is quite challenging in low-SNR channels such asmm-wave. ♦ Remark
Let us assume that the signal power is con-centrated in an α -dim subspace. In this case, η α ≈ , and fora moderate value of SNR, the required training time scaleslike T = O ( α ) . Two different models can be considered forthe signal. In the first model, the effective dimension α doesnot scale by increasing M , thus, the required training lengthis independent of the embedding dimension or the number ofarray elements M . In the second one, the user has a fixedangular range ∆ θ = βπ , for some β ∈ (0 , , for which α ≈ βM scales linearly with M . In this case, the requiredtraining length scales linearly in M . ♦ VI. S
IMULATION R ESULTS
In this section, we assess the performance of our proposedestimators via numerical simulations. The computationally ef-ficient implementation of the proposed optimization problemsis beyond the scope of the present work. Therefore, here weuse the general-purpose CVX package [47] for running all theconvex optimizations.
A. Comparing the Performance of Subspace Estimators
We consider an array of size M = 80 and θ max = 60 degrees (corresponding to an angular sector of degrees).We use a coprime sampling with q = 7 , q = 9 , where wedenote the set of indices of the sampled antennas with D ,where |D| = 19 . Although there are still some array indices in [ M ] = { , . . . , } not covered by ∆ D , the simulations showthat the estimators are quite insensitive to the presence of afew unobserved elements. We also assume that only m = 20 RF chains are available at the BS, which would be enough toimplement the coprime sampling, and to serve up to datastreams in the UL or DL.As an example, we consider a scattering channel with AoAsin the range Θ = [ θ , θ (cid:48) ] ∪ [ θ , θ (cid:48) ] , where θ = − , θ (cid:48) = − , θ = 10 , and θ (cid:48) = 20 degrees. We assume a uniformpower distribution over Θ , thus, the total angular support is degrees. The AoA scattering function γ ( u ) is given by γ ( u ) = (cid:26) κ √ − u u ∈ [ u , u (cid:48) ] ∪ [ u , u (cid:48) ]0 otherwise, (37) where u i = sin( θ i ) and u (cid:48) i = sin( θ (cid:48) i ) , for i = 1 , , and where κ > is a normalization constant. We calculate the vectorof Fourier coefficients f = (cid:82) − γ ( u ) a ( u ) du , from which weconstruct the Toeplitz signal covariance matrix S . The i.i.d.channel vectors in each training period are generated accordingto h = S / n , and the corresponding observation at the arrayantennas is y = h + σ n , where n , n ∼ CN ( , I M ) areindependent vectors, and where σ denotes the noise variance.We compare the performance of each subspace estimatorwith the optimal beamformer that captures more than of signal’s power (i.e., η p = 0 . ), from which we obtainthe required dimension p . We estimate the efficiency ofeach estimator, denoted by Γ p , via numerical simulations. Tocompare the performance of our algorithms with the state ofthe art, we have selected three candidate algorithms that wehave also reviewed in Section II. • PETRELS.
We have implemented the algorithm intro-duced in [15] with the difference that here we fix thedata size. Hence, in every step of the algorithm we selecta training sample at random from the fixed training setand update the estimated signal subspace. • Nuclear-norm minimization.
We run the optimizationproblem given in (6). Here we optimistically assume thatthe algorithm knows the best value of (cid:15) , given by (cid:15) ∗ = (cid:107) (cid:98) C x − BC y B H (cid:107) , although in reality this would not beavailable and must be estimated. • MMV Algorithms.
We compare our algorithms with thestate of the art MMV methods as summarized in SectionII-D. The first algorithm runs the optimization introducedin (7), where we consider a quantized grid of size G = 3 M equally spaced AoAs. We call the algorithmgrid-based MMV (GBMMV). The second one is basedon the off-grid techniques given by the optimization (8).A byproduct of this optimization is to directly obtainan estimate of the covariance of the data C ∗ y = M ∗ ,given by the matrix M ∗ that achieves the minimization in(8). Then, we extract the dominant p -dim subspace fromsuch estimate. We call this algorithm grid-less MMV(GLMMV). We set (cid:15) (cid:48) to its optimal value given by (cid:96) -norm of the noise in subsampled observations X . Performance vs. signal-to-noise ratio.
Fig. 2 compares theperformance of our proposed algorithms with the ones in theliterature for a range of SNR. The curve corresponding to eachalgorithm is obtained by averaging over independent runs.It is seen that AML, RMMV, and SR perform comparablywith the GLMMV but they have much lower computationalcomplexity, which in particular does not scale with T . Theperformance of CMP is as good as GBMMV and better thanNuclear-norm minimization especially for higher SNR, but itscomplexity is much lower than GBMMV especially for large T . PETRELS does not perform very well for the fixed datasize, e.g., its performance even for T = 800 is worse than theother algorithms. Performance vs. training length T . Fig. 3 compares the scal-ing performance of our proposed algorithms and Nuclear-normminimization for different training lengths. As the performanceof AML and RMMV is comparable with the GLMMV and − − − . . . . SNR [dB] P e rf o r m a n ce M e t r i c Γ p Performance of different estimators vs. SNR for T = 100 PETRELSNucNormGBMMVGLMMVAMLRMMVSRCMP
Fig. 2: Performance comparison of various subspace estima-tors versus the received SNR for training length T = 100 . − − − . . . . SNR [dB] P e rf o r m a n ce M e t r i c Γ p T = 200 NucNormAMLRMMVSRCMP − − − . . . . SNR [dB] T = 400 NucNormAMLRMMVSRCMP − − − . . . . SNR [dB] P e rf o r m a n ce M e t r i c Γ p T = 800 NucNormAMLRMMVSRCMP − − − . . . . SNR [dB] T = 1600 NucNormAMLRMMVSRCMP
Fig. 3:
Scaling of the performance of different estimators withtraining length T ∈ { , , , } better than GBMMV and since, for large training length T ,these algorithms run very slowly, we have not included them inthis figure. The results generally show that for moderate rangeof SNR, the performance of the proposed algorithms improvesconsiderably by increasing T . However, for low SNR, theresulting improvement is quite negligible. This confirms thecomment made in Remark 9 about the scaling behaviour oftraining length T with snr . B. JSDM with Subspace Estimation
Basic Setup.
As explained in the Introduction, our mainmotivation for estimating the users’ signal subspaces comesfrom JSDM [7–9], where the users are grouped/clustered basedon the similarity of their signal subspaces so that they canbe served efficiently by the BS. In this section, we demon-strate the performance of our proposed subspace estimation algorithms included as a component of a JSDM system. Inparticular, we consider a setup closely inspired by [8], whereusers have a uniform AoA scattering function over an intervallocated in the angular sector [ − θ max , θ max ] covered by theBS. Users are grouped by a Grassmannian quantizer with afixed and pre-defined set of quantization points. Following theapproach in [8], we fix the Grassmannian quantizer such thateach quantization point is a subspace spanned by a group ofadjacent columns of the M × M Discrete Fourier Transform (DFT) matrix. Once the users are partitioned into groups, afixed number of users per group is served by JSDM with per-group processing (PGP) (see details in [7]). The main motiva-tion of this paper is the HDA low-complexity implementationof JSDM, for which the number of RF chains (and A/Dconverters) at the BS side is limited to m (cid:28) M . Therefore,both the estimation of the users’ subspaces and the estimationof the per-group effective channels, including the JSDM pre-beamforming, must be obtained by sampling no more than m analog signal dimensions. In our example, we considere aULA with M = 80 elements and θ max = 60 degrees, and thesame coprime sampling as in Section VI-A, and we assumethat the BS can only sample m = 20 analog RF demodulatedsignals. While in [8] it is assumed that the users’ channelcovariance matrices are ideally known, and the grouping byGrassmannian quantization is performed on the subspacesextracted from the SVD of such covariance matrices, herewe estimate the users’ subspace using our algorithms, from ablock of T i.i.d. noisy channel snapshots captured during thetraining period, as explained before. Then, we apply the samegrouping scheme and DFT pre-beamforming scheme to boththe estimated case and the ideally known case, and comparethe results in terms of achieved total sum-rate, averaged over asufficiently large number of independent channel realizations. Signal Model.
We considered a collection of users K = { , . . . , K } of size K = |K| = 200 . The AoA scatteringfunction of user k is a uniform distribution in the interval [ θ k − ∆ θ k , θ k + ∆ θ k ] , corresponding to the following powerdistribution in the u -domain (recall that u = sin( θ )sin( θ max ) ): γ k ( u ) = (cid:40) χ √ − u u ∈ (cid:2) sin( θ k − ∆ θk )sin( θ max ) , sin( θ k + ∆ θk )sin( θ max ) (cid:3) , otherwise,where χ > is a normalization constant. The users’ angularspread (same for all users) is ∆ θ k = ∆ θ = 20 degress, and thecenter-AoA θ k is i.i.d. and uniformly distributed in [ − θ max + ∆ θ , θ max − ∆ θ ] . Letting ∆ u k indicate the support size of thepower distribution γ k ( u ) , we define ∆ u = max k ∈K ∆ u k asthe maximum support size of the users in the u -domain, where ∆ u = θ/ θ max ) ≈ . . User Grouping.
The Grassmanian quantizer is obtained byfollowing [8]. We divide the domain u ∈ [ − , into inter-twined groups G = { , . . . , G } , with G = 9 , as illustrated inFig. 4. We denote the u -support of group g ∈ G with U g ⊂ [ − , . Since the length of U g satisfies |U g | ≈ ∆ u = 0 . ,due to the intertwined structure of the groups (see Fig. 4), weexpect that γ k ( u ) for each user k is well-aligned with at leastone U g . The signal subspace for group g ∈ G is simply givenby a submatrix F g of the M × M DFT matrix as follows. We − − . . u Fig. 4: Grouping of the users in the u -domain.define the discrete grid U disc = {− , − M , . . . , − M } , andset the columns of F g to the normalized array steering vectors √ M a ( u ) for all u ∈ U disc ∩ U g . In our case, the matrix F g has dimension M × q g with q g = M/ . Therefore, theJSDM pre-beamforming matrices B g (see notation in [7]) aresimply given by B g = F g We cluster the users into groups in G , where each user k ∈ K is assigned to the group i k ∈ G , where i k =arg max g ∈G (cid:104) S k , F g F H g (cid:105) corresponds to the group whose sub-space captures the maximum amount of power in S k . Thiscan be seen as an improved weighted version of the chordaldistance between subspaces used in [8], and reduces to [8]when the non-zero eigenvalues of S k are constant. As saidbefore, we apply the same clustering/quantization scheme bothto the ideally known set { S k } and to the estimated set of usercovariances { (cid:98) S k } . In this example, we used the SR Algorithmin (28) with tuning parameter ξ = 2 . Since the performance ofour proposed algorithms in terms of Γ p is similar (see SectionVI-A), we expect that also the other proposed algorithms yieldsimilar results in terms of JSDM sum-rate as the ones reportedhere for SR. Beamforming and Scheduling.
After clustering the users,JSDM with PGP is applied. Following [8], we divide thegroups G into two subsets G = { , , , , } and G = { , , , } with intertwined angular supports (see Fig. 4), suchthat within each subset the groups have mutually orthogonalmatrices F g . The two group subsets are served in orthogonalresource blocks, while the groups within each subset are servedusing spatial multiplexing.For clarity, we focus on subset G while the scheme for G follows immediately. Let h g (cid:96) be the channel vector of the (cid:96) -thscheduled user in group g ∈ G , and let H g = [ h g , . . . , h g Sg ] and H = [ H , H , . . . ] be the channel matrix of the scheduledusers in group g , and the channel matrix of all groupscombined in a given coherence block. The BS wishes to serve S = (cid:80) g ∈G S g data streams (users), where S g ≤ q g is thenumber of users scheduled in group g . We suppose that thescheduled users in group g are selected randomly from theset of users assigned to group g by the grouping algorithm. SNR [dB] S u m R a t e T = 13 PGPPGP-SE
SNR [dB] S u m R a t e T = 25 PGPPGP-SE
SNR [dB] S u m R a t e T = 50 PGPPGP-SE
SNR [dB] S u m R a t e T = 100 PGPPGP-SE
Fig. 5: Sum-rate vs. SNR performance of a JSDM with PGPwith exact subspace knowledge (PGP), and comparison witha JSDM with PGP with subspace estimation (PGP-SE) fortraining lengths T ∈ { , , , } .Since we have only m = 20 RF chains, we set S g = 4 , for g ∈ G ( S g = 5 for g ∈ G ). The JSDM two-stage precoderis given by V = UR , where U = [ F , F , . . . , F G ] is the M × q DFT pre-beamforming matrix with q = (cid:80) g ∈G q g , andwhere R ∈ C q × S is the baseband (implemented in the digitaldomain) MIMO multiuser precoding matrix in block-diagonalform according to the PGP scheme. The MIMO precodingmatrix R depends on the instantaneous realizations of thereduced dimensional projected channel matrices H gg = F H g H g for g ∈ G . The scheduled users in each group are served usinglinear zero-forcing beamforming (ZFBF) matrix obtained asthe column-normalized version of the pseudo-inverse of theprojected channel matrix H gg . Further details can be found in[7] and are omitted here due to space limitation. Sum-rate performance and simulation results.
Let U and R be the pre-beamforming and the MIMO precoding ma-trices, and let d ∈ C S be the S -dim vector of S symbolscorresponding to S data streams. We normalize the symbolenergy and the users’ channel vectors such that E [ dd H ] = I S and E [ (cid:107) h k (cid:107) ] = 1 , for all k ∈ K . When d is transmittedin the DL, the received signal at the user side is given by y = H H URd + n , where n ∼ CN ( , snr I S ) is the S -dimvector of additive noise. This is equivalent to an S × S MIMOmultiuser interference channel given by the transfer matrix T := H H UR . Treating the interference as noise, the signal-to-interference-plus-noise (SINR) for the data stream s is givenby sinr s = | [ T ] s,s | (cid:80) s (cid:48) (cid:54) = s | [ T ] s,s (cid:48) | + snr . (38) The corresponding achieved sum-rate in the block is given by C sum ( H , snr ) = S (cid:88) s =1
12 log (1 + sinr s ) . (39)In simulations, we evaluate C sum ( H , snr ) for independentrealizations of the channel matrix H , and plot the average sum-rate, which yields the “ergodic” rate achieved via coding overmany fading blocks, and ideally adapting the rate of the datastreams in each block. For simplicity, in our simulations wehave used the same value snr both for DL data transmissionand for the UL training, in order to estimate the users’subspaces. In practice, UL and DL SNRs may differ. However,this does not change the qualitative conclusions of our results,since we can always compensate for a lower UL SNR byincreasing the subspace estimation training length T . In orderto clearly put in evidence the effect of the subspace estimation,we calculated the achievable sum-rate under the assumptionthat the projected channel matrices H gg are perfectly knownat the BS. In general, a dedicated UL training per groupmust be implemented, from which H gg is estimated via UL-DL reciprocity [5, 6]. We did not include here the projectedchannel estimation phase since it incurs a small performancedegradation with respect to the ideal case in the practicallyrelevant range of SNR (see analysis in [7]), and would havethe effect of “blurring” the dependence of the performancewith respect to the subspace estimation, which is the focus ofthis paper.Fig. 5 shows the JSDM sum-rate vs. SNR for the systemdescribed before. It is seen that subspace estimation even forvery short training length T does not incur any significant lossin terms of achievable rate.VII. C ONCLUSION
In this paper, we studied the estimation of the user signalsubspace in a MIMO wireless scenario where a transmitter(user) with a single antenna sends training symbols, and areceiver (BS) equipped with an array with M antennas obtainsnoisy linear sketches of the corresponding channel vector,modeled as an i.i.d. (in time) and correlated (in the antennadomain) vector Gaussian random process. Our algorithmsrequire sampling only √ M antenna array elements andreturn a p -dimensional estimated subspace capturing almostthe same amount of signal power as the best possible p -dimensional subspace. We analyzed the performance of someof the proposed algorithms and compared it with that of state-of-the art methods in the literature via numerical simulations.We also illustrated in details how the users’ subspaceinformation can be exploited in a JSDM hybrid digital-analogimplementation of a massive MIMO system. We providednumerical simulations showing that in terms of the achievedergodic sum-rate, such a JSDM system in which the users’subspaces are estimated through our proposed algorithmsperforms almost indistinguishably from a scheme that as-sumes ideal knowledge of the users’ channel covariances.This indicates that the proposed algorithms are well suitedfor JSDM with HDA implementation with a small number m (cid:28) M of RF chains (and A/D converters), where the group-separating beamformer is implemented in the analog (RF)domain, and the multiuser precoding is implemented in thedigital (baseband) domain.A PPENDIX AO VERVIEW OF C OMPLEX G AUSSIAN R ANDOM V ARIABLES
In this appendix, we review some of the properties of thecomplex circularly symmetric Gaussian random variables thatwe need in this paper.
Proposition Let X = X r + jX i and Y = Y r + jY i betwo zero-mean unit-variance circularly symmetric complex-valued Gaussian variables with a correlation coefficient ρ = ρ r + jρ i . Then we have:1) E [ XX ∗ ] = E [ Y Y ∗ ] = 1 , and E [ XX ] = E [ Y Y ] = E [ XY ] = 0 .2) E [ X r ] = E [ X i ] = E [ Y r ] = E [ Y i ] = .3) E [ X r X i ] = E [ Y r Y i ] = 0 , E [ X r Y r ] = E [ X i Y i ] = ρ r ,and E [ X r Y i ] = − E [ X i Y r ] = ρ i .We also need the following proposition for real-valued Gaus-sian variables. Proposition
Let Z and W betwo real-valued N (0 , Gaussian variables with a covariance ρ . Let g ( z, w ) be a differentiable function of ( z, w ) , and I ( ρ ) = E [ g ( Z, W )] . Then ddρ I = E (cid:104) ∂ ∂z∂w g ( Z, W ) (cid:105) . (cid:3) Using Proposition 7, we can prove the following result.
Proposition Let X and Y be as in Proposition 6. Then,we have1) E [ X i Y r ] = E [ X r Y i ] = ρ i and E [ X i Y i ] = E [ X r Y r ] = ρ r .2) E [ | XY ∗ | ] = E [ | X | ] E [ | Y | ] + | E [ XY ∗ ] | . Proof:
For simplicity, we prove only one of the iden-tities in part 1. Let us consider E [ X r Y i ] . Note that fromthe properties of the complex Gaussian variables, it resultsthat ( Z, W ) = ( √ X r , √ Y i ) are jointly Gaussian N (0 , random variables with covariance ρ i . This implies that g ( ρ i ) =4 E [ X i Y r ] = E [ Z W ] is a function of ρ i . Applying thePrice’s theorem in Proposition 7, we have ddρ i g = 4 E [ ZW ] = 4 ρ i , (40)which implies that g ( ρ i ) = 2 ρ i + κ , where κ is a constant.For ρ i = 0 , the random variables ( Z, W ) are independent fromeach other, and g (0) = E [ Z W ] = E [ Z ] E [ W ] = 1 , whichimplies that κ = 1 . Hence, we have g ( ρ i ) = 2 ρ i + 1 , whichimplies that E [ X i Y r ] = ρ i .To prove part 2, note that | XY ∗ | = ( X r + X i )( Y r + Y i ) can be expanded into four terms whose expected values canbe computed using Price’s theorem, where we obtain E [ | XY ∗ | ] = 2( 1 + 2 ρ i ρ r | ρ | . (41)Moreover, we have E [ XY ∗ ] = E [ X r Y r + X i Y i ] + j E [ X i Y r − X r Y i ]= 2( ρ r j − ρ i ρ r − jρ i = ρ ∗ . (42)The result follows from (41), (42) and E [ | X | ] = E [ | Y | ] = 1 . A PPENDIX BA NALYSIS AND P ROOF T ECHNIQUES
A. Statistics of the Subsampled Signal
From the signal model in (9), it is seen that the receivedsignal y ( t ) is a complex Gaussian vector with covariancematrix C y = S ( γ ) + σ I M . Since we assume that y ( t ) isindependent across different snapshots t ∈ [ T ] , this completelyspecifies its statistics. Similarly, it results that the statisticsof the sketches x ( t ) = By ( t ) is fully specified with thecovariance matrix C x = BC y B H . For the coprime samplingmatrix B , and for i, j ∈ { , . . . , m } with i ≥ j , we obtain [ C x ] i,j = [ S ( γ )] d i ,d j + σ δ ij = [ f ] d i − d j + σ δ ij := [ g ] d i − d j . (43)where d i ∈ D is the i -th largest index of the sampled antennasas in Section IV-A, and where [ f ] k = (cid:82) − γ ( u ) e jkπu du denotes the k -th Fourier coefficient of γ . Notice that the SNRis simply given by snr = [ f ] σ .Now consider a specific k ∈ [ M ] and let d i , d j ∈ D besuch that k = d i − d j . Since, as in Section IV-A, we assumethat D is a complete cover for [ M ] , such d i and d j exist.Let us also define [ (cid:98) g ] k = [ (cid:98) C x ] i,j . The following propositioncharactereizes the mean and the variance of [ (cid:98) g ] k . The proofuses the properties of the complex Gaussian variables reviewedin Appendix A. Proposition Let k ∈ [ M ] with k = d i − d j , and let [ (cid:98) g ] k = [ (cid:98) C x ] i,j . Then, we have E (cid:104) [ (cid:98) g ] k (cid:105) = [ g ] k , V ar (cid:104) [ (cid:98) g ] k (cid:105) = ( σ +[ f ] ) T = σ (1+ snr ) T . (cid:3) Proof:
Taking the expectation, we have E (cid:104) [ (cid:98) g ] k (cid:105) = E (cid:104) [ (cid:98) C x ] i,j (cid:105) = [ C x ] i,j = [ C y ] d i ,d j = [ f ] d i − d j + σ δ ij = [ g ] k . (44)Since the observations y ( t ) , and as a result x ( t ) , are indepen-dent for t ∈ [ T ] , and [ (cid:98) C x ] i,j = 1 T T (cid:88) t =1 [ x ( t )] i [ x ( t )] ∗ j = 1 T T (cid:88) t =1 [ y ( t )] d i [ y ( t )] ∗ d j , (45)it results that V ar (cid:104) [ (cid:98) g ] k (cid:105) = T V ar (cid:104) [ y ( t )] d i [ y ( t )] ∗ d j (cid:105) . Hence,using the properties of the complex Gaussian variables provedin Proposition 8, we obtain V ar (cid:104) [ y ( t )] d i [ y ( t )] ∗ d j (cid:105) = E (cid:104)(cid:12)(cid:12) [ y ( t )] d i [ y ( t )] ∗ d j (cid:12)(cid:12) (cid:105) − (cid:12)(cid:12)(cid:12) E (cid:104) [ y ( t )] d i [ y ( t )] ∗ d j (cid:105)(cid:12)(cid:12)(cid:12) = E (cid:104)(cid:12)(cid:12) [ y ( t )] d i (cid:12)(cid:12) (cid:105) E (cid:104)(cid:12)(cid:12) [ y ( t )] d j (cid:12)(cid:12) (cid:105) + (cid:12)(cid:12)(cid:12) E (cid:2) [ y ( t )] d i [ y ( t )] ∗ d j (cid:3)(cid:12)(cid:12)(cid:12) − (cid:12)(cid:12)(cid:12) E (cid:2) [ y ( t )] d i [ y ( t )] ∗ d j (cid:3)(cid:12)(cid:12)(cid:12) = ([ f ] + σ ) = σ (1 + snr ) , (46)where snr = [ f ] σ as before. Hence, V ar (cid:104) [ (cid:98) g ] k (cid:105) = T σ (1 + snr ) . B. Analysis of the Performance of the CMP and SR Algorithms
In this section, we analyze the performance of the CMP andSR. First, we need the following preliminary results.
Proposition
Let (cid:98) C y be the sample covariance of thesignal y ( t ) , t ∈ [ T ] , and let C ∗ y be the CMP estimategiven by (31) or equivalently (32). Let C (cid:48) ∈ T + be anarbitrary Hermitian PSD Toeplitz matrix. Then max (cid:110) (cid:107) (cid:98) C y − C ∗ y (cid:107) B , (cid:107) C ∗ y − C (cid:48) (cid:107) B (cid:111) ≤ (cid:107) (cid:98) C y − C (cid:48) (cid:107) B . (cid:3) Proof:
The inequality (cid:107) (cid:98) C y − C ∗ y (cid:107) B ≤ (cid:107) (cid:98) C y − C (cid:48) (cid:107) B simplyfollows from the definition of C ∗ y as the projection of (cid:98) C y onto the space T + . Thus, we only need to prove the otherinequality (cid:107) C ∗ y − C (cid:48) (cid:107) B ≤ (cid:107) (cid:98) C y − C (cid:48) (cid:107) B . To prove this, notethat the seminorm (cid:107) . (cid:107) B is defined from a PSD bilinear form.As C (cid:48) itself belongs to T + , it is not difficult to see that at theprojection C ∗ y , the vector C (cid:48) − C ∗ y is a feasible direction tomove because from the convexity of the space T + , it resultsthat C ∗ y + α ( C (cid:48) − C ∗ y ) ∈ T + for any α ∈ [0 , . Thus, from theoptimality of C ∗ y , it results that (cid:104) (cid:98) C y − C ∗ y , C (cid:48) − C ∗ y (cid:105) B ≤ .This implies that (cid:107) (cid:98) C y − C (cid:48) (cid:107) B = (cid:107) (cid:98) C y − C ∗ y + C ∗ y − C (cid:48) (cid:107) B (47) = (cid:107) (cid:98) C y − C ∗ y (cid:107) B + (cid:107) C ∗ y − C (cid:48) (cid:107) B (48) − (cid:104) (cid:98) C y − C ∗ y , C (cid:48) − C ∗ y (cid:105) B (49) ≥ (cid:107) (cid:98) C y − C ∗ y (cid:107) B + (cid:107) C ∗ y − C (cid:48) (cid:107) B , (50)from which the desired inequality (cid:107) C ∗ y − C (cid:48) (cid:107) B ≤ (cid:107) (cid:98) C y − C (cid:48) (cid:107) B results. Combining the two inequalities, gives the proof. Proposition
Let C ∗ y and C y be as defined before.Suppose (cid:107) C ∗ y − C y (cid:107) B ≤ (cid:15) and let V ∈ H ( M, p ) be anarbitrary M × p matrix with V H V = I p . Then, |(cid:104) C y , VV H (cid:105)−(cid:104) C ∗ y , VV H (cid:105)| ≤ (cid:15) √ pM . (cid:3) Proof:
Using the Cauchy-Schwartz inequality, we have: |(cid:104) C y , VV H (cid:105) − (cid:104) C ∗ y , VV H (cid:105)| = |(cid:104) C y − C ∗ y , VV H (cid:105)|≤ (cid:107) C y − C ∗ y (cid:107) (cid:113) Tr (cid:0) VV H VV H (cid:1) = (cid:107) C y − C ∗ y (cid:107)√ p ( a ) ≤ α B ( M ) (cid:107) C y − C ∗ y (cid:107) B √ p ≤ (cid:15) (cid:112) pM , (51)where in ( a ) we used the coherence parameter of the coprimematrix α B ( M ) ≤ √ M .After finding the projection C ∗ y , we use its p -dim dominantsubspace to design a beamformer matrix for the received signal y ( t ) , t ∈ [ T ] . Let C ∗ y = UΛU H be the SVD of C ∗ y , and let V ∈ H ( M, p ) be the matrix consisting of the first p columns of U . If the estimate C ∗ y is very close to the C y , then we expectthat V , in terms of capturing the signal power, be a goodapproximation of the dominant p -dim subspace of the signal.This has been formalized in the following propositions. Proposition
Let C ∗ y and C y be as defined before andlet S be the signal covariance matrix, where we have C y = S + σ I M . Assume that (cid:107) C ∗ y − C y (cid:107) B ≤ (cid:15) . Let C y = UΛU H and C ∗ y = (cid:101) U (cid:101) Λ (cid:101) U H be the SVD of C y and C ∗ y respectively. Let V and (cid:101) V be M × p matrices consisting of the first p columnsof U and (cid:101) U . Then, |(cid:104) S , VV H (cid:105) − (cid:104) S , (cid:101) V (cid:101) V H (cid:105)| ≤ (cid:15) √ pM . (cid:3) Proof:
First note that C y and S differ by a multiple ofidentity matrix I M . As (cid:104) I M , VV H (cid:105) = (cid:104) I M , (cid:101) V (cid:101) V H (cid:105) = p , we can equivalently prove the following inequality |(cid:104) C y , VV H (cid:105)−(cid:104) C y , (cid:101) V (cid:101) V H (cid:105)| ≤ (cid:15) √ pM .From the triangle inequality, |(cid:104) C y , VV H (cid:105) − (cid:104) C y , (cid:101) V (cid:101) V H (cid:105)| can be upper bounded by |(cid:104) C y , VV H (cid:105) − (cid:104) C ∗ y , VV H (cid:105)| + |(cid:104) C ∗ y , VV H (cid:105) − (cid:104) C y , (cid:101) V (cid:101) V H (cid:105)| . Using Proposition 11, the first term is less than (cid:15) √ pM . Forthe second term, note that ≤ (cid:104) C y , (cid:101) V (cid:101) V H (cid:105) ≤ (cid:104) C y , VV H (cid:105) because from the SVD, V is the best p -dim beamformer for C y . Similarly, we have ≤ (cid:104) C ∗ y , VV H (cid:105) ≤ (cid:104) C ∗ y , (cid:101) V (cid:101) V H (cid:105) . Con-sequently, we can always make |(cid:104) C ∗ y , VV H (cid:105) − (cid:104) C y , (cid:101) V (cid:101) V H (cid:105)| larger by either changing V into (cid:101) V or vice-versa. This impliesthat |(cid:104) C ∗ y , VV H (cid:105) − (cid:104) C y , (cid:101) V (cid:101) V H (cid:105)| is always smaller than themaximum of |(cid:104) C ∗ y , VV H (cid:105) − (cid:104) C y , VV H (cid:105)| and |(cid:104) C ∗ y , (cid:101) V (cid:101) V H (cid:105) −(cid:104) C y , (cid:101) V (cid:101) V H (cid:105)| , where from Proposition 11 both terms aresmaller than (cid:15) √ pM . Combining with the first upper bound,we have |(cid:104) C ∗ y , (cid:101) V (cid:101) V H (cid:105) − (cid:104) C y , VV H (cid:105)| ≤ (cid:15) (cid:112) pM , (52)which is the desired result. Remark
Notice that V is the optimal p -dim beam-former for S (or equivalently C y ). Proposition 12 implies thatthe optimal beamformer for the estimate covariance matrix C ∗ y is (cid:15) √ pM -optimal for S . ♦ We also need the following result.
Proposition
Let C ∗ y and C y be as before. Then E (cid:104) (cid:107) C ∗ y − C y (cid:107) B (cid:105) ≤ M σ (1 + snr ) T , (53)which also implies E (cid:104) (cid:107) C ∗ y − C y (cid:107) B (cid:105) ≤ σ √ M (1+ snr ) √ T . (cid:3) Proof:
First notice that, we have E (cid:104) (cid:107) C ∗ y − C y (cid:107) B (cid:105) ( a ) ≤ E (cid:104) (cid:107) (cid:98) C y − C y (cid:107) B (cid:105) (54) = M − (cid:88) k =0 c k E (cid:104)(cid:12)(cid:12) [ (cid:98) g ] k − [ g ] k (cid:12)(cid:12) (cid:105) (55) = M − (cid:88) k =0 c k V ar (cid:104) [ (cid:98) g ] k (cid:105) ( b ) = σ (1 + snr ) T M − (cid:88) k =0 c k (56) = m ( m + 1)2 σ (1 + snr ) T ≈ M σ (1 + snr ) T , (57)where in ( a ) we used the inequality proved in Proposition10 by replacing C (cid:48) = C y , where [ g ] k are as in (43), where c k denotes the covering number of k ∈ [ M ] by the coprimesampling D with m = |D| = O (2 √ M ) , and where ( b ) resultsfrom Proposition 9. The other result simply follows from theidentity (cid:110) E (cid:104) (cid:107) C ∗ y − C y (cid:107) B (cid:105)(cid:111) ≤ E (cid:104) (cid:107) C ∗ y − C y (cid:107) B (cid:105) . Proof of Theorem 1:
Using the definition of Γ p and takingthe expectation value we obtain E [Γ p ] = 1 − E (cid:104) (cid:104) S , VV H (cid:105) − (cid:104) S , (cid:101) V (cid:101) V H (cid:105)(cid:104) S , VV H (cid:105) (cid:105) (58) ( a ) = 1 − E (cid:104) |(cid:104) S , (cid:101) V (cid:101) V H (cid:105) − (cid:104) S , VV H (cid:105)|(cid:104) S , VV H (cid:105) (cid:105) (59) ( b ) = 1 − E (cid:2) |(cid:104) C y , (cid:101) V (cid:101) V H (cid:105) − (cid:104) C y , VV H (cid:105)| (cid:3) η p Tr ( S ) (60) ( c ) ≥ − √ pM E (cid:2) (cid:107) C ∗ y − C y (cid:107) B (cid:3) η p M [ f ] (61) ( d ) ≥ − σ √ pM (1 + snr ) η p M [ f ] √ T (62) ≥ − √ pη p √ T (1 + 1 snr ) , (63)where in ( a ) we used (cid:104) S , (cid:101) V (cid:101) V H (cid:105) ≤ (cid:104) S , VV H (cid:105) , in ( b ) we used C y = S + σ I M and Tr ( VV H ) = Tr ( (cid:101) V (cid:101) V H ) , in ( c ) we usedProposition 12, and finally in ( d ) we used Proposition 13.As Γ p ≥ , this implies that E (cid:2) Γ p (cid:3) ≥ max (cid:110) − √ pη p √ T (1 + snr ) , (cid:111) . In a similar way, we obtain V ar [Γ p ] = V ar (cid:104) |(cid:104) S , (cid:101) V (cid:101) V H (cid:105) − (cid:104) S , VV H (cid:105)|(cid:104) S , VV H (cid:105) (cid:105) = V ar (cid:104) |(cid:104) C y , (cid:101) V (cid:101) V H (cid:105) − (cid:104) C y , VV H (cid:105)| (cid:105) η p Tr ( S ) ≤ E (cid:104) |(cid:104) C y , (cid:101) V (cid:101) V H (cid:105) − (cid:104) C y , VV H (cid:105)| (cid:105) η p Tr ( S ) ≤ E (cid:104)(cid:16) √ pM (cid:107) C ∗ y − C y (cid:107) B (cid:17) (cid:105) η p Tr ( S ) = 4 pM E (cid:2) (cid:107) C ∗ y − C y (cid:107) B (cid:3) η p M [ f ] a ) = 8 p M ( snr + 1) σ T η p M [ f ] = 8 pT η p (1 + 1 snr ) , (64)where in ( a ) we used Proposition 13 and the fact that snr = [ f ] σ . This completes the proof. Proof of Theorem 2 : Let f be the Fourier coefficient of γ andlet (cid:98) f be its estimate as in Section IV-D. Let S ∗ be HermitianToeplitz matrix obtained from the optimization (28) and letus denote its first column by f ∗ . Note that Tr ( S ∗ ) = M [ f ∗ ] .We suppose that the parameter ξ is tuned largely enough suchthat the true f is feasible. Also, from the optimality of f ∗ , itresults that [ f ∗ ] ≤ [ f ] . Thus, we have both inequalities (cid:107) f ∗ − (cid:98) f (cid:107) ≤ ξ (cid:114) MT ( σ + [ f ∗ ] ) ≤ ξ (cid:114) MT ( σ + [ f ] ) (65) (cid:107) f − (cid:98) f (cid:107) ≤ ξ (cid:114) MT ( σ + [ f ] ) . (66)Applying the triangle inequality, we have (cid:107) f − f ∗ (cid:107) ≤ ξ (cid:113) MT σ (1 + snr ) , where as before snr = [ f ] σ . Using theToeplitz property, we obtain that (cid:107) S − S ∗ (cid:107) ≤ √ M (cid:107) f − f ∗ (cid:107) ≤ √ ξM σ √ T (1 + snr ) . (67)Let V and (cid:101) V be M × p matrices whose columns span thedominant p -dim signal subspace of S and S ∗ respectively.Using a result similar to Proposition 11 and 12, we obtain |(cid:104) S , VV H (cid:105) − (cid:104) S , (cid:101) V (cid:101) V H (cid:105)| ≤ √ p (cid:107) S − S ∗ (cid:107) (68) ≤ √ pξM σ √ T (1 + snr ) . (69)Dividing both side by (cid:104) S , VV H (cid:105) , using the definition of Γ p in(12), and the definition of η p in (11), and using the fact that Γ p ∈ [0 , , we have Γ p ≥ − √ p ξη p √ T (1 + 1 snr ) . (70)This completes the proof.A PPENDIX CP ROOFS OF THE P ROPOSITIONS
A. Proof of Proposition 4
Note that B (cid:101) SB H is a PSD matrix. We prove the followingmore general statement that for any m × m Hermitian matrix H for which I m + H is PSD, we have log det ( I m + H ) = Tr ( H ) + o ( Tr ( H )) . Let UΛU H be the SVD of H . Then, log det ( I m + H ) = log det ( I m + UΛU H )= log det ( I m + Λ ) = m (cid:88) (cid:96) =1 log(1 + λ (cid:96) )= m (cid:88) (cid:96) =1 λ (cid:96) + o ( Tr ( H )) = Tr ( H ) + o ( Tr ( H )) . (71)In particular, from the concavity of the Logarithm, it resultsthat log(1 + λ (cid:96) ) ≤ λ (cid:96) . This implies that log det ( I m + H ) ≤ Tr ( H ) . This complete the proof. B. Proof of Proposition 5
Let ( (cid:101) S ∗ , W ∗ ) be the output of the SDP (18), and let (cid:101) S opt be the AML estimate given by the optimization (cid:101) S opt =arg min (cid:101) S ∈ T + L app ( (cid:101) S ) . Let us define H opt := I m + B (cid:101) S opt B H and W opt := (cid:101) ∆ H H − opt (cid:101) ∆ . Using the well-known Schurcomplement condition for positive semi-definiteness (see [49]p.28), it results that (cid:34) H opt (cid:101) ∆ (cid:101) ∆ H W opt (cid:35) (cid:23) , which impliesthat ( (cid:101) S opt , W opt ) satisfy the SDP constraint in (18). Hence, L app ( (cid:101) S opt ) = Tr ( B (cid:101) S opt B H ) + Tr ( (cid:98) C (cid:101) x ( I m + B (cid:101) S opt B H ) − ) ( a ) = Tr ( B (cid:101) S opt B H ) + Tr ( (cid:101) ∆ H H − opt (cid:101) ∆ )= Tr ( B (cid:101) S opt B H ) + Tr ( W opt ) ≥ Tr ( B (cid:101) S ∗ B H ) + Tr ( W ∗ ) ( b ) ≥ Tr ( B (cid:101) S ∗ B H ) + Tr ( (cid:101) ∆ H ( I m + B (cid:101) S ∗ B H ) − (cid:101) ∆ )= L app ( (cid:101) S ∗ ) , (72)where in ( a ) , we use (cid:98) C (cid:101) x = (cid:101) ∆ (cid:101) ∆ H , and where in ( b ) , we applySchur complement condition to the SDP constraint to obtain W ∗ (cid:23) (cid:101) ∆ H ( I m + B (cid:101) S ∗ B H ) − (cid:101) ∆ , and take the trace of bothsides to obtain the desired inequality. As (cid:101) S opt is the AMLestimate, we also have L app ( (cid:101) S opt ) ≤ L app ( (cid:101) S ∗ ) , which togetherwith (72) completes the proof. R EFERENCES [1] T. L. Marzetta, “Noncooperative cellular wireless with unlimitednumbers of base station antennas,”
IEEE Trans. on WirelessCommun. , vol. 9, no. 11, pp. 3590–3600, Nov. 2010.[2] H. Huh, G. Caire, H. Papadopoulos, and S. Ramprashad,“Achieving massive MIMO spectral efficiency with a not-so-large number of antennas,”
IEEE Trans. on Wireless Commun. ,vol. 11, no. 9, pp. 3226–3239, 2012.[3] J. Hoydis, S. Ten Brink, and M. Debbah, “Massive mimo in theul/dl of cellular networks: How many antennas do we need?”
IEEE J. on Sel. Areas on Commun. (JSAC) , vol. 31, no. 2, pp.160–171, 2013.[4] E. Larsson, O. Edfors, F. Tufvesson, and T. Marzetta, “Massivemimo for next generation wireless systems,”
CommunicationsMagazine, IEEE , vol. 52, no. 2, pp. 186–195, 2014.[5] C. Shepard, H. Yu, N. Anand, E. Li, T. Marzetta, R. Yang,and L. Zhong, “Argos: Practical many-antenna base stations,”in
Proceedings of the 18th Annual International Conference onMobile Computing and Networking . ACM, 2012, pp. 53–64.[6] R. Rogalin, O. Y. Bursalioglu, H. Papadopoulos, G. Caire, A. F.Molisch, A. Michaloliakos, V. Balan, and K. Psounis, “Scalablesynchronization and reciprocity calibration for distributed mul-tiuser mimo,”
IEEE Trans. on Wireless Commun. , vol. 13, no. 4,pp. 1815–1831, 2014.[7] A. Adhikary, J. Nam, J.-Y. Ahn, and G. Caire, “Joint spatialdivision and multiplexing: the large-scale array regime,”
IEEETrans. on Inform. Theory , vol. 59, no. 10, pp. 6441–6463, 2013.[8] J. Nam, A. Adhikary, J.-Y. Ahn, and G. Caire, “Joint spatialdivision and multiplexing: Opportunistic beamforming, usergrouping and simplified downlink scheduling,”
IEEE J. of Sel.Topics in Sig. Proc. (JSTSP) , vol. 8, no. 5, pp. 876–890, 2014.[9] A. Adhikary, E. Al Safadi, M. K. Samimi, R. Wang, G. Caire,T. S. Rappaport, and A. F. Molisch, “Joint spatial division andmultiplexing for mm-wave channels,”
IEEE J. on Sel. Areas onCommun. (JSAC) , vol. 32, no. 6, pp. 1239–1255, 2014.[10] A. Adhikary, H. S. Dhillon, and G. Caire, “Massive-MIMOmeets HetNet: Interference coordination through spatial blank-ing,”
IEEE J. on Sel. Areas on Commun. (JSAC) , 2014.[11] ——, “Spatial blanking and inter-tier coordination in massive-mimo heterogeneous cellular networks,” in
Globecom Work-shops (GC Workshop) . IEEE, 2014, pp. 1229–1234.[12] J. Nam, G. Caire, Y. Ko, and J. Ha, “On the role oftransmit correlation diversity in multiuser MIMO systems,”
CoRR , vol. abs/1505.02896, 2015. [Online]. Available: http://arxiv.org/abs/1505.02896[13] P. P. Vaidyanathan and P. Pal, “Sparse sensing with co-primesamplers and arrays,”
IEEE Transactions on Signal Processing ,vol. 59, no. 2, pp. 573–586, 2011.[14] P. Vaidyanathan and P. Pal, “Theory of sparse coprime sensingin multiple dimensions,”
Signal Processing, IEEE Transactionson , vol. 59, no. 8, pp. 3592–3608, 2011.[15] Y. Chi, Y. C. Eldar, and R. Calderbank, “Petrels: Parallelsubspace estimation and tracking by recursive least squares frompartial observations,”
Signal Processing, IEEE Transactions on ,vol. 61, no. 23, pp. 5947–5959, 2013.[16] L. Balzano, R. Nowak, and B. Recht, “Online identification andtracking of subspaces from highly incomplete information,” in
Communication, Control, and Computing (Allerton), 2010 48thAnnual Allerton Conference on . IEEE, 2010, pp. 704–711.[17] E. J. Cand`es and B. Recht, “Exact matrix completion via con-vex optimization,”
Foundations of Computational mathematics ,vol. 9, no. 6, pp. 717–772, 2009.[18] R. Roy and T. Kailath, “Esprit-estimation of signal parametersvia rotational invariance techniques,”
Acoustics, Speech andSignal Processing, IEEE Transactions on , vol. 37, no. 7, pp.984–995, 1989.[19] R. O. Schmidt, “Multiple emitter location and signal parameterestimation,”
Antennas and Propagation, IEEE Transactions on ,vol. 34, no. 3, pp. 276–280, 1986. [20] Y. Chi, L. L. Scharf, A. Pezeshki et al. , “Sensitivity to basismismatch in compressed sensing,” Signal Processing, IEEETransactions on , vol. 59, no. 5, pp. 2182–2195, 2011.[21] W. U. Bajwa, J. Haupt, A. M. Sayeed, and R. Nowak, “Com-pressed channel sensing: A new approach to estimating sparsemultipath channels,”
Proceedings of the IEEE , vol. 98, no. 6,pp. 1058–1076, 2010.[22] R. Baraniuk and P. Steeghs, “Compressive radar imaging,” in
Radar Conference, 2007 IEEE . IEEE, 2007, pp. 128–133.[23] M. F. Duarte and R. G. Baraniuk, “Spectral compressive sens-ing,”
Applied and Computational Harmonic Analysis , vol. 35,no. 1, pp. 111–129, 2013.[24] A. C. Fannjiang, T. Strohmer, and P. Yan, “Compressed remotesensing of sparse objects,”
SIAM Journal on Imaging Sciences ,vol. 3, no. 3, pp. 595–618, 2010.[25] M. Herman, T. Strohmer et al. , “High-resolution radar viacompressed sensing,”
Signal Processing, IEEE Transactions on ,vol. 57, no. 6, pp. 2275–2284, 2009.[26] D. Malioutov, M. C¸ etin, and A. S. Willsky, “A sparse signalreconstruction perspective for source localization with sensorarrays,”
Signal Processing, IEEE Transactions on , vol. 53, no. 8,pp. 3010–3022, 2005.[27] S. Kunis and H. Rauhut, “Random sampling of sparse trigono-metric polynomials, ii. orthogonal matching pursuit versus basispursuit,”
Foundations of Computational Mathematics , vol. 8,no. 6, pp. 737–763, 2008.[28] P. Stoica and P. Babu, “Spice and likes: Two hyperparameter-free methods for sparse-parameter estimation,”
Signal Process-ing , vol. 92, no. 7, pp. 1580–1590, 2012.[29] P. Stoica, P. Babu, and J. Li, “New method of sparse parameterestimation in separable models and its use for spectral analysisof irregularly sampled data,”
Signal Processing, IEEE Transac-tions on , vol. 59, no. 1, pp. 35–47, 2011.[30] E. J. Cand`es and C. Fernandez-Granda, “Towards a mathemat-ical theory of super-resolution,”
Communications on Pure andApplied Mathematics , vol. 67, no. 6, pp. 906–956, 2014.[31] ——, “Super-resolution from noisy data,”
Journal of FourierAnalysis and Applications , vol. 19, no. 6, pp. 1229–1254, 2013.[32] Z. Tan, Y. C. Eldar, and A. Nehorai, “Direction of arrivalestimation using co-prime arrays: A super resolution viewpoint,”
Signal Processing, IEEE Transactions on , vol. 62, no. 21, pp.5565–5576, 2014.[33] M. Toeltsch, J. Laurila, K. Kalliola, A. F. Molisch,P. Vainikainen, and E. Bonek, “Statistical characterization ofurban spatial radio channels,”
Selected Areas in Communica- tions, IEEE Journal on , vol. 20, no. 3, pp. 539–549, 2002.[34] H. Asplund, A. A. Glazunov, A. F. Molisch, K. Pedersen,M. Steinbauer et al. , “The cost 259 directional channel model-part ii: macrocells,”
Wireless Communications, IEEE Transac-tions on , vol. 5, no. 12, pp. 3434–3450, 2006.[35] J. A. Tropp, A. C. Gilbert, and M. J. Strauss, “Algorithmsfor simultaneous sparse approximation. part i: Greedy pursuit,”
Signal Processing , vol. 86, no. 3, pp. 572–588, 2006.[36] J. A. Tropp, “Algorithms for simultaneous sparse approxima-tion. part ii: Convex relaxation,”
Signal Processing , vol. 86,no. 3, pp. 589–602, 2006.[37] K. Lee, Y. Bresler, and M. Junge, “Subspace methods for jointsparse recovery,”
Information Theory, IEEE Transactions on ,vol. 58, no. 6, pp. 3613–3641, 2012.[38] J. M. Kim, O. K. Lee, and J. C. Ye, “Compressive music:revisiting the link between compressive sensing and array signalprocessing,”
Information Theory, IEEE Transactions on , vol. 58,no. 1, pp. 278–301, 2012.[39] M. E. Davies and Y. C. Eldar, “Rank awareness in joint sparserecovery,”
Information Theory, IEEE Transactions on , vol. 58,no. 2, pp. 1135–1146, 2012.[40] M. Mishali and Y. C. Eldar, “Reduce and boost: Recoveringarbitrary sets of jointly sparse vectors,”
Signal Processing, IEEETransactions on , vol. 56, no. 10, pp. 4692–4702, 2008.[41] G. Tang, B. N. Bhaskar, P. Shah, and B. Recht, “Compressedsensing off the grid,”
Information Theory, IEEE Transactionson , vol. 59, no. 11, pp. 7465–7490, 2013.[42] Y. Li and Y. Chi, “Off-the-grid line spectrum denoising andestimation with multiple measurement vectors,” arXiv preprintarXiv:1408.2242 , 2014.[43] Z. Yang and L. Xie, “Exact joint sparse frequency recovery viaoptimization methods,” arXiv preprint arXiv:1405.6585 , 2014.[44] S. Boyd and L. Vandenberghe,
Convex optimization . Cam-bridge university press, 2004.[45] J. Neyman,
Su un teorema concernente le cosiddette statistichesufficienti . Istituto Italiano degli Attuari, 1936.[46] A. L. Yuille and A. Rangarajan, “The concave-convex proce-dure,”
Neural computation , vol. 15, no. 4, pp. 915–936, 2003.[47] M. Grant, S. Boyd, and Y. Ye, “Cvx: Matlab software fordisciplined convex programming,” 2008.[48] R. Price, “A useful theorem for nonlinear devices having gaus-sian inputs,”
Information Theory, IRE Transactions on , vol. 4,no. 2, pp. 69–72, 1958.[49] S. P. Boyd, L. El Ghaoui, E. Feron, and V. Balakrishnan,