[PDF] SiML: Sieved Maximum Likelihood for Array Signal Processing

Abstract

Stochastic Maximum Likelihood (SML) is a popular direction of arrival (DOA) estimation technique in array signal processing. It is a parametric method that jointly estimates signal and instrument noise by maximum likelihood, achieving excellent statistical performance. Some drawbacks are the computational overhead as well as the limitation to a point-source data model with fewer sources than sensors. In this work, we propose a Sieved Maximum Likelihood (SiML) method. It uses a general functional data model, allowing an unrestricted number of arbitrarily-shaped sources to be recovered. To this end, we leverage functional analysis tools and express the data in terms of an infinite-dimensional sampling operator acting on a Gaussian random function. We show that SiML is computationally more efficient than traditional SML, resilient to noise, and results in much better accuracy than spectral-based methods.

Full PDF

SSIML: SIEVED MAXIMUM LIKELIHOOD FOR ARRAY SIGNAL PROCESSING

Matthieu Simeoni ∗ , Paul Hurley †∗ ´Ecole Polytechnique F´ed´erale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland † Western Sydney University (WSU), NSW-2150 Sydney, Australia ©Copyright 2021 IEEE. Published in ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), scheduledfor 6-11 June 2021 in Toronto, Ontario, Canada. Personal use of this material is permitted. However, permission to reprint/republish this material foradvertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted componentof this work in other works, must be obtained from the IEEE. Contact: Manager, Copyrights and Permissions / IEEE Service Center / 445 Hoes Lane / P.O.Box 1331 / Piscataway, NJ 08855-1331, USA. Telephone: + Intl. 908-562-3966.

ABSTRACT

Stochastic Maximum Likelihood (SML) is a popular direction of ar-rival (DOA) estimation technique in array signal processing. It is aparametric method that jointly estimates signal and instrument noiseby maximum likelihood, achieving excellent statistical performance.Some drawbacks are the computational overhead as well as the lim-itation to a point-source data model with fewer sources than sen-sors. In this work, we propose a

Sieved Maximum Likelihood (SiML) method. It uses a general functional data model, allowing an unre-stricted number of arbitrarily-shaped sources to be recovered. Tothis end, we leverage functional analysis tools and express the datain terms of an inﬁnite-dimensional sampling operator acting on aGaussian random function. We show that SiML is computationallymore efﬁcient than traditional SML, resilient to noise, and results inmuch better accuracy than spectral-based methods.

Index Terms — stochastic maximum likelihood, sieved maxi-mum likelihood, spatially extended sources, random ﬁelds, samplingoperator, array signal processing.

1. INTRODUCTION

Array signal processing [1, 2, 3] is primarily concerned with thesensing, processing and estimation of random waveﬁelds (electro-magnetic or mechanic). Techniques from array signal processingare used in myriad of applications, including for example acoustics[4, 5], radio-interferometry [6, 7], radar and sonar systems [2, 8],wireless networks [9, 10, 11], and medical imagery [12, 13]. Thesensing devices in all those applications consist of large networks ofsensors, called sensor arrays or phased-arrays .A common task in array signal processing consists of estimat-ing the intensity ﬁeld of an emitting waveﬁeld. The various algo-rithms available in the literature for this purpose divide into twocategories [1]: spectral-based and parametric methods. Spectral-based ones estimate the intensity ﬁeld by “steering” the array to-wards particular directions in space and evaluating the output power.The intensity ﬁeld is thus recovered sequentially by scanning via beamforming [1, 8, 11] a grid covering the ﬁeld of view. Famousbeamformers include Bartlett , also known as

Matched Beamforming(MB) , and

Capon , which is often called

Minimum Variance Distor-tionless Response (MVDR) [1]. These are extremely simple to imple-ment, computationally attractive, and quite generic with little struc-tural or distributional assumptions on the sensed waveﬁeld. They are

Part of this work was carried out at the IBM Zurich Research Laboratory,where both authors previously worked. The work of Matthieu Simeoni wassupported by the Swiss National Science Foundation under Grant 200 021181 978/1, “SESAM - Sensing and Sampling: Theory and Algorithms”. however limited in terms of accuracy, in particular for extreme ac-quisition conditions such as small sample size, low signal-to-noiseratio and/or spatial correlation in the waveﬁeld [1].Parametric methods, on the other hand, attempt to overcomethose limitations using a statistical model for the instrument noiseand the unknown waveﬁeld. Typically, the thermal noise at eachsensor is modelled as an additive Gaussian white random processwhile the waveﬁeld is assumed to be the incoherent sum of Q pointsources with amplitudes distributed according to a multivariate com-plex Gaussian distribution, where Q is strictly smaller than the totalnumber of sensors composing the array. In this context the problemof estimating the intensity ﬁeld is generally referred to as a directionof arrival (DOA) estimation problem [1]. By thus specifying thedata model, parametric methods can achieve much better recoveryperformance, both theoretically and in practice [1]. Stochastic Max-imum Likelihood (SML) [1, 14, 15] is perhaps the most well-knownparametric method. It uses explicit maximum likelihood expressions[16] to estimate the various parameters involved in the traditional point-source model [1, 14], namely the noise and sources poweras well as the directions of arrival. Each parameter is estimatedconsistently and general theory of maximum likelihood guaranteesefﬁcient estimation as the number of samples grows to inﬁnity.These strong theoretical guarantees and excellent empirical perfor-mance come however at the cost of very intense computation [1].Moreover, the data model assumptions restrict its application topoint sources, of which there must be fewer than the number ofsensors, preventing its use in many applications. This is particularlytrue for radio-astronomy [6, 7], where the number of sources istypically far larger than the number of antennas forming the array.Moreover, the increased resolution of modern radio-interferometers[17, 18] permits celestial sources with spatial extent and complexshapes to be resolved, for which a point-source model is overlysimplistic.The present work, SiML, takes a maximum likelihood approachbased on more general functional data model, which, in particu-lar, allows potentially correlated sources with spatial extent andarbitrary shapes to be recovered. To this end, we leverage func-tional analysis tools [19, 20] and formulate the data in terms of aninﬁnite-dimensional sampling operator [19] acting on the wave-ﬁeld’s amplitude function , modelled as a complex Gaussian randomfunction [21]. Based on this data model, we derive a joint maximumlikelihood estimate for both the covariance kernel of this randomfunction and the sensor noise power. As the optimisation problemelicits many solutions, we deploy the method of sieves [22, 23] asa means of restricting the optimisation problem to a lower dimen-sional subspace. For identiﬁability, we show that this subspace musthave a smaller dimension than the total number of sensors in thearray. A suitable subspace dimension is obtained by trading offbetween likelihood improvement and model complexity through a r X i v : . [ s t a t . A P ] F e b he Bayesian Information Criterion (BIC) [24]. Simulations revealthat the subspace dimension acts as a regularisation parameter, in-creasing or decreasing with the SNR. For a known noise level, theresulting estimate of the covariance ﬁeld is shown to be an unbi-ased, consistent and asymptotically efﬁcient estimate of a particularoblique projection of the true covariance ﬁeld. The method is com-putationally far more efﬁcient than traditional SML and resilientto noise. Finally, we demonstrate by simulation that SiML obtainsmuch better accuracy and contrast than spectral-based methods.

2. A FUNCTIONAL DATA MODEL

To allow for the handling of very general sources, we introduce inthis section a functional data model. Leveraging tools from func-tional analysis [19, 25], the sensing device can be modelled as aninﬁnite dimensional sampling operator acting on an unknown ran-dom amplitude function (see [21] for an introduction to random func-tions). We ﬁrst investigate the population version of the data model,where the covariance matrix of the instrument recordings is known,before presenting its empirical counterpart, where the covariancematrix is estimated from i.i.d. observations. More details on themodelling assumptions can be found in [1, 14, 26, 3].

Consider an array of L sensors with positions { p , . . . , p L } ⊂ R .Assuming the emitting sources are in the far-ﬁeld [3] of the sensorarray, they can be thought of as lying on the unit sphere S . To allowfor arbitrary numbers of complex sources, we consider a notionalcontinuous source ﬁeld covering the entire sphere, with an associ-ated amplitude function , describing for each direction r ∈ S theemission strength of the source ﬁeld. In practice, source amplitudesﬂuctuate randomly [1, 14], and the amplitude function can be mod-elled as a complex random function S = { S ( r ) : Ω → C , r ∈ S } ,where Ω is some probability space. More precisely, we assume S tobe a Gaussian random function [21], i.e., that all its ﬁnite marginalshave distribution: [ S ( r ) , · · · , S ( r n )] d ∼ C N n (0 , K I n ) , ∀ I n ⊂ S , ∀ n ∈ N , where C N n denotes the n -variate centrally symmetric, complexGaussian distribution [27, 28], I n := { r , . . . , r n } and K I n ∈ C n × n is some valid covariance matrix depending on the set I n .From the Huygens-Fresnel principle [29], exciting the sourceﬁeld with a narrowband waveform of wavelength λ ∈ R results ina diffracted wavefront, which, after travelling through an assumedhomogeneous medium is recorded by the sensor array. In a far-ﬁeldcontext, the Fraunhofer equation [29, 3] permits this wavefront ateach sensor position p i ∈ R to be approximated by: Y ( p i ) = (cid:90) S S ( r ) exp (cid:18) − j πλ (cid:104) r , p i (cid:105) (cid:19) d r + n i , (1)where i = 1 , . . . , L, and n = [ n , . . . , n L ] is an additive whitenoise term capturing the inaccuracies in measurement of each sensor,distributed as [1] n d ∼ C N L ( , σI L ) , σ > . Noise across sensors is assumed to be identically and independentlydistributed and independent of the random amplitude function S .We assume that every realisation, or sample function [21], s ω : S → C of S is an element of some Hilbert space H = L ( S , C ) of ﬁnite-energy functions, and thus Eq. (1) can be writtenas: Y ( p i ) = (cid:104) S, φ i (cid:105) + n i , i = 1 , . . . , L, where φ i ( r ) := exp ( j π (cid:104) r , p i (cid:105) /λ ) , which in turn can be re-written more compactly using an analysis operator [19] Φ ∗ : H → C L , mapping an element of H to a ﬁnite number L of measurements: Y =  Y ( p ) ... Y ( p L )  =  (cid:104) S, φ (cid:105) ... (cid:104) S, φ L (cid:105)  +  n ... n L  = Φ ∗ S + n . We call Φ ∗ the sampling operator [19] associated with the sensor ar-ray. As the sum of two independent centred complex Gaussian ran-dom vectors, the vector of measurements Y is also a centred com-plex Gaussian random vector with covariance matrix Σ ∈ C L × L :(Σ) ij = (cid:90) (cid:90) S × S κ ( r , ρ ) φ ∗ i ( r ) φ j ( ρ ) d r d ρ + σ δ ij , (2)where i, j = 1 , . . . , L, δ ij denotes the Kronecker delta and κ : S × S → C is the covariance kernel [21] of S : κ ( r , ρ ) := E (cid:2) S ( r ) S ∗ ( ρ ) (cid:3) , r , ρ ∈ S . Introducing the associated covariance operator T κ : H → H : ( T κ f )( r ) := (cid:90) S κ ( r , ρ ) f ( ρ ) d ρ , f ∈ H , r ∈ S , we can again reformulate Eq. (2) in terms of the sampling operator Φ ∗ and its adjoint, called the synthesis operator [19], Φ : C L → H : Σ = Φ ∗ T κ Φ + σI L . (3)By analogy with the ﬁnite dimensional case [30], it is customaryto write κ = vec ( T κ ) , where the vec ( · ) operator maps an inﬁnite-dimensional linear operator onto its associated kernel representation.Because of the Gaussianity assumption, the covariance kernel κ (orequivalently the covariance operator T κ ) completely determines thedistribution of S . Our goal is hence to leverage Eq. (3) in order toform an estimate of κ from the covariance matrix Σ of the instrumentrecordings. Often the source ﬁeld is assumed to be spatially uncorre-lated, in which case the random function S is Gaussian white noise[21] and κ becomes diagonal. The diagonal part of κI ( r ) := κ ( r , r ) , r ∈ S , is called the intensity function of the source ﬁeld, of crucial interestin many array signal processing applications. In practice of course, the covariance matrix Σ needs to be estimatedfrom a ﬁnite number of i.i.d. observations of Y , say N . Typ-ically, the maximum likelihood estimate of Σ is formed by ˆΣ = N (cid:80) Ni =1 y i y Hi . It follows a L -variate complex Wishart distribution [27, 31] with N degrees of freedom and mean Σ : N ˆΣ d ∼ C W L ( N, Σ) . (4)The density of a complex Wishart distribution can be found in [31].In the next section, we use it to form the likelihood function of thedata ˆΣ and derive maximum likelihood estimates of the covariancekernel κ and the noise level σ . . SIEVED MAXIMUM LIKELIHOOD We now take the population and empirical data models Eqs. (3)and (4) and derive maximum likelihood estimates for κ and σ . Thesimpler case of known noise power, which allows for an insight-ful geometric interpretation of the maximum likelihood estimate interms of projection operators, is presented ﬁrst. That is then followedby the more general case given an unknown noise level. The log-likelihood function for κ and σ given the sufﬁcient statistic ˆΣ [32] can be written in terms of the density function of the complexWishart distribution [31], (cid:96) (cid:16) κ, σ | ˆΣ (cid:17) = − Tr (cid:104)(cid:0) Φ ∗ T κ Φ + σI L (cid:1) − ˆΣ (cid:105) − log (cid:12)(cid:12) Φ ∗ T κ Φ + σI L (cid:12)(cid:12) , (5) where the terms independent of κ and σ have been dropped. As σ > , it is guaranteed that the matrix Φ ∗ T κ Φ + σI L is invertibleand that the log-likelihood function is hence well-deﬁned. Maxi-mum likelihood estimates for κ and σ are then obtained by maximis-ing Eq. (5) with respect to κ ∈ L ( S × S ) and σ > . Since thesampling operator Φ ∗ has ﬁnite rank and consequently a non-trivialkernel, the log-likelihood function admits inﬁnitely many local max-ima. Indeed, for f ∈ N (Φ ∗ ) , adding a kernel of the form ¯ f ⊗ f to T κ in (5) does not change the value of the log-likelihood func-tion. We thus choose to impose a unique maximum by restrictingthe search space for κ to a lower dimensional subspace, and look forsolutions in the range of some synthesis operator ¯Ψ ⊗ Ψ , which willbe speciﬁed in Section 3.4: κ = (cid:0) ¯Ψ ⊗ Ψ (cid:1) vec ( R ) = M (cid:88) i,j =1 R ij ¯ ψ j ⊗ ψ i , ⇔ T κ = Ψ R Ψ ∗ , where R ∈ C M × M is a Hermitian symmetric matrix and Ψ ∗ : H → C M , Ψ : C M → H are the analysis and synthesis operators as-sociated with the family of functions { ψ , . . . , ψ M } ⊂ H . Thisregularisation of the likelihood problem by restricting the parame-ter space to a lower dimensional subspace is generally known as the method of sieves [22, 23]. The maximum likelihood estimates of R and σ are then given by minimising the negative log-likelihood: ˆ R, ˆ σ = arg min R ∈ C M σ> Tr (cid:20)(cid:16) GRG H + σI L (cid:17) − ˆΣ (cid:21) + log (cid:12)(cid:12)(cid:12) GRG H + σI L (cid:12)(cid:12)(cid:12) , (6) where G = Φ ∗ Ψ ∈ C L × M is the so-called Gram matrix [19],given by ( G ) ij = (cid:104) ψ j , φ i (cid:105) . For Eq. (6) to admit a unique solution,it is necessary to have at least as many measurements as unknowns.When the noise power is unknown a priori, this requires that M < L .When the noise power is known, there is one less unknown, leadingto M ≤ L . This is however not a sufﬁcient condition for identiﬁabil-ity, and we must further assume G to be of full column-rank. If thelatter condition is veriﬁed, we say that the two families of functions { φ , . . . , φ L } and { ψ , . . . , ψ M } are coherent with one another. Suppose the noise power σ is known. Then R becomes the only vari-able in Eq. (6), and a solution can easily be obtained by cancelling The tensor product ⊗ is deﬁned as (cid:0) ¯ f ⊗ f (cid:1) g := (cid:104) g, f (cid:105) f, ∀ f, g ∈L ( S , C ) . the derivative. This yields ˆ R = G † ˜Σ (cid:16) G † (cid:17) H = G † (cid:104) ˆΣ − σI L (cid:105) (cid:16) G † (cid:17) H , where G † ∈ C L × M is the left pseudo-inverse [33] of G . Hence,when restricting the search space to R ( ¯Ψ ⊗ Ψ) , the maximum like-lihood of κ is given by ˆ κ = M (cid:88) i,j =1 ˆ R ij ¯ ψ j ⊗ ψ i = (cid:0) ¯Ψ ⊗ Ψ (cid:1) vec (cid:18) G † (cid:104) ˆΣ − σI L (cid:105) (cid:16) G † (cid:17) H (cid:19) = (cid:0) ¯Ψ ⊗ Ψ (cid:1) (cid:104) ¯ G † ⊗ G † (cid:105) (ˆ ς − σ (cid:15) ) , (7)with ˆ ς = vec ( ˆΣ) ∈ C L and (cid:15) = vec ( I L ) ∈ C L . The intensityfunction is then obtained by taking the diagonal part of ˆ κ :ˆ I ( r ) = M (cid:88) i,j =1 ˆ R ij ψ i ( r ) ¯ ψ j ( r ) , ∀ r ∈ S . Using properties of the tensor product and the vec operator, we canre-write Eq. (3) as ς = (cid:0) ¯Φ ⊗ Φ (cid:1) ∗ κ + σ (cid:15) . Hence, since E [ˆ ς ] = ς ,Eq. (7) becomes on expectation E [ˆ κ ] = (cid:0) ¯Ψ ⊗ Ψ (cid:1) (cid:104) ¯ G † ⊗ G † (cid:105) (cid:0) ¯Φ ⊗ Φ (cid:1) ∗ κ. For M = L , G is invertible and G † = G − , making ( ¯Ψ ⊗ Ψ)[ ¯ G − ⊗ G − ]( ¯Φ ⊗ Φ) ∗ an oblique projection operator [19]. The operator ( ¯Ψ ⊗ Ψ)[ ¯ G − ⊗ G − ] is indeed a right-inverse of ( ¯Φ ⊗ Φ) ∗ : (cid:0) ¯Φ ⊗ Φ (cid:1) ∗ (cid:0) ¯Ψ ⊗ Ψ (cid:1) (cid:104) ¯ G − ⊗ G − (cid:105) = Φ ∗ Ψ G − ⊗ Φ ∗ Ψ G − = I L . (8)In the speciﬁc case where M = L , the maximum likelihood esti-mate ˆ κ is hence an unbiased, consistent and asymptotically efﬁcientestimator of the oblique projection of κ onto R ( ¯Ψ ⊗ Ψ) . Whenadditionally setting Ψ = Φ the projection becomes orthogonal.

Suppose now the noise power is unknown. We must hence minimiseEq. (6) with respect to both σ and R . Using the result from theorem1.1 of [16], we can write explicit solutions for the unique minimisersof Eq. (6): ˆ σ = Tr (cid:16) ˆΣ − GG † ˆΣ (cid:17) L − M , ˆ R = G † (cid:104) ˆΣ − ˆ σI (cid:105) (cid:16) G † (cid:17) H . (9)Again, the constrained maximum likelihood estimate of κ is givenby ˆ κ = (cid:0) ¯Ψ ⊗ Ψ (cid:1) (cid:104) ¯ G † ⊗ G † (cid:105) (ˆ ς − ˆ σ (cid:15) ) , (10)with intensity function ˆ I ( r ) = (cid:80) Mi,j =1 ˆ R ij ψ i ( r ) ¯ ψ j ( r ) . This time,since

M < L the consistency condition Eq. (8) cannot be met,and E [ˆ κ ] can no longer be interpreted as an oblique projection of κ .For values of M comparable to L though, the consistency conditionshould still hold approximately , and this geometrical interpretationprovides intuition. The left pseudo-inverse exists since G is assumed full-column rank. More precisely, the consistency condition will hold on a subspace of C L of dimension M . .4. On the choice of Ψ We have thus far only required the synthesis operator Ψ to be iden-tiﬁable, with the coherency condition requiring G = Φ ∗ Ψ to befull column-rank. This still leaves plenty of potential candidates.For practical purposes, we recommend taking Ψ = Φ W where W ∈ C L × M is a tall matrix, with columns containing the ﬁrst M eigenvectors of ˆΣ (assuming eigenvalues sorted in descendingorder). Such a choice presents numerous advantages. First, since R (Φ) ⊥ = N (Φ ∗ ) , the instrument can only sense functions withinthe range of Φ , and it is hence natural to choose R (Ψ) = R (Φ) .This canonical choice moreover yields an analytically computableGram matrix G . Indeed, we have G = Φ ∗ Φ W = HW , where H ∈ C L × L is given by (see of [7, Chapter 4 section 1.1]): ( H ) ij = 4 π sinc (2 π (cid:107) p i − p j (cid:107) /λ ) , i, j = 1 , . . . , L. Finally, by choosing the columns of W as the ﬁrst M eigenvectorsof ˆΣ , M acts as a regularisation parameter. Indeed, the eigenvec-tors associated to the smallest eigenvalues of ˆΣ are usually the mostpolluted by noise. Hence, truncating to the M largest eigenvalues re-duces the contribution of the noise in the ﬁnal estimate (see Figs. 2fto 2h). Moreover, small values of M will increase the chances of ( G H G ) ∈ C M × M in the left pseudo-inverse G † = ( G H G ) − G H being well-conditioned, thus improving the overall numerical stabil-ity of the algorithm. Suitable values of M can be obtained by min-imising the Bayesian Information Criterion (BIC) [24], often usedin model selection:

BIC ( M ) = − (cid:96) M + 2 M log( L ) , where ˆ (cid:96) M is the maximised log-likelihood function for a speciﬁc choice of M .Example of a BIC proﬁle and evolution of the BIC-selected M withthe signal-to-noise ratio are depicted in Fig. 1. Fig. 2 compares the performance of the proposed

Sieved Maxi-mum Likelihood (SiML) method in a radio astronomy setup to threepopular spectral-based methods, namely

Matched Beamforming(MB) , Maximum Variance Distortionless Response (MVDR) andthe

Adapted Angular Response (AAR) [34]. For this experiment,we generated randomly a layout L = 300 antennas and simulated N = 2000 random measurements from the ground truth intensityﬁeld Fig. 2a. Furthermore, we considered two metrics to assessthe quality of the recovered images: the traditional relative MeanSquared Error (MSE) and the Root Mean Squared (RMS) metric,which measures the contrast of an image by computing its standarddeviation over all pixels. The simulations reveal that the SiMLoutperforms all the traditional algorithms for the considered SNRrange in both metrics, except for large SNRs where MVDR exhibitsa slightly better contrast. As for the traditional SML method, SiMLperforms particularly well for challenging scenarios with very lowSNR.

4. CONCLUSION

SiML generalises the traditional SML method to a wider class of sig-nals, encompassing arbitrarily shaped, possibly correlated, sourcesof which there may be more than the number of sensors. The methodis numerically stable and admits a nice geometrical interpretation inthe case of known noise power. Simulations revealed its superior-ity with respect to state-of-the-art subspace-based methods, both interms of accuracy and contrast. Finally, the tensor product structurein Eq. (10) makes the estimate ˆ κ very efﬁcient to compute. This isin contrast to traditional SML, which requires minimising a highlynon-linear multi-dimensional function [1].

50 100 150 200 2501.831.841.851.861.871.88 10 (a) BIC proﬁle for the setup de-scribed in Fig. 2. ß (b) BIC-selected M vs. SNR for thesetup described in Fig. 2. Fig. 1 : Optimal choice of the dimension M based on the BayesianInformation Criterion (BIC). (a) Intensity ﬁeld. MBMVDRAAR M=15 M=101 M=295 M=15 M=101 M=295

53 46 0.15 0.18 0.15RMS Contrast0.16 (b) Relative MSE and RMS contrast scores of thevarious images below.(c) MB estimate. (d) MVDR estimate. (e) AAR estimate.(f) SiML estimate( M = 15 ). (g) SiML estimate(BIC-selected M = 101 ). (h) SiML estimate( M = 296 ).(i) Performance of the different algorithms for various SNR. The standard de-viation was obtained over 10 repeated experiments. Fig. 2 : Comparison between sieved maximum likelihood methodand various steered-response power methods (MB, MVDR, AAR).The parameters of the experiment were set to: L = 300 , N = 2000 , SNR = 5 dB. . REFERENCES [1] Hamid Krim and Mats Viberg, “Two decades of array signalprocessing research: the parametric approach,”

IEEE Signalprocessing magazine , vol. 13, no. 4, pp. 67–94, 1996.[2] Robert J Mailloux,

Phased array antenna handbook , vol. 2,Artech House Boston, 2005.[3] Don H Johnson and Dan E Dudgeon,

Array signal processing:concepts and techniques , Simon & Schuster, 1992.[4] Michael Brandstein and Darren Ward,

Microphone arrays: sig-nal processing techniques and applications , Springer Science& Business Media, 2013.[5] Jacob Benesty, Jingdong Chen, and Yiteng Huang,

Micro-phone array signal processing , vol. 1, Springer Science &Business Media, 2008.[6] A Richard Thompson, James M Moran, and George W Swen-son Jr,

Interferometry and synthesis in radio astronomy , JohnWiley & Sons, 2008.[7] Matthieu Simeoni, “Towards more accurate and efﬁcientbeamformed radio interferometry imaging,” M.S. thesis,EPFL, Spring 2015.[8] Simon Haykin, “Array signal processing,”

Englewood Cliffs,NJ, Prentice-Hall, Inc., 1985, 493 p. For individual items seeA85-43961 to A85-43963. , vol. 1, 1985.[9] Lal Chand Godara, “Application of antenna arrays to mo-bile communications. ii. beam-forming and direction-of-arrivalconsiderations,”

Proceedings of the IEEE , vol. 85, no. 8, pp.1195–1245, 1997.[10] Arogyaswami J Paulraj and Constantinos B Papadias, “Space-time processing for wireless communications,”

IEEE SignalProcessing Magazine , vol. 14, no. 6, pp. 49–83, 1997.[11] P. Hurley and M. Simeoni, “Flexibeam: analytic spatial ﬁlter-ing by beamforming,” in

International Conference on Acous-tics, Speech and Signal Processing (ICASSP), IEEE , March2016.[12] Zhi-Pei Liang and Paul C Lauterbur,

Principles of magneticresonance imaging: a signal processing perspective , The In-stitute of Electrical and Electronics Engineers Press, 2000.[13] Boaz Rafaely,

Fundamentals of spherical array processing ,vol. 8, Springer, 2015.[14] Petre Stoica, Bj¨orn Ottersten, Mats Viberg, and Randolph LMoses, “Maximum likelihood array processing for stochasticcoherent sources,”

IEEE Transactions on Signal Processing ,vol. 44, no. 1, pp. 96–105, 1996.[15] Petre Stoica, Erik G Larsson, and Alex B Gershman, “Thestochastic crb for array processing: A textbook derivation,”

IEEE Signal Processing Letters , vol. 8, no. 5, pp. 148–150,2001.[16] Petre Stoica and Arye Nehorai, “On the concentrated stochas-tic likelihood function in array signal processing,”

Circuits,Systems and Signal Processing , vol. 14, no. 5, pp. 669–674,1995.[17] MP Van Haarlem, MW Wise, AW Gunst, George Heald,JP McKean, JWT Hessels, AG De Bruyn, Ronald Nijboer,John Swinbank, Richard Fallows, et al., “Lofar: The low-frequency array,”

Astronomy & Astrophysics , vol. 556, pp. A2,2013. [18] Peter E Dewdney, Peter J Hall, Richard T Schilizzi, andT Joseph LW Lazio, “The square kilometre array,”

Proceed-ings of the IEEE , vol. 97, no. 8, pp. 1482–1496, 2009.[19] Martin Vetterli, Jelena Kovaˇcevi´c, and Vivek K Goyal,

Foun-dations of signal processing , Cambridge University Press,2014.[20] James O Ramsay,

Functional data analysis , Wiley OnlineLibrary, 2006.[21] Mikhail Lifshits, “Lectures on gaussian processes,” in

Lectureson Gaussian Processes , pp. 1–117. Springer, 2012.[22] Ulf Grenander and Grenander Ulf, “Abstract inference,” Tech.Rep., 1981.[23] Stuart Geman and Chii-Ruey Hwang, “Nonparametric maxi-mum likelihood estimation by the method of sieves,”

The An-nals of Statistics , pp. 401–414, 1982.[24] Harish S Bhat and Nitesh Kumar, “On the derivation of thebayesian information criterion,”

School of Natural Sciences,University of California , 2010.[25] James O Ramsay and Bernard W Silverman,

Applied func-tional data analysis: methods and case studies , vol. 77, Cite-seer, 2002.[26] Bj¨orn Ottersten, Peter Stoica, and Richard Roy, “Covariancematching estimation techniques for array signal processing ap-plications,”

Digital Signal Processing , vol. 8, no. 3, pp. 185–210, 1998.[27] Nathaniel R Goodman, “Statistical analysis based on a certainmultivariate complex gaussian distribution (an introduction),”

The Annals of mathematical statistics , vol. 34, no. 1, pp. 152–177, 1963.[28] Robert G Gallager,

Principles of digital communication , vol. 1,Cambridge University Press Cambridge, UK:, 2008.[29] T Douglas Mast, “Fresnel approximations for acoustic ﬁelds ofrectangularly symmetric sources,”

The Journal of the Acousti-cal Society of America , vol. 121, no. 6, pp. 3311–3322, 2007.[30] KG Jinadasa, “Applications of the matrix operators vech andvec,”

Linear Algebra and its Applications , vol. 101, pp. 73–79,1988.[31] D Maiwald and D Kraus, “Calculation of moments of complexwishart and complex inverse wishart distributed matrices,”

IEEProceedings-Radar, Sonar and Navigation , vol. 147, no. 4, pp.162–168, 2000.[32] Victor M Panaretos, “Statistics for mathematicians,” .[33] Heinz Werner Engl, Martin Hanke, and Andreas Neubauer,

Regularization of inverse problems , vol. 375, Springer Science& Business Media, 1996.[34] Alle-Jan van der Veen and Stefan J Wijnholds, “Signal process-ing tools for radio astronomy,” in