[PDF] Generalized robust shrinkage estimator and its application to STAP detection problem

Abstract

Recently, in the context of covariance matrix estimation, in order to improve as well as to regularize the performance of the Tyler's estimator [1] also called the Fixed-Point Estimator (FPE) [2], a "shrinkage" fixed-point estimator has been introduced in [3]. First, this work extends the results of [3,4] by giving the general solution of the "shrinkage" fixed-point algorithm. Secondly, by analyzing this solution, called the generalized robust shrinkage estimator, we prove that this solution converges to a unique solution when the shrinkage parameter β (losing factor) tends to 0. This solution is exactly the FPE with the trace of its inverse equal to the dimension of the problem. This general result allows one to give another interpretation of the FPE and more generally, on the Maximum Likelihood approach for covariance matrix estimation when constraints are added. Then, some simulations illustrate our theoretical results as well as the way to choose an optimal shrinkage factor. Finally, this work is applied to a Space-Time Adaptive Processing (STAP) detection problem on real STAP data.

Full PDF

SSUBMITTED TO IEEE TRANS. ON SIGNAL PROCESSING 1

Generalized robust shrinkage estimator and itsapplication to STAP detection problem

Frédéric Pascal,

Senior Member, IEEE,

Yacine Chitour and Yihui Quek

Abstract —Recently, in the context of covariance matrix es-timation, in order to improve as well as to regularize theperformance of the Tyler’s estimator [1] also called the Fixed-Point Estimator (FPE) [2], a "shrinkage" ﬁxed-point estimatorhas been originally introduced in [3]. First, this work extendsthe results of [4], [5] by giving the general solution of the"shrinkage" ﬁxed-point algorithm. Secondly, by analyzing thissolution, called the generalized robust shrinkage estimator, weprove that this solution converges to a unique solution when theshrinkage parameter β (losing factor) tends to 0. This solutionis exactly the FPE with the trace of its inverse equal to thedimension of the problem. This general result allows one to giveanother interpretation of the FPE and more generally, on theMaximum Likelihood approach for covariance matrix estimationwhen constraints are added. Then, some simulations illustrateour theoretical results as well as the way to choose an optimalshrinkage factor. Finally, this work is applied to a Space-TimeAdaptive Processing (STAP) detection problem on real STAPdata. Index Terms —Covariance matrix estimation, robust shrinkageestimation, Fixed Point Estimator, Tyler’s Estimator

I. I

NTRODUCTION

In the statistical signal processing area, the problem ofcovariance matrix estimation is an active topic of research[2], [4]–[11]. From an application point of view, a betteraccuracy in terms of covariance matrix estimation directlyinvolves an improvement of the system performance interms of estimation, detection and/or classiﬁcation, as shownin radar applications or in Direction-Of-Arrival estimationproblems (see e.g. [8], [9] and references therein). Untilrecent years, the classical assumption used for data modellingwas the Gaussian model that provides the well-knownSample Covariance Matrix (SCM) estimator as the MaximumLikelihood (ML) estimator for the data covariance matrix.However, in many practical cases, the SCM suffers frommajor drawbacks, as for instance in adaptive radar andsonar processing [12] since its performance can be stronglydegraded. This is also the case in the presence of non-Gaussian, impulsive and/or heterogeneous noise as well asin the presence of outliers (see e.g. [13] and referencestherein). To ﬁll these gaps, a general framework on robust

F. Pascal is with SONDRA, Supelec, Plateau du Moulon, 3 rue Joliot-Curie,F-91190 Gif-sur-Yvette, France (e-mail: [email protected]).Y. Chitour is with the Laboratoire des Signaux et Systèmes, Supelec, Plateaudu Moulon, 3 rue Joliot Curie, F-91192 Gif-sur-Yvette Cedex, France (e-mail:[email protected]).Y. Quek is a B.Sc student in Physics and Applied Mathematics at theMassachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge,USAThis work has been partially supported by the DGA grant N ◦ estimation theory has been extensively studied in the statisticalcommunity in the 1970s following the seminal works ofHuber, Hampel and Maronna [14]–[17]. The multivariatereal case introduced by Maronna in [17] has been recentlyextended by Ollila to the complex case [2], [8], [9] moreadapted for signal processing applications.Under this robust theory framework, most of recent worksin covariance matrix estimation considers the broader classof Complex Elliptically Symmetric (CES) distributions,originally introduced by Kelker [18], which encompassesthe Spherically Invariant Random Vectors (SIRV) [19] aswell as the Multivariate Generalized Gaussian Distributions(MGGD) [20]. [9] provides a complete review on CESapplied to array processing. We will refer to this paper formain results on CES. From a signal processing point of view,an important contribution of this class of distributions is thatthe covariance matrix does not necessary exist, which isthe case for instance in the multivariate Cauchy distributionwhose variance is inﬁnite. Thus, one can always consider theso-called scatter matrix that is always well deﬁned, contraryto the covariance matrix. In the case of ﬁnite second-ordermoment, the covariance matrix is equal to the scatter matrix,up to a scale factor (see [9] for more details). One importantconsequence is that one can always estimate the scatter matrixinstead of the covariance matrix. Moreover, for applicationsthat are invariant to a scale factor, like for instance DOAestimation with the MUltiple SIgnal Classiﬁcation (MUSIC)[21] or detection using the Adaptive Normalized MatchedFilter (ANMF), ﬁrstly introduced by [22] and analyzed in[23]–[26], the resulting performance is the same.In this context of covariance matrix estimation, to improvethe estimator performance as well as to deal with under-sampling cases (i.e. when the number of sample is less thanthe dimension of the data), a common regularization approachhas been widely studied, the diagonally loaded approachoriginally introduced by [27], [28] and applied to the SCM.More recently, a shrinkage has been proposed in [29] but alsoapplied to the SCM. To deal with non-Gaussian models andto have a robust approach, this shrinkage has been applied tothe Tyler’s estimator [1] to obtain a "shrinkage" ﬁxed-pointestimator (FPE), originally introduced by [3] for the casewhere the number of samples is less than the dimension.A rigorous proof of existence, uniqueness and convergenceof the associated recursive algorithm has been given by [4]in the case where a trace normalization has been added.Moreover, in [5], a similar shrinkage ﬁxed-point estimator hasbeen analyzed by including a penalized term on the trace of a r X i v : . [ s t a t . A P ] S e p SUBMITTED TO IEEE TRANS. ON SIGNAL PROCESSING the inverse. Then, in [30]–[32], this estimator has been usedwith the Expected Likelihood approach. However, in all thesework, no general proof for the existence and the uniquenessof this "shrinkage" FPE is provided. Only the particular casewhere the trace of the estimator is ﬁxed is analyzed.To ﬁll this gap, this work provides the general solution ofthis FP problem, even in the case of under sampling, i.e. whenthe number N of samples is less than the dimension m of theobservations. Moreover, this "shrinkage" FPE is comparedto the classical FPE and interestingly, it provides a simpleway to built a unique FPE of the covariance matrix. Finally,the proof relies on the analysis of a continuous function thatcan be seen as a Likelihood Function (LF) that generalizesthe LF of the so-called Angular-Complex Gaussian (ACG)distributions.The second part of this paper is devoted to the analysis ofthe "shrinkage" FPE in a Space-Time Adaptive Processing(STAP) context [33], [34]. For that purpose, the shrinkageparameter is studied in order to provide the better detectionperformance. Regarding the work of [31], this paperconsiders the so-called over-sampled case, which means thatthe number N of samples is greater than the dimension m ofthe observations. However, some preliminary results on theunder-sampled case will be provided when applying proposedapproach on STAP data. These results are linked with thoseof [32] that also considers the under-sampled case, i.e. wherethe number N of samples is less than the dimension m ofthe observations.The paper is organized as follows: section II presentsthe estimation context while section III contains the maincontribution of this work, i.e. the derivation of the ShrinkageFPE in a general context. First part of section IV is devotedto the analysis of the Shrinkage FPE through Monte Carlosimulations, and then this estimate is applied to a STAPdetection problem on a real set of data. Section V draws theconclusions and gives some outlines for further work. Forthe clarity of the presentation, some parts of the proofs arepostponed in the Appendix section.The following convention is adopted: italic indicates a scalarquantity, lower case boldface indicates a vector quantity andupper case boldface a matrix. T denotes the transpose operatorand H the transpose conjugate. E [ . ] is the expected valueoperator and Tr(.) denotes the trace operator. CN ( a , M ) isa complex Gaussian distribution with a mean vector a and acovariance matrix M . I is the identity matrix with appropriatedimension. II. B ACKGROUND

In the context of covariance matrix estimation, this paperfocuses on two particular estimators: the FPE or Tyler’sestimator and the shrinkage ﬁxed point estimator also calledthe diagonally loaded ﬁxed point estimator. Let us considera N -sample ( x , . . . , x N ) of independent and identically distributed (i.i.d.) m -variate random vectors with covariancematrix Σ if it exists, else Σ is the scatter matrix. The FPEor Tyler’s estimator [1], [2] which is deﬁned as the solutionof the following ﬁxed-point equation: Σ = f ( Σ ) , (1)where the map f is deﬁned over the positive deﬁnite hermitianmatrices of size m by f ( Σ ) = mN N (cid:88) n =1 x n x Hn x Hn Σ − x n . (2)As shown in [2] for the complex case, the solution exists and isunique up to a scale factor. Moreover, the associated recursivealgorithm deﬁned by  (cid:101) Σ ( k +1) = mN N (cid:88) n =1 x n x Hn x Hn (cid:98) Σ − k ) x n (cid:98) Σ ( k +1) = m Tr (cid:16) (cid:101) Σ ( k +1) (cid:17) (cid:101) Σ ( k +1) (3)converges towards the solution which respects the constraintsthat its trace is equal to m , whatever the initialization matrix Σ (0) .It is important to notice that the constraint on the trace,used here for identiﬁability considerations, is only consideredin the recursive algorithm to obtain a unique solution.However, this solution is not the general solution of Eq.(1), since this general solution is not unique but is deﬁnedup to a scale constant as shown in [2]. More precisely,under the constraint Tr ( Σ ) = m , it is proved in [2] thata particular solution exists, is unique and can be achievedby using the recursive algorithm deﬁned in Eq. (3). In thefollowing, this point will be more detailed concerning theshrinkage ﬁxed point estimator, since it is proved in thiswork that no additional constraint is required to obtain theuniqueness of a solution of the shrinkage ﬁxed point equation.Let us consider now the shrinkage (diagonally loaded) ﬁxedpoint originally introduced in [3], fully analyzed in [4] anddeﬁned as the solution of the following ﬁxed point equation,for β ∈ [0 , Σ ( β ) = (1 − β ) mN N (cid:88) n =1 x n x Hn x Hn Σ ( β ) − x n + β I . (4)Notice that no proof of existence and uniqueness of a solutionof Eq. (4) is given in [4] as stated by Theorem 1 of [4].Actually, it is proved that the following recursive algorithmconverges to a unique solution whatever the initialization:  (cid:101) Σ ( k +1) = (1 − β ) mN N (cid:88) n =1 x n x Hn x Hn (cid:98) Σ − k ) x n + β I (cid:98) Σ ( k +1) = m Tr (cid:16) (cid:101) Σ ( k +1) (cid:17) (cid:101) Σ ( k +1) (5) ASCAL et al. : COVARIANCE MATRIX ESTIMATION: GENERALIZED ROBUST SHRINKAGE ESTIMATOR VERSUS TYLER ESTIMATOR 3

In [5], a similar shrinkage ﬁxed point has been proposedand is deﬁned by ˜ Σ ( β ) = (1 − β ) mN N (cid:88) n =1 x n x Hn x Hn ˜ Σ ( β ) − x n + β m Tr ( ˜ Σ ( β ) − ) I . (6)The aim of this paper is to analyze the ﬁxed-point schemedeﬁned by Eq.(4). This is the purpose of the next section.III. M AIN CONTRIBUTION

One of the main contributions of this paper is to prove thatEq. (4) admits a unique solution for β ∈ ( ¯ β, , where ¯ β :=max(0 , − N/m ) and that this solution can be achieved by asimilar algorithm to (5) but without the step that imposes thetrace of the solution equal to m . Before turning into the proofof such a result, let us notice that, interestingly, we have thenext proposition. Proposition III.1

If Eq. (4) admits a solution Σ for some β ∈ (0 , , thus Σ veriﬁes the following constraint: Tr (cid:0) Σ − (cid:1) = m. (7) Proof:

Let us whiten Eq. (4) by Σ − : I = (1 − β ) mN N (cid:88) n =1 Σ − / x n x Hn Σ − / x Hn Σ − x n + β Σ − . By setting z n = Σ − / x n / (cid:113) x Hn Σ − x n , which is of unitnorm, one obtains I = (1 − β ) mN N (cid:88) n =1 z n z Hn + β Σ − , (8)and by taking the trace, one has m = (1 − β ) mN N (cid:88) n =1 Tr (cid:0) z n z Hn (cid:1) + β Tr (cid:0) Σ − (cid:1) . Since each z n is of unit norm, one has Tr (cid:0) z n z Hn (cid:1) = 1 andthe above equation reduces to m = (1 − β ) m + β Tr (cid:0) Σ − (cid:1) , which concludes the proof for any β different from 0.Consequently, there is no reason for a solution tosimultaneously verify Tr (cid:0) Σ − (cid:1) = m and Tr ( Σ ) = m as it is provided by the algorithm of [4], recalled in Eq. (5).Moreover, the way proposed by [5] that penalizes on thetrace of the inverse is not necessary since this constraint isnaturally satisﬁed.The second contribution of this paper consists in showing,when m/N < , that the map β (cid:55)→ Σ ( β ) deﬁned on (0 , admits a limit as β tends to zero and thus converges to theunique ﬁxed point of Eq. (1) whose inverse has its traceequal to m . The following theorem proves the existence andthe uniqueness of a solution of the ﬁxed point Eq. (4) forany β between 0 and 1 (bounds are included) when N > m ,but also in the case where N ≤ m for particular values of β . On the other hand, a simpler recursive algorithm is provided,whose convergence to the solution is ensured.We use D to denote the set of Positive Deﬁnite Symmetric(PDS) matrices of size m . Theorem III.1

The following ﬁxed point equation Σ = (1 − β ) mN N (cid:88) n =1 x n x Hn x Hn Σ − x n + β I , ( Σ , β ) ∈ D × (0 , , (9) admits a solution if and only if β ∈ ( ¯ β, , where ¯ β :=max(0 , − N/m ) , and in that case, the solution is uniqueand denoted Σ ( β ) .Moreover, when m/N < , lim β → Σ ( β ) exists and is equalto the unique ﬁxed point of Eq. (1) whose inverse has a traceequal to m . Let us notice that Theorem III.1 provides the main resultof this work. It directly implies that the unique solution ofEq. (4) tends to a particular Tyler’s estimator as β tends to0, i.e. the unique estimator that veriﬁes that the trace of itsinverse is equal to m .In the course of the argument of Theorem III.1, we willbe considering the functions F : D × [0 , → ( R + ) ∗ and f : D × [0 , → D given by F ( Σ , β ) := exp( − N β Tr ( Σ − ))det( Σ ) N N (cid:89) n =1 ( x (cid:62) n Σ − x n ) − m (1 − β ) (10) f ( Σ , β ) := (1 − β ) mN N (cid:88) n =1 x n x (cid:62) n x (cid:62) n Σ − x n + β I . (11)The extension to the complex numbers followsstraightforwardly, as proved in [2]. In the above functions, β appears as an argument but it will be ﬁxed ( β ∈ ( ¯ β, )in the proof. These notations will allow one to maximizethe function F ( · , β ) , that can be assimilated as a Likelihoodfunction, with respect to (w.r.t.) Σ . As mentioned in [31],[32], an important challenge when dealing with the shrinkageestimator is to ﬁnd a "good" shrinkage parameter. Discussionson the theorem results are provided after the proof in remarkIII.1. Proof:

The proof is divided into two parts according tothe value of the ratio m/N . A. Case where m/N < The proof strategy is similar to that of [11]. More precisely,we will ﬁrst prove that, for every β ∈ (0 , , Eq. (4) hassolutions by combining two facts: ( a ) solutions of Eq. (4) areexactly the critical points of F ( · , β ) and ( b ) F ( · , β ) admitsa unique strict global maximum on D . In a second step, wewill show that every critical point F ( · , β ) in D must be astrict local maximum and we hence conclude the ﬁrst part ofTheorem III.1 relying on a topological argument. As for thesecond one, this will result the study of the map β (cid:55)→ Σ ( β ) as β tends to zero. SUBMITTED TO IEEE TRANS. ON SIGNAL PROCESSING

In the sequel, we use F β and f β to denote respectivelythe maps over D given by F ( · , β ) and f ( · , β ) . Note that themaps F and f have been studied in detail in [2]. Let us nowconsider the " log -likelihood function" log( F β )( Σ ) = − N log(det( Σ )) − N β Tr ( Σ − ) − m (1 − β ) N (cid:88) n =1 log( x (cid:62) n Σ − x n ) . By differentiation w.r.t Σ , one obtains ∇ log( F β )( Σ ) = − N Σ − (cid:0) Σ − f β ( Σ ) (cid:1) Σ − , (12)where ∇ log( F β )( Σ ) is the unique symmetric matrix suchthat dF β ( Σ )( Q ) = Tr ( ∇ log( F β )( Σ ) Q ) for every symmetricmatrix Q . We therefore trivially conclude that the ﬁxed pointsof f ( · , β ) are exactly the critical points of F β .We next state the following proposition. Proposition III.2

Assume that m/N < . For every ( Σ , β ) ∈D × [0 , , one has that ( a ) F β can be extended by continuityon the boundary of D by zero; ( b ) F β ( Σ ) tends to zero as (cid:107) Σ (cid:107) tends to inﬁnity. Hence, F β admits a global maximum in D . Proof: The proof has been postponed in Appendix A.We next prove that every critical point of F β in D mustbe a local strict maximum. We then immediately conclude theargument for the existence of a solution of Eq. (4). Proposition III.3

For every β ∈ (0 , , if Σ ∈ D is a criticalpoint of F β , then Hess β (Σ)( Q ) N βF β ( Σ ) ≤ − β Tr ( Q Σ − Q Σ − ) , (13) where Hess β ( Σ ) is the Hessian of F β at Σ . One deduces atonce that Σ must be a local strict maximum.Proof: The proof has been postponed in Appendix B.As for the uniqueness of the solution of Eq. (4), we argueby contradiction. Assuming that two such solutions Σ and Σ exist in D , we can apply the mountain-pass theorem tothe functional /F β ( [35]) since the latter functional tendsto inﬁnity as one approaches the boundary of D . We thusobtain the existence of a saddle point of F β in D , which isnot possible according to Proposition III.3.We now turn to the second statement in Theorem III.1,namely the fact that lim β → Σ ( β ) exists and belongs to D .First notice that if (cid:98) Σ is an accumulation point of Σ ( β ) as β tends to zero such that (cid:98) Σ ∈ D , then (cid:98) Σ must be a ﬁxedpoint of f whose inverse has trace equal to m . According toTheorem IV.1 of [2], the latter is unique. To prove the thesis,it is therefore sufﬁcient to prove that Σ ( β ) is lower bounded,for β small enough, by a element of D . That last statementwould follow from an upper bound for det( Σ ( β )) , for β smallenough, since Tr ( Σ ( β ) − ) = m implies that Σ ( β ) ≥ /m I .This is the object of the following proposition. Proposition III.4

There exists C > such that, for every β ∈ (0 , , det( Σ ( β )) ≤ C .Proof: The proof has been postponed in Appendix C.The proof of Theorem III.1 when m/N < is thencomplete. Let us now turn to the case where m/N ≥ . B. Case where m/N ≥ We have the following result.

Proposition III.5

The ﬁxed point equation (9) admits aunique solution if and only if (1 − β ) m/N < .Proof: The proof has been postponed in Appendix D.This concludes the proof of Theorem III.1.Some comments on the results of Theorem III.1 are givenin the following remark.

Remark III.1

Two main points are involved by Theorem III.1: • First, when m/N > , the parameter β should be greaterthan − N/m , which is a very intuitive condition. Indeed,the greater the ratio of m/N, the more a priori informationis needed or equivalently, the stronger the regularizationhas to be. This implies for this shrinkage FP algorithmthat the weight applied on the identity matrix has to behigher. • Then, the proposed shrinkage FPE is unique, whichdiffers from the one deﬁned by Eq. (6) that is uniqueonly up to a scaling factor. Actually, the consequence ofthe penalization function used in [5] is to removed thenatural trace constraint. • Moreover, the approach used to prove Theorem III.1relies on the analysis of a likelihood function (LF), whichcan be seen as a generalization of the complex angularGaussian (AG) distributions detailed in [9], due to theparameter β . Of course, when β = 0 , one retrieves theclassical AG distribution. Interestingly, this LF can nowbe maximized w.r.t the CM but also w.r.t to β , to obtaina way to optimally set the β parameter. Unfortunately,as shown in the Appendix by Proposition E.1, the Log-likelihood function is convex and so the maximum isobtained for β = 0 or β = 1 . • Following previous remark, a way to ﬁnd an optimal (inthe sense of minimizing the Mean Square Error) value of β has been recently proposed in [36] by means of largeRandom Matrix Theory. IV. S

IMULATIONS

This section is divided into two parts. First, the algorithmbehaviour of the Shrinkage estimate is analysed and comparedto the FPE, only for estimation purposes. Moreover, theconvergence of the Shrinkage estimate towards the particularFPE that veriﬁed Tr ( (cid:98) Σ − ) = m , proved by Theorem III.1, isillustrated when the shrinkage parameter β tends to 0.Then, the Shrinkage estimate is used in a STAP applicationand its performance is compared to the one of the FPE, the ASCAL et al. : COVARIANCE MATRIX ESTIMATION: GENERALIZED ROBUST SHRINKAGE ESTIMATOR VERSUS TYLER ESTIMATOR 5 . . . . . . β N M S E FPEShrinkage FPE (a) ρ = 0 . . . . . . . β N M S E FPEShrinkage FPE (b) ρ = 0 . . . . . . . β N M S E FPEShrinkage FPE (c) ρ = 0 . Fig. 1. Normalized mean square error of ˆ M estimated by the FPE and theshrinkage FPE versus the parameter β of the shrinkage FPE and for differentcovariance matrices, i.e. for different values of the correlation coefﬁcient ρ ,where N = 24 samples and the dimension of the data is m = 12 for Gaussiannoise. standard Sample Covariance Matrix (SCM) and the DiagonallyLoaded-SCM (DL-SCM). A. Shrinkage FPE algorithm analysis

In these simulations, the true covariance matrix Σ = α M is deﬁned through a correlation coefﬁcient ρ as follows: M ij = ρ | i − j | , for ρ ∈ (0 , (14)where α is a scalar that ensures Tr ( Σ − ) = m , i.e. α = Tr ( M − ) /m , to be coherent with the theoretical results andthe discussions on the trace constraint. Consequently, theparticular FPE that is studied is the one whose trace of its inverse is equal to m . But, of course, for applicationpurposes with a priori knowledge on this coefﬁcient, therecursive algorithm can be modiﬁed without changing theresults (convergence to a unique solution) as it is provedin Theorem III.1. This deﬁnition allows to use covariancematrices close to the identity matrix (for ρ close to 0) as well asbad conditioned covariance matrices when ρ is close to 1. Thesamples are zero-mean generated from Gaussian distributionwith covariance matrix Σ . Let us notice that the result are stillvalid for any Spherically Invariant Random Vectors (SIRV)since, in the deﬁnition of both estimators (Shrinkage estimatorand FPE), they do not depend on a scalar factor that multipliesthe samples x n .Furthermore, the dimension m of the data is settled to beequal to 12, the number N of samples is equal to 24, 48 or200 and to assess estimates performances, the NormalizedMean Square Error (NMSE) is used.Figures 1(a), 1(b) and 1(c) show the NMSE versus theregularized parameter β for the Shrinkage estimator aswell as for the FPE, for different values of the correlationcoefﬁcient, i.e. ρ = 0 . , . and . and for a numberof samples N equal to 24, i.e. twice the dimension of theobservations. Let us ﬁrst notice that in all scenarios, theShrinkage estimator outperforms the FPE, particularly when ρ is closed to 0, since the regularization enforces to providean estimate close to the identity matrix. But, even for largevalues of ρ , the NMSE of the Shrinkage estimator is lowerthan the one of the FPE. Another interesting result is thatthe optimal regularized parameter changes according to thetrue covariance matrix: larger is the correlation coefﬁcient ρ , smaller is the optimal regularized parameter, and conversely.This behavior is also present on Figures 2(a), 2(b) and2(c) where the number of samples is equal to N = 48 . Butinterestingly, for a larger N , the Shrinkage estimator doesnot always outperform the FPE. Moreover, when N increases,the optimal value of β will be equal to , i.e. the Shrinkageestimator will be reduced to the FPE, as illustrated on Figure 3for ρ = 0 . and 0.99 and N = 200 . This can be explained bythe fact that the FPE is a consistent ML estimator, as shown in[9], [37] associated to a particular distribution of the samplewhile, as shown by equation (10), the Shrinkage estimatordoes not result from any known distribution. Furthermore,this conﬁrms what has been shown previously, which is thatmaximizing the LF associated to the Shrinkage estimate onboth Σ and β is not a good way for optimizing on theregularized parameter β since the optimization will provide β = 0 or β = 1 .Conversely, when the number of samples N is small, theregularization of the estimator plays an important role (as itis designed for it) and allows to signiﬁcantly improve theperformance, again in the sense of the NMSE. Besides, asshown in the following application, the regularization alsoallows one to deal with problem where N is smaller than thedimension m , which is impossible by using the FPE since it in the sense of the particular criterion used in this work, i.e. the NMSE SUBMITTED TO IEEE TRANS. ON SIGNAL PROCESSING . . . . . . β N M S E FPEShrinkage FPE (a) ρ = 0 . . . . . . . . . β N M S E FPEShrinkage FPE (b) ρ = 0 . . . . . . . . . β N M S E FPEShrinkage FPE (c) ρ = 0 . Fig. 2. Normalized mean square error of ˆ M estimated by the FPE and theshrinkage FPE versus the parameter β of the shrinkage FPE and for differentcovariance matrices, i.e. for different values of the correlation coefﬁcient ρ ,where N = 48 samples and the dimension of the data is m = 12 for Gaussiannoise. requires a matrix inversion in its deﬁnition.Finally, Figure 4 illustrates the convergence of the Shrinkageestimator towards the FPE when β tends to 0, by plotting thefollowing criterion: C ( β ) = (cid:107) Σ ( β ) − Σ F P (cid:107) F / (cid:107) Σ F P (cid:107) F fordifferent values of ρ and for m = 3 and N = 12 . B. Application to real STAP data

This section is devoted to the analysis of the Shrinkageestimator performance on a real STAP data set. Thisperformance are analyzed through the target detection . . . . . . . . β N M S E FPEShrinkage FPE (a) ρ = 0 . . . . . . . . . β N M S E FPEShrinkage FPE (b) ρ = 0 . Fig. 3. Normalized mean square error of ˆ M estimated by the FPE and theshrinkage FPE versus the parameter β of the shrinkage FPE and for differentcovariance matrices, i.e. for different values of the correlation coefﬁcient ρ ,where N = 200 samples and the dimension of the data is m = 12 forGaussian noise. . . . . . . . . β C ( β ) ρ = 0 . ρ = 0 . ρ = 0 . Fig. 4. Convergence of Σ ( β ) towards Σ FP (that veriﬁes Tr ( Σ − FP ) = m )when β → for N = 12 , m = 3 . The criterion used is C ( β ) = (cid:107) Σ ( β ) − Σ FP (cid:107) F / (cid:107) Σ FP (cid:107) F . problem. STAP [33], [34] is a recent technique usedin airborne phased array radar to detect moving targetsembedded in an interference background such as jammingor strong clutter. While conventional radars are capable ofdetecting targets both in the time domain related to targetrange and in the frequency domain related to target velocity,STAP uses an additional domain (space or information ASCAL et al. : COVARIANCE MATRIX ESTIMATION: GENERALIZED ROBUST SHRINKAGE ESTIMATOR VERSUS TYLER ESTIMATOR 7 collected by an antennas array) related to the target angularlocalization. The joint processing of these space-time data,by appropriate two-dimensional adaptive ﬁltering methods,allows stronger interference/clutter rejection and thereforeimproved target detection.The STAP data are provided by the French agencyDGA/MI : the clutter is real but the targets are synthetic.The number of sensors is S = 4 and the number of coherentpulses can be up to M = 64 . The centre frequency andthe bandwidth are respectively equal to f = 10 GHz and B = 5 MHz. The radar velocity is given by V = 100 m/s. Theinter-element spacing is d = 0 . m and the pulse repetitionfrequency is f r = 1 kHz. The Clutter to Noise Ratio (CNR)is equal to 20 dB. The maximum number of samples availablefor all scenarios is N = 408 . In the different scenarios, targetswith a Signal to Clutter Ratio (SCR) of -5 dB are present. Speed (m/s) A ng l e ( deg ) (Trial 10, beta= 0.5, 400 secondary data) − − − − −

10 Speed (m/s) A ng l e ( deg ) (Trial 10, beta= 0.6, 400 secondary data) − − − − − A ng l e ( deg ) (Trial 10, beta= 0.7, 400 secondary data) − − − − −

10 Speed (m/s) A ng l e ( deg ) (Trial 10, beta= 0.8, 400 secondary data) − − − − − A ng l e ( deg ) (Trial 10, beta= 0.9, 400 secondary data) − − − − −

10 Speed (m/s) A ng l e ( deg ) (Trial 10, beta= 1, 400 secondary data) − − − − − Fig. 5. log ( (cid:98) Λ S − FPE ) for different values of β : angle/speed map withparameters m = 256 , N = 400 . The colour scale is in dB with a dynamicof 30 dB from the maximum value of log ( (cid:98) Λ S − FPE ) The data under study contains 3 synthetic targets ( m/s, , cell 216), ( m/s, , cell 256) and ( − m/s, ,cell 296). We consider the range cell 256 under test. Moreover, The authors are grateful to DGA/MI for providing them this set of realdata

Speed (m/s) A ng l e ( deg ) (Trial 10, beta= 0.5, 400 secondary data) − − − − −

10 Speed (m/s) A ng l e ( deg ) (Trial 10, beta= 0.6, 400 secondary data) − − − − − A ng l e ( deg ) (Trial 10, beta= 0.7, 400 secondary data) − − − − −

10 Speed (m/s) A ng l e ( deg ) (Trial 10, beta= 0.8, 400 secondary data) − − − − − A ng l e ( deg ) (Trial 10, beta= 0.9, 400 secondary data) − − − − −

10 Speed (m/s) A ng l e ( deg ) (Trial 10, beta= 1, 400 secondary data) − − − − − Fig. 6. log ( (cid:98) Λ S − FPE − W ) for different values of β : angle/speed map withparameters m = 256 , N = 400 . The colour scale is in dB with a dynamicof 30 dB from the maximum value of log ( (cid:98) Λ S − FPE − W ) the range cells 216 and 296 are kept in the set of secondarydata to illustrate the robustness the shrinkage algorithm in caseof contaminated secondary data. For both algorithms understudy, four guard cells are removed around the range cell256. To detect the target and to analyze the performance ofthe Shrinkage FPE, we use the Adaptive Normalized MatchedFilter (ANMF) introduced by [22] and analyzed in [23]–[26].It is given by: (cid:98) Λ( (cid:99) M ) = | p H (cid:99) M − y | | p H (cid:99) M − p | | y H (cid:99) M − y | , (15)where p is the so-called STAP steering vector, y is theobservation under test, i.e. in this case the range cell 256and (cid:99) M is the covariance matrix estimator built on the setof secondary data, i.e. data that are assumed to be signal-free and i.i.d. The ANMF presents some properties of in-variance: it is invariant to a multiplicative scale factor onthe CM estimator. This implies that there is no need toimpose any trace constraint to the other CM estimators. Letus recall that the Shrinkage FPE has, by construction, a traceconstraint on its inverse. Now, let us denote (cid:98) Λ SCM − DL = (cid:98) Λ (cid:16)(cid:99) M SCM − DL ( β ) (cid:17) , (cid:98) Λ S − F P E = (cid:98) Λ (cid:16)(cid:99) M S − F P E ( β ) (cid:17) and SUBMITTED TO IEEE TRANS. ON SIGNAL PROCESSING (cid:98) Λ S − F P E − W = (cid:98) Λ (cid:16)(cid:99) M S − F P E − W ( β ) (cid:17) , where (cid:99) M SCM − DL ( β ) = 1 − βN N (cid:88) n =1 x Hn x n + β I , (cid:99) M S − F P E ( β ) is the unique solution of equation (4) and (cid:99) M S − F P E − W ( β ) is a solution of equation (6). Moreover, letus denote (cid:98) Λ SCM the ANMF built with the classical SCMthat will play the role of a benchmark.

Speed (m/s) A ng l e ( deg ) (Trial 10, beta= 0.5, 400 secondary data) − − − − −

10 Speed (m/s) A ng l e ( deg ) (Trial 10, beta= 0.6, 400 secondary data) − − − − − A ng l e ( deg ) (Trial 10, beta= 0.7, 400 secondary data) − − − − −

10 Speed (m/s) A ng l e ( deg ) (Trial 10, beta= 0.8, 400 secondary data) − − − − − A ng l e ( deg ) (Trial 10, beta= 0.9, 400 secondary data) − − − − −

10 Speed (m/s) A ng l e ( deg ) (Trial 10, beta= 1, 400 secondary data) − − − − − Fig. 7. log ( (cid:98) Λ SCM − DL ) for different values of β : angle/speed map withparameters m = 256 , N = 400 . The colour scale is in dB with a dynamicof 30 dB from the maximum value of log ( (cid:98) Λ SCM ) Figure 5 (resp. ﬁgure 6) depicts the detection results ob-tained with (cid:98) Λ S − F P E (resp. (cid:98) Λ S − F P E − W ) while on ﬁgure 7for different speeds and different azimuths, the results for (cid:98) Λ SCM − DL are given. More precisely, ﬁgures 5, 6 and 7represent the detection results for the range cell 256 in colourscale; the y -axis corresponds to the target angle (in degree)while the x -axis corresponds for the target velocity (in m.s − ).These two ﬁgures have been obtained for N = 400 secondarydata, which means that it is a classical over-sampled case.Finally, for each case, results are given for 6 different valuesof the Shrinkage parameter β , to highlight the impact of thisparameter onto the detection performance.The ﬁrst comment is that the well-known diagonal loadingtechniques allows one to improve the clutter rejection except for the SCM-DL on the clutter ridge. More interestingly, onecan notice that the result for the SCM-DL is the same for allvalues of β , which means that there is no adaptive whitening(clutter cancellation) with the covariance matrix estimate.The result is the same as when plugging the identity matrix( β = 1 ) in the ANMF instead of an estimate. The clutterwhich is mainly on the diagonal is not removed. On theopposite, the S-FPE provides very interesting results since,according to the values of β , the clutter is more or less totallyremoved. The cases where β equals 0.7 and 0.8 provide thebest clutter rejection. On the other hand, one can notice thatthe bad results of the SCM-DL are due to the presence oftargets in the secondary data while the performance of theS-FPE, due to the robustness of the Tyler’s estimator, arenot affected by these contaminated data. Finally, the S-FPEand the S-FPE-W seem to provide very similar results (thedifferences are of order − ) which could be explained bythe fact that the resulting covariance matrix estimators areequal up to a scaling constant. Speed (m/s) A ng l e ( deg ) (Trial 10, beta= 0.5, 200 secondary data) − − − − −

10 Speed (m/s) A ng l e ( deg ) (Trial 10, beta= 0.6, 200 secondary data) − − − − − A ng l e ( deg ) (Trial 10, beta= 0.7, 200 secondary data) − − − − −

10 Speed (m/s) A ng l e ( deg ) (Trial 10, beta= 0.8, 200 secondary data) − − − − − A ng l e ( deg ) (Trial 10, beta= 0.9, 200 secondary data) − − − − −

10 Speed (m/s) A ng l e ( deg ) (Trial 10, beta= 1, 200 secondary data) − − − − − Fig. 8. log ( (cid:98) Λ S − FPE ) for different values of β : angle/speed map withparameters m = 256 , N = 200 . The colour scale is in dB with a dynamicof 30 dB from the maximum value of log ( (cid:98) Λ S − FPE ) Then, since in a STAP context, the ground clutter hasbeen shown to be low-rank [38], it is possible to estimatethe covariance matrix with less secondary data by usingDL techniques. For this dataset, the estimated clutter rank,

ASCAL et al. : COVARIANCE MATRIX ESTIMATION: GENERALIZED ROBUST SHRINKAGE ESTIMATOR VERSUS TYLER ESTIMATOR 9

Speed (m/s) A ng l e ( deg ) (Trial 10, beta= 0.5, 200 secondary data) − − − − −

10 Speed (m/s) A ng l e ( deg ) (Trial 10, beta= 0.6, 200 secondary data) − − − − − A ng l e ( deg ) (Trial 10, beta= 0.7, 200 secondary data) − − − − −

10 Speed (m/s) A ng l e ( deg ) (Trial 10, beta= 0.8, 200 secondary data) − − − − − A ng l e ( deg ) (Trial 10, beta= 0.9, 200 secondary data) − − − − −

10 Speed (m/s) A ng l e ( deg ) (Trial 10, beta= 1, 200 secondary data) − − − − − Fig. 9. log ( (cid:98) Λ S − FPE − W ) for different values of β : angle/speed map withparameters m = 256 , N = 200 . The colour scale is in dB with a dynamicof 30 dB from the maximum value of log ( (cid:98) Λ S − FPE − W ) obtained from the Brennan rule [38] is then equal to r = 45 .This value is small in comparison to the full size of cluttercovariance matrix, m = SM = 256 .This is the purpose of ﬁgures 8, 9 and 10 where the numberof secondary data used to estimate the CM is N = 200 ,which is less than the data dimension. Thus, without anyDL techniques, the CM estimate is not invertible. First, onecan notice that the result for the DL-SCM are very similaras previous ones, due to the fact that the term with theidentity matrix is prevailing on the SCM. More importantly,concerning the S-FPE, the DL approach, for β large enough,allows to compute the FP algorithm which requires a matrixinversion. Let us recall that, according to Theorem III.1, β has to be greater than − N/m , i.e. approximately 0.22.Moreover, this leads to good results in terms of clutterrejection as well as of target detection. Moreover, the"optimal" value of β has changed and is now closer to 0.8.Finally, previous comments concerning the robustness of theS-FPE to the contaminated data are still valid, as well as thesimilar results between S-FPE and S-FPE-W.Now, to highlight the improvement brought by the shrinkage optimal in the sense that there is a good clutter cancellation Speed (m/s) A ng l e ( deg ) (Trial 10, beta= 0.5, 200 secondary data) − − − − −

10 Speed (m/s) A ng l e ( deg ) (Trial 10, beta= 0.6, 200 secondary data) − − − − − A ng l e ( deg ) (Trial 10, beta= 0.7, 200 secondary data) − − − − −

10 Speed (m/s) A ng l e ( deg ) (Trial 10, beta= 0.8, 200 secondary data) − − − − − A ng l e ( deg ) (Trial 10, beta= 0.9, 200 secondary data) − − − − −

10 Speed (m/s) A ng l e ( deg ) (Trial 10, beta= 1, 200 secondary data) − − − − − Fig. 10. log ( (cid:98) Λ SCM − DL ) for different values of β : angle/speed map withparameters m = 256 , N = 200 . The colour scale is in dB with a dynamicof 30 dB from the maximum value of log ( (cid:98) Λ SCM ) techniques, ﬁgure 11 depicts the results obtained by the"classical" ANMF built with the SCM, (cid:98) Λ SCM , in the samecontext as for other detectors, for N = 400 and N = 200 . Asexpected, when N = 400 , the performance are degraded incomparison of the performance of other detectors. However,the target can be detected but with a strong clutter level. Then,for a smaller number of secondary data, i.e. N = 200 , (cid:98) Λ SCM is not able anymore to detect the target .

Speed (m/s) A ng l e ( deg ) (Trial 10, 400 secondary data) − − − − − − − − − − − − (a) N=400 Speed (m/s) A ng l e ( deg ) (Trial 10, 200 secondary data) − − − − − − (b) N=200Fig. 11. log ( (cid:98) Λ SCM ) : angle/speed map with parameters m = 256 for therange bin 268 and for different values of N . The colour scale is in dB witha dynamic of 30 dB from the maximum value of log ( (cid:98) Λ SCM ) V. C

ONCLUSION

In the context of covariance matrix estimation, this paperpresents the derivation, namely the proofs of existence anduniqueness, of the shrinkage Fixed Point estimator. Contraryto the results presented in [4], this proof does not require anyadditional constraint on the trace of the shrinkage FP. However,this more general case has some limitations since it is provedthat the existence and uniqueness of the Shrinkage FP areonly valid for certain values of the shrinkage parameter. Moreprecisely, the results is true for m/N < and when m/N ≥ , β has to be greater than − N/m , which seems to bea realistic constraint. One the other hand, the performanceof the Shrinkage FP has been analyzed through Monte-Carlosimulations, and then, it has been applied on STAP data fortarget detection purposes. These results show the interest ofusing such a shrinkage method since it improves the detectionperformance and presents the main advantage of being able todeal with problems where the number of samples is less thanthe dimension of the observations.A

PPENDIX AP ROOF OF P ROPOSITION

III.2Deﬁne on D the functional L by L ( Σ ) = exp( − Tr ( Σ − ))det( Σ ) . (16)Then, one trivially has F β (Σ) = L ( Σ ) Nβ F ( Σ ) (1 − β ) . (17)Moreover, note that the real-valued function x (cid:55)→ x exp( − x ) deﬁned on ( R + ) ∗ is upper bounded by one and tends to zeroas x tends either to zero or to + ∞ . Therefore, L ( Σ ) tends tozero as soon as one of the eigenvalues of Σ tends to zero or + ∞ . By the properties of F proved in Theorem IV.1 of [2], F is bounded over D if m/N < and one deduces at onceItems ( a ) and ( b ) . A PPENDIX BP ROOF OF P ROPOSITION

III.3By a trivial computation, one has, for every β ∈ (0 , andsymmetric matrix Q , that Hess β (Σ)( Q ) βF β ( Σ ) = (cid:104) Q, d ∇ log F β ( Σ )( Q ) (cid:105) = − N (cid:104) Q, Σ − V ( Σ , Q, β ) Σ − (cid:105) , where V ( Σ , Q, β ) := Q − (1 − β ) mN N (cid:88) n =1 x (cid:62) n Σ − Q Σ − x n ( x (cid:62) n Σ − x n ) x n x (cid:62) n . Setting R := Σ − / Q Σ − / and using the vectors z n deﬁnedin Eq. (8), one rewrites the above equation as Hess β (Σ)( Q ) N βF β ( Σ ) = − (cid:2) Tr ( R ) − (1 − β ) mN N (cid:88) n =1 ( z (cid:62) n R z n ) (cid:3) . Multiplying Eq. (8) on the left and on the right by R , thentaking the trace, one gets an expression for Tr ( R ) which isreported in the above equation. One therefore obtains that Hess β (Σ)( Q ) N βF β ( Σ ) = − (cid:2) β Tr ( R Σ − R )+ (1 − β ) mN N (cid:88) n =1 ( (cid:107) R z n (cid:107) − ( z (cid:62) n R z n ) ) (cid:3) . Since the vectors z n have unit length, one deduces at oncefrom Cauchy-Schwarz’s inequality, that the summation termin the above equality is non negative. Hence Eq. (13).Therefore, if Σ ∈ D is a critical point of F β , Hess β (Σ) isnegative deﬁnite, implying that Σ is a local strict maximumfor F . A PPENDIX CP ROOF OF P ROPOSITION

III.4Let P be the unique ﬁxed point of F such that Tr ( P ) = m .Then, for every β ∈ (0 , , one has F β ( Σ ( β )) ≥ F β ( P ) and F ( Σ ( β )) ≤ F ( P ) . We multiply the two inequalities and,after using Eq. (17) and the fact that Tr ( Σ ( β ) − ) = m , wededuce that, for every β ∈ (0 , , det( Σ ( β )) ≤ exp( − m ) L ( P ) . A PPENDIX DP ROOF OF P ROPOSITION

III.5Set γ := (1 − β ) m/N and assume ﬁrst that Eq. (9)admits a solution Σ in D . Set d := x T Σ − x and Σ := Σ − γd x x T . Since Σ ≥ β I , one gets that Σ ispositive deﬁnite and ¯ d := x T Σ − x is strictly positive. Bya standard computation, one gets that d = (1 − γ ) ¯ d , whichimplies that γ < .Conversely, assume that γ < . A careful examinationof the proof of Theorem III.1 reveals that the assumption m/N < is required only to get that the functional F canbe extended by continuity on the boundary of D by zeroand bounded over D . It turns out that, under the relaxedcondition γ < , one can show that F β can be extended bycontinuity on the boundary of D by zero and F β ( Σ ) tends tozero as (cid:107) Σ (cid:107) tends to + ∞ . To proceed, one can also supposewithout loss of generality that N ≤ m and the ( x i ) ≤ i ≤ N arelinearly independent. For Σ ∈ D , set G β ( Σ ) := F β ( Σ ) /N .If < λ ( Σ ) ≤ · · · ≤ λ m ( Σ ) are used to denote the m -thordered eigenvalues of Σ , we deﬁne on D the followingfunctional for ρ > H ρ ( Σ ) := (cid:16) N (cid:89) n =1 exp( − ρ/λ n ( Σ )) λ n ( Σ ) − γ (cid:17) · (cid:16) m (cid:89) n = N +1 exp( − ρ/λ n ( Σ )) λ n ( Σ ) (cid:17) . (18) ASCAL et al. : COVARIANCE MATRIX ESTIMATION: GENERALIZED ROBUST SHRINKAGE ESTIMATOR VERSUS TYLER ESTIMATOR 11

Moreover, complete orthogonally the ( x i ) ≤ i ≤ N by an or-thonormal set ( y k ) N +1 ≤ i ≤ m and let Z be the m × m matrixwhose columns are the ( x i ) ≤ i ≤ N and then the ( y k ) N +1 ≤ i ≤ m .Finally note (cid:101) Σ := Z − Σ ( Z − ) T for every Σ ∈ D . Then onehas G β ( Σ ) = exp( − β Tr (( Z T Z ) − (cid:101) Σ )det( Z − ) det( (cid:101) Σ ) N (cid:89) n =1 (cid:16) (cid:101) Σ − (cid:17) − γnn . We ﬁnally need the following lemma before estimating G β . Lemma D.1 ( i ) If A, B are two symmetric positivedeﬁnite m × m matrices, then there exist two positiveconstants a , a only depending on A such that a Tr ( B ) ≤ Tr ( AB ) ≤ a Tr ( B ) . ( ii ) If A is a symmetric positive deﬁnite m × m matrix,then N (cid:89) j =1 A jj ≥ N (cid:89) j =1 λ j ( A ) . Proof:

The right inequality in Item ( i ) follows with fromVon Neuman’s theorem (cf. Theorem . . in [39]) and thethe left inequality as a application of the right inequalitywith A − and A / BA / instead of A and B . As forItem ( ii ) , ﬁrst consider A N = ( A ij ) ≤ i,j ≤ N . According toHadamard’s inequality, one gets N (cid:89) j =1 A jj ≥ det( A N ) . Thenone concludes by using Theorem . . in [39] stating that λ j ( A N ) ≥ λ j ( A ) for ≤ j ≤ N .Now, we deduce at once from the previous lemma and Eq.(18) that G β ( Σ ) ≤ C H aβ ( (cid:101) Σ ) , where the constants C and a only depend on Z .Note that the real-valued function x (cid:55)→ x exp( − aβx ) deﬁned on ( R + ) ∗ is upper bounded and tends to zero as x tends either to zero or to + ∞ . The same result holds truefor x (cid:55)→ x − γ since γ < . One gets immediately that H aβ (and then G β ) admits a global maximum on D and oneconcludes the proof of Proposition III.5 by using the rest ofthe argument of Theorem III.1.A PPENDIX EA NALYSIS OF THE GLOBAL MAXIMUM

In this section, we investigate the more general questionof maximizing F over the full state space [0 , × D . Takinginto account the previous results, it is enough to maximizeover [0 , the function M ( β ) := log F ( β, Σ ( β )) . We get thefollowing. Proposition E.1

The maximum of F over [0 , ×D is reachedfor β equal to zero or one and, therefore, the set of maximizersis either Id m or the half-line R + ∗ P , where P is the uniqueﬁxed point of F such that Tr ( P ) = m .Proof: We will derivate the function M ( · ) and for that,we next prove that M ( · ) is of class C over (0 , . To show that, it amounts to prove that the function β (cid:55)→ Σ ( β ) is itselfof class C over (0 , . To see the latter, notice that Σ ( β ) isdeﬁned implicitly by the equation ∂ log F∂ Σ ( β, Σ ( β )) = 0 , (19)for β ∈ (0 , . Since F is real-analytic with respect to itsarguments, if one is able to apply the implicit function theoremto the above equation, then at once one gets the desiredregularity for β (cid:55)→ Σ ( β ) over (0 , . In turn, to meet theconditions of the implicit function theorem, one must getthe invertibility of the map ∂ log F∂ Σ ( β, Σ ( β )) , i.e. that of Hess β ( Σ ( β )) , which is established in Proposition III.3.For β ∈ (0 , , one has Eq. (19) and one deduces that M (cid:48) ( β ) = ∂ log F∂β ( β, Σ ( β )) (20) = − N m + m N (cid:88) n =1 log( x (cid:62) n Σ − ( β ) x n ) . (21)We next prove that M (cid:48)(cid:48) ( β ) > for β ∈ (0 , . First, takingthe derivative with respect to β in Eqs. (19) and (21) yieldsthat, for β ∈ (0 , , ∂ log F∂β∂ Σ ( β, Σ ( β )) + ∂ log F∂ Σ ( Σ (cid:48) ( β ) , · ) = 0 , and on the other hand M (cid:48)(cid:48) ( β ) = ∂ log F∂β∂ Σ ( β, Σ ( β ))( Σ (cid:48) ( β ) . One immediately deduces that, for β ∈ (0 , , M (cid:48)(cid:48) ( β ) = − Hess β ( Σ (cid:48) ( β )) > . We deduce that M is convex on [0 , and, therefore, achievesits maximum at β equal to zero or one. Remark E.1

According to the above proposition, one mustcompare one with N (cid:89) n =1 ( x (cid:62) n Σ − x n ) to decide whether themaximum is reached for β = 0 or β = 1 . A KNOWLEDGEMENT

The authors would like to thank Dr. Romain Couillet forpointing out the necessary part of Proposition III.5.R

EFERENCES[1] D. Tyler, “A distribution-free M-estimator of multivariate scatter,”

TheAnnals of Statistics , vol. 15, no. 1, pp. 234–251, 1987.[2] F. Pascal, Y. Chitour, J. Ovarlez, P. Forster, and P. Larzabal, “Covariancestructure maximum-likelihood estimates in compound Gaussian noise:existence and algorithm analysis,”

Signal Processing, IEEE Transactionson , vol. 56, no. 1, pp. 34–48, Jan. 2008.[3] Y. Abramovich and N. K. Spencer, “Diagonally Loaded NormalisedSample Matrix Inversion (LNSMI) for Outlier-Resistant Adaptive Filter-ing,” in

Acoustics, Speech and Signal Processing, 2007. ICASSP 2007.IEEE International Conference on , vol. 3. IEEE, 2007, pp. 1105–1108.[4] Y. Chen, A. Wiesel, and A. O. Hero, “Robust shrinkage estimationof high-dimensional covariance matrices,”

Signal Processing, IEEETransactions on , vol. 59, no. 9, pp. 4097–4107, 2011.[5] A. Wiesel, “Uniﬁed framework to regularized covariance estimationin scaled Gaussian models,”

Signal Processing, IEEE Transactions on ,vol. 60, no. 1, pp. 29–38, 2012. [6] F. Pascal, L. Bombrun, J.-Y. Tourneret, and Y. Berthoumieu, “Parameterestimation for multivariate generalized gaussian distributions,”

SignalProcessing, IEEE Transactions on , vol. 61, no. 23, pp. 5960–5971,December 2013.[7] Y. Abramovich and O. Besson, “Regularized covariance matrix estima-tion in complex elliptically symmetric distributions using the expectedlikelihood approach,” ISAE, Tech. Rep., 2012.[8] M. Mahot, F. Pascal, P. Forster, and J.-P. Ovarlez, “Asymptotic propertiesof robust complex covariance matrix estimates,”

Signal Processing, IEEETransactions on , vol. 61, no. 13, pp. 3348–3356, July 2013.[9] E. Ollila, D. Tyler, V. Koivunen, and H. Poor, “Complex ellipticallysymmetric distributions: Survey, new results and applications,”

SignalProcessing, IEEE Transactions on , vol. 60, no. 11, pp. 5597–5625,November 2012.[10] ——, “Compound-gaussian clutter modeling with an inverse gaussiantexture distribution,”

Signal Processing Letters, IEEE , vol. 19, no. 12,pp. 876–879, December 2012.[11] Y. Chitour and F. Pascal, “Exact maximum likelihood estimates for SIRVcovariance matrix: existence and algorithm analysis,”

Signal Processing,IEEE Transactions on , vol. 56, no. 10, pp. 4563–4573, Oct. 2008.[12] S. M. Kay,

Fundamentals of Statistical Signal Processing - DetectionTheory . Prentice-Hall PTR, 1998, vol. 2.[13] M. Mahot, P. Forster, J.-P. Ovarlez, and F. Pascal, “Robustness analysisof covariance matrix estimates,”

European Signal Processing Conference(EUSIPCO), Aalborg, Denmark , August 2010.[14] P. J. Huber, “Robust estimation of a location parameter,”

The Annals ofMathematical Statistics , vol. 35, no. 1, pp. 73–101, 1964.[15] ——, “The 1972 wald lecture robust statistics : A review,”

Annals ofMathematical Statistics , vol. 43, no. 4, pp. 1041–1067, August 1972.[16] F. R. Hampel, “The inﬂuence curve and its role in robust estimation,”

Journal of the American Statistical Association , vol. 69, no. 346, pp.383–393, June 1974.[17] R. A. Maronna, “Robust M -estimators of multivariate location andscatter,” Annals of Statistics , vol. 4, no. 1, pp. 51–67, January 1976.[18] D. Kelker, “Distribution theory of spherical distributions and a location-scale parameter generalization,”

Sankhy¯a: The Indian Journal of Statis-tics, Series A , vol. 32, no. 4, pp. 419–430, Dec. 1970.[19] K. Yao, “A representation theorem and its applications to sphericallyinvariant random processes,”

Information Theory, IEEE Transactions on ,vol. 19, no. 5, pp. 600–608, September 1973.[20] E. Gomez, M. A. Gomez-Villegas, and J.-M. Marin, “A multivariategeneralization of the power exponential family of distributions,”

Com-munications in statistics. Theory and methods , vol. 27, no. 3, pp. 589–600, 1998.[21] R. Schmidt, “Multiple emitter localization and signal parameter estima-tion,” in

Proc. RADC, Spectral Estimation Workshop , Rome, NY, 1979,pp. 243–248.[22] E. Conte, M. Lops, and G. Ricci, “Asymptotically Optimum RadarDetection in Compound-Gaussian Clutter,”

Aerospace and ElectronicSystems, IEEE Transactions on , vol. 31, no. 2, pp. 617–625, April 1995.[23] J. Liu, Z.-J. Zhang, Y. Yang, and H. Liu, “A CFAR Adaptive SubspaceDetector for First-Order or Second-Order Gaussian Signals Based on aSingle Observation,”

Signal Processing, IEEE Transactions on , vol. 59,no. 11, pp. 5126–5140, 2011.[24] S. Kraut and L. Scharf, “The CFAR adaptive subspace detector is a scale-invariant GLRT,”

Signal Processing, IEEE Transactions on , vol. 47,no. 9, pp. 2538–2541, 1999.[25] S. Kraut, L. L. Scharf, and L. T. Mc Whorter, “Adaptive subspacedetectors,”

IEEE Trans.-SP , vol. 49, no. 1, pp. 1–16, January 2001.[26] F. Pascal, J.-P. Ovarlez, P. Forster, and P. Larzabal, “Constant false alarmrate detection in spherically invariant random processes,” in

Proc. of theEuropean Signal Processing Conf., EUSIPCO-04 , Vienna, Sep. 2004,pp. 2143–2146.[27] Y. Abramovich, “Controlled method for adaptive optimization of ﬁltersusing the criterion of maximum snr,”

Radio Eng. Electron. Phys , vol. 26,no. 3, pp. 87–95, 1981.[28] B. D. Carlson, “Covariance matrix estimation errors and diagonalloading in adaptive arrays,”

Aerospace and Electronic Systems, IEEETransactions on , vol. 24, no. 4, pp. 397–401, 1988.[29] O. Ledoit and M. Wolf, “A well-conditioned estimator for large-dimensional covariance matrices,”

Journal of multivariate analysis ,vol. 88, no. 2, pp. 365–411, 2004.[30] Y. I. Abramovich and O. Besson, “Covariance matrix estimation incomplex elliptic distributions using the expected likelihood approach,”in

Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEEInternational Conference on , 2013, pp. 6476–6480. [31] Y. Abramovich and O. Besson, “Regularized covariance matrix estima-tion in complex elliptically symmetric distributions using the expectedlikelihood approach-part 1: The over-sampled case,”

Signal Processing,IEEE Transactions on , vol. 61, no. 23, pp. 5807–5818, 2013.[32] O. Besson and Y. Abramovich, “Regularized covariance matrix estima-tion in complex elliptically symmetric distributions using the expectedlikelihood approach-part 2: The under-sampled case,”

Signal Processing,IEEE Transactions on , vol. 61, no. 23, pp. 5819–5829, 2013.[33] J. Ward, “Space-time adaptive processing for airborne radar,” LincolnLab, MIT, Tech. Report, Dec 1994.[34] R. Klemm,

Principles of space-time adaptive processing . IET, 2002,no. 159.[35] M. Struwe,

Variational methods: applications to nonlinear partialdifferential equations and Hamiltonian systems . Springer, 2008, vol. 34.[36] R. Couillet and M. R. McKay, “Large Dimensional Analysis andOptimization of Robust Shrinkage Covariance Matrix Estimators,” arXivpreprint arXiv:1401.4083 , 2014.[37] F. Pascal, P. Forster, J. Ovarlez, and P. Larzabal, “Performance analysisof covariance matrix estimates in impulsive noise,”

Signal Processing,IEEE Transactions on , vol. 56, no. 6, pp. 2206–2217, Jun. 2008.[38] L. Brennan and F. Staudaher, “Subclutter visibility demonstration,”

Adeptive Sensors, Inc., Santa Monica, CA, Tech. Rep. RL-TR-92-21 ,1992.[39] R. A. Horn and C. R. Johnson,