Joint Estimation of Location and Scatter in Complex Elliptical Distributions: A robust semiparametric and computationally efficient R-estimator of the shape matrix
JJournal of Signal Processing Systems manuscript No. (will be inserted by the editor)
Joint Estimation of Location and Scatter in ComplexElliptical Distributions
A robust semiparametric and computationally efficient R -estimator of the shape matrix Stefano Fortunati · Alexandre Renaux · Fr´ed´eric Pascal
Received: date / Accepted: date
Abstract
The joint estimation of the location vector and the shape matrixof a set of independent and identically Complex Elliptically Symmetric (CES)distributed observations is investigated from both the theoretical and compu-tational viewpoints. This joint estimation problem is framed in the originalcontext of semiparametric models allowing us to handle the (generally un-known) density generator as an infinite-dimensional nuisance parameter. Inthe first part of the paper, a computationally efficient and memory saving im-plementation of the robust and semiparmaetric efficient R -estimator for shapematrices is derived. Building upon this result, in the second part, a joint esti-mator, relying on the Tyler’s M -estimator of location and on the R -estimatorof shape matrix, is proposed and its Mean Squared Error (MSE) performancecompared with the Semiparametric Cram´er-Rao Bound (CSCRB). Inferring the correlation structure of a set of centered data if a key step in manysignal processing and machine learning procedures. Among others, radar/sonardetection, image segmentation, dimension reduction, distance learning andclustering rely on the estimation of the covariance/correlation matrix of anacquired data set [2]. Along with the need of an estimation of the covariance
The work of S. Fortunati, A. Renaux and F. Pascal has been partially supported by DGAunder grant ANR-17-ASTR-0015.Stefano FortunatiUniversit´e Paris-Saclay, CNRS, CentraleSup´elec, Laboratoire des signaux et syst`emes, 91190,Gif-sur-Yvette, France & DR2I-IPSA, 94200, Ivry sur Seine, FranceE-mail: [email protected] Renaux and Fr´ed´eric PascalUniversit´e Paris-Saclay, CNRS, CentraleSup´elec, Laboratoire des signaux et syst`emes, 91190,Gif-sur-Yvette, FranceE-mail: [email protected], [email protected]. a r X i v : . [ s t a t . M E ] J a n Stefano Fortunati et al. matrix, there is another common aspect in all the above-mentioned applica-tions: the non-Gaussian and heavy-tailed nature of the data. As a consequence,popular Gaussian and pseudo-Gaussian inference procedures may present adramatic performance decay as extensively shown in statistics and signal pro-cessing literature (see e.g. [33] and the references therein).Motivated by a wide range of experimental evidences and measurementcampaigns, the (Real or Complex) Elliptically Symmetric (ES) model hasbeen recently adopted to characterize the statistical data behavior. The RESand CES models in fact have been proved to be able to catch the data heavy-tailedness in a large variety of applications such us radar/sonar [23, 29], hyper-spectral imaging [8, 17] and clustering [28, 30] just to cite a few. From nowon, in this paper, we will focus our attention only on complex-valued, CESdistributed, datasets. This choice allows us to work in the most general frame-work, since the obtained results can be easily “brought back” to the real-valuedcase.Together with its generality, the second feature that has placed the ellipticalmodel in the spotlight of signal processing and machine learning communitiesis its “parsimony” in terms of required parameters. In fact, CES model isfully specified by two finite dimensional parameters, i.e. a location vectorand a covariance/scatter matrix (as the classical Gaussian model) and by an infinite dimensional functional parameter, usually called density generator ,characterizing the data heavy-tailedness. To better understand the role of thisinfinite dimensional term, let us take a step back to introduce the notion of semiparametric model.Let { z l } Ll =1 be a set of L independent and identically distributed (i.i.d.) N -dimensional observations sharing the same probability density function (pdf),i.e. C N (cid:51) z l ∼ p Z , ∀ l . A parametric model P θ (cid:44) { p Z | p Z ( z l ; θ ) , θ ∈ Θ } isthen defined as a family of pdfs parameterized by a finite dimensional vector θ ∈ Θ . As a classical example, in multivariate Gaussian-based inference, θ isset up by the mean vector µ and by the covariance/scatter matrix Σ .However, a parametric model is generally too “narrow” and it fails totake into account all the actual uncertainty about the data distribution thatis generally present in practical scenarios. Semiparametric models have beenthen introduced to provide with an additional (functional) degree of freedom[1]. A semiparametric model P θ ,h (cid:44) { p Z | p Z ( z l ; θ , h ) , θ ∈ Θ, h ∈ G } is a familyof pdf parameterized by a finite dimensional vector θ ∈ Θ (as in the classicalparametric case) and by a function h belonging to some suitable function space G . Usually, in most applications, θ ∈ Θ is the parameter vector of interest,while h ∈ G can be considered as a nuisance function that “contains” themissing knowledge of the functional form of the data pdf p Z . Consequently,inference procedures in semiparametric models aim at estimating/testing for θ ∈ Θ in the presence of an unknown function h ∈ G whose estimation is notstrictly required.It is now immediate to realize that the CES model can be framed as asemiparametric model [3, 10, 12]. Formally, the pdf p Z of a CES-distributed oint Estimation of Location and Scatter in Complex Elliptical Distributions 3 random vector z l ∈ C N can be expressed as [22]: p Z ( z l | µ , Σ , h ) = | Σ | − h (cid:0) ( z l − µ ) H Σ − ( z l − µ ) (cid:1) , (1)where, as said before, the final dimensional parameter of interest θ ∈ Θ iscomposed of a location vector µ and by the scatter matrix Σ that representsthe correlation structure of the data, while the nuisance density generator h belongs to the set G = (cid:26) h : R + → R + (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) ∞ t N − h ( t ) dt < ∞ , (cid:90) p Z = 1 (cid:27) . (2)Two considerations are now in order: – The identifiability issue : the scatter matrix Σ and the density generator h are not jointly identifiable. Consequently, only scaled versions, usuallycalled shape matrix , V (cid:44) Σ /s ( Σ ) can be estimated [22]. According to ourrecent work [6], from now on we consider the shape matrix V (cid:44) Σ / [ Σ ] , , (3)i.e. the one obtained form the scatter matrix by constraining its first top-left element to be equal to one. – Augmented complex representation of θ : Following the rules of the Wirtingercalculus [13, 15, 27], in order to take into account the complex-value natureof the location vector µ and of the shape matrix V , the finite-dimensionalparameter θ has to be built up as [6]: θ (cid:44) ( µ (cid:62) , µ H , vec( V ) (cid:62) ) (cid:62) ∈ Θ ⊆ C q , (4)where q = N ( N + 2) − N + N −
1) and C q is a complex-vector spaceon real field of dimension q [13, 15]. Note that the “ −
1” is due to the factthat we constraint the first top-left element of V to be equal to 1, so itdoes not have to be estimated.Building upon the previous considerations, the semiparamentric CES model[3] can be cast as: P θ ,h = (cid:8) p Z | p Z ( z l | θ , h ) = | V | − h (cid:0) ( z l − µ ) H V − ( z l − µ ) (cid:1) ; θ ∈ Θ, h ∈ G (cid:9) , (5)Estimating θ ∈ Θ in the presence of different “degrees of uncertainty” onthe density generator h is a well-known problem in robust statistics and signalprocessing. The most popular class of robust estimators for θ ∈ Θ belongsto the family of M -estimators [14] and has been firstly proposed by Maronnain [18] and further developed and investigated by Tyler [31]. However, if onone hand the Maronna/Tyler M -estimators have the remarkable robustness The operator vec( A ) defines the N − A ) bydeleting its first element, i.e. vec ( A ) (cid:44) [ a , vec( A ) T ] T . In general, in this paper, we alwaysadopt the same notation used in our previous work [6]. Stefano Fortunati et al. property, they fail to be semiparametrically efficient , i.e their Mean SquareError (MSE) does not achieve the Semiparametric Cram´er-Rao Bound [3, 4].In order to fill this gap, in their seminal work Hallin, Oja and Paindaveine[12] proposed a new class of rank-based R -estimators of the shape matrixfor a set of centered RES-distributed data able to be both distributionallyrobust and (almost) semiparametric efficient . The real-valued R -estimatorproposed in [12] has been expended to the case of CES-distributed data inour recent work [6] where a theoretical and simulative analysis of its “finite-sample” performance has been provided as well.Following the trail of [6, 12], the present paper has two main goals:1. Derive a “computationally efficient” version of the complex-valued shapematrix R -estimator proposed in [6],2. Investigate the joint estimation problem of the location parameter µ andthe shape matrix V in the presence of an unknown density generator h . Having a “computationally efficient” implementation of an estimator is offundamental importance in real-time applications or in high-dimensional datasets. The new version of the R -estimator of the shape matrix proposed inSection 2 of this paper is computationally faster and “memory saving” thanthe previous implementation in [6] making its exploitation possible in a widerrange of applications. The paper continues with an exhaustive theoretical in-vestigation of the statistical interrelation underlying the joint semiparametricestimation of the location vector µ and of the shape matrix V provided inSection 3 while a robust semiparametric (and computationally) efficient jointestimator of location and shape is discussed in Section 4. The numerical anal-ysis of the performance of the proposed joint estimator is presented in Section5. Finally, some concluding remarks are collected in Section 6.We conclude this nonproductive Section with three paragraphs summariz-ing some useful notations and definitions that serve as prerequisites to thecomprehension of the material presented in the rest of the paper. Algebraic notation : For the sake of consistency with our previous works, inthe rest of this paper, we adopt the same notation already introduced in [6, 7].In addition to the list of symbols detailed in [6], we will make extensive use ofsome specific matrices whose definitions are collected below. In particular: P = [ e | e | · · · | e N ] (cid:62) , (6)where e i is the i -th vector of the canonical basis of R N , the projection matrix Π ⊥ vec( I N ) = I N − N − vec( I N )vec( I N ) (cid:62) , (7)and L V = P (cid:16) V −(cid:62) / ⊗ V − / (cid:17) Π ⊥ vec( I N ) , (8) The interested reader can find the Matlab and Python code related to our implementa-tion of this R -estimator in both RES- and CES- distributed data at https://github.com/StefanoFor . This part has been partially addressed in our related conference paper [7].oint Estimation of Location and Scatter in Complex Elliptical Distributions 5 where V is the shape matrix previously introduced. CES-related notation : Without any claim of completeness, we collect belowthe basic properties and notation on CES distributed random vectors. Werefer the reader to the excellent review paper [22] for additional material. Let θ (cid:44) ( µ (cid:62) , µ H , vec( V , ) (cid:62) ) (cid:62) be the “true” parameter vector to be estimatedand let h be the actual (and unknown) density generator. Let C N (cid:51) z ∼ p ( z ) ≡ p Z ( z ; θ , h ) ≡ CES N ( µ , V , , h ) a CES-distributed random vectorparameterized by a location vector µ , a shape matrix V , (cid:44) Σ / [ Σ ] , where Σ represents the relevant scatter matrix and a density generator h ∈G . Then, z satisfies the following stochastic representation: z = d µ + √Q Σ / u , (9)where u ∼ U ( C S N ) is a complex random vector uniformly distributed on theunit N -sphere and = d stands for “has the same distribution as”. The Q is independent from u and such that (s.t.): Q = d ( z − µ ) H Σ − ( z − µ ) (cid:44) Q, (10)Moreover, Q is distributed according to the following pdf: p Q , ( q ) = π N Γ ( N ) − q N − h ( q ) , (11)where Γ ( · ) is the Gamma function. For any h ∈ G , the function ψ is definedas ψ ( t ) (cid:44) d ln h ( t ) /dt. (12)Finally, the expectation operator of any measurable function f with respectto p ( z ) is indicated as E { f ( z ) } (cid:44) (cid:82) f ( z ) p Z ( z ; θ , h ) d z . Ranks : The concept of ranks of a set of relevant random variables are auseful tool in non-parametric statistics and numerous works can be found onthis topic (see [9], [32, Ch. 13] and references therein). Far be it from us to pro-pose a comprehensive overview of the use of ranks in robust statistics, in thefollowing we limit ourselves to introduce their definition since they will playa crucial role in the definition of the R -estimator of the shape matrix. Let { a , a , . . . , a L } be a set of L continuous i.i.d. random variables with unspeci-fied distribution. Let us rearrange the variables a l , l = 1 , . . . , L in an ascendingorder a L (1) < a L (2) < · · · < a L ( L ) and, consequently build the vector of orderstatistics as v A (cid:44) [ a L (1) , a L (2) , . . . , a L ( L ) ] (cid:62) . Then, the rank r l ∈ N / { } of a l isthe position index of a l in v A . R -estimatorfor shape matrices Building upon the the seminal work of Hallin, Oja and Paindaveine [12], in ourrecent papers [5, 6] a robust and semiparametric efficient R -estimator (cid:98) V ,R for the shape matrix V , of CES distributed data has been proposed and its Stefano Fortunati et al. properties investigated. This R -estimator has its roots in the Le Cam’s theoryof efficient “one-step” estimator [16] and consequently it can be expressed asa linear combination of two terms:vec( (cid:98) V ,R ) = vec( (cid:98) V (cid:63) ) + L − / (cid:99) Υ C − (cid:101) ∆ C (cid:98) V (cid:63) , (13)where (cid:98) V (cid:63) is a √ L -consistent preliminary estimator that provides (cid:98) V ,R with theconsistency property while the linear correction term L − / (cid:99) Υ C − (cid:101) ∆ C (cid:98) V (cid:63) makes (cid:98) V ,R semiparametric efficient. Let us have a closer look at the two quantitieswhich constitute the linear correction term (all the details can be found in [6]).The ( N − (cid:101) ∆ C (cid:98) V (cid:63) is the “distribution-free” version ofthe efficient central sequence [1] and it can be explicitly expressed as: (cid:101) ∆ C (cid:98) V (cid:63) (cid:44) √ L L (cid:98) V (cid:63) L (cid:88) l =1 K h (cid:18) r (cid:63)l L + 1 (cid:19) vec(ˆ u (cid:63)l (ˆ u (cid:63)l ) H ) , (14)where { r (cid:63)l } Ll =1 are the ranks of the random variables { ˆ Q (cid:63)l } Ll =1 defined as:ˆ Q (cid:63)l (cid:44) ( z l − (cid:98) µ (cid:63) ) H [ (cid:98) V (cid:63) ] − ( z l − (cid:98) µ (cid:63) ) , (15) (cid:98) µ (cid:63) and (cid:98) V (cid:63) are two √ L -consistent preliminary estimators of the locationvector µ and of the shape matrix V , . The random vectors { ˆ u (cid:63)l } Ll =1 aregiven by: ˆ u (cid:63)l (cid:44) ( ˆ Q (cid:63)l ) − / [ (cid:98) V (cid:63) ] − / ( z l − (cid:98) µ (cid:63) ) . (16)The function K h : (0 , → R + is the so-called score function and it is a keyelement to guarantee the robustness of (cid:98) V ,R . We refer the reader to [6] forfurther details on the assumptions that a score function has to satisfy and onhow to built it starting from the set of density generators G .The ( N − × ( N −
1) matrix (cid:99) Υ C represents the “distribution-free” ap-proximation of the semiparametric Fisher Information Matrix (SFIM) and itis given by [6, Eq. (52)]: (cid:98) Υ (cid:44) ˆ α C L (cid:98) V (cid:63) L H (cid:98) V (cid:63) , (17)where ˆ α C is a complex scalar that can be obtained as [6, Eq. (53)]:ˆ α C = || (cid:101) ∆ C (cid:98) V (cid:63) + L − / H C − (cid:101) ∆ C (cid:98) V (cid:63) |||| L (cid:98) V (cid:63) L H (cid:98) V (cid:63) vec( H C ) || , (18)and H C is a “small perturbation”, Hermitian, matrix s. t. [ H C ] , = 0. The choice of these preliminary estimators and of their impact on the asymptotic per-formance of (cid:98) V ,R will be extensively discussed in the next Sections.oint Estimation of Location and Scatter in Complex Elliptical Distributions 7 By substituting Eqs. (14) and (17) in Eq. (13), the R -estimator (cid:98) V ,R canbe explicitly re-written as (see [6, Eq. (54)]):vec( (cid:98) V ,R ) =vec( (cid:98) V (cid:63) )+1 L ˆ α C (cid:104) L (cid:98) V (cid:63) L H (cid:98) V (cid:63) (cid:105) − L (cid:98) V (cid:63) (cid:88) Ll =1 K h (cid:18) r (cid:63)l L + 1 (cid:19) vec(ˆ u (cid:63)l (ˆ u (cid:63)l ) H ) . (19)For an in-depth discussion about the semiparametric efficiency and therobustness property characterizing (cid:98) V ,R , we refer the readers to our previousworks [5, 6] and to the related statistical literature [10, 11, 12]. Here we focusour attention on an important aspect that has not been fully addressed yet: thecomputational cost underlying the calculation of Eq. (19). As already noted in[6, Sec. V.C], there is a main (computational) drawback in Eqs. (19) and (18)that really stands out: to evaluate the N × N matrix (cid:98) V ,R (or equivalently its( N − (cid:98) V ,R )), we have to calculatethe ( N − × ( N −
1) matrix L (cid:98) V (cid:63) . This may become a cumbersome bottleneckin many practical applications.Fortunately, as proved in Appendix A of this paper, it is possible to recastEqs. (19) and (18) in order to avoid the calculation of L (cid:98) V (cid:63) . In particular,a computationally efficient “matrix version” of the R -estimator (cid:98) V ,R can beexpressed as: (cid:98) V ,R = (cid:98) V (cid:63) + 1ˆ α C (cid:16) W − [ W ] , (cid:98) V (cid:63) (cid:17) , (20)ˆ α C = || z (cid:98) V (cid:63) + L − / H C − z (cid:98) V (cid:63) || (cid:13)(cid:13)(cid:13) vec (cid:16) ( (cid:98) V (cid:63) ) − H C ( (cid:98) V (cid:63) ) − − N − tr (cid:16) ( (cid:98) V (cid:63) ) − H C (cid:17) ( (cid:98) V (cid:63) ) − (cid:17)(cid:13)(cid:13)(cid:13) , (21)where: W (cid:44) L − / ( (cid:98) V (cid:63) ) / R ( (cid:98) V (cid:63) ) / . (22) z (cid:98) V (cid:63) (cid:44) vec (cid:16) ( (cid:98) V (cid:63) ) − / R ( (cid:98) V (cid:63) ) − / − ζ ( (cid:98) V (cid:63) ) − (cid:17) , (23) R (cid:44) √ L L (cid:88) l =1 K h (cid:18) r (cid:63)l L + 1 (cid:19) ˆ u (cid:63)l (ˆ u (cid:63)l ) H , (24) ζ (cid:44) N √ L L (cid:88) l =1 K h (cid:18) r (cid:63)l L + 1 (cid:19) . (25)It is worth stressing that Eqs. (20) and (21) involve matrix and vectorquantities of (linear) dimension equal at most to N and this fact leads toa great reduction in terms of computational load with respect to Eqs. (19)and (18) that, on the contrary, rely on the calculation of matrices whose lineardimension is N . A quantitative analysis of the reduction of the computationalload will be provided in Sec. 5. We conclude this Section by noticing that anexpression similar to Eq. (20) characterizing the R -estimator of the shape Stefano Fortunati et al. matrix of a set of Real Elliptically Symmetric (RES) distributed data hasbeen firstly provided in [12, Eq. (3.9)].
Remark : The interested reader can find our Matlab implementation of thecomputationally efficient R -estimator (cid:98) V ,R for the shape matrix of both Realand Complex Elliptically Symmetric distributed data at https://github.com/StefanoFor . After having introduced a computationally efficient version of the R -estimatorof the shape matrix V , , in this Section we will focus on a different stillinterrelated topic: which is the impact of not knowing the location vector µ when estimating V , ? Along with its theoretical implication, the answer tothis question has a practical importance as well. As shown in the previousSection in fact (see Eqs. (15) and (16)), the R -estimator (cid:98) V ,R relies on (cid:98) µ (cid:63) ,i.e. a √ L -consistent preliminary estimator of µ . However, Section 2 does notprovide any suggestion on which specific estimator (cid:98) µ (cid:63) should we choose amongall the possible √ L -consistent ones. In this Section then we are going to providewith the necessary theoretical framework that will allow us to make the goodchoice for (cid:98) µ (cid:63) .Let us start by formalizing the problem. Let { z l } Ll =1 be a set of CES dis-tributed vectors such that C N (cid:51) z l ∼ p ≡ CES N ( µ , V , , h ), ∀ l where µ and V , have to be considered as two finite-dimensional parameters ofinterest while h is a functional nuisance term.The two fundamental questions underlying the above mentioned joint es-timation problem are:1. What is the impact of not knowing h on the joint estimation of ( µ , V , )?2. What is the (asymptotic) impact that the lack of knowledge of µ has onthe estimation of V , and vice versa?To answer these two questions, we need to introduce the semiparametricefficient score vector ¯ s θ and the semiparamatric Fisher Information Martix (SFIM) ¯ I ( θ | h ). As discussed in the relevant statistical literature for a genericsemiparametric model [1] and recently investigated for the specific CES model[3, 4], the semiparametric efficient score vector for the joint estimation oflocation and shape matrix of a set of CES distributed data is given by:¯ s θ = [¯ s (cid:62) µ , ¯ s (cid:62) µ ∗ , ¯ s (cid:62) vec( V , ) ] (cid:62) = s θ − Π ( s θ |T h ) , (26)where s θ is the “classical” score vector defined, by means of the Wirtingerderivatives, as [13]:[ s θ ] i (cid:44) ∂ ln p Z ( z ; θ , h ) /∂θ ∗ i | θ = θ , i = 1 , . . . , q (27)and θ is given in Eq. (4) and q = N ( N + 2) −
1. The term Π ( s θ |T h ) indicatesthe orthogonal projection of s θ on the nuisance tangent space T h of the CES oint Estimation of Location and Scatter in Complex Elliptical Distributions 9 model P θ ,h in Eq. (5) evaluated at the true density generator h . Specifically, Π ( s θ |T h ) tells us the loss of information on the estimation of θ due to thelack of knowledge of h . In our previous work [3, Sec. III.A], we proved thefollowing facts:1. The projection of ¯ s µ and ¯ s µ ∗ onto T h is equal to zero: Π ( s µ |T h ) = Π ( s µ ∗ |T h ) = N , (28)This implies that the lack of knowledge of h does not have any impact onthe (asymptotic) estimation of the location parameter µ .2. The projection of ¯ s vec( V , ) onto T h is generally different from zero and itis given by: Π ( s vec( V , ) |T h ) = − (1 + N − Q ψ ( Q ))vec( V − , ) , (29)where ψ is defined in Eq. (12) while Q is given in Eq. (10). Consequently,not knowing h does have an impact on the estimation of the shape matrix V , .Points 1) and 2) answer the first question.To address the second question about the (asymptotic) cross-informationbetween µ and V , , we need to check the structure of the SFIM ¯ I ( θ | h ).The SFIM for the joint estimation of µ and V , in the CES semiparametricmodel P θ ,h in Eq. (5) has been evaluated in [3, Sec. III.C] as:¯ I ( θ | h ) (cid:44) E { ¯ s θ ¯ s H θ } = (cid:18) ¯ I ( µ | h ) N × ( N − ( N − × N ¯ I ( V , | h ) (cid:19) , (30)¯ I ( µ | h ) = E {Q ψ ( Q ) } N (cid:18) V − , N × N N × N V −∗ , (cid:19) , (31)¯ I (vec( V , ) | h ) = E {Q ψ ( Q ) } N ( N + 1) L V , L H V , == E {Q ψ ( Q ) } N ( N + 1) P (cid:2) V −(cid:62) , ⊗ V − , − N − vec( V − , )vec( V − , ) H (cid:3) P (cid:62) , (32)where, as before, the function ψ is given in Eq. (12) while Q is given in Eq.(10). Eq. (30) clearly shows that the efficient SFIM ¯ I ( θ | h ) is a block-diagonalmatrix, i.e. the cross-information terms between the location µ and the shapematrix V , are equal to zero. Consequently, the relevant estimation problemsare (asymptotically) decorrelated and can be considered as two separate esti-mation problems. This fact greatly simplify the implementation of a practicaljoint estimation algorithm. In fact, in estimating the shape matrix V , , thetrue (and generally unknown) location vector µ can be substituted by any ofits √ L -consistent estimators without any impact on the (asymptotic) perfor-mance of the estimator of V , . Of course, the vice versa holds true as well,i.e. any √ L -consistent estimator of V , can be used in place of the true shapematrix without any (asymptotic) impact on the estimation of µ . This im-portant theoretical result will be exploited in the next Section, to implementa robust , semiparametric efficient joint estimator for the location and shapematrix in CES distributed data. Robust estimation of location and shape in elliptical distributions is a well-known topic in statistics and signal processing since the seminal paper ofMaronna [18]. In particular, in [18], a general class of joint M -estimators of µ and V , (in the presence of an unknown density generator h ) has beenintroduced as the “fixed-point” solution of the following system of equations: (cid:88) Ll =1 u ( ˆ Q / l )( z l − ˆ µ ) = , (33) L − (cid:88) Ll =1 u ( ˆ Q l )( z l − ˆ µ )( z l − ˆ µ ) H = (cid:98) V , (34)where, according to Eq. (10), ˆ Q l = ( z l − ˆ µ ) H (cid:98) V − ( z l − ˆ µ ), and { z l } Ll =1 isthe set of available CES distributed observations such that C N (cid:51) z l ∼ p ≡ CES N ( µ , V , , h ), ∀ l . The functions u and u have to satisfy a given set ofassumptions that guarantees the existence and the uniqueness of the solutionof Eqs. (33) and (34) (see [18] for the real case and [22] for the extension tothe complex one).4.1 Tyler’s joint M -estimator of µ and V , Among different possible choices for u and u , Tyler in [31] (see also [8], [20]and [28]) showed that the functions u ( Q ) = Q − / and u ( Q ) = N Q − leadto the “minimax robust” M -estimator of the location and shape. Specifically,by defining ˆ Q ( k ) l = ( z l − ˆ µ ( k ) ) H [ (cid:98) V ( k )1 ] − ( z l − ˆ µ ( k ) ) , (35)where k indicates the iteration number, we have that the Tyler’s joint M -estimator of location and shape, i.e. ( ˆ µ T y , (cid:98) V ,T y ), can be obtained as theconvergence points ( k → ∞ ) of the following iterations:ˆ µ ( k +1) T y = (cid:34) L (cid:88) l =1 [ ˆ Q ( k ) l ] − / (cid:35) − L (cid:88) l =1 (cid:16) ˆ Q ( k ) l (cid:17) − / z l , (36) (cid:98) V ( k +1) T y = NL L (cid:80) l =1 ( z l − ˆ µ ( k ) Ty )( z l − ˆ µ ( k ) Ty ) H ˆ Q ( k ) l . (cid:98) V ( k +1)1 ,T y (cid:44) (cid:98) V ( k +1) Ty / [ (cid:98) V ( k +1) Ty ] , . (37)Note that, even if a formal proof of the joint convergence of Eqs. (36) and (37)is still an open problem, this iterative algorithm has been shown to providereliable estimates in most of the scenarios of possible interest in practicalapplications. We refer to [8] where joint M -estimators of the form (33)-(34)have been exploited in hyperspectral anomaly detection problems and to [28] oint Estimation of Location and Scatter in Complex Elliptical Distributions 11 where joint M -estimators have been derived as part of a general Expectation-Maximization (EM) algorithm for clustering applications.The estimators ˆ µ T y and (cid:98) V ,T y have the remarkable property of being √ L -consistent under any (unknown) density generator h ∈ G (see [31] for thereal-valued case and [20] for the complex-valued case). Consistency, however,is only one of the properties that good robust estimators should have. Anotherimportant property is the (semiparametric) efficiency.4.2 The Semiparametric Cram´er-Rao Bound (SCRB)A robust estimator is said to be semiparametric efficient if its Mean SquareError (MSE) achieves the Semiparametric Cram´er-Rao Bound (SCRB) [1] asthe number of available observations L goes to infinity. The SCRB for the jointestimation of location and shape in CES distributed data has been derived in[3] as the inverse of the SFIM in Eq. (30). We refer the reader to [3] for allthe details about its calculation. Here, for the sake of conciseness, we reportonly the final expression. As discussed before, since the efficient score vectorsfor the location, i.e. ¯ s µ and ¯ s µ ∗ , are orthogonal to the nuisance tangent space T h , the SCRB on the estimation of µ is equal to the “classical” CRB and itis given bySCRB( µ | h ) (cid:44) ¯ I ( µ | h ) − = NE {Q ψ ( Q ) } (cid:18) V , N × N N × N V ∗ , (cid:19) . (38)On the other hand, since as previously shown in Eq. (29), Π ( s vec( V , ) |T h ) (cid:54) = , the SCRB on the estimation of the shape matrix V , is tighter than the“classical” CRB (that is obtained for a perfectly known h ) and is given by:SCRB(vec( V , ) | h ) (cid:44) ¯ I (vec( V , ) | h ) − = N ( N + 1) E {Q ψ ( Q ) } (cid:104) L V , L H V , (cid:105) − , (39)where the matrix L V , is defined in Eq. (8). It is worth mentioning that theexpression of the SCRB given in Eq. (39) is valid only if the shape matrix isdefined through the constraint in Eq. (3), i.e. when the first-top left elementof V , is forced to be equal to 1. The interested reader may find the generalform of the SCRB for the shape matrix estimation under any constraints (e.g.constraints on its trace or determinant) in [3, 4].In [3, 4], it has been shown that robust M -estimators of the shape ma-trix are not semiparametric efficient. This efficiency issue can be overcome byexploiting the R -estimator of the shape given in Eqs. (20) in Sec. 2.4.3 An R -estimator of V , in non-centered CES dataIn this subsection, we finally put all our previous results together to provide arobust and semiparametric efficient joint estimation of the location vector µ and of the shape matrix V , of a set { z l } Ll =1 of CES-distributed observations.As previously discussed, in order to gain the semiparametric efficiency, wewill exploit the R -estimator in Eq. (20). As amply discussed in Section 2, toimplement this estimator we need:1. A preliminary √ L -consistent estimator (cid:98) V (cid:63) for the shape matrix and an-other √ L -consistent estimator of the location vector ˆ µ (cid:63) ,2. A score function K h : (0 , → R + .Due to their properties of minimax robustness and √ L -consistency underany density generator h ∈ G , the Tyler’s estimators previously introduced inEqs. (36) and (37) are perfect candidates for this role, i.e. ˆ µ (cid:63) ≡ ˆ µ T y and (cid:98) V (cid:63) ≡ (cid:98) V ,T y . Specifically, the general expression of the (computationally efficient) R -estimator given in Section 2 can be recast as: (cid:98) V ,R = (cid:98) V ,T y + 1ˆ α C (cid:16) W − [ W ] , (cid:98) V ,T y (cid:17) . (40)Note that, all the other related quantities reported in Eqs. (22) - (25) have tobe evaluated by substituting to the generic preliminary estimators ˆ µ (cid:63) and (cid:98) V (cid:63) with the Tyler’s estimators for location and scale, ˆ µ T y and (cid:98) V ,T y respectively.Regarding the second point, i.e. the choice of a score function K h , we willexploit two different options [6]: – The complex van der Waerden score function: K vdW ( u ) (cid:44) Φ − G ( u ) , u ∈ (0 , , (41)where Φ − G indicates the inverse function of the cdf of a Gamma-distributedrandom variable with parameters ( N, – The complex t ν -score given by: K t ν ( u ) = N (2 N + ν ) F − N,ν ( u ) ν + 2 N F − N,ν ( u ) , u ∈ (0 , , (42)where F N,ν ( u ) stands for the Fisher cdf with 2 N and ν ∈ (0 , ∞ ) degreesof freedom.The van der Waerden score K vdW has been proved to have excellent perfor-mance in terms of efficiency in the estimation of the shape matrix in centeredCES data [6, 24] while the t ν -score K t ν is able to provide with a better ro-bustness to the presence of possible outliers thanks to the presence of theadditional “tuning” parameter ν .To conclude this sub section, the pseudocode for the implementation of the R -estimator in Eq. (40) is provided in the following while the related Matlabcode can be downloaded at https://github.com/StefanoFor . oint Estimation of Location and Scatter in Complex Elliptical Distributions 13 Algorithm 1
Computationally efficient R -estimator for V , Input: z , . . . , z L . Output: (cid:98) V ,R .1: Evaluate the preliminary Tyler’s joint estimators:ˆ µ Ty ← lim k →∞ ˆ µ ( k ) Ty in (36), (cid:98) V ,Ty ← lim k →∞ (cid:98) V ( k +1)1 ,Ty in (37),2: Data centring: { z l } Ll =1 ← { z l − ˆ µ Ty } Ll =1 ,3: for l = l to L do
4: ˆ Q (cid:63)l ← z Hl (cid:98) V − ,Ty z l ,5: ˆ u (cid:63)l ← ( ˆ Q (cid:63)l ) − / (cid:98) V − / ,Ty z l ,6: end for
7: Evaluate the ranks { r (cid:63) , . . . , r (cid:63)L } of { ˆ Q (cid:63) , . . . , ˆ Q (cid:63)L } ,8: Select a score function K h ( K vdW in (41) and K t ν in (42) are two options),9: R ← √ L (cid:80) Ll =1 K h (cid:16) r (cid:63)l L +1 (cid:17) ˆ u (cid:63)l (ˆ u (cid:63)l ) H ,10: ζ ← N √ L (cid:80) Ll =1 K h (cid:16) r (cid:63)l L +1 (cid:17) ,11: W ← L − / ( (cid:98) V ,Ty ) / R ( (cid:98) V ,Ty ) / ,12: z (cid:98) V ,Ty ← vec (cid:16) ( (cid:98) V ,Ty ) − / R ( (cid:98) V ,Ty ) − / − ζ ( (cid:98) V ,Ty ) − (cid:17) ,
13: Obtain ˆ α C through the following two steps:1) Generate a random Hermitian matrix H C s.t. [ H C ] , = 0,2) ˆ α C ← || z (cid:98) V ,Ty + L − / H C − z (cid:98) V ,Ty || (cid:107) vec ( ( (cid:98) V ,Ty ) − H C ( (cid:98) V ,Ty ) − − N − tr ( ( (cid:98) V ,Ty ) − H C ) ( (cid:98) V ,Ty ) − ) (cid:107) , Remark:
The entries of H C should be “small enough” to guarantee that (cid:98) V ,Ty + L − / H C is a positive definite matrix.14: The last step: (cid:98) V ,R ← (cid:98) V ,Ty + α C (cid:16) W − [ W ] , (cid:98) V ,Ty (cid:17) ,15: return (cid:98) V ,R This Section will be basically divided in two parts. In the first one, we discussthe computational advantages that the “matrix version” of the R -estimatorprovided in Eq. (20) has with respect to the “vectorized version” derived in [6]and recalled in Eq. (19). In the second part we finally assess, through numericalsimulations, the semiparametric efficiency of the joint estimator ( ˆ µ T y , (cid:98) V ,R ),given in Eqs. (36) and (40), respectively. Data generation : In both the two parts, we generate the set of non-centeredCES distributed data { z l } Ll =1 according to a Generalized Gaussian (GG) dis-tribution [25], such that C N (cid:51) z l ∼ p , ∀ l where: p ( z l ) = | Σ | − h (cid:0) ( z l − µ ) H Σ − ( z l − µ ) (cid:1) , (43)while the relevant density generator is given by: h ( t ) (cid:44) sΓ ( N ) b − N/s π N Γ ( N/s ) exp (cid:18) − t s b (cid:19) , t ∈ R + . (44)We chose the GG distribution to assess the performance of the proposed jointestimator because of its flexibility in characterizing the data “heavy-tailness” with respect to the Gaussian one. In fact, according to the value of the shapeparameter s >
0, the GG density generator in (44) is able to define a distri-bution with both heavier tails (0 < s <
1) and lighter tails ( s >
1) comparedto the Gaussian one ( s = 1).The parameters adopted in our simulations are: – Σ is a Toeplitz Hermitian matrix whose first column is given by [1 , ρ, . . . , ρ N − ] T ; ρ = 0 . e j π/ and N = 8. – Shape matrix: V , (cid:44) Σ / [ Σ ] , . – Location vector: [ µ ] n (cid:44) . e j π/ n − , n = 1 , . . . , N . – Scale parameter: b = [ σ X N Γ ( N/s ) /Γ (( N + 1) /s )] s in Eq. (44) and σ X = E {Q} /N = 4. – Numbers of observations: L = 5 N . This clearly defines a “finite-sample”regime.5.1 Computational efficiency of the proposed “matrix version” of the R -estimator for shapeAs amply discussed in Section 2, the crucial difference between the “vector-ized” implementation of the R -estimator given in Eq. (19) and its “matrixversion” provided in Eq. (20) is in the fact that, while the first one relies onthe calculation of N × N matrix quantities, the latter only involves N × N matrices. Clearly, this will lead to a huge gain in term of computational effi-ciency, in particular when the data dimension N increases. In order to highlightthis fact, in Fig. 1 we report the time (in seconds) required for the calculationof three shape matrix estimators as function of the data dimension N : – The Tyler’s estimator (cid:98) V ,T y in Eq. (37), – The “vectorized version” of the R -estimator in Eq. (19) exploiting (cid:98) V ,T y as preliminary estimator for the shape matrix, – The “matrix version” of the R -estimator in Eq. (20) exploiting (cid:98) V ,T y aspreliminary estimator for the shape matrix.The curves in Fig. 1 are crystal clear: the proposed “matrix version” of the R -estimator is more than two orders of magnitude faster that the “vectorized”one derived in [6]. The gap between the two clearly increases as the datadimension N increases. Moreover, it can be noted that the computationaltime of the “matrix version” of the R -estimator in Eq. (20) is similar to theone of the Tyler’s estimator. These considerations provide us with an hardevidence in favor of the computational effectiveness of the “matrix version” ofthe R -estimator derived in Section 2 and suggest us to adopt it as standardform of the R -estimator.5.2 Statistical efficiency of the joint estimator for location and shapeAs previously anticipated, this last sub session is devoted to the assessment ofthe semiparametric efficiency of the joint estimator ( ˆ µ T y , (cid:98) V ,R ), where ˆ µ T y is oint Estimation of Location and Scatter in Complex Elliptical Distributions 15 . . . . N C o m pu t a t i o n a l t i m e [ s ] Tyler est R -est: “vectorized version” R -est: proposed “matrix version” Fig. 1
Comparison among the time riquired to calculate the Tyler’s estimator, the “vec-torized” and the proposed “matrix” versions of the R -estimator. the Tyler’s estimator in Eq. (36) of the location vector µ , while (cid:98) V ,R is the R -estimator in Eq. (40) of the shape matrix V , exploiting the Tyler’s jointestimator ( ˆ µ T y , (cid:98) V ,T y ) as preliminary √ L -consistent estimators.As basis of comparison, we also report the performance of the joint “sam-ple” estimator ( ˆ µ SM , (cid:98) V ,SCM ), defined as:ˆ µ SM (cid:44) L − (cid:88) Ll =1 z l , (45) (cid:40) (cid:98) Σ SCM (cid:44) L − (cid:80) Ll =1 ( z l − ˆ µ SM )( z l − ˆ µ SM ) H (cid:98) V ,SCM (cid:44) (cid:98) Σ SCM / [ (cid:98) Σ SCM ] , . (46)The performance assessment will be performed in terms of the followingindices: Bias indices – Bias index for the estimation of the location vector µ : β γ (cid:44) || E { ˆ µ γ − µ }|| , (47)where γ ∈ { SM, T y } indicates the sample mean in Eq. (45) or the Tyler’sestimator in Eq. (36). – Bias index for the estimation of the shape matrix V , : ϕ γ (cid:44) || E { vec( (cid:98) V ,γ − V , ) }|| , (48)where γ ∈ { SCM, T y, R − vdW, R − t } indicates a specif estimator amongthe SCM in Eq. (46), the Tyler’s shape estimator in Eq. (37) and the R -estimator in Eq. (40) that relies on the Tyler’s one as preliminary estimator.Note that for the R -estimator we have two options: (cid:98) V ,R − vdW indicatesthe R -estimator in Eq. (40) exploiting the van der Waerden score in Eq.(41) while (cid:98) V ,R − t indicates again R -estimator in Eq. (40) but exploitingthe t -score in Eq. (42) with ν = 5. Mean Squared Error (MSE) indices – MSE index for the estimation of the location vector µ : (cid:37) γ (cid:44) || E { ( ˆ µ aγ − µ a )( ˆ µ aγ − µ a ) H }|| F , (49)where γ ∈ { SM, T y } and for a given x ∈ C N , x a (cid:44) ( x T , x H ) T ∈ C N . – MSE index for the estimation of the shape matrix V , : ς γ (cid:44) || E { vec( (cid:98) V ,γ − V , )vec( (cid:98) V ,γ − V , ) H }|| F , (50)and γ ∈ { SCM, T y, R − vdW, R − t } , as before, indicates the relevantestimator at hand.As lower bound indices, we use ε SCRB, µ (cid:44) || SCRB( µ | h ) || F , (51) ε SCRB, V , (cid:44) || SCRB(vec( V , ) | h ) || F , (52)where SCRB( µ | h ) is given in (38) and SCRB(vec( V , ) | h ) in (39).The bias indices of the sample mean in Eq. (45) and of the Tyler’s estimatorin Eq. (37) are reported in Fig. 1. As we can note, the bias is on the orderof 10 − , so it can be considered negligible and the two estimators unbiased.Fig. 3 shows the MSE performance of the sample mean ˆ µ SM estimator in(45) and of the Tyler’s estimator ˆ µ T y in (36) compared to the lover bound in(38). As wee can see, ˆ µ T y is almost efficient with respect to SCRB( µ | h ) inheavy-tailed data (0 < s <
1) and outperforms ˆ µ SM that it is known to benon robust. On the other hand, ˆ µ SM is efficient in the Gaussian case ( s = 1),and tends to have better performance than ˆ µ T y for s >
1. However, in thislight-tails scenario, the MSE of ˆ µ T y does not explode and remains close to theˆ µ SM ’s one. . . . . . . . . · − Shape parameter: s B i a s i nd i ce s f o r µ β SM β Ty Fig. 2
Bias in the estimation of µ for the sample mean and for the Tyler’s estimator.oint Estimation of Location and Scatter in Complex Elliptical Distributions 17 . . . . . . . . . . . .
81 Shape parameter: s M S E i nd i ce s & B o und f o r µ ς SM ς Ty ε SCRB, µ Fig. 3
MSE in the estimation of µ for the sample mean and for the Tyler’s estimator. . . . . . . . . · − · − . . . . . . s B i a s i nd i ce s f o r V , ϕ SCM ϕ Ty ϕ R − vdW ϕ R − t Fig. 4
Bias in the estimation of V , for the SCM, the Tyler’s estimator and the R -estimator exploiting both the van der Waerden and the t -score functions. As far it concern the shape matrix estimation, the simulation results areshown in Fig. 4 for the bias and Fig. 5 for the MSE. The main fact here isthat the two R -estimators (cid:98) V ,R − vdW and (cid:98) V ,R − t in Eq. (40) outperformsthe Tyler’s estimator (cid:98) V ,T y in (37) for every values of s , i.e. for both heavy-tailed and light-tailed data. Moreover, as expected, (cid:98) V ,R − vdW and (cid:98) V ,R − t greatly outperform the sample covariance matrix (cid:98) V ,SCM in Eq. (46) in thepresence of heavy-tailed data (0 < s < s >
1. Between the two R -estimators, we can notice that (cid:98) V ,R − vdW has better performance than (cid:98) V ,R − t in terms of both bias and MSE. Finally,a comment on the efficiency of the above-mentioned estimator is in order. Aswe can see, there is a gap between the MSE indices of (cid:98) V ,SCM , (cid:98) V ,T y and (cid:98) V ,R − vdW and (cid:98) V ,R − t and the SCRB. However, it is worth to underline thatour aim here is to compare the performance of shape matrix estimators in a“finite-sample” regime, i.e. with a number of observations equal to L = 5 N that represents a reasonable value in many practical applications. Of course, . . . . . . . . . . . . . s M S E i nd i ce s & B o und f o r V , ς SCM ς Ty ς R − vdW ς R − t ε SCRB, V , Fig. 5
MSE in the estimation of V , for the SCM, the Tyler’s estimator and the R -estimator exploiting both the van der Waerden and the t -score functions.. by letting L → ∞ , it can be shown that both the two R -estimators (cid:98) V ,R − vdW and (cid:98) V ,R − t achieve the bound SCRB(vec( V , ) | h ) in Eq. (39) as predictedby theoretical considerations [6, 12].In summary, previous simulations highlights the benefits that the proposedrobust R -estimator can bring. Specifically, it always outperforms the Tyler’sestimator in both heavy- and light-tails scenarios. Moreover its estimationperformance is way better that the SCM one in heavy-tailed data while it isalmost similar in light-tailed scenarios: high gain, very small loss. These verypromising results promote the use of the R -estimator to other problems, asthe structured shape estimation discussed in [19, 21]. This paper dealt with the fundamental problem of estimating the locationvector and the shape matrix of a set of CES distributed data. In the firstpart of this work, we derived a computationally efficient version of the robustand semiparametric efficient R -estimator already proposed in [6]. Remarkably,the new “matrix version” of the R -estimator can provide the same estimateof the shape matrix but at a computational time that is more than two or-der of magnitude smaller with respect to the “vectorized version” previouslyderived in [6]. This fundamental property suggests us to use the new versiongiven in Eq. (20) as the default version of the R -estimator. In the second partof this paper, the joint estimation of the location vector µ and the shapematrix V , of a set of i.i.d. CES-distributed, multivariate observations hasbeen addressed. Building upon the asymptotic decorrelation of the locationand shape estimation problems, a joint estimator that relies on the Tyler’s M -estimator ˆ µ T y for µ and on a recently proposed R -estimator (cid:98) V ,R for V , has been discussed and its performance, in terms of both bias and MSE,assessed and compared with the relevant Semiparametric Cram´er-Rao Bound. oint Estimation of Location and Scatter in Complex Elliptical Distributions 19 Our simulation results, obtained for GG-distributed data, have shown thatjoint estimator ( ˆ µ T y , (cid:98) V ,R ) of location and shape represents a good alterna-tive to the classical Maronna’s joint M -estimators. In particular, in terms ofshape matrix estimation, the proposed joint estimator outperforms the jointTyler’s estimator in both heavy-tailed and light-tailed data. Future works willinvestigate the application of the proposed estimator in robust clustering anddistance learning problems. A Appendix: Proof of the Eqs. (20) and (21)The main aim of this Appendix is to show how to obtain a closed from expres-sion of the (complex-valued) R -estimator for the shape matrix V , providedin (19) (see also [6, Eq. (54)]) by avoiding the use of the matrix L V . Thematrix L V is in fact a structured ( N − × ( N −
1) that is built upon the N × N matrix V . Consequently, if we are able to obtain a expression of the R -estimator that relies only on V and not on L V , we would gain a lot interms of computational efficiency.At first, let us recall here the expression of the R -estimator introduced inEqs. (19) and (18):vec( (cid:98) V ,R ) = vec( (cid:98) V (cid:63) )+1 L ˆ α C (cid:104) L (cid:98) V (cid:63) L H (cid:98) V (cid:63) (cid:105) − L (cid:98) V (cid:63) (cid:88) Ll =1 K h (cid:18) r (cid:63)l L + 1 (cid:19) vec(ˆ u (cid:63)l (ˆ u (cid:63)l ) H ) . where ˆ α C = || (cid:101) ∆ C (cid:98) V (cid:63) L − / H C − (cid:101) ∆ C (cid:98) V (cid:63) || / || L (cid:98) V (cid:63) L H (cid:98) V (cid:63) vec( H C ) || , and H C is a “small perturbation”, Hermitian, matrix s. t. [ H C ] , = 0.In the calculation proposed below, we make extensive use of the followingproperties holding for conforming matrices (see e.g. [26]):tr( AB ) = vec( A (cid:62) ) (cid:62) vec( B ) (A.1)vec( AXB ) = ( B (cid:62) ⊗ A )vec( X ) (A.2)( A ⊗ B )( C ⊗ D ) = ( AC ⊗ BD ) (A.3)( A ⊗ B ) (cid:62) = A (cid:62) ⊗ B (cid:62) , ( A ⊗ B ) H = A H ⊗ B H (A.4)A.1 Matrix version of the central sequenceThe “distribution-free” version of the (complex-valued) efficient central se-quence (cid:101) ∆ C (cid:98) V (cid:63) is defined in Eq. (14) as: (cid:101) ∆ C (cid:98) V (cid:63) (cid:44) √ L L (cid:98) V (cid:63) L (cid:88) l =1 K h (cid:18) r (cid:63)l L + 1 (cid:19) vec(ˆ u (cid:63)l (ˆ u (cid:63)l ) H ) . Let us start by showing how to re-write (cid:101) ∆ C (cid:98) V (cid:63) without using L V .For notation simplicity, we introduce the scalar κ l as: κ l (cid:44) √ L K h (cid:18) r (cid:63)l L + 1 (cid:19) (A.5)Let us now define the N -dimensianl vector z (cid:98) V (cid:63) as: z (cid:98) V (cid:63) = (cid:104) z , ( z (cid:98) V (cid:63) ) (cid:62) (cid:105) (cid:62) = (cid:104) z , ( (cid:101) ∆ C (cid:98) V (cid:63) ) (cid:62) (cid:105) (cid:62) . (A.6)where z is an unspecified complex number. Then, from Eq. (14) and fromthe definition of the matrices P and L V in Eqs. (6) and (8) respectively, thevector z (cid:98) V (cid:63) can be cast as: z (cid:98) V (cid:63) = (cid:104) ( (cid:98) V (cid:63) ) −(cid:62) / ⊗ ( (cid:98) V (cid:63) ) − / (cid:105) Π ⊥ vec( I N ) L (cid:88) l =1 κ l vec(ˆ u (cid:63)l (ˆ u (cid:63)l ) H )= (cid:104) ( (cid:98) V (cid:63) ) −(cid:62) / ⊗ ( (cid:98) V (cid:63) ) − / (cid:105) (cid:34) L (cid:88) l =1 κ l vec(ˆ u (cid:63)l (ˆ u (cid:63)l ) H ) − N vec( I N ) L (cid:88) l =1 κ l (cid:35) = vec (cid:16) ( (cid:98) V (cid:63) ) − / R ( (cid:98) V (cid:63) ) − / − ζ ( (cid:98) V (cid:63) ) − (cid:17) (A.7)where: R (cid:44) L (cid:88) l =1 κ l ˆ u (cid:63)l (ˆ u (cid:63)l ) H , (A.8)and ζ (cid:44) N L (cid:88) l =1 κ l . (A.9)Consequently, we have that: (cid:101) ∆ C (cid:98) V (cid:63) = vec (cid:16) ( (cid:98) V (cid:63) ) − / R ( (cid:98) V (cid:63) ) − / − ζ ( (cid:98) V (cid:63) ) − (cid:17) (cid:44) z (cid:98) V (cid:63) . (A.10)Note that in (A.7), we have used the fact that:tr((ˆ u (cid:63)l (ˆ u (cid:63)l ) H ) = (ˆ u (cid:63)l ) H ˆ u (cid:63)l = 1 , ∀ l. (A.11)To avoid confusion, we indicate as z (cid:98) V (cid:63) the “ L (cid:98) V (cid:63) -free” version of (cid:101) ∆ C (cid:98) V (cid:63) obtained in Eq. (A.10). It is important to underline in fact that z (cid:98) V (cid:63) does notmake use of the “unnecessary large” ( N − × ( N −
1) matrix L (cid:98) V (cid:63) , whileonly N × N matrices are involved. oint Estimation of Location and Scatter in Complex Elliptical Distributions 21 A.2 An “ L (cid:98) V (cid:63) -free” version of the scalar ˆ α C Let us define the N -dimensional vector as: r (cid:98) V (cid:63) = (cid:104) r , ( r (cid:98) V (cid:63) ) (cid:62) (cid:105) (cid:62) = (cid:104) r , ( L (cid:98) V (cid:63) L H (cid:98) V (cid:63) vec( H C )) (cid:62) (cid:105) (cid:62) , (A.12)where r is an unspecified complex scalar.By using the fact that, by definition, [ H C ] , = 0, we have that: r (cid:98) V (cid:63) = (cid:104) r , ( L (cid:98) V (cid:63) L H (cid:98) V (cid:63) vec( H C )) (cid:62) (cid:105) (cid:62) = (cid:20)(cid:16) ( (cid:98) V (cid:63) ) −(cid:62) / ⊗ ( (cid:98) V (cid:63) ) − / (cid:17) Π ⊥ vec( I N ) (cid:16) ( (cid:98) V (cid:63) ) −(cid:62) / ⊗ ( (cid:98) V (cid:63) ) − / (cid:17) H (cid:21) vec( H C )= (cid:20) ( (cid:98) V (cid:63) ) −(cid:62) ⊗ ( (cid:98) V (cid:63) ) − − N − vec (cid:16) ( (cid:98) V (cid:63) ) − (cid:17) vec (cid:16) ( (cid:98) V (cid:63) ) − (cid:17) H (cid:21) vec( H C )= vec (cid:16) ( (cid:98) V (cid:63) ) − H C ( (cid:98) V (cid:63) ) − − N − tr (cid:16) ( (cid:98) V (cid:63) ) − H C (cid:17) ( (cid:98) V (cid:63) ) − (cid:17) (A.13)Then, from Eq. (A.12), we get: r (cid:98) V (cid:63) = vec (cid:16) ( (cid:98) V (cid:63) ) − H C ( (cid:98) V (cid:63) ) − − N − tr (cid:16) ( (cid:98) V (cid:63) ) − H C (cid:17) ( (cid:98) V (cid:63) ) − (cid:17) , (A.14)that depends only on N × N matrix quantities.Finally, by using the previous results, an “ L (cid:98) V (cid:63) -free” version of ˆ α C can beexpressed as:ˆ α C = || (cid:101) ∆ C (cid:98) V (cid:63) + L − / H C − (cid:101) ∆ C (cid:98) V (cid:63) |||| L (cid:98) V (cid:63) L H (cid:98) V (cid:63) vec( H C ) || = || z (cid:98) V (cid:63) + L − / H C − z (cid:98) V (cid:63) |||| r (cid:98) V (cid:63) || . (A.15)A.3 An “ L (cid:98) V (cid:63) -free” matrix version of the R -estimatorBy using the scalar κ l , previously defined in Eq. (A.5), we can re-write theexpression of the R -estimator in Eq. (19) as:vec( (cid:98) V ,R ) = vec( (cid:98) V (cid:63) ) + 1 √ L ˆ α C (cid:104) L (cid:98) V (cid:63) L H (cid:98) V (cid:63) (cid:105) − L (cid:98) V (cid:63) L (cid:88) l =1 κ l vec(ˆ u (cid:63)l (ˆ u (cid:63)l ) H ) . (A.16)By using the “ L (cid:98) V (cid:63) -free” version of (cid:101) ∆ C (cid:98) V (cid:63) obtained in Eq. (A.10), the ex-pression in Eq. (A.16) can be rewritten as:vec( (cid:98) V ,R ) = vec( (cid:98) V (cid:63) ) + 1 √ L ˆ α C P (cid:62) (cid:104) L (cid:98) V (cid:63) L H (cid:98) V (cid:63) (cid:105) − P × vec (cid:16) ( (cid:98) V (cid:63) ) − / R ( (cid:98) V (cid:63) ) − / − ζ ( (cid:98) V (cid:63) ) − (cid:17) . (A.17) To get rid of the “unnecessary large” N × N matrix P (cid:62) (cid:104) L (cid:98) V (cid:63) L H (cid:98) V (cid:63) (cid:105) − P we make use of the extension to the complex field of the result obtained inLemma 3.1 of [10]. Specifically, it can be shown that (see [12, Appendix A.2]): P (cid:62) (cid:104) L (cid:98) V (cid:63) L H (cid:98) V (cid:63) (cid:105) − P = (cid:104) I N − vec( (cid:98) V (cid:63) ) e (cid:62) (cid:105) (cid:104) ( (cid:98) V (cid:63) ) (cid:62) ⊗ (cid:98) V (cid:63) (cid:105) (cid:104) I N − vec( (cid:98) V (cid:63) ) e (cid:62) (cid:105) H , (A.18)where e is the first vector of the canonical basis of R N .Through direct calculation, we can easily show that: P (cid:62) (cid:104) L (cid:98) V (cid:63) L H (cid:98) V (cid:63) (cid:105) − P vec (cid:16) ( (cid:98) V (cid:63) ) − / R ( (cid:98) V (cid:63) ) − / − ζ ( (cid:98) V (cid:63) ) − (cid:17) = (cid:104) I N − vec( (cid:98) V (cid:63) ) e (cid:62) (cid:105) (cid:104) ( (cid:98) V (cid:63) ) (cid:62) ⊗ (cid:98) V (cid:63) (cid:105) × vec (cid:16) ( (cid:98) V (cid:63) ) − / R ( (cid:98) V (cid:63) ) − / − ζ ( (cid:98) V (cid:63) ) − (cid:17) = (cid:104) I N − vec( (cid:98) V (cid:63) ) e (cid:62) (cid:105) vec (cid:16) ( (cid:98) V (cid:63) ) / R ( (cid:98) V (cid:63) ) / − ζ (cid:98) V (cid:63) (cid:17) = vec (cid:16) W − [ W ] , (cid:98) V (cid:63) (cid:17) , (A.19)where W (cid:44) L / ( (cid:98) V (cid:63) ) / R ( (cid:98) V (cid:63) ) / = ( (cid:98) V (cid:63) ) / (cid:34) L L (cid:88) l =1 K h (cid:18) r (cid:63)l L + 1 (cid:19) ˆ u (cid:63)l (ˆ u (cid:63)l ) H (cid:35) ( (cid:98) V (cid:63) ) / . (A.20)By collecting the previous results, the R -estimator in Eq. (19) can be ex-pressed as: vec( (cid:98) V ,R ) = vec( (cid:98) V (cid:63) ) + 1ˆ α C vec( W − [ W ] , (cid:98) V (cid:63) ) . (A.21)Finally, in matrix form, we have: (cid:98) V ,R = (cid:98) V (cid:63) + 1ˆ α C (cid:16) W − [ W ] , (cid:98) V (cid:63) (cid:17) , (A.22)where ˆ α C = || z (cid:98) V (cid:63) + L − / H C − z (cid:98) V (cid:63) |||| r (cid:98) V (cid:63) || . (A.23)This concludes the proof of Eqs. (20) and (21). oint Estimation of Location and Scatter in Complex Elliptical Distributions 23 References
1. Bickel P, Klaassen C, Ritov Y, Wellner J (1993) Efficient and AdaptiveEstimation for Semiparametric Models. Johns Hopkins University Press2. Bishop CM (2007) Pattern Recognition and Machine Learning (Informa-tion Science and Statistics), 1st edn. Springer3. Fortunati S, Gini F, Greco MS, Zoubir AM, Rangaswamy M (2019) Semi-parametric CRB and Slepian-Bangs formulas for complex elliptically sym-metric distributions. IEEE Transactions on Signal Processing 67(20):5352–53644. Fortunati S, Gini F, Greco MS, Zoubir AM, Rangaswamy M (2019) Semi-parametric inference and lower bounds for real elliptically symmetric dis-tributions. IEEE Transactions on Signal Processing 67(1):164–1775. Fortunati S, Renaux A, Pascal F (2020) Properties of a new R -estimatorof shape matrices. EUSIPCO 20206. Fortunati S, Renaux A, Pascal F (2020) Robust semiparametric efficientestimators in complex elliptically symmetric distributions. IEEE Transac-tions on Signal Processing 68:5003–50157. Fortunati S, Renaux A, Pascal F (2020) Robust semiparametric joint es-timators of location and scatter in elliptical distributions. In: 2020 IEEE30th International Workshop on Machine Learning for Signal Processing(MLSP), pp 1–6, DOI 10.1109/MLSP49062.2020.92318658. Frontera-Pons J, Veganzones MA, Pascal F, Ovarlez J (2016) Hyperspec-tral anomaly detectors using robust estimators. IEEE Journal of SelectedTopics in Applied Earth Observations and Remote Sensing 9(2):720–7319. H´ajek J (1968) Asymptotic normality of simple linear rank statistics underalternatives. Ann Math Statist 39(2):325–34610. Hallin M, Paindaveine D (2006) Semiparametrically efficient rank-basedinference for shape I. Optimal rank-based tests for sphericity. The Annalsof Statistics 34(6):2707–275611. Hallin M, Paindaveine D (2009) Parametric and semiparametric inferencefor shape: the role of the scale functional. Statistics & Decisions 24(3):327–35012. Hallin M, Oja H, Paindaveine D (2006) Semiparametrically efficient rank-based inference for shape II. Optimal R -estimation of shape. The Annalsof Statistics 34(6):2757–278913. Hjørungnes A (2011) Complex-Valued Matrix Derivatives With Applica-tions in Signal Processing and Communications. Cambridge UniversityPress14. Huber PJ, Ronchetti EM (2011) Robust Statistics (Second Edition). JohnWiley & Sons15. Kreutz-Delgado K (2009) The complex gradient operator and the CR-calculus. URL https://arxiv.org/abs/0906.4835
16. Le Cam L, Yang GL (2000) Asymptotics in Statistics: Some Basic Con-cepts (second edition). Springer series in statistics
17. Manolakis DG, Marden D, Kerekes JP, Shaw GA (2001) Statistics of hy-perspectral imaging data. In: Shen SS, Descour MR (eds) Algorithmsfor Multispectral, Hyperspectral, and Ultraspectral Imagery VII, Inter-national Society for Optics and Photonics, SPIE, vol 4381, pp 308 – 31618. Maronna RA (1976) Robust M -estimators of multivariate location andscatter. Ann Statist 4(1):51–6719. M´eriaux B, Ren C, El Korso MN, Breloy A, Forster P (2019) Robustestimation of structured scatter matrices in (mis)matched models. SignalProcessing 165:163 – 17420. M´eriaux B, Ren C, Korso MNE, Breloy A, Forster P (2019) Asymptoticperformance of complex m -estimators for multivariate location and scatterestimation. IEEE Signal Processing Letters 26(2):367–37121. M´eriaux B, Ren C, Breloy A, El Korso MN, Forster P (2020) Matched andmismatched estimation of kronecker product of linearly structured scat-ter matrices under elliptical distributions. IEEE Transactions on SignalProcessing DOI 10.1109/TSP.2020.304294622. Ollila E, Tyler DE, Koivunen V, Poor HV (2012) Complex elliptically sym-metric distributions: Survey, new results and applications. IEEE Transac-tions on Signal Processing 60(11):5597–562523. Ollila E, Tyler DE, Koivunen V, Poor HV (2012) Compound-gaussianclutter modeling with an inverse gaussian texture distribution. IEEE Sig-nal Processing Letters 19(12):876–87924. Paindaveine D (2006) A Chernoff-Savage result for shape:on the non-admissibility of pseudo-Gaussian methods. Journal of Multivariate Anal-ysis 97(10):2206 – 222025. Pascal F, Bombrun L, Tourneret J, Berthoumieu Y (2013) Parameter es-timation for multivariate generalized gaussian distributions. IEEE Trans-actions on Signal Processing 61(23):5960–597126. Petersen KB, Pedersen MS (2012) The matrix cookbook. URL , version 2012111527. Remmert R (1991) Theory of Complex Functions. New York: Springer28. Roizman V, Jonckheere M, Pascal F (2020) A flexible EM-like clusteringalgorithm for noisy data29. Sangston KJ, Gini F, Greco MS (2012) Coherent radar target detection inheavy-tailed compound-gaussian clutter. IEEE Transactions on Aerospaceand Electronic Systems 48(1):64–7730. Schroth CA, Muma M (2020) Robust M -estimation based bayesian clusterenumeration for real elliptically symmetric distributions. In: submitted toIEEE Transactions on Signal Processing 2020 (available on arXiv), URL https://arxiv.org/abs/2005.01404