[PDF] A short overview of adaptive multichannel filters SNR loss analysis

Abstract

Many multichannel systems use a linear filter to retrieve a signal of interest corrupted by noise whose statistics are partly unknown. The optimal filter in Gaussian noise requires knowledge of the noise covariance matrix \Sigma and in practice the latter is estimated from a set of training samples. An important issue concerns the characterization of the performance of such adaptive filters. This is generally achieved using as figure of merit the ratio of the signal to noise ratio (SNR) at the output of the adaptive filter to the SNR obtained with the clairvoyant -- known \Sigma -- filter. This problem has been studied extensively since the seventies and this document presents a concise overview of results published in the literature. We consider various cases about the training samples covariance matrix and we investigate fully adaptive, partially adaptive and regularized filters.

Full PDF

AA short overview of adaptive multichannel ﬁlters SNR lossanalysis

Olivier BessonMarch 2021

Outline

Many multichannel systems use a linear ﬁlter to retrieve a signal of interest corrupted bynoise whose statistics are partly unknown. The optimal ﬁlter in Gaussian noise requiresknowledge of the noise covariance matrix Σ and in practice the latter is estimated froma set of training samples. An important issue concerns the characterization of the perfor-mance of such adaptive ﬁlters. This is generally achieved using as ﬁgure of merit the ratioof the signal to noise ratio (SNR) at the output of the adaptive ﬁlter to the SNR obtainedwith the clairvoyant -known Σ - ﬁlter. This problem has been studied extensively sincethe seventies and this document presents a concise overview of results published in theliterature. We consider various cases about the training samples covariance matrix andwe investigate fully adaptive, partially adaptive and regularized ﬁlters. Keywords

Adaptive array processing, covariance mismatch, signal to noise ratio loss, Wishart ma-trices, Student distributed training samples. a r X i v : . [ ee ss . SP ] M a r Preamble

This report focuses on performance analysis of multichannel adaptive ﬁlters through analysis oftheir SNR loss. It is an outgrowth of a course on array processing I have been giving for aboutﬁfteen years in various institutions in Toulouse at the Master of Sciences level. It should beviewed as a personal excerpt of the many results published in the literature since the seventiesand it is in no way claimed to be exhaustive. Similarly it is not meant to provide an exhaustivelist of references. Rather I will highlight along the document the references that have beenmost inﬂuential to me, starting with references [1–3] by Van Trees, Kelly and Ward whichcontain invaluable information. Very good overviews of topics related to this problem canalso be found in [4–6]. The derivations leading to the representations of the SNR loss mostlyborrow from results in multivariate statistics, especially from the theory of Gaussian and Wishartdistributions. Numerous references are available concerning this area, including [7–10].

Let us start with the problem of detecting the presence and/or estimating the amplitude α of aknown signal of interest (SoI) v from a corrupted version x = α v + n where n is the disturbanceand will be referred to as noise. In radar applications, v stands for the space and/or timesignature of a potential target and n comprises clutter, thermal noise and possibly jammers.The simplest processor is a linear ﬁlter w which enables to estimate α as ˆ α = w H x and todecide of the SoI’s presence, e.g. by comparing | w H x | to a threshold. The SNR at the outputof the (ﬁxed) ﬁlter w is given bySNR( w ) = E {| α w H v | } E {| w H n | } = P | w H v | w H Σw (1)When the noise n follows a complex multivariate Gaussian distribution with zero mean andcovariance matrix Σ the maximum likelihood estimate (MLE) of α writes α ML = w H opt x with w opt = Σ − vv H Σ − v (2)With white noise, i.e., when Σ = γ I N , (2) boils down to what we will refer to as the white noisematched ﬁlter w wnmf = ( v H v ) − v .The optimal ﬁlter w opt is also obtained as the solution to the following minimization problem:min w w H Σw subject to w H v = 1 (3)In other words this ﬁlter minimizes the output power under the constraint that the signal ofinterest goes unscathed through the ﬁlter or equivalently w opt maximizes SNR( w ). Note thatthis interpretation holds irrespective of the distribution of n . w opt results inSNR opt = SNR( w opt ) = P v H Σ − v (4)and consequently the performance of an arbitrary w can be evaluated from the SNR loss : ρ ( w ) = SNR( w )SNR opt = | w H v | ( v H Σ − v )( w H Σw ) (5)2 Fully adaptive processing

Let us consider the practical situation where Σ is unknown and hence needs to be inferredfrom a set of K samples which are usually referred to as training samples. We assume in thissection that K ≥ N . Let us assume that the training samples are independent and identicallydistributed according to a multivariate Gaussian distribution with zero-mean and covariancematrix Σ t which, at this stage, is arbitrary and possibly diﬀerent from Σ .Let X t be the N | K matrix gathering the training samples so that, with the notations de-scribed in the appendix, X t d = C N N,K ( , Σ t , I K ). Let S t = X t X Ht denote the sample covariancematrix and let us build the adaptive matched ﬁlter w = ( v H S − t v ) − S − t v (6)In order to evaluate its performance we will use the SNR loss which, with the above choice of w , becomes ρ = ( v H S − t v ) ( v H Σ − v )( v H S − t ΣS − t v ) (7)Since S t is random ρ is a random variable whose probability density function (p.d.f.) we areinterested in.Before continuing, we would like to make the following comments. First note that the casewhere Σ t = Σ is of primary importance and has been studied in the fundamental work by Reed,Mallett and Brenann [11] with a radar application point of view, see also [12,13] for a multivariateanalysis oriented study. Following Van Trees’s terminology [1], we will refer to this case as theminimum variance distortionless response (MVDR) scenario. Now in some applications there ispossibly a covariance mismatch between the samples that are used to train the ﬁlter and thesamples to be ﬁltered, and a large number of papers have addressed this issue. The simplest caseis the so-called partially homogeneous environment where Σ t = γ Σ [14, 15]. A second commonexample is the case where the training samples contain the SoI, i.e., Σ t = Σ + P vv H , a scenariowhich will be referred to as the minimum power distortionless response (MPDR) scenario. Itsthorough analysis can be found in [16]. The training samples can also be contaminated bysignal-like components or outliers [17, 18] or there might exist a rank-one diﬀerence between Σ t and Σ , for instance a surprise or undernulled interference [19–21]. The case where Σ t and Σ are diﬀerent but satisfy the so-called generalized eigen-relation is dealt with in [22, 23] while anarbitrary Σ t is considered in [24–27].In what follows we let Σ / t denote a square-root of Σ t i.e., Σ t = Σ / t ( Σ / t ) H and weuse the short-hand notations Σ H/ t = ( Σ / t ) H , Σ − / t = ( Σ / t ) − and Σ − H/ t = ( Σ H/ t ) − .The deﬁnition of the complex matrix-variate distributions appearing below can be found in theappendix. We consider ﬁrst an arbitrary Σ t and then specialize to the above cases of interest. Our aim is toobtain a statistical representation of ρ in terms of well-known distributions and the derivationsbelow follow from [26]. Since X t d = Σ / t N with N d = C N N,K ( , I N , I K ), it ensues that S t d =3 / t WΣ H/ t with W d = C W N ( K, I N ). Therefore, one can write ρ = ( v H S − t v ) ( v H Σ − v )( v H S − t ΣS − t v ) d = ( v H Σ − H/ t W − Σ − / t v ) ( v H Σ − v )( v H Σ − H/ t W − Σ − / t ΣΣ − H/ t W − Σ − / t v ) d = ( v H Σ − H/ t QW − Q H Σ − / t v ) ( v H Σ − v )( v H Σ − H/ t QW − Q H Σ − / t ΣΣ − H/ t QW − Q H Σ − / t v ) (8)for any unitary matrix Q since W and QWQ H have the same distribution. Let us deﬁne Ω = Q H Σ − / t ΣΣ − H/ t Q = (cid:18) Ω Ω Ω Ω (cid:19) (9)and let us choose Q such that Q H Σ − / t v = ( v H Σ − t v ) / e where e = (cid:2) . . . (cid:3) T . Withthis choice we obtain ρ d = v H Σ − t vv H Σ − v ( e H W − e ) e H W − ΩW − e (10)If we partition W as in (9) we get W − e = (cid:18) W − . − W − . W W − − W − W W − . W − . (cid:19) (cid:18) (cid:19) = W − . (cid:18) − t (cid:19) (11)where W . = W − W W − W ; t = W − W (12)The SNR loss can thus be written as ρ = v H Σ − t vv H Σ − v (cid:2) − t H (cid:3) Ω (cid:20) − t (cid:21) (13)Moreover (cid:2) − t H (cid:3) Ω (cid:20) − t (cid:21) = Ω − t H Ω − Ω t + t H Ω t = Ω . + ( t − Ω − Ω ) H Ω ( t − Ω − Ω ) (14)which, along with the readily veriﬁed fact that Ω . = v H Σ − t vv H Σ − v yields the following compactexpression ρ = (cid:20) v H Σ − vv H Σ − t v ( t − ¯ t ) H Ω ( t − ¯ t ) (cid:21) − (15)with ¯ t = Ω − Ω . From [13] t follows a complex multivariate Student distribution and canbe represented as t d = n √ V d = C N N − ( , I N − ) (cid:113) C χ K − N +2 (0) (16)4et Ω = UΛU H denote the eigenvalue decomposition of Ω . Then one can write thequadratic form in (15) as Q = ( t − ¯ t ) H Ω ( t − ¯ t )= V − ( n − V / ¯ t ) H UΛU H ( n − V / ¯ t )= V − ( U H n − V / U H ¯ t ) H Λ ( U H n − V / U H ¯ t )= V − N − (cid:88) i =1 λ i (cid:12)(cid:12)(cid:12) u Hi ( n − V / ¯ t ) (cid:12)(cid:12)(cid:12) d = V − N − (cid:88) i =1 λ i C χ ( V | u Hi ¯ t | ) d = V − N − (cid:88) i =1 λ i C χ ( V δ i ) (17)where V d = C χ K − N +2 (0) and δ i = | u Hi ¯ t | . Therefore, the SNR loss admits the followingrepresentation: ρ d = (cid:34) v H Σ − vv H Σ − t v (cid:80) N − i =1 λ i C χ ( V δ i ) V (cid:35) − (18) The previous equation provides a statistical representation of the SNR loss forarbitrary matrices Σ and Σ t .We will examine later on the impact of Σ t (cid:54) = Σ on the SNR loss distribution. Prior to that,let us consider the MVDR scenario for which Σ t = Σ or possibly Σ t = γ Σ since it does notchange the distribution of the SNR loss. In this case, one has Ω = γ − I N , λ i = γ − and δ i = 0,which results in ρ mvdr d = (cid:34) C χ N − (0) C χ K − N +2 (0) (cid:35) − (19)and therefore ρ mvdr d = Beta( N − , K − N + 2) with a p.d.f. given by p mvdr ( ρ ) = 1 B N − ,K − N +2 ρ K − N +1 (1 − ρ ) N − (20) The distribution of ρ mvdr is therefore independent of Σ and depends only on N and K . It isstraightforward to see that E { ρ mvdr } = ( K − N + 2) / ( K + 1) so that E { ρ mvdr } = 0 . ⇔ K = 2 N − Reed-Mallet-Brennan rule [11]. As an illustration, wedisplay in Figure 1 the Beta distribution (20) for diﬀerent values of K with N = 16. As canbe seen the fully adaptive processor does a good job only when a suﬃcient number of trainingsamples is available whereas when K approaches N the support of this distribution is signiﬁcantlymoved towards very small values, and one may wonder about using w = ( v H S − t v ) − S − t v under these circumstances. Note that the requirement for a large number of training samplescan somehow be relieved if one assumes some structure for Σ . For instance if Σ is knownto be persymmetric and this property is exploited then the corresponding fully adaptive ﬁlterhas a Beta( N − , K − N +22 ) distribution [28] and hence approximately N samples are required toachieve convergence. 5

30 -25 -20 -15 -10 -5 0012345678

Figure 1: Distribution of the SNR loss when Σ t = Σ . N = 16 and varying K .Let us now move to the MPDR scenario which has been thoroughly studied by Boroson [16].Actually the latter reference considers a rather large set of scenarios (MVDR or MPDR) and ad-ditionally takes into account errors about the SoI signature, i.e., Boroson derives representationsof the SNR loss of w ∝ S − t ¯ v where ¯ v (cid:54) = v for both the MVDR and MPDR scenarios. Assumingno SoI signature error and a partially homogeneous MPDR scenario - Σ t = γ Σ + P vv H - it canbe shown that ρ mpdr d = (cid:34) γ − SNR opt ) C χ N − (0) C χ K − N +2 (0) (cid:35) − (22)which depends only on K , N and γ − SNR opt . It is clear that the diﬀerence between the MVDRand the MPDR scenarios will be all the more pronounced that γ − SNR opt is large. The p.d.f.of ρ mpdr is given by p mpdr ( ρ ) = (1 + γ − SNR opt ) K − N +2 B N − ,K − N +2 ρ K − N +1 (1 − ρ ) N − (1 + γ − SNR opt ρ ) K +1 (23)The detrimental eﬀect of the presence of the SoI in the training samples is illustrated in Figure2 where we plot the distribution of ρ mpdr for K = 2 N and various values of γ − SNR opt andwhere we compare it with the distribution of ρ mvdr . Obviously when γ − SNR opt increases a largediﬀerence is observed and ρ mpdr is likely to take very low values which makes the interest of theadaptive ﬁlter questionable in this situation.Another case of interest stems from observing the importance of the vector ¯ t in (17).Indeed the non-centrality parameter δ i of (18) depends on it and therefore becomes 0 if ¯ t = .The latter condition is equivalent to the so-called generalized eigen-relation (GER) [19, 22]which states that Σ − t v = λ Σ − v . If the latter condition is fulﬁlled then ρ GER d = (cid:34) (cid:80) N − i =1 λ i C χ (0) λ C χ K − N +2 (0) (cid:35) − (24)Before illustrating the general case, we investigate a special case of the GER, namely when Σ = Σ t + qq H and q H Σ − t v = 0, i.e., the data to be ﬁltered contains a rank-one component6

20 -18 -16 -14 -12 -10 -8 -6 -4 -2 002468101214

Figure 2: Distribution of the SNR loss in the presence of the SoI in the training samples versus γ − SNR opt . N = 16 and K = 2 N .(e.g. a surprise interference) which is not accounted for by the training samples and which fallsin a null of the optimal ﬁlter w opt . In this case, the expression in (24) can be simpliﬁed to ρ Σ = Σ t + qq H , q H Σ − t v =0 d = (cid:34) C χ N − (0) + (1 + q H Σ − t q ) C χ (0) C χ K − N +2 (0) (cid:35) − (25)which depends on q H Σ − t q only. In Figure 3 we illustrate the impact of such a surprise inter-ference which is present in the data to be ﬁltered but has not been learned from the trainingsamples. With the power of this interference increasing the degradation is seen to be substantial.Let us come back to the general case where Σ t (cid:54) = Σ but the GER Σ − t v = λ Σ − v issatisﬁed. The representation of the SNR loss is given by (24) and is shown to depend on thescenario through λ and the eigenvalues λ i . In the sequel we consider a scenario dealing with arrayprocessing, more precisely a uniform linear array with N = 16 elements spaced a half wavelengthapart. The data to be ﬁltered consists of thermal (white Gaussian) noise and 3 interfering signalslocated at − ◦ , 9 ◦ and 25 ◦ (measured with respect to the normal of the array), with respectivepowers 35dB, 25dB and 30dB above thermal noise power. We would like to stress that thisis a very speciﬁc scenario where Σ consists of a very strong low-rank component plus a scaledidentity matrix. Thus it may not be representative of other applications and consequently theconclusions drawn hereafter apply only to this case. As for Σ t we will use the method in [26]where Σ t = Σ / W − Σ H/ and, in the general case, W is a Wishart matrix with mean value η I N and 10 log η is uniformly distributed over [ − W has a speciﬁc form, see [26] for details. In Figure 4 ﬁve independent matrices Σ t were so randomly generated and we compare the distribution of the SNR loss when Σ t (cid:54) = Σ and Σ − t v = λ Σ − v to the case where Σ t = Σ (solid line in the ﬁgure). As can be observeda mismatch of the training samples covariance matrix results in a degradation of the SNR loss.Let us ﬁnally deal with the more general case where Σ t = Σ / W − Σ H/ and the GER is notsatisﬁed. Figure 5 plots the distribution of the SNR loss corresponding to the representation in(18). Again the deleterious eﬀect of Σ t (cid:54) = Σ is observed.7

20 -18 -16 -14 -12 -10 -8 -6 -4 -2 000.511.522.533.544.55

Figure 3: Distribution of the SNR loss when Σ = Σ t + qq H and q H Σ − t v = 0. N = 16 and K = 2 N .Before closing this section, we mention that approximations of the distribution of ρ of theform ρ d ≈ (cid:20) a C χ ν (0) C χ µ (0) (cid:21) − (26)are proposed in [26] both under the GER assumption and in the general case. They are basedon approximating a quadratic form in centered normal variables (GER case) or non centeredStudent variables (general case) and rely on on Pearson’s method [29], see also [30–32]. So far we assumed a ﬁxed and arbitrary matrix Σ t and, with no loss of generality, we can writeit as Σ t = Σ / Γ − Σ H/ where Γ is an arbitrary matrix. Therefore, the p.d.f. of X t | Γ is givenby p ( X t | Γ ) = π − NK | Σ / Γ − Σ H/ | − K etr (cid:110) − X Ht [ Σ / Γ − Σ H/ ] − X t (cid:111) = π − NK | Σ | − K | Γ | K etr (cid:110) − ΓΣ − / X t X Ht Σ − H/ (cid:111) (27)Let us now consider Γ as a random matrix and for mathematical convenience let us assume for Γ a prior conjugate to (27), namely a Wishart distribution Γ d = C W N (cid:0) ν, µ − I N (cid:1) p ( Γ ) ∝ | µ − I N | − ν | Γ | ν − N etr {− µ Γ } (28)The mean value of Γ is E { Γ } = ( ν − N ) − µ I N so that choosing µ = ν − N results in Γ “ﬂuctuating” around I N and hence Σ t ﬂuctuating around Σ . Under the assumptions (27)-(28) it can be shown that X t follows a complex matrix-variate Student distribution X t d = C T N,K ( ν − N + 1 , , µ Σ , I K ) whose p.d.f is given by p ( X t ) ∝ | µ Σ | − K | I N + ( µ Σ ) − X t X Ht | − ( ν + K ) (29)8

10 -9 -8 -7 -6 -5 -4 -3 -2 -1 000.511.522.533.544.55

Figure 4: Distribution of the SNR loss when Σ t (cid:54) = Σ and Σ − t v = λ Σ − v . The solid linerepresents the case Σ t = Σ . N = 16 and K = 2 N .In other words this type of random covariance mismatch results in a distribution mismatchas the training samples are no longer Gaussian but Student distributed. In fact we have now X t d = ( µ Σ ) W − ν N where W ν d = C W N ( ν, I N ) is independent of N d = C N N,K ( , I N , I K ) so that S t d = µ Σ W − ν W K W − ν Σ = µ Σ F − Σ (30)where W K d = C W N ( K, I N ). It follows that F = W ν W − K W ν d = C F N ( ν, K ). Recall that in theGaussian case we had S t d = Σ t W K Σ t with W K d = C W N ( K, I N ). Therefore the analysis of theStudent case follows along the same lines as in the Gaussian case except that now one needs touse properties of partitioned F -distributed matrices rather than partitioned Wishart matrices.In the Student case it can be shown [33] that the SNR loss admits the following representation: ρ Student d = (cid:34) (cid:32) C χ K − N +1 (0) C χ ν (0) (cid:33) C χ N − (0) C χ K − N +2 (0) (cid:35) − (31)When comparing the Student representation (31) to its Gaussian counterpart (19) it is clearthat the SNR loss is likely to take lower values in the Student case than in the Gaussian case.Note also that we recover that (19) and (31) are equivalent when ν → ∞ . The p.d.f. of ρ Student is given by p Student ( ρ ) = ρ K − N +1 (1 − ρ ) N − B N − ,K − N +2 × B K − N +1 ,ν + N − B K − N +1 ,ν F ( K + 1 , K − N + 1; ν + K ; 1 − ρ ) (32)where the ﬁrst term is recognized as the distribution of the SNR loss in the Gaussian case.As an illustration of the impact of Student distributed training samples we ﬁrst look at theinﬂuence of ν in Figure 6 where we display the p.d.f of ρ Student for K = 2 N . Here µ = ν − N so that E { X t X Ht } = Σ . As can be seen, the impact is rather signiﬁcant with the support of ρ Student ’s p.d.f. moved downwards compared to that in the Gaussian case. We also observe thatit depends however on K as illustrated in Figure 7: the diﬀerence between the Student and theGaussian cases increases with K . 9

10 -9 -8 -7 -6 -5 -4 -3 -2 -1 00123456

Figure 5: Distribution of the SNR loss when Σ t = Σ / W − Σ H/ . The solid line represents thecase Σ t = Σ . N = 16 and K = 2 N . This section was devoted to the analysis of the SNR loss associated with the fully adaptive ﬁlter w ∝ S − t v . We looked at the case where Σ t = Σ (for both Gaussian and Student distributedtraining samples) and also at the case Σ t (cid:54) = Σ . A short synthesis of this section is found below. Synthesis

When the training samples have covariance matrix Σ t and the ﬁlter w =( v H S − t v ) − S − t v is used in lieu of w opt = ( v H Σ − v ) − Σ − v degradation occurs withthe SNR loss a random variable taking values in [0 , . The least impact is achieved inthe MVDR scenario where Σ t = Σ and K = 2 N − training samples are necessaryto achieve an average SNR dB below the optimum SNR. In the MPDR scenario where Σ t = Σ + P vv H the number of samples required to achieve convergence increases with SNR opt . The detrimental eﬀect of a covariance mismatch between the training samplesand the data to be ﬁltered was also analysed. Finally we illustrated that with Student dis-tributed training samples (which can occur as a particular (Bayesian) type of covariancemismatch) the support of the p.d.f. of ρ is also moved towards lower values. The SNR loss of the MPDR beamformer has been particularly studied in [16] where errors onthe signature v are also considered. It allows a more straightforward derivation of ρ mpdr which10

10 -9 -8 -7 -6 -5 -4 -3 -2 -1 000.511.522.533.544.55

Figure 6: Probability density function of the SNR loss with Student distributed samples forvarious ν . E { X t X Ht } = Σ . N = 16 and K = 2 N .we adapt here. Let us assume that Σ t = Σ + P vv H and let us write ρ mpdr = (cid:0) v H S − t v (cid:1) (cid:0) v H Σ − v (cid:1) (cid:0) v H S − t ΣS − t v (cid:1) = (cid:0) v H S − t v (cid:1) (cid:0) v H Σ − t v (cid:1) (cid:0) P v H Σ − v (cid:1) (cid:2) v H S − t ( Σ t − P vv H ) S − t v (cid:3) = (cid:0) v H S − t v (cid:1) (cid:0) v H Σ − t v (cid:1) (cid:0) v H S − t Σ t S − t v (cid:1) (cid:0) P v H Σ − v (cid:1) − (cid:32) − P (cid:0) v H S − t v (cid:1) v H S − t Σ t S − t v (cid:33) − d = ρ mvdr (cid:2)(cid:0) P v H Σ − v (cid:1) (cid:0) − ρ mvdr P v H Σ − t v (cid:1)(cid:3) − = ρ mvdr (cid:2) (1 + P v H Σ − v − ρ mvdr P v H Σ − v (cid:3) − = ρ mvdr [1 + (1 − ρ mvdr )SNR opt ] − (33)where we used the fact that v H Σ − t v = (cid:0) P v H Σ − v (cid:1) − (cid:0) v H Σ − v (cid:1) . The simple change ofvariables ρ mpdr → ρ mvdr in (33) along with the p.d.f. of ρ mvdr in (20) enables one to recover thep.d.f. of ρ mpdr of (23).The derivations above also enable one to derive stochastic representations for the weightvector of the adaptive ﬁlter itself, see [34] for related work. Indeed one has w = S − t vv H S − t v d = Σ − H/ t W − Σ − / t vv H Σ − H/ t W − Σ − / t v d = Σ − H/ t QW − Q H Σ − / t vv H Σ − H/ t QW − Q H Σ − / t v (34)11

20 -18 -16 -14 -12 -10 -8 -6 -4 -2 0012345678

Figure 7: Probability density function of the SNR loss with Student distributed samples forvarious K . E { X t X Ht } = Σ . N = 16 and ν = 2 N .for any unitary Q . Let us again choose Q = (cid:20) Σ − / t v ( v H Σ − t v ) / Σ H/ t V ⊥ ( V H ⊥ Σ t V ⊥ ) − H/ (cid:21) where V ⊥ is a semi-unitary matrix such that V H ⊥ v = . Then w d = ( v H Σ − t v ) − / Σ − H/ t QW − e e H W − e = ( v H Σ − t v ) − / (cid:104) Σ − t v ( v H Σ − t v ) / V ⊥ ( V H ⊥ Σ t V ⊥ ) − H/ (cid:105) (cid:20) − t (cid:21) = Σ − t vv H Σ − t v − ( v H Σ − / t v ) − / V ⊥ ( V H ⊥ Σ t V ⊥ ) − H/ t d = Σ − t vv H Σ − t v − ( v H Σ − t v ) − / V ⊥ ( V H ⊥ Σ t V ⊥ ) − H/ n √ V (35)with n d = C N N − ( , I N − ) and V d = C χ K − N +2 (0). When Σ t = Σ we obtain w mvdr d = w opt − v H Σ − v ) / V ⊥ ( V H ⊥ ΣV ⊥ ) − H/ n √ V (36)while when Σ t = Σ + P vv H we get w mpdr d = w opt − (1 + SNR opt ) / ( v H Σ − v ) / V ⊥ ( V H ⊥ ΣV ⊥ ) − H/ n √ V (37)We can notice that the two beamformers diﬀer in the subspace orthogonal to v . One can alsoobserve that E {(cid:107) w (cid:107) } = (cid:13)(cid:13)(cid:13)(cid:13) Σ − t vv H Σ − t v (cid:13)(cid:13)(cid:13)(cid:13) + 1 v H Σ − t v Tr { ( V H ⊥ Σ t V ⊥ ) − } K − N + 1 (38)This implies in particular that E {(cid:107) w mpdr (cid:107) } = E {(cid:107) w mvdr (cid:107) } + P Tr { (cid:0) V H ⊥ ΣV ⊥ (cid:1) − } K − N + 1 (39)12herefore we can expect the white noise array gain (cid:107) w mpdr (cid:107) − of the MPDR beamformer to belower than that of the MVDR beamformer, with a diﬀerence that increases with the power ofthe signal of interest. This section is devoted to partially adaptive processing where the adaptive ﬁlter belongs to asubspace of the entire space. In array processing applications this includes beamspace processingor reduced-rank adaptive beamforming. First we set the principle of this approach and givesome insights about when it can be as eﬃcient as fully adaptive processing. Then we willsuccessively consider the case of ﬁxed transformations and the case of principal-componentbased transformations. We will also brieﬂy allude to random transformations. Unless otherwisestated we assume in this section that Σ t = Σ . Consider the general structure of a partially adaptive beamformer of the form described in Figure8. x T N | R + 1 ˜ x = T H x ˜ w R + 1 | w H ˜ x Figure 8: Structure of a partially adaptive processor with T a N | ( R + 1) matrix.The data x is ﬁrst transformed through T to get ˜ x = T H x and then ˜ x is ﬁltered by thelength-( R + 1) ﬁlter ˜ w to obtain the output. The equivalent length- N ﬁlter is w = T ˜ w . Wewill often consider the case where T = (cid:2) (cid:107) v (cid:107) − v V ⊥ Ψ (cid:3) with Ψ a ( N − | R matrix, which isdescribed in Figure 9. x w wnmf d V ⊥ N | N − z Ψ N − | R ˜ z ˜ w aR | + + − d − ˜ w Ha ˜ z Figure 9: Structure of a partially adaptive processor with T = (cid:2) w wnmf V ⊥ Ψ (cid:3) .This choice of T is reminiscent of a sidelobe canceler structure where the columns of V ⊥ Ψ are orthogonal to v and aimed at capturing the part of noise that goes through the white noisematched ﬁlter w wnmf = (cid:107) v (cid:107) − v .Let us start as before with minimizing the output power subject to unit gain towards v :min ˜ w ˜ w H ˜ Σ t ˜ w subject to ˜ w H ˜ v = 1 (40)where ˜ Σ t = T H Σ t T = T H ΣT and ˜ v = T H v . The solution is ˜ w opt = (˜ v H ˜ Σ t − ˜ v ) − ˜ Σ t − ˜ v .Let us now ask the following question: 13an we possibly have T ˜ w opt = w opt = ( v H Σ − v ) − Σ − v ?If so it would mean that no loss is incurred when using a partially adaptive processor comparedto a fully adaptive processor. Let us examine how such an equality could be achieved: T ˜ w opt ∝ Σ − v ⇔ T ˜ Σ t − ˜ v ∝ Σ − v ⇔ T ( T H ΣT ) − T H v ∝ Σ − v ⇔ Σ T ( T H ΣT ) − T H Σ Σ − v ∝ Σ − v ⇔ Π Σ T Σ − v ∝ Σ − v ⇔ Σ − v ∈ R (cid:110) Σ T (cid:111) ⇔ Σ − v ∈ R { T } (41)The latter equality means that the range space of T should include Σ − v . At ﬁrst glance thisis a meaningless condition since, if Σ − v was known, we would already get the optimal ﬁlter.However, let us consider a situation where Σ = GG H + γ I N where G is a N | R matrix. Then Σ − v = ( GG H + γ I N ) − v = γ − (cid:104) I N − G (cid:0) γ I R + G H G (cid:1) − G H (cid:105) v = γ − (cid:2) v G (cid:3) (cid:20) − (cid:0) γ I R + G H G (cid:1) − G H v (cid:21) (42)It follows that if (cid:2) v G (cid:3) ∈ R { T } then T ˜ w opt = w opt . In other words, if the range space T contains v and the principal subspace of Σ then there is no loss in using partially adaptiveprocessing . In this case w opt actually belongs to a subspace. This suggests that partially adaptiveprocessing might be particularly eﬀective when the noise covariance matrix has a strong low-rank component. Before pursuing let us just see what the condition (41) means when T = (cid:2) (cid:107) v (cid:107) − v V ⊥ Ψ (cid:3) . We have Σ − v ∈ R { T } ⇔ Σ − v = α w wnmf + V ⊥ Ψ β ⇔ ( (cid:107) v (cid:107) − vv H + V ⊥ V H ⊥ ) Σ − v = α (cid:107) v (cid:107) − v + V ⊥ Ψ β ⇔ (cid:107) v (cid:107) − ( v H Σ − v − α ) v + V ⊥ ( V H ⊥ Σ − v − Ψ β ) = (43)Since v and V ⊥ are orthogonal this is equivalent to V H ⊥ Σ − v ∈ R { Ψ } .Moreover, if Σ = GG H + γ I N then from (42) V H ⊥ Σ − v = − γ − V H ⊥ G (cid:0) γ I R + G H G (cid:1) − G H v ∈R (cid:8) V H ⊥ G (cid:9) . Therefore if R (cid:8) V H ⊥ G (cid:9) ⊂ R { Ψ } the partially adaptive ﬁlter coincides with the fullyadaptive ﬁlter .Before continuing note that if Σ = GG H + γ I N and GG H (cid:29) γ I N then from (42) w opt ∝ v − G (cid:0) γ I R + G H G (cid:1) − G H v (cid:39) v − G (cid:0) G H G (cid:1) − G H v = Π ⊥ G v (44)where Π ⊥ G denotes the projector onto the orthogonal complement of R { G } . Therefore if thelow-rank component in Σ is much stronger than the white noise component, the optimal ﬁlterconsists of projecting the desired signal v onto the so-called noise subspace. This property willbe the basis for principal component adaptive processing to be described later.14 .2 Analysis of the SNR loss for ﬁxed T and insights Let us now consider the practical case where a set of training samples X t is used to design theﬁlter ˜ w and we solve (40) with ˜ Σ t substituted for ˜ S t = T H S t T , i.e.,min ˜ w ˜ w H ˜ S t ˜ w subject to ˜ w H ˜ v = 1 (45)whose solution is˜ w = ˜ S − t ˜ v ˜ v H ˜ S − t ˜ v = ( T H S t T ) − T H vv H T ( T H S t T ) − T H v ⇒ w = T ( T H S t T ) − T H vv H T ( T H S t T ) − T H v (46)In the case where T = (cid:2) w wnmf V ⊥ Ψ (cid:3) the weight vector can be written as w = w wnmf − V ⊥ Ψ ˜ w a with ˜ w a a R -dimensional vector which is found by minimizing the output power in anunconstrained way since ( w wnmf − V ⊥ Ψ ˜ w a ) H v = 1. In other words the problem is nowmin ˜ w a ( w wnmf − V ⊥ Ψ ˜ w a ) H S t ( w wnmf − V ⊥ Ψ ˜ w a ) (47)whose solution results in the ﬁlter w = w wnmf − V ⊥ Ψ ( Ψ H V H ⊥ S t V ⊥ Ψ ) − Ψ H V H ⊥ S t w wnmf = (cid:2) w wnmf V ⊥ Ψ (cid:3) (cid:34) − ˆ Σ − z ˆ r d ˜ z (cid:35) (48)where ˆ Σ ˜ z and ˆ r d ˜ z are the sample versions of E { ˜ z ˜ z H } and E { d ∗ ˜ z } , see Figure 9.We ﬁrst analyse the SNR loss obtained with the ﬁlter w = T ˜ w = (˜ v H ˜ S − t ˜ v ) − T ˜ S − t ˜ v . Letus assume ﬁrst an arbitrary Σ t and note that ˜ X t = T H X t d = C N R +1 ,K (cid:16) , ˜ Σ t , I K (cid:17) so that˜ S t = ˜ X t ˜ X Ht d = ˜ Σ t / ˜ W ˜ Σ tH/ T with ˜ W d = C W R +1 ( K, I R +1 ). Mimicking the derivations usedin the fully adaptive case, we can write that ρ = ( w H v ) ( v H Σ − v )( w H Σv )= (˜ v H ˜ S − t ˜ v ) ( v H Σ − v )(˜ v H ˜ S − t ˜ Σ ˜ S − t ˜ v ) d = (˜ v H ˜ Σ t − H/ ˜ W − ˜ Σ t − / ˜ v ) ( v H Σ − v )(˜ v H ˜ Σ t − H/ ˜ W − ˜ Σ t − / ˜ Σ ˜ Σ t − H/ ˜ W − ˜ Σ t − / ˜ v ) d = (˜ v H ˜ Σ t − H/ ˜ Q ˜ W − ˜ Q H Σ − / t ˜ v ) ( v H Σ − v )(˜ v H ˜ Σ t − H/ ˜ Q ˜ W − ˜ Q H ˜ Σ t − / ˜ Σ ˜ Σ t − H/ ˜ Q ˜ W − ˜ Q H ˜ Σ t − / v ) (49)for any unitary matrix ˜ Q . Again let us choose ˜ Q such that ˜ Q H ˜ Σ t − / ˜ v = (˜ v H ˜ Σ t − ˜ v ) / ˜ e where ˜ e = (cid:2) . . . (cid:3) T and let us deﬁne˜ Ω = ˜ Q H ˜ Σ t − / ˜ Σ ˜ Σ t − H/ ˜ Q = (cid:18) ˜Ω ˜ Ω ˜ Ω ˜ Ω (cid:19) (50)Then we get ρ d = ˜ v H ˜ Σ t − ˜ vv H Σ − v × (˜ e H ˜ W − ˜ e ) ˜ e H ˜ W − ˜ Ω ˜ W − ˜ e (51)15imilarly to what was done in the fully adaptive case, one can show that˜ e H ˜ W − ˜ Ω ˜ W − ˜ e (˜ e H ˜ W − ˜ e ) = ˜ v H ˜ Σ t − ˜ v ˜ v H ˜ Σ − ˜ v + ˜ Q = ˜ v H ˜ Σ t − ˜ v ˜ v H ˜ Σ − ˜ v (cid:34) v H ˜ Σ − ˜ v ˜ v H ˜ Σ t − ˜ v ˜ Q (cid:35) (52)with ˜ Q d = ˜ V − R (cid:88) i =1 ˜ λ i C χ ( ˜ V ˜ δ i ) (53)where ˜ V d = C χ K − R +1 (0), ˜ δ i = | ˜ u Hi ˜ Ω − ˜ Ω | and ˜ u i , ˜ λ i are the R eigenvectors and eigenvaluesof ˜ Ω . It follows that, for arbitrary Σ t , the SNR loss of the partially adaptive ﬁlter can berepresented as ρ d = ˜ v H ˜ Σ − ˜ vv H Σ − v (cid:34) v H ˜ Σ − ˜ v ˜ v H ˜ Σ t − ˜ v (cid:80) Ri =1 ˜ λ i C χ ( ˜ V ˜ δ i )˜ V (cid:35) − (54)Let us now focus on the MVDR scenario for which Σ t = Σ . In this case ˜ Ω = I R +1 , ˜ λ i = 1,˜ δ i = 0 and therefore one obtains ρ pa-mvdr d = ˜ v H ˜ Σ − ˜ vv H Σ − v Beta(

R, K − R + 1) (55)Therefore, the SNR loss is now a scaled beta distributed random variable whose p.d.f. greatlydepends on a = ˜ v H ˜ Σ − ˜ vv H Σ − v = v H T ( T H ΣT ) − T H vv H Σ − v = v H Σ − H/ Π Σ / T Σ − / vv H Σ − v = energy of Σ − / v in R (cid:110) Σ / T (cid:111) energy of Σ − / v (56)which conﬁrms what was observed in (41), i.e., the importance of how much energy of Σ − / v is contained in R (cid:110) Σ / T (cid:111) . In particular the support of the distribution of ρ is is shrunk dueto the scaling and is now limited to [0 , a ]. Of course the value of a is unknown since it dependson Σ . In Figure 10 we illustrate the distribution of ρ pa-mvdr for various values of N , K andan hypothetical a . It can be seen that in limited sample support, e.g., K = N then partiallyadaptive beamforming is even to be preferred to fully adaptive beamforming.Nevertheless in practice one does not know the value of a a particular choice of T willresult in. For illustration purposes let us assume that Σ = GG H + γ I N and let us choose T = (cid:2) (cid:107) v (cid:107) − v V ⊥ Ψ (cid:3) . We consider 2 choices for Ψ . In one case Ψ is picked at random whilein a second case Ψ is such that the angles between R (cid:8) V H ⊥ G (cid:9) and R { Ψ } are less than 45 ◦ .The corresponding values of a are given in Figure 11 for 100 diﬀerent trials of Ψ . It is clearthat picking Ψ at random does not oﬀer much guarantee : some values of a can be very smalleven though some can be rather high. Consequently selecting one random Ψ is risky, yet theidea of using a few random matrices Ψ is deﬁnitely not irrelevant, we will come back later onthis point. On the contrary it is seen that if Ψ is selected such that R (cid:8) V H ⊥ G (cid:9) and R { Ψ } areclose, a is very close to 1 which, following Figure 10, should result in a very eﬀective partiallyadaptive processor. This is a main argument for partially adaptive processors based on usingthe principal subspace of S t . 16

30 -25 -20 -15 -10 -5 00246810121416 (a) N = 16, K = 16, R = 4 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 005101520253035 (b) N = 16, K = 32, R = 4 -30 -25 -20 -15 -10 -5 0051015202530 (c) N = 64, K = 64, R = 16 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 00102030405060 (d) N = 64, K = 128, R = 164 Figure 10: Distribution of the SNR loss of a partially adaptive ﬁlter versus value of a . The leftpanels deal with K = N , the right panels with K = 2 N . R = N/ As shown earlier, partially adaptive processing is particularly suitable when the noise covariancematrix is the sum of a strong low-rank component and white noise. In this case the optimalﬁlter indeed belongs to a subspace, see (42), and can be approximately written -see (43)- asa projection onto the orthogonal complement of the principal subspace of Σ . Moreover therewould be no loss if one would be able to select T = (cid:2) v G (cid:3) or equivalently to select Ψ = V H ⊥ G .These observations have led to the use of what we will refer to as the class of principal component adaptive processing whose gist is to use an estimate of R { G } based on the principal eigenvectorsof S t . There is a long list of references about such approach but the fundamentals are describedin [35–37]. In these references the so-called eigencanceler is deﬁned as w ec = v − R (cid:88) r =1 ( u Hr v ) u r (57)17

10 20 30 40 50 60 70 80 90 1000.10.20.30.40.50.60.70.80.91 0 10 20 30 40 50 60 70 80 90 1000.940.950.960.970.980.991

Figure 11: Value of a when Ψ is picked at random or close to the principal subspace of Σ = GG H + γ I N .where u r , r = 1 . . . N stands for the eigenvectors of S t where the eigenvalues are arranged indescending order, i.e. S t = N (cid:88) n =1 λ n u n u Hn ; λ ≥ λ ≥ · · · ≥ λ N (58)The form in (57) is in fact the sample version of (43). To our knowledge there does not exist anyexact analysis of the distribution of the SNR loss ρ ec associated with w ec . Only approximationsof its distribution are available under the assumption of Σ = GG H + γ I N and GG H (cid:29) γ I N .To be more precise, Kirsteins and Tufts [35] showed that, under the latter assumption, ρ ec d ≈ Beta(

R, K − R + 1) (59)which, when looking at (55), would mean that the eigencanceler would achieve a value of a = 1,i.e., the value obtained with the clairvoyant choice of T . It may be felt as a rather optimisticapproximation but can predict fairly well the actual distribution of ρ ec when Σ is the sum ofa very powerful low-rank term and a scaled identity matrix. This is illustrated in Figure 12where the actual distribution of ρ ec obtained from Monte-Carlo simulations is compared withthe Beta( R, K − R + 1) distribution.Again we remind that the scenario of this simulation is extremely favourable to the eigen-canceler. With a smoother variation of the eigenvalues of Σ this conclusion should be re-examined.A few observations can be made concerning the above technique. First note that it uses theeigenvectors associated with the largest eigenvalues. Another meaningful approach was proposedby Goldstein and Reed [38–40] called the cross spectral metric (CSM) whose principle is to selectthe eigenvectors that contribute most to increasing the SNR.Note that the original eigencanceler makes use of the eigenvalue decomposition to obtainan estimate of the principal subspace of Σ . However this is not the only method for thatpurpose. A very interesting approach which avoids eigenvalue decomposition was proposed byGoldstein, Reed and Scharf in [41] and referred to as the multistage Wiener ﬁlter (MWF), seealso [42, 43]. It turns out that, at each stage, the MWF operates in the Krylov subspace of thesample covariance matrix, as does the conjugate gradient method at each of its iterations tosolve a linear system of the type S t w = v . This analogy puts conjugate gradient as an eﬀectiveand eﬃcient reduced-rank method for partially adaptive processing.18

10 -9 -8 -7 -6 -5 -4 -3 -2 -1 000.511.522.533.544.55

Figure 12: Probability density function of ρ ec and comparison with its Beta( R, K − R + 1)prediction. N = 16 and R = 3.In [44] we proposed to use a partial Cholesky factorization as a way to approximate theprincipal subspace. The partial Cholesky factor of Σ of rank R will be denoted as pchol ( Σ , R ).It is the N | R (with R ≤ rank ( Σ )) lower triangular matrix with positive diagonal elements G Σ = (cid:18) G Σ G Σ (cid:19) , where G Σ is a R × R lower triangular matrix with positive diagonal elementsand G Σ is a ( N − R ) | R matrix, deﬁned from Σ = (cid:18) Σ Σ Σ Σ (cid:19) = (cid:18) Σ Σ Σ Σ Σ − Σ (cid:19) + (cid:18) . (cid:19) = (cid:18) G Σ G Σ (cid:19) (cid:0) G H Σ G H Σ (cid:1) + (cid:18) . (cid:19) (60)where Σ . = Σ − Σ Σ − Σ , G Σ G H Σ = Σ and G Σ G H Σ = Σ . From a practical pointof view, G Σ can be obtained, e.g., by using only R steps of Algorithm 4.2.2 of [45]. In practicewe can use as an alternative to (57) the vector w ec-pchol = Π ⊥ pchol( S t ,R ) v (61)An analysis was conducted in [44] showing that ρ ec-pchol d ≈ a (cid:48) Beta(

R, K − R + 1) (62)where a (cid:48) is a scalar that depends on Σ and is rather close to 1. The representation in (62) iscompliant with that of (55) and predicts that the partial Cholesky factorization could performwell. Actually it was shown that the distributions of ρ ec and ρ ec-pchol are almost identical, withthe latter possibly better when Σ t = Σ + P vv H . Therefore, the computationally simpler partialCholesky factorization is a very good alternative to eigenvalue decomposition.19inally we would like to draw attention to the fact that all these partially adaptive processorsrequire the selection of R . Obviously, when Σ = GG H + γ I N and the ﬁrst low-rank term ispredominant, R should be chosen as the rank of G . Actually selecting R below this rank hasdramatic consequences with very low SNR, mainly due to the fact that the range space of G cannot be captured by T . Over-estimating R is less consequential, yet it leads to performanceloss. We make a digression here and consider an interesting class of partially adaptive processors basedon random T , actually random Ψ . As said before, if one picks Ψ at random then the coeﬃcient a = ˜ v H ˜ Σ − ˜ vv H Σ − v which intervenes in (55) can sometimes be very close to 1 (hence a good ﬁlter) yetother times small. This is undesirable in practice as one cannot control the performance of thepartially adaptive processor which depends on the particular outcome of Ψ . However using andimproving over this basic idea results in a rather eﬃcient scheme as proposed in [46]. The ideaof Marzetta et al. is to use a certain number or random matrices Ψ (cid:96) and to average over thecorresponding ﬁlters, as illustrated in Figure 13. x w wnmf d V ⊥ z • Ψ z ˜ w Ψ z ˜ w ...... Ψ L ˜ z L ˜ w L meanvalue + + − d − L P Lℓ =1 ˜ w Hℓ ˜ z ℓ Figure 13: Structure of Marzetta et al. partially adaptive processor based on random reduced-dimension transformations.The ﬁnal adaptive ﬁlter writes w Marzetta = w wnmf − L L (cid:88) (cid:96) =1 V ⊥ Ψ (cid:96) ( Ψ H(cid:96) V H ⊥ S t V ⊥ Ψ (cid:96) ) − Ψ H(cid:96) V H ⊥ S t w wnmf (63)In [46] it is proposed to draw the matrices Ψ (cid:96) from a uniform distribution on the Stiefel manifoldof ( N − | R semi-unitary matrices. However the vector in (63) remains the same if Ψ (cid:96) are drawnfrom a C N N − ,R ( , I N , I R ). Analysis of the SNR loss associated with w Marzetta is not availablebut very good performance was observed. Moreover a number of interesting results and insights20re provided in [46] regarding the average value of Ψ ( Ψ H ΩΨ ) − Ψ H for arbitrary positive semi-deﬁnite matrices Ω . For illustration purposes, we display in Figure 14 the SNR loss of w Marzetta when R = 3, i.e., when R corresponds to the number of interfering signals. As can be seen one -20 -18 -16 -14 -12 -10 -8 -6 -4 -2 000.511.522.5 Figure 14: Probability density function of Marzetta’s partially adaptive scheme SNR loss. N =16 K = 6 and R = 3.does not reach the performance of the eigencanceler predicted by (59) but the diﬀerence is small.However we observe that improvement can be achieved when R is slightly above the number ofinterfering signals, as depicted in Figure 15. Therefore one advantage of this method is that onenot needs to know precisely the rank of the true interference covariance matrix. Synthesis

In this section we looked at partially adaptive processing, i.e., when the adaptive ﬁlterbelongs to a given subspace of the entire observation space. We begun with a ﬁxedreduced-dimension transformation T and a known Σ and we showed that this kind ofreduced-dimension structure can be as eﬃcient as the optimal processor when Σ containsa strong low-rank component and T is suitably chosen. In practice Σ is substitutedfor the sample covariance matrix and we showed that the SNR loss is a scaled betadistributed random variable for ﬁxed T . Next we addressed adaptive techniques where T is selected from the data with the main purpose of retrieving the principal subspace of Σ .In this case no exact distribution of the SNR loss is available but a rather accurate betadistribution approximation exists which shows a better convergence of partially adaptiveﬁlters compared to fully adaptive ﬁlters. Finally we hinted at an original idea developedby Marzetta et al. relying on random reduced-dimension transformations.21

20 -18 -16 -14 -12 -10 -8 -6 -4 -2 000.511.522.5

Figure 15: Probability density function of Marzetta’s partially adaptive scheme SNR loss. N =16, K = 6 and R = 4. One of the most widely used method to cope with the eﬀects of small sample support on con-ventional adaptive ﬁlters is to use regularization and to solvemin w w H ˆ Σw output power + µ (cid:107) w (cid:107) subject to w H v = 1 constraint (64)where ˆ Σ = K − S t . In adaptive beamforming the penalizing term µ (cid:107) w (cid:107) is inversely propor-tional to the white noise array gain and hence ﬁlls the purpose of ensuring that the latter is nottoo low. The solution is given by w dl = ( S t + Kµ I N ) − vv H ( S t + Kµ I N ) − v (65)and is often referred to as diagonal loading [47]. Diagonal loading is an ubiquitous techniquethat appears as the solution of many problem formulations in robust adaptive beamforming, seee.g., [4, 48–52]. Although diagonal loading has been dealt with in hundreds of papers to the bestof our knowledge there is no exact analysis of the SNR loss associated with w dl for arbitrary Σ . Only approximations are available in the case Σ = GG H + γ I N and GG H (cid:29) γ I N . Theseapproximations require that µ is larger than γ (i.e., the loading level is larger than the whitenoise power), yet much below the eigenvalues of GG H , which is possible only if a very largegap exists between the latter and the white noise power. Moreover, they also assume that K is small and typically slightly above the rank of G . The fundamental works where derivationand analysis of this technique can be found are [53, 54]. In the sequel we provide a sketch of thederivations leading to the distribution of the SNR loss of w dl when Σ = GG H + γ I N . Let us22rite the eigenvalue decomposition of Σ as Σ = GG H + γ I N = (cid:2) U s U n (cid:3) (cid:20) Λ s

00 0 (cid:21) (cid:20) U Hs U Hn (cid:21) + γ I N = (cid:2) U s U n (cid:3) (cid:20) Λ s + γ I R γ I N − R (cid:21) (cid:20) U Hs U Hn (cid:21) = UΛU H (66)and let us assume that Λ s ( r, r ) (cid:29) γ . Under this hypothesis, one has Σ − = U s ( Λ s + γ I R ) − U Hs + γ − U n U Hn (cid:39) γ − U n U Hn (67)which means that inverting Σ amounts to projecting onto the noise subspace. This is basicallythe main property that will be used below.We begin with the exact representation of S t + Kµ I N as S t + Kµ I N d = UΛ / NN H Λ / U H + Kµ I N = UΛ / (cid:0) NN H + Kµ Λ − (cid:1) Λ / U H = UΛ / ˜ WΛ / U H (68)where N d = C N N,K ( , I N , I K ) and ˜ W = NN H + Kµ Λ − . It follows that ρ dl = [ v H ( S t + Kµ I N ) − v ] ( v H Σ − v )[ v H ( S t + Kµ I N ) − Σ ( S t + Kµ I N ) − v ] d = [ v H UΛ − / ˜ W − Λ − / U H v ] ( v H Σ − v )[ v H UΛ − / ˜ W − Λ − / U H UΛU H Λ − / U H ˜ W − Λ − / U H v ] d = [ v H UΛ − / ˜ W − Λ − / U H v ] ( v H Σ − v )[ v H UΛ − / ˜ W − Λ − / U H v ] (69)Now we use the fact that Λ s ( r, r ) (cid:29) γ to write˜ W = NN H + Kµ Λ − = (cid:20) N s N n (cid:21) (cid:2) N Hs N Hn (cid:3) + Kµ (cid:20) ( Λ s + γ I R ) − γ − I N − R (cid:21) (cid:39) (cid:20) N s N Hs N s N Hn N n N Hs N n N Hn + Kµγ − I N − R (cid:21) = (cid:20) ˜ W ss ˜ W sn ˜ W ns ˜ W nn (cid:21) (70)and Λ − / U H v (cid:39) (cid:20) γ − / U Hn v (cid:21) = γ − / (cid:20) n (cid:21) (71)These approximations result in ρ dl (cid:39) (cid:18)(cid:2) Hn (cid:3) ˜ W − (cid:20) n (cid:21)(cid:19) ( v Hn v n ) (cid:18)(cid:2) Hn (cid:3) ˜ W − (cid:20) n (cid:21)(cid:19) = [ v Hn ˜ W − n.s v n ] ( v Hn v n )[ v Hn ˜ W − n.s ( I N − R + ˜ W ns ˜ W − ss ˜ W sn ) ˜ W − n.s v n ] (72)23t this stage further approximations are required to obtain tractable expressions and eventuallyarrive at the ﬁnal representation in (77). In [54] two successive approximations are actuallymade while [55] directly states the expression (75) below. Whatever, it amounts to use thefollowing approximations:˜ W − n.s = [ ˜ W nn − ˜ W ns ˜ W − ss ˜ W sn ] − (cid:39) [ Kµγ − I N − R + W nn − W ns W − ss W sn ] − = [ Kµγ − I N − R + N n (cid:2) I K − N Hs ( N s N Hs ) − N s (cid:3) N Hn ] − = [ Kµγ − I N − R + N n Π ⊥ N Hs N Hn ] − = [ Kµγ − I N − R + N n H s H Hs N Hn ] − = ( Kµγ − ) − [ I N − R − N n H s ( Kµγ − I R + H Hs N Hn N n H s ) − H Hs N Hn ] (cid:39) ( Kµγ − ) − [ I N − R − ( Kµγ − ) − N n H s H Hs N Hn ] (cid:39) ( Kµγ − ) − I N − R (73)and ˜ W ns ˜ W − ss ˜ W ns (cid:39) W ns W − ss W sn = (cid:104) N n N Hs ( N Hs N s ) − (cid:105) W − ss (cid:104) N n N Hs ( N Hs N s ) − (cid:105) H (74)Note that N (cid:48) n = N n N Hs ( N Hs N s ) − is independent of N s (hence of W ss ) and follows a C N N − R,R ( , I N − R , I R ) distribution while W ss = N s N Hs d = C W R ( K, I R ). Therefore ρ dl (cid:39) ( v Hn v n ) ( v Hn v n )[ v Hn (cid:0) I N − R + N (cid:48) n W − ss ( N (cid:48) n ) H (cid:1) v n ] (75)From W ss = N s N Hs d = C W R ( K, I R ) and N (cid:48) n d = C N N − R,R ( , I N − R , I R ), we have that v Hn N (cid:48) n W − ss ( N (cid:48) n ) H v n d = ( v Hn N (cid:48) n ( N (cid:48) n ) H v n ) / C χ K − R +1 (0) d = ( v Hn v n ) C χ R (0) / C χ K − R +1 (0) (76)It yields ρ dl d ≈ (cid:34) C χ R (0) C χ K − R +1 (0) (cid:35) − d = Beta( R, K − R + 1) (77)which coincides with the distribution of the SNR loss of the eigencanceler in (59). This isactually not surprising. Indeed, if the eigenvalue decomposition of S t writes as in (58) with λ R (cid:29) λ R +1 (cid:39) λ R +2 (cid:39) · · · (cid:39) λ N then( S t + Kµ I N ) − v = (cid:32) N (cid:88) n =1 ( λ n + Kµ ) − u n u Hn (cid:33) v (cid:39) (cid:32) N (cid:88) n = R +1 ( λ n + Kµ ) − u n u Hn (cid:33) v (cid:39) η (cid:32) N (cid:88) n = R +1 u n u Hn (cid:33) v = η (cid:32) I N − R (cid:88) r =1 u r u Hr (cid:33) v ∝ w ec (78)24nd therefore diagonal loading more or less behaves like the eigencanceler whose SNR loss distri-bution is given by (59). Numerical simulations tend to conﬁrm this fact and show that diagonalloading is very eﬃcient in low sample support with strong low-rank interference, provided thatthe loading level µ is chosen properly. This is illustrated in Figure 16 where, if µ is chosen prop-erly, the ﬁlter w dl has a performance commensurate with that of the eigencanceler. Note alsothat when µ is not suﬃciently larger than γ or suﬃciently smaller than the largest eigenvaluesthen the approximation does not hold. -20 -18 -16 -14 -12 -10 -8 -6 -4 -2 000.511.522.5 Figure 16: Probability density function of the SNR loss with diagonal loading. N = 16 and K = 6.As said before exact analysis of the SNR loss of w dl does not exist, the main technical problembeing that the Wishart distribution of S t is lost when adding the scaled identity matrix. Therehave been attempts at asymptotic (i.e., when K → ∞ ) analysis, see [56–58]. While theseanalyses are interesting they hold under a framework (large K ) which does not correspond tothe framework of most interest of diagonal loading, namely low sample support. We believethat a much more adequate and solid framework is that of random matrix theory (RMT)for which N, K → ∞ with

N/K → c . The beauty of RMT lies in 1)a very rigorous theorywhich applies to large class of processes and 2)its ability to predict fairly well what happensin ﬁnite samples despite its “asymptotic” nature, certainly because K scales with N . Amongthe numerous important works in this area we would like to excerpt papers by Mestre and hisco-authors [59, 60] which deal with diagonal loading. In these references a central limit theoremis provided showing that b N √ K (cid:0) SNR( w dl ) − SNR( w dl ) (cid:1) converges to a normalized Gaussiandistribution N (0 , In this report we provided a short overview of many works spanned over ﬁfty years to analyze thedistribution of the SNR loss at the output of multichannel adaptive ﬁlters trained with a ﬁnitenumber of snapshots. We successively investigated fully adaptive processing, partially adaptiveprocessing and regularized adaptive processing and tried to summarize the main results. Wealso tried to show how powerful are the statistical tools related to matrix-variate distributions25n deriving these results. As said before this presentation is not exhaustive and probably a lot ofinteresting results are missing. In particular we only alluded to RMT which, in the recent years,has proved to provide tremendous tools to analyze the performance of many array processingmethods.

A Complex matrix-variate distributions

In this appendix, we brieﬂy give the deﬁnitions and some properties of the complex matrix-variate distributions used in this report.

Gaussian distribution

The complex matrix-variate Gaussian distribution is denoted by C N N,K (cid:0) ¯ X , Σ , Ω (cid:1) with a p.d.f.given by p ( X ) = π − NK | Σ | − K | Ω | − N etr (cid:8) − Σ − ( X − ¯ X ) Ω − ( X − ¯ X ) H (cid:9) (Gaussian)For K = 1 we note C N N (¯ x , Σ ) the distribution of the vector x . Wishart distribution

When X d = C N N,K ( , Σ , I K ) with K ≥ N then S = XX H d = C W N ( K, Σ ) follows a complexWishart distribution with p.d.f. p ( S ) ∝ | Σ | − K | S | K − N etr (cid:8) − Σ − S (cid:9) (Wishart)where ∝ means “proportional to”. Properties of partitioned Wishart matrices are very importantto derive the SNR loss distribution and we brieﬂy recall some of them. Let us partiton S as S =  S P,P S P,Q S Q,P S Q,Q  (79)Then the inverse of S can be partitioned as S − = (cid:18) S − . − S − . S S − − S − . S S − S − . (cid:19) (80)where S . = S − S S − S and S . = S − S S − S . It has been shown that S . and (cid:0) S − S , S (cid:1) are independently distributed with S . d = C W P ( K − Q, Σ . ) (81) S d = C W Q ( K, Σ ) (82)and the conditional distribution of the Q | P matrix S − S given S is S − S | S d = C N Q,P (cid:0) Σ − Σ , S − , Σ . (cid:1) . (83)The marginal distribution of T = S − S is a matrix-variate Student distribution (givenbelow), i.e., T d = C T Q,P (cid:0) Σ − Σ , K − Q + 1 , Σ − , Σ . (cid:1) .The complex chi-square distribution with N degrees of freedom and non-centrality parameter δ will be denoted as C χ N ( δ ). It is the distribution of x H x when x d = C N N (¯ x , I N ) and δ = ¯ x H ¯ x .26 tudent distribution The complex matrix-variate t distribution is denoted by C T N,K (cid:0) ν, ¯ X , Σ , Ω (cid:1) and its p.d.f is givenby p ( X ) ∝ | Σ | − K | Ω | − N | I N + Σ − ( X − ¯ X ) Ω − ( X − ¯ X ) H | − ( ν + N + K − (matrix-Student)It can be represented as X d = ¯ X + ( W − / ) H Y ; W d = C W N (cid:0) ν + N − , Σ − (cid:1) , Y d = C N N,K ( , I N , Ω ) X d = ¯ X + Y W − / ; Y d = C N N,K ( , Σ , I K ) , W d = C W K (cid:0) ν + K − , Ω − (cid:1) (84)where Y i is independent of W i .For K = 1 we let p ( x ) ∝ | Σ | − (cid:2) x − ¯ x ) H Σ − ( x − ¯ x ) (cid:3) − ( ν + N ) (vector-Student)and one has x d = ¯ x + C N N ( , Σ ) / (cid:112) C χ ν (0). F distribution If S i d = C W N ( K i , Σ ), i = 1 ,

2, then F = S S − S -where S denotes the unique Hermitiansquare-root of S - follows a complex matrix-variate F distribution with p.d.f. p ( F ) ∝ | F | K − N | I N + F | − ( K + K ) (F)and we note F d = C F N ( K , K ). If F is partitioned as in (79) then F . and F are independentand F . d = C F P ( K − Q, K ); F d = C F Q ( K , K − P ) (85) F − F | F . , F d = C T Q,P (cid:0) K + K − N + 1 , , I Q + F − , I P + F . (cid:1) (86)When N = 1, F d = C χ K (0) / C χ K (0) d = C F ( K , K ). Beta distribution

Let F d = C F ( K , K ). Then B = (1 + F ) − d = Beta( K , K ) is beta distributed and itsprobability density function is given by p ( B ) = 1 B K ,K B K − (1 − B ) K − (Beta)where B K ,K = Γ( K )Γ( K )Γ( K + K ) . References [1] H. L. Van Trees.

Optimum Array Processing . John Wiley, New York, 2002.[2] E. J. Kelly and K. M. Forsythe. Adaptive detection and parameter estimation for multi-dimensional signal models. Technical Report 848, Massachusetts Institute of Technology,Lincoln Laboratory, Lexington, MA, 19 April 1989.273] J. Ward. Space-time adaptive processing for airborne radar. Technical Report 1015, LincolnLaboratory, Massachusetts Institute of Technology, Lexington, MA, December 1994.[4] A. B. Gershman. Robustness issues in adaptive beamforming and high-resolution directionﬁnding. In Y. Hua, A. Gershman, and Q. Chen, editors,

High Resolution and Robust SignalProcessing , chapter 2, pages 63–110. Marcel Dekker, 2003.[5] J. R. Guerci.

Space-Time Adaptive Processing for Radar . Artech House, Norwood, MA,2003.[6] S. A. Vorobyov. Adaptive and robust beamforming. In A. M. Zoubir, M. Viberg, R. Chel-lappa, and S. Theodoridis, editors,

Academic Press Library in Signal Processing: Volume3 , pages 503–552. Elsevier, 2014.[7] R. J. Muirhead.

Aspects of Multivariate Statistical Theory . John Wiley & Sons, Hoboken,NJ, 1982.[8] A. K. Gupta and D. K. Nagar.

Matrix Variate Distributions . Chapman & Hall/CRC, BocaRaton, FL, 2000.[9] N. R. Goodman. Statistical analysis based on a certain multivariate complex Gaussian dis-tribution (an introduction).

The Annals of Mathematical Statistics , 34(1):152–177, March1963.[10] C. G. Khatri. Classical statistical analysis based on a certain multivariate complex Gaussiandistribution.

The Annals of Mathematical Statistics , 36(1):98–114, February 1965.[11] I. S. Reed, J. D. Mallett, and L. E. Brennan. Rapid convergence rate in adaptive arrays.

IEEE Transactions Aerospace Electronic Systems , 10(6):853–863, November 1974.[12] R. Choudary Hanumara. An alternate derivation of the distribution of the conditionedsignal-to-noise ratio.

IEEE Transactions Antennas Propagation , 34(3):463–464, March1986.[13] C. G. Khatri and C. R. Rao. Eﬀects of estimated noise covariance matrix in optimalsignal detection.

IEEE Transactions Acoustics Speech Signal Processing , 35(5):671–679,May 1987.[14] S. Kraut and L. L. Scharf. The CFAR adaptive subspace detector is a scale-invariant GLRT.

IEEE Transactions Signal Processing , 47(9):2538–2541, September 1999.[15] S. Kraut, L. L. Scharf, and R. W. Butler. The adaptive coherence estimator: A uniformlymost powerful invariant adaptive detection statistic.

IEEE Transactions Signal Processing ,53(2):427–438, February 2005.[16] D. M. Boroson. Sample size considerations for adaptive arrays.

IEEE TransactionsAerospace Electronic Systems , 16(4):446–451, July 1980.[17] K. Gerlach. The eﬀects of signal contamination on two adaptive detectors.

IEEE Transac-tions Aerospace Electronic Systems , 31(1):297–309, January 1995.[18] K. Gerlach. Outlier resistant adaptive matched ﬁltering.

IEEE Transactions AerospaceElectronic Systems , 38(3):885–901, July 2002.[19] C. D. Richmond. Statistics of adaptive nulling and use of the generalized eigenrelation(GER) for modeling inhomogeneities in adaptive processing.

IEEE Transactions SignalProcessing , 48(5):1263–1273, May 2000. 2820] O. Besson. Detection in the presence of surprise or undernulled interference.

IEEE SignalProcessing Letters , 14(5):352–354, May 2007.[21] O. Besson and D. Orlando. Adaptive detection in non-homogeneous environments using thegeneralized eigenrelation.

IEEE Signal Processing Letters , 14(10):731–734, October 2007.[22] C. D. Richmond. Performance of a class of adaptive detection algorithms in nonhomoge-neous environments.

IEEE Transactions Signal Processing , 48(5):1248–1262, May 2000.[23] R. S. Blum and K. F. McDonald. Analysis of STAP algorithms for cases with mismatchedsteering and clutter statistics.

IEEE Transactions Signal Processing , 48(2):301–310, Febru-ary 2000.[24] K. F. McDonald and R. S. Blum. Exact performance of STAP algorithms with mismatchedsteering and clutter statistics.

IEEE Transactions Signal Processing , 48(10):2750–2763,October 2000.[25] R. S. Raghavan. False alarm analysis of the AMF algorithm for mismatched training.

IEEETransactions Signal Processing , 67(1):83–96, January 2019.[26] O. Besson. Analysis of the SNR loss distribution with covariance mismatched trainingsamples.

IEEE Transactions Signal Processing , 68:5759–5768, 2020.[27] O. Besson. Impact of covariance mismatched training samples on constant false alarm ratedetectors.

IEEE Transactions on Signal Processing , 69:755–765, 2021.[28] J. Liu, W. Liu, H. Liu, B. Chen, X.-G. Xia, and F. Dai. Average SINR calculation of apersymmetric sample matrix inversion beamformer.

IEEE Transactions Signal Processing ,64(8):2135–2145, April 2016.[29] E. S. Pearson. Note on an approximation to the distribution of non-central χ . Biometrika ,46(3/4):364, December 1959.[30] J. T. Imhof. Computing the distribution of quadratic forms in normal variables.

Biometrika ,48(3/4):419–426, December 1961.[31] H. Solomon and M. A. Stephens. Distribution of a sum of weighted chi-square variables.

Journal of the American Statistical Association , 72(360):881–885, December 1977.[32] A. M. Mathai and S. B. Provost.

Quadratic forms in random variables: Theory and appli-cations . Marcel Dekker, Inc, New York, NY, 1992.[33] O. Besson. On the distributions of some statistics related to adaptive ﬁlters trained with t -distributed samples, 2021. arXiv.2101.1069.v2 [math.ST].[34] C. D. Richmond. Derived PDF of maximum likelihood signal estimator which employs anestimated noise covariance. IEEE Transactions Signal Processing , 44(2):305–315, February1996.[35] I. P. Kirsteins and D. W. Tufts. Rapidly adaptive nulling of interference. In Michel Bouvetand Georges Bienvenu, editors,

High-Resolution Methods in Underwater Acoustics , pages217–249, Berlin, Heidelberg, 1991. Springer Berlin Heidelberg.[36] A. M. Haimovich and Y. Bar-Ness. An eigenanalysis interference canceler.

IEEE Transac-tions Acoustics Speech Signal Processing , 39(1):76–84, January 1991.2937] A. M. Haimovich. The eigencanceler: Adaptive radar by eigenanalysis methods.

IEEETransactions Aerospace Electronic Systems , 32(2):532–542, April 1996.[38] J. S. Goldstein and I. S. Reed. Reduced-rank adaptive ﬁltering.

IEEE Transactions SignalProcessing , 45(2):492–496, February 1997.[39] J. S. Goldstein and I. S. Reed. Subspace selection for partially adaptive sensor arrayprocessing.

IEEE Transactions Aerospace Electronic Systems , 33(2):539–544, April 1997.[40] J. S. Goldstein and I. S. Reed. Theory of partially adaptive radar.

IEEE TransactionsAerospace Electronic Systems , 33(4):1309–1325, October 1997.[41] J. S. Goldstein, I. S. Reed, and L. L. Scharf. A multistage representation of the Wiener ﬁlterbased on orthogonal projections.

IEEE Transactions Information Theory , 44(7):2943–2959,November 1998.[42] M. L. Honig and W. Xao. Performance of reduced-rank linear interference suppression.

IEEE Transactions Information Theory , 47(5):1928–1946, July 2001.[43] M. L. Honig and J. S. Goldstein. Adaptive reduced-rank interference suppression basedon the multistage Wiener ﬁlter.

IEEE Transactions Communications , 50(6):986–994, June2002.[44] O. Besson and F. Vincent. Properties of the partial Cholesky factorization and applicationto reduced-rank adaptive beamforming.

Signal Processing , 167:107300, February 2020.[45] G. Golub and C. Van Loan.

Matrix Computations . John Hopkins University Press, Balti-more, 3rd edition, 1996.[46] T. L. Marzetta, G. H. Tucci, and S. H. Simon. A random matrix theoretic approach to han-dling singular covariance estimates.

IEEE Transactions Information Theory , 57(9):6256–6271, September 2011.[47] B. D. Carlson. Covariance matrix estimation errors and diagonal loading in adaptive arrays.

IEEE Transactions Aerospace Electronic Systems , 24(4):397–401, July 1988.[48] H. Cox, R. M. Zeskind, and M. M. Owen. Robust adaptive beamforming.

IEEE Transac-tions Acoustics Speech Signal Processing , 35(10):1365–1376, October 1987.[49] J. Li, P. Stoica, and Z. Wang. On robust Capon beamforming and diagonal loading.

IEEETransactions Signal Processing , 51(7):1702–1715, July 2003.[50] J. Li, P. Stoica, and Z. Wang. Doubly constrained robust Capon beamforming. In

Pro-ceedings 37th Asilomar Conference on Signals, Systems and Computers , pages 1335–1339,Paciﬁc Grove, CA, November 9-12 2003.[51] S. A. Vorobyov, A. B. Gershman, and Z. Luo. Robust adaptive beamforming using worst-case performance optimization: A solution to the signal mismatch problem.

IEEE Trans-actions Signal Processing , 51(2):313–324, February 2003.[52] S. Shahbazpanahi, A. B. Gershman, Z.-Q. Luo, and K. M. Wong. Robust adaptive beam-forming for general-rank signal models.

IEEE Transactions Signal Processing , 51(9):2257–2269, September 2003.[53] Y. I. Abramovich. Controlled method for adaptive optimization of ﬁlters using the criterionof maximum SNR.

Radio Engineering and Electronic Physics , 26:87–95, March 1981.3054] O. P. Cheremisin. Eﬃciency of adaptive algorithms with regularised sample covariancematrix.

Radiotechnika i Electronika , 27(10):1933–1941, October 1982.[55] Y. Abramovich. Convergence analysis of linearly constrained SMI and LSMI adaptivealgorithms. In

Proceedings IEEE Adaptive Systems for Signal Processing, Communications,and Control Symposium (AS-SPCC) , pages 255–59, October 1-4 2000.[56] M. W. Ganz, R. L. Moses, and S. L. Wilson. Convergence of the SMI and the diagonallyloaded SMI algorithms with weak interference.

IEEE Transactions Antennas Propagation ,38(3):394–399, March 1990.[57] R. L. Dilsavor and R. L. Moses. Analysis of modiﬁed SMI method for adaptive array weightcontrol.

IEEE Transactions Signal Processing , 41(2):721–726, February 1993.[58] L. B. Fertig. Statistical performance of the MVDR beamformer in the presence of diagonalloading. In

Proceedings 1st IEEE SAM Workshop , pages 77–81, Cambridge, MA, March16-17 2000.[59] X. Mestre and M. A. Lagunas. Finite sample size eﬀect on minimum variance beamformers:Optimum diagonal loading factor for large arrays.

IEEE Transactions Signal Processing ,54(1):69–82, January 2006.[60] F. Rubio, X. Mestre, and W. Hachem. A CLT on the SNR of the diagonally loaded MVDRﬁlters.