[PDF] On the distributions of some statistics related to adaptive filters trained with t-distributed samples

Abstract

In this paper we analyse the behaviour of adaptive filters or detectors when they are trained with t-distributed samples rather than Gaussian distributed samples. More precisely we investigate the impact on the distribution of some relevant statistics including the signal to noise ratio loss and the Gaussian generalized likelihood ratio test. Some properties of partitioned complex F distributed matrices are derived which enable to obtain statistical representations in terms of independent chi-square distributed random variables. These representations are compared with their Gaussian counterparts and numerical simulations illustrate and quantify the induced degradation.

Full PDF

OOn the distributions of some statistics related to adaptive ﬁlterstrained with t -distributed samples Olivier Besson ∗ March 2021

Abstract

In this paper we analyse the behaviour of adaptive ﬁlters or detectors when they aretrained with t -distributed samples rather than Gaussian distributed samples. More preciselywe investigate the impact on the distribution of some relevant statistics including the signalto noise ratio loss and the Gaussian generalized likelihood ratio test. Some properties ofpartitioned complex F distributed matrices are derived which enable to obtain statisticalrepresentations in terms of independent chi-square distributed random variables. Theserepresentations are compared with their Gaussian counterparts and numerical simulationsillustrate and quantify the induced degradation. Keywords—

Adaptive multichannel processing, complex matrix-variate F distribution, SNR loss Estimating the amplitude α or detecting the presence of a known signal v from a noise corrupted version x = α v + n is a recurrent problem in numerous applications including radar where v stands for thespace and/or time signature of a potential target and n gathers disturbance sources, mostly clutter andthermal noise [1–3]. When the noise n follows a complex matrix-variate Gaussian distribution with zeromean and covariance matrix Σ the maximum likelihood estimate (MLE) of α writes α ML = w H opt x with w opt = ( v H Σ − v ) − Σ − v . This optimal ﬁlter w opt is also obtained as the solution to the followingminimization problem min w w H Σw subject to w H s = 1 (1)In other words this ﬁlter minimizes the output power under the constraint that the signal of interestgoes unscathed through the ﬁlter. Note that this interpretation holds irrespective of the distribution of n . Since Σ is usually unknown a set of training samples is used which, in the best case, share the samedistribution as n . In the Gaussian framework Σ is substituted for the sample covariance matrix (SCM) S = XX H where X is the training samples data matrix, on the rationale that S is up to a scaling factorthe MLE of Σ . Proceeding this way results in w amf = ( v H S − v ) − S − v which is usually referred to asthe adaptive matched ﬁlter [4]. For any ﬁlter w a classical ﬁgure of merit is the signal to noise ratio(SNR) loss which is deﬁned as ρ ( w ) = SNR( w )SNR( w opt ) = | w H v | ( v H Σ − v )( w H Σw ) (2)and corresponds to the ratio of the SNR obtained with w to that obtained with w opt . In the sequel, weconcentrate on the SNR loss of w amf , which we will denote as ρ and is given by ρ = ρ ( w amf ) = ( v H S − v ) ( v H Σ − v )( v H S − ΣS − v ) (3) ∗ The author is with Universit´e de Toulouse, ISAE-SUPAERO, 10 avenue Edouard Belin, 31055 Toulouse,France. Email: [email protected] a r X i v : . [ m a t h . S T ] M a r ssuming a Gaussian distribution for X , it has been shown that ρ is beta distributed with parametersthat depend only on the size of the observations N and the number of training samples K [5,6]. A similarbeta distribution with diﬀerent parameters is obtained when persymmetry is exploited [7].Unfortunately for some applications it may not be possible to dispose of Gaussian distributed trainingsamples as the latter have possibly a heavier distribution tail. This is often the case in radar applicationswhere the main source of noise is the clutter and the latter is generally non Gaussian [8–10]. Therefore,it becomes of interest to study what happens when training samples are no longer Gaussian distributed.This is the aim of this paper where we assume that X follows a matrix-variate complex t (Student)distribution and we study the impact on the distribution of some random variables commonly usedin adaptive ﬁltering and detection, including the SNR loss. As we shall see later, the matrix-variatecomplex Student distribution also appears naturally when training samples exhibit a particular case ofcovariance mismatch. In this paper we derive stochastic representations of relevant statistics in termsof independent random variables following a complex chi-square distribution. These representations relyon some properties of partitioned complex F distributed matrices. They allow quick insights into theimpact of mismatched training samples.We note that in the literature the impact of mismatch on adaptive ﬁlters or adaptive detectors hasbeen extensively studied, with two main types of mismatch considered. The ﬁrst concerns a mismatchon the SoI signature v , see e.g., [11–16]. Alternatively, researchers have studied the case where thecovariance matrix of X diﬀers from that of the data to be ﬁltered or the data under test [17–20]. Apossible combination of the two mismatches is addressed in [21, 22]. The situation considered herein isdiﬀerent as the mismatch concerns the training samples distribution.Before proceeding we state the notations used in this paper concerning matrix-variate distributions(MVD). References [23,24] provide a very comprehensive overview of real-valued MVD. For their extensionto complex-valued MVD we refer to e.g. [25–28] where most of the distributions considered below arestudied. In the sequel we note ˜N p,n (cid:0) ¯ X , Σ , Ω (cid:1) the complex matrix-variate distribution whose probabilitydensity function (p.d.f.) is p ( X ) = π − pn | Σ | − n | Ω | − p etr (cid:8) − Σ − ( X − ¯ X ) Ω − ( X − ¯ X ) H (cid:9) . When ¯ X = E { X } = the matrix S = XX H d = ˜W p ( n, Σ ) follows a complex Wishart distribution with p.d.f. p ( S ) ∝| Σ | − n | S | n − p etr (cid:8) − Σ − S (cid:9) where ∝ means “proportional to”. The complex matrix-variate t distributionis denoted by ˜T p,n (cid:0) ν, ¯ X , Σ , Ω (cid:1) and its p.d.f is given by p ( X ) ∝ | Σ | − n | Ω | − p | I p + Σ − ( X − ¯ X ) Ω − ( X − ¯ X ) H | − ( ν + p + n − . It is the distribution of X = ¯ X + ( W − / ) H Y where W d = ˜W p (cid:0) ν + p − , Σ − (cid:1) isindependent of Y d = ˜N p,n ( , I p , Ω ). W / denotes any square-root of W while W will stand for itsunique Hermitian square-root. If S i d = ˜W p ( n i , Σ ), i = 1 ,

2, then F = S S − S follows a complexmatrix-variate F distribution with p.d.f. p ( F ) ∝ | F | n − p | I p + F | − ( n + n ) and we note F d = ˜F p ( n , n ).The complex chi-square distribution with q degrees of freedom and non-centrality parameter δ will bedenoted as ˜ χ q ( δ ). In the sequel we assume that K training samples are available and distributed according to X d =˜T N,K ( ν − N + 1 , , µ Σ , I K ) so that their p.d.f is given by p ( X ) ∝ | µ Σ | − K | I N + ( µ Σ ) − XX H | − ( ν + K ) (4)As explained above, there are situations where the training samples are not Gaussian distributed, e.g.,in radar applications where the dominant part of the noise, namely the clutter, is often non Gaussianand well modelled by the class of compound-Gaussian distributions, of which the Student distributionis a member. Other applications have to deal with non Gaussian data and therefore it is of interestto investigate what happens when a ﬁlter is trained with samples that no longer follow a Gaussiandistribution but rather a Student distribution. Note that the SNR loss, as given in (2)-(3), does notdepend on the distribution of the data to be ﬁltered, it just requires that their covariance matrix is Σ . A second motivation for the use of the Student distribution is the following. Assume that thetraining samples are Gaussian distributed but have a covariance matrix Σ t that is diﬀerent from Σ , saywith no loss of generality Σ t = Σ / W − ( Σ / ) H for some positive deﬁnite matrix W . This is the ase for instance in non homogeneous environments in radar applications. We can thus assume that X | W d = ˜N N,K (cid:16) , Σ / W − ( Σ / ) H , I K (cid:17) . In [19] we analysed the distribution of the SNR loss for ﬁxedand arbitrary W . We showed that it can be written as a quadratic form in normal or Student randomvariables and we proposed approximations of them. Now the matrix W may be considered as a randommatrix and, if we assume a conjugate prior W d = ˜W N (cid:0) ν, µ − I N (cid:1) , then the marginal distribution of X is given by (4). In other words, the statistical model used herein results from a Bayesian model ofcovariance mismatch where the samples used to train the ﬁlter do not share the same covariance matrixas the samples to be ﬁltered. Therefore the model used in this paper covers the two cases describedabove. Note that the smaller ν the more heavy-tailed is the Student distribution.The sample covariance matrix S = XX H is still, up to a scaling factor, the MLE of E { XX H } = K ( ν − N ) − µ Σ and thus can still be used to design the adaptive ﬁlter w amf = ( v H S − v ) − S − v whoseSNR loss we are interested in. First let us note that X d = ( µ Σ ) W − ν N where W ν d = ˜W N ( ν, I N ) isindependent of N d = ˜N N,K ( , I N , I K ) [24] so that S d = µ Σ W − ν W K W − ν Σ = µ Σ F − Σ (5)where W K d = ˜W N ( K, I N ). It follows that F = W ν W − K W ν d = ˜F N ( ν, K ) [26, 29]. Therefore the SNRloss can be represented as ρ = SNR( w amf )SNR( w opt ) = ( v H S − v ) ( v H Σ − v )( v H S − ΣS − v ) d = ( v H Σ − FΣ − v ) ( v H Σ − v )( v H Σ − F Σ − v ) d = ( v H Σ − QFQ H Σ − v ) ( v H Σ − v )( v H Σ − QF Q H Σ − v ) (6)for any unitary matrix Q since F and Q H FQ have the same distribution. Let us choose Q such that Q H Σ − v = ( v H Σ − v ) / e N where e N = (cid:2) . . . (cid:3) T . Partitioning F as F = (cid:18) F F F F (cid:19) (7)where F is ( N − × ( N − ρ d = ( e HN Fe N ) e HN F e N = F F + F F = 11 + t H t (8)with t = F F − . As shown in A, one has t d = (1 + F − ) / n √ γ (9)where F , n and γ are independent with n d = ˜N N − ( , I N − ) and γ d = ˜ χ K − N +2 (0); F d = ˜ χ ν (0)˜ χ K − N +1 (0) (10)It ensues that the SNR loss admits the following representation ρ Student d = (cid:20) (cid:18) χ K − N +1 (0)˜ χ ν (0) (cid:19) ˜ χ N − (0)˜ χ K − N +2 (0) (cid:21) − (11)which provides a simple and convenient expression as a function of independent chi-square distributedrandom variables. This should be compared to its counterpart when X is Gaussian distributed, namely ρ Gaussian d = (cid:20) χ N − (0)˜ χ K − N +2 (0) (cid:21) − (12) learly the SNR loss is likely to take lower values in the Student case than in the Gaussian case and werecover that the two representations are equivalent as ν → ∞ . Moreover the average value of the term˜ χ K − N +1 (0) / ˜ χ ν (0) is ( K − N + 1) / ( ν −

1) and hence the diﬀerence is expected to increase as K increases.The representation in (11) also allows to derive (see B) the SNR loss p.d.f. which is given in equation(44) as well as its mean value which writes E { ρ Student } = ν ( K − N + 2)( ν + K − N + 1)( K + 1) × F (1 , K − N + 3 , K − N + 1; K + 2 , ν + K − N + 2; 1) (13)to be compared with E { ρ Gaussian } = ( K − N + 2) / ( K + 1).We now provide numerical evaluation of the diﬀerence between the distribution of the SNR lossobtained with Gaussian training samples and that obtained with Student training samples. Throughpreliminary simulations we checked that the distribution of the SNR loss obtained from the representationin (11) coincides with the distribution obtained when one generates snapshots from (4), computes w amf and its SNR loss in (3). We consider a scenario with N = 16 and µ is chosen equal to ν − N . We ﬁrst look atthe inﬂuence of ν in Figure 1 where we display the p.d.f and the cumulative distribution function (c.d.f.) of ρ for K = 2 N . As can be seen, the impact is rather signiﬁcant. For instance while P ( ρ Gaussian ≤ .

5) = 0 . P ( ρ Student ≤ .

5) = 0 .

4, 0 .

748 and 0 .

896 for ν = 10 N , ν = 2 N and ν = N + 2 respectively. Thisimpact depends however on K as illustrated in Figure 2. As could be expected from (11), the diﬀerencebetween the Student and the Gaussian cases increases with K . For instance for K = 4 N the probabilityof having an SNR loss lower than 0 . P ( ρ Gaussian ≤ .

5) = 3 .

65 10 − to P ( ρ Student ≤ .

5) =0 .

19, while for K = 2 N one goes from P ( ρ Gaussian ≤ .

5) = 0 . P ( ρ Student ≤ .

5) = 0 . K the larger the diﬀerence between E { ρ Gaussian } and E { ρ Student } .Another impact concerns the rate of convergence of the adaptive ﬁlter which is increased with Studenttraining samples as can be observed in Figure 4 where we plot the value of K required to have E { ρ Student } =0 .

5: clearly the required number of samples decreases when ν increases, going from K (cid:39)

30 in the Gaussiancase to K (cid:39)

96 when ν = N + 2. We now study the impact of Student distributed training samples for a related problem, namely thatof adaptive detection. A very common problem in multichannel processing [30] is to test H versus H where H : x d = ˜N N ( , Σ ) ; X d = ˜N N,K ( , Σ , I K ) H : x d = ˜N N ( α v , Σ ) ; X d = ˜N N,K ( , Σ , I K ) (14)The maximal invariant statistic for the detection problem in (14) is bi-dimensional [31] and is a one-to-onefunction of β = 1 / (1 + s − s ) and ˜ t = s / (1 + s − s ) where s = x H S − x ; s = | x H S − v | v H S − v (15) β corresponds to the loss factor whose distribution is actually that of ρ Gaussian when X is Gaussiandistributed. ˜ t corresponds to Kelly’s generalized likelihood ratio test (GLRT) statistic [30]. Any detectorwhich is a function of ( β, ˜ t ) has a constant false alarm rate property and actually most of the adaptivedetectors derived so far can be expressed as a function of ( β, ˜ t ) [32]. Therefore, it is of interest to seehow the performance of these detectors is aﬀected when the training samples are no longer Gaussiandistributed but Student distributed. Note that the impact of a ﬁxed covariance mismatch between x and X with the latter being both Gaussian distributed has been studied in [17, 18, 20–22]. In thesequel we consider a distribution mismatch and we assume that x d = ˜N N ( α v , Σ ) (where α is possiblyequal to zero) and that X d = ˜T N,K ( ν − N + 1 , , µ Σ , I K ) as before. It is diﬀerent from assuming that (cid:2) x X (cid:3) d = ˜T N,K +1 (cid:0) ν − N + 1 , (cid:2) α v 0 (cid:3) , µ Σ , I K +1 (cid:1) . In the latter case it has been shown [33, 34] thatthe GLRT is still Kelly’s detector [30] and that its distribution under the null hypothesis is the same as n the Gaussian case. The assumption here is diﬀerent since we have a distribution mismatch between x and X which can be direct or the consequence of a particular covariance mismatch. The aim of thepresent section is to derive statistical representations of ( β, ˜ t ) under this framework in order to ﬁgure outhow they deviate from the Gaussian case.Let us start with s = x H S − x d = µ − x H Σ − FΣ − x d = µ − ˜ x H F ˜ x (16)where ˜ x = Q H Σ − x d = ˜N N (cid:0) α ( v H Σ − v ) / , I N (cid:1) . Similarly s = | x H S − v | v H S − v d = µ − | ˜ x H Fe N | e HN Fe N (17)Partitioning ˜ x = (cid:20) ˜ x ˜ x (cid:21) and F as in (7), it is readily shown that s = s + µ − ˜ x H F . ˜ x s = µ − F (cid:12)(cid:12) ˜ x + ˜ x H t (cid:12)(cid:12) (18)where F . = F − F F − F . Consequently β = (1 + µ − ˜ x H F . ˜ x ) − (19)From A, we have that F . d = ˜F N − ( ν − , K ) d = W W − W where W d = ˜W N − ( ν − , I N − ) and W d = ˜W N − ( K, I N − ). Using well-known results on quadratic forms in Wishart distributions, it comes˜ x H F . ˜ x d = (˜ x H ˜ x ) ˜ χ ν − (0)˜ χ K − N +2 (0) (20)and ﬁnally β d = (cid:20) χ ν − (0) µ ˜ χ N − (0)˜ χ K − N +2 (0) (cid:21) − (21)where we used the fact that ˜ x d = ˜N N − ( , I N − ) and hence ˜ x H ˜ x d = ˜ χ N − (0). The previous equationshould be compared to its Gaussian counterpart namely β Gaussian d = (cid:20) χ N − (0)˜ χ K − N +2 (0) (cid:21) − (22)The diﬀerence lies in the factor µ − ˜ χ ν − (0). The latter is gamma distributed with mean µ − ( ν − µ − ( ν − K − E { XX H } = ( ν − N ) − µ Σ = Σ then the meanand variance become ( ν − / ( ν − N ) and ( ν − / ( ν − N ) . Therefore as ν → ∞ the distribution ofthis variable becomes more and more concentrated around 1 and the two representations are equivalent.However, for small ν there is a diﬀerence which will be quantiﬁed below. Another observation is that inthe Gaussian case β and ρ have the same distribution which is no longer the case with Student distributedsamples.Let us now turn to ˜ t which is the test statistic of Kelly’s GLRT and can be written as˜ t = s s − s d = µ − F (cid:12)(cid:12) ˜ x + ˜ x H t (cid:12)(cid:12) µ − ˜ x H F . ˜ x (23)From the representation of t in (9), we have that˜ x + ˜ x H t = ˜ x + (1 + F − ) / ˜ x H n √ γ (24) hich implies, since ˜ x d = ˜N (cid:0) α ( v H Σ − v ) / , (cid:1) that˜ x + ˜ x H t | ˜ x , F , γ d = ˜N (cid:18) α ( v H Σ − v ) / , F − ) ˜ x H ˜ x γ (cid:19) (25)Consequently ˜ t d = µ − F (cid:104) F − ) ˜ x H ˜ x γ (cid:105) ˜ χ ν − (0) µ ˜ x H ˜ x ˜ χ K − N +2 (0) ˜ χ ( δ ) (26)where δ = | α | ( v H Σ − v ) / (cid:104) F − ) ˜ x H ˜ x γ (cid:105) , the distributions of F , γ are given in (10) and˜ x H ˜ x d = ˜ χ N − (0). Equation (26) provides the statistical representation of ˜ t as a function of independentchi-square distributed random variables. It should be compared with the Gaussian expression˜ t Gaussian | β Gaussian d = ˜ χ ( β Gaussian | α | ( v H Σ − v ))˜ χ K − N +1 (0) (27)We now evaluate how the distributions of β and ˜ t in the Student case depart from their distributions withGaussian distributed training samples. As before we have N = 16 and µ = ν − N . Figures 5-6 displaythe c.d.f. of β and ˜ t for K = 2 N . Similarly to what was observed for the SNR loss we see that the impactis signiﬁcant, especially for β . This suggests that using β in the test statistic may lead to signiﬁcantperformance degradation. We also notice that contrary to the Gaussian case ρ and β do not have thesame distribution in the Student case. Furthermore, similarly to what was observed for ρ , the diﬀerencebetween Gaussian and Student distributions is all the more important that K is large, see Figures 7-8.Finally we investigate the inﬂuence of Student distributed training samples on the probability of falsealarm of ˜ t , i.e., Kelly’s Gaussian GLRT. The threshold is set so that P fa = 10 − in the Gaussian case.Figure 9 shows the actual P fa when t distributed training samples are used. One can observe two things.First, P fa is increased and the increase is more pronounced as K grows. Second, one can see that evenfor large ν we do not recover the Gaussian P fa due to the distribution mismatch between the data undertest x and the training samples X . In this paper we were interested in what happens to statistics commonly used in adaptive multichannelprocessing when the training samples used to infer noise are no longer Gaussian distributed but t dis-tributed. Statistical representations of the SNR loss and of some statistics used for adaptive detectionwere derived, based on properties of partitioned matrix-variate F distributions. The expressions derivedare given in terms of independent chi-square distributed random variables. They enable one to quicklyevaluate the impact of this type of distribution mismatch, which was illustrated numerically. A Properties of partitioned complex matrix-variate F distributedmatrices

In this appendix we derive some properties of partitioned complex matrix-variate F distributed matrices.Most of these properties were derived in the real case in [35]. We extend them to the complex caseand provide new additional results concerning marginalization of the distribution of t , see below. Let F d = ˜F p ( q, n ) and let us partition it as r s (cid:18) F F F F (cid:19) rs (28) he p.d.f. of F is given by p ( F ) ∝ | F | q − p | I p + F | − ( q + n ) . Now we have | F | = | F . || F | where F . = F − F F − F . Moreover[ I p + F ] . = I r + F − F ( I s + F ) − F = I r + F . + F F − F − F ( I s + F ) − F = I r + F . + F (cid:2) F − − ( I s + F ) − (cid:3) F = I r + F . + F F − ( I s + F ) − F = I r + F . + T ( I s + F ) − F T H (29)with T = F F − . It ensues that | [ I p + F ] . | = | I r + F . |× | I r + ( I r + F . ) − T ( I s + F ) − F T H | (30)Since the Jacobian J ( F → F . , T , F ) = | F | r [26], we can write the joint density of ( F . , T , F )as p ( F . , T , F ) ∝ | F . | q − r − s | I r + F . | − ( q + n − s ) × | F | q − s | I s + F | − ( q + n − r ) × | I r + F . | − s | F − ( I s + F ) | − r | I r + ( I r + F . ) − T ( I s + F ) − F T H | − ( q + n ) (31)Therefore F . and F are independent and F . d = ˜F r ( q − s, n ); F d = ˜F s ( q, n − r ) (32) T | F . , F d = ˜T r,s (cid:0) n + q − p + 1 , , I r + F . , F − ( I s + F ) (cid:1) (33)These results extend those of [35] to the complex case. Next, we marginalize T in order to obtain thedistribution of T | F . To do so, note that | I r + ( I r + F . ) − T ( I s + F ) − F T H | = | I r + F . | − | ∆ + F . | (34)where ∆ = I r + T ( I s + F ) − F T H . It follows that p ( T | F ) = (cid:90) p ( T | F , F . ) p ( F . ) d F . = | F − ( I s + F ) | − r (cid:90) | I r + F . | n + q − s | ∆ + F . | − ( q + n ) p ( F . ) d F . = | F − ( I s + F ) | − r (cid:90) | F . | q − r − s | ∆ + F . | − ( q + n ) d F . = | F − ( I s + F ) | − r | I r + T ( I s + F ) − F T H | − ( n + s ) (35)This proves that T | F d = ˜T r,s (cid:0) n − r + 1 , , I r , F − ( I s + F ) (cid:1) (36)When s = 1, (36) reduces to t | F d = ˜T p − (cid:0) n − p + 2 , , (1 + F − ) I p − (cid:1) (37)which means that t can be modelled as t = (1 + F − ) / n √ γ (38) ith n d = ˜N p − ( , I p − ) and γ d = ˜ χ n − p +2 (0). The distribution of t can be evaluated by marginalizing p ( t | F ), which gives p ( t ) = (cid:90) ∞ p ( t | F ) p ( F ) d F = Γ( n + 1) π p − Γ( n − p + 2) (cid:90) ∞ F p − (1 + F ) p − [1 + F (1 + F ) − t H t ] − ( n +1) p ( F ) d F (39)From (32), F has a complex scalar ˜ F ( q, n − p + 1) distribution so that p ( F ) = F q − (1 + F ) − ( q + n − p +1) B q,n − p +1 (40)where B a,b = Γ( a )Γ( b ) / Γ( a + b ). It follows that p ( t ) = (cid:90) ∞ C F p + q − (1 + F ) q + n [1 + F (1 + F ) − t H t ] − ( n +1) d F = CB p + q − ,n − p +1 2 F ( n + 1 , p + q − n + q ; − t H t ) (41)with C = Γ( n +1) π p − Γ( n − p +2) 1 B ( q,n − p +1) and where we used [36] to obtain the last line. The unconditionaldistribution is seen to depend only on t H t . Finally note that in the purpose of analyzing the SNR lossthe conditional distribution p ( t | F ) is the most convenient and is actually used. B Distribution and average value of SNR loss in the Studentcase

In this appendix we derive the p.d.f as well as the mean value of the SNR loss. Let F d = ˜ χ K − N +1 (0) / ˜ χ ν (0)and F d = ˜ χ N − (0) / ˜ χ K − N +2 (0) and let ρ d = [1 + (1 + F ) F ] − . Let us ﬁrst evaluate the distribution of ρ | F . The p.d.f of F is given by p F ( f ) = 1 B N − ,K − N +2 f N − (1 + f ) K +1 (42)Making the change of variables ρ = [1 + (1 + F ) F ] − ⇔ F = (1 + F ) − ( ρ − −

1) whose Jacobian is J ( F → ρ | F ) = (1 + F ) − ρ − , it follows that p ( ρ | F ) = (1 + F ) K − N +2 B N − ,K − N +2 ρ K − N +1 (1 − ρ ) N − (1 + ρF ) K +1 (43)Setting F = 0, one recovers the usual beta distribution of the SNR loss in the Gaussian case. Marginal-izing with respect to the p.d.f. of F we obtain p ( ρ ) = (cid:90) ∞ p ( ρ | f ) p F ( f ) d f = ρ K − N +1 (1 − ρ ) N − B N − ,K − N +2 B K − N +1 ,ν (cid:90) ∞ f K − N (1 + f ) − ( ν − (1 + ρf ) K +1 d f = B K − N +1 ,ν + N − B N − ,K − N +2 B K − N +1 ,ν ρ K − N +1 (1 − ρ ) N − F ( K + 1 , K − N + 1; ν + K ; 1 − ρ ) (44) here we made use of [36] to obtain the last equality. The previous equation allows to calculate theaverage value of the SNR loss. Let us start with the conditional mean of ρ : E { ρ | F } = (cid:90) ρ p ( ρ | F ) d ρ = (1 + F ) K − N +2 B N − ,K − N +2 (cid:90) ρ K − N +2 (1 − ρ ) N − (1 + ρF ) K +1 d ρ = B N − ,K − N +3 B N − ,K − N +2 (1 + F ) K − N +2 2 F ( K + 1 , K − N + 3; K + 2; − F )= K − N + 2 K + 1 (1 + F ) K − N +2 2 F ( K + 1 , K − N + 3; K + 2; − F )= K − N + 2 K + 1 (1 + F ) − F (cid:18) , K − N + 3; K + 2; F F (cid:19) (45)where the two last lines are obtained from equivalent expressions of the hypergeometric function [36]. Ifwe set F = 0 in the previous equation we recover the Gaussian case for which E { ρ Gaussian } = K − N +2 K +1 .Next we need to integrate with respect to the density of F : E { ρ } = (cid:90) ∞ E { ρ | f } p F ( f ) d f = (cid:90) ∞ E { ρ | f } B K − N +1 ,ν f K − N (1 + f ) ν + K − N +1 d f = ( K − N + 2)( K + 1) B K − N +1 ,ν (cid:90) ∞ f K − N (1 + f ) ν + K − N +2 2 F (cid:18) , K − N + 3; K + 2; f f (cid:19) d f (46)Making the change of variables x = f / (1 + f ) the integral above can be written as I = (cid:90) ∞ x K − N (1 − x ) ν F (1 , K − N + 3; K + 2; x ) d x = B K − N +1 ,ν +1 3 F (1 , K − N + 3 , K − N + 1; K + 2 , ν + K − N + 2; 1) (47)which ﬁnally results in E { ρ } = K − N + 2 K + 1 B K − N +1 ,ν +1 B K − N +1 ,ν × F (1 , K − N + 3 , K − N + 1; K + 2 , ν + K − N + 2; 1)= ν ( K − N + 2)( ν + K − N + 1)( K + 1) × F (1 , K − N + 3 , K − N + 1; K + 2 , ν + K − N + 2; 1) (48) References [1] J. Ward. Space-time adaptive processing for airborne radar. Technical Report 1015, Lincoln Labo-ratory, Massachusetts Institute of Technology, Lexington, MA, December 1994.[2] W. L. Melvin and J. A. Scheer, editors.

Principles of Modern Radar: advanced principles , volume 2.Institution Engineering Technology, 2012.[3] M. A. Richards.

Fundamentals of Radar Signal Processing . McGraw Hill, 2nd edition, 2014.[4] F. C. Robey, D. R. Fuhrmann, E. J. Kelly, and R. Nitzberg. A CFAR adaptive matched ﬁlterdetector.

IEEE Transactions Aerospace Electronic Systems , 28(1):208–216, January 1992.[5] I. S. Reed, J. D. Mallett, and L. E. Brennan. Rapid convergence rate in adaptive arrays.

IEEETransactions Aerospace Electronic Systems , 10(6):853–863, November 1974.[6] C. G. Khatri and C. R. Rao. Eﬀects of estimated noise covariance matrix in optimal signal detection.

IEEE Transactions Acoustics Speech Signal Processing , 35(5):671–679, May 1987.

7] J. Liu, W. Liu, H. Liu, B. Chen, X.-G. Xia, and F. Dai. Average SINR calculation of a persymmetricsample matrix inversion beamformer.

IEEE Transactions Signal Processing , 64(8):2135–2145, April2016.[8] A. Farina, F. Gini, M. V. Greco, and L. Verrazzani. High resolution sea clutter data: statisticalanalysis of recorded live data.

IEE Proceedings - Radar, Sonar and Navigation , 144(3):121–130,1997.[9] J. B. Billingsley, A. Farina, F. Gini, M. V. Greco, and L. Verrazzani. Statistical analyses of measuredradar ground clutter data.

IEEE Transactions Aerospace Electronic Systems , 35(2):579–593, April1999.[10] E. Conte, A. De Maio, and A. Farina. Statistical tests for higher order analysis of radar clutter - Theiranalysis to L-band measured data.

IEEE Transactions Aerospace Electronic Systems , 41(1):205–218,January 2005.[11] D. M. Boroson. Sample size considerations for adaptive arrays.

IEEE Transactions AerospaceElectronic Systems , 16(4):446–451, July 1980.[12] E. J. Kelly. Performance of an adaptive detection algorithm; rejection of unwanted signals.

IEEETransactions Aerospace Electronic Systems , 25(2):122–133, April 1989.[13] S. Z. Kalson. An adaptive array detector with mismatched signal rejection.

IEEE TransactionsAerospace Electronic Systems , 28(1):195–207, January 1992.[14] S. Bose and A. O. Steinhardt. Adaptive array detection of uncertain rank one waveforms.

IEEETransactions Signal Processing , 44(11):2801–2809, November 1996.[15] F. Bandiera, D. Orlando, and G. Ricci.

Advanced radar detection schemes under mismatched signalmodels , volume 4 of

Synthesis lectures on Signal Processing . Morgan & Claypool, 2009.[16] J. Liu, D. Orlando, P. Addabbo, and W. Liu. SINR distribution for the persymmetric SMI beam-former with steering vector mismatches.

IEEE Transactions Signal Processing , 67(5):1382–1392,March 2019.[17] C. D. Richmond. Performance of a class of adaptive detection algorithms in nonhomogeneous envi-ronments.

IEEE Transactions Signal Processing , 48(5):1248–1262, May 2000.[18] R. S. Raghavan. False alarm analysis of the AMF algorithm for mismatched training.

IEEE Trans-actions Signal Processing , 67(1):83–96, January 2019.[19] O. Besson. Analysis of the SNR loss distribution with covariance mismatched training samples.

IEEE Transactions Signal Processing , 68:5759–5768, 2020.[20] O. Besson. Impact of covariance mismatched training samples on constant false alarm rate detectors.

IEEE Transactions on Signal Processing , 69:755–765, 2021.[21] R. S. Blum and K. F. McDonald. Analysis of STAP algorithms for cases with mismatched steeringand clutter statistics.

IEEE Transactions Signal Processing , 48(2):301–310, February 2000.[22] K. F. McDonald and R. S. Blum. Exact performance of STAP algorithms with mismatched steeringand clutter statistics.

IEEE Transactions Signal Processing , 48(10):2750–2763, October 2000.[23] R. J. Muirhead.

Aspects of Multivariate Statistical Theory . John Wiley & Sons, Hoboken, NJ, 1982.[24] A. K. Gupta and D. K. Nagar.

Matrix Variate Distributions . Chapman & Hall/CRC, Boca Raton,FL, 2000.[25] N. R. Goodman. Statistical analysis based on a certain multivariate complex Gaussian distribution(An introduction).

The Annals of Mathematical Statistics , 34(1):152–177, March 1963.[26] C. G. Khatri. Classical statistical analysis based on a certain multivariate complex Gaussian distri-bution.

The Annals of Mathematical Statistics , 36(1):98–114, February 1965.[27] P. R. Krishnaiah. Some recents developments in complex multivariate distributions.

Journal ofMultivariate Analysis , 6:1–30, March 1976.

28] A. M. Mathai and S. B. Provost. Some complex matrix-variate statistical distributions on rectangularmatrices.

Linear Algebra and its Applications , 410:198–216, November 2005.[29] I. Olkin and H. Rubin. Multivariate beta distribution and independence properties of Wishartmatrices.

The Annals of Mathematical Statistics , 35(1):261–269, March 1964.[30] E. J. Kelly. An adaptive detection algorithm.

IEEE Transactions Aerospace Electronic Systems ,22(1):115–127, March 1986.[31] S. Bose and A. O. Steinhardt. A maximal invariant framework for adaptive detection with struc-tured and unstructured covariance matrices.

IEEE Transactions Signal Processing , 43(9):2164–2175,September 1995.[32] A. Coluccia, A. Fascista, and G. Ricci. CFAR feature plane: A novel framework for the analysis anddesign of radar detectors.

IEEE Transactions on Signal Processing , 68:3903–3916, 2020.[33] C. D. Richmond. A note on non-Gaussian adaptive array detection and signal parameter estimation.

IEEE Signal Processing Letters , 3(8):251–252, August 1996.[34] C. D. Richmond.

Adaptive Array Signal Processing and Performance Analysis in Non-GaussianEnvironments . PhD thesis, Massachusetts Institute of Technology, 1996.[35] W. Y. Tan. Note on the multivariate and the generalized multivariate beta distributions.

Journalof the American Statistical Association , 64:230–241, March 1969.[36] I. S. Gradshteyn and I. M. Ryzhik.

Table of Integrals, Series and Products . Academic Press, 7thedition, 2007. Figure 1: Probability density function and cumulative distribution function of ρ for various ν . K = 2 N . 12 Figure 2: Probability density function and cumulative distribution function of ρ for various K . ν = 2 N . 13

20 40 60 80 100 120 140 160-10-9-8-7-6-5-4-3-2-10

Figure 3: Average value of ρ Student versus K for various ν . Figure 4: Number of snapshots required to have E { ρ Student } = 0 . ν .14 .2 0.3 0.4 0.5 0.6 0.7 0.8 0.900.10.20.30.40.50.60.70.80.91 Figure 5: Cumulative distribution function of β for various ν . K = 2 N . -6 -4 -2 Figure 6: Cumulative distribution function of ˜ t for various ν . K = 2 N .15 .2 0.3 0.4 0.5 0.6 0.7 0.8 0.900.10.20.30.40.50.60.70.80.91 Figure 7: Cumulative distribution function of β for various K . ν = 2 N . -6 -4 -2 Figure 8: Cumulative distribution function of ˜ t for various K . ν = 2 N .16