[PDF] Ellipse Combining with Unknown Cross Ellipse Correlations

Abstract

We discuss the combining of measurements where single measurement covariances are given but the joint measurement covariance is unknown. For this paper we assume the mapping of a single measurement to the solution space is the identity matrix. We examine the solution when it is assumed all measurements are uncorrelated. We then present a way to parameter joint measurement covariance based on pairwise correlation coefficients. Finally, we discuss how to use this parameterization to combine the measurements.

Full PDF

EEllipse Combining with Unknown Cross EllipseCorrelations

Adam Hall

We discuss the combining of measurements where single measurement covari-ances are given but the joint measurement covariance is unknown. For thispaper we assume the mapping of a single measurement to the solution space isthe identity matrix. We examine the solution when it is assumed all measure-ments are uncorrelated. We then present a way to parameter joint measure-ment covariance based on pairwise correlation coefﬁcients. Finally, we discusshow to use this parameterization to combine the measurements.

Introduction

In this paper, we are going to examine the linear combining or fusion of location estimates.Each estimate is a vector of size k × and for each estimate we are also provided a k × k covariance matrix. We will assume that the overall joint system covariance which describes thecorrelation between location estimates is not provided. Speciﬁcally, we consider the generalizedleast squares problem of ﬁnding an estimate ˆ x given the following model y = Ax + (cid:15), (1)1 a r X i v : . [ m a t h . S T ] J a n here y is a column stacked vector of n , k × estimates , each k × vector is denoted as y i y =  y ... y n  (2)and A , our design matrix, is a block matrix, a stack of n , k × k identity matrices A =  I kxk ... I kxk  . (3)The size of A is nk × k . Finally, (cid:15) is a zero mean Gaussian process with a covariance matrixthat overall is unknown but whose block diagonal is provided by the y i estimated covariancedenoted as E i , where E i is a k × k matrix. We deﬁne the joint covariance matrix R as a blockmatrix as follows E (cid:0) (cid:15)(cid:15) T (cid:1) = R =  E E (cid:0) y y T (cid:1) . . . E (cid:0) y y Tn (cid:1) E (cid:0) y y T (cid:1) E . . . ...... . . . . . . ... E (cid:0) y n y T (cid:1) . . . . . . E n  (4)where E ( y i y j ) is not known when i (cid:54) = j .At ﬁrst this model may look restrictively speciﬁc because the design matrix A consists of acolumn stack of identity matrices. Often in practice measurements must be transformed via adesign matrix A which is not a column stack of identity matrices. However, the model aboveis applicable to data fusion problems where the raw measurements are unavailable. Assumefor example that we obtain n estimates of the k × parameters, where each y i estimate wasprovided by some black-box processor. This black-box processor might have produced each y i n estimates. However, this isoften not possible; thus, we have the problem presented and discussed in this paper.In this paper, we will examine the best linear unbiased estimator (BLUE) for this model un-der various constructions of the joint unknown covariance matrix R . First, we will examine thetrivial solution when one assumes that all cross estimate correlations are zero, E (cid:0) y i y Tj (cid:1) | i (cid:54) = j = 0 .This approach is sometimes called ellipse convolving. In practice assuming datasets to be com-pletely uncorrelated often produces solution covariance estimates that are too small because theassumption of uncorrelated datasets is often wrong. Some examples of correlation found indatasets thought to be uncorrelated are found in the following sources [15,10,7,5,14] . We will thenlook at an approach to parameterize the covariance matrix by cross-correlation coefﬁcients.Under this parameterization we will solve for the max-entropy solution as a function of thecross-correlation coefﬁcient for combining two estimates y and y . Finally, we will extend thispairwise approach to n estimates. Use of the Terms: Ellipse, Error Integral and Entropy.

The probability density function ofthe y i measurement with covariance E i is f ( y i ) = (cid:0) [2 π ] k | E i | (cid:1) − / exp (cid:18) −

12 ( y − y i ) T E − ( y − y i ) (cid:19) . (5)We can deﬁne the contour where f ( y i ) is equal to a constant by ﬁnding all whose Mahalanobisdistance is the same. In other words, we deﬁne the contour as all points y such that (cid:113) ( y − y i ) T E − i ( y − y i ) = C, (6)where C is a positive constant. [12] It is well-known that this contour is the equation of an ellipsefor k = 2 and in general an ellipsoid. For this reason we use the capital E to denote thecovariance and refer to combining measurements as Ellipse Combining. Furthermore, for k = 2 k = 3 the volume of the ellipsoid) deﬁned as the area where (cid:113) ( y − y i ) T E − i ( y − y i ) < C (7)is proportional to (cid:112) | E i | . [4] In this paper we will say that (cid:112) | E | is proportional to the errorintegral of the ellipse E . Finally, throughout this paper we will refer to the differential entropyof a continuous random variable simply as the entropy. Review of Generalized Least Squares.

For reference and completeness, when the covariancematrix is fully speciﬁed and one combines the measurements using a semi-positive deﬁniteweight matrix W , we have the well-known results [2] ˆ x = ( A T W A ) − A T W y, (8) E (cid:0) ˆ x ˆ x T (cid:1) = P ( W, R ) = ( A T W A ) − A T W RW A ( A T W A ) − . (9)The best linear unbiased estimator (BLUE) is found by setting W = R − and for this speciﬁccase Equation 9 simpliﬁes to E (cid:0) ˆ x ˆ x T (cid:1) = P ( R − , R ) = ( A T R − A ) − . (10)As deﬁned so far, the BLUE solution above requires that R be positive deﬁnite. We canextend the solution to incorporate R which is only positive semideﬁnite by setting W = R † ,where X † is deﬁned as the pseudo-inverse of a positive semideﬁnite matrix X . [6] In the casewhere R is only positive semideﬁnite we also require that the solution covariance P be positivedeﬁnite. For the rest of this paper, for simplicity, we will denote the pseudo-inverse of R as R − and not R † . To summarize, in this paper we require that R be a positive semideﬁnite suchthat the resulting solution P is strictly positive deﬁnite and R − will be used to denote thepseudo-inverse of R . 4 eﬁning α and β to compare models. We can compare P ( W, R ) for various W and R . Forexample, assume that we have some guess or estimate of the joint covariance denoted ˆ R . Wethen set W − = ˆ R compute a solution and report a solution covariance via Equation 9 setting R = ˆ R . Our reported solution covariance will be mismatched if R (cid:54) = ˆ R . To describe thismismatch we deﬁne α ( W, R ) = (cid:115) | P ( W, R ) || P ( W, W − ) | . (11)The scalar α ( W, R ) is the number of times smaller or bigger the error integral of our estimateis compared to what we report.We also deﬁne β ( W, R ) = (cid:115) | P ( W, R ) || P ( R − , R ) | . (12)The scalar β ( W, R ) is the number of times bigger the error integral of our estimate is comparedto the BLUE solution. The scalar β ( W, R ) ≥ .The scalar α ( W, R ) represents a mismatch between reported covariance and the actual co-variance. The scalar β ( W, R ) represents the number of times bigger our solution error integralis compared to what it could be if we fully knew R and applied the BLUE weights (i.e. set W = R − ). There are other metrics and methods to compare P ( W, R ) for various W and R ;nevertheless, we will use α and β as deﬁned above as our metrics for comparison. Ellipse Convolving

We can construct R as n × n block matrix where each block is a k × k ; the size of R is ( nk ) × ( nk ) . We denote the i th row and j th column block matrix by R i,j . If we assume that allnon-diagonal blocks consist of null matrices we have R i,i = E i , (13) R i,j | j (cid:54) = i = 0 kxk . (14)5hen we ﬁnd the BLUE solution is ˆ x = ( A T R − A ) − A T R − y = (cid:32) n (cid:88) i =1 E − i (cid:33) − (cid:32) n (cid:88) i =1 E − i y i (cid:33) (15)with covariance, P = ( A T R − A ) − = (cid:32) n (cid:88) i =1 E − i (cid:33) − . (16)For the rest of this paper, we will denote this solution as ˆ x c and its covariance as P c where thesubscript c is for ellipse convolving. The underlying assumption is that all provided estimatesare independent (and linear). We will consider this the best case scenario and as a result treat | P c | as a lower bound. In practice, estimates will almost never be completely independentand linear. For example, our estimates might all be off by some unknown bias, which we cantreat as a nonzero positive cross correlation between all the estimates. In exploring these othercovariance models, we will ﬁnd that strictly speaking P c is not always the most optimistic orbest case scenario. In other words there are plenty of cases where one can design R to have non-zero positive cross estimate correlation values whose BLUE combine estimate covariance P issuch that | P | < | P c | . However, given that cross-estimate correlations are unknown, why selectweights that reﬂect the cross-estimate correlations working together to improve your estimatescompared to zero cross correlation? Based on this reasoning, such matrices that theoreticallyoutperform convolving are rejected. Instead, this paper will investigate covariance matrices thatresult in | P | ≥ | P c | . Pairwise Ellipse Combining Parameterized by a Scalar Cross-Ellipse Correlation Coefﬁcient

The previous section showed that ellipse convolving is the best linear unbiased estimator (i.e.BLUE) for combining measurements under the assumption that all cross-ellipse correlationvalues are zero. This section will investigate BLUE for combining two ellipses whose cross-6llipse correlation is parameterized by a scalar cross-correlation coefﬁcient r . Then we willﬁnd the cross correlation coefﬁcient r max which maximizes the entropy of the combined ellipsesolution.We start by considering the combining of two ellipses with covariance matrices E and E respectively and a joint system covariance matrix R =  E r √ E E r (cid:0) √ E E (cid:1) T E  (17)where we will deﬁne the square root of a matrix X as the principal square root. [9] The Schurcomplement of the E block matrix of the matrix R is R/E = E − r ( (cid:112) E E ) E − (cid:16)(cid:112) E E (cid:17) T = (1 − r ) E . (18)Clearly the matrix R/E is positive semideﬁnite if and only if | r | ≤ . Or equivalently, R positive semideﬁnite if and only if | r | ≤ . [6] Also if | r | = 1 , then | R | = 0 . Via the blockmatrix inversion identify [3] we ﬁnd R − = (1 − r ) −  E − − r (cid:112) E − E − − r (cid:16)(cid:112) E − E − (cid:17) T E −  . (19)The entropy of the BLUE solution is h ( P ) = k k π ) + 12 ln ( | P | ) , (20)where P is deﬁned by Equation 10. [1] Clearly ﬁnding the r max , the cross-correlation coefﬁcientthat maximizes the entropy is equivalent to ﬁnding the r max that maximizes | P | . For algebraicsimplicity, we will examine | P − | , P − = (1 − r ) − ( S − rZ ) (21)7here we deﬁne Z = (cid:113) E − E − + (cid:18)(cid:113) E − E − (cid:19) T , (22) S = E − + E − . (23)To ﬁnd the critical points of | P − | with respect to r we look at ddr | P − | = ddr (cid:2) (1 − r ) − k | S − rZ | (cid:3) (24)and via Jacobi identity [11] and the derivative chain rule we ﬁnd ddr | P − | = r tr ( adj ( S − rZ ) Z ) + 2 kr | S − rZ | − tr ( adj ( S − rZ ) Z )(1 − r ) k +1 (25)where adj ( X ) is the adjugate matrix of X and tr ( X ) is the trace of X .By looking at the values of r for which ddr | P − | = 0 we can determine which value r max thatmaximizes | P | for positive coefﬁcient values < r ≤ . In doing so, we can ﬁnd an interval ≤ r ≤ r max ≤ where the | P | is monotonically increasing with respect to r . In other wordsthe entropy of our combine ellipse increases with respect to r . We will restrict ourselves tothe range ≤ r ≤ r max when considering alternative algorithms besides convolving to enableellipse combining.It is also useful to compare a mismatch between a weight model based on a predicted r say r p versus the actual unknown r say we label it r n . For the pairwise case we ﬁnd that thecovariance for the mismatched case given by equation 9, now rewritten as a function of r p and r n instead of W and R becomes P ( r p , r n ) = ( S − r p Z ) − [(1 − r p r n + r p ) S + ( r n − r p + r n r p ) Z ]( S − r p Z ) − . (26)We note P c = P (0 , and introduce a new constant P max deﬁned as P max = P ( r max , r max ) . (27)8e can rewrite Equation 11 in terms of r p and r n instead of W and R , we ﬁnd α ( r p , r n ) = (1 − r p ) − k/ (cid:115) | (1 − r p r n + r p ) S + ( r n − r p + r n r p ) Z || ( S − r p Z ) | . (28)Finally, we can rewrite Equation 12 in terms of r p and r n as β ( r p , r n ) = (cid:115) | ( S − r n Z )([1 − r p r n + r p ] S + [ r n − r p + r n r p ] Z ) | (1 − r n ) k | ( S − r p Z ) | . (29)The next two sections provide more speciﬁc discussion for 1-D and 2-D location estimates. Pairwise Combine: The 1-D case, k =1. If we have k = 1 covariance matrices E and E are scalars; we deﬁne E = σ , (30) E = σ . (31)With these deﬁnitions, we ﬁnd Z = 2 σ σ , (32) S = 1 σ + 1 σ . (33)And equation 25 simpliﬁes to ddr | P − | = − Zr + 2 r ( S − rZ ) − Z (1 − r ) . (34)Setting the derivative of | P − | to zero and restricting ourselves to the region | r | < we ﬁnd onecritical point and we can easily verify this point is a local maximum of | P | , we have r max = min (cid:18) σ σ , σ σ (cid:19) . (35)We also ﬁnd ˆ x = w y + w y (36)9here w = σ − rσ σ σ + σ − rσ σ , (37) w = σ − rσ σ σ + σ − rσ σ . (38)We note that for values r > r max we ﬁnd either w or w is less than zero. For the case wherethe cross-correlation coefﬁcient is unknown and we are computing a weighted average it iscounter-intuitive to select a r value that results in a negative weight. Throughout this paper, weavoid this by restricting ourselves to the region ≤ r ≤ r max .For a numerical example, let us consider σ = 1 . and σ = 2 . . Equation 35 gives r max =0 . . Figure 1 shows the BLUE solution as a function of r . Figure 2 shows α for both r p = 0 and r p = r max . As | r p − r n | increases, α (0 , r n ) also increases becoming greater than which meansif we convolve and compute the solution covariance assuming the variables are independent andthey are not, our solution covariance error integral will be too small. In contrast, as | r p − r n | increases, α ( r max , r n ) decreases becoming slightly less than which means if we computethe combine solution under assume r = r max , our computed covariance error integral will beapproximately correct and slightly too big if the variables are closer to independent. Finally,Figure 3 shows β for both r p = 0 and r p = r max . From this ﬁgure we note that as | r p − r n | increases, both β (0 , r n ) and β ( r max , r n ) increase. We also note that β ( r max , r n ) increases fasterthan β (0 , r n ) as a function of | r p − r n | . This means that if we guess r but are off compared to thetrue r , our algorithm will compute a covariance whose error integral is larger than theoreticallypossible but we will be closer to the lower bound using weights based on r = 0 (convolving)compared to r = r max . 10igure 1: Numerical Example of the solution P (deﬁned in equation 26) as a function of r . Thesolid line is the interval of ≤ r ≤ r max ; we recommend only considering r values in thisregion. The dotted line shows the region where r max < r ≤ .11igure 2: Numerical Example of the solution α ( r p , r n ) as a function of | r p − r n | , where ≤ r n ≤ r max .Figure 3: Numerical Example of the solution β ( r p , r n ) as a function of | r p − r n | , where ≤ r n ≤ r max . 12 airwise Combine: The 2-D case, k =2. For × ellipses, we ﬁnd | P | = (cid:20) r | Z | − λr + | S | (1 − r ) (cid:21) − (39)where λ = tr ( S adj ( Z )) . (40)We ﬁnd the r max value will be one of the solutions to the cubic equation, | Z | r − λr + (4 | S | + 2 | Z | ) r − λ = 0 . (41)We can ﬁnd solutions to the cubic equation above, throw out solutions where | r | > and fromthe remaining solutions ﬁnd r max and the interval ≤ r ≤ r max where | P | is monotonicallyincreasing with respect to r .For a numerical example let us consider E =   , (42) E =   . (43)For this example Equation 41 becomes √ r − √ √ r + 2 √ r − √ − √

32 = 0 (44)and we ﬁnd r max ≈ . . Figure 4 shows the BLUE solution as a function of r . Figure5 shows α for both r p = 0 and r p = r max . Finally, Figure 6 shows β for both r p = 0 and r p = r max . While the exact values are different, the patterns we observed in the previous sec-tion for k = 1 are the same for k = 2 . Namely, if we construct weights and covariance solutionestimation assuming r = 0 our constructed covariance error integral will be too small if the13easurements cross-correlation is non-zero. Conversely, if we construct weights and covari-ance solution estimation assuming r = r max and r is actually less than r max , our constructedcovariance error integral will be approximately correct but slightly too large. Additionally,the ratio between the theoretical best error integral and the one computed for a ﬁxed value of | r p − r n | is smaller when r p = 0 compared to r p = r max .Figure 4: Numerical Example of the solution P (deﬁned in equation 26) as a function of r . Thesolid line is the interval of ≤ r ≤ r max ; we recommend only considering r values in thisregion. The dotted line shows the region where r max < r ≤ .14igure 5: Numerical Example of the solution α ( r p , r n ) as a function of | r p − r n | , where ≤ r n ≤ r max .Figure 6: Numerical Example of the solution β ( r p , r n ) as a function of | r p − r n | , where ≤ r n ≤ r max . 15 xtending Ellipse Combining to n Ellipses

In general, we will have more than two ellipses we wish to combine. The extension is straight-forward. We parameterize the joint system covariance matrix by correlation coefﬁcient betweenevery pair of ellipses. We consider n ellipses with covariance matrices E , E , ..., E j , ..., E n allof size k × k . Then, we deﬁne the joint covariance matrix parameterized by pairwise correlationcoefﬁcients as a block matrix consisting of n × n blocks. Each block is a square matrix equalin size to the individual ellipse covariance. The i row, j column sub-block is deﬁned as R i,j | ij = r i,j (cid:16)(cid:112) E i E j (cid:17) T . (47)Or in other words we deﬁne R ( −→ r ) =  E r , √ E E . . . r ,n √ E E n r , (cid:0) √ E E (cid:1) T E . . . ...... . . . . . . ... r ,n (cid:0) √ E E n (cid:1) T . . . . . . E n  (48)where −→ r is a vector of n − n correlation coefﬁcients. The values of −→ r are constrained to valuessuch that R ( −→ r ) is positive semideﬁnite.Over all possible valid −→ r we deﬁne −→ r max as the vector that maximizes P ( R − ( −→ r ) , R ( −→ r )) as deﬁned in Equation 9. While identifying and developing algorithms to efﬁciently ﬁnd −→ r max is left for future work, we will deﬁne an initial guess of −→ r max based on the pairwise solution.We deﬁne −→ r pm as the vector where each r i,j is set by considering the pairwise problem involv-ing just E i and E j and ﬁnding the coefﬁcient that maximizes this pairwise problem as deﬁned16n the previous sections. The subscript pm in −→ r pm denotes the pairwise max. We can use −→ r pm as an initial guess of −→ r max or even as a computationally inexpensive substitute. We conjecturethat the matrix R ( −→ r pm ) is always positive semideﬁnite, or in other words that −→ r pm is a validchoice of −→ r .We deﬁne R max and R pm as R max = R ( −→ r max ) , (49)and R pm = R ( −→ r pm ) . (50) R max is the covariance matrix that results in the blue solution whose uncertainty has the largestentropy, and R pm is an initial guess or practical estimate of R max . It is noted that in the speciﬁccase of k = 1 , R pm = R max . Three Alternatives to Convolving

The convolving algorithm and resulting combined covariance are based on an assumption ofindependent and zero-mean measurements. If we have reason to believe that our observationsare correlated or biased we can adopt a different strategy. While there are numerous possiblestrategies, in this section we will discuss three.

Max Entropy BLUE.

Given n observations y i and their respective covariance matrices E i ,we build the max entropy parameterized covariance matrix R max as deﬁned by Equation 49.Then, compute the blue solution. The resulting combined estimate and reported covariance aregiven by ˆ x = ( A T R − max A ) − A T R − max y, (51) P max = ( A T R − max A ) − . (52)17his algorithm assumes the cross-observation correlations are such to make our estimation asuncertain as possible. Compared to the next proposed algorithm, this algorithm is computation-ally expensive especially as the number of measurements n increases. Computing ˆ x directlyinvolves both ﬁnding R max and taking the inverse of R max , a matrix of size ( nk ) × ( nk ) . Wenote, one could also use R pm in place of R max in order to avoid the computational cost of ﬁnding R max . Assume Max Entropy to Report Covariance; Convolve to estimate solution.

Given n ob-servations y i and their respective covariance matrices E i , we build the max entropy param-eterized covariance matrix R max as deﬁned by Equation 49. Next, we estimate the solutionassuming the measurements are uncorrelated (i.e. convolve) and report the combined estimatecovariance under the max entropy assumption. We deﬁne P c = (cid:32) n (cid:88) i =1 E − i (cid:33) − , (53)and P r = P c (cid:32) n (cid:88) i =1 n (cid:88) j = i +1 r i,j,max (cid:20)(cid:113) E − i E − j + (cid:113) E − j E − i (cid:21)(cid:33) P c . (54)Subsequently, we ﬁnd that in this algorithm ˆ x and reported covariance P as follows, ˆ x = P c (cid:32) n (cid:88) i =1 E − i y i (cid:33) (55) P = P c + P r . (56)This algorithm is convolving with the only difference of a modiﬁed reported covariance. Themodiﬁed covariance matrix is a conservative estimate which assumes the max entropy jointcovariance R max . In contrast to the previous algorithm R max is never inverted. Additionally, ifone assumes that new measurements are coming in over time, it is easy to write this algorithmas an update to previous solutions with each new estimate.18 ax Entropy provided a sum of covariances per measurement. This approach requiresthat each observation covariance is speciﬁed as a linear sum of covariance to include an inde-pendent portion and some number, say m correlated or bias portions, E i = B + (cid:32) m (cid:88) a =1 B a (cid:33) . (57)Then we deﬁne the joint covariance model as R = R ( −→ (cid:32) m (cid:88) a =1 R a ( −→ r a ) (cid:33) , (58)where −→ r a is a vector of correlation coefﬁcients for the a th covariance matrix in the sum. Wedeﬁne the number of covariance terms m and their respective −→ r a based on the speciﬁc modelwe wish to adopt. We have deﬁned B to be the portion of each covariance that is completelyindependent and therefore the joint covariance R is just a block diagonal matrix. After con-structing R , the algorithm performs the best linear unbiased estimator. For example, we mighthave a bias covariance due to a common instrument used to make the observations so for allmeasurements that come from that instrument we select the max entropy coefﬁcient within −→ r a .While if observations come from another instrument, we set the appropriate −→ r a term to zero.This approach requires that the measurements provided come with more than just a single co-variance matrix to enable one to construct R . One could make these models very complexwith a large m and different rules for building each −→ r a , but doing so would require that eachmeasurement be stored with all of these covariance terms. Speciﬁc approaches will depend oneach application; we present one simple version of this approach which leverages the time of ameasurement in the construction of R .A simple version of this approach which enables weight models that emphasize time diver-sity is a case where we set m = 1 , or assume that each observation is broken into a independentcovariance term, B , and a bias or correlated bias term, B . Additionally, assume each mea-surement comes with a measurement time, t i . Then for the bias vector −→ r compute the pairwise19omponent as r i,j = r i,j,max exp( − γ | t i − t j | ) , (59)where γ is a positive real constant. This model reﬂects a system where measurements close intime are correlated, but this correlation decreases as the time between measurements increases. Conclusion

Measurement combining via convolving assumes provided measurements are uncorrelated.When the measurements are not uncorrelated the reported covariance that comes from con-volving is often too small. We show an approach to construct joint measurement covariancematrices parameterized by measurement pair-wise correlation coefﬁcients. Finally based onthis approach, we present three alternatives to convolving when the full-joint covariance is notprovided and must be constructed.In practice, linear uncorrelated location estimates almost never exist. When we assume es-timates are uncorrelated, the combined estimated covariance determinant shrinks to zero as thenumber of measurements increase. In contrast, this is not true for the alternative algorithmspresented in this paper. In real world applications, believing we can achieve arbitrarily pre-cise estimates by increasing the number of measurements is naive (for further discussion andexamples see [8,16,13] ). We therefore strongly recommend algorithms like the ones discussed inthis paper in place of convolving. While this paper focused on cases where the mapping of asingle measurement vector to the solution space is the identity matrix, the concepts and ideasare extendable to other design matrices.Lastly, we note that all algorithms presented are linear. It is well-known that linear estimatesare particularly sensitive to outliers or data contamination. Appropriate modiﬁcations to thepresented algorithms to make them robust are recommended.20 eferences [1] Nabil Ali Ahmed and DV Gokhale. Entropy expressions and their estimators for multi-variate distributions.

IEEE Transactions on Information Theory , 35(3):688–692, 1989.[2] Alexander C Aitken. Iv.—on least squares and linear combination of observations.

Pro-ceedings of the Royal Society of Edinburgh , 55:42–48, 1936.[3] Tadeusz Banachiewicz. Zur berechnung der determinanten, wie auch der inversen und zurdarauf basierten auﬂosung der systeme linearer gleichungen.

Acta Astronom. Ser. C , 3:41–67, 1937.[4] Mitra Basu and Tin Kam Ho.

Data complexity in pattern recognition , page 181. SpringerScience & Business Media, 2006.[5] Rainer Beck, Anvar Shukurov, Dmitry Sokoloff, and Richard Wielebinski. Systematicbias in interstellar magnetic ﬁeld estimates.

Astronomy & Astrophysics , 411(2):99–107,2003.[6] Jean Gallier. The schur complement and symmetric positive semideﬁnite (and deﬁnite)matrices. 2019.[7] Frank R Hampel, Elvezio M Ronchetti, Peter J Rousseeuw, and Werner A Stahel.

Robuststatistics: the approach based on inﬂuence functions , volume 196, pages 22–25. JohnWiley & Sons, 2011.[8] Frank R Hampel, Elvezio M Ronchetti, Peter J Rousseeuw, and Werner A Stahel.

Robuststatistics: the approach based on inﬂuence functions , volume 196, pages 387–397. JohnWiley & Sons, 2011. 219] Nicholas J Higham. Computing real square roots of a real matrix.

Linear Algebra and itsapplications , 88:405–430, 1987.[10] Harold Jeffreys.

The theory of probability . OUP Oxford, 1998.[11] J.R. Magnus and H. Neudecker.

Matrix Differential Calculus with Applications in Statis-tics and Econometrics , pages 149–150. Wiley, 1999. ISBN 9780471986331.[12] Prasanta Chandra Mahalanobis. On the generalized distance in statistics. National Instituteof Science of India, 1936.[13] Frederick Mosteller, John Wilder Tukey, et al.

Data analysis and regression: a secondcourse in statistics , chapter Hunting Out the Real Uncertainty. 1977.[14] Thuan Nguyen and Jiming Jiang. Simple estimation of hidden correlation in repeatedmeasures.

Statistics in medicine , 30(29):3403–3415, 2011.[15] Karl Pearson. On the mathematical theory of errors of judgement with special reference tothe personal equation.

Proceedings of the Royal Society of London , 68(442-450):369–372,1901.[16] WJ Youden. Enduring values.