EEllipse Combining with Unknown Cross EllipseCorrelations
Adam Hall
We discuss the combining of measurements where single measurement covari-ances are given but the joint measurement covariance is unknown. For thispaper we assume the mapping of a single measurement to the solution space isthe identity matrix. We examine the solution when it is assumed all measure-ments are uncorrelated. We then present a way to parameter joint measure-ment covariance based on pairwise correlation coefficients. Finally, we discusshow to use this parameterization to combine the measurements.
Introduction
In this paper, we are going to examine the linear combining or fusion of location estimates.Each estimate is a vector of size k × and for each estimate we are also provided a k × k covariance matrix. We will assume that the overall joint system covariance which describes thecorrelation between location estimates is not provided. Specifically, we consider the generalizedleast squares problem of finding an estimate ˆ x given the following model y = Ax + (cid:15), (1)1 a r X i v : . [ m a t h . S T ] J a n here y is a column stacked vector of n , k × estimates , each k × vector is denoted as y i y = y ... y n (2)and A , our design matrix, is a block matrix, a stack of n , k × k identity matrices A = I kxk ... I kxk . (3)The size of A is nk × k . Finally, (cid:15) is a zero mean Gaussian process with a covariance matrixthat overall is unknown but whose block diagonal is provided by the y i estimated covariancedenoted as E i , where E i is a k × k matrix. We define the joint covariance matrix R as a blockmatrix as follows E (cid:0) (cid:15)(cid:15) T (cid:1) = R = E E (cid:0) y y T (cid:1) . . . E (cid:0) y y Tn (cid:1) E (cid:0) y y T (cid:1) E . . . ...... . . . . . . ... E (cid:0) y n y T (cid:1) . . . . . . E n (4)where E ( y i y j ) is not known when i (cid:54) = j .At first this model may look restrictively specific because the design matrix A consists of acolumn stack of identity matrices. Often in practice measurements must be transformed via adesign matrix A which is not a column stack of identity matrices. However, the model aboveis applicable to data fusion problems where the raw measurements are unavailable. Assumefor example that we obtain n estimates of the k × parameters, where each y i estimate wasprovided by some black-box processor. This black-box processor might have produced each y i n estimates. However, this isoften not possible; thus, we have the problem presented and discussed in this paper.In this paper, we will examine the best linear unbiased estimator (BLUE) for this model un-der various constructions of the joint unknown covariance matrix R . First, we will examine thetrivial solution when one assumes that all cross estimate correlations are zero, E (cid:0) y i y Tj (cid:1) | i (cid:54) = j = 0 .This approach is sometimes called ellipse convolving. In practice assuming datasets to be com-pletely uncorrelated often produces solution covariance estimates that are too small because theassumption of uncorrelated datasets is often wrong. Some examples of correlation found indatasets thought to be uncorrelated are found in the following sources [15,10,7,5,14] . We will thenlook at an approach to parameterize the covariance matrix by cross-correlation coefficients.Under this parameterization we will solve for the max-entropy solution as a function of thecross-correlation coefficient for combining two estimates y and y . Finally, we will extend thispairwise approach to n estimates. Use of the Terms: Ellipse, Error Integral and Entropy.
The probability density function ofthe y i measurement with covariance E i is f ( y i ) = (cid:0) [2 π ] k | E i | (cid:1) − / exp (cid:18) −
12 ( y − y i ) T E − ( y − y i ) (cid:19) . (5)We can define the contour where f ( y i ) is equal to a constant by finding all whose Mahalanobisdistance is the same. In other words, we define the contour as all points y such that (cid:113) ( y − y i ) T E − i ( y − y i ) = C, (6)where C is a positive constant. [12] It is well-known that this contour is the equation of an ellipsefor k = 2 and in general an ellipsoid. For this reason we use the capital E to denote thecovariance and refer to combining measurements as Ellipse Combining. Furthermore, for k = 2 k = 3 the volume of the ellipsoid) defined as the area where (cid:113) ( y − y i ) T E − i ( y − y i ) < C (7)is proportional to (cid:112) | E i | . [4] In this paper we will say that (cid:112) | E | is proportional to the errorintegral of the ellipse E . Finally, throughout this paper we will refer to the differential entropyof a continuous random variable simply as the entropy. Review of Generalized Least Squares.
For reference and completeness, when the covariancematrix is fully specified and one combines the measurements using a semi-positive definiteweight matrix W , we have the well-known results [2] ˆ x = ( A T W A ) − A T W y, (8) E (cid:0) ˆ x ˆ x T (cid:1) = P ( W, R ) = ( A T W A ) − A T W RW A ( A T W A ) − . (9)The best linear unbiased estimator (BLUE) is found by setting W = R − and for this specificcase Equation 9 simplifies to E (cid:0) ˆ x ˆ x T (cid:1) = P ( R − , R ) = ( A T R − A ) − . (10)As defined so far, the BLUE solution above requires that R be positive definite. We canextend the solution to incorporate R which is only positive semidefinite by setting W = R † ,where X † is defined as the pseudo-inverse of a positive semidefinite matrix X . [6] In the casewhere R is only positive semidefinite we also require that the solution covariance P be positivedefinite. For the rest of this paper, for simplicity, we will denote the pseudo-inverse of R as R − and not R † . To summarize, in this paper we require that R be a positive semidefinite suchthat the resulting solution P is strictly positive definite and R − will be used to denote thepseudo-inverse of R . 4 efining α and β to compare models. We can compare P ( W, R ) for various W and R . Forexample, assume that we have some guess or estimate of the joint covariance denoted ˆ R . Wethen set W − = ˆ R compute a solution and report a solution covariance via Equation 9 setting R = ˆ R . Our reported solution covariance will be mismatched if R (cid:54) = ˆ R . To describe thismismatch we define α ( W, R ) = (cid:115) | P ( W, R ) || P ( W, W − ) | . (11)The scalar α ( W, R ) is the number of times smaller or bigger the error integral of our estimateis compared to what we report.We also define β ( W, R ) = (cid:115) | P ( W, R ) || P ( R − , R ) | . (12)The scalar β ( W, R ) is the number of times bigger the error integral of our estimate is comparedto the BLUE solution. The scalar β ( W, R ) ≥ .The scalar α ( W, R ) represents a mismatch between reported covariance and the actual co-variance. The scalar β ( W, R ) represents the number of times bigger our solution error integralis compared to what it could be if we fully knew R and applied the BLUE weights (i.e. set W = R − ). There are other metrics and methods to compare P ( W, R ) for various W and R ;nevertheless, we will use α and β as defined above as our metrics for comparison. Ellipse Convolving
We can construct R as n × n block matrix where each block is a k × k ; the size of R is ( nk ) × ( nk ) . We denote the i th row and j th column block matrix by R i,j . If we assume that allnon-diagonal blocks consist of null matrices we have R i,i = E i , (13) R i,j | j (cid:54) = i = 0 kxk . (14)5hen we find the BLUE solution is ˆ x = ( A T R − A ) − A T R − y = (cid:32) n (cid:88) i =1 E − i (cid:33) − (cid:32) n (cid:88) i =1 E − i y i (cid:33) (15)with covariance, P = ( A T R − A ) − = (cid:32) n (cid:88) i =1 E − i (cid:33) − . (16)For the rest of this paper, we will denote this solution as ˆ x c and its covariance as P c where thesubscript c is for ellipse convolving. The underlying assumption is that all provided estimatesare independent (and linear). We will consider this the best case scenario and as a result treat | P c | as a lower bound. In practice, estimates will almost never be completely independentand linear. For example, our estimates might all be off by some unknown bias, which we cantreat as a nonzero positive cross correlation between all the estimates. In exploring these othercovariance models, we will find that strictly speaking P c is not always the most optimistic orbest case scenario. In other words there are plenty of cases where one can design R to have non-zero positive cross estimate correlation values whose BLUE combine estimate covariance P issuch that | P | < | P c | . However, given that cross-estimate correlations are unknown, why selectweights that reflect the cross-estimate correlations working together to improve your estimatescompared to zero cross correlation? Based on this reasoning, such matrices that theoreticallyoutperform convolving are rejected. Instead, this paper will investigate covariance matrices thatresult in | P | ≥ | P c | . Pairwise Ellipse Combining Parameterized by a Scalar Cross-Ellipse Correlation Coefficient
The previous section showed that ellipse convolving is the best linear unbiased estimator (i.e.BLUE) for combining measurements under the assumption that all cross-ellipse correlationvalues are zero. This section will investigate BLUE for combining two ellipses whose cross-6llipse correlation is parameterized by a scalar cross-correlation coefficient r . Then we willfind the cross correlation coefficient r max which maximizes the entropy of the combined ellipsesolution.We start by considering the combining of two ellipses with covariance matrices E and E respectively and a joint system covariance matrix R = E r √ E E r (cid:0) √ E E (cid:1) T E (17)where we will define the square root of a matrix X as the principal square root. [9] The Schurcomplement of the E block matrix of the matrix R is R/E = E − r ( (cid:112) E E ) E − (cid:16)(cid:112) E E (cid:17) T = (1 − r ) E . (18)Clearly the matrix R/E is positive semidefinite if and only if | r | ≤ . Or equivalently, R positive semidefinite if and only if | r | ≤ . [6] Also if | r | = 1 , then | R | = 0 . Via the blockmatrix inversion identify [3] we find R − = (1 − r ) − E − − r (cid:112) E − E − − r (cid:16)(cid:112) E − E − (cid:17) T E − . (19)The entropy of the BLUE solution is h ( P ) = k k π ) + 12 ln ( | P | ) , (20)where P is defined by Equation 10. [1] Clearly finding the r max , the cross-correlation coefficientthat maximizes the entropy is equivalent to finding the r max that maximizes | P | . For algebraicsimplicity, we will examine | P − | , P − = (1 − r ) − ( S − rZ ) (21)7here we define Z = (cid:113) E − E − + (cid:18)(cid:113) E − E − (cid:19) T , (22) S = E − + E − . (23)To find the critical points of | P − | with respect to r we look at ddr | P − | = ddr (cid:2) (1 − r ) − k | S − rZ | (cid:3) (24)and via Jacobi identity [11] and the derivative chain rule we find ddr | P − | = r tr ( adj ( S − rZ ) Z ) + 2 kr | S − rZ | − tr ( adj ( S − rZ ) Z )(1 − r ) k +1 (25)where adj ( X ) is the adjugate matrix of X and tr ( X ) is the trace of X .By looking at the values of r for which ddr | P − | = 0 we can determine which value r max thatmaximizes | P | for positive coefficient values < r ≤ . In doing so, we can find an interval ≤ r ≤ r max ≤ where the | P | is monotonically increasing with respect to r . In other wordsthe entropy of our combine ellipse increases with respect to r . We will restrict ourselves tothe range ≤ r ≤ r max when considering alternative algorithms besides convolving to enableellipse combining.It is also useful to compare a mismatch between a weight model based on a predicted r say r p versus the actual unknown r say we label it r n . For the pairwise case we find that thecovariance for the mismatched case given by equation 9, now rewritten as a function of r p and r n instead of W and R becomes P ( r p , r n ) = ( S − r p Z ) − [(1 − r p r n + r p ) S + ( r n − r p + r n r p ) Z ]( S − r p Z ) − . (26)We note P c = P (0 , and introduce a new constant P max defined as P max = P ( r max , r max ) . (27)8e can rewrite Equation 11 in terms of r p and r n instead of W and R , we find α ( r p , r n ) = (1 − r p ) − k/ (cid:115) | (1 − r p r n + r p ) S + ( r n − r p + r n r p ) Z || ( S − r p Z ) | . (28)Finally, we can rewrite Equation 12 in terms of r p and r n as β ( r p , r n ) = (cid:115) | ( S − r n Z )([1 − r p r n + r p ] S + [ r n − r p + r n r p ] Z ) | (1 − r n ) k | ( S − r p Z ) | . (29)The next two sections provide more specific discussion for 1-D and 2-D location estimates. Pairwise Combine: The 1-D case, k =1. If we have k = 1 covariance matrices E and E are scalars; we define E = σ , (30) E = σ . (31)With these definitions, we find Z = 2 σ σ , (32) S = 1 σ + 1 σ . (33)And equation 25 simplifies to ddr | P − | = − Zr + 2 r ( S − rZ ) − Z (1 − r ) . (34)Setting the derivative of | P − | to zero and restricting ourselves to the region | r | < we find onecritical point and we can easily verify this point is a local maximum of | P | , we have r max = min (cid:18) σ σ , σ σ (cid:19) . (35)We also find ˆ x = w y + w y (36)9here w = σ − rσ σ σ + σ − rσ σ , (37) w = σ − rσ σ σ + σ − rσ σ . (38)We note that for values r > r max we find either w or w is less than zero. For the case wherethe cross-correlation coefficient is unknown and we are computing a weighted average it iscounter-intuitive to select a r value that results in a negative weight. Throughout this paper, weavoid this by restricting ourselves to the region ≤ r ≤ r max .For a numerical example, let us consider σ = 1 . and σ = 2 . . Equation 35 gives r max =0 . . Figure 1 shows the BLUE solution as a function of r . Figure 2 shows α for both r p = 0 and r p = r max . As | r p − r n | increases, α (0 , r n ) also increases becoming greater than which meansif we convolve and compute the solution covariance assuming the variables are independent andthey are not, our solution covariance error integral will be too small. In contrast, as | r p − r n | increases, α ( r max , r n ) decreases becoming slightly less than which means if we computethe combine solution under assume r = r max , our computed covariance error integral will beapproximately correct and slightly too big if the variables are closer to independent. Finally,Figure 3 shows β for both r p = 0 and r p = r max . From this figure we note that as | r p − r n | increases, both β (0 , r n ) and β ( r max , r n ) increase. We also note that β ( r max , r n ) increases fasterthan β (0 , r n ) as a function of | r p − r n | . This means that if we guess r but are off compared to thetrue r , our algorithm will compute a covariance whose error integral is larger than theoreticallypossible but we will be closer to the lower bound using weights based on r = 0 (convolving)compared to r = r max . 10igure 1: Numerical Example of the solution P (defined in equation 26) as a function of r . Thesolid line is the interval of ≤ r ≤ r max ; we recommend only considering r values in thisregion. The dotted line shows the region where r max < r ≤ .11igure 2: Numerical Example of the solution α ( r p , r n ) as a function of | r p − r n | , where ≤ r n ≤ r max .Figure 3: Numerical Example of the solution β ( r p , r n ) as a function of | r p − r n | , where ≤ r n ≤ r max . 12 airwise Combine: The 2-D case, k =2. For × ellipses, we find | P | = (cid:20) r | Z | − λr + | S | (1 − r ) (cid:21) − (39)where λ = tr ( S adj ( Z )) . (40)We find the r max value will be one of the solutions to the cubic equation, | Z | r − λr + (4 | S | + 2 | Z | ) r − λ = 0 . (41)We can find solutions to the cubic equation above, throw out solutions where | r | > and fromthe remaining solutions find r max and the interval ≤ r ≤ r max where | P | is monotonicallyincreasing with respect to r .For a numerical example let us consider E = , (42) E = . (43)For this example Equation 41 becomes √ r − √ √ r + 2 √ r − √ − √
32 = 0 (44)and we find r max ≈ . . Figure 4 shows the BLUE solution as a function of r . Figure5 shows α for both r p = 0 and r p = r max . Finally, Figure 6 shows β for both r p = 0 and r p = r max . While the exact values are different, the patterns we observed in the previous sec-tion for k = 1 are the same for k = 2 . Namely, if we construct weights and covariance solutionestimation assuming r = 0 our constructed covariance error integral will be too small if the13easurements cross-correlation is non-zero. Conversely, if we construct weights and covari-ance solution estimation assuming r = r max and r is actually less than r max , our constructedcovariance error integral will be approximately correct but slightly too large. Additionally,the ratio between the theoretical best error integral and the one computed for a fixed value of | r p − r n | is smaller when r p = 0 compared to r p = r max .Figure 4: Numerical Example of the solution P (defined in equation 26) as a function of r . Thesolid line is the interval of ≤ r ≤ r max ; we recommend only considering r values in thisregion. The dotted line shows the region where r max < r ≤ .14igure 5: Numerical Example of the solution α ( r p , r n ) as a function of | r p − r n | , where ≤ r n ≤ r max .Figure 6: Numerical Example of the solution β ( r p , r n ) as a function of | r p − r n | , where ≤ r n ≤ r max . 15 xtending Ellipse Combining to n Ellipses
In general, we will have more than two ellipses we wish to combine. The extension is straight-forward. We parameterize the joint system covariance matrix by correlation coefficient betweenevery pair of ellipses. We consider n ellipses with covariance matrices E , E , ..., E j , ..., E n allof size k × k . Then, we define the joint covariance matrix parameterized by pairwise correlationcoefficients as a block matrix consisting of n × n blocks. Each block is a square matrix equalin size to the individual ellipse covariance. The i row, j column sub-block is defined as R i,j | i
The convolving algorithm and resulting combined covariance are based on an assumption ofindependent and zero-mean measurements. If we have reason to believe that our observationsare correlated or biased we can adopt a different strategy. While there are numerous possiblestrategies, in this section we will discuss three.
Max Entropy BLUE.
Given n observations y i and their respective covariance matrices E i ,we build the max entropy parameterized covariance matrix R max as defined by Equation 49.Then, compute the blue solution. The resulting combined estimate and reported covariance aregiven by ˆ x = ( A T R − max A ) − A T R − max y, (51) P max = ( A T R − max A ) − . (52)17his algorithm assumes the cross-observation correlations are such to make our estimation asuncertain as possible. Compared to the next proposed algorithm, this algorithm is computation-ally expensive especially as the number of measurements n increases. Computing ˆ x directlyinvolves both finding R max and taking the inverse of R max , a matrix of size ( nk ) × ( nk ) . Wenote, one could also use R pm in place of R max in order to avoid the computational cost of finding R max . Assume Max Entropy to Report Covariance; Convolve to estimate solution.
Given n ob-servations y i and their respective covariance matrices E i , we build the max entropy param-eterized covariance matrix R max as defined by Equation 49. Next, we estimate the solutionassuming the measurements are uncorrelated (i.e. convolve) and report the combined estimatecovariance under the max entropy assumption. We define P c = (cid:32) n (cid:88) i =1 E − i (cid:33) − , (53)and P r = P c (cid:32) n (cid:88) i =1 n (cid:88) j = i +1 r i,j,max (cid:20)(cid:113) E − i E − j + (cid:113) E − j E − i (cid:21)(cid:33) P c . (54)Subsequently, we find that in this algorithm ˆ x and reported covariance P as follows, ˆ x = P c (cid:32) n (cid:88) i =1 E − i y i (cid:33) (55) P = P c + P r . (56)This algorithm is convolving with the only difference of a modified reported covariance. Themodified covariance matrix is a conservative estimate which assumes the max entropy jointcovariance R max . In contrast to the previous algorithm R max is never inverted. Additionally, ifone assumes that new measurements are coming in over time, it is easy to write this algorithmas an update to previous solutions with each new estimate.18 ax Entropy provided a sum of covariances per measurement. This approach requiresthat each observation covariance is specified as a linear sum of covariance to include an inde-pendent portion and some number, say m correlated or bias portions, E i = B + (cid:32) m (cid:88) a =1 B a (cid:33) . (57)Then we define the joint covariance model as R = R ( −→ (cid:32) m (cid:88) a =1 R a ( −→ r a ) (cid:33) , (58)where −→ r a is a vector of correlation coefficients for the a th covariance matrix in the sum. Wedefine the number of covariance terms m and their respective −→ r a based on the specific modelwe wish to adopt. We have defined B to be the portion of each covariance that is completelyindependent and therefore the joint covariance R is just a block diagonal matrix. After con-structing R , the algorithm performs the best linear unbiased estimator. For example, we mighthave a bias covariance due to a common instrument used to make the observations so for allmeasurements that come from that instrument we select the max entropy coefficient within −→ r a .While if observations come from another instrument, we set the appropriate −→ r a term to zero.This approach requires that the measurements provided come with more than just a single co-variance matrix to enable one to construct R . One could make these models very complexwith a large m and different rules for building each −→ r a , but doing so would require that eachmeasurement be stored with all of these covariance terms. Specific approaches will depend oneach application; we present one simple version of this approach which leverages the time of ameasurement in the construction of R .A simple version of this approach which enables weight models that emphasize time diver-sity is a case where we set m = 1 , or assume that each observation is broken into a independentcovariance term, B , and a bias or correlated bias term, B . Additionally, assume each mea-surement comes with a measurement time, t i . Then for the bias vector −→ r compute the pairwise19omponent as r i,j = r i,j,max exp( − γ | t i − t j | ) , (59)where γ is a positive real constant. This model reflects a system where measurements close intime are correlated, but this correlation decreases as the time between measurements increases. Conclusion
Measurement combining via convolving assumes provided measurements are uncorrelated.When the measurements are not uncorrelated the reported covariance that comes from con-volving is often too small. We show an approach to construct joint measurement covariancematrices parameterized by measurement pair-wise correlation coefficients. Finally based onthis approach, we present three alternatives to convolving when the full-joint covariance is notprovided and must be constructed.In practice, linear uncorrelated location estimates almost never exist. When we assume es-timates are uncorrelated, the combined estimated covariance determinant shrinks to zero as thenumber of measurements increase. In contrast, this is not true for the alternative algorithmspresented in this paper. In real world applications, believing we can achieve arbitrarily pre-cise estimates by increasing the number of measurements is naive (for further discussion andexamples see [8,16,13] ). We therefore strongly recommend algorithms like the ones discussed inthis paper in place of convolving. While this paper focused on cases where the mapping of asingle measurement vector to the solution space is the identity matrix, the concepts and ideasare extendable to other design matrices.Lastly, we note that all algorithms presented are linear. It is well-known that linear estimatesare particularly sensitive to outliers or data contamination. Appropriate modifications to thepresented algorithms to make them robust are recommended.20 eferences [1] Nabil Ali Ahmed and DV Gokhale. Entropy expressions and their estimators for multi-variate distributions.
IEEE Transactions on Information Theory , 35(3):688–692, 1989.[2] Alexander C Aitken. Iv.—on least squares and linear combination of observations.
Pro-ceedings of the Royal Society of Edinburgh , 55:42–48, 1936.[3] Tadeusz Banachiewicz. Zur berechnung der determinanten, wie auch der inversen und zurdarauf basierten auflosung der systeme linearer gleichungen.
Acta Astronom. Ser. C , 3:41–67, 1937.[4] Mitra Basu and Tin Kam Ho.
Data complexity in pattern recognition , page 181. SpringerScience & Business Media, 2006.[5] Rainer Beck, Anvar Shukurov, Dmitry Sokoloff, and Richard Wielebinski. Systematicbias in interstellar magnetic field estimates.
Astronomy & Astrophysics , 411(2):99–107,2003.[6] Jean Gallier. The schur complement and symmetric positive semidefinite (and definite)matrices. 2019.[7] Frank R Hampel, Elvezio M Ronchetti, Peter J Rousseeuw, and Werner A Stahel.
Robuststatistics: the approach based on influence functions , volume 196, pages 22–25. JohnWiley & Sons, 2011.[8] Frank R Hampel, Elvezio M Ronchetti, Peter J Rousseeuw, and Werner A Stahel.
Robuststatistics: the approach based on influence functions , volume 196, pages 387–397. JohnWiley & Sons, 2011. 219] Nicholas J Higham. Computing real square roots of a real matrix.
Linear Algebra and itsapplications , 88:405–430, 1987.[10] Harold Jeffreys.
The theory of probability . OUP Oxford, 1998.[11] J.R. Magnus and H. Neudecker.
Matrix Differential Calculus with Applications in Statis-tics and Econometrics , pages 149–150. Wiley, 1999. ISBN 9780471986331.[12] Prasanta Chandra Mahalanobis. On the generalized distance in statistics. National Instituteof Science of India, 1936.[13] Frederick Mosteller, John Wilder Tukey, et al.
Data analysis and regression: a secondcourse in statistics , chapter Hunting Out the Real Uncertainty. 1977.[14] Thuan Nguyen and Jiming Jiang. Simple estimation of hidden correlation in repeatedmeasures.
Statistics in medicine , 30(29):3403–3415, 2011.[15] Karl Pearson. On the mathematical theory of errors of judgement with special reference tothe personal equation.
Proceedings of the Royal Society of London , 68(442-450):369–372,1901.[16] WJ Youden. Enduring values.