Jackknife covariance matrix estimation for observations from mixture
aa r X i v : . [ m a t h . S T ] D ec Modern Stochastics: Theory and Applications 6 (4) (2019) 495–513https://doi.org/10.15559/19-VMSTA145
Jackknife covariance matrix estimation forobservations from mixture
Rostyslav Maiboroda ∗ , Olena Sugakova Taras Shevchenko National University of Kyiv, Kyiv, Ukraine [email protected] (R. Maiboroda), [email protected] (O. Sugakova)Received: 20 May 2019, Revised: 7 September 2019, Accepted: 14 October 2019,Published online: 7 November 2019
Abstract
A general jackknife estimator for the asymptotic covariance of moment estimatorsis considered in the case when the sample is taken from a mixture with varying concentrationsof components. Consistency of the estimator is demonstrated. A fast algorithm for its calcu-lation is described. The estimator is applied to construction of confidence sets for regressionparameters in the linear regression with errors in variables. An application to sociological dataanalysis is considered.
Keywords
Finite mixture model, orthogonal regression, mixture with varyingconcentrations, nonparametric estimation, asymptotic covariance matrix estimation,confidence ellipsoid, jackknife, errors-in-variables model
Finite Mixture Models (FMM) are widely used in the analysis of biological, economicand sociological data. For a comprehensive survey of different statistical techniquesbased on FMMs, see [9]. Mixtures with Varying Concentrations (MVC) is a sub-class of these models in which the mixing probabilities are not constant, but vary fordifferent observations (see [4, 5]).In this paper we consider application of the jackknife technique to the estimationof asymptotic covariance matrix (the covariance matrix for asymptotically normal ∗ Corresponding author.
Preprint submitted to VTeX / Modern Stochastics: Theory andApplications
R. Maiboroda, O. Sugakova estimator, ACM) in the case when the data are described by the MVC model. Thejackknife is a well-known resampling technique usually applied to i.i.d. samples (seeSection 5.5.2 in [11], Chapter 4 in [15], Chapter 2 in [12]). On the jackknife estimatesof variance for censored and dependent data, see [14]. Its modification to the case ofthe MVC model in which the observations are still independent but not identicallydistributed needs some efforts.We obtained a general theorem on consistency of the jackknife estimators forACM for moment estimators in the MVC models and apply this result to constructconfidence sets for regression coefficients in linear errors-in-variables models forMVC data. On general errors-in-variables models, see [2, 3, 8]. The model and theestimators for the regression coefficients considered in this paper was proposed in [6],where the asymptotic normality of these estimates is shown.The rest of the paper is organized as follows. In Section 2 we introduce the MVCmodel and describe the estimation technique for these models based on weightedmoments. In Section 3 the jackknife estimates for the ACM are introduced and con-ditions of their consistency formulated. Section 4 is devoted to the algorithm of fastcomputation of the jackknife estimates. In Section 5 we apply the previous resultsto construct confidence sets for linear regression coefficients in errors-in-variablesmodels with MVC. In Section 6 results of simulations are presented. In Section 7 wepresent results of application of the proposed technique to analyze sociological data.Proofs are placed in Section 8. Section 9 contains concluding remarks.
We consider a dataset in which each observed subject O belongs to one of M sub-populations (mixture components). The number κ ( O ) of the population which O be-longs to is unknown. We observe d numeric characteristics of O which form thevector ξ ( O ) = ( ξ ( O ) , . . . , ξ d ( O )) T ∈ R d of observable variables. The distributionof ξ ( O ) may depend on the component κ ( O ) : F ( m ) ξ ( A ) = P { ξ ( O ) ∈ A | κ ( O ) = m } , m = 1 , . . . , M, where A is any Borel subset of R d .We observe variables of n independent subjects ξ j = ξ ( O j ) . The probability toobtain j -th subject from m -th component p ( m ) j = P { κ ( O j ) = m } can be considered as the concentration of the m -th component in the mixture when the j -th observation was made. The concentrations are known and can vary for differentobservations.So, the distribution of ξ j is described by the model of mixture with varying con-centrations: P { ξ j ∈ A } = M X m =1 p ( m ) j F ( m ) ξ ( A ) (1) ackknife covariance matrix estimation for observations from mixture We will denote by µ ( m ) = E ( m ) [ ξ ] = E [ ξ ( O ) | κ ( O ) = m ] = Z R d xF ( m ) ξ ( dx ) the vector of theoretical first moments of the m -th component distribution. In whatfollows, Cov ( m ) [ ξ ] means the covariance of ξ ( O ) for the m -th component, Var ( m ) [ ξ l ] means the variance of ξ l ( O ) for this component and so on.To estimate µ ( k ) by observations ξ , . . . , ξ n one can use the weighted samplemean ¯ ξ ( k ); n = ¯ ξ ( k ) = n X j =1 a ( k ) j ξ j , (2)where a ( k ) j = a ( k ) j ; n are some weights dependent on components’ concentrations, butnot on the observed ξ j = ξ j ; n . (In what follows we denote by the subscript ; n thatthe corresponding quantity is considered for the sample size n . In most cases thissubscript is dropped to simplify notations.)To obtain unbiased estimates in (2) one needs to select the weights satisfying theassumption n X j =1 a ( k ) j p ( m ) j = ( if k = m, if k = m. (3)Let us denote Ξ = ( ξ T , . . . , ξ n ) T = ξ . . . ξ d ... . . . ... ξ n . . . ξ dn , a = a (1)1 . . . a ( M )1 ... . . . ... a (1) n . . . a ( M ) n and p = p (1)1 . . . p ( M )1 ... . . . ... p (1) n . . . p ( M ) n . Then p ( m ) (cid:5) = ( p ( m )1 , . . . , p ( m ) n ) T , p (cid:5) j = ( p (1) j , . . . , p ( M ) j ) T and the same notation isused for the matrix a .In this notation the unbiasedness condition (3) reads a T p = E , (4)where E means the M × M unit matrix.There can be many choices of a satisfying (4). In [4, 5] minimax weights areconsidered defined by a = pΓ − , (5) In fact, in [4] and [5] the weights are defined as ˜ a = n a and ¯ ξ ( k ) = n P nj =1 ˜ a ( k ) j ξ j . In this paperwe adopt notation which allows to simplify formulas for fast estimator calculation in Section 4.98 R. Maiboroda, O. Sugakova where Γ = Γ ; n = p T p is the Gram matrix of the set of concentration vectors p (1) (cid:5) , . . . , p ( M ) (cid:5) . In what follows,we assume that these vectors are linearly independent, so det Γ > and Γ − exists.See [5] on the optimal properties of the estimates for concentration distributions basedon the minimax weights (5).To describe the asymptotic behavior of ¯ ξ ( k ); n as n → ∞ , we will calculate itscovariance matrix.Notice that Cov[ ξ j ] = E [ ξ j ξ Tj ] − E [ ξ j ] E [ ξ j ] T = M X m =1 p ( m ) j Σ ( m ) − M X m,l =1 p ( m ) j p ( l ) j µ ( m ) ( µ ( l ) ) T , where Σ ( m ) = Cov ( m ) [ ξ ] = Cov[ ξ ( O ) | κ ( O ) = m ] . So, n Cov ¯ ξ ( k ) = M X m =1 h ( a ( k ) ) p ( m ) i ; n Σ ( m ) − M X m,l =1 h ( a ( k ) ) p ( m ) p ( l ) i ; n µ ( m ) ( µ ( l ) ) T , where h ( a ( k ) ) p ( m ) i ; n = n n X j =1 ( a ( k ) j ) p ( m ) j , h ( a ( k ) ) p ( m ) p ( l ) i ; n = n n X j =1 ( a ( k ) j ) p ( m ) j p ( l ) j . Assume that the limits h ( a ( k ) ) p ( m ) p ( l ) i ∞ def = lim n →∞ h ( a ( k ) ) p ( m ) p ( l ) i ; n (6)exist. Then the limits h ( a ( k ) ) p ( m ) i ∞ def = lim n →∞ h ( a ( k ) ) p ( m ) i ; n exist also, since due to (3) we have P Ml =1 p lj = 1 for all j .So, under this assumption, n Cov[ ¯ ξ ( k ) ] → Σ ∞ as n → ∞ , (7)where Σ ∞ def = M X m =1 h ( a ( k ) ) p ( m ) i ∞ Σ ( m ) − M X m,l =1 h ( a ( k ) ) p ( m ) p ( l ) i ∞ µ ( m ) ( µ ( l ) ) T . Theorem 1.
Assume that:1. n Γ ; n → Γ ∞ as n → ∞ and det Γ ∞ > .2. Assumption (6) holds.3. E ( m ) [ k ξ k ] < ∞ for all m = 1 , . . . , M .Then √ n ( ¯ ξ ( k ); n − µ ( k ) ) W −→ N (0 , Σ ( k ) ∞ ) . This theorem is a simple corollary of Theorem 4.3 in [5]. ackknife covariance matrix estimation for observations from mixture
In what follows, we will consider unknown parameters of the component distribution F ( k ) ξ , which can be represented in the form ϑ = ϑ ( k ) = H ( µ ( k ) ) , (8)where H : R d → R q is some known function. A natural estimator for such parameterby the sample ξ , . . . , ξ n is ˆ ϑ = ˆ ϑ ( k ); n = H ( ¯ ξ ( k ); n ) . (9)Then asymptotic behavior of this estimator is described by the following theorem. Theorem 2.
In assumptions of Theorem 1, if H is continuously differentiable insome neighborhood of µ ( k ) , then √ n ( ˆ ϑ ( k ); n − ϑ ( k ) ) W −→ N (0 , V ∞ ) , where V ∞ = V ( k ) ∞ = H ′ ( µ ( k ) ) Σ ( k ) ∞ ( H ′ ( µ ( k ) )) T , (10) H ′ = ∂H ∂µ . . . ∂H ∂µ d ... . . . ... ∂H q ∂µ . . . ∂H q ∂µ d . This theorem is a simple implication from our Theorem 1 and Theorem 3 in Sec-tion 5, Chapter 1 of [1].So, V ∞ defined by (10) is the ACM of the estimator ˆ ϑ ( k ) (the covariance matrixof the limit normal distribution of the normalized difference between the estimatorand the estimated parameter). If it was known one could use it to construct tests forhypotheses on ϑ ( k ) or to derive confidence set for ϑ ( k ) . In fact, for most estimatorsthe ACM is unknown. Usually some estimate of V ∞ is used to replace its true valuein statistical algorithms.The jackknife is one of most general techniques of ACM estimation. Let ˆ ϑ be anyestimator of ϑ by the data ξ , . . . , ξ n : ˆ ϑ = ˆ ϑ ( ξ , . . . , ξ n ) . Consider estimates of the same form which are calculated by all observations withoutone ˆ ϑ i − = ˆ ϑ ( ξ , . . . , ξ i − , ξ i +1 , . . . , ξ n ) . Then the jackknife estimator for V ∞ is defined by ˆ V ; n = ˆ V ( k ); n = n n X i =1 ( ˆ ϑ i − − ˆ ϑ )( ˆ ϑ i − − ˆ ϑ ) T . (11) R. Maiboroda, O. Sugakova
In our case ˆ ϑ = H ( ¯ ξ ( k ) ) , so ˆ ϑ i − = H ( ¯ ξ ( k ) i − ) , (12)where ¯ ξ ( k ) j − = X j = i a ( k ) ji − ξ j . (13)Here a i − = ( a ( m ) ji − , j = 1 , . . . , n, m = 1 , . . . , M ) is the minimax weights matrixcalculated by the matrix p i − of concentrations of all observations except i -th one.That is, p i − = ( p (cid:5) , . . . , p (cid:5) i − , , p (cid:5) i +1 , . . . , p (cid:5) n ) T , a i − = p i − Γ − i − , (14) Γ i − = p Ti − p i − . (15)Notice that is placed at the i -th row of p i − as a placeholder only, to preserve thenumbering of the rows in p i − and a i − , which corresponds to the numbering of sub-jects in the sample. Theorem 3.
Let ϑ be defined by (8), ˆ ϑ by (9), V ∞ by (10) and ˆ V ( k ); n by (11)–(15).Assume that:1. H is twice continuously differentiable in some neighborhood of µ ( k ) .2. There exists some α > , such that E [ k ξ ( O ) k α | κ ( O ) = m ] < ∞ for all m = 1 , . . . , M .3. n Γ ; n → Γ ∞ as n → ∞ and det Γ ∞ > .4. Assumption (6) holds.Then ˆ V ( k ); n → V ( k ) ∞ . For proof see Section 8.
Direct calculation of ˆ V ; n by (11)–(15) needs ∼ Cn elementary operations. Here weconsider an algorithm which reduces the computational complexity to ∼ Cn opera-tions (linear complexity).Notice that Γ i − = Γ − p (cid:5) i ( p (cid:5) i ) T . So ( Γ i − ) − = Γ − + 11 − h i Γ − p (cid:5) i ( p (cid:5) i ) T Γ − , (16)where h i = ( p (cid:5) i ) T Γ − p (cid:5) i . (17)(Formula (16) can be demonstrated directly by checking Γ − i − Γ i − = E . It is also acorollary to the Serman–Morrison–Woodbury formula, see A.9.4 in [13].) ackknife covariance matrix estimation for observations from mixture Let us denote ¯ ξ i − = ¯ ξ i − . . . ¯ ξ d (1) i − ... . . . ... ¯ ξ M ) i − . . . ¯ ξ d ( M ) i − = (( ¯ ξ (1) i − ) T , . . . , ( ¯ ξ ( M ) i − ) T ) T and ¯ ξ = (( ¯ ξ (1) ) T , . . . , ( ¯ ξ ( M ) ) T ) T . (18)Then ¯ ξ i − = ( Γ i − ) − p i − Ξ i − , where Ξ i − = ( ξ , . . . , ξ i − , , ξ i +1 , . . . , ξ n ) T . (Zeroat the i -th row is a placeholder as in the matrix p i − .) Applying (16) one obtains ¯ ξ i − = Γ − ( p i − ) T Ξ i − + 11 − h i Γ − p (cid:5) i ( p (cid:5) i ) T Γ − ( p (cid:5) i ) T Ξ i − . This together with ( p i − ) T Ξ i − = p T Ξ − p (cid:5) i ξ Ti implies ¯ ξ i − = ¯ ξ + 11 − h i a (cid:5) i (( p (cid:5) i ) T ¯ ξ − ξ Ti ) . (19)Then the following algorithm allows one to calculate ˆ V ( m ) for all m = 1 , . . . , M atonce by ∼ Cn operations. Algorithm
1. Calculate Γ and a by (5).2. Calculate h i , i = 1 , . . . , n by (17).3. Calculate ¯ ξ by (2) and (18).4. Calculate ¯ ξ i − by (19).5. Calculate ˆ ϑ i − = ˆ ϑ ( m ) i − = H (¯ ξ ( m ) i − ) for i = 1 , . . . , n , m = 1 , . . . , M .6. Calculate ˆ V ( m ); n by (11) for all m = 1 , . . . , M . In this section we consider a mixture of simple linear regressions with errors in vari-ables. A modification of orthogonal regression estimation technique was proposed forthe regression coefficients estimation in [6]. We will show how the jackknife ACMestimators from Section 3 can be applied in this case to construct confidence sets forthe regression coefficients.Recall the errors-in-variables regression model in the context of mixture withvarying concentrations.We consider the case when each subject O has two variables of interest: x ( O ) and y ( O ) . These variables are related by a strict linear dependence with coefficientsdepending on the component that O belongs to: y ( O ) = b ( κ ( O ))0 + b ( κ ( O ))1 x ( O ) , (20)where b ( m )0 , b ( m )1 are the regression coefficients for the m -th component. R. Maiboroda, O. Sugakova
The true values of x ( O ) and y ( O ) are unobservable. These variables are observedwith measurement errors X ( O ) = x ( O ) + ε X ( O ) , Y ( O ) = y ( O ) + ε Y ( O ) . (21)Here we assume that the errors ε X ( O ) and ε Y ( O ) are conditionally independentgiven κ ( O ) = m , E ( m ) ε X = E ( m ) ε Y = 0 and Var ( m ) ε X = Var ( m ) ε Y = σ m ) (22)for all m = 1 , . . . , M . So the distributions of ε X ( O ) and ε Y ( O ) can be different,but their variances are the same for a given subject. We assume that σ m ) > , m =1 , . . . , M , and are unknown.As in Section 2 we observe a sample ( X ( O j ) , Y ( O j )) T = ( X j , Y j ) T , j =1 , . . . , n , from the mixture with known concentrations p ( m ) j = P { κ ( O j ) = m } .In the case of homogeneous sample, when there is no mixture, the classical wayto estimate b and b is orthogonal regression. That is, the estimator is taken as theminimizer of the total least squares functional which is the sum of squares of minimalEuclidean distances from the observation points to the regression line. The modifica-tion of this technique for mixtures with varying concentrations proposed in [6] leadsto the following estimators for b ( k )0 and b ( k )1 : ˆ b ( k )1 = ˆ S ( k ) XX − ˆ S ( k ) Y Y + q ( ˆ S ( k ) XX − ˆ S ( k ) Y Y ) + 4( ˆ S ( k ) XY ) S ( k ) XY , ˆ b ( k )0 = ¯ Y k − ˆ b ( k )1 ¯ X ( k ) , (23)where ¯ X ( k ) = n X j =1 a ( k ) j X j , ¯ Y ( k ) = n X j =1 a ( k ) j Y j , ˆ S ( k ) XX = n X j =1 a ( k ) j ( X j − ¯ X ( k ) ) , ˆ S ( k ) Y Y = n X j =1 a ( k ) j ( Y j − ¯ Y ( k ) ) , ˆ S ( k ) XY = n X j =1 a ( k ) j ( X j − ¯ X ( k ) )( Y j − ¯ Y ( k ) ) . Conditions of consistency and asymptotic normality of these estimators are given inTheorems 5.1 and 5.2 from [6]. For example, under the assumptions of Theorem 4we obtain √ n ( ˆ ϑ ( k ); n − ϑ ( k ) ) → N (0 , V ( k ) ∞ ) , where ϑ ( k ) = ( b ( k )0 , b ( k )1 ) T .The ACM V ∞ of the estimator is given by formula (21) in [6]. This formula israther complicated and involves theoretical moments of unobservable variables x ( O ) , ε X ( O ) and ε Y ( O ) . So it is natural to estimate V ∞ by the jackknife technique, whichdoesn’t need to know or estimate these moments. ackknife covariance matrix estimation for observations from mixture Notice that the estimator (ˆ b ( k )0 , ˆ b ( k )1 ) T can be represented in terms of Section 3 ifwe expand the space of observable variables including quadratic terms. That is, weconsider the sample ξ j = ( X j , Y j , ( X j ) , ( Y j ) , ( X j Y j ) ) T , j = 1 , . . . , n. Then the estimator ˆ ϑ ; n = ( ˆ ϑ n , ˆ ϑ n ) T = (ˆ b ( k )0 , ˆ b ( k )1 ) T defined by (23) can be repre-sented in form (9) with twice continuously differentiable function H if Var ( k ) [ x ] = 0 and b ( k )1 = 0 .So we can apply the technique developed in Sections 3–4. Let us define the esti-mator ˆ V ( k ); n for V ( k ) ∞ by (11). Theorem 4.
Assume that the following conditions hold.1. n Γ ; n → Γ ∞ as n → ∞ and det Γ ∞ > .2. Assumption (6) holds.3. E ( m ) [( x ) ] < ∞ , E ( m ) [( ε X ) ] < ∞ , E ( m ) [( ε Y ) ] < ∞ for all m = 1 , . . . , M .4. Var ( k ) x (0) = 0 and b ( k )1 = 0 .Then ˆ V ( k ); n → V ( k ) ∞ in probability as n → ∞ . This theorem is a simple combination of Theorem 3 and Theorem 5.2 from [6].In what follows we assume that V ( k ) ∞ is nonsingular. This assumption holds, e.g. iffor all m the distributions of x ( O ) , ε X ( O ) and ε Y ( O ) given κ ( O ) = m are absolutelycontinuous with continuous PDFs. (The proof of this fact is rather technical, so wedo not present it here.)We can construct a confidence set (ellipsoid) for the unknown parameter ϑ ( k ) applying the Theorem 4 by the usual way. Namely, let for any t ∈ R T ; n ( t ) = ( t − ˆ ϑ ( k ); n ) T ( ˆ V ( k ); n ) − ( t − ˆ ϑ ( k ); n ) . Then in the assumptions of Theorem 4, if det V ( k ) ∞ = 0 , T ; n ( ϑ ( k ) ) W −→ η, as n → ∞ , (24)where η is a random variable (r.v.) with chi-square distribution with 2 degrees offreedom.Consider B α ; n = { t ∈ R : T ; n ( t ) ≤ Q η (1 − α ) } , where Q η ( α ) means thequantile of level α for the r.v. η . By (24) P { ϑ ( k ) ∈ B α ; n } → − α, (25)so B α ; n is an asymptotic confidence set for ϑ ( k ) of level α . R. Maiboroda, O. Sugakova
To assess performance of the proposed technique we performed a small simulationstudy. In the following three experiments we calculated covering frequencies of con-fidence sets for regression coefficients in the model (20)–(22) constructed by (25) andcorresponding one-dimensional confidence intervals.In all experiments for sample size n = 100 through we generated B = 1000 samples and calculated estimates for the parameters and corresponding confidencesets. The one-dimensional confidence intervals for b ( k ) i were calculated by the stan-dard formula ˆ b ( k ) i ; n − λ α/ s ˆ v ( k ) ii ; n n , ˆ b ( k ) i ; n + λ α/ s ˆ v ( k ) ii ; n n , where ˆ v ( k ) ii ; n is the i -th diagonal entry of the matrix ˆ V ( k ); n , λ α/ is the quantile of level − α/ for the standard normal distribution. The confidence level for the sets andintervals was taken α = 0 . .Then the numbers of cases when the confidence set covers the true value of theestimated parameter were calculated and divided by B . These are the covering fre-quencies reported in the tables below.In all the experiments we considered two-component mixture ( M = 2 ) with theconcentrations of components p j ; n = j/n, p j ; n = 1 − j/n. The regression coefficients were taken as b (1)0 = 1 / , b (1)1 = 2 , b (2)0 = − / , b (2)1 = − / , and the distribution of the true (unobservable) regressor x ( O ) was N (0 , for κ ( O ) =1 and N (1 , for κ ( O ) = 2 . Experiment 1.
In this experiment we let ε X and ε Y ∼ N (0 , . . The varianceof the errors is so small that the regression coefficients can be estimated with nodifficulties even for small sample sizes.The covering frequencies for confidence sets are presented in Table 1. It seemsthat they approach the nominal covering probability . with satisfactory accuracyfor sample sizes n ≥ . Table 1.
Covering frequencies for confidence sets in Experiment 1 n b (1)0 b (1)1 ( b (1)0 , b (1)1 ) b (2)0 b (2)1 ( b (2)0 , b (2)1 )
100 0.935 0.961 0.948 0.936 0.987 0.957250 0.953 0.960 0.950 0.964 0.980 0.950500 0.940 0.954 0.939 0.958 0.973 0.9621000 0.946 0.949 0.943 0.954 0.971 0.9352500 0.961 0.949 0.948 0.937 0.953 0.9475000 0.947 0.949 0.948 0.954 0.956 0.958 ackknife covariance matrix estimation for observations from mixture
Experiment 2.
In this experiment we enlarged the variance of the error terms takingit as σ = 2 . All other parameters were the same as in Experiment 1. The results arepresented in Table 2. Table 2.
Covering frequencies for confidence sets in Experiment 2 n b (1)0 b (1)1 ( b (1)0 , b (1)1 ) b (2)0 b (2)1 ( b (2)0 , b (2)1 )
100 0.969 0.942 0.918 0.950 0.974 0.958250 0.958 0.956 0.945 0.946 0.962 0.959500 0.949 0.945 0.936 0.953 0.966 0.9601000 0.959 0.946 0.954 0.947 0.958 0.9422500 0.956 0.949 0.950 0.947 0.961 0.9585000 0.953 0.941 0.952 0.955 0.955 0.968
It seems that the increase of errors dispersion doesn’t deteriorate covering accu-racy of the confidence sets.
Experiment 3.
Here we consider the case when the errors distributions are heavytailed. We generate the data with ε X and ε Y having Student-T distribution with df = 14 degrees of freedom. (This is the smallest df for which assumptions of Theo-rem 4 hold.) Covering frequencies are presented in Table 3. Table 3.
Covering frequencies for confidence sets in Experiment 3 n b (1)0 b (1)1 ( b (1)0 , b (1)1 ) b (2)0 b (2)1 ( b (2)0 , b (2)1 )
100 0.935 0.961 0.948 0.936 0.987 0.957250 0.953 0.960 0.950 0.964 0.980 0.950500 0.940 0.954 0.939 0.958 0.973 0.9621000 0.946 0.949 0.943 0.954 0.971 0.9352500 0.961 0.949 0.948 0.937 0.953 0.9475000 0.947 0.949 0.948 0.954 0.956 0.958
It seems that the accuracy of covering slightly decreased but this decrease is in-significant for practical purposes.
We would like to demonstrate advantages of the proposed technique by application tothe analysis of the External Independent Testing (EIT) data (see [7]). EIT is a set ofexams for high school graduates in Ukraine which must be passed for admission touniversities. We use data on EIT-2016 from the official site of
Ukrainian Center forEducational Quality Assessment . In this presentation we consider only the data on scores on two subjects:
Ukrainianlanguage and literature (Ukr) and on
Mathematics (Math). The scores range from 100to 200 points. (We have excluded the data on persons who failed on one of the exams https://zno.testportal.com.ua/stat/201606 R. Maiboroda, O. Sugakova or didn’t pass these exams at all.) EIT-2016 contain such data on 246 thousands of ex-aminees. The information on the region (Oblast) of Ukraine in which each examineeattended the high school is also available in EIT-2016.Our aim is to investigate how dependence between Ukr and Math scores differs forexaminees grown up in different environments There can be, e.g. an environment ofadherents of Ukrainian culture and Ukrainian state, or in the environment of personscritical toward the Ukrainan independence. EIT-2016 doesn’t contain information onsuch issues. So we use data on Ukrainian Parliament (Verhovna Rada) election re-sults to deduce approximate proportions of adherents of different political choices indifferent regions of Ukraine.We divided adherents of 29 parties and blocks that took part in the elections intothree large groups, which are the components of our mixture:(1) Pro-Ukrainian persons, voting for the parties that then created the ruling coali-tion (BPP, Batkivschyna, Narodny Front, Radicals and Samopomich)(2) Contra-Ukrainian persons who voted for the Opposition block, voted against allor voted for small parties which where under 5% threshold on these elections.(3) Neutral persons who did not took part in the voting.Combining these data with EIT-2016 we obtain the sample ( X j , Y j ) , j = 1 , . . . , n ,where X j is the Math score of the j -th examinee, Y j is his/her Ukr score. The con-centrations of components ( p j , p j , p j ) are taken as frequencies of adherents of cor-responding political choice at the region where the j -th examinee attended the highschool.In [7] the authors propose to use classical linear regression model (in which theerror appears in the response only) to describe dependence between X j and Y j inthese data. But the errors-in-variables model can be more adequate since the causeswhich deteriorate ideal functional dependence Ukr = b + b Math can affect bothMath and Ukr scores causing random deviations of, maybe, the same dispersion foreach variable.So, in this presentation, we assumed that the data are described by the model(20)–(22), where κ ( O ) = 1 , , means the component (environment at which theperson O was grown up) corresponding to one of three political choices given above.(Lower and upper endpoints of confidence intervals are given in columns named lowand upp correspondingly.)In this model we calculated the confidence intervals of the level α = 0 . / ≈ . to derive the unilateral level α = 0 . in comparisons of three intervals de-rived for three different components. The results are presented in Table 4. We ob-serve that the obtained intervals are rather narrow. They don’t intersect for different Table 4.
Confidence sets for coefficients of regression between Math and Ukr
Pro Contra Neutrallow upp low upp low upp b ( k )0 b ( k )1 ackknife covariance matrix estimation for observations from mixture Fig. 1.
Estimated orthogonal regression lines for EIT-2016 data components. So, the regression coefficients for different components are significantlydifferent. (Of course, it is so only if our theoretical model of the data distribution isadequate.)The orthogonal regression lines corresponding to different components are pre-sented on Fig 1. The solid line corresponds to the Pro-component, the dashed line forthe Contra-component and the dotted line for the Neutrals.These results have simple and plausible explanation. Say, in the Pro-componentthe success in Ukr positively correlates with the general school successes, so withMath scores, too. It is natural for persons who are interested in Ukrainian culture andliterature. In the Contra-component the correlation is negative. Why? The personswith high Math grades in this component do not feel the need to learn Ukrainian. Butthe persons with less success in Math try to improve their average score (by whichthe admission to universities is made) by increasing their Ukr score. The Neutralcomponent shows positive correlation between Math and Ukr, but it is less then thecorrelation in the Pro-component.Surely, these explanations are too simple to be absolutely correct. We considerthem only as examples of hypotheses which can be deduced from the data by theproposed technique.
To demonstrate Theorem 3 we need three lemmas. Below the symbols C and c meanfinite constants, maybe different. R. Maiboroda, O. Sugakova
Lemma 1.
Assume that det Γ ∞ > . Then:1. sup j =1 ,...,n ; m =1 ,...,M | a mj ; n | = O ( n − ) .2. sup i,j =1 ,...,n ; i = j ; m =1 ,...,M | a mj ; n − a mji − ; n | = O ( n − ) . Proof.
By definition, n Γ ; n → Γ ∞ , so there exists c > such that det Γ ; n > c forall n large enough. This together with | p mj ; n | < imply k Γ − n k ≤ Cn . (26)(Here k · k means the operator norm.) Taking into account that a ; n = p ; n Γ − n , weobtain the first statement of the lemma.Then by (16)–(17), a (cid:5) j − a (cid:5) ji − = Γ − p (cid:5) j − Γ − i − p (cid:5) j = 1 h i Γ − p (cid:5) i ( p (cid:5) i ) T Γ − p (cid:5) j . This together with (26) yields the second statement.
Lemma 2.
Assume that for m = 1 , . . . , M :1. E ( m ) [ ξ ] = 0 ;2. For some δ > , E ( m ) ( ξ ) | log | ξ || (1+ δ ) < ∞ ;3. det Γ ∞ > .Then for some C < ∞ , P ( | ¯ ξ ( k ) | > C r log log nn ) → as n → ∞ . Proof.
Let η , . . . , η n be independent random variables with E η i = 0 . Let us denote B n = P nj =1 E ( η j ) . Then the last formula in the proof of Theorem 7.2 and Theorem7.3 in [10] imply the following proposition. Proposition 1. If lim n →∞ n n X j =1 E η j | log | η j || δ < ∞ , then for any b such that < b < √ δ , P n X j =1 η j ≥ b p B n log log B n ≤ (log B n ) − b for n large enough. ackknife covariance matrix estimation for observations from mixture Let η j = ± na ( k ) j ; n ξ lj . Then B n = n P nj =1 ( a ( k ) j ; n ) Var ξ lj ∼ Cn by Lemma 1.Assumption 2 implies that the assumption of Proposition 1 holds. So, P n X j =1 na ( k ) j ; n ξ lj > b p Cn log log Cn → . This implies the statement of the lemma.
Lemma 3.
Assume that for some α > E ( m ) [ | ξ | α ] < ∞ for all m = 1 , . . . , M. Then for any β > /α there exists C < ∞ such that P { sup j =1 ,...,n | ξ j | > Cn β } → as n → ∞ . Proof.
By the Chebyshov inequality we obtain that for some < R < ∞ , P {| ξ j | > x } ≤ Rx α . Then for αβ > , P { sup j =1 ,...,n | ξ j | > cn β } = 1 − P { sup j =1 ,...,n | ξ j | ≤ cn β } = 1 − n Y j =1 P {| ξ j | ≤ Cn β } = 1 − n Y j =1 (1 − P {| ξ j | > Cn β } ) ≤ − (cid:18) − RC α n αβ (cid:19) n = 1 − exp (cid:18) n log (cid:18) − RC α n αβ (cid:19)(cid:19) ∼ − exp (cid:18) − nRC α n αβ (cid:19) → as n → ∞ , if αβ > .Lemma is proved. Proof of Theorem 3.
Let ξ ′ j = ξ j − E ξ j . Then ¯ ξ ( k ) = n X j =1 a ( k ) j ξ j = n X j =1 a ( k ) j ξ ′ j + n X j =1 M X m =1 a ( k ) j p ( m ) j µ ( m ) = n X j =1 a ( k ) j ξ ′ j + µ ( k ) , due to (3). Similarly, ¯ ξ ( k ) i − = X j = i a ( k ) ji − ξ ′ j + µ ( k ) . Let us denote U i = ( U , . . . , U d ) T = ¯ ξ ( k ) − ¯ ξ ( k ) i − . Then U i = n X j =1 a ( k ) j ξ ′ j − X j = i a ji − ξ ′ j = a ( k ) i ξ ′ i + X j = i ( a ( k ) j − a ( k ) ji − ) ξ ′ j (27) R. Maiboroda, O. Sugakova and ˆ ϑ − ˆ ϑ i − = H ( ¯ ξ ( k ) ) − H ( ¯ ξ ( k ) i − ) = H ′ ( ζ i ) U i , where ζ i is some intermediate point between ¯ ξ ( k ) and ¯ ξ ( k ) i − . So, ˆ V ; n = n n X i =1 H ′ ( ζ i ) U i U Ti ( H ′ ( ζ i )) T . (28)Let us denote ˜ V ; n = n n X i =1 H ′ ( µ ( k ) ) U i U Ti ( H ′ ( µ ( k ) )) T . (29)We will show that ˜ V ; n → V ∞ as n → ∞ in probability (30)and n k ˆ V ; n − ˜ V ; n k → as n → ∞ in probability . (31)These two equations imply the statement of the theorem.We start from (30). Let us calculate E ˜ V ; n . Notice that E U i U Ti = ( a ( k ) i ) E ξ ′ i ( ξ ′ i ) T + X j = i ( a ( k ) j − a ( k ) ji − ) E ξ ′ i ( ξ ′ i ) T . By Assumption 2 of the theorem, sup i k E ξ ′ i ( ξ ′ i ) T k < C , and by Lemma 1, sup j =1 ,...,n ( a ( k ) j − a ( k ) ji − ) = O ( n − ) . So, E ˜ V ; n = n H ′ ( µ ( k ) ) n X i =1 ( a ( k ) i ) E ξ ′ i ( ξ ′ i ) T ( H ′ ( µ ( k ) )) T + O ( n − ) . By the same way as in (9), we obtain E ˜ V ; n → H ′ ( µ ( k ) ) Σ ( k ) ∞ ( H ′ ( µ ( k ) )) T = V ∞ . (32)Now, let us estimate E k ˜ V ; n − E ˜ V ; n k ≤ Cn d X l ,l =1 n X i =1 E ( U l i U l i − E U l i U l i ) ≤ Cn n X l =1 n X i =1 E ( U li ) . (33)Notice that E ( U li ) = E a ( k ) i ξ li ′ + X j = i ( a ( k ) i − a ( k ) ji − ) ξ lj ′ ackknife covariance matrix estimation for observations from mixture = ( a ( k ) i ) E ( ξ lj ′ ) + 6( a ( k ) i ) E X j = i ( a ( k ) i − a ( k ) ji − ) ξ lj ′ + E X j = i ( a ( k ) i − a ( k ) ji − ) ξ lj ′ = O ( n − ) due to Lemma 1 and Assumption 2 of the theorem. So by (33) we obtain E k ˜ V ; n − E ˜ V ; n k = O ( n − ) . This and (32) imply (30).Let us show (31). Notice that n ( ˆ V ; n − ˜ V ; n ) = n n X i =1 ( H ′ ( ζ i ) − H ′ ( µ ( k ) )) U i ( U i ) T ( H ′ ( ζ i )) T + n n X i =1 ( H ′ ( µ ( k ) )) U i ( U i ) T ( H ′ ( ζ i ) − H ′ ( µ ( k ) )) T . (34)By Lemma 1, sup i k U i ( U i ) T k ≤ Cn − sup i k ξ ′ i k . By Lemma 3 and Assumption 2of the theorem, sup i k ξ ′ i k = O P ( n β ) for β > α . Since α > we may take here β < / . Let us estimate sup i k H ′ ( ζ i ) − H ′ ( µ ( k ) ) k .Notice that ζ i is an intermediate point between ¯ ζ ( k ) and ¯ ζ ( k ) i − . By Lemma 2, k ¯ ξ ( k ) − µ ( k ) k = O P r log log nn ! . Then, by Lemma 1, sup i k ¯ ξ ( k ) − ¯ ξ ( k ) i − k ≤ Cn − sup i | ξ ′ i | = O P ( n − β ) and sup i k ζ i − µ ( k ) k = O P r log log nn ! . Due to Assumption 1 of the theorem this implies sup i k H ′ ( ζ i ) − H ′ ( µ ( k ) ) k = O P r log log nn ! and sup i k H ′ ( ζ i ) | = O P (1) . R. Maiboroda, O. Sugakova
Combining these estimates with (34), we obtain n ( ˆ V ; n − ˜ V ; n ) = n n X i =1 O P r log log nn ! Cn n β O P (1) = O P (1) , since β < / . This is (31).Combining (31) and (30), we obtain the statement of theorem. We introduced a modification of the jackknife technique for the ACM estimation formoment estimators by observations from mixtures with varying concentrations. A fastalgorithm is proposed which implements this technique. Consistency of derived esti-mator is demonstrated. Results of simulations demonstrate its practical applicabilityfor sample sizes n > . References [1] Borovkov, A.A.: Mathematical Statistics. Gordon and Breach Science Publishers, Ams-terdam (1998). MR1712750[2] Branham, R.: Total Least Squares in Astronomy. In: Total Least Squares andErrors-in-Variables Modeling, pp. 375–384. Springer, Dordrecht (2002). MR1952962.10.1007/978-94-017-3552-0_33[3] Cheng, C.-L., Van Ness, J.: Statistical Regression with Measurement Error, Kendall’sLibrary of Statistics 6. Arnold, London (1999). MR1719513[4] Maiboroda, R.: Statistical analysis of mixtures. Kyiv University Publishers, Kyiv (inUkrainian) (2003)[5] Maiboroda, R., Sugakova, O.: Statistics of mixtures with varying concentrations with ap-plication to DNA microarray data analysis. J. Nonparametr. Stat. (1), 201–215 (2012).MR2885834. 10.1080/10485252.2011.630076[6] Maiboroda, R., Navara, H., Sugakova, O.: Orthogonal regression for observations frommixtures. Teor. Imovir. Mat. Stat. , 152–167 (2018)[7] Miroshnichenko V, M.R.: Confidence ellipsoids for regression coefficients by observa-tions from a mixture. Mod. Stoch. Theory Appl. (2), 225–245 (2018). MR3813093.10.15559/18-vmsta105[8] Masiuk, S., Kukush, A., Shklyar, S., Chepurny, M., Likhtarov, I. (eds.): Radiation RiskEstimation: Based on Measurement Error Models, 2nd edn. de Gruyter series in Mathe-matics and Life Sciences, vol. 5. de Gruyter (2017). MR3726857[9] McLachlan, G.J., Lee, S.X., Rathnayake, S.I.: Finite Mixture Models. Ann.Rev. Stat. Appl. , 355–378 (2019). MR3939525. 10.1146/annurev-statistics-031017-100325[10] Petrov, V.: Limit theorems of probability theory: sequences of independent random vari-ables. Clarendon Press[11] Shao, J.: Mathematical statistics. Springer, New York (2007). MR2002723.10.1007/b97553[12] Shao, J., Tu, D.: The Jackknife and Bootstrap. Springer (2012). MR1351010.10.1007/978-1-4612-0795-5 ackknife covariance matrix estimation for observations from mixture513