[PDF] Semiparametric empirical likelihood inference with estimating equations under density ratio models

Abstract

The density ratio model (DRM) provides a flexible and useful platform for combining information from multiple sources. In this paper, we consider statistical inference under two-sample DRMs with additional parameters defined through and/or additional auxiliary information expressed as estimating equations. We examine the asymptotic properties of the maximum empirical likelihood estimators (MELEs) of the unknown parameters in the DRMs and/or defined through estimating equations, and establish the chi-square limiting distributions for the empirical likelihood ratio (ELR) statistics. We show that the asymptotic variance of the MELEs of the unknown parameters does not decrease if one estimating equation is dropped. Similar properties are obtained for inferences on the cumulative distribution function and quantiles of each of the populations involved. We also propose an ELR test for the validity and usefulness of the auxiliary information. Simulation studies show that correctly specified estimating equations for the auxiliary information result in more efficient estimators and shorter confidence intervals. Two real-data examples are used for illustrations.

Full PDF

aa r X i v : . [ m a t h . S T ] F e b Semiparametric empirical likelihood inference with estimatingequations under density ratio models

Meng Yuan, Pengfei Li and Changbao Wu The density ratio model (DRM) provides a ﬂexible and useful platform for com-bining information from multiple sources. In this paper, we consider statisticalinference under two-sample DRMs with additional parameters deﬁned throughand/or additional auxiliary information expressed as estimating equations. Weexamine the asymptotic properties of the maximum empirical likelihood estima-tors (MELEs) of the unknown parameters in the DRMs and/or deﬁned throughestimating equations, and establish the chi-square limiting distributions for the em-pirical likelihood ratio (ELR) statistics. We show that the asymptotic variance ofthe MELEs of the unknown parameters does not decrease if one estimating equa-tion is dropped. Similar properties are obtained for inferences on the cumulativedistribution function and quantiles of each of the populations involved. We alsopropose an ELR test for the validity and usefulness of the auxiliary information.Simulation studies show that correctly speciﬁed estimating equations for the auxil-iary information result in more eﬃcient estimators and shorter conﬁdence intervals.Two real-data examples are used for illustrations.

Keywords:

Auxiliary information, density ratio model, empirical likelihood, esti-mating equations.

Suppose we have two independent random samples { X , . . . , X n } and { X , . . . , X n } from two populations with cumulative distribution functions (CDFs) F and F , respec-tively. The dimension of X ij can be one or greater than one. We assume that the CDFs F and F are linked through a semiparametric density ratio model (DRM) (Anderson,1979; Qin, 2017), dF ( x ) = exp { α + β ⊤ q ( x ) } dF ( x ) = exp { θ ⊤ Q ( x ) } dF ( x ) , (1)where dF i ( x ) denotes the density of F i ( x ) for i = 0 and 1; θ = ( α, β ⊤ ) ⊤ are the unknownparameters for the DRM; Q ( x ) = (1 , q ( x ) ⊤ ) ⊤ with q ( x ) being a prespeciﬁed, nontrivialfunction of dimension d ; and the baseline distribution F is unspeciﬁed. We further Meng Yuan is doctoral student, Pengfei Li is Professor and Changbao Wu is Professor, Departmentof Statistics and Actuarial Science, University of Waterloo, Waterloo ON N2L 3G1,Canada (E-mails: [email protected] , [email protected] and [email protected] ). F , F , and θ is available in the form offunctionally independent unbiased estimating equations (EEs): E { g ( X ; ψ , θ ) } = , (2)where E ( · ) refers to the expectation operator with respect to F , ψ consists of additionalparameters of interest and has dimension p , g ( · ; · ) is r -dimensional, and r ≥ p . In thispaper, our goal is twofold:(1) we develop new and general semiparametric inference procedures for ( ψ , θ ) and( F , F ) along with their quantiles under Model (1) with unbiased EEs in (2);(2) we propose a new testing procedure on the validity of (2) under Model (1), whichleads to a practical validation method on the usefulness of the auxiliary information.The semiparametric DRM in (1) provides a ﬂexible and useful platform for combininginformation from multiple sources (Qin, 2017). It enables us to utilize information fromboth F and F to improve inferences on the unknown model parameters and the sum-mary population quantities of interest (Chen & Liu, 2013; Cai et al., 2017; Zhuang et al.,2019). With the unspeciﬁed F , the DRM embraces many commonly used statisticalmodels including distributions of exponential families (Kay & Little, 1987). For exam-ple, when q ( x ) = log x , the DRM includes two log-normal distributions with the samevariance with respect to the log-scale, as well as two gamma distributions with the samescale parameter; when q ( x ) = x , it includes two normal distributions with diﬀerentmeans but a common variance and two exponential distributions. Jiang & Tu (2012)observed that the DRM is actually broader than Cox proportional hazard models. More-over, it has a natural connection to the well-studied logistic regression if one treats D =0 and 1 as indicators for the observations from F and F , respectively. Among others,Anderson (1979) and Qin & Zhang (1997) noticed that the DRM is equivalent to thelogistic regression model via the fact that P ( D = 1 | x ) = exp { α ∗ + β ⊤ q ( x ) } { α ∗ + β ⊤ q ( x ) } , (3)where α ∗ = α + log { P ( D = 1) /P ( D = 0) } .The EEs in (2) play two important roles. First, they can be used to deﬁne many im-portant summary population quantities such as the ratio of the two population means,the centered and uncentered moments, the generalized entropy class of inequality mea-sures, the CDFs, and the quantiles of each population. See Example 1 below and Section1 of the Supplementary Material for more examples. Second, they provide a uniﬁedplatform for the use of auxiliary information. With many data sources being increas-ingly available, it becomes more feasible to access auxiliary information, and using suchinformation to enhance statistical inference is an important and active research topic inmany ﬁelds. Calibration estimators, which are widely used in survey sampling, missingdata problems and causal inference, rely heavily on the use of auxiliary information; seeWu & Thompson (2020) and the references therein. Many economics problems can be2ddressed using similar methodology. For instance, knowledge of the moments of themarginal distributions of economic variables from census reports can be used in com-bination with microdata to improve the parameter estimates of microeconomic models(Imbens & Lancaster, 1994). Examples 2 and 3 below illustrate the use of auxiliaryinformation through EEs in the form of (2). Example 1. (The mean ratio of two populations) The ratio of the means of two pos-itive skewed distributions is often of interest in biomedical research (Zhou et al., 1997;Wu et al., 2002). Let µ and µ be the means with respect to F and F , respectively.Further, let δ = µ /µ denote the mean ratio of the two populations. For inferenceon δ , a common assumption is that both distributions are lognormal. To alleviate therisk of parametric assumptions, we could use the DRM in (1) with q ( x ) = log x or q ( x ) = (log x, log x ) ⊤ depending on whether or not the variances with respect to thelog-scale are the same. Then, under the DRM (1) , δ can be deﬁned through the followingEE: g ( x ; ψ , θ ) = δx − x exp { θ ⊤ Q ( x ) } , with ψ = δ . When additional information is available, we may add more EEs to improvethe estimation eﬃciency; see Section 4.1 for further detail. Example 2. (Retrospective case-control studies with auxiliary information) Consider aretrospective case-control study with D = 1 or 0 representing diseased or disease-freestatus, and X representing the collection of risk factors. Note that the two samples arecollected retrospectively, given the diseased status. Let F and F denote the CDF of X given D = 0 and D = 1 , respectively. Assume that the relationship between D and X canbe modeled by the logistic regression speciﬁed in (3) . Then, using the equivalence betweenthe DRM and the logistic regression discussed above, F and F satisfy the DRM (1) .Qin et al. (2015) used covariate-speciﬁc disease prevalence information to improvethe power of case-control studies. Speciﬁcally, let X = ( Y, Z ) ⊤ with Y and Z beingtwo risk factors. Assume that we know the disease prevalence at various levels of Y : φ ( a l − , a l ) = P ( D = 1 | a l − < Y ≤ a l ) for l = 1 , . . . , k . Let π = P ( D = 1) be theoverall disease prevalence. Using Bayes’ formula, the information in the φ ( a l − , a l ) ’s canbe summarized as E { g ( X ; ψ , θ ) } = , where ψ = π and the l th component of g ( x ; ψ , θ ) is g l ( x ; ψ , θ ) = I ( a l − < x ≤ a l ) (cid:20) π − π exp { θ ⊤ Q ( x ) } − φ ( a l − , a l )1 − φ ( a l − , a l ) (cid:21) . (4) Chatterjee et al. (2016) improved the internal study by using summary-level informa-tion from an external study. Suppose X = ( Y ⊤ , Z ⊤ ) ⊤ , where Y is available for both theinternal and external studies, while Z is available for only the internal study. Assumethat the external study provides the true coeﬃcients ( α ∗ Y , β ∗ Y ) for the following logisticregression model, which may not be the true model: h ( Y ; α Y , β Y ) = P ( D = 1 | Y ) = exp( α + β ⊤ Y Y )1 + exp( α + β ⊤ Y Y ) . This assumption is reasonable when the total sample size n = n + n satisﬁes n/n E → , here n E is the total sample size in the external study. Further, assume that the jointdistribution of ( D, X ) is the same for both the internal and external studies. Let h ( y ) = h ( y ; α ∗ Y , β ∗ Y ) . In Section 2 of the Supplementary Material, we argue that if the externalstudy is a prospective case-control study, then E { g ( X ; ψ , θ ) } = , where g ( x ; ψ , θ ) = [ − (1 − π ) h ( y ) + π exp { θ ⊤ Q ( x ) }{ − h ( y ) } ](1 , y ⊤ ) ⊤ (5) with ψ = π ; if the external study is a retrospective case-control study, then E { g ( X ; θ ) } = , where g ( x ; θ ) = [ − (1 − π E ) h ( y ) + π E exp { θ ⊤ Q ( x ) }{ − h ( y ) } ](1 , y ⊤ ) ⊤ (6) with π E being the proportion of diseased individuals in the external study. Example 3. (A two-sample problem with common mean) Tsao & Wu (2006) consideredtwo populations with a common mean. This type of problems occurs when two “instru-ments” are used to collect data on a common response variable, and these two instrumentsare believed to have no systematic biases but diﬀer in precision. The observations fromthe two instruments then form two samples with a common population mean. In theliterature, there has been much interest in using the pooled sample to improve inferences.A common assumption is that the two samples follow normal distributions with a com-mon mean but diﬀerent variances. To gain robustness with respect to the parametricassumption, we may use the DRM (1) with q ( x ) = ( x, x ) ⊤ . Under this model, thecommon-mean assumption can be incorporated via the EE: E { X exp { θ ⊤ Q ( X ) } − X } = 0 . (7) The DRM has been investigated extensively because of its ﬂexibility and eﬃciency. Forexample, it has been applied to multiple-sample hypothesis-testing problems (Fokianos et al.,2001; Cai et al., 2017; Wang et al., 2017, 2018) and quantile and quantile-function esti-mation (Zhang, 2000; Chen & Liu, 2013). These inference problems can be viewed asspecial cases, without auxiliary information, of the ﬁrst goal to be achieved in this paper.Other applications of the DRM include receiver operating characteristic (ROC) analysis(Qin & Zhang, 2003; Chen et al., 2016; Yuan et al., 2021), inference under semiparamet-ric mixture models (Qin, 1999; Zou et al., 2002; Li et al., 2017), the modeling of mul-tivariate extremal distributions (de Carvalho & Davison, 2014), and dominance indexestimation (Zhuang et al., 2019). Recently, Li et al. (2018) studied maximum empiri-cal likelihood estimation (MELE) and empirical likelihood ratio (ELR) based conﬁdenceintervals (CIs) for a parameter deﬁned as ψ = R u ( x ; θ ) dF ( x ), where u ( · ; · ) is a one-dimensional function. They did not consider auxiliary information, and because of thespeciﬁc form of ψ , their results do not apply to the mean ratio discussed in Example1. Zhang et al. (2020) investigated the ELR statistic for quantiles under the DRM andshowed that the ELR-based conﬁdence region of the quantiles is preferable to the Wald-4ype conﬁdence region. Again, they did not consider auxiliary information. In summary,the existing literature on DRMs focuses on cases where there is no auxiliary informa-tion, and furthermore, there is no general theory available to handle parameters deﬁnedthrough the EEs in (2).Using the connection of the DRM to the logistic regression model, Qin et al. (2015)studied the MELE of θ and the ELR statistic for testing a parameter in θ under Model (1)with the unbiased EEs in (4). Chatterjee et al. (2016) proposed constrained maximumlikelihood estimation for the unknown parameters in the internal study using summary-level information from an external study. In Section 2 of the Supplementary Material,we argue that their results are applicable to the MELE of θ under Model (1) with theunbiased EEs in (5) but not to the MELE of θ under Model (1) with the unbiased EEs in(6). Furthermore, they did not consider the ELR statistic for the unknown parameters.Qin et al. (2015) and Chatterjee et al. (2016) focused on how to use auxiliary informationto improve inference on the unknown parameters, and they did not check the validity ofthat information or explore inferences on the CDFs ( F , F ) and their quantiles. With two-sample observations from the DRM (1), we use the empirical likelihood (EL)of Owen (1988, 2000) to incorporate the unbiased EEs in (2). We show that the MELEof ( ψ , θ ) is asymptotically normal, and its asymptotic variance will not decrease when anEE in (2) is dropped. We also develop an ELR statistic for testing a general hypothesisabout ( ψ , θ ), and show that it has a χ limiting distribution under the null hypothesis.The result can be used to construct the ELR-based conﬁdence region for ( ψ , θ ). Similarresults are obtained for inferences on ( F , F ) and their quantiles. Finally, we constructan ELR statistic with the χ null limiting distribution to test the validity of some or allof the EEs in (2).We make the following observations:(1) Our results on the two-sample DRMs contain more advanced development thanthose in Qin & Lawless (1994) for the one-sample case.(2) Our inferential framework and theoretical results are very general. The results inQin et al. (2015) and Chatterjee et al. (2016) for case-control studies are specialcases of our theory for an appropriate choice of g ( x ; ψ , θ ) in (2). Our results arealso applicable to cases that are not covered by these two earlier studies, e.g.,Example 2 with the EEs in (6) and Example 3.(3) Our proposed ELR statistic, to our best knowledge, is the ﬁrst formal procedure totest the validity of auxiliary information under the DRM or for case-control studies.(4) Our proposed inference procedures for ( F , F ) and their quantiles in the presenceof auxiliary information are new to the literature.The rest of this paper is organized as follows. In Section 2, we develop the ELinferential procedures and study the asymptotic properties of the MELE of ( ψ , θ ). We5lso investigate the ELR statistics for ( ψ , θ ) and for testing the validity of the EEs in (2).In Section 3, we discuss inference procedures for ( F , F ) and their quantiles. Simulationresults are reported in Section 4, and two real-data examples are presented in Section 5.We conclude the paper with a discussion in Section 6. For convenience of presentation,all the technical details are given in the Supplementary Material. ( ψ , θ )In this section, we ﬁrst develop the EL formulation under the DRM (1) with the unbiasedEEs in (2). With two samples { X , . . . , X n } and { X , . . . , X n } from F and F ,respectively, the full likelihood is Y i =0 n i Y j =1 dF i ( X ij ) . Under the one-sample EL formulation of Owen (2000), the baseline distribution function F ( x ) would have been modeled as F ( x ) = P n j =1 p j I ( X j ≤ x ), where p j = dF ( X j ) for j = 1 , . . . , n . Under the two-sample DRM (1), we use the combined sample to modelthe baseline function F ( x ) as F ( x ) = X i =0 n i X j =1 p ij I ( X ij ≤ x ) , (8)where p ij = dF ( X ij ) for i = 0 , j = 1 , . . . , n i . Note that the size of the combinedsample is n = n + n . With (8) and under the DRM (1), the EL function is given by L n = ( Y i =0 n i Y j =1 p ij ) ( n Y j =1 θ ⊤ Q ( X j ) ) . (9)The feasible p ij ’s satisfy two sets of constraints given by C = ( ( F , θ ) : p ij > , X i =0 n i X j =1 p ij = 1 , X i =0 n i X j =1 p ij exp { θ ⊤ Q ( X ij ) } = 1 ) (10)and C = ( ( F , ψ , θ ) : X i =0 n i X j =1 p ij g ( X ij ; ψ , θ ) = ) , (11)where the set of constraints C ensures that F and F are CDFs and the set of constraints C is induced by the EEs in (2).Using the Lagrange multiplier method and for the given ψ and θ , it can be shown6hat the maximizer of the EL function is given by p ij = 1 n

11 + λ (cid:2) exp { θ ⊤ Q ( X ij ) } − (cid:3) + ν ⊤ g ( X ij ; ψ , θ ) , where the Lagrange multipliers λ and ν = ( ν , · · · , ν r ) ⊤ are the solutions to the followingset of r + 1 equations: X i =0 n i X j =1 exp { θ ⊤ Q ( X ij ) } −

11 + λ (cid:2) exp { θ ⊤ Q ( X ij ) } − (cid:3) + ν ⊤ g ( X ij ; ψ , θ ) = 0 , (12) X i =0 n i X j =1 g ( X ij ; ψ , θ )1 + λ (cid:2) exp { θ ⊤ Q ( X ij ) } − (cid:3) + ν ⊤ g ( X ij ; ψ , θ ) = . (13)The proﬁle empirical log-likelihood of ( ψ , θ ) is given by ℓ n ( ψ , θ ) = − X i =0 n i X j =1 log (cid:8) λ (cid:2) exp { θ ⊤ Q ( X ij ) } − (cid:3) + ν ⊤ g ( X ij ; ψ , θ ) (cid:9) + n X j =1 θ ⊤ Q ( X j ) . The MELEs of ψ and θ are then deﬁned as ( ˆ ψ , ˆ θ ) = arg max ψ , θ ℓ n ( ψ , θ ).We now establish the asymptotic distribution of ( ˆ ψ , ˆ θ ). Let η = ( ψ ⊤ , θ ⊤ ) ⊤ be thevector of parameters and η ∗ be the true value of η . Let λ ∗ = n /n . We further deﬁne ω ( x ; θ ) = exp (cid:8) θ ⊤ Q ( x ) (cid:9) , ω ( x ) = ω ( x ; θ ∗ ) , h ( x ) = 1 + λ ∗ { ω ( x ) − } ,h ( x ) = λ ω ( x ) h ( x ) , G ( x ; η ) = ( ω ( x ; θ ) − , g ( x ; θ , β ) ⊤ ) ⊤ , G ( x ) = G ( x ; η ∗ ) , A θθ = (1 − λ ) E (cid:8) h ( X ) Q ( x ) Q ( x ) ⊤ (cid:9) , A θu = A ⊤ uθ = E (cid:26) ∂ G ( X ; η ) ∂ θ (cid:27) ⊤ − E (cid:8) h ( X ) Q ( x ) G ( X ) ⊤ (cid:9) , A ψu = A ⊤ uψ = E (cid:26) ∂ G ( X ; η ) ∂ ψ (cid:27) ⊤ , A uu = E (cid:26) G ( X ) G ( X ) ⊤ h ( X ) (cid:27) . Noting that ω ( · ), h ( · ), h ( · ) and G ( · ) depend on ψ ∗ and/or θ ∗ , we drop these redundantparameters for notational simplicity. Theorem 1.

Assume that the regularity conditions in the Appendix are satisﬁed. As thetotal sample size n = n + n goes to inﬁnity, we have n / (ˆ η − η ∗ ) → N (cid:0) , J − (cid:1) in distribution, where J = U V − U ⊤ , U = (cid:18) A ψu A θθ A θu (cid:19) , and V = (cid:18) A θθ A uu (cid:19) .

7n the absence of the constraints C in (11), we can maximize the EL function in(9) with respect only to the CDF constraints C in (10) to obtain the MELE ˜ θ of θ .Qin & Zhang (1997) and Keziou & Leoni-Aubin (2008) noticed that ˜ θ equivalently max-imizes the following dual likelihood: ℓ nd ( θ ) = − X i =0 n i X j =1 log (cid:8) λ ∗ (cid:2) exp (cid:8) θ ⊤ Q ( X ij ) (cid:9) − (cid:3)(cid:9) + n X j =1 (cid:8) θ ⊤ Q ( X j ) (cid:9) . (14)That is, ˜ θ = arg max θ ℓ nd ( θ ). Corollary 1.

Under the conditions of Theorem 1,(a) if r = p , the asymptotic variance of n / (ˆ θ − θ ∗ ) is the same as that of n / (˜ θ − θ ∗ ) ;(b) if r > p , the asymptotic variance matrix of n / (ˆ η − η ∗ ) cannot decrease if an EEin (2) is dropped. We provide some further comments on the results presented in Corollary 1. First,when the dimensions of the parameters ψ and the EEs are equal, we can solve Z g ( X ; ψ , ˜ θ ) d ˜ F ( x ) = to get the estimator ˜ ψ of ψ , where ˜ F ( x ) is the MELE of F without the constraints C in (11). Because of the result in Corollary 1(a), the estimators ˜ ψ and ˆ ψ share thesame asymptotic property. Second, Corollary 1(b) indicates that additional auxiliaryinformation leads to more eﬃcient estimation of η .The proposed semiparametric method provides a way to ﬁnd the point estimator ofthe unknown parameters, which has the asymptotic normality analogue to the parametricestimator. The semiparametric framework also creates a natural platform for hypothesistests using the ELR statistic. We consider a general null hypothesis H : H ( η ) = , where the function H ( · ) is q × q ≤ p + d + 1, and the derivative of this functionis of rank q . This null hypothesis forms a third set of constraints C = (cid:8) η = ( ψ ⊤ , θ ⊤ ) ⊤ : H ( η ) = (cid:9) . The ELR statistic for testing H is then deﬁned as R n = 2 (cid:26) sup ψ , θ ℓ n ( ψ , θ ) − sup η ∈C ℓ n ( ψ , θ ) (cid:27) . The next theorem establishes the asymptotic property of the ELR statistic R n under thenull hypothesis H . 8 heorem 2. Assume that the conditions of Theorem 1 hold. Under H , as n → ∞ , theELR statistic R n → χ q in distribution. The result of Theorem 2 is very general due to the general form of the function H ( · ).First, it is applicable to testing problems that focus on some of the parameters in η . Forexample, if we wish to test H : ψ = ψ , we can choose H ( η ) = ψ − ψ . Let R ∗ n ( ψ ) bethe ELR function of ψ . That is, R ∗ n ( ψ ) = 2 (cid:26) sup ψ , θ ℓ n ( ψ , θ ) − sup θ ℓ n ( ψ , θ ) (cid:27) . Then R ∗ n ( ψ ) has a chi-squared null limiting distribution with p degrees of freedom.Second, the result can be used to construct conﬁdence regions for some of the parametersin η . For example, we can construct an ELR-based conﬁdence region for the parameter ψ at the nominal level 1 − a as { ψ : R ∗ n ( ψ ) ≤ χ q, − a } , (15)where χ q, − a is the 100(1 − a )th quantile of the χ q distribution.The use of valid auxiliary information leads to improved inference on η . However, ifthe information is not properly speciﬁed in terms of unbiased estimating functions, theresulting estimator of η may be biased (Qin et al., 2015). Our last major theoreticalresult is to construct an ELR statistic for testing the validity and usefulness of theauxiliary information. Let W n = 2 ( sup ( η ,F ) ∈C log L n − sup ( η ,F ) ∈C ∩C log L n ) = 2 n ℓ nd (˜ θ ) − ℓ n ( ˆ ψ , ˆ θ ) o . (16) Theorem 3.

Under the conditions of Theorem 1 and as n → ∞ , we have W n → χ r − p in distribution if (2) is correctly speciﬁed. We can also test the validity of some but not all of the EEs in (2). To do so, wepartition the EEs in (2) into two parts: g ( x ; ψ , θ ) = (cid:18) g ( x ; ψ , θ ) g ( x ; ψ , θ ) (cid:19) , where g ( · ) and g ( · ) are of dimension r − m and m with r − m ≥ p . We are interestedin testing H : E { g ( X ; ψ , θ ) } = . Let ℓ n ( ψ , θ ) be the proﬁle empirical log-likelihoodof ( ψ , θ ) that uses the auxiliary information only through E { g ( x ; ψ , θ ) } = . That is, ℓ n ( ψ , θ ) = − X i =0 n i X j =1 log (cid:8) λ (cid:2) exp { θ ⊤ Q ( X ij ) } − (cid:3) + ν ⊤ g ( X ij ; ψ , θ ) (cid:9) + n X j =1 θ ⊤ Q ( X j ) , λ and ν are the solution to X i =0 n i X j =1 exp { θ ⊤ Q ( X ij ) } −

11 + λ (cid:2) exp { θ ⊤ Q ( X ij ) } − (cid:3) + ν ⊤ g ( X ij ; ψ , θ ) = 0 , X i =0 n i X j =1 g ( X ij ; ψ , θ )1 + λ (cid:2) exp { θ ⊤ Q ( X ij ) } − (cid:3) + ν ⊤ g ( X ij ; ψ , θ ) = . Then the ELR statistic for testing H : E { g ( X ; ψ , θ ) } = can be constructed similarto (16) as W ∗ n = 2 (cid:26) sup ψ , θ ℓ n ( ψ , θ ) − sup ψ , θ ℓ n ( ψ , θ ) (cid:27) . Corollary 2.

Under the conditions of Theorem 1 and as n → ∞ , we have W ∗ n → χ m if E { g ( X ; ψ , θ ) } = is true. In this section, we discuss inferences on the CDFs F and F and their quantiles. Forconvenience of presentation, we assume that the dimension of X ij is one.We ﬁrst construct point estimators of F and F . Let ˆ λ and ˆ ν be the solutions to(12) and (13) with ( ψ , θ ) replaced by ( ˆ ψ , ˆ θ ). The MELEs of p ij are then given asˆ p ij = 1 n

11 + ˆ λ h exp { ˆ θ ⊤ Q ( X ij ) } − i + ˆ ν ⊤ g ( X ij ; ˆ ψ , ˆ θ ) . The MELEs of F and F are then deﬁned asˆ F ( x ) = X i =0 n i X j =1 ˆ p ij I ( X ij ≤ x ) and ˆ F ( x ) = X i =0 n i X j =1 ˆ p ij exp { ˆ θ ⊤ Q ( X ij ) } I ( X ij ≤ x ) . We now present results on the asymptotic properties of the MELEs ˆ F ( x ) and ˆ F ( x )of the two population CDFs F ( x ) and F ( x ). Let W = V − U ⊤ J − U V − − (cid:18) A − uu (cid:19) , B ∗ ( x ) = (cid:18) B θ ( x ) B u ( x ) (cid:19) , B ∗ ( x ) = (cid:18) B θ ( x ) B u ( x ) (cid:19) , where B θ ( x ) = E { h ( X ) Q ( X ) I ( X ≤ x ) } , B u ( x ) = E (cid:26) G ( X ) h ( X ) I ( X ≤ x ) (cid:27) , B θ ( x ) = λ ∗ − λ ∗ E { h ( X ) Q ( X ) I ( X ≤ x ) } , B u ( x ) = E (cid:26) ω ( X ) G ( X ) h ( X ) I ( X ≤ x ) (cid:27) . F ( x ) and ˜ F ( x ) be the MELEs of F and F under the DRM whenthere is no auxiliary information. We refer to Qin & Zhang (1997) for the forms of ˜ F ( x )and ˜ F ( x ) and their asymptotic properties. Denote x ∧ y = min( x, y ). Theorem 4.

Assume that the conditions of Theorem 1 are satisﬁed.(a) For any l, s ∈ { , } and real numbers x and y in the support of F , as n → ∞ , √ n (cid:18) ˆ F l ( x ) − F l ( x )ˆ F s ( y ) − F s ( y ) (cid:19) → N (cid:16) , Σ ls ( x, y ) (cid:17) , where Σ ls ( x, y ) = (cid:18) σ ll ( x, x ) σ ls ( x, y ) σ sl ( y, x ) σ ss ( y, y ) (cid:19) with σ ij ( x, y ) = E (cid:26) ω i + j ( X ) I ( X ≤ x ∧ y ) h ( X ) (cid:27) − F i ( x ) F j ( y ) + B ∗ i ( x ) ⊤ W B ∗ j ( y ) for any i, j ∈ { l, s } .(b) If r = p , the asymptotic variance-covariance matrix Σ ls ( x, y ) reduces to the sameone of √ n (cid:0) ˜ F l ( x ) − F l ( x ) , ˜ F s ( x ) − F s ( x ) (cid:1) ⊤ .(c) If r > p , the asymptotic variance matrix Σ ls ( x, y ) cannot decrease if an EE in (2) is dropped. Theorem 4 indicates that the MELEs ˆ F ( x ) and ˆ F ( x ) have asymptotic propertiessimilar to those of ˆ η . That is, they are asymptotically normally distributed; they areasymptotically equivalent to ˜ F ( x ) and ˜ F ( x ) when r = p ; and they become more eﬃcientwhen r > p .In the second half of this section we discuss the estimation of the quantiles of F i ( x )for i = 0 and 1. For any τ ∈ (0 , τ th-quantile of F i as ξ i,τ = inf { x : F i ( x ) ≥ τ } and its MELE as ˆ ξ i,τ = inf { x : ˆ F i ( x ) ≥ τ } . (17)Similarly, the estimator of ξ i,τ based on ˜ F i ( x ) is deﬁned as˜ ξ i,τ = inf { x : ˜ F i ( x ) ≥ τ } . (18)See Zhang (2000) and Chen & Liu (2013) for the asymptotic properties of ˜ ξ i,τ . We referto ˆ ξ i,τ as the “DRM-EE” quantile estimators and ˜ ξ i,τ as the “DRM” quantile estimators.The Bahadur representation is a useful tool for studying the asymptotic propertiesof quantile estimators. In the following theorem, we show that the DRM-EE quantileestimators are Bahadur representable. Let f i ( x ) be the probability density function of F i ( x ) for i = 0 and 1. 11 heorem 5. Assume that the conditions of Theorem 1 are satisﬁed. Further, for i = 0 , and any τ ∈ (0 , , assume that f i ( x ) is continuous and positive at x = ξ i,τ . Then ˆ ξ i,τ admits the Barhadur representation ˆ ξ i,τ = ξ i,τ + τ − ˆ F i ( ξ i,τ ) f i ( ξ i,τ ) + O p (cid:0) n − / (log n ) / (cid:1) . The following theorem shows that the DRM-EE quantile estimators have asymptoticproperties similar to those of the MELEs of η , F ( x ), and F ( x ). Theorem 6.

Assume that the conditions in Theorem 5 hold for ξ l,τ l and ξ s,τ s .(a) As n → ∞ , √ n (cid:18) ˆ ξ l,τ l − ξ l,τ l ˆ ξ s,τ s − ξ s,τ s (cid:19) → N ( , Ω ls ) , where Ω ls = (cid:18) σ ll ( ξ l,τ l , ξ s,τ s ) /f l ( ξ l,τ l ) σ ls ( ξ l,τ l , ξ s,τ s ) /f l ( ξ l,τ l ) f s ( ξ s,τ s ) σ sl ( ξ s,τ s , x ) /f s ( ξ s,τ s ) f l ( ξ l,τ l ) σ ss ( ξ s,τ s , ξ s,τ s ) /f s ( ξ s,τ s ) (cid:19) . (b) If r = p , the asymptotic variance matrix Ω ls of the DRM-EE quantile estimatorsis the same as that for the DRM quantile estimators;(c) if r > p , the asymptotic variance matrix Ω ls of the DRM-EE quantile estimatorscannot decrease if an EE in (2) is dropped. Using the results of Theorems 4 and 6, we may construct conﬁdence regions and/ortest hypotheses on the CDFs at some ﬁxed points and for quantiles through the Wald-type statistics. However, methods based on the Wald-type statistics require a consistentestimator of the corresponding asymptotic variance. It is more attractive to use theresults in Corollary 2 to construct the ELR-based conﬁdence region for the CDFs atsome ﬁxed points and for quantiles.Suppose we are interested in constructing a (1 − a )-level CI for a CDF at some ﬁxedpoint x for i = 0 or 1. Denote the parameter of interest as ζ = F i ( x ). Let g ∗ ( x ; θ , ζ ) = (cid:26) I ( x ≤ x ) − ζ , i = 0exp { θ ⊤ Q ( x ) } I ( x ≤ x ) − ζ , i = 1 . We further deﬁne ℓ ∗ n ( ψ , β , ζ ) to be the proﬁle empirical log-likelihood of ( ψ , β , ζ ) underModel (1) with the unbiased EEs in (2) and E { g ∗ ( X ; θ , ζ ) } =0. Then the ELR functionof ζ is deﬁned as R n ( ζ ) = 2 { ℓ n ( ˆ ψ , ˆ θ ) − sup ψ , θ ℓ ∗ n ( ψ , θ , ζ ) } . We can similarly deﬁne the ELR function for a quantile ξ at the quantile level τ for12 = 0 or 1, i.e., ξ = ξ i,τ . Let g ∗ ( x ; θ , ξ ) = (cid:26) I ( x ≤ ξ ) − τ, i = 0exp { θ ⊤ Q ( x ) } I ( x ≤ ξ ) − τ, i = 1 . We further deﬁne ℓ ∗ n ( ψ , β , ξ ) to be the proﬁle empirical log-likelihood of ( ψ , β , ξ ) underModel (1) with the unbiased EEs in (2) and E { g ∗ ( X ; θ , ξ ) } = 0. Then the ELR functionof ξ is deﬁned as R n ( ξ ) = 2 { ℓ n ( ˆ ψ , ˆ θ ) − sup ψ , θ ℓ ∗ n ( ψ , θ , ξ ) } . Using Corollary 2, we have the following results for R n ( ζ ∗ ) and R n ( ξ ∗ ), where ζ ∗ and ξ ∗ are the true values of ζ and ξ . Corollary 3.

Under the conditions of Theorem 1, as n → ∞ , both R n ( ζ ∗ ) and R n ( ξ ∗ ) converge in distribution to χ . Corollary 3 enables us to construct the ELR-based CI for ζ and ξ . For example, theELR-based CI for ξ with level 1 − a can be constructed as { ξ : R n ( ξ ) ≤ χ , − a } . We conducted simulation studies to investigate three aspects of the proposed semipara-metric inference procedures:(1) The performance of the inference procedures for ψ ;(2) The power of the ELR test for the validity and usefulness of the auxiliary informa-tion;(3) The performance of the inference procedures for the population quantiles.We consider four combinations of sample sizes ( n , n ): (50 , , , , ψ We start by exploring the ﬁrst aspect of the proposed semiparametric inference proce-dures. In the simulations, F and F are the CDFs of LN(0 ,

1) and LN(0 . , a, b ) denotes the lognormal distribution with mean a and variance b , both withrespect to the log scale. It is easy to show that F and F satisfy the DRM in (1) with Q ( x ) = (1 , log x ) ⊤ . The parameter of interest is the mean ratio δ = µ /µ which wasdiscussed in Example 1.To examine the usefulness of auxiliary information, we construct another variable Z using the following model: Z = 1 + 0 . X + ǫ and ǫ ∼ N (0 , . (19)13hat is, given X ij , Z ij is generated from (19), for i = 0 , , j = 1 , · · · , n i . Hence, thetwo-sample data consist of T ij = ( X ij , Z ij ) ⊤ for i = 0 , , j = 1 , · · · , n i . We treat µ z = E ( Z | D = 0), the population mean of covariate Z for the ﬁrst group (i.e., the D = 0group), as the known auxiliary information. Let the CDFs of T given D = 0 and D = 1be F and F , respectively. It can be checked that F and F satisfy the DRM with Q ( x, z ) = (1 , log x ) ⊤ .To explore the eﬀect of misspeciﬁed estimating equations for the auxiliary information,we introduce a bias by using κµ z instead of the true value µ z for E ( Z | D = 0). We con-sider κ = 0 . , . , . , . , .

10. Note that κ = 1 .

00 corresponds to correctly speciﬁedauxiliary information. We incorporate the biased/unbiased auxiliary information into ourproblem by setting ψ = δ and g ( t ; ψ , θ ) = (cid:0) δx − x exp { θ ⊤ Q ( x ) } , z − κµ z (cid:1) ⊤ in (2). We compare three point estimators:(i) EMP: ¯ δ = ¯ µ / ¯ µ , where ¯ µ i = n − i P n i j =1 x ij for i = 0 and 1;(ii) DRM: ˜ δ = ˜ µ / ˜ µ , where ˜ µ i = R xd ˜ F i ( x ) for i = 0 and 1;(iii) DRM-EE: ˆ δ = ˆ µ / ˆ µ , where ˆ µ i = R xd ˆ F i ( x ) for i = 0 and 1.Note that the asymptotic properties of ˜ δ and ˆ δ are covered in Theorem 1. The perfor-mance of each estimator is evaluated by the relative bias (RB) and the mean squarederror (MSE). Simulation results on the three point estimators are presented in Table 1.Table 1: RB (%) and MSE ( × ( n , n ) EMP DRM DRM-EE κ = 1 κ = 0 . κ = 0 . κ = 1 . κ = 1 . ,

50) RB 3.37 1.46 1.15 12.73 6.83 -4.24 -9.32MSE 20.03 12.50 9.61 16.59 12.07 9.00 9.96(50 , , , We ﬁrst compare the results reported in the third to ﬁfth columns, i.e., EMP, DRM,and DRM-EE with correctly speciﬁed auxiliary information (DRM-EE with κ = 1). Wesee that the EMP estimator has the largest RBs and MSEs in all cases. The estimatorof DRM-EE with κ = 1 has the best performance, followed by the DRM estimator.This suggests that using correctly speciﬁed auxiliary information improves the estimationeﬃciency, which agrees with Corollary 1 in Section 2. We also note that as the sample14ize increases, all three estimators have improved performance and the gaps between thethree estimators become less pronounced, especially between DRM and DRM-EE.The sensitivity of the DRM-EE estimator with respect to misspeciﬁed auxiliary infor-mation can be observed from the last four columns of Table 1. The DRM-EE estimatorfor κ = 1 are clearly not as good as the estimator for κ = 1. The absolute value of theRB increases as κ moves further away from 1. We compare four CIs for δ :(i) EMP-NA: Wald-type CI for δ based on the asymptotic normality of log ¯ δ ;(ii) EMP-EL: Owen (2000)’s ELR-based CI for δ ;(iii) DRM: the ELR-based CI for δ in (15) without auxiliary information;(iv) DRM-EE: the ELR-based CI for δ in (15) with auxiliary information.The performance of a CI is evaluated in terms of coverage probability (CP) and averagelength (AL). The simulation results for the four CIs at the 95% nominal level are shownin Table 2.Table 2: CP (%) and AL of four CIs for the mean ratio at 95% nominal level ( n , n ) EMP-NA EMP-EL DRM DRM-EE κ = 1 κ = 0 . κ = 0 . κ = 1 . κ = 1 . ,

50) CP 92.6 91.6 94.5 94.2 90.7 93.9 92.1 88.1AL 1.65 1.65 1.41 1.23 1.38 1.30 1.16 1.10(50 , , , As we can see in the third to sixth columns, EMP-NA and EMP-EL are comparablebut are clearly inferior to DRM and DRM-EE ( κ = 1) in terms of CP and AL. TheCPs of the CIs for DRM and DRM-EE with κ = 1 are close to the nominal level for allsample size combinations. This suggests that the limiting distributions provide accurateapproximations to the ﬁnite-sample distributions of the ELR statistics. The ALs of theCIs for DRM-EE with κ = 1 are always shorter than other CIs, a strong evidence thatusing correctly speciﬁed auxiliary information improves the performance of a CI. On theother hand, misspeciﬁed auxiliary information results in inaccurate CIs. As κ movesfurther away from 1, the CP of the ELR-based CI shifts away from the nominal value.15 .1.4 Power of the validity test In this section, we explore the second aspect of the proposed semiparametric inferenceprocedures on the power of the ELR test for the validity of the auxiliary information.The null hypothesis for the ELR test is H : E ( z − κµ z ) = 0 . According to Theorem 3and Corollary 2, the ELR statistic has a χ limiting distribution under the null hypoth-esis. We consider misspeciﬁed auxiliary information with κ = 0 . , . , . , .

10 as thealternatives. Table 3 gives the simulated power ( κ = 1) and type I error rate ( κ = 1) ofthe ELR test at the 5% signiﬁcance level.Table 3: Power and type I error rate of the ELR test (%) at 5% signiﬁcance level ( n , n ) κ = 0 . κ = 0 . κ = 1 κ = 1 . κ = 1 . ,

50) 21.43 8.76 5.36 9.41 20.48(50 , , , We observe from Table 3 that the type I error rates of the ELR tests are close to the5% nominal level in all cases, which suggests that the limiting distribution for the ELRtest works very well. As κ deviates from 1 and the sample size increases, the power ofthe test increases, as expected. The third aspect of the proposed semiparametric inference procedures is inference onpopulation quantiles with auxiliary information. In the simulations, we consider twodistributional settings:(1) f ∼ N (18 ,

4) and f ∼ N (18 , f ∼ Gam (6 , .

5) and f ∼ Gam (8 , . N ( a, b ) denotes the normal distribution with mean a and variance b and Gam ( a, b ) isthe gamma distribution with shape parameter a and scale parameter b . We are interestedin estimating and constructing CIs for the quantiles of F and F at the levels τ =0 . , . , . , . , . We compare four quantile estimators:(i) EMP: the quantile estimator based on the empirical CDFs;(ii) EL: the quantile estimator based on the MELEs of the CDFs in Tsao & Wu (2006),in which a common mean is assumed;16iii) DRM: the DRM based quantile estimator in (18);(iv) DRM-EE: our proposed quantile estimator in (17) with the common-mean assump-tion or the EE (7) in Example 3.The DRM and DRM-EE methods are calculated with the correctly speciﬁed q ( x ),where q ( x ) = ( x, x ) ⊤ for the normal distributional setting and q ( x ) = ( x, log x ) ⊤ forthe gamma distributional setting. The performance of an estimator is evaluated by theRB and MSE. The general patterns of the simulation results for the four methods aresimilar in the two settings. Hence, Table 4 presented here is only for the normal setting;the results under gamma distributions are included in Section 3 of the SupplementaryMaterial.Table 4 shows that the RBs are negligibly small for all methods under all scenar-ios. The EMP estimator has the largest MSEs. The DRM-EE quantile estimators havethe smallest MSEs due to its use of additional information, and the results agree withTheorem 6. We also notice that the EL and DRM quantile estimators are comparable. We compare three CIs:(i) EMP: Owen (2000)’s ELR-based CI for quantiles;(ii) DRM: the ELR-based CI under the DRM without the common-mean assumption(Zhang et al., 2020);(iii) DRM-EE: the proposed ELR-based CI.The construction of CIs for the quantiles under the two-sample EL method with thecommon-mean assumption has not been discussed in the literature, and hence is notincluded in the simulation. The CP and AL are used to compare CIs. We present thesimulation results for the normal case in Table 5. The results for the gamma distributionsdisplay similar patterns and are included in Section 3 of the Supplementary Material.The CIs for all the methods have satisfactory performance in terms of CP. However,the CIs using the DRM-EE method have the shortest AL. The results indicate that thelimiting distribution of the ELR statistic in Corollary 3 works very well, and additionalauxiliary information leads to shorter CIs.

The ﬁrst dataset (Simpson et al., 1975) is from a randomized airborne pyrotechnic seed-ing experiment, which is designed to test whether seeding clouds with silver iodide in-crease rainfall. The measurements are the amount of rainfall (in acre-feet) from 52isolated cumulus clouds, half of which were randomly chosen and massively injected withsilver iodide smoke. The rest were untreated. We use D = 0 to indicate untreated clouds17able 4: RB (%) and MSE ( × ( n , n ) τ N (18 , N (18 , ,

50) 0.10 RB -0.58 0.08 0.25 0.19 -1.07 -0.10 0.17 -0.07MSE 23.87 19.88 18.85 16.32 59.74 44.17 46.26 37.350.25 RB 0.04 0.02 0.15 0.14 0.01 -0.06 -0.14 -0.25MSE 14.73 12.25 12.23 9.57 33.32 22.42 29.22 18.110.50 RB -0.21 0.03 0.04 0.03 -0.43 0.03 0.00 0.03MSE 12.47 9.93 10.06 7.76 29.21 16.25 25.08 11.100.75 RB -0.01 -0.01 -0.08 -0.07 -0.05 0.02 0.03 0.14MSE 13.92 11.81 11.97 9.64 34.86 21.55 29.68 16.950.90 RB -0.62 -0.08 -0.21 -0.18 -0.87 0.08 -0.08 0.10MSE 23.36 21.36 19.51 17.66 53.89 43.03 46.50 37.61(50 , , , τ %-quantiles (normal distributions) ( n , n ) τ N (18 , N (18 , D = 1 for seeded clouds. We estimate the mean ratio δ of the two populations andconstruct CIs for δ .To use our proposed method to analyze the dataset, we need to choose an appropriate q ( x ) in the DRM (1). Simpson et al. (1975) and Krishnamoorthy & Mathew (2003)argued that this dataset is highly skewed. This suggests that the two-sample data canbe ﬁtted by the DRM with q ( x ) = log x . The goodness-of-ﬁt test of Qin & Zhang (1997)gives a p -value of 0.568, which indicates that the DRM with q ( x ) = log x provides anadequate ﬁt to the two-sample data. Since there is no auxiliary information available,we analyze the data using DRM and the other methods discussed in Section 4.1. For thepoint estimates, the EMP method gives 2 . . δ (cloud data) LB UB LengthEMP-NA 1.13 6.36 5.23EMP-EL 1.41 5.24 3.83DRM 1.21 4.89 3.68

The second dataset (Hawkins, 2002) is from a clinical study of cyclosporine mea-surements in blood samples of organ transplant recipients. In total, 56 assay pairs forcyclosporine are obtained by a standard approved method, high-performance liquid chro-matography (HPLC), and an alternative radio-immunoassay (RIA) method. We wouldlike to investigate whether the RIA assay is essentially equivalent to the HPLC assay. Theresults in Hawkins (2002) and Bebu & Mathew (2008) indicate that the measurementsfrom the two methods can be modeled by lognormal distributions and have a commonmean. Since the quantiles are important characteristics of the population, we considerinference on these quantities at τ = 0 , , . , . D = 0 to indicatethe HPLC method for the ﬁrst group and D = 1 to indicate the RIA method for thesecond group. This gives two independent samples, shown in Table 7. We set q ( x )in the DRM (1) to q ( x ) = (log x, log x ) ⊤ . For this choice, the goodness-of-ﬁt test ofQin & Zhang (1997) gives a p -value of 0 . p -value of 0 . HPLC ( D = 0) RIA ( D = 1)77 87 93 109 109 129 130 38 98 108 109 111 118 125153 156 159 185 198 203 227 130 144 149 162 165 169 172244 245 271 280 285 318 336 204 218 234 235 293 294 303339 340 440 498 521 556 578 311 341 376 404 406 477 679 We use the methods of Section 4.2 to analyze the independent samples. Table 8summarizes the point estimates and 95% CIs. Note that the EL method does not specifyhow to construct CIs for quantiles with the common-mean assumption. We also providethe results of analyzing the original 56 pairs using the EMP method; these are recordedunder “EMP–ALL” in Table 8 and serve as the benchmarks. Table 8 shows that theDRM-EE CIs are always shorter than the DRM and EMP CIs. This is in line with oursimulation results. Although each independent sample is half the size of the originalsample, the DRM-EE quantile estimates and CIs are similar to the EMP-ALL quantileestimates and CIs. This indicates that our method can combine information from twosamples and eﬀectively utilize available auxiliary information.Table 8: Summary of point estimates and 95% CIs for quantiles (cyclosporine data) τ HPLC ( D = 0) RIA ( D = 1)Estimate LB UB Length Estimate LB UB Length0.25 EMP-ALL 127 109 159 50 141 118 162 50EMP 130 93 198 105 125 108 165 105EL 130 – – – 130 – – –DRM 144 109 185 76 129 108 162 54DRM-EE 130 109 165 56 130 109 162 530.5 EMP-ALL 206 159 271 112 196 162 287 112EMP 227 156 318 162 172 144 294 162EL 227 – – – 204 – – –DRM 234 162 303 141 198 149 280 131DRM-EE 218 162 280 118 204 162 280 1180.75 EMP-ALL 336 271 402 131 311 287 408 131EMP 336 240 432 192 303 218 388 192EL 336 – – – 311 – – –DRM 339 280 477 197 311 235 406 171DRM-EE 318 280 404 124 336 280 406 126 We have proposed new and general semiparametric inference procedures to utilize thecombined information from two samples as well as auxiliary information formulatedthrough unbiased EEs. We have established the asymptotic normality of the MELEsof the unknown parameters in the DRMs and/or deﬁned through EEs and the chi-square21imiting distributions for the ELR statistics on the parameters. We have also derived ef-ﬁciency results for estimating these parameters and obtained similar results for inferenceon the CDFs and population quantiles. We have developed an ELR test for checkingthe validity and usefulness of auxiliary information, and conducted simulation studies toevaluate the power of the test. Our theoretical results and simulation studies demon-strated that the use of DRMs and auxiliary information leads to improved eﬃciency ofstatistical inferences.We have focused on two-sample data under the DRM (1) in the current paper. Thisleads to many interesting potential research topics. First, we may generalize our resultsto multiple-sample DRMs (Chen & Liu, 2013) with unbiased EEs. Second, we may studyother types of parameters, such as the ROC curve and the area under the curve. Third,in Example 2 (a retrospective case-control study with auxiliary information), it is as-sumed that the ratio of the total sample size for the internal study to the total samplesize for the external study goes to 0. This assumption ensures that the uncertainty of theregression coeﬃcient from the external study is negligible. If the sample sizes of the inter-nal and external studies are comparable, then the variation of the regression coeﬃcientcannot be ignored. Simply discarding the uncertainty may not guarantee eﬃciency withthe auxiliary information (Zhang et al., 2020). We may generalize Zhang et al. (2020)’smethod from the one-sample case to case-control studies with uncertainty in the regres-sion coeﬃcient for the external study. We hope to address these problems in futureresearch.

Appendix: Regularity conditions

The asymptotic results in this paper depend on the following regularity conditions. Weuse || · || to denote the Euclidean norm, i.e., || · || is the sum of squares of the elements.C1. The total sample size n = n + n → ∞ and λ ∗ = n /n is a constant.C2. The two CDFs F and F satisfy the DRM (1) with a true parameter value θ ∗ , and R exp { θ ⊤ Q ( x ) } dF ( x ) < ∞ in a neighborhood of the true value θ ∗ .C3. R Q ( x ) ⊤ Q ( x ) dF ( x ) exists and is positive deﬁnite.C4. E { g ( X ; ψ ∗ , θ ∗ ) } = , E { ∂ g ( X ; ψ ∗ , θ ∗ ) /∂ η } has rank p , and R G ( X ) G ( X ) ⊤ dF ( x )exists and is positive deﬁnite, where G ( x ) is deﬁned before Theorem 1.C5. G ( x ; η ) is twice diﬀerentiable with respect to η , and || G ( x, η ) || , || ∂ G ( x, η ) /∂ η || ,and || ∂ G ( x, η ) / { ∂ η ∂ η τ }|| are bounded by some integrable function R ( x ) with re-spect to both F and F in the neighborhood of η ∗ .Conditions C1–C3 ensure that the quadratic approximation of the dual likelihood ℓ nd in (14) is applicable. Condition C2 guarantees the existence of ﬁnite moments of Q ( x )in a neighborhood of θ ∗ . Condition C3 is an identiﬁability condition, and it ensuresthat the components of Q ( x ) are linearly independent under both F i ’s, and hence the22lements of Q ( x ) except the ﬁrst cannot be constant functions. Conditions C3 and C4together ensure that U and V in Theorem 1 have full rank, guaranteeing that J isinvertible. Conditions C1–C5 guarantee that quadratic approximations of the proﬁleempirical log-likelihood ℓ n ( ψ , θ ) are applicable. References

Anderson, J. A. (1979). Multivariate logistic compounds.

Biometrika , , 17–26.Bebu, I. & Mathew, T. (2008). Comparing the means and variances of a bivariate log-normal distribution. Statistics in Medicine , , 2684–2696.Cai, S., Chen, J., & Zidek, J. V. (2017). Hypothesis testing in the presence of multiplesamples under density ratio models. Statistica Sinica , , 761–783.Chatterjee, N., Chen, Y.-H., Maas, P., & Carroll, R. J. (2016). Constrained maximumlikelihood estimation for model calibration using summary-level information from ex-ternal big data sources. Journal of the American Statistical Association , , 107–117.Chen, B., Li, P., Qin, J., & Yu, T. (2016). Using a monotonic density ratio model toﬁnd the asymptotically optimal combination of multiple diagnostic tests. Journal ofthe American Statistical Association , , 861–874.Chen, J. & Liu, Y. (2013). Quantile and quantile-function estimations under densityratio model. The Annals of Statistics , , 1669–1692.de Carvalho, M. & Davison, A. C. (2014). Spectral density ratio models for multivariateextremes. Journal of the American Statistical Association , , 764–776.Fokianos, K., Kedem, B., Qin, J., & Short, D. A. (2001). A semiparametric approach tothe one-way layout. Technometrics , , 56–65.Hawkins, D. M. (2002). Diagnostics for conformity of paired quantitative measurements. Statistics in Medicine , , 1913–1935.Imbens, G. W. & Lancaster, T. (1994). Combining micro and macro data in microecono-metric models. The Review of Economic Studies , , 655–680.Jiang, S. & Tu, D. (2012). Inference on the probability P ( T < T ) as a measurementof treatment eﬀect under a density ratio model and random censoring. ComputationalStatistics & Data Analysis , , 1069–1078.Kay, R. & Little, S. (1987). Transformations of the explanatory variables in the logisticregression model for binary data. Biometrika , , 495–501.Keziou, A. & Leoni-Aubin, S. (2008). On empirical likelihood for semiparametric two-sample density ratio models. Journal of Statistical Planning and Inference , , 915–928. 23rishnamoorthy, K. & Mathew, T. (2003). Inferences on the means of lognormal dis-tributions using generalized p -values and generalized conﬁdence intervals. Journal ofStatistical Planning and Inference , , 103–121.Li, H., Liu, Y., Liu, Y., & Zhang, R. (2018). Comparison of empirical likelihood andits dual likelihood under density ratio model. Journal of Nonparametric Statistics , ,581–597.Li, P., Liu, Y., & Qin, J. (2017). Semiparametric inference in a genetic mixture model. Journal of the American Statistical Association , , 1250–1260.Owen, A. B. (1988). Empirical likelihood ratio conﬁdence intervals for a single functional. Biometrika , , 237–249.Owen, A. B. (2000). Empirical Likelihood . Boca Raton: Chapman and Hall/CRC.Qin, J. (1999). Empirical likelihood ratio based conﬁdence intervals for mixture propor-tions.

The Annals of Statistics , , 1368–1384.Qin, J. (2017). Biased Sampling, Over-identiﬁed Parameter Problems and Beyond . Sin-gapore: Springer.Qin, J. & Lawless, J. (1994). Empirical likelihood and general estimating equations.

TheAnnals of Statistics , , 300–325.Qin, J. & Zhang, B. (1997). A goodness-of-ﬁt test for logistic regression models basedon case-control data. Biometrika , , 609–618.Qin, J. & Zhang, B. (2003). Using logistic regression procedures for estimating receiveroperating characteristic curves. Biometrika , , 585–596.Qin, J., Zhang, H., Li, P., Albanes, D., & Yu, K. (2015). Using covariate-speciﬁc diseaseprevalence information to increase the power of case-control studies. Biometrika , ,169–180.Simpson, J., Olsen, A., & Eden, J. C. (1975). A Bayesian analysis of a multiplicativetreatment eﬀect in weather modiﬁcation. Technometrics , , 161–166.Tsao, M. & Wu, C. (2006). Empirical likelihood inference for a common mean in thepresence of heteroscedasticity. The Canadian Journal of Statistics , , 45–59.Wang, C., Marriott, P., & Li, P. (2017). Testing homogeneity for multiple nonnegativedistributions with excess zero observations. Computational Statistics & Data Analysis , , 146–157.Wang, C., Marriott, P., & Li, P. (2018). Semiparametric inference on the means of mul-tiple nonnegative distributions with excess zero observations. Journal of MultivariateAnalysis , , 182–197. 24u, C. & Thompson, M. E. (2020). Sampling Theory and Practice . Cham: Springer.Wu, J., Jiang, G., Wong, A., & Sun, X. (2002). Likelihood analysis for the ratio of meansof two independent log-normal distributions.

Biometrics , , 463–469.Yuan, M., Li, P., & Wu, C. (2021). Semiparametric inference of the youden index andthe optimal cut-oﬀ point under density ratio models. Canadian Journal of Statistics ,forthcoming.Zhang, A. G., Zhu, G., & Chen, J. (2020). Empirical likelihood ratio test on quantilesunder a density ratio model. arXiv:2007.10586.Zhang, B. (2000). Quantile estimation under a two-sample semi-parametric model.

Bernoulli , , 491–511.Zhang, H., Deng, L., Schiﬀman, M., Qin, J., & Yu, K. (2020). Generalized integra-tion model for improved statistical inference by leveraging external summary data. Biometrika , , 689–703.Zhou, X.-H., Gao, S., & Hui, S. L. (1997). Methods for comparing the means of twoindependent log-normal samples. Biometrics , , 1129–1135.Zhuang, W., Hu, B., & Chen, J. (2019). Semiparametric inference for the dominanceindex under the density ratio model. Biometrika , , 229–241.Zou, F., Fine, J. P., & Yandell, B. S. (2002). On empirical likelihood for a semiparametricmixture model. Biometrika , , 61–75. 25 upplementary Material for“Semiparametric empirical likelihood inference with general es-timating equations under density ratio models” This document provides supplement materials to the paper entitled “Semiparametric em-pirical likelihood inference with general estimating equations under density ratio models.”Section 1 contains more examples of important summary quantities. Section 2 presentsmore details of the extraction of the summary-level information from the external case-control study. Section 3 gives additional simulation results, and Sections 4–11 providetechnical details and proofs for the theoretical results presented in the main paper.

In this section, we provide some examples to demonstrate that the estimating equations(EEs) E { g ( X ; ψ , θ ) } = can deﬁne many important summary quantities. Recall thatthe two cumulative distribution functions (CDFs) F and F are linked via the densityratio model (DRM): dF ( x ) = exp { θ ⊤ Q ( x ) } dF ( x ) . (20) Example 4. (Means and variances) Let µ i and σ i be the mean and variance of F i for i = 0 , . Further, let ψ = ( µ , µ , σ , σ ) ⊤ and g ( x ; ψ , θ ) =  x − µ x exp { θ ⊤ Q ( x ) } − µ x − µ − σ x exp { θ ⊤ Q ( x ) } − µ − σ  . Then these means and variances can be deﬁned through E { g ( X ; ψ , θ ) } = . The generaluncentered and centered moments can be deﬁned similarly.Applying the results in Theorem 2 in the main paper, we can construct an empiricallikelihood ratio (ELR) statistic for testing H : σ = σ , which to our best knowledge isnew for such a testing problem. Example 5. (Generalized entropy class of inequality measures) Suppose the X ij ’s arepositive random variables. Let GE ( ξ ) i =  ξ − ξ (cid:26)R ∞ (cid:16) xµ i (cid:17) ξ dF i ( x ) − (cid:27) , if ξ = 0 , , − R ∞ log (cid:16) xµ i (cid:17) dF i ( x ) , if ξ = 0 , R ∞ xµ i log (cid:16) xµ i (cid:17) dF i ( x ) , if ξ = 1 be the generalized entropy class of inequality measures of the i th population, i = 0 , . Weassume that GE ( ξ ) i exists. In our setup, ( GE ( ξ )0 , GE ( ξ )1 ) ⊤ together with ( µ , µ ) can alsobe deﬁned through the EEs. For illustration, we consider ξ = 1 . et ψ = ( µ , µ , GE (1)0 , GE (1)0 ) ⊤ and g ( x ; ψ , θ ) =  x − µ x exp { θ ⊤ Q ( x ) } − µ x log( x/µ ) − µ GE (1)0 x log( x/µ ) exp { θ ⊤ Q ( x ) } − µ GE (1)1  . Then ( GE ( ξ )0 , GE ( ξ )1 ) ⊤ together with ( µ , µ ) can be deﬁned through E { g ( X ; ψ , θ ) } = .For other values of ξ , we can deﬁne the corresponding EEs similarly.Applying the results in Theorem 2 in the main paper, we can also construct an ELRstatistic for testing H : GE ( ξ )0 = GE ( ξ )1 . Again, to our best knowledge this ELR statisticis new for such testing problems. Example 6. (Cumulative distribution functions) Suppose we are interested in ζ = F ( x ) and ζ = F ( x ) , where x and x are ﬁxed points. Let ψ = ( ζ , ζ ) ⊤ and g ( x ; ψ , θ ) = (cid:18) I ( x ≤ x ) − ζ exp { θ ⊤ Q ( x ) } I ( x ≤ x ) − ζ (cid:19) . Then ( ζ , ζ ) ⊤ can be deﬁned through E { g ( X ; ψ , θ ) } = .Applying the results in Theorem 2 in the main paper, we can also construct an ELR-based conﬁdence interval (CI) for ζ or ζ or an ELR-based conﬁdence region for ( ζ , ζ ) ⊤ . Example 7. (Quantiles) Suppose we are interested in ξ ,τ = inf { x : F ( x ) ≥ τ } and ξ ,τ = inf { x : F ( x ) ≥ τ } , where τ , τ ∈ (0 , . Let ψ = ( ζ , ζ ) ⊤ and g ( x ; ψ , θ ) = (cid:18) I ( x ≤ ξ ,τ ) − τ exp { θ ⊤ Q ( x ) } I ( x ≤ ξ ,τ ) − τ (cid:19) . Then ( ξ ,τ , ξ ,τ ) ⊤ can be deﬁned through E { g ( X ; ψ , θ ) } = .Applying the result of Corollary 2 or 3 in the main paper, we can also construct anELR-based CI for ξ ,τ or ξ ,τ or an ELR-based conﬁdence region for ( ξ ,τ , ξ ,τ ) ⊤ . Let { ( Y i , D i ) : i = 1 , . . . , n E } be the data from an external study, where D i = 0 or1 indicates that the individual is from a disease-free or diseased group. We model therelationship between D and Y through a logistic regression model, which may not be thetrue model: h ( Y ; α Y , β Y ) = P ( D = 1 | Y ) = exp( α Y + β ⊤ Y Y )1 + exp( α Y + β ⊤ Y Y ) . (21)Let a ( α Y , β Y ) = 1 n E n E X i =1 { D i − h ( Y i ; α Y , β Y ) } (1 , Y ⊤ ) ⊤ , α ∗ Y , β ∗ Y ) be the solution to E { a ( α Y , β Y ) } = . That is, E { a ( α ∗ Y , β ∗ Y ) } = . Note that ( α ∗ Y , β ∗ Y ) may not be known exactly. We can solve the score equations a ( α Y , β Y ) = to obtain the estimator ( ˆ α Y , ˆ β Y ). That is, a ( ˆ α Y , ˆ β Y ) = . Assumethat we have access to the estimator ( ˆ α Y , ˆ β Y ) but not necessarily to the individual-leveldata { ( Y i , D i ) : i = 1 , . . . , n E } .When the total sample size n = n + n for the internal study satisﬁes n/n E → α Y , ˆ β Y ) for ( α ∗ Y , β ∗ Y ). This will cause a negligible error for inference forthe internal study. In the following, we assume that ( α ∗ Y , β ∗ Y ) is known and we denote h ( y ) = h ( y ; α ∗ Y , β ∗ Y ) . Next, we discuss how to summarize the information from E { a ( α ∗ Y , β ∗ Y ) } = intounbiased EEs with respect to F , which is the setup in the main paper. When theexternal study is a prospective case-control study, by deﬁning the unknown overall diseaseprevalence π = P ( D = 1), we have E { a ( α ∗ Y , β ∗ Y ) } = E (cid:2) { D − h ( Y ) } (1 , Y ⊤ ) ⊤ (cid:3) (22)= E (cid:16) [ − (1 − π ) h ( Y ) + π exp { θ ⊤ Q ( X ) }{ − h ( Y ) } ](1 , Y ⊤ ) ⊤ (cid:17) , (23)where we have used the law of total expectation and the DRM (20) in the last step.When the external study is a retrospective case-control study, we have E { a ( α ∗ Y , β ∗ Y ) } = − (1 − π E ) E { h ( Y )(1 , Y ⊤ ) ⊤ } + π E E [ { − h ( Y ) } (1 , Y ⊤ ) ⊤ } ] , (24)where E represents the expectation operators with respect to F , and π E is the propor-tion of diseased individuals in the external case-control study. Note that π E is a knownand ﬁxed value.Using the DRM (20), we further get E { a ( α ∗ Y , β ∗ Y ) } = E (cid:16) [ − (1 − π E ) h ( Y ) + π E exp { θ ⊤ Q ( X ) }{ − h ( Y ) } ](1 , Y ⊤ ) ⊤ (cid:17) . (25)Summarizing (23) and (25), we have that if the external study is a prospective case-control study, then E { g ( X ; ψ , θ ) } = , where g ( x ; ψ , θ ) = [ − (1 − π ) h ( y ) + π exp { θ ⊤ Q ( x ) }{ − h ( y ) } ](1 , y ⊤ ) ⊤ with ψ = π ; if the external study is a retrospective case-control study, then E { g ( X ; θ ) } = , where g ( x ; θ ) = [ − (1 − π E ) h ( y ) + π E exp { θ ⊤ Q ( x ) }{ − h ( y ) } ](1 , y ⊤ ) ⊤ . Similarly, we summarize the information from E { a ( α ∗ Y , β ∗ Y ) } = into unbiased EEs28ith respect to the joint distribution of ( D, Y ), which is the setup in Chatterjee et al.(2016). Note that when the external study is a retrospective case-control study, (24) canbe equivalently written as E { a ( α ∗ Y , β ∗ Y ) } = E (cid:20) − π E − π (1 − D ) { D − h ( Y ) } (1 , Y ⊤ ) ⊤ + π E π D { D − h ( Y ) } (1 , Y ⊤ ) ⊤ (cid:21) . (26)Summarizing (22) and (26), we have that if the external study is a prospective case-control study, then E { u ( D, Y ) } = , where u ( D, Y ) = { D − h ( Y ) } (1 , Y ⊤ ) ⊤ ;if the external study is a retrospective case-control study, then E { u ( D, Y ; π ) } = , where u ( D, Y ; π ) = 1 − π E − π (1 − D ) { D − h ( Y ) } (1 , Y ⊤ ) ⊤ + π E π D { D − h ( Y ) } (1 , Y ⊤ ) ⊤ . Note that the method and theory in Chatterjee et al. (2016) are applicable when thereis no unknown parameter in the functions u ( · ). Hence, their general results do not applywhen the external study is a retrospective case-control study. Table 9 presents the four quantile estimates under gamma distributions. Table 10 presentsthe three CIs for quantiles under gamma distributions. The general summary statementsare similar to those for normal distributions, and hence are omitted.

Recall that the proﬁle empirical log-likelihood of ( ψ , θ ) is ℓ n ( ψ , θ ) = − X i =0 n i X j =1 log (cid:8) λ (cid:2) exp { θ ⊤ Q ( X ij ) } − (cid:3) + ν ⊤ g ( X ij ; ψ , θ ) (cid:9) + n X j =1 θ ⊤ Q ( X j ) , where the Lagrange multipliers satisfy X i =0 n i X j =1 exp { θ ⊤ Q ( X ij ) } −

11 + λ (cid:2) exp { θ ⊤ Q ( X ij ) } − (cid:3) + ν ⊤ g ( X ij ; ψ , θ ) = 0 , X i =0 n i X j =1 g ( X ij ; ψ , θ )1 + λ (cid:2) exp { θ ⊤ Q ( X ij ) } − (cid:3) + ν ⊤ g ( X ij ; ψ , θ ) = . × ( n , n ) τ Gam (8 , . Gam (6 , , ,

50) 0.10 RB -2.25 -0.05 0.25 0.16 -1.40 0.71 1.26 0.65MSE 29.71 25.04 23.26 20.29 31.70 26.96 26.66 22.880.25 RB 0.01 -0.04 0.08 0.03 0.75 0.30 0.47 -0.06MSE 25.02 19.93 21.38 16.39 32.91 24.71 27.78 20.320.50 RB -1.03 -0.04 -0.15 -0.02 -0.74 -0.07 0.28 -0.08MSE 30.99 23.20 25.91 17.32 40.46 25.74 35.52 19.680.75 RB -0.13 -0.02 -0.33 -0.13 -0.02 -0.20 0.15 0.12MSE 48.41 35.85 42.11 28.23 65.70 43.10 57.48 33.810.90 RB -1.85 0.15 -0.47 -0.20 -1.93 0.01 0.12 0.14MSE 99.19 86.91 83.12 62.28 133.79 110.01 120.28 86.79(50 , , , τ %-quantile (gamma distributions) ( n , n ) τ Gam (8 , . Gam (6 , . ℓ n ( ψ , θ ) can be rewritten as ℓ n ( ψ , θ ) = inf λ, ν l n ( ψ , θ , λ, ν ) , where l n ( ψ , θ , λ, ν )= − X i =0 n i X j =1 log (cid:8) λ (cid:2) exp (cid:8) θ ⊤ Q ( X ij ) (cid:9) − (cid:3) + ν ⊤ g ( X ij ; ψ , θ ) (cid:9) + n X j =1 { θ ⊤ Q ( X j ) } . Equivalently, ℓ n ( ψ , θ ) = l n ( ψ , θ , λ, ν ) with λ and ν being the solution to ∂l n ( ψ , θ , λ, ν ) ∂λ = 0 and ∂l n ( ψ , θ , λ, ν ) ∂ ν = . With the above preparation, it can be veriﬁed that the maximum empirical likelihoodestimate (MELE) ( ˆ ψ , ˆ θ ) of ( ψ , θ ) and the corresponding Lagrange multipliers (ˆ λ, ˆ ν )satisfy ∂l n ( ˆ ψ , ˆ θ , ˆ λ, ˆ ν ) ∂ θ = , ∂l n ( ˆ ψ , ˆ θ , ˆ λ, ˆ ν ) ∂ β = , ∂l n ( ˆ ψ , ˆ θ , ˆ λ, ˆ ν ) ∂λ = 0 , ∂l n ( ˆ ψ , ˆ θ , ˆ λ, ˆ ν ) ∂ ν = . To investigate the asymptotic properties of ˆ ψ and ˆ θ , we need their approximations. Weﬁrst ﬁnd the ﬁrst and second derivatives of l n ( ψ , θ , λ, ν ).For convenience of presentation, we recall and deﬁne some notation. We use ψ ∗ and θ ∗ to denote the true values of ψ and θ . Let ω ( x ; θ ) = exp (cid:8) θ ⊤ Q ( x ) (cid:9) , ω ( x ) = ω ( x ; θ ∗ ) , λ ∗ = n /n,h ( x ) = 1 + λ ∗ { ω ( x ) − } , h ( x ) = λ ∗ ω ( x ) /h ( x ) , h ( x ) = (1 − λ ∗ ) /h ( x ) , G ( x ; ψ , θ ) = ( ω ( x ; θ ) − , g ( x ; ψ , θ ) ⊤ ) ⊤ , G ( x ) = G ( x ; ψ ∗ , θ ∗ ) . Note that ω ( · ), h ( · ), h ( · ), h ( · ), and G ( · ) depend on ψ ∗ and/or θ ∗ , and h ( x )+ h ( x ) = 1.By Condition C1, λ ∗ is a ﬁxed value and does not depend on the total sample size n .Recall that η = ( ψ ⊤ , θ ⊤ ) ⊤ and u = ( λ, ν ⊤ ) ⊤ . Let γ = ( η ⊤ , u ⊤ ). We further deﬁneˆ η = ( ˆ ψ ⊤ , ˆ θ ⊤ ) ⊤ , ˆ u = (ˆ λ, ˆ ν ⊤ ) ⊤ , ˆ γ = (ˆ η ⊤ , ˆ u ⊤ ) , η ∗ = ( ψ ∗⊤ , θ ∗⊤ ) ⊤ , u ∗ = ( λ ∗ , × r ) ⊤ , γ ∗ = ( η ∗⊤ , u ∗⊤ ) . In the following, we use l n ( γ ), g ( x ; η ), and G ( x ; η ) to denote l n ( ψ , θ , λ, ν ), g ( x ; ψ , θ ),and G ( x ; ψ , θ ). 32 .1 First and second derivatives of l n ( γ )After some algebra, the ﬁrst derivatives of l n ( γ ) are found to be: ∂l n ( γ ) ∂ ψ = − X i =0 n i X j =1 { ∂ g ( X ij ; η ) /∂ ψ } ⊤ ν λ { ω ( X ij ; θ ) − } + ν ⊤ g ( X ij ; η ) ,∂l n ( γ ) ∂ θ = − X i =0 n i X j =1 λω ( X ij ; θ ) Q ( X ij ) + { ∂ g ( X ij ; η ) /∂ θ } ⊤ ν λ { ω ( X ij ; θ ) − } + ν ⊤ g ( X ij ; η ) + n X j =1 Q ( X j ) ,∂l n ( γ ) ∂ u = − X i =0 n i X j =1 G ( X ij ; η )1 + λ { ω ( X ij ; θ ) − } + ν ⊤ g ( X ij ; η ) . Then the ﬁrst derivatives at the true values η ∗ and u ∗ are S n = ∂l n ( γ ∗ ) ∂ γ =  ∂l n ( γ ∗ ) ∂ ψ ∂l n ( γ ∗ ) ∂ θ ∂l n ( γ ∗ ) ∂ u  =  S n θ S n u  , where S n θ = n X j =1 Q ( X j ) − X i =0 n i X j =1 h ( X ij ) Q ( X ij ) , S n u = − X i =0 n i X j =1 G ( X ij ) h ( X ij ) . Similarly, we calculate the second derivatives of l n ( γ ). Evaluating them at γ ∗ gives: ∂ l n ( γ ∗ ) ∂ γ ∂ γ ⊤ =  ∂ l n ( γ ∗ ) ∂ θ ∂ u ⊤ ∂ l n ( γ ∗ ) ∂ β ∂ β ⊤ ∂ l n ( γ ∗ ) ∂ β ∂ u ⊤ ∂ l n ( γ ∗ ) ∂ u ∂ θ ⊤ ∂ l n ( γ ∗ ) ∂ u ∂ β ⊤ ∂ l n ( γ ∗ ) ∂ u ∂ u ⊤  , (27)33here ∂ l n ( γ ∗ ) ∂ ψ ∂ u ⊤ = (cid:18) ∂ l n ( γ ∗ ) ∂ u ∂ ψ ⊤ (cid:19) ⊤ = − X i =0 n i X j =1 { ∂ G ( X ij ; η ∗ ) /∂ ψ } ⊤ h ( X ij ) ; ∂ l n ( γ ∗ ) ∂ θ ∂ θ ⊤ = − X i =0 n i X j =1 h ( X ij ) h ( X ij ) Q ( X ij ) Q ( X ij ) ⊤ ; ∂ l n ( γ ∗ ) ∂ θ ∂ u ⊤ = (cid:18) ∂ l n ( γ ∗ ) ∂ u ∂ θ ⊤ (cid:19) ⊤ = X i =0 n i X j =1 h ( X ij ) Q ( X ij ) G ( X ij ) ⊤ h ( X ij ) − X i =0 n i X j =1 { ∂ G ( X ij ; η ∗ ) /∂ θ } ⊤ h ( X ij ) ; ∂ l n ( γ ∗ ) ∂ u ∂ u ⊤ = X i =0 n i X j =1 G ( X ij ) G ( X ij ) ⊤ h ( X ij ) . We ﬁrst review a lemma from the supplementary material of Qin et al. (2015), whichhelps to ease the calculation in our proofs. In the following, we assume that the DRM(20) is satisﬁed as required in Condition C2.

Lemma 7.

Suppose that S is an arbitrary vector-valued function. Let E ( · ) representthe expectation operator with respect to F and X refer to a random variable from F .Then we have for j = 1 , · · · , n , E {S ( X j ) } = E { ω ( X ) S ( X ) } and E ( X i =0 n i X j =1 S ( X ij ) ) = nE {S ( X ) h ( X ) } . Proof.

Under the DRM with true parameter θ ∗ , we have E {S ( X j ) } = Z S ( x ) dF ( x ) = Z S ( x ) ω ( x ) dF ( x ) = E { ω ( X ) S ( X ) } . Using the fact that λ ∗ = n /n and the deﬁnition of the function h ( · ), we further have E ( X i =0 n i X j =1 S ( X ij ) ) = n E {S ( X ) } + n E { ω ( X ) S ( X ) } = n [(1 − λ ∗ ) E {S ( X ) } + λ ∗ E { ω ( X ) S ( X ) } ]= nE [ { (1 − λ ∗ ) + λ ∗ ω ( X ) }S ( X )]= nE { ω ( X ) S ( X ) } . This completes the proof. 34ecall that A θθ = (1 − λ ∗ ) E (cid:8) h ( X ) Q ( X ) Q ( X ) ⊤ (cid:9) , A θu = A ⊤ uθ = E (cid:26) ∂ G ( X ; η ∗ ) ∂ θ (cid:27) ⊤ − E (cid:8) h ( X ) Q ( X ) G ( X ) ⊤ (cid:9) , A ψu = A ⊤ uψ = E (cid:26) ∂ G ( X ; η ∗ ) ∂ ψ (cid:27) ⊤ , A uu = E (cid:26) G ( X ) G ( X ) ⊤ h ( X ) (cid:27) . Applying Lemma 7, after some algebra, we have the following Lemma.

Lemma 8. (a) With the form of ∂ l n ( γ ∗ ) / ( ∂ γ ∂ γ ⊤ ) deﬁned in (27) , we have − n E (cid:26) ∂ l n ( γ ∗ ) ∂ γ ∂ γ ⊤ (cid:27) = A =  A ψu A θθ A θu A uψ A uθ − A uu  . (b) Let S ∗ n = ( S ⊤ n θ , S ⊤ n u ) ⊤ . Then as n → ∞ , n − / S ∗ n → N ( , Γ ) in distribution with e θ = (cid:18) d × (cid:19) , e u = (cid:18) r × (cid:19) , C = (cid:18) A θθ e θ − λ ∗ (1 − λ ∗ ) A uu e u (cid:19) , and Γ = (cid:18) A θθ A uu (cid:19) − λ ∗ (1 − λ ∗ ) CC ⊤ . Proof.

For (a): Note that Conditions C3 and C4 ensure that A is well deﬁned. Theresults then follow by applying Lemma 7 to each term of E (cid:8) ∂ l n ( γ ∗ ) / ( ∂ γ ∂ γ ⊤ ) (cid:9) . Weuse E (cid:8) ∂ l n ( γ ∗ ) / ( ∂ θ ∂ θ ⊤ ) (cid:9) as an illustration; for the other entries, the idea is similar andwe omit the details.With Lemma 7 and the fact that h ( x ) h ( x ) = 1 − λ ∗ , we have − n E (cid:26) ∂ l n ( γ ∗ ) ∂ θ ∂ θ ⊤ (cid:27) = 1 n E ( X i =0 n i X j =1 h ( X ij ) h ( X ij ) Q ( X ij ) Q ( X ij ) ⊤ ) = (1 − λ ∗ ) E (cid:8) h ( X ) Q ( X ) Q ( X ) ⊤ (cid:9) = A θθ . For (b): Conditions C2–C4 ensure that E ( S ∗ n ) and V ar ( S ∗ n ) are well deﬁned. We35rst use the results in Lemma 7 to show that E ( S ∗ n ) = . For E ( S n θ ), E ( S n θ ) = n E { Q ( X ) } − nE { h ( X ) h ( X ) Q ( X ) } = n E { ω ( X ) Q ( X ) } − nE { λ ∗ ω ( X ) Q ( X ) } = . The last step follows from the fact that λ ∗ = n /n .The unbiasedness of the EEs leads to E ( S n u ) = − nE { G ( X ; η ∗ ) } = . Hence, we have E ( S ∗ n ) = .Since S ∗ n is a summation of independent random vectors, by the central limit theorem, n − / S ∗ n → N ( , Γ )for some Γ . Next, we show that Γ has the form claimed in the lemma.We start with the variances of n − / S n θ and n − / S n u . Note that S n θ = n X j =1 h ( X j ) Q ( X j ) − n X j =1 h ( X j ) Q ( X j ) . With the help of Lemma 7, we have

V ar ( n − / S n θ ) = 1 n V ar n X j =1 h ( X j ) Q ( X j ) − n X j =1 h ( X j ) Q ( X j ) ! = λ ∗ E (cid:8) h ( X ) ω ( X ) Q ( X ) Q ( X ) ⊤ (cid:9) +(1 − λ ∗ ) E (cid:8) h ( X ) Q ( X ) Q ( X ) ⊤ (cid:9) − λ ∗ E { h ( X ) ω ( X ) Q ( X ) } E (cid:8) h ( X ) ω ( X ) Q ( X ) ⊤ (cid:9) − (1 − λ ∗ ) E { h ( X ) Q ( X ) } E (cid:8) h ( X ) Q ( X ) ⊤ (cid:9) . Using the deﬁnitions of functions h ( · ) and h ( · ) and the fact that λ ∗ = n /n , we furtherhave V ar ( n − / S n θ ) = (1 − λ ∗ ) E (cid:8) h ( X ) Q ( X ) Q ( X ) ⊤ (cid:9) − − λ ∗ λ ∗ E { h ( X ) Q ( X ) } E (cid:8) h ( X ) Q ( X ) ⊤ (cid:9) = A θθ − { λ ∗ (1 − λ ∗ ) } − A θθ e θ ( A θθ e θ ) ⊤ . n − / S n u as V ar ( n − / S n u ) = 1 n V ar ( − X i =0 n i X j =1 G ( X ij ) h ( X ij ) ) = 1 n X i =0 n i X j =1 E (cid:26) G ( X ij ) G ( X ij ) ⊤ h ( X ij ) (cid:27) − n n X j =1 E (cid:26) G ( X j ) h ( X j ) (cid:27) E (cid:26) G ( X j ) ⊤ h ( X j ) (cid:27) − n n X j =1 E (cid:26) ω ( X j ) G ( X j ) h ( X j ) (cid:27) E (cid:26) ω ( X j ) G ( X j ) ⊤ h ( X j ) (cid:27) = A uu − (1 − λ ∗ ) E (cid:26) G ( X ) h ( X ) (cid:27) E (cid:26) G ( X ) ⊤ h ( X ) (cid:27) − λ ∗ E (cid:26) ω ( X ) G ( X ) h ( X ) (cid:27) E (cid:26) ω ( X ) G ( X ) ⊤ h ( X ) (cid:27) . It can easily be veriﬁed that(1 − λ ∗ ) E (cid:26) G ( X ) h ( X ) (cid:27) + λ ∗ E (cid:26) ω ( X ) G ( X ) h ( X ) (cid:27) = E { G ( X ) } = , which implies that E (cid:26) { ω ( X ) − } G ( X ) h ( X ) (cid:27) = − λ ∗ E (cid:26) G ( X ) h ( X ) (cid:27) = A uu e u . Therefore,

V ar ( n − / S n u ) = A uu − λ ∗ (1 − λ ∗ ) A uu e u ( A uu e u ) ⊤ . Lastly, we consider the covariance between n − / S n θ and n − / S n u : Cov ( n − / S n θ , n − / S n u )= − n Cov n X j =1 h ( X j ) Q ( X j ) − n X j =1 h ( X j ) Q ( X j ) , X i =0 n i X j =1 G ( X ij ) ⊤ h ( X ij ) ! = − n n X j =1 Cov (cid:18) h ( X j ) Q ( X j ) , G ( X j ) ⊤ h ( X j ) (cid:19) + 1 n n X j =1 Cov (cid:18) h ( X j ; ) Q ( X j ) , G ( X j ) ⊤ h ( X j ) (cid:19) = λ ∗ E { ω ( X ) h ( X ) Q ( X ) } E (cid:26) ω ( X ) G ( X ) ⊤ h ( X ) (cid:27) − (1 − λ ∗ ) E { h ( X ) Q ( X ) } E (cid:26) G ( X ) ⊤ h ( X ) (cid:27) = (1 − λ ∗ ) E { h ( X ) Q ( X ) } E (cid:26) { ω ( X ) − } G ( X ) ⊤ h ( X ) (cid:27) = A θθ e θ ( A uu e u ) ⊤ . Then Γ = V ar ( n − / S ∗ n ) has the form claimed in the lemma. This completes the proof.37 Proof of Theorem 1

Recall that ˆ γ = (ˆ η ⊤ , ˆ u ⊤ ) ⊤ is the MELE of γ . Using an argument similar to thatin Qin & Lawless (1994) and Qin et al. (2015), we have that ˆ η = η ∗ + O p ( n − / ) andˆ u = u ∗ + O p ( n − / ). To develop the asymptotic approximation of ˆ η , we apply the ﬁrst-order Taylor expansion to ∂l n (ˆ γ ) /∂ γ at the true value γ ∗ . This, together with ConditionC5, gives = S n + ∂ l n ( γ ∗ ) ∂ γ ∂ γ ⊤ (ˆ γ − γ ∗ ) + o p ( n / ) . With the law of large numbers and Lemma 8, we have1 n ∂ l n ( γ ∗ ) ∂ γ ∂ γ ⊤ = 1 n E (cid:26) ∂ l n ( γ ∗ ) ∂ γ ∂ γ ⊤ (cid:27) + o p (1) = − A + o p (1) . (28)Hence, we can write (cid:18) A θθ (cid:19) (ˆ η − η ∗ ) + (cid:18) A ψu A θu (cid:19) ( ˆ u − u ) = 1 n (cid:18) S n θ (cid:19) + o p ( n − ); (29) (cid:0) A uψ A uθ (cid:1) (ˆ η − η ∗ ) − A uu ( ˆ u − u ) = 1 n S n u + o p ( n − ) . (30)Recall that U = (cid:18) A ψu A θθ A θu (cid:19) , V = (cid:18) A θθ A uu (cid:19) , and J = U V − U ⊤ . (31)Conditions C3 and C4 ensure that U , V , and J have full rank. Then (29) and (30)together imply that n / (ˆ η − η ∗ ) = J − U V − ( n − / S ∗ n ) + o p (1) . Applying Lemma 8 and Slusky’s theorem, we have as n → ∞ n / (ˆ η − η ∗ ) → N ( , Σ )in distribution with Σ = J − U V − V ar ( n − / S ∗ n ) V − U ⊤ J − .Recall that V ar ( n − / S ∗ n ) = Γ = V − λ ∗ (1 − λ ∗ ) CC ⊤ and C = (cid:18) A θθ e θ − λ ∗ (1 − λ ∗ ) A uu e u (cid:19) . Since A ψu e u = and A θu e u = 1 λ ∗ E { h ( X ) Q ( X ) } = 1 λ ∗ (1 − λ ∗ ) A θθ e θ ,

38e have

U V − C = U V − (cid:18) A θθ e θ − λ ∗ (1 − λ ∗ ) A uu e u (cid:19) = (cid:18) − λ ∗ (1 − λ ∗ ) A ψu e u A θθ e θ − λ ∗ (1 − λ ∗ ) A θu e u (cid:19) = . This leads to Σ = J − and completes the proof. Part (a).

The results in Theorem 1 imply that n / (ˆ θ − θ ∗ ) → N ( , J θ )in distribution, where J θ = n A θθ + A θu A − uu A uθ − A θu A − uu A uψ (cid:0) A ψu A − uu A uψ (cid:1) − A ψu A − uu A uθ o − . From the deﬁnitions of A uψ and A uu , we have A uψ = E n ∂ g ( X ; η ∗ ) ∂ ψ o ! and A uu =  E n { ω ( X ) − } h ( X ) o E n { ω ( X ) − } g ( X ; η ∗ ) h ( X ) o E n { ω ( X ) − } g ( X ; η ∗ ) ⊤ h ( X ) o E n g ( X ; η ∗ ) g ( X ; η ∗ ) ⊤ h ( X ) o  . We write A − uu = (cid:18) A uu A uu A uu A uu (cid:19) . When r = p , we have (cid:0) A ψu A − uu A uψ (cid:1) − = " E (cid:26) ∂ g ( X ; η ∗ ) ∂ ψ (cid:27) ⊤ A uu E (cid:26) ∂ g ( X ; η ∗ ) ∂ ψ (cid:27) − = " E (cid:26) ∂ g ( X ; η ∗ ) ∂ ψ (cid:27) ⊤ − (cid:0) A uu (cid:1) − (cid:20) E (cid:26) ∂ g ( X ; η ∗ ) ∂ ψ (cid:27)(cid:21) − . This leads to A − uu A uψ (cid:0) A ψu A − uu A uψ (cid:1) − A ψu A − uu = (cid:18) A uu (cid:0) A uu (cid:1) − A uu A uu A uu A uu (cid:19) = A − uu − (cid:18) A uu − A uu (cid:0) A uu (cid:1) − A uu

00 0 (cid:19) .

39t can be veriﬁed that A θu e u = { λ ∗ (1 − λ ∗ ) } − A θθ e θ and n A uu − A uu (cid:0) A uu (cid:1) − A uu o − = E (cid:26) { ω ( X ) − } h ( X ) (cid:27) = 1 λ ∗ (1 − λ ∗ ) (cid:26) − e ⊤ θ A θθ e θ λ ∗ (1 − λ ∗ ) (cid:27) . By the Woodbury matrix identity, the variance matrix J θ can be simpliﬁed as J θ = ( A θθ + (cid:26) A θθ e θ λ ∗ (1 − λ ∗ ) (cid:27) (cid:20) E (cid:26) { ω ( X ) − } h ( X ) (cid:27)(cid:21) − (cid:26) A θθ e θ λ ∗ (1 − λ ∗ ) (cid:27) ⊤ ) − = A − θθ − e θ e ⊤ θ λ ∗ (1 − λ ∗ ) . This is the same as the asymptotic variance of n / (˜ θ − θ ∗ ) shown in Lemma 1 ofQin & Zhang (1997) under Conditions C1–C3. Part (b).

For r > p , let U m , V m , J m denote the corresponding U , V , J matrices obtainedby using only the ﬁrst m EEs of g ( x ; η ). With Theorem 1, to complete the proof of thispart it suﬃces to show that J m ≥ J m − . From the deﬁnition of the matrix U , we notice that U m has one more column than U m − , and we denote this extra column u m . Then we have U m = ( U m − , u m ). Followingthe proof of Corollary 1 of Qin & Lawless (1994), we have V − m ≥ (cid:18) V − m − (cid:19) . (32)Therefore, J m = U m V − m U ⊤ m ≥ ( U m − , u m ) (cid:18) V − m − (cid:19) ( U m − , u m ) ⊤ = J m − , (33)as required. This completes the proof. Recall that the null hypothesis forms a constraint C = { η : H ( η ) = } , and the ELR statistic for testing H : H ( η ) = 0 is deﬁned as R n = 2 (cid:26) sup ψ , θ ℓ n ( ψ , θ ) − sup η ∈C ℓ n ( ψ , θ ) (cid:27) = 2 n ℓ n ( ˆ ψ , ˆ θ ) − ℓ n ( ˇ ψ , ˇ θ ) o , ψ , ˇ θ ) = arg max η ∈C ℓ n ( ψ , θ ) . In the following steps, we ﬁnd the approximations of ℓ n ( ˆ ψ , ˆ θ ) and ℓ n ( ˇ ψ , ˇ θ ).We ﬁrst derive the approximation of l n ( γ ) when γ is in the n − / neighborhood of itstrue value γ ∗ . Applying the second-order Taylor expansion to l n ( γ ), and using (28) andCondition C5, we have l n ( γ ) = l n ( γ ∗ ) + S ⊤ n ( γ − γ ∗ ) − n γ − γ ∗ ) ⊤ A ( γ − γ ∗ ) + o p (1)= l n ( γ ∗ ) + (cid:0) S ⊤ n θ (cid:1) ( η − η ∗ ) + S ⊤ n u ( u − u ) − n η − η ∗ ) ⊤ (cid:18) A θθ (cid:19) ( η − η ∗ ) − n ( η − η ∗ ) ⊤ (cid:18) A ψu A θu (cid:19) ( u − u ∗ ) + n u − u ∗ ) ⊤ A uu ( u − u ∗ ) + o p (1) . Setting the derivative of l n ( γ ) with respect to u equal to zero gives u − u ∗ = A − uu (cid:0) A uψ A uθ (cid:1) ( η − η ∗ ) − A − uu (cid:18) n S n u (cid:19) + o p ( n − ) . Substituting the approximation of u − u ∗ into l n ( γ ) leads to an approximation of ℓ n ( ψ , θ ): ℓ n ( ψ , θ ) = l n ( γ ∗ ) + ( η − η ∗ ) ⊤ U V − S ∗ n − n η − η ∗ ) ⊤ J ( η − η ∗ ) − n S ⊤ n u A − uu S n u + o p (1) . (34)With the approximation of ˆ η in (34), we then have ℓ n ( ˆ ψ , ˆ θ ) = l n ( γ ∗ ) + 12 n S ∗⊤ n V − U ⊤ J − U V − S ∗ n − n S ⊤ n u A − uu S n u + o p (1) . Next, we ﬁnd an approximation for ˇ η = ( ˇ ψ ⊤ , ˇ θ ⊤ ) ⊤ . We ﬁrst deﬁne ℓ ∗ n ( ψ , θ , v ) = ℓ n ( ψ , θ ) + n v ⊤ H ( η ) , where v is the Lagrange multiplier. Then ˇ η and the corresponding Lagrange multiplierˇ v satisfy ∂ℓ ∗ n ( ˇ ψ , ˇ θ , ˇ v ) ∂ ψ = , ∂ℓ ∗ n ( ˇ ψ , ˇ θ , ˇ v ) ∂ θ = , ∂ℓ ∗ n ( ˇ ψ , ˇ θ , ˇ v ) ∂ v = . (35)It is easy to verify that ˇ γ = γ ∗ + O p ( n − / ) and ˇ v = O p ( n − / ) (Qin & Lawless, 1995;Qin et al., 2015).Let h ∗ = ∂ H ( η ∗ ) /∂ η . When η is in the n − / neighborhood of the true value η ∗ , weapproximate H ( η ) with H ( η ) = h ∗ ( η − η ∗ )+ o p ( n − / ). Together with the approximationof ℓ n ( ψ , θ ) in (34), we approximate ℓ ∗ n ( ψ , θ , v ) at an n − / neighbor of ( ψ ⊤ , θ ⊤ , × q ) ⊤ ℓ ∗ n ( ψ , θ , v ) = l n ( γ ∗ ) + ( η − η ∗ ) ⊤ U V − S ∗ n − n η − η ∗ ) ⊤ J ( η − η ∗ )+ n v ⊤ h ∗ ( η − η ∗ ) − n S ⊤ n u A − uu S n u + o p (1) . Applying the ﬁrst-order Taylor expansion to (35), we have (cid:18) J − h ∗⊤ − h ∗ (cid:19) (cid:18) ˇ η − η ∗ ˇ v (cid:19) = 1 n (cid:18) U V − S ∗ n (cid:19) + o p ( n − ) . Hence, n / (ˇ η − η ∗ ) = ( I , ) (cid:18) J − h ∗⊤ − h ∗⊤ (cid:19) − (cid:18) n − / U V − S n (cid:19) + o p (1)= { J − − J − h ∗⊤ ( h ∗ J − h ∗⊤ ) − h ∗ J − } U V − ( n − / S ∗ n ) + o p (1) , (36)where I is the identity matrix with dimension p + d + 1.Substituting the expression of ˇ η in (36) into (34) gives ℓ n ( ˇ ψ , ˇ θ ) = l n ( γ ∗ ) + 12 n S ∗⊤ n V − U ⊤ { J − − J − h ∗⊤ ( h ∗ J − h ∗⊤ ) − h ∗ J − } U V − S ∗ n − n S ⊤ n u A − uu S n u + o p (1) . Hence, the ELR statistic R n can be written as R n = 1 n S ∗⊤ n V − U ⊤ J − h ∗⊤ ( h ∗ J − h ∗⊤ ) − h ∗ J − U V − S ∗ n + o p (1) . We ﬁnd that J − / h ∗⊤ ( h ∗ J − h ∗⊤ ) − h ∗ J − / is an idempotent matrix with rank q . Fur-ther, as n → ∞ , J − / U V − ( n − / S ∗ n ) → N (0 , I )in distribution. Therefore, the limiting distribution of R n is χ q under H . We start with the proof of Theorem 3. Recall that the ELR statistic for testing thevalidity of the EEs is deﬁned as W n = 2 n ℓ nd (˜ θ ) − ℓ n ( ˆ ψ , ˆ θ ) o . We ﬁrst ﬁnd an approximation of ℓ nd (˜ θ ). Applying the second-order Taylor expansion42o ℓ nd (˜ θ ) at the true value θ ∗ , we have ℓ nd (˜ θ ) = ℓ nd ( θ ∗ ) + (˜ θ − θ ∗ ) ⊤ ∂ℓ nd ( θ ∗ ) ∂ θ + 12 (˜ θ − θ ∗ ) ⊤ ∂ ℓ nd ( θ ∗ ) ∂ θ ∂ θ ⊤ (˜ θ − θ ∗ ) + o p (1) . The fact that ν ∗ = implies ℓ nd ( θ ∗ ) = l n ( γ ∗ ). According to Qin & Zhang (1997), it iseasy to verify that˜ θ − θ ∗ = 1 n A − θθ ∂ℓ nd ( θ ∗ ) ∂ θ + o p ( n − / ) , ∂ℓ nd ( θ ∗ ) ∂ θ = S n θ , and 1 n ∂ ℓ nd ( θ ∗ ) ∂ θ ∂ θ ⊤ = − A θθ + o p (1) . Then ℓ nd (˜ θ ) = l n ( γ ∗ ) + 12 n S ⊤ n θ A − θθ S n θ + o p (1) . Hence, the ELR statistic can be written as W n = 2 n ℓ nd (˜ θ ) − ℓ n ( ˆ ψ , ˆ θ ) o = 1 n S ⊤ n θ A − θθ S n θ + 1 n S ⊤ n u A − uu S n u − n S ∗⊤ n V − U ⊤ J − U V − S ∗ n = 1 n S ∗⊤ n V − ( V − U ⊤ J − U ) V − S ∗ n + o p (1) . (37)Since V is a positive-deﬁnite matrix, we deﬁne an inner product on the vector space R d + r as < a , b > V − = a ⊤ V − b for any vector a , b in the vector space. Recall that C = (cid:18) A θθ e θ − λ ∗ (1 − λ ∗ ) A uu e u (cid:19) . The vector C and each row in U are linearly independent in the inner product spacebecause U V − C = . Let V be the inner product space spanned by the vector C andeach row in U . Then there exists an orthogonal complement B of the subspace V withthe dimension r − p . Let the columns of C ∗ be the basis of the orthogonal complement B . Then C ∗ satisﬁes C ∗⊤ V − ( C , U ⊤ ) = . Deﬁne M ⊤ = ( C ∗ , C , U ⊤ ), which satisﬁes M V − M ⊤ =  C ∗⊤ V − C ∗ C ⊤ V − C

00 0 J  . With the above construction, M is a full rank matrix and can be inverted. We canwrite the inverse of M V − M ⊤ as( M ⊤ ) − V M − =  ( C ∗⊤ V − C ∗ ) − ( C ⊤ V − C ) −

00 0 J −  . V = M ⊤ ( M ⊤ ) − V M − M = C ∗ ( C ∗⊤ V − C ∗ ) − C ∗⊤ + C ( C ⊤ V − C ) − C ⊤ + U ⊤ J − U . Note that C ⊤ V − S ∗ n = e ⊤ θ S n θ − λ ∗ (1 − λ ∗ ) e ⊤ u S n u = n − X i =0 n i X j =1 h ( X ij ) + λ ∗ (1 − λ ∗ ) X i =0 n i X j =1 ω ( X ij ) − h ( X ij )= 0 . This helps to simplify W n as W n = 1 n S ∗⊤ n V − C ∗ ( C ∗⊤ V − C ∗ ) − C ∗⊤ V − S ∗ n + o p (1) . According to Lemma 8, we have

V ar (cid:0) n − / S ∗ n (cid:1) = V − λ ∗ (1 − λ ∗ ) CC ⊤ . Together with C ∗⊤ V − C = and the fact that V − / { C ∗ ( C ∗⊤ V − C ∗ ) − C ∗⊤ } V − / isidempotent with rank r − p , we have { V − / C ∗ ( C ∗⊤ V − C ∗ ) − C ∗⊤ V − } V ar (cid:0) n − / S ∗ n (cid:1) { V − C ∗ ( C ∗⊤ V − C ∗ ) − C ∗⊤ V − / } = V − / C ∗ ( C ∗⊤ V − / C ∗ ) − C ∗⊤ V − / . Therefore, W n asymptotically follows χ r − p under H as n → ∞ .We now prove Corollary 2. Let S ∗ n be the ﬁrst d + r − m + 2 elements of S ∗⊤ n , U bethe ﬁrst r − m columns of U , V be the upper ( d + r − m + 2) × ( d + r − m + 2) matrixof V , and J = U V − U ⊤ . Further, let ℓ n ( ψ , θ ) be the proﬁle empirical log-likelihoodof ( ψ , θ ) using only g ( x ; η ) and( ˆ ψ ∗ , ˆ θ ∗ ) = arg max ψ , θ ℓ n ( ψ , θ ) . Following the techniques used to obtain (37), we have2 n ℓ nd (˜ θ ) − ℓ n ( ˆ ψ ∗ , ˆ θ ∗ ) o = S ∗⊤ n V − ( V − U ⊤ J − U ) V − S ∗ n + o p (1) . W ∗ n has the following approximation: W ∗ n = 2 { ℓ n ( ˆ ψ ∗ , ˆ θ ∗ ) − ℓ n ( ˆ ψ , ˆ θ ) } = 2 n ℓ nd (˜ θ ) − ℓ n ( ˆ ψ , ˆ θ ) o − n ℓ nd (˜ θ ) − ℓ n (ˆ θ ∗ , ˆ β ∗ ) o = 1 n (cid:2) S ∗⊤ n V − ( V − U ⊤ J − U ) V − S ∗ n − S ∗⊤ n V − ( V − U ⊤ J − U ) V − S ∗ n (cid:3) + o p (1) . With the technique used to prove Corollary 1, we have V − ( V − U ⊤ J − U ) V − ≥ (cid:18) V − { V − U ⊤ J − U } V −

00 0 (cid:19) . Then1 n (cid:2) S ∗⊤ n V − ( V − U ⊤ J − U ) V − S ∗ n − S ∗⊤ n V − ( V − U ⊤ J − U ) V − S ∗ n (cid:3) ≥ . Recall that as n → ∞ ,1 n S ∗⊤ n V − ( V − U ⊤ J − U ) V − S ∗ n → χ r − p in distribution. We can similarly prove that as n → ∞ ,1 n S ∗⊤ n V − ( V − U ⊤ J − U ) V − S ∗ n → χ r − m − p in distribution.By the arguments in Qin & Lawless (1994), we conclude that W ∗ n → χ r − p ) − ( r − m − p ) = χ m in distribution as n → ∞ . For (a): We start with some preparation. For any x in the support of F , let F ( x, γ ) = 1 n X i =0 n i X j =1 I ( X ij ≤ x )1 + λ { ω ( X ij ; θ ) − } + ν ⊤ g ( X ij ; ψ , θ ) ,F ( x, γ ) = 1 n X i =0 n i X j =1 ω ( X ij ; θ ) I ( X ij ≤ x )1 + λ { ω ( X ij ; θ ) − } + ν ⊤ g ( X ij ; ψ , θ ) . F ( x ) = F ( x, ˆ γ ) , F ( x, γ ∗ ) = 1 n X i =0 n i X j =1 I ( X ij ≤ x ) h ( X ij ) , ˆ F ( x ) = F ( x, ˆ γ ) , F ( x, γ ∗ ) = 1 n X i =0 n i X j =1 ω ( X ij ) I ( X ij ≤ x ) h ( X ij ) . Next, we explore the properties of the ﬁrst derivatives of F ( x, γ ) and F ( x, γ ) at thetrue value γ ∗ . Deﬁne ∂F ( x, γ ∗ ) ∂ γ =  ∂F ( x, γ ∗ ) ∂ ψ ∂F ( x, γ ∗ ) ∂ θ ∂F ( x, γ ∗ ) ∂ u  , ∂F ( x, γ ∗ ) ∂ γ =  ∂F ( x, γ ∗ ) ∂ ψ ∂F ( x, γ ∗ ) ∂ θ ∂F ( x, γ ∗ ) ∂ u  , where ∂F ( x, γ ∗ ) ∂ ψ = ∂F ( x, γ ∗ ) ∂ ψ = ,∂F ( x, γ ∗ ) ∂ θ = − n X i =0 n i X j =1 h ( X ij ) h ( X ij ) Q ( X ij ) I ( X ij ≤ x ) ,∂F ( x, γ ∗ ) ∂ u = − n X i =0 n i X j =1 G ( X ij ) { h ( X ij ) } I ( X ij ≤ x ) ,∂F ( x, γ ∗ ) ∂ θ = 1 n X i =0 n i X j =1 ω ( X ij ) h ( X ij ) h ( X ij ) Q ( X ij ) I ( X ij ≤ x ) ,∂F ( x, γ ∗ ) ∂ u = − n X i =0 n i X j =1 ω ( X ij ) { h ( X ij ) } G ( X ij ) I ( X ij ≤ x ) . Applying Lemma 7, we have the following results for E n ∂F ( x, γ ∗ ) ∂ γ o and E n ∂F ( x, γ ∗ ) ∂ γ o . Lemma 9.

With the form of ∂F ( x, γ ∗ ) /∂ γ and ∂F ( x, γ ∗ ) /∂ γ deﬁned above, we have − E (cid:26) ∂F ( x, γ ∗ ) ∂ γ (cid:27) = B ( x ) =  B θ ( x ) B u ( x )  = (cid:18) B ∗ ( x ) (cid:19) , − E (cid:26) ∂F ( x, γ ∗ ) ∂ γ (cid:27) = B ( x ) =  B θ ( x ) B u ( x )  = (cid:18) B ∗ ( x ) (cid:19) , here B θ ( x ) = E { h ( X ) Q ( X ) I ( X ≤ x ) } , B u ( x ) = E (cid:26) G ( X ) h ( X ) I ( X ≤ x ) (cid:27) , B θ ( x ) = λ ∗ − λ ∗ E { h ( X ) Q ( X ) I ( X ≤ x ) } , B u ( x ) = E (cid:26) ω ( X ) G ( X ) h ( X ) I ( X ≤ x ) (cid:27) . We now move to the joint asymptotic normality of ˆ F l ( x ) and ˆ F s ( y ). We ﬁrst ﬁnd anapproximation for ˆ F l ( x ) for l = 0 and 1. Applying the ﬁrst-order Taylor expansion toˆ F l ( x ) and using the results in Lemma 9, we haveˆ F l ( x ) = F l ( x, γ ∗ ) − B ∗ l ( x ) ⊤ (ˆ γ ∗ − γ ∗ ) + o p ( n − / )= F l ( x, γ ∗ ) − ( , B l θ ( x ) ⊤ )(ˆ η ∗ − η ∗ ) − B u ( x ) ⊤ ( ˆ u − u ∗ ) + o p ( n − / ) . Using the relationship in (30) and the deﬁnitions of the matrices U and V in (31), wehaveˆ F l ( x ) = F l ( x, γ ∗ ) − B ∗ l ( x ) ⊤ V − U ⊤ (ˆ η ∗ − η ∗ ) + 1 n B l u ( x ) ⊤ A − uu S n u + o p ( n − / )= F l ( x, γ ∗ ) − B ∗ l ( x ) ⊤ (cid:26) V − U ⊤ (ˆ η ∗ − η ∗ ) − (cid:18) A − uu (cid:19) (cid:18) n S ∗ n (cid:19)(cid:27) + o p ( n − / ) . Recall that ˆ η − η ∗ = J − U V − ( n − S ∗ n ) + o p ( n − / ). The approximation of ˆ F l ( x ) is thengiven by ˆ F l ( x ) = F l ( x, γ ∗ ) − n B ∗ l ( x ) ⊤ W S ∗ n + o p ( n − / )with W = V − U ⊤ J − U V − − (cid:18) A − uu (cid:19) . Note that F l ( x ) = E { F l ( x, γ ∗ ) } . Then n / { ˆ F l ( x ) − F l ( x ) } = n / { F l ( x, γ ∗ ) − F l ( x ) } − n − / B ∗ l ( x ) ⊤ W S ∗ n + o p (1) . The two leading terms are summations of independent random variables and both havemean zero. Hence, as n → ∞ , √ n (cid:18) ˆ F l ( x ) − F l ( x )ˆ F s ( y ) − F s ( y ) (cid:19) → N (cid:16) , Σ ls ( x, y ) (cid:17) , where Σ ls ( x, y ) = (cid:18) σ ll ( x, x ) σ ls ( x, y ) σ sl ( y, x ) σ ss ( y, y ) (cid:19) . To complete the proof of (a), we need to argue that Σ ls ( x, y ) has the form claimed in47he lemma. According to the expression of ˆ F l ( x ) − F l ( x ), we have σ ll ( x, x ) = nV ar { F l ( x, γ ∗ ) } + n − V ar ( B ∗ l ( x ) ⊤ W S ∗ n ) − Cov (cid:8) F l ( x, γ ∗ ) , B ∗ l ( x ) ⊤ W S ∗ n (cid:9) ; σ ss ( y, y ) = nV ar { F s ( y, γ ∗ ) } + n − V ar ( B ∗ s ( y ) ⊤ W S ∗ n ) − Cov (cid:8) F s ( y, γ ∗ ) , B ∗ s ( y ) ⊤ W S ∗ n (cid:9) ; σ ls ( x, y ) = nCov { F l ( x, γ ∗ ) , F s ( y, γ ∗ ) } − Cov (cid:8) F l ( x, γ ∗ ) , B ∗ s ( y ) ⊤ W S ∗ n (cid:9) − Cov (cid:8) F s ( y, γ ∗ ) , B ∗ l ( x ) ⊤ W S ∗ n (cid:9) + B ∗ l ( x ) ⊤ { n − V ar ( W S ∗ n ) } B ∗ s ( y ); σ sl ( y, x ) = σ ls ( x, y ) . Next, we calculate the covariances and variances appearing above. We start with thecovariance and variance related to F l ( x, γ ∗ ) and F s ( y, γ ∗ ). Let x ∧ y = min { x, y } . UsingLemma 7, we have nCov { F ( x, γ ∗ ) , F ( y, γ ∗ ) } = (1 − λ ∗ ) Cov (cid:26) I ( X ≤ x ) h ( X ) , I ( X ≤ y ) h ( X ) (cid:27) + λ ∗ Cov (cid:26) I ( X ≤ x ) h ( X ) , I ( X ≤ y ) h ( X ) (cid:27) = E (cid:26) I ( X ≤ x ∧ y ) h ( X ) (cid:27) − (1 − λ ∗ ) E (cid:26) I ( X ≤ x ) h ( X ) (cid:27) E (cid:26) I ( X ≤ y ) h ( X ) (cid:27) − λ ∗ E (cid:26) ω ( X ) I ( X ≤ x ) h ( X ) (cid:27) E (cid:26) ω ( X ) I ( X ≤ y ) h ( X ) (cid:27) . After some algebra, we have that for any x in the support of F , B u ( x ) ⊤ e u = E (cid:26) ω ( X ) I ( X ≤ x ) h ( X ) (cid:27) − E (cid:26) I ( X ≤ x ) h ( X ) (cid:27) ,F ( x ) = E (cid:26) I ( X ≤ x ) h ( X ) (cid:27) + λ ∗ B u ( x ) ⊤ e u . nCov { F ( x, γ ∗ ) , F ( y, γ ∗ ) } is simpliﬁed as nCov { F ( x, γ ∗ ) , F ( y, γ ∗ ) } = E (cid:26) I ( X ≤ x ∧ y ) h ( X ) (cid:27) − λ ∗ B u ( x ) ⊤ e u e ⊤ u B u ( y ) − λ ∗ B u ( x ) ⊤ e u E (cid:26) I ( X ≤ y ) h ( X ) (cid:27) − λ ∗ E (cid:26) I ( X ≤ x ) h ( X ) (cid:27) e ⊤ u B u ( y ) − E (cid:26) I ( X ≤ x ) h ( X ) (cid:27) E (cid:26) I ( X ≤ y ) h ( X ) (cid:27) = E (cid:26) I ( X ≤ x ∧ y ) h ( X ) (cid:27) − λ ∗ B u ( x ) ⊤ e u e ⊤ u B u ( y ) − λ ∗ B u ( x ) ⊤ e u (cid:2) F ( y ) − λ ∗ e ⊤ u B u ( y ) (cid:3) − E (cid:26) I ( X ≤ x ) h ( X ) (cid:27) F ( y )= E (cid:26) I ( X ≤ x ∧ y ) h ( X ) (cid:27) − F ( x ) F ( y ) − λ ∗ (1 − λ ∗ ) B u ( x ) ⊤ e u e ⊤ u B u ( y ) . The covariances nCov { F ( x, γ ∗ ) , F ( y, γ ∗ ) } and nCov { F ( x, γ ∗ ) , F ( y, γ ∗ ) } can befound in a similar manner. For nCov { F ( x, γ ∗ ) , F ( y, γ ∗ ) } , we have nCov { F ( x, γ ∗ ) , F ( y, γ ∗ ) } = E (cid:26) ω ( X ) I ( X ≤ x ∧ y ) h ( X ) (cid:27) − (1 − λ ∗ ) E (cid:26) ω ( X ) I ( X ≤ x ) h ( X ) (cid:27) E (cid:26) ω ( X ) I ( X ≤ y ) h ( X ) (cid:27) − λ ∗ E (cid:26) ω ( X ) I ( X ≤ x ) h ( X ) (cid:27) E (cid:26) ω ( X ) I ( X ≤ y ) h ( X ) (cid:27) = E (cid:26) ω ( X ) I ( X ≤ x ∧ y ) h ( X ) (cid:27) − F ( x ) F ( y ) − λ ∗ (1 − λ ∗ ) B u ( x ) ⊤ e u e ⊤ u B u ( y )and nCov { F ( x, γ ∗ ) , F ( y, γ ∗ ) } = E (cid:26) ω ( X ) I ( X ≤ x ∧ y ) h ( X ) (cid:27) − (1 − λ ∗ ) E (cid:26) I ( X ≤ x ) h ( X ) (cid:27) E (cid:26) ω ( X ) I ( X ≤ y ) h ( X ) (cid:27) − λ ∗ E (cid:26) ω ( X ) I ( X ≤ x ) h ( X ) (cid:27) E (cid:26) ω ( X ) I ( X ≤ y ) h ( X ) (cid:27) = E (cid:26) ω ( X ) I ( X ≤ x ∧ y ) h ( X ) (cid:27) − F ( x ) F ( y ) − λ ∗ (1 − λ ∗ ) B u ( x ) ⊤ e u e ⊤ u B u ( y ) . In summary, for any l, s ∈ { , } , we get nCov { F l ( x, γ ∗ ) , F s ( y, γ ∗ ) } = E (cid:26) ω l + s ( X ) I ( X ≤ x ∧ y ) h ( X ) (cid:27) − F l ( x ) F s ( y ) − λ ∗ (1 − λ ∗ ) B l u ( x ) ⊤ e u e ⊤ u B s u ( y ) . (38)Next, we consider the cross-terms with S ∗ n . We present the calculation of Cov { F ( x, γ ∗ ) , S ∗ n }

49s an illustration. Using Lemma 7, we get

Cov { F ( x, γ ∗ ) , S n θ } = 1 n Cov ( X i =0 n i X j =1 I ( X ij ≤ x ) h ( X ij ) , n X j =1 h ( X j ) Q ( X j ) ⊤ − n X j =1 h ( X j ) Q ( X j ) ⊤ ) = λ ∗ Cov (cid:26) I ( X ≤ x ) h ( X ) , h ( X ) Q ( X ) ⊤ (cid:27) − (1 − λ ∗ ) Cov (cid:26) I ( X ≤ x ) h ( X ) , h ( X ) Q ( X ) ⊤ (cid:27) = (cid:20) E { h ( X ) I ( X ≤ x ) } − − λ ∗ λ ∗ E { h ( X ) I ( X ≤ x ) } (cid:21) E { h ( X ) Q ( X ) ⊤ } . It can be checked that E { h ( X ) Q ( X ) } = 11 − λ ∗ A θθ e θ ,E { h ( X ) I ( X ≤ x ) } − − λ ∗ λ ∗ E { h ( X ) I ( X ≤ x ) } = − (1 − λ ∗ ) B u ( x ) ⊤ e u . Then we have

Cov { F ( x, γ ∗ ) , S n θ } = − B u ( x ) ⊤ e u ( A θθ e θ ) ⊤ . Similarly,

Cov { F ( x, γ ∗ ) , S n u } = − n Cov ( X i =0 n i X j =1 I ( X ij ≤ x ) h ( X ij ) , X i =0 n i X j =1 G ( X ij ) ⊤ h ( X ij ) ) = − λ ∗ Cov (cid:26) I ( X ≤ x ) h ( X ) , G ( X ) ⊤ h ( X ) (cid:27) − (1 − λ ∗ ) Cov (cid:26) I ( X ≤ x ) h ( X ) , G ( X ) ⊤ h ( X ) (cid:27) = − E (cid:26) I ( X ≤ x ) G ( X ) ⊤ h ( X ) (cid:27) + 11 − λ ∗ (cid:20) E { h ( X ) I ( X ≤ x ) } − − λ ∗ λ ∗ E { h ( X ) I ( X ≤ x ) } (cid:21) E { h ( X ) G ( X ) ⊤ } = − E (cid:26) I ( X ≤ x ) G ( X ) ⊤ h ( X ) (cid:27) − B u ( x ) ⊤ e u · E { h ( X ) G ( X ) ⊤ } = − B u ( x ) ⊤ + λ ∗ (1 − λ ∗ ) B u ( x ) ⊤ e u ( A uu e u ) ⊤ , where in the last step we used the facts that B u ( x ) = E (cid:26) I ( X ≤ x ) G ( X ) h ( X ) (cid:27) and E { h ( X ) G ( X ) } = − λ ∗ (1 − λ ∗ ) A uu e u . Recall that C = (cid:18) A θθ e θ − λ ∗ (1 − λ ∗ ) A uu e u (cid:19) . Cov { F ( x, γ ∗ ) , S ∗ n } = − (cid:18) B u ( x ) (cid:19) ⊤ − B u ( x ) ⊤ e u C ⊤ . The covariance between F ( x, γ ∗ ) and S ∗ n can be found in a similar manner; thedetails are omitted. We conclude that for any x in the support of F , Cov { F l ( x, γ ∗ ) , S ∗ n } = − (cid:18) B l u ( x ) (cid:19) ⊤ − B l u ( x ) ⊤ e u C ⊤ , l ∈ { , } . We now return to the form of Σ ( x, y ). Recall that n − V ar ( S n ) = Γ = V − λ ∗ (1 − λ ∗ ) CC ⊤ and U V − C = . This leads to B ∗ l ( x ) ⊤ W Γ = B ∗ l ( x ) ⊤ V − U ⊤ J − U − (cid:18) B l u ( x ) (cid:19) ⊤ − B l u ( x ) ⊤ e u C ⊤ = B ∗ l ( x ) ⊤ V − U ⊤ J − U + Cov { F l ( x, γ ∗ ) , S ∗ n } . Consequently, for l = 0 ,

1, the summation of the last two terms in σ ll ( x, x ) is n − V ar ( B ∗ l ( x ) ⊤ W S ∗ n ) − Cov (cid:8) F l ( x, γ ∗ ) , B ∗ l ( x ) ⊤ W S ∗ n (cid:9) = (cid:2) B ∗ l ( x ) ⊤ W Γ − Cov { F l ( x, γ ∗ ) , S ∗ n } (cid:3) W B ∗ l ( x )= " B ∗ l ( x ) ⊤ V − U ⊤ J − U + (cid:18) B l u ( x ) (cid:19) ⊤ + B l u ( x ) ⊤ e u C ⊤ W B ∗ l ( x )= B ∗ l ( x ) ⊤ W B ∗ l ( x ) + λ ∗ (1 − λ ∗ ) B l u ( x ) ⊤ e u e ⊤ u B l u ( x ) . (39)Combining (38) and (39) leads to σ ll ( x, x ) = E (cid:26) ω l ( X ) I ( X ≤ x ) h ( X ) (cid:27) − F l ( x ) + B ∗ l ( x ) ⊤ W B ∗ l ( x ) . (40)Using similar steps to derive (39), we ﬁnd that the summation of the last three termsin σ ls ( x, y ) is B ∗ l ( x ) ⊤ W Γ W B ∗ s ( y ) − Cov { F l ( x, γ ∗ ) , S ∗ n } W B ∗ s ( y ) − B ∗ l ( x ) ⊤ W Cov { S ∗ n , F s ( y, γ ∗ ) } = B ∗ l ( x ) ⊤ V − U J − U ⊤ W B ∗ s ( y ) − B ∗ l ( x ) ⊤ W Cov { S ∗ n , F s ( y, γ ∗ ) } = B ∗ l ( x ) ⊤ W B ∗ s ( y ) + λ ∗ (1 − λ ∗ ) B l u ( x ) ⊤ e u e ⊤ u B s u ( y ) . (41)51ombining (38) and (41) gives σ ls ( x, y ) = E (cid:26) ω l + s ( X ) I ( X ≤ x ∧ y ) h ( X ) (cid:27) − F l ( x ) F s ( y ) + B ∗ l ( x ) ⊤ W B ∗ s ( y ) . (42)Summarizing (40) and (42), we conclude that for any i, j ∈ { l, s } σ ij ( x, y ) = E (cid:26) ω i + j ( X ) I ( X ≤ x ∧ y ) h ( X ) (cid:27) − F i ( x ) F j ( y ) + B ∗ i ( x ) ⊤ W B ∗ j ( y ) , (43)which is as claimed in the lemma. This completes the proof of (a).For (b): We prove that the claim in (b) is correct for l = 0 and s = 1. The proofs forthe other cases are similar and are omitted.We ﬁrst simplify the matrix W . Let M ⊤ q = ( C , U ⊤ ). Then M q is full rank andtherefore invertible. Note that V = M ⊤ q ( M ⊤ q ) − V M − q M q = M ⊤ q ( M q V − M ⊤ q ) − M q . Recall that

U V − C = and J = U V − U ⊤ . Then M q V − M ⊤ q = (cid:18) C ⊤ V − C J (cid:19) and V = C ( C ⊤ V − C ) − C ⊤ + U ⊤ J − U . Note that C ⊤ V − C = e ⊤ θ A θθ e θ + { λ ∗ (1 − λ ∗ ) } e ⊤ u A uu e u = (1 − λ ∗ ) E { h ( X ) } + { λ ∗ (1 − λ ∗ ) } E (cid:20) { ω ( X ) − } h ( X ) (cid:21) = λ ∗ (1 − λ ∗ ) , where we use the fact that λ ∗ E (cid:20) { ω ( X ) − } h ( X ) (cid:21) + E (cid:26) ω ( X ) − h ( X ) (cid:27) = 0in the last step. The matrix V is expressed as V = { λ ∗ (1 − λ ∗ ) } − CC ⊤ + U ⊤ J − U . W as W = V − U ⊤ J − U V − − (cid:18) A − uu (cid:19) = V − { U ⊤ J − U − V } V − + (cid:18) A − θθ

00 0 (cid:19) = (cid:18) A − θθ

00 0 (cid:19) − { λ ∗ (1 − λ ∗ ) } − V − CC ⊤ V − = (cid:18) A − θθ

00 0 (cid:19) − { λ ∗ (1 − λ ∗ ) } − (cid:18) e θ − λ ∗ (1 − λ ∗ ) e u (cid:19) (cid:18) e θ − λ ∗ (1 − λ ∗ ) e u (cid:19) ⊤ . Substituting W into (43) and using the fact that B ∗ ( x ) ⊤ (cid:18) e θ − λ ∗ (1 − λ ∗ ) e u (cid:19) = λ ∗ F ( x ) , B ∗ ( x ) ⊤ (cid:18) e θ − λ ∗ (1 − λ ∗ ) e u (cid:19) = − (1 − λ ∗ ) F ( x ) , we ﬁnd that for any i, j ∈ { l, s } σ ij ( x, y ) = E (cid:26) ω i + j ( X ) I ( X ≤ x ∧ y ) h ( X ) (cid:27) + B i θ ( x ) ⊤ A − θθ B j θ ( y ) − δ ij F i ( x ) F j ( y ) , where δ ij =  (1 − λ ∗ ) − , i = j = 0( λ ∗ ) − , i = j = 10 , i = j . This form is the same as that in Chen & Liu (2013) for the two-sample case, whichcompletes the proof of (b).For (c): Recall that U m , V m , and J m denote the corresponding U , V , and J matricesobtained by using only the ﬁrst m EEs of g ( x ; η ). We further deﬁne Σ ( m ) ls ( x, y ) = { σ ( m ) ij ( x, y ) } i,j ∈{ l,s } and B ∗ ( m ) l ( x ) to denote the corresponding matrix Σ ls ( x, y ) and vector B l ( x ) obtained by using the ﬁrst m EEs.From the deﬁnitions of these matrices and vectors, we notice the following relation-ships: U m = ( U m − , u m ); V m = (cid:18) V m − ϑ m − ,m ϑ m,m − ϑ m,m (cid:19) ; B ∗ ( m ) l ( x ) = (cid:18) B ∗ ( m − l ( x ) b lm ( x ) (cid:19) , where u m , ϑ m − ,m , ϑ m,m , and b lm ( x ) are the extra terms coming from the m th dimensionof the EEs. 53ith the fact that W = V − ( U ⊤ J − U − V ) V − + (cid:18) A − θθ

00 0 (cid:19) , the entry in the covariance matrix Σ ( m ) ls ( x, y ) can be written as σ ( m ) ij ( x, y ) = E (cid:26) ω i + j ( X ) I ( X ≤ x ∧ y ) h ( X ) (cid:27) − F i ( x ) F j ( y ) + B ∗ ( m ) i ( x ) ⊤ W B ∗ ( m ) j ( y )= E (cid:26) ω i + j ( X ) I ( X ≤ x ∧ y ) h ( X ) (cid:27) − F i ( x ) F j ( y ) + B i θ ( x ) ⊤ A − θθ B j θ ( y ) − B ∗ ( m ) i ( x ) ⊤ V − m ( V m − U ⊤ m J − m U m ) V − m B ∗ ( m ) j ( x )for any i, j ∈ { l, s } .Therefore, Σ ( m − ls ( x, y ) − Σ ( m ) ls ( x, y )= (cid:18) B ∗ ( m ) l ( x ) B ∗ ( m ) s ( y ) (cid:19) ⊤ ( V m − U ⊤ m J − m U m ) V − m (cid:18) B ∗ ( m ) l ( x ) B ∗ ( m ) s ( y ) (cid:19) − (cid:18) B ∗ ( m − l ( x ) B ∗ ( m − s ( y ) (cid:19) ⊤ ( V m − − U ⊤ m − J − m − U m − ) V − m − (cid:18) B ∗ ( m − l ( x ) B ∗ ( m − s ( y ) (cid:19) . Using the results in (32) and (33), we have V − m { V m − U ⊤ m J − m U m } V − m ≥ (cid:18) V − m − (cid:19) (cid:26)(cid:18) V m − ϑ m − ,m ϑ m,m − ϑ m,m (cid:19) − (cid:18) U ⊤ m − u ⊤ m (cid:19) J − m − ( U m − , u m ) (cid:27) (cid:18) V − m − (cid:19) ≥ (cid:18) V − m − { V m − − U ⊤ m − J − m − U m − } V − m − (cid:19) . This implies that Σ ( m − ls ( x, y ) − Σ ( m ) ls ( x, y ) ≥ (cid:18) B ∗ ( m ) l ( x ) B ∗ ( m ) s ( y ) (cid:19) ⊤ (cid:18) V − m − { V m − − U ⊤ m − J − m − U m − } V − m − (cid:19) (cid:18) B ∗ ( m ) l ( x ) B ∗ ( m ) s ( y ) (cid:19) − (cid:18) B ∗ ( m − l ( x ) B ∗ ( m − s ( y ) (cid:19) ⊤ ( V m − − U ⊤ m − J − m − U m − ) V − m − (cid:18) B ∗ ( m − l ( x ) B ∗ ( m − s ( y ) (cid:19) = . This completes the proof of (c). 54

We ﬁrst introduce two lemmas that will be helpful in the proof of Theorem 5. Thefollowing lemma establishes the convergence rate of ˆ ξ i,τ . Lemma 10.

Assume the conditions of Theorem 5 are satisﬁed. For each ﬁxed τ ∈ (0 , and i = 0 , , we have ˆ ξ i,τ − ξ i,τ = O p ( n − / ) . Proof.

We concentrate on the case i = 0; the case i = 1 can be proved similarly. Let∆ n = sup x | ˆ F ( x ) − F ( x ) | . It suﬃces to show that (Chen & Liu, 2013; Chen et al., 2019)∆ n = O p ( n − / ) . (44)Deﬁne ¯ F ( x ) = 1 n X i =0 n i X j =1 I ( X ij ≤ x )1 + λ ∗ h exp { ˆ θ ⊤ Q ( X ij ) } − i . Then∆ n = sup x | ˆ F ( x ) − F ( x ) | ≤ sup x | ˆ F ( x ) − ¯ F ( x ) | + sup x | ¯ F ( x ) − F i ( x ) | = ∆ n + ∆ n , where ∆ n = sup x | ˆ F ( x ) − ¯ F ( x ) | and ∆ n = sup x | ¯ F ( x ) − F ( x ) | . Following the proof of Theorem 3.1 in Chen & Liu (2013) and Lemma 1 in Chen et al.(2019), we can verify that ∆ n = O p ( n − / ) . With this result, the claim (44) is proved if ∆ n = O p ( n − / ).As preparation, we argue that( n ˆ p ij ) − = 1 + ˆ λ [exp { ˆ θ ⊤ Q ( X ij ) } −

1] + ˆ ν ⊤ g ( X ij ; ˆ ψ , ˆ θ ) ≥ − λ ∗ + o p (1) (45)or equivalently ˆ p ij ≤ n − { − λ ∗ + o p (1) } − = O p (1 /n ). Note that( n ˆ p ij ) − ≥ − ˆ λ + ˆ ν ⊤ g ( X ij ; ˆ ψ , ˆ θ ) ≥ − ˆ λ − k ˆ ν k max ij k g ( X ij ; ˆ ψ , ˆ θ ) k . By Condition C5, max ij k g ( X ij ; ˆ ψ , ˆ θ ) k ≤ max ij R / ( X ij ) = o p ( n / ) , which, together with ˆ γ − γ ∗ = O p ( n − / ), implies that (45) is valid.55e now return to argue that ∆ n = O p ( n − / ). After some algebra, we haveˆ F ( x ) − ¯ F ( x )= X i =0 n i X j =1 ˆ p ij ( λ ∗ − ˆ λ ) h exp { ˆ θ ⊤ Q ( X ij ) } − i − ˆ ν ⊤ g ( X ij ; ˆ ψ , ˆ θ )1 + λ ∗ h exp { ˆ θ ⊤ Q ( X ij ) } − i I ( X ij ≤ x ) . Using (45), we have | ˆ F ( x ) − ¯ F ( x ) | ≤ O p (1 /n ) X i =0 n i X j =1 | ˆ λ − λ ∗ | h exp { ˆ θ ⊤ Q ( X ij ) } + 1 i λ ∗ h exp { ˆ θ ⊤ Q ( X ij ) } − i I ( X ij ≤ x )+ O p (1 /n ) X i =0 n i X j =1 | ˆ ν ⊤ g ( X ij ; ˆ ψ , ˆ θ ) | λ ∗ h exp { ˆ θ ⊤ Q ( X ij ) } − i I ( X ij ≤ x ) ≤ O p (1 /n ) X i =0 n i X j =1 | ˆ λ − λ ∗ | λ ∗ (1 − λ ∗ ) I ( X ij ≤ x )+ O p (1 /n ) X i =0 n i X j =1 | ˆ ν ⊤ g ( X ij ; ˆ ψ , ˆ θ ) | − λ ∗ I ( X ij ≤ x ) . (46)By Condition C5,∆ n = sup x | ˆ F ( x ) − ¯ F ( x ) | ≤ O p (1) | ˆ λ − λ ∗ | + O p (1) 1 n X i =0 n i X j =1 (cid:8) k ˆ ν k R / ( X ij ) (cid:9) , which, together with ˆ γ − γ ∗ = O p ( n − / ), implies that∆ n = O p ( n − / ) . This completes the proof.

Lemma 11.

Under the regularity conditions, for any c > and i = 0 , , we have sup x : | x − ξ i,τ |

We prove this lemma for i = 0; the case i = 1 is equivalent. Without loss ofgenerality we assume x ≥ ξ ,τ . Note that |{ ˆ F ( x ) − ˆ F ( ξ ,τ ) } − { F ( x ) − F ( ξ ,τ ) }|≤ |{ ˆ F ( x ) − ˆ F ( ξ ,τ ) } − { ¯ F ( x ) − ¯ F ( ξ ,τ ) }| + |{ ¯ F ( x ) − ¯ F ( ξ ,τ ) } − { F ( x ) − F ( ξ ,τ ) }| . (47)56ollowing the proof of Lemma A.2 in Chen & Liu (2013), we can verify thatsup x : 0 ≤ x − ξ ,τ

The results in (a) and (b) are direct consequences of Theorems 4 and 5.For (c): We note that Ω ls = (cid:18) /f l ( ξ l,τ l ) 00 1 /f s ( ξ s,τ s ) (cid:19) Σ ls ( ξ l,τ l , ξ s,τ s ) (cid:18) /f l ( ξ l,τ l ) 00 1 /f s ( ξ s,τ s ) (cid:19) . Then Theorem 4(c) implies the results in (c). This completes the proof.

References

Chatterjee, N., Chen, Y.-H., Maas, P., & Carroll, R. J. (2016). Constrained maximumlikelihood estimation for model calibration using summary-level information from ex-ternal big data sources.

Journal of the American Statistical Association , , 107–117.Chen, J., Li, P., Liu, Y., & Zidek, J. V. (2019). Composite empirical likelihood formultisample clustered data. Submitted.Chen, J. & Liu, Y. (2013). Quantile and quantile-function estimations under densityratio model. The Annals of Statistics , , 1669–1692.Qin, J. & Lawless, J. (1994). Empirical likelihood and general estimating equations. TheAnnals of Statistics , , 300–325.Qin, J. & Lawless, J. (1995). Estimating equations, empirical likelihood and constraintson parameters. The Canadian Journal of Statistics , , 145–159.Qin, J. & Zhang, B. (1997). A goodness-of-ﬁt test for logistic regression models basedon case-control data. Biometrika , , 609–618.Qin, J., Zhang, H., Li, P., Albanes, D., & Yu, K. (2015). Using covariate-speciﬁc diseaseprevalence information to increase the power of case-control studies. Biometrika ,102