[PDF] Estimating linear functionals in nonlinear regression with responses missing at random

Abstract

We consider regression models with parametric (linear or nonlinear) regression function and allow responses to be ``missing at random.'' We assume that the errors have mean zero and are independent of the covariates. In order to estimate expectations of functions of covariate and response we use a fully imputed estimator, namely an empirical estimator based on estimators of conditional expectations given the covariate. We exploit the independence of covariates and errors by writing the conditional expectations as unconditional expectations, which can now be estimated by empirical plug-in estimators. The mean zero constraint on the error distribution is exploited by adding suitable residual-based weights. We prove that the estimator is efficient (in the sense of Hájek and Le Cam) if an efficient estimator of the parameter is used. Our results give rise to new efficient estimators of smooth transformations of expectations. Estimation of the mean response is discussed as a special (degenerate) case.

Full PDF

aa r X i v : . [ m a t h . S T ] A ug The Annals of Statistics (cid:13)

Institute of Mathematical Statistics, 2009

ESTIMATING LINEAR FUNCTIONALS IN NONLINEARREGRESSION WITH RESPONSES MISSING AT RANDOM

By Ursula U. M¨uller

Texas A&M University

We consider regression models with parametric (linear or nonlin-ear) regression function and allow responses to be “missing at ran-dom.” We assume that the errors have mean zero and are indepen-dent of the covariates. In order to estimate expectations of functionsof covariate and response we use a fully imputed estimator, namely anempirical estimator based on estimators of conditional expectationsgiven the covariate. We exploit the independence of covariates anderrors by writing the conditional expectations as unconditional expec-tations, which can now be estimated by empirical plug-in estimators.The mean zero constraint on the error distribution is exploited byadding suitable residual-based weights. We prove that the estimatoris eﬃcient (in the sense of H´ajek and Le Cam) if an eﬃcient esti-mator of the parameter is used. Our results give rise to new eﬃcientestimators of smooth transformations of expectations. Estimation ofthe mean response is discussed as a special (degenerate) case.

1. Introduction.

Consider a regression model Y = r ϑ ( X ) + ε with linearor nonlinear regression function r ϑ depending on a ﬁnite-dimensional param-eter ϑ in some open set. Assume that the covariate vector X and the errorvariable ε are independent and that Eε = 0. Note that we do not make anyfurther model assumptions on the distributions of the variables. We are in-terested in the situation where the response Y is missing at random , in otherwords, we always observe X but only observe Y in those cases where someindicator Z equals one, and the indicator Z is conditionally independent of Y given X .We want to estimate the expectation Eh ( X, Y ) of some known square-integrable function h from a sample ( X i , Z i Y i , Z i ), i = 1 , . . . , n , for example,the mean response, higher moments of Y or X or mixed moments. If all Received December 2007; revised July 2008.

AMS 2000 subject classiﬁcations.

Primary 62J02; secondary 62N01, 62F12, 62G20.

Key words and phrases.

Semiparametric regression, weighted empirical estimator, em-pirical likelihood, inﬂuence function, gradient, conﬁdence interval.

This is an electronic reprint of the original article published by theInstitute of Mathematical Statistics in

The Annals of Statistics ,2009, Vol. 37, No. 5A, 2245–2277. This reprint diﬀers from the original inpagination and typographic detail. 1

U. U. M ¨ULLER indicators Z i were 1, a simple consistent estimator would be the empiri-cal estimator n − P ni =1 h ( X i , Y i ). A related estimator for the missing datasituation considered here would be1 n n X i =1 Z i ˆ π ( X i ) h ( X i , Y i )with ˆ π ( X ) denoting an estimator of the conditional probability π ( X ) = P ( Z = 1 | X ) = E ( Z | X ). Another estimator is the partially imputed estimator1 n n X i =1 { Z i h ( X i , Y i ) + (1 − Z i ) ˆ χ ( X i ) } , where ˆ χ ( X ) is a (semiparametric) estimator of the conditional expectation χ ( X ) = E { h ( X, Y ) | X } . An alternative to this estimator is the fully imputed estimator n − P ni =1 ˆ χ ( X i ).If a nonparametric estimator ˆ χ is used, we expect all three estimators tobe asymptotically equivalent. For h ( X, Y ) = Y and the last two estimators,this is sketched in Cheng (1994). Here we assume a speciﬁc form of the con-ditional distribution of Y given X , and we can construct better estimatorsthan the nonparametric ones. We then expect the fully imputed estima-tor n − P ni =1 ˆ χ ( X i ) to be better than the partially imputed one, which inturn should be better than the ﬁrst estimator. For parametric models thisis shown for h ( X, Y ) = Y by Tamhane (1978) and Matloﬀ (1981). M¨uller,Schick and Wefelmeyer (2006) show for several regression models (not in-cluding the present one) and arbitrary h that the fully imputed estimator isusually better than the partially imputed estimator. That the same holds forthe nonlinear regression model considered here is intuitively clear: our model E ( Y | X ) = r ϑ ( X ) constitutes a structural constraint. The fully imputed es-timator, based on estimators ˆ χ ( X ) that use the structure, will therefore bebetter than the partially imputed estimator, which uses this informationonly at data points where responses are missing.In this article we study the fully imputed estimator based on suitableestimators for χ ( X ) and show that it is eﬃcient. The construction is as fol-lows: in a ﬁrst step we exploit the independence of covariates and errors andthe structure of the regression model and write the conditional expectation χ ( x ) = χ ( x, ϑ ) as an unconditional expectation of the error distribution, χ ( x, ϑ ) = E { h ( X, Y ) | X = x } = Eh { x, r ϑ ( x ) + ε } = Eh { x, r ϑ ( x ) + Y − r ϑ ( X ) } . This representation suggests an empirical plug-in estimator based on theobserved data, namelyˆ χ ( x, ˆ ϑ ) = n X j =1 Z j h { x, r ˆ ϑ ( x ) + Y j − r ˆ ϑ ( X j ) } . n X j =1 Z j , EGRESSION WITH MISSING RESPONSES where ˆ ϑ is an estimator of ϑ . The corresponding fully imputed estimator is1 n n X i =1 ˆ χ ( X i , ˆ ϑ ) = 1 n n X i =1 P nj =1 Z j h { X i , r ˆ ϑ ( X i ) + Y j − r ˆ ϑ ( X j ) } P nj =1 Z j . (1.1)It is straightforward to check that ˆ χ ( x, ϑ ) is consistent for Eh { x, r ϑ ( x ) + ε } [which yields consistency of n − P ni =1 ˆ χ ( X i , ˆ ϑ ), with ˆ ϑ consistent]; notethat ˆ χ ( x, ϑ ) tends in probability to E [ Zh { x, r ϑ ( x ) + ε } ] /EZ with EZ = E { E ( Z | X ) } = Eπ ( X ). Now use the missing at random assumption and theindependence of X and ε to rewrite the numerator, E ( E [ Zh { x, r ϑ ( x ) + ε }| X ]) = E ( E ( Z | X ) E [ h { x, r ϑ ( x ) + ε }| X ])= E [ π ( X ) Eh { x, r ϑ ( x ) + ε } ]= Eπ ( X ) Eh { x, r ϑ ( x ) + ε } . The limit of ˆ χ ( x, ϑ ) is therefore χ ( x, ϑ ) = Eh { x, r ϑ ( x ) + ε } .The estimator (1.1) is well thought out and consistent. However, it is notyet eﬃcient, even if an eﬃcient estimator for ϑ is used (which is relativelyelaborate in the model considered here; see Section 5): we focus on thecommon situation where the errors have mean zero; this information mustalso be incorporated in order to obtain eﬃciency.Motivated by Owen’s empirical likelihood approach, we improve the aboveestimator by introducing weights which use the mean zero constraint on theerror distribution. However, and in contrast to the original approach, we can-not observe the errors and must use residuals. This clearly complicates thesituation: since we have missing responses the residuals are partially incom-plete and, moreover, they involve parameter estimates ˆ ϑ . Formally, we chooseweights ˆ w j based on residuals ˆ ε j = Y j − r ˆ ϑ ( X j ) such that P nj =1 ˆ w j Z j ˆ ε j = 0.(See Section 3 for more details.)Our ﬁnal estimator now is a weighted version of the above fully imputedestimator, namely1 n n X i =1 ˆ χ w ( X i , ˆ ϑ ) = 1 n n X i =1 P nj =1 ˆ w j Z j h { X i , r ˆ ϑ ( X i ) + Y j − r ˆ ϑ ( X j ) } P nj =1 Z j . (1.2)The combination of full imputation methods (involving estimators of un-conditional expectations of the error distribution) with empirical likelihoodideas provides a new methodology which has not appeared in the literaturebefore. We show in this article that n − P ni =1 ˆ χ w ( X i , ˆ ϑ ) is eﬃcient if an ef-ﬁcient estimator ˆ ϑ for ϑ is used. The partially imputed estimator will ingeneral not be eﬃcient, even if ˆ ϑ is eﬃcient for ϑ .For estimation of the mean response, that is, if h ( X, Y ) = Y , which isof particular interest and typically considered in the literature, the esti-mator simpliﬁes to the straightforward estimator n − P ni =1 r ˆ ϑ ( X i ). That the U. U. M ¨ULLER unweighted estimator (1.1) for EY cannot be eﬃcient is immediately appar-ent: consider the case where all responses are observed. Here (1.1) reducesto the empirical estimator n − P ni =1 Y i which does not use the regressionstructure at all. It will be seen that its inﬂuence function is not the eﬃcientone. (See Section 6 for details.)Our eﬃciency results are based on the H´ajek–Le Cam theory for locallyasymptotically normal families. As a consequence, our proposed estimatorshave a limiting normal distribution with the asymptotic variance determinedby the inﬂuence function. It is therefore straightforward to construct asymp-totic conﬁdence interval for Eh ( X, Y ) (see Section 6.3).In addition, estimators for smooth (continuously diﬀerentiable) transfor-mations of expectations Eh ( X, Y ) are also now available, with the varianceof the response, Var Y = EY − E Y , as an important example. Since eﬃ-ciency is preserved by smooth transformations, plugging in eﬃcient estima-tors yields an eﬃcient estimator of the transformation. The transformationfor Var Y in terms of the ﬁrst two moments is ( EY, EY ) EY − ( EY ) .Plugging in n − P ni =1 r ˆ ϑ ( X i ) for EY and the weighted fully imputed esti-mator for EY (which is straightforward to compute and is also given inSection 6) gives an eﬃcient estimator of the variance.To our knowledge, our estimator (1.2) is the ﬁrst eﬃcient estimator forarbitrary linear functionals Eh ( X, Y ) (including the mean functional EY )in the nonlinear regression model (including the linear regression model Y = ϑ ⊤ X + ε ) with independent centered errors when responses are missingat random. Matloﬀ (1981) considers estimation of the mean EY in a modelrelated to ours, the (parametric) conditional mean model , E ( Y | X ) = r ϑ ( X ),which can (but need not) also be written in the form Y = r ϑ ( X ) + ε withconditionally centered errors, E ( ε | X ) = 0. He shows that the average of theestimated regression function values (with his estimator ˆ ϑ of ϑ ) improvesupon the partially imputed estimator. Wang and Rao (2001) consider lin-early constrained covariates and develop an empirical likelihood approachfor inference about the mean in linear regression (with independent errors)based on partial linear regression imputation. In Wang and Rao (2002) theypresent an empirical likelihood approach for inference about the mean re-sponse in nonparametric regression, based on partial kernel regression impu-tation as suggested by Cheng (1994). A diﬀerent empirical likelihood methodfor this setting is proposed by Qin and Zhang (2007). Wang (2004) assumes aparametric model for the conditional density of Y given X , with constraintson the covariate distribution, and introduces a weighted partial imputationestimator for the mean, utilizing empirical likelihood techniques. Wang, Lin-ton and H¨ardle (2004) consider a partially linear regression model for theconditional mean function and derive inference tools for the mean responsebased on a class of asymptotically equivalent (partially and fully imputed) EGRESSION WITH MISSING RESPONSES estimators. A related article is Liang, Wang and Carroll (2007) who addi-tionally assume that covariates are measured with error. Chen, Fan, Li andZhou (2006) consider partially imputed estimators for the mean response in aquasi-likelihood setting. Maity, Ma and Carroll (2007) estimate expectationsin semi-parametric regression models, with and without missing responses.They consider a general regression function involving a parametric and anonparametric part, thus covering the partly linear model, and assume thatthe likelihood function given the covariates is known.For estimating expectations, little attention has been given to the fullyimputed estimator. We anticipate that in many situations, in particular inmodels with structural assumptions, improved estimators can be obtainedby using appropriate full imputation instead of partial imputation estimates.Inference for missing data has been studied by many authors, also recently.Chen and Wang (2009) study estimation of parameters which are deﬁnedby model constraints. They introduce an empirical likelihood approach in-volving estimating equations, where missing variables are replaced using anonparametric imputation approach. Chen, Hong and Tarozzi (2008) con-sider parameter estimation as well. They introduce eﬃcient estimators forparameters in GMM models with missing data, and assume that the miss-ingness can be explained by auxiliary variables. More references to recentliterature can be found, for example, in Wang, Linton and H¨ardle (2004)and in the monograph by Tsiatis (2006). For an introduction, see Tsiatis(2006) and the books by Little and Rubin (2002) and Gelman et al. (1995).This paper is organized as follows. In Section 2 we derive a stochasticexpansion of the unweighted estimator. The expansion of the weighted es-timator is given in Section 3, utilizing the results of Section 2. Section 4characterizes eﬃcient estimators of arbitrary functionals of the joint distri-bution and gives the eﬃcient inﬂuence function of the functional Eh ( X, Y )in the nonlinear regression model. In Section 5 we characterize eﬃcient es-timators for the parameter vector ϑ and brieﬂy sketch the construction ofsuch an estimator. In this section we also show our main result, that theweighted estimator with an eﬃcient estimator ˆ ϑ for ϑ plugged in is eﬃ-cient for Eh ( X, Y ). Section 6 contains a short discussion of special casessuch as estimation of the mean response. We also compare, using computersimulations, the eﬃcient (weighted fully imputed) estimator with the otherapproaches, with convincing results. For these studies we considered a linearand a nonlinear regression function and estimation of two simple function-als, namely of the response mean and second moment, for which the eﬃcient(weighted fully imputed) estimator simpliﬁes, and estimation of a more com-plicated expectation. We also brieﬂy sketch the construction of conﬁdenceintervals.

U. U. M ¨ULLER

2. Expansion of the unweighted estimator.

In this section we derive anexpansion of the unweighted estimator n − P ni =1 ˆ χ ( X i , ˆ ϑ ), which is a specialcase of the weighted estimator n − P ni =1 ˆ χ w ( X i , ˆ ϑ ) with all weights beingequal to one, w j = 1. This can be regarded as a result of independent interestsince the estimator (with an appropriate estimator ˆ ϑ ) would be relevant forregression models where the errors cannot be assumed to have mean zero.Also, we will see in the next section that the weighted estimator can bewritten as the sum of the unweighted estimator and an additional correctionterm. Hence we can utilize the results later when we derive an expansion ofthe weighted estimator.Throughout this paper we will assume that Y is square integrable andthat the error variance Eε = σ is nonzero and ﬁnite. We also supposethat the error distribution has a Lebesgue density f and ﬁnite Fisher in-formation, Eℓ ( ε ) < ∞ , where ℓ denotes the score function for location, ℓ ( ε ) = − f ′ ( ε ) /f ( ε ). The degenerate case that we (almost surely) never ob-serve a response Y will be excluded by assuming P ( Z = 1) = EZ >

0. Thefollowing assumptions will also be required.

Assumption 1.

The regression function τ r τ ( x ) is diﬀerentiable at τ = ϑ with a p -dimensional square integrable gradient ˙ r ϑ ( x ) which satisﬁesthe Lipschitz condition | ˙ r τ ( x ) − ˙ r ϑ ( x ) | ≤ | τ − ϑ | a ( x ) , a ( X ) square integrable.Later we will also need that the covariance matrix of an eﬃcient parameterestimator ˆ ϑ [which involves the covariance matrix of ˙ r ϑ ( X ) and the Fisherinformation] is invertible.Now use a Taylor expansion to see that n X i =1 { r τ ( X i ) − r ϑ ( X i ) − ˙ r ϑ ( X i ) ⊤ ( τ − ϑ ) } = n X i =1 (cid:20)Z { ˙ r ϑ + u ( τ − ϑ ) ( X i ) − ˙ r ϑ ( X i ) } ⊤ ( τ − ϑ ) du (cid:21) ≤ | τ − ϑ | n X i =1 Z | ˙ r ϑ + u ( τ − ϑ ) ( X i ) − ˙ r ϑ ( X i ) | du ≤ | τ − ϑ | n X i =1 a ( X i ) . Assumption 1 therefore guarantees that the function τ r τ ( X ) is stochas-tically diﬀerentiable, that is, for each constant C ,sup | τ − ϑ |≤ Cn − / n X i =1 { r τ ( X i ) − r ϑ ( X i ) − ˙ r ϑ ( X i ) ⊤ ( τ − ϑ ) } = o p (1) . (2.1) EGRESSION WITH MISSING RESPONSES We will not need the ﬁrst partial derivative of h ( x, y ), ∂/∂xh ( x, y ). There-fore we will write h ′ for the second partial derivative, h ′ ( x, y ) = ∂ h ( x, y ) = ∂/∂yh ( x, y ). Assumption 2.

The function h ( x, y ) is diﬀerentiable in y with a squareintegrable partial derivative h ′ ( x, y ) = ∂/∂yh ( x, y ) which satisﬁes the Lips-chitz condition | h ′ ( x, z ) − h ′ ( x, y ) | ≤ | z − y | b ( x, y ) , b ( X, Y ) square integrable.In the following ¯ Z will denote the average of the indicators Z i , ¯ Z = n − P ni =1 Z i . The next lemma gives the expansion of the estimator aroundthe true parameter ϑ . Lemma 2.1.

Assume that Assumptions and hold and that ˆ ϑ is a √ n consistent estimator of ϑ . Then the unweighted estimator has the expansion n n X i =1 ˆ χ ( X i , ˆ ϑ ) = 1 n n X i =1 ˆ χ ( X i , ϑ ) + D ⊤ ( ˆ ϑ − ϑ ) + o p ( n − / )(2.2) with D = E ( h ( X, Y )[ ˙ r ϑ ( X ) − E { ˙ r ϑ ( X ) | Z = 1 } ] ℓ ( ε )) . Proof.

For reasons of clarity we introduce the notation f ij ( ϑ ) = h { X i , r ϑ ( X i ) + Y j − r ϑ ( X j ) } and write ˙ f ij for the gradient. Then1 n n X i =1 ˆ χ ( X i , ˆ ϑ )= 1 n n X i =1 n X j =1 Z j ¯ Z h { X i , r ˆ ϑ ( X i ) + Y j − r ˆ ϑ ( X j ) } = 1¯ Z n n X i =1 " n X j =1 j = i Z j h { X i , r ˆ ϑ ( X i ) + Y j − r ˆ ϑ ( X j ) } + Z i h ( X i , Y i ) (2.3) = 1¯ Z n n X i =1 ( n X j =1 j = i Z j f ij ( ϑ ) + Z i h ( X i , Y i ) ) + 1¯ Z n n X i =1 n X j =1 j = i Z j { f ij ( ˆ ϑ ) − f ij ( ϑ ) } = 1 n n X i =1 ˆ χ ( X i , ϑ ) + 1¯ Z n n X i =1 n X j =1 j = i Z j { f ij ( ˆ ϑ ) − f ij ( ϑ ) } . U. U. M ¨ULLER

Below we will show that1¯ Z n n X i =1 n X j =1 j = i Z j { f ij ( ˆ ϑ ) − f ij ( ϑ ) } = D ⊤ ( ˆ ϑ − ϑ ) + o p ( n − / )(2.4)with D = ( EZ ) − E [ Z h ′ { X , r ϑ ( X )+ Y − r ϑ ( X ) }{ ˙ r ϑ ( X ) − ˙ r ϑ ( X ) } ] . Thatthis D is indeed of the form given in the lemma can be seen as follows. Con-sider D = E [ h ′ { X , r ϑ ( X ) + ε } ˙ r ϑ ( X )] − EZ E [ h ′ { X , r ϑ ( X ) + ε } ( Z = 1) ˙ r ϑ ( X )] . The ﬁrst term can be written E ( E [ h ′ { X , r ϑ ( X ) + ε }| X ] ˙ r ϑ ( X )). Inte-gration by parts of the inner integral gives E [ h ′ { X , r ϑ ( X ) + ε }| X ] = E [ h { X , r ϑ ( X ) + ε } ℓ ( ε ) | X ]. The second term is E [ h ′ { X , r ϑ ( X ) + ε } ] E { ˙ r ϑ ( X ) | Z = 1 } . We proceed analogously and, in conclusion, obtain D = E ( h ( X, Y )[ ˙ r ϑ ( X ) − E { ˙ r ϑ ( X ) | Z = 1 } ] ℓ ( ε )) . (2.5)The result now follows from (2.3), (2.4) and (2.5). It remains to verify (2.4).The proof consists of two parts,1¯ Z n n X i =1 n X j =1 j = i Z j { f ij ( ˆ ϑ ) − f ij ( ϑ ) − ˙ f ij ( ϑ ) ⊤ ( ˆ ϑ − ϑ ) } = o p ( n − / ) , (2.6) 1¯ Z n n X i =1 n X j =1 j = i Z j ˙ f ij ( ϑ ) ⊤ ( ˆ ϑ − ϑ ) = D ⊤ ( ˆ ϑ − ϑ ) + o p ( n − / ) . (2.7)Statement (2.7) can be quickly proved: since ˆ ϑ is √ n consistent we canreplace the gradient by its expectation,1¯ Z n n X i =1 n X j =1 j = i Z j ˙ f ij ( ϑ ) ⊤ ( ˆ ϑ − ϑ )= 1¯ Z n n X i =1 n X j =1 j = i E { Z j ˙ f ij ( ϑ ) } ⊤ ( ˆ ϑ − ϑ ) + o p ( n − / )= 1 EZ E { Z ˙ f ( ϑ ) } ⊤ ( ˆ ϑ − ϑ ) + o p ( n − / )with ( EZ ) − E { Z ˙ f ( ϑ ) } = D ⊤ as given in (2.4). For the proof of (2.6) itsuﬃces to show that n X i =1 " √ n n X j =1 j = i Z j { f ij ( ˆ ϑ ) − f ij ( ϑ ) − ˙ f ij ( ϑ ) ⊤ ( ˆ ϑ − ϑ ) } = O p (1) . EGRESSION WITH MISSING RESPONSES This holds by the following arguments. Rewrite the above expression andapply the Cauchy–Schwarz inequality to obtain n X i =1 √ n n X j =1 j = i Z j Z [ ˙ f ij { ϑ + u ( ˆ ϑ − ϑ ) } − ˙ f ij ( ϑ )] ⊤ ( ˆ ϑ − ϑ ) du ! ≤ n X i =1 n X j =1 j = i Z j | ˆ ϑ − ϑ | Z | ˙ f ij { ϑ + u ( ˆ ϑ − ϑ ) } − ˙ f ij ( ϑ ) | du. The diﬀerence | ˙ f ij { ϑ + u ( ˆ ϑ − ϑ ) } − ˙ f ij ( ϑ ) | is bounded by | ˆ ϑ − ϑ | timesa square integrable function A ij . This holds due to Assumptions 1 and 2,namely the Lipschitz conditions on ˙ r ϑ and h ′ and since a ( X ) , b ( X, Y ) , ˙ r ϑ ( X )and h ′ ( X, Y ) are square integrable. Summing up, the expression is boundedby | ˆ ϑ − ϑ | P ni =1 P nj =1 ,j = i A ij which is stochastically bounded since ˆ ϑ is √ n consistent. (cid:3) We will now replace the estimated conditional expectation ˆ χ in the right-hand side of (2.2) by the true one. Set S = 1 n ( n − n X i =1 n X j =1 j = i Z j EZ h { X i , r ϑ ( X i ) + Y j − r ϑ ( X j ) } . We have1 n n X i =1 ˆ χ ( X i , ϑ ) = EZ ¯ Z S + O p ( n − ) = S − ¯ Z − EZEZ ES + o p ( n − / )and, by the Hoeﬀding decomposition, S = ES + 1 n n X i =1 { χ ( X i , ϑ ) − ES } + 1 n n X i =1 (cid:26) Z i ¯ h ( ε i ) EZ − ES (cid:27) + o p ( n − / )with ¯ h ( ε ) = E { h ( X, Y ) | ε } , ES = Eh ( X, Y ) = E ¯ h ( ε ). Combining the aboveyields1 n n X i =1 ˆ χ ( X i , ϑ ) = 1 n n X i =1 χ ( X i , ϑ ) + 1 n n X i =1 Z i EZ { ¯ h ( ε i ) − E ¯ h ( ε ) } + o p ( n − / ) . This and Lemma 2.1 give our expansion for the unweighted estimator whichwe formulate as a corollary.

Corollary 2.2.

Assume that Assumptions and hold and that ˆ ϑ is a √ n consistent estimator of ϑ . Then, with D = E ( h ( X, Y )[ ˙ r ϑ ( X ) − U. U. M ¨ULLER E { ˙ r ϑ ( X ) | Z = 1 } ] ℓ ( ε )) and ¯ h ( ε ) = E { h ( X, Y ) | ε } , the unweighted estimatorhas the expansion n n X i =1 ˆ χ ( X i , ˆ ϑ )= 1 n n X i =1 (cid:20) χ ( X i , ϑ ) + Z i EZ { ¯ h ( ε i ) − E ¯ h ( ε i ) } (cid:21) + D ⊤ ( ˆ ϑ − ϑ ) + o p ( n − / ) .

3. Expansion of the weighted estimator.

In this section we study theweighted estimator which uses residual-based weights, ˆ w j , that are con-structed by adapting empirical likelihood techniques. The approach is tomaximize Q nj =1 ˆ w j subject to the mean zero constraint on the error distri-bution, P nj =1 ˆ w j Z j ˆ ε j = 0, with ˆ w j ≥ P nj =1 ˆ w j = n . The weights solvingthis optimization problem are given by ˆ w j = 1 / (1 + ˆ λZ j ˆ ε j ), where ˆ λ denotesthe Lagrange multiplier—provided ˆ λ exists. As shown by Owen (1988, 2001),this is the case if not all residuals have the same sign, that is, on the eventmin ≤ j ≤ n ˆ ε j < < max ≤ j ≤ n ˆ ε j , which has probability tending to one sincethe residuals ˆ ε j are uniformly close to the centered errors ε j [see (A.1) inthe Appendix]. If ˆ λ does not exist, we set ˆ λ = 0. Note that the weights equalone if Z j = 0 or ˆ λ = 0. For computational issues we refer to Section 2.9 ofOwen’s book (2001).The formula for the weights can be written as an identity, ˆ w j = 1 − ˆ λ ˆ w j Z j ˆ ε j . This enables us to decompose the estimator into the unweightedestimator and an additional correction term,1 n n X i =1 ˆ χ w ( X i , ˆ ϑ ) = 1 n n X i =1 ˆ χ ( X i , ˆ ϑ )(3.1) − ˆ λn n X i =1 n X j =1 ˆ w j ˆ ε j Z j ¯ Z h { X i , r ˆ ϑ ( X i ) + ˆ ε j } . Since we have already derived an expansion of the unweighted estimator(see Corollary 2.2) we only need to study the second term on the right-handside. In Lemma 3.1 we will derive an expansion of the estimated Lagrangemultiplier ˆ λ and use this result in Lemma 3.2, where we determine an ap-proximation of the extra term. For the proof of Lemma 3.1 we proceed anal-ogously to Owen (2001), pages 219–221 [compare also M¨uller, Schick andWefelmeyer (2005)]. This requires some auxiliary results which are provedin the Appendix, namelymax ≤ i ≤ n | Z i ˆ ε i | = o p ( n / ) , (3.2) EGRESSION WITH MISSING RESPONSES n n X i =1 Z i ˆ ε i = 1 n n X i =1 Z i ε i − EZE { ˙ r ϑ ( X ) | Z = 1 } ⊤ ( ˆ ϑ − ϑ ) + o p ( n − / )(3.3) = O p ( n − / ) , n n X i =1 Z i ˆ ε i = 1 n n X i =1 Z i ε i + o p (1) = EZσ + o p (1) , (3.4)where ˆ ϑ is a √ n consistent estimator of ϑ and σ > Lemma 3.1.

Suppose that Assumption is satisﬁed and let ˆ ϑ be a √ n consistent estimator of ϑ . Then max ≤ j ≤ n | ˆ w j − | = o p (1) and ˆ λ = 1 σ n n X j =1 Z j EZ ε j − σ E { ˙ r ϑ ( X ) | Z = 1 } ⊤ ( ˆ ϑ − ϑ ) + o p ( n − / )(3.5) = O p ( n − / ) . Proof.

We ﬁrst derive the order of ˆ λ . Recall that ˆ w j = 1 / (1 + ˆ λZ j ˆ ε j ),that ˆ w j + ˆ λ ˆ w j Z j ˆ ε j = 1 and that P nj =1 ˆ w j Z j ˆ ε j = 0 by construction. Also notethat the Z j ’s are binary and that therefore Z j = Z j . This allows us to write1 n n X j =1 Z j ˆ ε j = 1 n n X j =1 ( ˆ w j + ˆ λ ˆ w j Z j ˆ ε j ) Z j ˆ ε j = ˆ λ n n X j =1 ˆ w j Z j ˆ ε j = ˆ λ n n X j =1 Z j ˆ ε j λZ j ˆ ε j . Note that 1 + ˆ λZ j ˆ ε j > | ˆ λ | n n X j =1 Z j ˆ ε j = | ˆ λ | n n X j =1 Z j ˆ ε j λZ j ˆ ε j (1 + ˆ λZ j ˆ ε j ) ≤ | ˆ λ | n n X j =1 Z j ˆ ε j λZ j ˆ ε j (cid:18) | ˆ λ | max ≤ j ≤ n | Z j ˆ ε j | (cid:19) = | ˆ λ | λ n n X j =1 Z j ˆ ε j (cid:18) | ˆ λ | max ≤ j ≤ n | Z j ˆ ε j | (cid:19) . The last equality holds due to (3.6). Applying (3.2), (3.3) and (3.4) to theﬁrst and last terms of the inequality we obtain | ˆ λ | · O p (1) = O p ( n − / ) + | ˆ λ | o p (1) which implies ˆ λ = O p ( n − / ). This and (3.2) give max ≤ j ≤ n | ˆ λZ j ˆ ε j | U. U. M ¨ULLER = o p (1) and therefore our ﬁrst statement,max ≤ j ≤ n | ˆ w j − | = max ≤ j ≤ n (cid:12)(cid:12)(cid:12)(cid:12) − ˆ λZ j ˆ ε j λZ j ˆ ε j (cid:12)(cid:12)(cid:12)(cid:12) = o p (1) . We now again make use of (3.6) and write1 n n X j =1 Z j ˆ ε j = ˆ λ ( n n X j =1 ( ˆ w j − Z j ˆ ε j + 1 n n X j =1 Z j ˆ ε j ) = ˆ λ n n X j =1 Z j ˆ ε j + o p ( n − / ) . For the last statement we utilized (3.4), max ≤ j ≤ n | ˆ w j − | = o p (1) and ˆ λ = O p ( n − / ). This and (3.4) giveˆ λ = P nj =1 Z j ˆ ε j P nj =1 Z j ˆ ε j + o p ( n − / )= 1 EZ σ n n X j =1 Z j ˆ ε j + o p ( n − / ) . Inserting approximation (3.3) for n − P nj =1 Z j ˆ ε j ﬁnally yields the desiredapproximation of ˆ λ . (cid:3) Lemma 3.2.

Suppose that Assumptions and are satisﬁed and let ˆ ϑ be a √ n consistent estimator of ϑ . Then, with ¯ h ( ε ) = E { h ( X, Y ) | ε } , ˆ λn n X i =1 n X j =1 ˆ w j ˆ ε j Z j ¯ Z h { X i , r ˆ ϑ ( X i ) + ˆ ε j } = 1 σ n n X i =1 Z i EZ ε i E { ε ¯ h ( ε ) } − σ E { ε ¯ h ( ε ) } E { ˙ r ϑ ( X ) | Z = 1 } ⊤ ( ˆ ϑ − ϑ )+ o p ( n − / ) . Proof.

Since ˆ λ = O p ( n − / ) and max ≤ j ≤ n | ˆ w j − | = o p (1) by the pre-vious lemma, and since max ≤ i ≤ n | Z i ˆ ε i | = o p ( n / ) by (3.2), it is clear thatthe terms of the sum where j = i , that is, h { X i , r ˆ ϑ ( X i ) + ˆ ε i } = h ( X i , Y i ), canbe ignored. It therefore suﬃces to prove the statement forˆ λn X i X j = i ˆ w j ˆ ε j Z j ¯ Z h { X i , r ˆ ϑ ( X i ) + ˆ ε j } = ˆ λ EZ ¯ Z ψ w ( ˆ ϑ ) EGRESSION WITH MISSING RESPONSES with ψ w ( ˆ ϑ ) = 1 n X i X j = i ˆ w j ˆ ε j Z j EZ h { X i , r ˆ ϑ ( X i ) + ˆ ε j } = ψ ( ˆ ϑ ) + 1 n X i X j = i ( ˆ w j − ε j Z j EZ h { X i , r ˆ ϑ ( X i ) + ˆ ε j } , where ψ is ψ w with ˆ w j = 1. The second part involving the diﬀerence ˆ w j − o p ( n − / ), which can be seen as follows: using ˆ λ = O p ( n − / ) andmax ≤ j ≤ n | ˆ w j − | = o p (1) we obtain (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ˆ λ EZ ¯ Z n X i X j = i ( ˆ w j − ε j Z j EZ h { X i , r ˆ ϑ ( X i ) + ˆ ε j } (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ | ˆ λ | Z max ≤ j ≤ n | ˆ w j − | n X i X j = i | ˆ ε j h { X i , r ˆ ϑ ( X i ) + ˆ ε j }| = o p ( n − / ) · n X i X j = i | ˆ ε j h { X i , r ˆ ϑ ( X i ) + ˆ ε j }| . This gives the claimed rate o p ( n − / ) since the sum is bounded in probability,which follows from the √ n consistency of ˆ ϑ and Assumptions 1 and 2 onthe terms of the product ( Y − r τ ( X )) h { X , r τ ( X ) + Y − r τ ( X ) } .It remains to consider ˆ λEZ/ ¯ Zψ ( ˆ ϑ ). Using ˆ λ = O p ( n − / ) we can replace ψ ( ˆ ϑ ) by ψ ( ϑ ) since ψ ( ˆ ϑ ) − ψ ( ϑ ) = o p (1) , which again follows from Assump-tions 1 and 2 and the consistency of ˆ ϑ . Further, by the law of large numbers, EZ/ ¯ Z = 1 + o p (1) and ψ ( ϑ ) − Eψ ( ϑ ) = o p (1). These arguments yieldˆ λ EZ ¯ Z ψ ( ˆ ϑ ) = ˆ λEψ ( ϑ ) + o p ( n − / ) . The expected value of ψ ( ϑ ) is n − n E (cid:20) ε Z EZ h { X , r ϑ ( X ) + ε } (cid:21) = n − n E { εh ( X, Y ) } = n − n E { ε ¯ h ( ε ) } . Summing up,ˆ λ EZ ¯ Z ψ w ( ˆ ϑ ) = ˆ λEψ ( ϑ ) + o p ( n − / ) = ˆ λE { ε ¯ h ( ε ) } + o p ( n − / ) . Inserting expansion (3.5) for ˆ λ into the above completes the proof. (cid:3) Combining the previous lemma and the approximation of the weightedestimator from Section 2 gives an expansion for the weighted estimator. U. U. M ¨ULLER

Theorem 3.3.

Suppose that Assumption 1 and 2 are satisﬁed and that ˆ ϑ is a √ n consistent estimator of ϑ . Let ¯ h ( ε ) = E { h ( X, Y ) | ε } . Then n n X i =1 ˆ χ w ( X i , ˆ ϑ ) = 1 n n X i =1 (cid:18) χ ( X i , ϑ ) + Z i EZ (cid:20) ¯ h ( ε i ) − E ¯ h ( ε i ) − E { ε ¯ h ( ε ) } σ ε i (cid:21)(cid:19) + D ⊤ w ( ˆ ϑ − ϑ ) + o p ( n − / ) , where D w = E ( h ( X, Y )[ ˙ r ϑ ( X ) − E { ˙ r ϑ ( X ) | Z = 1 } ] ℓ ( ε )) + σ − E { ε ¯ h ( ε ) }× E { ˙ r ϑ ( X ) | Z = 1 } . Proof.

Consider the two terms of representation (3.1) and replace themby their approximations given in Corollary 2.2 and Lemma 3.2. This yields1 n n X i =1 ˆ χ w ( X i , ˆ ϑ )= 1 n n X i =1 (cid:18) χ ( X i , ϑ ) + Z i EZ (cid:20) ¯ h ( ε i ) − E ¯ h ( ε ) − E { ε ¯ h ( ε ) } σ ε i (cid:21)(cid:19) + (cid:20) D ⊤ + 1 σ E { ε ¯ h ( ε ) } E { ˙ r ϑ ( X ) | Z = 1 } ⊤ (cid:21) ( ˆ ϑ − ϑ ) + o p ( n − / )with D + σ − E { ε ¯ h ( ε ) } E { ˙ r ϑ ( X ) | Z = 1 } = D w , by deﬁnition of D (see Corol-lary 2.2). Inserting this into the above gives the desired representation. (cid:3)

4. Eﬃciency.

We are interested in eﬃcient estimation of Eh ( X, Y ) basedon observations (

X, ZY, Z ). Our estimator requires an eﬃcient estimator of ϑ . In this section we determine the inﬂuence function of an eﬃcient estimatorof Eh ( X, Y ). In the next section, where the inﬂuence function of an eﬃcientestimator ˆ ϑ of ϑ is determined, this allows us to show that the fully imputedestimator with an eﬃcient ˆ ϑ plugged in is eﬃcient. Throughout we willsuppose that the assumptions made earlier are satisﬁed.We ﬁrst calculate the eﬃcient inﬂuence function for estimating an arbi-trary functional κ of the joint distribution P ( dx, dy, dz ). The joint distri-bution depends on the marginal distribution G ( dx ) of X , the conditionalprobability π ( x ) of Z = 1 given X = x , and the conditional distribution Q ( x, dy ) of Y given X = x , P ( dx, dy, dz ) = G ( dx ) B π ( x ) ( dz ) { zQ ( x, dy ) + (1 − z ) δ ( dy ) } . Here B p = pδ + (1 − p ) δ denotes the Bernoulli distribution with parameter p and δ t the Dirac measure at t . In a ﬁrst step we consider a nonparamet-ric model for P , that is, we allow for arbitrary models for G, Q and π . Forthis general setting a characterization of eﬃcient estimators of κ ( G, Q, π )is in M¨uller, Schick and Wefelmeyer (2006), Section 2. In the following we

EGRESSION WITH MISSING RESPONSES summarize their key arguments and apply them to the special case of non-linear regression (which is not considered in that article). We then calculatethe eﬃcient inﬂuence functions for estimating Eh ( X, Y ) in the nonlinearregression model and, in the next section, for estimating ϑ .For the characterization of eﬃcient estimators it is essential to ﬁrst intro-duce the notion of tangent spaces. The tangent space of a model is the setof possible perturbations of P within the model. An estimator of a certainfunctional is, roughly speaking, eﬃcient if its inﬂuence function equals theso-called canonical gradient of the functional, which is an element of the tan-gent space. Hence, in order to characterize the eﬃcient inﬂuence function,we ﬁrst need to determine the tangent space.Consider (Hellinger diﬀerentiable) perturbations of G , Q and π , G nu ( dx ) ˙= G ( dx ) { n − / u ( x ) } ,Q nv ( x, dy ) ˙= Q ( x, dy ) { n − / v ( x, y ) } ,B π nw ( x ) ( dz ) ˙= B π ( x ) ( dz )[1 + n − / { z − π ( x ) } w ( x )] . To guarantee that the perturbed distributions are probability distributionsrequires that the (Hellinger) derivative u belongs to L , ( G ) = (cid:26) u ∈ L ( G ) : Z u dG = 0 (cid:27) , that v belongs to V = (cid:26) v ∈ L ( M ) : Z v ( x, y ) Q ( x, dy ) = 0 (cid:27) with M ( dx, dy ) = Q ( x, dy ) G ( dx ), and that w belongs to L ( G π ), where G π ( dx ) = π ( x ) { − π ( x ) } G ( dx ). The perturbed joint distribution P nuvw thenhas derivative t uvw ( x, zy, z ) = u ( x ) + zv ( x, y ) + { z − π ( x ) } w ( x ). Note thatmodels for G, Q and π will result in further restrictions on the perturba-tions which must satisfy the model assumptions. Then u, v and π must berestricted to subspaces U of L , ( G ), V of V and W of L ( G π ).In this article we make no model assumptions on G and π and thushave U = L , ( G ) and W = L ( G π ). Since we are considering nonlinearregression we do, however, have a model for the conditional distribution,namely Q ( x, dy ) = f { y − r ϑ ( x ) } dy with f denoting the (mean zero) den-sity of the error distribution. Perturbations v of Q must therefore satisfy R v ( x, y ) f { y − r ϑ ( x ) } dy = 0. In order to derive an explicit form of V , weintroduce perturbations s and t of the two parameters f and ϑ . Write F for the distribution function of f and remember that we assume that f hasﬁnite Fisher information for location, Eℓ ( ε ) < ∞ , where ℓ = − f ′ /f is thescore function. The perturbed distribution Q now depends on s and t , Q nv ( x, dy ) = Q nst ( x, dy ) = f ns { y − r ϑ nt ( x ) } dy U. U. M ¨ULLER with ϑ nt = ϑ + n − / t , t ∈ R p , f ns ( y ) = f ( y ) { n − / s ( y ) } and s ∈ S , where S = (cid:26) s ∈ L ( F ) : Z s ( y ) f ( y ) dy = 0 , Z ys ( y ) f ( y ) dy = 0 (cid:27) . Note that the space S is determined by two constraints: the perturbed errordensity f ns must integrate to 1, R f ns ( y ) dy = 1, and must be centered atzero, R yf ns ( y ) dy = 0. As in Schick (1993), Section 3, we have f ns { y − r ϑ nt ( x ) } = f { y − r ϑ nt ( x ) } [1 + n − / s { y − r ϑ nt ( x ) } ]˙= [ f { y − r ϑ ( x ) } − n − / f ′ { y − r ϑ ( x ) } ˙ r ϑ ( x ) ⊤ t ][1 + n − / s { y − r ϑ ( x ) } ]˙= f { y − r ϑ ( x ) } (cid:18) n − / (cid:20) s { y − r ϑ ( x ) } − f ′ { y − r ϑ ( x ) } f { y − r ϑ ( x ) } ˙ r ϑ ( x ) ⊤ t (cid:21)(cid:19) = f { y − r ϑ ( x ) } (1 + n − / [ s { y − r ϑ ( x ) } + ℓ { y − r ϑ ( x ) } ˙ r ϑ ( x ) ⊤ t ]) . Therefore Q nst ( x, dy ) ˙= f { y − r ϑ ( x ) } dy × (1 + n − / [ s { y − r ϑ ( x ) } + ℓ { y − r ϑ ( x ) } ˙ r ϑ ( x ) ⊤ t ])and the subspace V of V is V = { v ( x, y ) = s { y − r ϑ ( x ) } + ℓ { y − r ϑ ( x ) } ˙ r ϑ ( x ) ⊤ t : s ∈ S, t ∈ R p } . (4.1)We now brieﬂy review some deﬁnitions. We will do this for arbitrarysubspaces U, V and W of L , ( G ), V and L ( G π ), and then return to ourspeciﬁc situation.Let T denote the tangent space consisting of all derivatives t uvw . A func-tional κ of G , Q and π is called diﬀerentiable with gradient g ∈ L ( P ) if, forall u ∈ U , v ∈ V and w ∈ W , n / { κ ( G nu , Q nv , π nw ) − κ ( G, Q, π ) } (4.2) → E { g ( X, ZY, Z ) t uvw ( X, ZY, Z ) } . The (unique) canonical gradient g ∗ = g ∗ ( X, ZY, Z ) is the projection of g ( X,ZY, Z ) onto the tangent space T . It is easy to check that T can be writtenas an orthogonal sum of three subspaces, T = { u ( X ) : u ∈ U } ⊕ { Zv ( X, Y ) : v ∈ V } ⊕ {{ Z − π ( X ) } w ( X ) : w ∈ W } . The random variable g ∗ ( X, ZY, Z ) is therefore the sum u ∗ ( X ) + Zv ∗ ( X, Y ) + { Z − π ( X ) } w ∗ ( X ), where u ∗ ( X ), Zv ∗ ( X, Y ) and { Z − π ( X ) } w ∗ ( X ) are theprojections of g ( X, ZY, Z ) onto these subspaces.

EGRESSION WITH MISSING RESPONSES An estimator ˆ κ for κ is regular with limit L if L is a random variablesuch that for all u ∈ U , v ∈ V and w ∈ W , n / { ˆ κ − κ ( G nu , Q nv , π nw ) } ⇒ L under P nuvw . The H´ajek–Le Cam convolution theorem says that L is distributed as thesum of a normal random variable N , with mean zero and variance Eg ∗ ,and some independent random variable. This justiﬁes calling an estimatorˆ κ eﬃcient if it is regular with limit L = N . As a consequence, a regularestimator is eﬃcient if and only if it is asymptotically linear with inﬂuencefunction g ∗ , that is, n / { ˆ κ − κ ( G, Q, π ) } = n − / n X i =1 g ∗ ( X i , Z i Y i , Z i ) + o p (1) . A reference for the convolution theorem and the characterization is Bickelet al. (1998).Let us now specify the canonical gradient for the functional Eh ( X, Y ).The canonical gradient is, in particular, a gradient and thus speciﬁed by(4.2). Moreover, it is characterized by g ∗ ( X, ZY, Z ) = u ∗ ( X ) + Zv ∗ ( X, Y ) + { Z − π ( X ) } w ∗ ( X ) with the terms of the sum being projections as statedabove. The canonical gradient for arbitrary κ is therefore determined by E { u ∗ ( X ) u ( X ) } + E { Zv ∗ ( X, Y ) v ( X, Y ) } + E [ { Z − π ( X ) } w ∗ ( X ) w ( X )](4.3) = lim n →∞ n / { κ ( G nu , Q nv , π nw ) − κ ( G, Q, π ) } . In the nonlinear regression model we have, as deﬁned earlier, U = L , ( G ), W = L ( G π ), Q nv = Q nst with v ∈ V , that is, v ( X, Y ) = s ( ε ) + ℓ ( ε ) ˙ r ϑ ( X ) ⊤ t [see (4.1)]. Since Eh ( X, Y ) does not depend on π we have Eh ( X, Y ) = κ ( G, Q, π ) = κ ( G, Q ) and Eh ( X, Y ) = Z h dM = Z Z h ( x, y ) Q ( x, dy ) G ( dx )= Z Z h ( x, y ) f { y − r ϑ ( x ) } dy G ( dx ) . Let M nuv ( dx, dy ) = Q nv ( x, dy ) G nu ( dx ) with Q nv = Q nst = f ns { y − r ϑ nt ( x ) } dy and perturbations G nu , f ns and ϑ nt as deﬁned earlier. Using the previousapproximations we see that the right-hand side of (4.3) islim n →∞ n / (cid:18)Z h dM nuv − Z h dM (cid:19) = E [ h ( X, Y ) { u ( X ) + v ( X, Y ) } ] U. U. M ¨ULLER with v ( X, Y ) = s ( ε ) + ℓ ( ε ) ˙ r ϑ ( X ) ⊤ t . The canonical gradient g ∗ of Eh ( X, Y )is therefore determined by E { u ∗ ( X ) u ( X ) } + E { Zv ∗ ( X, Y ) v ( X, Y ) } (4.4) + E [ { Z − π ( X ) } w ∗ ( X ) w ( X )] = E [ h ( X, Y ) { u ( X ) + v ( X, Y ) } ]for all u ∈ U , v ∈ V and w ∈ W with v of the above form.In order to specify g ∗ we set u = 0 and v = 0 in (4.4) and see that w ∗ must be zero. Setting v = 0, we see that u ∗ ( X ) is the projection of h ( X, Y )onto U = L , ( G ), that is, u ∗ ( X ) = χ ( X, ϑ ) − E { χ ( X, ϑ ) } with χ ( X, ϑ ) = E { h ( X, Y ) | X } . Hence we have g ∗ ( X, ZY, Z ) = χ ( X, ϑ ) − E { χ ( X, ϑ ) } + Zv ∗ ( X, Y )(4.5)and are left to determine v ∗ . Taking u = 0 in (4.4), we see that the projec-tion of Zv ∗ ( X, Y ) onto ˜ V = { v ( X, Y ) : v ∈ V } must equal the projection of h ( X, Y ) onto ˜ V , that is, onto˜ V = { s ( ε ) + ℓ ( ε ) ˙ r ϑ ( X ) ⊤ t, s ∈ S, t ∈ R p } . There are two possible ways to obtain v ∗ . One method would be to make aneducated guess: in Theorem 3.3 we derived an approximation of an estimatorof Eh ( X, Y ) which we expect to be eﬃcient since it uses all informationabout the model. The approximation still involves ˆ ϑ − ϑ but, combined withthe eﬃcient inﬂuence function for estimating ϑ (which is relatively easyto derive; see Section 5), it will suggest a candidate for v ∗ . Whether thiscandidate is the correct v ∗ can be checked with characterization (4.4), thatis, with E [ Zv ∗ ( X, Y ) { s ( ε ) + ℓ ( ε ) ˙ r ϑ ( X ) ⊤ t } ] = E [ h ( X, Y ) { s ( ε ) + ℓ ( ε ) ˙ r ϑ ( X ) ⊤ t } ] . (4.6)The other method uses the structure of the tangent space. The canonicalgradient v ∗ is characterized in terms of projections onto ˜ V . Its derivationas a projection onto ˜ V is simpliﬁed by decomposing ˜ V . Let ℓ s denote theprojection of ℓ onto S , ℓ s ( ε ) = ℓ ( ε ) − σ − ε, and note that ℓ s = 0 is possible, namely when the error density f is normal.We now introduce the notation ζ = [ ˙ r ϑ ( X ) − E { ˙ r ϑ ( X ) | Z = 1 } ] ℓ ( ε ) + E { ˙ r ϑ ( X ) | Z = 1 } εσ and, for s ∈ S and t ∈ R p , write s ( ε ) + ˙ r ϑ ( X ) ⊤ tℓ ( ε )= s ( ε ) + t ⊤ [ ˙ r ϑ ( X ) − E { ˙ r ϑ ( X ) | Z = 1 } ] ℓ ( ε ) EGRESSION WITH MISSING RESPONSES + t ⊤ E { ˙ r ϑ ( X ) | Z = 1 } (cid:26) ℓ ( ε ) − εσ (cid:27) + t ⊤ E { ˙ r ϑ ( X ) | Z = 1 } εσ = t ⊤ ζ + s ( ε ) + t ⊤ E { ˙ r ϑ ( X ) | Z = 1 } ℓ s ( ε )with s ( ε ) + t ⊤ E { ˙ r ϑ ( X ) | Z = 1 } ℓ s ( ε ) ∈ S . Any element of ˜ V can therefore bewritten t ⊤ ζ + s ( ε ) for some t ∈ R p and s ∈ S . Since the canonical gradient v ∗ is in ˜ V by deﬁnition, it must be of the form v ∗ ( X, Y ) = s ∗ ( ε ) + t ∗⊤ ζ with s ∗ ∈ S and t ∗ ∈ R p to be determined such that (4.6) holds, that is, afterour above considerations, E [ Z { s ∗ ( ε ) + t ∗⊤ ζ }{ s ( ε ) + t ⊤ ζ } ] = E [ h ( X, Y ) { s ( ε ) + t ⊤ ζ } ]for all t ∈ R p and s ∈ S .We ﬁrst consider t = 0 and secondly s = 0 and, in both cases, use the factthat Zζ is orthogonal to S . Then the above characterization of s ∗ and t ∗ reduces to two equations, namely E { Zs ∗ ( ε ) s ( ε ) } = E { h ( X, Y ) s ( ε ) } for all s ∈ S, (4.7) E { Zt ∗⊤ ζt ⊤ ζ } = E { h ( X, Y ) t ⊤ ζ } for all t ∈ R p . (4.8)Consider (4.7) and again use the notation ¯ h ( ε ) for the conditional expecta-tion E { h ( X, Y ) | ε } . Then (4.7) can be written as E { Zs ∗ ( ε ) s ( ε ) } = E { ¯ h ( ε ) s ( ε ) } , that is, ¯ h ( ε ) /EZ is an obvious candidate for s ∗ . However, it is not(yet) in S : the desired s ∗ is obtained as its centered version with a correctionterm chosen such that s ∗ ∈ S , s ∗ ( ε ) = 1 EZ (cid:20) ¯ h ( ε ) − E ¯ h ( ε ) − E { ε ¯ h ( ε ) } σ ε (cid:21) . The vector t ∗ is obtained by solving (4.8), t ∗⊤ E ( Zζζ ⊤ ) t = E { h ( X, Y ) ζ ⊤ } t for all t ∈ R p . Now use the deﬁnition of ζ from above and the deﬁnition of thevector D w from the end of the previous section, D w = E ( h ( X, Y )[ ˙ r ϑ ( X ) − E { ˙ r ϑ ( X ) | Z = 1 } ] ℓ ( ε )) + σ − E { ε ¯ h ( ε ) } E { ˙ r ϑ ( X ) | Z = 1 } , and assume that E ( Zζζ ⊤ ) is invertible to obtain t ∗⊤ = E { h ( X, Y ) ζ ⊤ } E ( Zζζ ⊤ ) − = E (cid:26) h ( X, Y ) (cid:18) [ ˙ r ϑ ( X ) − E { ˙ r ϑ ( X ) | Z = 1 } ] ⊤ ℓ ( ε )+ E { ˙ r ϑ ( X ) | Z = 1 } ⊤ εσ (cid:19)(cid:27) E ( Zζζ ⊤ ) − = D ⊤ w E ( Zζζ ⊤ ) − . U. U. M ¨ULLER

This completes the derivation of v ∗ ( X, Y ) = s ∗ ( ε ) + t ∗⊤ ζ : v ∗ ( X, Y ) = 1 EZ (cid:20) ¯ h ( ε ) − E ¯ h ( ε ) − E { ε ¯ h ( ε ) } σ ε (cid:21) + D ⊤ w E ( Zζζ ⊤ ) − ζ. (4.9)Equations (4.5) and (4.9) together ﬁnally yield the canonical gradient g ∗ ,which is given in the following lemma. Note that we now have the addi-tional assumption that E ( Zζζ ⊤ ) is invertible, where E ( Zζζ ⊤ ) involves thecovariance matrix of Z ˙ r ( X ) and the Fisher information Eℓ ( ε ). Lemma 4.1.

Let ¯ h ( ε ) = E { h ( X, Y ) | ε } , ζ = [ ˙ r ϑ ( X ) − E { ˙ r ϑ ( X ) | Z = 1 } ] ℓ ( ε )+ σ − E { ˙ r ϑ ( X ) | Z = 1 } ε and D w = E ( h ( X, Y )[ ˙ r ϑ ( X ) − E { ˙ r ϑ ( X ) | Z = 1 } ] ℓ ( ε ))+ σ − E { ε ¯ h ( ε ) } E { ˙ r ϑ ( X ) | Z = 1 } = E { h ( X, Y ) ζ } . Suppose additionally to themodel assumptions from Section 2 that E ( Zζζ ⊤ ) is invertible. Then thecanonical gradient of the functional Eh ( X, Y ) is χ ( X, ϑ ) − E { χ ( X, ϑ ) } (4.10) + ZEZ (cid:20) ¯ h ( ε ) − E ¯ h ( ε ) − E { ε ¯ h ( ε ) } σ ε (cid:21) + D ⊤ w E { Zζζ ⊤ } − Zζ.

5. Estimation of the parameter and main result.

In this section we showthat the weighted estimator for Eh ( X, Y ) with an eﬃcient estimator ˆ ϑ for ϑ plugged in is asymptotically linear with inﬂuence function equal to thecanonical gradient, that is, it is eﬃcient. Let us compare the expansion ofthe weighted estimator from Theorem 3.3 and the eﬃcient inﬂuence functionwhich is given by the canonical gradient (4.10) in Lemma 4.1. The approxi-mation of n − / P ni =1 [ ˆ χ w ( X i , ˆ ϑ ) − E { χ ( X, ϑ ) } ] which we derived in Section3 is n − / n X i =1 (cid:18) χ ( X i , ϑ ) − E { χ ( X, ϑ ) } + Z i EZ (cid:20) ¯ h ( ε i ) − E ¯ h ( ε ) − E { ε ¯ h ( ε ) } σ ε i (cid:21)(cid:19) + D ⊤ w n / ( ˆ ϑ − ϑ ) , where D w = E ( h ( X, Y )[ ˙ r ϑ ( X ) − E { ˙ r ϑ ( X ) | Z = 1 } ] ℓ ( ε )) + σ − E { ε ¯ h ( ε ) } × E { ˙ r ϑ ( X ) | Z = 1 } . The eﬃcient inﬂuence function determined by the canon-ical gradient is χ ( X, ϑ ) − E { χ ( X, ϑ ) } + ZEZ (cid:20) ¯ h ( ε ) − E ¯ h ( ε ) − E { ε ¯ h ( ε ) } σ ε (cid:21) + D ⊤ w E { Zζζ ⊤ } − Zζ with ζ = [ ˙ r ϑ ( X ) − E { ˙ r ϑ ( X ) | Z = 1 } ] ℓ ( ε ) + σ − E { ˙ r ϑ ( X ) | Z = 1 } ε . Using anestimator ˆ ϑ with inﬂuence function E ( Zζζ ⊤ ) − Zζ would therefore yield an EGRESSION WITH MISSING RESPONSES eﬃcient estimator for Eh ( X, Y ). In fact, it is easy to check (this will bedone in the following lemma) that this inﬂuence function is the canonicalgradient of the functional κ ( G, Q, π ) = ϑ . This means that our estimator of Eh ( X, Y ) requires an eﬃcient estimator ˆ ϑ for ϑ to be plugged in in orderto be eﬃcient. Lemma 5.1.

Let ζ = [ ˙ r ϑ ( X ) − E { ˙ r ϑ ( X ) | Z = 1 } ] ℓ ( ε ) + σ − E { ˙ r ϑ ( X ) | Z =1 } ε and suppose that E ( Zζζ ⊤ ) is invertible. An asymptotically linear esti-mator ˆ ϑ for ϑ with inﬂuence function E ( Zζζ ⊤ ) − Zζ , that is, n / ( ˆ ϑ − ϑ )= n − / n X i =1 E ( Zζζ ⊤ ) − Z i (cid:20) { ˙ r ϑ ( X i ) − E [ ˙ r ϑ ( X ) | Z = 1] } ℓ ( ε i )+ E { ˙ r ϑ ( X ) | Z = 1 } ε i σ (cid:21) + o p (1) , is eﬃcient for ϑ . Proof.

We have a semiparametric model for the conditional distribu-tion, namely Q ( x, dy ) = f ( y − r ϑ ( x )) dy , and nonparametric models for G and π . The functional ϑ ∈ R p is therefore a functional of Q , κ ( G, Q, π ) = κ ( Q ) = ϑ . By the discussion of the previous section we must show that theinﬂuence function of the estimator equals the canonical gradient, which is,for arbitrary functionals κ , determined by (4.3). For the functional ϑ theright-hand side of (4.3) is simply n / { ( ϑ + n − / t ) − ϑ } = t . From Section4 we also know that in the nonlinear regression model any v in ˜ V is of theform v ( X, Y ) = s ( ε ) + t ⊤ ζ , where s ∈ S and t ∈ R . The canonical gradient u ∗ ( X ) + Zv ∗ ( X, Y ) + { Z − π ( X ) } w ∗ ( X ) is therefore characterized by E { u ∗ ( X ) u ( X ) } + E [ Zv ∗ ( X, Y ) { s ( ε ) + ζ ⊤ t } ]+ E [ { Z − π ( X ) } w ∗ ( X ) w ( X )] = t. Taking s = 0, t = 0 and w = 0 we see that u ∗ = 0. Analogously one obtainsthat w ∗ must be zero. The canonical gradient thus reduces to Zv ∗ ( X, Y ).Again, since v ∗ ∈ ˜ V , we write Zv ∗ ( X, Y ) = Zs ∗ ( ε ) + Zζ ⊤ t ∗ with s ∗ and t ∗ to be determined. Taking t = 0 we see that Zv ∗ must be orthogonal to S ,that is, s ∗ = 0 which yields Zv ∗ ( X, Y ) = Zζ ⊤ t ∗ . The above characterizationtherefore reduces to t = E [ Zζ ⊤ t ∗ { s ( ε ) + ζ ⊤ t } ] = t ∗⊤ E ( Zζζ ⊤ ) t for all t ∈ R . This gives t ∗ = E ( Zζζ ⊤ ) − and the proof is complete: the canonical gradientof the parameter ϑ is Zv ∗ ( X, Y ) = Zt ∗⊤ ζ = E ( Zζζ ⊤ ) − Zζ . (cid:3) U. U. M ¨ULLER

Note that the asymptotic variance of ˆ ϑ is E ( Zζζ ⊤ ) − . The assumptionthat E ( Zζζ ⊤ ) must be invertible is therefore a condition on the covariancematrix of an eﬃcient estimator of ϑ which we require to have full rank.Lemma 5.1 combined with the previous discussion yields our main result,which is given in the following theorem. Note that the asymptotic varianceof the fully imputed estimator of Eh ( X, Y ) is Eg ∗ , where g ∗ is the canon-ical gradient from (4.10). This variance is also given in the theorem belowand is easily veriﬁed by taking into account that the three terms of g ∗ areorthogonal. Theorem 5.2.

Assume that Assumptions and hold and that thecovariance matrices of ˙ r ϑ ( X ) and of Z ˙ r ϑ ( X ) are invertible. Let ˆ ϑ be anasymptotically linear estimator of ϑ with inﬂuence function E ( Zζζ ⊤ ) − Zζ where ζ = [ ˙ r ϑ ( X ) − E { ˙ r ϑ ( X ) | Z = 1 } ] ℓ ( ε ) + σ − E { ˙ r ϑ ( X ) | Z = 1 } ε . Then theestimator n − P ni =1 ˆ χ w ( X i , ˆ ϑ ) with ˆ χ w ( X i , ˆ ϑ ) = P nj =1 ˆ w j Z j h { x, r ˆ ϑ ( x ) + Y j − r ˆ ϑ ( X j ) } / P nj =1 Z j has the expansion n n X i =1 (cid:18) χ ( X i , ϑ ) + Z i EZ (cid:20) ¯ h ( ε i ) − E ¯ h ( ε i ) − E { ε ¯ h ( ε ) } σ ε i (cid:21) + D ⊤ w E ( Zζζ ⊤ ) − Z i × [ ˙ r ϑ ( X i ) − E { ˙ r ϑ ( X ) | Z = 1 } ] ℓ ( ε i ) + E { ˙ r ϑ ( X ) | Z = 1 } ε i σ (cid:19) + o p ( n − / ) , where D w = E ( h ( X, Y )[ ˙ r ϑ ( X ) − E { ˙ r ϑ ( X ) | Z = 1 } ] ℓ ( ε )) + σ − E { ε ¯ h ( ε ) } × E { ˙ r ϑ ( X ) | Z = 1 } and ¯ h ( ε ) = E { h ( X, Y ) | ε } . In particular, it is an eﬃcientestimator of Eh ( X, Y ) and asymptotically normally distributed with asymp-totic variance Eχ ( X, ϑ ) + 1

EZ E ¯ h ( ε ) − (cid:18) EZ (cid:19) E h ( X, Y ) − E { ε ¯ h ( ε ) } σ EZ + D ⊤ w E ( Zζζ ⊤ ) − D w . In the linear regression model without missing responses, eﬃcient estimatorsfor ϑ have been constructed by Bickel (1982), Koul and Susarla (1983) andSchick (1987, 1993). Schick (1993) considers general regression models witharbitrary sets of identiﬁability assumptions and discusses the mean zero con-straint on the error distribution as an important example. His constructionof an eﬃcient estimator requires a preliminary estimate of ϑ and a directestimator of the inﬂuence function. The inﬂuence function for the nonlinearregression model with mean zero errors [see Schick (1993), Section 4.1 andRemark 3.13] is E ( ξξ ⊤ ) − ξ with ξ = [ ˙ r ϑ ( X ) − E { ˙ r ϑ ( X ) } ] ℓ ( ε )+ E [ ˙ r ϑ ( X )] ε/σ EGRESSION WITH MISSING RESPONSES and therefore consistent with our ﬁndings. A further developed eﬃcient es-timator, which requires weaker conditions, is in Forrester et al. (2003). Inthe model with missing responses an eﬃcient estimator can be constructedanalogously, using only the (available) full observations. Note that the onlydiﬀerence in the construction is that the data are incomplete, that is, thepresence of indicators Z i . In the following we will brieﬂy sketch this “one-step improvement” construction of the estimator and refer to Forrester etal. (2003) for details.Let ¯ ϑ denote a √ n consistent and discretized estimator of ϑ , that is,with values on a rectangular grid with side lengths of order n − / . Write µ ( ϑ ) for E { ˙ r ϑ ( X ) | Z = 1 } , ε ( ϑ ) for the error variables ε ( ϑ ) = Y − r ϑ ( X ) and ζ ϑ { X, ε ( ϑ ) } for ζ , that is, ζ = ζ ϑ { X, ε ( ϑ ) } = { ˙ r ϑ ( X ) − µ ( ϑ ) } ℓ { ε ( ϑ ) } + µ ( ϑ ) ε ( ϑ ) /σ . In order to estimate the inﬂuence function one replaces the unknown quan-tities by estimators. The estimator of ϑ is then of the form¯ ϑ + " n X j =1 Z j ˆ ζ ¯ ϑ { X j , ε j ( ¯ ϑ ) } ˆ ζ ¯ ϑ { X j , ε j ( ¯ ϑ ) ⊤ } − n X j =1 Z j ˆ ζ ¯ ϑ { X j , ε j ( ¯ ϑ ) } , where ˆ ζ ¯ ϑ ( X, ε ( ¯ ϑ )) = [ ˙ r ¯ ϑ ( X ) − ˆ µ ( ¯ ϑ )]ˆ ℓ { ε ( ¯ ϑ ) } + ˆ µ ( ¯ ϑ ) ε ( ¯ ϑ ) /σ ( ¯ ϑ )with ˆ µ ( ¯ ϑ ) = P nj =1 Z j ˙ r ¯ ϑ ( X j ) P nj =1 Z j , ˆ σ ( ¯ ϑ ) = P nj =1 Z j ε j ( ¯ ϑ ) P nj =1 Z j and an estimator ˆ ℓ of the score function. To describe this estimator let k be a kernel that satisﬁes the assumptions given in Section 8 of Forrester etal., for example, a logistic density. For a bandwidth a n → k n ( x ) = k ( x/a n ) /a n . The estimator of the score function ℓ is a kernel estimator basedon the available residuals ε ( ¯ ϑ ),ˆ ℓ ¯ ϑ ( x ) = − ˆ f ′ n ( x ) b n + ˆ f n ( x )with ˆ f n ( x ) = n − P nj =1 Z j k n { x − ε j ( ¯ ϑ ) } and where b n is a sequence of pos-itive numbers converging to zero. The orders of a n → b n → n data pairs is observed) are givenin Forrester et al. (2003).There are other simple estimators for ϑ available which, however, and incontrast to the estimators proposed by Schick (1987, 1993) and Forresteret al. (2003), are not eﬃcient for ϑ and which, if used for plug-in, would U. U. M ¨ULLER yield ineﬃcient estimators of Eh ( X, Y ). One could, for example, estimateˆ ϑ by a weighted least squares estimator, that is, by the solution t = ˆ ϑ ofan estimating equation P ni =1 Z i w t ( X i ) { Y i − r t ( X i ) } = 0. Such an estimatorwould be appropriate in a regression model where independence of errorsand covariates cannot be assumed. Then one could even obtain eﬃciency forsuitably chosen weights [see M¨uller (2007), for nonlinear regression withoutmissing responses]. The estimating equation can be regarded as an empiri-cal version of the equation E [ Zw t ( X ) { Y − r t ( X ) } ] = 0. If a solution t = ϑ of this equation exists, the solution ˆ ϑ of the empirical version will, in gen-eral, be consistent for ϑ . If one is not interested in eﬃciency, the estimator n − P ni =1 ˆ χ w ( X i , ˆ ϑ ) with a least squares estimator ˆ ϑ plugged in would yielda consistent estimator for Eh ( X, Y ) (but not an eﬃcient one since the inde-pendence structure is not used). Alternatively, the least squares estimatorcan be used as a preliminary estimator for the one-step improvement ap-proach sketched above.

6. Special cases, simulations and inference.

Sometimes the estimatorsimpliﬁes considerably, especially if we study simple special cases such asestimation of expectations Eh ( X, Y ) where h has a simple form. The mainresult from Theorem 5.2 is therefore useful in proving eﬃciency of existingapproaches for speciﬁc applications, or in improving them, and for com-parisons of competing methods. Theorem 5.2 further provides the limitingdistribution of the eﬃcient estimator, which facilitates the construction ofconﬁdence intervals. We will address this and aspects of the construction ofestimators in the following, and illustrate the results with simulations.6.1. Special cases.

We have shown that the fully imputed weighted esti-mator n − P ni =1 ˆ χ w ( X i , ˆ ϑ ) withˆ χ w ( x, ˆ ϑ ) = n X j =1 ˆ w j Z j h { x, r ˆ ϑ ( x ) + Y j − r ˆ ϑ ( X j ) } . n X j =1 Z j is eﬃcient for Eh ( X, Y ) where h ( X, Y ) is a known square-integrable func-tion. The literature usually deals with estimation of the mean response, thatis, h ( x, y ) = y . Other important examples are estimation of higher momentsof the response variable Y and the estimation of the covariance and of mixedmoments of X and Y . In all these cases h ( x, y ) is a polynomial in x and y and the estimator often simpliﬁes. This holds for the mean response, and,more generally, when h is of the form h ( x, y ) = a ( x ) y . Then the estimatorreduces to an unweighted empirical estimator, which can be seen as follows.Recall that the weights must be chosen such that P nj =1 ˆ w j Z j ˆ ε j = 0 and thatˆ w j = 1 − ˆ λ ˆ w j Z j ˆ ε j which gives P nj =1 ˆ w j Z j / P nj =1 Z j = 1. Hence the estimator EGRESSION WITH MISSING RESPONSES for E { a ( X ) Y } is1 n n X i =1 ˆ χ w ( X i , ˆ ϑ ) = 1 n n X i =1 P nj =1 ˆ w j Z j a ( X i ) { r ˆ ϑ ( X i ) + ˆ ε j } P nj =1 Z j = 1 n n X i =1 P nj =1 ˆ w j Z j P nj =1 Z j a ( X i ) r ˆ ϑ ( X i ) + P nj =1 ˆ w j Z j ˆ ε j P nj =1 Z j = 1 n n X i =1 a ( X i ) r ˆ ϑ ( X i ) . In these cases it is therefore not necessary to determine weights: the aboveintuitive estimator, with an eﬃcient estimator ˆ ϑ for ϑ plugged in, is eﬃcientfor E { a ( X ) Y } .An interesting special case is estimation of the mean response, a ( X ) =1, when possibly all responses are observed, which we mentioned in theIntroduction. Regardless of whether there are missing responses or not, n − P ni =1 r ˆ ϑ ( X i ) is eﬃcient for EY , provided ˆ ϑ is eﬃcient for ϑ . The diﬀer-ence between the two situations is the construction of ˆ ϑ , which will be basedon either complete data pairs or on missing response data. Let us stay withthis example and consider, for a comparison, the unweighted estimator (1.1)from the introduction, that is, with all weights equal to one. It involves theterm P nj =1 Z j ˆ ε j / P nj =1 Z j which is nonzero. If all responses are observable,the unweighted estimator further simpliﬁes, namely to1 n n X i =1 r ˆ ϑ ( X i ) + 1 n n X i =1 ˆ ε i = 1 n n X i =1 Y i [whereas the weighted estimator is n − P ni =1 r ˆ ϑ ( X i )]. Its inﬂuence functionis Y − EY which is clearly not the eﬃcient one: our eﬃcient estimator for EY (with an eﬃcient estimator ˆ ϑ ) has the expansion1 n n X i =1 r ˆ ϑ ( X i ) ˙= 1 n n X i =1 r ϑ ( X i ) + ( ˆ ϑ − ϑ ) E ˙ r ϑ ( X ) . We recognize this as the expansion from Theorem 3.3 with D w = E ˙ r ϑ ( X ).Even without inserting the expansion for ˆ ϑ − ϑ from the previous section, it isclear that this is, in general, not the inﬂuence function of n − P ni =1 Y i , whichshows that it cannot be eﬃcient. Note that n − P ni =1 Y i also coincides withthe (ineﬃcient) partially imputed estimator if all responses were observed.6.2. Simulations.

For an illustration with computer simulations we con-sider a linear regression function, r ϑ ( X ) = ϑX with ϑ = 2, and a nonlinearregression function, r ϑ ( X ) = cos( ϑX ), also with ϑ = 2. The probabilities U. U. M ¨ULLER π ( X ) = P ( Z = 1 | X ) = E ( Z | X ) are chosen as values of a logistic distributionfunction, π ( X ) = 1 / (1 + e − X ), so that on average one half of the simulatedresponses are missing. We generate covariates X from a uniform distribu-tion on the interval ( − ,

1) and error variables ε from a standard normaldistribution. If the errors are in fact normally distributed then ℓ ( ε ) = ε/σ and the eﬃcient one-step improvement estimator for ϑ from the previoussection is asymptotically equivalent to the ordinary least squares estimator.The following considerations can therefore be based on this straightforwardestimation approach.In a ﬁrst example we consider estimation of the mean response EY andcompare the eﬃcient (fully imputed weighted) estimator, which, as seenabove, here simpliﬁes to n − P ni =1 r ˆ ϑ ( X i ), with the partially imputed esti-mator n − P ni =1 { Z i Y i + (1 − Z i ) r ˆ ϑ ( X i )). We also study the performance ofthese estimators if the parameter estimates are replaced by their true values,and if all responses are observed, π ( · ) = 1. Further we calculate the ﬁrst sim-ple estimator from the introduction, n − P ni =1 Z i Y i / ˆ π ( X i ), with, for reasonsof simplicity, the estimated probabilities ˆ π replaced by the true ones. Thevalues of the simulated mean squared errors are given in Table 1.In both the linear and the nonlinear regression models, the fully imputedestimator performs considerably better than the partially imputed estima-tor. The simple estimator in the last column is clearly outperformed by theimputation approaches. Comparing the columns for the fully imputed es-timator with and without parameter estimation (and analogously for thepartially imputed estimator), we see that the estimator of the slope ϑ inlinear regression r ϑ ( X ) = ϑX is, as a plug-in estimator for estimating EY ,better than the parameter estimator of the frequency parameter ϑ in thenonlinear regression model r ϑ ( X ) = cos( ϑX ): in the linear regression modelthe mean squared errors of the approaches based on ϑ and ˆ ϑ are very simi-lar, in contrast to the nonlinear model where the diﬀerences are quite large.Let us also compare the (a) and (b) sections in the linear regression andthe nonlinear regression example, which refer to the situation where (a) re-sponses are missing at random and (b) all responses are available. For thefully imputed estimator n − P ni =1 r ˆ ϑ ( X i ) we observe the expected improvedperformance when more (response) data for the estimation of ϑ are avail-able. The situation is diﬀerent for the partially imputed estimator. Indeedwe expect that, similarly, performance will improve as the proportion ofobserved responses increases. In this case ˆ ϑ improves as an estimator of ϑ but, at the same time, the partially imputed estimator will discard moreand more information about the structure of the regression function. [In theextreme case π ( · ) = 1 it equals the empirical estimator n − P ni =1 Y i .] Ourexample demonstrates that both scenarios are possible: for the linear re-gression model the estimator of ϑ performs well and the simulated mean EGRESSION WITH MISSING RESPONSES Table 1

Simulated mean squared errors of estimators of the mean response EY π ( X ) n c FI FI c PI PI N

Linear regression: r ϑ ( X ) = ϑX ( ϑ = 2)1 / (1 + e − X ) 50 0.027520 0.026639 0.036231 0.036368 0.104962100 0.013502 0.013298 0.018074 0.018364 0.0526801000 0.001328 0.001325 0.001794 0.001835 0.0052701 50 0.026990 0.026639 0.046322 0.046322 0.046322100 0.013415 0.013298 0.023479 0.023479 0.0234791000 0.001327 0.001325 0.002345 0.002345 0.002345Nonlinear regression: r ϑ ( X ) = cos( ϑX ) ( ϑ = 2)1 / (1 + e − X ) 50 0.027858 0.003957 0.031163 0.013272 0.053038100 0.015462 0.002001 0.017147 0.007020 0.0281541000 0.001492 0.000199 0.001671 0.000696 0.0028101 50 0.016512 0.003957 0.023369 0.023369 0.023369100 0.008581 0.002001 0.012043 0.012043 0.0120431000 0.000852 0.000199 0.001207 0.001207 0.001207 Notes.

The table entries are the simulated mean squared errors of estimators of EY = Er ϑ ( X ) with partially missing responses, π ( X ) = 1 / (1 + e − X ) and completely observeddata pairs, π ( X ) = 1. In the ﬁrst two columns we study the eﬃcient fully imputed weightedestimator with the ordinary least squares estimator ˆ ϑ plugged in ( b FI) and its correspondingversion using the true parameter, ϑ = 2 (FI). The next two columns refer to the partiallyimputed estimator using ˆ ϑ ( b PI) and the version based on ϑ = 2 (PI). The last columnconsiders the simple estimator n − P ni =1 Z i Y i /π ( X i ) (N), which does not use imputation.Note that in the sections with π ( X ) = 1 the columns for b PI, PI and N are identical: since allthe indicators are 1, these estimators coincide with the empirical estimator n − P ni =1 Y i . squared error of the partially imputed estimator in (a) is smaller than in(b). In the nonlinear regression model the estimator of ϑ is not as good andthe mean squared error in (a) is larger than the mean squared error of theempirical estimator in (b). Note that this observation about the performanceof the partially imputed estimator is only of secondary interest since, in anycase, the fully imputed estimator has the smaller mean squared error.The situation is slightly more complicated when h is of the form h ( x, y ) = a ( x ) b ( y ) with a nonlinear function b , for example, when higher mixed mo-ments of X and Y or just higher moments of Y are estimated. Simpli-ﬁed estimators are available when b has a simple form. For an illustra-tion we consider, in a second example, estimation of the second moment EY = Er ϑ ( X ) + σ . The fully imputed estimator is1 n n X i =1 P nj =1 ˆ w j Z j { r ˆ ϑ ( X i ) + ˆ ε j } P nj =1 Z j = 1 n n X i =1 r ˆ ϑ ( X i ) + P nj =1 ˆ w j Z j ˆ ε j P nj =1 Z j . U. U. M ¨ULLER

The mean square errors for the fully imputed and the partially imputedestimator (with and without parameter estimation) are given in Table 2.Consider the lower section on nonlinear regression ﬁrst. We see that, asexpected, the fully imputed estimator outperforms the partially imputedestimator, and that, in part (a) with missing responses, both estimatorsare far better than the simple estimator in the last column. Using an es-timator ˆ ϑ for ϑ , or the true value ϑ = 2, does not have much impact onthe mean squared error here. The upper half of Table 2 on linear regres-sion, however, shows a diﬀerent picture: although the mean squared er-ror of the fully imputed and the partially imputed based on the true ϑ are considerably diﬀerent (which is what we would expect) the values ofthe estimators based on the ordinary least squares parameter estimatorˆ ϑ suggest that the two approaches are asymptotically equivalent. For theextreme case (b) where π ( · ) = 1 this would mean that the fully imputedestimator n − P ni =1 r ˆ ϑ ( X i ) + n − P ni =1 ˆ w i ˆ ε i and the empirical estimator n − P ni =1 Y i are asymptotically equivalent. This may be surprising but,in fact, it is easy to see that this is exactly what is happening: we con-sider the special example of linear regression with normal errors and theordinary least squares estimator ˆ ϑ = P ni =1 X i Y i / P ni =1 X i . Rewriting the Table 2

Simulated mean squared errors of estimators of EY π ( X ) n c FI FI c PI PI N

Linear regression: r ϑ ( X ) = ϑX ( ϑ = 2)1 / (1 + e − X ) 50 0.312670 0.116360 0.310263 0.161374 0.528146100 0.158512 0.055343 0.157402 0.079863 0.2676011000 0.016215 0.005470 0.016189 0.008113 0.0272981 50 0.174683 0.070048 0.173817 0.173817 0.173817100 0.088960 0.034685 0.088455 0.088455 0.0884551000 0.008630 0.003359 0.008623 0.008623 0.008623Nonlinear regression: r ϑ ( X ) = cos( ϑX ) ( ϑ = 2)1 / (1 + e − X ) 50 0.086350 0.087286 0.092361 0.093401 0.176124100 0.042671 0.042747 0.047054 0.047219 0.0924781000 0.004260 0.004179 0.005032 0.004961 0.0101531 50 0.043774 0.043873 0.066100 0.066100 0.066100100 0.021578 0.021574 0.035573 0.035573 0.0355731000 0.002159 0.002116 0.003713 0.003713 0.003713 Notes.

Here we study estimation of EY . The ﬁrst two columns refer to the fully im-puted estimator with the ordinary least squares estimator ˆ ϑ plugged in ( b FI) and to itsversion using ϑ = 2 (FI). In the next two columns we consider the partially imputed es-timator based on ˆ ϑ ( b PI) and ϑ = 2 (PI). In the last column the mean squared errors of n − P ni =1 Z i Y i /π ( X i ) (N) are listed.EGRESSION WITH MISSING RESPONSES empirical estimator gives n − P ni =1 Y i = n − P ni =1 r ˆ ϑ ( X i ) + n − P ni =1 ˆ ε i + n − ϑ P ni =1 ˆ ε i X i . The last term cancels for the least squares estimator ˆ ϑ sothat n − P ni =1 Y i = n − P ni =1 r ˆ ϑ ( X i ) + n − P ni =1 ˆ ε i . Finally, by our resultsfrom Section 3, the estimators n − P ni =1 ˆ w i ˆ ε i and n − P ni =1 ˆ ε i of the errorvariance σ are asymptotically equivalent.In the next example we restrict our attention to linear regression, r ϑ ( X ) = ϑX , ϑ = 2, and consider estimation of a more complicated expectation,namely of Eh ( X, Y ) = E ( Xe XY ). In contrast to the previous examples the(weighted) fully imputed estimator cannot be reduced. The mean squarederrors of this estimator and of the partially imputed estimator are given inTable 3. For each estimator we study the two cases with and without param-eter estimation. Again we observe that the performance of the estimators isnot much aﬀected by the plug-in parameter estimator. Comparing the fullyand the partially imputed estimators we see that the fully imputed estima-tor clearly outperforms the partially imputed estimator. In addition we alsocalculate the simulated mean squared error of the unweighted (ineﬃcient)version of our fully imputed estimator. The performance of this estimatorturns out to lie between the fully and the partially imputed one. In partic-ular, the simulations in section (b), where all data are observed and wherethe partially imputed estimator equals the empirical estimator, conﬁrm ourtheoretical observation that incorporating the information about the loca-tion of the errors, for example in the form of weights as done in this article,is important.In order to study the behavior of the fully imputed estimator for multi-dimensional ϑ we again studied estimation of E ( Xe XY ). For our simulations Table 3

Simulated mean squared errors of estimators of E { X exp( XY ) } in linear regression π ( X ) n c FI FI b U c PI PI / (1 + e − X ) 50 0.32563 0.29024 0.36187 0.48164 0.47769100 0.15017 0.14085 0.18147 0.24192 0.246981000 0.01384 0.0137 0.01992 0.02577 0.027031 50 0.28988 0.27262 0.32220 0.58566 0.58566100 0.13804 0.13413 0.16520 0.29948 0.299481000 0.01332 0.01329 0.01663 0.02997 0.02997 Notes.

We consider estimation of Eh ( X, Y ) = E ( Xe XY ) in the linear regression model r ϑ ( X ) = ϑX , ϑ = 2. The ﬁrst two columns give the mean squared errors of the fullyimputed estimator with the least squares estimator ˆ ϑ plugged in ( b FI), and its versionusing ϑ = 2 (FI). The third column contains the mean squared errors of the unweightedversion b U of b FI. The last two columns refer to the partially imputed estimator using ˆ ϑ ( b PI) and ϑ = 2 (PI). Note that if π ( X ) = 1 then the partially imputed estimator againequals the empirical estimator, PI = b PI = n − P ni =1 X i exp( X i Y i ). U. U. M ¨ULLER

Table 4

Simulated mean squared errors of estimators of E { X exp( XY ) } with ϑ ∈ R p ( p = 2 , r ϑ ( X ) c FI FI b U c PI PI ϑ X + ϑ U ϑ + ϑ X + ϑ U ϑ X + ϑ U + ϑ V Notes.

The three rows refer to two regression functions with diﬀerent parametrizations. Wehave ϑ = 0, ϑ = 2, ϑ = − ϑ = 0 . n = 100, π ( X ) = 1 / (1 + e − X ). The covariates X, U and V are independent from a uniform distribution on ( − , Table 5

Simulated mean squared errors of estimators of Eh ( X, Y ) with ˆ ϑ ineﬃcient EY EY E ( Xe XY ) r ϑ ( X ) = cos( ϑX ) r ϑ ( X ) = ϑX r ϑ ( X ) = ϑXπ ( X ) n c FI c PI c FI c PI c FI c PI / (1 + e − X ) 50 0.03124 0.03545 0.02742 0.03868 0.50275 0.72944100 0.01841 0.02057 0.01375 0.01938 0.24148 0.487591 50 0.02000 0.02864 0.02689 0.05181 0.41476 0.79949100 0.01016 0.01448 0.01359 0.02589 0.25796 0.63103 Notes.

We compare fully and the partially imputed estimators of EY and E ( Xe XY ),keeping the previous notation. Again, ˆ ϑ is the least squares estimator, but now the errorsare from a t -distribution with 10 degrees of freedom. we restricted our attention to missing data and on samples of size n = 100,and considered three diﬀerent regression models which are given in Table4. Note that the second regression function, ϑ + ϑ X + ϑ U with ϑ = 0, ϑ = 2 and ϑ = −

1, equals the ﬁrst one, namely 2 X − U , but it involves athree-dimensional parameter. As expected, the increase of dimension impairsthe performance of the fully imputed (weighted and unweighted) and ofthe partially imputed estimator. Note that the weighted and unweightedfully imputed estimator ( c FI and b U) in the second regression model are thesame: we consider the least squares estimator in a regression model withan intercept term ϑ . In this model the least squares estimator solves, byconstruction, P nj =1 Z j ˆ ε j = 0 (which implies that all weights ˆ w j equal one).Again we observe that the fully imputed estimator consistently outperformsthe partially imputed estimator.We conclude this section with a small simulation study to examine thebehavior of the fully imputed estimator when ˆ ϑ is ineﬃcient. The simplestsetting is to choose the ordinary least squares estimator, as we did before, EGRESSION WITH MISSING RESPONSES but in a model with non-normal errors. In Table 5 we consider estimationof the mean response and of E ( Xe XY ) for linear and nonlinear regression,and for errors from a t -distribution. The results are similar to the previousones: again the fully imputed estimator performs best, though not as wellas if the errors are, in fact, from a normal distribution (cf. Tables 1–3).Simulations with a logistic error density turned out similarly, conﬁrming thebetter performance of the imputation method. At least in these examples,with moderate sample sizes n = 50 and n = 100, the construction of ϑ doesnot seem to be as important as the choice between the full and the partialimputation approaches.6.3. Conﬁdence intervals.

By Theorem 5.2 the fully imputed weightedestimator n − P ni =1 ˆ χ w ( X i , ˆ ϑ ) is asymptotically normally distributed, withasymptotic variance σ = Eχ ( X, ϑ )+ ( EZ ) − E ¯ h ( ε ) − {

1+ ( EZ ) − } E h ( X,Y ) − E { ε ¯ h ( ε ) } / ( σ EZ ) + D ⊤ w E ( Zζζ ⊤ ) − D w (see Theorem 5.2 for the nota-tion). An asymptotic conﬁdence interval for Eh ( X, Y ) with conﬁdence level1 − α is n n X i =1 ˆ χ w ( X i , ˆ ϑ ) − z α/ s ˆ σ n , n n X i =1 ˆ χ w ( X i , ˆ ϑ ) + z α/ s ˆ σ n ! , where z α/ denotes the upper α/ σ is a consistent estimator of σ . Consider, for example,estimation of EY with r ϑ ( X ) depending on a scalar parameter ϑ , which cov-ers our previous simple examples r ϑ ( X ) = ϑX and r ϑ ( X ) = cos( ϑX ). Herethe conﬁdence interval is n − P ni =1 r ˆ ϑ ( X i ) ± z α/ (ˆ σ /n ) / . The asymptoticvariance of n − P ni =1 r ˆ ϑ ( X i ) is σ = Var r ϑ ( X ) + E { ˙ r ϑ ( X ) } EZ Var { r ϑ ( X ) | Z = 1 } E { ℓ ( ε ) } . The expectations in the formula can be estimated by empirical methods,with a consistent estimator ˆ ϑ for the parameter ϑ plugged in. Consider, forexample, Var { r ϑ ( X ) | Z = 1 } = E { r ϑ ( X ) | Z = 1 } − E { r ϑ ( X ) | Z = 1 } . Theﬁrst expectation is estimated by ( P ni =1 Z i ) − P ni =1 Z i { r ˆ ϑ ( X i ) } , and analo-gously the second one.In order to conﬁrm the theoretical results we also performed some simu-lation studies, generating conﬁdence intervals for the above examples withthe described estimation method. As expected, for α = 0 .

05 we obtained thedesired coverage probability 0.95.APPENDIX

Lemma A.1.

Suppose that Assumption is satisﬁed. Then, for a √ n consistent estimator ˆ ϑ of ϑ , the statements (3.2)–(3.4) hold. U. U. M ¨ULLER

Proof.

In order to prove (3.2)–(3.4) we ﬁrst showmax ≤ i ≤ n | Z i ˆ ε i − Z i ε i | = o p (1) , (A.1) n X i =1 Z i (ˆ ε i − ε ∗ i ) = o p (1) with ε ∗ i = ε i − ˙ r ϑ ( X i ) ⊤ ( ˆ ϑ − ϑ ) . (A.2)Result (A.2) immediately follows from the √ n consistency of ˆ ϑ and thestochastic diﬀerentiability of r ϑ [implication (2.1) of Assumption 1]: n X i =1 Z i (ˆ ε i − ε ∗ i ) = n X i =1 Z i [ˆ ε i − { ε i − ˙ r ϑ ( X i ) ⊤ ( ˆ ϑ − ϑ ) } ] ≤ n X i =1 { r ˆ ϑ ( X i ) − r ϑ ( X i ) − ˙ r ϑ ( X i ) ⊤ ( ˆ ϑ − ϑ ) } = o p (1) . This gives max ≤ i ≤ n | Z i (ˆ ε i − ε ∗ i ) | = o p (1). In order to establish (A.1) it there-fore suﬃces to show max ≤ i ≤ n | Z i ( ε ∗ i − ε i ) | = o p (1). We havemax ≤ i ≤ n | Z i ( ε ∗ i − ε i ) | ≤ max ≤ i ≤ n | ε ∗ i − ε i | ≤ | ˆ ϑ − ϑ | · max ≤ i ≤ n | ˙ r ϑ ( X i ) | . Since ˆ ϑ is √ n consistent we only need n − / max ≤ i ≤ n | ˙ r ϑ ( X i ) | = o p (1). Butthis holds by Owen (2001), Lemma 11.2, since the variables | ˙ r ϑ ( X i ) | , i =1 , . . . , n , are i.i.d. and, by Assumption 1, have ﬁnite second moments. Thisshows max ≤ i ≤ n | Z i ( ε ∗ i − ε i ) | = o p (1).Equation (3.2), max ≤ i ≤ n | Z i ˆ ε i | = o p ( n / ), can be seen as follows: we canbound max ≤ i ≤ n | Z i ˆ ε i | by max ≤ i ≤ n | Z i ˆ ε i − Z i ε i | + max ≤ i ≤ n | Z i ε i | . The ﬁrstterm is o p (1) by (A.1) and the second term is o p ( n / ) by Owen’s Lemma11.2 since the Z i ε i are i.i.d. with ﬁnite variance. We now show (3.3), that is,1 n n X i =1 Z i ˆ ε i = 1 n n X i =1 Z i ε i − EZE { ˙ r ϑ ( X ) | Z = 1 } ⊤ ( ˆ ϑ − ϑ ) + o p ( n − / ) . In view of (A.2), n − P ni =1 Z i ˆ ε i = n − P ni =1 Z i ε ∗ i + o p ( n − / ). By the law oflarge numbers we obtain1 n n X i =1 Z i ε ∗ i = 1 n n X i =1 Z i ε i − n n X i =1 Z i ˙ r ϑ ( X i ) ⊤ ( ˆ ϑ − ϑ )= 1 n n X i =1 Z i ε i − E { Z ˙ r ϑ ( X ) } ⊤ ( ˆ ϑ − ϑ ) + o p ( n − / ) . Since E { Z ˙ r ϑ ( X ) } = EZE { ˙ r ϑ ( X ) | Z = 1 } we have established (3.3).Our last auxiliary result to prove is (3.4),1 n n X i =1 Z i ˆ ε i = 1 n n X i =1 Z i ε i + o p (1) = EZσ + o p (1) . EGRESSION WITH MISSING RESPONSES The second equality is just a consequence of the law of large numbers. Tosee that the ﬁrst equation holds consider1 n n X i =1 Z i ˆ ε i − n n X i =1 Z i ε i = 1 n n X i =1 Z i (ˆ ε i − ε i ) + 2 1 n n X i =1 Z i (ˆ ε i − ε i ) ε i . The second term on the right-hand side is o p (1) by (A.1). To show that theﬁrst expression is o p (1) it suﬃces, in view of (A.2), to consider1 n n X i =1 Z i ( ε ∗ i − ε i ) = 1 n n X i =1 Z i { ˙ r ϑ ( X i ) ⊤ ( ˆ ϑ − ϑ ) } . This term is o p (1) since ˆ ϑ is √ n consistent and since ˙ r ϑ ( X ) is in L ( P ). (cid:3) Acknowledgments.

Many thanks to Anton Schick for important sugges-tions on specifying and constructing eﬃcient parameter estimators in Section5, to Wolfgang Wefelmeyer for valuable discussions and advice and to Ray-mond Carroll for constructive criticism of an earlier draft. Thanks also totwo referees for helpful comments.REFERENCES

Bickel, P. J. (1982). On adaptive estimation.

Ann. Statist. Bickel, P. J., Klaassen, C. A. J., Ritov, Y. and

Wellner, J. A. (1998).

Eﬃcientand Adaptive Estimation for Semiparametric Models . Springer, New York. MR1623559

Chen, J., Fan, J., Li, K. H. and

Zhou, H. (2006). Local quasi-likelihood estimationwith data missing at random.

Statist. Sinica Chen, S. X. and

Wang, D. (2009). Empirical likelihood for estimating equations withmissing values.

Ann. Statist. Chen, X., Hong, H. and

Tarozzi, A. (2008). Semiparametric eﬃciency in GMM modelswith auxiliary data.

Ann. Statist. Cheng, P. E. (1994). Nonparametric estimation of mean functionals with data missingat random.

J. Amer. Statist. Assoc. Forrester, J., Hooper, W., Peng, H. and

Schick, A. (2003). On the construction of ef-ﬁcient estimators in semiparametric models.

Statist. Decisions Gelman, A., Carlin, J. B., Stern, H. S. and

Rubin, D. B. (1995).

Bayesian DataAnalysis . Chapman & Hall, London. MR1385925

Koul, H. L. and

Susarla, V. (1983). Adaptive estimation in linear regression.

Statist.Decisions Liang, H., Wang, S. and

Carroll, R. J. (2007). Partially linear models with missingresponse variables and error-prone covariates.

Biometrika Little, R. J. A. and

Rubin, D. B. (2002).

Statistical Analysis With Missing Data , 2nded. Wiley, New York. MR1925014

Maity, A., Ma, Y. and

Carroll, R. J. (2007). Eﬃcient estimation of population-levelsummaries in general semiparametric regression models.

J. Amer. Statist. Assoc.

Matloff, N. S. (1981). Use of regression functions for improved estimation of means.

Biometrika U. U. M ¨ULLER

M¨uller, U. U. (2007). Weighted least squares estimators in possibly misspeciﬁed non-linear regression.

Metrika M¨uller, U. U., Schick, A. and

Wefelmeyer, W. (2005). Weighted residual-baseddensity estimators for nonlinear autoregressive models.

Statist. Sinica M¨uller, U. U., Schick, A. and

Wefelmeyer, W. (2006). Imputing responses that arenot missing. In

Probability, Statistics and Modelling in Public Health (M. Nikulin, D.Commenges and C. Huber, eds.) 350–363. Springer, New York. MR2230741

Owen, A. B. (1988). Empirical likelihood ratio conﬁdence intervals for a single functional.

Biometrika Owen, A. B. (2001).

Empirical Likelihood. Monographs on Statistics and Applied Proba-bility . Chapman & Hall, London. Qin, J. and

Zhang, B. (2007). Empirical-likelihood-based inference in missing responseproblems and its application in observational studies.

J. Roy. Statist. Soc. Ser. B Schick, A. (1987). A note on the construction of asymptotically linear estimators.

J.Statist. Plann. Inference Schick, A. (1993). On eﬃcient estimation in regression models.

Ann. Statist. (1995) 1862–1863. MR1241276 Tamhane, A. C. (1978). Inference based on regression estimator in double sampling.

Biometrika Tsiatis, A. A. (2006).

Semiparametric Theory and Missing Data . Springer, New York.MR2233926

Wang, Q. (2004). Likelihood-based imputation inference for mean functionals in the pres-ence of missing responses.

Ann. Inst. Statist. Math. Wang, Q., Linton, O. and

H¨ardle, W. (2004). Semiparametric regression analysis withmissing response at random.

J. Amer. Statist. Assoc. Wang, Q. and

Rao, J. N. K. (2001). Empirical likelihood for linear regression modelsunder imputation for missing responses.

Canad. J. Statist. Wang, Q. and

Rao, J. N. K. (2002). Empirical likelihood-based inference under impu-tation for missing response data.

Ann. Statist. Department of StatisticsTexas A&M UniversityCollege Station, Texas 77843-3143USAE-mail: [email protected]