[PDF] Nonclassical Measurement Error in the Outcome Variable

Abstract

We study a semi-/nonparametric regression model with a general form of nonclassical measurement error in the outcome variable. We show equivalence of this model to a generalized regression model. Our main identifying assumptions are a special regressor type restriction and monotonicity in the nonlinear relationship between the observed and unobserved true outcome. Nonparametric identification is then obtained under a normalization of the unknown link function, which is a natural extension of the classical measurement error case. We propose a novel sieve rank estimator for the regression function and establish its rate of convergence. In Monte Carlo simulations, we find that our estimator corrects for biases induced by nonclassical measurement error and provides numerically stable results. We apply our method to analyze belief formation of stock market expectations with survey data from the German Socio-Economic Panel (SOEP) and find evidence for nonclassical measurement error in subjective belief data.

Full PDF

NNonclassical Measurement Error in theOutcome Variable ∗ Christoph Breunig † Emory University

Stephan Martin ‡ Humboldt-Universit¨at zu Berlin

September 29, 2020

We study a semi-/nonparametric regression model with a general form of nonclassicalmeasurement error in the outcome variable. We show equivalence of this model to a gen-eralized regression model. Our main identifying assumptions are a special regressor typerestriction and monotonicity in the nonlinear relationship between the observed and unob-served true outcome. Nonparametric identiﬁcation is then obtained under a normalizationof the unknown link function, which is a natural extension of the classical measurement er-ror case. We propose a novel sieve rank estimator for the regression function. We establisha rate of convergence of the estimator which depends on the strength of identiﬁcation. InMonte Carlo simulations, we ﬁnd that our estimator corrects for biases induced by measure-ment errors and provides numerically stable results. We apply our method to analyze beliefformation of stock market expectations with survey data from the German Socio-EconomicPanel (SOEP) and ﬁnd evidence for non-classical measurement error in subjective beliefdata.

Keywords:

Non-classical measurement error, rank based estimation, shape restrictions,nonparametric identiﬁcation, special regressors, generalized regression, sieve estimation. ∗ We are thankful to seminar participants at CMStatistics/CFE in London, Humboldt-Universit¨at zu Berlin, Retreatof CRC TRR 190, and UEA in Norwich for their helpful suggestions. Financial support by Deutsche Forschungsge-meinschaft through CRC TRR 190 is gratefully acknowledged. † Department of Economics, Emory University, Rich Memorial Building, Atlanta, GA 30322, USA. Email: [email protected] ‡ Humboldt-Universit¨at zu Berlin, Spandauer Straße 1, 10178 Berlin, Germany, e-mail: [email protected] a r X i v : . [ ec on . E M ] S e p . Introduction In empirical research, measurement error is a recurring issue. In recent years much attention hasbeen given to various forms of measurement error in the covariates of econometric models, whereasmeasurement error of the dependent variable is mostly ignored. In many economic environmentsmeasurement error of the dependent variable may be driven (in a nonlinear fashion) by the underlyingvariable. This nonclassical measurement error implies biased estimation results if not accounted for.This paper is concerned with semi-/nonparametric regression models where the dependent variableof interest Y ∗ is generally not observed and only a possibly error-contaminated measurement Y isobservable. Speciﬁcally, Y ∗ satisﬁes Y ∗ = g ( X ) + U, where the unknown function g is of interest given observed covariates X and unobservables U . Westudy the non-classical measurement error case where E [ Y | Y ∗ , X ] (cid:54) = Y ∗ and hence regressing theobserved outcome on covariates generally does not provide us with a consistent estimate for the truemean regression function.Nonparametric identiﬁcation of our model relies on the availability of covariates which do not aﬀectthe measurement error directly. We impose such type of exclusion restriction on a subset Z of the vector X = ( Z, W ) where W are additional controls. Under a monotonicity condition on the measurementerror mechanism E [ Y | Y ∗ , X ] the regression model can be reformulated as a generalized regressionmodel of the form E [ Y | X = x ] = H ( g ( x ) , w )where H ( · , w ) is some nonlinear, monotonic function for w in the support of W . From this gener-alized regression model, we establish identiﬁcation of functions g or g ( · , w ) up to strictly monotonictransformations.The identiﬁcation up to strictly monotonic transformations still allows us to infer economicallyrelevant quantities such as the direction and shape of partial eﬀects. Further, under scale and locationnormalization of the unknown link function H , nonparametric identiﬁcation of the regression function g is obtained. We highlight that normalization of the link function H is equivalent to imposing mildshape restrictions on the measurement error mechanism. Additionally, our normalization conditionson the link function do not only naturally extend the classical measurement case but are also satisﬁedif there is a range of Y ∗ where measurement error is classical. Our nonparametric identiﬁcation resultsbuild thus on intuitive assumptions without relying on high-level assumptions such as completeness,see Hu and Schennach [2008].We propose a novel sieve rank-based minimum distance estimator and establish its asymptoticproperties. The estimator builds on U-Statistics as well as more recent results linking U-statistics andempirical processes. We ﬁnd that the sieve rank estimator generally suﬀers from ill-posedness in theconvergence rate as the rank-based criterion function is not continuous in the usual L -norm. We alsoextend the estimator to continuous controls W using kernel weights.We analyze the performance of the estimator in a Monte Carlo simulation study and in an empirical2pplication using survey data. We apply our estimator to study belief formation with subjective beliefdata from the German Socio-Economic Panel innovation sample (SOEP-IS). Subjective belief data isknown to be plagued by substantial measurement error and it is in general hard to justify that themeasurement error is classical and thus not sensitive to the underlying true individual belief. We studythe impact of an exogenous display of historic stock market returns provided to survey respondentsprior to eliciting their belief on future returns. Applying our method, we ﬁnd a monotonic and concaverelationship between the historic information and stated beliefs indicating that individuals acknowledgethe given information conservatively. Literature

Our work ties into the literature on measurement error in observable variables of econo-metric models. The literature on measurement error in covariates is extensive, whereas measurementerror in the outcome variable has received much less attention. For a review of models with errors incovariates, see e.g. Chen et al. [2011] and Schennach [2013]. Chen et al. [2005] develop a general wayof accounting for measurement error in any variable of a class of semiparametric models once auxiliarydata, e.g. from validation samples is available. However, this is hardly the case in most practical appli-cations. Models focusing on non-classical measurement error in the outcome side are rare. Chapter 3 ofAbrevaya and Hausman [1999] considers a semiparametric model with a more simplistic measurementerror mechanism. Hoderlein and Winter [2010] and Hoderlein et al. [2015] develop structural modelsof response error in surveys due to imperfect recall and derive testable implications for econometricanalyses. The latter paper focuses on the role of rounding in individual reporting behavior which isalso a more speciﬁc form of non-classical measurement error.Nadai and Lewbel [2016] allows for classical measurement error in the outcome variable that iscorrelated with an error in covariates. Abrevaya and Hausman [2004] consider classical measurementerror of the dependent variable in a transformation model. Given we have a precise idea on theform of measurement error, a sizeable literature is usually available providing diﬀerent strategies foridentiﬁcation. For instance a special case of nonclassical measurement error is selective non-responsein the outcome variable, see e.g. D’Haultfoeuille [2010] or Breunig et al. [2018] and references therein.A non-nested form of nonclassical measurement error are Berkson-type errors, see Berkson [1950] andSchennach [2013, Section 6.3].Our identifying assumptions lead us to the literature on generalized regression models as introducedin Han [1987] or the class of nonlinear index models in Matzkin [2007]. See also the model studiedin Jacho-Chavez et al. [2010]. Estimation of such models often proceeds by rank-based estimationstrategies, see Han [1987], Cavanagh and Sherman [1998], Khan [2001], Shin [2010] and Abrevayaand Shin [2011] which all consider parametric regression models with the exception of Matzkin [1991]who studies a nonparametric model with additional shape restrictions on the link function. A recentcontribution studying rank estimators in a high-dimensional setting is Fan et al. [2020]. To the bestof our knowledge, we are the ﬁrst to study nonparametric M-estimation with rank-based criterionfunctions and to point out and illustrate the ill-posedness of the estimation problem.The remainder of the paper is organised as follows. In Section 2 we present our model setup andgive a nonparametric identiﬁcation result for features of the mean regression function when there isa form of non-classical measurement error in the outcome variable. In Section 3 we introduce a sieveestimator with a rank based criterion function and establish its convergence. In Section 4 we analyze3nite sample properties of the estimator in a Monte Carlo simulation study. Section 5 contains anapplication of our method to belief formation of stock market expectations. Appendix A providesdescriptive statistics on the empirical data. Appendix B provides an extension to weighted sieve rankestimation, when control variables are continuous. All proofs are postponed to the Appendix C.

2. Model Setup and Identiﬁcation

We consider a nonparametric econometric model with measurement error in the outcome variable.The model we study is Y ∗ = g ( X ) + U (2.1)where Y ∗ is the scalar, outcome variable, X is a d x -dimensional vector of exogenous covariates, U is ascalar error term, and g a nonparametric function of interest. The outcome variable Y ∗ is not observedby the researcher; only an error contaminated measurement Y is available. We are primarily interestedin the case where the error satisﬁes E [ U | X ] = 0 and thus g is the unknown conditional expectationfunction of Y ∗ given X .Throughout the paper we assume that the regressors X can be decomposed such that X = ( Z (cid:48) , W (cid:48) ) (cid:48) where Z has no direct eﬀect on the measurement error and W are control variables. Also we introducethe notation g w ( · ) ≡ g ( · , w ) for the regression function evaluated at a ﬁxed w in the support of W .Our goal is to identify and estimate the unknown g under possibly non-classical measurement errorin the outcome variable. In the next section we therefore present restrictions on the model and formof the measurement error which give us a nonparametric identiﬁcation result for certain features of g ,i.e., the function g w up to strictly monotonic transformations. Assumption 1 (Exclusion Restriction) . The observed outcome Y is conditionally mean independentof Z given Y ∗ and W , i.e., E [ Y | Y ∗ , Z, W ] = E [ Y | Y ∗ , W ] . Assumption 1 rules out that Z has a direct eﬀect on the measurement Y in conditional means.Assumption 1 is generally weaker than assuming that the conditional distribution of Y given ( Y ∗ , Z, W )does not depend on Z , which restricts Z to have no information on Y that is not captured by ( Y ∗ , W ).Analogues exclusion restrictions are commonly imposed in the literature on non-classical measurementerror in covariates. In Assumption 2 (ii) of Hu and Schennach [2008] the distribution of the error-contaminated regressor is independent of instruments conditional on the latent regressor (see alsoSchennach [2013, Section 4.3]). Assumption 1 is less restrictive than other exclusion restrictions foundin the measurement error literature, see Ben-Moshe et al. [2017, Assumption 2.1 (iii)].A similar condition to Assumption 1 can also be found in the literature on selective non-responsewhich is a special case of non-classical measurement error in the outcome. Individuals either reportthe outcome truthfully (response indicator D = 1) or not at all ( D = 0) so the observed outcome inthis case is Y = DY ∗ . An identifying assumption in D’Haultfoeuille [2010] and Breunig et al. [2018]is that D ⊥⊥ X | ( Y ∗ , W ) which is similar to our Assumption 1. See also Tang et al. [2003] and Zhaoand Shao [2015] for similar conditions. 4n the following, we make use of the notation h ( Y ∗ , W ) = E [ Y | Y ∗ , W ]. Assumption 1 implies themeasurement error model Y = h ( Y ∗ , W ) + V where E [ V | Y ∗ , W ] = 0. Consequently, Assumption 1 implies conditional mean independence of themeasurement error V given the regression error U , that is, E [ V | U ] = 0. Below, for any random variable X , its support is denoted by supp ( X ). We now impose shape restrictions on the conditional meanfunction h . Assumption 2 (Monotonicity) . For any w ∈ supp ( W ) , the function h ( · , w ) is weakly monotonic andnon-constant over the support of Y ∗ . Assumption 2 imposes that the expected observed outcome Y is monotonic in the latent outcome Y ∗ given W . This is trivially satisﬁed when the measurement error is classical, i.e., when h does notdepend on W and is the identity. A similar monotonicity condition has also been imposed in themeasurement error model in Example 3 of Abrevaya and Hausman [1999]. We discuss the plausiblity of Assumption 2 in a setting with survey data in Example 2.1. Note that h does not need to be strictly monotonic which allows to consider models with rounding error in theoutcome, see Hoderlein et al. [2015]. Assumption 3 (Conditional Exogeneity) . The conditional independence restriction Z ⊥⊥ U | W holds. Assumption 3 imposes a conditional independence restriction of Z and the regression error U . Thiscondition is also known as conditional exogeneity assumption following White and Chalak [2010]. In-dependence assumptions can be restrictive, but are often required in the measurement error literature(see, e.g. Hausman et al. [1991], Schennach [2007], Ben-Moshe et al. [2017, Assumption 2.2]), or whenaccounting for endogeneity using control functions (see, e.g. Newey et al. [1999]). We relax the suchrestrictions by imposing independence only conditional on control variables W . Similar conditionsare often employed for identiﬁcation in the econometrics literature, see e.g. Chiappori et al. [2015]for nonparametric identiﬁcation in a transformation model. It corresponds to the unconfoundednessassumption in the treatment eﬀects literature and is also closely related to the special regressor as-sumption, see Lewbel [2014] for a review.A key implication of Assumptions 1–3 is E [ Y | X = x ] = E [ h ( Y ∗ , W ) | Z = z, W = w ]= E [ h ( g ( Z, W ) +

U, W ) | Z = z, W = w ]= E [ h ( g ( z, W ) + U, W ) | W = w ]=: H ( g w ( z ) , w ) (2.2) In our notation Abrevaya and Hausman [1999] consider the error mechanism Y = h ( Y ∗ , V ), with ∂ y h ( Y ∗ , V ) > ∂ v h ( Y ∗ , V ) > V ⊥⊥ ( X, U ). As we allow for heteroscedasticity in the measurement error model, condition ∂ y h ( Y ∗ , V ) > H is a function that is strictly monotonically increasing in its ﬁrst argument as we show inthe proof of our main identiﬁcation result below. As we see from the previous display, nonclassicalmeasurement error implies heterogeneous biases for the marginal eﬀects. When ∂ z H ( g w ( z ) , w ) < attenuation bias for the marginal eﬀect ∂ z g w ( z ) and when ∂ z H ( g w ( z ) , w ) > augmentation bias for ∂ z g w ( z ).This is a type of model that is closely related to the class of generalized regression models studiedby Han [1987], Matzkin [1991], Cavanagh and Sherman [1998] and Matzkin [2007]. Our identiﬁcationargument exploits the monotonicity in (2.2) to establish identiﬁcation of g w up to strictly monotonictransformations based on arguments from the literature on generalized regression models. Example 2.1 (Example with Subjective Belief Data) . In this example, we discuss the plausibility ofour assumptions in our empirical application. In Section 5, we analyze how individuals adapt theirbeliefs on future stock market returns when they are provided with information on historical returns.Survey respondents are presented two randomly chosen realizations from a series of historical stockindex returns which we denote as Z and Z . They are asked to state their belief on stock returnsin the next year. Let Y ∗ denote an individual’s true belief and Y their reported belief. In this case, E [ Y ∗ | Z , Z ] characterizes variation in individual beliefs with respect to historic information. For thisexample we discuss the validity of our identifying Assumptions 1–3. Figure 1: Nonparametric estimates of g . The estimate in the right panel is obtained by assumingat most classical measurement error in the outcome variable, whereas the left panel showsresults obtained from applying our correction. Assumption 1 requires that historical return variations have no information on the mean of reportedbeliefs Y which is not already captured by the true, latent belief Y ∗ . Assumption 2 translates to the ild requirement that individuals with higher beliefs report on average (weakly) larger values thanrespondents with lower beliefs. This weak inequality is important as it allows for ﬂexible forms ofrounding as the reporting function h ( · ) in Assumption 2 can map neighborhoods of Y ∗ into ﬂat regionsof Y . Assumption 3 implies that higher moments of the regression error U conditional on Z , Z are constant, which rules out e.g. conditional heteroskedasticity since we neglect additional controlvariables W in this example. A further analysis, using more control variables is presented in Section5. Figure 1 compares estimation results from ignoring measurement error in the outcome to ourproposed correction.We see that our estimate exhibits a monotone, concave relationship between treatments and beliefs.On the other hand, without accounting for selective measurement error, we obtain a heterogenous, inpart ﬂat or convex relationship which results in a diﬀerent interpretation. Below, we introduce the notation supp ( V ) for the support of a random vector V . Assumption 4.

For any w ∈ supp ( W ) : (i) the function g w is continuous, (ii) and any z , z ∈ supp ( Z ) such that g w ( z ) < g w ( z ) there exists u ∈ supp ( U ) satisfying h ( g w ( z ) + u, w ) < h ( g w ( z ) + u, w ) ; (iii) there is at least one variable Z (1) in Z satisfying f Z (1) | Z ( − ,W ( z | z − , w ) > for all ( z , z − ) ∈ supp ( Z ) . Assumption 4 (ii) is a mild support condition on U conditional on W = w . The unobservable U must vary suﬃciently to shift g w ( Z ) out of a ﬂat region of h . The assumption is not required if h isalready strictly monotonic in its ﬁrst argument. Assumption 4 (iii) requires Z to contain at least onecontinuously distributed variable with suﬃcient variation. The case with scalar Z is allowed with theassumption becoming f Z | W ( z | w ) > z ∈ supp ( Z ). This rules out the case of Z being a discretescalar variable. Lemma 2.1.

Let Assumptions 1–4 be satisﬁed, then for any w ∈ supp ( W ) the function g w ( · ) isidentiﬁed up to strictly increasing transformations. The identiﬁcation result in Lemma 2.1 builds on Matzkin [2007, Theorem 3.2]. Without furthermodel restrictions we are able to point identify those features of g w that are preserved under strictlymonotonic transformations. This includes the sign of partial eﬀects, the ratio of two partial eﬀectsand properties such as quasi-concavity/convexity of the function. For the remainder of the paper weconsider the estimation of g w in the point identiﬁed case.Economic restrictions on the model can be employed to suﬃciently restrict the function space. Werefer to the discussion in Sections 3.4 and 4.4 in Matzkin [2007] where several possible function spacesare discussed that satisfy Assumption 5. This includes the spaces of function that are homogeneousof degree one, additive separable and so called “least-concave” functions, see also Matzkin [1994].Matzkin [2007] shows that imposing homogeneity of degree 1 and a location normalization is suﬃcientfor Assumption 5. Homogeneous functions are frequently encountered in microeconomics. Thus, inapplications where the function g has the structural interpretation of a production or cost function,homogeneity can be a reasonable restriction on the parameter space. In a general mean regressionsetting, however, it is not clear why the regression function should satisfy such a property. Same holdstrue for the least-concavity property. 7e impose the following restriction on the model and the measurement error mechanism describedby the function H . Assumption 5. (i) The function g w is additively separable such that there exists a decomposition Z = ( Z , Z − ) such that g w ( Z ) = m w ( Z ) + l w ( Z − ) for some functions m w , l w . (ii) There exists { z , z } ⊂ supp ( Z ) with g w ( z ) (cid:54) = g w ( z ) and E [ Y | Z = z, W = w ] = E [ Y ∗ | Z = z, W = w ] for z ∈ { z , z } . Assumption 5 (i) imposes an additive separable structure on the regression function g w . Followingthe identiﬁcation statement in Lemma 2.1 mere location and scale normalizations are not suﬃcient topoint identify g w . However for any additive separable model this is the case. See also Jacho-Chavezet al. [2010] who study identiﬁcation of (2.2) under Assumption 5 (i). Assumption 5 (ii) is a restrictionon the measurement error mechanism and supposes that there are points z , where the correct g w coincides with the function obtained from ignoring the measurement error. For instance, one canthink of pension information to account for nonclassical measurement error in labor income surveyquestions (see Breunig and Haan [2018]). Here, for certain ranges of labor income (e.g. close to themedian) we may assume that the measurement error is of classical form. Another interesting feature ofAssumption 5 is that it implies a normalization of the unknown, nonparametric link function H . Thisis in contrast to nonparametric generalized regression models where the normalization is imposed onthe unknown function of interest Assumption 5 (ii) is also in line with normalization requirements foridentiﬁcation under nonclassical measurement error. For instance, Assumption 5 of Hu and Schennach[2008] requires some functional of the distribution of the measurement error conditional on the valueof the true variable to be equal to the true variable itself, such as some quantile of Y | Y ∗ = y ∗ tocorrespond to y ∗ . Corollary 2.2.

Let Assumptions 1– 5 (i) be satisﬁed, then g w ( · ) is identiﬁed up to a location and scalenormalization. If 5 (ii) is additionally satisﬁed then g w is point identiﬁed. Corollary 2.2 establishes identiﬁcation of the regression function under normalization imposed inAssumption 5, which essentially is a shape restriction on the functional form of measurement error.For an alternative identiﬁcation argument for transformed additively separable models see Theorem2.1. of Jacho-Chavez et al. [2010].We neither restrict the support of the observed outcome Y , nor require continuity in the function h ( · , w ). Thus, we can also cover cases where the observed outcome is categorical or has mass points.This likely occurs in survey data as respondents tend to provide rounded values. The following remarksconsider two important special cases of model (2.1) which shed a diﬀerent light on the interpretationof Assumptions 1–3. Remark 2.1 (Control function approach) . We can also motivate the presence of W in Assumption 3as a control function. To this end we deviate for a moment from our previous notation and introducethe following triangular model Y ∗ = g ( X ) + UX = m ( Z, η ) 8 here for simplicity X is a one-dimensional endogenous covariate that may correlate with the modelerror U . m ( Z, η ) is strictly monotonic in η and Z is an appropriate instrument that satisﬁes Z ⊥⊥ ( U, η ) Under additional regularity conditions outlined in Theorem 1 of Imbens and Newey [2009] it holds that X ⊥⊥ U | W with W = F X | Z ( X, Z ) = F η ( η ) Then if Assumption 1 is formulated as E [ Y | Y ∗ , X, W ] = E [ Y | Y ∗ , W ] and if the latter function ismonotonic as in Assumption 2 we can follow the same reasoning leading up to Theorem 2.1 to establishthat g is identiﬁed up to a strictly monotonic transformation. Also note that our method may be appliedin any setting where Y ∗ is an endogenous regressor within a triangular model as it is the outcomevariable in the reduced form equation. Remark 2.2 (Selective Nonresponse) . Consider a nonresponse model Y = DY ∗ D = φ ( Y ∗ , W, V ) , for some unknown function φ , where the response indicator D ∈ { , } is always observed and Y ∗ is only observed if D = 1 . This framework, where the response mechanism is mainly driven by thelatent outcome Y ∗ has been studied by D’Haultfoeuille [2010] and Breunig et al. [2018]. As long asthe conditional mean function h ( Y ∗ , W ) = P ( D = 1 | Y ∗ , W ) Y ∗ is monotonic in its ﬁrst argument, themodel is in accordance to Assumption 2. This holds e.g. when the conditional response probabilityfunction is monotonic and the support of Y ∗ is bounded below . In contrast to D’Haultfoeuille [2010]and Breunig et al. [2018] there is no need for a completeness condition for nonparametric identiﬁcationof the conditional selection probability P ( D = 1 | Y ∗ , W ) via conditional moment restrictions.

3. Estimation and Asymptotic Properties

In this section we introduce a nonparametric sieve M-estimator with a simple, rank-based criterionfunction. For simplicity we consider only the case where W consists of discrete variables and defer theestimation with continuous W to Section B of the appendix. Our identiﬁcation result builds on shape restrictions imposed on the measurement error mechanismwhich imply identiﬁed moment conditions. Speciﬁcally, for a given w we can conclude from the If Y ∗ is bounded below, then Y ∗ can be redeﬁned such that without loss of generality Y ∗ ≥ h ( Y ∗ , W ) = P ( D = 1 | Y ∗ , W ) Y ∗ follows from taking the derivative. g w maximizes the function Q ( φ, w ) = E [ Y { φ ( X ) > φ ( X ) } | W = W = w ] . Based on this population criterion we now consider a sieve rank estimator, which implicitly accountsfor imposed shape restrictions on the measurement error.We introduce a sieve space G K which depends on the dimension parameter K = K ( n ) which growswith sample size n . and thus we suggest the following estimator (cid:98) g w of g w that maximizes a populationanalogue (cid:98) g w = arg max φ ∈G K Q n ( φ, w ) where (3.1) Q n ( φ, w ) := 2 n ( n − (cid:88) ≤ i φ ( Z j ) } . For the special case where W is absent the criterion reduces to Q n ( φ ) = 1 n ( n − n (cid:88) i =1 Y i Rank( φ ( Z i )) (3.2)which is a nonparametric version of the criterion of Cavanagh and Sherman [1998]. We henceforthrefer to (cid:98) g w as the (weighted) sieve rank estimator.In Section B of the appendix we illustrate how to deal with the case of continuous W by introducingkernel weights instead of the indicator function. The speciﬁc choice of G K hinges on the chosennormalization. Under a normalization of the link function H , see Corollary 2.2, we may consider alinear sieve space G K = { φ : φ ( z ) = γ (cid:48) w p K ( z ) } . Let p K = ( p , . . . , p K ) be a K - dimensional vector ofknown basis functions such as polynomials, splines or similar.We can in principal also apply the general sieve estimation technique of Chen [2007] based on theconditional moment restriction E [ Y | X = x ] = H ( g w ( z ) , w ). This would require to estimate H alongwith g w and nesting of two sieve spaces.Our estimation strategy constructively arises from the identiﬁcation argument and provides a simpledirect estimate of g w . We also directly leverage the monotonicity condition on H in the estimation sothere is no need to introduce additional shape-constraints. In this section we develop the convergence rate of the estimator in (3.1). To keep notation simplewe omit the controls W entirely from the following analysis. Estimation amounts to maximizing thecriterion in (3.2) from the previous section over a suitable sieve space. Under slight amendments to thenotation the omission of W is without loss of generality as long as W contains only discrete variables.The estimator in (3.1) is equivalent to applying Q n on a subset of the available sample conditional ona ﬁxed realization of W . Estimation when W contains continuous variables is deferred to Section Bof the appendix. 10or the remainder of the paper we consider the centered criterion function Q ( φ ) = E (cid:2) Y i (cid:0) { φ ( Z i ) > φ ( Z j ) } − { g ( Z i ) > g ( Z j ) } (cid:1)(cid:3) (3.3)where g is the regression function satisfying the model equation (2.1). Centering does not change themaximizer in the optimization problem and is thus without loss of generality. Let Q ( φ − g ) := ∂ ∂τ Q ( g + τ ( φ − g )) (cid:12)(cid:12)(cid:12) τ =0 denote the second directional derivative of the non-linear functional Q in the direction φ − g . Weassume that the functional Q ( · ) is bi-linear and continuous.Then Q ( · ) is a norm which we introduce here, as it is not possible to establish (local) equivalenceof |Q ( · ) | and e.g. the L -norm (cid:107)·(cid:107) L ( Z ) . Here, we denote L ( Z ) = { φ : (cid:107) φ (cid:107) L ( Z ) < ∞} where (cid:107) φ (cid:107) L ( Z ) := (cid:112) E φ ( Z ).The ﬁrst directional derivative of Q is equal to zero for any arbitrary direction and thus not acandidate to construct a weak norm locally equivalent to |Q ( · ) | . We instead use the second directionalderivative which can be viewed as a quadratic approximation to the criterion function Q ( · ). Weintroduce the following sieve measure of ill-posedness τ K = sup φ ∈G K (cid:107) φ − Π K g (cid:107) L ( Z ) Q ( φ − Π K g )to account for the fact that the criterion function and the L -norm are generally not (locally) equivalent.If τ K → ∞ as K → ∞ the problem of estimating g is ill-posed in rate and additional regularizationslows down convergence in the strong L - norm.For the following assumption we introduce a local neighborhood of g and deﬁne the space G δK = { φ ∈ G K : (cid:107) φ − g (cid:107) L ( Z ) < δ } with δ > Assumption 6. (i) A random sample { ( Y i , Z i ) } ni =1 of ( Y, Z ) is observed; (ii) there exists Π K g ∈ G K such that (cid:107) Π K g − g (cid:107) L ( Z ) = O ( K − α/d z ) ; (iii) E [ U ] < ∞ and g ∈ L ( Z ) ; (iv) for any φ in G δK thereexists a constant < η < such that |Q ( φ ) − Q ( φ − g ) | ≤ η · Q ( φ − g ) ; (v) the cdf of g ( Z ) isLipschitz continuous, i.e., | F g ( Z ) ( a ) − F g ( Z ) ( b ) | ≤ C | a − b | for some constant C and any a, b ; and (vi) τ K (cid:112) K/n = o (1) . Assumption 6 (ii) is satisﬁed for many combinations of smoothness classes for g and sieve bases.Examples are the H¨older and Sobolev classes in Chen [2007] and spline or wavelet sieves. Assumption6 (iv) is also known as the tangential cone condition and implies that Q ( φ ) is locally equivalent to Q ( φ − g ) which is a typical condition required to derive the convergence rate for sieve estimators; seeAssumption 4.1(ii) of Chen and Pouzo [2012] and also Dunker et al. [2014].Assumption 6 (v) amounts to a local continuity assumption for the kernel of an empirical process(see Lemma C.1 in the appendix) which is also standard, see e.g. Condition 3.8. in Chen [2007].Assumption 6 (vi) restricts the growth of K relative to the sieve measure of ill-posedness τ K and isrequired for consistency, see Lemma C.2 in the Appendix.11o give an insight on the source of ill-posedness, note that Q ( φ ) = E (cid:2) Y i (cid:0) F g ( Z i ) | Y i ( g ( Z j )) − F φ ( Z i ) | Y i ( φ ( Z j )) (cid:1)(cid:3) which shows that if there is little variation in the distribution of F g ( Z ) | Y for variations of g then theill-posed inverse problem becomes more severe. This is further illustrated by the following lemmawhere we study a special case for which we can derive Q analytically and give suﬃcient conditions forAssumption 6 (iv). Lemma 3.1.

Consider the additive separable model g ( Z ) = Z + (cid:101) g ( Z ) with bivariate Z = ( Z , Z ) .Then Assumption 6 (iv) is satisﬁed if f (cid:48) Z | Z is uniformly bounded away from zero and f (cid:48)(cid:48) Z | Z is uniformlybounded above. The special case outlined in Lemma 3.1 illustrates the behavior of τ K . If the density f Z | Z , thatis the conditional density of the separable covariate, is ﬂat in the relevant support, we may encounterthe case that the criterion Q is close to zero for candidate functions that are arbitrarily far away fromthe true function in the L - sense.We further illustrate this issue in a Monte Carlo simulation study in Section 4, where we showthat the estimation problem is more severely ill-posed whenever f Z | Z is ﬂat. So the behavior of τ K will generally depend on the distribution of observables as well as the speciﬁc model under study andchosen normalization. Theorem 3.2.

Let Assumptions 1-6 be satisﬁed. It holds that (cid:107) (cid:98) g − g (cid:107) L ( Z ) = O p (cid:16) max (cid:110) τ K (cid:114) Kn , K − α/d z (cid:111)(cid:17) The proof is based on the Hoeﬀding decomposition of U-statistics and makes use of a representationof second-order U-processes as empirical processes as in Clemencon et al. [2008]. To the best of ourknowledge this is the ﬁrst convergence rate result for nonparametric M-estimators with a rank-basedcriterion function and thus also the ﬁrst to acknowledge the ill-posedness of the problem.The next corollary provides concrete rates of testing when the dimension parameter K is chosento level variance and square bias under classical smoothness conditions. We call our model mildlyill-posed if: τ k ∼ k γ/d z with γ > severely ill-posed if: τ k ∼ exp( k γ/d ), with γ > Corollary 3.3.

Let Assumptions 1-6 be satisﬁed.1. Mildly ill-posed case: setting K ∼ n d z /d z +2 γ +2 α yields (cid:107) (cid:98) g − g (cid:107) L ( Z ) = O p ( n − α/ α +2 γ + d z ) .

2. Severely ill-posed case: setting K ∼ log( n ) d/γ yields (cid:107) (cid:98) g − g (cid:107) L ( Z ) = O p (log( n ) − α/γ ) . If { a n } and { b n } are sequences of positive numbers, we use the notation a n (cid:46) b n if lim sup n →∞ a n /b n < ∞ and a n ∼ b n if a n (cid:46) b n and b n (cid:46) a n .

4. Monte Carlo Simulation Study

This section demonstrates how non-classical measurement errors in the outcome alters mean regressionresults in ﬁnite samples and shows the usefulness of our approach to correct for such biases. We compareregression function estimates obtained from simply ignoring the measurement error with our estimatorwhich accounts for the presence of the error. Throughout this section, simulation results are based ona sample of size of n = 1000 and 1000 Monte Carlo iterations.We consider the following data generating process Y ∗ = Z + g ( Z ) + UY = h ( Y ∗ ) + V where Z ∼ N (1 , σ ), Z ∼ U [ − ,

3] independent of each other, g ( · ) = sin( · ) and the error terms( U, V ) ∼ N (0 , I ). Here, I is the 2-dimensional identity matrix and for the standard deviation of Z we choose σ = 1, which will be varied later. In the above model, g is identiﬁed up to a locationnormalization. Analogously we could specify a linear or nonlinear function on Z and impose anadditional scale normalization on g . The function h in the measurement error equation is chosen as, h ( Y ∗ ) =  q . + b ( Y ∗ − q . ) if Y ∗ > q . Y ∗ , if q . ≤ Y ∗ ≤ q . q . − a ( q . − Y ∗ ) if Y ∗ < q . where q . , q . denote the 30%- and 70%-quantile of Y ∗ . The setup is analogous to a survey datasetting with over- or underreporting in the tails of Y ∗ , whereas the center of the distribution is notaﬀected. The scalars a, b can be chosen to vary the magnitude of measurement error.Figure 2 illustrates the eﬀects of the measurement error for the case a = b = 0 .

5. We show therealizations of Y and Y ∗ for a speciﬁc draw of the data generating process and plots the function h .We compare the measurement error function h (depicted as red solid line) with the setup of classicalmeasurement error, which is captured by the 45 ◦ line (depicted as black dashed line).We implement the sieve rank estimator (cid:98) g given in (3.1) using a linear sieve space with B-splinebasis functions of order 3 with 2 interior knots that are placed according to quantiles of the empiricaldistribution. Thus we have K = 4. The elements of the sieve space are normalized to move throughthe point (0,0) which is the correct value of the true function sin( · ) at 0. This normalization canalso be rationalized as utilizing prior knowledge on the measurement error mechanism in the senseof Assumption 5 (ii). For instance, we can expect that ignoring the measurement error results inestimates that are close to the true function g in the center of the distribution of Z . Figure 3 showsthe sieve rank estimates (cid:98) g and compares them to a nonparametric series regression that does not13 − Y * Y Figure 2: Realizations of Y ∗ , Y when a = b = 0 . n = 1000. The redsolid line depict the function h and the black dashed line the 45 ◦ line.account for nonclassical measurement error in the outcome using the same order and the same knotplacement as for (cid:98) g . For the latter estimator the same choice of basis functions and tuning parametersis adopted. We study the cases a = b = 0 . a = b = 0 which essentiallyimplies that at some point the measurements Y are merely random ﬂuctuations around a constantvalue . We observe that our estimation strategy results in an accurate estimate of g in both cases,whereas ignoring the measurement error yields estimates with a sizeable bias in the tails of Z . In thesevere setting depicted in the right panel, ignoring measurement error results in a rather ﬂat estimatewhich is signiﬁcantly diﬀerent from the sieve rank estimator.The data generating process chosen here is in line with the model in Lemma 3.1 and thus allows usto study the degree of ill-posedness in the convergence rate of the estimator. As pointed out in thediscussion following Lemma 3.1, the behavior of the sieve measure of ill-posedness τ K is governed bythe conditional density f Z | Z . If the density f Z | Z is ﬂat over the relevant support, τ K diverges fasterand the ill-posedness is more severe.Figure 4 below shows estimates across diﬀerent standard deviations of the separable covariate Z which aﬀects the slope of the density f Z | Z . For small standard deviations, the conditional density f Z | Z will be rather ﬂat over the better part of the support.We can see that pointwise conﬁdence bands increase when the standard deviation of Z is low. Forstandard deviations of 0.5, 1 and 2 we see very small diﬀerences in the bias of our estimator (cid:98) g . For the We perform Kolmogorov-Smirnov tests to test the hypothesis that Y and Y ∗ follow the same probability distributionon every drawn sample of the MC study. In the a = b = 0 . a = b = 0 case we reject the null in 966 cases. Thus in the strong ME setting, Y and Y ∗ have a very similar marginal distribution in contrast to the mild ME setting. − − mild ME, a=b=0.5 Z g ( Z ) −3 −2 −1 0 1 2 3 − − strong ME, a=b=0 Z g ( Z ) Figure 3: Estimation results normalized to go through the coordinate (0 , (cid:98) g , solid red line is the median of a series estimator withsame B-splines speciﬁcation, solid blue line shows true g ( · ) function, and dashed black linesare the 0.95 and 0.05 quantiles over all Monte Carlo rounds. In the left panel we choose a = b = 0 . a = b = 0.case where the standard deviation is lowest, not only conﬁdence bands blow up but also bias increaseswhich suggests that a tiny variation in Z will ultimately also result in erroneous estimation results.The last row shows that increasing the standard deviation more does not lead to smaller conﬁdencebands. This observation is in line with the ﬁnite sample behavior of estimators with ill-posed rate. Ifthe density f Z | Z is ﬂat, τ K diverges faster and we need to choose a smaller number of basis functionsto control the variance.

5. Application: Beliefs on Stock Returns in SOEP

Subjective beliefs on stock market returns are a key variable in economic models that seek to explainstock market participation and portfolio choice, see e.g. Breunig et al. [2019] and the references therein.However subjective belief data is known to be prone to a large degree of measurement error, see thediscussion and references in Drerup et al. [2017]. These authors also argue for the presence of types inthe population which do not hold stable beliefs on stock market returns and thus report rather noisybeliefs that do not help in explaining economic behavior empirically.Acknowledging the presence of diverse measurement error in subjective belief data, it is diﬃcultto rationalize a classical measurement error assumption a priori. In this application we study meanregressions where the outcome variable is a subjective belief measure. We account for the possibilityof non-classical measurement error in the outcome by applying our methodology and comparing it tostandard regression techniques that do not account for this form of measurement error. Thereby we arethe ﬁrst to explicitly acknowledge the possibly nonclassical nature of measurement error in subjectivebelief data. 15 − − s = Z g ( Z ) −3 −2 −1 0 1 2 3 − − − s = Z g ( Z ) −3 −2 −1 0 1 2 3 − − s = Z g ( Z ) −3 −2 −1 0 1 2 3 − − s = Z g ( Z ) Figure 4: Estimation results with varying standard deviation σ of Z . Otherwise, the same legendapplies as in Figure 3. Note the diﬀerent scale on the y-axis in the ﬁrst and second panel.Via experimental interventions Breunig et al. [2019] vary subjective beliefs exogenously to determinethe causal impact of individual beliefs in portfolio choices. However, they ﬁnd that the treatments didnot suﬃciently shift reported beliefs. To this end, we study the impact of historic return informationon subjective beliefs of future stock market returns, allowing for nonclassical measurement error in theoutcome variable. The idea is that displaying historic return information to survey respondents priorto eliciting their beliefs can serve as an exogenous shift to their beliefs. This can be seen as a ﬁrststage of a more general analysis tackling the endogeneity of subjective beliefs.We use novel data from the innovation sample of the 2017 wave of the german socio-economic panel(SOEP-IS), which contains survey questions on individual beliefs on future stock market returns. Inthe interviews, respondents are asked what they believe how much the DAX, Germanys prime bluechip stock market index, will change in one, two, ten and thirty years with respect to the current level.They are asked to provide a direction of the change (increase or decrease) as well as a percentagechange. 16efore the individuals are asked about their beliefs, they obtain information about historical DAXreturns. Two observations of the series of yearly DAX returns from 1951 to 2016 are randomly chosenand presented to the respondent. Afterwards they are asked to report their beliefs on how the DAXchanges in the next year (in percentage points).In this application we are interested in the eﬀect of the historical DAX information on the individualsexpected DAX return in one year. Let Y ∗ denote the individual true belief on the DAX return inone year and let Z , Z be the two treatment variables, i.e. the randomly drawn historical returns.The reported belief is denoted by Y . In general we cannot be sure that reported beliefs are free ofnonclassical measurement error and in the following we want to account for it. The data consistsof 1084 interviewed persons but 306 people do not respond to the question on beliefs. We removedmissing values and report the summary statistics below.Min. 1. Quant Median Mean 3. Quant. Max. Y -50.00 1.00 4.00 3.55 7.00 130.00 Z -43.94 -6.08 11.36 14.77 29.06 116.06 Z -43.94 -6.08 13.99 17.13 34.97 116.06Table 1: Summary Statistics (all units are percentage points)We begin the analysis by considering the following additively separable model Y ∗ = g ( Z ) + g ( Z ) + m ( W ) + U, where Z ⊥⊥ U | W (5.1)which under the identifying assumptions 1-3 leads to the model E [ Y | X ] = H [ g ( Z ) + g ( Z ) + m ( W ) , W ]where W contains all observable variables that may have a direct eﬀect on both the latent belief Y ∗ (via some function m ) as well as the measurement Y . By the experimental nature of Z , the randomvariables are credibly fully independent of any observables in W , and thus we refrain from specifying W explicitly. Though the model choice is restrictive in that it ignores any interaction eﬀects betweenthe two treatments, the restrictions imposed on the measurement error mechanism are rather mild. Inaddition to the latent outcome Y ∗ any other observable variable W and even unobservables may havea direct eﬀect on the reporting Y in the sense of Assumption 1.We estimate functions g , g with our method outlined in (3.2) and contrast the results to estimatesobtained from assuming classical measurement error, i.e. from a standard additive-separable, nonpara-metric regression of Y on Z and Z . We choose a B-Spline basis of degree 2 and the number of basisfunctions is K = 2 for each function estimate (resulting from 10-fold cross-validation). The results arepresented in Figure 5 along with implementational details. Note that the absolute value of the y-axisin the left column is not informative as location and scale of g and g are not identiﬁed. Comparingboth estimates, accounting for nonclassical measurement error in the outcome leads to more concaveestimates which implies that the historic information is processed more conservatively.The main concern with the above model is that it ignores interactive eﬀects of both treatments.17

20 −10 0 10 20 30 40 50

ME correction Z g ( Z ) −20 −10 0 10 20 30 40 50 . . . . . ME ignored

Z_1 g ( Z ) −20 −10 0 10 20 30 40 50 Z_2 g ( Z ) −20 −10 0 10 20 30 40 50 . . . . . Z_2 g ( Z ) Figure 5: Left column: Plots of sieve rank estimators. Right column: Series estimators ignoring ME.Thus the resulting qualitative diﬀerences in estimates may be explained by interactive eﬀects thatare captured by the presence of H in (5.1). In order to study a fully nonparametric function of bothtreatments we incorporate additional variables and study the following model Y ∗ = g ( Z , Z ) + Z (cid:48) β + m ( W ) + U where Z ⊥⊥ U | W (5.2)where ( Z , W ) are now additional control variables which contribute to individual beliefs Y ∗ . Here,we assume that Z does directly eﬀect the measurement Y in the sense of Assumption 1 while W mayhave such eﬀects.The SOEP-IS contains respondents’ socio-demographics such as age, gender, tertiary degree in-formation as well as self-assessed cognitive skill measures and personality traits. The educationalinformation (“tertdegree”) and cognitive skill measures (“ire01”, “ire02”) are summarized in W as webelieve they may have a direct impact on the measurement Y . This is backed by ﬁndings of Breuniget al. [2019] where well-educated individuals react to a manipulation of their beliefs. For them thedegree of measurement error may generally be smaller. All remaining control variables are summarizedin Z these contain age, gender, risk attitudes and diﬀerent measures of personality traits, see Table 2in Section A of the appendix for more detail.We impose our identifying Assumptions 1–3 are valid. Compared to the model (5.1) the exclusionof additional variables Z from the set W is necessary to achieve identiﬁcation of g up to locationand scale normalizations. We believe that once we control for cognitive skills in W , variables like age,gender and other personal characteristics do not shift the mean reported beliefs.18herefore the model is more ﬂexible with respect to Z , Z at the cost of assuming that none of thevariables in Z have a direct eﬀect on the measurement Y in the sense of Assumption 1, and that theydo not violate other model assumptions such as the conditional independence in (5.2). The overallcorrelation between the variables in Z and W is negligible, see the correlogram in Figure 7 in SectionA of the appendix. We estimate (5.2) by simply varying the criterion in 3.2 over g ( Z , Z ) + Z (cid:48) β which is equivalent to imposing that Z ⊥⊥ W which does not appear to be critical due to the overalllow correlation of cognitive skills with the remaining covariates. In other cases the weighted rankestimator outlined in Section B of the appendix needs to be used.Again we compare the estimate for g from our method to estimates obtained from ignoring themeasurement error. We choose a bivariate B-Spline basis of degree 2 with K = 4 basis functions. Thisminimizes the 10-fold cross-validation of the nonparametric regression model ignoring the measurementerror and is therefore adapted to our sieve rank estimator.Results are presented in Figure 6. Again, accounting for the measurement error leads to a concave,symmetric eﬀect of both treatments on the individual beliefs. When ignoring the possibility of mea-surement error, results are much more asymmetric, including zero eﬀects of the second treatment andconvex parts in the surface. In contrast, our method yields that individuals learn conservatively fromboth treatments which is in line with the a priori intuition. Note that the z-axis of the sieve rankestimate is not informative since we can choose an arbtirary location and scale normalization of thefunction. The functions are evaluated on a grid ranging from -20 to 50 which corresponds to the 10%-and 90%-quantile of the marginal distributions of the treatment variables.Summarizing the results from the two diﬀerent models considered above, accounting for nonclassicalmeasurement error with our sieve rank methods, yields stable results that are in line with economicexpectations. Ignoring nonclassical measurement error leads to spuriously more interesting ﬁndings inthat the eﬀects of both treatments on beliefs appear diﬀerent and range from ﬂat to convex marginals ofthe regression function. As soon as we account for the potential measurement error in the outcome weﬁnd that that marginal eﬀects of both treatments are symmetric and concave hinting at a conservativelearning from the historical information.

6. Conclusion

This paper provides new insights on the analysis of regression models with non-classical measurementerror in the outcome variable. Our nonparametric identiﬁcation result is based on intuitive assumptionsinvolving shape restrictions on measurement error functions. This novel result builds on the equivalenceof nonclassical measurement models and generalized regression models. We propose a novel sieve rankestimator which constructively arises from our identiﬁcation result and implicitly accounts for therequired shape restrictions. We establish the rate of convergence of the sieve rank estimator whichis aﬀected by a potentially ill-posed inverse problem. The proposed estimation method is easy toimplement and provides numerically stable results as demonstrated in a ﬁnite sample analysis. Finally,we demonstrate the usefulness of our method in an empirical application on belief elicitation wherewe ﬁnd measurement error in subjective belief data to be of a non-classical form.19igure 6: Nonparametric estimates of g ( Z , Z ). The ﬁrst column contains the estimate from our sieverank estimator and the second column the estimate from ignoring measurement error. A. Additional Data Description

Below we give summary statistics on additional key variables. Other variables included are “ibl11”-“ibl18” which are categorical answers to questions measuring the perseverance of a respondent. Vari-ables “isb011”-“isb015” contain answers to questions measuring personality traits such as reciprocityand patience. W consists of “ire01”, “ire02” and “tertdegree”, the remaining variables are summa-rized in Z . Below we summarize the correlation structure in the data, with most variables beinguncorrelated. 20in. 1. Quant Median Mean 3. Quant. Max.age 18.00 39.00 54.00 52.95 66.75 94.00female 0 0 0 0.4448 1 1tertdegree 0 0 0 0.1601 0 1prisk -1 2 4 3.931 6 9ire01 -5 20 40 38.14 50 100ire02 -5 30 40 42.57 50 100Table 2: Summary Statistics of key variables. Age in years, female and tertdegree are dummy variablesindicating female gender and whether a tertiary degree has been obtained. prisk is a score(0-10, -1 indicating nonresponse) of risk preferences, ire01 and ire02 are self-assessed scores(0-100, -1 indicating nonresponse) for calculatory skills and knowledge on nature. B. Extension: Estimation with Continuous W When W does contain continuous variables, we can simply replace the indicator in (3.1) with a kernelfunction to account for the fact that W i = W j = w is a null event. Then estimation can proceed with (cid:98) g w = arg max φ ∈G K Q n ( φ, w ) where (B.1) Q n ( φ, w ) := (cid:88) ≤ i φ ( Z j ) } where K h is deﬁned as K s ( W i − w ) = d w (cid:89) l =1 K (cid:18) W l,i − ws l (cid:19) and K : R → R is some kernel function and s ∈ R d w a vector of bandwidths.As we move from the original criterion of Cavanagh and Sherman [1998] to the conditional versionwith continuous W the computational complexity of the maximization problem increases. Ranking isan O ( n log( n )) operation whereas the weighted ranking is performed in O ( n ) time. This implies thatthe conditional estimation method is not scalable to large data sets and computation time increasesheavily with the sample size.The following criterion can be used to deal with continuous W and computation time scales in n . Q n ( φ, w ) = (cid:88) ≤ i φ ( Z j ) } = (cid:88) i : w − s

Assume the function g ( Z ) does not depend on W . We can consider the followingestimator (cid:98) g = arg max φ ∈G K Q n ( φ ) where Q n ( φ ) := (cid:88) ≤ i φ ( Z j ) } In contrast to before we consider only those observations in a neighborhood around a ﬁxed value w butwe choose the weights according to which distance any pair ( W i , W j ) has to each other. Similar to theapproach in (3.5) this is associated with increasing computational complexity as the computation time oes not scale with the sample size.We thus suggest the following strategy:First use the criterion in (B.2) to obtain estimates (cid:98) g w across diﬀerent values of w ∈ supp ( W ) . Eachis an estimate of g as g does not depend in theory on w , but estimation results may nevertheless varyfor diﬀerent w . Second, aggregate the diﬀerent estimates (cid:98) g w to one ﬁnal estimator for g . To this end,we can follow Chiappori et al. [2015] which discuss the following two ’aggregation’ procedures. (cid:98) g LS ( z ) = arg min q ∈ R (cid:90) supp ( W ) ν ( w )[ (cid:98) g ( z, w ) − q ] dw (cid:98) g LAD ( z ) = arg min q ∈ R (cid:90) supp ( W ) ν ( w ) | (cid:98) g ( z, w ) − q | dw where ν is some weighting function with (cid:82) supp ( W ) ν ( w ) dw = 1 .The implementation is simple. Random draws from { W i } Ni =1 yields a set of diﬀerent realizations w on which to evaluate the local estimators (cid:98) g w . The LS criterion takes the average of the local estimators,the LAD criterion takes the empirical median to aggregate to a ﬁnal estimator for g . In simulationsChiappori et al. [2015] ﬁnd that the latter estimator performs best as for w in the tails of the distributionof W we may get erratically behaving (cid:98) g w . B.1. Weighted Rank Estimation

In this section we assess the performance of a weighted rank estimator for a setting as described inRemark B.1. We consider the following data generating process similar to Section 4, Y ∗ = Z + g ( Z ) + m ( W ) + U · W Y = h ( Y ∗ + W ) + V · | W | where g ( · ) = sin( · ), m ( · ) = cos ( · ), W = 0 . · Z + 0 . · U and the remaining variables as in Section 4with h parameterized by a = b = 0. In this setting there is correlation between Z and W . Furtherthe measurement is additionally aﬀected by the variable W . This setting is in line with Remark B.1as g does not vary with W , and we implement the procedure outlined at the end of this remark withthe LAD-criterion as aggregating procedure.In order to calculate an estimate of g for each Monte Carlo sample, we ﬁrst take 50 random draws ofthe variable W , calculate (cid:98) g w by maximizing (B.2) for each of the 50 diﬀerent realizations w . Finally,we aggregate the results to a ﬁnal estimate by taking the sample median over the local estimates (cid:98) g w .We vary the bandwidth parameter (cid:101) s across diﬀerent experiments. The sample size is n = 1000 and500 Monte Carlo replications are considered. The following Figure 8 shows the results.If we choose s reasonably small, our estimation procedure is quite close to the truth and outperformsthe standard nonparametric estimator that simply ignores the measurement error. Increasing thebandwidth s leads to smaller conﬁdence bands, but considerably increases the bias of the estimate.However in this strong measurement error setting, the weighted sieve rank estimator still outperformsthe estimate from ignoring the measurement error.23 − − s=0. 05 Z g ( Z ) −3 −2 −1 0 1 2 3 − − s=0. 25 Z g ( Z ) −3 −2 −1 0 1 2 3 − − s=0. 5 Z g ( Z ) −3 −2 −1 0 1 2 3 − − s=1 Z g ( Z ) Figure 8: The blue line is the g ( · ) = sin( · ) function, the solid black line denotes the median and thedotted lines the respective 0.95 and 0.05 quantiles of the weighted sieve rank estimator overthe Monte Carlo experiments. The red line is the median of series estimates of g in themodel Y = Z + g ( Z ) + m ( W ) + U . Basis functions are set as in Section 4 with K = 4. C. Proofs and Technical Results

Proof of Lemma 2.1.

First, recall that X = ( Z, W ) and that g w = g ( · , w ). The criterions weconsider are Q ( φ, w ) = 12 E [ H ( g ( X ) , W ) { φ ( X ) > φ ( X ) } + H ( g ( X ) , W ) { φ ( X ) < φ ( X ) } | W = W = w ]24here E [ Y | X ] = H ( g ( X ) , W ) and by the LIE also Q ( φ, w ) = E [ Y { φ ( X ) > φ ( X ) }| W = W = w ]. Without loss of generality we consider the case that Assumption 2 holds with h ( · , w ) weaklymonotonically increasing. The argument for the decreasing case is analogous.We begin by noting that g w is a maximizer of Q ( · , w ), as Q ( g w , w ) = 12 E [max { H ( g ( X ) , W ) , H ( g ( X ) , W ) } | W = W = w ] , which follows by monotonicity of H in its ﬁrst argument. Let m ( · ) be some arbitrary strictly increasingfunction. Then m ◦ g w is as well a maximizer of Q ( · , w ). This implies that without Assumption 5 theregression function g w is at best identiﬁed up to strictly increasing transformations.Below, we show that for any function (cid:101) g w (cid:54) = m ◦ g w for an arbitrary strictly monotonic transfor-mation m we have that Q ( (cid:101) g w , w ) < Q ( g w , w ). That is, a function that is not a strictly monotonictransformation of g w never maximizes the criterion Q ( · , w ) and thus for an arbitrary w ∈ supp ( W ), g w is identiﬁed up to a strictly monotonic transformation. Take some arbitrary function φ ∈ G that isnot a strictly monotonic transformation of g w . Therefore there exist points z (cid:48) and z (cid:48)(cid:48) in the supportof Z such that, g w ( z (cid:48) ) < g w ( z (cid:48)(cid:48) ) and φ ( z (cid:48) ) > φ ( z (cid:48)(cid:48) ). By Assumption 4 (ii) H ( · , w ) is strictly monotonicand it holds for every w that H ( g w ( z (cid:48) ) , w ) < H ( g w ( z (cid:48)(cid:48) ) , w )By continuity of the functions following Assumption 4 (i) the above inequalities hold in neighborhoods B around z (cid:48) and B around z (cid:48)(cid:48) , respectively. By Assumption 4 (iii) these neighborhoods have a strictlypositive probability measure. This implies Q ( g w , w ) − Q ( φ, w ) ≥ E [ H ( g w ( Z ) , W ) − H ( g w ( Z ) , W ) | Z , Z ∈ B × B , W = W = w ] × P ( Z , Z ∈ B × B | W = W = w ) > , with the last inequality following from the strictly positive probability of regions B × B . Thus Q ( · , w ) is only maximized by g w and strictly monotonic transformations of it. Hence g w is identiﬁedup to a strictly monotonic transformation. Proof of Corollary 2.2.

Under Assumption 5 (i) any candidate regression function (cid:101) g w ( Z ) = (cid:101) m w ( Z ) + (cid:101) l w ( Z − ) must satisfy (cid:101) g w ( Z ) = M w ( g w ( Z )) = M w ( m w ( Z ) + l w ( Z − ) = (cid:101) m w ( Z ) + (cid:101) l w ( Z − )for a strictly monotonic function M w . Thus M w must be linear and g w is identiﬁed up to locationand scale transformation. Indeed, given linear and strictly monotonic transformations, g w is the onlymaximizer of Q ( · , w ). Under Assumption 5 (ii) we have that g w ( z ) = E [ Y | Z = z , W = w ] and g w ( z ) = E [ Y | Z = z , W = w ] and ﬁxing the parameter space to move through both points leads to g w being the unique maximizer of Q ( · , w ) over G and thus g w is point identiﬁed.25 roof of Lemma 3.1. Let Z , Z be independent copies of Z . Consider the additive separable case g ( Z ) = Z + (cid:101) g ( Z ) with bivariate Z = ( Z , Z ). Analogously we denote φ ( Z ) = Z + (cid:101) φ ( Z ).The following holds for the criterion Q|Q ( φ ) | = E [ Y ( { Z + (cid:101) g ( Z ) > g ( Z ) } − E [ Y { Z + (cid:101) φ ( Z ) > φ ( Z ) } ]= E [ Y ( F Z | Z ( φ ( Z ) − (cid:101) φ ( Z )) − F Z | Z ( g ( Z ) − (cid:101) g ( Z )))] , as g is the maximizer of Q and with the second equation due to the law of iterated expectation. Usinga second-order Taylor decomposition with directional derivatives yields for all φ in a neighborhoodaround g |Q ( φ ) | = Q g ( φ − g ) + E [ Y f (cid:48)(cid:48) Z | Z ( ξ )( (cid:101) φ ( Z ) − (cid:101) g ( Z ) + (cid:101) g ( Z ) − (cid:101) φ ( Z )) ] (cid:124) (cid:123)(cid:122) (cid:125) = R , where ξ is some intermediate variable and Q g denotes the directional derivative of Q at g which isgiven by Q g ( φ − g ) = E [ Y f (cid:48) Z | Z ( g ( Z ) − (cid:101) g ( Z ))( (cid:101) φ ( Z ) − (cid:101) g ( Z ) + (cid:101) g ( Z ) − (cid:101) φ ( Z )) ] . Applying the Cauchy-Schwarz inequality to Q g ( φ − g ) shows that Q g is weaker than the L -norm.Further, the remainder term R satisﬁes | R | ≤ E (cid:34)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) f (cid:48)(cid:48) Z | Z ( ξ ) f (cid:48) Z | Z ( g ( Z ) − (cid:101) g ( Z )) ( (cid:101) φ ( Z ) − (cid:101) g ( Z ) + (cid:101) g ( Z ) − (cid:101) φ ( Z )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:35) · Q g ( φ − g )and thus the tangential cone condition in Assumption 6 (iv) is satisﬁed if the ﬁrst factor on the righthand side is bounded between 0 and 1. The lower bound holds directly and the upper bound is easilysatisﬁed if the δ − neighborhood around g is chosen suﬃciently small and derivatives of the density arebounded away from zero and inﬁnity, as is condition.For the proof of the next results, we require some additional notation to deal with the Hoeﬀdingdecomposition of U-statistics, speciﬁc function spaces and their respective envelope functions.We introduce the empirical criterion Q n ( φ ) that can be denoted as Q n ( φ ) = 2 n ( n − (cid:88) ≤ i φ ( Z j ) } − { g ( Z i ) > g ( Z j ) } (cid:1) indexed by φ ∈ G K making it a second-order U-process. Note that Q n is centered here which does More precisely ξ = g ( Z ) − (cid:101) g ( Z ) + s [ φ ( Z ) − (cid:101) φ ( Z ) + (cid:101) g ( Z ) − g ( Z )] for some s ∈ (0 , Q given in (3.3) satisﬁes Q ( φ ) = E [Γ( S i , S j , φ )].For the asymptotic analysis we make use of the Hoeﬀding decomposition of a U-statistic (see e.g.van der Vaart [1998]) Q n ( φ ) = Q ( φ ) + ν n ( φ ) + ξ n ( φ ) (C.1)with short hand notations ν n ( φ ) := 1 n n (cid:88) i =1 ν ( S i , φ ) ,ν ( S i , φ ) := E [Γ( S i , S j , φ ) | S i ] + E [Γ( S j , S i , φ ) | S i ] − E [Γ( S i , S j , φ )] ,ξ n ( φ ) := 2 n ( n − (cid:88) ≤ i φ ( Z j ) } − { g ( Z i ) > g ( Z j ) }| Z i ]+ E [ Y j ( { φ ( Z j ) > φ ( Z i ) } − { g ( Z j ) > g ( Z i ) } ) | Z i ] − E [ Y i ( { φ ( Z i ) > φ ( Z j ) } − { g ( Z i ) > g ( Z j ) } )] |≤| Y i | + 3 E [ | Y i | ] , where (cid:13)(cid:13) F ν (cid:13)(cid:13) L ( S ) ≤ (cid:112) E [ Y ] =: C ν . In addition we have F ξ ( S i , S j ) = 2 | Y i | + 2 E [ | Y i | ] as | ξ ( S i , S j , φ ) | = | Y i ( { φ ( Z i ) > φ ( Z j ) } − { g ( Z i ) > g ( Z j ) } ) − Y i E [ { φ ( Z i ) > φ ( Z j ) } − { g ( Z i ) > g ( Z j ) }| Z i ] − E [ Y i ( { φ ( Z i ) > φ ( Z j ) } − { g ( Z i ) > g ( Z j ) } ) | Z j ]+ E [ Y i ( { φ ( Z i ) > φ ( Z j ) } − { g ( Z i ) > g ( Z j ) } )] | and (cid:13)(cid:13) F (cid:13)(cid:13) L ( S ) ≤ (cid:112) E [ Y ] =: C η . By Assumption 6 (iii) we have C ν , C η < ∞ . Ultimately, we deﬁnethe bracketing integral J [] of the space F ν,K J [] (1 , F ν,K , L ( S )) = (cid:90) (cid:113) N [] ( (cid:15) · (cid:13)(cid:13) F ν (cid:13)(cid:13) L ( S ) , F ν,K , L ( S )) d(cid:15). F ξ,K . Proof of Theorem 3.2.

We begin by noting that consistency of (cid:98) g in the L -norm follows fromLemma C.2. Due to the consistency result in Lemma C.2, we may restrict the function spaces to alocal neighborhood around g , i.e. we deﬁne the space G δK = { φ ∈ G K : (cid:107) φ − g (cid:107) L ( Z ) < δ } and assumethat (cid:98) g ∈ G δK . Further we introduce the space G δ,r n K = { φ ∈ G δK : Q g ( φ − g ) > M r n } where M >

0. Itholds that P (cid:0) Q g ( (cid:98) g − g ) ≥ M r n (cid:1) ≤ P (cid:32) sup φ ∈G δ,rnK Q n ( φ ) ≥ Q n (Π K g ) (cid:33) ≤ P (cid:32) sup φ ∈G δ,rnK Q ( φ ) + ν n ( φ ) + ξ n ( φ ) ≥ Q (Π K g ) + ν n (Π K g ) + ξ n (Π K g ) (cid:33) , by applying the Hoeﬀding decomposition (C.1). Due to Assumption 6 (iv) we have local equivalenceof |Q ( · ) | and Q g ( · ). Since Q ( · ) is negative and thus |Q ( · ) | = −Q ( · ) it follows that P (cid:0) Q g ( (cid:98) g − g ) ≥ M r n (cid:1) ≤ P  sup φ ∈G δ,rnK (cid:16) Q ( φ ) + ν n ( φ ) − ν n (Π K g ) + ξ n ( φ ) − ξ n (Π K g )) (cid:17) ≥ − ηQ g (Π K g − g )  ≤ P  sup φ ∈G δ,rnK (cid:16) ν n ( φ ) − ν n (Π K g ) + ξ n ( φ ) − ξ n (Π K g ) + ηQ g (Π K g − g ) (cid:17) ≥ inf φ ∈G δ,rnK |Q ( φ ) |  ≤ P (cid:32) sup φ ∈G δK ν n ( φ ) − ν n (Π K g ) + sup φ ∈G δK ξ n ( φ ) − ξ n (Π K g ) + ηQ g (Π K g − g ) ≥ C M r n (cid:33) , where it remains to study the asymptotic behavior of each summand in the last line separately. Notethat both summands on the left hand-side are positive, hence if sup G δK ν n ( φ ) is bounded in probabilityso is ν n (Π K g ) and similarly for ξ n .First we study the asymptotic behavior of the empirical process part sup φ ∈G δK ν n ( φ ). Recall thedeﬁnition F ν,K = { ν ( · , φ ) : φ ∈ G δK } with envelope F ν . By applying the last display of Theorem 2.14.2of van der Vaart and Wellner [2000] we can conclude that E (cid:12)(cid:12)(cid:12) sup φ ∈G δK ν n ( φ ) (cid:12)(cid:12)(cid:12) = E (cid:12)(cid:12)(cid:12) sup ν ∈F ν,K n n (cid:88) i =1 ν ( S i ) (cid:12)(cid:12)(cid:12) ≤ J [] (1 , F ν,K , L ( S )) · (cid:13)(cid:13) F ν (cid:13)(cid:13) L ( S ) · n − / where (cid:13)(cid:13) F ν (cid:13)(cid:13) L ( S ) ≤ (cid:13)(cid:13) F ν (cid:13)(cid:13) L ∞ ( S ) ≤ C ν < ∞ . By Lemma C.1 (i) and (ii) we havelog N [] ( (cid:15) · (cid:13)(cid:13) F ν (cid:13)(cid:13) L ∞ ( S ) , F ν,K , L ∞ ( S )) ≤ c K log(1 /(cid:15) · C − ν )and ultimately we obtain J [] (1 , F ν,K , L ∞ ( S )) = O ( √ K ) and by Markov’s inequality sup φ ∈G δK ν n ( φ ) = O p ( (cid:112) K/n ). 28t remains to analyze the convergence rate of the degenerate U-process sup φ ∈G δK ξ n ( φ ). Similar toLemma A.1 in Clemencon et al. [2008] we can make use of the following equality for second-orderU-statistics1 n ( n − (cid:88) i (cid:54) = j ξ ( S i , S j , φ ) = 1 n ! (cid:88) π (cid:98) n/ (cid:99) (cid:98) n/ (cid:99) (cid:88) i =1 ξ ( S i , S (cid:98) n/ (cid:99) + i , φ ) (C.2)where π is short-hand for all permutations of { , . . . , n } . Then applying the triangle inequality to(C.2) leads to E (cid:34)(cid:12)(cid:12)(cid:12) sup φ ∈G δK n ( n − (cid:88) i (cid:54) = j ξ ( S i , S j , φ ) (cid:12)(cid:12)(cid:12)(cid:35) ≤ E (cid:12)(cid:12)(cid:12) sup φ ∈G δK (cid:98) n/ (cid:99) (cid:98) n/ (cid:99) (cid:88) i =1 ξ ( S i , S (cid:98) n/ (cid:99) + i , φ ) (cid:12)(cid:12)(cid:12) (C.3)from which we can conclude that for obtaining the convergence rate of the degenerate U-process onthe left-hand side of (C.3) it is suﬃcient to analyze the convergence rate of an empirical process withkernel ξ indexed by the function G δK .The kernel ξ contains non-smooth indicator functions so we cannot apply the exact same reasoningwe used earlier to derive a bound for ν n , as ξ ( S i , S j , φ ) is not continuous in φ . However we can use thefact that ξ ( · , · , φ ) belongs to a VC- subgraph family and we can thus derive the complexity bound inLemma C.1 (iii).Recall the deﬁnition F ξ,K = { ξ ( · , · , φ ) : φ ∈ G δK } and the associated envelope function F ξ . Now weapply Theorem 2.14.1 of van der Vaart and Wellner [2000] E (cid:12)(cid:12)(cid:12) sup φ ∈G δK (cid:98) n/ (cid:99) (cid:98) n/ (cid:99) (cid:88) i =1 ξ ( S i , S (cid:98) n/ (cid:99) + i , φ ) (cid:12)(cid:12)(cid:12) ≤ J [] (1 , F ξ,K , L ( S )) (cid:13)(cid:13) F ξ (cid:13)(cid:13) L ( S ) (cid:98) ( n/ (cid:99) − / Applying Lemma C.1 (iii) we obtain the bound J [] (1 , F ξ,K , L ( S )) ≤ (cid:90) (cid:112) c + c K log(1 /(cid:15) ) d(cid:15) = O ( √ K )and by Markov’s inequality that sup φ ∈G δK ξ n ( φ ) = O p ( (cid:112) K/n ). Finally, we can conclude that P ( Q g ( (cid:98) g − g ) ≥ M r n ) ≤ P (cid:32) sup φ ∈G δK ν n ( φ ) + sup φ ∈G δK ξ n ( φ ) + Q g (Π K g − g ) ≥ C M r n (cid:33) . with sup φ ∈G δK ν n ( φ ) = O p ( (cid:112) K/n ) and sup φ ∈G δK ξ n ( φ ) = O p ( (cid:112) K/n ). Consequently, choosing r n =max { (cid:112) K/n, Q g (Π K g − g ) } we see that the right hand side probability converges to zero as M → ∞ .29hus Q g ( (cid:98) g − g ) = O p ( r n ). By the deﬁnition of the sieve measure of ill-posedness τ K we obtain (cid:107) (cid:98) g − g (cid:107) L ( Z ) ≤ τ K Q g ( (cid:98) g − g ) ≤ τ K O p (cid:16) max { (cid:112) K/n, Q g (Π K g − g ) } (cid:17) = O p (cid:16) τ K (cid:112) K/n, (cid:107) Π K g − g (cid:107) L ( Z ) (cid:17) which concludes the proof. Lemma C.1.

Under Assumption 6 it holds that(i) sup (cid:107) φ − g (cid:107) ∞ ≤ δ | ν ( S i , φ ) | ≤ M ( S i ) · δ with E [ M ( S i )] < ∞ ,(ii) log N [] ( (cid:15), F ν,K , L ∞ ( S )) ≤ c K log(1 /(cid:15) ) for some positive constant c ,(iii) log N ( (cid:15), F ξ,K , L ( S )) ≤ c + c K log(1 /(cid:15) ) , for positive constants c , c . Proof of Lemma C.1.

Proof of part (i). It holds that ν ( S i , φ ) = Y i E [ { φ ( Z i ) > φ ( Z j ) } − { g ( Z i ) > g ( Z j ) }| Z i ]+ E [ Y j ( { φ ( Z j ) > φ ( Z i ) } − { g ( Z j ) > g ( Z i ) } ) | Z i ] − E [ Y i ( { φ ( Z i ) > φ ( Z j ) } − { g ( Z i ) > g ( Z j ) } )]We make use of the fact that as (cid:107) φ − g (cid:107) ∞ ≤ δ and thus g ( z ) − δ ≤ φ ( z ) ≤ g ( z ) + δ for any z in thesupport of Z . Following Chen et al. [2003] (p. 1599-1600) we have thatsup (cid:107) φ − g (cid:107) ∞ ≤ δ | { φ ( Z j ) < φ ( Z i ) } − { g ( Z j ) < g ( Z i ) }| ≤ | { g ( Z j ) < φ ( Z i ) + δ } − { g ( Z j ) < g ( Z i ) − δ }| and thus | ν ( S i , φ ) | ≤| Y i | · | F g ( Z ) ( φ ( Z i ) + δ ) − F g ( Z ) ( g ( Z i ) − δ ) | + | E [ Y i | Z i ] | · | F g ( Z ) ( φ ( Z i ) + δ ) − F g ( Z ) ( g ( Z i ) − δ ) | + | E [ Y i ] | · E [ | F g ( Z ) ( φ ( Z i ) + δ ) − F g ( Z ) ( g ( Z i ) − δ ) | ] ≤ ( | Y i | + | E [ Y i | Z i ] | + | E [ Y i ] | ) · δ where the last inequality follows from Assumption 6 (v), the Lipschitz continuity for the cdf of g ( Z ).Deﬁne M ( S i ) = | Y i | + | E [ Y i | Z i ] | + | E [ Y i ] | . From Assumption 6 (iii) follows that E [ M ( S i )] < ∞ whichconcludes the argument.We continue with the proof of part (ii). By Lemma C.1 (i) we havelog N [] ( (cid:15), F ν,K , L ∞ ( S )) ≤ log N [] ( (cid:15), G K , L ∞ ( Z )) ≤ cK log(1 /(cid:15) )where both inequalities are due to Chen [2007] (pp. 5595 and 5601).We conclude with the proof of part (iii). We make use of the decomposition ξ ( S i , S j , φ ) = ξ ( S i , S j , φ )+ ξ ( S i , S j , φ ) where ξ ( S i , S j , φ ) = Γ( S i , S j , φ ) and ξ ( S i , S j , φ ) = − E [Γ( S i , S j , φ ) | S i ] − E [Γ( S i , S j , φ ) | S j ] + E [Γ( S i , S j , φ )] . N ( (cid:15), F ξ,K , L ( S )) ≤ log N ( (cid:15), F ξ ,K , L ( S )) + log N ( (cid:15), F ξ ,K , L ( S )) . Similar to the proof of part (ii) of Lemma C.1 we obtain log N ( (cid:15), F ξ ,K , L ( S )) ≤ cK log(1 /(cid:15) ) forsome constant c . Below, we follow Chapter 5 of Sherman [1993] to establish that F ξ ,K belongs to aVC-subgraph class. To this end deﬁne the subgraphsubgraph ( ξ ( · , · , φ )) = { ( s i , s j , t ) ∈ supp ( S ) × R : 0 < t < y i [ { φ ( z i ) > φ ( z j ) } − { g ( z i ) < g ( z j ) } ] } = { y i > }{ φ ( z i ) − φ ( z j ) > }{ t > }{ t < F ξ ( z i , z j ) }{ g ( z i ) − g ( z j ) < }∪ { y i < }{ φ ( z i ) − φ ( z j ) < }{ t > }{ t < F ξ, ( z i , z j ) }{ g ( z i ) − g ( z j ) > } and introduce the function m ( t, s , s ; γ , γ , π , π ) := γ t + γ y + ( g ( z ) , p K ( z )) (cid:48) π + ( g ( z ) , p K ( z )) (cid:48) π with the associated function space M = { m ( · , · , · ; γ , γ , π , π ) : γ ∈ R , γ ∈ R , π ∈ R K +1 , π ∈ R K +1 } . Note that M is a ﬁnite vector space of dimension 2( K + 2) and the subgraph can be written assubgraph ( ξ ( · , · , φ )) = { m > }{ m > }{ m > }{ m > }{ m > }∪ { m > }{ m > }{ m > }{ m > }{ m > } (C.4)with functions m i ∈ M for any i = 1 , . . . ,

10. Following e.g. Lemma 2.4 and 2.5 in Pakes and Pollard[1989] it can be established that subgraph ( ξ ( · , · , φ )) belongs to a VC-class of sets and thus the space F ξ is a VC-class of functions. To bound the complexity of the space we require the VC-index of F ξ which we denote as V ( F ξ ) = V (subgraph( ξ )).From Pollard [1984, Lemma 18] it follows that V ( { m i > } ) ≤ K + 2). Applying in van der Vaartand Wellner [2009, Theorem 1.1] to (C.4) then leads to V (subgraph( ξ )) (cid:46) K + 2), so the VC-indexof the space F ξ increases with the same order as the sieve dimension K . Now applying van der Vaart[1998, Theorem 2.6.7] yieldslog N ( (cid:15), F ξ ,K , L ( S )) ≤ log( C · V ( F ξ )(16 e ) V ( F ξ ) (1 /(cid:15) ) V ( F ξ ) − )= log( C ) + log(2( K + 2)) + 2( K + 2) log(16 e ) + 2( K + 2) log(1 /(cid:15) )and together with log N ( (cid:15), F ξ ,K , L ( S )) ≤ cK log(1 /(cid:15) ) the stated result follows. Lemma C.2.

Under Assumptions 1–6 it holds that (cid:107) (cid:98) g − g (cid:107) L ( Z ) = o p (1) . Proof of Lemma C.2.

We need to check the conditions in Lemma A.2 of Chen and Pouzo [2012].31n their notation Q n = Q and g ( k, n, (cid:15) ) = inf φ ∈G K : (cid:107) φ − g (cid:107) L Z ) ≥ (cid:15) |Q ( φ ) | Their condition a is thus satisﬁed and g ( n, k, (cid:15) ) > K the following holds |Q (Π K g ) − Q ( g ) | (cid:46) Q g (Π K g − g ) (cid:46) τ − K (cid:107) Π K g − g (cid:107) L ( Z ) , and thus Q (Π K g ) − Q ( g ) = o (1). Next, Condition c is implicitly assumed to hold and it remains tocheck condition d which translates asmax {|Q (Π K g ) − Q ( g ) | , sup φ ∈G K |Q n ( φ ) − Q ( φ ) |} g ( n, k, (cid:15) ) = o (1) . Analogous to the empirical process result from (C.2) and (C.3) and the subsequent proceedings, itholds that sup φ ∈G K |Q n ( φ ) − Q ( φ ) | (cid:46) (cid:112) K/n . Then ultimately consider that for any (cid:15) > (cid:15) ∗ > g ( k, n, (cid:15) ) = inf G K : (cid:107) φ − g (cid:107) L Z ) ≥ (cid:15) |Q ( φ ) | ≥ inf G K : (cid:107) φ − g (cid:107) L Z ) ≥ (cid:15) ∗ Q g ( φ − g ) ≥ inf G K : (cid:107) φ − g (cid:107) L Z ) ≥ (cid:15) ∗ τ − K (cid:107) φ − g (cid:107) L ( Z ) ≥ τ − K (cid:15) ∗ . In summary we require thatmax {|Q (Π K g ) − Q ( g ) | , sup φ ∈G K |Q n ( φ ) − Q ( φ ) |} (cid:14) g ( n, k, (cid:15) ) (cid:46) τ K max { (cid:112) K/n, τ − K (cid:107) φ − g (cid:107) L ( Z ) } = o (1) , which follows from the rate restriction in Assumption 6 (vi). References

J. Abrevaya and J. A. Hausman. Semiparametric estimation with mismeasured dependent variables:an application to duration models for unemployment spells.

Annales d’Economie et de Statistique ,pages 243–275, 1999.J. Abrevaya and J. A. Hausman. Response error in a transformation model with an application toearnings-equation estimation.

The Econometrics Journal , 7(2):366–388, 2004.J. Abrevaya and Y. Shin. Rank estimation of partially linear index models.

The Econometrics Journal ,14(3):409–437, 2011.D. Ben-Moshe, X. D’Haultfœuille, and A. Lewbel. Identiﬁcation of additive and polynomial modelsof mismeasured regressors without instruments.

Journal of Econometrics , 200(2):207–222, 2017.32. Berkson. Are there two regressions?

Journal of the American Statistical Association , 45(250):164–180, 1950.C. Breunig and P. Haan. Nonparametric regression with selectively missing covariates. arXiv preprintarXiv:1810.00411 , 2018.C. Breunig, E. Mammen, and A. Simoni. Nonparametric estimation in case of endogenous selection.

Journal of Econometrics , 202(2):268 – 285, 2018.C. Breunig, S. Huck, T. Schmidt, and G. Weizs¨acker. The standard portfolio choice problem ingermany.

CRC TRR 190 Discussion Paper , (171), 2019.C. Cavanagh and R. P. Sherman. Rank estimators for monotonic index models.

Journal of Economet-rics , 84(2):351–381, 1998.X. Chen. Large sample sieve estimation of semi-nonparametric models.

Handbook of Econometrics ,2007.X. Chen and D. Pouzo. Estimation of nonparametric conditional moment models with possibly nons-mooth generalized residuals.

Econometrica , 80(1):277–321, 2012.X. Chen, O. Linton, and I. Van Keilegom. Estimation of semiparametric models when the criterionfunction is not smooth.

Econometrica , 71(5):1591–1608, 2003.X. Chen, H. Hong, and E. Tamer. Measurement Error Models with Auxiliary Data.

The Review ofEconomic Studies , 72(2):343–366, 04 2005.X. Chen, H. Hong, and D. Nekipelov. Nonlinear models of measurement errors.

Journal of EconomicLiterature , 49(4):901–37, December 2011.P.-A. Chiappori, I. Komunjer, and D. Kristensen. Nonparametric identiﬁcation and estimation oftransformation models.

Journal of Econometrics , 188(1):22 – 39, 2015.S. Clemencon, G. Lugosi, and N. Vayatis. Ranking and empirical minimization of u-statistics.

TheAnnals of Statistics , 36(2):844–874, 2008. ISSN 00905364.X. D’Haultfoeuille. A new instrumental method for dealing with endogenous selection.

Journal ofEconometrics , 154(1):1–15, 2010.T. Drerup, B. Enke, and H.-M. von Gaudecker. The precision of subjective data and the explanatorypower of economic models.

Journal of Econometrics , 200(2):378 – 389, 2017.F. Dunker, J.-P. Florens, T. Hohage, J. Johannes, and E. Mammen. Iterative estimation of solu-tions to noisy nonlinear operator equations in nonparametric instrumental regression.

Journal ofEconometrics , 178:444–455, 2014. 33. Fan, F. Han, W. Li, and X.-H. Zhou. On rank estimators in increasing dimensions.

Journal ofEconometrics , 214:379–412, 2020.A. K. Han. Non-parametric analysis of a generalized regression model: the maximum rank correlationestimator.

Journal of Econometrics , 35(2-3):303–316, 1987.J. A. Hausman, W. K. Newey, H. Ichimura, and J. L. Powell. Identiﬁcation and estimation of polyno-mial errors-in-variables models.

Journal of Econometrics , 50(3):273 – 295, 1991.S. Hoderlein and J. Winter. Structural measurement errors in nonseparable models.

Journal ofEconometrics , 157(2):432 – 440, 2010.S. Hoderlein, B. Siﬂinger, and J. Winter. Identiﬁcation of structural models in the presence of mea-surement error due to rounding in survey responses. 2015.Y. Hu and S. M. Schennach. Instrumental variable treatment of nonclassical measurement error models.

Econometrica , 76(1):195–216, 2008.G. W. Imbens and W. K. Newey. Identiﬁcation and estimation of triangular simultaneous equationsmodels without additivity.

Econometrica , 77(5):1481–1512, 2009.D. Jacho-Chavez, A. Lewbel, and O. Linton. Identiﬁcation and nonparametric estimation of a trans-formed additively separable model.

Journal of Econometrics , 156(2):392 – 407, 2010.S. Khan. Two-stage rank estimation of quantile index models.

Journal of Econometrics , 100(2):319–355, 2001.A. Lewbel. An overview of the special regressor method. 08 2014.R. Matzkin.

Nonparametric and Semiparametric Methods in Econometrics and Statistics , chapter ANonparametric Maximum Rank Correlation Estimator. Cambridge: Cambridge University Press,1991.R. L. Matzkin. Restrictions of economic theory in nonparametric methods.

Handbook of econometrics ,4:2523–2558, 1994.R. L. Matzkin. Nonparametric identiﬁcation.

Handbook of Econometrics , 6:5307–5368, 2007.M. D. Nadai and A. Lewbel. Nonparametric errors in variables models with measurement errors onboth sides of the equation.

Journal of Econometrics , 191(1):19 – 32, 2016.W. Newey, J. L. Powell, and F. Vella. Nonparametric estimation of triangular simulataneous equationsmodels.

Econometrica , 67(3):565–603, 1999.D. Nolan and D. Pollard. U-processes: Rates of convergence.

The Annals of Statistics , 15(2):780–799,1987. 34. Pakes and D. Pollard. Simulation and the asymptotics of optimization estimators.

Econometrica:Journal of the Econometric Society , pages 1027–1057, 1989.D. Pollard.

Convergence of Stochastic Processes . Springer Series in Statistics, 1984.S. Schennach. Instrumental variable estimation of nonlinear errors-in-variables models.

Econometrica ,75(1):201–239, 2007.S. M. Schennach. Measurement error in nonlinear models - a review.

Advances in Economics andEconometrics, Theory and Applications: Tenth World Congress of the Econometric Society , 2013.R. P. Sherman. The limiting distribution of the maximum rank correlation estimator.

Econometrica ,pages 123–137, 1993.Y. Shin. Local rank estimation of transformation models with functional coeﬃcients.

EconometricTheory , 26(6):1807–1819, 2010.G. Tang, R. J. Little, and T. E. Raghunathan. Analysis of multivariate missing data with nonignorablenonresponse.

Biometrika , 90(4):747–764, 2003.A. van der Vaart and J. Wellner.

Weak Convergence and Empirical Processes: With Applications toStatistics (Springer Series in Statistics) . Springer, corrected edition, Nov. 2000.A. van der Vaart and J. Wellner. A note on bounds for vc dimensions.

IMS Collections: HighDimensional Probability , 5:103–107, 2009.A. W. van der Vaart.

Asymptotic statistics.

Cambridge University Press, 1998.H. White and K. Chalak. Testing a conditional form of exogeneity.

Economics Letters , 109(2):88–90,2010.J. Zhao and J. Shao. Semiparametric pseudo-likelihoods in generalized linear models with nonignorablemissing data.