Nonclassical Measurement Error in the Outcome Variable
NNonclassical Measurement Error in theOutcome Variable ∗ Christoph Breunig † Emory University
Stephan Martin ‡ Humboldt-Universit¨at zu Berlin
September 29, 2020
We study a semi-/nonparametric regression model with a general form of nonclassicalmeasurement error in the outcome variable. We show equivalence of this model to a gen-eralized regression model. Our main identifying assumptions are a special regressor typerestriction and monotonicity in the nonlinear relationship between the observed and unob-served true outcome. Nonparametric identification is then obtained under a normalizationof the unknown link function, which is a natural extension of the classical measurement er-ror case. We propose a novel sieve rank estimator for the regression function. We establisha rate of convergence of the estimator which depends on the strength of identification. InMonte Carlo simulations, we find that our estimator corrects for biases induced by measure-ment errors and provides numerically stable results. We apply our method to analyze beliefformation of stock market expectations with survey data from the German Socio-EconomicPanel (SOEP) and find evidence for non-classical measurement error in subjective beliefdata.
Keywords:
Non-classical measurement error, rank based estimation, shape restrictions,nonparametric identification, special regressors, generalized regression, sieve estimation. ∗ We are thankful to seminar participants at CMStatistics/CFE in London, Humboldt-Universit¨at zu Berlin, Retreatof CRC TRR 190, and UEA in Norwich for their helpful suggestions. Financial support by Deutsche Forschungsge-meinschaft through CRC TRR 190 is gratefully acknowledged. † Department of Economics, Emory University, Rich Memorial Building, Atlanta, GA 30322, USA. Email: [email protected] ‡ Humboldt-Universit¨at zu Berlin, Spandauer Straße 1, 10178 Berlin, Germany, e-mail: [email protected] a r X i v : . [ ec on . E M ] S e p . Introduction In empirical research, measurement error is a recurring issue. In recent years much attention hasbeen given to various forms of measurement error in the covariates of econometric models, whereasmeasurement error of the dependent variable is mostly ignored. In many economic environmentsmeasurement error of the dependent variable may be driven (in a nonlinear fashion) by the underlyingvariable. This nonclassical measurement error implies biased estimation results if not accounted for.This paper is concerned with semi-/nonparametric regression models where the dependent variableof interest Y ∗ is generally not observed and only a possibly error-contaminated measurement Y isobservable. Specifically, Y ∗ satisfies Y ∗ = g ( X ) + U, where the unknown function g is of interest given observed covariates X and unobservables U . Westudy the non-classical measurement error case where E [ Y | Y ∗ , X ] (cid:54) = Y ∗ and hence regressing theobserved outcome on covariates generally does not provide us with a consistent estimate for the truemean regression function.Nonparametric identification of our model relies on the availability of covariates which do not affectthe measurement error directly. We impose such type of exclusion restriction on a subset Z of the vector X = ( Z, W ) where W are additional controls. Under a monotonicity condition on the measurementerror mechanism E [ Y | Y ∗ , X ] the regression model can be reformulated as a generalized regressionmodel of the form E [ Y | X = x ] = H ( g ( x ) , w )where H ( · , w ) is some nonlinear, monotonic function for w in the support of W . From this gener-alized regression model, we establish identification of functions g or g ( · , w ) up to strictly monotonictransformations.The identification up to strictly monotonic transformations still allows us to infer economicallyrelevant quantities such as the direction and shape of partial effects. Further, under scale and locationnormalization of the unknown link function H , nonparametric identification of the regression function g is obtained. We highlight that normalization of the link function H is equivalent to imposing mildshape restrictions on the measurement error mechanism. Additionally, our normalization conditionson the link function do not only naturally extend the classical measurement case but are also satisfiedif there is a range of Y ∗ where measurement error is classical. Our nonparametric identification resultsbuild thus on intuitive assumptions without relying on high-level assumptions such as completeness,see Hu and Schennach [2008].We propose a novel sieve rank-based minimum distance estimator and establish its asymptoticproperties. The estimator builds on U-Statistics as well as more recent results linking U-statistics andempirical processes. We find that the sieve rank estimator generally suffers from ill-posedness in theconvergence rate as the rank-based criterion function is not continuous in the usual L -norm. We alsoextend the estimator to continuous controls W using kernel weights.We analyze the performance of the estimator in a Monte Carlo simulation study and in an empirical2pplication using survey data. We apply our estimator to study belief formation with subjective beliefdata from the German Socio-Economic Panel innovation sample (SOEP-IS). Subjective belief data isknown to be plagued by substantial measurement error and it is in general hard to justify that themeasurement error is classical and thus not sensitive to the underlying true individual belief. We studythe impact of an exogenous display of historic stock market returns provided to survey respondentsprior to eliciting their belief on future returns. Applying our method, we find a monotonic and concaverelationship between the historic information and stated beliefs indicating that individuals acknowledgethe given information conservatively. Literature
Our work ties into the literature on measurement error in observable variables of econo-metric models. The literature on measurement error in covariates is extensive, whereas measurementerror in the outcome variable has received much less attention. For a review of models with errors incovariates, see e.g. Chen et al. [2011] and Schennach [2013]. Chen et al. [2005] develop a general wayof accounting for measurement error in any variable of a class of semiparametric models once auxiliarydata, e.g. from validation samples is available. However, this is hardly the case in most practical appli-cations. Models focusing on non-classical measurement error in the outcome side are rare. Chapter 3 ofAbrevaya and Hausman [1999] considers a semiparametric model with a more simplistic measurementerror mechanism. Hoderlein and Winter [2010] and Hoderlein et al. [2015] develop structural modelsof response error in surveys due to imperfect recall and derive testable implications for econometricanalyses. The latter paper focuses on the role of rounding in individual reporting behavior which isalso a more specific form of non-classical measurement error.Nadai and Lewbel [2016] allows for classical measurement error in the outcome variable that iscorrelated with an error in covariates. Abrevaya and Hausman [2004] consider classical measurementerror of the dependent variable in a transformation model. Given we have a precise idea on theform of measurement error, a sizeable literature is usually available providing different strategies foridentification. For instance a special case of nonclassical measurement error is selective non-responsein the outcome variable, see e.g. D’Haultfoeuille [2010] or Breunig et al. [2018] and references therein.A non-nested form of nonclassical measurement error are Berkson-type errors, see Berkson [1950] andSchennach [2013, Section 6.3].Our identifying assumptions lead us to the literature on generalized regression models as introducedin Han [1987] or the class of nonlinear index models in Matzkin [2007]. See also the model studiedin Jacho-Chavez et al. [2010]. Estimation of such models often proceeds by rank-based estimationstrategies, see Han [1987], Cavanagh and Sherman [1998], Khan [2001], Shin [2010] and Abrevayaand Shin [2011] which all consider parametric regression models with the exception of Matzkin [1991]who studies a nonparametric model with additional shape restrictions on the link function. A recentcontribution studying rank estimators in a high-dimensional setting is Fan et al. [2020]. To the bestof our knowledge, we are the first to study nonparametric M-estimation with rank-based criterionfunctions and to point out and illustrate the ill-posedness of the estimation problem.The remainder of the paper is organised as follows. In Section 2 we present our model setup andgive a nonparametric identification result for features of the mean regression function when there isa form of non-classical measurement error in the outcome variable. In Section 3 we introduce a sieveestimator with a rank based criterion function and establish its convergence. In Section 4 we analyze3nite sample properties of the estimator in a Monte Carlo simulation study. Section 5 contains anapplication of our method to belief formation of stock market expectations. Appendix A providesdescriptive statistics on the empirical data. Appendix B provides an extension to weighted sieve rankestimation, when control variables are continuous. All proofs are postponed to the Appendix C.
2. Model Setup and Identification
We consider a nonparametric econometric model with measurement error in the outcome variable.The model we study is Y ∗ = g ( X ) + U (2.1)where Y ∗ is the scalar, outcome variable, X is a d x -dimensional vector of exogenous covariates, U is ascalar error term, and g a nonparametric function of interest. The outcome variable Y ∗ is not observedby the researcher; only an error contaminated measurement Y is available. We are primarily interestedin the case where the error satisfies E [ U | X ] = 0 and thus g is the unknown conditional expectationfunction of Y ∗ given X .Throughout the paper we assume that the regressors X can be decomposed such that X = ( Z (cid:48) , W (cid:48) ) (cid:48) where Z has no direct effect on the measurement error and W are control variables. Also we introducethe notation g w ( · ) ≡ g ( · , w ) for the regression function evaluated at a fixed w in the support of W .Our goal is to identify and estimate the unknown g under possibly non-classical measurement errorin the outcome variable. In the next section we therefore present restrictions on the model and formof the measurement error which give us a nonparametric identification result for certain features of g ,i.e., the function g w up to strictly monotonic transformations. Assumption 1 (Exclusion Restriction) . The observed outcome Y is conditionally mean independentof Z given Y ∗ and W , i.e., E [ Y | Y ∗ , Z, W ] = E [ Y | Y ∗ , W ] . Assumption 1 rules out that Z has a direct effect on the measurement Y in conditional means.Assumption 1 is generally weaker than assuming that the conditional distribution of Y given ( Y ∗ , Z, W )does not depend on Z , which restricts Z to have no information on Y that is not captured by ( Y ∗ , W ).Analogues exclusion restrictions are commonly imposed in the literature on non-classical measurementerror in covariates. In Assumption 2 (ii) of Hu and Schennach [2008] the distribution of the error-contaminated regressor is independent of instruments conditional on the latent regressor (see alsoSchennach [2013, Section 4.3]). Assumption 1 is less restrictive than other exclusion restrictions foundin the measurement error literature, see Ben-Moshe et al. [2017, Assumption 2.1 (iii)].A similar condition to Assumption 1 can also be found in the literature on selective non-responsewhich is a special case of non-classical measurement error in the outcome. Individuals either reportthe outcome truthfully (response indicator D = 1) or not at all ( D = 0) so the observed outcome inthis case is Y = DY ∗ . An identifying assumption in D’Haultfoeuille [2010] and Breunig et al. [2018]is that D ⊥⊥ X | ( Y ∗ , W ) which is similar to our Assumption 1. See also Tang et al. [2003] and Zhaoand Shao [2015] for similar conditions. 4n the following, we make use of the notation h ( Y ∗ , W ) = E [ Y | Y ∗ , W ]. Assumption 1 implies themeasurement error model Y = h ( Y ∗ , W ) + V where E [ V | Y ∗ , W ] = 0. Consequently, Assumption 1 implies conditional mean independence of themeasurement error V given the regression error U , that is, E [ V | U ] = 0. Below, for any random variable X , its support is denoted by supp ( X ). We now impose shape restrictions on the conditional meanfunction h . Assumption 2 (Monotonicity) . For any w ∈ supp ( W ) , the function h ( · , w ) is weakly monotonic andnon-constant over the support of Y ∗ . Assumption 2 imposes that the expected observed outcome Y is monotonic in the latent outcome Y ∗ given W . This is trivially satisfied when the measurement error is classical, i.e., when h does notdepend on W and is the identity. A similar monotonicity condition has also been imposed in themeasurement error model in Example 3 of Abrevaya and Hausman [1999]. We discuss the plausiblity of Assumption 2 in a setting with survey data in Example 2.1. Note that h does not need to be strictly monotonic which allows to consider models with rounding error in theoutcome, see Hoderlein et al. [2015]. Assumption 3 (Conditional Exogeneity) . The conditional independence restriction Z ⊥⊥ U | W holds. Assumption 3 imposes a conditional independence restriction of Z and the regression error U . Thiscondition is also known as conditional exogeneity assumption following White and Chalak [2010]. In-dependence assumptions can be restrictive, but are often required in the measurement error literature(see, e.g. Hausman et al. [1991], Schennach [2007], Ben-Moshe et al. [2017, Assumption 2.2]), or whenaccounting for endogeneity using control functions (see, e.g. Newey et al. [1999]). We relax the suchrestrictions by imposing independence only conditional on control variables W . Similar conditionsare often employed for identification in the econometrics literature, see e.g. Chiappori et al. [2015]for nonparametric identification in a transformation model. It corresponds to the unconfoundednessassumption in the treatment effects literature and is also closely related to the special regressor as-sumption, see Lewbel [2014] for a review.A key implication of Assumptions 1–3 is E [ Y | X = x ] = E [ h ( Y ∗ , W ) | Z = z, W = w ]= E [ h ( g ( Z, W ) +
U, W ) | Z = z, W = w ]= E [ h ( g ( z, W ) + U, W ) | W = w ]=: H ( g w ( z ) , w ) (2.2) In our notation Abrevaya and Hausman [1999] consider the error mechanism Y = h ( Y ∗ , V ), with ∂ y h ( Y ∗ , V ) > ∂ v h ( Y ∗ , V ) > V ⊥⊥ ( X, U ). As we allow for heteroscedasticity in the measurement error model, condition ∂ y h ( Y ∗ , V ) > H is a function that is strictly monotonically increasing in its first argument as we show inthe proof of our main identification result below. As we see from the previous display, nonclassicalmeasurement error implies heterogeneous biases for the marginal effects. When ∂ z H ( g w ( z ) , w ) < attenuation bias for the marginal effect ∂ z g w ( z ) and when ∂ z H ( g w ( z ) , w ) > augmentation bias for ∂ z g w ( z ).This is a type of model that is closely related to the class of generalized regression models studiedby Han [1987], Matzkin [1991], Cavanagh and Sherman [1998] and Matzkin [2007]. Our identificationargument exploits the monotonicity in (2.2) to establish identification of g w up to strictly monotonictransformations based on arguments from the literature on generalized regression models. Example 2.1 (Example with Subjective Belief Data) . In this example, we discuss the plausibility ofour assumptions in our empirical application. In Section 5, we analyze how individuals adapt theirbeliefs on future stock market returns when they are provided with information on historical returns.Survey respondents are presented two randomly chosen realizations from a series of historical stockindex returns which we denote as Z and Z . They are asked to state their belief on stock returnsin the next year. Let Y ∗ denote an individual’s true belief and Y their reported belief. In this case, E [ Y ∗ | Z , Z ] characterizes variation in individual beliefs with respect to historic information. For thisexample we discuss the validity of our identifying Assumptions 1–3. Figure 1: Nonparametric estimates of g . The estimate in the right panel is obtained by assumingat most classical measurement error in the outcome variable, whereas the left panel showsresults obtained from applying our correction. Assumption 1 requires that historical return variations have no information on the mean of reportedbeliefs Y which is not already captured by the true, latent belief Y ∗ . Assumption 2 translates to the ild requirement that individuals with higher beliefs report on average (weakly) larger values thanrespondents with lower beliefs. This weak inequality is important as it allows for flexible forms ofrounding as the reporting function h ( · ) in Assumption 2 can map neighborhoods of Y ∗ into flat regionsof Y . Assumption 3 implies that higher moments of the regression error U conditional on Z , Z are constant, which rules out e.g. conditional heteroskedasticity since we neglect additional controlvariables W in this example. A further analysis, using more control variables is presented in Section5. Figure 1 compares estimation results from ignoring measurement error in the outcome to ourproposed correction.We see that our estimate exhibits a monotone, concave relationship between treatments and beliefs.On the other hand, without accounting for selective measurement error, we obtain a heterogenous, inpart flat or convex relationship which results in a different interpretation. Below, we introduce the notation supp ( V ) for the support of a random vector V . Assumption 4.
For any w ∈ supp ( W ) : (i) the function g w is continuous, (ii) and any z , z ∈ supp ( Z ) such that g w ( z ) < g w ( z ) there exists u ∈ supp ( U ) satisfying h ( g w ( z ) + u, w ) < h ( g w ( z ) + u, w ) ; (iii) there is at least one variable Z (1) in Z satisfying f Z (1) | Z ( − ,W ( z | z − , w ) > for all ( z , z − ) ∈ supp ( Z ) . Assumption 4 (ii) is a mild support condition on U conditional on W = w . The unobservable U must vary sufficiently to shift g w ( Z ) out of a flat region of h . The assumption is not required if h isalready strictly monotonic in its first argument. Assumption 4 (iii) requires Z to contain at least onecontinuously distributed variable with sufficient variation. The case with scalar Z is allowed with theassumption becoming f Z | W ( z | w ) > z ∈ supp ( Z ). This rules out the case of Z being a discretescalar variable. Lemma 2.1.
Let Assumptions 1–4 be satisfied, then for any w ∈ supp ( W ) the function g w ( · ) isidentified up to strictly increasing transformations. The identification result in Lemma 2.1 builds on Matzkin [2007, Theorem 3.2]. Without furthermodel restrictions we are able to point identify those features of g w that are preserved under strictlymonotonic transformations. This includes the sign of partial effects, the ratio of two partial effectsand properties such as quasi-concavity/convexity of the function. For the remainder of the paper weconsider the estimation of g w in the point identified case.Economic restrictions on the model can be employed to sufficiently restrict the function space. Werefer to the discussion in Sections 3.4 and 4.4 in Matzkin [2007] where several possible function spacesare discussed that satisfy Assumption 5. This includes the spaces of function that are homogeneousof degree one, additive separable and so called “least-concave” functions, see also Matzkin [1994].Matzkin [2007] shows that imposing homogeneity of degree 1 and a location normalization is sufficientfor Assumption 5. Homogeneous functions are frequently encountered in microeconomics. Thus, inapplications where the function g has the structural interpretation of a production or cost function,homogeneity can be a reasonable restriction on the parameter space. In a general mean regressionsetting, however, it is not clear why the regression function should satisfy such a property. Same holdstrue for the least-concavity property. 7e impose the following restriction on the model and the measurement error mechanism describedby the function H . Assumption 5. (i) The function g w is additively separable such that there exists a decomposition Z = ( Z , Z − ) such that g w ( Z ) = m w ( Z ) + l w ( Z − ) for some functions m w , l w . (ii) There exists { z , z } ⊂ supp ( Z ) with g w ( z ) (cid:54) = g w ( z ) and E [ Y | Z = z, W = w ] = E [ Y ∗ | Z = z, W = w ] for z ∈ { z , z } . Assumption 5 (i) imposes an additive separable structure on the regression function g w . Followingthe identification statement in Lemma 2.1 mere location and scale normalizations are not sufficient topoint identify g w . However for any additive separable model this is the case. See also Jacho-Chavezet al. [2010] who study identification of (2.2) under Assumption 5 (i). Assumption 5 (ii) is a restrictionon the measurement error mechanism and supposes that there are points z , where the correct g w coincides with the function obtained from ignoring the measurement error. For instance, one canthink of pension information to account for nonclassical measurement error in labor income surveyquestions (see Breunig and Haan [2018]). Here, for certain ranges of labor income (e.g. close to themedian) we may assume that the measurement error is of classical form. Another interesting feature ofAssumption 5 is that it implies a normalization of the unknown, nonparametric link function H . Thisis in contrast to nonparametric generalized regression models where the normalization is imposed onthe unknown function of interest Assumption 5 (ii) is also in line with normalization requirements foridentification under nonclassical measurement error. For instance, Assumption 5 of Hu and Schennach[2008] requires some functional of the distribution of the measurement error conditional on the valueof the true variable to be equal to the true variable itself, such as some quantile of Y | Y ∗ = y ∗ tocorrespond to y ∗ . Corollary 2.2.
Let Assumptions 1– 5 (i) be satisfied, then g w ( · ) is identified up to a location and scalenormalization. If 5 (ii) is additionally satisfied then g w is point identified. Corollary 2.2 establishes identification of the regression function under normalization imposed inAssumption 5, which essentially is a shape restriction on the functional form of measurement error.For an alternative identification argument for transformed additively separable models see Theorem2.1. of Jacho-Chavez et al. [2010].We neither restrict the support of the observed outcome Y , nor require continuity in the function h ( · , w ). Thus, we can also cover cases where the observed outcome is categorical or has mass points.This likely occurs in survey data as respondents tend to provide rounded values. The following remarksconsider two important special cases of model (2.1) which shed a different light on the interpretationof Assumptions 1–3. Remark 2.1 (Control function approach) . We can also motivate the presence of W in Assumption 3as a control function. To this end we deviate for a moment from our previous notation and introducethe following triangular model Y ∗ = g ( X ) + UX = m ( Z, η ) 8 here for simplicity X is a one-dimensional endogenous covariate that may correlate with the modelerror U . m ( Z, η ) is strictly monotonic in η and Z is an appropriate instrument that satisfies Z ⊥⊥ ( U, η ) Under additional regularity conditions outlined in Theorem 1 of Imbens and Newey [2009] it holds that X ⊥⊥ U | W with W = F X | Z ( X, Z ) = F η ( η ) Then if Assumption 1 is formulated as E [ Y | Y ∗ , X, W ] = E [ Y | Y ∗ , W ] and if the latter function ismonotonic as in Assumption 2 we can follow the same reasoning leading up to Theorem 2.1 to establishthat g is identified up to a strictly monotonic transformation. Also note that our method may be appliedin any setting where Y ∗ is an endogenous regressor within a triangular model as it is the outcomevariable in the reduced form equation. Remark 2.2 (Selective Nonresponse) . Consider a nonresponse model Y = DY ∗ D = φ ( Y ∗ , W, V ) , for some unknown function φ , where the response indicator D ∈ { , } is always observed and Y ∗ is only observed if D = 1 . This framework, where the response mechanism is mainly driven by thelatent outcome Y ∗ has been studied by D’Haultfoeuille [2010] and Breunig et al. [2018]. As long asthe conditional mean function h ( Y ∗ , W ) = P ( D = 1 | Y ∗ , W ) Y ∗ is monotonic in its first argument, themodel is in accordance to Assumption 2. This holds e.g. when the conditional response probabilityfunction is monotonic and the support of Y ∗ is bounded below . In contrast to D’Haultfoeuille [2010]and Breunig et al. [2018] there is no need for a completeness condition for nonparametric identificationof the conditional selection probability P ( D = 1 | Y ∗ , W ) via conditional moment restrictions.
3. Estimation and Asymptotic Properties
In this section we introduce a nonparametric sieve M-estimator with a simple, rank-based criterionfunction. For simplicity we consider only the case where W consists of discrete variables and defer theestimation with continuous W to Section B of the appendix. Our identification result builds on shape restrictions imposed on the measurement error mechanismwhich imply identified moment conditions. Specifically, for a given w we can conclude from the If Y ∗ is bounded below, then Y ∗ can be redefined such that without loss of generality Y ∗ ≥ h ( Y ∗ , W ) = P ( D = 1 | Y ∗ , W ) Y ∗ follows from taking the derivative. g w maximizes the function Q ( φ, w ) = E [ Y { φ ( X ) > φ ( X ) } | W = W = w ] . Based on this population criterion we now consider a sieve rank estimator, which implicitly accountsfor imposed shape restrictions on the measurement error.We introduce a sieve space G K which depends on the dimension parameter K = K ( n ) which growswith sample size n . and thus we suggest the following estimator (cid:98) g w of g w that maximizes a populationanalogue (cid:98) g w = arg max φ ∈G K Q n ( φ, w ) where (3.1) Q n ( φ, w ) := 2 n ( n − (cid:88) ≤ i
Consider the additive separable model g ( Z ) = Z + (cid:101) g ( Z ) with bivariate Z = ( Z , Z ) .Then Assumption 6 (iv) is satisfied if f (cid:48) Z | Z is uniformly bounded away from zero and f (cid:48)(cid:48) Z | Z is uniformlybounded above. The special case outlined in Lemma 3.1 illustrates the behavior of τ K . If the density f Z | Z , thatis the conditional density of the separable covariate, is flat in the relevant support, we may encounterthe case that the criterion Q is close to zero for candidate functions that are arbitrarily far away fromthe true function in the L - sense.We further illustrate this issue in a Monte Carlo simulation study in Section 4, where we showthat the estimation problem is more severely ill-posed whenever f Z | Z is flat. So the behavior of τ K will generally depend on the distribution of observables as well as the specific model under study andchosen normalization. Theorem 3.2.
Let Assumptions 1-6 be satisfied. It holds that (cid:107) (cid:98) g − g (cid:107) L ( Z ) = O p (cid:16) max (cid:110) τ K (cid:114) Kn , K − α/d z (cid:111)(cid:17) The proof is based on the Hoeffding decomposition of U-statistics and makes use of a representationof second-order U-processes as empirical processes as in Clemencon et al. [2008]. To the best of ourknowledge this is the first convergence rate result for nonparametric M-estimators with a rank-basedcriterion function and thus also the first to acknowledge the ill-posedness of the problem.The next corollary provides concrete rates of testing when the dimension parameter K is chosento level variance and square bias under classical smoothness conditions. We call our model mildlyill-posed if: τ k ∼ k γ/d z with γ > severely ill-posed if: τ k ∼ exp( k γ/d ), with γ > Corollary 3.3.
Let Assumptions 1-6 be satisfied.1. Mildly ill-posed case: setting K ∼ n d z /d z +2 γ +2 α yields (cid:107) (cid:98) g − g (cid:107) L ( Z ) = O p ( n − α/ α +2 γ + d z ) .
2. Severely ill-posed case: setting K ∼ log( n ) d/γ yields (cid:107) (cid:98) g − g (cid:107) L ( Z ) = O p (log( n ) − α/γ ) . If { a n } and { b n } are sequences of positive numbers, we use the notation a n (cid:46) b n if lim sup n →∞ a n /b n < ∞ and a n ∼ b n if a n (cid:46) b n and b n (cid:46) a n .
4. Monte Carlo Simulation Study
This section demonstrates how non-classical measurement errors in the outcome alters mean regressionresults in finite samples and shows the usefulness of our approach to correct for such biases. We compareregression function estimates obtained from simply ignoring the measurement error with our estimatorwhich accounts for the presence of the error. Throughout this section, simulation results are based ona sample of size of n = 1000 and 1000 Monte Carlo iterations.We consider the following data generating process Y ∗ = Z + g ( Z ) + UY = h ( Y ∗ ) + V where Z ∼ N (1 , σ ), Z ∼ U [ − ,
3] independent of each other, g ( · ) = sin( · ) and the error terms( U, V ) ∼ N (0 , I ). Here, I is the 2-dimensional identity matrix and for the standard deviation of Z we choose σ = 1, which will be varied later. In the above model, g is identified up to a locationnormalization. Analogously we could specify a linear or nonlinear function on Z and impose anadditional scale normalization on g . The function h in the measurement error equation is chosen as, h ( Y ∗ ) = q . + b ( Y ∗ − q . ) if Y ∗ > q . Y ∗ , if q . ≤ Y ∗ ≤ q . q . − a ( q . − Y ∗ ) if Y ∗ < q . where q . , q . denote the 30%- and 70%-quantile of Y ∗ . The setup is analogous to a survey datasetting with over- or underreporting in the tails of Y ∗ , whereas the center of the distribution is notaffected. The scalars a, b can be chosen to vary the magnitude of measurement error.Figure 2 illustrates the effects of the measurement error for the case a = b = 0 .
5. We show therealizations of Y and Y ∗ for a specific draw of the data generating process and plots the function h .We compare the measurement error function h (depicted as red solid line) with the setup of classicalmeasurement error, which is captured by the 45 ◦ line (depicted as black dashed line).We implement the sieve rank estimator (cid:98) g given in (3.1) using a linear sieve space with B-splinebasis functions of order 3 with 2 interior knots that are placed according to quantiles of the empiricaldistribution. Thus we have K = 4. The elements of the sieve space are normalized to move throughthe point (0,0) which is the correct value of the true function sin( · ) at 0. This normalization canalso be rationalized as utilizing prior knowledge on the measurement error mechanism in the senseof Assumption 5 (ii). For instance, we can expect that ignoring the measurement error results inestimates that are close to the true function g in the center of the distribution of Z . Figure 3 showsthe sieve rank estimates (cid:98) g and compares them to a nonparametric series regression that does not13 − Y * Y Figure 2: Realizations of Y ∗ , Y when a = b = 0 . n = 1000. The redsolid line depict the function h and the black dashed line the 45 ◦ line.account for nonclassical measurement error in the outcome using the same order and the same knotplacement as for (cid:98) g . For the latter estimator the same choice of basis functions and tuning parametersis adopted. We study the cases a = b = 0 . a = b = 0 which essentiallyimplies that at some point the measurements Y are merely random fluctuations around a constantvalue . We observe that our estimation strategy results in an accurate estimate of g in both cases,whereas ignoring the measurement error yields estimates with a sizeable bias in the tails of Z . In thesevere setting depicted in the right panel, ignoring measurement error results in a rather flat estimatewhich is significantly different from the sieve rank estimator.The data generating process chosen here is in line with the model in Lemma 3.1 and thus allows usto study the degree of ill-posedness in the convergence rate of the estimator. As pointed out in thediscussion following Lemma 3.1, the behavior of the sieve measure of ill-posedness τ K is governed bythe conditional density f Z | Z . If the density f Z | Z is flat over the relevant support, τ K diverges fasterand the ill-posedness is more severe.Figure 4 below shows estimates across different standard deviations of the separable covariate Z which affects the slope of the density f Z | Z . For small standard deviations, the conditional density f Z | Z will be rather flat over the better part of the support.We can see that pointwise confidence bands increase when the standard deviation of Z is low. Forstandard deviations of 0.5, 1 and 2 we see very small differences in the bias of our estimator (cid:98) g . For the We perform Kolmogorov-Smirnov tests to test the hypothesis that Y and Y ∗ follow the same probability distributionon every drawn sample of the MC study. In the a = b = 0 . a = b = 0 case we reject the null in 966 cases. Thus in the strong ME setting, Y and Y ∗ have a very similar marginal distribution in contrast to the mild ME setting. − − mild ME, a=b=0.5 Z g ( Z ) −3 −2 −1 0 1 2 3 − − strong ME, a=b=0 Z g ( Z ) Figure 3: Estimation results normalized to go through the coordinate (0 , (cid:98) g , solid red line is the median of a series estimator withsame B-splines specification, solid blue line shows true g ( · ) function, and dashed black linesare the 0.95 and 0.05 quantiles over all Monte Carlo rounds. In the left panel we choose a = b = 0 . a = b = 0.case where the standard deviation is lowest, not only confidence bands blow up but also bias increaseswhich suggests that a tiny variation in Z will ultimately also result in erroneous estimation results.The last row shows that increasing the standard deviation more does not lead to smaller confidencebands. This observation is in line with the finite sample behavior of estimators with ill-posed rate. Ifthe density f Z | Z is flat, τ K diverges faster and we need to choose a smaller number of basis functionsto control the variance.
5. Application: Beliefs on Stock Returns in SOEP
Subjective beliefs on stock market returns are a key variable in economic models that seek to explainstock market participation and portfolio choice, see e.g. Breunig et al. [2019] and the references therein.However subjective belief data is known to be prone to a large degree of measurement error, see thediscussion and references in Drerup et al. [2017]. These authors also argue for the presence of types inthe population which do not hold stable beliefs on stock market returns and thus report rather noisybeliefs that do not help in explaining economic behavior empirically.Acknowledging the presence of diverse measurement error in subjective belief data, it is difficultto rationalize a classical measurement error assumption a priori. In this application we study meanregressions where the outcome variable is a subjective belief measure. We account for the possibilityof non-classical measurement error in the outcome by applying our methodology and comparing it tostandard regression techniques that do not account for this form of measurement error. Thereby we arethe first to explicitly acknowledge the possibly nonclassical nature of measurement error in subjectivebelief data. 15 − − s = Z g ( Z ) −3 −2 −1 0 1 2 3 − − − s = Z g ( Z ) −3 −2 −1 0 1 2 3 − − s = Z g ( Z ) −3 −2 −1 0 1 2 3 − − s = Z g ( Z ) Figure 4: Estimation results with varying standard deviation σ of Z . Otherwise, the same legendapplies as in Figure 3. Note the different scale on the y-axis in the first and second panel.Via experimental interventions Breunig et al. [2019] vary subjective beliefs exogenously to determinethe causal impact of individual beliefs in portfolio choices. However, they find that the treatments didnot sufficiently shift reported beliefs. To this end, we study the impact of historic return informationon subjective beliefs of future stock market returns, allowing for nonclassical measurement error in theoutcome variable. The idea is that displaying historic return information to survey respondents priorto eliciting their beliefs can serve as an exogenous shift to their beliefs. This can be seen as a firststage of a more general analysis tackling the endogeneity of subjective beliefs.We use novel data from the innovation sample of the 2017 wave of the german socio-economic panel(SOEP-IS), which contains survey questions on individual beliefs on future stock market returns. Inthe interviews, respondents are asked what they believe how much the DAX, Germanys prime bluechip stock market index, will change in one, two, ten and thirty years with respect to the current level.They are asked to provide a direction of the change (increase or decrease) as well as a percentagechange. 16efore the individuals are asked about their beliefs, they obtain information about historical DAXreturns. Two observations of the series of yearly DAX returns from 1951 to 2016 are randomly chosenand presented to the respondent. Afterwards they are asked to report their beliefs on how the DAXchanges in the next year (in percentage points).In this application we are interested in the effect of the historical DAX information on the individualsexpected DAX return in one year. Let Y ∗ denote the individual true belief on the DAX return inone year and let Z , Z be the two treatment variables, i.e. the randomly drawn historical returns.The reported belief is denoted by Y . In general we cannot be sure that reported beliefs are free ofnonclassical measurement error and in the following we want to account for it. The data consistsof 1084 interviewed persons but 306 people do not respond to the question on beliefs. We removedmissing values and report the summary statistics below.Min. 1. Quant Median Mean 3. Quant. Max. Y -50.00 1.00 4.00 3.55 7.00 130.00 Z -43.94 -6.08 11.36 14.77 29.06 116.06 Z -43.94 -6.08 13.99 17.13 34.97 116.06Table 1: Summary Statistics (all units are percentage points)We begin the analysis by considering the following additively separable model Y ∗ = g ( Z ) + g ( Z ) + m ( W ) + U, where Z ⊥⊥ U | W (5.1)which under the identifying assumptions 1-3 leads to the model E [ Y | X ] = H [ g ( Z ) + g ( Z ) + m ( W ) , W ]where W contains all observable variables that may have a direct effect on both the latent belief Y ∗ (via some function m ) as well as the measurement Y . By the experimental nature of Z , the randomvariables are credibly fully independent of any observables in W , and thus we refrain from specifying W explicitly. Though the model choice is restrictive in that it ignores any interaction effects betweenthe two treatments, the restrictions imposed on the measurement error mechanism are rather mild. Inaddition to the latent outcome Y ∗ any other observable variable W and even unobservables may havea direct effect on the reporting Y in the sense of Assumption 1.We estimate functions g , g with our method outlined in (3.2) and contrast the results to estimatesobtained from assuming classical measurement error, i.e. from a standard additive-separable, nonpara-metric regression of Y on Z and Z . We choose a B-Spline basis of degree 2 and the number of basisfunctions is K = 2 for each function estimate (resulting from 10-fold cross-validation). The results arepresented in Figure 5 along with implementational details. Note that the absolute value of the y-axisin the left column is not informative as location and scale of g and g are not identified. Comparingboth estimates, accounting for nonclassical measurement error in the outcome leads to more concaveestimates which implies that the historic information is processed more conservatively.The main concern with the above model is that it ignores interactive effects of both treatments.17
20 −10 0 10 20 30 40 50
ME correction Z g ( Z ) −20 −10 0 10 20 30 40 50 . . . . . ME ignored
Z_1 g ( Z ) −20 −10 0 10 20 30 40 50 Z_2 g ( Z ) −20 −10 0 10 20 30 40 50 . . . . . Z_2 g ( Z ) Figure 5: Left column: Plots of sieve rank estimators. Right column: Series estimators ignoring ME.Thus the resulting qualitative differences in estimates may be explained by interactive effects thatare captured by the presence of H in (5.1). In order to study a fully nonparametric function of bothtreatments we incorporate additional variables and study the following model Y ∗ = g ( Z , Z ) + Z (cid:48) β + m ( W ) + U where Z ⊥⊥ U | W (5.2)where ( Z , W ) are now additional control variables which contribute to individual beliefs Y ∗ . Here,we assume that Z does directly effect the measurement Y in the sense of Assumption 1 while W mayhave such effects.The SOEP-IS contains respondents’ socio-demographics such as age, gender, tertiary degree in-formation as well as self-assessed cognitive skill measures and personality traits. The educationalinformation (“tertdegree”) and cognitive skill measures (“ire01”, “ire02”) are summarized in W as webelieve they may have a direct impact on the measurement Y . This is backed by findings of Breuniget al. [2019] where well-educated individuals react to a manipulation of their beliefs. For them thedegree of measurement error may generally be smaller. All remaining control variables are summarizedin Z these contain age, gender, risk attitudes and different measures of personality traits, see Table 2in Section A of the appendix for more detail.We impose our identifying Assumptions 1–3 are valid. Compared to the model (5.1) the exclusionof additional variables Z from the set W is necessary to achieve identification of g up to locationand scale normalizations. We believe that once we control for cognitive skills in W , variables like age,gender and other personal characteristics do not shift the mean reported beliefs.18herefore the model is more flexible with respect to Z , Z at the cost of assuming that none of thevariables in Z have a direct effect on the measurement Y in the sense of Assumption 1, and that theydo not violate other model assumptions such as the conditional independence in (5.2). The overallcorrelation between the variables in Z and W is negligible, see the correlogram in Figure 7 in SectionA of the appendix. We estimate (5.2) by simply varying the criterion in 3.2 over g ( Z , Z ) + Z (cid:48) β which is equivalent to imposing that Z ⊥⊥ W which does not appear to be critical due to the overalllow correlation of cognitive skills with the remaining covariates. In other cases the weighted rankestimator outlined in Section B of the appendix needs to be used.Again we compare the estimate for g from our method to estimates obtained from ignoring themeasurement error. We choose a bivariate B-Spline basis of degree 2 with K = 4 basis functions. Thisminimizes the 10-fold cross-validation of the nonparametric regression model ignoring the measurementerror and is therefore adapted to our sieve rank estimator.Results are presented in Figure 6. Again, accounting for the measurement error leads to a concave,symmetric effect of both treatments on the individual beliefs. When ignoring the possibility of mea-surement error, results are much more asymmetric, including zero effects of the second treatment andconvex parts in the surface. In contrast, our method yields that individuals learn conservatively fromboth treatments which is in line with the a priori intuition. Note that the z-axis of the sieve rankestimate is not informative since we can choose an arbtirary location and scale normalization of thefunction. The functions are evaluated on a grid ranging from -20 to 50 which corresponds to the 10%-and 90%-quantile of the marginal distributions of the treatment variables.Summarizing the results from the two different models considered above, accounting for nonclassicalmeasurement error with our sieve rank methods, yields stable results that are in line with economicexpectations. Ignoring nonclassical measurement error leads to spuriously more interesting findings inthat the effects of both treatments on beliefs appear different and range from flat to convex marginals ofthe regression function. As soon as we account for the potential measurement error in the outcome wefind that that marginal effects of both treatments are symmetric and concave hinting at a conservativelearning from the historical information.
6. Conclusion
This paper provides new insights on the analysis of regression models with non-classical measurementerror in the outcome variable. Our nonparametric identification result is based on intuitive assumptionsinvolving shape restrictions on measurement error functions. This novel result builds on the equivalenceof nonclassical measurement models and generalized regression models. We propose a novel sieve rankestimator which constructively arises from our identification result and implicitly accounts for therequired shape restrictions. We establish the rate of convergence of the sieve rank estimator whichis affected by a potentially ill-posed inverse problem. The proposed estimation method is easy toimplement and provides numerically stable results as demonstrated in a finite sample analysis. Finally,we demonstrate the usefulness of our method in an empirical application on belief elicitation wherewe find measurement error in subjective belief data to be of a non-classical form.19igure 6: Nonparametric estimates of g ( Z , Z ). The first column contains the estimate from our sieverank estimator and the second column the estimate from ignoring measurement error. A. Additional Data Description
Below we give summary statistics on additional key variables. Other variables included are “ibl11”-“ibl18” which are categorical answers to questions measuring the perseverance of a respondent. Vari-ables “isb011”-“isb015” contain answers to questions measuring personality traits such as reciprocityand patience. W consists of “ire01”, “ire02” and “tertdegree”, the remaining variables are summa-rized in Z . Below we summarize the correlation structure in the data, with most variables beinguncorrelated. 20in. 1. Quant Median Mean 3. Quant. Max.age 18.00 39.00 54.00 52.95 66.75 94.00female 0 0 0 0.4448 1 1tertdegree 0 0 0 0.1601 0 1prisk -1 2 4 3.931 6 9ire01 -5 20 40 38.14 50 100ire02 -5 30 40 42.57 50 100Table 2: Summary Statistics of key variables. Age in years, female and tertdegree are dummy variablesindicating female gender and whether a tertiary degree has been obtained. prisk is a score(0-10, -1 indicating nonresponse) of risk preferences, ire01 and ire02 are self-assessed scores(0-100, -1 indicating nonresponse) for calculatory skills and knowledge on nature. B. Extension: Estimation with Continuous W When W does contain continuous variables, we can simply replace the indicator in (3.1) with a kernelfunction to account for the fact that W i = W j = w is a null event. Then estimation can proceed with (cid:98) g w = arg max φ ∈G K Q n ( φ, w ) where (B.1) Q n ( φ, w ) := (cid:88) ≤ i Assume the function g ( Z ) does not depend on W . We can consider the followingestimator (cid:98) g = arg max φ ∈G K Q n ( φ ) where Q n ( φ ) := (cid:88) ≤ i In this section we assess the performance of a weighted rank estimator for a setting as described inRemark B.1. We consider the following data generating process similar to Section 4, Y ∗ = Z + g ( Z ) + m ( W ) + U · W Y = h ( Y ∗ + W ) + V · | W | where g ( · ) = sin( · ), m ( · ) = cos ( · ), W = 0 . · Z + 0 . · U and the remaining variables as in Section 4with h parameterized by a = b = 0. In this setting there is correlation between Z and W . Furtherthe measurement is additionally affected by the variable W . This setting is in line with Remark B.1as g does not vary with W , and we implement the procedure outlined at the end of this remark withthe LAD-criterion as aggregating procedure.In order to calculate an estimate of g for each Monte Carlo sample, we first take 50 random draws ofthe variable W , calculate (cid:98) g w by maximizing (B.2) for each of the 50 different realizations w . Finally,we aggregate the results to a final estimate by taking the sample median over the local estimates (cid:98) g w .We vary the bandwidth parameter (cid:101) s across different experiments. The sample size is n = 1000 and500 Monte Carlo replications are considered. The following Figure 8 shows the results.If we choose s reasonably small, our estimation procedure is quite close to the truth and outperformsthe standard nonparametric estimator that simply ignores the measurement error. Increasing thebandwidth s leads to smaller confidence bands, but considerably increases the bias of the estimate.However in this strong measurement error setting, the weighted sieve rank estimator still outperformsthe estimate from ignoring the measurement error.23 − − s=0. 05 Z g ( Z ) −3 −2 −1 0 1 2 3 − − s=0. 25 Z g ( Z ) −3 −2 −1 0 1 2 3 − − s=0. 5 Z g ( Z ) −3 −2 −1 0 1 2 3 − − s=1 Z g ( Z ) Figure 8: The blue line is the g ( · ) = sin( · ) function, the solid black line denotes the median and thedotted lines the respective 0.95 and 0.05 quantiles of the weighted sieve rank estimator overthe Monte Carlo experiments. The red line is the median of series estimates of g in themodel Y = Z + g ( Z ) + m ( W ) + U . Basis functions are set as in Section 4 with K = 4. C. Proofs and Technical Results Proof of Lemma 2.1. First, recall that X = ( Z, W ) and that g w = g ( · , w ). The criterions weconsider are Q ( φ, w ) = 12 E [ H ( g ( X ) , W ) { φ ( X ) > φ ( X ) } + H ( g ( X ) , W ) { φ ( X ) < φ ( X ) } | W = W = w ]24here E [ Y | X ] = H ( g ( X ) , W ) and by the LIE also Q ( φ, w ) = E [ Y { φ ( X ) > φ ( X ) }| W = W = w ]. Without loss of generality we consider the case that Assumption 2 holds with h ( · , w ) weaklymonotonically increasing. The argument for the decreasing case is analogous.We begin by noting that g w is a maximizer of Q ( · , w ), as Q ( g w , w ) = 12 E [max { H ( g ( X ) , W ) , H ( g ( X ) , W ) } | W = W = w ] , which follows by monotonicity of H in its first argument. Let m ( · ) be some arbitrary strictly increasingfunction. Then m ◦ g w is as well a maximizer of Q ( · , w ). This implies that without Assumption 5 theregression function g w is at best identified up to strictly increasing transformations.Below, we show that for any function (cid:101) g w (cid:54) = m ◦ g w for an arbitrary strictly monotonic transfor-mation m we have that Q ( (cid:101) g w , w ) < Q ( g w , w ). That is, a function that is not a strictly monotonictransformation of g w never maximizes the criterion Q ( · , w ) and thus for an arbitrary w ∈ supp ( W ), g w is identified up to a strictly monotonic transformation. Take some arbitrary function φ ∈ G that isnot a strictly monotonic transformation of g w . Therefore there exist points z (cid:48) and z (cid:48)(cid:48) in the supportof Z such that, g w ( z (cid:48) ) < g w ( z (cid:48)(cid:48) ) and φ ( z (cid:48) ) > φ ( z (cid:48)(cid:48) ). By Assumption 4 (ii) H ( · , w ) is strictly monotonicand it holds for every w that H ( g w ( z (cid:48) ) , w ) < H ( g w ( z (cid:48)(cid:48) ) , w )By continuity of the functions following Assumption 4 (i) the above inequalities hold in neighborhoods B around z (cid:48) and B around z (cid:48)(cid:48) , respectively. By Assumption 4 (iii) these neighborhoods have a strictlypositive probability measure. This implies Q ( g w , w ) − Q ( φ, w ) ≥ E [ H ( g w ( Z ) , W ) − H ( g w ( Z ) , W ) | Z , Z ∈ B × B , W = W = w ] × P ( Z , Z ∈ B × B | W = W = w ) > , with the last inequality following from the strictly positive probability of regions B × B . Thus Q ( · , w ) is only maximized by g w and strictly monotonic transformations of it. Hence g w is identifiedup to a strictly monotonic transformation. Proof of Corollary 2.2. Under Assumption 5 (i) any candidate regression function (cid:101) g w ( Z ) = (cid:101) m w ( Z ) + (cid:101) l w ( Z − ) must satisfy (cid:101) g w ( Z ) = M w ( g w ( Z )) = M w ( m w ( Z ) + l w ( Z − ) = (cid:101) m w ( Z ) + (cid:101) l w ( Z − )for a strictly monotonic function M w . Thus M w must be linear and g w is identified up to locationand scale transformation. Indeed, given linear and strictly monotonic transformations, g w is the onlymaximizer of Q ( · , w ). Under Assumption 5 (ii) we have that g w ( z ) = E [ Y | Z = z , W = w ] and g w ( z ) = E [ Y | Z = z , W = w ] and fixing the parameter space to move through both points leads to g w being the unique maximizer of Q ( · , w ) over G and thus g w is point identified.25 roof of Lemma 3.1. Let Z , Z be independent copies of Z . Consider the additive separable case g ( Z ) = Z + (cid:101) g ( Z ) with bivariate Z = ( Z , Z ). Analogously we denote φ ( Z ) = Z + (cid:101) φ ( Z ).The following holds for the criterion Q|Q ( φ ) | = E [ Y ( { Z + (cid:101) g ( Z ) > g ( Z ) } − E [ Y { Z + (cid:101) φ ( Z ) > φ ( Z ) } ]= E [ Y ( F Z | Z ( φ ( Z ) − (cid:101) φ ( Z )) − F Z | Z ( g ( Z ) − (cid:101) g ( Z )))] , as g is the maximizer of Q and with the second equation due to the law of iterated expectation. Usinga second-order Taylor decomposition with directional derivatives yields for all φ in a neighborhoodaround g |Q ( φ ) | = Q g ( φ − g ) + E [ Y f (cid:48)(cid:48) Z | Z ( ξ )( (cid:101) φ ( Z ) − (cid:101) g ( Z ) + (cid:101) g ( Z ) − (cid:101) φ ( Z )) ] (cid:124) (cid:123)(cid:122) (cid:125) = R , where ξ is some intermediate variable and Q g denotes the directional derivative of Q at g which isgiven by Q g ( φ − g ) = E [ Y f (cid:48) Z | Z ( g ( Z ) − (cid:101) g ( Z ))( (cid:101) φ ( Z ) − (cid:101) g ( Z ) + (cid:101) g ( Z ) − (cid:101) φ ( Z )) ] . Applying the Cauchy-Schwarz inequality to Q g ( φ − g ) shows that Q g is weaker than the L -norm.Further, the remainder term R satisfies | R | ≤ E (cid:34)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) f (cid:48)(cid:48) Z | Z ( ξ ) f (cid:48) Z | Z ( g ( Z ) − (cid:101) g ( Z )) ( (cid:101) φ ( Z ) − (cid:101) g ( Z ) + (cid:101) g ( Z ) − (cid:101) φ ( Z )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:35) · Q g ( φ − g )and thus the tangential cone condition in Assumption 6 (iv) is satisfied if the first factor on the righthand side is bounded between 0 and 1. The lower bound holds directly and the upper bound is easilysatisfied if the δ − neighborhood around g is chosen sufficiently small and derivatives of the density arebounded away from zero and infinity, as is condition.For the proof of the next results, we require some additional notation to deal with the Hoeffdingdecomposition of U-statistics, specific function spaces and their respective envelope functions.We introduce the empirical criterion Q n ( φ ) that can be denoted as Q n ( φ ) = 2 n ( n − (cid:88) ≤ i We begin by noting that consistency of (cid:98) g in the L -norm follows fromLemma C.2. Due to the consistency result in Lemma C.2, we may restrict the function spaces to alocal neighborhood around g , i.e. we define the space G δK = { φ ∈ G K : (cid:107) φ − g (cid:107) L ( Z ) < δ } and assumethat (cid:98) g ∈ G δK . Further we introduce the space G δ,r n K = { φ ∈ G δK : Q g ( φ − g ) > M r n } where M > 0. Itholds that P (cid:0) Q g ( (cid:98) g − g ) ≥ M r n (cid:1) ≤ P (cid:32) sup φ ∈G δ,rnK Q n ( φ ) ≥ Q n (Π K g ) (cid:33) ≤ P (cid:32) sup φ ∈G δ,rnK Q ( φ ) + ν n ( φ ) + ξ n ( φ ) ≥ Q (Π K g ) + ν n (Π K g ) + ξ n (Π K g ) (cid:33) , by applying the Hoeffding decomposition (C.1). Due to Assumption 6 (iv) we have local equivalenceof |Q ( · ) | and Q g ( · ). Since Q ( · ) is negative and thus |Q ( · ) | = −Q ( · ) it follows that P (cid:0) Q g ( (cid:98) g − g ) ≥ M r n (cid:1) ≤ P sup φ ∈G δ,rnK (cid:16) Q ( φ ) + ν n ( φ ) − ν n (Π K g ) + ξ n ( φ ) − ξ n (Π K g )) (cid:17) ≥ − ηQ g (Π K g − g ) ≤ P sup φ ∈G δ,rnK (cid:16) ν n ( φ ) − ν n (Π K g ) + ξ n ( φ ) − ξ n (Π K g ) + ηQ g (Π K g − g ) (cid:17) ≥ inf φ ∈G δ,rnK |Q ( φ ) | ≤ P (cid:32) sup φ ∈G δK ν n ( φ ) − ν n (Π K g ) + sup φ ∈G δK ξ n ( φ ) − ξ n (Π K g ) + ηQ g (Π K g − g ) ≥ C M r n (cid:33) , where it remains to study the asymptotic behavior of each summand in the last line separately. Notethat both summands on the left hand-side are positive, hence if sup G δK ν n ( φ ) is bounded in probabilityso is ν n (Π K g ) and similarly for ξ n .First we study the asymptotic behavior of the empirical process part sup φ ∈G δK ν n ( φ ). Recall thedefinition F ν,K = { ν ( · , φ ) : φ ∈ G δK } with envelope F ν . By applying the last display of Theorem 2.14.2of van der Vaart and Wellner [2000] we can conclude that E (cid:12)(cid:12)(cid:12) sup φ ∈G δK ν n ( φ ) (cid:12)(cid:12)(cid:12) = E (cid:12)(cid:12)(cid:12) sup ν ∈F ν,K n n (cid:88) i =1 ν ( S i ) (cid:12)(cid:12)(cid:12) ≤ J [] (1 , F ν,K , L ( S )) · (cid:13)(cid:13) F ν (cid:13)(cid:13) L ( S ) · n − / where (cid:13)(cid:13) F ν (cid:13)(cid:13) L ( S ) ≤ (cid:13)(cid:13) F ν (cid:13)(cid:13) L ∞ ( S ) ≤ C ν < ∞ . By Lemma C.1 (i) and (ii) we havelog N [] ( (cid:15) · (cid:13)(cid:13) F ν (cid:13)(cid:13) L ∞ ( S ) , F ν,K , L ∞ ( S )) ≤ c K log(1 /(cid:15) · C − ν )and ultimately we obtain J [] (1 , F ν,K , L ∞ ( S )) = O ( √ K ) and by Markov’s inequality sup φ ∈G δK ν n ( φ ) = O p ( (cid:112) K/n ). 28t remains to analyze the convergence rate of the degenerate U-process sup φ ∈G δK ξ n ( φ ). Similar toLemma A.1 in Clemencon et al. [2008] we can make use of the following equality for second-orderU-statistics1 n ( n − (cid:88) i (cid:54) = j ξ ( S i , S j , φ ) = 1 n ! (cid:88) π (cid:98) n/ (cid:99) (cid:98) n/ (cid:99) (cid:88) i =1 ξ ( S i , S (cid:98) n/ (cid:99) + i , φ ) (C.2)where π is short-hand for all permutations of { , . . . , n } . Then applying the triangle inequality to(C.2) leads to E (cid:34)(cid:12)(cid:12)(cid:12) sup φ ∈G δK n ( n − (cid:88) i (cid:54) = j ξ ( S i , S j , φ ) (cid:12)(cid:12)(cid:12)(cid:35) ≤ E (cid:12)(cid:12)(cid:12) sup φ ∈G δK (cid:98) n/ (cid:99) (cid:98) n/ (cid:99) (cid:88) i =1 ξ ( S i , S (cid:98) n/ (cid:99) + i , φ ) (cid:12)(cid:12)(cid:12) (C.3)from which we can conclude that for obtaining the convergence rate of the degenerate U-process onthe left-hand side of (C.3) it is sufficient to analyze the convergence rate of an empirical process withkernel ξ indexed by the function G δK .The kernel ξ contains non-smooth indicator functions so we cannot apply the exact same reasoningwe used earlier to derive a bound for ν n , as ξ ( S i , S j , φ ) is not continuous in φ . However we can use thefact that ξ ( · , · , φ ) belongs to a VC- subgraph family and we can thus derive the complexity bound inLemma C.1 (iii).Recall the definition F ξ,K = { ξ ( · , · , φ ) : φ ∈ G δK } and the associated envelope function F ξ . Now weapply Theorem 2.14.1 of van der Vaart and Wellner [2000] E (cid:12)(cid:12)(cid:12) sup φ ∈G δK (cid:98) n/ (cid:99) (cid:98) n/ (cid:99) (cid:88) i =1 ξ ( S i , S (cid:98) n/ (cid:99) + i , φ ) (cid:12)(cid:12)(cid:12) ≤ J [] (1 , F ξ,K , L ( S )) (cid:13)(cid:13) F ξ (cid:13)(cid:13) L ( S ) (cid:98) ( n/ (cid:99) − / Applying Lemma C.1 (iii) we obtain the bound J [] (1 , F ξ,K , L ( S )) ≤ (cid:90) (cid:112) c + c K log(1 /(cid:15) ) d(cid:15) = O ( √ K )and by Markov’s inequality that sup φ ∈G δK ξ n ( φ ) = O p ( (cid:112) K/n ). Finally, we can conclude that P ( Q g ( (cid:98) g − g ) ≥ M r n ) ≤ P (cid:32) sup φ ∈G δK ν n ( φ ) + sup φ ∈G δK ξ n ( φ ) + Q g (Π K g − g ) ≥ C M r n (cid:33) . with sup φ ∈G δK ν n ( φ ) = O p ( (cid:112) K/n ) and sup φ ∈G δK ξ n ( φ ) = O p ( (cid:112) K/n ). Consequently, choosing r n =max { (cid:112) K/n, Q g (Π K g − g ) } we see that the right hand side probability converges to zero as M → ∞ .29hus Q g ( (cid:98) g − g ) = O p ( r n ). By the definition of the sieve measure of ill-posedness τ K we obtain (cid:107) (cid:98) g − g (cid:107) L ( Z ) ≤ τ K Q g ( (cid:98) g − g ) ≤ τ K O p (cid:16) max { (cid:112) K/n, Q g (Π K g − g ) } (cid:17) = O p (cid:16) τ K (cid:112) K/n, (cid:107) Π K g − g (cid:107) L ( Z ) (cid:17) which concludes the proof. Lemma C.1. Under Assumption 6 it holds that(i) sup (cid:107) φ − g (cid:107) ∞ ≤ δ | ν ( S i , φ ) | ≤ M ( S i ) · δ with E [ M ( S i )] < ∞ ,(ii) log N [] ( (cid:15), F ν,K , L ∞ ( S )) ≤ c K log(1 /(cid:15) ) for some positive constant c ,(iii) log N ( (cid:15), F ξ,K , L ( S )) ≤ c + c K log(1 /(cid:15) ) , for positive constants c , c . Proof of Lemma C.1. Proof of part (i). It holds that ν ( S i , φ ) = Y i E [ { φ ( Z i ) > φ ( Z j ) } − { g ( Z i ) > g ( Z j ) }| Z i ]+ E [ Y j ( { φ ( Z j ) > φ ( Z i ) } − { g ( Z j ) > g ( Z i ) } ) | Z i ] − E [ Y i ( { φ ( Z i ) > φ ( Z j ) } − { g ( Z i ) > g ( Z j ) } )]We make use of the fact that as (cid:107) φ − g (cid:107) ∞ ≤ δ and thus g ( z ) − δ ≤ φ ( z ) ≤ g ( z ) + δ for any z in thesupport of Z . Following Chen et al. [2003] (p. 1599-1600) we have thatsup (cid:107) φ − g (cid:107) ∞ ≤ δ | { φ ( Z j ) < φ ( Z i ) } − { g ( Z j ) < g ( Z i ) }| ≤ | { g ( Z j ) < φ ( Z i ) + δ } − { g ( Z j ) < g ( Z i ) − δ }| and thus | ν ( S i , φ ) | ≤| Y i | · | F g ( Z ) ( φ ( Z i ) + δ ) − F g ( Z ) ( g ( Z i ) − δ ) | + | E [ Y i | Z i ] | · | F g ( Z ) ( φ ( Z i ) + δ ) − F g ( Z ) ( g ( Z i ) − δ ) | + | E [ Y i ] | · E [ | F g ( Z ) ( φ ( Z i ) + δ ) − F g ( Z ) ( g ( Z i ) − δ ) | ] ≤ ( | Y i | + | E [ Y i | Z i ] | + | E [ Y i ] | ) · δ where the last inequality follows from Assumption 6 (v), the Lipschitz continuity for the cdf of g ( Z ).Define M ( S i ) = | Y i | + | E [ Y i | Z i ] | + | E [ Y i ] | . From Assumption 6 (iii) follows that E [ M ( S i )] < ∞ whichconcludes the argument.We continue with the proof of part (ii). By Lemma C.1 (i) we havelog N [] ( (cid:15), F ν,K , L ∞ ( S )) ≤ log N [] ( (cid:15), G K , L ∞ ( Z )) ≤ cK log(1 /(cid:15) )where both inequalities are due to Chen [2007] (pp. 5595 and 5601).We conclude with the proof of part (iii). We make use of the decomposition ξ ( S i , S j , φ ) = ξ ( S i , S j , φ )+ ξ ( S i , S j , φ ) where ξ ( S i , S j , φ ) = Γ( S i , S j , φ ) and ξ ( S i , S j , φ ) = − E [Γ( S i , S j , φ ) | S i ] − E [Γ( S i , S j , φ ) | S j ] + E [Γ( S i , S j , φ )] . N ( (cid:15), F ξ,K , L ( S )) ≤ log N ( (cid:15), F ξ ,K , L ( S )) + log N ( (cid:15), F ξ ,K , L ( S )) . Similar to the proof of part (ii) of Lemma C.1 we obtain log N ( (cid:15), F ξ ,K , L ( S )) ≤ cK log(1 /(cid:15) ) forsome constant c . Below, we follow Chapter 5 of Sherman [1993] to establish that F ξ ,K belongs to aVC-subgraph class. To this end define the subgraphsubgraph ( ξ ( · , · , φ )) = { ( s i , s j , t ) ∈ supp ( S ) × R : 0 < t < y i [ { φ ( z i ) > φ ( z j ) } − { g ( z i ) < g ( z j ) } ] } = { y i > }{ φ ( z i ) − φ ( z j ) > }{ t > }{ t < F ξ ( z i , z j ) }{ g ( z i ) − g ( z j ) < }∪ { y i < }{ φ ( z i ) − φ ( z j ) < }{ t > }{ t < F ξ, ( z i , z j ) }{ g ( z i ) − g ( z j ) > } and introduce the function m ( t, s , s ; γ , γ , π , π ) := γ t + γ y + ( g ( z ) , p K ( z )) (cid:48) π + ( g ( z ) , p K ( z )) (cid:48) π with the associated function space M = { m ( · , · , · ; γ , γ , π , π ) : γ ∈ R , γ ∈ R , π ∈ R K +1 , π ∈ R K +1 } . Note that M is a finite vector space of dimension 2( K + 2) and the subgraph can be written assubgraph ( ξ ( · , · , φ )) = { m > }{ m > }{ m > }{ m > }{ m > }∪ { m > }{ m > }{ m > }{ m > }{ m > } (C.4)with functions m i ∈ M for any i = 1 , . . . , 10. Following e.g. Lemma 2.4 and 2.5 in Pakes and Pollard[1989] it can be established that subgraph ( ξ ( · , · , φ )) belongs to a VC-class of sets and thus the space F ξ is a VC-class of functions. To bound the complexity of the space we require the VC-index of F ξ which we denote as V ( F ξ ) = V (subgraph( ξ )).From Pollard [1984, Lemma 18] it follows that V ( { m i > } ) ≤ K + 2). Applying in van der Vaartand Wellner [2009, Theorem 1.1] to (C.4) then leads to V (subgraph( ξ )) (cid:46) K + 2), so the VC-indexof the space F ξ increases with the same order as the sieve dimension K . Now applying van der Vaart[1998, Theorem 2.6.7] yieldslog N ( (cid:15), F ξ ,K , L ( S )) ≤ log( C · V ( F ξ )(16 e ) V ( F ξ ) (1 /(cid:15) ) V ( F ξ ) − )= log( C ) + log(2( K + 2)) + 2( K + 2) log(16 e ) + 2( K + 2) log(1 /(cid:15) )and together with log N ( (cid:15), F ξ ,K , L ( S )) ≤ cK log(1 /(cid:15) ) the stated result follows. Lemma C.2. Under Assumptions 1–6 it holds that (cid:107) (cid:98) g − g (cid:107) L ( Z ) = o p (1) . Proof of Lemma C.2. We need to check the conditions in Lemma A.2 of Chen and Pouzo [2012].31n their notation Q n = Q and g ( k, n, (cid:15) ) = inf φ ∈G K : (cid:107) φ − g (cid:107) L Z ) ≥ (cid:15) |Q ( φ ) | Their condition a is thus satisfied and g ( n, k, (cid:15) ) > K the following holds |Q (Π K g ) − Q ( g ) | (cid:46) Q g (Π K g − g ) (cid:46) τ − K (cid:107) Π K g − g (cid:107) L ( Z ) , and thus Q (Π K g ) − Q ( g ) = o (1). Next, Condition c is implicitly assumed to hold and it remains tocheck condition d which translates asmax {|Q (Π K g ) − Q ( g ) | , sup φ ∈G K |Q n ( φ ) − Q ( φ ) |} g ( n, k, (cid:15) ) = o (1) . Analogous to the empirical process result from (C.2) and (C.3) and the subsequent proceedings, itholds that sup φ ∈G K |Q n ( φ ) − Q ( φ ) | (cid:46) (cid:112) K/n . Then ultimately consider that for any (cid:15) > (cid:15) ∗ > g ( k, n, (cid:15) ) = inf G K : (cid:107) φ − g (cid:107) L Z ) ≥ (cid:15) |Q ( φ ) | ≥ inf G K : (cid:107) φ − g (cid:107) L Z ) ≥ (cid:15) ∗ Q g ( φ − g ) ≥ inf G K : (cid:107) φ − g (cid:107) L Z ) ≥ (cid:15) ∗ τ − K (cid:107) φ − g (cid:107) L ( Z ) ≥ τ − K (cid:15) ∗ . In summary we require thatmax {|Q (Π K g ) − Q ( g ) | , sup φ ∈G K |Q n ( φ ) − Q ( φ ) |} (cid:14) g ( n, k, (cid:15) ) (cid:46) τ K max { (cid:112) K/n, τ − K (cid:107) φ − g (cid:107) L ( Z ) } = o (1) , which follows from the rate restriction in Assumption 6 (vi). References J. Abrevaya and J. A. Hausman. Semiparametric estimation with mismeasured dependent variables:an application to duration models for unemployment spells. Annales d’Economie et de Statistique ,pages 243–275, 1999.J. Abrevaya and J. A. Hausman. Response error in a transformation model with an application toearnings-equation estimation. The Econometrics Journal , 7(2):366–388, 2004.J. Abrevaya and Y. Shin. Rank estimation of partially linear index models. The Econometrics Journal ,14(3):409–437, 2011.D. Ben-Moshe, X. D’Haultfœuille, and A. Lewbel. Identification of additive and polynomial modelsof mismeasured regressors without instruments. Journal of Econometrics , 200(2):207–222, 2017.32. Berkson. Are there two regressions? Journal of the American Statistical Association , 45(250):164–180, 1950.C. Breunig and P. Haan. Nonparametric regression with selectively missing covariates. arXiv preprintarXiv:1810.00411 , 2018.C. Breunig, E. Mammen, and A. Simoni. Nonparametric estimation in case of endogenous selection. Journal of Econometrics , 202(2):268 – 285, 2018.C. Breunig, S. Huck, T. Schmidt, and G. Weizs¨acker. The standard portfolio choice problem ingermany. CRC TRR 190 Discussion Paper , (171), 2019.C. Cavanagh and R. P. Sherman. Rank estimators for monotonic index models. Journal of Economet-rics , 84(2):351–381, 1998.X. Chen. Large sample sieve estimation of semi-nonparametric models. Handbook of Econometrics ,2007.X. Chen and D. Pouzo. Estimation of nonparametric conditional moment models with possibly nons-mooth generalized residuals. Econometrica , 80(1):277–321, 2012.X. Chen, O. Linton, and I. Van Keilegom. Estimation of semiparametric models when the criterionfunction is not smooth. Econometrica , 71(5):1591–1608, 2003.X. Chen, H. Hong, and E. Tamer. Measurement Error Models with Auxiliary Data. The Review ofEconomic Studies , 72(2):343–366, 04 2005.X. Chen, H. Hong, and D. Nekipelov. Nonlinear models of measurement errors. Journal of EconomicLiterature , 49(4):901–37, December 2011.P.-A. Chiappori, I. Komunjer, and D. Kristensen. Nonparametric identification and estimation oftransformation models. Journal of Econometrics , 188(1):22 – 39, 2015.S. Clemencon, G. Lugosi, and N. Vayatis. Ranking and empirical minimization of u-statistics. TheAnnals of Statistics , 36(2):844–874, 2008. ISSN 00905364.X. D’Haultfoeuille. A new instrumental method for dealing with endogenous selection. Journal ofEconometrics , 154(1):1–15, 2010.T. Drerup, B. Enke, and H.-M. von Gaudecker. The precision of subjective data and the explanatorypower of economic models. Journal of Econometrics , 200(2):378 – 389, 2017.F. Dunker, J.-P. Florens, T. Hohage, J. Johannes, and E. Mammen. Iterative estimation of solu-tions to noisy nonlinear operator equations in nonparametric instrumental regression. Journal ofEconometrics , 178:444–455, 2014. 33. Fan, F. Han, W. Li, and X.-H. Zhou. On rank estimators in increasing dimensions. Journal ofEconometrics , 214:379–412, 2020.A. K. Han. Non-parametric analysis of a generalized regression model: the maximum rank correlationestimator. Journal of Econometrics , 35(2-3):303–316, 1987.J. A. Hausman, W. K. Newey, H. Ichimura, and J. L. Powell. Identification and estimation of polyno-mial errors-in-variables models. Journal of Econometrics , 50(3):273 – 295, 1991.S. Hoderlein and J. Winter. Structural measurement errors in nonseparable models. Journal ofEconometrics , 157(2):432 – 440, 2010.S. Hoderlein, B. Siflinger, and J. Winter. Identification of structural models in the presence of mea-surement error due to rounding in survey responses. 2015.Y. Hu and S. M. Schennach. Instrumental variable treatment of nonclassical measurement error models. Econometrica , 76(1):195–216, 2008.G. W. Imbens and W. K. Newey. Identification and estimation of triangular simultaneous equationsmodels without additivity. Econometrica , 77(5):1481–1512, 2009.D. Jacho-Chavez, A. Lewbel, and O. Linton. Identification and nonparametric estimation of a trans-formed additively separable model. Journal of Econometrics , 156(2):392 – 407, 2010.S. Khan. Two-stage rank estimation of quantile index models. Journal of Econometrics , 100(2):319–355, 2001.A. Lewbel. An overview of the special regressor method. 08 2014.R. Matzkin. Nonparametric and Semiparametric Methods in Econometrics and Statistics , chapter ANonparametric Maximum Rank Correlation Estimator. Cambridge: Cambridge University Press,1991.R. L. Matzkin. Restrictions of economic theory in nonparametric methods. Handbook of econometrics ,4:2523–2558, 1994.R. L. Matzkin. Nonparametric identification. Handbook of Econometrics , 6:5307–5368, 2007.M. D. Nadai and A. Lewbel. Nonparametric errors in variables models with measurement errors onboth sides of the equation. Journal of Econometrics , 191(1):19 – 32, 2016.W. Newey, J. L. Powell, and F. Vella. Nonparametric estimation of triangular simulataneous equationsmodels. Econometrica , 67(3):565–603, 1999.D. Nolan and D. Pollard. U-processes: Rates of convergence. The Annals of Statistics , 15(2):780–799,1987. 34. Pakes and D. Pollard. Simulation and the asymptotics of optimization estimators. Econometrica:Journal of the Econometric Society , pages 1027–1057, 1989.D. Pollard. Convergence of Stochastic Processes . Springer Series in Statistics, 1984.S. Schennach. Instrumental variable estimation of nonlinear errors-in-variables models. Econometrica ,75(1):201–239, 2007.S. M. Schennach. Measurement error in nonlinear models - a review. Advances in Economics andEconometrics, Theory and Applications: Tenth World Congress of the Econometric Society , 2013.R. P. Sherman. The limiting distribution of the maximum rank correlation estimator. Econometrica ,pages 123–137, 1993.Y. Shin. Local rank estimation of transformation models with functional coefficients. EconometricTheory , 26(6):1807–1819, 2010.G. Tang, R. J. Little, and T. E. Raghunathan. Analysis of multivariate missing data with nonignorablenonresponse. Biometrika , 90(4):747–764, 2003.A. van der Vaart and J. Wellner. Weak Convergence and Empirical Processes: With Applications toStatistics (Springer Series in Statistics) . Springer, corrected edition, Nov. 2000.A. van der Vaart and J. Wellner. A note on bounds for vc dimensions. IMS Collections: HighDimensional Probability , 5:103–107, 2009.A. W. van der Vaart. Asymptotic statistics. Cambridge University Press, 1998.H. White and K. Chalak. Testing a conditional form of exogeneity. Economics Letters , 109(2):88–90,2010.J. Zhao and J. Shao. Semiparametric pseudo-likelihoods in generalized linear models with nonignorablemissing data.