[PDF] Rationalizing Rational Expectations: Characterization and Tests

Abstract

In this paper, we build a new test of rational expectations based on the marginal distributions of realizations and subjective beliefs. This test is widely applicable, including in the common situation where realizations and beliefs are observed in two different datasets that cannot be matched. We show that whether one can rationalize rational expectations is equivalent to the distribution of realizations being a mean-preserving spread of the distribution of beliefs. The null hypothesis can then be rewritten as a system of many moment inequality and equality constraints, for which tests have been recently developed in the literature. The test is robust to measurement errors under some restrictions and can be extended to account for aggregate shocks. Finally, we apply our methodology to test for rational expectations about future earnings. While individuals tend to be right on average about their future earnings, our test strongly rejects rational expectations.

Full PDF

RRationalizing Rational Expectations:Characterizations and Tests ∗ Xavier D’Haultfoeuille † Christophe Gaillac ‡ Arnaud Maurel § Abstract

In this paper, we build a new test of rational expectations based on themarginal distributions of realizations and subjective beliefs. This test is widelyapplicable, including in the common situation where realizations and beliefsare observed in two diﬀerent datasets that cannot be matched. We show thatwhether one can rationalize rational expectations is equivalent to the distribu-tion of realizations being a mean-preserving spread of the distribution of beliefs.The null hypothesis can then be rewritten as a system of many moment inequal-ity and equality constraints, for which tests have been recently developed in theliterature. The test is robust to measurement errors under some restrictionsand can be extended to account for aggregate shocks. Finally, we apply ourmethodology to test for rational expectations about future earnings. Whileindividuals tend to be right on average about their future earnings, our teststrongly rejects rational expectations.

Keywords:

Rational expectations; Test; Subjective expectations; Datacombination. ∗ This paper is based on portions of our previous working paper D’Haultfœuille et al. (2018 b ). Wethank the co-editor Andres Santos, three anonymous referees, Peter Arcidiacono, Levon Barseghyan,Federico Bugni, Pierre Cahuc, Zhuoli Chen, Tim Christensen, Valentina Corradi, Christian Gourier-oux, Nathael Gozlan, Gregory Jolivet, Max Kasy, Jia Li, Matt Masten, Magne Mogstad, AndrewPatton, Aureo de Paula, Mirko Wiederholt, Basit Zafar, Yichong Zhang and participants of variousseminars and conferences for useful comments and suggestions. † CREST-ENSAE, [email protected]. Xavier D’Haultfoeuille thanks the hospitalityof PSE where part of this research was conducted. ‡ CREST-ENSAE and TSE, [email protected]. § Duke University, NBER and IZA, [email protected]. a r X i v : . [ ec on . E M ] D ec Introduction

How individuals form their beliefs about uncertain future outcomes is critical tounderstanding decision making. Despite longstanding critiques (see, among manyothers, Pesaran, 1987; Manski, 2004), rational expectations remain by far the mostpopular framework to describe belief formation (Muth, 1961). This theory states thatagents have expectations that do not systematically diﬀer from the realized outcomes,and eﬃciently process all private information to form these expectations. Rationalexpectations (RE) are a key building block in many macro- and micro-economicmodels, and in particular in most of the dynamic microeconomic models that havebeen estimated over the last two decades (see, e.g., Aguirregabiria and Mira, 2010;Blundell, 2017, for recent surveys).In this paper, we build a new test of RE. Our test only requires having access tothe marginal distributions of subjective beliefs and realizations, and, as such, can beapplied quite broadly. In particular, this test can be used in a data combination con-text, where individual realizations and subjective beliefs are observed in two diﬀerentdatasets that cannot be matched. Such situations are common in practice (see, e.g.,Delavande, 2008; Arcidiacono, Hotz and Kang, 2012; Arcidiacono, Hotz, Maurel andRomano, 2014; Stinebrickner and Stinebrickner, 2014 a ; Gennaioli, Ma and Shleifer,2016; Kuchler and Zafar, 2019; Boneva and Rauh, 2018; Biroli, Boneva, Raja andRauh, 2020). Besides, even in surveys for which an explicit aim is to measure sub-jective expectations, such as the Michigan Survey of Consumers or the Survey ofConsumer Expectations of the New York Fed, expectations and realizations can typ-ically only be matched for a subset of the respondents. And of course, regardless ofattrition, whenever one seeks to measure long or medium-term outcomes, matchingbeliefs with realizations does require waiting for a long period of time before the datacan be made available to researchers. The tests of RE implemented so far in this context only use speciﬁc implications of theRE hypothesis. In contrast, we develop a test that exploits all possible implicationsof RE. Using the key insight that we can rationalize RE if and only if the distribution Situations where realizations can be perfectly predicted beforehand, such as in school choicesettings where assignments are a known function of observed inputs, are notable exceptions.

2f realizations is a mean-preserving spread of the distribution of beliefs, we show thatrationalizing RE is equivalent to satisfying one moment equality and (inﬁnitely) manymoment inequalities. As a consequence, if these moment conditions hold, RE cannotbe rejected, given the data at our disposal. By exhausting all relevant implicationsof RE, our test is able to detect much more violations of rational expectations thanexisting tests.To develop a statistical test of RE rationalization, we build on the recent literature oninference based on moment inequalities, and more speciﬁcally, on Andrews and Shi(2017). By applying their results to our context, we show that our test controls sizeasymptotically and is consistent over ﬁxed alternatives. We also provide conditionsunder which the test is not conservative.We then consider several extensions to our baseline test. First, we show that by usinga set of covariates that are common to both datasets, we can increase our ability todetect violations of RE. Another important issue is that of unanticipated aggregateshocks. Even if individuals have rational expectations, the mean of observed outcomesmay diﬀer from the mean of individual beliefs simply because of aggregate shocks.We show that our test can be easily adapted to account for such shocks.Finally, we prove that our test is robust to measurement errors in the following sense.If individuals have rational expectations but both beliefs and outcomes are measuredwith (classical) errors, then our test does not reject RE provided that the amount ofmeasurement errors on beliefs does not exceed the amount of intervening transitoryshocks plus the measurement errors on the realized outcomes. In that speciﬁc sense,imperfect data quality does not jeopardize the validity of our test. In particular,this allows for elicited beliefs to be noisier than realized outcomes. This provides arationale for our test even in cases where realizations and beliefs are observed in thesame dataset, since a direct test based on a regression of the outcome on the beliefs(see, e.g., Lovell, 1986) is, at least at the population level, not robust to any amountof measurement errors on the subjective beliefs.We apply our framework to test for rational expectations about future earnings. To Interestingly, the equivalence on which we rely, which is based on Strassen’s theorem (Strassen,1965), is also used in the microeconomic risk theory literature, see in particular Rothschild andStiglitz (1970).

3o so, we combine elicited beliefs about future earnings with realized earnings, usingdata from the Labor Market module of the Survey of Consumer Expectations (SCE,New York Fed), and test whether household heads form rational expectations ontheir annual labor earnings. While a naive test of equality of means between earningsbeliefs and realizations shows that earnings expectations are realistic in the sense ofnot being signiﬁcantly biased, thus not rejecting the rational expectations hypothesis,our test does reject rational expectations at the 1% level. Taken together, our ﬁnd-ings illustrate the practical importance of incorporating the additional restrictionsof rational expectations that are embedded in our test. The results of our test alsoindicate that the RE hypothesis is more credible for certain subpopulations than oth-ers. For instance, we reject RE for individuals without a college degree, who exhibitsubstantial deviations from RE. On the other hand, we fail to reject the hypothesisthat college-educated workers have rational expectations on their future earnings.By developing a test of rational expectations in a setting where realizations and sub-jective beliefs are observed in two diﬀerent datasets, we bring together the literatureon data combination (see, e.g., Cross and Manski, 2002, Molinari and Peski, 2006,Fan, Sherman and Shum, 2014, Buchinsky, Li and Liao, 2019, and Ridder and Moﬃtt,2007 for a survey), and the literature on testing for rational expectations in a microenvironment (see, e.g., Lovell, 1986; Gourieroux and Pradel, 1986; Ivaldi, 1992, forseminal contributions).On the empirical side, we contribute to a rapidly growing literature on the use ofsubjective expectations data in economics (see, e.g., Manski, 2004; Delavande, 2008;Van der Klaauw and Wolpin, 2008; Van der Klaauw, 2012; Arcidiacono, Hotz, Maureland Romano, 2014; de Paula, Shapira and Todd, 2014; Stinebrickner and Stinebrick-ner, 2014 b ; Wiswall and Zafar, 2015). In this paper, we show how to incorporate allof the relevant information from subjective beliefs combined with realized data to testfor rational expectations.The remainder of the paper is organized as follows. In Section 2, we present thegeneral set-up and the main theoretical equivalences underlying our RE test. In Sec-tion 3, we introduce the corresponding statistical tests and study their asymptoticproperties. Section 4 illustrates the ﬁnite sample properties of our tests throughMonte Carlo simulations. Section 5 applies our framework to expectations about fu-4ure earnings. Finally, Section 6 concludes. The appendix collects various theoreticalextensions, additional simulation results, additional material on the application, andall the proofs. The companion R package RationalExp, described in the user guide(D’Haultfœuille, Gaillac and Maurel, 2018 a ), performs the test of RE. We assume that the researcher has access to a ﬁrst dataset containing the individualoutcome variable of interest, which we denote by Y . She also observes, through asecond dataset drawn from the same population, the elicited individual expectationon Y , denoted by ψ . The two datasets, however, cannot be matched. We focus onsituations where the researcher has access to elicited beliefs about mean outcomes,as opposed to probabilistic expectations about the full distribution of outcomes. Thetype of subjective expectations data we consider in the paper has been collected invarious contexts, and used in a number of prior studies (see, among others, Delavande,2008; Zafar, 2011 b ; Arcidiacono, Hotz and Kang, 2012; Arcidiacono, Hotz, Maurel andRomano, 2014; Hoﬀman and Burks, 2020).Formally, ψ = E [ Y |I ], where I denotes the σ -algebra corresponding to the agent’sinformation set and E [ ·|I ] is the subjective expectation operator (i.e. for any U , E [ U |I ] is a I -measurable random variable). We are interested in testing the rationalexpectations (RE) hypothesis ψ = E [ Y |I ], where E [ ·|I ] is the conditional expecta-tion operator generated by the true data generating process. Importantly, we remainagnostic throughout most of our analysis on the information set I . Our setting is alsocompatible with heterogeneity in the information diﬀerent agents use to form theirexpectations. To see this, let ( U , ..., U m ) denote m variables that agents may or maynot observe when they form their expectations, and let D k = 1 if U k is observed, 0otherwise. Then, if I is the information set generated by ( D U , ..., D m U m ), agentswill use diﬀerent subsets of the ( U k ) k =1 ...m (i.e., diﬀerent pieces of information) de-pending on the values of the ( D k ) k =1 ...m . Our setup encompasses a wide variety ofsituations, where individuals have private information and form their beliefs based on5heir information set. This includes various contexts where individuals form their ex-pectations about future outcomes, including education, labor market as well as healthoutcomes. By remaining agnostic on the information set, our analysis complementsseveral studies which primarily focus on testing for diﬀerent information sets, whilemaintaining the rational expectations assumption (see Cunha and Heckman, 2007,for a survey).It is easy to see that the RE hypothesis imposes restrictions on the joint distribution ofrealizations Y and beliefs ψ . In this data combination context, the relevant questionof interest is then whether one can rationalize RE, in the sense that there exists atriplet ( Y (cid:48) , ψ (cid:48) , I (cid:48) ) such that ( i ) the pair of random variables ( Y (cid:48) , ψ (cid:48) ) are compatiblewith the marginal distributions of Y and ψ ; and ( ii ) ψ (cid:48) correspond to the rationalexpectations of Y (cid:48) , given the information set I (cid:48) , i.e., E ( Y (cid:48) |I (cid:48) ) = ψ (cid:48) . Hence, weconsider the test of the following hypothesis:H : there exists a pair of random variables ( Y (cid:48) , ψ (cid:48) ) and a sigma-algebra I (cid:48) such that σ ( ψ (cid:48) ) ⊂ I (cid:48) , Y (cid:48) ∼ Y, ψ (cid:48) ∼ ψ and E [ Y (cid:48) |I (cid:48) ] = ψ (cid:48) , where ∼ denotes equality in distribution. Rationalizing RE does not mean thatthe true realizations Y , beliefs ψ and information set I are such that E [ Y |I ] = ψ .Instead, it means that there exists a triplet ( Y (cid:48) , ψ (cid:48) , I (cid:48) ) consistent with the data andsuch that E [ Y (cid:48) |I (cid:48) ] = ψ (cid:48) . In other words, rejecting H implies that RE does not hold,in the sense that the true realizations, beliefs, and information set do not satisfy RE( E [ Y |I ] (cid:54) = ψ ). The converse, however, is not true. Let F ψ and F Y denote the cumulative distribution functions (cdf) of ψ and Y , x + =max(0 , x ), and deﬁne ∆( y ) = (cid:90) y −∞ F Y ( t ) − F ψ ( t ) dt. Throughout most of our analysis, we impose the following regularity conditions onthe distributions of realized outcomes ( Y ) and subjective beliefs ( ψ ):6 ssumption 1 E ( | Y | ) < ∞ and E ( | ψ | ) < ∞ . The following preliminary result will be useful subsequently.

Lemma 1

Suppose that Assumption 1 holds. Then H holds if and only if there existsa pair of random variables ( Y (cid:48) , ψ (cid:48) ) such that Y (cid:48) ∼ Y , ψ (cid:48) ∼ ψ and E [ Y (cid:48) | ψ (cid:48) ] = ψ (cid:48) . Lemma 1 states that in order to test for H , we can focus on the constraints onthe joint distribution of Y and ψ , and ignore those related to the information set.This is intuitive given that we impose no restrictions on this set. Our main resultis Theorem 1 below. It states that rationalizing RE (i.e., H ) is equivalent to acontinuum of moment inequalities, and one moment equality. Theorem 1

Suppose that Assumption 1 holds. The following statements are equiva-lent:(i) H holds;(ii) ( F Y mean-preserving spread of F ψ ) ∆( y ) ≥ for all y ∈ R and E [ Y ] = E [ ψ ] ;(iii) E (cid:2) ( y − Y ) + − ( y − ψ ) + (cid:3) ≥ for all y ∈ R and E [ Y ] = E [ ψ ] . The implication (i) ⇒ (iii) and the equivalence between (ii) and (iii) are simple toestablish. The key part of the result is to prove that (iii) implies (i). To show this,we ﬁrst use Lemma 1, which states that H is equivalent to the existence of ( Y (cid:48) , ψ (cid:48) )such that Y (cid:48) ∼ Y , ψ (cid:48) ∼ ψ and E [ Y (cid:48) | ψ (cid:48) ] = ψ (cid:48) . Then the result essentially follows fromStrassen’s theorem (Strassen, 1965, Theorem 8).It is interesting to note that Theorem 1 is related to the theory of risk in microeco-nomic theory. In particular, using the terminology of Rothschild and Stiglitz (1970),(ii) states that realizations ( Y ) are more risky than beliefs ( ψ ). The main value ofTheorem 1, from a statistical point of view, is to transform H into the set of momentinequality (and equality) restrictions given by (iii). We show in Section 3 how tobuild a statistical test of these conditions.7 omparison with alternative approaches We now compare our approach withalternative ones that have been proposed in the literature. In the following discussion,as in this whole section, we reason at the population level and thus ignore statisticaluncertainty. Accordingly, the “tests” we consider here are formally deterministic, andwe compare them in terms of data generating processes violating the null hypothesisassociated with each of them.Our approach can clearly detect many more violations of rational expectations thanthe “naive” approach based solely on the equality E ( Y ) = E ( ψ ). It also detects moreviolations than the approach based on the restrictions E ( Y ) = E ( ψ ) and V ( Y ) ≥ V ( ψ ) (approach based on the variance), which has been considered in particular inthe macroeconomic literature on the accuracy and rationality of forecasts (see, e.g.Patton and Timmermann, 2012). On the other hand, and as expected since it relieson the joint distribution of ( Y, ψ ), the “direct” approach for testing RE, based on E ( Y | ψ ) = ψ , can detect more violations of rational expectations than ours.To better understand the diﬀerences between these four diﬀerent approaches (“naive”,variance, “direct”, and ours), it is helpful to consider important particular cases. Ofcourse, if ψ = E [ Y |I ], individuals are rational and none of the four approachesleads to reject RE. Next, consider departures from rational expectations of the form ψ = E [ Y |I ] + η , with η independent of E [ Y |I ]. If E ( η ) (cid:54) = 0, subjective beliefs arebiased, and individuals are on average either over-pessimistic or over-optimistic. Itfollows that E ( Y ) (cid:54) = E ( ψ ), implying that all four approaches lead to reject RE.More interestingly, if E ( η ) = 0, individuals’ expectations are right on average, andthe naive approach does not lead to reject RE. However, it is easy to show that,as long as deviations from RE are heterogeneous in the population ( V ( η ) > η ) relative to the uncertainty shocks ( ε = Y − E ( Y |I )). Inother words and intuitively, we reject RE whenever departures from RE dominate theuncertainty shocks aﬀecting the outcome. Formally, and using similar arguments as inProposition 4 in Subsection 2.2.4, one can show that if ε is independent of E [ Y |I ], wereject H as long as the distribution of the uncertainty shocks stochastically dominatesat the second-order the distribution of the deviations from RE.8peciﬁcally, if ε ∼ N (0 , σ ε ) and η ∼ N (0 , σ η ), we reject RE if and only if σ η > σ ε .In such a case, our approach boils down to the variance approach mentioned above:we reject whenever V ( ψ ) > V ( Y ). But interestingly, if the discrepancy ( η ) betweenbeliefs and RE is not normally distributed, we can reject H even if V ( ψ ) ≤ V ( Y ).Suppose for instance that ε ∼ N (0 ,

1) and η = a ( − { U ≤ . } + { U ≥ . } ) , U ∼ U [0 ,

1] and a > . In other words, 80% of individuals are rational, 10% are over-pessimistic and form ex-pectations equal to E [ Y |I ] − a , whereas 10% are over-optimistic and expect E [ Y |I ]+ a .Then one can show that our approach leads to reject RE when a ≥ . a = 1 . V ( η ) (cid:39) . < V ( ε ) = 1. Binary outcome

Our equivalence result does not require the outcome Y to becontinuously distributed. In the particular case where Y is binary, our test re-duces to the naive test of E ( Y ) = E ( ψ ). Indeed, when Y is a binary outcomeand ψ ∈ [0 , E ( Y ) = E ( ψ ), the inequalities E (cid:2) ( y − Y ) + − ( y − ψ ) + (cid:3) ≥ y ∈ R . This applies to expec-tations about binary events, such as, e.g., being employed or not at a given date. Interpretation of the boundary condition

To shed further light on our test andon the interpretation of H , it is instructive to derive the distributions of Y | ψ thatcorrespond to the boundary condition (∆( y ) = 0). The proposition below shows that,in the presence of rational expectations, agents whose beliefs ψ lies at the boundaryof H have perfect foresight, i.e. ψ = E [ Y |I ] = Y . Proposition 1

Suppose that ( Y, ψ ) satisﬁes RE, u (cid:55)→ F − Y | ψ ( τ | u ) is continuous for all τ ∈ (0 , , and ∆( y ) = 0 for some y in the interior of the support of ψ . Then thedistribution of Y conditional on ψ = y is degenerate: P ( Y = y | ψ = y ) = 1 . For any cdf F , we let F − denote its quantile function, namely F − ( τ ) = inf { x : F ( x ) ≥ τ } . .2.2 Equivalence with covariates In practice we may observe additional variables X ∈ R d X in both datasets. Assumingthat X is in the agent’s information set, we modify H as follows: H X : there exists a pair of random variables ( Y (cid:48) , ψ (cid:48) ) and a sigma-algebra I (cid:48) such that σ ( ψ (cid:48) , X ) ⊂ I (cid:48) , Y (cid:48) | X ∼ Y | X, ψ (cid:48) | X ∼ ψ | X and E [ Y (cid:48) |I (cid:48) ] = ψ (cid:48) . Adding covariates increases the number of restrictions that are implied by the rationalexpectation hypothesis, thus improving our ability to detect violations of rationalexpectations. Proposition 2 below formalizes this idea and shows that H X can beexpressed as a continuum of conditional moment inequalities, and one conditionalmoment equality. Proposition 2

Suppose that Assumption 1 holds. The following two statements areequivalent:(i) H X holds;(ii) Almost surely, E (cid:2) ( y − Y ) + − ( y − ψ ) + (cid:12)(cid:12) X (cid:3) ≥ for all y ∈ R and E [ Y − ψ | X ] =0 .Moreover, if H X holds, H holds as well. Oftentimes, the outcome variable is aﬀected not only by individual-speciﬁc shocks,but also by aggregate shocks. We denote by C the random variable corresponding tothe aggregate shocks. The issue, in this case, is that we observe a single realizationof C ( c , say), along with the outcome variable conditional on that realization C = c . In other words, we only identify F Y | C = c rather than F Y , as the latter wouldrequire to integrate over the distribution of all possible aggregate shocks. Moreover,the restriction E [ Y | C = c, ψ ] = ψ is generally violated, even though the rationalexpectations hypothesis holds. It follows that one cannot directly apply our previous See complementary work by Gutknecht et al. (2018), who use subjective expectations data torelax the rational expectations assumption, and propose a method allowing to test whether speciﬁccovariates are included in the agents’ information sets. F Y by F Y | C = c . In such a case, one has to make additionalassumptions on how the aggregate shocks aﬀect the outcome.To illustrate our approach, let us consider the example of individual income. Supposethat the logarithm of income of individual i at period t , denoted by Y it , satisﬁes aRestricted Income Proﬁle model: Y it = α i + β t + ε it , where β t capture aggregate (macroeconomic) shocks, ε it follows a zero-mean ran-dom walk, and α i , ( β t ) t and ( ε it ) t are assumed to be mutually independent. Let I it − denote individual i ’s information set at time t −

1, and suppose that I it − = σ ( α i , ( β t − k ) k ≥ , ( ε it − k ) k ≥ ). If individuals form rational expectations on their futureoutcomes, their beliefs in period t − t aregiven by ψ it = E [ Y it |I it − ] = α i + E [ β t | ( β t − k ) k ≥ ] + ε it − . Thus, Y it = ψ it + C t + ε it − ε it − , with C t = β t − E [ β t | ( β t − k ) k ≥ ]. The correspondingconditional expectation is given by: E [ Y it |I it − , C t = c t ] = ψ it + c t (cid:54) = ψ it . To get closer to our initial set-up, we now drop indexes i and t and maintain theconditioning on the aggregate shocks C = c implicit. Under these conventions, ra-tionalizing RE does not correspond to E [ Y |I ] = ψ , but instead to E [ Y |I ] = c + ψ for some c ∈ R . A similar reasoning applies to multiplicative instead of additiveaggregate shocks. In such a case, the null takes the form E [ Y |I ] = c ψ , for some c >

0. In these two examples, c is identiﬁable: by c = E ( Y ) − E ( ψ ) in the additivecase, by c = E ( Y ) / E ( ψ ) in the multiplicative case. Moreover, there exists in bothcases a known function q ( y, c ) such that E ( q ( Y, c )) = E ( ψ ), namely q ( y, c ) = y − c and q ( y, c ) = y/c for additive and multiplicative shocks, respectively.More generally, we consider the following null hypothesis for testing RE in the pres-ence of aggregate shocks:H S : there exist random variables ( Y (cid:48) , ψ (cid:48) ) , a sigma-algebra I (cid:48) and c ∈ R such that σ ( ψ (cid:48) ) ⊂ I (cid:48) , Y (cid:48) ∼ Y, ψ (cid:48) ∼ ψ and E [ q ( Y (cid:48) , c ) |I (cid:48) ] = ψ (cid:48) . q ( ., . ) is a known function supposed to satisfy the following restrictions. Assumption 2 E ( | ψ | ) < ∞ and for all c , E ( | q ( Y, c ) | ) < ∞ . Moreover, E [ q ( Y, c )] = E [ ψ ] admits a unique solution, c . By applying our main equivalence result (Theorem 1) to q ( Y, c ) and ψ , we obtainthe following result. Proposition 3

Suppose that Assumption 2 holds. Then the following statements areequivalent:(i) H S holds;(ii) E (cid:2) ( y − q ( Y, c )) + − ( y − ψ ) + (cid:3) ≥ for all y ∈ R . A few remarks on this proposition are in order. First, this result can be extendedin a straightforward way to a setting with covariates. This is important not onlyto increase the ability of our test to detect violations of RE, but also because thisallows for aggregate shocks that diﬀer across observable groups. We discuss furtherthis extension, and the corresponding statistical test, in Appendix A.1. Second, inthe presence of aggregate shocks, the null hypothesis does not involve a momentequality restriction anymore; the corresponding moment is used instead to identify c . Related, a clear limitation of the naive test ( E ( Y ) = E ( ψ )) is that, unlike our test,it is not robust to aggregate shocks. In this case, rejecting the null could either stemfrom violations of the rational expectation hypothesis, or simply from the presenceof aggregate shocks. Third, in Appendix A.2, we examine whether one can extendthe results above to test for RE when aggregate shocks aﬀect the outcomes in amore general way. Proposition 6 establishes a negative result in this respect: aslong as one allows for a suﬃciently ﬂexible dependence between the outcome andthe aggregate shocks, any given distribution of subjective expectations is arbitrarilyclose to a distribution for which RE can be rationalized. This implies that, withinthis more general class of outcome models, there does not exist any almost-surelycontinuous RE test that has non-trivial power.12 .2.4 Robustness to measurement errors We have assumed so far that Y and ψ were perfectly observed; yet measurementerrors in survey data are pervasive (see, e.g. Bound, Brown and Mathiowetz, 2001).We explore in the following the extent to which our test is robust to measurementerrors. By robust, we mean that the test does not incorrectly reject RE, when theyin fact hold. Speciﬁcally, assume that the true variables ( ψ and Y ) are unobserved.Instead, we only observe (cid:98) ψ and (cid:98) Y , which are aﬀected by classical measurement errors. Namely: (cid:98) ψ = ψ + ξ ψ with ξ ψ ⊥⊥ ψ, E [ ξ ψ ] = 0 (cid:98) Y = Y + ξ Y with ξ Y ⊥⊥ Y, E [ ξ Y ] = 0 . (1)The following proposition shows that our test is robust to a certain degree of mea-surement errors on the beliefs. Proposition 4

Suppose that Y and ψ satisfy H , and let ε = Y − ψ and (cid:16) (cid:98) ψ, (cid:98) Y (cid:17) bedeﬁned as in (1) . Suppose also that ε + ξ Y ⊥⊥ ψ and F ξ ψ dominates at the secondorder F ξ Y + ε . Then (cid:98) Y and (cid:98) ψ satisfy H . The key condition is that F ξ ψ dominates at the second order F ξ Y + ε , or, equivalentlyhere, that F ξ Y + ε is a mean-preserving spread of F ξ ψ . Recall that in the case ofnormal variables, ξ ψ ∼ N (0 , σ ) and ξ Y + ε ∼ N (0 , σ ), this is in turn equivalentto imposing σ ≤ σ . Thus, even if there is no measurement error on Y , so that ξ Y = 0, this condition may hold provided that the variance of measurement errorson ψ is smaller than the variance of the uncertainty shocks on Y . More generally,this allows elicited beliefs to be - potentially much - noisier than realized outcomes, asetting which is likely to be relevant in practice. One should not infer, however, thatmeasurement errors are innocuous in our set-up. Indeed, the converse of Proposition4 does not hold: we may reject H with Y and ψ , but not with (cid:98) Y and (cid:98) ψ . As asimple example, suppose that Y ∼ N (0 , σ Y ), ψ ∼ N (0 , σ ψ ), ξ Y ∼ N (0 , σ ), ξ ψ = 0and σ ψ ∈ ( σ Y , σ Y + σ ]. Then, (cid:98) Y and (cid:98) ψ satisfy H , since σ ψ ≤ σ Y + σ , whereas Y See Zafar (2011 a ) who does not ﬁnd evidence of non-classical measurement errors on subjectivebeliefs elicited from a sample of Northwestern undergraduate students. We conjecture that our testis robust to some forms of non-classical measurement errors. However, it seems diﬃcult in this caseto obtain a general result similar to the one in Proposition 4. ψ do not, since σ ψ > σ Y . Importantly though, Proposition 4 does show that ourtest is conservative in the sense that measurement errors cannot result in incorrectlyconcluding that the RE hypothesis does not hold.In situations where ( (cid:98) Y , (cid:98) ψ ) are jointly observed, one could in principle alternativelyimplement the direct test. However, in contrast to our test, the direct test is not robustto any measurement errors on the subjective beliefs ψ . Indeed, if RE holds, so that E [ Y | ψ ] = ψ , it is nevertheless the case that E (cid:104) (cid:98) Y (cid:12)(cid:12)(cid:12) (cid:98) ψ (cid:105) (cid:54) = (cid:98) ψ , as long as Cov( ξ Y , (cid:98) ψ ) =Cov( ξ ψ , Y ) = 0 and V ( ξ ψ ) >

0. In other words, even if individuals have rationalexpectations, the direct test will reject the null hypothesis in the presence of even anarbitrarily small degree of measurement errors on the elicited beliefs.Also, it is unclear whether, in the presence of measurement errors on the elicited beliefsand beyond the restrictions on the marginal distributions, there are restrictions onthe copula of ( (cid:98)

Y , (cid:98) ψ ) that are implied by RE. For instance, we show in Proposition 7in Appendix B that under RE, and without imposing restrictions on the dependencebetween ξ Y + ε and ξ ψ , the coeﬃcient of the (theoretical) linear regression of (cid:98) Y on (cid:98) ψ remains unrestricted. On the other hand, if one assumes that Cov( ξ Y + ε, ξ ψ ) ≥ V ( ψ ) / V ( ξ ψ ) ≥ λ for some λ ≥

0, Proposition 7 also shows that the coeﬃcient ofthe linear regression of (cid:98) Y on (cid:98) ψ is bounded from below under RE. Such a restriction,which does require to take a stand on the signal-to-noise ratio V ( ψ ) / V ( ξ ψ ), can beeasily added to the moment inequalities of our test if ( (cid:98) Y , (cid:98) ψ ) is observed. We now brieﬂy discuss other relevant directions in which Theorem 1 can be extended.First, another potential source of uncertainty on ψ is rounding. Rounding practicesby interviewees are common in the case of subjective beliefs. Under additional restric-tions, it is possible in such a case to construct bounds on the true beliefs ψ (see, e.g.,Manski and Molinari, 2010). We show in Appendix C that our test can be generalizedto accommodate this rounding practice.Second, we have implicitly maintained the assumption so far that subjective beliefsand realized outcomes are drawn from the same population. In Appendix D, we There might of course possibly be additional relevant information in the higher-order moments,although we have not been able to ﬁnd any. Y k ) k =1 ,..,K and multiple subjective beliefs ( ψ k ) k =1 ,..,K associ-ated with each of these outcomes. Speciﬁcally, whether one can rationalize rationalexpectations in this environment can be written as: E ( Y k | ψ , ..., ψ K ) = ψ k , for all k ∈ { , ..., K } which, in turn, is equivalent to the distribution of the outcomes Y k being a mean-preserving spread of the distribution of the beliefs ψ k . This situation arises in variouscontexts, including cases where respondents declare their subjective probabilities ofmaking particular choices among K + 1 possible alternatives. This also arises insituations where expectations about the distribution of a continuous outcome Y areelicited through questions of the form “what do you think is the percent chancethat [Y] will be greater than [y]?”, for diﬀerent values ( y k ) k =1 ,..,K . In such cases, itis natural to build a RE test based on the multiple outcomes ( { Y > y k } ) k =1 ,..,K and subjective beliefs ( ψ k ) k =1 ,..,K , where ψ k is the subjective survival function of Y evaluated at y k . We now propose a testing procedure for H X , which can be easily adapted to the casewhere no covariate common to both datasets is available to the analyst. To simplifynotation, we use a potential outcome framework to describe our data combinationproblem. Speciﬁcally, instead of observing ( Y, ψ ), we suppose to observe only, inaddition to the covariates X , (cid:101) Y = DY + (1 − D ) ψ and D , where D = 1 (resp. D = 0)if the unit belongs to the dataset of Y (resp. ψ ). As in Subsection 2.1, we assume thatthe two samples are drawn from the same population, which amounts to supposingthat D ⊥⊥ ( X, Y, ψ ) (see Assumption 3-(i) below). In order to build our test, we usethe characterization (ii) of Proposition 2: E (cid:2) ( y − Y ) + − ( y − ψ ) + (cid:12)(cid:12) X (cid:3) ≥ ∀ y ∈ R and E [ Y − ψ | X ] = 0 . (cid:101) Y only, E (cid:20) W (cid:16) y − (cid:101) Y (cid:17) + (cid:12)(cid:12)(cid:12)(cid:12) X (cid:21) ≥ ∀ y ∈ R and E (cid:104) W (cid:101) Y (cid:12)(cid:12)(cid:12) X (cid:105) = 0 , where W = D/ E ( D ) − (1 − D ) / E (1 − D ). This formulation of the null hypothesis allowsus to apply the instrumental functions approach of Andrews and Shi (2017, AS), whoconsider the issue of testing many conditional moment inequalities and equalities. Wethen build on their results to establish that our test controls size asymptotically andis consistent over ﬁxed alternatives. The initial step is to transform the conditionalmoments into the following unconditional moments conditions: E (cid:20) W (cid:16) y − (cid:101) Y (cid:17) + g ( X ) (cid:21) ≥ , E [( Y − ψ ) g ( X )] = 0 , for all y ∈ R and g belonging to a suitable class of non-negative functions.We suppose to observe a sample ( D i , X i , (cid:101) Y i ) i =1 ...n of n i.i.d. copies of ( D, X, (cid:101) Y ).We consider instrumental functions g that are indicators of belonging to speciﬁchypercubes within [0 , d X , hence we tranform the variables X i to lie in [0 , d X . Fornotational convenience, we let (cid:101) X i denote the nontransformed vector of covariates, andredeﬁne X i as: X i = Φ (cid:16)(cid:98) Σ − / (cid:101) X,n (cid:16) (cid:101) X i − (cid:101) X i (cid:17)(cid:17) , where, for any x = ( x , . . . , x d X ), we let Φ ( x ) = (Φ( x ) , . . . , Φ ( x d X )) (cid:62) . Here Φdenotes the standard normal cdf, (cid:98) Σ (cid:101) X,n is the sample covariance matrix of (cid:16) (cid:101) X i (cid:17) i =1 ...n and (cid:101) X n its sample mean.Speciﬁcally, we consider instrumental functions g belonging to the class of functions G r = { g a,r , a ∈ A r } , with A r = { , , . . . , r } d X ( r ≥ g a,r ( x ) = 1l { x ∈ C a,r } and,for any a = ( a , ..., a d X ) (cid:62) ∈ A r , C a,r = d X (cid:89) u =1 (cid:18) a u − r , a u r (cid:21) . Other testing procedures could be used to implement our test, such as that proposed by Lintonet al. (2010). T , we need to introduce additional notations. First,let w i = nD i / (cid:80) nj =1 D j − n (1 − D i ) / (cid:80) nj =1 (1 − D j ) and deﬁne, for any y ∈ R , m (cid:16) D i , (cid:101) Y i , X i , g, y (cid:17) =  m (cid:16) D i , (cid:101) Y i , X i , g, y (cid:17) m (cid:16) D i , (cid:101) Y i , X i , g, y (cid:17)  =  w i (cid:16) y − (cid:101) Y i (cid:17) + g ( X i ) w i (cid:101) Y i g ( X i )  . (2)Let m n ( g, y ) = (cid:80) ni =1 m (cid:16) D i , (cid:101) Y i , X i , g, y (cid:17) /n and deﬁne similarly m n,j for j = 1 ,

2. Forany function g and any y ∈ R , we also deﬁne, for some (cid:15) > n ( g, y ) = (cid:98) Σ n ( g, y ) + (cid:15) Diag (cid:16)(cid:98) V (cid:16) (cid:101) Y (cid:17) , (cid:98) V (cid:16) (cid:101) Y (cid:17)(cid:17) , where (cid:98) Σ n ( g, y ) is the sample covariance matrix of √ nm n ( g, y ) and (cid:98) V (cid:16) (cid:101) Y (cid:17) is theempirical variance of (cid:101) Y . We then denote by Σ n,jj ( g, y ) ( j = 1 ,

2) the j -th diagonalterm of Σ n ( g, y ).Then the (Cram´er-von-Mises) test statistic T is deﬁned by T =sup y ∈ (cid:98) Y r n (cid:88) r =1 (2 r ) − d X ( r + 100) (cid:88) a ∈ A r (cid:20) (1 − p ) (cid:18) − √ nm n, ( g a,r , y )Σ n, ( g a,r , y ) / (cid:19) +2 + p (cid:18) √ nm n, ( g a,r , y )Σ n, ( g a,r , y ) / (cid:19) (cid:21) , where (cid:98) Y = (cid:20) min i =1 ,...,n (cid:101) Y i , max i =1 ,...,n (cid:101) Y i (cid:21) , p ∈ (0 ,

1) is a parameter weighting the moments in-equalities versus equalities and ( r n ) n ∈ N is a deterministic sequence tending to inﬁnity.To test for rational expectations in the absence of covariates, we set the instrumentalfunction equal to the constant function g ( X ) = 1, and the test statistic is simplywritten as: T = sup y ∈ (cid:98) Y (cid:20) (1 − p ) (cid:18) − √ nm n, ( y )Σ n, ( y ) / (cid:19) +2 + p (cid:18) √ nm n, ( y )Σ n, ( y ) / (cid:19) (cid:21) , where, using the notations introduced above, m n,j ( y ) = m n,j (1 , y ) and Σ n,jj ( y ) =Σ n,jj (1 , y ) ( j = 1 , ϕ n,α =1l (cid:8) T > c ∗ n,α (cid:9) where the estimated critical value c ∗ n,α is obtained by bootstrap using asin AS the Generalized Moment Selection method. Speciﬁcally, we follow three steps:17. Compute the function ϕ n ( y, g ) = (cid:0) ϕ n, ( y, g ) , (cid:1) (cid:62) for ( y, g ) in (cid:98) Y × ∪ r n r =1 G r , with ϕ n, ( y, g ) = Σ / n, B n (cid:26) n / κ n Σ − / n, m n, ( y, g ) > (cid:27) , and where B n = ( b ln( n ) / ln(ln( n ))) / , b > κ n = ( κ ln( n )) / , and κ > n, , we ﬁx (cid:15) to 0 .

05, as in AS.2. Let (cid:16) D ∗ i , (cid:101) Y ∗ i , X ∗ i (cid:17) i =1 ,...,n denote a bootstrap sample, i.e., an i.i.d. sample fromthe empirical cdf of (cid:16) D, (cid:101) Y , X (cid:17) , and compute from this sample the bootstrapcounterparts of m n and Σ n , m ∗ n and Σ ∗ n . Then compute the bootstrap coun-terpart of T , T ∗ , replacing Σ n ( y, g a,r ) and √ nm n ( y, g a,r ) by Σ ∗ n ( y, g a,r ) and √ n ( m ∗ n − m n ) ( y, g a,r ) + ϕ n ( y, g a,r ), respectively.3. The threshold c ∗ n,α is the quantile (conditional on the data) of order 1 − α + η of T ∗ + η for some η >

0. Following AS, we set η to 10 − .Note that, despite the multiple steps involved, the testing procedure remains com-putationally easily tractable. In particular, for the baseline sample we use in ourapplication (see Section 5.1), the RE test only takes 2 minutes. We now turn to the asymptotic properties of the test. For that purpose, it is conve-nient to introduce additional notations. Let Y and X denote the support of Y and X respectively, and L F = (cid:26) ( y, g a,r ) : y ∈ Y , ( a, r ) ∈ A r × N : E F (cid:20) W (cid:16) y − (cid:101) Y (cid:17) + g a,r ( X ) (cid:21) = 0 (cid:27) , where, to make the dependence on the underlying probability measure explicit, E F denotes the expectation with respect to the distribution F of (cid:16) D, (cid:101) Y , X (cid:17) . Finally, let F denote a subset of all possible cumulative distribution functions of (cid:16) D, (cid:101) Y , X (cid:17) and F be the subset of F such that H X holds. We impose the following conditions on F and F . Assumption 3 This CPU time is obtained using our companion R package, on an Intel Xeon CPU E5-2643,3.30GHz with 256Gb of RAM. i) For all F ∈ F , D ⊥⊥ ( X, Y, ψ ) ;(ii) There exists M > such that (cid:101) Y ∈ [ − M, M ] for all F ∈ F . Also, inf F ∈F V F (cid:16) (cid:101) Y (cid:17) > and < inf F ∈F E F [ D ] ≤ sup F ∈F E F [ D ] < ;(iii) For all F ∈ F , K F , the asymptotic covariance kernel of n − / Diag (cid:16) V F (cid:16) (cid:101) Y (cid:17)(cid:17) − / m n is in a compact set K of the set of all × matrix valued covariance kernelson Y × ∪ r ≥ G r with uniform metric d deﬁned by d ( K, K (cid:48) ) = sup ( y,g,y (cid:48) ,g (cid:48) ) ∈ ( Y×∪ r ≥ H r ) (cid:107) K ( y, g, y (cid:48) , g (cid:48) ) − K (cid:48) ( y, g, y (cid:48) , g (cid:48) ) (cid:107) . The main result of this section is Theorem 2. It shows that, under Assumption 3, thetest ϕ n,α controls the asymptotic size and is consistent over ﬁxed alternatives. Theorem 2

Suppose that r n → ∞ and Assumption 3 holds. Then:(i) lim sup n →∞ sup F ∈F E F [ ϕ n,α ] ≤ α ;(ii) If there exists F ∈ F such that L F is nonempty and there exists ( j, y , g ) in { , } × L F such that K F ,jj ( y , g , y , g ) > , then, for any α ∈ [0 , / , lim η → lim sup n →∞ sup F ∈F E F [ ϕ n,α ] = α. (iii) If F ∈ F \F , then lim n →∞ E F ( ϕ n,α ) = 1 . Theorem 2 (i) is closely related to Theorem 5.1 and Lemma 2 in AS. It shows thatthe test ϕ n,α controls the asymptotic size, in the sense that the supremum over F of its level is asymptotically lower or equal to α . To prove this result, the key is toestablish that, under Assumption 3, the class of transformed unconditional momentrestrictions that characterize the null hypothesis satisﬁes a manageability condition(see Pollard, 1990). Using arguments from Hsu (2016), we then exhibit cases ofequality in Theorem 2 (ii), showing that, under mild additional regularity conditions,the test has asymptotically exact size (when letting η tend to zero). Finally, Theorem2 (iii), which is based on Theorem 6.1 in AS, shows that the test is consistent overﬁxed alternatives. 19 xtension to account for aggregate shocks This testing procedure can be easilymodiﬁed to accommodate unanticipated aggregate shocks. Speciﬁcally, using thenotation deﬁned in Section 2.2.3, we consider the same test as above after replacing (cid:101) Y by (cid:101) Y (cid:98) c = Dq ( Y, (cid:98) c ) + (1 − D ) ψ , where (cid:98) c denotes a consistent estimator of c . Theresulting test is given by ϕ n,α, (cid:98) c = 1l (cid:8) T ( (cid:98) c ) > c ∗ n,α (cid:9) (where T ( (cid:98) c ) is obtained by replacing (cid:101) Y by (cid:101) Y (cid:98) c in the original test statistic). Such tests have the same properties as thoseabove under some mild regularity conditions on q ( · , · ), which hold in particular for theleading examples of additive and multiplicative shocks ( q ( y, c ) = y − c and q ( y, c ) = y/c ). We refer the reader to Appendix A.1 for a detailed discussion of this extension. In the following we study the ﬁnite sample performances of the test without covariatesthrough Monte Carlo simulations. The ﬁnite sample performances of the version ofour test that accounts for covariates are reported and discussed in Appendix E.We suppose that the outcome Y is given by Y = ρψ + ε, with ρ ∈ [0 , ψ ∼ N (0 ,

1) and ε = ζ ( − { U ≤ . } + 1l { U ≥ . } ) , where ζ , U and ψ are mutually independent, ζ ∼ N (2 , .

1) and U ∼ U [0 , E ( Y | ψ ) = ρψ and expectations are rational if and only if ρ = 1. But since weobserve Y and ψ in two diﬀerent datasets, there are values of ρ (cid:54) = 1 for which our testcannot reject the null hypothesis. More precisely, we can show that as the samplesize n grows to inﬁnity, we reject the null if and only if ρ ≤ ρ ∗ (cid:39) . E ( Y ) = E ( ψ ) always fails to rejectRE, while the RE test based on variances is only able to detect a subset of violationsof RE that correspond to ρ < . b , κ , (cid:15) and η (seeSection 3 for deﬁnitions). As mentioned in Section 3, we set (cid:15) = 0 .

05 and η =10 − , following Andrews and Shi (2017). Andrews and Shi (2013) show that there20xists in practice a large range of admissible values for the other tuning parametersparameters. Regarding b and κ , we follow Beare and Shi (2019, Section 4.2) andcompute, for a grid of candidate parameters, the rejection rate under the null andunder one alternative (namely, ρ = 0 . b , κ ) so as to maximize the power subject to the constraint that the rejectionrate under the null is below the nominal size 0.05. That way, we obtain b = 0 . κ = 0 . p has a distinct eﬀect, in that its choice does notaﬀect size, at least asymptotically. Rather, this parameter selects to what extentthe test aims power at the equality constraint E ( Y − ψ ) = 0 versus the inequalities E [( y − Y ) + − ( y − ψ ) + ] ≥ y ∈ R ). Setting p to 0 .

05 leads to slightly higher powerin our DGP, but values of p in [0 , .

31] provide similar ﬁnite sample performances,with power always greater than 90% of the maximal power.Results reported in Figure 1 show the power curves of the test ϕ α for ﬁve diﬀer-ent sample sizes ( n Y = n ψ = n ∈ { , , , } ) as a function ofthe parameter ρ , using 800 simulations for each value of ρ . We use 500 bootstrapsimulations to compute the critical values of the test.Several remarks are in order. First, as expected, under the alternative (i.e. forvalues of ρ ≤ ρ ∗ = 0 . n . Inparticular, for the largest sample size n = 3 , ρ as large as .45. Second, in this setting, our testis conservative in the sense that rejection frequencies under the null are smaller than α = 0 .

05, for all sample sizes. This should not necessarily come as a surprise since thetest proposed by AS has been shown to be conservative in alternative ﬁnite-samplesettings (see, e.g.

Table 1 p.22 in AS for the case of ﬁrst-order stochastic dominancetests). However, for the version of our test that accounts for covariates and for thedata generating process considered in Appendix E, rejection frequencies under thenull are very close to the nominal level. 21 otes: The vertical line at ρ (cid:39) .

616 corresponds to the theoretical limit for the rejectionof the null hypothesis using our test. The dotted horizontal line corresponds to the 5%level.

Figure 1: Power curves.

Using the tests developed in Section 3, we now investigate whether household headsform rational expectations on their future earnings. We use for this purpose data fromthe Survey of Consumer Expectations (SCE), a monthly household survey that hasbeen conducted by the Federal Reserve Bank of New York since 2012 (see Armantier,Topa, Van der Klaauw and Zafar, 2017, for a detailed description of the survey,and Kuchler and Zafar, 2019; Conlon, Philossoph, Wiswall and Zafar, 2018; Fuster,Kaplan and Zafar, 2020 for recent articles using the SCE). The SCE is conductedwith the primary goal of eliciting consumer expectations about inﬂation, householdﬁnance, labor market, as well as housing market. It is a rotating internet-based panelof about 1,200 household heads, in which respondents participate for up to twelvemonths. Each month, the panel consists of about 180 entrants, and 1,100 repeatedrespondents. While entrants are overall fairly similar to the repeated respondents, Each survey takes on average about ﬁfteen minutes to complete, and respondents are paid $ ψ ) over the next four months: “What do you believe your annualearnings will be in four months?”. Implicit throughout the rest of our analysis is theassumption that these elicited beliefs correspond to the mean of the subjective beliefsdistribution. In this module, respondents are also asked about current job outcomes,including their current annual earnings ( Y ), through the following question: “Howmuch do you make before taxes and other deductions at your [main/current] job, onan annual basis?”.Speciﬁcally, we use for our baseline test the elicited earnings expectations ( ψ ), whichare available for two cross-sectional samples of household heads who were workingeither full-time or part-time at the time of the survey, and responded to the labormarket module in March 2015 and July 2015 respectively. We combine this data withcurrent earnings ( Y ) declared in July 2015 and November 2015 by the respondentswho are working full-time or part-time at the time of the survey. This leaves uswith a ﬁnal sample of 2,993 observations, which is composed of 1,565 earnings expec-tation observations, and 1,428 realized earnings observations. 51% (1,536) of theseobservations correspond to the sub-sample of respondents who are reinterviewed atleast once. We refer to Table 1 for additional details on our sample. This assumption, while often made in the subjective expectations literature, is a priori restric-tive. In this application, for the vast majority of the sub-groups of the population, the mean of ψ cannot be statistically distinguished from the one of Y (see Table 2 below). This provides empiricalsupport for this assumption. Throughout our analysis (with the exception of the number of observations reported in Table 2)we use the monthly survey weights of the SCE in order to obtain an estimation sample that isrepresentative of the population of U.S. household heads. See Armantier et al. (2017) for more detailson the construction of these weights. We also Winsorize the top 5 percentile of the distributions ofrealized earnings and earnings beliefs.

Mean Std. dev.Male 0.53 0.50White 0.74 0.43College degree 0.49 0.46Low numeracy 0.33 0.47Tenure ≤ ψ (Earnings beliefs) $ $ Y (Realized earnings) $ $ We summarize how we implemented the test in practice, either on the overall sampleor on each subsample corresponding to the binary covariates in Table 1. For eachcase, we start by winsorizing the distribution of realized earnings ( Y ) and earningsbeliefs ( ψ ) at the 95% level. Then, we perform the test without covariates, wherewe allow for multiplicative aggregate shock and thus test H S , with q ( y ; c ) = y/c . Then, we use the function test of our companion R package RationalExp. Wechoose the same values for the tuning parameters b = 0 . κ = 0 .

001 as in theMonte-Carlo simulations in Section 4. We also set p = 0 . (cid:15) = 0 .

05, and η = 10 − .Following Andrews and Shi (2017), the interval (cid:98) Y is approximated by a grid of length100 from min i =1 ,...,n (cid:101) Y i to max i =1 ,...,n (cid:101) Y i . Finally, we use 5,000 bootstrap simulations to computethe critical values of the test. In Table 2 below, we report the results from the naive test of RE ( E ( Y ) = E ( ψ )), andour preferred test (“Full RE”), where we allow for multiplicative aggregate shocks.We implement the tests both on the overall population and on separate subgroups. We show in Table 4 in Appendix F.1 that our results are robust to other levels of Winsorization. In our application, the parameter c is estimated using survey weights from the SCE. See Section 3 in our user’s guide (D’Haultfœuille et al., 2018 a ) for details on this function. Third, the results from our test point to beliefs formation being heterogeneous acrossschooling (college degree vs. no college degree) and tenure (more or less than 6months spent in current job) levels. In particular, we cannot rule out that the beliefsabout future earnings of individuals with more schooling experience correspond torational expectations with respect to some information set. Similarly, while we rejectRE at any standard level for the subgroup of workers who have accumulated lessthan 6 months of experience in their current job, we can only marginally reject atthe 10% level RE for those who have been in their current job for a longer periodof time. As such, these ﬁndings complement some of the recent evidence from theeconomics of education and labor economics literatures that individuals have moreaccurate beliefs about their ability as they progress through their schooling and workcareers (see, e.g., Stinebrickner and Stinebrickner, 2012; Arcidiacono, Aucejo, Maureland Ransom, 2016). Respondents’ numeracy is evaluated in the SCE through ﬁve questions involving computationof sales, interests on savings, chance of winning lottery, of getting a disease and being aﬀected bya viral infection. Respondents are then partitioned into two categories: “High numeracy” (4 or 5correct answers), and “low numeracy” (3 or fewer correct answers). E ( Y − ψ ) / E ( Y ) Naive RE Variance RE Full RE Number of obs.(p-val) (p-val) (p-val) ψ Y All 0.034 0.23 0.71 < .

001 1,565 1,428Women 0.059 0.13 0.62 < .

001 730 649Men 0.025 0.48 0.58 0.210 835 779White 0.032 0.31 0.67 0.021 1,200 1,097Minorities 0.046 0.43 0.60 < .

006 365 331College degree -0.001 0.96 0.50 0.130 1,106 1,053No college degree 0.093 0.04 0.57 0.013 459 375High numeracy 0.033 0.28 0.62 0.012 1,158 1,070Low numeracy 0.055 0.27 0.58 0.022 407 358Tenure ≤ < .

001 271 180Tenure > .

091 1,294 1,248Notes: “Naive RE” denotes the naive RE test of equality of means between Y and ψ .“Variance RE” denotes the variance RE test where the null hypothesis is the variance of Y being greater or equal than the variance of ψ , once we account for aggregate, multiplicativeshocks. “Full RE” denotes the test without covariates, where we test H S with q ( y, c ) = y/c . We use 5,000 bootstrap simulations to compute the critical values of the Full REtest. Distributions of realized earnings ( Y ) and earnings beliefs ( ψ ) are both Winsorizedat the 95% quantile. Fourth, using the naive test of equality of means between earnings beliefs and realiza-tions, one would instead generally not reject the null at any standard levels. The oneexception is the subgroup of workers without a college degree, for whom the naivetest yields rejection of RE at the 5% level. But, as discussed before, one cannot ruleout that such a rejection is due to aggregate shocks.Even though individuals in the overall sample form expectations over their earningsin the near future that are realistic, in the sense of not being signiﬁcantly biased, theresult from our preferred test shows that earnings expectations are nonetheless notrational. Taken together, these ﬁndings highlight the importance of incorporatingthe additional restrictions of rational expectations that are embedded in our test,using the distributions of subjective beliefs and realized outcomes to detect violationsof rational expectations. That the variance test of RE never rejects the null at26ny standard levels indicates that it is important in practice to go beyond the ﬁrstmoments, and exploit instead the full distributions of beliefs and outcomes to detectdepartures from rational expectations. These results also suggest that, in order torationalize the realized and expected earnings data, one should consider alternativemodels of expectation formation that primarily diﬀer from RE in their third, orhigher-order moments.The results of the direct test of RE on the subsample of individuals who are followedover four months are reported in Table 3 below. While these results generally paint asimilar picture to the results of our test, there are some diﬀerences. In particular, thedirect test rejects RE at the 5% level for men and at 1% for individuals with tenuregreater than 6 months, whereas we do not reject RE for the former group and onlymarginally so, at the 10% level, for the latter. The direct test also rejects with lesspower than our test for certain groups (low numeracy, tenure lower than 6 months,and minorities). This lower power may seem surprising given that the direct test canexploit the joint distribution of (

Y, ψ ), but is simply due to the important reductionin sample size when focusing on the subsample of individuals who are followed overfour months results.There are also important issues associated with the direct test, which generally war-rant caution when interpreting the results from this test. Most importantly, as alreadydiscussed in Section 2.2.4, the direct test is not robust to measurement errors on thesubjective beliefs ψ . As shown in Proposition 7 in Appendix B, it is however possibleto derive a restriction on β under RE. Speciﬁcally, if ξ ψ is positively correlated with ε + ξ Y , we have, under RE, β ≥ −

11 + λ , (3)where λ is a lower bound on the signal-to-noise ratio V ( ψ ) / V ( ξ ψ ). Table 3 also reportsthe results of tests combining (3) with the restrictions on the marginal distributionsused in our full RE test. Adding the restriction (3) does not change the resultsfor values of signal-to-noise ratio between 5 and 20 (i.e., for noise-to-signal ratiosbetween 5% and 20%). Overall, using the subsample of linked data ( Y, ψ ) throughthis additional restriction does not add much to our test, at least once we account forpossible measurement errors on the elicited beliefs. Another signiﬁcant concern withthe direct test, and, more generally, the use of linked data on (

Y, ψ ), is that attrition27ay be endogenous. We discuss this issue in more details in Appendix F.2.Table 3: Direct test, our test, and combined test of RE on annual earnings β Direct test Full RE Combined test Number of obs.Bound on signal/noise λ β ψ Y ( ψ, Y )All 0.954 0 . < . < . < .

001 1,565 1,428 768Women 0.956 0 . < . < . < .

001 730 649 356Men 0.960 0 .

021 0.210 0.276 0.276 835 779 412White 0.963 0 .

004 0.021 0.019 0 .

010 1,200 1,097 596Minorities 0.928 0 .

010 0.006 0.007 0.005 365 331 172College degree 0.974 0 .

060 0.130 0.182 0.182 1,106 1,053 560No college degree 0.954 0 .

044 0.013 0.017 0.017 459 375 208High numeracy 0.959 0 .

001 0.012 0.016 0.016 1,158 1,070 573Low numeracy 0.954 0 .

094 0.022 0.030 0.030 407 358 195Tenure ≤ .

015 0 .

001 0.002 0.001 271 180 98Tenure > .

001 0 .

091 0.094 0.094 1,294 1,248 670Notes: “Direct test” denotes the direct test of RE when ( ψ, Y ) is observed. β is the coeﬃcient of theregression of Y on ψ in that case. “Full RE” denotes the test without covariates, where we test H S with q ( y, c ) = y/c . We use 5,000 bootstrap simulations to compute the critical values of the Full REtest. “Combined RE test” denotes the test without covariates, where we test H S with q ( y, c ) = y/c ,which is the “Full RE” test, combined with the additional restriction β ≥ − / (1 + λ ), where λ is ana priori bound on the signal-to-noise ratio. Distributions of realized earnings ( Y ) and earnings beliefs( ψ ) are both Winsorized at the 95% quantile. Coming back to our test, the rejection of RE for the overall population but also formost of the subpopulations are, in view of Proposition 4, unlikely to be due to dataquality issues. In that sense, these results may be seen as robust evidence against theRE hypothesis for individual earnings, at least in this context. As a result, conclusionsof behavioral models based on the assumption that agents form rational expectationsabout their future earnings may be misleading. Exploring this important questionrequires one to go beyond testing though, by quantifying the extent to which modelpredictions are actually sensitive to the violations from rational expectations thathave been detected with our test. We investigate this issue in D’Haultfœuille et al.(2018 b ) in the context of a life-cycle consumption model.28 Conclusion

In this paper, we develop a new test of rational expectations that can be used ina broad range of empirical settings. In particular, our test only requires havingaccess to the marginal distributions of realizations and subjective beliefs. As such,it can be applied in frequent cases where realizations and beliefs are observed intwo separate datasets, or only observed for a selected sub-population. By bypassingthe need to link beliefs to future realizations, our approach also enables to test forrational expectations without having to wait until the outcomes of interest are realizedand made available to researchers. We establish that whether one can rationalizerational expectations is equivalent to the distribution of realizations being a mean-preserving spread of the distribution of beliefs, a condition which can be tested usingrecent tools from the moment inequalities literature. We show that our test caneasily accommodate covariates and aggregate shocks, and, importantly for practicalpurpose, is robust to some degree of measurement errors on the elicited beliefs. Weapply our method to test for rational expectations about future earnings, using datafrom the Survey of Consumer Expectations. While individuals tend to be right onaverage about their future earnings, our test strongly rejects rational expectations.Beyond testing, in this application as in any other situations where rational expec-tations are violated, a natural next step is to evaluate the deviations from rationalexpectations that one can rationalize from the available data. In the context of struc-tural analysis, a central question then becomes to which extent the main predictionsof the model are sensitive to those departures from rational expectations. We ex-plore this important issue and propose in D’Haultfœuille et al. (2018 b ) a tractablesensitivity analysis framework on the assumed form of expectations.29 eferences Aguirregabiria, V. and Mira, P. (2010), ‘Dynamic discrete choice structural models:A survey’,

Journal of Econometrics , 38–67.Andrews, D. (1994), ‘Empirical process methods in econometrics’,

Handbook of econo-metrics , 2247–2294.Andrews, D. and Shi, X. (2013), ‘Inference based on conditional moment inequalities’, Econometrica (2), 609–666.Andrews, D. and Shi, X. (2017), ‘Inference based on many conditional moment in-equalities’, Journal of Econometrics (2), 275–287.Arcidiacono, P., Aucejo, E., Maurel, A. and Ransom, T. (2016), College attrition andthe dynamics of information revelation. NBER Working Paper No. 22325.Arcidiacono, P., Hotz, J. and Kang, S. (2012), ‘Modeling college major choices usingelicited measures of expectations and counterfactuals’,

Journal of Econometrics (1), 3–16.Arcidiacono, P., Hotz, J. V., Maurel, A. and Romano, T. (2014), Recovering ex antereturns and preferences for occupations using subjective expectations data. NBERWorking Paper No. 20626.Armantier, O., Topa, G., Van der Klaauw, W. and Zafar, B. (2017), ‘An overview ofthe survey of consumer expectations’,

Economic Policy Review (2), 51–72.Beare, B. and Shi, X. (2019), ‘An improved bootstrap test of density ratio ordering’, Econometrics and Statistics , 9–26.Bertanha, M. and Moreira, M. J. (2020), ‘Impossible inference in econometrics: The-ory and applications’, Journal of Econometrics , 247–270.Biroli, P., Boneva, T., Raja, A. and Rauh, C. (2020), ‘Parental beliefs about returnsto child health investments’,

Journal of Econometrics

Forthcoming .Blundell, R. (2017), ‘What have we learned from structural models?’,

American Eco-nomic Review: Papers and Proceedings (5), 287–292.30oneva, T. and Rauh, C. (2018), ‘Parental beliefs about returns to educational in-vestments - the later the better?’,

Journal of the European Economic Association (6), 1669–1711.Bound, J., Brown, C. and Mathiowetz, N. (2001), ‘Measurement error in survey data’, Handbook of Econometrics , 3705–3843.Buchinsky, M., Li, F. and Liao, Z. (2019), ‘Estimation and inference of semiparametricmodels using data from several sources’, Journal of Econometrics

Forthcoming .Chen, X., Linton, O. and Van Keilegom, I. (2003), ‘Estimation of semiparametricmodels when the criterion function is not smooth’,

Econometrica (5), 1591–1608.Conlon, J. J., Philossoph, L., Wiswall, M. and Zafar, B. (2018), Labor market searchwith imperfect information and learning. NBER Working Paper No. 24988.Cross, P. J. and Manski, C. F. (2002), ‘Regressions, short and long’, Econometrica (1), 357–368.Cunha, F. and Heckman, J. J. (2007), ‘Identifying and estimating the distributionsof ex post and ex ante returns to schooling’, Labour Economics (6), 870–93.Davydov, Y. A., Lifshits, M. A. and Smorodina, N. V. (1998), Local properties ofdistributions of stochastic functionals , American Mathematical Society.de Paula, A., Shapira, G. and Todd, P. E. (2014), ‘How beliefs about hiv statusaﬀect risky behaviors: Evidence from malawi’,

Journal of Applied Econometrics (6), 944–964.Delavande, A. (2008), ‘Pill, patch, or shot? subjective expectations and birth controlchoice’, International Economic Review (3), 999–1042.D’Haultfœuille, X., Gaillac, C. and Maurel, A. (2018 a ), Rationalexp: Tests of anddeviations from rational expectations. Working paper.D’Haultfœuille, X., Gaillac, C. and Maurel, A. (2018 b ), Rationalizing rational expec-tations? Tests and deviations. NBER Working Paper No. 25274.31an, Y., Sherman, R. and Shum, M. (2014), ‘Identifying treatment eﬀects under datacombination’, Econometrica (2), 811–822.Fuster, A., Kaplan, G. and Zafar, B. (2020), ‘What would you do with $ Review of Economic Studies

Forth-coming .Gennaioli, N., Ma, Y. and Shleifer, A. (2016), ‘Expectations and investment’,

NBERMacroeconomics Annual (1), 379–431.Gourieroux, C. and Pradel, J. (1986), ‘Direct test of the rational expectation hypoth-esis’, European Economic Review (2), 265–284.Gozlan, N., Roberto, C., Samson, P.-M., Shu, Y. and Tetali, P. (2018), ‘Character-ization of a class of weak transport-entropy inequalities on the line’, Annales del’IHP (3), 1667–1693.Gutknecht, D., Hoderlein, S. and Peters, M. (2018), ‘Constrained information pro-cessing and individual income expectations’. Working paper.Hoﬀman, M. and Burks, S. V. (2020), ‘Worker overconﬁdence: Field evidence andimplications for employee turnover and returns from training’, Quantitative Eco-nomics , 315–348.Hsu, Y.-C. (2016), ‘Consistent tests for conditional treatment eﬀects’, The Econo-metrics Journal (1), 1–22.Ivaldi, M. (1992), ‘Survey evidence on the rationality of expectations’, Journal ofApplied Econometrics (3), 225–241.Kuchler, T. and Zafar, B. (2019), ‘Personal experiences and expectations about ag-gregate outcomes’, Journal of Finance , 2491–2542.Linton, O., Song, K. and Whang, Y.-J. (2010), ‘An improved bootstrap test ofstochastic dominance’, Journal of Econometrics , 186–202.Lovell, M. C. (1986), ‘Tests of the rational expectations hypothesis’,

American Eco-nomic Review (1), 110–124. 32anski, C. (2004), ‘Measuring expectations’, Econometrica (5), 1329–1376.Manski, C. and Molinari, F. (2010), ‘Rounding probabilistic expectations in surveys’, Journal of Business & Economic Statistics (2), 219–231.Molinari, F. and Peski, M. (2006), ‘Generalization of a result on “regressions, shortand long”’, Econometric Theory (1), 159–163.Mulansky, B. and Neamtu, M. (1998), ‘Interpolation and approximation from convexsets’, Journal of approximation theory (1), 82–100.Muth, J. F. (1961), ‘Rational expectations and the theory of price movements’, Econo-metrica (3), 315–335.Patton, A. J. and Timmermann, A. (2012), ‘Forecast rationality tests based on multi-horizon bounds’, Journal of Business & Economic Statistics (1), 1–17.Pesaran, M. H. (1987), The limits to rational expectations , Basil Blackwell.Pollard, D. (1990), Empirical processes: theory and applications, in ‘NSF-CBMSregional conference series in probability and statistics’, Institute of MathematicalStatistics and the American Statistical Association, pp. i–86.Ridder, G. and Moﬃtt, R. (2007), ‘The econometrics of data combination’, Handbookof Econometrics , 5469–5547.Rothschild, M. and Stiglitz, J. (1970), ‘Increasing risk: I. a deﬁnition’, Journal ofEconomic Theory (3), 225–243.Stinebrickner, R. and Stinebrickner, T. (2012), ‘Learning about academic ability andthe college dropout decision’, Journal of Labor Economics (4), 707–748.Stinebrickner, R. and Stinebrickner, T. (2014 a ), ‘Academic performance and collegedropout: Using longitudinal expectations data to estimate a learning model’, Jour-nal of Labor Economics (3), 601–644.Stinebrickner, R. and Stinebrickner, T. (2014 b ), ‘A major in science? initial beliefsand ﬁnal outcomes for college major and dropout’, The Review of Economic Studies (1), 426–472. 33trassen, V. (1965), ‘The existence of probability measures with given marginals’, The Annals of Mathematical Statistics (2), 423–439.Van der Klaauw, W. (2012), ‘On the use of expectations data in estimating structuraldynamic choice models’, Journal of Labor Economics (3), 521–554.Van der Klaauw, W. and Wolpin, K. I. (2008), ‘Social security and the retirementand savings behavior of low-income households’, Journal of Econometrics (1-2), 21–42.Van der Vaart, A. (2000),

Asymptotic statistics , Cambridge University Press.Van der Vaart, A. and Wellner, J. (1996), Weak convergence, in ‘Weak Convergenceand Empirical Processes’, Springer, pp. 16–28.Wiswall, M. and Zafar, B. (2015), ‘Determinants of college major choice: Identiﬁca-tion using an information experiment’, The Review of Economic Studies (2), 791–824.Zafar, B. (2011 a ), ‘Can subjective expectations data be used in choice models? evi-dence on cognitive biases’, Journal of Applied Econometrics (3), 520–544.Zafar, B. (2011 b ), ‘How do college students form expectations?’, Journal of LaborEconomics (2), 301–348. 34 Aggregate shocks

A.1 Statistical tests in the presence of aggregate shocks

In this appendix, we show how to adapt the construction of the test statistic and ob-tain similar results as in Theorem 2 in the presence of aggregate shocks. As explainedin Section 2.2.3, we mostly have to replace (cid:101) Y by (cid:101) Y c = Dq (cid:16) (cid:101) Y , c (cid:17) + (1 − D ) ψ . Becausewe include covariates here, as in Section 3, c is actually a function of X . Also, thetrue function c has to be estimated. We let (cid:98) c denote such a nonparametric estimator,which is based on E [ q ( Y, c ( X )) | X ] = E [ ψ | X ]. When q ( y, c ) = y − c or q ( y, c ) = y/c ,we get respectively c ( X ) = E ( Y | X ) − E ( ψ | X ) and c ( X ) = E ( Y | X ) / E ( ψ | X ), and (cid:98) c is easy to compute using nonparametric estimators of E ( Y | X ) and E ( ψ | X ).Because in Proposition 3 (ii) we do not test for a moment equality anymore, m (cid:16) D i , (cid:101) Y i , X i , g, y (cid:17) reduces to m (cid:16) D i , (cid:101) Y c,i , X i , g, y (cid:17) . We let hereafter m n ( g, y ) = (cid:80) ni =1 m (cid:16) D i , (cid:101) Y c,i , X i , g, y (cid:17) /n .In the test statistic T , we replace, for ( y, g ) ∈ Y × ∪ r ≥ G r , Σ n ( g, y ) by Σ n ( g, y ) = (cid:98) Σ n ( g, y ) + (cid:15) Diag (cid:16)(cid:98) V (cid:16) (cid:101) Y ˆ c (cid:17) , (cid:98) V (cid:16) (cid:101) Y ˆ c (cid:17)(cid:17) , where (cid:98) Σ n ( g, y ) and (cid:98) V (cid:16) (cid:101) Y ˆ c (cid:17) are respectively thesample covariance matrix of √ nm n ( g, y ) and the empirical variance of (cid:101) Y ˆ c . The lastdiﬀerence with the test considered in Section 3 is that when using the bootstrap tocompute the critical value, we also have to re-estimate c in the bootstrap sample.We obtain in this context a result similar to Theorem 2 above, under the regularityconditions stated in Assumption 4. We let hereafter C s (cid:0) [0 , d X (cid:1) denote the spaceof continuously diﬀerentiable functions of order s on [0 , d X that have a ﬁnite norm (cid:107) c (cid:107) s, ∞ = max | k |≤ s sup x ∈ [0 , dX (cid:12)(cid:12) c ( k ) ( x ) (cid:12)(cid:12) . We also let, for any function f on a set G , (cid:107) f (cid:107) G =sup x ∈G | f ( x ) | . Finally, when the distribution of (cid:16) D, (cid:101) Y , X (cid:17) is F , K F denotes theasymptotic covariance kernel of n − / Diag (cid:16) V (cid:16) (cid:101) Y c (cid:17)(cid:17) − / m . Assumption 4 (i) (cid:98) c and c belong to C s (cid:0) [0 , d X (cid:1) , with s ≥ d X . Moreover, (cid:107) (cid:98) c − c (cid:107) [0 , dX = o P (1) .(ii) For all y ∈ Y , q is Lipschitz on Y × [ − C, C ] for some C > (cid:107) c (cid:107) [0 , dX . Moreover, sup ( y,c ) ∈Y× [ − C,C ] | q ( y, c ) | ≤ M ;(iii) For all c ∈ R , the function q ( · , c ) : Y → Y is bijective and its inverse q I ( · , c ) isLipschitz on Y ; iv) F ψ | X ( ·| x ) , F Y | X ( ·| x ) are Lipschitz on Y uniformly in x ∈ [0 , d X with constants Q F, satisfying sup F ∈F Q F, ≤ Q < ∞ . Also, F q ( ψ,c ( X )) , F q ( Y,c ( X )) are Lipschitzon [ − M , M ] with constants Q F, satisfying sup F ∈F Q F, ≤ Q < ∞ ;(iv) inf F ∈F V F (cid:104) (cid:101) Y c (cid:105) > and (cid:15) ≤ inf F ∈F E F [ D ] ≤ sup F ∈F E F [ D ] ≤ − (cid:15) for some ε ∈ (0 , / . Also, (cid:98) V F (cid:104) (cid:101) Y (cid:98) c (cid:105) is a consistent estimator of V F (cid:104) (cid:101) Y c (cid:105) . Part (i) imposes some regularity conditions on c and its nonparametric estimator (cid:98) c .It is possible to check such regularity conditions on (cid:98) c with kernel or series estimatorsof E ( Y | X ) and E ( ψ | X ). Parts (ii) and (iii) also hold when q ( y, c ) = y − c and q ( y, c ) = q ( y ) /c , by imposing in the second case that c belongs to a compact subset of(0 , ∞ ). Proposition 5 shows that under these conditions, the test has asymptoticallycorrect size. Proposition 5

Suppose that r n → ∞ and that Assumptions 3 and 4 hold. Then (i)in Proposition 2 holds, replacing ϕ n,α by ϕ n,α, (cid:98) c . Results like (ii) and (iii) in Proposition 2 could also be obtained under the conditionsof Proposition 5, modifying directly the proof of Proposition 2.

A.2 Impossibility results with more ﬂexible eﬀects of aggre-gate shocks

We show here that restrictions in the way aggregate shocks aﬀect the outcome areneeded to be able to reject RE with F Y and F ψ . We consider for that purpose thefollowing model: Y = K (cid:88) k =0 C k V k + ε, (4)where V is I -measurable and the individual shock ε satisﬁes E [ ε |I ] = 0. The vector C := ( C , ..., C K ) (cid:48) represents aggregate shocks, which is assumed to be independentof I , with support R K +1 . We also assume that E ( C ) = (0 , , , ..., (cid:48) , so that V = E [ Y |I ] and under RE, ψ = V . Let Q c ( y ) = (cid:80) Kk =0 c k y k . Then E ( Y | C = c, I ) = Q c ( V )and under RE, we have E ( Y | C = c, I ) = Q c ( ψ ) . SK : there exist random variables ( Y (cid:48) , ψ (cid:48) ) , a sigma-algebra I (cid:48) and c ∈ R K +1 such that σ ( ψ (cid:48) ) ⊂ I (cid:48) , Y (cid:48) ∼ Y, ψ (cid:48) ∼ ψ and E [ Y (cid:48) |I (cid:48) ] = Q c ( ψ (cid:48) ) . The following proposition is a negative result on the possibility to test for H SK . Proposition 6

Suppose that F Y and F ψ are continuous with supports that are boundedintervals. For any η > , there exists K > and F , with sup u ∈ R | F ( u ) − F ψ ( u ) | < η ,such that H SK holds with Y and (cid:101) ψ ∼ F (instead of ψ ). Proposition 6 states that as K grows large, the set of cdfs F Y and F ψ satisfyingH SK (and thus RE in Model (4)) becomes arbitrarily close, for the Kolmogorov-Smirnov metric, to the set of of cdfs F Y and F ψ that do not satisfy H SK . In otherwords, ∪ K ∈ N H SK is dense in the set of all continuous cdfs having bounded intervalas supports. When combined with Theorem 2 in Bertanha and Moreira (2020), thisimplies that there does not exist any almost-surely continuous test of ∪ K ∈ N H SK thathas non-trivial power.A similar, negative result holds if aggregate shocks are allowed to vary with respect tounobserved, individual-speciﬁc variables. For instance, shocks may be sector-speciﬁc,but sectors may be unobserved in the data. To show such an impossibility result,consider the following model: Y = q ( C, U ) + V + ε, where both U and V are I− measurable, C is an aggregate shock independent of I andthe individual shock ε satisﬁes E [ ε |I ] = 0. Thus, aggregate shocks aﬀect the outcomein an additive way, but heterogeneously across individuals, depending on their U ,which is assumed to be unobserved by the econometrician and can thus depend on V in a ﬂexible way. We assume without loss of generality that E [ q ( C, U ) |I ] = 0, sothat ψ = V under RE. Let us also assume that q ( u, c ) = (cid:80) Kk =0 c k u k and U = ξV ,with ξ > ξ ⊥⊥ V and E [ ξ k ] < ∞ for all k ≤ K . Let C (cid:48) k = E [ ξ k ] C k if k (cid:54) = 1, C (cid:48) = E [ ξ ] C − C (cid:48) = ( C (cid:48) , ..., C (cid:48) K ) (cid:48) . Then, under RE, E [ Y | C (cid:48) = c (cid:48) , I ] = K (cid:88) k =0 c (cid:48) k ψ k . C ) = R K +1 , we also have Supp( C (cid:48) ) = R K +1 , and no constraint isimposed on c (cid:48) . As a result, we are led again to test H SK , and the same negativeresult as above holds. B Tests based on linear regressions with measure-ment errors

We suppose here to observe both ( (cid:98)

Y , (cid:98) ψ ) satisfying (1). In this framework, we studythe restrictions that RE entail on the coeﬃcient β of the (theoretical) linear regressionof (cid:98) Y on (cid:98) ψ . Proposition 7

1. For any values of ( V ( (cid:98) Y ) , V ( (cid:98) ψ ) , Cov ( (cid:98) Y , (cid:98) ψ )) such that V ( (cid:98) Y ) > V ( (cid:98) ψ ) , there exists a DGP compatible with this triple, satisfying (1) , for whichRE hold and such that ε + ξ Y ⊥⊥ ψ and F ξ ψ dominates at the second order F ξ Y + ε .2. If β < − / (1 + λ ) for some λ ≥ , there exists no DGP compatible with thisvalue of β , satisfying (1) , for which RE hold and such that corr ( ξ ψ , ξ Y + ε ) ≥ and V ( ψ ) / V ( ξ ψ ) ≥ λ . The ﬁrst result is a negative one. It implies that without further restrictions thanthose already imposed in Proposition 4, the regression of (cid:98) Y on (cid:98) ψ does not bring anyadditional restriction related to RE. The second result, on the other hand, showsthat if one assumes a positive correlation between ξ ψ and ξ Y + ε and a lower boundon the signal-to-noise ratio V ( ψ ) / V ( ξ ψ ), then β is bounded from below under RE.The restriction corr( ξ ψ , ξ Y + ε ) ≥ ε cannot be anticipated, it is natural to assume that corr( ξ ψ , ε ) = 0. It then followsthat the assumption corr( ξ ψ , ξ Y + ε ) ≥ Y and ψ are positively correlated. This would typically happen, for instance, if individualsreport their expectations and realized earnings omitting in both cases some compo-nents of their earnings, or if they instead overstate their realized earnings, and theirexpectations accordingly. E [ q ( C, U ) |I ] = 0 implies that E [ C k ] = 0 for k = 0 , ..., K , but it does not restrict the set ofpossible c (cid:48) k . (cid:98) Y on (cid:98) ψ , since this regressionhas been very often used to test for RE. This means, however, that there may inprinciple be additional restrictions on the joint distribution of ( (cid:98) Y , (cid:98) ψ ) implied by RE. C Tests with rounding practices

We have considered in Section 2.2.4 the possibility of measurement errors on ψ . An-other source of uncertainty on ψ is rounding. Rounding practices by intervieweesare common. A way to interpret these practices is that in situations of ambiguity,individuals may only be able to bound the distribution of their future outcome Y (Manski, 2004). If individuals round at 5% levels, for instance, an answer ψ = 0 . ψ ∈ [0 . , . ψ are observed is when ques-tions to elicit subjective expectations take the following form: “What do you thinkis the percent chance that your own [ Y ] will be below [ y ]?”, for a certain grid of y . If 0 and 100 are always observed, or if we assume that the support of subjectivedistributions is included in [ y, y ], we can still compute bounds on ψ . In such cases,we only observe ( ψ L , ψ U ), with ψ L ≤ ψ ≤ ψ U . For a thorough discussion of this issue,and especially of how to infer rounding practices, see Manski and Molinari (2010).In this setting, rationalizing rational expectations is less stringent than in our baselineset-up since the constraints on the distribution of ψ are weaker. Formally, the nullhypothesis takes the following form:H B : ∃ ( Y (cid:48) , ψ (cid:48) , I (cid:48) ) : σ ( ψ (cid:48) ) ⊂ I (cid:48) , Y (cid:48) ∼ Y, F ψ U ≤ F ψ (cid:48) ≤ F ψ L and E ( Y (cid:48) |I (cid:48) ) = ψ (cid:48) . To obtain an equivalent formulation to H B , a natural idea would be to ﬁx a candidatecdf F ∈ [ F ψ U , F ψ L ] for F ψ and apply Theorem 1 with this F . Then, letting ∆ F ( y ) = (cid:82) y −∞ F Y ( t ) − F ( t ) dt and δ F = E ( Y ) − (cid:82) udF ( u ), H B would hold as long as for some F ∈ [ F ψ U , F ψ L ], ∆ F ( y ) ≥ y ∈ R and δ F = 0. In practice though, directly checkingwhether such a distribution exists would be very diﬃcult. Fortunately, we show inthe following proposition that it is in fact suﬃcient to check that these conditions Note however that in this case, our approach does not take into account all the information onthe subjective distribution. b ∈ R , the random variables ψ b = ψ U { ψ U < b } + max( b, ψ L )1l { ψ U ≥ b } . We also let ψ −∞ = ψ L and ψ ∞ = ψ U . The cdf of ψ b is then F b ( t ) = F ψ U ( t )1l { t

Suppose that Assumption 5 holds. First, if E [ ψ L ] ≤ E [ Y ] ≤ E [ ψ U ] ,there exists a unique F ∗ ∈ F B such that δ F ∗ = 0 . Second, the following statementsare equivalent:(i) H B holds.(ii) E [ ψ L ] ≤ E [ Y ] ≤ E [ ψ U ] and ∆ F ∗ ( y ) ≥ for all y ∈ R . This test shares some similarities with the test in the presence of aggregate shocks.Speciﬁcally, if E [ ψ L ] ≤ E [ Y ] ≤ E [ ψ U ], we ﬁrst identify b ∈ R such that the candidatebelief ψ b , which plays a similar role as the modiﬁed outcome q ( Y, c ) in the test withaggregate shocks, satisﬁes the equality constraint E [ ψ b ] = E [ Y ]. Noting that theinequality ∆ F ∗ ( y ) ≥ E (cid:104) ( y − Y ) + − (cid:0) y − ψ b (cid:1) + (cid:105) ≥

0, it followsfrom (ii) that rationalizing RE in this context (i.e., H B ) is then equivalent to a setof many moment inequality constraints involving the distributions of realizations Y and candidate belief ψ b . D Tests with sample selection in the datasets

We consider here cases where the two samples are not representative of the samepopulation, or formally, D is not independent of ( Y, ψ ). This may arise for instancebecause of oversampling of some subpopulations or diﬀerences in nonresponse betweenthe two surveys that are used. We assume instead that selection is conditionallyexogenous, that is to say: D ⊥⊥ ( Y, ψ ) | X. (5)40e show how to use a propensity score weighting to handle such a selection. Denoteby p ( x ) = P ( D = 1 | X = x ) = E [ D | X = x ] the propensity score and by W ( X ) = Dp ( X ) − − D − p ( X ) . The law of iterated expectations combined with Proposition 2 directly yields thefollowing proposition:

Proposition 9

Suppose that (5) and Assumption 1 hold. Then H X is equivalent to E (cid:20) W ( X ) (cid:16) y − (cid:101) Y (cid:17) + (cid:12)(cid:12)(cid:12)(cid:12) X (cid:21) ≥ for all y ∈ R and E (cid:104) W ( X ) (cid:101) Y (cid:12)(cid:12)(cid:12) X (cid:105) = 0 . This proposition shows that under sample selection, we can build a statistical test ofH X akin to that developed in Section 3, by merely estimating nonparametrically p ( X ). We could consider for that purpose a series logit estimator, for instance.Validity of such a test would follow using very similar arguments as for the testwith aggregate shocks considered above. E Simulations with covariates

We consider here simulations including covariates. The DGP is similar to that con-sidered in Section 4. Speciﬁcally, we assume that Y = ρψ + √ Xε , with ρ ∈ [0 , ψ ∼ N (0 , X ∼ Beta(0 . ,

10) and ε = ζ ( − { U ≤ . } + 1l { U ≥ . } ) , where ζ ∼ N (2 , .

1) and U ∼ U [0 , ψ, ζ, U, X ) are supposed to be mutuallyindependent. Like in the test without covariates, we can show that the test withcovariates is able to reject RE if and only if ρ < . E [ Y | X ] = E [ ψ | X ], so the naive conditional test has no power. The test based on conditionalvariances rejects only if ρ < . X , ourtest has power only for ρ < .

52. Hence, relying on covariates allows us to gain powerfor ρ ∈ [0 . , . n ψ = n Y = n ∈ { , , , } , use 500 bootstrapsimulations to compute the critical value, and rely on 800 Monte-Carlo replicationsfor each value of ρ and n . We use the same parameters p = 0 .

05 and b = 0 . Notes: the dotted vertical lines correspond to the theoretical limit for the rejection of thenull hypothesis for test based on variance ( ρ (cid:39) . ρ (cid:39) . ρ = 0 . Figure 2: Power curves for the test with covariates.Figure 2 shows that the RE test with covariates asymptotically outperforms theRE test without covariates. The test exhibits a similar behavior as that withoutcovariates, though, as we could expect, the power converges less quickly to one as n tends to inﬁnity. 42 Additional material on the application

F.1 Eﬀect of the Winsorization on the RE test

Table 4: Full test of RE with diﬀerent levels of Winsorization

Winsorization level 0.95 0.97 0.99(p-value) (p-value) (p-value)All < < < < ≤ > S with q ( y, c ) = y/c , using 5,000 bootstrap simula-tions to compute the critical values. Distributions of realized earnings( Y ) and earnings beliefs ( ψ ) are both Winsorized at either the 0.95,0.97, or 0.99 quantile. F.2 Possibly endogenous attrition in the survey

In addition to measurement errors, another potential issue when using the linkeddata (

Y, ψ ) is that attrition may be related to Y itself. This would create a sampleselection issue that would invalidate the direct test, even absent any measurementerrors. To explore this possibility, Table 5 below reports the estimation results froma logit model of attrition on earnings beliefs, gender, race/ethnicity, college degreeattainment, numeracy test score, tenure and a (linear) time trend. The main takeawayfrom this table is that earnings beliefs ψ are signiﬁcantly associated with attrition,even after controlling for this extensive set of characteristics. This result suggeststhat individuals for whom we observe both earnings expectations and realizations arelikely to earn more than those who are not followed across the two waves. Alongthe same lines, a Kolmogorov-Smirnov test rejects at the 1% level the equality of the43istributions of realized earnings between the whole sample and the subsample thatwould be used for the direct test. Similarly, we reject the equality of the distributionsof expected earnings between these two samples. These results indicate that, inthis context, the direct RE test is likely to be misleading. Conversely, attrition isunlikely to be an issue with our test, since we use in each wave the observations ofall respondents. Table 5: Logit model of attrition

Intercept ψ Male White Coll. Degree Low Num. Tenure > ∗∗ -6.206e-06 ∗∗ ∗∗ -0.040(0.293) (1.621e-06) (0.138) (0.222) (0.139) (0.162) (0.164) (0.033)Notes: 1,565 observations. Signiﬁcance levels: † : 10%, ∗ : 5%, ∗∗ : 1%. G Proofs

G.1 Notation and preliminaries

For any set G , let us denote by l ∞ ( G ) the collection of all uniformly bounded realfunctions on G equipped with the supremum norm (cid:107) f (cid:107) G = sup x ∈G | f ( x ) | . Denote by L ( F ) the square integrable space with respect to the measure associated with F ,and let (cid:107)·(cid:107) F, be the corresponding norm. We let N ( (cid:15), T , L ( F )) denote the minimalnumber of (cid:15) -balls with respect to (cid:107)·(cid:107) F, needed to cover T . An (cid:15) -bracket (with respectto F ) is a pair of real functions ( l, u ) such that l ≤ u and (cid:107) u − l (cid:107) F, ≤ (cid:15) . Then, forany set of real functions M , we let N [] ( (cid:15), M , L ( F )) denote the minimum number of (cid:15) -brackets needed to cover M . We denote by G = ( ∪ r ≥ G r ). For x ∈ R d , d >

1, wedenote by (cid:107) x (cid:107) ∞ = max j =1 ,...,d | x | .For a sequence of random variable ( U n ) n ∈ N and a set F , we say that U n = O P (1)uniformly in F ∈ F if for any (cid:15) > M > n > The one assumption we need to make is that respondents in the surveys used to measure ψ (i.e.,those of March and July 2015) are drawn from the same population as those from the surveys usedto measure Y (i.e., those of July and November 2015). That there is no signiﬁcant time trend in theattrition model (Table 5) suggests that this assumption is reasonable in this context. F ∈F P F ( | U n | > M ) < (cid:15) for all n > n . Similarly we say that U n = o P (1)uniformly in F ∈ F if for any (cid:15) >

0, sup F ∈F P F ( | U n | > (cid:15) ) → T ∗ versus T . We deﬁne o P ∗ and O P ∗ as above, but conditional on (cid:16) (cid:101) Y i , D i , X i (cid:17) i =1 ...n . Convergence in distribution conditional on (cid:16) (cid:101) Y i , D i , X i (cid:17) i =1 ...n isdenoted by → d ∗ . G.2 Proof of Lemma 1

Under H , there exist Y (cid:48) , ψ (cid:48) and I (cid:48) such that Y (cid:48) ∼ Y , ψ (cid:48) ∼ ψ , σ ( ψ (cid:48) ) ⊂ I (cid:48) and E ( Y (cid:48) |I (cid:48) ) = ψ (cid:48) . Then, by the law of iterated expectations, E [ Y (cid:48) | ψ (cid:48) ] = E [ E [ Y (cid:48) |I (cid:48) ] | ψ (cid:48) ] = E [ ψ (cid:48) | ψ (cid:48) ] = ψ (cid:48) . Conversely, if there exists ( Y (cid:48) , ψ (cid:48) ) such that Y (cid:48) ∼ Y , ψ (cid:48) ∼ ψ and E [ Y (cid:48) | ψ (cid:48) ] = ψ (cid:48) , let I (cid:48) = σ ( ψ (cid:48) ). Then ψ (cid:48) = E [ Y (cid:48) | ψ (cid:48) ] = E [ Y (cid:48) |I (cid:48) ] and H holds. G.3 Proof of Theorem 1 (i) ⇔ (iii). By Strassen’s theorem (Strassen, 1965, Theorem 8), the existence of( Y, ψ ) with margins equal to F Y and F ψ and such that E [ Y | ψ ] = ψ is equivalent to (cid:82) f dF ψ ≤ (cid:82) f dF Y for every convex function f . By, e.g., Proposition 2.3 in Gozlanet al. (2018), this is, in turn, equivalent to (iii).(ii) ⇔ (iii). By Fubini-Tonelli’s theorem, (cid:82) y −∞ F Y ( t ) dt = E (cid:104)(cid:82) y −∞ { t ≥ Y } dt (cid:105) = E [( y − Y ) + ] . The same holds for ψ . Hence, ∆( y ) ≥ y ∈ R is equivalentto E (cid:2) ( y − Y ) + (cid:3) ≥ E (cid:2) ( y − ψ ) + (cid:3) for all y ∈ R . The result follows. G.4 Proof of Proposition 1

First, by Jensen’s inequality, we obtain E [( y − Y ) + | ψ ] ≥ ( y − E ( Y | ψ )) + = ( y − ψ ) + . Moreover, ∆( y ) = 0 implies that E (( y − Y ) + ) = E (( y − ψ ) + ). Hence, almost surely,we have E [( y − Y ) + | ψ ] = ( y − ψ ) + . u , we either have Supp( Y | ψ = u ) ⊂ [ y , ∞ ) or Supp( Y | ψ = u ) ⊂ ( −∞ , y ]. Because E [ Y | ψ ] = ψ , Supp( Y | ψ = u ) ⊂ [ y , ∞ ) for almost all u > y and Supp( Y | ψ = u ) ⊂ ( −∞ , y ] for almost all u < y .Then, for all τ ∈ (0 , F − Y | ψ ( τ | u ) ≥ y for almost all u ≥ y and F − Y | ψ ( τ | u ) ≤ y foralmost all u ≤ y . Thus, for all τ ∈ (0 , F − Y | ψ ( τ |· ), F − Y | ψ ( τ | y ) = y .This implies that Y | ψ = y is degenerate. G.5 Proof of Proposition 2

We ﬁrst prove that H X is equivalent to the existence of ( Y (cid:48) , ψ (cid:48) ) such that DY (cid:48) +(1 − D ) ψ (cid:48) = (cid:101) Y , D ⊥⊥ ( Y (cid:48) , ψ (cid:48) ) | X and E (( Y (cid:48) | ψ (cid:48) , X ) = ψ (cid:48) . First, under H X , thereexists ( Y (cid:48) , ψ (cid:48) , I (cid:48) ) such that DY (cid:48) + (1 − D ) ψ (cid:48) = (cid:101) Y , D ⊥⊥ ( Y (cid:48) , ψ (cid:48) ) | X , σ ( ψ (cid:48) , X ) ⊂ I (cid:48) and E ( Y (cid:48) |I (cid:48) ) = ψ (cid:48) . Then E [ Y (cid:48) | ψ (cid:48) , X ] = E [ E [ Y (cid:48) |I (cid:48) ] | ψ (cid:48) , X ] = E [ ψ (cid:48) | ψ (cid:48) , X ] = ψ (cid:48) . Conversely, if there exists ( Y (cid:48) , ψ (cid:48) ) such that DY (cid:48) + (1 − D ) ψ (cid:48) = (cid:101) Y , D ⊥⊥ ( Y (cid:48) , ψ (cid:48) ) | X and E ( Y (cid:48) | ψ (cid:48) , X ) = ψ (cid:48) , let I (cid:48) = σ ( X (cid:48) , ψ (cid:48) ). Then ψ (cid:48) = E ( Y (cid:48) | ψ (cid:48) , X ) = E ( Y (cid:48) |I (cid:48) ) andH X holds. The proposition then follows as Theorem 1. G.6 Proof of Proposition 4

For all y , ξ (cid:55)→ E [( y − ψ − ξ ) + ] is decreasing and convex. Then, because F ξ ψ dominatesat the second order F ξ Y + ε , we have (cid:90) E (cid:2) ( y − ψ − ξ ) + (cid:3) dF ε + ξ Y ( ξ ) ≥ (cid:90) E (cid:2) ( y − ψ − ξ ) + (cid:3) dF ξ ψ ( ξ ) . As a result, for all y , we obtain E (cid:20)(cid:16) y − (cid:98) Y (cid:17) + (cid:21) = (cid:90) E (cid:2) ( y − ψ − ε − ξ Y ) + | ε + ξ Y = ξ (cid:3) dF ε + ξ Y ( ξ )= (cid:90) E (cid:2) ( y − ψ − ξ ) + (cid:3) dF ε + ξ Y ( ξ ) ≥ (cid:90) E (cid:2) ( y − ψ − ξ ) + (cid:3) dF ξ ψ ( ξ )= E (cid:104) ( y − (cid:98) ψ ) + (cid:105) . Moreover, E (cid:16) (cid:98) Y (cid:17) = E (cid:16) (cid:98) ψ (cid:17) . By Theorem 1, (cid:98) Y and (cid:98) ψ satisfy H .46 .7 Proof of Theorem 2 (i) This is a particular case of Proposition 5 below, with q ( Y, c ) = Y . The proof istherefore omitted. (ii) We show that equality holds for F ∈ F satisfying the conditions stated in (ii).The proof is divided in three steps. We ﬁrst prove convergence in distribution of T to S deﬁned below, and conditional convergence of T ∗ towards the same limit. Then weshow that the cdf H of S is continuous and strictly increasing in the neighborhoodof its quantile of order 1 − α , for any α ∈ (0 , /

1. Convergence in distribution of T and T ∗ . Let us introduce some notation. Let K j,j ( j ∈ { , } ) be the j -th diagonal elementof the covariance kernel K , S : ( ν, K ) (cid:55)→ (1 − p ) (cid:16) − ν /K / , (cid:17) +2 + p (cid:16) ν /K / , (cid:17) , q ( r ) = ( r + 100) − (2 r ) − d X , and ν n,F ( y, g ) = 1 √ n n (cid:88) i =1 Diag (cid:16) V F (cid:16) (cid:101) Y (cid:17)(cid:17) − / (cid:16) m (cid:16) D i , (cid:101) Y i , X i , g, y (cid:17) − E F (cid:104) m (cid:16) D i , (cid:101) Y i , X i , g, y (cid:17)(cid:105)(cid:17) . Finally, we deﬁne k n,F ( y, g ) = √ n Diag (cid:16) V F (cid:16) (cid:101) Y (cid:17)(cid:17) − / E F (cid:104) m (cid:16) D i , (cid:101) Y i , X i , g, y (cid:17)(cid:105) , K n,F ( y, g, y (cid:48) , g (cid:48) ) = Diag (cid:16) V F (cid:16) (cid:101) Y (cid:17)(cid:17) − / (cid:100) Cov (cid:0) √ nm n ( y, g ) , √ nm n ( y (cid:48) , g (cid:48) ) (cid:1) Diag (cid:16) V F (cid:16) (cid:101) Y (cid:17)(cid:17) − / ,K n,F ( y, g, y (cid:48) , g (cid:48) ) = K n,F ( y, g, y (cid:48) , g (cid:48) ) + (cid:15) Diag (cid:16) V F (cid:16) (cid:101) Y (cid:17)(cid:17) − / Diag (cid:16)(cid:98) V (cid:16) (cid:101) Y (cid:17)(cid:17) Diag (cid:16) V F (cid:16) (cid:101) Y (cid:17)(cid:17) − / , and use the notations K n,F ( y, g ) = K n,F ( y, g, y, g ) and K n,F ( y, g ) = K n,F ( y, g, y, g ).We have, by deﬁnition of T , T = sup y ∈Y (cid:88) ( a,r ): r ∈{ ,...,r n } ,a ∈ A r q ( r ) S (cid:0) ν n,F ( y, g a,r ) + k n,F ( y, g a,r ) , K n,F ( y, g a,r ) (cid:1) . To characterize the distribution of T (resp. T ∗ ), we ﬁrst prove the convergence of ν n,F and K n,F ( y, g a,r ) (resp. ν ∗ n,F and K ∗ n,F ( y, g a,r )). For those purposes, we use aclass of functions which is a general form taken by m deﬁned in (2), namely, for any0 < N < M , M = { f y,φ ,φ ,g ( (cid:101) y, x, d ) = (cid:0) dφ ( y − (cid:101) y ) + − (1 − d ) φ ( y − (cid:101) y ) + (cid:1) g ( x ) , ( y, φ , φ , g ) ∈ Y × [ N , M ] × G} . M is a particular case of classes M deﬁned in (9) below. Then, by theproof of Proposition 5 below, Assumptions PS1 and PS2 in AS are satisﬁed. Thus,the assumptions of Lemma D.2 in AS hold as well. This entails that AssumptionsPS4 and PS5 in AS hold. Namely, there exists a Gaussian process ν F such that- ν n,F → d ν F and ν ∗ n,F → d ∗ ν F ;- For all r ∈ N and ( y, g ) ∈ Y×G r , K n,F ( y, g ) → P K F ( y, g )+ (cid:15)I and K ∗ n,F ( y, g ) → P ∗ K F ( y, g ) + (cid:15)I , where I is the 2 × k F ( y, g ) denote the limit in probability of k n,F ( y, g ), we have k F ( y, g ) = 0 if ( y, g ) ∈ L F and ∞ otherwise. Note that by assumption, the set L F is nonempty.Thus, using (D.11) in the proof of Theorem D.3. in AS, which is based on the uniformcontinuity of the function S in the sense of Assumption S2 therein, we have, under F , T → d sup y ∈Y (cid:88) ( a,r ) ∈ A r × N S ( ν F ( y, g a,r ) + k F ( y, g a,r ) , K F ( y, g a,r ) + (cid:15)I )= S := sup y ∈Y (cid:88) ( a,r ):( y,g a,r ) ∈L F q ( r ) S ( ν F ( y, g a,r ) , K F ( y, g a,r ) + (cid:15)I ) , where the equality follows by deﬁnition of S and k F ( y, g ). Similarly, using As-sumption PS5 and (D.11) in AS, replacing T by T ∗ and quantities ν n,F ( y, g a,r ) and K n,F ( y, g a,r ) by their bootstrap counterparts (see the proof of Lemma D.4 in AS) wehave T ∗ → d ∗ S .

2. The cdf H of S is continuous and strictly increasing in the neighborhoodof any of its quantile of order − α > / . First, the cdf H of S is a convex functional of the Gaussian process ν F . Then, asin the proof of Lemma B3 in Andrews and Shi (2013), we can use Theorem 11.1 ofDavydov et al. (1998) p.75 to show that H is continuous and strictly increasing at48very point of its support except r = inf { r ∈ R : H ( r ) > } . Moreover, for any r > H ( r ) ≥ P  sup y ∈Y (cid:88) ( a,r ):( y,g a,r ) ∈L F q ( r ) S ( ν F ( y, g a,r ) , K F ( y, g a,r ) + (cid:15)I ) < r  ≥ P (cid:32) sup j ∈{ , } , ( y,a,r ):( y,g a,r ) ∈L F (cid:12)(cid:12) ( K ,F ,j,j ( y, g a,r ) + (cid:15) ) − / ν F ,j ( y, g a,r ) (cid:12)(cid:12) < (cid:112) r/ Q (cid:33) > , where Q = (cid:80) ( a,r ):( y,g a,r ) ∈L F q ( r ) < ∞ and we use Problem 11.3 of Davydov et al.(1998) p.79 for the last inequality. This yields r > r and H is continuous and strictlyincreasing on (0 , ∞ ).Then, we show that for any α ∈ (0 , / − α of the distri-bution of S is positive. By assumption, there exists ( y , g ) ∈ L F such that either K F , ( y , g ) > K F , ( y , g ) >

0. This yields P ( S >

0) = 1 − P  sup y ∈Y (cid:88) ( a,r ):( y,g a,r ) ∈L F q ( r ) S ( ν F ( y, g a,r ) , K F ( y, g a,r ) + (cid:15)I ) = 0  ≥ − P ( ν F , ( y , g ) ≤ , ν F , ( y, g ) = 0) ≥ − min { P ( ν F , ( y , g ) ≤ , P ( ν F , ( y , g ) = 0) }≥ / . (6)The ﬁrst inequality holds by deﬁnition of the supremum and because S is nonnegative.To obtain the last inequality, note that either ν F , ( y , g ) is non-degenerate, in whichcase the ﬁrst probability is 1 / ν F , ( y , g ) is normal with zero mean), or ν F , ( y , g ) is non-degenerate, in which case the second probability is 0.Finally, using that H is strictly increasing on (0 , ∞ ), (6) ensures that any quantileof S of order 1 − α with α ∈ [0 , /

2) is positive. Hence, H is continuous and strictlyincreasing in the neighborhood of any such quantiles.

3. Conclusion.

Using T ∗ → d ∗ S in distribution, Step 2 and Lemma 21.2 in Van der Vaart (2000),we have that for η > c ∗ n,α → d ∗ c (1 − α + η ) + η , where c (1 − α + η ) is the491 − α + η )-th quantile of the distribution of S . Because T → d S and H is continuousat c (1 − α + η ) + η >

0, we obtain thatlim η → lim sup n →∞ P F (cid:0) T > c ∗ n,α (cid:1) = α. Combined with the inequality of Part (i) above, this yields the result. (iii)

This results follows from Theorem E.1 in AS. First, Assumption SIG2 in ASholds for σ F = V F (cid:16) (cid:101) Y (cid:17) , following the proof of Lemma 7.2 (b) under Assumption3-(ii). Second, Assumptions PS4 and PS5 are satisﬁed using the point (ii) above.Third, Assumptions CI, MQ, S1, S3, S4 in AS are also satisﬁed by construction ofthe statistic T . Thus, Theorem E.1 in AS yields the result. (cid:3) G.8 Proof of Proposition 5

We introduce E F,c = E F (cid:104) m (cid:16) D i , (cid:101) Y c,i , X i , g, y (cid:17)(cid:105) and ν n,F ( y, g ) = 1 √ n n (cid:88) i =1 Diag (cid:16)(cid:98) V F (cid:16) (cid:101) Y (cid:98) c (cid:17)(cid:17) − / (cid:16) m (cid:16) D i , (cid:101) Y (cid:98) c,i , X i , g, y (cid:17) − E F, (cid:98) c (cid:17) ,ν n,F ( y, g ) = 1 √ n n (cid:88) i =1 Diag (cid:16) V F (cid:16) (cid:101) Y c (cid:17)(cid:17) − / (cid:16) m (cid:16) D i , (cid:101) Y c ,i , X i , g, y (cid:17) − E F,c (cid:17) . The proof is based on Theorem 5.1 in AS, hence we have to check that the corre-sponding assumptions PS1, PS2, and SIG1 hold. Namely, we have to ensure that-

PS1 : for all sequence F ∈ F and all ( d, y (cid:48) , x, g, y, c ) ∈ { , } × Y × [0 , d X ×G r × Y × C s (cid:0) [0 , d X (cid:1)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) m ( d, y (cid:48) , x, g, y ) V F (cid:16) (cid:101) Y c,i (cid:17) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ M ( d, y (cid:48) , x, g, y ) and E F (cid:20) M (cid:16) D i , (cid:101) Y c,i , X i , g, y (cid:17) δ (cid:21) ≤ C < ∞ , where δ > M ;- PS2 : for all sequence F n ∈ F , the i.i.d triangular array of processes T n = (cid:26) m (cid:16) D i , (cid:101) Y n,c ( X n,i ) , X n,i , g, y (cid:17) V F n (cid:16) (cid:101) Y n,c ( X n,i ) (cid:17) , ( c, y, g ) ∈ C s (cid:0) [0 , d X (cid:1) × Y × G , i ≤ n, n ≥ (cid:27) is manageable with respect to some envelope function U (see Pollard, 1990,p.38 for the deﬁnition of a manageable class);50 SIG1 : for all ζ >

0, sup F ∈F ,c ∈C s ( [0 , dX ) P (cid:16)(cid:12)(cid:12)(cid:12)(cid:98) V F (cid:16) (cid:101) Y i,c (cid:17) / V F (cid:16) (cid:101) Y i,c (cid:17) − (cid:12)(cid:12)(cid:12) > ζ (cid:17) → c and Diag (cid:16) V F (cid:16) (cid:101) Y c (cid:17)(cid:17) − / areestimated:1. We ﬁrst show thatsup F ∈F sup g ∈∪ r ≥ G r ,y ∈Y (cid:107) ν n,F ( y, g ) − ν n,F ( y, g ) (cid:107) ∞ = o P (1) , (7)sup F ∈F sup g ∈∪ r ≥ G r ,y ∈Y (cid:13)(cid:13) ν ∗ n,F ( y, g ) − ν ∗ n,F ( y, g ) (cid:13)(cid:13) ∞ = o P ∗ (1) . (8)2. Next, we show that m satisﬁes assumptions PS1, PS2, and that SIG1 in AS alsoholds for σ F = V F (cid:16) (cid:101) Y c (cid:17) , where F ∈ F and (cid:98) σ n = n − (cid:80) ni =1 (cid:16) (cid:101) Y (cid:98) c,i − n − (cid:80) nj =1 (cid:101) Y (cid:98) c,j (cid:17) .

1. Proof of (7) - (8)We apply the uniform version over F ∈ F of Theorem 3 in Chen et al. (2003) to ageneral class of functions to which pertain the moment condition m (see (2), with (cid:101) Y replaced here by (cid:101) Y c = Dq (cid:16) (cid:101) Y , c (cid:17) + (1 − D ) ψ and without the moment equality m ).Hence, it suﬃces to verify that Assumptions (3.2) and (3.3) of Theorem 3 in Chenet al. (2003) are satisﬁed. Let us introduce, for any 0 < N < M , the classes offunctions M = (cid:8) f c,y,φ,g ( (cid:101) y, x ) = φ ( y − q ( (cid:101) y, c ( x ))) + g ( x ) , ( c, y, φ, g ) ∈ C s (cid:0) [0 , d X (cid:1) × Y × [ N , M ] × G (cid:9) , (9) M = (cid:8) f c,y,φ,g ( (cid:101) y, x ) = φ ( y − (cid:101) y ) + g ( x ) , ( c, y, φ, g ) ∈ C s (cid:0) [0 , d X (cid:1) × Y × [ N , M ] × G (cid:9) , M = { f c,y,φ ,φ ,g ( (cid:101) y, x, d ) = ( dg c,y,φ ,g − (1 − d ) q c,y,φ ,g ) ( (cid:101) y, x ) , g ∈ M , q ∈ M , ( c, y, φ , φ , g ) ∈ C s (cid:0) [0 , d X (cid:1) × Y × [ N , M ] × G} . Note that φ , φ , and c in the class M denote components of m that are estimated.Consider the space C s (cid:0) [0 , d X (cid:1) × Y × [ N , M ] × G equipped with the norm (cid:107) ( c, y, φ , φ , g ) (cid:107) = max (cid:110) (cid:107) c (cid:107) [0 , dX , | y | , | φ | , | φ | , (cid:107) g (cid:107) [0 , dX (cid:111) . For v = ( c, y, φ , φ , g ) , v (cid:48) = ( c (cid:48) , y (cid:48) , φ (cid:48) , φ (cid:48) , g (cid:48) ) ∈ C s (cid:0) [0 , d X (cid:1) × Y × [ N , M ] × G and( (cid:101) y, x, d ) ∈ Y × [0 , d X × { , } , we have, by the triangular inequality and Assumptions51-(i) and 4-(v), | f v ( (cid:101) y, x, d ) − f v (cid:48) ( (cid:101) y, x, d ) | ≤ (cid:12)(cid:12) g c,y,φ ,g ( (cid:101) y, x ) − g c (cid:48) ,y (cid:48) ,φ (cid:48) ,g (cid:48) ( (cid:101) y, x ) (cid:12)(cid:12) + (cid:12)(cid:12) q c,y,φ ,g ( (cid:101) y, x ) − q c (cid:48) ,y (cid:48) ,φ (cid:48) ,g (cid:48) ( (cid:101) y, x ) (cid:12)(cid:12) ≤ ( M + M ) ( | φ − φ (cid:48) | + | φ − φ (cid:48) | )+ 2 M [ | y − y (cid:48) | + | q ( (cid:101) y, c ( x )) − q ( (cid:101) y, c (cid:48) ( x )) | ]+ 2 M M (cid:2) | { q ( (cid:101) y, c ( x )) ≤ y } − { q ( (cid:101) y, c ( x )) ≤ y (cid:48) }| + | { q ( (cid:101) y, c ( x )) ≤ y (cid:48) } − { q ( (cid:101) y, c (cid:48) ( x )) ≤ y (cid:48) }| + | g ( x ) − g (cid:48) ( x ) | (cid:3) . Denote by K q > q ( (cid:101) y, . ). Then, by convexity of x (cid:55)→ x ,we obtain17 | f v ( (cid:101) y, x, d ) − f v (cid:48) ( (cid:101) y, x, d ) | ≤ ( M + M ) (cid:16) | φ − φ (cid:48) | + | φ − φ (cid:48) | (cid:17) + 4 M (cid:104) | y − y (cid:48) | + K q (cid:107) c − c (cid:48) (cid:107) , dX (cid:105) + 4( M M ) (cid:2) | { q ( (cid:101) y, c ( x )) ≤ y } − { q ( (cid:101) y, c ( x )) ≤ y (cid:48) }| + | { q ( (cid:101) y, c ( x )) ≤ y (cid:48) } − { q ( (cid:101) y, c (cid:48) ( x )) ≤ y (cid:48) }| + (cid:107) g − g (cid:48) (cid:107) , dX (cid:3) . Fix δ >

0. If (cid:107) v − v (cid:48) (cid:107) ≤ δ , this yields17 | f v ( (cid:101) y, x, d ) − f v (cid:48) ( (cid:101) y, x, d ) | ≤ δ (cid:0) M + M ) + 4 M (1 + K q ) + 4( M M ) (cid:1) + 4( M M ) (cid:2) { q ( (cid:101) y, c ( x )) ≤ y + δ } − { q ( (cid:101) y, c ( x )) ≤ y − δ } + (cid:12)(cid:12) (cid:8)(cid:101) y ≤ q I ( y (cid:48) , c ( x )) (cid:9) − (cid:8)(cid:101) y ≤ q I ( y (cid:48) , c (cid:48) ( x )) (cid:9)(cid:12)(cid:12) (cid:3) . Next, by Assumption 4-(iv), we obtain E (cid:104) (cid:110) q (cid:16) (cid:101) Y , c ( X ) (cid:17) ≤ y + δ (cid:111) − (cid:110) q (cid:16) (cid:101) Y , c ( X ) (cid:17) ≤ y − δ (cid:111)(cid:105) = F q ( (cid:101) Y ,c ( X ) ) ( y + δ ) − F q ( (cid:101) Y ,c ( X ) ) ( y − δ ) ≤ Q δ. E (cid:2)(cid:12)(cid:12) (cid:8) Y ≤ q I ( y (cid:48) , c ( X )) (cid:9) − (cid:8)(cid:101) y ≤ q I ( y (cid:48) , c (cid:48) ( X )) (cid:9)(cid:12)(cid:12)(cid:3) ≤ E (cid:2) (cid:8) Y ≤ q I ( y (cid:48) , c ( X )) − Q F, δ (cid:9) − (cid:8)(cid:101) y ≤ q I ( y (cid:48) , c ( X )) + Q F, δ (cid:9)(cid:3) ≤ E (cid:2) F Y | X (cid:0) q I ( y (cid:48) , c ( X )) − Q q I δ (cid:12)(cid:12) X (cid:1) − F Y | X (cid:0) q I ( y (cid:48) , c ( X )) + Q q I δ (cid:12)(cid:12) X (cid:1)(cid:3) ≤ Q F, Q q I δ, where Q q I is the Lipschitz constant of q I . Thus, by Assumption 4, there exists Q > F ∈F E (cid:34) sup (cid:107) v − v (cid:48) (cid:107)≤ δ (cid:12)(cid:12)(cid:12) f v (cid:16) (cid:101) Y , X, D (cid:17) − f v (cid:48) (cid:16) (cid:101) Y , X, D (cid:17)(cid:12)(cid:12)(cid:12) (cid:35) ≤ Qδ. (10)Therefore the class M satisﬁes Condition (3.2) of Theorem 3 in Chen et al. (2003)uniformly in F ∈ F . Moreover, the class G is manageable and thus Donsker (seeLemma 3 in Andrews and Shi, 2013). Finally, by Remark 3 (ii) in Chen et al. (2003), C s (cid:0) [0 , d X (cid:1) is also Donsker. Then, C s (cid:0) [0 , d X (cid:1) , Y , [ N , M ], and G satisfy Condition(3.3) of Theorem 3 in Chen et al. (2003). The result follows by Theorem 3 in Chenet al. (2003). m satisﬁes PS1 and PS2 of AS and SIG1 of AS also holds for σ F and (cid:98) σ n . From Assumption 4 (iii) and the proof of Lemma 7.2 (a) in AS, PS1 is satisﬁedreplacing B by max( M, M ) in the proof of Lemma 7.2-(a) in AS.We now show that PS2 in AS also holds. As the result is uniform over F , we have toconsider sequences for the cdfs F n of ( D n,i , Y n,i , X n,i ) i =1 ...n (with F n ∈ F ). We alsodeﬁne (cid:101) Y n,c ( X n,i ) = D n,i q ( Y n,i , c ( X n,i )) + (1 − D n,i ) ψ n,i ,W n,i = D n,i E F n [ D n,i ] − − D n,i E F n [1 − D n,i ] ,σ F n = V F n (cid:16) (cid:101) Y n,c ( X n,i ) (cid:17) . Note that by Assumption 3 (iii), σ F n ≥ σ > F n ∈ F . Let (Ω , F , F n ) be aprobability space and let ω denote a generic element in Ω. Showing Assumption PS2in AS then boils down to prove that for any 0 < N < M := 1 / inf F σ F , the i.i.d53riangular array of processes T ,n,ω = (cid:26) W n,i φ (cid:16) y − (cid:101) Y n,c ( X n,i ) (cid:17) + g ( X n,i ) , ( c, y, φ, g ) ∈ C s (cid:0) [0 , d X (cid:1) × Y × [ N , M ] × G ,i ≤ n, n ≥ (cid:27) is manageable with respect to some envelope function U . Lemma 3 in Andrews andShi (2013) shows that the processes { g ( X n,i ) , g ∈ G , i ≤ n, n ≥ } are manageablewith respect to the constant function 1. Then, using Lemma D.5 in AS, it remainsto show that T (cid:48) ,n,ω = (cid:26) W n,i φ (cid:16) y − (cid:101) Y n,c ( X n,i ) (cid:17) + , ( c, y, φ ) ∈ C s (cid:0) [0 , d X (cid:1) × Y × [ N , M ] , i ≤ n, n ≥ (cid:27) , is manageable with respect to some envelope. For such an envelope, we can consider U (cid:48) ( ω ) = ( M + M ) / ( σ(cid:15) ). We now prove the manageability of T (cid:48) ,n,ω . Let us deﬁne M (cid:48) = (cid:8) f c,y,φ ,φ ( (cid:101) y, x, d ) = dφ ( y − q ( (cid:101) y, c ( x ))) + − (1 − d ) φ ( y − (cid:101) y ) + , ( c, y, φ , φ ) ∈ C s (cid:0) [0 , d X (cid:1) × Y × [ N , M ] (cid:9) . Reasoning as for the class M deﬁned in (9), and using the last equation of the proofof Theorem 3 in Chen et al. (2003), p.1607, we have that for (cid:15) > N [ · ] ( (cid:15), M (cid:48) , (cid:107) · (cid:107) ) ≤ N (cid:0) (cid:15) (cid:48) , [ N , M ] , |·| (cid:1) × N ( (cid:15) (cid:48) , Y , |·| ) × N (cid:0) (cid:15) (cid:48) , C s (cid:0) [0 , d X (cid:1) , (cid:107) · (cid:107) [0 , dX (cid:1) , with (cid:15) (cid:48) = ( (cid:15)/ (2 Q )) and Q deﬁned in (10). Using Theorem 2.7.1 page 155 in Van derVaart and Wellner (1996), there exists a constant Q depending only on s , d X , and[0 , d X such that ln (cid:0) N (cid:0) (cid:15) (cid:48) , C s ([0 , d X ) , (cid:107) · (cid:107) [0 , dX (cid:1)(cid:1) ≤ Q (cid:15) (cid:48)− d X /s . Moreover, because Y and [ N , M ] are compact subsets of two Euclidean spaces, thereexist Q , Q such that N (cid:0) (cid:15) (cid:48) , [ N , M ] , |·| (cid:1) ≤ Q (cid:15) (cid:48)− and N ( (cid:15) (cid:48) , Y , |·| ) ≤ Q (cid:15) (cid:48)− . (11)This yieldsln (cid:0) N [ · ] ( (cid:15), M (cid:48) , (cid:107) · (cid:107) ) (cid:1) ≤ (6 + Q ) max (cid:0) − ln( (cid:15) (cid:48) ) , (cid:15) (cid:48)− d X /s (cid:1) + ln( Q Q ) . (12)54et (cid:12) denote element-by-element product and D (cid:0) (cid:15) | α (cid:12) U (cid:48) ( ω ) | , α (cid:12) T (cid:48) ,n,ω (cid:1) denoterandom packing numbers. By (A.1) in Andrews (1994, p.2284), we havesup ω ∈ Ω ,n ≥ , α ∈ R n + D (cid:0) (cid:15) | α (cid:12) U (cid:48) ( ω ) | , α (cid:12) T (cid:48) ,n,ω (cid:1) ≤ sup F ∈F N (cid:16) (cid:15) , M (cid:48) , (cid:107) · (cid:107) (cid:17) ≤ sup F ∈F N [ · ] ( (cid:15), M (cid:48) , (cid:107) · (cid:107) ) , (13)where the second inequality follows as in e.g., Van der Vaart and Wellner (1996, p.84).Then, (12) ensures (see Deﬁnition 7.9 in Pollard (1990), p.38) thatsup ω ∈ Ω ,n ≥ , α ∈ R n + D (cid:0) (cid:15) | α (cid:12) U (cid:48) ( ω ) | , α (cid:12) T (cid:48) ,n,ω (cid:1) ≤ λ ( (cid:15) ) , where λ ( (cid:15) ) = exp (cid:16) (6 + Q ) max (cid:16) − (cid:15)/ (2 Q )) , ( (cid:15)/ (2 Q )) − d X /s (cid:17) + ln( Q Q ) (cid:17) . More-over, by using √ a + b ≤ √ a + √ b for all a, b ≥ (cid:90) (cid:112) ln( λ ( (cid:15) )) d(cid:15) ≤ (cid:112) Q (cid:90) (cid:104) max (cid:16) − (cid:15)/ (2 Q )) , ( (cid:15)/ (2 Q )) − d X /s (cid:17)(cid:105) / d(cid:15) + (cid:112) ln( Q Q ) < ∞ . Thus, T (cid:48) ,n,ω hence T ,n,ω are manageable. Therefore, m satisﬁes PS2 in AS.Finally, in order to show that SIG1 in AS is satisﬁed, we use Assumption 4 (iii) andfollow the proof of Lemma 7.2 (b) in AS where we replace Y by q ( Y, c ( X )) and B bymax( M, M ). The result follows. G.9 Proof of Proposition 6

Hereafter, we let [ ψ, ψ ] (resp. [ y, y ]) denote the support of ψ (resp. of Y ). As inLemma 1, H SK holds if and only if there exists a pair of random variables ( Y (cid:48) , ψ (cid:48) )and c such that Y (cid:48) ∼ Y , ψ (cid:48) ∼ ψ and E [ Y (cid:48) | ψ (cid:48) ] = Q c ( ψ (cid:48) ). Now, if Q c is strictlyincreasing on [ ψ, ψ ], we have E [ Y (cid:48) | ψ (cid:48) ] = Q c ( ψ (cid:48) ) if and only if E [ Y (cid:48) | Q c ( ψ (cid:48) )] = Q c ( ψ (cid:48) ).In view of Theorem 1, the latter is equivalent to F Y being a mean-preserving spreadof F Q c ( ψ (cid:48) ) . Therefore, the proposition holds if for any η >

0, there exists K , c ∈ R K +1 and F such that (i) Q c is strictly increasing on [ ψ, ψ ]; (ii) sup y ∈ R | F ψ ( y ) − F ( y ) | < η ;(iii) F Y is mean-preserving spread of F Q c ( (cid:101) ψ ) , with (cid:101) ψ ∼ F .55ix η >

0. Since F Y is continuous on [ y, y ], it is uniformly continuous on this set.Hence, there exists η (cid:48) such that | y − y (cid:48) | < η (cid:48) ⇒ | F Y ( y ) − F Y ( y (cid:48) ) | < η. (14)By assumption, F − Y ◦ F ψ is increasing and continuous. Then, by Theorem 9 in Mu-lansky and Neamtu (1998), there exists a sequence ( P n ) n ∈ N of increasing polynomialson [ ψ, ψ ] satisfying P n ( ψ ) = y and P n ( ψ ) = y and converging uniformly to F − Y ◦ F ψ .Hence, there exists P n such thatsup y ∈ [ ψ,ψ ] | P n ( y ) − F − Y ◦ F ψ ( y ) | < η (cid:48) . (15)Let K be the degree of P n and c ∈ R K denote the vector of coeﬃcients of P n , sothat Q c = P n . Q c is a non-constant polynomial, which is increasing on [ ψ, ψ ]. Hence,its derivative vanishes a ﬁnite number of times and Q c is actually strictly increasing.Hence, Condition (i) above holds. Moreover, combining (15) with (14), we obtainsup y ∈ [ ψ,ψ ] | F Y ◦ Q c ( y ) − F ψ ( y ) | < η. Now, let F := F Y ◦ Q c on [ ψ, ψ ], F ( y ) := 0 for all y < ψ and F ( y ) := 1 for all y > ψ .Then F is continuous and increasing, with limit 0 and 1 respectively at −∞ and ∞ .Thus, it is a cdf and Condition (ii) above holds. Finally, let (cid:101) ψ ∼ F . We have, for any y ∈ [ y, y ], P (cid:16) Q c ( (cid:101) ψ ) ≤ y (cid:17) = F ◦ Q − c ( y ) = F Y ( y ) . This implies that F Q c ( (cid:101) ψ ) is a mean-preserving spread of F Y . The result follows. G.10 Proof of Proposition 7

1. We consider for that purpose ( ψ ∗ , ξ ∗ ψ , ξ ∗ Y , ε ∗ ) ∼ N ( m, Σ), potentially diﬀerent fromthe true ( ψ, ξ ψ , ξ Y , ε ), and let (cid:98) ψ ∗ = ψ ∗ + ξ ∗ ψ , (cid:98) Y ∗ = ψ ∗ + ε ∗ + ξ ∗ Y . We then ﬁx ( m, Σ) so that the DGP satisﬁes all the restrictions speciﬁed in the propo-sitions, and in particular, ( V ( (cid:98) Y ∗ ) , V ( (cid:98) ψ ∗ ) , Cov( (cid:98) Y ∗ , (cid:98) ψ ∗ )) = ( V ( (cid:98) Y ) , V ( (cid:98) ψ ) , Cov( (cid:98)

Y , (cid:98) ψ )).56irst, letting m = ( m , m , m , m ) (cid:48) , we impose m = m = m = 0, and set all thenon-diagonal terms of Σ, except Σ = Cov( ξ ∗ ψ , ξ ∗ Y ), equal to zero. Then ( (cid:98) Y ∗ , (cid:98) ψ ∗ , ψ ∗ )satisfy (1) and RE hold (considering I = σ ( ψ ∗ ) and Y ∗ = ψ ∗ + ε ∗ ). We ﬁx be-low Σ ∈ [0 , V ( (cid:98) ψ )]. Then let Σ = V ( (cid:98) ψ ) − Σ and Σ = V ( (cid:98) Y ) − V ( (cid:98) ψ ) + Σ and Σ = 0, so that ( V ( (cid:98) Y ∗ ) , V ( (cid:98) ψ ∗ )) = ( V ( (cid:98) Y ) , V ( (cid:98) ψ )). Also, because V ( (cid:98) Y ) > V ( (cid:98) ψ ), V ( ξ ∗ ψ ) < V ( ξ ∗ Y + ε ∗ ) and F ξ ∗ ψ dominates at the second order F ξ ∗ Y + ε ∗ .Now, we ﬁx Σ . Let a = V ( (cid:98) Y ) − V ( (cid:98) ψ ) and c = Cov( (cid:98) Y − (cid:98) ψ, (cid:98) ψ ). Then, by Cauchy-Schwarz inequality, c ≤ V ( (cid:98) ψ ) V ( (cid:98) Y − (cid:98) ψ ) = V ( (cid:98) ψ )( a − c ) . This means that there exists σ ∈ [0 , V ( (cid:98) ψ )] such that c ≤ σ ( a − c ) . (16)Let Σ = σ and Σ = c + Σ . Then, by construction,Cov( (cid:98) Y ∗ , (cid:98) ψ ∗ ) = Σ + Σ = V ( (cid:98) ψ ) − Σ + Σ + c = Cov( (cid:98) Y , (cid:98) ψ ) . Moreover, in view of (16) and by deﬁnition of Σ and Σ ,Σ = c + 2 c Σ + Σ ≤ ( a − c )Σ + 2 c Σ + Σ = Σ Σ . In other words, Σ is a proper covariance matrix.2. Let λ = V ( ψ ) /σ ξ ψ . If (1) and RE hold, Cov( ξ ψ , ε + ξ Y ) ≥ λ ≥ λ , we obtain β − (cid:98) Y − (cid:98) ψ, (cid:98) ψ ) V ( (cid:98) ψ )= Cov( ε + ξ Y − ξ ψ , ξ ψ ) σ ξ ψ (1 + λ ) ≥ −

11 + λ .

The result follows. 57 .11 Proof of Proposition 8

We ﬁrst prove that if E [ ψ L ] ≤ E [ Y ] ≤ E [ ψ U ], there exists a unique F ∗ ∈ F B suchthat δ F ∗ = 0. First, suppose that F b (cid:54) = F b (cid:48) and, without loss of generality, b > b (cid:48) .Then ψ b ≤ ψ b (cid:48) , implying that F b ( y ) ≤ F b (cid:48) ( y ) for all y . Moreover, the inequality isstrict for at least one y . As a result, E ( ψ b ) > E ( ψ b (cid:48) ). In other words, there is atmost one F ∗ ∈ F B such that δ F ∗ = 0. If E [ ψ L ] = E [ Y ] or E [ ψ U ] = E [ Y ], such asolution also exists by taking b = −∞ and b = ∞ , respectively. Now, suppose that E [ ψ L ] < E [ Y ] < E [ ψ U ]. For all ∞ > b > b (cid:48) > −∞ , ψ b − ψ b (cid:48) = ( ψ U − max( ψ L , b (cid:48) )) 1l { ψ U ∈ [ b (cid:48) , b ) } + ( b − b (cid:48) )1l { ψ L < b (cid:48) , ψ U ≥ b } + ( b − ψ L )1l { ψ L ∈ [ b (cid:48) , b ) , ψ U ≥ b } . As a result, | ψ b − ψ b (cid:48) | ≤ | b − b (cid:48) | . This implies that (cid:101) δ : b (cid:55)→ E [ ψ b ] is continuous.Moreover, lim b →−∞ (cid:101) δ ( b ) = E [ ψ L ] < E ( Y ) and lim b →∞ (cid:101) δ ( b ) = E [ ψ U ] > E ( Y ). By theintermediate value theorem, there exists b ∗ such that (cid:101) δ ( b ∗ ) = E ( Y ). Hence, thereexists F ∗ ∈ F B such that δ F ∗ = 0. The ﬁrst part of Proposition 8 follows.Let us turn to the second part of the proposition. First, if (ii) holds, there exists b ∈ R such that F ∗ = F b . Then, by construction and Theorem 1, Y and ψ b satisfyH . Moreover, F b ∈ [ F ψ U , F ψ L ]. Therefore, H B holds as well.Now, let us prove that (i) implies (ii). Let us denote by D the set of all the cdfs for ψ such that H B holds. By Theorem 1, these are cdfs F satisfying F ψ U ≤ F ≤ F ψ L , δ F = 0 and dominating at the second order F Y . We show below that all F ∈ D are dominated at the second order by F ∗ . Then, because F ψ U ≤ F ∗ ≤ F ψ L and (cid:82) ydF ∗ ( y ) = (cid:82) ydF Y ( y ), D is not empty only if F ∗ dominates at the second order F Y .The result then follows by Theorem 1.Thus, we have to show that for all t ∈ R , F ∗ = argmin F ψ ∈D (cid:90) t −∞ F ψ ( y ) dy. (17)First, if F ∗ = F −∞ , we have for all F (cid:54) = F ∗ , F ( y ) ≤ F ψ L ( y ) = F ∗ ( y ) for all y , withstrict inequality for some y . Then δ F > δ F ∗ = 0 and D = { F ∗ } , implying that (17)holds. Similarly, (17) holds if F ∗ = F ∞ . 58uppose now that F ∗ = F b for some b ∈ R . Because F ψ U ( y ) ≤ F ψ ( y ) for all y < b and all F ψ ∈ D , (17) holds for all t < b . We now prove that (17) holds also for t ≥ b . First suppose that t ≥ max( b , F ψ ∈ D , (cid:82) ydF Y ( y ) = (cid:82) ydF ψ ( y ) dy .As a result, by Fubini’s theorem, − (cid:90) −∞ F ∗ ( y ) dy + (cid:90) t (1 − F ∗ ( y )) dy + (cid:90) ∞ t (1 − F ∗ ( y )) dy = − (cid:90) −∞ F ψ ( y ) dy + (cid:90) t (1 − F ψ ( y )) dy + (cid:90) ∞ t (1 − F ψ ( y )) dy. Because F ψ ≤ F ψ L = F ∗ on [ b , ∞ ], this implies that − (cid:90) −∞ F ∗ ( y ) dy + (cid:90) t (1 − F ∗ ( y )) dy ≥ − (cid:90) −∞ F ψ ( y ) dy + (cid:90) t (1 − F ψ ( y )) dy and thus (17) holds for t ≥ max( b , b < t ∈ ( b , − (cid:18)(cid:90) t −∞ F ∗ ( y ) dy + (cid:90) t F ∗ ( y ) dy (cid:19) + (cid:90) ∞ (1 − F ∗ ( y )) dy = − (cid:18)(cid:90) t −∞ F ψ ( y ) dy + (cid:90) t F ψ ( y ) dy (cid:19) + (cid:90) ∞ (1 − F ψ ( y )) dy. Using again F ψ ≤ F ψ L = F ∗ on [ t, ∞ ) yields − (cid:90) t F ∗ ( y ) dy + (cid:90) ∞ (1 − F ∗ ( y )) dy ≤ − (cid:90) t F ψ ( y ) dy + (cid:90) ∞ (1 − F ψ ( y )) dy.dy.

Related Researches

Optimal transportation and the falsifiability of incompletely specified economic models

by Ivar Ekeland

A note on global identification in structural vector autoregressions

by Emanuele Bacchiocchi

Duality in dynamic discrete-choice models

by Khai Xiang Chiong

A test of non-identifying restrictions and confidence regions for partially identified parameters

by Alfred Galichon

Assessing Sensitivity of Machine Learning Predictions.A Novel Toolbox with an Application to Financial Literacy

by Falco J. Bargagli Stoffi

Extreme dependence for multivariate data

by Damien Bosc

Dilation bootstrap

by Alfred Galichon

Inference under Covariate-Adaptive Randomization with Imperfect Compliance

by Federico A. Bugni

Identification of Matching Complementarities: A Geometric Viewpoint

by Alfred Galichon

Hypothetical bias in stated choice experiments: Part I. Integrative synthesis of empirical evidence and conceptualisation of external validity

by Milad Haghani

Hypothetical bias in stated choice experiments: Part II. Macro-scale analysis of literature and effectiveness of bias mitigation methods

by Milad Haghani

The Econometrics and Some Properties of Separable Matching Models

by Alfred Galichon

Discretizing Unobserved Heterogeneity

by Stéphane Bonhomme Thibaut Lamadon Elena Manresa

Permutation Tests at Nonparametric Rates

by Marinho Bertanha

General Bayesian time-varying parameter VARs for predicting government bond yields

by Manfred M. Fischer

Quasi-maximum likelihood estimation of break point in high-dimensional factor models

by Jiangtao Duan

A Control Function Approach to Estimate Panel Data Binary Response Model

by Amaresh K Tiwari

Set Identification in Models with Multiple Equilibria

by Alfred Galichon

Inference in Incomplete Models

by Alfred Galichon

Non-stationary GARCH modelling for fitting higher order moments of financial series within moving time windows

by Luke De Clerk

Bridging factor and sparse models

by Jianqing Fan

Misguided Use of Observed Covariates to Impute Missing Covariates in Conditional Prediction: A Shrinkage Problem

by Charles F Manski

A Novel Multi-Period and Multilateral Price Index

by Consuelo Rubina Nava

Cointegrated Solutions of Unit-Root VARs: An Extended Representation Theorem

by Mario Faliva

Estimation and Inference by Stochastic Optimization: Three Examples

by Jean-Jacques Forneron

«

1

2

3

4

»

Submitted on 25 Mar 2020 (v1), last revised 19 Dec 2020 (this version, v3) Updated

arXiv.org Original Source

NASA ADS

Google Scholar

Semantic Scholar