Rationalizing Rational Expectations: Characterization and Tests
RRationalizing Rational Expectations:Characterizations and Tests ∗ Xavier D’Haultfoeuille † Christophe Gaillac ‡ Arnaud Maurel § Abstract
In this paper, we build a new test of rational expectations based on themarginal distributions of realizations and subjective beliefs. This test is widelyapplicable, including in the common situation where realizations and beliefsare observed in two different datasets that cannot be matched. We show thatwhether one can rationalize rational expectations is equivalent to the distribu-tion of realizations being a mean-preserving spread of the distribution of beliefs.The null hypothesis can then be rewritten as a system of many moment inequal-ity and equality constraints, for which tests have been recently developed in theliterature. The test is robust to measurement errors under some restrictionsand can be extended to account for aggregate shocks. Finally, we apply ourmethodology to test for rational expectations about future earnings. Whileindividuals tend to be right on average about their future earnings, our teststrongly rejects rational expectations.
Keywords:
Rational expectations; Test; Subjective expectations; Datacombination. ∗ This paper is based on portions of our previous working paper D’Haultfœuille et al. (2018 b ). Wethank the co-editor Andres Santos, three anonymous referees, Peter Arcidiacono, Levon Barseghyan,Federico Bugni, Pierre Cahuc, Zhuoli Chen, Tim Christensen, Valentina Corradi, Christian Gourier-oux, Nathael Gozlan, Gregory Jolivet, Max Kasy, Jia Li, Matt Masten, Magne Mogstad, AndrewPatton, Aureo de Paula, Mirko Wiederholt, Basit Zafar, Yichong Zhang and participants of variousseminars and conferences for useful comments and suggestions. † CREST-ENSAE, [email protected]. Xavier D’Haultfoeuille thanks the hospitalityof PSE where part of this research was conducted. ‡ CREST-ENSAE and TSE, [email protected]. § Duke University, NBER and IZA, [email protected]. a r X i v : . [ ec on . E M ] D ec Introduction
How individuals form their beliefs about uncertain future outcomes is critical tounderstanding decision making. Despite longstanding critiques (see, among manyothers, Pesaran, 1987; Manski, 2004), rational expectations remain by far the mostpopular framework to describe belief formation (Muth, 1961). This theory states thatagents have expectations that do not systematically differ from the realized outcomes,and efficiently process all private information to form these expectations. Rationalexpectations (RE) are a key building block in many macro- and micro-economicmodels, and in particular in most of the dynamic microeconomic models that havebeen estimated over the last two decades (see, e.g., Aguirregabiria and Mira, 2010;Blundell, 2017, for recent surveys).In this paper, we build a new test of RE. Our test only requires having access tothe marginal distributions of subjective beliefs and realizations, and, as such, can beapplied quite broadly. In particular, this test can be used in a data combination con-text, where individual realizations and subjective beliefs are observed in two differentdatasets that cannot be matched. Such situations are common in practice (see, e.g.,Delavande, 2008; Arcidiacono, Hotz and Kang, 2012; Arcidiacono, Hotz, Maurel andRomano, 2014; Stinebrickner and Stinebrickner, 2014 a ; Gennaioli, Ma and Shleifer,2016; Kuchler and Zafar, 2019; Boneva and Rauh, 2018; Biroli, Boneva, Raja andRauh, 2020). Besides, even in surveys for which an explicit aim is to measure sub-jective expectations, such as the Michigan Survey of Consumers or the Survey ofConsumer Expectations of the New York Fed, expectations and realizations can typ-ically only be matched for a subset of the respondents. And of course, regardless ofattrition, whenever one seeks to measure long or medium-term outcomes, matchingbeliefs with realizations does require waiting for a long period of time before the datacan be made available to researchers. The tests of RE implemented so far in this context only use specific implications of theRE hypothesis. In contrast, we develop a test that exploits all possible implicationsof RE. Using the key insight that we can rationalize RE if and only if the distribution Situations where realizations can be perfectly predicted beforehand, such as in school choicesettings where assignments are a known function of observed inputs, are notable exceptions.
2f realizations is a mean-preserving spread of the distribution of beliefs, we show thatrationalizing RE is equivalent to satisfying one moment equality and (infinitely) manymoment inequalities. As a consequence, if these moment conditions hold, RE cannotbe rejected, given the data at our disposal. By exhausting all relevant implicationsof RE, our test is able to detect much more violations of rational expectations thanexisting tests.To develop a statistical test of RE rationalization, we build on the recent literature oninference based on moment inequalities, and more specifically, on Andrews and Shi(2017). By applying their results to our context, we show that our test controls sizeasymptotically and is consistent over fixed alternatives. We also provide conditionsunder which the test is not conservative.We then consider several extensions to our baseline test. First, we show that by usinga set of covariates that are common to both datasets, we can increase our ability todetect violations of RE. Another important issue is that of unanticipated aggregateshocks. Even if individuals have rational expectations, the mean of observed outcomesmay differ from the mean of individual beliefs simply because of aggregate shocks.We show that our test can be easily adapted to account for such shocks.Finally, we prove that our test is robust to measurement errors in the following sense.If individuals have rational expectations but both beliefs and outcomes are measuredwith (classical) errors, then our test does not reject RE provided that the amount ofmeasurement errors on beliefs does not exceed the amount of intervening transitoryshocks plus the measurement errors on the realized outcomes. In that specific sense,imperfect data quality does not jeopardize the validity of our test. In particular,this allows for elicited beliefs to be noisier than realized outcomes. This provides arationale for our test even in cases where realizations and beliefs are observed in thesame dataset, since a direct test based on a regression of the outcome on the beliefs(see, e.g., Lovell, 1986) is, at least at the population level, not robust to any amountof measurement errors on the subjective beliefs.We apply our framework to test for rational expectations about future earnings. To Interestingly, the equivalence on which we rely, which is based on Strassen’s theorem (Strassen,1965), is also used in the microeconomic risk theory literature, see in particular Rothschild andStiglitz (1970).
3o so, we combine elicited beliefs about future earnings with realized earnings, usingdata from the Labor Market module of the Survey of Consumer Expectations (SCE,New York Fed), and test whether household heads form rational expectations ontheir annual labor earnings. While a naive test of equality of means between earningsbeliefs and realizations shows that earnings expectations are realistic in the sense ofnot being significantly biased, thus not rejecting the rational expectations hypothesis,our test does reject rational expectations at the 1% level. Taken together, our find-ings illustrate the practical importance of incorporating the additional restrictionsof rational expectations that are embedded in our test. The results of our test alsoindicate that the RE hypothesis is more credible for certain subpopulations than oth-ers. For instance, we reject RE for individuals without a college degree, who exhibitsubstantial deviations from RE. On the other hand, we fail to reject the hypothesisthat college-educated workers have rational expectations on their future earnings.By developing a test of rational expectations in a setting where realizations and sub-jective beliefs are observed in two different datasets, we bring together the literatureon data combination (see, e.g., Cross and Manski, 2002, Molinari and Peski, 2006,Fan, Sherman and Shum, 2014, Buchinsky, Li and Liao, 2019, and Ridder and Moffitt,2007 for a survey), and the literature on testing for rational expectations in a microenvironment (see, e.g., Lovell, 1986; Gourieroux and Pradel, 1986; Ivaldi, 1992, forseminal contributions).On the empirical side, we contribute to a rapidly growing literature on the use ofsubjective expectations data in economics (see, e.g., Manski, 2004; Delavande, 2008;Van der Klaauw and Wolpin, 2008; Van der Klaauw, 2012; Arcidiacono, Hotz, Maureland Romano, 2014; de Paula, Shapira and Todd, 2014; Stinebrickner and Stinebrick-ner, 2014 b ; Wiswall and Zafar, 2015). In this paper, we show how to incorporate allof the relevant information from subjective beliefs combined with realized data to testfor rational expectations.The remainder of the paper is organized as follows. In Section 2, we present thegeneral set-up and the main theoretical equivalences underlying our RE test. In Sec-tion 3, we introduce the corresponding statistical tests and study their asymptoticproperties. Section 4 illustrates the finite sample properties of our tests throughMonte Carlo simulations. Section 5 applies our framework to expectations about fu-4ure earnings. Finally, Section 6 concludes. The appendix collects various theoreticalextensions, additional simulation results, additional material on the application, andall the proofs. The companion R package RationalExp, described in the user guide(D’Haultfœuille, Gaillac and Maurel, 2018 a ), performs the test of RE. We assume that the researcher has access to a first dataset containing the individualoutcome variable of interest, which we denote by Y . She also observes, through asecond dataset drawn from the same population, the elicited individual expectationon Y , denoted by ψ . The two datasets, however, cannot be matched. We focus onsituations where the researcher has access to elicited beliefs about mean outcomes,as opposed to probabilistic expectations about the full distribution of outcomes. Thetype of subjective expectations data we consider in the paper has been collected invarious contexts, and used in a number of prior studies (see, among others, Delavande,2008; Zafar, 2011 b ; Arcidiacono, Hotz and Kang, 2012; Arcidiacono, Hotz, Maurel andRomano, 2014; Hoffman and Burks, 2020).Formally, ψ = E [ Y |I ], where I denotes the σ -algebra corresponding to the agent’sinformation set and E [ ·|I ] is the subjective expectation operator (i.e. for any U , E [ U |I ] is a I -measurable random variable). We are interested in testing the rationalexpectations (RE) hypothesis ψ = E [ Y |I ], where E [ ·|I ] is the conditional expecta-tion operator generated by the true data generating process. Importantly, we remainagnostic throughout most of our analysis on the information set I . Our setting is alsocompatible with heterogeneity in the information different agents use to form theirexpectations. To see this, let ( U , ..., U m ) denote m variables that agents may or maynot observe when they form their expectations, and let D k = 1 if U k is observed, 0otherwise. Then, if I is the information set generated by ( D U , ..., D m U m ), agentswill use different subsets of the ( U k ) k =1 ...m (i.e., different pieces of information) de-pending on the values of the ( D k ) k =1 ...m . Our setup encompasses a wide variety ofsituations, where individuals have private information and form their beliefs based on5heir information set. This includes various contexts where individuals form their ex-pectations about future outcomes, including education, labor market as well as healthoutcomes. By remaining agnostic on the information set, our analysis complementsseveral studies which primarily focus on testing for different information sets, whilemaintaining the rational expectations assumption (see Cunha and Heckman, 2007,for a survey).It is easy to see that the RE hypothesis imposes restrictions on the joint distribution ofrealizations Y and beliefs ψ . In this data combination context, the relevant questionof interest is then whether one can rationalize RE, in the sense that there exists atriplet ( Y (cid:48) , ψ (cid:48) , I (cid:48) ) such that ( i ) the pair of random variables ( Y (cid:48) , ψ (cid:48) ) are compatiblewith the marginal distributions of Y and ψ ; and ( ii ) ψ (cid:48) correspond to the rationalexpectations of Y (cid:48) , given the information set I (cid:48) , i.e., E ( Y (cid:48) |I (cid:48) ) = ψ (cid:48) . Hence, weconsider the test of the following hypothesis:H : there exists a pair of random variables ( Y (cid:48) , ψ (cid:48) ) and a sigma-algebra I (cid:48) such that σ ( ψ (cid:48) ) ⊂ I (cid:48) , Y (cid:48) ∼ Y, ψ (cid:48) ∼ ψ and E [ Y (cid:48) |I (cid:48) ] = ψ (cid:48) , where ∼ denotes equality in distribution. Rationalizing RE does not mean thatthe true realizations Y , beliefs ψ and information set I are such that E [ Y |I ] = ψ .Instead, it means that there exists a triplet ( Y (cid:48) , ψ (cid:48) , I (cid:48) ) consistent with the data andsuch that E [ Y (cid:48) |I (cid:48) ] = ψ (cid:48) . In other words, rejecting H implies that RE does not hold,in the sense that the true realizations, beliefs, and information set do not satisfy RE( E [ Y |I ] (cid:54) = ψ ). The converse, however, is not true. Let F ψ and F Y denote the cumulative distribution functions (cdf) of ψ and Y , x + =max(0 , x ), and define ∆( y ) = (cid:90) y −∞ F Y ( t ) − F ψ ( t ) dt. Throughout most of our analysis, we impose the following regularity conditions onthe distributions of realized outcomes ( Y ) and subjective beliefs ( ψ ):6 ssumption 1 E ( | Y | ) < ∞ and E ( | ψ | ) < ∞ . The following preliminary result will be useful subsequently.
Lemma 1
Suppose that Assumption 1 holds. Then H holds if and only if there existsa pair of random variables ( Y (cid:48) , ψ (cid:48) ) such that Y (cid:48) ∼ Y , ψ (cid:48) ∼ ψ and E [ Y (cid:48) | ψ (cid:48) ] = ψ (cid:48) . Lemma 1 states that in order to test for H , we can focus on the constraints onthe joint distribution of Y and ψ , and ignore those related to the information set.This is intuitive given that we impose no restrictions on this set. Our main resultis Theorem 1 below. It states that rationalizing RE (i.e., H ) is equivalent to acontinuum of moment inequalities, and one moment equality. Theorem 1
Suppose that Assumption 1 holds. The following statements are equiva-lent:(i) H holds;(ii) ( F Y mean-preserving spread of F ψ ) ∆( y ) ≥ for all y ∈ R and E [ Y ] = E [ ψ ] ;(iii) E (cid:2) ( y − Y ) + − ( y − ψ ) + (cid:3) ≥ for all y ∈ R and E [ Y ] = E [ ψ ] . The implication (i) ⇒ (iii) and the equivalence between (ii) and (iii) are simple toestablish. The key part of the result is to prove that (iii) implies (i). To show this,we first use Lemma 1, which states that H is equivalent to the existence of ( Y (cid:48) , ψ (cid:48) )such that Y (cid:48) ∼ Y , ψ (cid:48) ∼ ψ and E [ Y (cid:48) | ψ (cid:48) ] = ψ (cid:48) . Then the result essentially follows fromStrassen’s theorem (Strassen, 1965, Theorem 8).It is interesting to note that Theorem 1 is related to the theory of risk in microeco-nomic theory. In particular, using the terminology of Rothschild and Stiglitz (1970),(ii) states that realizations ( Y ) are more risky than beliefs ( ψ ). The main value ofTheorem 1, from a statistical point of view, is to transform H into the set of momentinequality (and equality) restrictions given by (iii). We show in Section 3 how tobuild a statistical test of these conditions.7 omparison with alternative approaches We now compare our approach withalternative ones that have been proposed in the literature. In the following discussion,as in this whole section, we reason at the population level and thus ignore statisticaluncertainty. Accordingly, the “tests” we consider here are formally deterministic, andwe compare them in terms of data generating processes violating the null hypothesisassociated with each of them.Our approach can clearly detect many more violations of rational expectations thanthe “naive” approach based solely on the equality E ( Y ) = E ( ψ ). It also detects moreviolations than the approach based on the restrictions E ( Y ) = E ( ψ ) and V ( Y ) ≥ V ( ψ ) (approach based on the variance), which has been considered in particular inthe macroeconomic literature on the accuracy and rationality of forecasts (see, e.g.Patton and Timmermann, 2012). On the other hand, and as expected since it relieson the joint distribution of ( Y, ψ ), the “direct” approach for testing RE, based on E ( Y | ψ ) = ψ , can detect more violations of rational expectations than ours.To better understand the differences between these four different approaches (“naive”,variance, “direct”, and ours), it is helpful to consider important particular cases. Ofcourse, if ψ = E [ Y |I ], individuals are rational and none of the four approachesleads to reject RE. Next, consider departures from rational expectations of the form ψ = E [ Y |I ] + η , with η independent of E [ Y |I ]. If E ( η ) (cid:54) = 0, subjective beliefs arebiased, and individuals are on average either over-pessimistic or over-optimistic. Itfollows that E ( Y ) (cid:54) = E ( ψ ), implying that all four approaches lead to reject RE.More interestingly, if E ( η ) = 0, individuals’ expectations are right on average, andthe naive approach does not lead to reject RE. However, it is easy to show that,as long as deviations from RE are heterogeneous in the population ( V ( η ) > η ) relative to the uncertainty shocks ( ε = Y − E ( Y |I )). Inother words and intuitively, we reject RE whenever departures from RE dominate theuncertainty shocks affecting the outcome. Formally, and using similar arguments as inProposition 4 in Subsection 2.2.4, one can show that if ε is independent of E [ Y |I ], wereject H as long as the distribution of the uncertainty shocks stochastically dominatesat the second-order the distribution of the deviations from RE.8pecifically, if ε ∼ N (0 , σ ε ) and η ∼ N (0 , σ η ), we reject RE if and only if σ η > σ ε .In such a case, our approach boils down to the variance approach mentioned above:we reject whenever V ( ψ ) > V ( Y ). But interestingly, if the discrepancy ( η ) betweenbeliefs and RE is not normally distributed, we can reject H even if V ( ψ ) ≤ V ( Y ).Suppose for instance that ε ∼ N (0 ,
1) and η = a ( − { U ≤ . } + { U ≥ . } ) , U ∼ U [0 ,
1] and a > . In other words, 80% of individuals are rational, 10% are over-pessimistic and form ex-pectations equal to E [ Y |I ] − a , whereas 10% are over-optimistic and expect E [ Y |I ]+ a .Then one can show that our approach leads to reject RE when a ≥ . a = 1 . V ( η ) (cid:39) . < V ( ε ) = 1. Binary outcome
Our equivalence result does not require the outcome Y to becontinuously distributed. In the particular case where Y is binary, our test re-duces to the naive test of E ( Y ) = E ( ψ ). Indeed, when Y is a binary outcomeand ψ ∈ [0 , E ( Y ) = E ( ψ ), the inequalities E (cid:2) ( y − Y ) + − ( y − ψ ) + (cid:3) ≥ y ∈ R . This applies to expec-tations about binary events, such as, e.g., being employed or not at a given date. Interpretation of the boundary condition
To shed further light on our test andon the interpretation of H , it is instructive to derive the distributions of Y | ψ thatcorrespond to the boundary condition (∆( y ) = 0). The proposition below shows that,in the presence of rational expectations, agents whose beliefs ψ lies at the boundaryof H have perfect foresight, i.e. ψ = E [ Y |I ] = Y . Proposition 1
Suppose that ( Y, ψ ) satisfies RE, u (cid:55)→ F − Y | ψ ( τ | u ) is continuous for all τ ∈ (0 , , and ∆( y ) = 0 for some y in the interior of the support of ψ . Then thedistribution of Y conditional on ψ = y is degenerate: P ( Y = y | ψ = y ) = 1 . For any cdf F , we let F − denote its quantile function, namely F − ( τ ) = inf { x : F ( x ) ≥ τ } . .2.2 Equivalence with covariates In practice we may observe additional variables X ∈ R d X in both datasets. Assumingthat X is in the agent’s information set, we modify H as follows: H X : there exists a pair of random variables ( Y (cid:48) , ψ (cid:48) ) and a sigma-algebra I (cid:48) such that σ ( ψ (cid:48) , X ) ⊂ I (cid:48) , Y (cid:48) | X ∼ Y | X, ψ (cid:48) | X ∼ ψ | X and E [ Y (cid:48) |I (cid:48) ] = ψ (cid:48) . Adding covariates increases the number of restrictions that are implied by the rationalexpectation hypothesis, thus improving our ability to detect violations of rationalexpectations. Proposition 2 below formalizes this idea and shows that H X can beexpressed as a continuum of conditional moment inequalities, and one conditionalmoment equality. Proposition 2
Suppose that Assumption 1 holds. The following two statements areequivalent:(i) H X holds;(ii) Almost surely, E (cid:2) ( y − Y ) + − ( y − ψ ) + (cid:12)(cid:12) X (cid:3) ≥ for all y ∈ R and E [ Y − ψ | X ] =0 .Moreover, if H X holds, H holds as well. Oftentimes, the outcome variable is affected not only by individual-specific shocks,but also by aggregate shocks. We denote by C the random variable corresponding tothe aggregate shocks. The issue, in this case, is that we observe a single realizationof C ( c , say), along with the outcome variable conditional on that realization C = c . In other words, we only identify F Y | C = c rather than F Y , as the latter wouldrequire to integrate over the distribution of all possible aggregate shocks. Moreover,the restriction E [ Y | C = c, ψ ] = ψ is generally violated, even though the rationalexpectations hypothesis holds. It follows that one cannot directly apply our previous See complementary work by Gutknecht et al. (2018), who use subjective expectations data torelax the rational expectations assumption, and propose a method allowing to test whether specificcovariates are included in the agents’ information sets. F Y by F Y | C = c . In such a case, one has to make additionalassumptions on how the aggregate shocks affect the outcome.To illustrate our approach, let us consider the example of individual income. Supposethat the logarithm of income of individual i at period t , denoted by Y it , satisfies aRestricted Income Profile model: Y it = α i + β t + ε it , where β t capture aggregate (macroeconomic) shocks, ε it follows a zero-mean ran-dom walk, and α i , ( β t ) t and ( ε it ) t are assumed to be mutually independent. Let I it − denote individual i ’s information set at time t −
1, and suppose that I it − = σ ( α i , ( β t − k ) k ≥ , ( ε it − k ) k ≥ ). If individuals form rational expectations on their futureoutcomes, their beliefs in period t − t aregiven by ψ it = E [ Y it |I it − ] = α i + E [ β t | ( β t − k ) k ≥ ] + ε it − . Thus, Y it = ψ it + C t + ε it − ε it − , with C t = β t − E [ β t | ( β t − k ) k ≥ ]. The correspondingconditional expectation is given by: E [ Y it |I it − , C t = c t ] = ψ it + c t (cid:54) = ψ it . To get closer to our initial set-up, we now drop indexes i and t and maintain theconditioning on the aggregate shocks C = c implicit. Under these conventions, ra-tionalizing RE does not correspond to E [ Y |I ] = ψ , but instead to E [ Y |I ] = c + ψ for some c ∈ R . A similar reasoning applies to multiplicative instead of additiveaggregate shocks. In such a case, the null takes the form E [ Y |I ] = c ψ , for some c >
0. In these two examples, c is identifiable: by c = E ( Y ) − E ( ψ ) in the additivecase, by c = E ( Y ) / E ( ψ ) in the multiplicative case. Moreover, there exists in bothcases a known function q ( y, c ) such that E ( q ( Y, c )) = E ( ψ ), namely q ( y, c ) = y − c and q ( y, c ) = y/c for additive and multiplicative shocks, respectively.More generally, we consider the following null hypothesis for testing RE in the pres-ence of aggregate shocks:H S : there exist random variables ( Y (cid:48) , ψ (cid:48) ) , a sigma-algebra I (cid:48) and c ∈ R such that σ ( ψ (cid:48) ) ⊂ I (cid:48) , Y (cid:48) ∼ Y, ψ (cid:48) ∼ ψ and E [ q ( Y (cid:48) , c ) |I (cid:48) ] = ψ (cid:48) . q ( ., . ) is a known function supposed to satisfy the following restrictions. Assumption 2 E ( | ψ | ) < ∞ and for all c , E ( | q ( Y, c ) | ) < ∞ . Moreover, E [ q ( Y, c )] = E [ ψ ] admits a unique solution, c . By applying our main equivalence result (Theorem 1) to q ( Y, c ) and ψ , we obtainthe following result. Proposition 3
Suppose that Assumption 2 holds. Then the following statements areequivalent:(i) H S holds;(ii) E (cid:2) ( y − q ( Y, c )) + − ( y − ψ ) + (cid:3) ≥ for all y ∈ R . A few remarks on this proposition are in order. First, this result can be extendedin a straightforward way to a setting with covariates. This is important not onlyto increase the ability of our test to detect violations of RE, but also because thisallows for aggregate shocks that differ across observable groups. We discuss furtherthis extension, and the corresponding statistical test, in Appendix A.1. Second, inthe presence of aggregate shocks, the null hypothesis does not involve a momentequality restriction anymore; the corresponding moment is used instead to identify c . Related, a clear limitation of the naive test ( E ( Y ) = E ( ψ )) is that, unlike our test,it is not robust to aggregate shocks. In this case, rejecting the null could either stemfrom violations of the rational expectation hypothesis, or simply from the presenceof aggregate shocks. Third, in Appendix A.2, we examine whether one can extendthe results above to test for RE when aggregate shocks affect the outcomes in amore general way. Proposition 6 establishes a negative result in this respect: aslong as one allows for a sufficiently flexible dependence between the outcome andthe aggregate shocks, any given distribution of subjective expectations is arbitrarilyclose to a distribution for which RE can be rationalized. This implies that, withinthis more general class of outcome models, there does not exist any almost-surelycontinuous RE test that has non-trivial power.12 .2.4 Robustness to measurement errors We have assumed so far that Y and ψ were perfectly observed; yet measurementerrors in survey data are pervasive (see, e.g. Bound, Brown and Mathiowetz, 2001).We explore in the following the extent to which our test is robust to measurementerrors. By robust, we mean that the test does not incorrectly reject RE, when theyin fact hold. Specifically, assume that the true variables ( ψ and Y ) are unobserved.Instead, we only observe (cid:98) ψ and (cid:98) Y , which are affected by classical measurement errors. Namely: (cid:98) ψ = ψ + ξ ψ with ξ ψ ⊥⊥ ψ, E [ ξ ψ ] = 0 (cid:98) Y = Y + ξ Y with ξ Y ⊥⊥ Y, E [ ξ Y ] = 0 . (1)The following proposition shows that our test is robust to a certain degree of mea-surement errors on the beliefs. Proposition 4
Suppose that Y and ψ satisfy H , and let ε = Y − ψ and (cid:16) (cid:98) ψ, (cid:98) Y (cid:17) bedefined as in (1) . Suppose also that ε + ξ Y ⊥⊥ ψ and F ξ ψ dominates at the secondorder F ξ Y + ε . Then (cid:98) Y and (cid:98) ψ satisfy H . The key condition is that F ξ ψ dominates at the second order F ξ Y + ε , or, equivalentlyhere, that F ξ Y + ε is a mean-preserving spread of F ξ ψ . Recall that in the case ofnormal variables, ξ ψ ∼ N (0 , σ ) and ξ Y + ε ∼ N (0 , σ ), this is in turn equivalentto imposing σ ≤ σ . Thus, even if there is no measurement error on Y , so that ξ Y = 0, this condition may hold provided that the variance of measurement errorson ψ is smaller than the variance of the uncertainty shocks on Y . More generally,this allows elicited beliefs to be - potentially much - noisier than realized outcomes, asetting which is likely to be relevant in practice. One should not infer, however, thatmeasurement errors are innocuous in our set-up. Indeed, the converse of Proposition4 does not hold: we may reject H with Y and ψ , but not with (cid:98) Y and (cid:98) ψ . As asimple example, suppose that Y ∼ N (0 , σ Y ), ψ ∼ N (0 , σ ψ ), ξ Y ∼ N (0 , σ ), ξ ψ = 0and σ ψ ∈ ( σ Y , σ Y + σ ]. Then, (cid:98) Y and (cid:98) ψ satisfy H , since σ ψ ≤ σ Y + σ , whereas Y See Zafar (2011 a ) who does not find evidence of non-classical measurement errors on subjectivebeliefs elicited from a sample of Northwestern undergraduate students. We conjecture that our testis robust to some forms of non-classical measurement errors. However, it seems difficult in this caseto obtain a general result similar to the one in Proposition 4. ψ do not, since σ ψ > σ Y . Importantly though, Proposition 4 does show that ourtest is conservative in the sense that measurement errors cannot result in incorrectlyconcluding that the RE hypothesis does not hold.In situations where ( (cid:98) Y , (cid:98) ψ ) are jointly observed, one could in principle alternativelyimplement the direct test. However, in contrast to our test, the direct test is not robustto any measurement errors on the subjective beliefs ψ . Indeed, if RE holds, so that E [ Y | ψ ] = ψ , it is nevertheless the case that E (cid:104) (cid:98) Y (cid:12)(cid:12)(cid:12) (cid:98) ψ (cid:105) (cid:54) = (cid:98) ψ , as long as Cov( ξ Y , (cid:98) ψ ) =Cov( ξ ψ , Y ) = 0 and V ( ξ ψ ) >
0. In other words, even if individuals have rationalexpectations, the direct test will reject the null hypothesis in the presence of even anarbitrarily small degree of measurement errors on the elicited beliefs.Also, it is unclear whether, in the presence of measurement errors on the elicited beliefsand beyond the restrictions on the marginal distributions, there are restrictions onthe copula of ( (cid:98)
Y , (cid:98) ψ ) that are implied by RE. For instance, we show in Proposition 7in Appendix B that under RE, and without imposing restrictions on the dependencebetween ξ Y + ε and ξ ψ , the coefficient of the (theoretical) linear regression of (cid:98) Y on (cid:98) ψ remains unrestricted. On the other hand, if one assumes that Cov( ξ Y + ε, ξ ψ ) ≥ V ( ψ ) / V ( ξ ψ ) ≥ λ for some λ ≥
0, Proposition 7 also shows that the coefficient ofthe linear regression of (cid:98) Y on (cid:98) ψ is bounded from below under RE. Such a restriction,which does require to take a stand on the signal-to-noise ratio V ( ψ ) / V ( ξ ψ ), can beeasily added to the moment inequalities of our test if ( (cid:98) Y , (cid:98) ψ ) is observed. We now briefly discuss other relevant directions in which Theorem 1 can be extended.First, another potential source of uncertainty on ψ is rounding. Rounding practicesby interviewees are common in the case of subjective beliefs. Under additional restric-tions, it is possible in such a case to construct bounds on the true beliefs ψ (see, e.g.,Manski and Molinari, 2010). We show in Appendix C that our test can be generalizedto accommodate this rounding practice.Second, we have implicitly maintained the assumption so far that subjective beliefsand realized outcomes are drawn from the same population. In Appendix D, we There might of course possibly be additional relevant information in the higher-order moments,although we have not been able to find any. Y k ) k =1 ,..,K and multiple subjective beliefs ( ψ k ) k =1 ,..,K associ-ated with each of these outcomes. Specifically, whether one can rationalize rationalexpectations in this environment can be written as: E ( Y k | ψ , ..., ψ K ) = ψ k , for all k ∈ { , ..., K } which, in turn, is equivalent to the distribution of the outcomes Y k being a mean-preserving spread of the distribution of the beliefs ψ k . This situation arises in variouscontexts, including cases where respondents declare their subjective probabilities ofmaking particular choices among K + 1 possible alternatives. This also arises insituations where expectations about the distribution of a continuous outcome Y areelicited through questions of the form “what do you think is the percent chancethat [Y] will be greater than [y]?”, for different values ( y k ) k =1 ,..,K . In such cases, itis natural to build a RE test based on the multiple outcomes ( { Y > y k } ) k =1 ,..,K and subjective beliefs ( ψ k ) k =1 ,..,K , where ψ k is the subjective survival function of Y evaluated at y k . We now propose a testing procedure for H X , which can be easily adapted to the casewhere no covariate common to both datasets is available to the analyst. To simplifynotation, we use a potential outcome framework to describe our data combinationproblem. Specifically, instead of observing ( Y, ψ ), we suppose to observe only, inaddition to the covariates X , (cid:101) Y = DY + (1 − D ) ψ and D , where D = 1 (resp. D = 0)if the unit belongs to the dataset of Y (resp. ψ ). As in Subsection 2.1, we assume thatthe two samples are drawn from the same population, which amounts to supposingthat D ⊥⊥ ( X, Y, ψ ) (see Assumption 3-(i) below). In order to build our test, we usethe characterization (ii) of Proposition 2: E (cid:2) ( y − Y ) + − ( y − ψ ) + (cid:12)(cid:12) X (cid:3) ≥ ∀ y ∈ R and E [ Y − ψ | X ] = 0 . (cid:101) Y only, E (cid:20) W (cid:16) y − (cid:101) Y (cid:17) + (cid:12)(cid:12)(cid:12)(cid:12) X (cid:21) ≥ ∀ y ∈ R and E (cid:104) W (cid:101) Y (cid:12)(cid:12)(cid:12) X (cid:105) = 0 , where W = D/ E ( D ) − (1 − D ) / E (1 − D ). This formulation of the null hypothesis allowsus to apply the instrumental functions approach of Andrews and Shi (2017, AS), whoconsider the issue of testing many conditional moment inequalities and equalities. Wethen build on their results to establish that our test controls size asymptotically andis consistent over fixed alternatives. The initial step is to transform the conditionalmoments into the following unconditional moments conditions: E (cid:20) W (cid:16) y − (cid:101) Y (cid:17) + g ( X ) (cid:21) ≥ , E [( Y − ψ ) g ( X )] = 0 , for all y ∈ R and g belonging to a suitable class of non-negative functions.We suppose to observe a sample ( D i , X i , (cid:101) Y i ) i =1 ...n of n i.i.d. copies of ( D, X, (cid:101) Y ).We consider instrumental functions g that are indicators of belonging to specifichypercubes within [0 , d X , hence we tranform the variables X i to lie in [0 , d X . Fornotational convenience, we let (cid:101) X i denote the nontransformed vector of covariates, andredefine X i as: X i = Φ (cid:16)(cid:98) Σ − / (cid:101) X,n (cid:16) (cid:101) X i − (cid:101) X i (cid:17)(cid:17) , where, for any x = ( x , . . . , x d X ), we let Φ ( x ) = (Φ( x ) , . . . , Φ ( x d X )) (cid:62) . Here Φdenotes the standard normal cdf, (cid:98) Σ (cid:101) X,n is the sample covariance matrix of (cid:16) (cid:101) X i (cid:17) i =1 ...n and (cid:101) X n its sample mean.Specifically, we consider instrumental functions g belonging to the class of functions G r = { g a,r , a ∈ A r } , with A r = { , , . . . , r } d X ( r ≥ g a,r ( x ) = 1l { x ∈ C a,r } and,for any a = ( a , ..., a d X ) (cid:62) ∈ A r , C a,r = d X (cid:89) u =1 (cid:18) a u − r , a u r (cid:21) . Other testing procedures could be used to implement our test, such as that proposed by Lintonet al. (2010). T , we need to introduce additional notations. First,let w i = nD i / (cid:80) nj =1 D j − n (1 − D i ) / (cid:80) nj =1 (1 − D j ) and define, for any y ∈ R , m (cid:16) D i , (cid:101) Y i , X i , g, y (cid:17) = m (cid:16) D i , (cid:101) Y i , X i , g, y (cid:17) m (cid:16) D i , (cid:101) Y i , X i , g, y (cid:17) = w i (cid:16) y − (cid:101) Y i (cid:17) + g ( X i ) w i (cid:101) Y i g ( X i ) . (2)Let m n ( g, y ) = (cid:80) ni =1 m (cid:16) D i , (cid:101) Y i , X i , g, y (cid:17) /n and define similarly m n,j for j = 1 ,
2. Forany function g and any y ∈ R , we also define, for some (cid:15) > n ( g, y ) = (cid:98) Σ n ( g, y ) + (cid:15) Diag (cid:16)(cid:98) V (cid:16) (cid:101) Y (cid:17) , (cid:98) V (cid:16) (cid:101) Y (cid:17)(cid:17) , where (cid:98) Σ n ( g, y ) is the sample covariance matrix of √ nm n ( g, y ) and (cid:98) V (cid:16) (cid:101) Y (cid:17) is theempirical variance of (cid:101) Y . We then denote by Σ n,jj ( g, y ) ( j = 1 ,
2) the j -th diagonalterm of Σ n ( g, y ).Then the (Cram´er-von-Mises) test statistic T is defined by T =sup y ∈ (cid:98) Y r n (cid:88) r =1 (2 r ) − d X ( r + 100) (cid:88) a ∈ A r (cid:20) (1 − p ) (cid:18) − √ nm n, ( g a,r , y )Σ n, ( g a,r , y ) / (cid:19) +2 + p (cid:18) √ nm n, ( g a,r , y )Σ n, ( g a,r , y ) / (cid:19) (cid:21) , where (cid:98) Y = (cid:20) min i =1 ,...,n (cid:101) Y i , max i =1 ,...,n (cid:101) Y i (cid:21) , p ∈ (0 ,
1) is a parameter weighting the moments in-equalities versus equalities and ( r n ) n ∈ N is a deterministic sequence tending to infinity.To test for rational expectations in the absence of covariates, we set the instrumentalfunction equal to the constant function g ( X ) = 1, and the test statistic is simplywritten as: T = sup y ∈ (cid:98) Y (cid:20) (1 − p ) (cid:18) − √ nm n, ( y )Σ n, ( y ) / (cid:19) +2 + p (cid:18) √ nm n, ( y )Σ n, ( y ) / (cid:19) (cid:21) , where, using the notations introduced above, m n,j ( y ) = m n,j (1 , y ) and Σ n,jj ( y ) =Σ n,jj (1 , y ) ( j = 1 , ϕ n,α =1l (cid:8) T > c ∗ n,α (cid:9) where the estimated critical value c ∗ n,α is obtained by bootstrap using asin AS the Generalized Moment Selection method. Specifically, we follow three steps:17. Compute the function ϕ n ( y, g ) = (cid:0) ϕ n, ( y, g ) , (cid:1) (cid:62) for ( y, g ) in (cid:98) Y × ∪ r n r =1 G r , with ϕ n, ( y, g ) = Σ / n, B n (cid:26) n / κ n Σ − / n, m n, ( y, g ) > (cid:27) , and where B n = ( b ln( n ) / ln(ln( n ))) / , b > κ n = ( κ ln( n )) / , and κ > n, , we fix (cid:15) to 0 .
05, as in AS.2. Let (cid:16) D ∗ i , (cid:101) Y ∗ i , X ∗ i (cid:17) i =1 ,...,n denote a bootstrap sample, i.e., an i.i.d. sample fromthe empirical cdf of (cid:16) D, (cid:101) Y , X (cid:17) , and compute from this sample the bootstrapcounterparts of m n and Σ n , m ∗ n and Σ ∗ n . Then compute the bootstrap coun-terpart of T , T ∗ , replacing Σ n ( y, g a,r ) and √ nm n ( y, g a,r ) by Σ ∗ n ( y, g a,r ) and √ n ( m ∗ n − m n ) ( y, g a,r ) + ϕ n ( y, g a,r ), respectively.3. The threshold c ∗ n,α is the quantile (conditional on the data) of order 1 − α + η of T ∗ + η for some η >
0. Following AS, we set η to 10 − .Note that, despite the multiple steps involved, the testing procedure remains com-putationally easily tractable. In particular, for the baseline sample we use in ourapplication (see Section 5.1), the RE test only takes 2 minutes. We now turn to the asymptotic properties of the test. For that purpose, it is conve-nient to introduce additional notations. Let Y and X denote the support of Y and X respectively, and L F = (cid:26) ( y, g a,r ) : y ∈ Y , ( a, r ) ∈ A r × N : E F (cid:20) W (cid:16) y − (cid:101) Y (cid:17) + g a,r ( X ) (cid:21) = 0 (cid:27) , where, to make the dependence on the underlying probability measure explicit, E F denotes the expectation with respect to the distribution F of (cid:16) D, (cid:101) Y , X (cid:17) . Finally, let F denote a subset of all possible cumulative distribution functions of (cid:16) D, (cid:101) Y , X (cid:17) and F be the subset of F such that H X holds. We impose the following conditions on F and F . Assumption 3 This CPU time is obtained using our companion R package, on an Intel Xeon CPU E5-2643,3.30GHz with 256Gb of RAM. i) For all F ∈ F , D ⊥⊥ ( X, Y, ψ ) ;(ii) There exists M > such that (cid:101) Y ∈ [ − M, M ] for all F ∈ F . Also, inf F ∈F V F (cid:16) (cid:101) Y (cid:17) > and < inf F ∈F E F [ D ] ≤ sup F ∈F E F [ D ] < ;(iii) For all F ∈ F , K F , the asymptotic covariance kernel of n − / Diag (cid:16) V F (cid:16) (cid:101) Y (cid:17)(cid:17) − / m n is in a compact set K of the set of all × matrix valued covariance kernelson Y × ∪ r ≥ G r with uniform metric d defined by d ( K, K (cid:48) ) = sup ( y,g,y (cid:48) ,g (cid:48) ) ∈ ( Y×∪ r ≥ H r ) (cid:107) K ( y, g, y (cid:48) , g (cid:48) ) − K (cid:48) ( y, g, y (cid:48) , g (cid:48) ) (cid:107) . The main result of this section is Theorem 2. It shows that, under Assumption 3, thetest ϕ n,α controls the asymptotic size and is consistent over fixed alternatives. Theorem 2
Suppose that r n → ∞ and Assumption 3 holds. Then:(i) lim sup n →∞ sup F ∈F E F [ ϕ n,α ] ≤ α ;(ii) If there exists F ∈ F such that L F is nonempty and there exists ( j, y , g ) in { , } × L F such that K F ,jj ( y , g , y , g ) > , then, for any α ∈ [0 , / , lim η → lim sup n →∞ sup F ∈F E F [ ϕ n,α ] = α. (iii) If F ∈ F \F , then lim n →∞ E F ( ϕ n,α ) = 1 . Theorem 2 (i) is closely related to Theorem 5.1 and Lemma 2 in AS. It shows thatthe test ϕ n,α controls the asymptotic size, in the sense that the supremum over F of its level is asymptotically lower or equal to α . To prove this result, the key is toestablish that, under Assumption 3, the class of transformed unconditional momentrestrictions that characterize the null hypothesis satisfies a manageability condition(see Pollard, 1990). Using arguments from Hsu (2016), we then exhibit cases ofequality in Theorem 2 (ii), showing that, under mild additional regularity conditions,the test has asymptotically exact size (when letting η tend to zero). Finally, Theorem2 (iii), which is based on Theorem 6.1 in AS, shows that the test is consistent overfixed alternatives. 19 xtension to account for aggregate shocks This testing procedure can be easilymodified to accommodate unanticipated aggregate shocks. Specifically, using thenotation defined in Section 2.2.3, we consider the same test as above after replacing (cid:101) Y by (cid:101) Y (cid:98) c = Dq ( Y, (cid:98) c ) + (1 − D ) ψ , where (cid:98) c denotes a consistent estimator of c . Theresulting test is given by ϕ n,α, (cid:98) c = 1l (cid:8) T ( (cid:98) c ) > c ∗ n,α (cid:9) (where T ( (cid:98) c ) is obtained by replacing (cid:101) Y by (cid:101) Y (cid:98) c in the original test statistic). Such tests have the same properties as thoseabove under some mild regularity conditions on q ( · , · ), which hold in particular for theleading examples of additive and multiplicative shocks ( q ( y, c ) = y − c and q ( y, c ) = y/c ). We refer the reader to Appendix A.1 for a detailed discussion of this extension. In the following we study the finite sample performances of the test without covariatesthrough Monte Carlo simulations. The finite sample performances of the version ofour test that accounts for covariates are reported and discussed in Appendix E.We suppose that the outcome Y is given by Y = ρψ + ε, with ρ ∈ [0 , ψ ∼ N (0 ,
1) and ε = ζ ( − { U ≤ . } + 1l { U ≥ . } ) , where ζ , U and ψ are mutually independent, ζ ∼ N (2 , .
1) and U ∼ U [0 , E ( Y | ψ ) = ρψ and expectations are rational if and only if ρ = 1. But since weobserve Y and ψ in two different datasets, there are values of ρ (cid:54) = 1 for which our testcannot reject the null hypothesis. More precisely, we can show that as the samplesize n grows to infinity, we reject the null if and only if ρ ≤ ρ ∗ (cid:39) . E ( Y ) = E ( ψ ) always fails to rejectRE, while the RE test based on variances is only able to detect a subset of violationsof RE that correspond to ρ < . b , κ , (cid:15) and η (seeSection 3 for definitions). As mentioned in Section 3, we set (cid:15) = 0 .
05 and η =10 − , following Andrews and Shi (2017). Andrews and Shi (2013) show that there20xists in practice a large range of admissible values for the other tuning parametersparameters. Regarding b and κ , we follow Beare and Shi (2019, Section 4.2) andcompute, for a grid of candidate parameters, the rejection rate under the null andunder one alternative (namely, ρ = 0 . b , κ ) so as to maximize the power subject to the constraint that the rejectionrate under the null is below the nominal size 0.05. That way, we obtain b = 0 . κ = 0 . p has a distinct effect, in that its choice does notaffect size, at least asymptotically. Rather, this parameter selects to what extentthe test aims power at the equality constraint E ( Y − ψ ) = 0 versus the inequalities E [( y − Y ) + − ( y − ψ ) + ] ≥ y ∈ R ). Setting p to 0 .
05 leads to slightly higher powerin our DGP, but values of p in [0 , .
31] provide similar finite sample performances,with power always greater than 90% of the maximal power.Results reported in Figure 1 show the power curves of the test ϕ α for five differ-ent sample sizes ( n Y = n ψ = n ∈ { , , , } ) as a function ofthe parameter ρ , using 800 simulations for each value of ρ . We use 500 bootstrapsimulations to compute the critical values of the test.Several remarks are in order. First, as expected, under the alternative (i.e. forvalues of ρ ≤ ρ ∗ = 0 . n . Inparticular, for the largest sample size n = 3 , ρ as large as .45. Second, in this setting, our testis conservative in the sense that rejection frequencies under the null are smaller than α = 0 .
05, for all sample sizes. This should not necessarily come as a surprise since thetest proposed by AS has been shown to be conservative in alternative finite-samplesettings (see, e.g.
Table 1 p.22 in AS for the case of first-order stochastic dominancetests). However, for the version of our test that accounts for covariates and for thedata generating process considered in Appendix E, rejection frequencies under thenull are very close to the nominal level. 21 otes: The vertical line at ρ (cid:39) .
616 corresponds to the theoretical limit for the rejectionof the null hypothesis using our test. The dotted horizontal line corresponds to the 5%level.
Figure 1: Power curves.
Using the tests developed in Section 3, we now investigate whether household headsform rational expectations on their future earnings. We use for this purpose data fromthe Survey of Consumer Expectations (SCE), a monthly household survey that hasbeen conducted by the Federal Reserve Bank of New York since 2012 (see Armantier,Topa, Van der Klaauw and Zafar, 2017, for a detailed description of the survey,and Kuchler and Zafar, 2019; Conlon, Philossoph, Wiswall and Zafar, 2018; Fuster,Kaplan and Zafar, 2020 for recent articles using the SCE). The SCE is conductedwith the primary goal of eliciting consumer expectations about inflation, householdfinance, labor market, as well as housing market. It is a rotating internet-based panelof about 1,200 household heads, in which respondents participate for up to twelvemonths. Each month, the panel consists of about 180 entrants, and 1,100 repeatedrespondents. While entrants are overall fairly similar to the repeated respondents, Each survey takes on average about fifteen minutes to complete, and respondents are paid $ ψ ) over the next four months: “What do you believe your annualearnings will be in four months?”. Implicit throughout the rest of our analysis is theassumption that these elicited beliefs correspond to the mean of the subjective beliefsdistribution. In this module, respondents are also asked about current job outcomes,including their current annual earnings ( Y ), through the following question: “Howmuch do you make before taxes and other deductions at your [main/current] job, onan annual basis?”.Specifically, we use for our baseline test the elicited earnings expectations ( ψ ), whichare available for two cross-sectional samples of household heads who were workingeither full-time or part-time at the time of the survey, and responded to the labormarket module in March 2015 and July 2015 respectively. We combine this data withcurrent earnings ( Y ) declared in July 2015 and November 2015 by the respondentswho are working full-time or part-time at the time of the survey. This leaves uswith a final sample of 2,993 observations, which is composed of 1,565 earnings expec-tation observations, and 1,428 realized earnings observations. 51% (1,536) of theseobservations correspond to the sub-sample of respondents who are reinterviewed atleast once. We refer to Table 1 for additional details on our sample. This assumption, while often made in the subjective expectations literature, is a priori restric-tive. In this application, for the vast majority of the sub-groups of the population, the mean of ψ cannot be statistically distinguished from the one of Y (see Table 2 below). This provides empiricalsupport for this assumption. Throughout our analysis (with the exception of the number of observations reported in Table 2)we use the monthly survey weights of the SCE in order to obtain an estimation sample that isrepresentative of the population of U.S. household heads. See Armantier et al. (2017) for more detailson the construction of these weights. We also Winsorize the top 5 percentile of the distributions ofrealized earnings and earnings beliefs.
Mean Std. dev.Male 0.53 0.50White 0.74 0.43College degree 0.49 0.46Low numeracy 0.33 0.47Tenure ≤ ψ (Earnings beliefs) $ $ Y (Realized earnings) $ $ We summarize how we implemented the test in practice, either on the overall sampleor on each subsample corresponding to the binary covariates in Table 1. For eachcase, we start by winsorizing the distribution of realized earnings ( Y ) and earningsbeliefs ( ψ ) at the 95% level. Then, we perform the test without covariates, wherewe allow for multiplicative aggregate shock and thus test H S , with q ( y ; c ) = y/c . Then, we use the function test of our companion R package RationalExp. Wechoose the same values for the tuning parameters b = 0 . κ = 0 .
001 as in theMonte-Carlo simulations in Section 4. We also set p = 0 . (cid:15) = 0 .
05, and η = 10 − .Following Andrews and Shi (2017), the interval (cid:98) Y is approximated by a grid of length100 from min i =1 ,...,n (cid:101) Y i to max i =1 ,...,n (cid:101) Y i . Finally, we use 5,000 bootstrap simulations to computethe critical values of the test. In Table 2 below, we report the results from the naive test of RE ( E ( Y ) = E ( ψ )), andour preferred test (“Full RE”), where we allow for multiplicative aggregate shocks.We implement the tests both on the overall population and on separate subgroups. We show in Table 4 in Appendix F.1 that our results are robust to other levels of Winsorization. In our application, the parameter c is estimated using survey weights from the SCE. See Section 3 in our user’s guide (D’Haultfœuille et al., 2018 a ) for details on this function. Third, the results from our test point to beliefs formation being heterogeneous acrossschooling (college degree vs. no college degree) and tenure (more or less than 6months spent in current job) levels. In particular, we cannot rule out that the beliefsabout future earnings of individuals with more schooling experience correspond torational expectations with respect to some information set. Similarly, while we rejectRE at any standard level for the subgroup of workers who have accumulated lessthan 6 months of experience in their current job, we can only marginally reject atthe 10% level RE for those who have been in their current job for a longer periodof time. As such, these findings complement some of the recent evidence from theeconomics of education and labor economics literatures that individuals have moreaccurate beliefs about their ability as they progress through their schooling and workcareers (see, e.g., Stinebrickner and Stinebrickner, 2012; Arcidiacono, Aucejo, Maureland Ransom, 2016). Respondents’ numeracy is evaluated in the SCE through five questions involving computationof sales, interests on savings, chance of winning lottery, of getting a disease and being affected bya viral infection. Respondents are then partitioned into two categories: “High numeracy” (4 or 5correct answers), and “low numeracy” (3 or fewer correct answers). E ( Y − ψ ) / E ( Y ) Naive RE Variance RE Full RE Number of obs.(p-val) (p-val) (p-val) ψ Y All 0.034 0.23 0.71 < .
001 1,565 1,428Women 0.059 0.13 0.62 < .
001 730 649Men 0.025 0.48 0.58 0.210 835 779White 0.032 0.31 0.67 0.021 1,200 1,097Minorities 0.046 0.43 0.60 < .
006 365 331College degree -0.001 0.96 0.50 0.130 1,106 1,053No college degree 0.093 0.04 0.57 0.013 459 375High numeracy 0.033 0.28 0.62 0.012 1,158 1,070Low numeracy 0.055 0.27 0.58 0.022 407 358Tenure ≤ < .
001 271 180Tenure > .
091 1,294 1,248Notes: “Naive RE” denotes the naive RE test of equality of means between Y and ψ .“Variance RE” denotes the variance RE test where the null hypothesis is the variance of Y being greater or equal than the variance of ψ , once we account for aggregate, multiplicativeshocks. “Full RE” denotes the test without covariates, where we test H S with q ( y, c ) = y/c . We use 5,000 bootstrap simulations to compute the critical values of the Full REtest. Distributions of realized earnings ( Y ) and earnings beliefs ( ψ ) are both Winsorizedat the 95% quantile. Fourth, using the naive test of equality of means between earnings beliefs and realiza-tions, one would instead generally not reject the null at any standard levels. The oneexception is the subgroup of workers without a college degree, for whom the naivetest yields rejection of RE at the 5% level. But, as discussed before, one cannot ruleout that such a rejection is due to aggregate shocks.Even though individuals in the overall sample form expectations over their earningsin the near future that are realistic, in the sense of not being significantly biased, theresult from our preferred test shows that earnings expectations are nonetheless notrational. Taken together, these findings highlight the importance of incorporatingthe additional restrictions of rational expectations that are embedded in our test,using the distributions of subjective beliefs and realized outcomes to detect violationsof rational expectations. That the variance test of RE never rejects the null at26ny standard levels indicates that it is important in practice to go beyond the firstmoments, and exploit instead the full distributions of beliefs and outcomes to detectdepartures from rational expectations. These results also suggest that, in order torationalize the realized and expected earnings data, one should consider alternativemodels of expectation formation that primarily differ from RE in their third, orhigher-order moments.The results of the direct test of RE on the subsample of individuals who are followedover four months are reported in Table 3 below. While these results generally paint asimilar picture to the results of our test, there are some differences. In particular, thedirect test rejects RE at the 5% level for men and at 1% for individuals with tenuregreater than 6 months, whereas we do not reject RE for the former group and onlymarginally so, at the 10% level, for the latter. The direct test also rejects with lesspower than our test for certain groups (low numeracy, tenure lower than 6 months,and minorities). This lower power may seem surprising given that the direct test canexploit the joint distribution of (
Y, ψ ), but is simply due to the important reductionin sample size when focusing on the subsample of individuals who are followed overfour months results.There are also important issues associated with the direct test, which generally war-rant caution when interpreting the results from this test. Most importantly, as alreadydiscussed in Section 2.2.4, the direct test is not robust to measurement errors on thesubjective beliefs ψ . As shown in Proposition 7 in Appendix B, it is however possibleto derive a restriction on β under RE. Specifically, if ξ ψ is positively correlated with ε + ξ Y , we have, under RE, β ≥ −
11 + λ , (3)where λ is a lower bound on the signal-to-noise ratio V ( ψ ) / V ( ξ ψ ). Table 3 also reportsthe results of tests combining (3) with the restrictions on the marginal distributionsused in our full RE test. Adding the restriction (3) does not change the resultsfor values of signal-to-noise ratio between 5 and 20 (i.e., for noise-to-signal ratiosbetween 5% and 20%). Overall, using the subsample of linked data ( Y, ψ ) throughthis additional restriction does not add much to our test, at least once we account forpossible measurement errors on the elicited beliefs. Another significant concern withthe direct test, and, more generally, the use of linked data on (
Y, ψ ), is that attrition27ay be endogenous. We discuss this issue in more details in Appendix F.2.Table 3: Direct test, our test, and combined test of RE on annual earnings β Direct test Full RE Combined test Number of obs.Bound on signal/noise λ β ψ Y ( ψ, Y )All 0.954 0 . < . < . < .
001 1,565 1,428 768Women 0.956 0 . < . < . < .
001 730 649 356Men 0.960 0 .
021 0.210 0.276 0.276 835 779 412White 0.963 0 .
004 0.021 0.019 0 .
010 1,200 1,097 596Minorities 0.928 0 .
010 0.006 0.007 0.005 365 331 172College degree 0.974 0 .
060 0.130 0.182 0.182 1,106 1,053 560No college degree 0.954 0 .
044 0.013 0.017 0.017 459 375 208High numeracy 0.959 0 .
001 0.012 0.016 0.016 1,158 1,070 573Low numeracy 0.954 0 .
094 0.022 0.030 0.030 407 358 195Tenure ≤ .
015 0 .
001 0.002 0.001 271 180 98Tenure > .
001 0 .
091 0.094 0.094 1,294 1,248 670Notes: “Direct test” denotes the direct test of RE when ( ψ, Y ) is observed. β is the coefficient of theregression of Y on ψ in that case. “Full RE” denotes the test without covariates, where we test H S with q ( y, c ) = y/c . We use 5,000 bootstrap simulations to compute the critical values of the Full REtest. “Combined RE test” denotes the test without covariates, where we test H S with q ( y, c ) = y/c ,which is the “Full RE” test, combined with the additional restriction β ≥ − / (1 + λ ), where λ is ana priori bound on the signal-to-noise ratio. Distributions of realized earnings ( Y ) and earnings beliefs( ψ ) are both Winsorized at the 95% quantile. Coming back to our test, the rejection of RE for the overall population but also formost of the subpopulations are, in view of Proposition 4, unlikely to be due to dataquality issues. In that sense, these results may be seen as robust evidence against theRE hypothesis for individual earnings, at least in this context. As a result, conclusionsof behavioral models based on the assumption that agents form rational expectationsabout their future earnings may be misleading. Exploring this important questionrequires one to go beyond testing though, by quantifying the extent to which modelpredictions are actually sensitive to the violations from rational expectations thathave been detected with our test. We investigate this issue in D’Haultfœuille et al.(2018 b ) in the context of a life-cycle consumption model.28 Conclusion
In this paper, we develop a new test of rational expectations that can be used ina broad range of empirical settings. In particular, our test only requires havingaccess to the marginal distributions of realizations and subjective beliefs. As such,it can be applied in frequent cases where realizations and beliefs are observed intwo separate datasets, or only observed for a selected sub-population. By bypassingthe need to link beliefs to future realizations, our approach also enables to test forrational expectations without having to wait until the outcomes of interest are realizedand made available to researchers. We establish that whether one can rationalizerational expectations is equivalent to the distribution of realizations being a mean-preserving spread of the distribution of beliefs, a condition which can be tested usingrecent tools from the moment inequalities literature. We show that our test caneasily accommodate covariates and aggregate shocks, and, importantly for practicalpurpose, is robust to some degree of measurement errors on the elicited beliefs. Weapply our method to test for rational expectations about future earnings, using datafrom the Survey of Consumer Expectations. While individuals tend to be right onaverage about their future earnings, our test strongly rejects rational expectations.Beyond testing, in this application as in any other situations where rational expec-tations are violated, a natural next step is to evaluate the deviations from rationalexpectations that one can rationalize from the available data. In the context of struc-tural analysis, a central question then becomes to which extent the main predictionsof the model are sensitive to those departures from rational expectations. We ex-plore this important issue and propose in D’Haultfœuille et al. (2018 b ) a tractablesensitivity analysis framework on the assumed form of expectations.29 eferences Aguirregabiria, V. and Mira, P. (2010), ‘Dynamic discrete choice structural models:A survey’,
Journal of Econometrics , 38–67.Andrews, D. (1994), ‘Empirical process methods in econometrics’,
Handbook of econo-metrics , 2247–2294.Andrews, D. and Shi, X. (2013), ‘Inference based on conditional moment inequalities’, Econometrica (2), 609–666.Andrews, D. and Shi, X. (2017), ‘Inference based on many conditional moment in-equalities’, Journal of Econometrics (2), 275–287.Arcidiacono, P., Aucejo, E., Maurel, A. and Ransom, T. (2016), College attrition andthe dynamics of information revelation. NBER Working Paper No. 22325.Arcidiacono, P., Hotz, J. and Kang, S. (2012), ‘Modeling college major choices usingelicited measures of expectations and counterfactuals’,
Journal of Econometrics (1), 3–16.Arcidiacono, P., Hotz, J. V., Maurel, A. and Romano, T. (2014), Recovering ex antereturns and preferences for occupations using subjective expectations data. NBERWorking Paper No. 20626.Armantier, O., Topa, G., Van der Klaauw, W. and Zafar, B. (2017), ‘An overview ofthe survey of consumer expectations’,
Economic Policy Review (2), 51–72.Beare, B. and Shi, X. (2019), ‘An improved bootstrap test of density ratio ordering’, Econometrics and Statistics , 9–26.Bertanha, M. and Moreira, M. J. (2020), ‘Impossible inference in econometrics: The-ory and applications’, Journal of Econometrics , 247–270.Biroli, P., Boneva, T., Raja, A. and Rauh, C. (2020), ‘Parental beliefs about returnsto child health investments’,
Journal of Econometrics
Forthcoming .Blundell, R. (2017), ‘What have we learned from structural models?’,
American Eco-nomic Review: Papers and Proceedings (5), 287–292.30oneva, T. and Rauh, C. (2018), ‘Parental beliefs about returns to educational in-vestments - the later the better?’,
Journal of the European Economic Association (6), 1669–1711.Bound, J., Brown, C. and Mathiowetz, N. (2001), ‘Measurement error in survey data’, Handbook of Econometrics , 3705–3843.Buchinsky, M., Li, F. and Liao, Z. (2019), ‘Estimation and inference of semiparametricmodels using data from several sources’, Journal of Econometrics
Forthcoming .Chen, X., Linton, O. and Van Keilegom, I. (2003), ‘Estimation of semiparametricmodels when the criterion function is not smooth’,
Econometrica (5), 1591–1608.Conlon, J. J., Philossoph, L., Wiswall, M. and Zafar, B. (2018), Labor market searchwith imperfect information and learning. NBER Working Paper No. 24988.Cross, P. J. and Manski, C. F. (2002), ‘Regressions, short and long’, Econometrica (1), 357–368.Cunha, F. and Heckman, J. J. (2007), ‘Identifying and estimating the distributionsof ex post and ex ante returns to schooling’, Labour Economics (6), 870–93.Davydov, Y. A., Lifshits, M. A. and Smorodina, N. V. (1998), Local properties ofdistributions of stochastic functionals , American Mathematical Society.de Paula, A., Shapira, G. and Todd, P. E. (2014), ‘How beliefs about hiv statusaffect risky behaviors: Evidence from malawi’,
Journal of Applied Econometrics (6), 944–964.Delavande, A. (2008), ‘Pill, patch, or shot? subjective expectations and birth controlchoice’, International Economic Review (3), 999–1042.D’Haultfœuille, X., Gaillac, C. and Maurel, A. (2018 a ), Rationalexp: Tests of anddeviations from rational expectations. Working paper.D’Haultfœuille, X., Gaillac, C. and Maurel, A. (2018 b ), Rationalizing rational expec-tations? Tests and deviations. NBER Working Paper No. 25274.31an, Y., Sherman, R. and Shum, M. (2014), ‘Identifying treatment effects under datacombination’, Econometrica (2), 811–822.Fuster, A., Kaplan, G. and Zafar, B. (2020), ‘What would you do with $ Review of Economic Studies
Forth-coming .Gennaioli, N., Ma, Y. and Shleifer, A. (2016), ‘Expectations and investment’,
NBERMacroeconomics Annual (1), 379–431.Gourieroux, C. and Pradel, J. (1986), ‘Direct test of the rational expectation hypoth-esis’, European Economic Review (2), 265–284.Gozlan, N., Roberto, C., Samson, P.-M., Shu, Y. and Tetali, P. (2018), ‘Character-ization of a class of weak transport-entropy inequalities on the line’, Annales del’IHP (3), 1667–1693.Gutknecht, D., Hoderlein, S. and Peters, M. (2018), ‘Constrained information pro-cessing and individual income expectations’. Working paper.Hoffman, M. and Burks, S. V. (2020), ‘Worker overconfidence: Field evidence andimplications for employee turnover and returns from training’, Quantitative Eco-nomics , 315–348.Hsu, Y.-C. (2016), ‘Consistent tests for conditional treatment effects’, The Econo-metrics Journal (1), 1–22.Ivaldi, M. (1992), ‘Survey evidence on the rationality of expectations’, Journal ofApplied Econometrics (3), 225–241.Kuchler, T. and Zafar, B. (2019), ‘Personal experiences and expectations about ag-gregate outcomes’, Journal of Finance , 2491–2542.Linton, O., Song, K. and Whang, Y.-J. (2010), ‘An improved bootstrap test ofstochastic dominance’, Journal of Econometrics , 186–202.Lovell, M. C. (1986), ‘Tests of the rational expectations hypothesis’,
American Eco-nomic Review (1), 110–124. 32anski, C. (2004), ‘Measuring expectations’, Econometrica (5), 1329–1376.Manski, C. and Molinari, F. (2010), ‘Rounding probabilistic expectations in surveys’, Journal of Business & Economic Statistics (2), 219–231.Molinari, F. and Peski, M. (2006), ‘Generalization of a result on “regressions, shortand long”’, Econometric Theory (1), 159–163.Mulansky, B. and Neamtu, M. (1998), ‘Interpolation and approximation from convexsets’, Journal of approximation theory (1), 82–100.Muth, J. F. (1961), ‘Rational expectations and the theory of price movements’, Econo-metrica (3), 315–335.Patton, A. J. and Timmermann, A. (2012), ‘Forecast rationality tests based on multi-horizon bounds’, Journal of Business & Economic Statistics (1), 1–17.Pesaran, M. H. (1987), The limits to rational expectations , Basil Blackwell.Pollard, D. (1990), Empirical processes: theory and applications, in ‘NSF-CBMSregional conference series in probability and statistics’, Institute of MathematicalStatistics and the American Statistical Association, pp. i–86.Ridder, G. and Moffitt, R. (2007), ‘The econometrics of data combination’, Handbookof Econometrics , 5469–5547.Rothschild, M. and Stiglitz, J. (1970), ‘Increasing risk: I. a definition’, Journal ofEconomic Theory (3), 225–243.Stinebrickner, R. and Stinebrickner, T. (2012), ‘Learning about academic ability andthe college dropout decision’, Journal of Labor Economics (4), 707–748.Stinebrickner, R. and Stinebrickner, T. (2014 a ), ‘Academic performance and collegedropout: Using longitudinal expectations data to estimate a learning model’, Jour-nal of Labor Economics (3), 601–644.Stinebrickner, R. and Stinebrickner, T. (2014 b ), ‘A major in science? initial beliefsand final outcomes for college major and dropout’, The Review of Economic Studies (1), 426–472. 33trassen, V. (1965), ‘The existence of probability measures with given marginals’, The Annals of Mathematical Statistics (2), 423–439.Van der Klaauw, W. (2012), ‘On the use of expectations data in estimating structuraldynamic choice models’, Journal of Labor Economics (3), 521–554.Van der Klaauw, W. and Wolpin, K. I. (2008), ‘Social security and the retirementand savings behavior of low-income households’, Journal of Econometrics (1-2), 21–42.Van der Vaart, A. (2000),
Asymptotic statistics , Cambridge University Press.Van der Vaart, A. and Wellner, J. (1996), Weak convergence, in ‘Weak Convergenceand Empirical Processes’, Springer, pp. 16–28.Wiswall, M. and Zafar, B. (2015), ‘Determinants of college major choice: Identifica-tion using an information experiment’, The Review of Economic Studies (2), 791–824.Zafar, B. (2011 a ), ‘Can subjective expectations data be used in choice models? evi-dence on cognitive biases’, Journal of Applied Econometrics (3), 520–544.Zafar, B. (2011 b ), ‘How do college students form expectations?’, Journal of LaborEconomics (2), 301–348. 34 Aggregate shocks
A.1 Statistical tests in the presence of aggregate shocks
In this appendix, we show how to adapt the construction of the test statistic and ob-tain similar results as in Theorem 2 in the presence of aggregate shocks. As explainedin Section 2.2.3, we mostly have to replace (cid:101) Y by (cid:101) Y c = Dq (cid:16) (cid:101) Y , c (cid:17) + (1 − D ) ψ . Becausewe include covariates here, as in Section 3, c is actually a function of X . Also, thetrue function c has to be estimated. We let (cid:98) c denote such a nonparametric estimator,which is based on E [ q ( Y, c ( X )) | X ] = E [ ψ | X ]. When q ( y, c ) = y − c or q ( y, c ) = y/c ,we get respectively c ( X ) = E ( Y | X ) − E ( ψ | X ) and c ( X ) = E ( Y | X ) / E ( ψ | X ), and (cid:98) c is easy to compute using nonparametric estimators of E ( Y | X ) and E ( ψ | X ).Because in Proposition 3 (ii) we do not test for a moment equality anymore, m (cid:16) D i , (cid:101) Y i , X i , g, y (cid:17) reduces to m (cid:16) D i , (cid:101) Y c,i , X i , g, y (cid:17) . We let hereafter m n ( g, y ) = (cid:80) ni =1 m (cid:16) D i , (cid:101) Y c,i , X i , g, y (cid:17) /n .In the test statistic T , we replace, for ( y, g ) ∈ Y × ∪ r ≥ G r , Σ n ( g, y ) by Σ n ( g, y ) = (cid:98) Σ n ( g, y ) + (cid:15) Diag (cid:16)(cid:98) V (cid:16) (cid:101) Y ˆ c (cid:17) , (cid:98) V (cid:16) (cid:101) Y ˆ c (cid:17)(cid:17) , where (cid:98) Σ n ( g, y ) and (cid:98) V (cid:16) (cid:101) Y ˆ c (cid:17) are respectively thesample covariance matrix of √ nm n ( g, y ) and the empirical variance of (cid:101) Y ˆ c . The lastdifference with the test considered in Section 3 is that when using the bootstrap tocompute the critical value, we also have to re-estimate c in the bootstrap sample.We obtain in this context a result similar to Theorem 2 above, under the regularityconditions stated in Assumption 4. We let hereafter C s (cid:0) [0 , d X (cid:1) denote the spaceof continuously differentiable functions of order s on [0 , d X that have a finite norm (cid:107) c (cid:107) s, ∞ = max | k |≤ s sup x ∈ [0 , dX (cid:12)(cid:12) c ( k ) ( x ) (cid:12)(cid:12) . We also let, for any function f on a set G , (cid:107) f (cid:107) G =sup x ∈G | f ( x ) | . Finally, when the distribution of (cid:16) D, (cid:101) Y , X (cid:17) is F , K F denotes theasymptotic covariance kernel of n − / Diag (cid:16) V (cid:16) (cid:101) Y c (cid:17)(cid:17) − / m . Assumption 4 (i) (cid:98) c and c belong to C s (cid:0) [0 , d X (cid:1) , with s ≥ d X . Moreover, (cid:107) (cid:98) c − c (cid:107) [0 , dX = o P (1) .(ii) For all y ∈ Y , q is Lipschitz on Y × [ − C, C ] for some C > (cid:107) c (cid:107) [0 , dX . Moreover, sup ( y,c ) ∈Y× [ − C,C ] | q ( y, c ) | ≤ M ;(iii) For all c ∈ R , the function q ( · , c ) : Y → Y is bijective and its inverse q I ( · , c ) isLipschitz on Y ; iv) F ψ | X ( ·| x ) , F Y | X ( ·| x ) are Lipschitz on Y uniformly in x ∈ [0 , d X with constants Q F, satisfying sup F ∈F Q F, ≤ Q < ∞ . Also, F q ( ψ,c ( X )) , F q ( Y,c ( X )) are Lipschitzon [ − M , M ] with constants Q F, satisfying sup F ∈F Q F, ≤ Q < ∞ ;(iv) inf F ∈F V F (cid:104) (cid:101) Y c (cid:105) > and (cid:15) ≤ inf F ∈F E F [ D ] ≤ sup F ∈F E F [ D ] ≤ − (cid:15) for some ε ∈ (0 , / . Also, (cid:98) V F (cid:104) (cid:101) Y (cid:98) c (cid:105) is a consistent estimator of V F (cid:104) (cid:101) Y c (cid:105) . Part (i) imposes some regularity conditions on c and its nonparametric estimator (cid:98) c .It is possible to check such regularity conditions on (cid:98) c with kernel or series estimatorsof E ( Y | X ) and E ( ψ | X ). Parts (ii) and (iii) also hold when q ( y, c ) = y − c and q ( y, c ) = q ( y ) /c , by imposing in the second case that c belongs to a compact subset of(0 , ∞ ). Proposition 5 shows that under these conditions, the test has asymptoticallycorrect size. Proposition 5
Suppose that r n → ∞ and that Assumptions 3 and 4 hold. Then (i)in Proposition 2 holds, replacing ϕ n,α by ϕ n,α, (cid:98) c . Results like (ii) and (iii) in Proposition 2 could also be obtained under the conditionsof Proposition 5, modifying directly the proof of Proposition 2.
A.2 Impossibility results with more flexible effects of aggre-gate shocks
We show here that restrictions in the way aggregate shocks affect the outcome areneeded to be able to reject RE with F Y and F ψ . We consider for that purpose thefollowing model: Y = K (cid:88) k =0 C k V k + ε, (4)where V is I -measurable and the individual shock ε satisfies E [ ε |I ] = 0. The vector C := ( C , ..., C K ) (cid:48) represents aggregate shocks, which is assumed to be independentof I , with support R K +1 . We also assume that E ( C ) = (0 , , , ..., (cid:48) , so that V = E [ Y |I ] and under RE, ψ = V . Let Q c ( y ) = (cid:80) Kk =0 c k y k . Then E ( Y | C = c, I ) = Q c ( V )and under RE, we have E ( Y | C = c, I ) = Q c ( ψ ) . SK : there exist random variables ( Y (cid:48) , ψ (cid:48) ) , a sigma-algebra I (cid:48) and c ∈ R K +1 such that σ ( ψ (cid:48) ) ⊂ I (cid:48) , Y (cid:48) ∼ Y, ψ (cid:48) ∼ ψ and E [ Y (cid:48) |I (cid:48) ] = Q c ( ψ (cid:48) ) . The following proposition is a negative result on the possibility to test for H SK . Proposition 6
Suppose that F Y and F ψ are continuous with supports that are boundedintervals. For any η > , there exists K > and F , with sup u ∈ R | F ( u ) − F ψ ( u ) | < η ,such that H SK holds with Y and (cid:101) ψ ∼ F (instead of ψ ). Proposition 6 states that as K grows large, the set of cdfs F Y and F ψ satisfyingH SK (and thus RE in Model (4)) becomes arbitrarily close, for the Kolmogorov-Smirnov metric, to the set of of cdfs F Y and F ψ that do not satisfy H SK . In otherwords, ∪ K ∈ N H SK is dense in the set of all continuous cdfs having bounded intervalas supports. When combined with Theorem 2 in Bertanha and Moreira (2020), thisimplies that there does not exist any almost-surely continuous test of ∪ K ∈ N H SK thathas non-trivial power.A similar, negative result holds if aggregate shocks are allowed to vary with respect tounobserved, individual-specific variables. For instance, shocks may be sector-specific,but sectors may be unobserved in the data. To show such an impossibility result,consider the following model: Y = q ( C, U ) + V + ε, where both U and V are I− measurable, C is an aggregate shock independent of I andthe individual shock ε satisfies E [ ε |I ] = 0. Thus, aggregate shocks affect the outcomein an additive way, but heterogeneously across individuals, depending on their U ,which is assumed to be unobserved by the econometrician and can thus depend on V in a flexible way. We assume without loss of generality that E [ q ( C, U ) |I ] = 0, sothat ψ = V under RE. Let us also assume that q ( u, c ) = (cid:80) Kk =0 c k u k and U = ξV ,with ξ > ξ ⊥⊥ V and E [ ξ k ] < ∞ for all k ≤ K . Let C (cid:48) k = E [ ξ k ] C k if k (cid:54) = 1, C (cid:48) = E [ ξ ] C − C (cid:48) = ( C (cid:48) , ..., C (cid:48) K ) (cid:48) . Then, under RE, E [ Y | C (cid:48) = c (cid:48) , I ] = K (cid:88) k =0 c (cid:48) k ψ k . C ) = R K +1 , we also have Supp( C (cid:48) ) = R K +1 , and no constraint isimposed on c (cid:48) . As a result, we are led again to test H SK , and the same negativeresult as above holds. B Tests based on linear regressions with measure-ment errors
We suppose here to observe both ( (cid:98)
Y , (cid:98) ψ ) satisfying (1). In this framework, we studythe restrictions that RE entail on the coefficient β of the (theoretical) linear regressionof (cid:98) Y on (cid:98) ψ . Proposition 7
1. For any values of ( V ( (cid:98) Y ) , V ( (cid:98) ψ ) , Cov ( (cid:98) Y , (cid:98) ψ )) such that V ( (cid:98) Y ) > V ( (cid:98) ψ ) , there exists a DGP compatible with this triple, satisfying (1) , for whichRE hold and such that ε + ξ Y ⊥⊥ ψ and F ξ ψ dominates at the second order F ξ Y + ε .2. If β < − / (1 + λ ) for some λ ≥ , there exists no DGP compatible with thisvalue of β , satisfying (1) , for which RE hold and such that corr ( ξ ψ , ξ Y + ε ) ≥ and V ( ψ ) / V ( ξ ψ ) ≥ λ . The first result is a negative one. It implies that without further restrictions thanthose already imposed in Proposition 4, the regression of (cid:98) Y on (cid:98) ψ does not bring anyadditional restriction related to RE. The second result, on the other hand, showsthat if one assumes a positive correlation between ξ ψ and ξ Y + ε and a lower boundon the signal-to-noise ratio V ( ψ ) / V ( ξ ψ ), then β is bounded from below under RE.The restriction corr( ξ ψ , ξ Y + ε ) ≥ ε cannot be anticipated, it is natural to assume that corr( ξ ψ , ε ) = 0. It then followsthat the assumption corr( ξ ψ , ξ Y + ε ) ≥ Y and ψ are positively correlated. This would typically happen, for instance, if individualsreport their expectations and realized earnings omitting in both cases some compo-nents of their earnings, or if they instead overstate their realized earnings, and theirexpectations accordingly. E [ q ( C, U ) |I ] = 0 implies that E [ C k ] = 0 for k = 0 , ..., K , but it does not restrict the set ofpossible c (cid:48) k . (cid:98) Y on (cid:98) ψ , since this regressionhas been very often used to test for RE. This means, however, that there may inprinciple be additional restrictions on the joint distribution of ( (cid:98) Y , (cid:98) ψ ) implied by RE. C Tests with rounding practices
We have considered in Section 2.2.4 the possibility of measurement errors on ψ . An-other source of uncertainty on ψ is rounding. Rounding practices by intervieweesare common. A way to interpret these practices is that in situations of ambiguity,individuals may only be able to bound the distribution of their future outcome Y (Manski, 2004). If individuals round at 5% levels, for instance, an answer ψ = 0 . ψ ∈ [0 . , . ψ are observed is when ques-tions to elicit subjective expectations take the following form: “What do you thinkis the percent chance that your own [ Y ] will be below [ y ]?”, for a certain grid of y . If 0 and 100 are always observed, or if we assume that the support of subjectivedistributions is included in [ y, y ], we can still compute bounds on ψ . In such cases,we only observe ( ψ L , ψ U ), with ψ L ≤ ψ ≤ ψ U . For a thorough discussion of this issue,and especially of how to infer rounding practices, see Manski and Molinari (2010).In this setting, rationalizing rational expectations is less stringent than in our baselineset-up since the constraints on the distribution of ψ are weaker. Formally, the nullhypothesis takes the following form:H B : ∃ ( Y (cid:48) , ψ (cid:48) , I (cid:48) ) : σ ( ψ (cid:48) ) ⊂ I (cid:48) , Y (cid:48) ∼ Y, F ψ U ≤ F ψ (cid:48) ≤ F ψ L and E ( Y (cid:48) |I (cid:48) ) = ψ (cid:48) . To obtain an equivalent formulation to H B , a natural idea would be to fix a candidatecdf F ∈ [ F ψ U , F ψ L ] for F ψ and apply Theorem 1 with this F . Then, letting ∆ F ( y ) = (cid:82) y −∞ F Y ( t ) − F ( t ) dt and δ F = E ( Y ) − (cid:82) udF ( u ), H B would hold as long as for some F ∈ [ F ψ U , F ψ L ], ∆ F ( y ) ≥ y ∈ R and δ F = 0. In practice though, directly checkingwhether such a distribution exists would be very difficult. Fortunately, we show inthe following proposition that it is in fact sufficient to check that these conditions Note however that in this case, our approach does not take into account all the information onthe subjective distribution. b ∈ R , the random variables ψ b = ψ U { ψ U < b } + max( b, ψ L )1l { ψ U ≥ b } . We also let ψ −∞ = ψ L and ψ ∞ = ψ U . The cdf of ψ b is then F b ( t ) = F ψ U ( t )1l { t
Suppose that Assumption 5 holds. First, if E [ ψ L ] ≤ E [ Y ] ≤ E [ ψ U ] ,there exists a unique F ∗ ∈ F B such that δ F ∗ = 0 . Second, the following statementsare equivalent:(i) H B holds.(ii) E [ ψ L ] ≤ E [ Y ] ≤ E [ ψ U ] and ∆ F ∗ ( y ) ≥ for all y ∈ R . This test shares some similarities with the test in the presence of aggregate shocks.Specifically, if E [ ψ L ] ≤ E [ Y ] ≤ E [ ψ U ], we first identify b ∈ R such that the candidatebelief ψ b , which plays a similar role as the modified outcome q ( Y, c ) in the test withaggregate shocks, satisfies the equality constraint E [ ψ b ] = E [ Y ]. Noting that theinequality ∆ F ∗ ( y ) ≥ E (cid:104) ( y − Y ) + − (cid:0) y − ψ b (cid:1) + (cid:105) ≥
0, it followsfrom (ii) that rationalizing RE in this context (i.e., H B ) is then equivalent to a setof many moment inequality constraints involving the distributions of realizations Y and candidate belief ψ b . D Tests with sample selection in the datasets
We consider here cases where the two samples are not representative of the samepopulation, or formally, D is not independent of ( Y, ψ ). This may arise for instancebecause of oversampling of some subpopulations or differences in nonresponse betweenthe two surveys that are used. We assume instead that selection is conditionallyexogenous, that is to say: D ⊥⊥ ( Y, ψ ) | X. (5)40e show how to use a propensity score weighting to handle such a selection. Denoteby p ( x ) = P ( D = 1 | X = x ) = E [ D | X = x ] the propensity score and by W ( X ) = Dp ( X ) − − D − p ( X ) . The law of iterated expectations combined with Proposition 2 directly yields thefollowing proposition:
Proposition 9
Suppose that (5) and Assumption 1 hold. Then H X is equivalent to E (cid:20) W ( X ) (cid:16) y − (cid:101) Y (cid:17) + (cid:12)(cid:12)(cid:12)(cid:12) X (cid:21) ≥ for all y ∈ R and E (cid:104) W ( X ) (cid:101) Y (cid:12)(cid:12)(cid:12) X (cid:105) = 0 . This proposition shows that under sample selection, we can build a statistical test ofH X akin to that developed in Section 3, by merely estimating nonparametrically p ( X ). We could consider for that purpose a series logit estimator, for instance.Validity of such a test would follow using very similar arguments as for the testwith aggregate shocks considered above. E Simulations with covariates
We consider here simulations including covariates. The DGP is similar to that con-sidered in Section 4. Specifically, we assume that Y = ρψ + √ Xε , with ρ ∈ [0 , ψ ∼ N (0 , X ∼ Beta(0 . ,
10) and ε = ζ ( − { U ≤ . } + 1l { U ≥ . } ) , where ζ ∼ N (2 , .
1) and U ∼ U [0 , ψ, ζ, U, X ) are supposed to be mutuallyindependent. Like in the test without covariates, we can show that the test withcovariates is able to reject RE if and only if ρ < . E [ Y | X ] = E [ ψ | X ], so the naive conditional test has no power. The test based on conditionalvariances rejects only if ρ < . X , ourtest has power only for ρ < .
52. Hence, relying on covariates allows us to gain powerfor ρ ∈ [0 . , . n ψ = n Y = n ∈ { , , , } , use 500 bootstrapsimulations to compute the critical value, and rely on 800 Monte-Carlo replicationsfor each value of ρ and n . We use the same parameters p = 0 .
05 and b = 0 . Notes: the dotted vertical lines correspond to the theoretical limit for the rejection of thenull hypothesis for test based on variance ( ρ (cid:39) . ρ (cid:39) . ρ = 0 . Figure 2: Power curves for the test with covariates.Figure 2 shows that the RE test with covariates asymptotically outperforms theRE test without covariates. The test exhibits a similar behavior as that withoutcovariates, though, as we could expect, the power converges less quickly to one as n tends to infinity. 42 Additional material on the application
F.1 Effect of the Winsorization on the RE test
Table 4: Full test of RE with different levels of Winsorization
Winsorization level 0.95 0.97 0.99(p-value) (p-value) (p-value)All < < < < ≤ > S with q ( y, c ) = y/c , using 5,000 bootstrap simula-tions to compute the critical values. Distributions of realized earnings( Y ) and earnings beliefs ( ψ ) are both Winsorized at either the 0.95,0.97, or 0.99 quantile. F.2 Possibly endogenous attrition in the survey
In addition to measurement errors, another potential issue when using the linkeddata (
Y, ψ ) is that attrition may be related to Y itself. This would create a sampleselection issue that would invalidate the direct test, even absent any measurementerrors. To explore this possibility, Table 5 below reports the estimation results froma logit model of attrition on earnings beliefs, gender, race/ethnicity, college degreeattainment, numeracy test score, tenure and a (linear) time trend. The main takeawayfrom this table is that earnings beliefs ψ are significantly associated with attrition,even after controlling for this extensive set of characteristics. This result suggeststhat individuals for whom we observe both earnings expectations and realizations arelikely to earn more than those who are not followed across the two waves. Alongthe same lines, a Kolmogorov-Smirnov test rejects at the 1% level the equality of the43istributions of realized earnings between the whole sample and the subsample thatwould be used for the direct test. Similarly, we reject the equality of the distributionsof expected earnings between these two samples. These results indicate that, inthis context, the direct RE test is likely to be misleading. Conversely, attrition isunlikely to be an issue with our test, since we use in each wave the observations ofall respondents. Table 5: Logit model of attrition
Intercept ψ Male White Coll. Degree Low Num. Tenure > ∗∗ -6.206e-06 ∗∗ ∗∗ -0.040(0.293) (1.621e-06) (0.138) (0.222) (0.139) (0.162) (0.164) (0.033)Notes: 1,565 observations. Significance levels: † : 10%, ∗ : 5%, ∗∗ : 1%. G Proofs
G.1 Notation and preliminaries
For any set G , let us denote by l ∞ ( G ) the collection of all uniformly bounded realfunctions on G equipped with the supremum norm (cid:107) f (cid:107) G = sup x ∈G | f ( x ) | . Denote by L ( F ) the square integrable space with respect to the measure associated with F ,and let (cid:107)·(cid:107) F, be the corresponding norm. We let N ( (cid:15), T , L ( F )) denote the minimalnumber of (cid:15) -balls with respect to (cid:107)·(cid:107) F, needed to cover T . An (cid:15) -bracket (with respectto F ) is a pair of real functions ( l, u ) such that l ≤ u and (cid:107) u − l (cid:107) F, ≤ (cid:15) . Then, forany set of real functions M , we let N [] ( (cid:15), M , L ( F )) denote the minimum number of (cid:15) -brackets needed to cover M . We denote by G = ( ∪ r ≥ G r ). For x ∈ R d , d >
1, wedenote by (cid:107) x (cid:107) ∞ = max j =1 ,...,d | x | .For a sequence of random variable ( U n ) n ∈ N and a set F , we say that U n = O P (1)uniformly in F ∈ F if for any (cid:15) > M > n > The one assumption we need to make is that respondents in the surveys used to measure ψ (i.e.,those of March and July 2015) are drawn from the same population as those from the surveys usedto measure Y (i.e., those of July and November 2015). That there is no significant time trend in theattrition model (Table 5) suggests that this assumption is reasonable in this context. F ∈F P F ( | U n | > M ) < (cid:15) for all n > n . Similarly we say that U n = o P (1)uniformly in F ∈ F if for any (cid:15) >
0, sup F ∈F P F ( | U n | > (cid:15) ) → T ∗ versus T . We define o P ∗ and O P ∗ as above, but conditional on (cid:16) (cid:101) Y i , D i , X i (cid:17) i =1 ...n . Convergence in distribution conditional on (cid:16) (cid:101) Y i , D i , X i (cid:17) i =1 ...n isdenoted by → d ∗ . G.2 Proof of Lemma 1
Under H , there exist Y (cid:48) , ψ (cid:48) and I (cid:48) such that Y (cid:48) ∼ Y , ψ (cid:48) ∼ ψ , σ ( ψ (cid:48) ) ⊂ I (cid:48) and E ( Y (cid:48) |I (cid:48) ) = ψ (cid:48) . Then, by the law of iterated expectations, E [ Y (cid:48) | ψ (cid:48) ] = E [ E [ Y (cid:48) |I (cid:48) ] | ψ (cid:48) ] = E [ ψ (cid:48) | ψ (cid:48) ] = ψ (cid:48) . Conversely, if there exists ( Y (cid:48) , ψ (cid:48) ) such that Y (cid:48) ∼ Y , ψ (cid:48) ∼ ψ and E [ Y (cid:48) | ψ (cid:48) ] = ψ (cid:48) , let I (cid:48) = σ ( ψ (cid:48) ). Then ψ (cid:48) = E [ Y (cid:48) | ψ (cid:48) ] = E [ Y (cid:48) |I (cid:48) ] and H holds. G.3 Proof of Theorem 1 (i) ⇔ (iii). By Strassen’s theorem (Strassen, 1965, Theorem 8), the existence of( Y, ψ ) with margins equal to F Y and F ψ and such that E [ Y | ψ ] = ψ is equivalent to (cid:82) f dF ψ ≤ (cid:82) f dF Y for every convex function f . By, e.g., Proposition 2.3 in Gozlanet al. (2018), this is, in turn, equivalent to (iii).(ii) ⇔ (iii). By Fubini-Tonelli’s theorem, (cid:82) y −∞ F Y ( t ) dt = E (cid:104)(cid:82) y −∞ { t ≥ Y } dt (cid:105) = E [( y − Y ) + ] . The same holds for ψ . Hence, ∆( y ) ≥ y ∈ R is equivalentto E (cid:2) ( y − Y ) + (cid:3) ≥ E (cid:2) ( y − ψ ) + (cid:3) for all y ∈ R . The result follows. G.4 Proof of Proposition 1
First, by Jensen’s inequality, we obtain E [( y − Y ) + | ψ ] ≥ ( y − E ( Y | ψ )) + = ( y − ψ ) + . Moreover, ∆( y ) = 0 implies that E (( y − Y ) + ) = E (( y − ψ ) + ). Hence, almost surely,we have E [( y − Y ) + | ψ ] = ( y − ψ ) + . u , we either have Supp( Y | ψ = u ) ⊂ [ y , ∞ ) or Supp( Y | ψ = u ) ⊂ ( −∞ , y ]. Because E [ Y | ψ ] = ψ , Supp( Y | ψ = u ) ⊂ [ y , ∞ ) for almost all u > y and Supp( Y | ψ = u ) ⊂ ( −∞ , y ] for almost all u < y .Then, for all τ ∈ (0 , F − Y | ψ ( τ | u ) ≥ y for almost all u ≥ y and F − Y | ψ ( τ | u ) ≤ y foralmost all u ≤ y . Thus, for all τ ∈ (0 , F − Y | ψ ( τ |· ), F − Y | ψ ( τ | y ) = y .This implies that Y | ψ = y is degenerate. G.5 Proof of Proposition 2
We first prove that H X is equivalent to the existence of ( Y (cid:48) , ψ (cid:48) ) such that DY (cid:48) +(1 − D ) ψ (cid:48) = (cid:101) Y , D ⊥⊥ ( Y (cid:48) , ψ (cid:48) ) | X and E (( Y (cid:48) | ψ (cid:48) , X ) = ψ (cid:48) . First, under H X , thereexists ( Y (cid:48) , ψ (cid:48) , I (cid:48) ) such that DY (cid:48) + (1 − D ) ψ (cid:48) = (cid:101) Y , D ⊥⊥ ( Y (cid:48) , ψ (cid:48) ) | X , σ ( ψ (cid:48) , X ) ⊂ I (cid:48) and E ( Y (cid:48) |I (cid:48) ) = ψ (cid:48) . Then E [ Y (cid:48) | ψ (cid:48) , X ] = E [ E [ Y (cid:48) |I (cid:48) ] | ψ (cid:48) , X ] = E [ ψ (cid:48) | ψ (cid:48) , X ] = ψ (cid:48) . Conversely, if there exists ( Y (cid:48) , ψ (cid:48) ) such that DY (cid:48) + (1 − D ) ψ (cid:48) = (cid:101) Y , D ⊥⊥ ( Y (cid:48) , ψ (cid:48) ) | X and E ( Y (cid:48) | ψ (cid:48) , X ) = ψ (cid:48) , let I (cid:48) = σ ( X (cid:48) , ψ (cid:48) ). Then ψ (cid:48) = E ( Y (cid:48) | ψ (cid:48) , X ) = E ( Y (cid:48) |I (cid:48) ) andH X holds. The proposition then follows as Theorem 1. G.6 Proof of Proposition 4
For all y , ξ (cid:55)→ E [( y − ψ − ξ ) + ] is decreasing and convex. Then, because F ξ ψ dominatesat the second order F ξ Y + ε , we have (cid:90) E (cid:2) ( y − ψ − ξ ) + (cid:3) dF ε + ξ Y ( ξ ) ≥ (cid:90) E (cid:2) ( y − ψ − ξ ) + (cid:3) dF ξ ψ ( ξ ) . As a result, for all y , we obtain E (cid:20)(cid:16) y − (cid:98) Y (cid:17) + (cid:21) = (cid:90) E (cid:2) ( y − ψ − ε − ξ Y ) + | ε + ξ Y = ξ (cid:3) dF ε + ξ Y ( ξ )= (cid:90) E (cid:2) ( y − ψ − ξ ) + (cid:3) dF ε + ξ Y ( ξ ) ≥ (cid:90) E (cid:2) ( y − ψ − ξ ) + (cid:3) dF ξ ψ ( ξ )= E (cid:104) ( y − (cid:98) ψ ) + (cid:105) . Moreover, E (cid:16) (cid:98) Y (cid:17) = E (cid:16) (cid:98) ψ (cid:17) . By Theorem 1, (cid:98) Y and (cid:98) ψ satisfy H .46 .7 Proof of Theorem 2 (i) This is a particular case of Proposition 5 below, with q ( Y, c ) = Y . The proof istherefore omitted. (ii) We show that equality holds for F ∈ F satisfying the conditions stated in (ii).The proof is divided in three steps. We first prove convergence in distribution of T to S defined below, and conditional convergence of T ∗ towards the same limit. Then weshow that the cdf H of S is continuous and strictly increasing in the neighborhoodof its quantile of order 1 − α , for any α ∈ (0 , /
1. Convergence in distribution of T and T ∗ . Let us introduce some notation. Let K j,j ( j ∈ { , } ) be the j -th diagonal elementof the covariance kernel K , S : ( ν, K ) (cid:55)→ (1 − p ) (cid:16) − ν /K / , (cid:17) +2 + p (cid:16) ν /K / , (cid:17) , q ( r ) = ( r + 100) − (2 r ) − d X , and ν n,F ( y, g ) = 1 √ n n (cid:88) i =1 Diag (cid:16) V F (cid:16) (cid:101) Y (cid:17)(cid:17) − / (cid:16) m (cid:16) D i , (cid:101) Y i , X i , g, y (cid:17) − E F (cid:104) m (cid:16) D i , (cid:101) Y i , X i , g, y (cid:17)(cid:105)(cid:17) . Finally, we define k n,F ( y, g ) = √ n Diag (cid:16) V F (cid:16) (cid:101) Y (cid:17)(cid:17) − / E F (cid:104) m (cid:16) D i , (cid:101) Y i , X i , g, y (cid:17)(cid:105) , K n,F ( y, g, y (cid:48) , g (cid:48) ) = Diag (cid:16) V F (cid:16) (cid:101) Y (cid:17)(cid:17) − / (cid:100) Cov (cid:0) √ nm n ( y, g ) , √ nm n ( y (cid:48) , g (cid:48) ) (cid:1) Diag (cid:16) V F (cid:16) (cid:101) Y (cid:17)(cid:17) − / ,K n,F ( y, g, y (cid:48) , g (cid:48) ) = K n,F ( y, g, y (cid:48) , g (cid:48) ) + (cid:15) Diag (cid:16) V F (cid:16) (cid:101) Y (cid:17)(cid:17) − / Diag (cid:16)(cid:98) V (cid:16) (cid:101) Y (cid:17)(cid:17) Diag (cid:16) V F (cid:16) (cid:101) Y (cid:17)(cid:17) − / , and use the notations K n,F ( y, g ) = K n,F ( y, g, y, g ) and K n,F ( y, g ) = K n,F ( y, g, y, g ).We have, by definition of T , T = sup y ∈Y (cid:88) ( a,r ): r ∈{ ,...,r n } ,a ∈ A r q ( r ) S (cid:0) ν n,F ( y, g a,r ) + k n,F ( y, g a,r ) , K n,F ( y, g a,r ) (cid:1) . To characterize the distribution of T (resp. T ∗ ), we first prove the convergence of ν n,F and K n,F ( y, g a,r ) (resp. ν ∗ n,F and K ∗ n,F ( y, g a,r )). For those purposes, we use aclass of functions which is a general form taken by m defined in (2), namely, for any0 < N < M , M = { f y,φ ,φ ,g ( (cid:101) y, x, d ) = (cid:0) dφ ( y − (cid:101) y ) + − (1 − d ) φ ( y − (cid:101) y ) + (cid:1) g ( x ) , ( y, φ , φ , g ) ∈ Y × [ N , M ] × G} . M is a particular case of classes M defined in (9) below. Then, by theproof of Proposition 5 below, Assumptions PS1 and PS2 in AS are satisfied. Thus,the assumptions of Lemma D.2 in AS hold as well. This entails that AssumptionsPS4 and PS5 in AS hold. Namely, there exists a Gaussian process ν F such that- ν n,F → d ν F and ν ∗ n,F → d ∗ ν F ;- For all r ∈ N and ( y, g ) ∈ Y×G r , K n,F ( y, g ) → P K F ( y, g )+ (cid:15)I and K ∗ n,F ( y, g ) → P ∗ K F ( y, g ) + (cid:15)I , where I is the 2 × k F ( y, g ) denote the limit in probability of k n,F ( y, g ), we have k F ( y, g ) = 0 if ( y, g ) ∈ L F and ∞ otherwise. Note that by assumption, the set L F is nonempty.Thus, using (D.11) in the proof of Theorem D.3. in AS, which is based on the uniformcontinuity of the function S in the sense of Assumption S2 therein, we have, under F , T → d sup y ∈Y (cid:88) ( a,r ) ∈ A r × N S ( ν F ( y, g a,r ) + k F ( y, g a,r ) , K F ( y, g a,r ) + (cid:15)I )= S := sup y ∈Y (cid:88) ( a,r ):( y,g a,r ) ∈L F q ( r ) S ( ν F ( y, g a,r ) , K F ( y, g a,r ) + (cid:15)I ) , where the equality follows by definition of S and k F ( y, g ). Similarly, using As-sumption PS5 and (D.11) in AS, replacing T by T ∗ and quantities ν n,F ( y, g a,r ) and K n,F ( y, g a,r ) by their bootstrap counterparts (see the proof of Lemma D.4 in AS) wehave T ∗ → d ∗ S .
2. The cdf H of S is continuous and strictly increasing in the neighborhoodof any of its quantile of order − α > / . First, the cdf H of S is a convex functional of the Gaussian process ν F . Then, asin the proof of Lemma B3 in Andrews and Shi (2013), we can use Theorem 11.1 ofDavydov et al. (1998) p.75 to show that H is continuous and strictly increasing at48very point of its support except r = inf { r ∈ R : H ( r ) > } . Moreover, for any r > H ( r ) ≥ P sup y ∈Y (cid:88) ( a,r ):( y,g a,r ) ∈L F q ( r ) S ( ν F ( y, g a,r ) , K F ( y, g a,r ) + (cid:15)I ) < r ≥ P (cid:32) sup j ∈{ , } , ( y,a,r ):( y,g a,r ) ∈L F (cid:12)(cid:12) ( K ,F ,j,j ( y, g a,r ) + (cid:15) ) − / ν F ,j ( y, g a,r ) (cid:12)(cid:12) < (cid:112) r/ Q (cid:33) > , where Q = (cid:80) ( a,r ):( y,g a,r ) ∈L F q ( r ) < ∞ and we use Problem 11.3 of Davydov et al.(1998) p.79 for the last inequality. This yields r > r and H is continuous and strictlyincreasing on (0 , ∞ ).Then, we show that for any α ∈ (0 , / − α of the distri-bution of S is positive. By assumption, there exists ( y , g ) ∈ L F such that either K F , ( y , g ) > K F , ( y , g ) >
0. This yields P ( S >
0) = 1 − P sup y ∈Y (cid:88) ( a,r ):( y,g a,r ) ∈L F q ( r ) S ( ν F ( y, g a,r ) , K F ( y, g a,r ) + (cid:15)I ) = 0 ≥ − P ( ν F , ( y , g ) ≤ , ν F , ( y, g ) = 0) ≥ − min { P ( ν F , ( y , g ) ≤ , P ( ν F , ( y , g ) = 0) }≥ / . (6)The first inequality holds by definition of the supremum and because S is nonnegative.To obtain the last inequality, note that either ν F , ( y , g ) is non-degenerate, in whichcase the first probability is 1 / ν F , ( y , g ) is normal with zero mean), or ν F , ( y , g ) is non-degenerate, in which case the second probability is 0.Finally, using that H is strictly increasing on (0 , ∞ ), (6) ensures that any quantileof S of order 1 − α with α ∈ [0 , /
2) is positive. Hence, H is continuous and strictlyincreasing in the neighborhood of any such quantiles.
3. Conclusion.
Using T ∗ → d ∗ S in distribution, Step 2 and Lemma 21.2 in Van der Vaart (2000),we have that for η > c ∗ n,α → d ∗ c (1 − α + η ) + η , where c (1 − α + η ) is the491 − α + η )-th quantile of the distribution of S . Because T → d S and H is continuousat c (1 − α + η ) + η >
0, we obtain thatlim η → lim sup n →∞ P F (cid:0) T > c ∗ n,α (cid:1) = α. Combined with the inequality of Part (i) above, this yields the result. (iii)
This results follows from Theorem E.1 in AS. First, Assumption SIG2 in ASholds for σ F = V F (cid:16) (cid:101) Y (cid:17) , following the proof of Lemma 7.2 (b) under Assumption3-(ii). Second, Assumptions PS4 and PS5 are satisfied using the point (ii) above.Third, Assumptions CI, MQ, S1, S3, S4 in AS are also satisfied by construction ofthe statistic T . Thus, Theorem E.1 in AS yields the result. (cid:3) G.8 Proof of Proposition 5
We introduce E F,c = E F (cid:104) m (cid:16) D i , (cid:101) Y c,i , X i , g, y (cid:17)(cid:105) and ν n,F ( y, g ) = 1 √ n n (cid:88) i =1 Diag (cid:16)(cid:98) V F (cid:16) (cid:101) Y (cid:98) c (cid:17)(cid:17) − / (cid:16) m (cid:16) D i , (cid:101) Y (cid:98) c,i , X i , g, y (cid:17) − E F, (cid:98) c (cid:17) ,ν n,F ( y, g ) = 1 √ n n (cid:88) i =1 Diag (cid:16) V F (cid:16) (cid:101) Y c (cid:17)(cid:17) − / (cid:16) m (cid:16) D i , (cid:101) Y c ,i , X i , g, y (cid:17) − E F,c (cid:17) . The proof is based on Theorem 5.1 in AS, hence we have to check that the corre-sponding assumptions PS1, PS2, and SIG1 hold. Namely, we have to ensure that-
PS1 : for all sequence F ∈ F and all ( d, y (cid:48) , x, g, y, c ) ∈ { , } × Y × [0 , d X ×G r × Y × C s (cid:0) [0 , d X (cid:1)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) m ( d, y (cid:48) , x, g, y ) V F (cid:16) (cid:101) Y c,i (cid:17) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ M ( d, y (cid:48) , x, g, y ) and E F (cid:20) M (cid:16) D i , (cid:101) Y c,i , X i , g, y (cid:17) δ (cid:21) ≤ C < ∞ , where δ > M ;- PS2 : for all sequence F n ∈ F , the i.i.d triangular array of processes T n = (cid:26) m (cid:16) D i , (cid:101) Y n,c ( X n,i ) , X n,i , g, y (cid:17) V F n (cid:16) (cid:101) Y n,c ( X n,i ) (cid:17) , ( c, y, g ) ∈ C s (cid:0) [0 , d X (cid:1) × Y × G , i ≤ n, n ≥ (cid:27) is manageable with respect to some envelope function U (see Pollard, 1990,p.38 for the definition of a manageable class);50 SIG1 : for all ζ >
0, sup F ∈F ,c ∈C s ( [0 , dX ) P (cid:16)(cid:12)(cid:12)(cid:12)(cid:98) V F (cid:16) (cid:101) Y i,c (cid:17) / V F (cid:16) (cid:101) Y i,c (cid:17) − (cid:12)(cid:12)(cid:12) > ζ (cid:17) → c and Diag (cid:16) V F (cid:16) (cid:101) Y c (cid:17)(cid:17) − / areestimated:1. We first show thatsup F ∈F sup g ∈∪ r ≥ G r ,y ∈Y (cid:107) ν n,F ( y, g ) − ν n,F ( y, g ) (cid:107) ∞ = o P (1) , (7)sup F ∈F sup g ∈∪ r ≥ G r ,y ∈Y (cid:13)(cid:13) ν ∗ n,F ( y, g ) − ν ∗ n,F ( y, g ) (cid:13)(cid:13) ∞ = o P ∗ (1) . (8)2. Next, we show that m satisfies assumptions PS1, PS2, and that SIG1 in AS alsoholds for σ F = V F (cid:16) (cid:101) Y c (cid:17) , where F ∈ F and (cid:98) σ n = n − (cid:80) ni =1 (cid:16) (cid:101) Y (cid:98) c,i − n − (cid:80) nj =1 (cid:101) Y (cid:98) c,j (cid:17) .
1. Proof of (7) - (8)We apply the uniform version over F ∈ F of Theorem 3 in Chen et al. (2003) to ageneral class of functions to which pertain the moment condition m (see (2), with (cid:101) Y replaced here by (cid:101) Y c = Dq (cid:16) (cid:101) Y , c (cid:17) + (1 − D ) ψ and without the moment equality m ).Hence, it suffices to verify that Assumptions (3.2) and (3.3) of Theorem 3 in Chenet al. (2003) are satisfied. Let us introduce, for any 0 < N < M , the classes offunctions M = (cid:8) f c,y,φ,g ( (cid:101) y, x ) = φ ( y − q ( (cid:101) y, c ( x ))) + g ( x ) , ( c, y, φ, g ) ∈ C s (cid:0) [0 , d X (cid:1) × Y × [ N , M ] × G (cid:9) , (9) M = (cid:8) f c,y,φ,g ( (cid:101) y, x ) = φ ( y − (cid:101) y ) + g ( x ) , ( c, y, φ, g ) ∈ C s (cid:0) [0 , d X (cid:1) × Y × [ N , M ] × G (cid:9) , M = { f c,y,φ ,φ ,g ( (cid:101) y, x, d ) = ( dg c,y,φ ,g − (1 − d ) q c,y,φ ,g ) ( (cid:101) y, x ) , g ∈ M , q ∈ M , ( c, y, φ , φ , g ) ∈ C s (cid:0) [0 , d X (cid:1) × Y × [ N , M ] × G} . Note that φ , φ , and c in the class M denote components of m that are estimated.Consider the space C s (cid:0) [0 , d X (cid:1) × Y × [ N , M ] × G equipped with the norm (cid:107) ( c, y, φ , φ , g ) (cid:107) = max (cid:110) (cid:107) c (cid:107) [0 , dX , | y | , | φ | , | φ | , (cid:107) g (cid:107) [0 , dX (cid:111) . For v = ( c, y, φ , φ , g ) , v (cid:48) = ( c (cid:48) , y (cid:48) , φ (cid:48) , φ (cid:48) , g (cid:48) ) ∈ C s (cid:0) [0 , d X (cid:1) × Y × [ N , M ] × G and( (cid:101) y, x, d ) ∈ Y × [0 , d X × { , } , we have, by the triangular inequality and Assumptions51-(i) and 4-(v), | f v ( (cid:101) y, x, d ) − f v (cid:48) ( (cid:101) y, x, d ) | ≤ (cid:12)(cid:12) g c,y,φ ,g ( (cid:101) y, x ) − g c (cid:48) ,y (cid:48) ,φ (cid:48) ,g (cid:48) ( (cid:101) y, x ) (cid:12)(cid:12) + (cid:12)(cid:12) q c,y,φ ,g ( (cid:101) y, x ) − q c (cid:48) ,y (cid:48) ,φ (cid:48) ,g (cid:48) ( (cid:101) y, x ) (cid:12)(cid:12) ≤ ( M + M ) ( | φ − φ (cid:48) | + | φ − φ (cid:48) | )+ 2 M [ | y − y (cid:48) | + | q ( (cid:101) y, c ( x )) − q ( (cid:101) y, c (cid:48) ( x )) | ]+ 2 M M (cid:2) | { q ( (cid:101) y, c ( x )) ≤ y } − { q ( (cid:101) y, c ( x )) ≤ y (cid:48) }| + | { q ( (cid:101) y, c ( x )) ≤ y (cid:48) } − { q ( (cid:101) y, c (cid:48) ( x )) ≤ y (cid:48) }| + | g ( x ) − g (cid:48) ( x ) | (cid:3) . Denote by K q > q ( (cid:101) y, . ). Then, by convexity of x (cid:55)→ x ,we obtain17 | f v ( (cid:101) y, x, d ) − f v (cid:48) ( (cid:101) y, x, d ) | ≤ ( M + M ) (cid:16) | φ − φ (cid:48) | + | φ − φ (cid:48) | (cid:17) + 4 M (cid:104) | y − y (cid:48) | + K q (cid:107) c − c (cid:48) (cid:107) , dX (cid:105) + 4( M M ) (cid:2) | { q ( (cid:101) y, c ( x )) ≤ y } − { q ( (cid:101) y, c ( x )) ≤ y (cid:48) }| + | { q ( (cid:101) y, c ( x )) ≤ y (cid:48) } − { q ( (cid:101) y, c (cid:48) ( x )) ≤ y (cid:48) }| + (cid:107) g − g (cid:48) (cid:107) , dX (cid:3) . Fix δ >
0. If (cid:107) v − v (cid:48) (cid:107) ≤ δ , this yields17 | f v ( (cid:101) y, x, d ) − f v (cid:48) ( (cid:101) y, x, d ) | ≤ δ (cid:0) M + M ) + 4 M (1 + K q ) + 4( M M ) (cid:1) + 4( M M ) (cid:2) { q ( (cid:101) y, c ( x )) ≤ y + δ } − { q ( (cid:101) y, c ( x )) ≤ y − δ } + (cid:12)(cid:12) (cid:8)(cid:101) y ≤ q I ( y (cid:48) , c ( x )) (cid:9) − (cid:8)(cid:101) y ≤ q I ( y (cid:48) , c (cid:48) ( x )) (cid:9)(cid:12)(cid:12) (cid:3) . Next, by Assumption 4-(iv), we obtain E (cid:104) (cid:110) q (cid:16) (cid:101) Y , c ( X ) (cid:17) ≤ y + δ (cid:111) − (cid:110) q (cid:16) (cid:101) Y , c ( X ) (cid:17) ≤ y − δ (cid:111)(cid:105) = F q ( (cid:101) Y ,c ( X ) ) ( y + δ ) − F q ( (cid:101) Y ,c ( X ) ) ( y − δ ) ≤ Q δ. E (cid:2)(cid:12)(cid:12) (cid:8) Y ≤ q I ( y (cid:48) , c ( X )) (cid:9) − (cid:8)(cid:101) y ≤ q I ( y (cid:48) , c (cid:48) ( X )) (cid:9)(cid:12)(cid:12)(cid:3) ≤ E (cid:2) (cid:8) Y ≤ q I ( y (cid:48) , c ( X )) − Q F, δ (cid:9) − (cid:8)(cid:101) y ≤ q I ( y (cid:48) , c ( X )) + Q F, δ (cid:9)(cid:3) ≤ E (cid:2) F Y | X (cid:0) q I ( y (cid:48) , c ( X )) − Q q I δ (cid:12)(cid:12) X (cid:1) − F Y | X (cid:0) q I ( y (cid:48) , c ( X )) + Q q I δ (cid:12)(cid:12) X (cid:1)(cid:3) ≤ Q F, Q q I δ, where Q q I is the Lipschitz constant of q I . Thus, by Assumption 4, there exists Q > F ∈F E (cid:34) sup (cid:107) v − v (cid:48) (cid:107)≤ δ (cid:12)(cid:12)(cid:12) f v (cid:16) (cid:101) Y , X, D (cid:17) − f v (cid:48) (cid:16) (cid:101) Y , X, D (cid:17)(cid:12)(cid:12)(cid:12) (cid:35) ≤ Qδ. (10)Therefore the class M satisfies Condition (3.2) of Theorem 3 in Chen et al. (2003)uniformly in F ∈ F . Moreover, the class G is manageable and thus Donsker (seeLemma 3 in Andrews and Shi, 2013). Finally, by Remark 3 (ii) in Chen et al. (2003), C s (cid:0) [0 , d X (cid:1) is also Donsker. Then, C s (cid:0) [0 , d X (cid:1) , Y , [ N , M ], and G satisfy Condition(3.3) of Theorem 3 in Chen et al. (2003). The result follows by Theorem 3 in Chenet al. (2003). m satisfies PS1 and PS2 of AS and SIG1 of AS also holds for σ F and (cid:98) σ n . From Assumption 4 (iii) and the proof of Lemma 7.2 (a) in AS, PS1 is satisfiedreplacing B by max( M, M ) in the proof of Lemma 7.2-(a) in AS.We now show that PS2 in AS also holds. As the result is uniform over F , we have toconsider sequences for the cdfs F n of ( D n,i , Y n,i , X n,i ) i =1 ...n (with F n ∈ F ). We alsodefine (cid:101) Y n,c ( X n,i ) = D n,i q ( Y n,i , c ( X n,i )) + (1 − D n,i ) ψ n,i ,W n,i = D n,i E F n [ D n,i ] − − D n,i E F n [1 − D n,i ] ,σ F n = V F n (cid:16) (cid:101) Y n,c ( X n,i ) (cid:17) . Note that by Assumption 3 (iii), σ F n ≥ σ > F n ∈ F . Let (Ω , F , F n ) be aprobability space and let ω denote a generic element in Ω. Showing Assumption PS2in AS then boils down to prove that for any 0 < N < M := 1 / inf F σ F , the i.i.d53riangular array of processes T ,n,ω = (cid:26) W n,i φ (cid:16) y − (cid:101) Y n,c ( X n,i ) (cid:17) + g ( X n,i ) , ( c, y, φ, g ) ∈ C s (cid:0) [0 , d X (cid:1) × Y × [ N , M ] × G ,i ≤ n, n ≥ (cid:27) is manageable with respect to some envelope function U . Lemma 3 in Andrews andShi (2013) shows that the processes { g ( X n,i ) , g ∈ G , i ≤ n, n ≥ } are manageablewith respect to the constant function 1. Then, using Lemma D.5 in AS, it remainsto show that T (cid:48) ,n,ω = (cid:26) W n,i φ (cid:16) y − (cid:101) Y n,c ( X n,i ) (cid:17) + , ( c, y, φ ) ∈ C s (cid:0) [0 , d X (cid:1) × Y × [ N , M ] , i ≤ n, n ≥ (cid:27) , is manageable with respect to some envelope. For such an envelope, we can consider U (cid:48) ( ω ) = ( M + M ) / ( σ(cid:15) ). We now prove the manageability of T (cid:48) ,n,ω . Let us define M (cid:48) = (cid:8) f c,y,φ ,φ ( (cid:101) y, x, d ) = dφ ( y − q ( (cid:101) y, c ( x ))) + − (1 − d ) φ ( y − (cid:101) y ) + , ( c, y, φ , φ ) ∈ C s (cid:0) [0 , d X (cid:1) × Y × [ N , M ] (cid:9) . Reasoning as for the class M defined in (9), and using the last equation of the proofof Theorem 3 in Chen et al. (2003), p.1607, we have that for (cid:15) > N [ · ] ( (cid:15), M (cid:48) , (cid:107) · (cid:107) ) ≤ N (cid:0) (cid:15) (cid:48) , [ N , M ] , |·| (cid:1) × N ( (cid:15) (cid:48) , Y , |·| ) × N (cid:0) (cid:15) (cid:48) , C s (cid:0) [0 , d X (cid:1) , (cid:107) · (cid:107) [0 , dX (cid:1) , with (cid:15) (cid:48) = ( (cid:15)/ (2 Q )) and Q defined in (10). Using Theorem 2.7.1 page 155 in Van derVaart and Wellner (1996), there exists a constant Q depending only on s , d X , and[0 , d X such that ln (cid:0) N (cid:0) (cid:15) (cid:48) , C s ([0 , d X ) , (cid:107) · (cid:107) [0 , dX (cid:1)(cid:1) ≤ Q (cid:15) (cid:48)− d X /s . Moreover, because Y and [ N , M ] are compact subsets of two Euclidean spaces, thereexist Q , Q such that N (cid:0) (cid:15) (cid:48) , [ N , M ] , |·| (cid:1) ≤ Q (cid:15) (cid:48)− and N ( (cid:15) (cid:48) , Y , |·| ) ≤ Q (cid:15) (cid:48)− . (11)This yieldsln (cid:0) N [ · ] ( (cid:15), M (cid:48) , (cid:107) · (cid:107) ) (cid:1) ≤ (6 + Q ) max (cid:0) − ln( (cid:15) (cid:48) ) , (cid:15) (cid:48)− d X /s (cid:1) + ln( Q Q ) . (12)54et (cid:12) denote element-by-element product and D (cid:0) (cid:15) | α (cid:12) U (cid:48) ( ω ) | , α (cid:12) T (cid:48) ,n,ω (cid:1) denoterandom packing numbers. By (A.1) in Andrews (1994, p.2284), we havesup ω ∈ Ω ,n ≥ , α ∈ R n + D (cid:0) (cid:15) | α (cid:12) U (cid:48) ( ω ) | , α (cid:12) T (cid:48) ,n,ω (cid:1) ≤ sup F ∈F N (cid:16) (cid:15) , M (cid:48) , (cid:107) · (cid:107) (cid:17) ≤ sup F ∈F N [ · ] ( (cid:15), M (cid:48) , (cid:107) · (cid:107) ) , (13)where the second inequality follows as in e.g., Van der Vaart and Wellner (1996, p.84).Then, (12) ensures (see Definition 7.9 in Pollard (1990), p.38) thatsup ω ∈ Ω ,n ≥ , α ∈ R n + D (cid:0) (cid:15) | α (cid:12) U (cid:48) ( ω ) | , α (cid:12) T (cid:48) ,n,ω (cid:1) ≤ λ ( (cid:15) ) , where λ ( (cid:15) ) = exp (cid:16) (6 + Q ) max (cid:16) − (cid:15)/ (2 Q )) , ( (cid:15)/ (2 Q )) − d X /s (cid:17) + ln( Q Q ) (cid:17) . More-over, by using √ a + b ≤ √ a + √ b for all a, b ≥ (cid:90) (cid:112) ln( λ ( (cid:15) )) d(cid:15) ≤ (cid:112) Q (cid:90) (cid:104) max (cid:16) − (cid:15)/ (2 Q )) , ( (cid:15)/ (2 Q )) − d X /s (cid:17)(cid:105) / d(cid:15) + (cid:112) ln( Q Q ) < ∞ . Thus, T (cid:48) ,n,ω hence T ,n,ω are manageable. Therefore, m satisfies PS2 in AS.Finally, in order to show that SIG1 in AS is satisfied, we use Assumption 4 (iii) andfollow the proof of Lemma 7.2 (b) in AS where we replace Y by q ( Y, c ( X )) and B bymax( M, M ). The result follows. G.9 Proof of Proposition 6
Hereafter, we let [ ψ, ψ ] (resp. [ y, y ]) denote the support of ψ (resp. of Y ). As inLemma 1, H SK holds if and only if there exists a pair of random variables ( Y (cid:48) , ψ (cid:48) )and c such that Y (cid:48) ∼ Y , ψ (cid:48) ∼ ψ and E [ Y (cid:48) | ψ (cid:48) ] = Q c ( ψ (cid:48) ). Now, if Q c is strictlyincreasing on [ ψ, ψ ], we have E [ Y (cid:48) | ψ (cid:48) ] = Q c ( ψ (cid:48) ) if and only if E [ Y (cid:48) | Q c ( ψ (cid:48) )] = Q c ( ψ (cid:48) ).In view of Theorem 1, the latter is equivalent to F Y being a mean-preserving spreadof F Q c ( ψ (cid:48) ) . Therefore, the proposition holds if for any η >
0, there exists K , c ∈ R K +1 and F such that (i) Q c is strictly increasing on [ ψ, ψ ]; (ii) sup y ∈ R | F ψ ( y ) − F ( y ) | < η ;(iii) F Y is mean-preserving spread of F Q c ( (cid:101) ψ ) , with (cid:101) ψ ∼ F .55ix η >
0. Since F Y is continuous on [ y, y ], it is uniformly continuous on this set.Hence, there exists η (cid:48) such that | y − y (cid:48) | < η (cid:48) ⇒ | F Y ( y ) − F Y ( y (cid:48) ) | < η. (14)By assumption, F − Y ◦ F ψ is increasing and continuous. Then, by Theorem 9 in Mu-lansky and Neamtu (1998), there exists a sequence ( P n ) n ∈ N of increasing polynomialson [ ψ, ψ ] satisfying P n ( ψ ) = y and P n ( ψ ) = y and converging uniformly to F − Y ◦ F ψ .Hence, there exists P n such thatsup y ∈ [ ψ,ψ ] | P n ( y ) − F − Y ◦ F ψ ( y ) | < η (cid:48) . (15)Let K be the degree of P n and c ∈ R K denote the vector of coefficients of P n , sothat Q c = P n . Q c is a non-constant polynomial, which is increasing on [ ψ, ψ ]. Hence,its derivative vanishes a finite number of times and Q c is actually strictly increasing.Hence, Condition (i) above holds. Moreover, combining (15) with (14), we obtainsup y ∈ [ ψ,ψ ] | F Y ◦ Q c ( y ) − F ψ ( y ) | < η. Now, let F := F Y ◦ Q c on [ ψ, ψ ], F ( y ) := 0 for all y < ψ and F ( y ) := 1 for all y > ψ .Then F is continuous and increasing, with limit 0 and 1 respectively at −∞ and ∞ .Thus, it is a cdf and Condition (ii) above holds. Finally, let (cid:101) ψ ∼ F . We have, for any y ∈ [ y, y ], P (cid:16) Q c ( (cid:101) ψ ) ≤ y (cid:17) = F ◦ Q − c ( y ) = F Y ( y ) . This implies that F Q c ( (cid:101) ψ ) is a mean-preserving spread of F Y . The result follows. G.10 Proof of Proposition 7
1. We consider for that purpose ( ψ ∗ , ξ ∗ ψ , ξ ∗ Y , ε ∗ ) ∼ N ( m, Σ), potentially different fromthe true ( ψ, ξ ψ , ξ Y , ε ), and let (cid:98) ψ ∗ = ψ ∗ + ξ ∗ ψ , (cid:98) Y ∗ = ψ ∗ + ε ∗ + ξ ∗ Y . We then fix ( m, Σ) so that the DGP satisfies all the restrictions specified in the propo-sitions, and in particular, ( V ( (cid:98) Y ∗ ) , V ( (cid:98) ψ ∗ ) , Cov( (cid:98) Y ∗ , (cid:98) ψ ∗ )) = ( V ( (cid:98) Y ) , V ( (cid:98) ψ ) , Cov( (cid:98)
Y , (cid:98) ψ )).56irst, letting m = ( m , m , m , m ) (cid:48) , we impose m = m = m = 0, and set all thenon-diagonal terms of Σ, except Σ = Cov( ξ ∗ ψ , ξ ∗ Y ), equal to zero. Then ( (cid:98) Y ∗ , (cid:98) ψ ∗ , ψ ∗ )satisfy (1) and RE hold (considering I = σ ( ψ ∗ ) and Y ∗ = ψ ∗ + ε ∗ ). We fix be-low Σ ∈ [0 , V ( (cid:98) ψ )]. Then let Σ = V ( (cid:98) ψ ) − Σ and Σ = V ( (cid:98) Y ) − V ( (cid:98) ψ ) + Σ and Σ = 0, so that ( V ( (cid:98) Y ∗ ) , V ( (cid:98) ψ ∗ )) = ( V ( (cid:98) Y ) , V ( (cid:98) ψ )). Also, because V ( (cid:98) Y ) > V ( (cid:98) ψ ), V ( ξ ∗ ψ ) < V ( ξ ∗ Y + ε ∗ ) and F ξ ∗ ψ dominates at the second order F ξ ∗ Y + ε ∗ .Now, we fix Σ . Let a = V ( (cid:98) Y ) − V ( (cid:98) ψ ) and c = Cov( (cid:98) Y − (cid:98) ψ, (cid:98) ψ ). Then, by Cauchy-Schwarz inequality, c ≤ V ( (cid:98) ψ ) V ( (cid:98) Y − (cid:98) ψ ) = V ( (cid:98) ψ )( a − c ) . This means that there exists σ ∈ [0 , V ( (cid:98) ψ )] such that c ≤ σ ( a − c ) . (16)Let Σ = σ and Σ = c + Σ . Then, by construction,Cov( (cid:98) Y ∗ , (cid:98) ψ ∗ ) = Σ + Σ = V ( (cid:98) ψ ) − Σ + Σ + c = Cov( (cid:98) Y , (cid:98) ψ ) . Moreover, in view of (16) and by definition of Σ and Σ ,Σ = c + 2 c Σ + Σ ≤ ( a − c )Σ + 2 c Σ + Σ = Σ Σ . In other words, Σ is a proper covariance matrix.2. Let λ = V ( ψ ) /σ ξ ψ . If (1) and RE hold, Cov( ξ ψ , ε + ξ Y ) ≥ λ ≥ λ , we obtain β − (cid:98) Y − (cid:98) ψ, (cid:98) ψ ) V ( (cid:98) ψ )= Cov( ε + ξ Y − ξ ψ , ξ ψ ) σ ξ ψ (1 + λ ) ≥ −
11 + λ .
The result follows. 57 .11 Proof of Proposition 8
We first prove that if E [ ψ L ] ≤ E [ Y ] ≤ E [ ψ U ], there exists a unique F ∗ ∈ F B suchthat δ F ∗ = 0. First, suppose that F b (cid:54) = F b (cid:48) and, without loss of generality, b > b (cid:48) .Then ψ b ≤ ψ b (cid:48) , implying that F b ( y ) ≤ F b (cid:48) ( y ) for all y . Moreover, the inequality isstrict for at least one y . As a result, E ( ψ b ) > E ( ψ b (cid:48) ). In other words, there is atmost one F ∗ ∈ F B such that δ F ∗ = 0. If E [ ψ L ] = E [ Y ] or E [ ψ U ] = E [ Y ], such asolution also exists by taking b = −∞ and b = ∞ , respectively. Now, suppose that E [ ψ L ] < E [ Y ] < E [ ψ U ]. For all ∞ > b > b (cid:48) > −∞ , ψ b − ψ b (cid:48) = ( ψ U − max( ψ L , b (cid:48) )) 1l { ψ U ∈ [ b (cid:48) , b ) } + ( b − b (cid:48) )1l { ψ L < b (cid:48) , ψ U ≥ b } + ( b − ψ L )1l { ψ L ∈ [ b (cid:48) , b ) , ψ U ≥ b } . As a result, | ψ b − ψ b (cid:48) | ≤ | b − b (cid:48) | . This implies that (cid:101) δ : b (cid:55)→ E [ ψ b ] is continuous.Moreover, lim b →−∞ (cid:101) δ ( b ) = E [ ψ L ] < E ( Y ) and lim b →∞ (cid:101) δ ( b ) = E [ ψ U ] > E ( Y ). By theintermediate value theorem, there exists b ∗ such that (cid:101) δ ( b ∗ ) = E ( Y ). Hence, thereexists F ∗ ∈ F B such that δ F ∗ = 0. The first part of Proposition 8 follows.Let us turn to the second part of the proposition. First, if (ii) holds, there exists b ∈ R such that F ∗ = F b . Then, by construction and Theorem 1, Y and ψ b satisfyH . Moreover, F b ∈ [ F ψ U , F ψ L ]. Therefore, H B holds as well.Now, let us prove that (i) implies (ii). Let us denote by D the set of all the cdfs for ψ such that H B holds. By Theorem 1, these are cdfs F satisfying F ψ U ≤ F ≤ F ψ L , δ F = 0 and dominating at the second order F Y . We show below that all F ∈ D are dominated at the second order by F ∗ . Then, because F ψ U ≤ F ∗ ≤ F ψ L and (cid:82) ydF ∗ ( y ) = (cid:82) ydF Y ( y ), D is not empty only if F ∗ dominates at the second order F Y .The result then follows by Theorem 1.Thus, we have to show that for all t ∈ R , F ∗ = argmin F ψ ∈D (cid:90) t −∞ F ψ ( y ) dy. (17)First, if F ∗ = F −∞ , we have for all F (cid:54) = F ∗ , F ( y ) ≤ F ψ L ( y ) = F ∗ ( y ) for all y , withstrict inequality for some y . Then δ F > δ F ∗ = 0 and D = { F ∗ } , implying that (17)holds. Similarly, (17) holds if F ∗ = F ∞ . 58uppose now that F ∗ = F b for some b ∈ R . Because F ψ U ( y ) ≤ F ψ ( y ) for all y < b and all F ψ ∈ D , (17) holds for all t < b . We now prove that (17) holds also for t ≥ b . First suppose that t ≥ max( b , F ψ ∈ D , (cid:82) ydF Y ( y ) = (cid:82) ydF ψ ( y ) dy .As a result, by Fubini’s theorem, − (cid:90) −∞ F ∗ ( y ) dy + (cid:90) t (1 − F ∗ ( y )) dy + (cid:90) ∞ t (1 − F ∗ ( y )) dy = − (cid:90) −∞ F ψ ( y ) dy + (cid:90) t (1 − F ψ ( y )) dy + (cid:90) ∞ t (1 − F ψ ( y )) dy. Because F ψ ≤ F ψ L = F ∗ on [ b , ∞ ], this implies that − (cid:90) −∞ F ∗ ( y ) dy + (cid:90) t (1 − F ∗ ( y )) dy ≥ − (cid:90) −∞ F ψ ( y ) dy + (cid:90) t (1 − F ψ ( y )) dy and thus (17) holds for t ≥ max( b , b < t ∈ ( b , − (cid:18)(cid:90) t −∞ F ∗ ( y ) dy + (cid:90) t F ∗ ( y ) dy (cid:19) + (cid:90) ∞ (1 − F ∗ ( y )) dy = − (cid:18)(cid:90) t −∞ F ψ ( y ) dy + (cid:90) t F ψ ( y ) dy (cid:19) + (cid:90) ∞ (1 − F ψ ( y )) dy. Using again F ψ ≤ F ψ L = F ∗ on [ t, ∞ ) yields − (cid:90) t F ∗ ( y ) dy + (cid:90) ∞ (1 − F ∗ ( y )) dy ≤ − (cid:90) t F ψ ( y ) dy + (cid:90) ∞ (1 − F ψ ( y )) dy.dy.