[PDF] New goodness-of-fit diagnostics for conditional discrete response models

Abstract

This paper proposes new specification tests for conditional models with discrete responses, which are key to apply efficient maximum likelihood methods, to obtain consistent estimates of partial effects and to get appropriate predictions of the probability of future events. In particular, we test the static and dynamic ordered choice model specifications and can cover infinite support distributions for e.g. count data. The traditional approach for specification testing of discrete response models is based on probability integral transforms of a jittered discrete data which leads to continuous uniform iid series under the true conditional distribution. Then, standard specification testing techniques for continuous variables could be applied to the transformed series, but the extra randomness from jitters affects the power properties of these methods. We investigate in this paper an alternative transformation based only on original discrete data that avoids any randomization. We analyze the asymptotic properties of goodness-of-fit tests based on this new transformation and explore the properties in finite samples of a bootstrap algorithm to approximate the critical values of test statistics which are model and parameter dependent. We show analytically and in simulations that our approach dominates the methods based on randomization in terms of power. We apply the new tests to models of the monetary policy conducted by the Federal Reserve.

Full PDF

aa r X i v : . [ m a t h . S T ] J un New goodness-of-ﬁt diagnostics for conditionaldiscrete response models

Igor Kheifets ∗ and Carlos Velasco † September 19, 2018

Abstract

This paper proposes new speciﬁcation tests for conditional models with dis-crete responses, which are key to apply eﬃcient maximum likelihood methods, toobtain consistent estimates of partial eﬀects and to get appropriate predictions ofthe probability of future events. In particular, we test the static and dynamic or-dered choice model speciﬁcations and can cover inﬁnite support distributions fore.g. count data. The traditional approach for speciﬁcation testing of discrete re-sponse models is based on probability integral transforms of a jittered discrete datawhich leads to continuous uniform iid series under the true conditional distribution.Then, standard speciﬁcation testing techniques for continuous variables could beapplied to the transformed series, but the extra randomness from jitters aﬀectsthe power properties of these methods. We investigate in this paper an alternativetransformation based only on original discrete data that avoids any randomiza-tion. We analyze the asymptotic properties of goodness-of-ﬁt tests based on thisnew transformation and explore the properties in ﬁnite samples of a bootstrap al-gorithm to approximate the critical values of test statistics which are model andparameter dependent. We show analytically and in simulations that our approachdominates the methods based on randomization in terms of power. We apply thenew tests to models of the monetary policy conducted by the Federal Reserve.

Keywords : Speciﬁcation tests, count data, dynamic discrete choice models, con-ditional probability integral transform.

JEL classiﬁcation : C12, C22, C52. ∗ ITAM, Mexico. Email: [email protected] † Department of Economics, Universidad Carlos III de Madrid. Email: [email protected] INTRODUCTION

Many statistical models specify the conditional distribution of a discrete response vari-able given some explanatory variables, including the description of binary, multinomial,ordered choice and count data. In this paper we analyze goodness-of-ﬁt tests for bothstatic models with covariates as well as dynamic ordered choice and count data models,where the conditioning information set may also include past information on the discretevariable and a set of (contemporaneous) explanatory variables which frequently appear inthe social sciences, see Kedem and Fokianos (2002) and Greene and Hensher (2010). Forexample, dynamic models are popular in macroeconomic applications, see for instanceHamilton and Jord´a (2002), Dolado and Maria-Dolores (2002) and Basu and de Jong(2007) for modeling central banks decisions or Kauppi and Saikkonen (2008) and Startz(2008) for predicting US recessions; in ﬁnance, see e.g. Rydberg and Shephard (2003) formodeling the size of asset price movements and Fokianos et al. (2009) for the number oftransactions per minute of a particular stock.Suppose we observe the random variables { Y t , X ′ t } Tt =1 and consider the information setsΩ t = { X t , Y t − , X t − , Y t − , X t − , . . . } for each period t = 1 , , . . . , T . We are interestedin testing the null hypothesis that the distribution of Y t conditional on Ω t is in theparametric family F t,θ ( · | Ω t ), i.e. H : Y t | Ω t ∼ F t,θ ( · | Ω t ) for some θ ∈ Θ , t = 1 , , . . . , T, where Θ ⊂ R m is the parameter space, while the alternative hypothesis ( H ) for theomnibus test would be the negation of H .We consider a class M of discrete conditional distributions deﬁned on K = { , , . . . , K } ,for integer K > K = { , , . . . , ∞} such that for all F ∈ M it holds that F (0) = 0, f ( k ) := F ( k ) − F ( k − > k = 1 , , . . . and P k ∈K f ( k ) = 1. This setup includesnumerous models that have been used extensively in applied work both for dynamic andfor iid data, here we describe brieﬂy two of them. Example 1 (Dynamic multinomial ordered choice model) . The discrete responses Y t are2ssumed to be generated by the rule Y t =  V ∗ t ≤ τ τ < V ∗ t ≤ τ ... K if V ∗ t > τ K − , where V ∗ t is a continuous latent variable and τ , . . . , τ K − are threshold parameters thatdeﬁne K intervals in R . In a simple model, e.g. Basu and de Jong (2007), the latentvariable is determined through the linear equation V ∗ t = X ′ t β + ρY t − + ε t , where X t is a vector of stationary exogenous regressors, β a vector of regression param-eters, ε t is the shock in each period, and Y t − could be replaced by any function of thepast { Y t − , . . . , Y t − n } for some ﬁnite n. The cdf of ε t , F ε , is going to determine the classof multinomial model, i.e. ordered multinomial probit (if ε t is standard normal) or logit(if ε t is logistic), since F t,θ is deﬁned at once fromPr ( Y t = k | Ω t ) = Pr ( τ k − < V ∗ t ≤ τ k | Ω t )= F ε ( τ k − X ′ t β − ρY t − ) − F ε ( τ k − − X ′ t β − ρY t − ) , with τ = −∞ and τ K = ∞ and θ = ( β ′ , ρ, τ , . . . , τ K − ) ′ . Example 2 (Poisson Model) . The variate Y t = Y ∗ t + 1 is deﬁned on the counts Y ∗ t =0 , , , . . . which are assumed to follow a conditional Poisson distribution Y ∗ t | Ω t ∼ Poisson( λ t ) , where the conditional mean can depend on covariates through an exponential link as λ t = exp( X ′ t β ) or on previous observations through an identity link as λ t = α + α λ t − + ρY ∗ t − , e.g. Fokianos et al. (2009), or through the logarithmic canonical link as log( λ t ) = X ′ t β + ρe t − , where e t = ( Y ∗ t − λ t ) /λ t are scaled and centered errors, e.g. Davis et al.(2003). 3espite that a correct speciﬁcation is key to apply eﬃcient maximum likelihood meth-ods, to obtain consistent estimates of partial eﬀects and to get appropriate predictions ofthe probability of future events, empirical researchers typically do not perform goodnessof ﬁt testing of such models as they would do in a continuous case. In general, thereare only a few speciﬁcation tests available for discrete data, see Mora and Moro-Egido(2007). Two of them, the test of the Generalized Linear Model (GLM) of Stute and Zhu(2002) and the conditional Kolmogorov test of Andrews (1997), based on the speciﬁca-tion of the conditional mean for binary data, can be adapted for this purpose and wediscuss this possibility and compare it to our approach in Section 6. A related test toAndrews derived for time series by Corradi and Swanson (2006) could be adapted alsofor discrete data, but this is testing a diﬀerent null hypothesis concerning a distributiongiven a ﬁnite conditioning set not characterizing the complete dynamics of the process.There are also tests designed speciﬁcally for Poisson models (see e.g. Neumann 2011;Fokianos and Neumann, 2013).In what follows we propose conditional, dynamic discrete analogs of the Kolmogorov-Smirnov goodness of ﬁt measure that can exploit diﬀerent restrictions derived from themartingale diﬀerence property of a particular transformation of the data under the nullhypothesis. This property is derived from the speciﬁcation of a complete dynamic modelgiven the information set generated by all the past observations of the discrete responseand other explanatory variables and is used to build the asymptotic theory for our tests.Under i.i.d. assumptions this martingale diﬀerence property leads to an exact indepen-dence of the transformation sequence under the null and a much simpler parallel asymp-totic theory.When the ﬁtted distribution is continuous, the relative distribution of Y t comparedto F t,θ deﬁned as the cdf of the Rosenblatt’s (1952) transforms, also called conditionalProbability Integral Transforms (PIT), U t ( θ ) := F t,θ ( Y t | Ω t ) , t = 1 , , . . . , T is standard uniform and U t ( θ ) are distributed as independent [0 ,

1] uniform random4ariables under H . This serves as a basis for several speciﬁcation tests of H , see e.g.Bai (2003) and Kheifets (2015) for dynamic models and Delgado and Stute (2008) forindependent and identical distributed (iid) data. However Rosenblatt transformation isnot appropriate for discrete support random variables, producing non-iid pseudo residualseven under the null of correct speciﬁcation. To solve the limitations of PIT-based testingtechniques for discrete data, several alternative transforms have been proposed, see Jung,Kukuk and Liesenfeld (2006), Czado, Gneiting and Held (2009) and references therein.An easy and popular way is to randomize, i.e. to interpolate the discrete values of Y t withindependent noise in [0 , nonrandomized transform Y t I t,θ ( u ) for u ∈ [0 , I t,θ ( u ) :=  , u ≤ U − t ( θ ) ; u − U − t ( θ ) U t ( θ ) − U − t ( θ ) , U − t ( θ ) ≤ u ≤ U t ( θ ) ;1 , U t ( θ ) ≤ u, (1)where U − t ( θ ) := F t,θ ( Y t − | Ω t ). This transform, conditional on data, is nonrandom-ized in the sense that it does not depend on extra sources of randomness, as opposed tointerpolation transforms discussed in the next section. The unconditional version of thistransform appears in Handcock and Morris (1999) and more recently in Czado, Gneitingand Held (2009) where it is used for calibration, but no formal tests are proposed there.This transformation can also be seem as a particular case of the multilinear extensionas deﬁned in Genest, Neˇslehov´a and R´emillard (2014). As we show below, for every u ∈ [0 , I t,θ ( u ) − u constitute a martingale diﬀerence sequence (MDS) with respect toΩ t under H and can be used for testing H as I t,θ ( u ) loses this property when the modelis misspeciﬁed. For instance, we can compute the pseudo empirical relative distributionof Y t compared to F t,θ e F θ ( u ) := 1 T T X t =1 I t,θ ( u ) , u ∈ [0 , , S T ( u ) := 1 T / T X t =1 { I t,θ ( u ) − u } = T / (cid:16) e F θ ( u ) − u (cid:17) , which converges weakly to a Gaussian process. In addition, in order to control dynamicsin I t,θ ( u ), we can compare the joint pseudo empirical cdf with the uniform on a squareusing the biparameter process S T ( u ) := 1( T − / T X t =2 { I t,θ ( u ) I t − ,θ ( u ) − u u } , (2)where u = ( u , u ). To obtain feasible tests we need to consider norms of S jT for j = 1 , R S jT ( u ) dϕ ( u ) for some absolute continuous measure ϕ in [0 , j , or Kolmogorov-Smirnov sup u ∈ [0 , j | S jT ( u ) | norms.When the parameter θ is unknown under the null, we use an estimate b θ T and ac-count for the parameter estimation eﬀect in the p -value computation with a parametricbootstrap method. It might be possible also to derive, e.g. martingale, distribution-freetransforms, but since they typically need to be programmed on a case by case basis foreach model, so can be impractical, and are beyond the scope of this paper. As far as weknow, our proposal is the ﬁrst formal speciﬁcation test of ordered discrete choice modelswhich accounts properly for parameter uncertainty and is based on a nonrandomizedtransform, which makes it attractive in terms of power against a wide set of alternativehypotheses.The rest of the paper is organized as follows. In the next section, we describe diﬀerentalternatives to the PIT. In Sections 3 and 4, we provide the main asymptotic propertiesof the nonrandomized transforms and of the resulting univariate and bivariate empiricalprocesses using martingale theory. In particular, we establish weak limits under ﬁxedand local alternatives accounting for parameter estimation eﬀect. Section 5 discussesthe implementation of new tests with a simple bootstrap algorithm. Section 6 provides asmall simulation exercise and an application exploring the properties of speciﬁcation testsbased on both randomized and non randomized transformations. Then we conclude. Allproofs are contained in the Appendix. 6 ALTERNATIVES TO PIT FOR DISCRETE DATA

In order to further motivate the nonrandomized transform I t,θ deﬁned in (1), we intro-duce the randomized PIT, U rt ( θ ) := U − t ( θ ) + Z Ut (cid:0) U t ( θ ) − U − t ( θ ) (cid:1) , (3)where { Z Ut } Tt =1 are independent standard uniform random variables, and independent of Y t . Alternatively, U rt can be obtained by applying the standard continuous PIT to the continuous random variable Y † t := Y t − Z t , where { Z t } Tt =1 are iid with any continuouscdf F Z on [0 , Y † t , F † t,θ ( y | Ω t ) = F t,θ ( ⌊ y ⌋ | Ω t ) + F Z ( y − ⌊ y ⌋ ) ( F t,θ ( ⌊ y + 1 ⌋ | Ω t ) − F t,θ ( ⌊ y ⌋ | Ω t )) , where ⌊ y ⌋ is the ﬂoor function, i.e. the maximum integer not exceeding y , and ﬁnd that U rt ( θ ) = F † t,θ (cid:16) Y † t | Ω t (cid:17) , for Z Ut = F Z ( Z t ) and any choice of F Z , see Kheifets and Velasco (2013). Note that thecdf of Y † t conditional on Ω t and { Ω t , Z t − , Z t − , . . . , Z } coincide. Under H , U rt ( θ ) areiid U [0 ,

1] variables as under any continuous distribution speciﬁcation, while U t ( θ ) and U − t ( θ ) are not independent nor U [0 , U rt ( θ ), estimated using the randomized transform Y t { U rt ( θ ) ≤ u } , b F rθ ( u ) := 1 T T X t =1 { U rt ( θ ) ≤ u } , u ∈ [0 , , can be compared to the uniform cdf. Kheifets and Velasco (2013) then test H usingempirical process based on the randomized transform R T ( u ) := T / n b F rθ ( u ) − u o = 1 T / T X t =1 [ { U rt ( θ ) ≤ u } − u ] , u ∈ [0 , . We can also consider reducing the dependence on a particular outcome of the noise Z Ut in (3) and in the randomized transform by taking averages over M replications of { Z Ut } Tt =1 , conditional on the original data, similar to “average-jittering” of Machado and7antos Silva (2005). Suppose that for each t we have M independent sequences of uniform U [0 ,

1] noises Z Ut,m , m = 1 , , . . . , M , which generate U rt,m ( θ ) according to (3). Deﬁnethe M-random transform Y t I t,θ ,M ( Y t , u ), I t,θ ,M ( Y t , u ) := 1 M M X m =1 (cid:8) U rt,m ( θ ) ≤ u (cid:9) , which takes values on the set { , /M, /M, . . . , } and has mean u under H . Then thecdf of U rt ( θ ) is estimated by b F rθ ,M ( u ) := 1 T T X t =1 I t,θ ,M ( Y t , u ) , u ∈ [0 , . Note that with M = 1 we are back to b F rθ ( u ), and therefore, we can generalize R T to R T,M ( u ) := T / n b F rθ ,M ( u ) − u o , u ∈ [0 , . In order to propose speciﬁcation tests, following Handcock and Morris (1999), we de-ﬁne the discrete relative distribution of Y t compared to F t,θ as the cdf of U rt ( θ ). Under H , the discrete relative distribution is the uniform U [0 , Y t comparedto F t,θ can be ordered in terms of eﬃciency in the following way: e F θ ( u ) (the mosteﬃcient), b F rθ ,M ( u ) and b F rθ ( u ). This ordering is determined by the amount of noiseintroduced in the deﬁnitions of the transforms: i.e. in nonrandomized, M -randomizedand (1-)randomized transforms. The nonrandomized transform can be equivalently ob-tained by integrating out the extra noise in the randomized transform I t,θ ( Y t , u ) = R { U rt ( θ ) ≤ u } dF Z or by taking the number of replications M to inﬁnity, thus com-pletely removing the noise from the estimate of the discrete relative distribution and otherfunctionals of the transforms. The eﬃciency of the nonrandomized transform translatesinto the increased power of the speciﬁcation tests based on this transform, whose prop-erties we study next. 8 PROPERTIES OF EMPIRICAL PROCESSES BASEDON THE NONRANDOMIZED TRANSFORM

As shown in the next lemma, the building blocks of e F θ ( u ) , I t,θ ( u ) − u , constitute amartingale diﬀerence sequence (MDS) with respect to Ω t , and therefore e F θ ( u ) is anunbiased and consistent estimate of the uniform cdf under the null, a reasonable basisfor developing tests of H . Moreover, the MDS property will allow us to establish theasymptotic properties of our test without imposing any additional restrictions. Let for u, v ∈ [0 , γ t,θ ( u, v ) := ( F k − u ∨ v ) ( u ∧ v − F k − ) F k − F k − (cid:8) F − t,θ ( u | Ω t ) = F − t,θ ( v | Ω t ) (cid:9) , where k = k ( u ) = F − t,θ ( u | Ω t ), with F − t,θ ( u | Ω t ) := min { y : F t,θ ( y | Ω t ) ≥ u } beingthe conditional quantile function and F k := F t,θ ( k | Ω t ). Lemma 1.

Under H , I t,θ ( u ) − u is a martingale diﬀerence sequence with respect to Ω t ,i.e. E [ I t,θ ( u ) | Ω t ] = u, a.s., with conditional covariance E [ I t,θ ( u ) I t,θ ( v ) | Ω t ] = u ∧ v − uv − γ t,θ ( u, v ) , a.s. Note that I t,θ ( u ) are not necessarily independent across t despite the fact that bythe martingale diﬀerence property, I t,θ ( u ) and I t − j,θ ( v ) are serially uncorrelated forall j = 0 and all u, v ∈ [0 , , see the Appendix. On the other hand, the I t,θ ( u ) are(conditionally) heteroskedastic, therefore the variance of S T is model and parameterdependent, but its distribution can be simulated conditional on exogenous informationin Ω t . Let V T ( u, v ) := Cov [ S T ( u ) , S T ( v )], then since 0 ≤ γ t,θ ( u, v ) < a.s. , V T ( u, v ) = u ∧ v − uv − T T X t =1 E (cid:2) γ t,θ ( u, v ) (cid:3) ≤ u ∧ v − uv, S T are not larger than those of the randomizedtransformation-based process R T or its weak limit, the Brownian sheet, see Corollary 4in Kheifets and Velasco (2013).Due to Lemma 1, E h e F θ ( u ) i = u under H and the natural empirical process forperforming tests on H is then S T . This process, being based on a nonrandomizedtransform, does not involve the extra noise that appears in the randomized transformbased empirical process R T for testing U rt ∼ U [0 , R T,M based on the M -randomized transform. The nextlemma is the key to understand the improvement of the M -randomized over the ran-domized and of the nonrandomized, advocated in this paper, over the M -randomizedtransform approaches. Lemma 2.

Suppose that the uniform law of large numbers holds for b F rθ ,M ( u ) and e F θ ( u ) .Independently of whether H holds or not, b F rθ ,M ( u ) and e F θ ( u ) consistently and uni-formly in u estimate the relative distribution, i.e. the cdf of U rt ( θ ) . e F θ ( u ) is moreeﬃcient, but the diﬀerence in eﬃciency goes to as M → ∞ . In particular, under H , E [ R T,M ( u ) R T,M ( v )] = 1 M E [ R T ( u ) R T ( v )] + (cid:18) − M (cid:19) E [ S T ( u ) S T ( v )] . From Lemma 2, it follows that S T has the smallest variance, the variance of R T,M is aweighted sum of those of S T and R T , see also Equation (5) in Machado and Santos Silva(2005). Other advantages of S T over R T,M , are 1) computational, as there is no needto simulate M paths of transformations and 2) theoretical, since the weak convergenceis easier to prove for processes which are piece-wise linear in parameters. Therefore weconcentrate on studying the properties of tests based on the nonrandomized transform,for which we introduce the following assumption. Assumption 1. F t,θ ( · | Ω t ) ( k ) ∈ M a.s. for all t . Moreover, there exists a ﬁnite func-tion γ ∞ ( u, v ), such that uniformly in ( u, v ) ∈ [0 , , T − P Tt =1 γ t,θ ( u, v ) → p γ ∞ ( u, v ).This assumption implicitly restrict dynamics such that a uniform law of large numbers(LLN) holds for the averaged conditional covariance function. In the case of stationary10nd ergodic data, γ ∞ ( u, v ) = E (cid:2) γ ,θ ( u, v ) (cid:3) . Suﬃcient conditions for the stationarityand ergodicity of dynamic multinomial ordered choice models are given in Basu and deJong (2007) and for autoregressive Poisson are given in Davis et al. (2003), Fokianoset al. (2009) and Doukhan et al. (2012). Then it is possible to show the uniformity ofthe convergence from a point-wise result, since the summands are continuous, piece-wisepolynomials in u and v . As an illustration, in Section 8.5 in Appendix we discuss theassumptions for the Poisson model.The next result describes the asymptotic distribution of S T under the null hypothesis.Let ⇒ denote weak convergence in ℓ ∞ [0 , V ∞ ( u, v ) := u ∧ v − uv − γ ∞ ( u, v ). Lemma 3.

Suppose Assumption 1 holds. Under H , S T ⇒ S ∞ , where S ∞ is a Gaussian process in [0 , with zero mean and covariance function V ∞ . The asymptotic distribution of S T is model and parameter dependent, and the practi-cal implementation of tests when θ is unknown is discussed in Section 3.2 after presentinga general class of local alternatives to the null of correct speciﬁcation of the conditionaldistribution. We next discuss the asymptotic properties of the empirical process S T under a class ofalternative hypothesis, that will lead to consistency of the speciﬁcation tests based on S T for a wide class of alternatives. We consider the following class of local alternativesto H , H T : Y t | Ω t ∼ G T,t,θ ( · | Ω t ) for some θ ∈ Θ , where G T,t,θ ( y | Ω t ) = (cid:18) − δT / (cid:19) F t,θ ( y | Ω t ) + δT / H t ( y | Ω t ) , < δ < T / and for all t , H t ( · | Ω t ) ∈ M . When δ = 0 then H T nests H . Following Kheifets and Velasco (2013), for any discrete distributions G and F in M ,with probability functions g and f , deﬁne d ( G, F, u ) = G (cid:0) F − ( u ) (cid:1) − F (cid:0) F − ( u ) (cid:1) − F ( F − ( u )) − uf ( F − ( u )) (cid:2) g (cid:0) F − ( u ) (cid:1) − f (cid:0) F − ( u ) (cid:1)(cid:3) . Note, that d ( G, F, u ) = E G [ I F ( Y, u )] − E F [ I F ( Y, u )] = E G [ I F ( Y, u )] − u and d ( G, F, u ) ≡ G ≡ F . Under any G t ( · | Ω t ) ∈ M ,1 T / E [ S T ( u )] = 1 T T X t =1 E [ d ( G t ( · | Ω t ) , F t,θ ( · | Ω t ) , u )] . The next assumption guarantees that a LLN can be applied to the empirical discrepancybetween H t and F t,θ . Assumption 2.

Under H T , there exists a ﬁnite function D ( u ), such that uniformly in u ∈ [0 , , T P Tt =1 d ( H t ( · | Ω t ) , F t,θ ( · | Ω t ) , u ) → p D ( u ).Then the following lemma shows that the departure of H in the direction of H T introduces a drift in the asymptotic distribution of S T that will render consistency ofhypothesis tests based on functionals of H T . Lemma 4.

Suppose Assumptions 1-2 hold. Under H T , S T ⇒ S ∞ + δD , where S ∞ is as in Lemma 3. In practice, tests based on S T are unfeasible since θ is unknown, and has to be estimatedby b θ T , say. We assume that we have available an estimate b θ T so that under H T T / (cid:16)b θ T − θ (cid:17) = O p (1) , b S T ( u ) := 1 T / T X t =1 n I t, b θ T ( u ) − u o . We next analyze the consequences of replacing θ by b θ T in b S T .Let k · k be Euclidean norm, i.e. for matrix A , k A k = p tr ( AA ′ ), where A ′ is atranspose of A . For ε > , B ( a, ε ) is an open ball in R m with the center at point a andradius ε . For a cdf F θ in M deﬁne ∇ ( F θ , u ) := ˙ F θ (cid:0) F − θ ( u ) (cid:1) − F θ (cid:0) F − θ ( u ) (cid:1) − uf θ (cid:0) F − θ ( u ) (cid:1) ˙ f θ (cid:0) F − θ ( u ) (cid:1) , where ˙ F θ := ( ∂/∂θ ) F θ and ˙ f θ := ( ∂/∂θ ) f θ . We need the following assumptions to analyzethe asymptotic properties of b S T . Assumption 3 (Parametric family) . (A) The parameter space Θ is a compact set ina ﬁnite-dimensional Euclidean space, θ ∈ Θ ⊂ R m .(B) There exists δ >

0, such that F t,θ ( · | Ω t ) ∈ M , for all t , Ω t , T and θ ∈ B ( θ , δ ).(C) F t,θ ( k | Ω t ) is diﬀerentiable with respect to θ ∈ B ( θ , δ ) and under H T max t E h max k sup θ ∈ B ( θ ,δ ) (cid:13)(cid:13)(cid:13) ˙ F t,θ ( k | Ω t ) (cid:13)(cid:13)(cid:13)i ≤ M F < ∞ . (D) Under H T , there exists a ﬁnite L ( u ) := plim T →∞ T − P Tt =1 ∇ ( F t,θ ( · | Ω t ) , u ).Conditions (A)-(C) about the parametric family of distribution are standard, see e.g.Bai (2003, Assumptions A1-A2). For dynamic ordered choice and Poisson models thediﬀerentiability of the conditional distribution with respect to the parameter is equivalentto the diﬀerentiability of the link function. Part (D) guarantees a nice limit behaviourof the average generalized derivative of I t,θ . Conditions for no eﬀect of informationtruncation can be provided in a similar way to Bai (2003, Assumption A4).The following lemma provides an expansion of the empirical process with estimatedparameters as the sum of the process with known parameters and a random drift describ-ing parameter estimation. 13 emma 5. Suppose Assumptions 1-3 hold and T / (cid:16)b θ T − θ (cid:17) = O p (1) . Under H T , b S T ( u ) = S T ( u ) + T / (cid:16)b θ T − θ (cid:17) ′ T T X t =1 ∇ ( F t,θ ( · | Ω t ) , u ) + o p (1) , (4) uniformly in u . Then, continuous functionals of b S T no longer converge to those of S + δD under H T , but the estimation eﬀect also has to be taken into account using the followingassumption. Let Z (Ψ) be a normal vector with zero mean and covariance matrix Ψ . Assumption 4 (Parameter estimation) . Under H T , the estimator b θ T admits the asymp-totic linear expansion T / (cid:16)b θ T − θ (cid:17) = δξ + 1 T / T X t =1 ℓ t ( Y t , Ω t ) + o p (1) , (5)where ξ is a m × ℓ t constitute a martingale diﬀerence sequencewith respect to Ω t , such that (A) E [ ℓ t ( Y t , Ω t ) | Ω t ] = 0 and T − P Tt =1 E (cid:2) ℓ t ( Y t , Ω t ) ℓ t ( Y t , Ω t ) ′ | Ω t (cid:3) p → Ψ . (B) Lindeberg condition T − P Tt =1 E (cid:2) k ℓ t ( Y t , Ω t ) k (cid:8) T − / k ℓ t ( Y t , Ω t ) k > ε (cid:9) | Ω t (cid:3) p → (C) There exists a ﬁnite function W ( u ), such that T − P Tt =1 E [ I t,θ ( u ) ℓ t ( Y t , Ω t ) | Ω t ] → p W ( u ) uniformly in u .In particular, under H , δξ = 0, the estimate b θ T is centered and T / (cid:16)b θ T − θ (cid:17) converges in distribution to Z (Ψ).Assumption 4(A) and 4(B) hold for the MLE of many popular discrete models, includ-ing dynamic probit and logit and general discrete choice models. As an example consider14stimates b θ T , which are asymptotically equivalent to the (conditional) maximum likeli-hood estimates, i.e., T / (cid:16)b θ T − θ (cid:17) = − B − T / T X t =1 s t ( Y t , Ω t ) + o p (1) , where s t ( k, Ω t ) := ˙ f t,θ ( k | Ω t ) /f t,θ ( k | Ω t ) is the score function and B is a symmetric m × m positive deﬁnite matrix given by the limit of the Hessian, B := plim T →∞ T T X t =1 K X k =1 s t ( k, Ω t ) ˙ f t,θ ( k | Ω t ) ′ . Under H T , E [ s t ( Y t , Ω t ) | Ω t ] = δT − / P Kk =1 s t ( k, Ω t ) h t ( k | Ω t ). Then equation (5)holds with ξ = − plim T →∞ B − T − P Tt =1 P Kk =1 s t ( k, Ω t ) h t ( k | Ω t ) and ℓ t ( Y t , Ω t ) = − B − s t ( Y t , Ω t ) + B − P Kk =1 s t ( k, Ω t ) h t ( k | Ω t ).We can derive the covariance matrix between the process S T ( u ) and T / (cid:16)b θ T − θ (cid:17) and obtain joint convergence results, so under H T (cid:16) S T , T / (cid:16)b θ T − θ (cid:17)(cid:17) ⇒ ( S ∞ + δD , Z (Ψ) + δξ ) , (6)where the covariance function between S ∞ and Z (Ψ) is W ( u ).We can state now the result on the asymptotic distribution of the empirical process b S T under local alternatives, whose drift is diﬀerent with respect to the case without estimatedparameters. Theorem 1.

Suppose Assumptions 1-4 hold. Under H T , b S T ⇒ b S ∞ + δ { D + ξ ′ L } , where b S ∞ := S ∞ + Z (Ψ) ′ L is a Gaussian process with zero mean and variance function V ( u, v ) + L ( u ) ′ Ψ L ( v ) + W ( u ) ′ L ( v ) + W ( v ) ′ L ( u ) . EMPIRICAL PROCESSES FOR DYNAMIC SPEC-IFICATION

Test statistics based on S T , R T and R T,M verify that the conditional distribution of Y t is right on average across all possible Ω t , so these tests might not capture all sources ofmisspeciﬁcation. This issue is raised in Corradi and Swanson (2006), Delgado and Stute(2008) and Kheifets (2015) in relation to testing continuous distributions. However, it isnot possible to develop speciﬁcation tests conditioned on inﬁnite dimensional values of Ω t .Instead of truncating Ω t or restricting the class of models, we consider S T , a biparameteranalog of S T to control the possible dynamic misspeciﬁcation. From Lemma 1, sinceunder H , I t,θ ( u ) − u is a MDS, I t,θ ( u ) I t − ,θ ( u ) − u u is centered around zero,and moreover E [ I t,θ ( u ) I t − ,θ ( u ) | Ω t − ] = u u , a.s. This motivates us to develop tests based on S T deﬁned in (2). This process also has zeromean under the null and identiﬁes not only departures from the null derived from devia-tions of the unconditional expectation of I t,θ ( u ) from u, but also from a possible failureof the martingale property, so that I t,θ ( u ) and I t − ,θ ( u ) would become correlated.This idea is similar to that exploited in Kheifets’ (2015) in the context of conditionaldistribution testing for continuous distributions, where diﬀerent methods of checking theindependence property of the PIT are proposed. Alternative statistics exploiting the lackof correlations with any other lag could be proposed, but we expect that low lags aretypically more useful for detecting general forms of misspeciﬁcation.One could also consider a biparameter analog of R T,M , i.e. for some M = 1 , , . . . ,R T,M ( u ) := 1( T − / M T X t =2 M X m =1 (cid:0) (cid:8) U rt,m ( θ ) ≤ u (cid:9) (cid:8) U rt − ,m ( θ ) ≤ u (cid:9) − u u (cid:1) , where u = ( u , u ) ∈ [0 , . In particular, a bivariate analog of R T , R T ( u ) := R T, ( u ),is introduced in Kheifets and Velasco (2013). Tests based on R T and R T,M involverandomized transforms and therefore suﬀer from power loss compared to tests based onthe nonrandomized transform. 16ote, that S T ( u ) − u S T − ( u ) is a martingale. This observation will allow us toderive weak convergence of S T by employing limiting theorems for MDS. Properties of R T were established in Kheifets and Velasco (2013) and could be extended to R T,M .Here we discuss the properties of S T when we estimate θ . In practice we use the process b S T ( u ) := 1( T − / T X t =2 n I t, b θ T ( u ) I t, b θ T ( u ) − u u o , where we can write under H T b S T ( u ) = S T ( u ) + T / (cid:16)b θ T − θ (cid:17) ′ T T X t =2 ∇ ,t ( u ) + o p (1) , (7)uniformly in u , where ∇ ,t ( u ) := I t − ,θ ( u ) ∇ ( F t,θ ( · | Ω t ) , u )+ u ∇ ( F t − ,θ ( · | Ω t − ) , u ) and the asymptoticcovariance function is W ( u ) := ACov (cid:16) S T ( u ) , T / (cid:16)b θ T − θ (cid:17)(cid:17) . To study the asymp-totic properties of the biparameter process we introduce the next assumption, whichextends Assumption 2. Assumption 5.

Under H T , there exist ﬁnite functions D ( u ) and L ( u ), such thatuniformly in u (A) T − P Tt =2 { I t − ,θ ( u ) d ( H t ( · | Ω t ) , F t,θ ( · | Ω t ) , u )+ u d ( H t ( · | Ω t ) , F t,θ ( · | Ω t ) , u ) } → p D ( u ). (B) T − P Tt =2 ∇ ,t ( u ) → p L ( u ).Note that the second terms in the deﬁnitions of D and L correspond to u D ( u ) and u L ( u ) respectively, the equivalent for the single parameter process S T , but the ﬁrstones are new. To state the next result, we need to assume existence of probabilistic limitsof several random functions. For the sake of presentation, we defer precise statements tothe Appendix, see Assumption A. 17 heorem 2. Suppose that in addition to the conditions of Theorem 1, Assumption 5 andAssumption A from the Appendix hold. Under H T , S T ( u ) ⇒ S ∞ + δD , where S ∞ is a Gaussian process in [0 , with mean zero and covariance function V ∞ ( u, v ) deﬁned in the Appendix. Under H T , if parameters are estimated, b S T ⇒ b S ∞ + δ { D + ξ ′ L } , where b S ∞ := S ∞ + Z (Ψ) ′ L is a Gaussian process with zero mean and variance function V ∞ ( u, v ) + L ( u ) ′ Ψ L ( v ) + W ( u ) ′ L ( v ) + W ( v ) ′ L ( u ) . When G t ( · | Ω t ) is diﬀerent from F t,θ ( · | Ω t ) such that D is non-zero, the test basedon b S T has nontrivial power in the direction of H T . In contrast to the univariate casewith S T , the ﬁrst term in the deﬁnition of D contains correlation with the past infor-mation and can therefore capture dynamic misspeciﬁcation when this induces in such acorrelation, even if the unconditional expectation of d , which appears in the second term u D ( u ), is zero. This fact is crucial if misspeciﬁcation occurs in the dynamics and notonly in the link function or other static aspects of the model. To test H we consider Cramer-von Mises, Kolmogorov-Smirnov or any other continuousfunctionals of b S jT , j = 1 , η (cid:16) b S jT (cid:17) . Then consistency properties of speciﬁcation testsbased on b S jT can be derived using the discussion in the previous sections by applying thecontinuous mapping theorem, so we omit the proof of the following result. Theorem 3.

Suppose that conditions of Theorem 2 hold. Under H T , η (cid:16) b S jT (cid:17) → d η (cid:16) b S j ∞ (cid:17) , j = 1 , . Since the asymptotic distributions of S jT ( u ) are model dependent, and those of b S jT ( u )further depend on the estimation eﬀect, we need to resort to bootstrap methods to18mplement our tests in practice. In the literature, there are several resampling methodssuitable for dependent data, but since under H the parametric conditional distribution isfully speciﬁed, we apply a conditional parametric bootstrap algorithm that only requiresto make draws from F t, b θ ( · | Ω t ) to mimic the null distribution of the test statistics. For adiscussion of the parametric bootstrap see Stute et al. (1993) and Andrews (1997), whichcan be adapted to the complications with information truncation and initialization arisingin the dynamic case using the discussion in Bai (2003).To estimate the true 1 − α quantiles c j ( θ ) of the null asymptotic distribution of thetest statistics, given by some continuous functional η applied to b S j ∞ with δ = 0, weimplement the following steps.1. Estimate the model with data ( Y t , X ′ t ), t = 1 , , ..., T , get parameter estimator b θ T and compute test statistics η ( b S jT ).2. Simulate Y ∗ t with F c θ T ( · | Ω ∗ t ) recursively for t = 1 , , ..., T , where the bootstrapinformation set is Ω ∗ t = (cid:0) X t , Y ∗ t − , X t − , Y ∗ t − , X t − , ... (cid:1) .3. Estimate the model with simulated data Y ∗ t , get b θ ∗ T using the same method as for b θ T , get bootstrapped test statistics η (cid:16) b S ∗ jT (cid:17) .4. Repeat 2-3 B times, compute the percentiles of the empirical distribution of the B bootstrapped test statistics.5. Reject H if η (cid:16) b S jT (cid:17) is greater than the (1 − α )th percentile of the empirical dis-tribution of the B bootstrapped test statistics denoted by b c ∗ jB (cid:16)b θ T (cid:17) .To analyze the properties of our parametric bootstrap, we need to assume that thesame conditions on the estimation method hold for both for original and resampled data.More formally, we have Assumption 6. (A)

The conditional distribution of Y t conditional on Ω t coincides withthe conditional distribution of Y t conditional on Ω t ∪ { X ′ k } Tk = t +1 .19 B) Suppose that the sample is generated by F θ T , for some nonrandom sequence θ T converging to θ , i.e. we have a triangular array of random variables { Y T t : t =1 , , . . . , T } with ( T, t ) element generated by F θ T ( · | Ω T t ), whereΩ

T t = { X t , Y T t − , X t − , Y T t − , X t − , . . . } . Then the estimator b θ T of θ T admits anasymptotic linear expansion as in Assumption 4. Moreover, assume that under thealternative H , there exists some θ ∈ Θ so that θ = plim T →∞ b θ T . This assumption insures that by simulating from the conditional distribution F θ T we obtain the correct joint distribution of S jT and T / (cid:16)b θ T − θ T (cid:17) in parallel to thoserequired in Theorems 1-2. Assumption 6 (A) says that Y t and future X t are independentconditionally on past information, i.e. that there is no direct feedback eﬀect. For example,in a latent variable form of the ordered probit model, this assumption translates to strictexogeneity, i.e. that innovations are independent of future X t . Dependence between Y t and future X t is still allowed through serial dependence in X t and Y t . Assumption 6 (B)is similar to Condition (5.5) in Burke et al. (1979), Assumption (A1) in Stute et al. (1993)and Assumption E2 in Andrews (1997), and introduces a triangular array version of theexpansion and central limit theorem for parameter estimates, see also the discussion inSection 4.1 in Andrews (1997).We obtain the following result. Theorem 4.

Suppose that in addition to conditions of Theorem 2, Assumption 6 holds.Under H T , as B, T → ∞ ,η (cid:16) b S ∗ jT (cid:17) → d η (cid:16) b S j ∞ (cid:17) , j = 1 , , in probability, so b c ∗ jB (cid:16)b θ T (cid:17) → p c j ( θ ) , and therefore, under H , Pr (cid:16) η (cid:16) b S jT (cid:17) > b c ∗ jB (cid:16)b θ T (cid:17)(cid:17) → α . Suppose also that the conditions of Theorem 2 hold for any θ ∈ Θ . Under H , as B, T → ∞ , b c ∗ jB (cid:16)b θ T (cid:17) = O p (1) . This theorem shows that the bootstrap test statistic has the same limit distribution asthe original one under local alternatives, so that under the null we get the right asymptoticsize using bootstrap estimated critical values and that under local alternatives we get non20rivial power when the drifts of the stochastic processes b S T and b S T are non negligible.Similarly, under ﬁxed alternatives we are able to get a bootstrap consistent test when theasymptotic test is consistent itself, i.e. lim T →∞ Pr (cid:16) η (cid:16) b S jT (cid:17) > b c ∗ jB (cid:16)b θ T (cid:17)(cid:17) = 1 if η (cid:16) b S jT (cid:17) diverges asymptotically. In this section we use a Monte Carlo simulation exercise to investigate the ﬁnite sampleproperties of the tests proposed in this paper. We take as reference the dynamic ordereddiscrete choice models investigated in Basu and de Jong (2007) for the modeling of themonetary policy conducted by the Federal Reserve (FED). The dependent variable usesthe following codiﬁcation of the changes in the reference interest rate in US, the federalfunds rate i t , Y t =  i t < − .

252 if − . ≤ ∆ i t <

03 if 0 ≤ ∆ i t < .

254 if ∆ i t ≥ . . Data is monthly and spans January 1990 to December 2006, leading to T = 204complete observations. The explanatory variables that Basu and de Jong (2007) used toexplain the decisions of the FED on ∆ i t are the current value and 4 lags of inﬂation (inf),the current value and a lag of four diﬀerent measures of output gap ( out ) and a series ofdummies that describe the decision of the FED in the previous period, dum t = I (∆ i t − < , dum t = I (∆ i t − > , dum t = I (∆ i t − < − . , dum t = I (∆ i t − > . . Instead of these four dummies, we implement an AR(1), ’dynamic’ version with one lagof the discrete Y t as explanatory variable (and a version without lags that we refer to as’static’ to serve as a benchmark to the inclusion of lagged endogenous variables in Ω t ).We consider both the Logit and Probit versions of the models. We ﬁt four versions ofthe basic model based on diﬀerent deﬁnitions of the output gap and conditional on theseries of inﬂation and output gap and on the parameter estimates obtained, we simulate21able 1: Scenarios for Monte Carlo simulations.Scenario Null and AlternativeSize 1 H : static probitSize 2 H : static logitPower 1 H : static probit vs H : static logitPower 2 H : static probit vs H : dynamic probitPower 3 H : static probit vs H : dynamic logitseries Y t and conduct our tests on these (see Monte Carlo scenarios in Table 1).The four choices of output gap lead to Models I-IV. The output gap is the percentagedeviation of the actual from the potential output, which is interpolated to obtain aseries of monthly frequency by replicating the GDP observation for any quarter to allthe months in that quarter. Then two diﬀerent measures of potential output are used:the potential output series provided by the Congressional Budget Oﬃce and a potentialoutput series constructed in a real-time setting using the HP ﬁlter, leading to Models Iand II. Apart from output gap, other measures of economic activity are used, such asunemployment rate and capacity utilization, leading to Models III and IV. Data sourcesare described in Basu and de Jong (2007).We compare the performance of our tests with an alternative test which is also om-nibus and does not require smoothing (and choice of smoothing parameters). Two generalapproaches can be adapted to our setup: the test of the Generalized Linear Model (GLM)of Stute and Zhu (2002) and the Conditional Kolmogorov test of Andrews (1997), asdiscussed in Mora and Moro-Egido (2007). The ﬁrst one is a test based on a marked em-pirical process for testing the null H ′ : E h Y | e X = x i = m e β (cid:16) x ′ e β (cid:17) , where m e β ( · )is a parametric link function and e β , e β are ﬁnite dimensional parameters. In the caseswhere Y takes only two values { , } , the conditional mean coincides with the conditionalprobability of Y = 1 and the null is similar to our H if we were considering an i.i.d setup.22o test Y t | e X t ∼ P e β (cid:16) · | e X ′ t e β (cid:17) deﬁne the process Z T ( y ) := 1 T / T X t =1 n e X ′ t e β ≤ y o h Y t − P e β (cid:16) Y t = 1 | e X ′ t e β (cid:17)i , y ∈ R . The second test by Andrews is obtained by substituting n e X ′ t e β ≤ y o with n e X t ≤ e x o (where e x is a real vector of dimension of e X t ) in Z T , but since it always underperformsaccording to simulations of Mora and Moro-Egido (2007), it is not considered here. If Y takes values { , . . . , K } , Mora and Moro-Egido (2007) substitute testing H by K tests ofthe hypotheses Y jt | e X t ∼ P j, e β (cid:16) Y t | e X ′ t e β (cid:17) , with corresponding processes Z j,T , where Y jt = { Y t = j } and j = 1 , , . . . , K , then the resulting pooled test statistics are η CvMZ = T − K X j =1 T X t =1 Z j,T (cid:16) e X ′ t e β (cid:17) and η KSZ = T − max j =1 ,...,K T X t =1 Z j,T (cid:16) e X ′ t e β (cid:17) , which we call the CvM and KS tests respectively. To apply these tests to our model, let e X t = (cid:0) X ′ t , Y t − , (cid:1) ′ and e β = (cid:16) β ′ , ρ, − τ (cid:17) ′ and take the corresponding link functions.We analyze tests based on S T , R T,M , R T and S T , R T,M , R T and Z T . In all caseswe use Kolmogorov-Smirnov (KS) and Cramer-von Mises (CvM) measures. We onlyconsider feasible bootstrap versions of tests based on b S T , b R T,M , etc, where we replace θ by root- T consistent estimates b θ T , the ML estimator in our case. We are not awareof any theoretical results for bootstrap assisted tests based on b Z T in our setup, althoughMora and Moro-Egido (2007) provide some simulations.Parameter estimates for real data are reported in Tables 2 and 3. The main questionis whether the static Probit or Logit models are appropriate for changes in the interestrates, and we check this with our tests. The p -values in Tables 4 and 5 say that all thesemodels are rejected even at the 1% signiﬁcance level by biparameter nonrandomizedtransform based tests. Note that single parameter static tests (e.g. b R T , b S T ) cannotreject any proposed model with the sole exception of b S T which rejects at 5% Model IIwith Cramer – von Misses test statistics. 23able 2: ML estimates and standard errors of Models I-IV with static and dynamicspeciﬁcations and Probit link function applied to the real US data, T = 204.I-static I-dynamic II-static II-dynamic III-static III-dynamic IV-static IV-dynamic τ − . − . − . − . − . − . − . − . .

51) (0 .

66) (0 .

35) (0 .

47) (0 .

36) (0 .

48) (0 .

37) (0 . τ − . − . − . − . − . − . − . − . .

47) (0 .

64) (0 .

31) (0 .

46) (0 .

32) (0 .

47) (0 .

32) (0 . τ − .

72 1 . − .

39 2 .

60 0 .

09 2 . − .

11 2 . .

40) (0 .

63) (0 .

26) (0 .

48) (0 .

28) (0 .

48) (0 .

27) (0 . inf − . − . − . − . − . − . − . − . .

68) (0 .

72) (0 .

67) (0 .

71) (0 .

69) (0 .

73) (0 .

69) (0 . inf − .

86 2 .

90 1 .

94 3 .

05 2 .

05 3 .

07 2 .

14 3 . .

99) (1 .

06) (0 .

98) (1 .

06) (1 .

00) (1 .

07) (1 .

01) (1 . inf − − . − . − . − . − . − . − . − . .

98) (1 .

07) (0 .

97) (1 .

06) (0 .

99) (1 .

07) (1 .

02) (1 . inf − .

39 2 .

44 1 .

60 2 .

74 1 .

79 2 .

79 1 .

27 2 . .

99) (1 .

06) (0 .

98) (1 .

06) (1 .

00) (1 .

08) (1 .

03) (1 . inf − . − . − . − . − . − .

85 0 . − . .

68) (0 .

73) (0 .

66) (0 .

71) (0 .

67) (0 .

73) (0 .

71) (0 . out − . − .

02 0 .

36 0 .

40 3 .

35 2 . − . − . .

30) (0 .

33) (0 .

59) (0 .

63) (0 .

68) (0 .

74) (0 .

22) (0 . out − .

81 0 .

90 0 .

84 0 .

65 2 .

48 0 . − . − . .

29) (0 .

32) (0 .

59) (0 .

64) (0 .

67) (0 .

73) (0 .

22) (0 . Y − — − .

08 — − .

12 — − .

03 — − . .

15) (0 .

16) (0 . T = 204.I-static I-dynamic II-static II-dynamic III-static III-dynamic IV-static IV-dynamic τ − . − . − . − . − . − . − . − . .

98) (1 .

20) (0 .

68) (0 .

83) (0 .

69) (0 .

85) (0 .

72) (0 . τ − . − . − . − . − . − . − . − . .

90) (1 .

17) (0 .

60) (0 .

81) (0 .

59) (0 .

83) (0 .

61) (0 . τ − .

00 3 . − .

85 4 .

52 0 .

07 4 . − .

24 4 . .

72) (1 .

12) (0 .

47) (0 .

84) (0 .

49) (0 .

86) (0 .

49) (0 . inf − . − . − . − . − . − . − . − . .

21) (1 .

30) (1 .

21) (1 .

29) (1 .

21) (1 .

32) (1 .

22) (1 . inf − .

28 4 .

95 3 .

22 5 .

46 3 .

59 5 .

43 3 .

41 5 . .

78) (1 .

92) (1 .

77) (1 .

92) (1 .

76) (1 .

93) (1 .

82) (1 . inf − − . − . − . − . − . − . − . − . .

74) (1 .

95) (1 .

73) (1 .

94) (1 .

76) (1 .

95) (1 .

86) (1 . inf − .

42 4 .

36 2 .

61 5 .

20 2 .

94 5 .

11 1 .

65 4 . .

75) (1 .

92) (1 .

75) (1 .

93) (1 .

77) (1 .

95) (1 .

86) (1 . inf − . − . − . − .

88 0 . − .

54 2 . − . .

20) (1 .

32) (1 .

18) (1 .

28) (1 .

19) (1 .

30) (1 .

27) (1 . out − . − .

79 0 .

43 0 .

63 5 .

87 4 . − . − . .

54) (0 .

60) (1 .

04) (1 .

14) (1 .

24) (1 .

34) (0 .

40) (0 . out − .

43 1 .

59 1 .

61 1 .

29 4 .

21 1 . − . − . .

52) (0 .

59) (1 .

04) (1 .

15) (1 .

20) (1 .

33) (0 .

40) (0 . Y − — − .

98 — − .

04 — − .

86 — − . .

28) (0 .

27) (0 .

28) (0 . T = 204. b S T b R T, b R T, b R T b S T b R T, b R T, b R T b Z T H : static probitModel I 0 .

001 0 .

237 0 .

009 0 .

026 0 .

078 0 .

516 0 . .

001 0 .

166 0 .

077 0 .

057 0 .

229 0 .

167 0 . .

001 0 .

307 0 .

492 0 .

632 0 .

616 0 .

731 0 . .

001 0 .

002 0 .

496 0 .

721 0 .

509 0 .

582 0 .

668 0 . H : static logitModel I 0 .

001 0 .

152 0 .

021 0 .

079 0 .

221 0 .

793 0 . .

001 0 .

112 0 .

113 0 .

155 0 .

459 0 .

240 0 . .

001 0 .

360 0 .

314 0 .

493 0 .

541 0 .

745 0 . .

001 0 .

448 0 .

890 0 .

804 0 .

899 0 .

634 0 . Table 5: P-values of Kolmogorov – Smirnov tests for static Probit and Logit link functionapplied to the real US data, T = 204. b S T b R T, b R T, b R T b S T b R T, b R T, b R T b Z T H : static probitModel I 0 .

003 0 .

002 0 .

082 0 .

047 0 .

193 0 .

372 0 .

354 0 . .

001 0 .

002 0 .

586 0 .

351 0 .

426 0 .

626 0 .

450 0 . .

001 0 .

155 0 .

454 0 .

435 0 .

244 0 .

742 0 . .

001 0 .

002 0 .

799 0 .

936 0 .

913 0 .

801 0 .

355 0 . H : static logitModel I 0 .

001 0 .

133 0 .

010 0 .

050 0 .

212 0 .

684 0 . .

001 0 .

354 0 .

114 0 .

201 0 .

319 0 .

416 0 . .

001 0 .

149 0 .

511 0 .

472 0 .

350 0 .

642 0 . .

002 0 .

001 0 .

769 0 .

975 0 .

968 0 .

867 0 .

411 0 . T = 100. b S T b R T, b R T, b R T b S T b R T, b R T, b R T b Z T Size 1 H : static probitModel I 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H : static logitModel I 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H : static probit vs H : static logitModel I 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H : static probit vs H : dynamic probitModel I 89 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H : static probit vs H : dynamic logitModel I 90 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 7: Simulated size/power rates for the nominal 5% level of Kolmogorov – Smirnovtests of Models I-IV with static and dynamic speciﬁcations applied to simulated data, T = 100. b S T b R T, b R T, b R T b S T b R T, b R T, b R T b Z T Size 1 H : static probitModel I 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H : static logitModel I 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H : static probit vs H : static logitModel I 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H : static probit vs H : dynamic probitModel I 82 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H : static probit vs H : dynamic logitModel I 86 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . T = 200. b S T b R T, b R T, b R T b S T b R T, b R T, b R T b Z T Size 1 H : static probitModel I 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H : static logitModel I 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H : static probit vs H : static logitModel I 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H : static probit vs H : dynamic probitModel I 98 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H : static probit vs H : dynamic logitModel I 98 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 9: Simulated size/power rates for the nominal 5% level of Kolmogorov – Smirnovtests of Models I-IV with static and dynamic speciﬁcations applied to simulated data, T = 200. b S T b R T, b R T, b R T b S T b R T, b R T, b R T b Z T Size 1 H : static probitModel I 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H : static logitModel I 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H : static probit vs H : static logitModel I 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H : static probit vs H : dynamic probitModel I 94 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H : static probit vs H : dynamic logitModel I 97 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

28o study the reliability of these results we conduct a Monte Carlo experiment usingthe estimated models with the real data as data generating processes and obtain thesimulations for the discrete response conditional on the covariates time series. In Tables 6and 7 we provide the empirical size and power results of our tests across simulations forsample size T = 100 and static Probit and Logit and output gap choices (Models Ito IV). To speed up the simulation procedure, we use the warp bootstrap algorithm ofGiacomini, Politis and White (2013). We see that all bootstrap tests provide reasonablesize accuracy, tests based on single parameter empirical processes underrejecting slightly,while ones based on bivariate processes tend to overreject moderately. Kolmogorov-Smirnov and Cramer-von Mises tests perform similarly in all cases, and the choice of theoutput gap series does not make large diﬀerences either, nor does the introduction oflagged endogenous (discrete) variables in the information set.The power of the tests for the static Probit model is analyzed against three diﬀerentalternatives: static Logit, dynamic Probit and dynamic Logit. We see that the testswithout randomization, b S T and b S T always perform better than random continuousprocesses b R T,M and b R T,M , which in turn dominate b R T and b R T , thus conﬁrming ourtheoretical ﬁndings. When we compare Probit and Logit speciﬁcations while letting thedynamic aspect of the model be well speciﬁed, static in both cases, we observe that withthis sample size and these speciﬁcations, it is almost impossible to distinguish Probit fromLogit models. The power against a dynamic Probit and Logit alternatives is very high.Since the nature of misspeciﬁcation is dynamic, once again bivariate processes shouldhave more power compared to single parameter counterparts, as it is conﬁrmed in oursimulation results. It can also be observed that for these alternatives, the Cramer-vonMises criterium provides more power than Kolmogorov-Smirnov tests. As for alternativetests based on b Z T , they have power comparable to b S T , sometimes slightly better, andare always outperformed by any bivariate test. This is not surprising, since b Z T has morestructure, i.e. it assumes a single-index model for covariates, but averages across points,thus suﬀering the same problems as other single parameter tests considered here.In Tables 8 and 9 we provide the empirical size and power results of our tests for the29arger sample size T = 200. Here the size properties are similar, while power rejectionsrates are noticeably higher for the dynamic alternatives. In this paper we have proposed new speciﬁcation tests for the conditional distribution ofdiscrete data with possibly inﬁnite support. The new tests are functionals of empiricalprocesses based on a nonrandomized transform that solves the implementation problemof the usual PIT for discrete distributions and achieves consistency against a wide classof alternatives. We show the validity of a bootstrap algorithm for approximating thenull distribution of the test statistics, which are model and parameter dependent. Inour simulation study, we show that our method compares favorably in many relevantsituations with other methods available in the literature and have illustrated the newmethod in a small application.

In this section we derive the basic properties of the nonrandomized transform, whichare required prior to proving the weak convergence results for our empirical process.Without loss of generality and in order to make the exposition more transparent, weomit subscripts t, θ and conditioning set Ω t , and use shortcuts I F ( Y, u ) = I t,θ ( Y t , u )and I F,M ( Y, u ) = I t,θ ,M ( Y t , u ).For F ∈ M , F ( F − ( u )) ≥ u > F ( F − ( u ) −

1) and equality holds iﬀ u = F ( k )for some integer k . For a random variable Y ∼ G ∈ M we ﬁnd Pr G ( F ( Y ) < u ) = G ( F − ( u ) −

1) and g ( F − ( u )) := Pr G ( Y = F − ( u )) = G ( F − ( u )) − G ( F − ( u ) − G = F , we have that Pr F ( F ( Y ) < u ) = F ( F − ( u ) − < u , i.e. F ( Y ) is notuniform and the expectation of the indicator function I ( F ( Y ) < u ) is never u as it is for30ontinuous F .The nonrandomized transform can be written as I F ( Y, u ) = (1 − δ F ( u )) (cid:8) Y = F − ( u ) (cid:9) + (cid:8) Y < F − ( u ) (cid:9) , where δ F ( u ) := F ( F − ( u )) − uf ( F − ( u )) . Note that δ F ( u ) ∈ [0 , I F ( Y, u ) is a piecewise linear (continuous) functionincreasing in u . Let δ F ( u, v ) := ( δ F ( u ∨ v ) − δ F ( u ) δ F ( v )) f (cid:0) F − ( u ∧ v ) (cid:1) × (cid:8) F − ( u ) = F − ( v ) (cid:9) ∈ (cid:2) , u ∧ v ∧ f (cid:0) F − ( u ∧ v ) (cid:1)(cid:3) ,d ( G, F, u, v ) := d ( G, F, u ∧ v ) − ( δ F ( u ∨ v ) − δ F ( u ) δ F ( v )) (cid:8) F − ( u ) = F − ( v ) (cid:9) × (cid:0) g (cid:0) F − ( u ) (cid:1) − f (cid:0) F − ( u ) (cid:1)(cid:1) , In Table 10 and Lemma A we list the properties of this transform.

Lemma A.

For ≤ v ≤ u ≤ and F, G, H ∈ M , (i) E G [ I F ( Y, u )] = u + d ( G, F, u ) , where E G [ · ] = R ( · ) dG and d ( G, F, u ) ∈ [ − u, − u ] .When G = F , the expectation is u .(ii) I F ( Y, u ) I F ( Y, v ) = I F ( Y, u ∧ v ) − ( δ F ( u ∨ v ) − δ F ( u ) δ F ( v )) × { Y = F − ( u ) = F − ( v ) } . (iii) E G [ I F ( Y, u ) I F ( Y, v )] = u ∧ v − δ F ( u, v ) + d ( G, F, u, v ) .(iv) | I F ( Y, u ) − I H ( Y, u ) | ≤ ∧ | F ( Y ) − H ( Y ) |∨| F ( Y − − H ( Y − | f ( Y ) ∨ h ( Y ) Moreover, E F (cid:2) | I F ( Y, u ) − I H ( Y, u ) | (cid:3) ≤ k | F ( k ) − H ( k ) | .(v) | I F ( Y, u ) − u − I F ( Y, v ) + v | ≤ | u − v |∨ (1 − f ( Y )) and | I F ( Y, u ) − u − I F ( Y, v ) + v | = | u − v | if u, v ≤ F ( Y − or u, v ≥ F ( Y ) . Moreover, E F (cid:2) sup u,v ∈ Ψ( ε ) | I F ( Y, u ) − u − I F ( Y, v ) + v | (cid:3) ≤ ε , for any interval Ψ( ε ) ⊂ [0 , of length ε . I ( · , · ) for all possiblevalues of Y relative to inverted cdfs at points u and v . For instance, I F ( Y, u ) − I F ( Y, v ) =0 if

Y < F − ( u ) and Y < F − ( v ), while I F ( Y, u ) − I F ( Y, v ) = − δ F ( u ) if Y = F − ( u ) F − ( u )The value of I F ( Y, u )1 1 − δ F ( u ) 0The value of { I F ( Y, u ) ≤ v } v = 0 0 0 1 v ∈ (0 ,

1) 0 { − δ F ( u ) ≤ v } v = 1 1 1 1The value of I F ( Y, u ) − I F ( Y, v ) Y < F − ( v ) 0 − δ F ( u ) − Y = F − ( v ) δ F ( v ) δ F ( v ) − δ F ( u ) − δ F ( v ) Y > F − ( v ) 1 1 − δ F ( u ) 0The value of I F ( Y, u ) I F ( Y, v ) Y < F − ( v ) 1 1 − δ F ( u ) 0 Y = F − ( v ) 1 − δ F ( v ) (1 − δ F ( u )) (1 − δ F ( v )) 0 Y > F − ( v ) 0 0 0The value of I F ( Y, u ) − I H ( Y, u ) Y < H − ( u ) 0 − δ F ( u ) − Y = H − ( u ) δ H ( u ) δ H ( u ) − δ F ( u ) − δ H ( u ) Y > H − ( u ) 1 1 − δ F ( u ) 0The value of I F ( Y, u ) I H ( Y, u ) Y < H − ( u ) 1 1 − δ F ( u ) 0 Y = H − ( u ) 1 − δ H ( u ) (1 − δ F ( u )) (1 − δ H ( u )) 0 Y > H − ( u ) 0 0 032 vi) E F z (cid:2) (cid:8) F † (cid:0) Y † (cid:1) < u (cid:9)(cid:3) = I F ( Y, u ) .(vii) E F z [ I F,M ( Y, u ) I F,M ( Y, v )] = M I F ( Y, u ∧ v ) + (cid:0) − M (cid:1) I F ( Y, u ) I F ( Y, v ) . In this section we present Lindeberg-Feller-type suﬃcient conditions for functional weakconvergence of discrete martingales. In general, to establish the weak convergence oneneeds to check tightness and ﬁnite-dimensional convergence. In case of martingales, bothparts can be veriﬁed without imposing restrictive conditions. Here we state a result ofNishiyama (2000), which extends Theorem 2.11.9 of van der Vaart and Wellner (1996)to martingales, see also Theorem A.1 in Delgado and Escanciano (2007). Further detailson notation and deﬁnitions can be found in books Van der Vaart and Wellner (1996) forempirical processes and row-independent triangular arrays and in Jacod and Shiryaev(2003) for ﬁnite-dimensional semimartingales. For every T , let (cid:0) Ω T , F T , {F Tt } , P T (cid:1) bea discrete stochastic basis, where (cid:0) Ω T , F T , P T (cid:1) is a probability space equipped with aﬁltration (cid:8) F Tt (cid:9) . For nonempty set Ψ, let { ξ Tt } t =1 , ,... be a ℓ ∞ (Ψ)-valued martingalediﬀerence array with respect to ﬁltration F Tt , i.e. for every t , ξ Tt maps Ω T to ℓ ∞ (Ψ),the space of bounded, R -valued functions on Ψ with sup-norm k · k = k · k ∞ and foreach u ∈ Ψ, ξ Tt ( u ) is a R -valued martingale diﬀerence array: ξ Tt ( u ) is F Tt -measurableand E (cid:2) ξ Tt ( u ) | F Tt (cid:3) = 0. We are interested in studying the weak convergence of discretemartingales P Tt =1 ξ Tt . Denote a decreasing series of ﬁnite partitions (DFP) of Ψ as Π = { Π( ε ) } ε ∈ (0 , ∩ Q , where Π( ε ) = { Ψ( ε ; k ) } ≤ k ≤ N Π ( ε ) such that Ψ = S N Π ( ε ) k =1 Ψ( ε ; k ), N Π (1) =1 and lim ε → N Π ( ε ) = ∞ monotonically in ε . The ε -entropy of the DFP Π is H Π ( ε ) = p log N Π ( ε ). The quadratic Π-modulus of ξ Tt is R + ∪{∞} -valued process (cid:13)(cid:13) ξ Tt (cid:13)(cid:13) Π ,T = sup ε ∈ (0 , ∩ Q ε max ≤ k ≤ N Π ( ε ) vuut T X t =1 E " sup u,v ∈ Ψ( ε ; k ) (cid:12)(cid:12) ξ Tt ( u ) − ξ Tt ( v ) (cid:12)(cid:12) | F Tt . (8) Theorem A.

Let { ξ Tt } t =1 , ,... be a ℓ ∞ (Ψ) -valued martingale diﬀerence array andN1) (conditional variance convergence) P Tt =1 E (cid:2) ξ Tt ( u ) ξ Tt ( v ) | F Tt (cid:3) → P T V ( u, v ) for every u, v ∈ Ψ; 33

2) (Lindeberg condition) P Tt =1 E h(cid:13)(cid:13) ξ Tt (cid:13)(cid:13) (cid:8)(cid:13)(cid:13) ξ Tt (cid:13)(cid:13) > ε (cid:9) | F Tt i → P T for every ε > ;N3) (partitioning entropy condition) there exist a DFP Π of Ψ such that (cid:13)(cid:13) ξ Tt (cid:13)(cid:13) Π ,T = O P T (1) and R H Π ( ε ) dε < ∞ .Then P Tt =1 ξ Tt ⇒ S , where S has normal marginals ( S ( v ) , S ( v ) , . . . , S ( v d ))) ∼ d N (0 , Σ) with covariance Σ = { V ( v i , v j ) } ij . To establish the asymptotic properties of the biparameter process S T we need the fol-lowing assumption for uniform convergence of diﬀerent empirical quantities. Assumption A.

Under H T , the following uniform limits to continuous functions exist1. plim T →∞ T P Tt =2 γ t − ,θ ( u , v ) γ t,θ ( u , v ),2. plim T →∞ T P Tt =2 I t − ,θ ( v ) γ t,θ ( u , v ),3. plim T →∞ T P Tt =2 I t − ,θ ( u ) d ( H t ( · | Ω t ) , F t,θ ( · | Ω t ) , u ),4. plim T →∞ T P Tt =2 I t − ,θ ( u ) E [ I t,θ ( u ) ℓ t ( Y t , Ω t ) | Ω t ],5. plim T →∞ T P Tt =2 I t − ,θ ( u ) ∇ ( F t,θ ( · | Ω t ) , u ).As it is discussed in the text, these conditions restrict the dynamics of the data processsuch that some LLN holds, which is the case, e.g., for stationary and ergodic processes. Proof of Lemma A . (i) By deﬁnition of I F ( Y, u ), E G [ I F ( Y, u )] = (1 − δ F ( u )) g ( F − ( u ))+ G ( F − ( u )) − g ( F − ( u )) = d ( G, F, u ) − δ F ( u ) f ( F − ( u )) + F ( F − ( u )) = d ( G, F, u ) + u .Similarly, by direct calculation we obtain (ii), (iii), (vi) and (vii). We now provide adetailed proof of (iv) and (v). 34 iv) We prove a stronger result that for G ∈ M , such that sup k | F ( k ) − G ( k ) | ∨| H ( k ) − G ( k ) | ≤ sup k | F ( k ) − H ( k ) | the expectation with respect to G is bounded:E G (cid:2) ( I F ( Y, u ) − I H ( Y, u )) (cid:3) ≤ k | F ( k ) − H ( k ) | . Then, the required bound is ob-tained by setting G ≡ F .Since | I F ( Y, u ) − I H ( Y, u ) | never exceeds 1, we have that E G (cid:2) ( I F ( Y, u ) − I H ( Y, u )) (cid:3) ≤ E G [ | I F ( Y, u ) − I H ( Y, u ) | ], therefore we bound the latter expectation.Suppose that F − ( u ) = H − ( u ). Then I F ( Y, u ) − I H ( Y, u ) = δ H ( u ) − δ F ( u ) for Y = F − ( u ), i.e. with probability g ( F − ( u )), and is zero for other Y . Therefore,E G [ | I F ( Y, u ) − I H ( Y, u ) | ] = | δ H ( u ) − δ F ( u ) | g (cid:0) F − ( u ) (cid:1) ≤ (cid:12)(cid:12) F (cid:0) F − ( u ) (cid:1) − H (cid:0) F − ( u ) (cid:1)(cid:12)(cid:12) + (cid:12)(cid:12) f (cid:0) F − ( u ) (cid:1) − g (cid:0) F − ( u ) (cid:1)(cid:12)(cid:12) δ F ( u ) + (cid:12)(cid:12) h (cid:0) F − ( u ) (cid:1) − g (cid:0) F − ( u ) (cid:1)(cid:12)(cid:12) δ H ( u ) ≤ sup k | F ( k ) − H ( k ) | + sup k | f ( k ) − g ( k ) | + sup k | h ( k ) − g ( k ) |≤ k | F ( k ) − H ( k ) | , since δ F ( u ) , δ H ( u ) ∈ [0 ,

1) and sup k | h ( k ) − g ( k ) | ≤ k | F ( k ) − G ( k ) | .Suppose that F − ( u ) < H − ( u ). Note that I F ( Y, u ) − I H ( Y, u ) = 0 for Y [ F − ( u ) , H − ( u )]. We separately bound each term inE G [ | I F ( Y, u ) − I H ( Y, u ) | ] = E G (cid:2) | I F ( Y, u ) − I H ( Y, u ) | (cid:8) Y = F − ( u ) (cid:9)(cid:3) + E G (cid:2) | I F ( Y, u ) − I H ( Y, u ) | (cid:8) Y = H − ( u ) (cid:9)(cid:3) + E G (cid:2) | I F ( Y, u ) − I H ( Y, u ) | (cid:8) F − ( u ) < Y < H − ( u ) (cid:9)(cid:3) . For Y = F − ( u ), I F ( Y, u ) − I H ( Y, u ) = − δ F ( u ). ThenE G (cid:2) ( I F ( Y, u ) − I H ( Y, u )) (cid:8) Y = F − ( u ) (cid:9)(cid:3) = (cid:12)(cid:12) I F (cid:0) F − ( u ) , u (cid:1) − I H (cid:0) F − ( u ) , u (cid:1)(cid:12)(cid:12) g ( F − ( u ))= δ F ( u ) g (cid:0) F − ( u ) (cid:1) = F (cid:0) F − ( u ) (cid:1) − u + δ F ( u ) (cid:0) g (cid:0) F − ( u ) (cid:1) − f (cid:0) F − ( u ) (cid:1)(cid:1) ≤ sup k | F ( k ) − H ( k ) | + sup k | f ( k ) − g ( k ) | ≤ k | F ( k ) − H ( k ) | , since δ F ( u ) ∈ [0 ,

1) and for u ∈ [ H ( F − ( u )) , F ( F − ( u ))] we have that F ( F − ( u )) − u ≤ F ( F − ( u )) − H ( F − ( u )). 35or Y = H − ( u ), I F ( Y, u ) − I H ( Y, u ) = − δ H ( u ).Then E G (cid:2) | I F ( Y, u ) − I H ( Y, u ) | (cid:8) Y = H − ( u ) (cid:9)(cid:3) = (cid:12)(cid:12) I F (cid:0) H − ( u ) , u (cid:1) − I H (cid:0) H − ( u ) , u (cid:1)(cid:12)(cid:12) g ( H − ( u ))= (1 − δ H ( u )) g (cid:0) H − ( u ) (cid:1) = u − H (cid:0) H − ( u ) − (cid:1) + (1 − δ H ( u )) (cid:0) g (cid:0) H − ( u ) (cid:1) − h (cid:0) H − ( u ) (cid:1)(cid:1) ≤ sup k | F ( k ) − H ( k ) | + sup k | h ( k ) − g ( k ) | ≤ k | F ( k ) − H ( k ) | , since δ H ( u ) ∈ [0 ,

1) and for u ∈ [ H ( H − ( u ) − , F ( H − ( u ) − u − H ( H − ( u ) − ≤ F ( H − ( u ) − − H ( H − ( u ) − F − ( u ) < Y < H − ( u ), I F ( Y, u ) − I H ( Y, u ) = −

1. Then E F | I F ( Y, u ) − I H ( Y, u ) | (cid:8) F − ( u ) < Y < H − ( u ) (cid:9) = H − ( u ) − X k = F − ( u )+1 g ( k ) = G ( H − ( u ) − − G ( F − ( u )) ≤ F ( H − ( u ) − − F ( F − ( u )) + 2 sup k | G ( k ) − F ( k ) |≤ F ( H − ( u ) − − H ( H − ( u ) −

1) + 2 sup k | G ( k ) − F ( k ) | ≤ k | F ( k ) − H ( k ) | since H ( H − ( u ) − < u < F ( F − ( u )) < F ( H − ( u ) − G (cid:2) | I F ( Y, u ) − I H ( Y, u ) | (cid:3) ≤ k | F ( k ) − H ( k ) | for F − ( u ) < H − ( u ). This equation is symmetric with respect to F and H ; therefore, itholds also for F − ( u ) > H − ( u ). (v) Let [ a, b ] denote the interval Ψ( ε ) of length ε , sup ξ denote the supremum of ξ over u, v ∈ [ a, b ], where ξ := I F ( Y, u ) − u − I F ( Y, v ) + v .Note that | ξ | ≤

1; moreover, if [ F ( Y − , F ( Y )] ∩ [ a, b ] = ∅ , then sup | ξ | = ε and if[ a, b ] ⊂ [ F ( Y − , F ( Y )], then sup | ξ | = − f ( Y ) f ( Y ) ε .Suppose that F − ( a ) = F − ( b ), i.e. [ a, b ] ⊂ [ F ( F − ( a ) − , F ( F − ( a ))]. ThenE F (cid:2) sup ξ (cid:3) ≤ E F [sup | ξ | ] = ε P k = F − ( a ) f ( k ) + (cid:16) − f ( F − ( a )) f ( F − ( a )) (cid:17) ε f ( F − ( a )) = 2(1 − f ( F − ( a ))) ε ≤ ε . 36uppose that F − ( a ) < F − ( b ), i.e. [ a, b ] contains at least one point F ( k ) or evenintervals [ F ( k − , F ( k )] ⊂ [ a, b ]. On such intervals, | ξ | goes up to 1 − f ( k ), but theprobability of Y taking all such k is bounded by b − a . More precisely,E F (cid:2) sup ξ (cid:3) ≤ E F [sup | ξ | ]= ε X kF − ( b ) f ( k ) < ε , since the sum of the ﬁrst and the last terms is below ε , the second and the fourth termseach is bounded by ε and the third term is P k ∈ [ F − ( a )+1 ,F − ( b ) − f ( k ) = F ( F − ( b ) − − F ( F − ( a ) + 1) ≤ b − a = ε . Proof of Lemma 1 . Substitute G = F = F θ ( · | Ω t ) in Lemma A(i) to demonstratethat E [ I t,θ ( u ) | Ω t ] = E [ I t,θ ( u )] = u , therefore I t,θ ( u ) − u is a martingale diﬀerence se-quence for every u ∈ [0 , G = F = F θ ( · | Ω t ).However the I t,θ ( u ) are not independent in general. To show that, note that bivariateindependence requires thatPr ( I t,θ ( u ) ≤ u , I t − ,θ ( u ) ≤ u ) = Pr ( I t,θ ( u ) ≤ u ) Pr ( I t − ,θ ( u ) ≤ u )for all u, u and u ∈ [0 , { I t,θ ( u ) ≤ u } { I t − ,θ ( u ) ≤ u } ] = E [E [ { I t,θ ( u ) ≤ u } { I t − ,θ ( u ) ≤ u } | Ω t ]]= E [ { I t − ,θ ( u ) ≤ u } E [ { I t,θ ( u ) ≤ u } | Ω t ]]and now, for u , u ∈ (0 ,

1) and under H , E [ { I t,θ ( u ) ≤ u } | Ω t ] = 1 − F θ (cid:0) F − θ ( u | Ω t ) | Ω t (cid:1) + n − δ F θ ( ·| Ω t ) ( u ) ≤ u o f θ ( · | Ω t ) (cid:0) F − θ ( u | Ω t ) (cid:1) , which depends on Ω t , and therefore E ( { I t,θ ( u ) ≤ u } | Ω t ) = E ( { I t,θ ( u ) ≤ u } ) withpositive probability, and independence does not follow in general.37 roof of Lemma 2. Because U rt ( θ ) are continuous, b F rθ ( u ) is a (uniform) consistentestimate of cdf of U rt ( θ ). Then by Lemma A(vi) and A(vii) and ULLN we get the uniformconsistency of b F rθ ,M ( u ) and e F rθ ( u ). The eﬃciency gain comes from Lemma A(ii). Proof of Lemma 3.

We need to verify conditions N1-N3 of Theorem A. Fix ε > ,

1] with usual norm and equidistant partition 0 = u < u < . . . < u N Π ( ε ) =1, i.e. partition of [0 ,

1] in N Π ( ε ) = [ ε − ] + 1 equal intervals of length ε (the lastinterval maybe even smaller), Ψ( ε ; k ) = [ u k − , u k ] and ξ Tt = ( I F ( Y t , u ) − u ) / √ T , whichis a square integrable martingale diﬀerence by Lemma 1. Then Condition N1 followsfrom Lemma 1 and Assumption 1. Condition N2 is satisﬁed because for T > ε − ],the indicator 1 n sup u ∈ [0 , | I F ( Y t , u ) − u | / √ T > ε o = 0. Condition N3 follows from thebound in Lemma A(v). Indeed, R H Π ( ε ) dε < ∞ and (cid:13)(cid:13) ξ Tt (cid:13)(cid:13) Π ,k ≤ sup ε ∈ (0 , ∩ Q ε max ≤ k ≤ N Π ( ε ) √ ε ≤ Proof of Lemma 4.

Apply weak convergence result from Lemma 3 under G T,θ ( · | Ω t )with ξ Tt := (cid:16) I F θ ( ·| Ω t ) ( Y t , u ) − u − d ( G T,θ ( · | Ω t ) , F θ ( · | Ω t ) , u ) (cid:17) / √ T , which is asquare integrable martingale diﬀerence because of Lemma A(i) with G = G T,θ ( · | Ω t )and F = F θ ( · | Ω t ). Then Condition N1 follows from Lemma A(iii) and the fact that d ( G, F, u, v ) are bounded in absolute value by T − / a.s. Condition N2 is satisﬁed becausefor T > ε − ], the indicator is 0. Condition N3 follows from the bound in Lemma A(v)and the fact that (E G [ · ] − E F [ · ]) applied to a.s. bounded r.v. are bounded in absolutevalue by T − / a.s. We obtain that P Tt =1 ξ Tt ⇒ S , the same limit as in Lemma 3. Finally,use additivity of d ( · , · , · ) in the ﬁrst argument and apply ULLN to S T − P Tt =1 ξ Tt = P Tt =1 d ( G T,θ ( · | Ω t ) , F θ ( · | Ω t ) , u ) / √ T = δ P Tt =1 d ( H ( · | Ω t ) , F θ ( · | Ω t ) , u ) /T . Proof of Lemma 5.

Under H T , i.e. under G T,θ , Equation (4) can be establishedusing standard methods, applying Doob and Rosenthal inequalities for MDS (Hall andHeyde, 1980) √ T ξ Tt := I F b θT ( ·| Ω t ) ( Y t , u ) − I F θ ( ·| Ω t ) ( Y t , u ) − d (cid:16) G T,θ ( · | Ω t ) , F b θ T ( · | Ω t ) , u (cid:17) d ( G T,θ ( · | Ω t ) , F θ ( · | Ω t ) , u ) . Deﬁne z T := P Tt =1 ξ Tt . When it is necessary, we willwrite explicitly arguments: z T ( u, b θ T ). We show that sup u | z T | = o p (1). Since √ T (cid:16)b θ T − θ (cid:17) = O P (1), it is suﬃcient to establish that for some γ < / u, k η − θ k≤ T − γ | z T ( u, η ) | = o p (1) . Note that for

T > δ /ν , by Assumption 3C,Pr (cid:18) sup η,t max y | G T,t,θ ( y | Ω t ) − F t,η ( y | Ω t ) | > ν (cid:19) ≤ M F T − γ /ν . (9)First, we will show that ∀ η, u | z T | = o p (1). Since ξ Tt are bounded by 2 in absolutevalue and form a martingale diﬀerence sequence with respect to Ω t , by the Doob inequality ∀ p ≥ ∀ ε > P (cid:18) max t =1 ,...,T | z t | > ε (cid:19) ≤ E | z T | p /ε p , and by Rosenthal inequality, ∀ p ≥ ∃ C E | z T | p ≤ C (cid:20) E nX E (cid:16)(cid:0) ξ Tt (cid:1) | Ω t (cid:17)o p/ + X E (cid:12)(cid:12) ξ Tt (cid:12)(cid:12) p (cid:21) . Take p = 4. The ﬁrst term is small because of bounds in Lemma A(iv) and (9). Because (cid:12)(cid:12) ξ Tt (cid:12)(cid:12) ≤ / √ T , P E (cid:12)(cid:12) ξ Tt (cid:12)(cid:12) p ≤ T − p/ . Therefore we have a pointwise bound. Unifor-mity in u, η can be established using monotonicity of I F θ ( ·| Ω t ) ( Y t , u ) and continuity of d (cid:16) G T,θ ( · | Ω t ) , F b θ T ( · | Ω t ) , u (cid:17) by employing bounds in Lemma A(iv) and (9).Finally, use that uniformly in u √ T X t (cid:16) d (cid:16) G T,θ ( · | Ω t ) , F b θ T ( · | Ω t ) , u (cid:17) − d ( G T,θ ( · | Ω t ) , F θ ( · | Ω t ) , u ) (cid:17) = √ T (cid:16)b θ T − θ (cid:17) T X t ∇ ( F θ ( · | Ω t ) , u ) + o p (1) . Proof of Theorem 1.

The joint weak convergence (6) follows from ﬁnite-dimensionalconvergence by CLT for MDS, while tightness was established in the proof of Lemma 4.39 roof of Theorem 2.

Note that S T = T X t =2 ξ Tt + 1( T − / { ( I θ ,T ( u ) − u ) I θ ,T − ( u ) + u ( I θ , ( u ) − u ) } , where ξ Tt := 1( T − / { ( I t,θ ( u ) − u ) I t − ,θ ( u ) + u ( I t,θ ( u ) − u ) } is a square integrable martingale diﬀerence by Lemma 1. The rest is similar to theproof of Theorem 1. To obtain S T ( u ) ⇒ S ∞ ( u ) under H , verify conditions N1-N3of Theorem A for ξ Tt as it is done in the proof of Lemma 3. The covariance function of S ∞ ( u ) is V ( u, v ) := ( u ∧ v ) ( u ∧ v ) − u v u v + ( u ∧ v ) plim T →∞ T T X t =2 γ t − ,θ ( u , v ) − plim T →∞ T T X t =2 γ t,θ ( u , v ) (cid:0) I t − ,θ ( u ∧ v ) − γ t − ,θ ( u , v ) (cid:1) + ( u ∧ v ) u v − u plim T →∞ T T X t =2 γ t,θ ( u , v ) I t − ,θ ( v )+ ( u ∧ v ) u v − v plim T →∞ T T X t =2 γ t,θ ( u , v ) I t − ,θ ( u ) . Under H T , apply the same weak convergence result under G T,t,θ ( · | Ω t ) with ζ Tt := ξ Tt − I t − ,θ ( u ) d ( G T,t,θ ( · | Ω t ) , F t,θ ( · | Ω t ) , u ) / √ T − u d ( G T,t,θ ( · | Ω t ) , F t,θ ( · | Ω t ) , u ) / √ T − , which is a square integrable martingale diﬀerence because of Lemma A(i) with G = G T,t,θ ( · | Ω t ) and F = F θ ( · | Ω t ). Then proceed as in the proof of Lemma 4.In order to establish (7), repeat the steps of the proof of Lemma 5 for e ζ Tt := ζ Tt − b ζ Tt ,where b ζ Tt is ζ Tt with F t, b θ T in place of F t,θ . Proof of Theorem 4.

Repeat the arguments of the proofs of Theorems 1 and 2 forsample generated by F θ T , deﬁned in Assumption 6, to obtain conditional convergence.Then follow as in Andrews (1997) proof of Corollary 1.40 .5 Checking assumptions for the Poisson model Here we write Y t for Y ⋆t . For Poisson model Y t | Ω t ∼ Poisson( λ t ) the probabilitydistribution is Pr( Y t = k | Ω t ) = P λ t ( k ) = λ kt exp( − λ t ) k ! and the cumulative distributionfunction is F t,θ ( k | Ω t ) = k X j =0 Pr( Y t = j | Ω t ) = k X j =0 λ jt exp( − λ t ) j ! = Q ( k, λ t ) , where Q ( · , · ) is the regularized gamma function, and λ t = λ t ( β ) = exp( X ′ t β ), t = 1 , , . . . .If covariates X t are iid or stationary and ergodic, and Ω t omits lags of the dependentvariable Y t , then the LLN applies both under the null and local alternatives (like, e.g.,the local alternative considered in Eq. (2.12) in Cameron and Trivedi, 1990) to justifyAssumptions 2-6 and Assumption A, which involve functions of Ω t that are uniformlycontinuous in u . However, it can also be interesting to allow the intensity to depend onlags of the dependent variable. For simplicity we consider AR (1) dynamics. AR ( p ) canbe treated similarly but is more lengthy. The parameters enter through λ t = λ t ( θ ) = α + α λ t − + ρY t − , t = 1 , , . . . , and are gathered in θ = ( α , α , ρ ) ′ . We assume that α , α , ρ are positive, λ and Y are ﬁxed and α + ρ <

1. Under these conditions, thereexist a unique stationary and ergodic solution to this model (Fokianos et al., 2009). Suchdata generating processes allow to use results on (generic, uniform) LLN, which facilitatethe checking of assumptions in the paper. Conditions for stationarity and ergodicityfor nonlinear λ t ( θ ) can be found in Neumann (2011) and are directly applicable to theanalysis under the null hypothesis. However, we are not aware of LLN results for thesemodels under local alternatives despite Fokianos and Neumann (2013, Proposition 2.3(ii))use related arguments.Let λ t, = λ t ( θ ) and the null hypothesis is Y t | Ω t ∼ Poisson( λ t, ) for some θ ∈ Θ.Then U t = Q ( Y t + 1 , λ t, ) and U − t = Q ( Y t , λ t, ), and the nonrandomized transform Y t t,θ ( u ) for u ∈ [0 ,

1] is I t,θ ( u ) =  , u ≤ Q ( Y t , λ t, ); u − Q ( Y t , λ t, ) λ Y t t, exp( − λ t, ) Y t ! , Q ( Y t , λ t, ) ≤ u ≤ Q ( Y t + 1 , λ t, );1 , Q ( Y t + 1 , λ t, ) ≤ u, from where one obtains the empirical processes and the test statistics deﬁned in Sec-tions 1-2.Now consider Assumption 1. For Poisson model γ t,θ ( u, v ) = ( Q ( k + 1 , λ t, ) − u ∨ v ) ( u ∧ v − Q ( k, λ t, )) λ kt, exp( − λ t, ) k ! { k ( u ) = k ( v ) } , where k = k ( u ) = min { y : Q ( y, λ t, ) ≥ u } . For the Poisson DGP described above, Y t isstationary and ergodic, γ ∞ ( u, v ) := E (cid:2) γ ,θ ( u, v ) (cid:3) satisﬁes Assumption 1. By the sameargument Assumptions 2, 3D, 4C, 5 are fulﬁlled.Assumption 3A and 3B are trivial. For Assumption 3C note that˙ F t,θ ( k | Ω t ) = k − X j =0 λ jt j ! − k X j =0 λ jt j ! ! exp( − λ t ) ˙ λ t = − λ kt k ! exp( − λ t ) ˙ λ t , where ˙ λ t = (cid:18) α ∂λ t − ∂α , λ t − + α ∂λ t − ∂α , Y t − + α ∂λ t − ∂ρ (cid:19) ′ . The last expression can be iterated from t − t = 1 and because α < Acknowledgements

We thank Juan Mora for useful comments. Support from the Ministerio Economia yCompetitividad (Spain), grants ECO2012-31748, ECO2014-57007p, MDM 2014-0431,Comunidad de Madrid, MadEco-CM (S2015/HUM-3444), and Fundaci´on Ram´on Arecesis gratefully acknowledged.

References [1] Andrews, D.W.K. (1997) A conditional Kolmogorov test,

Econometrica

65, 1097-1128.[2] Bai, J. (2003) Testing Parametric Conditional Distributions of Dynamic Models,

Review of Economics and Statistics

85, 531-549.[3] Basu, D. and R. de Jong (2007). Dynamic Multinomial Ordered Choice with anApplication to the Estimation of Monetary Policy Rules.

Studies in Nonlinear Dy-namics and Econometrics , 11, 1507-1507.[4] Burke, M. D., Csorgo M., Csorgo S. and P. Revesz (1978). Approximaiton of theempirical process whe parameters are estimated.

Annals of Probability , 7, 790-810.[5] Cameron A.C. and P.K. Trivedi (1990) Regression-based tests for overdispersion inthe Poisson model,

Journal of Econometrics

46, 347-364.[6] Corradi, V. and R. Swanson (2006) Bootstrap conditional distribution test in thepresence of dynamic misspeciﬁcation,

Journal of Econometrics

Biometrics , 65, 1254-1261.[8] Davis R. A., W. T. M. Dunsmuir and S. B. Streett (2003) Observation-Driven Modelsfor Poisson Counts.

Biometrika

90, 777-790.[9] Delgado, M. and Escanciano, J. C. (2007) Nonparametric tests for conditional sym-metry in dynamic models.

Journal of Econometrics

Oxford Bulletin of Economics and Statistics , 64, 159-182.4312] Doukhan P., K. Fokianos and D. Tjostheim (2012) On weak dependence conditionsfor Poisson autoregressions,

Statistics and Probability Letters

82, 942-948.[13] Fokianos K. and M. Neumann (2013) A goodness-of-ﬁt test for Poisson count pro-cesses,

Electronic Journal of Statistics

7, 793-819.[14] Fokianos K., A. Rahbek and D. Tjostheim (2009) Poisson Autoregression,

Journalof the American Statistical Association

Bernoulli,

20, 1344-1371.[16] Giacomini R., Politis D. and H. White (2013) A Warp-Speed Method for ConductingMonte Carlo Experiments Involving Bootstrap Estimators,

Econometric Theory

Modeling Ordered Choices. A Primer.

Cam-bridge University Press.[18] Hamilton, J. and O. Jord´a (2002). A model of the Federal Funds rate target.

TheJournal of Political Economy

Limit Theorems for Stochastic Processes.

Springer, 2d Ed. Berlin.[21] Jung, R.C., M. Kukuk and R. Liesenfeld (2006). Time series of count data: modeling,estimation and diagnostics.

Computational Statistics and Data Analysis , 51, 2350-2364.[22] Kauppi, H. and Saikkonen, P. (2008). Predicting U.S. recessions with dynamic binaryresponse models.

Review of Economics and Statistics

90, 777-791.[23] Kedem and Fokianos (2002).

Regression Models for Time Series Analysis.

Willey,New Jersey.[24] Kheifets, I. (2015). Speciﬁcation Tests for Nonlinear Time Series Models.

Economet-rics Journal

18, 67-94.[25] Kheifets, I., and C. Velasco (2013). Model Adequacy Checks for Discrete ChoiceDynamic Models. In Recent Advances and Future Directions in Causality, Prediction,and Speciﬁcation Analysis Essays in Honor of Halbert L. White Jr. Chen, Xiaohong;Swanson, Norman R. (Eds.), 363-382.[26] Lee, S. (2014). Goodness of ﬁt test for discrete random variables,

ComputationalStatistics and Data Analysis

69, 92-100.4427] Machado, J. A. F., and Santos Silva, J. M. C. (2005). Quantiles for Counts,

Journalof the American Statistical Association

Journal of Econometrics

Bernoulli

17, 1268-1284.[30] Nishiyama Y. (2000). Weak convergence of some classes of martingales with jumps.

Annals of Probability

28, 685-712.[31] Rydberg and Shephard (2003). Dynamics of Trade-By-Trade Price Movements: De-composition and Models.

Journal of Financial Econometrics

1, 2-25.[32] Rosenblatt M. (1952). Remarks on a Multivariate Transformation.

Annals of Math-ematical Statistics

23, 470-72.[33] Startz R. (2008). Binomial Autoregressive Moving Average Models with an Appli-cation to U.S. Recessions.

Journal of Business and Economic Statistics

26, 1-8.[34] Stute W., Gonzalez Manteiga W., and M. Presedo Quindimil (2002). BootstrapBased Goodness-Of-Fit-Tests.

Metrika

40, 243-256.[35] Stute W. and Zhu, L.-X. (2002). Model checks for generalized linear models.

Scan-dinavian Journal of Statistics

29, 535-545.[36] van der Vaart, A. W. and Wellner, J. A. (1996).